Background

By far the most pervasive mechanism to control the sequencing of tasks in time-constrained systems is priority scheduling. Priority scheduling means that whenever a task or thread is ready to execute, its priority relative to other threads determines the actual order of execution. Ideally, priorities in a time-constrained system should always reflect the time constraints for the tasks. A useful priority assignment heuristic is to assign higher priorities to tasks that are part of tighter time constraints. Thus, a task that needs to respond in a shorter time will get higher priority than a task that needs to respond in a longer time.

The most desirable state of a priority scheduler is that the system should always be executing the task with the highest priority. However, it is impossible for an operating system to invariably achieve this state. When circumstances within the system force a higher-priority task to wait for a lower-priority task, priority inversion occurs. This can lead to unbounded priority inversion (discussed in the next section), where a high-priority thread misses its deadline.

优先级反转（Priority Inversion）

Priority inversion is a bug that occurs when a high priority task is indirectly preempted by a low priority task. For example, the low priority task holds a mutex that the high priority task must wait for to continue executing.

This violates the priority model that high priority tasks can only be prevented from running by higher priority tasks and briefly by low priority tasks which will quickly complete their use of a resource shared by the high and low priority tasks.

Bounder Priority Inversion

In the simple case, the high priority task (Task H) would be blocked as long as the low priority task (Task L) held the lock. This is known as “bounded priority inversion,” as the length of time of the inversion is bounded by however long the low priority task is in the critical section (holding the lock).

As you can see in the diagram above, Task H is blocked so long as Task L holds the lock. The priority of the tasks have been indirectly “inverted” as now Task L is running before Task H.

Unbounder Priority Inversion

Unbounded priority inversion occurs when a medium priority task (Task M) interrupts Task L while it holds the lock. It’s called “unbounded” because Task M can now effectively block Task H for any amount of time, as Task M is preempting Task L (which still holds the lock).

Priority inversion nearly ended the Mars Pathfinder mission in 1997. After deploying the rover, the lander would randomly reset every few days due to an intermittent priority inversion bug that caused the watchdog timer to trigger a full system restart. Engineers eventually found the bug and sent an update patch to the lander. You can read about the mission and bug here.

There are a few ways to combat unbounded priority inversion. Two popular methods include priority ceiling protocol and priority inheritance.

Solution

Priority ceiling protocol

Priority ceiling protocol involves assigning a “priority ceiling level” to each resource or lock. Whenever a task works with a particular resource or takes a lock, the task’s priority level is automatically boosted to that of the priority ceiling associated with the lock or resource. The priority ceiling is determined by the maximum priority of any task that needs to use the resource or lock.

优先级天花板是当任务申请某资源时，把该任务的优先级提升到可访问这个资源（只是可能性，但不意味着一定会去访问）的所有任务中的最高优先级（即使并不是每一个”所有任务“中的每一个任何都一定能够访问到这个资源），这个优先级称为该资源的优先级天花板。这种方法简单易行，不必进行复杂的判断，不管任务是否阻塞了高优先级任务的运行，只要任务访问共享资源都会提升任务的优先级。

As the priority ceiling of the lock is 3, whenever Task L takes the lock, its priority is boosted to 3 so that it will run at the same priority as Task H. This prevents Task M (priority 2) from running until Tasks L and H are done with the lock.

Priority Inheritance

Another method, known as “priority inheritance,” involves boosting the priority of a task holding a lock to that of any other (higher priority) task that tries to take the lock.

优先级继承是当任务 L 申请共享资源S 时，如果S正在被任务 H 使用，通过比较任务 L 与自身的优先级，如发现任务 L 的优先级小于自身的优先级，则将任务 L 的优先级提升到 H 任务的优先级，任务 L 释放资源S 后，再恢复任务 L 到原原优先级。这种方法只在占有资源的低优先级任务阻塞了高优先级任务时，才动态的改变任务的优先级。

Task L takes the lock. Only when Task H attempts to take the lock is the priority of Task L boosted to that of Task H’s. Once again, Task M can no longer interrupt Task L until both tasks are finished in the critical section.

Note that in both priority ceiling protocol and priority inheritance, Task L’s priority is dropped back to its original level once it releases the lock. Also note that both systems only prevent unbounded priority inversion. Bounded priority inversion can still occur.

Priority Inheritance vs. Priority Ceiling Emulation

Both the priority inheritance protocol and the priority ceiling emulation protocol***** have strengths and weaknesses. When used correctly, each are useful components of a real-time designer’s toolbox.

Priority inheritance can be difficult in a complex environment; implementers not familiar with its nuances can easily make mistakes. However, there is ample evidence that it can also be used to great advantage. For example, in one jitter test (where jitter refers to the maximum deviation in the measured period of a periodic task), using an RTOS running on a heavily loaded 1.4GHz Pentium processor, the worst-case jitter dropped from 76,492 microseconds to 52 microseconds simply by enabling priority inheritance in the kernel. 1.4GHz is somewhat faster than is usually available in embedded systems, but even at this speed the worst-case jitter is still just over 70 milliseconds. Priority inheritance illustrates that faster hardware doesn’t make an application behave more predictably, but avoiding priority inversion does.

Priority ceiling emulation has certain desirable properties — it has good worst-case priority inversion control, it handles nested locks well, and can avoid deadlock in some cases. Priority inheritance can occasionally lead to poorer worst-case behavior when there are nested locks, and does not avoid deadlocks. However, most performance-critical applications and real-time operating systems minimize their use of nested locks, and there are other mechanisms to avoid deadlock when nesting cannot be avoided.

On the other hand, priority inheritance can be implemented so that there is no penalty when the locks are not contended, which covers the vast majority of time-constrained systems. This, in addition to the fact that many extra context switches are avoided, and medium-priority tasks are not preempted unnecessarily, leads to excellent average performance. In practical systems, including both hard and soft real-time systems, average performance is as important as worst-case response. In contrast, priority ceiling emulation will pay the cost of changing a thread’s priority twice regardless of whether there is contention for the lock or not, resulting in higher overhead and many unnecessary context switches and blocking in unrelated tasks.

Priority ceiling emulation is an attractive choice when the set of threads contending for the lock is known, when there may be nested locks, and when worst-case behavior is the only concern. On the other hand, priority inheritance is very effective when a lock is seldom part of a nested set, and when average performance is relevant in addition to worst-case performance.

Another important aspect to understand is that optimal use of priority ceiling emulation requires static analysis of the system to find the priority ceiling of each lock. While static analysis is highly desirable (and even necessary) for many applications with critical response requirements, it may be neither desirable nor cost-effective in many other applications in which only a few parts of the system are critical. Also, formal real-time analysis is primarily applicable to systems that are constructed according to a set of precise rules with limited dynamic operations. For systems that cannot be readily analyzed, priority inheritance is likely to be a more effective mechanism since it does not require static analysis.

Consequences

In some cases, priority inversion can occur without causing immediate harm—the delayed execution of the high priority task goes unnoticed, and eventually the low priority task releases the shared resource. However, there are also many situations in which priority inversion can cause serious problems. If the high priority task is left starved of the resources, it might lead to a system malfunction or the triggering of pre-defined corrective measures, such as a watchdog timer resetting the entire system. The trouble experienced by the Mars Pathfinder lander in 1997[1][2] is a classic example of problems caused by priority inversion in realtime systems.

Priority inversion can also reduce the perceived performance of the system. Low priority tasks usually have a low priority because it is not important for them to finish promptly (for example, they might be a batch job or another non-interactive activity). Similarly, a high priority task has a high priority because it is more likely to be subject to strict time constraints—it may be providing data to an interactive user, or acting subject to realtime response guarantees. Because priority inversion results in the execution of a lower priority task blocking the high priority task, it can lead to reduced system responsiveness, or even the violation of response time guarantees.

【Operating System】进程 - 优先级翻转（Priority Inversion）