Embedded Linux Primer: A Practical, Real-World Approach - читать онлайн бесплатно полную версию книги . Страница 22

Chapter 17. Linux and Real Time

When Linux began life on an Intel i386 processor, no one ever expected the success Linux would enjoy in server applications. That success has led to Linux being ported to many different architectures and used by developers for embedded systems from cellular handsets to telecommunications switches. Not long ago, if your application had real-time requirements, you might not have included Linux among the choices for your operating system. That has all changed with the developments in real-time Linux driven, in large part, by audio and multimedia applications.

In this chapter, we start with a brief look at the historical development of real-time Linux features. Then we look at the facilities available to the real-time programmer and how these facilities are used.

17.1. What Is Real Time?

Ask five people what "real time" means, and, chances are, you will get five different answers. Some might even cite some numbers. For the purposes of the discussion to follow, we discuss some scenarios and then propose a definition. Many requirements might be said to be soft real time, while others are called hard real time.

17.1.1. Soft Real Time

Most agree that soft real time means that the operation has a deadline, but if the deadline is missed, the quality of the experience could be diminished (but not fatal). Your desktop workstation is a perfect example of soft real-time requirements. When you are editing a document, you expect to see the results of your keystrokes immediately on the screen. When playing your favorite .mp3 file, you expect to have high-quality audio without any clicks, pops, or gaps in the music.

In general terms, humans cannot see or hear delays below a few tens of milliseconds. Of course, the musicians in the crowd will tell you that music can be colored by delays smaller than that. If a deadline is missed by these so-called soft real-time events, the results may be undesirable, leading to a lower level of "quality" of the experience, but not catastrophic.

17.1.2. Hard Real Time

Hard real time is characterized by the results of a missed deadline. In a hard real-time system, if a deadline is missed, the results are often catastrophic. Of course, catastrophic is a relative term. If your embedded device is controlling the fuel flow to a jet aircraft engine, missing a deadline to respond to pilot input or a change in operational characteristics can lead to disastrous results.

Note that the duration of the deadline has no bearing on the real-time characteristic. Servicing the tick on an atomic clock is such an example. As long as the tick is processed within the 1-second window before the next tick, the data remains valid. Missing the processing on a tick might throw off our global positioning systems by feet or even miles!

With this in mind, we draw on a commonly used set of definitions for soft and hard real time. For soft real-time systems, the value of a computation or result is diminished if a deadline is missed. For hard real-time systems, if a single deadline is missed, the system is considered to have failed, and might have catastrophic consequences.

17.1.3. Linux Scheduling

UNIX and Linux were both designed for fairness in their process scheduling. That is, the scheduler tries its best to allocate available resources across all processes that need the CPU and guarantee each process that they can make progress. This very design objective is counter to the requirement for a real-time process. A real-time process must run as soon as possible after it is ready to run. Real time means having predictable and repeatable latency.

17.1.4. Latency

Real-time processes are often associated with a physical event, such as an interrupt arriving from a peripheral device. Figure 17-1 illustrates the latency components in a Linux system. Latency measurement begins upon receipt of the interrupt we want to process. This is indicated by time t0 in Figure 17-1. Sometime later, the interrupt is taken and control is passed to the Interrupt Service Routine (ISR). This is indicated by time t1. This interrupt latency is almost entirely dictated by the maximum interrupt off time[115] the time spent in a thread of execution that has hardware interrupts disabled.

Figure 17-1. Latency components

It is considered good design practice to minimize the processing done in the actual interrupt service routine. Indeed, this execution context is limited in capability (for example, an ISR cannot call a blocking function, one that might sleep), so it is desirable to simply service the hardware device and leave the data processing to a Linux bottom half,[116] also called softIRQs.

When the ISR/bottom half has finished its processing, the usual case is to wake up a user space process that is waiting for the data. This is indicated by time t2 in Figure 17-1. At some point in time later, the real-time process is selected by the scheduler to run and is given the CPU. This is indicated by time t3 in Figure 17-1. Scheduling latency is affected primarily by the number of processes waiting for the CPU and the priorities among them. Setting the Real Time attribute on a process gives it higher priority over normal Linux processes and allows it to be the next process selected to run, assuming that it is the highest priority real-time process waiting for the CPU. The highest-priority real-time process that is ready to run (not blocked on I/O) will always run. You'll see how to set this attribute shortly.

17.2. Kernel Preemption

In the early Linux days of Linux 1.x, there was no kernel preemption. This meant that when a user space process requested kernel services, no other task could be scheduled to run until that process either blocked (goes to sleep) waiting on something (usually I/O), or until the kernel request is completed. Making the kernel preemptable[117] means that while one process is running in the kernel, another process can preempt the first and be allowed to run even though the first process had not completed its in-kernel processing. Figure 17-2 illustrates this.

Figure 17-2. Kernel preemption

In this figure, Process A has entered the kernel via a system call. Perhaps it was a call to write() to a device such as the console or a file. While executing in the kernel on behalf of Process A, Process B with higher priority is woken up by an interrupt. The kernel preempts Process A and assigns the CPU to Process B, even though Process A had neither blocked nor completed its kernel processing.

17.2.1. Impediments to Preemption

The challenge in making the kernel fully preemptable is to identify all the places in the kernel that must be protected from preemption. These are the critical sections within the kernel where preemption cannot be allowed to occur. For example, assume that Process A in Figure 17-2 is executing in the kernel performing a file system operation. At some point, the code might need to write to an in-kernel data structure representing a file on the file system. To protect that data structure from corruption, the process must lock out all other processes from accessing the shared data structure. Listing 17-1 illustrates this concept using C syntax.

Listing 17-1. Locking Critical Sections

...

preempt_disable();

...

/* Critical section */

update_shared_data();

...

preempt_enable();

...

If we did not protect shared data in this fashion, the process updating the shared data structure could be preempted in the middle of the update. If another process attempted to update the same shared data, corruption of the data would be virtually certain. The classic example is when two processes are operating directly on common variables and making decisions on their values. Figure 17-3 illustrates such a case.

Figure 17-3. Shared data concurrency error

In Figure 17-3, Process A is interrupted after updating the shared data but before it makes a decision based on it. By design, Process A cannot detect that it has been preempted. Process B changes the value of the shared data before Process A gets to run again. As you can see, Process A will be making a decision based on a value determined by Process B. If this is not the behavior you seek, you must disable preemption in Process A around the shared datain this case, the operation and decision on the variable count.

17.2.2. Preemption Models

The first solution to kernel preemption was to place checks at strategic locations within the kernel code where it was known to be safe to preempt the current thread of execution. These locations included entry and exit to system calls, release of certain kernel locks, and return from interrupt processing. At each of these points, code similar to Listing 17-2 was used to perform preemption.

Listing 17-2. Check for Preemption a la Linux 2.4 + Preempt Patch

...

/*

* This code is executed at strategic locations within

* the Linux kernel where it is known to be safe to

* preempt the current thread of execution

*/

if (kernel_is_preemptable() && current->need_resched) preempt_schedule();

...

/*

* This code is in .../kernel/sched.c and is invoked from

* those strategic locations as above

*/

#ifdef CONFIG_PREEMPT

asmlinkage void preempt_schedule(void) {

while (current->need_resched) {

ctx_sw_off();

current->state |= TASK_PREEMPTED;

schedule();

current->state &= ~TASK_PREEMPTED;

ctx_sw_on_no_preempt();

}

#endif

...

The first snippet of code in Listing 17-2 (simplified from the actual code) is invoked at those strategic locations described earlier, where it is known that the kernel is safe to preempt. The second snippet of code in Listing 17-2 is the actual code from an early Linux 2.4 kernel with the preempt patch applied. This interesting while loop causes a context switch via the call to schedule() until all requests for preemption have been satisfied.

Although this approach led to reduced latencies in the Linux system, it was not ideal. The developers working on low-latency soon realized the need to "flip the logic." With earlier preemption models, we had this:

• The Linux kernel was fundamentally nonpreemptable.

• Preemption checks were sprinkled around the kernel at strategic locations known to be safe for preemption.

• Preemption was enabled only at these known-safe points.

To achieve a further significant reduction in latency, we want this in a preemptable kernel:

• The Linux kernel is fully preemptable everywhere.

• Preemption is disabled only around critical sections.

This is where the kernel developers have been heading since the original preemptable kernel patch series. However, this is no easy task. It involves poring over the entire kernel source code base, analyzing exactly what data must be protected from concurrency, and disabling preemption at only those locations. The method used for this has been to instrument the kernel for latency measurements, find the longest latency code paths, and fix them. The more recent Linux 2.6 kernels can be configured for very low-latency applications because of the effort that has gone into this "lock-breaking" methodology.

17.2.3. SMP Kernel

It is interesting to note that much of the work involved in creating an efficient multiprocessor architecture also benefits real time. The SMP challenge is more complex than the uniprocessor challenge because there is an additional element of concurrency to protect against. In the uniprocessor model, only a single task can be executing in the kernel at a time. Protection from concurrency involves only protection from interrupt or exception processing. In the SMP model, multiple threads of execution in the kernel are possible in addition to the threat from interrupt and exception processing.

SMP has been supported from early Linux 2. x kernels. A Big Kernel Lock (BKL) was used to protect against concurrency in the transition from uniprocessor to SMP operation. The BKL is a global spinlock, which prevents any other tasks from executing in the kernel. In his excellent book Linux Kernel Development (Novell Press, 2005), Robert Love characterized the BKL as the "redheaded stepchild of the kernel." In describing the characteristics of the BKL, Robert jokingly added "evil" to its list of attributes!

Early implementations of the SMP kernel based on the BKL led to significant inefficiencies in scheduling. It was found that one of the CPUs could be kept idle for long periods of time. Much of the work that led to an efficient SMP kernel also directly benefited real-time applicationsprimarily lowered latency. Replacing the BKL with smaller-grained locking surrounding only the actual shared data to be protected led to significantly reduced preemption latency.

17.2.4. Sources of Preemption Latency

A real-time system must be capable of servicing its real-time tasks within a specified upper boundary of time. Achieving consistently low preemption latency is critical to a real-time system. The two single largest contributors to preemption latency are interrupt-context processing and critical section processing where interrupts are disabled. You have already learned that a great deal of effort has been targeted at reducing the size (and thus, duration) of the critical sections. This leaves interrupt-context processing as the next challenge. This was answered with the Linux 2.6 real-time patch.

17.3. Real-Time Kernel Patch

Support for hard real time is not in the mainline kernel.org source tree. To enable hard real time, a patch must be applied. The real-time kernel patch is the cumulative result of several initiatives to reduce Linux kernel latency. The patch had many contributors, and it is currently maintained by Ingo Molnar; you can find it at http://people.redhat.com/~mingo/realtime-preempt. The soft real-time performance of the 2.6 Linux kernel has improved significantly since the early 2.6 kernel releases. When 2.6 was first released, the 2.4 Linux kernel was substantially better in soft real-time performance. Since about Linux 2.6.12, soft real-time performance in the single-digit milliseconds on a reasonably fast x86 processor is readily achieved. To get repeatable performance beyond this requires the real-time patch.

The real-time patch adds several important features to the Linux kernel. Figure 17-4 displays the configuration options for Preemption mode when the real-time patch has been applied.

Figure 17-4. Preemption modes with real-time patch

The real-time patch adds a fourth preemption mode called PREEMPT_RT, or Preempt Real Time. The four preemption modes are as follows:

• PREEMPT_NONE : No forced preemption. Overall latency is, on average, good, but there can be some occasional long delays. Best suited for applications for which overall throughput is the top design criteria.

• PREEMPT_VOLUNTARY : First stage of latency reduction. Additional explicit preemption points are placed at strategic locations in the kernel to reduce latency. Some loss of overall throughput is traded for lower latency.

• PREEMPT_DESKTOP : This mode enables preemption everywhere in the kernel except when processing within critical sections. This mode is useful for soft real-time applications such as audio and multimedia. Overall throughput is traded for further reductions in latency.

• PREEMPT_RT : Features from the real-time patch are added, including replacing spinlocks with preemptable mutexes. This enables involuntary preemption everywhere within the kernel except for those areas protected by preempt_disable(). This mode significantly smoothes out the variation in latency (jitter) and allows a low and predictable latency for time-critical real-time applications.

If kernel preemption is enabled in your kernel configuration, it can be disabled at boot time by adding the following kernel parameter to the kernel command line:

preempt=0

17.3.1. Real-Time Features

Several new Linux kernel features are enabled with CONFIG_PREEMPT_RT. From Figure 17-4, we see several new configuration settings. These and other features of the real-time Linux kernel patch are described here.

17.3.1.1. Spinlock Converted to Mutex

The real-time patch converts most spinlocks in the system to mutexes. This reduces overall latency at the cost of slightly reduced throughput. The benefit of converting spinlocks to mutexes is that they can be preempted. If Process A is holding a lock, and Process B at a higher priority needs the same lock, Process A can preempt Process B in the case where it is holding a mutex.

17.3.1.2. ISRs as Kernel Tasks

With CONFIG_PREEMPT_HARDIRQ selected, interrupt service routines[118] (ISRs) are forced to run in process context. This gives the developer control over the priority of ISRs because they become schedulable entities. As such, they also become preemptable to allow higher-priority hardware interrupts to be handled first.

This is a powerful feature. Some hardware architectures do not enforce interrupt priorities. Those that do might not enforce the priorities consistent with your specified real-time design goals. Using CONFIG_PREEMPT_HARDIRQ, you are free to define the priorities at which each IRQ will run.

Conversion of ISRs to threads can be disabled at runtime through the /proc file system or at boot time by entering a parameter on the kernel command line. When enabled in the configuration, unless you specify otherwise, ISR threading is enabled by default.

To disable ISR threading at runtime, issue the following command as root:

# echo '0' >/proc/sys/kernel/hardirq_preemption

To verify the setting, display it as follows:

# cat /proc/sys/kernel/hardirq_preemption

1

To disable ISR threading at boot time, add the following parameter to the kernel command line:

hardirq-preempt=0

17.3.1.3. Preemptable Softirqs

CONFIG_PREEMPT_SOFTIRQ reduces latency by running softirqs within the context of the kernel's softirq daemon (ksoftirqd). ksoftirqd is a proper Linux task (process). As such, it can be prioritized and scheduled along with other tasks. If your kernel is configured for real time, and CONFIG_PREEMPT_SOFTIRQ is enabled, the ksoftirqd kernel task is elevated to real-time priority to handle the softirq processing.[119] Listing 17-3 shows the code responsible for this from a recent Linux kernel, found in .../kernel/softirq.c.

Listing 17-3. Promoting ksoftirq to Real-Time Status

static int ksoftirqd(void * __bind_cpu) {

struct sched_param param = { .sched_priority = 24 };

printk("ksoftirqd started up.\n");

#ifdef CONFIG_PREEMPT_SOFTIRQS

printk("softirq RT prio: %d.\n", param.sched_priority);

sys_sched_setscheduler(current->pid, SCHED_FIFO, &param);

#else

set_user_nice(current, -10);

#endif

...

Here we see that if CONFIG_PREEMPT_SOFTIRQS is enabled in the kernel configuration, the ksoftirqd kernel task is promoted to a real-time task (SCHED_FIFO) at a real-time priority of 24 using the sys_sched_setscheduler() kernel function.

SoftIRQ threading can be disabled at runtime through the /proc file system, as well as through the kernel command line at boot time. When enabled in the configuration, unless you specify otherwise, SoftIRQ threading is enabled by default. To disable SoftIRQ threading at runtime, issue the following command as root:

# echo '0' >/proc/sys/kernel/softirq_preemption

To verify the setting, display it as follows:

# cat /proc/sys/kernel/softirq_preemption

1

To disable SoftIRQ threading at boot time, add the following parameter to the kernel command line:

softirq-preempt=0

17.3.1.4. Preempt RCU

RCU (Read-Copy-Update)[120] is a special form of synchronization primitive in the Linux kernel designed for data that is read frequently but updated infrequently. You can think of RCU as an optimized reader lock. The real-time patch adds CONFIG_PREEMPT_RCU, which improves latency by making certain RCU sections preemptable.

17.3.2. O(1) Scheduler

The O(1) scheduler has been around since the days of Linux 2.5. It is mentioned here because it is a critical component of a real-time solution. The O(1) scheduler is a significant improvement over the previous Linux scheduler. It scales better for systems with many processes and helps produce lower overall latency.

In case you are wondering, O(1) is a mathematical designation for a system of the first order. In this context, it means that the time it takes to make a scheduling decision is not dependent on the number of processes on a given runqueue. The old Linux scheduler did not have this characteristic, and its performance degraded with the number of processes.[121]

17.3.3. Creating a Real-Time Process

You can designate a process as real time by setting a process attribute that the scheduler uses as part of its scheduling algorithm. Listing 17-4 shows the general method.

Listing 17-4. Creating a Real-Time Process

#include <sched.h>

#define MY_RT_PRIORITY MAX_USER_RT_PRIO /* Highest possible */

int main(int argc, char **argv) {

...

int rc, old_scheduler_policy;

struct sched_param my_params;

...

/* Passing zero specifies caller's (our) policy */

old_scheduler_policy = sched_getscheduler(0);

my_params.sched_priority = MY_RT_PRIORITY;

/* Passing zero specifies callers (our) pid */

rc = sched_setscheduler(0, SCHED_RR, &my_params);

if (rc == -1) handle_error();

...

}

This code snippet does two things in the call to sched_setscheduler(). It changes the scheduling policy to SCHED_RR and raises its priority to the maximum possible on the system. Linux supports three scheduling policies:

• SCHED_OTHER : Normal Linux process, fairness scheduling

• SCHED_RR : Real-time process with a time slicethat is, if it does not block, it is allowed to run for a given period of time determined by the scheduler

• SCHED_FIFO : Real-time process that runs until it either blocks or explicitly yields the processor, or until another higher-priority SCHED_FIFO process becomes runnable

The man page for sched_setscheduler provides more detail on the three different scheduling policies.

17.3.4. Critical Section Management

When writing kernel code, such as a custom device driver, you will encounter data structures that you must protect from concurrent access. The easiest way to protect critical data is to disable preemption around the critical section. Keep the critical path as short as possible to maintain a low maximum latency for your system. Listing 17-5 shows an example.

Listing 17-5. Protecting Critical Section in Kernel Code

...

/*

* Declare and initialize a global lock for your

* critical data

*/

DEFINE_SPINLOCK(my_lock);

...

int operate_on_critical_data() {

...

spin_lock(&my_lock);

...

/* Update critical/shared data */

...

spin_unlock(&my_lock);

...

}

When a task successfully acquires a spinlock, preemption is disabled and the task that acquired the spinlock is allowed into the critical section. No task switches can occur until a spin_unlock operation takes place. The spin_lock() function is actually a macro that has several forms, depending on the kernel configuration. They are defined at the top level (architecture-independent definitions) in .../include/linux/spinlock.h. When the kernel is patched with the real-time patch, these spinlocks are promoted to mutexes to allow preemption of higher-priority processes when a spinlock is held.

Because the real-time patch is largely transparent to the device driver and kernel developer, the familiar constructs can be used to protect critical sections, as described in Listing 17-5. This is a major advantage of the real-time patch for real-time applications; it preserves the well-known semantics for locking and interrupt service routines.

Using the macro DEFINE_SPINLOCK as in Listing 17-5 preserves future compatibility. These macros are defined in .../include/linux/spinlock_types.h.

17.4. Debugging the Real-Time Kernel

Several configuration options facilitate debugging and performance analysis of the real-time patched kernel. They are detailed in the following subsections.

17.4.1. Soft Lockup Detection

To enable soft lockup detection, enable CONFIG_DETECT_SOFTLOCKUP in the kernel configuration. This feature enables the detection of long periods of running in kernel mode without a context switch. This feature exists in non-real-time kernels but is useful for detecting very high latency paths or soft deadlock conditions. To use it, simply enable the feature and watch for any reports on the console or system log. Reports will be emitted similar to this:

BUG: soft lockup detected on CPU0

When this message is emitted by the kernel, it is usually accompanied by a backtrace and other information such as the process name and PID. It will look similar to a kernel oops message complete with processor registers. See .../kernel/softlockup.c for details. This information can be used to help track down the source of the lockup condition.

17.4.2. Preemption Debugging

To enable preemption debugging, enable CONFIG_DEBUG_PREEMPT in the kernel configuration. This debug feature enables the detection of unsafe use of preemption semantics such as preemption count underflows and attempts to sleep while in an invalid context. To use it, simply enable the feature and watch for any reports on the console or system log. Here is just a small sample of reports possible when preemption debugging is enabled:

BUG: <me> <mypid>, possible wake_up race on <proc> <pid>

BUG: lock recursion deadlock detected! <more info>

BUG: nonzero lock count <n> at exit time?

Many more messages are possiblethese are just a few examples of the kinds of problems that can be detected. These messages will help you avoid deadlocks and other erroneous or dangerous programming semantics when using real-time kernel features. For more details on the messages and conditions under which they are emitted, browse the Linux kernel source file .../kernel/rt-debug.c.

17.4.3. Debug Wakeup Timing

To enable wakeup timing, enable CONFIG_WAKEUP_TIMING in the kernel configuration. This debug option enables measurement of the time taken from waking up a high-priority process to when it is scheduled on a CPU. Using it is simple. When configured, measurement is disabled. To enable the measurement, do the following as root:

# echo '0' >/proc/sys/kernel/preempt_max_latency

When this /proc file is set to zero, each successive maximum wakeup timing result is written to this file. To read the current maximum, simply display the value:

# cat /proc/sys/kernel/preempt_max_latency

84

As long as any of the latency-measurement modes are enabled in the kernel configuration, preempt_max_latency will always be updated with the maximum latency value. It cannot be disabled. Writing 0 to this /proc variable simply resets the maximum to zero to restart the cumulative measurement.

17.4.4. Wakeup Latency History

To enable wakeup latency history, enable CONFIG_WAKEUP_LATENCY_HIST while CONFIG_WAKEUP_TIMING is also enabled. This option dumps all the wakeup timing measurements enabled by CONFIG_WAKEUP_TIMING into a file for later analysis. An example of this file and its contents is presented shortly when we examine interrupt off history.

• CRITICAL_PREEMPT_TIMING : Measures the time spent in critical sections with preempt disabled.

• PREEMPT_OFF_HIST : Similar to WAKEUP_LATENCY_HIST. Gathers preempt off timing measurements into a bin for later analysis.

17.4.5. Interrupt Off Timing

To enable measurement of maximum interrupt off timing, configure your kernel with CRITICAL_IRQSOFF_TIMING enabled. This option measures time spent in critical sections with irqs disabled. This feature works in the same way as wakeup latency timing. To enable the measurement, do the following as root:

# echo '0' >/proc/sys/kernel/preempt_max_latency

When this /proc file is set to zero, each successive maximum interrupt off timing result is written to this file. To read the current maximum, simply display the value:

# cat /proc/sys/kernel/preempt_max_latency

97

You will notice that the latency measurements for both wakeup latency and interrupt off latency are enabled and displayed using the same /proc file. This means, of course, that only one measurement can be configured at a time, or the results might not be valid. Because these measurements add significant runtime overhead, it isn't wise to enable them all at once anyway.

17.4.6. Interrupt Off History

Enabling INTERRUPT_OFF_HIST provides functionality similar to that with WAKEUP_LATENCY_HIST. This option gathers interrupt off timing measurements into a file for later analysis. This data is formatted as a histogram, with bins ranging from 0 microseconds to just over 10,000 microseconds. In the example just given, we saw that the maximum latency was 97 microseconds from that particular sample. Therefore, we can conclude that the latency data in histogram form will not contain any useful information beyond the 97-microsecond bin.

History data is obtained by reading a special /proc file. This output is redirected to a regular file for analysis or plotting as follows:

# cat /proc/latency_hist/interrupt_off_latency/CPU0 > hist_data.txt

Listing 17-6 displays the first 10 lines of the history data.

Listing 17-6. Interrupt Off Latency History (Head)

$ cat /proc/latency_hist/interrupt_off_latency/CPU0 | head

#Minimum latency: 0 microseconds.

#Average latency: 1 microseconds.

#Maximum latency: 97 microseconds.

#Total samples: 60097595

#There are 0 samples greater or equal than 10240 microseconds

#usecs samples

0 13475417

1 38914907

2 2714349

3 442308

...

From Listing 17-6 we can see the minimum and maximum values, the average of all the values, and the total number of samples. In this case, we accumulated slightly more than 60 million samples. The histogram data follows the summary and contains up to around 10,000 bins. We can easily plot this data using gnuplot as shown in Figure 17-5.

Figure 17-5. Interrupt off latency data

17.4.7. Latency Tracing

The LATENCY_TRACE configuration option enables generation of kernel trace data associated with the last maximum latency measurement. It is also made available through the /proc file system. A latency trace can help you isolate the longest-latency code path. For each new maximum latency measurement, an associated trace is generated that facilitates tracing the code path of the associated maximum latency.

Listing 17-7 reproduces an example trace for a 78-microsecond maximum. As with the other measurement tools, enable the measurement by writing a 0 to /proc/sys/kernel/preempt_max_latency.

Listing 17-7. Interrupt Off Maximum Latency Trace

$ cat /proc/latency_trace

preemption latency trace v1.1.5 on 2.6.14-rt-intoff-tim_trace

-------------------------------------------------------------

latency: 78 us, #50/50, CPU#0 | (M:rt VP:0, KP:0, SP:1 HP:1)

-----------------

| task: softirq-timer/0-3 (uid:0 nice:0 policy:1 rt_prio:1)

-----------------

_------=> CPU#

/ _-----=> irqs-off

| / _----=> need-resched

|| / _---=> hardirq/softirq

||| / _--=> preempt-depth

|||| /

||||| delay

cmd pid ||||| time | caller

\ / ||||| \ | /

cat-6637 0D... 1us : common_interrupt ((0))

cat-6637 0D.h. 2us : do_IRQ (c013d91c 0 0)

cat-6637 0D.h1 3us+: mask_and_ack_8259A (__do_IRQ)

cat-6637 0D.h1 10us : redirect_hardirq (__do_IRQ)

cat-6637 0D.h. 12us : handle_IRQ_event (__do_IRQ)

cat-6637 0D.h. 13us : timer_interrupt (handle_IRQ_event)

cat-6637 0D.h. 15us : handle_tick_update (timer_interrupt)

cat-6637 0D.h1 16us : do_timer (handle_tick_update)

... <we're in the timer interrupt function>

cat-6637 0D.h. 22us : run_local_timers (update_process_times)

cat-6637 0D.h. 22us : raise_softirq (run_local_timers)

cat-6637 0D.h. 23us : wakeup_softirqd (raise_softirq)

... <softirq work pending - need to preempt is signaled>

cat-6637 0Dnh. 34us : wake_up_process (wakeup_softirqd)

cat-6637 0Dnh. 35us+: rcu_pending (update_process_times)

cat-6637 0Dnh. 39us : scheduler_tick (update_process_times)

cat-6637 0Dnh. 39us : sched_clock (scheduler_tick)

cat-6637 0Dnh1 41us : task_timeslice (scheduler_tick)

cat-6637 0Dnh. 42us+: preempt_schedule (scheduler_tick)

cat-6637 0Dnh1 45us : note_interrupt (__do_IRQ)

cat-6637 0Dnh1 45us : enable_8259A_irq (__do_IRQ)

cat-6637 0Dnh1 47us : preempt_schedule (enable_8259A_irq)

cat-6637 0Dnh. 48us : preempt_schedule (__do_IRQ)

cat-6637 0Dnh. 48us : irq_exit (do_IRQ)

cat-6637 0Dn.. 49us : preempt_schedule_irq (need_resched)

cat-6637 0Dn.. 50us : __schedule (preempt_schedule_irq)

... <here is the context switch to softirqd-timer thread>

<...>-3 0D..2 74us+: __switch_to (__schedule)

<...>-3 0D..2 76us : __schedule <cat-6637> (74 62)

<...>-3 0D..2 77us : __schedule (schedule)

<...>-3 0D..2 78us : trace_irqs_on (__schedule)

... <output truncated here for brevity>

We have trimmed this listing significantly for clarity, but the key elements of this trace are obvious. This trace resulted from a timer interrupt. In the hardirq thread, little is done beyond queuing up some work for later in a softirq context. This is seen by the wakeup_softirqd() function at 23 microseconds and is typical for interrupt processing. This triggers the need_resched flag, as shown in the trace by the n in the third column of the second field. At 49 microseconds, after some processing in the timer softirq, the scheduler is invoked for preemption. At 74 microseconds, control is passed to the actual softirqd-timer/0 thread running in this particular kernel as PID 3. (The process name was truncated to fit the field width and is shown as <...> .)

Most of the fields of Listing 17-7 have obvious meanings. The irqs-off field contains a D for sections of code where interrupts are off. Because this latency trace is an interrupts off trace, we see this indicated throughout the trace. The need_resched field mirrors the state of the kernel's need_resched flag. An n indicates that the scheduler should be run at the soonest opportunity, and a period (.) means that this flag is not active. The hardirq/softirq field indicates a thread of execution in hardirq context with h, and softirq context with s. The preempt-depth field indicates the value of the kernel's preempt_count variable, an indicator of nesting level of locks within the kernel. Preemption can occur only when this variable is at zero.

17.4.8. Debugging Deadlock Conditions

The DEBUG_DEADLOCKS kernel configuration option enables detection and reporting of deadlock conditions associated with the semaphores and spinlocks in the kernel. When enabled, potential deadlock conditions are reported in a fashion similar to this:

==========================================

[ BUG: lock recursion deadlock detected! |

------------------------------------------

...

Much information is displayed after the banner line announcing the deadlock detection, including the lock descriptor, lock name (if available), lock file and name (if available), lock owner, who is currently holding the lock, and so on. Using this debug tool, it is possible to immediately determine the offending processes. Of course, fixing it might not be so easy!

17.4.9. Runtime Control of Locking Mode

The DEBUG_RT_LOCKING_MODE option enables a runtime control to switch the real-time mutex back into a nonpreemptable mode, effectively changing the behavior of the real-time (spinlocks as mutexes) kernel back to a spinlock-based kernel. As with the other configuration options we have covered here, this tool should be considered a development aid to be used only in a development environment.

It does not make sense to enable all of these debug modes at once. As you might imagine, most of these debug modes add size and significant processing overhead to the kernel. They are meant to be used as development aids and should be disabled for production code.

17.5. Chapter Summary

• Linux is increasingly being used in systems where real-time performance is required. Examples include multimedia applications and robot, industrial, and automotive controllers.

• Real-time systems are characterized by deadlines. When a missed deadline results in inconvenience or a diminished customer experience, we refer to this as soft real time. In contrast, hard real-time systems are considered failed when a deadline is missed.

• Kernel preemption was the first significant feature in the Linux kernel that addressed system-wide latency.

• Recent Linux kernels support several preemption modes, ranging from no preemption to full real-time preemption.

• The real-time patch adds several key features to the Linux kernel, resulting in reliable low latencies.

• The real-time patch includes several important measurement tools to aid in debugging and characterizing a real-time Linux implementation.

17.5.1. Suggestions for Additional Reading

Linux Kernel Development, 2nd Edition

Robert Love

Novell Press, 2005

We neglect the context switching time for interrupt processing because it is often negligible compared to interrupt off time.
Robert Love explains bottom-half processing in great detail in his book Linux Kernel Development. See Section 17.5.1, "Suggestions for Additional Reading," at the end of this chapter for the reference.
Interestingly, there is much debate on the correct spelling of preemptable! I defer to the survey done by Rick Lehrbaum on www.linuxdevices.com/articles/AT5136316996.html.
Also called HARDIRQs.
See Linux Kernel Development, referenced at the end of this chapter, to learn more about softirqs.
See www.rdrop.com/users/paulmck/RCU/ for an in-depth discussion of RCU.
We refer you again to Robert Love's book for an excellent discussion of the O(1) scheduler, and a delightful diatribe on algorithmic complexity, from which the notation O(1) derives.