52884.fb2
A typical embedded Linux distribution includes many useful tools. Some are complex and require a great deal of proficiency to master. Others are simple and have been all but ignored by developers of embedded systems. Some tools might require customization for a particular environment. Many will run "right out of the box" and provide the developer with useful information without much effort. This chapter presents a cross-section of the most important (and frequently neglected) tools available to the embedded Linux engineer.
It is impossible to provide complete details on the tools and utilities presented in this chapter. That would take an entire book by itself! Rather than provide a complete reference, our goal is to provide an introduction on the basic usage of each one. You are encouraged to pursue additional study on these and other important development tools. The man page (or other documentation) for each tool is a great place to start.
The GNU Debugger (GDB) is introduced first, followed by a brief look at the Data Display Debugger, a graphical front end for GDB. Next we introduce a series of utilities designed to give the developer a look at the behavior of programs and the system as a whole. These include strace, ltrace, top, and ps, often overlooked by inexperienced Linux developers. We then present some crash dump and memory-analysis tools. The chapter concludes by introducing some of the more useful binary utilities.
If you spend much time developing Linux applications, you will undoubtedly spend many hours getting to know the GNU Debugger. GDB is arguably the most important tool in the developer's toolbox. It has a long history, and its capabilities have blossomed to include low-level hardware-specific debugging support for a wide variety of architectures and microprocessors. It should be noted that the user manual for GDB is nearly as large as this book. Our intention here is to introduce GDB to get you started. You are encouraged to study the user manual referenced later under Section 13.7.1, "Suggestions for Additional Reading."
Because this is a book about embedded Linux development, we use a version of GDB that has been compiled as a cross-debugger. That is, the debugger itself runs on your development host, but it understands binary executables in the architecture for which it was configured at compile time. In the next few examples, we use GDB compiled for a Red Hat Linux-compatible development host, and an XScale (ARM) target processor. Although we use the short name gdb, we are presenting examples based on the XScale-enabled cross-gdb from the Monta Vista embedded Linux distribution for ARM XScale. The binary is called xscale_be-gdb. It is still GDB, simply configured for a cross-development environment.
The GDB debugger is a complex program with many configuration options during the build process. It is not our intention to provide guidance on building gdb that has been covered in other literature. For the purposes of this chapter, we assume that you have obtained a working GDB configured for the architecture and host development environment you will be using.
One of the most common reasons to drag GDB out of the toolbox is to evaluate a core dump. It is quick and easy, and often leads to immediate identification of the offending code. A core dump results when an application program generates a fault, such as accessing a memory location that it does not own. Many conditions can trigger a core dump,[80] but SIGSEGV (segmentation fault) is by far the most common. A SIGSEGV is a Linux kernel signal that is generated on illegal memory accesses by a user process. When this signal is generated, the kernel terminates the process. The kernel then dumps a core image, if so enabled.
To enable generation of a core dump, your process must have the resource limits to enable a core dump. This is achieved by setting the process's resource limits using the setrlimit() function call, or from a BASH or BusyBox shell command prompt, using ulimit. It is not uncommon to find the following line in the initialization scripts of an embedded system to enable the generation of core dumps on process errors:
$ ulimit -c unlimited
This BASH built-in command is used to set the size limit of a core dump. In the previous instance, the size is set to unlimited.
When an application program generates a segmentation fault (for example, by writing to a memory address outside its permissible range), Linux terminates the process and generates a core dump, if so enabled. The core dump is a snapshot of the running process at the time the segmentation fault occurred.
It helps to have debugging symbols enabled in your binary. GDB produces much more useful output with debugging symbols (gcc -g) enabled during the build. However, it is still possible to determine the sequence of events leading to the segmentation fault, even if the binary was compiled without debugging symbols. You might need to do a bit more investigative work without the aid of debugging symbols. You must manually correlate virtual addresses to locations within your program.
Listing 13-1 shows the results of a core dump analysis session using GDB. The output has been reformatted slightly to fit the page. We have used some demonstration software to intentionally produce a segmentation fault. Here is the output of the process (called webs) that generated the segmentation fault:
root@coyote:/workspace/websdemo# ./webs
Segmentation fault (core dumped)
Listing 13-1. Core Dump Analysis Using GDB
$ xscale_be-gdb webs core
GNU gdb 6.3 (MontaVista 6.3-20.0.22.0501131 2005-07-23)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public
License, and you are welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "--host=i686-pc-linux-gnu -target=armv5teb-montavista-linuxeabi"...
Core was generated by './webs'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /opt/montavista/pro/.../libc.so.6...done.
Loaded symbols for /opt/montavista/pro/.../libc.so.6
Reading symbols from /opt/montavista/pro/.../ld-linux.so.3...done.
Loaded symbols for /opt/montavista/pro/.../ld-linux.so.3
#0 0x00012ac4 in ClearBlock (RealBigBlockPtr=0x0, l=100000000) at led.c:43
43 *ptr = 0;
(gdb) l
38
39 static int ClearBlock(char * BlockPtr, int l)
40 {
41 char * ptr;
42 for (ptr = BlockPtr; (ptr - BlockPtr) < l; ptr++)
43 *ptr = 0;
44 return 0;
45 }
46 static int InitBlock(char * ptr, int n)
47 {
(gdb) p ptr
$1 = 0x0
(gdb)
The first line of Listing 13-1 shows how GDB was invoked from the command line. Because we are doing cross-debugging, we need the cross-version of GDB that has been compiled for our host and target system. We invoke our version of cross-gdb as shown and pass xscale_be-gdb the name of the binary followed by the name of the core dump filein this case, simply core. After GDB prints several banner lines describing its configuration and other information, it prints the reason for the termination: signal 11, the indication of a segmentation fault.[81] Several lines follow as GDB loads the binary, the libraries it depends on, and the core file. The last line printed upon GDB startup is the current location of the program when the fault occurred. The line preceded by the #0 string indicates the stack frame (stack frame zero in a function called ClearBlock() at virtual address 0x00012ac4). The following line preceded by 43 is the line number of the offending source line from a file called led.c. From there, GDB displays its command prompt and waits for input.
To provide some context, we enter the gdb list command, using its abbreviated form l. GDB recognizes command abbreviations where there is no ambiguity. Here the program error begins to present itself. The offending line, according to GDB's analysis of the core dump is:
43 *ptr = 0;
Next we issue the gdb print command on the ptr variable, again abbreviated as p. As you can see from Listing 13-1, the value of the pointer ptr is 0. So we conclude that the reason for the segmentation fault is the dereference of a null pointer, a common programming error. From here, we can elect to use the backtrace command to see the call chain leading to this error, which might lead us back to the actual source of the error. Listing 13-2 displays these results.
Listing 13-2. Backtrace Command
(gdb)
bt
#0 0x00012ac4 in ClearBlock (RealBigBlockPtr=0x0, l=100000000) at led.c:43
#1 0x00012b08 in InitBlock (ptr=0x0, n=100000000) at led.c:48
#2 0x00012b50 in ErrorInHandler (wp=0x325c8, urlPrefix=0x2f648 "/Error",
webDir=0x2f660 "", arg=0, url=0x34f30 "/Error", path=0x34d68 "/Error",
query=0x321d8 "") at led.c:61
#3 0x000126cc in websUrlHandlerRequest (wp=0x325c8) at handler.c:273
#4 0x0001f518 in websGetInput (wp=0x325c8, ptext=0xbefffc40,
pnbytes=0xbefffc38) at webs.c:664
#5 0x0001ede0 in websReadEvent (wp=0x325c8) at webs.c:362
#6 0x0001ed34 in websSocketEvent (sid=1, mask=2, iwp=206280) at webs.c:319
#7 0x00019740 in socketDoEvent (sp=0x34fc8) at sockGen.c:903
#8 0x00019598 in socketProcess (sid=1) at sockGen.c:845
#9 0x00012be8 in main (argc=1, argv=0xbefffe14) at main.c:99
(gdb)
The backtrace displays the call chain all the way back to main(), the start of the user's program. A stack frame number precedes each line of the backtrace. You can switch to any given stack frame using the gdb frame command. Listing 13-3 is an example of this. Here we switch to stack frame 2 and display the source code in that frame. As in the previous examples, the lines preceded with (gdb) are the commands we issue to GDB, and the other lines are the GDB output.
Listing 13-3. Moving Around Stack Frames in GDB
(gdb) frame 2
#2 0x00012b50 in ErrorInHandler (wp=0x325c8, urlPrefix=0x2f648 "/Error",
webDir=0x2f660 "", arg=0, url=0x34f30 "/Error", path=0x34d68 "/Error",
query=0x321d8 "") at led.c:61
61 return InitBlock(p, siz);
(gdb) l
56
57 siz = 10000 * sizeof(BigBlock);
58
59 p = malloc(siz);
60 /* if (p) */
61 return InitBlock(p, siz);
62 /* else return (0); */
63 }
64
65
(gdb)
As you can see, with a little help from the source code available using the list command, it would not be difficult to trace the code back to the source of the errant null pointer. In fact, the astute reader will notice the source of the segmentation fault we have produced for this example. From Listing 13-3, we see that the check of the return value in the call to malloc() has been commented out. In this example, the malloc() call failed, leading to the operation on a null pointer two frames later in the call chain. Although this example is both contrived and trivial, many crashes of this type are remarkably easy to track down using a similar method with GDB and core dumps. You can also see the null pointer by looking at the parameter values in the function call. This often leads you directly to the frame where the null pointer originated.
We conclude this introduction to GDB by showing a typical debug session. In the previous demonstration of a program crash, we could have elected to step through the code to narrow down the cause of the failure. Of course, if you get a core dump, you should always start there. However, in other situations, you might want to set breakpoints and step through running code. Listing 13-4 details how we start GDB in preparation for a debug session. Note that the program must have been compiled with the debug flag enabled in the gcc command line for GDB to be useful in this context. Refer back to Figure 12-1 in Chapter 12, "Embedded Development Environment"; this is a cross-debug session with GDB running on your development host, debugging a program running on your target. We cover complete details of remote application debugging in Chapter 15, "Debugging Embedded Linux Applications."
Listing 13-4. Initiating a GDB Debug Session
$ xscale_be-gdb -silent webs
(gdb)
target remote 192.168.1.21:2001
0x40000790 in ?? ()
(gdb) b main
Breakpoint 1 at 0x12b74: file main.c, line 78.
(gdb) c
Continuing.
Breakpoint 1, main (argc=1, argv=0xbefffe04) at main.c:78
78 bopen(NULL, (60 * 1024), B_USE_MALLOC);
(gdb) b ErrorInHandler
Breakpoint 2 at 0x12b30: file led.c, line 57.
(gdb) c
Continuing.
Breakpoint 2, ErrorInHandler (wp=0x311a0, urlPrefix=0x2f648 "/Error",
webDir=0x2f660 "", arg=0, url=0x31e88 "/Error", path=0x31918 "/Error",
query=0x318e8 "") at led.c:57
57 siz = 10000 * sizeof(BigBlock);
(gdb) next
59 p = malloc(siz);
(gdb) next
61 return InitBlock(p, siz);
(gdb) p p
$1 =(unsigned char *) 0x0
(gdb) p siz
$2 = 100000000
(gdb)
Following through this simple debug session, first we connect to our target board using the gdb target command. We cover remote debugging in more detail in Chapter 15. When we are connected to our target hardware, we set a breakpoint at main() using the gdb break (abbreviated b) command. Then we issue the gdb continue (abbreviated c) command to resume execution of the program. If we had any program arguments, we could have issued them on the command line when we invoked GDB.
We hit the breakpoint set at main(), and set another one at ErrorInHandler(), followed by the continue command, again abbreviated. When this new breakpoint is hit, we begin to step through the code using the next command. There we encounter the call to malloc(). Following the malloc() call, we examine the return value and discover the failure as indicated by the null return value. Finally, we print the value of the parameter in the malloc() call and see that a very large memory region (100 million bytes) is being requested, which fails.
Although trivial, the GDB examples in this section should enable the newcomer to become immediately productive with GDB. Few of us have really mastered GDBit is very complex and has many capabilities. Later in Section 13.2, "Data Display Debugger," we introduce a graphical front end to GDB that can ease the transition for those unfamiliar with GDB.
One final note about GDB: No doubt you have noticed the many banner lines GDB displays on the console when it is first invoked, as in Listing 13-1. In these examples, as stated earlier, we used a cross-gdb from the Monta Vista embedded Linux distribution. The banner lines contain a vital piece of information that the embedded developer must be aware of: GDB's host and target specifications. From Listing 13-1, we saw the following output when GDB was invoked:
This GDB was configured as "--host=i686-pc-linux-gnu --
target=armv5teb-montavista-linuxeabi"
In this instance, we were invoking a version of GDB that was compiled to execute from a Linux PCspecifically, an i686 running the GNU/Linux operating system. Equally critical, this instance of GDB was compiled to debug ARM binary code generated from the armv5teb big endian toolchain.
One of the most common mistakes made by newcomers to embedded development is to use the wrong GDB while trying to debug target executables. If something isn't working right, you should immediately check your GDB configuration to make sure that it makes sense for your environment. You cannot use your native GDB to debug target code!
The Data Display Debugger (DDD) is a graphical front end to GDB and other command line debuggers. DDD has many advanced features beyond simply viewing source code and stepping through a debug session. Figure 13-1 is a screen shot of the DDD's main screen.
Figure 13-1. Data Display Debugger
DDD is invoked as follows:
$ ddd --debugger xscale_be-gdb webs
Without the --debugger flag, DDD would attempt to invoke the native GDB on your development host, which is not what you want if you are planning to debug an application on your target system. The second argument on the DDD command line is the program you will be debugging. See the man page for DDD for additional details.
Using the command tool as shown in Figure 13-1, you can step through your program. You can set breakpoints either graphically or via the GDB console window at the bottom of the DDD screen. For target debugging, you must first connect your debugger to the target system as we did in Listing 13-4, using the target command. This command is issued in the GDB window of the ddd main screen.
When you are connected to the target, you can execute similar commands to the sequence described in the previous example to isolate the program failure. Figure 13-2 shows the DDD display during the later phase of this debugging session.
Figure 13-2. Debug session in DDD
Notice that in Figure 13-2 we have initiated the display of some important program variables that can help us narrow the cause of the segmentation fault. We can watch these variables as we step through the program using the command tool shown in the figure.
DDD is a powerful graphical front end for GDB. It is relatively easy to use and widely supported for many development hosts. Consult Section 13.7.1 at the end of this chapter for a link to the GNU DDD documentation.
We mention cbrowser here because support for this handy tool has found its way into the Linux kernel source tree.[82] cbrowser is a simple source-code browsing tool that makes it easy to bounce around a large source tree following symbols.
The Linux kernel makefile supports building the database that cbrowser uses. Here is an example invocation from a recent Linux kernel snapshot:
$ make ARCH=ppc CROSS_COMPILE=ppc_82xx- cscope
This produces the cscope symbol database that cbrowser uses. cscope is the engine; cbrowser is the graphical user interface. You can use cscope on its own if you want. It is command line driven and very powerful, but not quite as quick or easy for navigating a large source tree in this point-and-click era. If vi is still your favorite editor, cscope might be just for you!
To invoke cbrowser, enter the directory that contains your cscope database, and simply type the cbrowser command without arguments. Figure 13-3 shows an example session. You can read more about both of these useful tools in the references listed in Section 13.7.1 at the end of this chapter.
Figure 13-3. cbrowser in action
Many useful tools can provide you with various views of the system. Some tools offer a high-level perspective, such as what processes are running on your system and which processes are consuming the most CPU bandwidth. Other tools can provide detailed analysis, such as where memory is being allocated or, even more useful, where it is being leaked. The next few sections introduce the most important tools and utilities in this category. We have space for only a cursory introduction to these tools; references are provided where appropriate if you want more details.
This useful system trace utility is found in virtually all Linux distributions. strace captures and displays useful information for every kernel system call executed by a Linux application program. strace is especially handy because it can be run on programs for which no source code is available. It is not necessary to compile the program with debug symbols as it is with GDB. Furthermore, strace can be a very insightful educational tool. As the man page states, "Students, hackers and the overly-curious will find that a great deal can be learned about a system and its system calls by tracing even ordinary programs."
While preparing the example software for the GDB section earlier in this chapter, I decided to use a software project unfamiliar to me, an early version of the GoAhead web server. The first attempt at compiling and linking the project led to an interesting example for strace. Starting the application from the command line silently returned control back to the console. No error messages were produced, and a look into the system logs also produced no clues! It simply would not run.
strace quickly identified the problem. The output from invoking strace on this software package is produced in Listing 13-5. Many lines from this output have been deleted due to space considerations. The unedited output is over one hundred lines long.
Listing 13-5.[83] strace Output: GoAhead Web Demo
01 root@coyote:/home/websdemo$ strace ./websdemo
02 execve("./websdemo", ["./websdemo"], [/* 14 vars */]) = 0
03 uname({sys="Linux", node="coyote", ...}) = 0
04 brk(0) = 0x10031050
05 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory)
06 open("/etc/ld.so.cache", O_RDONLY) = -1 ENOENT (No such file or directory)
07 open("/lib/libc.so.6", O_RDONLY) = 3
08 read(3, "\177ELF\1\2\1\0\0\0\0\0\0\0\0\0\0\3\0\24\0\0\0\1\0\1\322"..., 1024) = 1024
09 fstat64(0x3, 0x7fffefc8) = 0
10 mmap(0xfe9f000, 1379388, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xfe9f000
11 mprotect(0xffd8000, 97340, PROT_NONE) = 0
12 mmap(0xffdf000, 61440, PROT_READ|PROT_WRITE|PROT_EXEC,MAP_PRIVATE|MAP_FIXED, 3,
0x130000) = 0xffdf000
13 mmap(0xffee000, 7228, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xffee000
14 close(3) = 0
15 brk(0) = 0x10031050
16 brk(0x10032050) = 0x10032050
17 brk(0x10033000) = 0x10033000
18 brk(0x10041000) = 0x10041000
19 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
20 stat("./umconfig.txt", 0x7ffff9b8) = -1 ENOENT (No such file or directory)
21 uname({sys="Linux", node="coyote", ...}) = 0
22 gettimeofday({3301, 178955}, NULL) = 0
23 getpid() = 156
24 open("/etc/resolv.conf", O_RDONLY) = 3
25 fstat64(0x3, 0x7fffd7f8) = 0
26 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x30017000
27 read(3, "#\n# resolv.conf This file is th"..., 4096) = 83
28 read(3, "", 4096) = 0
29 close(3) = 0
... <<< Lines 30-81 removed for brevity
82 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
83 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("0.0.0.0")}, 28) = 0
84 send(3, "\267s\1\0\0\1\0\0\0\0\0\0\6coyotea\0\0\1\0\1", 24, 0) = 24
85 gettimeofday({3301, 549664}, NULL) = 0
86 poll([{fd=3, events=POLLIN, revents=POLLERR}], 1, 5000) = 1
87 ioctl(3, 0x4004667f, 0x7fffe6a8) = 0
88 recvfrom(3, 0x7ffff1f0, 1024, 0, 0x7fffe668, 0x7fffe6ac) = -1 ECONNREFUSED (Connection
refused)
89 close(3) = 0
90 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
91 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("0.0.0.0")}, 28) = 0
92 send(3, "\267s\1\0\0\1\0\0\0\0\0\0\6coyote\0\0\1\0\1", 24, 0) = 24
93 gettimeofday({3301, 552839}, NULL) = 0
94 poll([{fd=3, events=POLLIN, revents=POLLERR}], 1, 5000) = 1
95 ioctl(3, 0x4004667f, 0x7fffe6a8) = 0
96 recvfrom(3, 0x7ffff1f0, 1024, 0, 0x7fffe668, 0x7fffe6ac) = -1 ECONNREFUSED (Connection
refused)
97 close(3) = 0
98 exit(-1) = ?
99 root@coyote:/home/websdemo#
Line numbers have been added to the output produced by strace to make this listing more readable. Invocation of the command is found on line number 01. In its simplest form, simply add the strace command directly in front of the program you want to examine. This is how the output in Listing 13-5 was produced.
Each line of this trace represents the websdemo process making a system call into the kernel. We don't need to analyze and understand each line of the trace, although it is quite instructive to do so. We are looking for any anomalies that might help pinpoint why the program won't run. In the first several lines, Linux is setting up the environment in which the program will execute. We see several open() system calls to /etc/ld.so.*, which are the Linux dynamic linker-loader (ld.so) doing its job. In fact, line 06 was my clue that this example embedded board had not been properly configured. There should be a linker cache produced by running ldconfig. (The linker cache substantially speeds up searching for shared library references.) This was subsequently resolved by running ldconfig on the target.
Down through line 19 is more basic housekeeping, mostly by the loader and libc initializing. Notice in line 20 that the program is looking for a configuration file but did not find one. That could be an important issue when we get the software running. Starting with line 24, the program begins to set up and configure the appropriate networking resources that it needs. Lines 24 through 29 open and read a Linux system file containing instructions for the DNS service to resolve hostnames. Local network configuration activity continues through line 81. Most of this activity consists of network setup and configuration necessary to build the networking infrastructure for the program itself. This portion of the listing has been removed for brevity and clarity.
Notice especially the network activity starting with line 82. Here we have the program trying to establish a TCP/IP connection to an IP address of all zeros. Line 82 is reproduced here for convenience:
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
A couple points about Listing 13-5 are worth noting. We might not know all the details of every system call, but we can get a general idea of what is happening. The socket() system call is similar to a file system open() call. The return value, indicated by the = sign, in this case, represents a Linux file descriptor. Knowing this, we can associate the activity from line 82 through the close() system call in line 89 with file descriptor 3.
We are interested in this group of related system calls because we see an error message in line 88: "Connection refused." At this point, we still don't know why the program won't run, but this appears abnormal. Let's investigate. Line 82, the system call to socket(), establishes an endpoint for IP communication. Line 83 is quite curious because it tries to establish a connection to a remote endpoint (socket) containing an IP address of all zeros. We don't have to be network experts to suspect that this might be causing trouble.[84] Line 83 provides another important clue: The port parameter is set to 53. A quick Google search for TCP/IP port numbers reveals that port 53 is the Domain Name Service, or DNS.
Line 84 provides yet another clue. Our board has a hostname of coyote. This can be seen as part of the command prompt in line 01 of Listing 13-5. It appears that this activity is a DNS lookup for our board's hostname, which is failing. As an experiment, we add an entry in our target system's /etc/hosts[85] file to associate our locally defined hostname with the board's IP locally assigned IP address, as follows:
Coyote 192.168.1.21 #The IP address we assigned
Voilà: Our program begins to function normally. Although we might not know exactly why this would lead to a program failure (TCP/IP networking experts might), our strace output led us to the fact that a DNS lookup for our board name was failing. When we corrected that, the program started up happily and began serving web pages. To recap, this was a program for which we had no source code to reference, and it had no symbols compiled into its binary image. Using strace, we were able to determine the cause of the program failure, and implement a solution.
The strace utility has many command line options. One of the more useful includes the capability to select a subset of system calls for tracing. For example, if you want to see only the network-related activity of a given process, issue the command as follows:
$ strace -e trace=network process_name
This produces a trace of all the network-related system calls, such as socket(), connect(), recvfrom(), and send(). This is a powerful way to view the network activity of a given program. Several other subsets are available. For example, you can view only the file- related activities of a program, with open(), close(), read(), write(), and so on. Additional subsets include process-related system calls, signal-related system calls, and IPC-related system calls.
It is worth noting that strace is capable of dealing with tracing programs that spawn additional processes. Invoking strace with the -f option instructs strace to follow child processes that are created using the fork() system call. Numerous possibilities exist with the strace command. The best way to become proficient with this powerful utility is to use it. Make it a point with this and all the tools we present to seek out and read the latest open-source documentation. In this case, man strace on most Linux hosts will produce enough material to keep you experimenting for an afternoon!
One very useful way to employ strace is using the -c option. This option produces a high-level profiling of your application. Using the -c option, strace accumulates statistics on each system call, how many times it was encountered, how many times errors were returned, and the time spent in each system call. Listing 13-6 is an example of running strace -c on the webs demo from the previous example.
Listing 13-6. Profiling Using strace
root@coyote$
strace -c ./webs
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- --------
29.80 0.034262 189 181 send
18.46 0.021226 1011 21 10 open
14.11 0.016221 130 125 read
11.87 0.013651 506 27 8 stat64
5.88 0.006762 193 35 select
5.28 0.006072 76 80 fcntl64
3.47 0.003994 65 61 time
2.79 0.003205 3205 1 execve
1.71 0.001970 90 22 3 recv
1.62 0.001868 85 22 close
1.61 0.001856 169 11 shutdown
1.38 0.001586 144 11 accept
0.41 0.000470 94 5 mmap2
0.26 0.000301 100 3 mprotect
0.24 0.000281 94 3 brk
0.17 0.000194 194 1 1 access
0.13 0.000150 150 1 lseek
0.12 0.000141 47 3 uname
0.11 0.000132 132 1 listen
0.11 0.000128 128 1 socket
0.09 0.000105 53 2 fstat64
0.08 0.000097 97 1 munmap
0.06 0.000064 64 1 getcwd
0.05 0.000063 63 1 bind
0.05 0.000054 54 1 setsockopt
0.04 0.000048 48 1 rt_sigaction
0.04 0.000046 46 1 gettimeofday
0.03 0.000038 38 1 getpid
------ ----------- ----------- --------- --------- -----------
100.00 0.114985 624 22 total
This is a very useful way to get a high-level view of where your application is consuming time and where errors are occurring. Some errors might be a normal part of your application's operation, but others might be consuming time that you hadn't intended. From Listing 13-6, we can see that the syscall with the longest duration was the execve(), which is the call that the shell used to spawn the application. As you can see, it was called only once. Another interesting observation is that the send() system call was the most frequently used syscall. This makes sensethe application is a small web server.
Bear in mind that, like the other tools we have been discussing here, strace must be compiled for your target architecture. strace is executed on your target board, not your development host. You must use a version that is compatible with your architecture. If you purchase a commercial embedded Linux distribution, you should make sure that this utility is included for your chosen architecture.
The ltrace and strace utilities are closely related. The ltrace utility does for library calls what strace does for system calls. It is invoked in a similar fashion: Precede the program to be traced by the tracer utility, as follows:
$ ltrace ./example
Listing 13-7 reproduces the output of ltrace on a small example program that executes a handful of standard C library calls.
Listing 13-7. Example ltrace Output
$
ltrace ./example
__libc_start_main(0x8048594, 1, 0xbffff944, 0x80486b4, 0x80486fc <unfinished ...>
malloc(256) = 0x804a008
getenv("HOME") = "/home/chris"
strncpy(0x804a008, "/home", 5) = 0x804a008
fopen("foo.txt", "w") = 0x804a110
printf("$HOME = %s\n", "/home/chris"$HOME = /home/chris
) = 20
fprintf(0x804a110, "$HOME = %s\n", "/home/chris") = 20
fclose(0x804a110) = 0
remove("foo.txt") = 0
free(0x804a008) = <void>
+++ exited (status 0) +++
$
For each library call, the name of the call is displayed, along with varying portions of the parameters to the call. Similar to strace, the return value of the library call is then displayed. As with strace, this tool can be used on programs for which source code is not available.
As with strace, a variety of switches affect the behavior of ltrace. You can display the value of the program counter at each library call, which can be helpful in understanding your application's program flow. As with strace, you can use -c to accumulate and report count, error, and time statistics, making a useful simple profiling tool. Listing 13-8 displays the results of our simple example program using the -c option.
Listing 13-8. Profiling Using ltrace
$ ltrace -c ./example
$HOME = /home/chris
% time seconds usecs/call calls function
------ ----------- ----------- --------- ----------------
24.16 0.000231 231 1 printf
16.53 0.000158 158 1 fclose
16.00 0.000153 153 1 fopen
13.70 0.000131 131 1 malloc
10.67 0.000102 102 1 remove
9.31 0.000089 89 1 fprintf
3.35 0.000032 32 1 getenv
3.14 0.000030 30 1 free
3.14 0.000030 30 1 strncpy
------ ----------- ----------- --------- ----------------
100.00 0.000956 9 total
The ltrace tool is available only for programs that have been compiled to use dynamically linked shared library objects. This is the usual default, so unless you explicitly specify -static when compiling, you can use ltrace on the resulting binary. Again similar to strace, you must use an ltrace binary that has been compiled for your target architecture. These utilities are run on the target, not the host development system.
With the possible exception of strace and ltrace, no tools are more often neglected by the embedded systems developer than top and ps. Given the myriad options available for each utility, we could easily devote an entire chapter to these useful system-profiling tools. They are almost universally available in embedded Linux distributions.
Both of these utilities make use of the /proc file system, as described in Chapter 9, "File Systems." Much of the information they convey can be learned from the /proc file system if you know what to look for and how to parse the resulting information. These tools present that information in a convenient human-readable form.
The ps utility lists all the running processes on a machine. However, it is very flexible and can be tailored to provide much useful data on the state of a running machine and the processes running on it. For example, ps can display the scheduling policy of each process. This is particularly useful for systems that employ real-time processes.
Without any options, ps displays all processes with the same user ID as the user who invoked the command, and only those processes associated with the terminal on which the command was issued. This is useful when many jobs have been spawned by that user and terminal.
Passing options to ps can be confusing because ps supports a wide variety of standards (as in POSIX versus UNIX) and three distinct options styles: BSD, UNIX, and GNU. In general, BSD options are single or multiple letters, with no dash. UNIX options are the familiar dash-letter combinations, and GNU uses long argument formats preceded by double dashes. Refer to the man page for details of your ps implementation.
Everyone who uses ps likely has a favorite invocation. One particularly useful general-purpose invocation is ps aux. This displays every process on the system. Listing 13-9 is an example from a running embedded target board.
Listing 13-9. Process Listing
$ ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.8 1416 508 ? S 00:00 0:00 init [3]
root 2 0.0 0.0 0 0 ? S< 00:00 0:00 [ksoftirqd/0]
root 3 0.0 0.0 0 0 ? S< 00:00 0:00 [desched/0]
root 4 0.0 0.0 0 0 ? S< 00:00 0:00 [events/0]
root 5 0.0 0.0 0 0 ? S< 00:00 0:00 [khelper]
root 10 0.0 0.0 0 0 ? S< 00:00 0:00 [kthread]
root 21 0.0 0.0 0 0 ? S< 00:00 0:00 [kblockd/0]
root 62 0.0 0.0 0 0 ? S 00:00 0:00 [pdflush]
root 63 0.0 0.0 0 0 ? S 00:00 0:00 [pdflush]
root 65 0.0 0.0 0 0 ? S< 00:00 0:00 [aio/0]
root 36 0.0 0.0 0 0 ? S 00:00 0:00 [kapmd]
root 64 0.0 0.0 0 0 ? S 00:00 0:00 [kswapd0]
root 617 0.0 0.0 0 0 ? S 00:00 0:00 [mtdblockd]
root 638 0.0 0.0 0 0 ? S 00:00 0:00 [rpciod]
bin 834 0.0 0.7 1568 444 ? Ss 00:00 0:00 /sbin/portmap
root 861 0.0 0.0 0 0 ? S 00:00 0:00 [lockd]
root 868 0.0 0.9 1488 596 ? Ss 00:00 0:00 /sbin/syslogd -r
root 876 0.0 0.7 1416 456 ? Ss 00:00 0:00 /sbin/klogd -x
root 884 0.0 1.1 1660 700 ? Ss 00:00 0:00 /usr/sbin/rpc.statd
root 896 0.0 0.9 1668 584 ? Ss 00:00 0:00 /usr/sbin/inetd
root 909 0.0 2.2 2412 1372 ? Ss+ 00:00 0:00 -bash
telnetd 953 0.3 1.1 1736 732 ? S 05:58 0:00 in.telnetd
root 954 0.2 2.1 2384 1348 pts/0 Ss 05:58 0:00 -bash
root 960 0.0 1.2 2312 772 pts/0 R+ 05:59 0:00 ps aux
This is but one of the many ways to view output data using ps. The columns are explained in the following text.
• The USER and process ID (PID) fields should be self-explanatory.
• The %CPU field expresses the percent of CPU utilization since the beginning of the process's lifetime; thus, CPU usage will virtually never add up to 100 percent.
• The %MEM field indicates the ratio of the process's resident memory footprint to the total available physical memory.
• The VSZ field is the virtual memory size of the process in kilobytes.
• RSS is resident set size and indicates the nonswapped physical memory that a process has used, also in kilobytes.
• TTY is the controlling terminal of the process.
Most of the processes in this example are not associated with a controlling terminal. The ps command that generated Listing 13-9 was issued from a Telnet session, which is indicated by the pts/0 terminal device.
The STAT field describes the state of the process at the time this snapshot was produced. Here, S means that the process is sleeping, waiting on an event of some type, often I/O. R means that the process is in a runnable state (that is, the scheduler is free to give it control of the CPU if nothing of a higher priority is waiting). The left bracket next to the state letter is an indication that this process has a higher priority.
The final column is the command name. Those listed in brackets are kernel threads. Many more symbols and options are available; refer to the man page for ps for complete details.
Whereas ps is a one-time snapshot of the current system, top takes periodic snapshots of the state of the system and its processes. Similar to ps, top has numerous command line and configuration options. It is interactive and can be reconfigured while operating to customize the display to your particular needs.
Entered without options, top displays all running processes in a fashion very similar to the ps aux command presented in Listing 13-9, updated every 3 seconds. Of course, this and many other aspects of top are user configurable. The first few lines of the top screen display system information, also updated every 3 seconds. This includes the system uptime, the number of users, information on the number of processes and their state, and much more.
Listing 13-10 shows top in its default configuration, resulting from executing top from the command line without parameters.
Listing 13-10. top
top - 06:23:14 up 6:23, 2 users, load average: 0.00, 0.00, 0.00
Tasks: 24 total, 1 running, 23 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0% us, 0.3% sy, 0.0% ni, 99.7% id, 0.0% wa, 0.0% hi, 0.0% si
Mem: 62060k total, 17292k used, 44768k free, 0k buffers
Swap: 0k total, 0k used, 0k free, 11840k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
978 root 16 0 1924 952 780 R 0.3 1.5 0:01.22 top
1 root 16 0 1416 508 452 S 0.0 0.8 0:00.47 init
2 root 5 -10 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
3 root 5 -10 0 0 0 S 0.0 0.0 0:00.00 desched/0
4 root -2 -5 0 0 0 S 0.0 0.0 0:00.00 events/0
5 root 10 -5 0 0 0 S 0.0 0.0 0:00.09 khelper
10 root 18 -5 0 0 0 S 0.0 0.0 0:00.00 kthread
21 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kblockd/0
62 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pdflush
63 root 15 0 0 0 0 S 0.0 0.0 0:00.00 pdflush
65 root 19 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0
36 root 25 0 0 0 0 S 0.0 0.0 0:00.00 kapmd
64 root 25 0 0 0 0 S 0.0 0.0 0:00.00 kswapd0
617 root 25 0 0 0 0 S 0.0 0.0 0:00.00 mtdblockd
638 root 15 0 0 0 0 S 0.0 0.0 0:00.34 rpciod
834 bin 15 0 1568 444 364 S 0.0 0.7 0:00.00 portmap
861 root 20 0 0 0 0 S 0.0 0.0 0:00.00 lockd
868 root 16 0 1488 596 504 S 0.0 1.0 0:00.11 syslogd
876 root 19 0 1416 456 396 S 0.0 0.7 0:00.00 klogd
884 root 18 0 1660 700 612 S 0.0 1.1 0:00.02 rpc.statd
896 root 16 0 1668 584 504 S 0.0 0.9 0:00.00 inetd
909 root 15 0 2412 1372 1092 S 0.0 2.2 0:00.34 bash
953 telnetd 16 0 1736 736 616 S 0.0 1.2 0:00.27 in.telnetd
954 root 15 0 2384 1348 1096 S 0.0 2.2 0:00.16 bash
The default columns from Listing 13-10 are the PID, the user, the process priority, the process nice value, the virtual memory used by the process, the resident memory footprint, the amount of shared memory used by the task, and other fields that are identical to those described in the previous ps example.
Space permits only a cursory introduction to these useful utilities. You are encouraged to spend an afternoon with the man pages for top and ps to explore the richness of their capabilities.
The mtrace package is a simple utility that analyzes and reports on calls to malloc(), realloc(), and free() in your application. It is easy to use and can potentially help spot trouble in your application. As with other userland tools we have been describing in this chapter, you must have the mtrace package configured and compiled for your architecture. mtrace is a malloc replacement library that is installed on your target. Your application enables it with a special function call. Your embedded Linux distribution should contain the mtrace package.
To demonstrate this utility, we created a simple program that creates dynamic data on a simple linked list. Each list item was dynamically generated, as was each data item we placed on the list. Listing 13-11 reproduces the simple list structure.
Listing 13-11. Simple Linear Linked List
struct blist_s {
struct blist_s *next;
char *data_item;
int item_size;
int index;
};
Each list item was dynamically created using malloc() as follows and subsequently placed at the end of the linked list:
struct blist_s *p = malloc(sizeof(struct blist_s));
Each variable-sized data item in the list was also dynamically generated and added to the list item before being placed at the end of the list. This way, every list item was created using two calls to malloc(), one for the list item itself, represented by struct blist_s just shown, and one for the variable data item. We then generated 10,000 records on the list containing variable string data, resulting in 20,000 calls to malloc().
To use mtrace, three conditions must be satisfied:
• A header file, mcheck.h, must be included in the source file.
• The application must call mtrace() to install the handlers.
• The environment variable MALLOC_TRACE must specify the name of a writeable file to which the trace data is written.
When these conditions are satisfied, each call to one of the traced functions generates a line in the raw trace file defined by MALLOC_TRACE. The trace data looks like this:
@ ./mt_ex:[0x80486ec] + 0x804a5f8 0x10
The @ sign signals that the trace line contains an address or function name. In the previous example, the program was executing at the address in square brackets, 0x80486ec. Using binary utilities or a debugger, we could easily associate this address with a function. The plus sign (+) indicates that this is a call to allocate memory. A call to free() would be indicated by a minus sign. The next field indicates the virtual address of the memory location being allocated or freed. The last field is the size, which is included in every call to allocate memory.
This data format is not very user friendly. For this reason, the mtrace package includes a utility[86] that analyzes the raw trace data and reports on any inconsistencies. In the simplest case, the Perl script simply prints a single line with the message "No memory leaks". Listing 13-12 contains the output when memory leaks are detected.
Listing 13-12. mtrace Error Report
$
mtrace ./mt_ex mtrace.log
Memory not freed:
-----------------
Address Size Caller
0x0804aa70 0x0a at /home/chris/temp/mt_ex.c:64
0x0804abc0 0x10 at /home/chris/temp/mt_ex.c:26
0x0804ac60 0x10 at /home/chris/temp/mt_ex.c:26
0x0804acc8 0x0a at /home/chris/temp/mt_ex.c:64
As you can see, this simple tool can help you spot trouble before it happens, as well as find it when it does. Notice that the Perl script has displayed the filename and line number of each call to malloc() that does not have a corresponding call to free() for the given memory location. This requires debugging information in the executable file generated by passing the -g flag to the compiler. If no debugging information is found, the script simply reports the address of the function calling malloc().
dmalloc picks up where mTRace leaves off. The mtrace package is a simple, relatively nonintrusive package most useful for simple detection of malloc /free unbalance conditions. The dmalloc package enables the detection of a much wider range of dynamic memory-management errors. Compared to mTRace, dmalloc is highly intrusive. Depending on the configuration, dmalloc can slow your application to a crawl. It is definitely not the right tool if you suspect memory errors due to race conditions or other timing issues. dmalloc (and mtrace, to a lesser extent) will definitely change the timing of your application.
dmalloc is a very powerful dynamic memory-analysis tool. It is highly configurable and, therefore, somewhat complex. It takes some time to learn and master this tool. However, from QA testing to bug squashing, it could become one of your favorite development tools.
dmalloc is a debug malloc library replacement. These conditions must be satisfied to use dmalloc :
• Application code must include the dmalloc.h header file.
• The application must be linked against the dmalloc library.
• The dmalloc library and utility must be installed on your embedded target.
• Certain environment variables that the dmalloc library references must be defined before running your application on the target.
Although it is not strictly necessary, you should include dmalloc.h in your application program. This allows dmalloc to include file and line number information in the output.
Link your application against the dmalloc library of your choice. The dmalloc package can be configured to generate several different libraries, depending on your selections during package configuration. In the examples to follow, we have chosen to use the libdmalloc.so shared library object. Place the library (or a symlink to it) in a path where your compiler can find it. The command to compile your application might look something like this:
$
ppc_82xx-gcc -g -Wall -o mtest_ex -L../dmalloc-5.4.2/ \
-ldmalloc mtest_ex.c
This command line assumes that you've placed the dmalloc library (libdmalloc.so) in a location searched by the -L switch on the command linenamely, the ../dmalloc-5.4.2 directly just above the current directory.
To install the dmalloc library on your target, place it in your favorite location (perhaps /usr/local/lib). You might need to configure your system to find this library. On our example PowerPC system, we added the path /usr/local/lib to the /etc/ld.so.conf file and invoked the ldconfig utility to update the library search cache.
The last step in preparation is to set an environment variable that the dmalloc library uses to determine the level of debugging that will be enabled. The environment variable contains a debug bit mask that concatenates a number of features into a single convenient variable. Yours might look something like this:
DMALLOC_OPTIONS=debug=0x4f4ed03,inter=100,log=dmalloc.log
Here, debug is the debug-level bit mask, and inter sets an interval count at which the dmalloc library performs extensive checks on itself and the heap. The dmalloc library writes its log output to the file indicated by the log variable.
The dmalloc package comes with a utility to generate the DMALLOC_OPTIONS environment variable based on flags passed to it. The previous example was generated with the following dmalloc invocation. The documentation in the dmalloc package details this quite thoroughly, so we shall not reproduce that here.
$ dmalloc -p check-fence -l dmalloc.log -i 100 high
When these steps are complete, you should be able to run your application against the dmalloc debug library.
dmalloc produces a quite detailed output log. Listing 13-13 reproduces a sample dmalloc log output for an example program that intentionally generates some memory leaks.
Listing 13-13. dmalloc Log Output
2592: 4002: Dmalloc version '5.4.2' from 'http://dmalloc.com/'
2592: 4002: flags = 0x4f4e503, logfile 'dmalloc.log'
2592: 4002: interval = 100, addr = 0, seen # = 0, limit = 0
2592: 4002: starting time = 2592
2592: 4002: process pid = 442
2592: 4002: Dumping Chunk Statistics:
2592: 4002: basic-block 4096 bytes, alignment 8 bytes
2592: 4002: heap address range: 0x30015000 to 0x3004f000, 237568 bytes
2592: 4002: user blocks: 18 blocks, 73652 bytes (38%)
2592: 4002: admin blocks: 29 blocks, 118784 bytes (61%)
2592: 4002: total blocks: 47 blocks, 192512 bytes
2592: 4002: heap checked 41
2592: 4002: alloc calls: malloc 2003, calloc 0, realloc 0, free 1999
2592: 4002: alloc calls: recalloc 0, memalign 0, valloc 0
2592: 4002: alloc calls: new 0, delete 0
2592: 4002: current memory in use: 52 bytes (4 pnts)
2592: 4002: total memory allocated: 27546 bytes (2003 pnts)
2592: 4002: max in use at one time: 27546 bytes (2003 pnts)
2592: 4002: max alloced with 1 call: 376 bytes
2592: 4002: max unused memory space: 37542 bytes (57%)
2592: 4002: top 10 allocations:
2592: 4002: total-size count in-use-size count source
2592: 4002: 16000 1000 32 2 mtest_ex.c:36
2592: 4002: 10890 1000 20 2 mtest_ex.c:74
2592: 4002: 256 1 0 0 mtest_ex.c:154
2592: 4002: 27146 2001 52 4 Total of 3
2592: 4002: Dumping Not-Freed Pointers Changed Since Start:
2592: 4002: not freed: '0x300204e8|s1' (10 bytes) from 'mtest_ex.c:74'
2592: 4002: not freed: '0x30020588|s1' (16 bytes) from 'mtest_ex.c:36'
2592: 4002: not freed: '0x30020688|s1' (16 bytes) from 'mtest_ex.c:36'
2592: 4002: not freed: '0x300208a8|s1' (10 bytes) from 'mtest_ex.c:74'
2592: 4002: total-size count source
2592: 4002: 32 2 mtest_ex.c:36
2592: 4002: 20 2 mtest_ex.c:74
2592: 4002: 52 4 Total of 2
2592: 4002: ending time = 2592, elapsed since start = 0:00:00
It is important to note that this log is generated upon program exit. (dmalloc has many options and modes of operation; it is possible to configure dmalloc to print output lines when errors are detected.)
The first half of the output log reports high-level statistics about the heap and the overall memory usage of the application. Totals are produced for each of the malloc library calls, such as malloc(), free(), and realloc(). Interestingly, this default log reports on the top 10 allocations and the source location where they occurred. This can be very useful for overall system-level profiling.
Toward the end of the log, we see evidence of memory leaks in our application. You can see that the dmalloc library detected four instances of memory that was allocated that was apparently never freed. Because we included dmalloc.h and compiled with debug symbols, the source location where the memory was allocated is indicated in the log.
As with the other tools we've covered in this chapter, space permits only a brief introduction of this very powerful debug tool. dmalloc can detect many other conditions and limits. For example, dmalloc can detect when a freed pointer has been written. It can tell whether a pointer was used to access data outside its bounds but within the application's permissible address range. In fact, dmalloc can be configured to log almost any memory transaction through the malloc family of calls. dmalloc is a tool that is sure to pay back many times the effort taken to become proficient with it.
Although not strictly a tool, a kernel oops contains much useful information to help you troubleshoot the cause. A kernel oops results from a variety of kernel errors from simple memory errors produced by a process (fully recoverable, in most cases) to a hard kernel panic. Recent Linux kernels support display of symbolic information in addition to the raw hexadecimal address values. Listing 13-14 reproduces a kernel oops from a PowerPC target.
Listing 13-14. Kernel Oops
$ modprobe loop
Oops: kernel access of bad area, sig: 11 [#1]
NIP: C000D058 LR: C0085650 SP: C7787E80 REGS: c7787dd0 TRAP: 0300 Not tainted
MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
DAR: 00000000, DSISR: 22000000
TASK = c7d187b0[323] 'modprobe' THREAD: c7786000
Last syscall: 128
GPR00: 0000006C C7787E80 C7D187B0 00000000 C7CD25CC FFFFFFFF 00000000 80808081
GPR08: 00000001 C034AD80 C036D41C C034AD80 C0335AB0 1001E3C0 00000000 00000000
GPR16: 00000000 00000000 00000000 100170D8 100013E0 C9040000 C903DFD8 C9040000
GPR24: 00000000 C9040000 C9040000 00000940 C778A000 C7CD25C0 C7CD25C0 C7CD25CC
NIP [c000d058] strcpy+0x10/0x1c
LR [c0085650] register_disk+0xec/0xf0
Call trace:
[c00e170c] add_disk+0x58/0x74
[c90061e0] loop_init+0x1e0/0x430 [loop]
[c002fc90] sys_init_module+0x1f4/0x2e0
[c00040a0] ret_from_syscall+0x0/0x44
Segmentation fault
Notice that the register dump includes symbolic information, where appropriate. Your kernel must have KALLSYSMS enabled for this symbolic information to be available. Figure 13-4 shows the configuration options under the General Setup main menu.
Figure 13-4. Symbol support for oops
Much of the information in a kernel oops message is directly related to the processor. Having some knowledge of the underlying architecture is necessary to fully understand the oops message.
Analyzing the oops in Listing 13-14, we see right away that the oops was generated due to a "kernel access of bad area, sig: 11". We already know from previous examples in this chapter that signal 11 is a segmentation fault.
The first section is a summary showing the reason for the oops, a few important pointers, and the offending task. In Listing 13-14, NIP is the next instruction pointer, which is decoded later in the oops message. This points to the offending code that led to the oops. LR is a PowerPC register and usually indicates the return address for the currently executing subroutine. SP is the stack pointer. REGS indicates the kernel address for the data structure containing the register dump data, and TRAP indicates the type of exception that this oops message relates to. Referring to the PowerPC architecture reference manual referenced at the end of Chapter 7, "Bootloaders," we see that a TRAP 0300 is the PowerPC Data Storage Interrupt, which is triggered by a data memory access error.
On the third line of the oops message, we see additional PowerPC machine registers, such as MSR (machine state register) and a decode of some of its bits. On the next line, we see the DAR (data access register), which often contains the offending memory address. The DSISR register contents can be used in conjunction with the PowerPC architecture reference to discover much detail about the specific reason for the exception.
An oops message also contains the task pointer and the decoded task name to quickly determine what task or thread was running at the time of the oops. We also see a detailed processor register dump, which can be used for additional clues. Again, we need knowledge of the architecture and compiler register usage to make sense of the clues from the register values. For example, the PowerPC architecture uses the r3 register for return values from C functions.
The last part of the oops message provides a stack backtrace with symbol decode if symbols are enabled in the kernel. Using this information, we can construct a sequence of events that led to the offending condition.
In this simple example, we have learned a great deal of information from this oops message. We know that it was a PowerPC Data Storage Exception, caused by an error in a data memory access (as opposed to an instruction fetch memory access). The DAR register tells us that the data address that generated this exception was 0x0000_0000. We know that the modprobe process produced the error. From the backtrace and NIP (next instruction pointer), we know that it was in a call to strcpy() that can be traced directly back to the loop_init() function in the loop.ko module, which modprobe was trying to insert at the time of the exception. Given this information, tracking down the source of this errant null pointer dereference should be quite trivial.
Binary utilities, or binutils, are a critical component of any toolchain. Indeed, to build a compiler, you must first have successfully built binutils. In this section, we briefly introduce the more useful tools that the embedded developer needs to know about. As with most of the other tools in this chapter, these are cross-utilities and must be built to execute on your development host while operating on binary files targeted to your chosen architecture. Alternatively, you could compile or obtain versions of these to run on your target, but we assume a cross-development environment for these examples.
The readelf utility examines the composition of your target ELF binary file. This is particularly useful for building images targeted for ROM or Flash memory where explicit control of the image layout is required. It is also a great tool for learning how your toolchain builds images and for understanding the ELF file format.
For example, to display the symbol table in an ELF image, use this command:
$ readelf -s <elf-image>
To discover and display all the sections in your ELF image, use this command:
$ readelf -e <elf-image>
Use the -S flag to list the section headers in your ELF image. You might be surprised to learn that even a simple seven-line "hello world" program contains 38 separate sections. Some of them will be familiar to you, such as the .text and .data sections. Listing 13-15 contains a partial listing of sections from our "hello world" example. For simplicity, we have listed only those sections that are likely to be familiar or relevant to the embedded developer.
Listing 13-15. readelf Section Headers
$
ppc_82xx-readelf -S hello-ex
There are 38 section headers, starting at offset 0x32f4:
Section Headers:
[ Nr] Name Type Addr Off Size ES Flg Lk Inf Al
...
[11] .text PROGBITS 100002f0 0002f0 000568 00 AX 0 0 4
...
[13] .rodata PROGBITS 10000878 000878 000068 00 A 0 0 4
...
[15] .data PROGBITS 100108e0 0008e0 00000c 00 WA 0 0 4
...
[22] .sdata PROGBITS 100109e0 0009e0 00001c 00 WA 0 0 4
[23] .sbss NOBITS 100109fc 0009fc 000000 00 WA 0 0 1
...
[25] .bss NOBITS 10010a74 0009fc 00001c 00 WA 0 0 4
...
The .text section contains the executable program code. The .rodata section contains constant data in your program. The .data section generally contains initialized global data used by the C library prologue code and can contain large initialized data items from your application. The .sdata section is used for smaller initialized global data items and exists only on some architectures. Some processor architectures can make use of optimized data access when the attributes of the memory area are known. The .sdata and .sbss sections enable these optimizations. The .bss and .sbss sections contain uninitialized data in your program. These sections occupy no space in the program imagetheir memory space is allocated and initialized to zero on program startup by C library prologue code.
We can dump any of these sections and display the contents. Given this line in your C program declared outside of any function, we can examine how it is placed in the .rodata section:
char *hello_rodata = "This is a read-only data string\n";
Issue the readelf command specifying the section number we want to dump from Listing 13-15:
$ ppc_82xx-readelf -x 13 hello-ex
Hex dump of section '.rodata':
0x10000878 100189e0 10000488 1000050c 1000058c ................
0x10000888 00020001 54686973 20697320 61207265 ....This is a read-
0x10000898 61642d6f 6e6c7920 64617461 20737472 only data string
0x100008a8 696e670a 00000000 54686973 20697320 .....This is
0x100008b8 73746174 69632064 6174610a 00000000 static data.....
0x100008c8 48656c6c 6f20456d 62656464 65640a00 Hello Embedded..
0x100008d8 25730a00 25780a00 %s..%x..
We see that the initialized global variable that we declared is represented in the .rodata section, together with all the constant strings defined in the program.
One of the more useful features of readelf is to display the debug information contained in an ELF file. When the -g compiler flag is issued during a compilation, the compiler generates debug information in a series of sections within the resulting ELF file. We can use readelf to display these ELF section headers within the ELF file:
$ ppc-linux-readelf -S ex_sync | grep debug
[28] .debug_aranges PROGBITS 00000000 000c38 0000b8 00 0 0 8
[29] .debug_pubnames PROGBITS 00000000 000cf0 00007a 00 0 0 1
[30] .debug_info PROGBITS 00000000 000d6a 00079b 00 0 0 1
[31] .debug_abbrev PROGBITS 00000000 001505 000207 00 0 0 1
[32] .debug_line PROGBITS 00000000 00170c 000354 00 0 0 1
[33] .debug_frame PROGBITS 00000000 001a60 000080 00 0 0 4
[34] .debug_str PROGBITS 00000000 001ae0 00014d 00 0 0 1
Using readelf with the --debug-dump option, we can display the contents of any one of these .debug_* sections. You will see how this information can be useful in Chapter 14, "Kernel Debugging Techniques," when we discuss the challenge of debugging optimized kernel code.
Debug information can be very large. Displaying all the debug information in the Linux kernel ELF file vmlinux produces more than six million lines of output. However daunting it might appear, having at least a familiarity with debug information will make you a better embedded engineer.
Listing 13-16 is a partial listing of the contents of the .debug_info section from a small example application. For space considerations, we have shown only a few records.
Listing 13-16. Partial Debug Info Dump
$ ppc-linux-readelf -debug-dump=info ex_sync
1 The section .debug_info contains:
2
3 Compilation Unit @ 0:
4 Length: 109
5 Version: 2
6 Abbrev Offset: 0
7 Pointer Size: 4
8 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
9 DW_AT_stmt_list : 0
10 DW_AT_low_pc : 0x10000368
11 DW_AT_high_pc : 0x1000038c
12 DW_AT_name :
../sysdeps/powerpc/powerpc32/elf/start.S
13 DW_AT_comp_dir : /var/tmp/BUILD/glibc-2.3.3/csu
14 DW_AT_producer : GNU AS 2.15.94
15 DW_AT_language : 32769 (MIPS assembler)
...
394 <1><5a1>: Abbrev Number: 14 (DW_TAG_subprogram)
395 DW_AT_sibling : <5fa>
396 DW_AT_external : 1
397 DW_AT_name : main
398 DW_AT_decl_file : 1
399 DW_AT_decl_line : 9
400 DW_AT_prototyped : 1
401 DW_AT_type : <248>
402 DW_AT_low_pc : 0x100004b8
403 DW_AT_high_pc : 0x10000570
404 DW_AT_frame_base : 1 byte block: 6f (DW_OP_reg31)
...
423 <2><5e9>: Abbrev Number: 16 (DW_TAG_variable)
424 DW_AT_name : mybuf
425 DW_AT_decl_file : 1
426 DW_AT_decl_line : 11
427 DW_AT_type : <600>
428 DW_AT_location : 2 byte block: 91 20 (DW_OP_fbreg: 32)
...
The first record identified by the Dwarf2[87] tag DW_TAG_compile_unit identifies the first compilation unit of this PowerPC executable. It is a file called start.S, which provides startup prologue for a C program. The next record identified by DW_TAG_subprogram identifies the start of the user program, the familiar function main(). This Dwarf2 debug record contains a reference to the file and line number where main() is found. The final record in Listing 13-16 identifies a local variable in the main() routine called mybuf. Again, the line number and file are provided by this record. You can deduce from this information that main() is at line 9, and mybuf is at line 11 of the source file. Other debug records in the ELF file correlate the filename via the Dwarf2 DW_AT_decl_file attribute.
You can discover all the details of the Dwarf2 debug information format via the reference given in Section 13.7.1 at the end of this chapter.
The objdump utility has considerable overlap with the readelf tool. However, one of the more useful features of objdump is its capability to display disassembled object code. Listing 13-17 provides an example of disassembly of the .text section of the simple "hello world" PowerPC version. We include only the main() routine, to save space. The entire dump, including C library prologue and epilogue, would consume many pages.
Listing 13-17. Disassembly Using objdump
$ ppc_82xx-objdump -S -m powerpc:common -j .text hello
...
10000488 <main>:
10000488: 94 21 ff e0 stwu r1,-32(r1)
1000048c: 7c 08 02 a6 mflr r0
10000490: 93 e1 00 1c stw r31,28(r1)
10000494: 90 01 00 24 stw r0,36(r1)
10000498: 7c 3f 0b 78 mr r31,r1
1000049c: 90 7f 00 08 stw r3,8(r31)
100004a0: 90 9f 00 0c stw r4,12(r31)
100004a4: 3d 20 10 00 lis r9,4096
100004a8: 38 69 08 54 addi r3,r9,2132
100004ac: 4c c6 31 82 crclr 4*cr1+eq
100004b0: 48 01 05 11 bl 100109c0
<__bss_start+0x60>
100004b4: 38 00 00 00 li r0,0
100004b8: 7c 03 03 78 mr r3,r0
100004bc: 81 61 00 00 lwz r11,0(r1)
100004c0: 80 0b 00 04 lwz r0,4(r11)
100004c4: 7c 08 03 a6 mtlr r0
100004c8: 83 eb ff fc lwz r31,-4(r11)
100004cc: 7d 61 5b 78 mr r1,r11
100004d0: 4e 80 00 20 blr
...
Much of the code from the simple main() routine is stack frame creation and destruction. The actual call to printf() is represented by the branch link (bl) instruction near the center of the listing at address 0x100004b0. This is a PowerPC function call. Because this program was compiled as a dynamically linked object, we will not have an address for the printf() function until runtime, when it is linked with the shared library printf() routine. Had we compiled this as a statically linked object, we would see the symbol and corresponding address for the call to printf().
objcopy formats and, optionally, converts the format of a binary object file. This utility is quite useful for generating code for ROM or Flash resident images. The U-Boot bootloader introduced in Chapter 7 makes use of objcopy to produce binary and s-record[88] output formats from the final ELF file. This example usage illustrates the capabilities of objcopy and its use to build Flash images.
$ ppc_82xx-objcopy --gap-fill=0xff -O binary u-boot u-boot.bin
This objcopy invocation shows how an image might be prepared for Flash memory. The input file u-boot, in this exampleis the complete ELF U-Boot image, including symbols and relocation information. The objcopy utility takes only the relevant sections containing program code and data and places the image in the output file, specified here as u-boot.bin.
Flash memory contains all ones in its erased state. Therefore, filling gaps in a binary image with all ones improves programming efficiency and prolongs the life of the Flash memory, which today has limited write cycles. This is done with the --gap-fill parameter to objcopy.
This is but one simple example usage of objcopy. This utility can be used to generate s-records and convert from one format to another. See the man page for complete details.
Your toolchain contains several additional useful utilities. Learning to use these utilities is straightforward. You will find many uses for these helpful tools.
The strip utility can be used to remove symbols and debug information from a binary. This is frequently used to save space on an embedded device. In the cross-development model, it is convenient to place stripped binaries on the target system and leave the unstripped version on your development host. Using this method, symbols are available for cross-debugging on your development host while saving space on the target. strip has many options, which are described in the man page.
When we highlighted mtrace in Listing 13-12, you saw that the output from the mtrace analysis script contained file and line number information. The mTRace Perl script used the addr2line utility to read the debug information contained in the executable ELF file and display a line number corresponding to the address. Using the same mtrace example executable, we can find a filename and line number for a virtual address:
$ addr2line -f -e mt_ex 0x80487c6
put_data
/home/chris/examples/mt_ex.c:64
Notice that the function put_data() is also listed together with the file and line number. This says that the address 0x80487c6 is on line 64 of the mt_ex.c file, in the put_data() function. This is even more useful in larger binaries consisting of multiple filenames, such as the Linux kernel:
$ ppc_82xx-addr2line -f -e vmlinux c000d95c
mpc52xx_restart
arch/ppc/syslib/mpc52xx_setup.c:41
This particular example highlights one of the points repeated throughout this chapter: This is an architecture-specific tool. You must use a tool configured and compiled to match the architecture of the target binary that you are using. As with the cross-compiler, addr2line is a cross-tool and part of the binary utilities package.
The strings utility examines ASCII string data in binary files. This is especially useful for examining memory dumps when source code or debug symbols might not be available. You might often discover that you can narrow the cause of a crash by tracing the strings back to the offending binary. Although strings does have a few command line options, it is easy to learn and use. See the man page for further details.
Although not strictly a binary utility, the ldd script is another useful tool for the embedded developer. It is part of the C library package and exists on virtually every Linux distribution. ldd lists the shared object library dependencies for a given object file or files. We introduced ldd in Chapter 11, "BusyBox." See Listing 11-2 for an example usage. The ldd script is particularly useful during development of ramdisk images. One of the most common failures asked about on the various embedded Linux mailing lists is a kernel panic after mounting root:
VFS: Mounted root (nfs filesystem).
Freeing unused kernel memory: 96k init
Kernel panic - not syncing: No init found. Try passing init=option to kernel.
One of the most common causes is that the root file system image (be it ramdisk, Flash, or NFS root file system) does not have the supporting libraries for the binaries that the kernel is trying to execute. Using ldd, you can determine which libraries each of your binaries requires and make sure that you include them in your ramdisk or other root file system image. In the previous example kernel panic, init was indeed on the file system, but the Linux dynamic loader, ld.so.1, was missing. Using ldd is quite straightforward:
$ xscale_be-ldd init
libc.so.6 => /opt/mvl/.../lib/libc.so.6 (0xdead1000)
ld-linux.so.3 => /opt/mvl/.../lib/ld-linux.so.3 (0xdead2000)
This simple example demonstrates that the init binary requires two dynamic library objects: libc and ld-linux. Both must be on your target and must be accessible to your init binary that is, they must be readable and executable.
The nm utility displays symbols from an object file. This can be useful for a variety of tasks. For example, when cross-compiling a large application, you encounter unresolved symbols. You can use nm to find which object module contains those symbols and then modify your build environment to include it.
The nm utility provides attributes for each symbol. For example, you can discover whether this symbol is local or global, or whether it is defined or referenced only in a particular object module. Listing 13-18 reproduces several lines from the output of nm run on the U-Boot ELF image u-boot.
Listing 13-18. Displaying Symbols Using nm
$ ppc_85xx-nm u-boot
...
fff23140 b base_address
fff24c98 B BootFile
fff06d64 T BootpRequest
fff00118 t boot_warm
fff21010 d border
fff23000 A __bss_start
...
Notice the link addresses of these U-Boot symbols. They were linked for a Flash device that lives in the highest portion of the memory map on this particular board. This listing contains only a few example symbols, for discussion purposes. The middle column is the symbol type. A capitalized letter indicates a global symbol, and lower case indicates a local symbol. B indicates that the symbol is located in the .bss section. T indicates that the symbol is located in the .text section. D indicates that the symbol is located in the .data section. A indicates that this address is absolute and is not subject to modification by an additional link stage. This absolute symbol indicates the start of the .bss section and is used by the code that clears the .bss on startup, as required for a C execution environment.
The prelink utility is often used in systems in which startup time is important. A dynamically linked ELF executable must be linked at runtime when the program is first loaded. This can take significant time in a large application. prelink prepares the shared libraries and the object files that depend on them to provide a-priori knowledge of the unresolved library references. In effect, this can reduce the startup time of a given application. The man page has complete details on the use of this handy utility.
• The GNU Debugger (GDB) is a complex and powerful debugger with many capabilities. We presented the basics to get you started.
• The DDD graphical front end for GDB integrates source code and data display with the power of GDB command line interface capabilities.
• cbrowser is a useful aid for understanding large projects. It uses the cscope database to rapidly find and display symbols and other elements of C source code.
• Linux is supported by many profiling and trace tools. We presented several, including strace, ltrace, top, and ps, and the memory profilers mtrace and dmalloc.
• Embedded developers often need to build custom images such as those required for bootloaders and firmware images. For these tasks, knowledge of binutils is indispensable. We presented many of the utilities found in binutils, including readelf, objdump, objcopy, and several others.
GDB: The GNU Project Debugger:
www.gnu.org/software/gdb/gdb.html
GDB Pocket Reference
Arnold Robbins
O'Reilly Media, 2005
Data Display Debugger:
www.gnu.org/software/ddd/
cbrowser home page:
http://cbrowser.sourceforge.net/
cscope home page:
http://cscope.sourceforge.net/index.html
dmallocDebug Malloc Library:
http://dmalloc.com/
Tool Interface Standard (TIS) Executable and Linking Format (ELF) Specification
Version 1.2
TIS Committee, May 1995
Tool interface standards:
DWARF Debugging Information Format Specification
Version 2.0
TIS Committee, May 1995
See SIG_KERNEL_COREDUMP_MASK in .../kernel/signal.c for a definition of which signals generate a core dump.
Signals and their associated numbers are defined in .../asm-/signal.h in your Linux kernel source tree.
Actually, support for the underlying engine that cbrowser uses is in the Linux build system.
See man ldconfig for details on creating a linker cache for your target system.
Sometimes an all-zeros address is appropriate in this context. However, we are investigating why the program bailed abnormally, so we should consider this suspect.
See man hosts for details of this system administration file.
The analysis utility is a Perl script supplied with the mTRace package.
A reference for the Dwarf2 Debug Information Specification is provided at the end of this chapter.
S-record files are an ASCII representation of a binary file, used by many device programmers and software binary utilities.