52884.fb2
When the power is applied to an embedded Linux system, a complex sequence of events is started. After a few seconds, the Linux kernel is operational and has spawned a series of application programs as specified by the system init scripts. A significant portion of these activities are governed by system configuration and are under the control of the embedded developer.
This chapter examines the initial sequence of events in the Linux kernel. We take a detailed look at the mechanisms and processes used during kernel initialization. We describe the Linux kernel command line and its use to customize the Linux environment on startup. With this knowledge, you will be able to customize and control the initialization sequence to meet the requirements of your particular embedded system.
At power-on, the bootloader in an embedded system is first to get processor control. After the bootloader has performed some low-level hardware initialization, control is passed to the Linux kernel. This can be a manual sequence of events to facilitate the development process (for example, the user types interactive load/boot commands at the bootloader prompt), or an automated startup sequence typical of a production environment. We have dedicated Chapter 7, "Bootloaders," to this subject, so we defer any detailed bootloader discussion to that chapter.
In Chapter 4, "The Linux Kernel: A Different Perspective," we examined the components that make up the Linux kernel image. Recall that one of the common files built for every architecture is the ELF binary named vmlinux. This binary file is the monolithic kernel itself, or what we have been calling the kernel proper. In fact, when we looked at its construction in the link stage of vmlinux, we pointed out where we might look to see where the first line of code might be found. In most architectures, it is found in an assembly language source file called head.S or similar. In the PowerPC (ppc) branch of the kernel, several versions of head.S are present, depending on the processor. For example, the AMCC 440 series processors are initialized from a file called head_44x.S.
Some architectures and bootloaders are capable of directly booting the vmlinux kernel image. For example, platforms based on PowerPC architecture and the U-Boot bootloader can usually boot the vmlinux image directly[36] (after conversion from ELF to binary, as you will shortly see). In other combinations of architecture and bootloader, additional functionality might be needed to set up the proper context and provide the necessary utilities for loading and booting the kernel.
Listing 5-1 details the final sequence of steps in the kernel build process for a hardware platform based on the ADI Engineering Coyote Reference Platform, which contains an Intel IXP425 network processor. This listing uses the quiet form of output from the kernel build system, which is the default. As pointed out in Chapter 4, it is a useful shorthand notation, allowing more focus on errors and warnings during the build.
Listing 5-1. Final Kernel Build Sequence: ARM/IXP425 (Coyote)
$ make ARCH=arm CROSS_COMPILE=xscale_be- zImage
... < many build steps omitted for clarity>
LD vmlinux
SYSMAP System.map
OBJCOPY arch/arm/boot/Image
Kernel: arch/arm/boot/Image is ready
AS arch/arm/boot/compressed/head.o
GZIP arch/arm/boot/compressed/piggy.gz
AS arch/arm/boot/compressed/piggy.o
CC arch/arm/boot/compressed/misc.o
AS arch/arm/boot/compressed/head-xscale.o
AS arch/arm/boot/compressed/big-endian.o
LD arch/arm/boot/compressed/vmlinux
OBJCOPY arch/arm/boot/zImage
Kernel: arch/arm/boot/zImage is ready
Building modules, stage 2.
...
In the third line of Listing 5-1, the vmlinux image (the kernel proper) is linked. Following that, a number of additional object modules are processed. These include head.o, piggy.o,[37] and the architecture-specific head-xscale.o, among others. (The tags identify what is happening on each line. For example, AS indicates that the assembler is invoked, GZIP indicates compression, and so on.) In general, these object modules are specific to a given architecture (ARM/XScale, in this example) and contain low-level utility routines needed to boot the kernel on this particular architecture. Table 5-1 details the components from Listing 5-1.
Table 5-1. ARM/XScale Low-Level Architecture Objects
Component | Function/Description |
---|---|
vmlinux | Kernel proper, in ELF format, including symbols, comments, debug info (if compiled with -g) and architecture-generic components. |
System.map | Text-based kernel symbol table for vmlinux module. |
Image | Binary kernel module, stripped of symbols, notes, and comments. |
head.o | ARM-specific startup code generic to ARM processors. It is this object that is passed control by the bootloader. |
piggy.gz | The file Image compressed with gzip. |
piggy.o | The file piggy.gz in assembly language format so it can be linked with a subsequent object, misc.o (see the text). |
misc.o | Routines used for decompressing the kernel image (piggy.gz), and the source of the familiar boot message: "Uncompressing Linux … Done" on some architectures. |
head-xscale.o | Processor initialization specific to the XScale processor family. |
big-endian.o | Tiny assembly language routine to switch the XScale processor into big-endian mode. |
vmlinux | Composite kernel image. Note this is an unfortunate choice of names, because it duplicates the name for the kernel proper; the two are not the same. This binary image is the result when the kernel proper is linked with the objects in this table. See the text for an explanation. |
zImage | Final composite kernel image loaded by bootloader. See the following text. |
An illustration will help you understand this structure and the following discussion. Figure 5-1 shows the image components and their metamorphosis during the build process leading up to a bootable kernel image. The following sections describe the components and process in detail.
Figure 5-1. Composite kernel image construction
After the vmlinux kernel ELF file has been built, the kernel build system continues to process the targets described in Table 5-1. The Image object is created from the vmlinux object. Image is basically the vmlinux ELF file stripped of redundant sections (notes and comments) and also stripped of any debugging symbols that might have been present. The following command is used for this:
xscale_be-objcopy -O binary -R .note -R .comment -S \
vmlinux arch/arm/boot/Image
In the previous objcopy command, the -O option tells objcopy to generate a binary file, the -R option removes the ELF sections named .note and .comment, and the -S option is the flag to strip debugging symbols. Notice that objcopy takes the vmlinux ELF image as input and generates the target binary file called Image. In summary, Image is nothing more than the kernel proper in binary form stripped of debug symbols and the .note and .comment ELF sections.
Following the build sequence further, a number of small modules are compiled. These include several assembly language files (head.o, head-xscale.o, and so on) that perform low-level architecture and processor-specific tasks. Each of these objects is summarized in Table 5-1. Of particular note is the sequence creating the object called piggy.o. First, the Image file (binary kernel image) is compressed using this gzip command:
gzip -f -9 < Image > piggy.gz
This creates a new file called piggy.gz, which is simply a compressed version of the binary kernel Image. You can see this graphically in Figure 5-1. What follows next is rather interesting. An assembly language file called piggy.S is assembled, which contains a reference to the compressed piggy.gz. In essence, the binary kernel image is being piggybacked into a low-level assembly language bootstrap loader.[38] This bootstrap loader initializes the processor and required memory regions, decompresses the binary kernel image, and loads it into the proper place in system memory before passing control to it. Listing 5-2 reproduces .../arch/arm/boot/compressed/piggy.S in its entirety.
Listing 5-2. Assembly File Piggy.S
.section .piggydata,#alloc
.globl input_data
input_data:
.incbin "arch/arm/boot/compressed/piggy.gz"
.globl input_data_end
input_data_end:This small assembly language file is simple yet produces a complexity that is not immediately obvious. The purpose of this file is to cause the compressed, binary kernel image to be emitted by the assembler as an ELF section called .piggydata. It is triggered by the .incbin assembler preprocessor directive, which can be viewed as the assembler's version of a #include file. In summary, the net result of this assembly language file is to contain the compressed binary kernel image as a payload within another imagethe bootstrap loader. Notice the labels input_data and input_data_end. The bootstrap loader uses these to identify the boundaries of the binary payload, the kernel image.
Not to be confused with a bootloader, many architectures use a bootstrap loader (or second-stage loader) to load the Linux kernel image into memory. Some bootstrap loaders perform checksum verification of the kernel image, and most perform decompression and relocation of the kernel image. The difference between a bootloader and a bootstrap loader in this context is simple: The bootloader controls the board upon power-up and does not rely on the Linux kernel in any way. In contrast, the bootstrap loader's primary purpose in life is to act as the glue between a board-level bootloader and the Linux kernel. It is the bootstrap loader's responsibility to provide a proper context for the kernel to run in, as well as perform the necessary steps to decompress and relocate the kernel binary image. It is similar to the concept of a primary and secondary loader found in the PC architecture.
Figure 5-2 makes this concept clear. The bootstrap loader is concatenated to the kernel image for loading.
Figure 5-2. Composite kernel image for ARM XScale
In the example we have been studying, the bootstrap loader consists of the binary images shown in Figure 5-2. The functions performed by this bootstrap loader include the following:
• Low-level assembly processor initialization, which includes support for enabling the processor's internal instruction and data caches, disabling interrupts, and setting up a C runtime environment. These include head.o and head-xscale.o.
• Decompression and relocation code, embodied in misc.o.
• Other processor-specific initialization, such as big-endian.o, which enables the big endian mode for this particular processor.
It is worth noting that the details we have been examining in the preceding sections are specific to the ARM/XScale kernel implementation. Each architecture has different details, although the concepts are similar. Using a similar analysis to that presented here, you can learn the requirements of your own architecture.
Perhaps you've seen a PC workstation booting a desktop Linux distribution such as Red Hat or SUSE Linux. After the PC's own BIOS messages, you see a flurry of console messages being displayed by Linux as it initializes the various kernel subsystems. Significant portions of the output are common across disparate architectures and machines. Two of the more interesting early boot messages are the kernel version string and the kernel command line, which is detailed shortly. Listing 5-3 reproduces the kernel boot messages for the ADI Engineering Coyote Reference Platform booting Linux on the Intel XScale IXP425 processor. The listing has been formatted with line numbers for easy reference.
Listing 5-3. Linux Boot Messages on IPX425
1 Uncompressing Linux... done, booting the kernel.
2 Linux version 2.6.14-clh (chris@pluto) (gcc version 3.4.3 (MontaVista 3.4.3-25.0.30
.0501131 2005-07-23)) #11 Sat Mar 25 11:16:33 EST 2006
3 CPU: XScale-IXP42x Family [690541c1] revision 1 (ARMv5TE)
4 Machine: ADI Engineering Coyote
5 Memory policy: ECC disabled, Data cache writeback
6 CPU0: D VIVT undefined 5 cache
7 CPU0: I cache: 32768 bytes, associativity 32, 32 byte lines, 32 sets
8 CPU0: D cache: 32768 bytes, associativity 32, 32 byte lines, 32 sets
9 Built 1 zonelists
10 Kernel command line: console=ttyS0,115200 ip=bootp root=/dev/nfs
11 PID hash table entries: 512 (order: 9, 8192 bytes)
12 Console: colour dummy device 80x30
13 Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
14 Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
15 Memory: 64MB = 64MB total
16 Memory: 62592KB available (1727K code, 339K data, 112K init)
17 Mount-cache hash table entries: 512
18 CPU: Testing write buffer coherency: ok
19 softlockup thread 0 started up.
20 NET: Registered protocol family 16
21 PCI: IXP4xx is host
22 PCI: IXP4xx Using direct access for memory space
23 PCI: bus0: Fast back to back transfers enabled
24 dmabounce: registered device 0000:00:0f.0 on pci bus
25 NetWinder Floating Point Emulator V0.97 (double precision)
26 JFFS2 version 2.2. (NAND) (C) 2001-2003 Red Hat, Inc.
27 Serial: 8250/16550 driver $Revision: 1.90 $ 2 ports, IRQ sharing disabled
28 ttyS0 at MMIO 0xc8001000 (irq = 13) is a XScale
29 io scheduler noop registered
30 io scheduler anticipatory registered
31 io scheduler deadline registered
32 io scheduler cfq registered
33 RAMDISK driver initialized: 16 RAM disks of 8192K size 1024 blocksize
34 loop: loaded (max 8 devices)
35 eepro100.c:v1.09j-t 9/29/99 Donald Becker http://www.scyld.com/network/eepro100.html
36 eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin <[email protected]
.com.sg> and others
37 eth0: 0000:00:0f.0, 00:0E:0C:00:82:F8, IRQ 28.
38 Board assembly 741462-016, Physical connectors present: RJ45
39 Primary interface chip i82555 PHY #1.
40 General self-test: passed.
41 Serial sub-system self-test: passed.
42 Internal registers self-test: passed.
43 ROM checksum self-test: passed (0x8b51f404).
44 IXP4XX-Flash.0: Found 1 x16 devices at 0x0 in 16-bit bank
45 Intel/Sharp Extended Query Table at 0x0031
46 Using buffer write method
47 cfi_cmdset_0001: Erase suspend on write enabled
48 Searching for RedBoot partition table in IXP4XX-Flash.0 at offset 0xfe0000
49 5 RedBoot partitions found on MTD device IXP4XX-Flash.0
50 Creating 5 MTD partitions on "IXP4XX-Flash.0":
51 0x00000000-0x00060000 : "RedBoot"
52 0x00100000-0x00260000 : "MyKernel"
53 0x00300000-0x00900000 : "RootFS"
54 0x00fc0000-0x00fc1000 : "RedBoot config"
55 mtd: partition "RedBoot config" doesn't end on an erase block -- force
read-only0x00fe0000-0x01000000 : "FIS directory"
56 NET: Registered protocol family 2
57 IP route cache hash table entries: 1024 (order: 0, 4096 bytes)
58 TCP established hash table entries: 4096 (order: 2, 16384 bytes)
59 TCP bind hash table entries: 4096 (order: 2, 16384 bytes)
60 TCP: Hash tables configured (established 4096 bind 4096)
61 TCP reno registered
62 TCP bic registered
63 NET: Registered protocol family 1
64 Sending BOOTP requests . OK
65 IP-Config: Got BOOTP answer from 192.168.1.10, my address is 192.168.1.141
66 IP-Config: Complete:
67 device=eth0, addr=192.168.1.141, mask=255.255.255.0, gw=255.255.25
5.255,
68 host=192.168.1.141, domain=, nis-domain=(none),
69 bootserver=192.168.1.10, rootserver=192.168.1.10,
rootpath=/home/chris/sandbox/coyote-target
70 Looking up port of RPC 100003/2 on 192.168.1.10
71 Looking up port of RPC 100005/1 on 192.168.1.10
72 VFS: Mounted root (nfs filesystem).
73 Freeing init memory: 112K
74 Mounting proc
75 Starting system loggers
76 Configuring lo
77 Starting inetd
78 / #
The kernel produces much useful information during startup, as shown in Listing 5-3. We study this output in some detail in the next few sections. Line 1 is produced by the bootstrap loader we presented earlier in this chapter. This message was produced by the decompression loader found in …/arch/arm/boot/compressed/misc.c.
Line 2 of Listing 5-3 is the kernel version string. It is the first line of output from the kernel itself. One of the first lines of C code executed by the kernel (in .../init/main.c) upon entering start_kernel() is as follows:
printk(linux_banner);
This line produces the output just describedthe kernel version string, Line 2 of Listing 5-3. This version string contains a number of pertinent data points related to the kernel image:
• Kernel version: Linux version 2.6.10-clh
• Username/machine name where kernel was compiled
• Toolchain info: gcc version 3.4.3, supplied by MontaVista Software
• Build number
• Date and time compiled
This is useful information both during development and later in production. All but one of the entries are self-explanatory. The build number is simply a tool that the developers added to the version string to indicate that something more substantial than the date and time changed from one build to the next. It is a way for developers to keep track of the build in a generic and automatic fashion. You will notice in this example that this was the eleventh build in this series, as indicated by the #11 on line 2 of Listing 5-3. The version string is stored in a hidden file in the top-level Linux directory and is called .version. It is automatically incremented by a build script found in .../scripts/mkversion and by the top-level makefile. In short, it is a version string that is automatically incremented whenever anything substantial in the kernel is rebuilt.
Now that we have an understanding of the structure and components of the composite kernel image, let's examine the flow of control from the bootloader to the kernel in a complete boot cycle. As we discussed in Chapter 2, "Your First Embedded Experience," the bootloader is the low-level component resident in system nonvolatile memory (Flash or ROM) that takes control immediately after the power has been applied. It is typically a small, simple set of routines designed primarily to do low-level initialization, boot image loading, and system diagnostics. It might contain memory dump and fill routines for examining and modifying the contents of memory. It might also contain low-level board self-test routines, including memory and I/O tests. Finally, a bootloader contains logic for loading and passing control to another program, usually an operating system such as Linux.
The ARM XScale platform used as a basis for the examples in this chapter contains the Redboot bootloader. When power is first applied, this bootloader is invoked and proceeds to load the operating system (OS). When the bootloader locates and loads the OS image (which could be resident locally in Flash, on a hard drive, or via a local area network or other device), control is passed to that image.
On this particular XScale platform, the bootloader passes control to our head.o module at the label Start in the bootstrap loader. This is illustrated in Figure 5-3.
Figure 5-3. ARM boot control flow
As detailed earlier, the bootstrap loader prepended to the kernel image has a single primary responsibility: to create the proper environment to decompress and relocate the kernel, and pass control to it. Control is passed from the bootstrap loader directly to the kernel proper, to a module called head.o for most architectures. It is an unfortunate historical artifact that both the bootstrap loader and the kernel proper contain a module called head.o because it is a source of confusion to the new embedded Linux developer. The head.o module in the bootstrap loader might be more appropriately called kernel_bootstrap_loader_head.o, although I doubt that the kernel developers would accept this patch. In fact, a recent Linux 2.6 source tree contains no fewer than 37 source files named head.S. This is another reason why you need to know your way around the kernel source tree.
Refer back to Figure 5-3 for a graphical view of the flow of control. When the bootstraploader has completed its job, control is passed to the kernel proper's head.o, and from there to start_kernel() in main.c.
The intention of the kernel developers was to keep the architecture-specific head.o module very generic, without any specific machine[39] dependencies. This module, derived from the assembly language file head.S, is located at .../arch/<ARCH>/kernel/head.S, where <ARCH> is replaced by the given architecture. The examples in this chapter are based on the ARM/XScale, as you have seen, with <ARCH>=arm.
The head.o module performs architecture- and often CPU-specific initialization in preparation for the main body of the kernel. CPU-specific tasks are kept as generic as possible across processor families. Machine-specific initialization is performed elsewhere, as you will discover shortly. Among other low-level tasks, head.o performs the following tasks:
• Checks for valid processor and architecture
• Creates initial page table entries
• Enables the processor's memory management unit (MMU)
• Establishes limited error detection and reporting
• Jumps to the start of the kernel proper, main.c
These functions contain some hidden complexities. Many novice embedded developers have tried to single-step through parts of this code, only to find that the debugger becomes hopelessly lost. Although a discussion of the complexities of assembly language and the hardware details of virtual memory is beyond the scope of this book, a few things are worth noting about this complicated module.
When control is first passed to the kernel's head.o from the bootstrap loader, the processor is operating in what we used to call real mode in x86 terminology. In effect, the logical address contained in the processor's program counter[40] (or any other register, for that matter) is the actual physical address driven onto the processor's electrical memory address pins. Soon after the processor's registers and kernel data structures are initialized to enable memory translation, the processor's memory management unit (MMU) is turned on. Suddenly, the address space as seen by the processor is yanked from beneath it and replaced by an arbitrary virtual addressing scheme determined by the kernel developers. This creates a complexity that can really be understood only by a detailed analysis of both the assembly language constructs and logical flow, as well as a detailed knowledge of the CPU and its hardware address translation mechanism. In short, physical addresses are replaced by logical addresses the moment the MMU is enabled. That's why a debugger can't single-step through this portion of code as with ordinary code.
The second point worth noting is the limited available mapping at this early stage of the kernel boot process. Many developers have stumbled into this limitation while trying to modify head.o for their particular platform.[41] One such scenario might go like this. Let's say you have a hardware device that needs a firmware load very early in the boot cycle. One possible solution is to compile the necessary firmware statically into the kernel image and then reference it via a pointer to download it to your device. However, because of the limited memory mapping done at this point, it is quite possible that your firmware image will exist beyond the range that has been mapped at this early stage in the boot cycle. When your code executes, it generates a page fault because you have attempted to access a memory region for which no valid mapping has been created inside the processor. Worse yet, a page fault handler has not yet been installed at this early stage, so all you get is an unexplained system crash. At this early stage in the boot cycle, you are pretty much guaranteed not to have any error messages to help you figure out what's wrong.
You are wise to consider delaying any custom hardware initialization until after the kernel has booted, if at all possible. In this manner, you can rely on the well-known device driver model for access to custom hardware instead of trying to customize the much more complicated assembly language startup code. Numerous undocumented techniques are used at this level. One common example of this is to work around hardware errata that may or may not be documented. A much higher price will be paid in development time, cost, and complexity if you must make changes to the early startup assembly language code. Hardware and software engineers should discuss these facts during early stages of hardware development, when often a minor hardware change can lead to significant savings in software development time.
It is important to recognize the constraints placed upon the developer in a virtual memory environment. Many experienced embedded developers have little or no experience in this environment, and the scenario presented earlier is but one small example of the pitfalls that await the developer new to virtual memory architectures. Nearly all modern 32-bit and larger microprocessors have memory-management hardware used to implement virtual memory architectures. One of the most significant advantages of virtual memory machines is that they help separate teams of developers write large complex applications, while protecting other software modules, and the kernel itself, from programming errors.
The final task performed by the kernel's own head.o module is to pass control to the primary kernel startup file written in C. We spend a good portion of the rest of this chapter on this important file.
For each architecture, there is a different syntax and methodology, but every architecture's head.o module has a similar construct for passing control to the kernel proper. For the ARM architecture it looks as simple as this:
b start_kernel
For PowerPC, it looks similar to this:
lis r4,start_kernel@h
ori r4,r4,start_kernel@l
lis r3,MSR_KERNEL@h
ori r3,r3,MSR_KERNEL@l
mtspr SRR0,r4
mtspr SRR1,r3
rfi
Without going into details of the specific assembly language syntax, both of these examples result in the same thing. Control is passed from the kernel's first object module (head.o) to the C language routine start_kernel() located in .../init/main.c. Here the kernel begins to develop a life of its own.
The file main.c should be studied carefully by anyone seeking a deeper understanding of the Linux kernel, what components make it up, and how they are initialized and/or instantiated. main.c does all the startup work for the Linux kernel, from initializing the first kernel thread all the way to mounting a root file system and executing the very first user space Linux application program.
The function start_kernel() is by far the largest function in main.c. Most of the Linux kernel initialization takes place in this routine. Our purpose here is to highlight those particular elements that will prove useful in the context of embedded systems development. It is worth repeating: Studying main.c is a great way to spend your time if you want to develop a better understanding of the Linux kernel as a system.
Among the first few things that happen in .../init/main.c in the start_kernel() function is the call to setup_arch(). This function takes a single parameter, a pointer to the kernel command line introduced earlier and detailed in the next section.
setup_arch(&command_line);
This statement calls an architecture-specific setup routine responsible for performing initialization tasks common across each major architecture. Among other functions, setup_arch() calls functions that identify the specific CPU and provides a mechanism for calling high-level CPU-specific initialization routines. One such function, called directly by setup_arch(), is setup_processor(), found in .../arch/arm/kernel/setup.c. This function verifies the CPU ID and revision, calls CPU-specific initialization functions, and displays several lines of information on the console during boot.
An example of this output can be found in Listing 5-3, lines 3 through 8. Here you can see the CPU type, ID string, and revision read directly from the processor core. This is followed by details of the processor cache type and size. In this example, the IXP425 has a 32KB I (instruction) cache and 32KB D (data) cache, along with other implementation details of the internal processor cache.
One of the final actions of the architecture setup routines is to perform any machine-dependent initialization. The exact mechanism for this varies across different architectures. For ARM, you will find machine-specific initialization in the .../arch/arm/mach-* series of directories, depending on your machine type. MIPS architecture also contains directories specific to supported reference platforms. For PowerPC, there is a machine-dependent structure that contains pointers to many common setup functions. We examine this in more detail in Chapter 16, "Porting Linux."
Following the architecture setup, main.c performs generic early kernel initialization and then displays the kernel command line. Line 10 of Listing 5-3 is reproduced here for convenience.
Kernel command line: console=ttyS0,115200 ip=bootp root=/dev/nfs
In this simple example, the kernel being booted is instructed to open a console device on serial port device ttyS0 (usually the first serial port) at a baud rate of 115Kbps. It is being instructed to obtain its initial IP address information from a BOOTP server and to mount a root file system via the NFS protocol. (We cover BOOTP later in Chapter 12, "Embedded Development Environment," and NFS in Chapters 9, "File Systems," and 12. For now, we limit the discussion to the kernel command line mechanism.)
Linux is typically launched by a bootloader (or bootstrap loader) with a series of parameters that have come to be called the kernel command line. Although we don't actually invoke the kernel using a command prompt from a shell, many bootloaders can pass parameters to the kernel in a fashion that resembles this well-known model. On some platforms whose bootloaders are not Linux aware, the kernel command line can be defined at compile time and becomes hard coded as part of the kernel binary image. On other platforms (such as a desktop PC running Red Hat Linux), the command line can be modified by the user without having to recompile the kernel. The bootstrap loader (Grub or Lilo in the desktop PC case) builds the kernel command line from a configuration file and passes it to the kernel during the boot process. These command line parameters are a boot mechanism to set initial configuration necessary for proper boot on a given machine.
Numerous command line parameters are defined throughout the kernel. The .../Documentation subdirectory in the kernel source contains a file called kernel-parameters.txt containing a list of kernel command line parameters in dictionary order. Remember the previous warning about kernel documentation: The kernel changes far faster than the documentation. Use this file as a guide, but not a definitive reference. More than 400 distinct kernel command line parameters are documented in this file, and it cannot be considered a comprehensive list. For that, you must refer directly to the source code.
The basic syntax for kernel command line parameters is fairly simple and mostly evident from the example in line 10 of Listing 5-3. Kernel command line parameters can be either a single text word, a key=value pair, or a key= value1, value2, …. key and multivalue format. It is up to the consumer of this information to process the data as delivered. The command line is available globally and is processed by many modules as needed. As noted earlier, setup_arch() in main.c is called with the kernel command line as its only argument. This is to pass architecture-specific parameters and configuration directives to the relevant portions of architecture- and machine-specific code.
Device driver writers and kernel developers can add additional kernel command-line parameters for their own specific needs. Let's take a look at the mechanism. Unfortunately, some complications are involved in using and processing kernel command line parameters. The first of these is that the original mechanism is being deprecated in favor of a much more robust implementation. The second complication is that we need to comprehend the complexities of a linker script file to fully understand the mechanism.[42]
It's not necessarily all that complex, but most of us never need to understand a linker script file. The embedded engineer does. It is well documented in the GNU LD manual referenced at the end of this chapter.
As an example of the use of kernel command line parameters, consider the specification of the console device. We want this device to be initialized early in the boot cycle so that we have a destination for console messages during boot. This initialization takes place in a kernel object called printk.o. The C source file for this module is found in .../kernel/printk.c. The console initialization routine is called console_setup() and takes the kernel command line parameter string as its only argument.
The challenge is to communicate the console parameters specified on the kernel command line to the setup and device driver routines that require this data in a modular and general fashion. Further complicating the issue is that typically the command line parameters are required early, before (or in time for) those modules that need them. The startup code in main.c, where the main processing of the kernel command line takes place, cannot possibly know the destination functions for each of hundreds of kernel command line parameters without being hopelessly polluted with knowledge from every consumer of these parameters. What is needed is a flexible and generic way to pass these kernel command line parameters to their consumers.
In Linux 2.4 and earlier kernels, developers used a simple macro to generate a not-so-simple sequence of code. Although it is being deprecated, the __setup macro is still in widespread use throughout the kernel. We next use the kernel command line from Listing 5-3 to demonstrate how the __setup macro works.
From the previous kernel command line (line 10 of Listing 5-3), this is the first complete command line parameter passed to the kernel:
console=ttyS0,115200
For the purposes of this example, the actual meaning of the parameters is irrelevant. Our goal here is to illustrate the mechanism, so don't be concerned if you don't understand the argument or its values.
Listing 5-4 is a snippet of code from .../kernel/printk.c. The body of the function has been stripped because it is not relevant to the discussion. The most relevant part of Listing 5-4 is the last line, the invocation of the __setup macro. This macro expects two arguments; in this case, it is passed a string literal and a function pointer. It is no coincidence that the string literal passed to the __setup macro is the same as the first eight characters of the kernel command line related to the console: console=.
Listing 5-4. Console Setup Code Snippet
/*
* Setup a list of consoles. Called from init/main.c
*/
static int __init console_setup(char *str)
{
char name[sizeof(console_cmdline[0].name)];
char*s, *options;
int idx;
/*
* Decode str into name, index, options.
*/
return 1;
}
__setup("console=", console_setup);
You can think of this macro as a registration function for the kernel command-line console parameter. In effect, it says: When the console= string is encountered on the kernel command line, invoke the function represented by the second __setup macro argumentin this case, the console_setup() function. But how is this information communicated to the early setup code, outside this module, which has no knowledge of the console functions? The mechanism is both clever and somewhat complicated, and relies on lists built by the linker.
The details are hidden in a set of macros designed to conceal the syntactical tedium of adding section attributes (and other attributes) to a portion of object code. The objective is to build a static list of string literals associated with function pointers. This list is emitted by the compiler in a separately named ELF section in the final vmlinux ELF image. It is important to understand this technique; it is used in several places within the kernel for special-purpose processing.
Let's now examine how this is done for the __setup macro case. Listing 5-5 is a portion of code from the header file .../include/linux/init.h defining the __setup family of macros.
Listing 5-5. Family of __setup Macro Definitions from init.h
...
#define __setup_param(str, unique_id, fn, early) \
static char __setup_str_##unique_id[] __initdata = str; \
static struct obs_kernel_param __setup_##unique_id \
__attribute_used__ \
__attribute__((__section__(".init.setup"))) \
__attribute__((aligned((sizeof(long))))) \
= { __setup_str_##unique_id, fn, early }
#define __setup_null_param(str, unique_id) \
__setup_param(str, unique_id, NULL, 0)
#define __setup(str, fn\
__setup_param(str, fn, fn, 0)
...
Listing 5-5 is the author's definition of syntactical tedium! Recall from Listing 5-4 that our invocation of the original __setup macro looked like this:
__setup("console=", console_setup);
With some slight simplification, here is what the compiler's preprocessor produces after macro expansion:
static char __setup_str_console_setup[] __initdata = "console=";
static struct obs_kernel_param __setup_console_setup \
__attribute__((__section__(".init.setup"))) =
{__setup_str_console_setup, console_setup, 0};
To make this more readable, we have split the second and third lines, as indicated by the UNIX line-continuation character \.
We have intentionally left out two compiler attributes whose description does not add any insight to this discussion. Briefly, the __attribute_used__ (itself a macro hiding further syntactical tedium) tells the compiler to emit the function or variable, even if the optimizer determines that it is unused.[43] The __attribute__ (aligned) tells the compiler to align the structures on a specific boundary, in this case sizeof(long).
What we have left after simplification is the heart of the mechanism. First, the compiler generates an array of characters called __setup_str_console_ setup[] initialized to contain the string console= . Next, the compiler generates a structure that contains three members: a pointer to the kernel command line string (the array just declared), the pointer to the setup function itself, and a simple flag. The key to the magic here is the section attribute attached to the structure. This attribute instructs the compiler to emit this structure into a special section within the ELF object module, called .init.setup. During the link stage, all the structures defined using the __setup macro are collected and placed into this .init .setup section, in effect creating an array of these structures. Listing 5-6, a snippet from .../init/main.c, shows how this data is accessed and used.
Listing 5-6. Kernel Command Line Processing
1 extern struct obs_kernel_param __setup_start[], __setup_end[];
2
3 static int __init obsolete_checksetup(char *line)
4 {
5 struct obs_kernel_param *p;
6
7 p = __setup_start;
8 do {
9 int n = strlen(p->str);
10 if (!strncmp(line, p->str, n)) {
11 if (p->early) {
12 /* Already done in parse_early_param? (Needs
13 * exact match on param part) */
14 if (line[n] == '\0' || line[n] == '=')
15 return 1;
16 } else if (!p->setup_func) {
17 printk(KERN_WARNING "Parameter %s is obsolete,"
18 " ignored\n", p->str);
19 return 1;
20 } else if (p->setup_func(line + n))
21 return 1;
22 }
23 p++;
24 } while (p < __setup_end);
25 return 0;
26 }
Examination of this code should be fairly straightforward, with a couple of explanations. The function is called with a single command line argument, parsed elsewhere within main.c. In the example we've been discussing, line would point to the string console=ttyS0, 115200, which is one component from the kernel command line. The two external structure pointers __setup_start and __setup_end are defined in a linker script file, not in a C source or header file. These labels mark the start and end of the array of obs_kernel_param structures that were placed in the .init.setup section of the object file.
The code in Listing 5-6 scans all these structures via the pointer p to find a match for this particular kernel command line parameter. In this case, the code is searching for the string console= and finds a match. From the relevant structure, the function pointer element returns a pointer to the console_setup() function, which is called with the balance of the parameter (the string ttyS0, 115200) as its only argument. This process is repeated for every element in the kernel command line until the kernel command line has been completely exhausted.
The technique just described, collecting objects into lists in uniquely named ELF sections, is used in many places in the kernel. Another example of this technique is the use of the __init family of macros to place one-time initialization routines into a common section in the object file. Its cousin __initdata, used to mark one-time-use data items, is used by the __setup macro. Functions and data marked as initialization using these macros are collected into a specially named ELF section. Later, after these one-time initialization functions and data objects have been used, the kernel frees the memory occupied by these items. You might have seen the familiar kernel message near the final part of the boot process saying, "Freeing init memory: 296K." Your mileage may vary, but a third of a megabyte is well worth the effort of using the __init family of macros. This is exactly the purpose of the __initdata macro in the earlier declaration of __setup_str_console_setup[].
You might have been wondering about the use of symbol names preceded with obsolete_. This is because the kernel developers are replacing the kernel command line processing mechanism with a more generic mechanism for registering both boot time and loadable module parameters. At the present time, hundreds of parameters are declared with the __setup macro. However, new development is expected to use the family of functions defined by the kernel header file .../include/linux/moduleparam.h, most notably, the family of module_param* macros. These are explained in more detail in Chapter 8, "Device Driver Basics," when we introduce device drivers.
The new mechanism maintains backward compatibility by including an unknown function pointer argument in the parsing routine. Thus, parameters that are unknown to the module_param* infrastructure are considered unknown, and the processing falls back to the old mechanism under control of the developer. This is easily understood by examining the well-written code in .../kernel/params.c and the parse_args() calls in .../init/main.c.
The last point worth mentioning is the purpose of the flag member of the obs_kernel_param structure created by the __setup macro. Examination of the code in Listing 5-6 should make it clear. The flag in the structure, called early, is used to indicate whether this particular command line parameter was already consumed earlier in the boot process. Some command line parameters are intended for consumption very early in the boot process, and this flag provides a mechanism for an early parsing algorithm. You will find a function in main.c called do_early_param() that traverses the linker-generated array of __setup- generated structures and processes each one marked for early consumption. This gives the developer some control over when in the boot process this processing is done.
Many kernel subsystems are initialized by the code found in main.c. Some are initialized explicitly, as with the calls to init_timers() and console_init(), which need to be called very early. Others are initialized using a technique very similar to that described earlier for the __setup macro. In short, the linker builds lists of function pointers to various initialization routines, and a simple loop is used to execute each in turn. Listing 5-7 shows how this works.
Listing 5-7. Example Initialization Routine
static int __init customize_machine(void) {
/* customizes platform devices, or adds new ones */
if (init_machine) init_machine();
return 0;
}
arch_initcall(customize_machine);
This code snippet comes from .../arch/arm/kernel/setup.c. It is a simple routine designed to provide a customization hook for a particular board.
Notice two important things about the initialization routine in Listing 5-7. First, it is defined with the __init macro. As we saw earlier, this macro applies the section attribute to declare that this function gets placed into a section called .init.text in the vmlinux ELF file. Recall that the purpose of placing this function into a special section of the object file is so the memory space that it occupies can be reclaimed when it is no longer needed.
The second thing to notice is the macro immediately following the definition of the function: arch_initcall(customize_machine). This macro is part of a family of macros defined in .../include/linux/init.h. These macros are reproduced here as Listing 5-8.
Listing 5-8. initcall Family of Macros
#define __define_initcall(level,fn) \
static initcall_t __initcall_##fn __attribute_used__ \
__attribute__((__section__(".initcall" level ".init"))) = fn
#define core_initcall(fn) __define_initcall("1",fn)
#define postcore_initcall(fn) __define_initcall("2",fn)
#define arch_initcall(fn) __define_initcall("3",fn)
#define subsys_initcall(fn) __define_initcall("4",fn)
#define fs_initcall(fn) __define_initcall("5",fn)
#define device_initcall(fn) __define_initcall("6",fn)
#define late_initcall(fn) __define_initcall("7",fn)
In a similar fashion to the __setup macro previously detailed, these macros declare a data item based on the name of the function, and use the section attribute to place this data item into a uniquely named section of the vmlinux ELF file. The benefit of this approach is that main.c can call an arbitrary initialization function for a subsystem that it has no knowledge of. The only other option, as mentioned earlier, is to pollute main.c with knowledge of every subsystem in the kernel.
As you can see from Listing 5-8, the name of the section is .initcallN.init, where N is the level defined between 1 and 7. The data item is assigned the address of the function being named in the macro. In the example defined by Listings 5-7 and 5-8, the data item would be as follows (simplified by omitting the section attribute):
static initcall_t __initcall_customize_machine = customize_machine;
This data item is placed in the kernel's object file in a section called .initcall1.init.
The level (N) is used to provide an ordering of initialization calls. Functions declared using the core_initcall() macro are called before all others. Functions declared using the postcore_initcall() macros are called next, and so on, while those declared with late_initcall() are the last initialization functions to be called.
In a fashion similar to the __setup macro, you can think of this family of *_initcall macros as registration functions for kernel subsystem initialization routines that need to be run once at kernel startup and then never used again. These macros provide a mechanism for causing the initialization routine to be executed during system startup, and a mechanism to discard the code and reclaim the memory after the routine has been executed. The developer is also provided up to seven levels of when to perform the initialization routines. Therefore, if you have a subsystem that relies on another being available, you can enforce this ordering using these levels. If you grep the kernel for the string [a-z]*_initcall, you will see that this family of macros is used extensively.
One final note about the *_initcall family of macros: The use of multiple levels was introduced during the development of the 2.6 kernel series. Earlier kernel versions used the __initcall() macro for this purpose. This macro is still in widespread use, especially in device drivers. To maintain backward compatibility, this macro has been defined to device_initcall(), which has been defined as a level 6 initcall.
The code found in .../init/main.c is responsible for bringing the kernel to life. After start_kernel() performs some basic kernel initialization, calling early initialization functions explicitly by name, the very first kernel thread is spawned. This thread eventually becomes the kernel thread called init(), with a process id (PID) of 1. As you will learn, init() becomes the parent of all Linux processes in user space. At this point in the boot sequence, two distinct threads are running: that represented by start_kernel() and now init(). The former goes on to become the idle process, having completed its work. The latter becomes the init process. This can be seen in Listing 5-9.
Listing 5-9. Creation of Kernel init THRead
static void noinline rest_init(void) __releases(kernel_lock) {
kernel_thread(init, NULL, CLONE_FS | CLONE_SIGHAND);
numa_default_policy();
unlock_kernel();
preempt_enable_no_resched();
/*
* The boot idle thread must execute schedule()
* at least one to get things moving:
*/
schedule();
cpu_idle();
}
The start_kernel() function calls rest_init(), reproduced in Listing 5-9. The kernel's init process is spawned by the call to kernel_thread().init goes on to complete the rest of the system initialization, while the thread of execution started by start_kernel() loops forever in the call to cpu_idle().
The reason for this structure is interesting. You might have noticed that start_kernel(), a relatively large function, was marked with the __init macro. This means that the memory it occupies will be reclaimed during the final stages of kernel initialization. It is necessary to exit this function and the address space that it occupies before reclaiming its memory. The answer to this was for start_kernel() to call rest_init(), shown in Listing 5-9, a much smaller piece of memory that becomes the idle process.
When init() is spawned, it eventually calls do_initcalls(), which is the function responsible for calling all the initialization functions registered with the *_initcall family of macros. The code is reproduced in Listing 5-10 in simplified form.
Listing 5-10. Initialization via initcalls
static void __init do_initcalls(void) {
initcall_t *call;
for (call = &__initcall_start; call < &__initcall_end; call++) {
if (initcall_debug) {
printk(KERN_DEBUG "Calling initcall 0x%p", *call);
print_symbol(":%s()", (unsigned long) *call);
printk("\n");
}
(*call)();
}
}
This code is self-explanatory, except for the two labels marking the loop boundaries: __initcall_start and __initcall_end. These labels are not found in any C source or header file. They are defined in the linker script file used during the link stage of vmlinux. These labels mark the beginning and end of the list of initialization functions populated using the *_initcall family of macros. You can see each of the labels by looking at the System.map file in the top-level kernel directory. They all begin with the string __initcall, as described in Listing 5-8.
In case you were wondering about the debug print statements in do_initcalls(), you can watch these calls being executed during bootup by setting the kernel command line parameter initcall_debug. This command line parameter enables the printing of the debug information shown in Listing 5-10. Simply start your kernel with the kernel command line parameter initcall_debug to enable this diagnostic output.[44]
Here is an example of what you will see when you enable these debug statements:
...
Calling initcall 0xc00168f4: tty_class_init+0x0/0x3c()
Calling initcall 0xc000c32c: customize_machine+0x0/0x2c()
Calling initcall 0xc000c4f0: topology_init+0x0/0x24()
Calling initcall 0xc000e8f4: coyote_pci_init+0x0/0x20()
PCI: IXP4xx is host
PCI: IXP4xx Using direct access for memory space
...
Notice the call to customize_machine(), the example of Listing 5-7. The debug output includes the virtual kernel address of the function (0xc000c32c, in this case) and the size of the function (0x2c here.) This is a useful way to see the details of kernel initialization, especially the order in which various subsystems and modules get called. Even on a modestly configured embedded system, dozens of these initialization functions are invoked in this manner. In this example taken from an ARM XScale embedded target, there are 92 such calls to various kernel-initialization routines.
Having spawned the init() thread and all the various initialization calls have completed, the kernel performs its final steps in the boot sequence. These include freeing the memory used by the initialization functions and data, opening a system console device, and starting the first userspace process. Listing 5-11 reproduces the last steps in the kernel's init() from main.c.
Listing 5-11. Final Kernel Boot Steps from main.c
if (execute_command) {
run_init_process(execute_command);
printk(KERN_WARNING "Failed to execute %s. Attempting defaults...\n", execute_command);
}
run_init_process("/sbin/init");
run_init_process("/etc/init");
run_init_process("/bin/init");
run_init_process("/bin/sh");
panic("No init found. Try passing init= option to kernel.");
Notice that if the code proceeds to the end of the init() function, a kernel panic results. If you've spent any time experimenting with embedded systems or custom root file systems, you've undoubtedly encountered this very common error message as the last line of output on your console. It is one of the most frequently asked questions (FAQs) on a variety of public forums related to Linux and embedded systems.
One way or another, one of these run_init_process() commands must proceed without error. The run_init_process() function does not return on successful invocation. It overwrites the calling process with the new one, effectively replacing the current process with the new one. It uses the familiar execve() system call for this functionality. The most common system configurations spawn /sbin/init as the userland[45] initialization process. We study this functionality in depth in the next chapter.
One option available to the embedded system developer is to use a custom userland initialization program. That is the purpose of the conditional statement in the previous code snippet. If execute_command is non-null, it points to a string containing a custom user-supplied command to be executed in user space. The developer specifies this command on the kernel command line, and it is set via the __setup macro we examined earlier in this chapter. An example kernel command line incorporating several concepts discussed in this chapter might look like this:
initcall_debug init=/sbin/myinit console=ttyS1,115200 root=/dev/hda1
This kernel command line instructs the kernel to display all the initialization routines as encountered, configures the initial console device as /dev/ttyS1 at 115 kbps, and executes a custom user space initialization process called myinit, located in the /sbin directory on the root file system. It directs the kernel to mount its root file system from the device /dev/hda1, which is the first IDE hard drive. Note that, in general, the order of parameters given on the kernel command line is irrelevant. The next chapter covers the details of user space system initialization.
• The Linux kernel project is large and complex. Understanding the structure and composition of the final image is key to learning how to customize your own embedded project.
• Many architectures concatenate an architecture-specific bootstrap loader onto the kernel binary image to set up the proper execution environment required by the Linux kernel. We presented the bootstrap loader build steps to differentiate this functionality from the kernel proper.
• Understanding the initialization flow of control will help deepen your knowledge of the Linux kernel and provide insight into how to customize for your particular set of requirements.
• We found the kernel entry point in head.o and followed the flow of control into the first kernel C file, main.c. We looked at a booting system and the messages it produced, along with an overview of many of the important initialization concepts.
• The kernel command line processing and the mechanisms used to declare and process kernel command line parameters was presented. This included a detailed look at some advanced coding techniques for calling arbitrary unknown setup routines using linker-produced tables.
• The final kernel boots steps produce the first userspace processes. Understanding this mechanism and its options will enable you to customize and troubleshoot embedded Linux startup issues.
GNU Compiler Collection documentation:
http://gcc.gnu.org/onlinedocs/gcc [46]
Using LD, the GNU linker
http://www.gnu.org/software/binutils/manual/ld-2.9.1/ld.html
Kernel documentation:
.../Documentation/kernel-parameters.txt
The kernel image is nearly always stored in compressed format, unless boot time is a critical issue. In this case, the image might be called uImage, a compressed vmlinux file with a U-Boot header. See Chapter 7,"Bootloaders."
The term piggy was originally used to describe a "piggy-back" concept. In this case, the binary kernel image is piggy-backed onto the bootstrap loader to produce the composite kernel image.
Not to be confused with the bootloader, a bootstrap loader can be considered a second-stage loader, where the bootloader itself can be thought of as a first-stage loader.
The term machine as used here refers to a specific hardware reference platform.
Often called Instruction Pointer, the register which holds the address of the next machine instruction in memory.
Modifying head.S for your custom platform is highly discouraged. There is almost always a better way. See Chapter 16, "Porting Linux," for additional information.
Modifying head.S for your custom platform is highly discouraged. There is almost always a better way. See Chapter 16, "Porting Linux," for additional information.
Normally, the compiler will complain if a variable is defined static and never referenced in the compilation unit. Because these variables are not explicitly referenced, the warning would be emitted without this directive.
You might have to lower the default loglevel on your system to see these debug messages. This is described in many references about Linux system administration. In any case, you should see them in the kernel log file.
Userland is an often-used term for any program, library, script, or anything else in user space.
Especially the sections on function attributes, type attributes, and variable attributes.