52938.fb2
It takes just minutes for a developer to compile and run a “Hello World!” application on a non-embedded system. On the other hand, for an embedded developer, the task is not so trivial. It might take days before seeing a successful result. This process can be a frustrating experience for a developer new to embedded system development.
Booting the target system, whether a third-party evaluation board or a custom design, can be a mystery to many newcomers. Indeed, it is daunting to pick up a programmer’s reference manual for the target board and pore over tables of memory addresses and registers or to review the hardware component interconnection diagrams, wondering what it all means, what to do with the information (some of which makes little sense), and how to relate the information to running an image on the target system.
Questions to resolve at this stage are
· how to load the image onto the target system,
· where in memory to load the image,
· how to initiate program execution, and
· how the program produces recognizable output.
We answer these questions in this chapter and hopefully reduce frustration by demystifying the booting and initialization process of embedded systems.
Chapter 2 discusses constructing an executable image with multiple program sections according to the target system memory layout. After the final image is successfully built and residing on the host system, the next step is to execute it on the target.
The focus of this chapter is
· image transfer from the host to the target system,
· the embedded monitor and debug agent,
· the target system loader,
· the embedded system booting process,
· various initialization procedures, and
· an introduction to BDM and JTAG interfaces.
An executable image built for a target embedded system can be transferred from the host development system onto the target, which is called loading the image, by:
· Programming the entire image into the EEPROM or flash memory.
· Downloading the image over either a serial (typically RS-232) or network connection. This process requires the presence of a data transfer utility program on the host system, as well as the presence of a target loader, an embedded monitor, or a target debug agent on the target system.
· Downloading the image through either a JTAG or BDM interface (discussed in section 3.5).
These approaches are the most common, and this list is by no means comprehensive. Some of the possible host-to-target connectivity solutions are shown in Figure 2.1. Figure 3.1 exemplifies a target embedded system. We refer to the ELF image format (introduced in Chapter 2) exclusively throughout this chapter.
Figure 3.1: View of the target embedded system.
The embedded software for the final product is commonly stored in either ROM or the flash memory. The entire executable image is burned into the ROM or flash memory using special equipment. If ROM is used, the ROM chip is set into its socket on the target board. For embedded system boards that have both ROM and flash memory, the next step is to set the necessary jumpers. Jumpers are the part of the target board's wiring that controls which memory chip the processor uses to start executing its first set of instructions upon reboot. For example, if the image is stored in the flash memory and the jumpers are set to use the flash memory, the processor fetches its first instruction from the starting address where the flash is mapped. Therefore, set the jumpers appropriately according to the image storage.
This final production method, however, is impractical during the development stage because developers construct software in incremental steps with high frequency. The process is interactive in that a portion of the code is written, debugged, and tested, and the entire process then repeats for the new code. Reprogramming the EEPROM or the flash memory each time the code changes due to bugs or code addition is time consuming. The methods for downloading the image over a serial or a network connection or for downloading the image through a JTAG or BDM interface solve this problem by transferring the image directly into the target system's RAM memory.
A common approach taken at the early development phase is to write a loader program for the target side, which is called the loader, and use the loader to download the image from the host system. In the scenario shown in Figure 3.1, the loader has a small memory footprint, so it typically can be programmed into a ROM chip. A data transfer utility resides on the host system side. The loader works in conjunction with its host utility counterpart to perform the image transfer.
After the loader is written, it is programmed into the ROM. Part of the same ROM chip is occupied by the boot image. At a minimum, this boot image (typically written by a hardware engineer) consists of the code that executes on system power up. This code initializes the target hardware, such as the memory system and the physical RAM, into a known state. In other words, the boot image prepares the system to execute the loader. The loader begins execution after this boot image completes the necessary initialization work.
For this transfer method to work, a data transfer protocol, as well as the communication parameters, must be agreed upon between the host utility and the target loader. The data transfer protocol refers to the transfer rules. For example, a transfer protocol might be that the image transfer request should be initiated from the loader to the host utility; in which case, the host utility sends out the image file size followed by the actual image, and the loader sends an acknowledgement to the host utility upon completion. Data transfer rate, such as the baud rate for the serial connection, and per packet size are examples of communication parameters. The loader and the utility program operate as a unit, which is often capable of using more than one type of connection. At a minimum, the transfer takes place over the serial connection. More sophisticated loaders can download images over the network, for example, over the Ethernet using protocols such as the Trivial File Transfer Protocol (TFTP) or the File Transfer Protocol (FTP). In this case, the host utility program is either the TFTP server or the FTP server respectively.
Both proprietary and well-known transfer protocols can be applied in either the serial or the network connection, but more commonly proprietary protocols are used with a serial connection.
The loader downloads the image directly into the RAM memory. The loader needs to understand the object file format (for example, the ELF format) because, as discussed in Chapter 2, the object file contains information such as the load address, which the loader uses for section placement.
The loader transfers control to the downloaded image after the transfer completes. A loader with flash programming capability can also transfer the image into the flash memory. In that case, the board jumpers must be set appropriately so that the processor executes out of flash memory after the image download completes.
A loader can be part of the final application program, and it can perform other functions in addition to downloading images, as discussed in more detail later in this chapter.
An alternative to the boot image plus loader approach is to use an embedded monitor. A monitor is an embedded software application commonly provided by the target system manufacturer for its evaluation boards. The monitor enables developers to examine and debug the target system at run time. Similar to the boot image, the monitor is executed on power up and performs system initialization such as
· initializing the required peripheral devices, for example, the serial interface and the system timer chip for memory refresh, at a minimum,
· initializing the memory system for downloading the image, and
· initializing the interrupt controller and installing default interrupt handlers.
The monitor has a well-defined user interface accessible through a terminal emulation program over the serial interface. The monitor defines a set of commands allowing the developer to
· download the image,
· read from and write to system memory locations,
· read and write system registers,
· set and clear different types of breakpoints,
· single-step instructions, and
· reset the system.
The way in which the monitor downloads the image from the host system over the serial or network connection is similar to how the loader does it. The monitor is capable of downloading the image into either the RAM memory or the flash memory. In essence, the monitor has both the boot image and the loader functionalities incorporated but with the added interactive debug capability. The monitor is still present while the newly downloaded image executes. A special keystroke on the host system, for example, CTRL+D, interrupts the program execution and reactivates the monitor user interface so the developer can conduct interactive debugging activities.
The monitor is generally developed by the hardware engineers and is also used by the hardware engineers to perform both system device diagnostics and low-level code debugging. Some manufactures give the monitor source code to their customers. In that case, the code can be extracted and modified to work with a custom-designed target board.
The target debug agent functions much like the monitor does but with one added feature: the target agent gives the host debugger enough information to provide visual source-level debug capability. Again, an agreed-upon communication protocol must be established between the host debugger and the target agent. The host debugger is something that the host tools vendor offers. Sometimes a RTOS vendor offers a host-based debugger simply because the debug agent is an integral part of the RTOS. The host debugger vendor works closely with the RTOS vendor to provide a fully compatible tool. The debug agent has built-in knowledge of the RTOS objects and services, which allows the developer to explore such objects and services fully and visually.
We have described the software components involved in transferring images from the host to the target. In this section, we describe the details of the loading process itself and how control is transferred to the newly acquired image.
Embedded processors, after they are powered on, fetch and execute code from a predefined and hard-wired address offset. The code contained at this memory location is called the reset vector. The reset vector is usually a jump instruction into another part of the memory space where the real initialization code is found. The reason for jumping to another part of memory is to keep the reset vector small. The reset vector belongs to a small range of memory space reserved by the system for special purposes. The reset vector, as well as the system boot startup code, must be in permanent storage. Because of this issue, the system startup code, called the bootstrap code, resides in the system ROM, the on-board flash memory, or other types of non-volatile memory devices. We will revisit the loader program from the system-bootstrapping perspective. In the discussions to follow, the loader refers to the code that performs system bootstrapping, image downloading, and initialization.
The concepts are best explained through an example. In this example, assume an embedded loader has been developed and programmed into the on-board flash memory. Also, assume that the target image contains various program sections. Each section has a designated location in the memory map. The reset vector is contained in a small ROM, which is mapped to location 0x0h of the address space. The ROM contains some essential initial values required by the processor on reset. These values are the reset vector, the initial stack pointer, and the usable RAM address.
In the example shown in Figure 3.2, the reset vector is a jump instruction to memory location 0x00040h; the reset vector transfers program control to the instruction at this address. Startup initialization code begins at this flash memory address. This system initialization code contains, among other things, the target image loader program and the default system exception vectors. The system exception vectors point to instructions that reside in the flash memory. See Chapter 10 for detailed discussions on interrupts, exceptions, and exception vectors and handlers.
Figure 3.2: Example bootstrap overview.
The first part of the system bootstrap process is putting the system into a known state. The processor registers are set with appropriate default values. The stack pointer is set with the value found in the ROM. The loader disables the system interrupts because the system is not yet prepared to handle the interrupts. The loader also initializes the RAM memory and possibly the on-processor caches. At this point, the loader performs limited hardware diagnostics on those devices needed for its operation.
As discussed in Chapter 2, program execution is faster in RAM than if the executable code runs directly out of the flash memory. To this end, the loader optionally can copy the code from the flash memory into the RAM. Because of this capability, a program section can have both a load address and a run address. The load address is the address in which the program sections reside, while the run address is the address to which the loader program copies the program sections and prepares it for execution. Enabling runtime debugging is another main reason for a program to execute out of the RAM. For example, the debugger must be able to modify the runtime code in order to insert breakpoints.
An executable image contains initialized and uninitialized data sections. These sections are both readable and writeable. These sections must reside in RAM and therefore are copied out of the flash memory into RAM as part of system initialization. The initialized data sections (.data and.sdata) contain the initial values for the global and static variables. The content of these sections, therefore, is part of the final executable image and is transferred verbatim by the loader. On the other hand, the content for the uninitialized data sections.bss and.sbss) is empty. The linker reserves space for these sections in the memory map. The allocation information for these sections, such as the section size and the section run address, is part of the section header. It is the loader’s job to retrieve this information from the section header and allocate the same amount of memory in RAM during the loading process. The loader places these sections into RAM according to the section’s run address.
An executable image is likely to have constants. Constant data is part of the.const section, which is read-only. Therefore, it is possible to keep the.const section in read-only memory during program execution. Frequently accessed constants, such as lookup tables, should be transferred into RAM for performance gain.
The next step in the boot process is for the loader program to initialize the system devices. Only the necessary devices that the loader requires are initialized at this stage. In other words, a needed device is initialized to the extent that a required subset of the device capabilities and features are enabled and operational. In the majority of cases, these devices are part of the I/O system; therefore, these devices are fully initialized when the downloaded image performs I/O system initialization as part of the startup sequence.
Now the loader program is ready to transfer the application image to the target system. The application image contains the RTOS, the kernel, and the application code written by the embedded developer. The application image can come from two places:
· the read-only memory devices on the target, or
· the host development system.
We describe three common image execution scenarios:
· execute from ROM while using RAM for data,
· execute from RAM after being copied from ROM, and
· execute from RAM after being downloaded from a host system.
In the discussions to follow, the term ROM refers to read-only memory devices in general.
Some embedded devices have such limited memory resources that the program image executes directly out of the ROM. Sometimes the board vendor provides the boot ROM, and the code in the boot ROM does not copy instructions out to RAM for execution. In these cases, however, the data sections must still reside in RAM. Figure 3.3 shows this boot scenario.
Figure 3.3: Boot sequence for an image running from ROM.
Two CPU registers are of concern: the Instruction Pointer (IP) register and the Stack Pointer (SP) register. The IP points to the next instruction (code in the.text section) that the CPU must execute, while the SP points to the next free address in the stack. The C programming language uses the stack to pass function parameters during function invocation. The stack is created from a space in RAM, and the system stack pointer registers must be set appropriately at start up.
The boot sequence for an image running from ROM is as follows:
1. The CPU’s IP is hardwired to execute the first instruction in memory (the reset vector).
2. The reset vector jumps to the first instruction of the.text section of the boot image. The.text section remains in ROM; the CPU uses the IP to execute.text. The code initializes the memory system, including the RAM.
3. The.data section of the boot image is copied into RAM because it is both readable and writeable.
4. Space is reserved in RAM for the.bss section of the boot image because it is both readable and writeable. There is nothing to transfer because the content for the.bss section is empty.
5. Stack space is reserved in RAM.
6. The CPU’s SP register is set to point to the beginning of the newly created stack. At this point, the boot completes. The CPU continues to execute the code in the.text section until it is complete or until the system is shut down.
Note that the boot image is not in the ELF format but contains binary machine code ready for execution. The boot image is created in the ELF format. The EEPROM programmer software, however, removes the ELF-specific data, such as the program header table and the section header table, when programming the boot image into the ROM, so that it is ready for execution upon processor reset.
The boot image needs to keep internal information in its program, which is critical to initializing the data sections, because the section header table is not present. As shown in Figure 3.3, the.data section is copied into RAM in its entirety. Therefore, the boot image must know the starting address of its data section and how big the data section is. One approach to this issue is to insert two special labels into the.data section: one label placed at the section’s beginning and the other placed at the end. Special assembly code is written to retrieve the addresses of these labels. These are the load addresses of the labels. The linker reference manual should contain the specific program code syntax and link commander file syntax used for retrieving the load address of a symbol. The difference between these two addresses is the size of the section. A similar approach is taken for the.bss section.
If the.text section is copied into RAM, two dummy functions can be defined. These dummy functions do nothing other than return from function. One function is placed at the beginning of the.text section, while the other is placed at the end. This reason is one why an embedded developer might create custom sections and instruct the linker on where to place a section, as well as how to combine the various sections into a single output section through the linker command file.
In the second boot scenario, the boot loader transfers an application image from ROM to RAM for execution. The large application image is stored in ROM in a compressed form to reduce the storage space required. The loader must decompress this image before it can initialize the sections of that image. Depending on the compression algorithm used and whether enough space is left in the ROM, some state information produced from the compression work can be stored to simplify image decompression. The loader needs a work area in RAM for the decompression process. It is common and good practice to perform checksum calculations over the boot image to ensure the image integrity before loading and execution.
The first six steps are identical to the previous boot scenario. After completing those steps, the process continues as follows:
7. The compressed application image is copied from ROM to RAM.
8-10. Initialization steps that are part of the decompression procedure are completed.
11. The loader transfers control to the image. This is done by “jumping” to the beginning address of the initialized image using a processor-specific “jump” instruction. This “jump” instruction effectively sets a new value into the instruction pointer.
12. As shown in Figure 3.4, the memory area that the loader program occupies is recycled. Specifically, the stack pointer is reinitialized (see the dotted line) to point to this area, so it can be used as the stack for the new program. The decompression work area is also recycled into the available memory space implicitly.
Figure 3.4: Boot sequence for an image executing from RAM after transfer from ROM.
Note that the loader program is still available for use because it is stored in ROM. Making the loader available for later use is often intentional on the designer’s part. Imagine a situation in which the loader program has a built-in monitor. As mentioned earlier, part of the monitor startup sequence is to install default interrupt handlers. This issue is extremely important because during the development phase the program under construction is incomplete and is being constantly updated. As such, this program might not be able to handle certain system interrupts and exceptions. It is beneficial to have the monitor conduct default processing in such cases. For example, a program avoids processing memory access exceptions by not installing an exception handler for it. In this case, the monitor takes control of the system when the program execution triggers such an exception, for example, when the program crashes. The developer then gets the opportunity to debug and back-trace the execution sequence through the monitor inter- face. As indicated earlier, a monitor allows the developer to modify the processor registers. Therefore, as soon as the bug is uncovered and a new program image is built, the developer can set the instruction pointer register to the starting address of the loader program in ROM, effectively transferring control to the loader. The result is that the loader begins to download the new image and reinitializes the entire system without having to power cycle on the system.
Similarly, another benefit of running the loader out of the ROM is to prevent a program that is behaving badly from corrupting its code in systems without protection from the MMU.
In this example, the loader image is in an executable machine code format. The application image is in the ELF format but has been compressed through an algorithm that works independently of the object file format. The application image is in the ELF format so that the loader can be written as a generic utility, able to load many application program images. If the application image is in the ELF format, the loader program can extract the necessary information from the image for initialization.
In the third boot scenario, the target debug agent transfers an application image from the host system into RAM for execution. This practice is typical during the later development phases when the majority of the device drivers have been fully implemented and debugged. The system can handle interrupts and exceptions correctly. At this stage, the target system facilitates a stable environment for further application development, allowing the embedded developer to focus on application design and implementation rather than the low-level hardware details.
The debug agent is RTOS-aware and understands RTOS objects and services. The debug agent can communicate with a host debugger and transfer target images through the host debugger. The debug agent can also function as a standalone monitor. The developer can access the command line interface for the target debug agent through a simple terminal program over the serial link. The developer can issue commands over the command line interface to instruct the debug agent on the target image’s location on the host system and to initiate the transfer.
The debug agent downloads the image into a temporary area in RAM first. After the download is complete and the image integrity verified, the debug agent initializes the image according to the information presented in the program section header table. This boot scenario is shown in Figure 3.5.
Figure 3.5: Boot sequence for an image executing from RAM after transfer from the host system.
The first six steps are identical to the initial boot scenario. After completing those steps, the process continues as follows:
7. The application image is downloaded from the host development system.
8. The image integrity is verified.
9. The image is decompressed if necessary.
10-12. The debug agent loads the image sections into their respective run addresses in RAM.
13. The debug agent transfers control to the download image.
There is a good reason why the memory area used by the debug agent is not recycled. In this example, the downloaded image contains an RTOS, which is introduced in Chapter 4. One of the core components of a RTOS is a scheduler, which facilitates the simultaneous existence and execution of multiple programs, called tasks or threads. The scheduler can save the execution state information of the debug agent and revive the agent later. Thus, the debug agent can continue to communicate with the host debugger while the downloaded image executes, providing interactive, visual, source-level debugging.
The target image referred to repeatedly in the last section is a combination of sophisticated software components and modules as shown in Figure 3.6. The software components include the following: the board support package (BSP), which contains a full spectrum of drivers for the system hardware components and devices; the RTOS, which provides basic services, such as resource synchronization services, I/O services, and scheduling services needed by the embedded applications; and the other components, which provide additional services, such as file system services and network services.
Figure 3.6: Software components of a target image.
These software components perform full system initialization after the target image gains control from the loading program.
Assuming the target image is structured as shown in Figure 3.6, then Figure 3.7 illustrates the steps required to initialize most target systems. The main stages are
· hardware initialization,
· RTOS initialization, and
· application initialization.
Note that these steps are not all that are required to initialize the target system. Rather, this summary provides a high-level example from which to learn. Each stage is discussed more thoroughly in the following sections.
The previous sections described aspects of steps 1 and 2 in Figure 3.7 in which a boot image executes after the CPU begins executing instructions from the reset vector. Typically at this stage, the minimum hardware initialization required to get the boot image to execute is performed, which includes:
1. starting execution at the reset vector
2. putting the processor into a known state by setting the appropriate registers:
○ getting the processor type
○ getting or setting the CPU’s clock speed
3. disabling interrupts and caches
4. initializing memory controller, memory chips, and cache units:
○ getting the start addresses for memory
○ getting the size of memory
○ performing preliminary memory tests, if required
Figure 3.7: The software initialization process.
After the boot sequence initializes the CPU and memory, the boot sequence copies and decompresses, if necessary, the sections of code that need to run. It also copies and decompresses its data into RAM.
Most of the early initialization code is in low-level assembly language that is specific to the target system’s CPU architecture. Later-stage initialization code might be written in a higher-level programming language, such as C.
As the boot code executes, the code calls the appropriate functions to initialize other hardware components, if present, on the target system. Eventually, all devices on the target board are initialized (as shown in step 3 of Figure 3.7). These might include the following:
· setting up execution handlers;
· initializing interrupt handlers;
· initializing bus interfaces, such as VME, PCI, and USB; and
· initializing board peripherals such as serial, LAN, and SCSI.
Most embedded systems developers consider steps 1 and 2 in Figure 3.7 as the initial boot sequence, and steps 1 to 3 as the BSP initialization phase. Steps 1 to 3 are also called the hardware initialization stage.
Writing a BSP for a particular target system is not trivial. The developer must have a good understanding of the underlying hardware components. Along with understanding the target system’s block diagrams, data flow, memory map, and interrupt map, the developer must also know the assembly language for the target system’s microprocessor.
Developers can save a great deal of time and effort by using sample BSPs if they come with the target evaluation board or from the RTOS vendor. Typically, the microprocessor registers that a developer needs to program are listed in these BSPs, along with the sequence in which to work with them to properly initialize target-system hardware.
A completed BSP initialization phase has initialized all of the target-system hardware and has provided a set of function calls that upper layers of software (for example, the RTOS) can use to communicate with the hardware components of the target system.
Step 4 of Figure 3.7 begins the RTOS software initialization. Key things that can happen in steps 4 to 6 include:
1. initializing the RTOS
2. initializing different RTOS objects and services, if present (usually controlled with a user-configurable header file):
○ task objects
○ semaphore objects
○ message-queue objects
○ timer services
○ interrupt services
○ memory-management services
3. creating necessary stacks for RTOS
4. initializing additional RTOS extensions, such as:
○ TCP/IP stack
○ file systems
5. starting the RTOS and its initial tasks
The components of an RTOS (for example, tasks, semaphores, and message queues) are discussed in more detail in later chapters of this book. For now, note that the RTOS abstracts the application code from the hardware and provides software objects and services that facilitate embedded-systems application development.
After the RTOS is initialized and running with the required components, control is transferred to a user-defined application. This transfer takes place when the RTOS code calls a predefined function (that is RTOS dependent) which is implemented by the user-defined application. At this point, the RTOS services are available. This application also goes through initialization, during which all necessary objects, services, data structures, variables, and other constructs are declared and implemented. For a simple, user application such as the “hello world” application, all the work can be done in this function. This user-defined application (maybe the “hello world” application) might finally produce its impressive output. On the other hand, for a complex application, it will create task or tasks to perform the work. These application-created tasks will execute once the kernel scheduler runs. The kernel scheduler runs when this control-transfer function exits.
Many silicon vendors recognize the need for built-in microprocessor debugging, called on-chip debugging (OCD). BDM and JTAG are two types of OCD solutions that allow direct access and control over the microprocessor and system resources without needing software debug agents on the target or expensive in-circuit emulators. As shown in Figure 3.1, the embedded processor with OCD capability provides an external interface. The developer can use the external interface to download code, read or write processor registers, modify system memory, and command the processor to execute one instruction and halt, thus facilitating single-step debugging. Depending on the selected processor, it might be possible to disable the on-chip peripherals while OCD is in effect. It might also be possible to gain a near real-time view of the executing system state. OCD is used to solve the chicken-and-egg problem often encountered at the beginning development stage-if the monitor is the tool for debugging a running program, what debugs the monitor while it's developed? The powerful debug capabilities offered by the OCD combined with the quick turnaround time required to set up the connection means that software engineers find OCD solutions invaluable when writing hardware initialization code, low-level drivers, and even applications.
JTAG stands for Joint Test Action Group, which was founded by electronics manufacturers to develop a new and cost-effective test solution. The result, produced by the JTAG consortium, is sanctioned by the IEEE1149.1 standard.
BDM stands for background debug mode. It refers to the microprocessor debug inter- face introduced by Motorola and found on its processor chips. The term also describes the non-intrusive nature (on the executing system) of the debug method provided by the OCD solutions.
An OCD solution is comprised of both hardware and software. Special hardware devices, called personality modules, are built for the specific processor type and are required to connect between the OCD interface on the target system and the host development system. The interface on the target system is usually an 8- or 10-pin connector. The host side of the connection can be the parallel port, the serial port, or the network interface. The OCD-aware host debugger displays system state information, such as the contents of the processor registers, the system memory dump, and the current executing instruction. The host debugger provides the interface between the embedded software developer and the target processor and its resources.
Some points to remember include the following:
· Developers have many choices for downloading an executable image to a target system. They can use target-monitor-based, debug-agent-based, or hardware-assisted connections.
· The boot ROM can contain a boot image, loader image, monitor image, debug agent, or even executable image.
· Hardware-assisted connections are ideal, both when first initializing a physical target system as well as later, for programming the final executable image into ROM or flash memory.
· Some of the different ways to boot a target system include running an image out of ROM, running an image out of RAM after copying it from ROM, and running an image out of RAM after downloading it from a host.
· A system typically undergoes three distinct initialization stages: hardware initialization, OS initialization (RTOS), and application initialization.
· After the target system is initialized, application developers can use this platform to download, test, and debug applications that use an underlying RTOS.