Introduction to Microprocessors and Microcontrollers - читать онлайн бесплатно полную версию книги . Страница 10

9. Programming – using machine code and assembly

We can use a microprocessor as part of a computer or as part of a dynamite process controller and the only essential difference is in the instructions that it is given.

Since the microprocessor has no intelligence at all, it relies entirely on following a sequence of instructions as we discussed in the fetch-execute cycle.

If you were sitting in a lecture theatre and the speaker said, ‘We’ll finish now’, you would know that it was time to pick up your bits and pieces and leave the room. Those words were not the only possibility, other instructions would have served the same purpose like, ‘Right, that’s it, thank you’; ‘We’ll break here’; ‘It seems a good time for a break’; ‘We’ll stop at this point and continue at the next session’, and many other variations. Teachers can clear a whole classroom instantly just by looking at their watch and saying, ‘Well, …’. There are literally dozens of ways of telling people that it is time to leave a room even before resorting to the less polite possibilities.

Microprocessors, on the other hand, have nothing like this degree of flexibility, in fact they have almost no flexibility at all. It is very frustrating, but we do not share any common language with a microprocessor (Figure 9.1). Each microprocessor has a built-in list of instructions that it can understand. This list is called its ‘instruction set’ and may consist of about a hundred or so instructions which must be put together in the right order to carry out the function required. This is the job of the programmer and is similar to a builder, constructing a house by putting a lot of simple pieces like wood, tiles, bricks in the correct order. When the microprocessor is designed, the instruction decoder recognizes these inputs and starts an internal process that allows the microprocessor to carry out the instruction. This presupposes that the microprocessor is familiar with the instruction or, to put it another way, the instruction is in a language that is understood by that particular microprocessor.

Figure 9.1 A small communication problem

As we have seen, we cannot talk directly to a microprocessor and, even worse, microprocessors often cannot talk directly to each other (Figure 9.2). This is particularly the case when the microprocessors have been developed by rival organizations.

Figure 9.2 They are all saying ‘ADD’

Within one company, such as Intel, there is a commercial pressure to ensure that each succeeding microprocessor understands the binary codes of the previous designs. This is referred to as being ‘upward compatible’. There is nothing to prevent a company from designing a microprocessor that has the same pins and programming capability as another. The Z80180 for instance was designed as an updated copy of the Intel Z80 which, in turn, was a revised version of the 8080A. It was a pin for pin compatible plug-in replacement. This may be irritating for the original designers but is accepted provided the internal design has not been copied. Indeed, it often does the original company no harm. If several compatible microprocessors are being sold it will induce many programmers to write programs using this code. This will increase the sales of these microprocessors and, perhaps, no-one will suffer.

The Intel Pentium was under similar attack during 1997/8 from other microprocessors like the Athlon series made by A.M.D. These can run Pentium programs and, for some purposes, are superior to the Pentium.

Machine code

The binary code that is understood by the microprocessor is called machine code and consists of streams of binary bits. They are fed from the RAM or ROM memory chips in blocks of 8, 16, 32 or 64 depending on the microprocessor in use. To us the binary stream is total gibberish.

Example

If we refer to the Z80180 block diagram in Figure 9.3 we can investigate the instruction necessary to add two numbers together. One of the numbers is already stored in the accumulator or register A. Let’s assume this is the number 25H.

Figure 9.3 The microprocessor adds two numbers

In comes the instruction: 11000110 00010101. It is in two parts, the first byte, 11000110, means add the following number to the number stored in the accumulator. This first byte which contains the instruction is called the operation code, usually abbreviated to ‘op code’. The second byte, 00010101, is the number 15H. This particular instruction has two bytes. Some instructions have only the one byte and others three or more bytes. The additional bytes contain the data to be used by the op code and is called the operand.

And here’s the action

1 The first byte goes into the instruction decoder where it is decoded into a sequence of internal operations.

2 It then copies the number 15H from the data buffer into ALU, the arithmetic and logic unit.

3 The number 25H from the accumulator is then copied into the ALU to join the 15H which is already there. The two numbers are added.

4 And the result is copied back to the accumulator.

What is the result?

It is 3AH. Be careful not to let your brain jump back to decimal mode and shout 40 at you.

At the end of a machine-code program, we must include an instruction to stop the program, otherwise we can get some unexpected or unwanted results. You may remember that, at least in the development stages, the program is often stored in RAM.

Now, RAM locations take up random values when they are switched off so when the program has completed the last instruction, it will start executing the random values as if they were a program. These instructions may, of course, do anything at all. They could even delete or change the program that we have just written. Overall, the effect is like an aircraft overshooting the end of a runway.

The problems with machine code

There are so many. The program is not friendly: 11000110 00010101 hardly compares with ‘Add 15H to the number 25H’ for easy understanding. There is nothing about 11000110 which reminds us of its meaning ‘add the following number to the number stored in the accumulator’ so a program would need to be laboriously decoded byte by byte.

Typing in streams of ones and zeros is so boring that we will make many mistakes, particularly when we remember that a real program may be ten thousand times longer than this. Can you imagine typing in half a million bits, finding the program does not run correctly and then settling down to look for the mistakes?

Another problem is that the programmer must be aware of the internal structure of the microprocessor. How else could you know which register to use, or even which registers exist? So you master all this and then change to another microprocessor and then what? The whole learning process has to start again – new instructions, new registers, and new coding requirements. It’s all too horrible.

The difficulties with machine code hardly mattered in the early days of the microprocessor. Everyone who programmed them were fanatics and loved the complexity and there were few serious jobs for the microprocessor to do. This first program language was called a ‘low-level’ language to differentiate it from our own verbal communication language which was called a high-level language. Machine code was later referred to as the ‘First generation’ language (see Figure 9.4).

Figure 9.4 High-and low-level languages

Very soon, the microprocessor was used for an increasing range of tasks and revolutionary ideas like ‘speed and ease’ crept into the discussions. This resulted in a new language called Assembly which overcame the most immediate failings of machine code.

Assembly language, the second generation language

Assembly language was designed to do the same work as machine code but be much easier to use. It replaced all the ones and zeros with letters that were easier to remember but it is still a low-level language.

The assembly equivalent of our machine code example 11000110 00010101 is the code ADD A,m. This means ‘add any nuMber ,m to the value stored in the accumulator. We can see immediately that it would be far easier to guess the meaning of ADD A,m than 11000110 00010101 and so it makes programming much easier. If we had to choose letters to represent the ‘add’ command, ADD A,m was obviously a good choice. A big improvement over alternatives like XYZ k,g or ABC r,h. The code ADD A,m is called a mnemonic.

A mnemonic (pronounced as nemonic) is just an aid to memory and is used for all assembly codes. Here are a couple of examples:

SLA E for shift to the left, the contents of register E.

LD B 25H load the B register with the number 25H.

Now see how easy it makes it by guessing the meaning of these:

INC H

LD C 48H

Finally, have a go at one that we have not considered yet.

LD B BЈ

If SLA E means shift the contents of register E one place to the left, then SRA E means shift the bits one place to the right. LD C 48H means put the number 48H into the C register. LD B BЈ enables us to copy the number stored in the BЈ register into the B register. Note: the actual mnemonics differ between microprocessors. The manufacturers issue an ‘instruction set’ that lists all the codes for each of their microprocessors. Together with the number of clock cycles taken by each instruction and a summary of the function of each.

Non-destructive readout

Incidentally, microprocessors in common with memories, always use non-destructive readouts. This means that information is shifted from one place to another by copying it and leaving the original number unaltered. For example, after the instruction LD A C, the registers A and C will both finish up with the same information in them. This enables stored information to be used over and over again.

To allow us to type in ADD A,15 rather than 11000110 00010101, we need another program to do the conversion. This program is called an assembler (see Figure 9.5).

Figure 9.5 An assembler

The program allows us to type in the assembly code, called the source code, and converts it to machine code referred to as object code. It can show the object code on a monitor screen or print it out or it can load it into the RAM ready for use. When starting the assembler, we have to state the RAM starting address that we wish to use. This is normally only a matter of making sure it is in RAM and avoiding the other programs already installed. The object code is shown in hex numbers rather than binary to make it easier for us. An assembler can only work within the instruction set provided by the microprocessor designer. It cannot add any new instructions and is (almost) just a simple converter or translator between mnemonics and machine code.

Assemblers are available from many sources and all provide the necessary conversion from source code to object code. In addition, they may provide other features that will help us in the programming.

Syntax help

Syntax is the structure of statements in a language, whether it be English or a computer language. In English, most people would recognize something is incorrect about saying ‘He are going’ rather than ‘He is going’. This is an example of a syntax error.

As an example, If we mistyped the instruction LD A C as LD A V then the assembler would be unable to convert this to object code since it will not recognize the ‘V’ as a valid register. A real cheapie assembler may just stop or miss out this instruction. A somewhat better one may put a message on the screen saying ‘syntax error code not recognized’ and a very helpful one may suggest a likely cause of the trouble. It may say, ‘syntax error. Invalid register. Register name may only be A, B, C, D , E, H or L.’

Labels

Another facility offered by a good assembler is the label. A label is a word that can be used to represent an address while the program is being written.

You will recall that we have to instruct the microprocessor to stop at the end of a program otherwise it will go off following random instructions. One way of stopping the microprocessor is to give it something quite meaningless to do. Suppose you opened an envelope, and the message inside read ‘put the envelope on the desk, pick it up, open the envelope and obey the instructions’. So you open the envelope, read the instructions, put it on the desk, pick it up and read the instructions. So you open the envelope, read the instructions… and I’m sure you can guess the next bit. We, of course, would soon stop doing this because we would see that it is pointless. A microprocessor on the other hand has no memory of ever having seen the instruction before and will be quite happy to do it forever if required.

Every byte of a program is stored in a sequence of memory locations so if we started a program in address 0300H we may have got to the address 0950H when the stop instruction is required. The last line of the program was ‘jump to address 0950H’. The jump instruction tells the microprocessor to go to the address that follows, in this case 0950H, and perform the instruction to be found there. The microprocessor now finds itself in the endless loop shown in Figure 9.6. Once the microprocessor is in one of these loops it will run continuously until the power supply is switched off, or the reset pin is taken low, or one of the interrupt pins is activated.

Figure 9.6 One way of stopping a microprocessor

So where does the label come in?

Let’s imagine that we find that we have to add an extra byte at the start of the program. This will result in each byte being shifted one place down in the memory and the last line of the program actually starting at address 0951H rather than 0950H. When the program tells the microprocessor to jump to address 0950H it will no longer be in the little loop. The new contents of address 0950H may provide any random instruction and the whole program could crash. This problem, shown in Figure 9.7, will not be seen as a mistake by the assembler. Remember that the assembler is only checking for known codes, not sensible programs.

Figure 9.7 A late change to the program can ruin everything

If we type a word over the position where the memory address normally resides it will be recognized by the assembler as a label. That is a word which is equivalent to the address it replaced. If we use the same label anywhere else in the program it will be given the same value. The good thing about this is that if the program is changed and the addresses change, then the value of the label is changed automatically so the two ‘STOP’s in the last line will always have the same value and the loop will be safe from accidental destruction (Figure 9.8).

Figure 9.8 Labels to the rescue!

Once the assembler accepts the label, it can be used as often as we like in the program but remember that a label can only ever have one meaning within a single program. The choice of label is usually restricted to avoid clashing with words that have a special meaning to the assembler. Words that can do this are called ‘reserved’ words and usually include words such as jump, LD, ADD, HALT etc. The actual list is provided with the assembler program.

Remarks

Assemblers and all higher languages allow us to write remarks or notes to act as reminders but to be ignored by the assembler. To tell the assembler program not to attempt converting it to object code we precede the note by a semicolon or the word REM: or something similar.

The last line of our program, using the mnemonic JP for jump could be written as:

STOP JP STOP ;this will hold the microprocessor in a loop.

Being able to add remarks to a program makes it much easier for the program to be understood at a later date. At the time of coding, the program seems, to the programmer at least, a model of clarity. When the programmer is off sick and we have to take over, their strategy is not obvious at all. It is even more embarrassing if the program that we cannot understand is the one that we wrote ourselves a few weeks earlier. The moral of this story is to add notes even if it seems ridiculously obvious at the time.

Summary of assembly language

Assembly and machine code are not portable. This means that they are designed to be used on a particular microprocessor and are generally not able to be used on another type. They also require the programmer to have knowledge of the internal layout or architecture of the microprocessor.

Despite the two problems of portability and architecture knowledge, assembly language has survived the onslaught of the new, modern ‘improved’ languages considered in the next chapter.

Why?

Assembly languages have two overriding advantages in the hands of a competent programmer (note the ‘competent’). A program written in Assembly is faster and is more compact, i.e. it takes less memory space to store it. Machine code and assembly languages are called procedural languages. This means that the program instructs the microprocessor to complete the first instruction, then start the next, then the next and so on until it has finished the job. This is just like a recipe.

Nearly all microprocessor-based systems are designed to operate in this way and it seems so obvious that it is difficult to think that there is any alternative – but there is, as we will see later.

Quiz time 9

In each case, choose the best option.

1 An assembler:

(a) converts assembly programs into machine code.

(b) is a type of microprocessor.

(d) is essential for converting mnemonics into source code.

2 The data that follows the op code:

(a) is always present and consists of one or more bytes.

(b) is called the object code.

(d) uses denary numbers.

3 A label is:

(a) an important feature in designer clothing.

(b) a form of syntax error.

(d) the part of the program that comes before an operand.

4 Machine code is:

(a) not a low-level language.

(b) written using mnemonics.

(d) an object code.

5 The part of the microprocessor that can follow the machine code is called the:

(a) Assembler.

(b) Instruction decoder.

(d) Mechanic.