52905.fb2
As the complexity of microprocessors and other digital integrated circuits has increased, there has been an inevitable increase in the number of transistors that are incorporated in their design.
In the list below, we have used transistors or their equivalent. These classifications are not universally accepted, there are different names and numbers floating around, so a degree of flexibility should be employed when comparing different sources. This is particularly true at the large end where the terminology has not yet ‘firmed up’.
SSI | Small scale integration | 1–10 transistors |
MSI | Medium scale integration | 10–1000 transistors |
LSI | Large scale integration | 1000–10 000 transistors |
VLSI | Very large scale integration | 10 000–100 000 transistors |
SLSI | Super large scale integration | 100 000–1 million transistors |
ULSI | Ultra large scale integration | 1–10 million transistors |
The increase in the number of devices has also had the effect of necessarily decreasing the size of each component. If the same component size were used for the current front runners as was used for the original 4004 microprocessor, they would be about the same size as a page of this book. For reasons that we will look at in a moment, unless we reduced the size of the components, we couldn’t increase the speed of operation and so the current microprocessors would have a maximum clock speed of under 1 MHz.
This is a lot more difficult than we think because the developers of microprocessors are in competition with each other so as soon as a method is suggested, they try to exploit the situation to present their microprocessor as faster than all the others.
Be wary when reading comparisons – which tests have they chosen, and why? While watching the last Olympic games, it occurred to me that I was probably faster that any of those competing in the 100 metres. Yes, I felt confident that I could build a working microprocessor-based system quicker than any of the athletes. You see, comparisons all depend on the test that we have decided to use. Anyone can be World champion. It’s only a matter of choosing the tests well enough. With that in mind, here are a few popular speed comparisons.
MIPS (millions of instructions per second)
This appears an easy measurement to take. It is simply a matter of multiplying the number of clock cycles in a second by the clock cycles taken to complete an instruction.
The current Athlon for example can run at 2 GHz or 2000 MHz. It can perform up to 9 instructions per clock cycle so its number of instructions per second is simply 2×109×9=18 000 million cycles per second.
Life is never that simple. Some instructions are more time consuming than others as they take a different number of clock cycles to perform the task. Competitors will obviously choose instructions that give the most impressive results on their own microprocessor.
An extreme example occurred about ten years ago with the Intel 80836, we could ask it to perform some additions. By consulting the instruction set, we could see that they each take two clock cycles to complete. Now, if we take a clock frequency of 25 MHz, each clock cycle would last for 40 ns so an ‘add’ instruction would take 80 ns. This would equate to a speed of 12.5 MIPS. An unkind competitor may take ‘at random’ the divide instruction that takes 46 clock cycles, they could reduce its MIPS rating to a measly 0.54 MIPS. A really vindictive person could search through the instruction set with a magnifying glass and find that there is a really obscure instruction that takes 316 clock cycles. This would provide a speed of 0.08 MIPS – about the same as a four-bit microprocessor.
We cannot even say that we can make it fair by using the same instruction for each microprocessor since they don’t always have their talents in the same area. If we had two microprocessors, each with a ‘load’ instruction taking five clock cycles and they both ran on a 10 MHz clock, what would be their speed? 10/5 = 2 MIPS. (In working this out we can just ignore the fact that the clock cycle is in megahertz since the speed is measured in millions of instructions.) But can they do the job at the same speed? Possibly or possibly not. What if one was a 64-bit microprocessor and the other was an 8-bit microprocessor. The 64-bit one could shovel data at eight times the speed. For good reason MIPS have been referred to as ‘Meaningless Indication of Performance by Sales reps’.
To overcome the problem of which instructions should be employed some standard floating-point operations can be used.
As a quick reminder, a floating point number is one in which we have moved the decimal point to the start of the number, so 123.456 would be converted to 1.23456×10². This makes the mathematics faster. Modern microprocessors would have values in the order of ten GFLOPS (GigaFLOPS).
This also meets with objections. The obvious question is, ‘What “operation” is being measured?’ Choose your operation carefully and the opposition is left far behind.
Both of these tests, MIPS and FLOPS are supposed to be microprocessor tests and not system tests.
System tests
Other speed measurements tend to be system tests rather than microprocessor tests but a brief overview may be in order since they are often quoted, almost as alternatives.
Benchmarks
These tests are based on making the microprocessor-based system run a standard or ‘benchmark’ program. The immediate failing here is that to get a program to run on microprocessors with incompatible code will mean that the compilers will also be tested, which is not part of the system. There is also an immediate outcry from people who disagree with the program chosen since it doesn’t suit their system.
I/O operations (input/output operations)
As the name suggests, this measures the speed of accepting information in and sending it out again. But loading information from a CD for example will depend on whether the information is being read from the same track or does the head have to move and include seektime?
TPS (Transactions Per Second)
This is a move to model the tasks set into real-life situations. It requires the system to take in information, modify it and then store it again. It puts a heavy significance on memory access times and compilers.
SPECmark (Systems Performance Evaluation Co-operative’s benchmark)
This is the average result of carrying out 10 agreed benchmark tests in an attempt to measure the system performance in a range of situations. Recent changes include one using floating point arithmetic, which is of more interest to serious number crunching in science and engineering and the other an integer test for the rest of us. These are referred to as SPECfp95 and SPECint95. As a starting point for comparison, the 200 MHz Intel Pentium Pro delivers a value of 8.71 using the SPECint95.
Increase the clock speed
This seems the obvious answer and generating the necessary square wave shown in Figure 6.4 is no problem. In a modern microprocessor-based system there are two clocks that we need to consider. There is a square-wave clock that controls the internal operation of the microprocessor. This is the headline speed seen in the adverts, ‘the 2 GHz Pentium’. There is also the operating clock, about 133 MHz, for the system to control the external devices and memory. This saves us from upgrading all the external devices to match each new processor.
Internal clock speeds will probably continue to increase at least into the low gigahertz range but there are limiting factors that will make continuous increases in speed difficult to achieve. It would be a trivial electronic problem to generate a square wave of 1000 GHz or more, so there is obviously more to it. And there is.
Power dissipation
Heat is a form of power and is an unwanted by-product of any activity inside a microprocessor.
Power = voltage × current
So, to reduce heat production, we have to reduce either the voltage or the current, or both.
We will start with voltage since it is a little more straightforward. The early microprocessors used a 15 V supply that has been steadily reduced and the latest designs are pushing at 1.5 V. How much further can we go along this line? It appears as if we have nearly reached the limit. The integrated circuits are made from a semiconductor called silicon. This passes electricity under the control of electric charges. The applied voltage creates this charge. In silicon, the simplest device needs at least 0.6 V to operate although, by adding minute traces of other materials, this figure can be reduced a little. A single transistor can do little with voltages less than 1 V so in a complex circuit it is already amazing that the total voltages can be as low as they are. The chances of a microprocessor running on less than 1 V is slight indeed. If I were braver, I would say impossible. Another point with regard to the voltage is that we must not forget the effects of random electrical noise as we saw earlier in Figure 2.2. Sudden changes in current flow in nearby circuits can cause random changes. If the voltages are reduced too far, the microprocessor will become more prone to random errors.
Bursts of current are promoted by the vertical leading and trailing edges of the clock pulses so the higher the clock frequency, the more edges per second and the more current will flow and hence more heat will be generated. To reduce the heat generated simply reduce the clock speed, which is exactly what we don’t want to do.
Size of architecture and its effects
As the electric charges move through the transistors inside the microprocessor it takes a finite time. It follows then, that if we reduce the size of the transistors we can move data around faster and this is true.
The smallest feature that could be fabricated in a microprocessor had an initial size of about 10 m when microprocessors were first produced; it has now been reduced to 0.13 μm, a significant reduction. This reduction has two drawbacks. Firstly, it is much more difficult and expensive to manufacture without accepting enormous failure rates. Secondly, the heat generated has not changed, since it is a feature of voltage and current but not size. This means that its temperature will increase unless we can dissipate the power. An unfortunate problem with semiconductors is that they are heat sensitive and will auto-destruct if the temperature rises too far. We do our best with heat sinks, which are basically slabs of aluminum with fins to increase the surface area, and fans to keep the heat moving. A typical operating range is 0–85°C when measured in the centre of the outer case of the microprocessor (not the heat sink).
To increase the system speed
When we look at the overall system, it is apparent that not all things have progressed at the same rate. The largest bottleneck is the memory. We already use a slower clock speed but the microprocessor still spends a lot of its time humming a tune and bending paper clips waiting for information to arrive from the memory. During the life of the microprocessor, the clock speed has increased from 0.1 MHz to about 3 GHz, an increase of about 30 000 times. During this time, these DRAM memories have got bigger but only about 2000 times faster.
Modern microprocessors have about 128 kbyte of on-board RAM, called a cache. When the microprocessor has to go to the external memory for information, it saves a copy of the address and the information in case it is needed again. It also saves the address and information from the next memory location. The reasoning behind this is that since nearly all languages are procedural then the next location is likely to be accessed next. If not, the program may jump back to a previous address to repeat part of a program as in a counting loop to produce a delay. When the microprocessor next requires access to the memory, it first checks the high speed cache to see if the information is stored, if it is, we have scored a ‘hit’ and the system has increased its speed. If it is not there, it is a ‘miss’ and the main memory is used. This new information is then stored in the cache for later.
This cache is sometimes called a level 1 cache, or L1 cache. This implies that there may be a level 2 cache – and there is. The L2 cache is usually 256 kbyte.
When data is needed, the microprocessor checks cache level 1, then 2 and lastly, the main memory.
Pipelining
To put too much reliance on the clock frequency is like saying that the maximum rpm of the engine determines the maximum speed of a vehicle. Yes, true, but other things like gearbox ratios are also significant. Doing 9000 rpm in first gear will not break any speed records. The real speed of a microprocessor also depends on how much useful work is done during each clock cycle. This is where pipelining is really helpful and is now incorporated in all microprocessors.
Let’s assume we have some numbers to move from the memory to the arithmetic and logic unit (ALU):
Clock pulse 1 | A number is moved from a memory location to the accumulator. |
Clock pulse 2 | It is then moved from the accumulator to the ALU. |
If we have another number to be loaded, this would have to repeat the process so loading two numbers would take four clock pulses. Three numbers would take six clock pulses and so on.
During the first clock pulse, a number is being moved along the bus between the memory and the accumulator and so the other part of the bus between the accumulator and the ALU is not used. During the second pulse, we still have one section of the bus idle (Figure 11.1).
Figure 11.1 One clock pulse moves one number
Pipelining is the process of making better use of the buses. While one number is shifted from the memory to the accumulator, we can use the same clock pulse to shift another number from the accumulator into the ALU along the other section of the bus. In this way, we get more action for each clock pulse and so the microprocessor completes instructions faster without an increase in the clock speed (Figure 11.2).
Figure 11.2 One clock pulse moves two numbers
If we get two jobs being done on the same clock cycle, then this has made a significant improvement to the speed without increasing the clock speed. If we can manage to get three pieces of information moving, or jobs done, this is even better. Incidentally the Pentium manages five and the Pentium Pro can manage 12 and the Pentium 4 can keep up to 126 instructions ‘in flight’. Unfortunately, we can never get pipelining to work this well on all instructions, but every little helps.
If we wished to AND two binary numbers, we could do it by using a logic gate as we saw in Chapter 5 or we could use a microprocessor executing an instruction code. Now, comparing middle-of-the-range devices, the logic gate would complete the task in 8 ns but a comparable microprocessor (80386, 25 MHz) would take a minimum of 80 ns.
This type of comparison established the belief that, given a choice, hardware is always faster than software. In the above case, it is 10 times faster.
Given the job of carrying out a hundred such instructions we had a choice:
Software method = 100 operations × 80 ns = 8000 ns (8 ms)
Hardware method = 1 operation at, say, 240 ns + 100 hardware operations × 8 ns = 1049 ns
This philosophy was followed throughout the development of 4-and 8-bit microprocessors. This gave rise to more complex hardware and a steady increase in the size of the instruction set from a little under 50 instructions for the 4004 up to nearly 250 in the case of the Pentium Pro.
In the mid 1980s, the hardware-for-speed approach began to be questioned. The ever-increasing number and complexity of the operating codes was reversed in some designs. These microprocessors were called RISC (Reduced Instruction Set Computers) and the ‘old fashioned’ designs were dubbed CISC or (Complex Instruction Set Computers). History has not proved so black and white as this suggests. It is much more a matter of shades of grey with new designs being neither wholly CISC nor RISC. The use of predominately CISC microprocessors outnumbers RISC designs by a wide margin, at least 60:1. This does not imply that they are better but they simply have a greater proportion of the market. As we know, there is a lot more to market dominance than having the best product. Sadly. CISC designs include all the 8-bit microprocessors and Pentium, Pentium Pro and all of the 68000 family whereas the RISC includes Digital Alphas and the IBM/Motorola Power PCs.
RISC versus CISC
Both RISC and CISC microprocessors employ all the go-faster techniques such as pipelining, superscalar structures and caches. A superscalar architecture is when there are two ALUs that share the processing like having two microprocessors. So, what are the real differences?
By analysing the code actually produced by compilers, we find that a small number of different instructions account for a very large proportion of the object code produced. Most popular are the instructions that deal with data being moved around.
At this point a curious switch of design occurred. You will remember that the ‘normal’ or CISC microprocessor included a microprogram in its instruction decoder or control unit. This microprogram was responsible for the internal steps necessary to carry out the instructions in the instruction code. So the microprocessor that we have been praising for its use of hardware to gain speed, is actually being run internally by software.
The RISC approach was to reduce the number of instructions available but keep them simple and do them fast. The number of instructions were reduced to under a hundred. Since instruction codes can be easily enhanced by adding some extras to the microprogram it was tempting to do it. No pruning of previous instruction was possible owing to the need to maintain compatibility with previous versions.
Following the cries of ‘hardware is faster than software’ it seemed a logical step to do away with the microprogram and replace it with hardware that could carry out the simple steps necessary. This hardware was made more simple by keeping all the instructions the same length so that pipelining was easier to organize. The only disadvantage of these constant length instructions is that they all have to be the same length as the longest and so the total program length will be increased.
CISC kept shoveling bucketfuls of data backwards and forwards between the microprocessor and the external memory using many different types of instruction. RISC designs just had a simple load and store instruction and everything else is done internally using a large number of registers to replace the external memory.
By the use of hardware for handling instructions and internal moves between registers, all instructions could be reduced to a single clock cycle, which gave a significant increase in speed. Generally, the Pentium Pro has managed to match this speed by its extensive use of pipelining.
As time goes by, there is an increasing tendency for the RISC/CISC difference to decrease. Modern RISC microprocessors like the PowerPC970 have an increasing number of instructions even though they do tend to be simple and fast and the more traditional CISC approach in the Pentium 4 is also employing simple yet extremely fast instructions.
As we know, our starting point was the Intel 4004 in 1972, very quickly followed by the 8-bit 8008 processor. These are shown in Figure 11.3. Notice that, even within a company, there was no agreement about the operating voltages.
Figure 11.3 The first 4-and 8-bit microprocessors
It also started the trend for numbering, rather than naming, a microprocessor. This made good sense since each basic design generated a series of variations, different speeds, modified instruction sets etc. The numbers can give a clue as to some of the basic characteristics and the hierarchy. Sometimes an X is used to signify a family of devices like the 80X86, so by giving different values to the X, we include the 80286, 80386 etc. The tendency now is to use a name and a number. This was due more to a legal problem owing to the difficulties of ‘owning’ a number. This was highlighted after Intel produced the 80286 followed by the 80386 and 80486, which gave its competitors rather advanced warning of the name of the next one. Intel couldn’t copyright the number 80586 otherwise mathematicians would fear prosecution if any calculation resulted in this number. They tried to call it the P5, claiming world rights over the letter P. Finally they went for ‘Pentium’. Apart from the fun of watching it all going on, the main beneficiaries, as usual, were the corporate lawyers (for whom we all pay, of course).
Intel versus Motorola
In December 1973, Intel introduced the 8080A. This was a very popular processor that had a 16-bit address bus so it can address up to 64 kbytes of memory or, as the adverts said at the time, ‘a MASSIVE 64k of memory’. The power supplies changed again, this time to +5 V, +12 V and –5 V. The number of instructions increased again and the number of pins increased to 40. Internally, there was the normal accumulator and eight general-purpose registers. This, then, became the standard package size for future 8-bit microprocessors as shown in Figure 11.4.
Figure 11.4 A standard size for 8-bit microprocessors
At about the same time, Motorola introduced a rival in the form of the MC6800. Like the Intel 8080A, this one was an 8-bit microprocessor with 16 address pins. At this point, the similarities ended. It was not compatible with the 8080A and so fought toe to toe in the market place, and came second!
The power supplies were simplified, now requiring only a single +5 V supply. A block diagram of the MC6800 is shown in Figure 11.5. We can see that it was unusual in not having any general purpose registers but it did have two accumulators. The approach here was to use an external memory location for the temporary storage of information that all previous microprocessors would have put into an internal register. The remainder of the microprocessor is quite familiar from our look at the Z80 in Chapter 8. The 6800 performed slightly faster than the 8080 on average, but not enough to break the hold of Intel in the marketplace.
Figure 11.5 MC6800 Motorola’s answer to the Intel 8080A
The final 8-bit microprocessors
At this time, Intel was working on the design of a replacement for the 8080A. It was a response to the criticisms of the 8080A: Why does it need three different power supplies when the 6800 only need one? And why only one interrupt pin when the 6800 has two?
At this time some of the engineers that had been working on the 8080A and were developing the replacement called the 8085A decided, or were persuaded, to move to a rival company called Zilog.
Meanwhile, back at Intel, the 8085 was produced. It answered all the gripes about the 8080A and increased the clock speed to 5.5 MHz. Still 8 bits with a 16-bit address bus, it stayed with its eight internal general purpose registers. It followed Motorola’s lead and opted for a single +5 V power supply but to keep its customer base it kept its instruction code compatible with the 8080A.
Back at Zilog, the other group of engineers that was also brought up with the 8080A set to work on the Z80 which was later developed into the Z80180. They combined much that was good about the Intel designs with some good ideas from the MC6800. A single power supply was used and this not only returned to the use of internal general purpose registers but increased their number to 14. The instruction code included all the code from the 8080A but added some new ones to nearly double the total number.
Meanwhile a new player, MOS Technology entered the fray with its own MCS650X family of which the MCS6502 is probably the best known. This flew to fame with the rise of the microcomputer in the 1980s. It was basically an enhancement of the Motorola MC6800 and follows the 8-bit trend of a 16-bit address bus and a single +5 V power supply. Its contribution to progress was the idea of pipelining. Two billion 6502s were sold.
Returning to the plot
The 6502 was said to be ‘accumulator-based’ in that it has no general purpose registers and a single accumulator through which all the incoming and outgoing data is passed but with pipelining and some fast instructions – it was very popular for a few years. In comparing Figure 11.6 with Figure 11.5 we can see the influence of the MC6800. Most of the blocks are the same. The MC6800 had two accumulators and one index register and the MCS6502 has two index registers and one accumulator.
Figure 11.6 MCS6502 – an enhancement of the MC6800
There was little to choose between these microprocessors, as the 6502 had the edge in the microcomputer market while the MC6800 was more popular in the industrial control field.
The one-chip microcomputer
To make a stand alone system, the microprocessor would require some RAM and ROM memory as we saw in Figure 7.6. In many industrial control situations the size of the memory would not need to be very large and it occurred to the designers that all these necessary parts could be built into a single chip. When this occurred the name microprocessor was superseded by the one-chip microcomputer.
Intel produced the 8048. This was not simply an 8085+RAM+ROM, but a new design. It included (yet another) change of instruction code so it was not compatible with either the original 8080A or the 8085A. The on-board memory consisted of 1 kbyte of ROM and 64 bytes of RAM which was quickly doubled to 2 kbytes and 128 bytes on a new version, the 8049. It could also access 4 kbytes of external ROM. A further enhancement was a timer. This can count up or down to provide time delays. Without this, the microprocessor would have to be used for this function which would prevent it from getting on with something more useful.
Zilog, of course, were not far behind. It bought out the Z8. This was similar in concept but slightly upgraded. It had two counter/timers that could be used for counting incoming pulses as well as providing the time delays. It had 2 kbytes of ROM and 128 bytes of RAM that it called general purpose registers. A big improvement was its ability to access 64 kbytes of external ROM and 64 kbytes of external RAM to allow the ‘one-chip’ to become more than one if needed.
Motorola replied with its MC6801 that included many of the features of others in its generation. It included 2 kbytes of ROM and 128 bytes of RAM together with a timer and UART. Its instruction set was compatible with the MC6800 with a glimmer of 16-bit arithmetic creeping in.
Rockwell launched their R6500, which was basically a 6502 microprocessor with the addition of 2 kbytes of ROM and 64 bytes of RAM. One interesting feature was that the internal RAM had a separate power supply in the form of a battery so that the data wasn’t lost during a power failure. It also included a universal asynchronous receiver/transmitter (UART) that we will save for Chapter 15. The first four generations of microprocessors are shown in Figure 11.7.
Figure 11.7 The first four generations
As the home computer was being developed, the headlines were only interested in the ever-increasing speed and capability of the microprocessors. However, behind the scenes, the microcontroller was selling in greater numbers with little publicity. There is not much in the way of attention-grabbing headlines in a microcontroller being fitted into a video recorder.
The mammoth engines of today’s computers were just not required for many uses in the real world. We are the limiting factor when it comes to everyday uses of microelectronics. We may want our computers to work faster but we still work at the same speed that our ancestors. Just how fast do we want a vending machine to work? We want it to work at the same speed today as it did yesterday, or ten years ago. After all we cannot press the buttons any faster and we don’t want the coffee to fly down the chute at ten time the previous speed.
If we want any change, we may like it to be more reliable. Is the Pentium 4 any more reliable than the Z80? I doubt it. We would certainly want it to be cheaper and use less power, and not require a fan if it thinks about something.
The trend is to rethink what we really want and the answer in the majority of cases is something much closer to the present day microcontroller.
We will be coming back to microcontrollers in more detail in Chapters 15 and 16.
It was inevitable that the 4-bit microprocessor that turned into the 8-bit should, in turn, grow into a 16-bit microprocessor but first a few moments to answer a seemingly obvious question.
What do we mean by a 16-bit microprocessor?
The ‘size’ of a microprocessor is the width of the data registers, so an 8-bit microprocessor can handle 8-bit numbers. It was traditional that 4-and 8-bit microprocessors had an address bus that was twice the width of the data registers. This was just a coincidence and doesn’t follow these days since no one wants, or could afford, the memory to fully use a 128-bit address bus (the number of locations is more than 3 followed by 38 zeros!)
The other fallacy is that it is necessarily the same as the width of the data bus. It is not. The Pentium family uses a 64-bit data bus but they have 32-bit data registers and are therefore 32-bit devices. It uses the 64-bit data bus to load two 32-bit registers at a time. The Power PC and the Digital Alpha family are real 64-bit devices and have a 128-bit data bus.
Curiously, the Intel 8088 was a real 16-bit microprocessor but had an 8-bit external data bus. This was to allow it to be compatible with the cheaper 8-bit circuitry, which had not quite caught up with the idea of using 16 bits.
Even so, the data registers are not universally accepted as defining the size of a microprocessor. Some people stick with the data bus to be the defining size. So, in reference books and catalogues you may find microprocessors referred to as a different ‘size’ to the one you expected. In this book I will stick with the data register as being the defining feature.
The 68000 family
The M68000, first produced in 1979 was a VLSI chip employing about 70 000 transistors. The M68000 microprocessor is well known as a 16-bit microprocessor but, in reality, it is a 32-bit device if we stick to our definition above. It certainly has a 16-bit data bus but the internal registers are 32-bit although some arithmetic operations can only use 16-bit data. Occasionally this format is called a 16/32-bit processor. It was in a 64 pin dil (dual-in-line) package, as shown in Figure 11.4, but even longer. Its length was often its undoing since it could easily snap in half if you attempted to remove it by prizing up one end of the chip. It has a 24-bit address bus that can therefore access 16 Mbytes of memory with a 12 MHz clock frequency.
One feature of M68000 is its pre-fetch action. When an instruction is being worked on, the microprocessor fetches the next instruction from memory and stores it in a little queue, ready to be used. This can be done whenever the present instruction is not using the external address and data buses. This means that the next instruction is already loaded ready to go as soon as it is required, thus saving valuable time. It has a total of seventeen 32-bit general-purpose registers of which eight are data registers, which can be used as 8-, 16-, or 32-bit registers as required.
One interesting feature is that this microprocessor can operate in two modes, supervisory and user. The essential difference is that the user mode has a restricted list of instructions at its disposal. The operating system can use the supervisory mode and thus use the full set of instructions but user programs only have access to a restricted range – enough to run the programs but, hopefully, not enough to screw things up too much. There is a software route between the two modes if you really want to change.
The M68000 gave rise to its own family as the basic model progressed. The main advances are detailed in Figure 11.8.
Figure 11.8 The M68000 family
As true 32-and 64-bit microprocessors have taken over the computing side, the 68000 family is used increasingly as a high-performance embedded control for printers and disk drives.
After the early one-chip microcomputer, the decreasing cost of the design and production of the integrated circuits made it easier to increase the complexity of the chips.
This has caused the development to diverge along two separate paths: speed and power or cheap and small.
By heading off in the cheap and small route, we get microcontrollers that are controlling the operation of most of the instruments and machines that we use and even playing tunes in greetings cards. We will meet these in Chapters 15 and 16.
The never-ending pursuit of more speed and more power for computers has resulted in the continuous development of larger and faster microprocessors like the Pentium 4 and its competitors. Each new design is king for a day, and then overtaken and dispatched to the museum. What cost a fortune three years ago is thrown out with the garbage, unwanted. We will look at these here today, gone tomorrow devices in the next three chapters.
If we only needed computers to allow us to handle text on a word processor, there would be little need for the development that has occurred in the last ten years – our typing is not getting faster and text is simple stuff. Introducing coloured pictures does little to increase the stress levels but things really start to change when we want to have moving coloured pictures.
We are never satisfied: we want faster-changing, more lifelike images – with sound effects, of course. Our demands will outstrip the latest microprocessors almost regardless of their capability.
Many computers are used mostly for playing games and simulations but because they have other functions they must be designed to operate across a wide range of fields and not really optimized for any particular task. This has resulted in the development of the dedicated games machine. We have a choice of three at the moment – they are the Nintendo Gamecube, the Playstation and the X-Box.
Nintendo Gamecube
This is the oldest design and, inevitably, the least technically advanced. It is generally cheap to buy the cube and the games are also cheap and are most popular amongst users in the lower age groups.
They decided to use a 64-bit, 85 MHz, IBM PowerPC 750Cxe microprocessor, also called the ‘Gekko’ which was its original codename during the production stages. It is a slightly modified version of the one used in Macintosh computers and similar to those described in Chapter 13. The modifications were software additions of almost forty additional instructions to provide specific help in game playing but not necessary in the original computer applications.
The quality of the graphics during play can be indicated by the speed of handling graphic data which is conveniently measured by how many shapes it is able to draw in one second. In this case the chosen ATI 162 MHz Flipper GPU (Graphics Processor Unit) has a maximum speed of 12 million polygons/second – this sounds fast but compared with the Playstation 2 and the X-box it is not impressive.
The main game storage is a 1.5 GB 3 inch Nintendo optical disk. There is also a facility for a 64 MB memory card supplied by Panasonic.
Sony Playstation 2
Rather than taking a ready-made microprocessor off the shelf and then designing a games machine around it, Sony started by designing their own microprocessor called the ‘Emotion Engine’ designed just to run games programs as fast as possible. This ‘single job’ approach allows the design to be very focused. The microprocessor is surrounded by the necessary circuitry to provide the necessary inputs and outputs. The overall blocks are shown in Figure 11.9.
Figure 11.9 Playstation 2
If we load a game from a DVD, the startup information is passed to the Emotion Engine which prepares the game graphics and sound in either analog or digital format. It then waits for input instructions arriving from our controller or through the USB ports.
Inside the Emotion Engine, shown in Figure 11.10, our controller information arrives through the Input/Output unit and the fun begins.
Figure 11.10 The Emotion Engine
Games involve serious numbers of calculations and they can be very complex. There are two types of calculations: straightforward calculations (just the sort of thing we could do on a pocket calculator) and geometric calculations. The ordinary calculations are performed by the FPU (Floating-Point Unit) and the others are done by the VU (Vector Unit). So why so many calculations?
Does the car skid on the corner? This depends on how fast you are telling it to turn, what speed you are driving at, the weather selected and the car data. What is going to happen if you ‘accidentally’ hit a tree or another vehicle?
These calculations have to be done in real time – if we turn the steering wheel the car has to respond immediately. The geometric calculations are performed by the Vector unit, which provides the results of what is happening on screen. It also prepares the list of events that control everything that appears on the screen – right down to the path taken by the wheel that has broken off and the reflection in the driver’s mirror.
To a large extent, the final quality of the game experience depends on the speed of these calculations. Floating point calculations are performed at more than six billion a second! That is moving.
So, how good is the Playstation 2?
That is too difficult to answer. Instead we can look at the technical information but there is much more. Does the controller feel good? Are the games exciting and realistic? Does an hour on the PS2 seem like minutes or weeks?
Generally, the console is fairly expensive but, having bought it, the games are cheap(ish). The thinking behind this is we only buy the PS2 once and soon forget how much it cost, especially if it was a gift, but we can afford plenty of games. Having plenty of games reduces the attraction of changing to another games machine – like the Xbox.
The Emotion Engine runs at only 294 MHz but these headline speeds are not a good indication of how fast it can do its job – this also depends on how well it has been designed for the job. You may remember that the Gamecube microprocessor was running at 485 MHz, more than 50% faster than the PS2, but look at the drawing speeds: Gamecube 12 million polygons/sec, PS2 25 million polygons/sec, double the speed. This compares a general-purpose computer microprocessor with a dedicated device.
The Microsoft Xbox
The Xbox is the latest offering in this market. They have opted for the opposite strategy to the Playstation 2 by making the console fairly cheap but charging more for the games. Presumably, the name of Microsoft together with an attractive upfront price will mean the boxes are carried out the shops and we will worry about the games later. They seem to sell to an older group than the other two and perhaps the price of games may not be such a barrier.
Figure 11.11 The Xbox
The microprocessor chosen for this machine is the Pentium 3 running at 733 MHz. They have gone for a standard micro rather than a specialized design but even so, it has got such power that it can blast through a game at a good rate. The geometric drawing speed of textured polygons is 50 million per second or twice the speed on the PS2 or eight times that of the Gamecube. In all cases, the polygon count has texture included. The bare polygon is a simple wire-frame shape without any surface coloring which is essential to provide reality to the scene.
The North Bridge chip is the central block of the Xbox and provides interconnections between the other units. It controls access to the 64 MB memory that provides a cache for the use of the CPU (Central Processing Unit) to store program code and a larger share for the GPU (Graphics Processing Unit). The North Bridge also sends signals to the South Bridge chip.
The South Bridge provides all the external inputs and outputs via the USB and network ports together with the audio signals.
The GPU design shown in Figure 11.12 is called the ‘nVidia’ is based on the GeForce 3 design, another popular computer component. The graphic processor converts the CPU output into the finished information being sent to the CPU or television.
The Figure 11.12 The graphics chip
The information about each object on the screen such as its position, the lighting applied and the surface appearance is prepared together with two features that are used to decrease the computing power necessary or to improve the appearance with the same processor. The ‘pixel shader’ that can apply realistic lighting and surface texture effects over a whole scene without each individual point being calculated separately and the vertex program oversees the detailed changes that are necessary in critical areas. We tend to be very selective when it comes to details, we would notice the slightest change in a facial expression yet ignore whole leaves on a tree. By controlling light and texture characteristics, the vertex shader fills in details of these small but important changes.
In each case, choose the best option.
1 The ‘size’ of a microprocessor is determined by the:
(a) width of its data registers.
(b) number of lines in its external data bus.
(c) number of digits in its type number.
(d) width of its address registers.
2 An integrated circuit having 15 000 transistors is classed as a:
(a) LSI device.
(b) SLSI device.
(c) VLSI device.
(d) SSI device.
3 An untextured polygon:
(a) looks like a dinosaur.
(b) is just a wire-frame shape.
(c) has no shape.
(d) is a cube.
4 An L1 cache is usually:
(a) onboard the microprocessor.
(b) constructed from DRAM for maximum speed.
(c) slower than Level 2 cache.
(d) external to the microprocessor.
5 RISC:
(a) means ‘radical instruction set computer’.
(b) has longer instructions and is therefore slower than a CISC chip.
(c) is part of everyday life.
(d) chips employ a smaller instruction set.