Sunday, December 5, 2010

Bus & Complex instruction set computing

In computer architecture, a bus is a subsystem that transfers data between computer components inside a computer or between computers (Dell XPS M1210 Battery) .

Early computer buses were literally parallel electrical buses with multiple connections, but the term is now used for any physical arrangement that provides the same logical functionality as a parallel electrical bus. Modern computer buses can use both parallel and bit-serial connections, and can be wired in either a multidrop (electrical parallel) or daisy chain topology, or connected by switched hubs, as in the case of USB (Dell Studio XPS 1340 Battery) .


First generation

Early computer buses were bundles of wire that attached memory and peripherals. They were named after electrical buses, or busbars (Dell Studio XPS 1640 Battery) .

Almost always, there was one bus for memory, and another for peripherals,and these were accessed by separate instructions, with completely different timings and protocols.

One of the first complications was the use of interrupts (Dell Vostro 1710 Battery) .

Early computer programs performed I/O by waiting in a loop for the peripheral to become ready. This was a waste of time for programs that had other tasks to do. Also, if the program attempted to perform those other tasks, it might take too long for the program to check again, resulting in loss of data. Engineers thus arranged for the peripherals to interrupt the CPU (Sony VGP-BPS13 battery) .

The interrupts had to be prioritized, because the CPU can only execute code for one peripheral at a time, and some devices are more time-critical than others.

Later computer programs began to share memory common to several CPUs. Access to this memory bus had to be prioritized, as well (Sony VGP-BPS13/B battery) .

The classic, simple way to prioritize interrupts or bus access was with a daisy chain.

DEC noted that having two buses seemed wasteful and expensive for mass-produced minicomputers, and mapped peripherals into the memory bus, so that the devices appeared to be memory locations (Sony VGP-BPS13/S battery) .

Early microcomputer bus systems were essentially a passive backplane connected directly or through buffer amplifiers to the pins of the CPU. Memory and other devices would be added to the bus using the same address and data pins as the CPU itself used, connected in parallel (Sony VGP-BPS13A/B battery) .

Communication was controlled by the CPU, which had read and written data from the devices as if they are blocks of memory, using the same instructions, all timed by a central clock controlling the speed of the CPU. Still, devices interrupted the CPU by signaling on separate CPU pins (Sony VGP-BPS13B/B battery) .

For instance, a disk drivecontroller would signal the CPU that new data was ready to be read, at which point the CPU would move the data by reading the "memory location" that corresponded to the disk drive. Almost all early microcomputers were built in this fashion, starting with the S-100 bus in the Altair (Sony VGP-BPL9 battery) .

In some instances, most notably in the IBM PC, although similar physical architecture is employed, instructions to access peripherals (in and out) and memory (mov and others) have not been made uniform at all, and still generate distinct CPU signals, that could be used to implement a separate I/O bus (Sony VGP-BPS13B/B battery) .

These simple bus systems had a serious drawback when used for general-purpose computers. All the equipment on the bus has to talk at the same speed, as it shares a single clock.

Increasing the speed of the CPU becomes harder, because the speed of all the devices must increase as well (Sony VGP-BPL11 battery) .

When it is not practical or economical to have all devices as fast as the CPU, the CPU must either enter a wait state, or work at a slower clock frequency temporarily , to talk to other devices in the computer. While acceptable in embedded systems, this problem was not tolerated for long in general-purpose, user-expandable computers (Sony VGP-BPL15 battery) .

Such bus systems are also difficult to configure when constructed from common off-the-shelf equipment. Typically each added expansion card requires many jumpers in order to set memory addresses, I/O addresses, interrupt priorities, and interrupt numbers (Dell Inspiron E1505 battery) .

A bus controller accepted data from the CPU side to be moved to the peripherals side, thus shifting the communications protocol burden from the CPU itself. This allowed the CPU and memory side to evolve separately from the device bus, or just "bus". Devices on the bus could talk to each other with no CPU intervention (Dell Latitude E6400 battery) .

This led to much better "real world" performance, but also required the cards to be much more complex. These buses also often addressed speed issues by being "bigger" in terms of the size of the data path, moving from 8-bit parallel buses in the first generation, to 16 or 32-bit in the second, as well as adding software setup (now standardised as Plug-n-play) to supplant or replace the jumpers (HP Pavilion dv6000 Battery) .

However these newer systems shared one quality with their earlier cousins, in that everyone on the bus had to talk at the same speed. While the CPU was now isolated and could increase speed without fear, CPUs and memory continued to increase in speed much faster than the buses they talked to (Sony Vaio VGN-FZ31S battery) .

The result was that the bus speeds were now very much slower than what a modern system needed, and the machines were left starved for data. A particularly common example of this problem was that video cards quickly outran even the newer bus systems like PCI, and computers began to include AGP just to drive the video card (Sony VGN-FZ31S battery) .

By 2004 AGP was outgrown again by high-end video cards and other peripherals and has been replaced by the new PCI Express bus.

An increasing number of external devices started employing their own bus systems as well. When disk drives were first introduced, they would be added to the machine with a card plugged into the bus, which is why computers have so many slots on the bus (Hp pavilion dv6000 battery) .

But through the 1980s and 1990s, new systems like SCSI and IDE were introduced to serve this need, leaving most slots in modern systems empty. Today there are likely to be about five different buses in the typical machine, supporting various devices (SONY VGN-FZ38M Battery) .

Third generation

"Third generation" buses have been emerging into the market since about 2001, including HyperTransport and InfiniBand. They also tend to be very flexible in terms of their physical connections, allowing them to be used both as internal buses, as well as connecting different machines together (SONY VGN-FZ31z Battery) .

This can lead to complex problems when trying to service different requests, so much of the work on these systems concerns software design, as opposed to the hardware itself. In general, these third generation buses tend to look more like a network than the original concept of a bus, with a higher protocol overhead needed than early systems, while also allowing multiple devices to use the bus at once (Sony VGN-FZ31Z Battery) .

Buses such as Wishbone have been developed by the open source hardware movement in an attempt to further remove legal and patent constraints from computer design (SONY VGN-FZ31E Battery) .

Description of a bus

At one time, "bus" meant an electrically parallel system, with electrical conductors similar or identical to the pins on the CPU. This is no longer the case, and modern systems are blurring the lines between buses and networks (SONY VGN-FZ31J Battery) .

Buses can be parallel buses, which carry data words in parallel on multiple wires, or serial buses, which carry data in bit-serial form. The addition of extra power and control connections, differential drivers, and data connections in each direction usually means that most serial buses have more conductors than the minimum of one used in the 1-Wire and UNI/O serial buses (SONY VGN-FZ31M Battery) .

As data rates increase, the problems of timing skew, power consumption, electromagnetic interference and crosstalk across parallel buses become more and more difficult to circumvent. One partial solution to this problem has been to double pump the bus. Often, a serial bus can actually be operated at higher overall data rates than a parallel bus (SONY VGN-FZ31B Battery) ,

despite having fewer electrical connections, because a serial bus inherently has no timing skew or crosstalk. USB, FireWire, and Serial ATA are examples of this. Multidrop connections do not work well for fast serial buses, so most modern serial buses use daisy-chain or hub designs (SONY VGP-BPS13 Battery) .

Most computers have both internal and external buses. An internal bus connects all the internal components of a computer to the motherboard (and thus, the CPU and internal memory). These types of buses are also referred to as a local bus, because they are intended to connect to local devices, not to those in other machines or external to the computer (Dell Inspiron 1320 Battery) .

An external bus connects external peripherals to the motherboard.

Network connections such as Ethernet are not generally regarded as buses, although the difference is largely conceptual rather than practical (Dell Inspiron 1320n Battery) .

The arrival of technologies such as InfiniBand andHyperTransport is further blurring the boundaries between networks and buses. Even the lines between internal and external are sometimes fuzzy, I²C can be used as both an internal bus, or an external bus (where it is known as ACCESS.bus) (Dell Inspiron 1464 Battery) ,

and InfiniBand is intended to replace both internal buses like PCI as well as external ones like Fibre Channel. In the typical desktop application, USBserves as a peripheral bus, but it also sees some use as a networking utility and for connectivity between different computers, again blurring the conceptual distinction (Dell Inspiron 1564 Battery) .

Bus topology

In a network, the master scheduler controls the data traffic. If data is to be transferred, the requesting computer sends a message to the scheduler, which puts the request into a queue. The message contains an identification code which is broadcast to all nodes of the network (Dell Inspiron 1764 Battery) .

The scheduler works out priorities and notifies the receiver as soon as the bus is available.

The identified node takes the message and performs the data transfer between the two computers. Having completed the data transfer the bus becomes free for the next request in the scheduler's queue (Dell Studio 1450 Battery) .

  • Advantage: Any computer can be accessed directly and messages can be sent in a relatively simple and fast way.
  • Disadvantage: A scheduler is required to organize the traffic by assigning frequencies and priorities to each signal (Dell Studio 1457 Battery) .

CPU socket

A CPU socket or CPU slot is a mechanical component that provides mechanical and electrical connections between a device (usually amicroprocessor) and a printed circuit board (PCB) (Dell Latitude D610 Battery) .

For chips that sport a high number of pinouts, either zero-insertion force (ZIF) sockets or land grid array (LGA) sockets are usually chosen over the alternative socket-type which requires the use of force to insert the device into the socket (Toshiba NB100 Battery) .

ZIF and LGA sockets avert the need for this insertion force because the retention force (required to hold the chip in place) is applied once either the handle (for ZIF type) or the surface plate (LGA type) apply the retaining force (Toshiba Satellite M65 battery) .

Common sockets utilize retention clips that are designed to always apply force and this force must be overcome when a device is inserted. The newer designs, e.g. Zero Insertion Force (ZIF) and the Land Grid Array (LGA) apply a compression force once the handle or cover is put into place (Toshiba Satellite M60 battery) .

Either design enables the CPU to be replaced without risking the damage typically introduced when using soldering tools. The ZIF and LGA sockets provide superior mechanical retention without the added risk of bending pins when inserting the chip into the socket (Dell Latitude D830 Battery) .

CPU sockets are used in desktop and server computers (laptops typically use surface mount CPUs). They are also used for prototyping a new circuit because of these advantages (Dell Latitude D620 Battery) .


A CPU socket is often made up of plastic, a metal lever or latch and metal contacts for each of the pins or lands on the CPU. Most packages are keyed to ensure the proper insertion of the CPU. CPUs with a PGA package are inserted into the socket and the latch is closed (Dell Studio 1735 Battery) .

This has the effect of physically securing and protecting the CPU as well as causing an electrical connection between all the CPU pins and the socket. In the case of LGA the CPU is placed onto the socket and a latch is closed over the CPU, securing it. Most CPU sockets are designed to support the installation of a heatsink (Dell Inspiron Mini 10 Battery) .

It must be able to protect the CPU from the weight of the heatsink (often very heavy in weight relative to the CPU) particularly during the installation and removal, while also ensuring that the heatsink makes good thermal contact with the CPU (Sony VGN-FW11S Battery) .

CPU sockets provide an advantage over directly attaching CPUs to the PCB by making it easier to replace the processor in the event of a failure. The CPU is often the most expensive component in the system and the cost of a CPU socket is relatively low, which makes this popular among computer system manufacturers (Sony VGN-FW11M Battery) .

The nature of a CPU socket requires it not only to make good electrical contact with the CPU, but also to be soldered to the PCB with which it interfaces (Sony VGN-FW139E/H battery) .

Complex instruction set computing

A complex instruction set computer (CISC) (pronounced /?s?sk/), is a computer where single instructions can execute several low-level operations (such as a load from memory, an arithmeticoperation, and a memory store) and/or are capable of multi-step operations or addressing modes within single instructions (Dell Latitude E5400 Battery) .

The term was retroactively coined in contrast to reduced instruction set computer (RISC).

Examples of CISC instruction set architectures are System/360 through z/Architecture, PDP-11, VAX, Motorola 68k, and x86.

Historical design context (Dell Latitude E4200 Battery)

Incitements and benefits

Before the RISC philosophy became prominent, many computer architects tried to bridge the so called semantic gap, i.e. to design instruction sets that directly supported high-level programming constructs such as procedure calls, loop control, and complex addressing modes, allowing data structure and array accesses to be combined into single instructions (Dell Vostro A840 Battery) .

Instructions are also typically highly encoded in order to further enhance the code density. The compact nature of such instruction sets results in smaller program sizes and fewer (slow) main memory accesses, which at the time (early 1960s and onwards) resulted in a tremendous savings on the cost of computer memory and disc storage, as well as faster execution (Dell Inspiron 300M Battery .

It also meant good programming productivity even inassembly language, as high level languages such as Fortran or Algol were not always available or appropriate (microprocessors in this category are sometimes still programmed in assembly language for certain types of critical applications) (Dell Studio 1737 battery) .

New instructions

In the 70's, analysis of high level languages indicated some complex machine language implementations and it was determined that new instructions could improve performance. Some instructions were added that were never intended to be used in assembly language but fit well with compiled high level languages (Dell Inspiron E1505 battery) .

Compilers were updated to take advantage of these instructions. The benefits of semantically rich instructions with compact encodings can be seen in modern processors as well, particularly in the high performance segment where caches are a central component (as opposed to most embedded systems) (Dell RM791 battery) .

This is because these fast, but complex and expensive, memories are inherently limited in size, making compact code beneficial. Of course, the fundamental reason they are needed is that main memories (i.e. dynamic RAM today) remain slow compared to a (high performance) CPU-core (Dell XPS M1530 battery) .

[edit]Design issues

While many designs achieved the aim of higher throughput at lower cost and also allowed high-level language constructs to be expressed by fewer instructions, it was observed that this was not alwaysthe case (Dell XPS M2010 battery) .

For instance, low-end versions of complex architectures (i.e. using less hardware) could lead to situations where it was possible to improve performance by not using a complex instruction (such as a procedure call or enter instruction), but instead using a sequence of simpler instructions (Dell Vostro 1000 battery) .

One reason for this was that architects (microcode writers) sometimes "over-designed" assembler language instructions, i.e. including features which were not possible to implement efficiently on the basic hardware available. This could, for instance, be "side effects" (above conventional flags) (Acer Aspire One battery) ,

such as the setting of a register or memory location that was perhaps seldom used; if this was done via ordinary (non duplicated) internal buses, or even the external bus, it would demand extra cycles every time, and thus be quite inefficient (Toshiba Satellite P10 Battery) .

Even in balanced high performance designs, highly encoded and (relatively) high-level instructions could be complicated to decode and execute efficiently within a limited transistor budget. Such architectures therefore required a great deal of work on the part of the processor designer in cases where a simpler (SONY VGN-FZ210CE Battery) ,

but (typically) slower, solution based on decode tables and/or microcode sequencing is not appropriate. At the time where transistors and other components were a limited resource, this also left fewer components and less area for other types of performance optimizations (Dell Precision M70 Battery) .

The RISC idea

The circuitry that performs the actions defined by the microcode in many (but not all) CISC processors is, in itself, a processor which in many ways is reminiscent in structure to very early CPU designs (Toshiba Satellite L305 Battery) .

This gave rise to ideas to return to simpler processor designs in order to make it more feasible to cope without (then relatively large and expensive) ROM tables, or even without PLA structures, for sequencing and/or decoding. At the same time, simplicity and regularity, would make it easier to implement overlapping processor stages (pipelining) at the machine code level (Toshiba Satellite T4900 Battery) (i.e. the level seen by compilers).

The first (retroactively) RISC-labeled processor (IBM 801 - IBMs Watson Research Center, mid-1970s) was therefore a tightly pipelined machine originally intended to be used as an internal microcode kernal, or engine, in a CISC design. At the time, pipelining at the machine code level was already used in some high performance CISC computers (Toshiba PA3399U-2BRS battery) ,

in order to reduce the instruction cycle time, but it was fairly complicated to implement within the limited component count and wiring complexity that was feasible at the time. (Microcode execution, on the other hand, could be more or less pipelined, depending on the particular design (Toshiba Satellite A200 Battery) .)


In a more modern context, the complex variable length encoding used by some of the typical CISC architectures makes it complicated, but still feasible, to build a superscalar implementation of a CISC programming model directly; the in-order superscalar Original Pentium and the out-of-order superscalar Cyrix 6x86 are well known examples of this (Toshiba Satellite 1200 Battery) .

The frequent memory accesses for operands of a typical CISC machine may limit the instruction level parallelism that can be extracted from the code, although this is strongly mediated by the fast cache structures used in modern designs, as well as by other measures (Toshiba Satellite M300 Battery) .

Due to inherently compact and semantically rich instructions, the average amount of work performed per machine code unit (i.e. per byte or bit) is higher for a CISC than a RISC processor, which may give it a significant advantage in a modern cache based implementation (WD passport essential (500GB/640GB) .

(Whether the downsides versus the upsides justifies a complex design or not is food for a never-ending debate in certain circles.)

Transistors for logic, PLAs, and microcode are no longer scarce resources; only large high-speed cache memories are limited by the maximum number of transistors today (WD passport essential (250GB/320GB) .

Although complex, the transistor count of CISC decoders do not grow exponentially like the total number of transistors per processor (the majority typically used for caches). Together with better tools and enhanced technologies, this has led to new implementations of highly encoded and variable length designs without load-store limitations (i.e. non-RISC) (WD passport essential SE (750GB/1TB) .

This governs re-implementations of older architectures such as the ubiquitous x86 (see below) as well as new designs for microcontrollers for embedded systems, and similar uses. The superscalar complexity in the case of modern x86 was solved with dynamically issued and buffered micro-operations, i.e. indirect and dynamic superscalar execution (WD passport elite(250GB/320GB) ;

the Pentium Pro and AMD K5 are early examples of this. This allows a fairly simple superscalar design to be located after the (fairly complex) decoders (and buffers), giving, so to speak, the best of both worlds in many respects (WD passport elite(500GB/640GB) .

CISC and RISC terms

The terms CISC and RISC have become less meaningful with the continued evolution of both CISC and RISC designs and implementations (WD passport studio for Mac(320GB/500GB) .

The first highly (or tightly) pipelined x86 implementations, the 486 designs from Intel, AMD, Cyrix, and IBM, supported every instruction that their predecessors did, but achieved maximum efficiency only on a fairly simple x86 subset that was only a little more than a typical RISC instruction set (i.e. without typical RISC load-store limitations) (WD passport studio for Mac(500GB/640GB) .

The Intel P5 Pentium generation was a superscalar version of these principles. However, modern x86 processors also (typically) decode and split instructions into dynamic sequences of internal buffered micro-operations, which not only helps execute a larger subset of instructions in a pipelined (overlapping) fashion, but also facilitates more advanced extraction of parallelism out of the code stream, for even higher performance (WD Elements SE(500GB/640GB) (WD Elements SE(750GB/1TB).

No comments:

Post a Comment