Microprocessors - transputers

Introduction

Bus and Software Snags

Transputers

Applications and Topologies

Software For Parallel Processing

Transputer Applications Areas

Notes on RISC

lNTRODUCTION

If today's computer scientists are correct, today's computer architecture will not be sufficient to handle the kinds of tasks computers will be called upon to perform during the rest of this decade, and certainly not in the next.

Many believe that applications such as signal processing, artificial intelligence, supercomputing and the like will place severe demands on conventional processors, demands which will be difficult to meet. Even the new crop of high performance 32 bit microprocessors will be tested by these kinds of applications.

What many say limits the capabilities of today's processors is their architecture - how the various units that make up the processor are set up to handle program instructions and communications with the rest of the system. So called conventional processors are sequential machines; one instruction must be completed before the next can begin. The serial nature of today's machines limits their ultimate performance in complex applications.

The next decade’s applications will call for processors that can deliver power considerably greater than that available from today’s sequential processors. Simply designing faster and faster sequential processors is not enough. Applications such as machine vision and speech recognition will prove to be too large and too complex to be handled by conventional computer architectures.

A more workable approach to performing these kinds of applications is what computer scientists call parallel processing. Parallel processing systems, and hence parallel computers, achieve their speed by dividing up a problem and processing its parts simultaneously on multiple computing nodes. What makes parallel processing effective is the inherent parallel natures of many of the applications it is used to solve.

Such parallel processing systems can be built using conventional sequential processors as their nodes. But two problems have blocked the widespread commercial acceptance of these systems. The first is communication between nodes; the second, software.

Bus and Software Snags

ln a conventional bus-based multiple processor system, the various system facilites (processors, memory, input/output) reside on a common communications bus, the channel over which data and programs move. One benefit of this is that bus-based systems are completely general purpose; there is no set structure that makes a bus-based system better for one application than for another.

On the other hand, a bus-based system has limited information-carrying capacity because its bandwidth is determined by the size of the word the bus can accommodate and by the speed of the system clock. Adding more processors does not improve system performance beyond this limit. Indeed, it bogs down performance.

Software presents problems for parallel system designers because using conventional processors as nodes also means using conventional software for reasons of compatibility. By having to use existing software (compilers, operating systems and the like) designers are not able to take advantage of the parallel nature of their problems; they must remain compatible with the hardware if the systems are to operate. The result is severely restricted performance even in a parallel processing structure.

INMOS International recognised that if system designers were going to build systems for the 1990s and beyond, they woutd need a processor that would free them from the constraints of bus-based systems and of yesterday’s software. INMOS’ transputer processor and OCCAM programming language were developed to address the issues surrounding parallel processor development. Both the processor and the language are architecturally suited to parallel systems design, such as embedded industrial controllers, defense systems and supercomputers.

TRANSPUTERS

The lNMOS transputer consists of a high performance processor, on-chip RAM and interprocessor links, all on a single chip of silicon. The transputer serves as a building block for parallel processing machines. Extra processing power can be added simply by connecting transputers through their communications links. There is no limit to the number of transputers that can be linked to build a system.

Transputers are based on a processor whose architecture is RISC-like in that it has a core of very fast, simple instructions in keeping with the RISC philosophy. But the processor’s capabilities are greatly extended by augmenting this simple core with a number of different application specific extensions to provide higher performance and more compact code in areas such as floating point performance and graphics than could be offered by a purist RISC approach. Rather than hard-wiring the instruction set, the flexibilities offered by microcoding are used in providing this application specificity for individual transputer designs, without any performance loss. Because of the simplicity of the core architecture, very high performance processors can be designed without the complexities and execution hiccups that arise from a pipelined design; transputers can average 10 million instructions per second in typical applications.

The RISC based design also allows INMOS to add two features to the transputer. The first is the 4 or 2 kilobytes on-chip memory already mentioned. Besides speeding memory access operations, a 32 bit word cycles in 50 nanoseconds, for instance, the on-chip memory permits considerable architectural freedom when designing multiple processing systems. Additionally, the transputer can address up to 4 gigabytes of memory over a 32 bit local memory bus.

The second feature afforded transputers by the RISC-like approach is the means by which many transputers can be interconnected to form multiprocessor systems. Transputers have several - normally four - high speed links by which data can be transferred between transputers. Each link is a fast asynchronous full duplex serial channel able to carry more than one and a half megabytes per second of useful data on each wire. Links are used to provide pairwise connection of transputers in a system.

The design of these interprocessor links allows the transputers in a multitransputer system to work together with considerably greater ease than is practical in a bus-based system, For example, the serial nature of the links eases the physical connections for each processor; being serial, only twisted pair and simple plugs and sockets are needed rather than complex backplane-style technology. In addition, distributing the processors between several cabinets spaced some distance apart is also feasible; transputer links can make local area networks. And the normal timing problems arising from trying to keep a large collection of processors synchronised so they can communicate is removed when the links are employed; their design insulates the local timing of each transputer from that of its fellow transputers.

As is discussed in the section on software, there are conceptual issues in designing multiprocessor systems as well as physical problems such as interconnect. Central to the resolution of the conceptual issues is the idea of building systems from cooperating tasks, or processes. The transputer processor provides instruction-set support for the implementation of processes, including multi-tasking and interprocess communications.

To the processor, the link hardware looks just like an interprocess communications channel. As a result, the processor includes direct instruction-set support for the links; data transfers through links are initiated by a single instruction, and once started continue until completion without any further processor intervention, The process that initiated the transfer is signalled upon completion of transfer.

Transputers are designed to be programmed in high level languages such as C, FORTRAN and Pascal or with OCCAM, the INMOS parallel processing language.

Applications and Topologies

Using the transputer means a designer can quickly and easily assemble a high performance system that matches his application, arranging the transputers into a network that matches the data flow pattern of the application. This is a key aspect of parallel processing, being able to build a system that models the probtem to be solved.

For example, in video processing applications the processing task is most effectively shared by a regulator matrix or a linear array. And in other applications, the best topology, or arrangement of processors, might be a tree structure.

Video display processors can use arrays of transputers to create life-like three dimensional models of images for mechanical design applications. Animated displays are also possible. And transputers can be linked together to provide massive processing power for such supercomputing applications as astrophysics, meteorology and geophysics.

Robotics and other industrial control applications can use the transputer's modular power and communications to expand the systems as needed. Embedded applications, such as high speed commercial laser printers, can take advantage of the transputer’s processing speed.

This building block approach gives the designer a powerful tool for assembling expandable processing engines with a performance span of 10 million instructions per second for a single processor system to one thousand million instructions per second for a hundred processor system. Such a system -equivalent to a supercomputer in processing power - would fit into a filing cabinet.

SOFTWARE FOR PARALLEL PROCESSING

The world which we inhabit is inherently concurrent. Events happen in both time and space. It is possible for two events to occur in the same place one after the other in time (ie sequentially), and equally possible for events to occur in different places at the same time (ie concurrently, or in parallel).

1n any computer application, the software has to be constructed to model the relevant aspects of the real world. A sharp contrast exists between the concurrent nature of the world and the sequential nature of the digital computer and conventional programming languages. Sequential computers were developed when processor costs dominated the system costs. However, the economics now favour a more sympathetic application design, as VLSI provides the potential of large scale affordable concurrency.

Parallel processing offers huge potential performance benefits, since ten processors running concurrently will execute ten times as many instructions in a second as a single processor, fifty processors will execute fifty times as many, etc...

A large number of today’s computing applications demand such large amounts of computer power that parallel processing and the division of tasks amongst many processors offers the only realistic solution.

In a multiprocessor system, one part of the program will be running on one processor, another part on another processor, and so on. A program designed for a multiprocessor system can be run, though not as fast, on a single processor system. The processor will share its time between different parts of the program, giving the user the illusion of concurrency.

Conventional programming languages are not well equipped to construct programs for multiple processors as their very design assumes the sequential execution of instructions. For this reason, INMOS developed a high level language, OCCAM. OCCAM is the first commercially available language to be based on the concepts of concurrency and communication. It includes constructs to express sequential execution, and synchronisation is implicit within the communication between concurrent processes.

An OCCAM program can be structured to reflect naturally the structure of the application, so that a system can be first described in OCCAM and then built using transputers. A program designed for a network of transputers can be run virtually unchanged on a smaller network, right down to a single transputer. OCCAM can capture the hierarchical structure of a system by allowing an interconnected set of processes to be regarded from the outside as a single process. At any level of detail, the designer is only concerned with a small and manageable set of processes.

ln addition to OCCAM, software tools offered by INMOS include industry standard C, Pascal and FORTRAN compilers to allow customers to re-use their investment in software already written. Input and output within the conventional languages appears completely standard from the programmer's point of view, and is translated into the necessary sequence of OCCAM channel communications by the appropriate run-time library.

A unique feature of INMOS’ hardware and software combination is that designers can program transputer systems in any mixture of these languages. Thus the inherently concurrent and time consuming parts of the application such as simulation, or design analysis can be written in OCCAM and the often bulky sequential parts (for example user friendly input and output) can be written in a conventional sequential language. As another example, each transputer in a network coutd be running a version of a user's G program, with communications between the various programs being made using library calls which provide OCCAM-like channel communication.

TRANSPUTER APPLICATIONS AREAS

Transputers offer many advantages over conventional microprocessors. For traditional single processor applications, the advantages include sheer performance, very compact packaging, low power dissipation, highly efficient interfaces that reduce component count and competitive pricing. In systems where multiple processors are needed or are advantageous, transputers also provide the simplest, most foolproof and fastest available interprocessor communications. In any system, a transputer solution will provide higher performance per square inch of PC board and per dollar.

As a result, transputers are attractive solutions to many applications needs. Areas included are:

Embedded intelligent applications like high performance laser printers, fingerprint recognition, signature recognition, video telephones, high speed facsimile and other areas.

Application-specific accelerators in workstations and PCs, like circuit simulators, statistical forecasting, molecular modelling, solids modelling for graphics.

Application-specific engines with need for high computational power, such as real-time computer animation for the film and television industry, simulation systems, and artificial intelligence.

Distributed control systems such as multi-joint robotics and factory automation.

Supercomputers for physics, meteorology, fluid dynamics.

Digital signal processing in vision systems, acoustic positioning, autonomous vehicles.

Defense applications in the above areas, including embedded avionics control systems, satellite surveillance systems, image recognition, distributed orbital control systems and simulation.

NOTES ON RISC

'Purist' RISCs are hardwired, not microcoded; normally it is claimed that this makes them go faster.

A hardwired machine’s design is much more difficult to change than a microcoded one.

Most RISCs are pipelined for speed, but the use of a pipeline makes the design more complex, the processor larger and response to interrupts and even jumps in program significantly slower.