Superscalar – Knowledge and References

Explore chapters and articles related to this topic

Pipeline Architecture

Published in Pranabananda Chakraborty, Computer Organisation and Architecture, 2020

The instruction pipeline mechanism has been further enhanced to obtain even higher throughput than superpipelined design exploiting more ILP. One such popular approach to accomplish this parallelism, which is implicit in sequential (ordinary) computer programs, is called superscalar approach. The term superscalar first appeared sometime in 1987 and simply refers to a machine being designed to increase the performance of the execution of scalar instructions. Incidentally, this design arrived on the heels of the RISC architecture. Although the simplified instruction set architecture of a RISC machine readily lends itself to superscalar techniques, the superscalar approach can, however, be used on either a RISC or CISC architecture. A superscalar architecture essentially is a more aggressive approach to equip the processor with, and in essence, replicating the scalar-based pipeline so that two or more instructions at the same stage of different pipelines can be processed simultaneously.

Three-Dimensional Molecular Electronics and Integrated Circuits for Signal and Information Processing Platforms

View Chapter

Purchase Book

Published in Sergey Edward Lyshevski, Nano and Molecular Electronics Handbook, 2018

Sergey Edward Lyshevski

The parallel execution capability (called superscalar processing), when added to pipelining of the individual instructions, means that more than one instruction can be executed per basic step. Thus, the execution rate can be increased. The rate RT of performing basic steps in the processor depends on the processor clock rate. The use of multiprocessors speeds up the execution of large programs by executing subtasks in parallel. The main difficulty in achieving this is the decomposition of a given task into its parallel subtasks, and then ordering these subtasks to the individual processors in such a way that communication among the subtasks are performed efficiently and robustly. Figure 6.30 documents a block diagram of a multiprocessor system with the interconnection network needed for data sharing among the processors Pi. Parallel paths are needed in this network in to parallel activity to proceed in the processors as they access the global memory space—as represented by the multiple memory units Mi. This is performed utilizing 3D organization.

Hardware Design Considerations

View Chapter

Purchase Book

Published in Gillian M. Davis, Noise Reduction in Speech Applications, 2018

Robert S. Oshana

Superscalar processors are the next step beyond pipelining. In superscalar architectures, multiple pipelines operate in parallel. A superscalar processor is one that can fetch, execute, and complete more than one instruction in parallel. Superscalar designs use smart hardware controllers to manage the parallelism on chip. This approach requires less support from the software (compiler) to manage the parallel issuance of instructions. Superscalar devices can execute words of varying widths. This enhances programmer flexibility and also produces higher code densities because the less complex instructions occupy fewer bytes of memory.

Exploration for Software Mitigation to Spectre Attacks of Poisoning Indirect Branches

View Article

Journal Information

Published in IETE Technical Review, 2018

Baozi Chen, Qingbo Wu, Yusong Tan, Liu Yang, Peng Zou

Modern superscalar processors are designed to exploit instruction-level parallelism (ILP). Speculative execution and branch prediction are the two most widely used techniques to maximize performance. Speculative execution allows the processor to execute instructions before being certain whether this execution should happen, while branch prediction can predict the next instruction to fetch and avoid stalling of control dependencies.

System performance enhancement with thread suspension for simultaneous multi-threading processors

View Article

Journal Information

Published in International Journal of Computers and Applications, 2020

Wenjun Wang, Wei-Ming Lin

Simultaneous Multi-Threading (SMT) processors allow instructions from different in dependent threads to be issued within the same clock cycle and improve the overall system throughput by exploiting Thread-Level Parallelism among the threads to fill in pipeline bubbles due to insufficient Instruction-Level Parallelism (ILP) typically present in the single-thread superscalar systems [1,2].

A programmable ternary CPU using hybrid CMOS/memristor circuits

View Article

Journal Information

Published in International Journal of Parallel, Emergent and Distributed Systems, 2018

Daniel Wust, Dietmar Fey, Johannes Knödtel

An easy to implement CPU is given by the MIPS CPU developed by John L. Patterson [24]. Each instruction addresses three registers, two for inputs, one for output. The execution of an instruction is divided into five stages:Instruction fetch: A sequential program counter holds the address of the instruction to be read from the instruction memory. The retrieved instruction is put in the register of the stage.Instruction decode: Control signals, registers to be accessed, the operation to be executed and immediate values are determined from the instruction code. The results are passed to the execute unit.Execute: The actual calculation is done by the arithmetic logic unit using the data retrieved from the first register, the second register and the immediate value respectively.Memory: If data memory needs to be accessed, the calculation result of a load operation executed by the the ALU determines the address of the data memory to be read into the first operand register. For store operations the data to be written is referenced by the content of the second register.Write-back: Depending on the instruction the result of the ALU or the data read from the data memory is written into the register file.In the CPU multiplication and division are reduced to subsequent SD additions for multiplication or subsequent SD additions and subtractions for division. By that, multiplication and addition can also profit from the SD arithmetic of a carry-free addition in constant time. Furthermore, as the signal delay of the ALU is assessed on basis of the longest critical path, it is necessary to introduce pipeline stages within the division and the multiplication pipeline path after each addition/subtraction step. Furthermore we favour a superscalar architecture with separate pipelines for multiplication, division, and addition/subtraction which supports out-of-order execution. This includes also an out-of-order execution scheduler which enables functional units for both arithmetic operations and data access operations to operate in parallel. The scheduling algorithm distributes the instructions to dedicated functional units and ensures the consistency of data.