Explore chapters and articles related to this topic
A Retrospective on High Performance Embedded Computing
Published in David R. Martinez, Robert A. Bond, Vai M. Michael, High Performance Embedded Computing Handbook, 2018
Less than a decade ago, defense system applications demanded computing throughputs in the range of a few GOPS consuming only a few 1000s of watts in power (approximating 1 MOPS/W). However, there was still a lot of interest in leveraging commercial off-the-shelf (COTS) systems. Therefore, in the middle 1990s, the Department of Defense (DoD) initiated an effort to miniaturize the Intel Paragon into a system called the Touchstone. The idea was to deliver 10 GOPS/ft3. As shown in Figure 1-4, the Intel Paragon was based on the Intel i860 programmable microprocessor running at 50 MHz and performing at about 0.07 MFLOPS/W. The performance was very limited but it offered programming flexibility. In demonstration, the Touchstone successfully met its set of goals, but it was overtaken by systems based on more capable DSP microprocessors. At the same time, the DoD also started investing in the Vector Signal and Image Processing Library (VSIPL) to allow for more standardized approaches in the development of software. The initial instantiation of VSIPL was only focused on a single processor. As discussed in later chapters, VSIPL has been successfully extended to many parallel processors operating together. The standardization in software library functions enhanced the ability to port the same software to other computing platforms and also to reuse the same software for other similar algorithm applications.
Parallel Architectures
Published in Pranabananda Chakraborty, Computer Organisation and Architecture, 2020
The focus of the definition of MPP system changes with the advancement of computer technology. Instruction-level parallelism became the major issue on demand. Computers operating in MIMD mode, both multiprocessor and multicomputer, now started to redefine the MPP system. A few of the representative systems that belong to this category drawn from SIMD, MIMD, and SIMD–MIMD are as follows: IBM RP3 was designed to include 512 processors, but only a version of a 64 processors was built. IBM MPP model was declared as targeting a configuration of 1024 processor, each processor was an RS / 6000 as building block with a projected peak speed of 50 Gflops.BBN TC 2000 was configured with a maximum configuration of 512 processors, but was striving for an even larger machine.Intel Touchstone Delta, a multicomputer system with 672 node processors was launched in 1991. Intel Paragon, a host-free multicomputer built with i860 XP microprocessors using 2D Mesh connection along with wormhole routers was introduced in mid-1992 with a peak performance of about 300 Gflops.CRAY MPP model (T 3D) with a heterogeneous MIMD scalable microarchitecture using 3D Torus network to connect DEC Alpha microprocessor chips used as building block was available in 1993. This architecture with larger configuration eventually attained a speed of teraflops. This machine was supposed to work as a back-end accelerator engine compatible with the existing Cray Y-MP Series.The CM-5 machine from Thinking Machine Corporation launched in mid-1992 used a hybrid SIMD–MIMD architecture that was equipped with 32 to 16,384 processing nodes (not elements); each such node contains a 32-MHz SPARC processor, 32-Mbytes of memory, and a 128-Mflops vector processing unit capable of executing 64-bit floating-point operations as well as integer operations. High-performance networks, and high-bandwidth I/O interfaces supported with voluminous mass secondary storage as a data vault promoted this machine architecture to a level fit for a massively parallel processing environment.Fujitsu VPP (Vector Parallel Processor) 500, a MIMD vector system was launched in last part of 1993. It was a 222-PE (processors with attached memory) system with large crossbar (224 × 224) interconnect. VP 2000 machine with shared distributed memories was used as host in solving large-scale problems.KSR-1 model introduced by Kendall Square Research was a multiprocessor with a configuration of 1088 custom-designed processors using ring interconnect. It attained a peak performance of about 45 Gflops.
Personal reflections on 50 years of scientific computing: 1967–2017
Published in International Journal of Parallel, Emergent and Distributed Systems, 2020
This decade also saw the emergence of parallel computing in almost every conceivable flavor, a truly exciting period for computer and computational scientists. Alas, the market could not sustain this cornucopia of parallel architectures, and almost all the companies and/or products failed. I lost this decade working on the Elxsi 6400 (the first parallel machine at Sandia National Laboratories in 1984), Intel iPSC-32 (the ‘hypercube’, serial number 2, at General Motors Research Laboratories in Warren, Michigan), NCUBE-10 (the company sold numerous machines but went out of business without ever producing a working FORTRAN compiler), Sequent Balance 21,000, Sequent Symmetry S81, Cray 2, Cray Y-MP C90, Intel Paragon XP/S, SGI Origin 2000. Despite the commercial failure of parallel computing, the community learned a lot about parallel programming, interconnection topologies, communication technologies, and the tradeoffs between shared and distributed memory systems. The community also learned that parallel programming was, and still is, very hard.