Explore chapters and articles related to this topic
Introduction to computer architecture
Published in Joseph D. Dumas, Computer Architecture, 2016
Although mobile devices and small computing systems made considerable advances during the sixth generation of computing, high-end systems, including servers and supercomputers, made great strides forward as well. One of the most important techniques to become popular during this time span was the use of Graphics Processing Units (GPUs) to supplement conventional CPUs in handling certain demanding computing jobs. GPUs, as their name suggests, are specialized processors optimized for the mathematically intensive process of creating images to be displayed to a human user—often in the context of a computer-based game, simulation, or virtual environment. Specialized graphics hardware has been in existence since at least the fourth generation of computing, and the term GPU was coined during the fifth generation (in 1999) by NVIDIA Corporation, which described its GeForce 256 processor as “the world’s first GPU.” However, it was several years later before the practice of using GPUs for nongraphical tasks (General-Purpose computing on Graphics Processing Units [GPGPU]) was widely adopted. As we shall explore further in Section 6.1.2, the same type of hardware parallelism that makes GPUs good at processing image data has also been found by researchers to be extremely efficient for processing other types of data, particularly performing scientific computations on highly parallel data sets composed of vectors or matrices. The development of proprietary programming languages, such as NVIDIA’s Compute Unified Device Architecture (CUDA), and cross-platform languages, such as OpenCL, have allowed programmers to apply the power of GPUs to many computationally intensive problems. By November 2015, the fraction of the world’s top 500 supercomputers that used GPU coprocessors/accelerators to boost their computational performance had risen to 21% (104/500).
Price-Performance of Computer Technology
Published in Vojin G. Oklobdzija, Digital Design and Fabrication, 2017
Another approach to improving computer system performance is to cluster a few computers [104]. Cooperative sharing and fail-over is one type of cluster that provides enhanced reliability and performance. Other clusters are collections of nodes with fast interconnections to share computations. The TOP500 list [47] ranks the fastest computer systems in the world based on LINPACK benchmarks. The top entries contain many thousand processors.
State-of-the-art in the mechanistic modeling of the drying of solids: A review of 40 years of progress and perspectives
Published in Drying Technology, 2023
Patrick Perré, Romain Rémond, Giana Almeida, Pedro Augusto, Ian Turner
We must keep in mind that the advancement in the solution of the macroscopic formulation is due to two important factors. The first is undoubtedly a result of the substantial increased computing power that has become available over this period. For example, in 1985, the supercomputer Cray-2 had a power of 1.9 GFlops (Giga = 109, Flops = floating point operations per second) and in 2022 the No. 1 system of the actual TOP500 (release of June 2022) has a peak power over one EFlops (Exa = 1018). Regarding processors for personal computers, the Intel 386 launched in 1985 was incapable of reaching one MFlops (Mega = 106). Nowadays, the most powerful processors reach the TFlop (Tera = 1012) range, and this figure is much higher for GPUs (of this order of 100 TFlops). In both cases, it is about a hundred million-fold increase in computing power over four decades.
Spatial specification of hypertorus interconnect by infinite and reenterable coloured Petri nets
Published in International Journal of Parallel, Emergent and Distributed Systems, 2022
Dmitry A. Zaitsev, Tatiana R. Shmeleva, Birgit Pröll
Recent advances in the high-performance (exascale) computing domain are signified by wide application of multidimensional torus interconnect starting from 3D torus for IBM Blue Gene computer, 5D torus for IBM Blue Gene/Q, and further developed in Fudjitsu Tofu Interconnect that represents a combined lattice-mesh structure represented by 6D torus [1]. Tofu Interconnect, first applied in K supercomputer, recently connects nodes of the most powerful computer in the world (according to top500 June 2020 list) Fugaku [1]. For Fugaku supercomputer, Fujitsu developed a new AF64 processor connecting via network-on-chip 48 computing cores. Thus, networks-on-chip also employ spatial structure of interconnect for cores, usually represented by 2D and recently by 3D lattices.
Aggregation of clans to speed-up solving linear systems on parallel architectures
Published in International Journal of Parallel, Emergent and Distributed Systems, 2022
Dmitry A. Zaitsev, Tatiana R. Shmeleva, Piotr Luszczek
Hypertorus communicating structures are widely applied as communication facilities of supercomputers and clusters since multi-dimensional torus (hypertorus) possesses ideal qualities of the minimal distance between two chosen nodes. For instance, IBM Blue Gene supercomputer communication system was implemented as a three-dimensional torus and its communication on chip as a five-dimensional torus [37]; the most recent supercomputer Fugaku [38], ranked number 1 on the TOP500 list, uses the TOFU Inerconnect D whose topology can be characterised as six-dimensional torus. Models of hypertorus communication structures in a form of Petri Nets and process algebra have also been studied [34]. An example of two-dimensional torus is shown in Figure 8. A node model specifies a packet switching device with four ports situated on each side of a square, the nodes are connected by merging their contact places. Torus topology requires connection of opposing borders of the resulting lattice. In d-dimensional case, a node is specified by a hypercube. We use von Neumann neighbourhood, when neighbouring nodes are connected by facets considered as -dimensional hypercubes, though a generalised neighbourhood [39] can be applied as well. Here, we consider four-dimensional torus of size 3 and represent the source decomposition graph in Figure 9 and its aggregation into seven clans in Figure 10 as generated by METIS [5]. Note that in this case, the decomposition graph ( Figure 9) is not much informative and for big size real-life models, having thousands of Petri net vertices, graphical representation does not help much. That is why in the next section, we use textual forms to specify briefly the obtained decomposition.