Explore chapters and articles related to this topic
Parallel Architectures
Published in Pranabananda Chakraborty, Computer Organisation and Architecture, 2020
Relentless rapid progress in VLSI technology has further made it feasible to construct massively parallel distributed-memory machines, large multicomputer configurations, clusters of computers as well as massively parallel processing systems and supercomputers using various types of static interconnection networks having different topologies, namely, Ring, Tree, Mesh, Torus, Hypercube (nCube), etc. and also with numerous types of dynamic interconnection structure, such as, multistage interconnection networks. These machines are entwined with improved processor technology providing high-performance computing and communications (1 Teraflops), high-speed high capacity (1 Terabyte) of a hierarchical memory module, extensive I/O support with high bandwidth (1 Terabyte / seconds), and synchronization features, while all working in an orchestra realizing effectively massively parallel processing (MPP) to meet the forthcoming challenges. MPP systems while providing parallel processing require both hardware parallelism and software parallelism that can, however, be accomplished by various other means. Today’s supercomputer is essentially a massively parallel processing systems with thousands of “ordinary” CPUs; some being off-the-shelf units, such as, PowerPC, PA-RISC or UltraSPARC (RISC), and others being custom designs. Most modern supercomputers are now highly-tuned computer clusters using commodity processors combined with fast custom-interconnects.
Bioinformatics and Applications in Biotechnology
Published in Ram Chandra, R.C. Sobti, Microbes for Sustainable Development and Bioremediation, 2019
With enormous amount of data being thrown up by powerful experimental and sequencing techniques, the enabling technologies to analyze them call for very-high-end computational capabilities. A computer cluster is assembled to work as one machine harnessing the power of each computer synergistically. High speed networks coupled with software for distributed computing have made it possible to link a large number of computers to work as one machine. As of November 2016, the Chinese Sunway TaihuLight is the world’s most powerful supercomputer reaching 93.015 petaFLOPS (105 floating point operations per second). It consists of 40,960 processors with each processor containing 256 processing cores for a total of about 10 million CPU cores across the entire system (Fu et al., 2016). The Blue Gene high-performance computing system was developed by IBM in collaboration with Department of Energy’s Lawrence Livermore National Laboratory in California. It was built to specifically observe the process of protein folding and gene development. Blue Gene/L uses 131,000 processors for 280 trillion operations every second. The power of the system can be gauged from the fact that, on a calculator, one has to work nonstop for 177,000 years to perform the operations that Blue Gene cando in 1 s (http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/bluegene/).
An Analysis of Distributed Programming Models and Frameworks for Large-scale Graph Processing
Published in IETE Journal of Research, 2022
Alejandro Corbellini, Daniela Godoy, Cristian Mateos, Silvia Schiaffino, Alejandro Zunino
Due to its fairly chaotic nature, graph algorithms present new challenges to distributed computing. Graphs can be arbitrarily connected, which means that a vertex may connect to any vertex in the graph. An algorithm that walks from to may span several computational nodes, depending on where and (and their associated information) are physically located. Maximising vertex co-location is an NP-complete problem, so heuristics are used to preserve data locality. Related to this locality problem is task balancing. Many real-world graphs follow a power-law distribution: there are many low-connected vertices that might be processed very fast, and a smaller number of vertices highly connected, and thus are slower to process. For example, in large-scale social networks, a relatively small number of celebrities have a large neighbourhood (e.g. followers) whereas “normal” users have a rather small one. This type of graph complicates the balanced distribution of tasks across a computer cluster.
Taylor mapping method for solving and learning of dynamic processes
Published in Inverse Problems in Science and Engineering, 2021
The employed PNN is of order 2 and uses a step of to discretize the time interval . Its input is a quadruple representing the initial data at time and the output is an approximation of the ODE-parameters at time . The high number of training data makes our optimization algorithm computationally expensive. Therefore, we parallelize the differentiation procedure and use a computer cluster for training. We switch back to a regular machine after training. In most applications the minimization problem has to be solved only once. Therefore, it is not really a restriction if one has to use a cluster computer.
Scale-adaptive simulation of unsteady flow and dispersion around a model building: spectral and POD analyses
Published in Journal of Building Performance Simulation, 2018
Mohammad Jadidi, Farzad Bazdidi-Tehrani, Mohsen Kiamansouri
All of the present computations are performed by a high-performance PC. In total, 32 CPUs of the computer cluster are allocated in parallel for all the simulations. Each simulation can be divided into two different parts. The first part consists of initializing each case until all the initial transient conditions are washed out and the flow is developed. This usually needs around 20,000 time steps. The second part involves time averaging and statistics collection for all the intended quantities. This is carried out over the last 80,000 time steps for each turbulence modelling approach, which is equivalent to 10–15 complete transits of the flow through the computational domain. Totally, it takes about 644, 385 and 204 hours wall-clock time for 80,000 time steps to complete for the SAS, LES and URANS computation based on the SST k–ω turbulence modelling approaches, consecutively. In terms of the computational performance, SAS imposes an extra CPU cost, which is 67% higher than that of LES for the same grid resolution. Furthermore, for the same time step size and convergence criteria, the number of iterations per time step associated with SAS is also found to be greater than the other turbulence modelling approaches. For instance, the number of iterations per time step is 11, 10 and 4 for SAS, LES and URANS, successively.