Explore chapters and articles related to this topic
Big Data Stream Processing
Published in Vivek Kale, Parallel Computing Architectures and APIs, 2019
Distributed shared memory (DSM) is a mechanism by which processes can access shared data without interprocess communication. The challenges of implementing a DSM system include addressing problems such as data location, data access, sharing and locking data, and data coherence. These problems have connections with transactional models, data migrations, concurrent programming, distributed systems, etc. RDDs are an important abstraction in Spark that allows a read-only collection of objects capable of rebuilding lost partitions across clusters. These RDDs can be reused in multiple parallel operations through memory caching. RDDs use lineage information about the lost partition in the rebuilding process.
Parallel Architectures
Published in Pranabananda Chakraborty, Computer Organisation and Architecture, 2020
Multiprocessor systems are best suited for general purpose multi-user applications where major thrust is on programmability. Shared-memory multiprocessors can form a very cost-effective approach, but latency tolerance while accessing remote memory is considered a major shortcoming. Lack of scalability is also a key limitation of such a system. Distributed shared memory (DSM) multiprocessors, however, address all these issues, and resolve most of all these drawbacks to a considerable extent by way of providing an extended form of stringent shared-memory multiprocessor architecture.
Adaptive output feedback control with cerebellar model articulation controller-based adaptive PFC and feedforward input
Published in SICE Journal of Control, Measurement, and System Integration, 2022
Nozomu Otakara, Kota Akaike, Sadaaki Kunimatsu, Ikuro Mizumoto
CMAC is well known as a type of neural network based on a mathematical model of the mammalian cerebellum. In the CMAC, inputs to the input space are transformed into the label set, and it outputs the average value of the weights by referring the weights in the activated cells with a distributed shared memory structure based on the input label. Since the CMAC is trained based on variables in the specified area, it is simple and might achieve quick learning and adjusting the weights in the activated cells through the error observed at the output. We utilize this CMAC strategy for adjusting parameters for the linearly approximated system in a specific domain. Thus, for a system satisfying Assumptions 3.1 and 3.2, in each subsets and , we adjust parameters of PFC and of FF control input via CMAC strategy.
Method of Characteristics for 3D, Full-Core Neutron Transport on Unstructured Mesh
Published in Nuclear Technology, 2021
Derek R. Gaston, Benoit Forget, Kord S. Smith, Logan H. Harbour, Gavin K. Ridley, Guillaume G. Giudicelli
As the number of partitions grow with MRT, so does the memory usage as the angular flux is stored at the beginning and end of each track on each partition, and the total partitions’ surface increases. To compute an estimate of the total memory that would be used by MRT for the 3D BEAVRS problem, a few assumptions are made: a 360 360 460 cm domain size, 32 azimuthal angles, four polar angles, isotropic tracks, and a domain partitioning using cubic numbers of MPI ranks. Using these assumptions, the graph in Fig. 4 is produced, showing the amount of memory used to store the partition boundary angular fluxes. The growth of memory with MPI ranks with MRT could represent an issue for usability with large-scale compute jobs. However, this could be somewhat alleviated by using hybrid distributed shared memory, in which MPI ranks are used across nodes and local (on-node), non-MPI processes are created that can share on-node memory as needed (for example, for angular fluxes between partitions on the same node).
Implicit discrete ordinates discontinuous Galerkin method for radiation problems on shared-memory multicore CPU/many-core GPU computation architecture
Published in Numerical Heat Transfer, Part B: Fundamentals, 2021
When running SNDG/GMRES implementations on shared-memory, multicore systems, in order to make full usage of all available CPU cores, the computer program needs to be parallelized appropriately. The application programming interfaces (APIs) of message–passing interface (MPI) [15] and open multiprocessing (OpenMP) [16] are both feasible for this purpose. The MPI model is highly efficient for shared-memory systems, and MPI based parallel programs could easily be extended to run on distributed memory and hybrid distributed/shared-memory architectures, e.g., HPC clusters. However, MPI parallelization might affect convergence characteristics and simulation data quality of the method [17, 18]. Besides, programming wise, given an existing serial program, it typically requires extensive modifications or even re-coding from the scratch to develop an MPI based parallel program. On the contrary, the OpenMP parallel model is compiler directive based and allows for an incremental programming style. Any existing serial program could be parallelized within a short period of time with little programming work involved. For shared-memory systems, OpenMP is the more economic parallel model between the two. Besides, within the literature body of parallel implementation of radiation problem simulations, most publications are on MPI parallel model [17–23] while OpenMP parallel model has rarely been explored, which is another reason why this work focuses on the OpenMP model.