Explore chapters and articles related to this topic
Graphics Programming
Published in Aditi Majumder, M. Gopi, Introduction to Visual Computing, 2018
The CUDA programming model is a parallel programming model that provides an abstract view of how the processes can be run on underlying GPU architectures. The evolution of GPU architecture and the CUDA programming language have been quite parallel and interdependent. While the CUDA programming model has stabilized over time, the architecture is still evolving in its capabilities and functionality. GPU architecture has also grown in terms of number of transistors, and number of computing units over years, while yet supporting the CUDA programming model. The CUDA programming model has been used to implement many other algorithms and applications other than graphics and this explosion of use and permeability of CUDA in hitherto unknown applications has catapulted the GPU’s near ubiquitous use in many domains of science and technology. Since then all the GPUs designed are CUDA capable. It should be noted that before CUDA was released, there were attempts to create high level languges and template libraries such as Glift [[Lefohn et al. 06] and Scout [[McCormick et al. 07]. But such efforts tapered down with the introduction of CUDA, and more effort was spent on refining CUDA and building libraries using its constructs.
Stream Processing Programming with CUDA, OpenCL, and OpenACC
Published in Vivek Kale, Parallel Computing Architectures and APIs, 2019
CUDA, a parallel programming model and software environment developed by Nvidia, enables programmers to write scalable parallel programs using a straightforward extension of C/C++ and FORTRAN languages for Nvidia’s GPUs. A GPU consists of an array of streaming multi-processors (SMs). Each SM further consists of a number of streaming processors (SPs). The threads of a parallel program run on different SPs, and each SP has a local memory associated with it. Also, all the SPs in an SM communicate via a common shared memory provided by the SM. Different SMs access and share data among them by means of GPU DRAM, different from the CPU DRAM, which functions as the GPU main memory, called the global memory.
A parallel computing framework for solving user equilibrium problem on computer clusters
Published in Transportmetrica A: Transport Science, 2020
Xinyuan Chen, Zhiyuan Liu, Inhi Kim
Implementation of parallel computing on distributed memory architecture is quite different from the shared memory framework. To efficiently utilize the parallel-computing powers, a parallel programming model (a bridge between hardware and software) is required to express algorithms and their composition in programs. Conventionally, the message passing model (MPM) is widely used on the distributed memory architecture (Gropp et al. 1999). In the field of TA problems, Liu and Meng (2013) implemented the proposed distributed computing method on a computing cluster by virtue of MPM. However, the MPM still has its limitations as it requires a vast amount of programming changes to switch from serial to parallel version and can also be hard to debug. Therefore, a more concise parallel programming model is needed.