Explore chapters and articles related to this topic
High-Performance Computing and Its Requirements in Deep Learning
Published in Sanjay Saxena, Sudip Paul, High-Performance Medical Image Processing, 2022
Biswajit Jena, Gopal Krishna Nayak, Sanjay Saxena
The GPU has become an essential component of modern-day computing with the availability of volumetric data such as image, video, audio, and other forms of these data. GPUs are used in the modern computing age in every personal computer, laptop, desktop, workstations, mobile phone, game console, and embedding system as a multi-core and multi-threaded multi-processor. The fields of study related to visual processing like image processing, computer graphics, and computer vision prominently use GPU to process its applications. The high processing capacity of GPU is being credited by heavily parallel processing units [26, 27]. The GPUs have ignited the world of computation to the next level, hence popularising Artificial Intelligence (AI) with becoming a crucial part of modern supercomputing.
ŠUnderstanding Artificial Intelligence (AI)
Published in Louis J. Catania, AI for Immunology, 2021
A GPU is a specialized microprocessor optimized for displaying graphics and doing very specific computer tasks. CPUs and GPUs are both made from hundreds of millions of transistors which can process thousands of operations per second. The GPU uses thousands of smaller and more efficient cores than the CPU and can handle multiple functions of lively parallel data at the same time. GPUs are 50–100 times faster in tasks that require multiple parallel processes, such as computer graphics and gaming (for which they were initially developed by Nvidia), but its most significant value is in its iterative computations of massive data load in machine learning, deep learning, and big data analytics.7
Stream Processing Programming with CUDA, OpenCL, and OpenACC
Published in Vivek Kale, Parallel Computing Architectures and APIs, 2019
OpenCL is a new, open standard for programming heterogeneous systems consisting of devices such as CPUs, GPUs, DSPs (digital signal processors), and FPGAs (field-programmable gate arrays), unlike CUDA which is proprietary to Nvidia GPUs. The standard, maintained by an industry consortium called Khronos Group (https://www.khronos.org/opencl/), enables programmers to write their application programs in C/C++ languages that execute on a wide variety of devices produced by different vendors, thereby achieving application code portability. It supports both data-parallel and task-parallel models of computing.
Optimization strategies for GPUs: an overview of architectural approaches
Published in International Journal of Parallel, Emergent and Distributed Systems, 2023
Alessio Masola, Nicola Capodieci
A GPU can be connected to the rest of the system within the same chip as the CPU host, as an integrated GPU (iGPU) or as a discrete peripheral (dGPU). In this latter case, the GPU is able to move data between GPU-visible-only memory address spaces to CPU-only memory address spaces through a PCI-express connection. Memory accesses are a crucial aspect to consider when optimizing GPU workloads as each point within the GPU's memory hierarchy presents significant differences in bandwidth and latencies when accessing data. Moreover, accessing global memory, i.e. VRAM in dGPUs or directly system RAM in integrated systems on chips can significantly increase the energy costs for the considered workload, as RAM banks' power activation cost accounts for a significant percentage of the power budget of IoT-powered mobile/embedded devices. Not surprisingly, most of the research efforts in GPU optimization are therefore strongly focused on understanding and proposing novel approaches for orchestrating memory accesses.
Non-Newtonian effects on MHD thermosolutal free convection and entropy production of nanofluids in a rectangular enclosure using the GPU-based mesoscopic simulation
Published in Waves in Random and Complex Media, 2022
Aimon Rahman, Preetom Nag, Md. Mamun Molla
Compute Unified Device Architecture (CUDA) is a parallel computing process which is programmed in the C-platform and API developed by NVIDIA to enable general-purpose computing in GPU. In CUDA computing, the considered device is the GPU, and the CPU is considered as the host. The kernel executed by an array of threads is called from the host and runs on the device. These threads are structured into blocks. In the case of Tesla k40 maximum of a block is formed by 1024 threads. A kernel is executed in a 2D indexed thread grid, and a multiprocessor executes each thread. A thread ID indexes each thread in terms of row and column in a block. It is noteworthy that threads from separate blocks can not cooperate. The dimensions of grids and blocks are defined by the programmers explicitly. Memory access of the kernel has an important influence on the code implementation performance. Threads can access data from different scopes. Each thread has its private registers and local memories. The threads from the same block interface through shared memory and its access latency are low. Global memory is accessible by all threads. Figure ref illustrates thread organization, and memory hierarchy is illustrated in Figure 2. Detailed information about GPU computing is available in the previous studies (Molla et al. [47,52,56]).
A combined physical and DEM modelling approach to improve performance of rotary dryers by modifying flights design
Published in Drying Technology, 2021
Alireza Ghasemi, Alireza Hasankhoei, Gholamabbas Parsapour, Erfan Razi, Samad Banisi
The best and rather costly commercial CPUs for PCs typically have fewer than 30 cores (e.g., Intel Xeon Platinum 8173 M). It is, therefore, important to have access to thousands of processors to perform parallelization in a cost-effective way. One method is to use GPGPU (General-purpose computing on graphics processing units) technique. GPGPU is the use of a graphics processing unit (GPU), which usually handles computer graphics renders, to perform central processing unit (CPU) computational tasks. Typical GPUs include more than two thousand CUDA (Compute Unified Device Architecture) Cores (e.g., GTX Titan Xp includes 3840 CUDA Cores) allowing very efficient manipulation of large blocks of data. CUDA is a parallel computing platform and programing interface model developed by NVIDIA for general computing on GPUs. In GPU-accelerated applications, the sequential part of the workload runs on the CPU which is optimized for single-threaded performance while the compute intensive portion of the application runs on thousands of GPU cores in parallel.[45]