OpenACC – Knowledge and References

Explore chapters and articles related to this topic

Stream Processing Programming with CUDA, OpenCL, and OpenACC

Published in Vivek Kale, Parallel Computing Architectures and APIs, 2019

At the level of API, OpenACC is similar to OpenMP, because it primarily uses directives to describe potential parallelism in a standard program. While CUDA and OpenCL allow for a (comparably) lower-level programming with management of computing devices and explicit management of streams and queues respectively, OpenACC uses directives to point out sections of code that may include parallelism or point out specific constructs, such as loops that may be parallelized. The OpenACC API specifies ways of expressing parallelism in C and Fortran codes, so that some computations can be offloaded to accelerators such as GPUs.

Acceleration of high-order combined compact finite difference scheme for simulating three-dimensional flow and heat transfer problems in GPUs

View Article

Journal Information

Published in Numerical Heat Transfer, Part B: Fundamentals, 2020

Neo Shih-Chao Kao, Rex Kuan-Shuo Liu, Tony Wen-Hann Sheu

OpenAcc, like OpenMP, is a high-level, portable and directive-based programing model. It provides programmers with a set of compiler directives, runtime routines, and environment variables so as to specify which parts of the computing task can be accelerated on GPU. Some implementation details, such as the initialization, data transfer between the CPU and GPU, execution configuration and finalization, are hidden by OpenAcc, thus largely reducing the effort in developing a GPU program. However, the over-simplified OpenAcc programing model may prevent full use of the available hardware resources, thereby potentially reducing speedup performance as compared to a high manually tuned CUDA program in solving the same problem. As a result, a better strategy is to employ these two models simultaneously so that the advantages of CUDA and OpenAcc models can be both exploited.

Acceleration of three-dimensional Tokamak magnetohydrodynamical code with graphics processing unit and OpenACC heterogeneous parallel programming

View Article

Journal Information

Published in International Journal of Computational Fluid Dynamics, 2019

H. W. Zhang, J. Zhu, Z. W. Ma, G. Y. Kan, X. Wang, W. Zhang

OpenACC as a user-driven directive-based performance-portable parallel programming model, is developed to simplify the parallel programming for scientists and engineers. Compared with the CUDA and OpenCL which require great efforts on code redevelopment, OpenACC has many advantages, such as satisfactory acceleration with very few modifications on an original source code and good compatibility with other devices, for example central processing unit (CPU). It has been successfully applied in some scientific and engineering codes, such as the flow code NeK5000 (Markidis et al. 2015), the computational electromagnetics code Nekton (Otten et al. 2016), the C++ flow solver ZFS (Kraus et al. 2014), the Rational Hybrid Monte Carlo (RHMC) QCD code (Gupta and Majumdar 2018), the Gyrokinetic Toroidal Code (GTC) (Wang et al. 2016), the space plasma Particle-in-cell (PIC) code iPIC3D (Peng et al. 2015), the three dimensional pseudo-spectral compressible magnetohydrodynamic GPU code G-MHD3D (Mukherjee et al. 2018), the solar MHD code MAS (Caplan et al. 2018), and etc. The application perspective of the OpenACC technology in other scientific and engineering areas is very good and attractive.

Study and evaluation of automatic GPU offloading method from various language applications

View Article

Journal Information

Published in International Journal of Parallel, Emergent and Distributed Systems, 2022

Yoji Yamato

Next, for offloading the loop statement, the loop pattern is geneticised. GPU processing is specified by #pragma acc kernels and #pragma acc parallel loop of OpenACC, and #pragma acc data copy and #pragma acc data present specify whether to transfer data or not. OpenACC code corresponding to the gene pattern is compiled by using the PGI compiler, and performance is measured by using Jenkins. Creation and repetition of next-generation gene patterns in GA are performed by the common part.