Heterogeneous computing – Knowledge and References

Explore chapters and articles related to this topic

Application Uncertainty Propagation

Published in Bin Jia, Ming Xin, Grid-based Nonlinear Estimation and Its Applications, 2019

The heterogeneous computing architecture uses both CPU and GPU. CUDA provides the development environment for implementing the parallel computing applications using GPU. The GPU is not operated as a standalone system but interacts with a CPU. Note that the computation can be performed by GPU. But GPU needs to interact with CPU to transfer the initial values and results. The data exchange between CPU and GPU is through a PCI-Express bus. The CPU is often called the host and the GPU is called the device. CPU is good at computing tasks with complicated logic control while the GPU is good at computing tasks with a large data set but simple control logic. For the orbital uncertainty propagation problem, the control logic is simple while the data can be large. Hence, the GPU computing technology is more suitable for this problem. In the following, we show the benefit of using GPU in the orbit uncertainty propagation problem.

A

View Chapter

Purchase Book

Published in Phillip A. Laplante, Dictionary of Computer Science, Engineering, and Technology, 2017

Phillip A. Laplante

analytical benchmarking the quantification of how effectively each machine in a heterogeneous computing environment can perform different categories of computation. The fact that a given high-performance machine can achieve near-peak performance for only a relatively small set of code types is the underlying motivation for heterogeneous computing. Although some general frameworks and tools for the process of analytical benchmarking have been proposed, more research is needed to achieve complete and effective automation of this process.

Interconnection Network Energy-Aware Scheduling Algorithm

View Chapter

Purchase Book

Published in Kenli Li, Xiaoyong Tang, Jing Mei, Longxin Zhang, Wangdong Yang, Keqin Li, Workflow Scheduling on Computing Systems, 2023

Kenli Li, Xiaoyong Tang, Jing Mei, Longxin Zhang, Wangdong Yang, Keqin Li

Most heterogeneous computing systems are mainly consisted of CPU+GPU computing nodes and high-speed interconnection networks. For example, the computing nodes of Summit supercomputer are equipped with two IBM Power9 CPUs, and six NVIDIA Tesla V100 GPUs [1]. These nodes are linked by a dual-rail Mellanox EDR InfiniBand high-speed interconnection network [100].

Implementation of the parallel mean shift-based image segmentation algorithm on a GPU cluster

View Article

Journal Information

Published in International Journal of Digital Earth, 2019

Fang Huang, Yinjie Chen, Li Li, Ji Zhou, Jian Tao, Xicheng Tan, Guangsong Fan

In summary, based on existing research studies on the mean shift algorithm, it can be concluded that: Optimization of the serial mean shift-based image segmentation algorithm will not dramatically improve its efficiency. Instead, parallelization is likely to be more fruitful.Existing CPU-based cluster computing can help obtain adequate performance, but with a low cost-to-performance ratio, and cannot meet the near real-time response requirement for some specific applications.Heterogeneous computing mechanisms, e.g. CPU + GPU, are evolving very fast and can obtain better performance. However, these still face some challenges that need to be overcome. First, the CUDA programming model, which can only work on NVIDIA GPUs, has poor portability and versatility. Second, the traditional method considers performance improvement only on a single node. In fact, the computing tasks in some real-world RS processing applications demand more than one node, each containing multiple GPU cards. For instance, in real-time multi-temporal RS image change detection, a single computing node is not sufficient.

Fast hydrological model calibration based on the heterogeneous parallel computing accelerated shuffled complex evolution method

View Article

Journal Information

Published in Engineering Optimization, 2018

Guangyuan Kan, Xiaoyan He, Liuqian Ding, Jiren Li, Yang Hong, Depeng Zuo, Minglei Ren, Tianjie Lei, Ke Liang

It has been widely proved that parallel computing is an effective way to accelerate the optimization method. The parallel algorithms must be implemented on parallel computing devices such as multi-core central processing units (CPUs) and many-core GPUs. Before 2005, most processors were single-core devices, and hardware producers improved the performance by boosting the clock speed of the processor or by adopting instruction-level parallelism. In those days, the ‘free lunch’ Moore’s law appeared. However, the performance of a single-core processor can no longer follow Moore’s law. Therefore, hardware producers switched to integrating multiple processors into one chip and processors started to support vectorization. Multi-core technology appeared. However, the increasing computational burden has exceeded the computational power of the multi-core CPU and new parallel devices are urgently needed. In 2007, the NVIDIA Corporation proposed the CUDA and NVIDIA GPU. New-generation high-performance computing devices composed of multi-core CPUs and many-core GPUs have become mainstream. This new hardware and the corresponding software programming technologies are named heterogeneous parallel computing. The heterogeneous computing system is much more powerful and energy efficient than the traditional CPU-based supercomputer. It features two concepts: heterogeneity and parallelism. Heterogeneity indicates that the computing platform is constituted of multiple types of device with different hardware architectures, such as the popular X86 CPU + NVIDIA GPU. Parallelism indicates that the programming technology is a parallel one.

Predictive modeling and analysis of runout distance of physical mudflows based on a discrete element method

View Article

Journal Information

Published in Journal of the Chinese Institute of Engineers, 2021

Jian Ye, Gordon G. D. Zhou, Jinfeng Liu

To improve the simulation accuracy and the efficiency of the prediction model of physical mudflows based on the DEM, the CPU was used to store the positions and velocities of physical mudflow particles. All visualization tasks were offloaded to the GPU. The Compute Unified Device Architecture (CUDA) was employed to make use of the CPU and GPU during CPU–GPU heterogeneous computing. The CPU was used to manage the GPU, provide data for the GPU, and receive data transmitted from the GPU. Thus, the GPU provided parallel computing so that the hierarchical architecture of the GPU memory could be used, especially for caching, thereby improving the heterogeneous CPU–GPU computing effect.