Memory access pattern

Memory access pattern refers to the way in which an algorithm accesses data stored in memory, and how frequently it does so. It is a critical factor in determining the practical running time of an algorithm, as a poorly designed memory access pattern can result in significant performance issues. By optimizing the memory access pattern, programmers can ensure that their programs are able to access data as efficiently as possible, leading to faster and more reliable performance.From: Handbook of Data Structures and Applications [2019], Game Engine Gems 2 [2019]

External Memory Data Structures

View Chapter

Purchase Book

Published in Suman Saha, Shailendra Shukla, Advanced Data Structures, 2019

Suman Saha, Shailendra Shukla

The memory system of most modern computers consists of a hierarchy of memory levels, with each level acting as a cache for the next; for a typical desktop computer the hierarchy consists of registers, level 1 cache, level 2 cache, level 3 cache, main memory, and disk. The model defines a computer as having two levels: 1. The cache which is near the CPU, cheap to access, but limited in space. 2. The disk which is distant from the CPU, expensive to access, but nearly limitless in space. The main aspect of this model is that transfers between cache and disk involve blocks of data. As a consequence of this, the memory access pattern of an algorithm has a major influence on its practical running time. If the program is aware of the cache hardware, the information can be used to optimize the cache complexity for the particular cache size and line length.

Performance engineering for HEVC transform and quantization kernel on GPUs

View Article

Journal Information

Published in Automatika, 2020

Mate Čobrnić, Alen Duspara, Leon Dragić, Igor Piljić, Mario Kovač

Shared memory is additionally exploited as the single access point for AZB identification. Since mapping residual block to thread-block is , an intermediate array of Booleans of length is allocated in the shared memory. Group of threads in a thread-block that computes levels in a TB will initiate a write request to the same corresponding array element if the non-zero level was identified in its vector. As the last step, each array element is tested by a single thread in the group to set a related array element in the output AZB array in global memory. To ensure correct results of values in both arrays, the threads are synchronized two times. First time after the initialization of arrays in shared and global memory and the second time after thread wrote a corresponding value to the matching residual block’s array element. The shared memory access pattern which enables maximum throughput of shared memory will be presented below. If the AZB identification stage is skipped in the process kernel, the processing time is shortened 4% or 2% depending on the used GPU type.

Memory access pattern

Explore chapters and articles related to this topic

External Memory Data Structures

Performance engineering for HEVC transform and quantization kernel on GPUs