Explore chapters and articles related to this topic
External Memory Data Structures
Published in Suman Saha, Shailendra Shukla, Advanced Data Structures, 2019
The memory system of most modern computers consists of a hierarchy of memory levels, with each level acting as a cache for the next; for a typical desktop computer the hierarchy consists of registers, level 1 cache, level 2 cache, level 3 cache, main memory, and disk. The model defines a computer as having two levels: 1. The cache which is near the CPU, cheap to access, but limited in space. 2. The disk which is distant from the CPU, expensive to access, but nearly limitless in space. The main aspect of this model is that transfers between cache and disk involve blocks of data. As a consequence of this, the memory access pattern of an algorithm has a major influence on its practical running time. If the program is aware of the cache hardware, the information can be used to optimize the cache complexity for the particular cache size and line length.
Performance engineering for HEVC transform and quantization kernel on GPUs
Published in Automatika, 2020
Mate Čobrnić, Alen Duspara, Leon Dragić, Igor Piljić, Mario Kovač
Shared memory is additionally exploited as the single access point for AZB identification. Since mapping residual block to thread-block is , an intermediate array of Booleans of length is allocated in the shared memory. Group of threads in a thread-block that computes levels in a TB will initiate a write request to the same corresponding array element if the non-zero level was identified in its vector. As the last step, each array element is tested by a single thread in the group to set a related array element in the output AZB array in global memory. To ensure correct results of values in both arrays, the threads are synchronized two times. First time after the initialization of arrays in shared and global memory and the second time after thread wrote a corresponding value to the matching residual block’s array element. The shared memory access pattern which enables maximum throughput of shared memory will be presented below. If the AZB identification stage is skipped in the process kernel, the processing time is shortened 4% or 2% depending on the used GPU type.