Memory bandwidth – Knowledge and References

Explore chapters and articles related to this topic

Introduction to computer architecture

Published in Joseph D. Dumas, Computer Architecture, 2016

Memory bandwidth, or the amount of information that can be transferred to or from a memory system per unit of time, depends on both the speed of the memory devices and the width of the pathway between memory and the device(s) that need to access it. The cycle time of the memory devices (divided by the interleave factor, if appropriate) tells us how frequently we can transfer data to or from the memory. Taking the reciprocal of this time gives us the frequency of data transfer; for example, if we can do a transfer every 4 ns, then the frequency of transfers f = 1/(4 × 10–9 s) = 250 MHz or 250,000,000 transfers per second. To compute the bandwidth of the transfers, however, we need to know how much information is transferred at a time. If the bus only allows for 8-bit (or single byte) transfers, then the memory bandwidth would be 250 MB/s. If the memory system were constructed of the same type devices but organized such that 64 bits (8 bytes) of data could be read or written per cycle, then the memory bandwidth would be 2000 MB/s (2 GB/s).

M

View Chapter

Purchase Book

Published in Philip A. Laplante, Comprehensive Dictionary of Electrical Engineering, 2018

Philip A. Laplante

memory address register (MAR) a register inside the CPU that holds the address of the memory location being accessed while the access is taking place. memory alignment matching data to the physical characteristics of the computer memory. Computer memory is generally addressed in bytes, while memories handle data in units of 4, 8, or 16 bytes. If the "memory width" is 64 bits, then reading or writing an 8 byte (64 bit) quantity is more efficient if data words are aligned to the 64 bit words of the physical memory. Data that is not aligned may require more memory accesses and more-or-less complex masking and shifting, all of which slow the operations. Some computers insist that operands be properly aligned, often raising an exception or interrupt on unaligned addresses. Others allow unaligned data, but at the cost of lower performance. memory allocation the act of reserving memory for a particular process. memory bandwidth the maximum amount of data per unit time that can be transferred between a processor and memory. memory bank a subdivision of memory that can be accessed independently of (and often in parallel with) other memory banks. memory bank conflict conflict when multiple memory accesses are issued to the same memory bank, leading to additional buffer delay for such accesses that reach the memory bank while it is busy serving a previous access. See also interleaved memory. memory block contiguous unit of data that is transferred between two adjacent levels of a memory hierarchy. The size of a block will vary according to the distance from the CPU, increasing as levels get farther from the CPU, in order to make transfers efficient.

Unsteady flow simulation using the curvilinear multiple-relaxation-time lattice Boltzmann method: Danube River case study

View Article

Journal Information

Published in Journal of Hydraulic Research, 2020

Ljubomir Budinski, Emad Pouryazdanpanah Kermani, Sanja Ožvat, Julius Fabian, Matija Stipić

The computational advantage of the LBM is that the implemented streaming and colliding steps lead to a computational algorithm which is very suitable for parallelization on graphics processing unit (GPU) based architectures. The simulations conducted in this study are performed using a single workstation Dell precision T7500 with an NVIDIA TeslaTM C2050 computing processor (Santa Clara, California). The NVIDIA TeslaTM C2050 computing processor contains 14 multiprocessors running with 448 compute unified device architecture (CUDA) cores, which has peak performance of 1.03 TFLOPs. While the memory clock operates at 1.5 GHz and the memory size is 3 GB, the memory interface is 384 bit and the memory bandwidth is 144 Gb s−1. Computations on the GPU are organized into kernels (GPU programs) to be executed by multiple threads in parallel. Threads are organized into groups called thread blocks. All threads within a thread block execute the same kernel and communicate with each other through a local multiprocessor's shared memory. They synchronize their computation by means of built-in synchronization instructions. Blocks are grouped into a one- or two-dimensional execution grid, specified at launch time. Blocks are executed asynchronously and there is no efficient dedicated mechanism to ensure global synchronization.