Explore chapters and articles related to this topic
Dynamic Random Access Memory (DRAM)
Published in Shimeng Yu, Semiconductor Memory Devices and Circuits, 2022
What makes HBM attractive is not only the higher integration density (multiple dies on the same 2D form factor), but also the wide I/O interface that it could offer. As aforementioned, the DDR/LPDDR often has a 64-bit-wide I/O interface, and GDDR often has a 32-bit-wide I/O interface. Now, HBM offers a 1024-bit-wide I/O interface. Even running at a slower I/O clock frequency, which means lower interface speed (Gbps) per pin, HBM could offer a significantly higher bandwidth (GB/s) at the system level. Table 3.1 summarizes the evolution of HBM interface protocol standard and the comparison with LPDDR and GDDR counterparts. As of 2020, HBM has gone through three generations (HBM, HBM2, and HBM2E). The capacity per DRAM die has increased from 2 Gb to 16 Gb, and the number of DRAM dies in the stack has increased from 4 to 8; thus, the total capacity has increased from 1 GB to 16 GB. The system bandwidth has increased from 128 GB/s to 410 GB/s. As a comparison, LPDDR5 and GDDR6 offer the system bandwidth 37.5 GB/s and 56 GB/s.
Three-Dimensional Integration: Technology and Design
Published in Katsuyuki Sakuma, Krzysztof Iniewski, 3D Integration in VLSI Circuits, 2018
3D memories do exactly this and are positioned to be the next large-volume application of 3DIC technologies. To date dynamic random-access memory (DRAM) has relied on one-signal-per-pin signaling using low cost, low pin count, and single-chip plastic packaging. As a result, DRAM has continued to lag logic in terms of bandwidth potential and power efficiency. Furthermore, the I/O speed of one-signal-per-pin signaling schemes are unlikely to scale a lot beyond what can be achieved today in double data rate (DDR4) (up to 3.2 Gbps per pin) and graphics double data rate (GDDR6) (8 Gbps). Beyond these data rates, two-pins-per-signal differential signaling is needed. Furthermore, the I/O power consumption, measured as pJ/bit, is relatively high, even for the low-power DDR (LPDDR) standards (intended for mobile applications).
Study and evaluation of automatic GPU offloading method from various language applications
Published in International Journal of Parallel, Emergent and Distributed Systems, 2022
Regarding GPU, we use NVIDIA GeForce RTX 2080 Ti (CUDA core: 4352, Memory: GDDR6 11GB) and NVIDIA Quadro K5200 (CUDA core: 2304, Memory: GDDR5 8GB). We use CUDA Toolkit 10.1 for GPU control. We use PGI compiler 19.10 for C language, pyCUDA 2019.1.2 and Cupy 7.8 for Python, and JCuda 10.1 for Java. Figure 3 shows the evaluation environment and specifications. Here, the application code used by the user is specified from the client notebook PC, tuned using the verification machine, and then deployed to the running environment for the actual use.