Explore chapters and articles related to this topic
Case Studies
Published in Lambrechts Wynand, Sinha Saurabh, Abdallah Jassem, Prinsloo Jaco, Extending Moore’s Law through Advanced Semiconductor Design and Processing Techniques, 2018
Lambrechts Wynand, Sinha Saurabh
Its latest offering, announced in 2017 and planned to be mass-produced in 2018, is the Volta line of GPUs, a processing unit based on a 12 nm FinFET technology node, housing 21 billion transistors in its main processing core on a die size of 815 mm2 (Nvidia 2017). Its predecessor, the Pascal line of GPUs, was built on a 14 nm technology and contained 15 billion transistors on a die size of 610 mm2 on its primary processing core. Huang additionally said, during this keynote address, that the Volta GPU is at the limits of photolithography, therefore acknowledging that this process step (photolithography) is the primary challenge to adhering to Moore’s law. The Volta line of GPUs has a redesigned microprocessor architecture with respect to the Pascal line, and is able to operate 50% more efficiently than its predecessor. In addition, these GPUs implement high bandwidth memory (HBM) for their video RAM (VRAM), as opposed to the traditional, albeit less costly, double data rate (DDR) memory – currently type 5, GDDR5 (Nvidia 2017). HBM uses vertically stacked dynamic RAM (DRAM) memory chips interconnected by through-silicon vias, shortening the path between individual memory chips, effectively reducing power consumption, reducing the required area and allowing for higher bandwidth at lower clock speeds.
Innumerable Biographies: A Brief History of the Field
Published in John D. Cressler, Silicon Earth, 2017
Ironically, the foundations of electrical engineering (EE—read “double E”), as a now-entrenched, can’t-live-without-it discipline, rest squarely on the shoulders of physicists, not electrical engineers! Skeptical? Start naming the units we commonly employ in EE: hertz (Heinrich Hertz, 1857–1894, physicist), volts (Alessandro Volta, 1745–1827, physicist), ohms (Georg Ohm, 1789–1854, physicist), amps (Andre-Marie Ampere, 1775–1836, physicist), farads (Michael Faraday, 1791–1867, physicist), Kelvins (William Thomson, Lord Kelvin, 1824–1907, physicist), gauss (Karl Friedrich Gauss, 1777–1855, physicist and mathematician), newtons (Isaac Newton, 1642–1727, physicist extraordinaire), and joules (James Joule, 1818–1889, physicist). This list could go on. There are exceptions, of course; watts (James Watt, 1736–1819, inventor and engineer) and teslas (Nikola Tesla, 1856–1943, inventor and engineer, inventor of the famous Tesla coil [Figure 3.2]) come to mind. The fact that the many fruits of engineering rest on the triumphs of scientific theory cannot be overstated. Science and engineering are highly complementary, and to my mind equally noble, pursuits. Remember this the next time you hear EE undergraduates grousing about their woes in a physics or chemistry course!
Source Term-Based Turbulent Flow Simulation on GPU with Link-Wise Artificial Compressibility Method
Published in International Journal of Computational Fluid Dynamics, 2021
Sijiang Fan, Marta Camps Santasmasas, Xiao-Wei Guo, Canqun Yang, Alistair Revell
From the perspective of hardware, the streaming processor (SP) is the basic processing unit and a streaming multiprocessor (SM) consists of many SPs. Different GPU frameworks typically have different numbers of SMs and SPs. In this paper, we use Tesla V100 GPUs based on the Volta architecture, which has 84 SMs, each with 64 FP32 cores, 64 INT32 cores, 32 FP64 cores, 8 Tensor cores, and 4 texture units. CUDA programs are generally executed by many threads and organised into blocks, which themselves are constructed into a ‘grid’ as illustrated in Figure 3. Each thread is executed on a single CUDA core (SP), and each block must stay on one SM so that the threads within the block are strongly coupled. Massive parallelism is achieved via the single instruction multiple threads (SIMT) execution model. Although the SIMT model can make full use of the GPU, it requires that the same operations are applied across as many parts of the domain as possible. In complex cases, when different parts of the domain require different treatment, performance is inevitably lost via the SIMT framework.