Explore chapters and articles related to this topic
High-Performance Computing
Published in Dale A. Anderson, John C. Tannehill, Richard H. Pletcher, Munipalli Ramakanth, Vijaya Shankar, Computational Fluid Mechanics and Heat Transfer, 2020
Dale A. Anderson, John C. Tannehill, Richard H. Pletcher, Munipalli Ramakanth, Vijaya Shankar
CFD applications using highly scalable computational methods have been deployed on very large computer clusters. The factors driving a balance between numerical methods and computational cost are gradually pointing toward exascale computing. As noted by Ashby et al. (2010), exascale computing is now seen as an enabling technology for simulation-based aerodynamic design. There seems to be a strong likelihood that this will be attained very soon. Intel introduced a teraflops chip (i9) for desktop applications in 2017, and an IBM-built supercomputer named “Summit” at Oak Ridge National Laboratory recorded a 148.6 petaflops performance in 2019. Exascale computing is also believed to be essential for transformative developments in astrophysics, biological systems, climate modeling, combustion, materials science, nuclear engineering, and aspects of national security.
How to Make an Artificial General Intelligence
Published in Chace Calum, Artificial Intelligence and the Two Singularities, 2018
Exascale computing is not just needed to model a human brain. It will also improve climate modelling, astronomy, ballistics analysis, engineering development and numerous other scientific, military and commercial endeavours. The Chinese government announced in January 2017 that it would have an exascale supercomputer by the end of the year, although it would not be fully operational until 2020. The US Department of Energy is funding a slower route to exascale computing, which it thinks will be more effective, and expects to get there by 2023.8
Simple Fault-tolerant Computing for Field Solvers
Published in International Journal of Computational Fluid Dynamics, 2020
The current path to exascale computing foresees tens of thousands of heavily populated nodes (i.e. millions of cores) working on the same time-critical problem. One of the emerging problems with millions of cores is the time between failures. Random failures of cores or communication between cores – commonly referred to as ‘faults’ – that are expected to occur every couple of minutes pose a serious problem for production runs that need days or hours to complete. This problem has not emerged so far because: For most machines the number of (high quality) cores allowed for a single run is still in the range of tens of thousands (Löhner and Baum 2013a, 2014), i.e. times between failures occur only every few hours;Due to their Message Passing Interface (MPI) implementations most computing centres do not allow for fault-tolerant computing: if any MPI process/core fails, the run terminates immediately. It should come as no surprise then that none of the production codes currently in place can deal with cores/nodes failing. The approach to date has been to write periodically all restart information to disk (e.g. every hour), so that if the machine experiences a malfunction, only the last hour of computing is lost. This approach requires constant human supervision or elaborate restart scripts, so that a considerable number of productive hours may not be lost should a node or core fail.