Explore chapters and articles related to this topic
Enhancing CPU performance
Published in Joseph D. Dumas, Computer Architecture, 2016
The VLIW approach has several disadvantages, which IBM, HP, and Intel hoped would be outweighed by its significant advantages. Early VLIW machines performed poorly on branch-intensive code; IBM attempted to address this problem with the tree structure of its system, and the Itanium architecture addresses it with a technique called predication (as opposed to prediction, which has been used in many RISC and superscalar processors). The predication technique uses a set of predicate registers, each of which can be used to hold a true or false condition. Where conditional branches would normally be used in a program to set up a structure such as if/then/else, the operations in each possible sequence are instead predicated (made conditional) on the contents of a given predicate register. Operations from both possible paths (the “then” and “else” paths) then flow through the parallel, pipelined execution units, but only one set of operations (the ones with the predicate that evaluates to true) are allowed to write their results.
High-Performance Computing for Nuclear Reactor Design and Safety Applications
Published in Nuclear Technology, 2020
Afaque Shams, Dante De Santis, Adam Padee, Piotr Wasiuk, Tobiasz Jarosiewicz, Tomasz Kwiatkowski, Sławomir Potempski
Another interesting topic regarding the scaling of NEK5000 is its efficiency in the utilization of single instruction, multiple data instructions from the advanced vector extensions (AVX) set. These instructions are designed to perform an arithmetic operation on several operands within one clock cycle. There are many limitations though, like slowdowns on memory access, lack of some operations (e.g., div/sqrt for full-length AVX), and problems with branch predication and with the power envelope of the CPU. The AVX unit is one of the biggest power consumers in the CPU, and upon activation it may slow down other CPU modules. It is therefore really difficult to predict the efficiency of a code when using shorter or longer operand vectors.