Explore chapters and articles related to this topic
Low-Power/Energy Compiler Optimizations
Published in Christian Piguet, Low-Power Processors and Systems on Chips, 2018
Early work on optimizing compilers for power and energy management suggested that optimization transformations for performance subsume those for power and energy management. Therefore, power/energy is not an optimization objective in its own right [13]. Traditional optimizations, such as common subexpression elimination, partial redundancy elimination, strength reduction, or dead code elimination increase the performance of a program by reducing the work to be done during program execution [2,12]. Clearly, reducing the workload may also result in power/energy savings. Memory hierarchy optimizations, such as loop tiling and register allocation, try to keep data closer to the processor because such data can be accessed more quickly. Keeping a value in an on-chip cache instead of an off-chip memory, or in a register instead of the cache, also saves power/energy due to reduced switching activities and switching capacitance.
Parallelizable adjoint stencil computations using transposed forward-mode algorithmic differentiation
Published in Optimization Methods and Software, 2018
J.C. Hückelheim, P.D. Hovland, M.M. Strout, J.-D. Müller
The first step can be explained as follows. We remind ourselves that the tangent-linear code is an implementation of the matrix-vector product , where the innermost loop body of the tangent-linear code, written in Algorithm 2 as is an implementation of the dot product Throughout this paper, subscripts denote a selected item from a vector, which is in practice implemented as an array access. For instance, represents the th element in the derivative residual vector. The diagonal of J consists of all terms that are multiplied with the same index in as they are assigned to in , which is the case for the term marked as diagonal in Equation (6). Conversely, all terms where the index in and do not match are part of the off-diagonal of J. As the inner loop of the tangent-linear code iterates over all neighbours , each of its iterations computes a distinct off-diagonal term given as All inner iterations within one outer iteration contribute to the same diagonal term: We can split the computation into diagonal and off-diagonal parts by calling twice, each time with one of the seeds set to zero. The diagonal term is thus computed by while the off-diagonal term is computed by For a more efficient execution, specialized tangent-linear routines and can be created to compute the diagonal and off-diagonal terms. These specialized routines omit the arguments that were set to zero in Equations (7) and (8) and use a variant of dead-code elimination to remove all computations that depend on the removed arguments. This technique was described in [2,22], and the split computation is shown in Algorithm 4.