Explore chapters and articles related to this topic
Parallelizing High-Level Synthesis: A Code Transformational Approach to High-Level Synthesis
Published in Louis Scheffer, Luciano Lavagno, Grant Martin, EDA for IC System Design, Verification, and Testing, 2018
Gaurav Singh, Sumit Gupta, Sandeep Shukla, Rajesh Gupta
Poor synthesis results can be attributed to several factors. Most of the optimizing transformations that have been proposed over the years are operation-level transformations. That is, these transformations typically operate on three-operand computational expressions or instructions [1]. In contrast, language- or source-level optimizations refer to transformations that require structural and hierarchical information available in the source code to operate. For example, loop transformations such as loop unrolling, loop fusion, and loop-invariant code motion (LICM), use information about the loop structure and the loop index variables and their increments. However, few language-level optimizations have been explored in the context of HLS and their effects on final circuit area and performance are not well understood. Often the effectiveness of transformations has been demonstrated in isolation from other transformations and results are often presented in terms of scheduling results, with little or no analysis of control and hardware costs. Furthermore, designs used as HLS benchmarks are often small, synthetic designs. It is thus difficult to judge if an optimization has a positive impact beyond scheduling results, on large moderately complex designs.
High-Level Synthesis
Published in Luciano Lavagno, Igor L. Markov, Grant Martin, Louis K. Scheffer, Electronic Design Automation for IC System Design, Verification, and Testing, 2017
Felice Balarin, Alex Kondratyev, Yosinori Watanabe
Besides unrolling and breaking loops, there are several well-known compiler techniques that rewrite loops in a simpler way that is more suitable for implementation (both in hardware and in software). The most popular ones are as follows: Loop inversion that changes a standard while loop into a do/while. This one is particularly effective when one can statically infer that the loop is always executed at least once.Loop-invariant code motion that moves out of the loop assignments to variables whose values are the same for each iteration.Loop fusion that combines two adjacent loops that iterate the same number of times into a single one when there are no dependencies between their computations.
Hardware and Software Architectures for Mobile Multimedia Signal Processing
Published in Borko Furht, Syed Ahson, Handbook of Mobile Broadcasting, 2008
Benno Stabernack, Kai-Immo Wels, Heiko Hübert
After a partitioning in hardware and software is found, the software part can be optimized. Numerous techniques exist that can be applied for optimizing software, such as loop unroll- ing, loop-invariant code motion, common subexpression elimination, or constant folding and propagation. For computationally intensive parts arithmetic optimizations or SIMD instructions can be applied, if such instructions are available in the processor. If the performance of the code is significantly influenced by memory accesses, as is mainly the case in video applications, the number of accesses has to be reduced or they have to be accelerated. The profiler gives a detailed overview of the memory accesses and allows therewith identifying the influence of the memory access. One optimization mechanism is the conversion of byte (8-bit) to word (32-bit) memory accesses. This can be applied if adjacent bytes in memory are required concurrently or within a short period, for example, pixel data of an image during image processing. A further mechanism is the usage of tightly coupled memories (TCMs) for storing frequently used data. For finding the most frequently accessed data area, the memory access statistics of memtrace can be used. In ITU-T Recommendation H.2643 these techniques are described in more detail.
Reduced O3 subsequence labelling: a stepping stone towards optimisation sequence prediction
Published in Connection Science, 2022
An optimising compiler performs a sequence of optimising transformation techniques, which are implemented as optimisation passes (or optimisation phases) and aim to transform a program to a semantically equivalent program that uses fewer computing resources or instructions. However, some passes are correlated and have positive or negative impacts to each other. For example, a program might not be vectorised if the licm pass (which performs loop invariant code motion) is not used (Kruse & Grosser, 2018). Therefore, how to determine the best sequence of passes for a program is a big research problem of compiler optimisations. In past research, evolution algorithms were used to solve this problem. However, since there are many optimisation passes in a compiler, the search space for the best sequence of passes is too large to be explored in reasonable time. There have been many kinds of research that use machine learning techniques to improve the speed of compiler optimisation tuning. The biggest challenge of such research is how to present program sources for machine learning models. With the rapid progress of deep learning technology, applying deep learning techniques to compilers has become a new research hotspot.