Explore chapters and articles related to this topic
Energy-Efficient Reconfigurable Processors
Published in Christian Piguet, Low-Power Processors and Systems on Chips, 2018
Raphaël David, Sébastien Pillement, Olivier Sentieys
A portion of code is usually qualified as regular when it is used for a long period of time, and applied to a large set of data, without being suspended by another processing. Loop kernels support this qualification because their computation patterns are maintained during all the loop iterations. Instruction-level parallelism of such regular processing is often exhibited by compilation techniques such as loop unrolling or software pipelining [26]. With such techniques, the computation pattern of the loop kernel is repeated several times, which leads to a highly regular architecture. If this loop kernel is implemented on several RDPs, their configuration might be redundant. Specifying several times the same configuration is an energy waste, we then introduce a concept called single configuration multiple data (SCMD). It may be considered as an extension of SIMD (single instruction multiple data), in which several operators execute the same operation on different data sets. Within the framework of SCMD, the configuration data sharing is no longer limited to the operators but is extended to the RDPs.
Using machine learning techniques for DSP software performance prediction at source code level
Published in Connection Science, 2021
Weihua Liu, Erh-Wen Hu, Bogong Su, Jian Wang
We observed that some testing samples tend to under-predict the execution time, while other testing samples tend to over-predict the execution time. Scrutinising the code, we noted that the under-prediction is caused by large overhead in the assembler code that was unaccounted for in the source code analysis. Under-prediction can be very significant when the samples’ inner loops involve many function calls and these function calls can be very expensive especially when executed on embedded systems or DSPs since each call triggers many stack operations. As to the over-prediction, we noted that it is due to the shortened inner loop body of assembly code generated by software pipelining, a loop optimisation technique used to reduce loop execution time by exploring instruction-level parallelism from different loop iterations. In this paper, two attributes “function call – Total number of function calls inside the inner loop of a sample” and “SWPP – Inner loop optimised by software pipelining or not” are closely related to above-mentioned observations.