Explore chapters and articles related to this topic
The Spidergon STNoC
Published in Marcello Coppola, Miltos D. Grammatikakis, Riccardo Locatelli, Giuseppe Maruccia, Lorenzo Pieralisi, Design of Cost-Efficient Interconnect Processing Units, 2020
Marcello Coppola, Miltos D. Grammatikakis, Riccardo Locatelli, Giuseppe Maruccia, Lorenzo Pieralisi
Except for the MIT Alewife system [5], two-case-delivery has also been implemented in the MIT Raw architecture [353]. Raw uses light processors with local SRAMs connected together and to off-chip memory through two separate physical static and dynamic 2-d mesh networks. While for the static network, the compiler guarantees deadlock-free fine grain parallel processing by defining packet routing at compile time, the dynamic network is totally exposed to two major groups of accesses: random or unknown user-defined communication protocols and unknown at compile time, memory accesses which can potentially deadlock. Thus, the dynamic network targets interrupts, I/O, DMA and other irregular off-chip communications [352]. Notice that protocol deadlocks in Raw always involve at least one packet in the dynamic network.
Performance and Footprint at the Toolchain Level
Published in Ivan Cibrario Bertolotti, Tingting Hu, Embedded Software Development, 2017
Ivan Cibrario Bertolotti, Tingting Hu
On systems without hardware support for floating-point instructions, floating-point calculations can still be performed in software, by means of two distinct approaches: The compiler still generates floating-point instructions, as it would if the FPU were available. At runtime, the processor will trap them as undefined instructions as soon as it attempts to execute them, because there is no FPU. In turn, the trap handler will invoke an appropriate software module that is responsible for emulating the floating-point instruction, and then resume execution.At compile time, the compiler generates calls to a floating-point library instead of floating-point instructions. The resulting object code is then linked against the floating-point library, which is responsible for performing floating-point operations in software, to build the executable image.
Event-Driven Programming
Published in Julio Sanchez, Maria P. Canton, Software Solutions for Engineers and Scientists, 2018
Julio Sanchez, Maria P. Canton
Library files differ from source files. All the program files containing source code are merged at compile time. Library functions are incorporated at link time, although the internal mechanisms of C and C++ development systems sometimes prevent us from noticing the difference. What often happens is that an #include statement in a source file references a header file, which in turn, contains references to other sources or to object code in a library. These references are usually made by means of the extern declarator which serves to indicate that a variable or function has external linkage. Figure 17.4 shows the different source, object, and library files that can be part of a C or C++ program.
Arbogast : Higher order automatic differentiation for special functions with Modular C
Published in Optimization Methods and Software, 2018
Isabelle Charpentier, Jens Gustedt
C is undergoing a continued process of standardization and improvement and, over the years, has added features that are important in the context of this study: complex numbers, variable length arrays (VLA), long double, the restrict keyword, type generic mathematical functions (all in C99), programmable type generic interfaces (_Generic), choosable alignment and Unicode support (in C11) (see [27]). Contrary to common belief, C is not a subset of C++. Features such as VLA, restrict and _Generic that make C interesting for numerical calculus do not translate to C++. Moreover, its static type system, fixed at compile time, and its ability to manage pointer aliasing make C particularly interesting for performance critical code. These are properties that are not met by C++, where dynamic types, indirections and opaque overloading of operators can be a severe impediment for compiler optimization. Unfortunately, these advantages of C are met with some shortcomings. Prominent among these is the lack of two closely related features, modularity and reusability, that are highly desirable in the context of AD.
Divide-and-conquer checkpointing for arbitrary programs with no user annotation
Published in Optimization Methods and Software, 2018
Jeffrey Mark Siskind, Barak A. Pearlmutter
The implementations of vlad and checkpointVLAD are disjoint and use completely different technology. The Stalin∇ [21] implementation of vlad is based on source-code transformation, conceptually applied reflectively at run time but migrated to compile time through partial evaluation. The implementation of checkpointVLAD uses something more akin to operator overloading. Again, nothing turns on this; this simplification is for expository purposes and allows us to focus on the issue at hand (see Section 7.1).
Design of a modern fast Fourier transform and cache effective bit-reversal algorithm
Published in International Journal of Parallel, Emergent and Distributed Systems, 2023
Through possibilities of modern C++, specifically compile-time code generation it is possible to create a design of FFT that is highly customizable. The actual version has a changeable iterative FFT threshold to adjust cache performance and adjustable stages where the planner takes any reasonable sequence of stage R values and creates desired mixed FFT at compile-time. Obviously, it is limited by the types of butterflies being implemented each defined by R and vector size.