Explore chapters and articles related to this topic
Cancerous or Non-Cancerous Cell Detection on a Field-Programmable Gate Array Medical Image Segmentation Using Xilinx System
Published in Neeraj Mohan, Surbhi Gupta, Chuan-Ming Liu, Society 5.0 and the Future of Emerging Computational Technologies, 2022
C. Gopala Krishnan, Prasannavenkatesan Theerthagiri, A.H. Nishan
The proposed architecture includes nine blocks for allocating separate blocks of memory starting from preprocessor stage to final algorithm implementations. The preprocessing stage is used to get the samples of value from the CANCER signal, which is given by the CANCER main processor module. For the denoising feature extraction and classification, we are using the DHT module, ANFIS module, and CANCER norm selections modules. The SRAM and mux modules are used to hold the process operations and select the signal conditions, respectively. The proposed modules are used to create the block of the stationary wavelet transform (SWF), which is used for noise reduction of the CANCER signals. Cache memory is high-speed memory that can be used to speed up data processing. A CPU case is a case in which the central processing unit of a computer is used to reduce access to average memory. Cache is a small, fast memory that stores copies of data with frequently used critical memory.
Shared Memory Architecture
Published in Vivek Kale, Parallel Computing Architectures and APIs, 2019
The size of the cache lines is fixed for a given architecture and cannot be changed during program execution. During program execution, as given by the load and store operations of the machine program, the processor specifies memory addresses to be read or to be written—independent of the organization of the memory system. After having received a memory access request from the processor, the cache controller checks whether the memory address specified Belongs to a cache line that is currently stored in the cache; in which case a cache hit occurs and the requested word is delivered to the processor from the cache.Does not belong to a cache line that is currently stored in the cache; in which case a cache miss occurs and the cache line is first copied from the main memory into the cache before the requested word is delivered to the processor. The corresponding delay time is also called a miss penalty. Since the access time to the main memory is significantly larger than the access time to the cache, a cache miss leads to a delay in operand delivery to the processor.
Interfacing: the use of the computer as a component
Published in Kirk Ross, Hunt Andy, Digital Sound Processing for Music and Multimedia, 2013
The final aspect of Figure 8.10 worthy of comment here is the memory unit known as the ‘cache’. In a typical computer program, instructions are fetched from relatively localised areas in memory — programs do not on the whole fetch instructions from very disparate areas. The idea of a cache is to maintain a snapshot of the most recently used instructions (and data in some processors) in a place very local to the processor. This is done in the hope that the number of times the processor will need to go off-chip, to main memory to fetch an instruction, will be minimised. The cache contains a mechanism which determines whether the information required is contained in the cache, in which case it is provided from there, or whether the processor must go off-chip, to the main memory to fetch the information.
Spherical Harmonics and Discontinuous Galerkin Finite Element Methods for the Three-Dimensional Neutron Transport Equation: Application to Core and Lattice Calculation
Published in Nuclear Science and Engineering, 2023
Kenneth Assogba, Lahbib Bourhrara, Igor Zmijarevic, Grégoire Allaire, Antonio Galia
First, we need to determine the set of equivalence classes of the mesh with respect to the equivalence relation , called the quotient set. The underlying idea is to work in the quotient set as on the mesh, but without distinguishing between equivalent elements. It is therefore sufficient to calculate the geometric elementary matrices of each class representative. Likewise, the matrix-vector product operator is written using the canonical application , which associates each element of to its class representative. The reuse of data already present inside the cache memory reduces the need to repeatedly fetch data from the main memory. In the end, the solution of the linear system is performed without the need to assemble the matrix , resulting from the bilinear form [Eq.(8)]. It is sufficient to pass the matrix-vector product operator to the Krylov solver. The solvers implemented in NYMO based on this principle are BICGSTAB (Ref. 20) and the generalized minimal residual method21 (GMRES).
Statistical regression models for WCET estimation
Published in Quality Technology & Quantitative Management, 2019
Qiong Zhang, Yijie Huangfu, Wei Zhang
According to our understanding, Load and Cache Miss Rate (MissRate) are the key features for execution times. A Load is an instruction to read data from the main memory and transfer it to a register on the CPU so that other calculation instructions (e.g. add, subtraction) can use the data from registers directly for quick computation. If there is a cache memory on chip (i.e. CPU), the load operation will try to find the data from the cache first. If the data are there (i.e. a cache hit), it is loaded into the register. If the load operation cannot find the data from the cache (i.e. a cache miss), it then loads the data from the memory to the register. Also, for the cache performance, the cache MissRate defines how many memory accesses are misses from the cache, thus the CPU will have to load the data from the DRAM memory, which is much slower and can greatly affect the performance. By analysing the program, including the loop counts, we can estimate the total number of loads. The cache miss rate can be obtained by either static cache analysis or reading the program counter information by running the program. Hence, both of these features can be obtained much more easily than the actual execution time. Our proposal is to incorporate these features into a joint regression model of multiple tasks to improve the WCET estimation.
Assessment of the Lagrange Discrete Ordinates Equations for Three-Dimensional Neutron Transport
Published in Nuclear Science and Engineering, 2019
Kelly L. Rowland, Cory D. Ahrens, Steven Hamilton, R. N. Slaybaugh
Finally, Table III lists the reported run times and memory usage for each quadrature set used in this test case. The run time for the simulation using the LDO quadrature set is one order of magnitude higher than all other listed run times. One of the main causes of the time difference is likely the octahedral asymmetry in angle exhibited by the LDO quadrature set. The KBA parallel sweep algorithm was written and implemented assuming that a given quadrature set has an equal number of angles among all octants, so asymmetric quadratures do not benefit as much from the angle pipelining of KBA. We also see that the simulation using the LDO quadrature set has a greater memory requirement than those with other quadrature types. Recalling Sec. III.B, this is to be expected; the sizes of the matrices in the LDO formulation scale with the number of angles used rather than the number of scattering moments. It also may be the case that the increased memory requirement with the LDO quadrature set contributes to the longer run time. The size of these larger data structures may not fit in lower cache levels, and the reduced data locality may increase run time as well. We note that although the simulation using the LDO quadrature set incurs the highest run time and memory requirement, the general use of LDO quadrature sets remains of interest. The eventual prospect of strategically selecting the discrete angles in an LDO quadrature set used for a given scenario may provide better answers.