Dataflow architecture – Knowledge and References

Explore chapters and articles related to this topic

State-of-the-Art and Future Trends

Published in Dobrivoje Popovic, Vijay P. Bhatkar, Distributed Computer Control for Industrial Automation, 2017

Another form of parallel processing is achieved by a dataflow architecture. The architecture is of particular importance for future generation distributed control systems. A dataflow architecture uses coded indentifiers or token to guide data concurrently through the processes. Instructions are executed automatically, when data are available, but calculations which depend upon recursion are not executed efficiently. To program a dataflow processor, a graphic representation called a directed graph is used. Dataflow processors work on different ports of a program in parallel, and several dataflow processors working in parallel increase speed almost linearly. One of the first commercial data flow architecture computers is Cydra from Cydrome. This computer architecture is considered important for real-time applications.

Special-purpose and future architectures

View Chapter

Purchase Book

Published in Joseph D. Dumas, Computer Architecture, 2016

Joseph D. Dumas

Although the concept of dataflow processing may be an intuitively appealing model of computation, and the dataflow graph and activity template models are interesting ways to represent the methods for carrying out computational tasks, the reader is no doubt wondering what, if any, advantage such an architecture might have in comparison to a machine using conventional von Neumann–style programming. It turns out that the chief potential advantage of a dataflow architecture is an increased ability to exploit parallel execution hardware without a lot of overhead. If only one hardware unit is available to execute operations, we might as well write programs using the time-honored sequential model, as going to a dataflow model can do nothing to improve performance. However, if we have the ability to construct multiple functional units, then it is possible that a dataflow approach to scheduling them may relieve some of the bottleneck that we usually (and artificially) impose by adopting the von Neumann programming paradigm.

D

View Chapter

Purchase Book

Published in Philip A. Laplante, Comprehensive Dictionary of Electrical Engineering, 2018

Philip A. Laplante

data structure data bus set of wires or tracks on a printed circuit or integrated circuit that carry binary data, normally one byte at a time. data cache a small, fast memory that holds data operands (not instructions) that may be reused by the processor. Typical data cache sizes currently range from 8 kilobytes to 8 megabytes. See cache. data communications equipment (DCE) a device (such as a modem) that establishes, maintains, and terminates a session on a network. data compression theorem Claude Shannon's theorem, presenting a bound to the optimally achievable compression in (lossless) source coding. See also Shannon's source coding theorem. data dependency the normal situation in which the data that an instruction uses or produces depends upon the data used or produced by other instructions such that the instructions must be executed in a specific order to obtain the desired results. data detection in communications, a method to extract the transmitted bits from the received signal. data flow architecture a computer architecture that operates by having source operands trigger the issue and execution of each operation, without relying on the traditional, sequential von Neumann style of fetching and issuing instructions. data fusion analysis of data from multiple sources -- a process for which neural networks are particularly suited. data logger a special-purpose processor that gathers and stores information for later transfer to another machine for further processing. data path the internal bus via which the processor ships data, for example, from the functional units to the register file, and vice versa. data pipeline a mechanism for feeding a stream of data to a processing unit. Data is pipelined so that the unit processing the data does not have to wait for the individual data elements. data preprocessing the processing of data before it is employed in network training. The usual aim is to reduce the dimensionality of the data by feature extraction. data processing inequality information theoretic inequality, a consequence of which is that no amount of signal processing on a signal can increase the amount of information obtained from that signal. Formally stated, for a Markov chain X Y Z, I (X ; Z ) I (X ; Y ) The condition for equality is that I (X ; Y |Z ) = 0, i.e., X Z Y is a Markov chain. data reduction coding system any algorithm or process that reduces the amount of digital information required to represent a digital signal. data register a CPU register that may be used as an accumulator or a buffer register or as index registers in some processors. In processors of the Motorola M68000 family, data registers are separate from address registers in the CPU. data segment the portion of a process' virtual address space allocated to storing and accessing the program data (BSS and heap, and may include the stack, depending on the definition). data stripe storage methodology where data is spread over several disks in a disk array. This is done in order to increase the throughput in disk accesses. However, latency is not necessarily improved. See also disk array. data structure a particular way of organizing a group of data, usually optimized for efficient

Power-Aware Characteristics of Matrix Operations on Multicores

View Article

Journal Information

Published in Applied Artificial Intelligence, 2021

Guruprasad Konnurmath, Satyadhyan Chickerur

The implementation of the proposed methodology on GPUs with the combination of three techniques namely TensorFlow, InterPSS, and DVFS supports the capability of supercomputing with relatively cheaper GPUs and from the results we can analyze that the performance, power efficiency, and power consumption of GPU application kernels are determined by the rate of instruction issues by the GPU cores and the ratio of number of global memory transactions to the total number of computation instructions by GPU memory. The proposed methodology is tested for dense matrices up to 512 by 512 size against the high powers of up to 256 that show reasonable tremendous improvement in execution speed around 15%. TensorFlow’s flexible dataflow architecture allows power users to achieve excellent performance and supports automatic GPU placement, GPU kernels fusion, efficient GPU memory management, and scheduling that can be considered as a better alternative machine learning technique for power optimization and speedy execution. Also, the proposed approach includes many architectural performance benefits confined to Nvidia GTX series GPUs. Also, pooling of DVFS technique with TensorFlow engine and InterPSS allows reasonable saving of energy in a more optimized way as compared with other energy saving mechanisms. The combined methodology can be further extended for future research work in designing energy efficient Green GPUs and implemented for different GPU architectures.