Explore chapters and articles related to this topic
Embedding Parallelism In Image Processing Techniques And Its Applications
Published in Sanjay Saxena, Sudip Paul, High-Performance Medical Image Processing, 2022
Suchismita Das, G. K. Nayak, Sanjay Saxena
Data parallelism and Task parallelism are both contrast to each other. Data parallelism is the process of distributing the data between multiple processors which computes a single task on the data in parallel. Whereas task parallelism is the process of distributing concurrent tasks performed by processes or threads across different nodes. So, computing the different algorithms on same data (images) is the one way of achieving parallelism were computing the same task on different set of images is another way of it. Hence, he images analysis can be parallelized in both the way to achieve maximum speed and better performance. Task parallelism can be achieved through pipelining and the data parallelism is achieved through work sharing. There are a number of image analysis operations in which the data and task parallelism can be performed. CPUs are suitable task parallelism for some heavy processing by only a few threads whereas GPUs is more suited for applications where large amount of data parallel operations are performed. Applications can achieve optimal performance by combining data parallelism on GPU with task parallelism on CPU.
Virtualization of the Architectural Components of a System-on-Chip
Published in Lev Kirischian, Reconfigurable Computing Systems Engineering, 2017
As mentioned in Section 1.1, the data-computing process consists of two flows: (1) flow of data to be processed and (2) flow of control information. Therefore, for any function, it is possible to consider data-level parallelism (DLP) and control-level parallelism. In cases when control information is presented in a form of operational code (op-code) in respective instructions, the instruction-level parallelism can be considered. Data parallelism assumes that multiple data-elements can be executed simultaneously. In turn, it means that there are certain sets of independent data-elements (e.g., bits, data-words, vectors of data) that can be processed in parallel.Control parallelism assumes that the algorithm consists of sets of elementary operations (algorithm branches) that can be processed in parallel.
Parallel Backpropagation Neural Network for Big Data Processing on Many-Core Platform
Published in Kuan-Ching Li, Hai Jiang, Albert Y. Zomaya, Big Data Management and Processing, 2017
As an exemplary study, in our experiment, we try to train the BP neural network to learn the operation of addition on SCC and Intel® Xeon® Phi™ platforms. We also train the BP neural network to learn the operation of multiplication on Intel® Xeon® Phi™ platform. We try to train the BP neural network to learn the addition and multiplication operations solely based on the data provided. BP neural network built for this purpose has two input variables, one hidden layer, one output layer, and one output variable. The number of training samples is set to 9600. Since there are two input variables, from our observation too many neurons (nodes) in the hidden layer can lead to overfitting. Hence, the numbers of nodes that we use in the hidden layer are 5, 10, and 20, respectively, for Intel® SCC. For Intel® Xeon® Phi™, we choose to employ 10 nodes in the hidden layer. We set the training iteration's upper limit to 20,000. We assume that the performance of neural network is satisfactory when the average error is less than 0.001. Thus, either on the condition that the iterations of training reach 20,000 or on the condition that the average error is less than 0.001, training is stopped. There are two widely used parallel approaches: task parallelism and data parallelism. In task parallelism, various tasks are partitioned among the cores to carry out the computation in solving the problem. In data parallelism, the data used in solving the problem are partitioned among the cores and each core carries out more or less similar operations on its part of the data [Pacheco 2011]. In our experiment, data parallelism is employed.
Adopting GPU computing to support DL-based Earth science applications
Published in International Journal of Digital Earth, 2023
Zifu Wang, Yun Li, Kevin Wang, Jacob Cain, Mary Salami, Daniel Q. Duffy, Michael M. Little, Chaowei Yang
Data parallelism is the most common model parallel training strategy. It is suitable for computationally intensive layers with a relatively small number of parameters. As shown in Figure 6, the data parallelism approach typically runs iteratively with the following initial weights and updates: (1) dataset is divided into smaller chunks and each chunk is assigned to a different GPU. Each GPU has a completed model and assigned data chunks; (2) a parameter server initializes model weights and sends weights to multiple GPUs; (3) every GPU trains the model on its assigned chunk of data and calculates the gradient; (4) GPUs send the computed gradient to the parameter server; (5) the parameter server produces a single and updated model based on gradients collected from all GPUs; (6) the parameter server sends the updated model weights to all GPU devices; (7) the above steps are repeated until the model has converged. Data parallelism strategy benefits distributed GPU computing by significantly reducing train time as the model could be trained in parallel on multiple GPUs. It also enables high scalability since the data parallelism strategy could be deployed on a large number of GPUs by simply dividing data into small chunks. Noticeably, data parallelism is relatively easy to implement since it did not require significant changes to the model or the training process.
Distributed Learning of CNNs on Heterogeneous CPU/GPU Architectures
Published in Applied Artificial Intelligence, 2018
Jose Marques, Gabriel Falcao, Luís A. Alexandre
In data parallelism, the batch of data is split across the several nodes of the cluster, such as CPU, GPU, or a combination of both. Each node is then responsible for computing the gradients with respect to all the parameters, but does so using part of the batch. However, since every node is running a replica, it is necessary to communicate the gradients and parameter values on every update step. Another problem with this approach is that since every node calculates different gradients, they need to be averaged, and that causes the loss of information and may hinder the training process.