Explore chapters and articles related to this topic
Language Elements
Published in Joseph Cavanagh, ® HDL Digital Design and Modeling, 2017
The wait construct provides a means to synchronize two concurrent processes. Therefore, a means must be provided to ensure that the value for the expression will never remain false; otherwise, the statement following the conditional expression would never be executed. The wait construct provides an ideal method to synchronize communication between two units that operate over an asynchronous interface.
Computationally efficient GPU based NS solver for two dimensional high-speed inviscid and viscous compressible flows
Published in Engineering Applications of Computational Fluid Mechanics, 2023
Muhammad Naveed Akhtar, Kamran Rasheed Qureshi, Muhammad Hanif Durad, Anila Usman, Syed Muhammad Mohsin, Band Shahab, Amirhosein Mosavi
Resolving data and computation dependencies is another complex task for parallel applications. These dependencies have been resolved by using multiple GPU kernels and executing them sequentially using the GPU's default stream. The reason for this is to ensure that all relevant data is available to the threads or computed before they actually use it. Also, a grid point of the geometry and all related computations for that point have been assigned to a single thread. With this approach, no thread has to wait for data computed by a neighbouring thread. In other words, it can be safely said that there is no intra-core data dependency in our proposed approach, while inter-core data dependency is successfully solved by queuing multiple kernels in order of computation to the default stream. The kernels from the standard stream are started in the order of the program and the next kernel starts its operation only when the previous one finishes. In this way, synchronization between kernels is achieved. Also, where necessary, two separate sets of variables were used for source and destination to avoid conflicts when reading and writing memory.
Parallel algorithm development and testing using Petri-object simulation
Published in International Journal of Parallel, Emergent and Distributed Systems, 2021
Inna V. Stetsenko, Alexander A. Pavlov, Oleksandra Dyfuchyna
Guarded block (Figure 7). This tool allows orchestrating of threads interaction by the waiting of a specific state. It includes the pair wait/notify methods defined in class Object or await/signal methods defined in interface Condition. The object is locked when the guarded block starts checking condition. In the case of successfully locking the method will continue. Otherwise, the thread, acquiring the lock, will wait until the lock becomes available. If the condition is not true the thread becomes ‘waiting.’ When one of the other threads calls signal() method for the same object as await() method then the next instruction after the wait() method will be performed. Since the guarded block includes the wait() method in the while cycle, the next action will be checking the condition again. Only if the condition is true the thread can continue the performance. The signal() method will receive only those threads that are waiting for the same object.
High-performance attribute reduction on graphics processing unit
Published in Journal of Experimental & Theoretical Artificial Intelligence, 2020
Some interesting phenomenon can be found in Figure 6. First, the speedup ratio of the ‘CPU-GPU’ algorithm decreases when the size of a data set is greater than 1 million. The reason is that the growth of data size will lead to the increase of overhead of data transmission in the ‘CPU-GPU’ algorithm. This will cancel out the performance gain obtained by GPU. On the contrary, the proposed algorithm just needs to transfer a little data during the execution. Therefore, it can keep performance when the data size increases. Second, we can see that the performance of the proposed algorithm is much better when the number of GPU thread is set to 256. This is due to the SIMT mechanism of CUDA. Even if a thread is finished, it still needs to wait for other in-execution threads which are in a same thread block. Therefore, the more threads in a thread block, the more waiting threads when execution divergence happens. However, the number of GPU thread can not be set too small. Because the number of thread block in a GPU is fixed and limited. Therefore, threads maybe not enough to achieve the best performance.