Explore chapters and articles related to this topic
The Ethernet Advantage in Networking
Published in James Aweya, Designing Switch/Routers, 2023
RDMA also allows the copying of data directly from a network adapter to the host CPU’s memory without involving the CPU. With RDMA, data is deposited directly into the host CPU’s memory. RDMA data transfers bypass the operating system networking stack in both hosts, improving data transfer throughput. RDMA conserves memory bandwidth and reduces latency by eliminating kernel interrupts for copying data between the network adapter buffer pool and host application buffers. RDMA is, particularly, useful for the movement of large blocks of data, such as that required for storage interconnects and cluster computing.
Tyche
Published in Kuan-Ching Li, Hai Jiang, Albert Y. Zomaya, Big Data Management and Processing, 2017
Pilar González-Férez, Angelos Bilas
RDMA has been used extensively by protocols such as iSER (iSCSI Extension for RDMA) [1], SCSI RDMA Protocol (SRP), and RDMA-assisted iSCSI [2], which improve the performance of iSCSI by taking advantage of RDMA-operations. Two commonly known protocols are Internet Wide Area RDMA Protocol (iWARP) and RDMA over Converged Ethernet (RoCE). iWARP performs RDMA over TCP, and RoCE over Ethernet. However, all these protocols focus on providing RDMA capabilities by using hardware support. Tyche focuses on using existing Ethernet and exploring issues at the software interface between the host and the NIC.
The HyTeG finite-element software framework for scalable multigrid solvers
Published in International Journal of Parallel, Emergent and Distributed Systems, 2019
Nils Kohl, Dominik Thönnes, Daniel Drzisga, Dominik Bartuschat, Ulrich Rüde
The control layer manages the communication directions along the graph-edges of the macro-primitive graph. It is used to precisely schedule the communication among pairs of primitives individually or in groups. The layer allows for correct and optimised, potentially overlapping communication patterns matching individual routines such as matrix-vector products (cf. Algorithm 3). Before sending and after receiving buffers, it calls the respective (de-)serialisation routines of the packing layer without knowledge of the communicated data structures. The control layer employs optimisations like non-blocking communication and calls process-local communication routines if graph-edges do not cross process boundaries. Further optimisations like shared memory parallelism to (un-)pack data structures simultaneously or remote direct memory access can be integrated here to enhance the communication performance.