Improving Concurrency and Asynchrony in Multithreaded MPI Applications using Software Offloading

SESSION: MPI/Communication


EVENT TAG(S): Power, System Software, Clouds and Distributed Computing, Resiliency

TIME: 4:00PM - 4:30PM


AUTHOR(S):Karthikeyan Vaidyanathan, Dhiraj D. Kalamkar, Kiran Pamnany, Jeff R. Hammond, Pavan Balaji, Dipankar Das, Jongsoo Park, Balint Joo



We present a new approach for multithreaded communication
and asynchronous progress in MPI applications, wherein we offload
communication processing to a dedicated thread. The central
premise is that given the rapidly increasing core counts on modern
systems, the improvements in MPI performance arising from
dedicating a thread to drive communication outweigh the small
loss of resources for application computation, particularly when
overlap of communication and computation can be exploited. Our
approach allows application threads to make MPI calls concurrently,
enqueuing these as communication tasks to be processed
by a dedicated communication thread. This not only guarantees
progress for such communication operations, but also reduces load
imbalance. Our implementation
additionally significantly reduces the overhead of mutual
exclusion seen in existing implementations for applications using
MPI THREAD MULTIPLE. Our technique requires no modification
to the application, and we demonstrate significant performance
improvement (up to 2X) for QCD, FFT and deep learning CNN.

Chair/Author Details:

Yong Chen (Chair) - Texas Tech University|

Karthikeyan Vaidyanathan - Intel Corporation

Dhiraj D. Kalamkar - Intel Corporation

Kiran Pamnany - Intel Corporation

Jeff R. Hammond - Intel Corporation

Pavan Balaji - Argonne National Laboratory

Dipankar Das - Intel Corporation

Jongsoo Park - Intel Corporation

Balint Joo - Thomas Jefferson National Accelerator Facility

