Clock Delta Compression for Scalable Order-Replay of Non-Deterministic Parallel Applications

SESSION: Programming Tools


EVENT TAG(S): Programming Systems

TIME: 11:00AM - 11:30AM

SESSION CHAIR(S): Thomas Fahringer

AUTHOR(S):Kento Sato, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee, Martin Schulz



The ability to record and replay program execution helps significantly in debugging non-deterministic parallel applications by reproducing message receive orders. However, the large amount of data that traditional record-and-reply techniques record precludes its practical applicability to massively parallel MPI applications. In this paper, we propose a new compression algorithm, Clock Delta Compression (CDC), for scalable record and replay of non-deterministic MPI applications. CDC defines a reference order of message receives based on a totally ordered relation using Lamport clocks, and only records the differences between this reference logical-clock order and an observed order. Our evaluation shows that CDC significantly reduces the record data size. For example, when we apply CDC to a Monte Carlo particle transport benchmark (MCB), which represents non-deterministic communication patterns, CDC reduces the record size by approximately two orders of magnitude compared to traditional techniques and incurs between 13.1% and 25.5% of runtime overhead.

Chair/Author Details:

Thomas Fahringer (Chair) - University of Innsbruck|

Kento Sato - Lawrence Livermore National Laboratory

Dong H. Ahn - Lawrence Livermore National Laboratory

Ignacio Laguna - Lawrence Livermore National Laboratory

Gregory L. Lee - Lawrence Livermore National Laboratory

Martin Schulz - Lawrence Livermore National Laboratory

