Overcoming Distributed Debugging Challenges in the MPI+OpenMP Programming Model
Authors: Lai Wei (Rice University), Ignacio Laguna (Lawrence Livermore National Laboratory), Dong H. Ahn (Lawrence Livermore National Laboratory), Matthew P. LeGendre (Lawrence Livermore National Laboratory), Gregory L. Lee (Lawrence Livermore National Laboratory)
Abstract: There is a general consensus that exascale computing will embrace a wider range of programming models to harness the many levels of architectural parallelism. To aid programmers in managing the complexities arising from multiple programming models, debugging tools must allow programmers to identify errors at the level of the programming model where the root cause of a bug was introduced. However, the question of what the effective levels for debugging in hybrid distributed models are, remains unanswered.
In this work, we share our lessons learned from incorporating OpenMP awareness into a highly-scalable, lightweight debugging tool for MPI applications: the Stack Trace Analysis Tool (STAT). Our framework leverages OMPD, an emerging debugging interface for OpenMP, to provide easy-to-understand stack trace views for MPI+OpenMP programs. Our tool helps users debug their programs at the user code level by mapping the stack traces to the high-level abstractions provided by programming models.
Two-page extended abstract: pdf