SCHEDULE: NOV 15-20, 2015

Relative Debugging for a Highly Parallel Hybrid Computer System

SESSION: Programming Tools


EVENT TAG(S): Programming Systems

TIME: 11:30AM - 12:00PM

SESSION CHAIR(S): Thomas Fahringer

AUTHOR(S):Luiz DeRose, Andrew Gontarek, Aaron Vose, Robert Moench, David Abramson, Minh Dinh, Chao Jin



Relative debugging traces software errors by comparing two executions of a program concurrently - one code being a reference version and the other faulty. Relative debugging is particularly effective when code is migrated from one platform to another, and this is of significant interest for hybrid computer architectures containing CPUs accelerators or coprocessors. In this paper we extend relative debugging to support porting stencil computation on a hybrid computer. We describe a generic data model that allows programmers to examine the global state across different types of applications, including MPI/OpenMP, MPI/OpenACC, and UPC programs. We present case studies using a hybrid version of the ‘stellarator’ particle simulation DELTA5D, on Titan at ORNL, and the UPC version of Shallow Water Equations on Crystal, an internal supercomputer of Cray. These case studies used up to 5,120 GPUs and 32,768 CPU cores to illustrate that the debugger is effective and practical.

Chair/Author Details:

Thomas Fahringer (Chair) - University of Innsbruck|

Luiz DeRose - Cray Inc.

Andrew Gontarek - Cray Inc.

Aaron Vose - Cray Inc.

Robert Moench - Cray Inc.

David Abramson - University of Queensland

Minh Dinh - University of Queensland

Chao Jin - University of Queensland

Paper provided by the ACM Digital Library

Paper also available from IEEE Computer Society