PDE Preconditioner Resilient to Soft and Hard Faults

SESSION: Regular & ACM Student Research Competition Poster Reception

EVENT TYPE: Posters, Receptions, ACM Student Research Competition

EVENT TAG(S): HPC Beginner Friendly, Regular Poster

TIME: 5:15PM - 7:00PM

SESSION CHAIR(S): Michela Becchi, Manish Parashar, Dorian C. Arnold

AUTHOR(S):Francesco Rizzi, Karla Morris, Kathryn Dahlgren, Khachik Sargsyan, Paul Mycek, Cosmin Safta, Olivier LeMaitre, Omar Knio, Bert Debusschere

We present a resilient domain-decomposition preconditioner for partial differential equations (PDEs). The algorithm reformulates the PDE as a sampling problem, followed by a solution update through data manipulation that is resilient to both soft and hard faults. The algorithm exploits data locality to reduce global communication. We discuss a server- client implementation where all state information is held by the servers, and clients are designed solely as computational units. Focusing on the stages of the algorithm that are most intensive in communication and computation, we explore the scalability of the actual code up to 12k cores, and build an SST/macro skeleton allowing us to extrapolate up to 50k cores. We show the resilience under simulated hard and soft faults for a 2D linear Poisson equation.

Michela Becchi, Manish Parashar, Dorian C. Arnold (Chair) - University of Missouri|Rutgers University|University of New Mexico|

Francesco Rizzi - Sandia National Laboratories

Karla Morris - Sandia National Laboratories

Kathryn Dahlgren - Sandia National Laboratories

Khachik Sargsyan - Sandia National Laboratories

Paul Mycek - Duke University

Cosmin Safta - Sandia National Laboratories

Olivier LeMaitre - Duke University

Omar Knio - Duke University

Bert Debusschere - Sandia National Laboratories

