PDE Preconditioner Resilient to Soft and Hard Faults
Authors: Francesco Rizzi (Sandia National Laboratories), Karla Morris (Sandia National Laboratories), Kathryn Dahlgren (Sandia National Laboratories), Khachik Sargsyan (Sandia National Laboratories), Paul Mycek (Duke University), Cosmin Safta (Sandia National Laboratories), Olivier LeMaitre (Duke University), Omar Knio (Duke University), Bert Debusschere (Sandia National Laboratories)
Abstract: We present a resilient domain-decomposition preconditioner for partial differential equations (PDEs). The algorithm reformulates the PDE as a sampling problem, followed by a solution update through data manipulation that is resilient to both soft and hard faults. The algorithm exploits data locality to reduce global communication. We discuss a server- client implementation where all state information is held by the servers, and clients are designed solely as computational units. Focusing on the stages of the algorithm that are most intensive in communication and computation, we explore the scalability of the actual code up to 12k cores, and build an SST/macro skeleton allowing us to extrapolate up to 50k cores. We show the resilience under simulated hard and soft faults for a 2D linear Poisson equation.
Two-page extended abstract: pdf