SC15 Austin, TX

Scaling Uncertainty Quantification Studies to Millions of Jobs


Authors: Tamara L. Dahlgren (Lawrence Livermore National Laboratory), David Domyancic (Lawrence Livermore National Laboratory), Scott Brandon (Lawrence Livermore National Laboratory), Todd Gamblin (Lawrence Livermore National Laboratory), John Gyllenhaal (Lawrence Livermore National Laboratory), Rao Nimmakayala (Lawrence Livermore National Laboratory), Richard Klein (Lawrence Livermore National Laboratory)

Abstract: Computer simulations for evaluating impacts of parameters and assessing the likelihood of outcomes typically involve hundreds to millions of executions on hundreds of processors each and can take months to complete using current resources. Reducing the turn-around time by scaling concurrent simulation ensemble runs to millions of processors is hindered by resource constraints. Our poster describes a preliminary investigation of mitigating the impacts of these limitations in the LLNL Uncertainty Quantification Pipeline (UQP) using CRAM. CRAM virtualizes MPI by splitting a program into separate groups, each with its own sub-communicator for running a simulation. We launched a single process problem on all 1.6 million Sequoia cores in under 40 minutes versus 4.5 days. The small problem size resulted in 400,000 concurrent ensemble simulations. Preparations are underway to demonstrate using CRAM to maximize the number of simulations we can run within our allocated partition for an ensemble using a multi-physics package.

Poster: pdf
Two-page extended abstract: pdf


Poster Index