Portable Performance of Large-Scale Physics Applications: Toward Targeting Heterogeneous Exascale Architectures Through Application Fitting
Student: William Killian (University of Delaware)
Supervisor: George Zagaris (Lawrence Livermore National Laboratory)
Abstract: Physics simulations are one of the driving applications for supercomputing and this trend is expected to continue as we transition to exascale computing. Modern and upcoming hardware design exposes tens to thousands of threads to applications, and achieving peak performance mandates harnessing all available parallelism in a single node. In this work we focus on two physics micro-benchmarks representative of kernels found in multi-physics codes. We map these onto three target architectures: Intel CPUs, IBM Blue Gene/Q, and NVIDIA GPUs. Speedups on CPUs were up to 16x over our baseline while speedups on Blue Gene/Q and GPUs peaked at 46x and 23x, respectively. We achieved 54% of peak performance on a single core. Using compiler directives with additional architecture- aware source code utilities allowed for code portability. Based on our experience, we list a set of guidelines for programmers and scientists to follow towards attaining a single, performance portable implementation.
Two-page extended abstract: pdf