Performance Comparison of the Multi-Zone Scalar Pentadiagonal (SP-MZ) NAS Parallel Benchmark on Many-Core Parallel Platforms
Authors: Christopher P. Stone (Computational Science and Engineering, LLC), Bracy Elton (Engility Corporation)
Abstract: The NAS multi-zone scalar-pentadiagonal (SP-MZ) benchmark is representative of many CFD applications. Offloading this class of algorithm to many-core accelerator devices should boost application throughput and reduce time-to-solution. OpenACC and OpenMP compiler directives provide platform portability, hierarchical thread and vector parallelism, and simplified development for legacy applications. We examine the performance of the SP-MZ benchmark on clusters comprised of NVIDIA GPU and Intel Xeon Phi accelerators. We found that offloading the SP-MZ application to the accelerators was straightforward using the compiler directives. However, significant code restructuring was required to attain acceptable performance on the many-core accelerator devices. We implemented similar optimizations for the Intel Xeon Phi, via OpenMP, and the NVIDIA Kepler GPU, with OpenACC, in order to increase both thread and vector parallelism. We observed comparable performance between the two many-core accelerator devices and to HPC-grade multi-core host CPUs.
Two-page extended abstract: pdf