Directive-Based Pipelining Extension for OpenMP
Authors: Xuewen Cui (Virginia Polytechnic Institute and State University), Thomas R. W. Scogland (Lawrence Livermore National Laboratory), Bronis R. de Supinski (Lawrence Livermore National Laboratory), Wu-chun Feng (Virginia Polytechnic Institute and State University)
Abstract: Heterogeneity continues to increase in computing applications, with the rise of accelerators such as GPUs, FPGAs, APUs, and other co-processors. They have also become common in state-of-the-art supercomputers on the TOP500 list. Programming models, such as CUDA, OpenMP, OpenACC and OpenCL are designed to offload compute intensive workloads to co-processors efficiently. However, the naive offload model, synchronously copying and executing, in sequence is inefficient. However, pipelining these activities reduces programmability.
We propose an easy-to-use directive-based pipelining extension for OpenMP. Our extension offers a simple interface to overlap data transfer and kernel computation with an auto-tuning scheduler. We achieve performance improvements between 40% and 60% for a Lattice QCD application.
Two-page extended abstract: pdf