Improving Application Concurrency on GPUs by Managing Implicit and Explicit Synchronizations
Student: Michael C. Butler (University of Missouri)
Supervisor: Michela Becchi (University of Missouri)
Abstract: GPUs have progressively become part of shared computing environments, such as HPC servers and clusters. Commonly used GPU software stacks (e.g., CUDA and OpenCL), however, are designed for the dedicated use of GPUs by a single application, possibly leading to resource underutilization. In recent years, several node-level runtime components have been proposed to allow the efficient sharing of GPUs among concurrent applications; however, they are limited by synchronizations embedded in the applications or implicitly introduced by the GPU software stack.
In this work, we analyze the effect of explicit and implicit synchronizations on application concurrency and GPU utilization, design runtime mechanisms to bypass these synchronizations, and integrate these mechanisms into a GPU virtualization runtime named Sync-Free GPU (SF-GPU). The resultant runtime removes unnecessary blockages caused by multitenancy, ensuring any two applications running on the same device experience limited to no interference. Finally, we evaluate the impact of our proposed mechanisms.
Two-page extended abstract: pdf