STATuner: Efficient Tuning of CUDA Kernels Parameters
Authors: Ravi Gupta (Purdue University), Ignacio Laguna (Lawrence Livermore National Laboratory), Dong H. Ahn (Lawrence Livermore National Laboratory), Todd Gamblin (Lawrence Livermore National Laboratory), Saurabh Bagchi (Purdue University), Felix Xiaozhu Lin (Purdue University)
Abstract: CUDA programmers need to decide the block size to use for a kernel launch that yields the lowest execution time. However, existing models to predict the best block size are not always accurate and involve a lot of manual effort from programmers. We identify a list of static metrics that can be used to characterize a kernel and build a Machine Learning model to predict block size that can be used in a kernel launch to minimize execution time. We use a set of kernels to train our model based on these identified static metrics and compare its predictions with the well-known NVIDIA tool called Occupancy Calculator on test kernels. Our model is able to predict block size that gives average error of 4.4% in comparison to Occupancy Calculator that gives error of 6.6%. Our model requires no trial runs of the kernel and lesser effort compared to Occupancy Calculator.
Two-page extended abstract: pdf