Multi-Level Blocking Optimization for Fast Sparse Matrix Vector Multiplication on GPUs
Authors: Yusuke Nagasaka (Tokyo Institute of Technology), Akira Nukada (Tokyo Institute of Technology), Satoshi Matsuoka (Tokyo Institute of Technology)
Abstract: Many scientific and industrial simulations require solving large linear equations, whose bottleneck is sparse matrix vector multiplication (SpMV). Although some previous work has shown improvement of SpMV performance on GPU, the critical bottlenecks such as requirement of high memory bandwidth and low cache hit ratio due to random memory access to input vector still remain. We propose the state of the art sparse matrix format reducing memory access for GPU. Adaptive Multi-level Blocking (AMB) format compresses the column index by using 16-bit integer and several blocking optimizations, and we also devise effective SpMV kernel. We evaluated the performance of our approach for 62 positive definite large size matrices in single precision. AMB format achieves significant speedup of x2.83 on maximum and x1.75 on average compared to cuSparse library and x1.38 on maximum and x1.08 on average compared to yaSpMV, which is recently proposed fast SpMV library.
Two-page extended abstract: pdf