LIBXSMM: A High Performance Library for Small Matrix Multiplications
Authors: Alexander Heinecke (Intel Corporation), Hans Pabst (Intel Corporation), Greg Henry (Intel Corporation)
Abstract: In this work we present a library, LIBXSMM, that provides a high performance
implementation of small sparse and dense matrix multiplications on latest Intel architectures. Such operations are important
building blocks in modern scientific applications and general math libraries are normally tuned for all dimensions
being large. LIBXSMM follows a matrix multiplication code generation approach specifically matching the applications' needs. By
providing several interfaces, the replacement of BLAS calls is simple and straightforward.
We show that depending on the application's
characteristics, LIBXSMM can either leverage the entire DRAM bandwidth or reaches close to the processor's
computational peak performance.
Our performance results of CP2K and SeisSol
therefore demonstrate that using LIBXSMM as a highly-efficient computational
backend, leads to speed-ups of greater than two compared to compiler
generated inlined code or calling highly-optimized vendor math libraries.
Two-page extended abstract: pdf