BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:2.0 BEGIN:VEVENT DTSTART:20151117T231500Z DTEND:20151118T010000Z LOCATION:Level 4 - Lobby DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Matrix multiplication is a fundamental performance primitive ubiquitous in all areas of science and engineering. In this work we present GiMMiK: a generator of bespoke matrix multiplication kernels for block by panel type multiplications where the block matrix is constant. GiMMiK exploits a priori knowledge of this matrix to generate highly performant CUDA code for NVIDIA GPUs. The performance of GiMMiK kernels is particularly apparent when the matrix has some degree of sparsity. GiMMiK embeds matrix entries directly in the code and eliminates multiplies by zeros. Together with the ability of GiMMiK kernels to avoid poorly optimised cleanup code, GiMMiK is able to outperform cuBLAS on a variety of real-world problems. Speedups of 10 times are found on a K40c for a 294 × 1029 matrix with 99% sparsity. It is open source and released under a three clause BSD license. SUMMARY:Beating cuBLAS: Automatically Generating Bespoke Matrix Multiplication Kernels Using GiMMiK PRIORITY:3 END:VEVENT END:VCALENDAR