libSkylark: A Framework for High-Performance Matrix Sketching for Statistical Computing
Authors: Georgios Kollias (IBM Corporation), Yves Ineichen (IBM Corporation), Haim Avron (IBM Corporation), Vikas Sindhwani (Google), Ken Clarkson (IBM Corporation), Costas Bekas (IBM Corporation), Alessandro Curioni (IBM Corporation)
Abstract: Matrix-based operations lie at the heart of many tasks in machine learning and statistics. Sketching the corresponding matrices is a way to compress them however preserving their key properties. This translates to dramatic reductions in execution time when the tasks are performed over the sketched matrices, while at the same time retaining provable bounds within practical approximation brackets. libSkylark is a high-performance framework enabling the sketching of potentially huge, distributed matrices and then applying the machinery of associated statistical computing flows. Sketching typically involves projections on randomized directions computed in parallel. libSkylark integrates state-of-the-art parallel pseudorandom number generators and their lazily computed streams with communication-minimization techniques for applying them on distributed matrix objects and then chaining the output into distributed numerical linear algebra and machine learning kernels. Scalability results for the sketching layer and example applications of our framework in natural language processing and speech recognition are presented.
Two-page extended abstract: pdf