Efficient Large-Scale Sparse Eigenvalue Computations on Heterogeneous Hardware
Authors: Moritz Kreutzer (Friedrich-Alexander University Erlangen-Nürnberg), Andreas Pieper (Ernst Moritz Arndt University of Greifswald), Andreas Alvermann (Ernst Moritz Arndt University of Greifswald), Holger Fehske (Ernst Moritz Arndt University of Greifswald), Georg Hager (Friedrich-Alexander University Erlangen-Nürnberg), Gerhard Wellein (Friedrich-Alexander University Erlangen-Nürnberg), Alan R. Bishop (Los Alamos National Laboratory)
Best Poster Finalist
Abstract: In quantum physics it is often required to determine spectral properties of large, sparse matrices.
For instance, an approximation to the full spectrum or a number of inner eigenvalues can be computed with algorithms based on the evaluation of Chebyshev polynomials.
We identify relevant bottlenecks of this class of algorithms and develop a reformulated version to increase the computational intensity and obtain a potentially higher efficiency, basically by employing kernel fusion and vector blocking.
The optimized algorithm requires a manual implementation of compute kernels.
Guided by a performance model, we show the capabilities of our fully heterogeneous implementation on a petascale system.
Based on MPI+OpenMP/CUDA, our approach utilizes all parts of a heterogeneous CPU+GPU system with high efficiency.
Finally, our scaling study on up to 4096 heterogeneous nodes reveals a performance of half a petaflop/s, which corresponds to 11% of LINPACK performance for an originally bandwidth-bound sparse linear algebra problem.
Two-page extended abstract: pdf