LOCATION:18AB
DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: We consider techniques to improve the performance of parallel sparse triangular solution on non-uniform memory architecture multicores by extending earlier coloring and level set schemes for single-core multiprocessors. We develop STS-k, where k represents a small number of transformations for latency reduction from increased spatial and temporal locality of data accesses. We propose a graph model of data reuse to inform the development of STS-k and to prove that computing an optimal cost schedule is NP-complete. We observe significant speed-ups with STS-3 on a 32-core Intel Westmere-Ex. Execution times are reduced on average by a factor of 6 (83%) for STS-3 with coloring compared to a reference implementation using level sets. Incremental gains solely from the k level transformations in STS-k correspond to reductions in execution times by factors of 1.4 (28%) and 2 (50%) respectively, relative to reference implementations with level sets and coloring.
SUMMARY:STS-k: A Multilevel Sparse Triangular Solution Scheme for NUMA Multicores
