Non-Blocking Preconditioned Conjugate Gradient Methods for Extreme-Scale Computing
Student: Paul Eller (University of Illinois at Urbana-Champaign)
Supervisor: William Gropp (University of Illinois at Urbana-Champaign)
Abstract: To achieve the best performance on extreme-scale systems we need to develop more scalable methods. For the preconditioned conjugate gradient method (PCG), dot products limit scalability because they are a synchronization point. Non-blocking methods have the potential to hide most of the cost of the allreduce and avoid the synchronization cost due to performance variation across cores.
We study three scalable methods that rearrange PCG to reduce communication latency by using a single allreduce (L56PCG, PIPECG) and/or overlap communication and computation using a non-blocking allreduce (NBPCG, PIPECG). Tests on up to 32k cores of Blue Waters show that current non-blocking solver implementations cannot efficiently enough overlap communication and computation to overcome the increased vector operations cost. However performance models show potential for non-blocking solvers to be more scalable than PCG. Performance models show that the ability to minimize the impact of noise throughout PCG may be a key benefit.
Two-page extended abstract: pdf