BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:2.0 BEGIN:VEVENT DTSTART:20151117T231500Z DTEND:20151118T010000Z LOCATION:Level 4 - Lobby DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: To achieve the best performance on extreme-scale systems we need to develop more scalable methods. For the preconditioned conjugate gradient method (PCG), dot products limit scalability because they are a synchronization point. Non-blocking methods have the potential to hide most of the cost of the allreduce and avoid the synchronization cost due to performance variation across cores.=0A=0AWe study three scalable methods that rearrange PCG to reduce communication latency by using a single allreduce (L56PCG, PIPECG) and/or overlap communication and computation using a non-blocking allreduce (NBPCG, PIPECG). Tests on up to 32k cores of Blue Waters show that current non-blocking solver implementations cannot efficiently enough overlap communication and computation to overcome the increased vector operations cost. However performance models show potential for non-blocking solvers to be more scalable than PCG. Performance models show that the ability to minimize the impact of noise throughout PCG may be a key benefit. SUMMARY:Non-Blocking Preconditioned Conjugate Gradient Methods for Extreme-Scale Computing PRIORITY:3 END:VEVENT END:VCALENDAR