BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20151117T173000Z DTEND:20151117T180000Z LOCATION:18AB DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: We present a scalable implementation of the Linearized Augmented Plane Wave method for distributed memory systems, which relies on an efficient distributed, block-cyclic setup of the Hamiltonian and overlap matrices and allows us to turn around highly accurate 1000+ atom all-electron quantum materials simulations on clusters with a few hundred nodes. The implementation runs efficiently on standard multi-core CPU nodes, as well as hybrid CPU-GPU nodes. Key for the latter is a novel algorithm to solve the generalized eigenvalue problem for dense, complex Hermitian matrices on distributed hybrid CPU-GPU systems. Performance test for Li-intercalated CoO$_2$ supercells containing 1501 atoms demonstrate that high-accuracy, transferable quantum simulations can now be used in throughput materials search problems. A systematic comparison between our new hybrid solver and the ELPA2 library shows that the hybrid CPU-GPU architecture is considerably more energy efficient than traditional multi-core CPU only systems for such complex applications. SUMMARY:Efficient Implementation of Quantum Materials Simulations on Distributed CPU-GPU Systems PRIORITY:3 END:VEVENT END:VCALENDAR