BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:2.0 BEGIN:VEVENT DTSTART:20151117T231500Z DTEND:20151118T010000Z LOCATION:Level 4 - Lobby DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: We present a new distributed-FFT library. Despite the extensive work on FFTs, we achieve=0Asignificant speedups. Our library uses novel all-to-all communication algorithms to overcome this barrier. These schemes are modified for GPUs to effectively hide PCI-e overhead. Even though we do not use GPUDirect technology, the GPU results are either better or almost the same as the CPU times (corresponding to 16 or 20 CPU cores). We present performance results on the Maverick and Stampede platforms at the Texas Advanced Computing Center (TACC) and on the Titan system at the Oak Ridge National Laboratory (ORNL). Comparison with P3DFFT and PFFT libraries show a consistent $2-3\times$ speedup across a range of processor counts and problem sizes. Comparison with FFTE library (GPU only) shows a similar trend with $2\times$ speedup. The library is tested up to 131K cores and 4,096 GPUs of Titan, and up to 16K cores of Stampede. SUMMARY:AccFFT: A New Parallel FFT Library for CPU and GPU Architectures PRIORITY:3 END:VEVENT END:VCALENDAR