SCHEDULE: NOV 15-20, 2015

AccFFT: A New Parallel FFT Library for CPU and GPU Architectures

SESSION: Regular & ACM Student Research Competition Poster Reception

EVENT TYPE: Posters, Receptions, ACM Student Research Competition

EVENT TAG(S): HPC Beginner Friendly, ACM Student Research Competition Poster

TIME: 5:15PM - 7:00PM

SESSION CHAIR(S): Michela Becchi, Manish Parashar, Dorian C. Arnold

AUTHOR(S):Amir Gholami

ROOM:Level 4 - Lobby


We present a new distributed-FFT library. Despite the extensive work on FFTs, we achieve
significant speedups. Our library uses novel all-to-all communication algorithms to overcome this barrier. These schemes are modified for GPUs to effectively hide PCI-e overhead. Even though we do not use GPUDirect technology, the GPU results are either better or almost the same as the CPU times (corresponding to 16 or 20 CPU cores). We present performance results on the Maverick and Stampede platforms at the Texas Advanced Computing Center (TACC) and on the Titan system at the Oak Ridge National Laboratory (ORNL). Comparison with P3DFFT and PFFT libraries show a consistent $2-3\times$ speedup across a range of processor counts and problem sizes. Comparison with FFTE library (GPU only) shows a similar trend with $2\times$ speedup. The library is tested up to 131K cores and 4,096 GPUs of Titan, and up to 16K cores of Stampede.

Chair/Author Details:

Michela Becchi, Manish Parashar, Dorian C. Arnold (Chair) - University of Missouri|Rutgers University|University of New Mexico|

Amir Gholami - The University of Texas at Austin

