Optimization Strategies for Materials Science Applications on Cori: An Intel Knights Landing, Many Integrated Core Architecture
Student: Luther D. Martin (National Energy Research Scientific Computing Center)
Supervisor: Zhengji Zhao (Lawrence Berkeley National Laboratory)
Abstract: NERSC is preparing for the arrival of its Cray XC40 machine dubbed Cori. Cori is built on Intel’s Knights-Landing Architecture. Each compute node will have 72 physical cores and 4 hardware threads per core. This is 6x the number of physical cores and 10x the number of virtual cores than the Cray XC30 machine. Cori also comes with a larger hardware vector unit, 512 bits, and high-bandwidth, on-package memory. While most of the current applications that currently run on NERSC's XC30 machine will be able to execute on Cori with little to no code refactoring, they will not be optimized and may suffer performance loss. This paper recounts the effectiveness of three optimization strategies on the materials science application VASP:
1. Increasing on node parallelism by adding OpenMP where applicable.
2. Refactoring code to allow compilers to vectorize loops
3. Identifying candidate arrays for the high-bandwidth, on-package memory.
Two-page extended abstract: pdf