Investigating Prefetch Potential on the Xeon Phi with Autotuning
Authors: Saami Rahman (Texas State University), Ziliang Zong (Texas State University), Apan Qasem (Texas State University)
Abstract: Prefetching is a well-known technique that is used to hide memory latency. Modern compilers analyze the program and insert prefetch instructions in the compiled binary. The Intel C Compiler (ICC) allows the programmer to specify two parameters that can help the compiler insert more accurate and timely prefetch instructions. The two parameters are -opt-prefetch and -opt-prefetch-distance. When unspecified, ICC uses default heuristics. In this work, we present the results of autotuning the two mentioned parameters and the its effect on performance and energy. Choosing these parameters using analysis by hand can be challenging and time consuming as it requires knowledge of memory access patterns as well as significant time investment. We have developed a simple autotuning framework for the Xeon Phi architecture that automatically tunes these two parameters for any given program. We used the framework on 4 memory intensive programs and gained up to 1.47 speedup and 1.39 greenup.
Two-page extended abstract: pdf