Emulating In-Memory Data Rearrangement for HPC Applications
Authors: Christopher W. Hajas (University of Florida), G. Scott Lloyd (Lawrence Livermore National Laboratory), Maya B. Gokhale (Lawrence Livermore National Laboratory)
Abstract: As bandwidth requirements for scientific applications continue to increase, new and novel memory architectures to support these applications are required. The Hybrid Memory Cube is a high-bandwidth memory architecture containing a logic layer with stacked DRAM. The logic layer aids the memory transactions; however, additional custom logic functions to perform near-memory computation are the subject of various research endeavors.
We propose a Data Rearrangement Engine in the logic layer to accelerate data-intensive, cache unfriendly applications containing irregular memory accesses by minimizing DRAM latency through the coalescing of disjoint memory accesses. Using a custom FPGA emulation framework, we found 1.4x speedup on a Sparse-Matrix, Dense-Vector benchmark (SpMV). We investigated the multi-dimensional parameter space to achieve maximum speedup and determine optimal cache invalidation and memory access coalescing schemes on various sizes/densities of matrices.
Two-page extended abstract: pdf