Out-of-Core Sorting Acceleration using GPU and Flash NVM
Authors: Hitoshi Sato (Tokyo Institute of Technology), Ryo Mizote (Tokyo Institute of Technology), Satoshi Matsuoka (Tokyo Institute of Technology)
Abstract: We propose a sample-sort-based out-of-core sorting acceleration technique, called xtr2sort, that deals with multi-level memory hierarchy of GPU, CPU and Flash NVM, as an instance for future computing systems with deep memory hierarchy. Our approach splits the input records into several chunks to fit on GPU and overlaps I/O operations between Flash NVM and CPU, data transfers between CPU and GPU, and sorting on GPU in an asynchronous manner. Experimental results show that xtr2sort can sort up to 64 times larger record size than in-core GPU sorting and 4 times larger record size than in-core CPU sorting, and achieve 2.16 times faster than out-of-core CPU sorting using 72 threads, even the input records cannot fit on CPU and GPU. The results indicate that I/O chunking and latency hiding approach works really well for GPU and Flash NVM, and a possible approach for future big data processing with extreme computing techniques.
Two-page extended abstract: pdf