In this paper, we present a GPU-based sorting algorithm, GPUMemSort, which achieves high performance
in sorting large-scale in-memory data by exploiting high-parallel GPU processors. It consists of two algorithms:
in-core algorithm, which is responsible for sorting data in
GPU global memory efficiently, and out-of-core algorithm,
which is responsible for dividing large scale data into multiple chunks that fit GPU global memory. GPUMemSort
is implemented based on NVIDIA CUDA framework and
some critical and detailed optimization methods are also
presented. The tests of different algorithms have been run
on multiple data sets. The experimental results show that
our in-core sorting can outperform other comparison-based
algorithms and GPUMemSort is highly effective in sorting
large-scale in-memory data.