To explore chip-level parallelism, the PSC (Parallel
Shared Cache) model is provided in this paper to describe high
performance shared cache of Chip Multi-Processors (CMP).
Then for a specific application, parallel sorting, a
cache-conscious parallel algorithm, PMCC (Partition-Merge
based Cache-Conscious) is designed based on the PSC model.
The PMCC algorithm consists of two steps: the partition-based
in-cache sorting and merge-based k-way merge sorting. In the
first stage, PMCC first divides the input dataset into multiple
blocks so that each block can fit into the shared L2 cache, and
then employs multiple cores to perform parallel cache sorting to
generate sorted blocks. In the second stage, PMCC first selects an
optimized parameter k which can not only improve the parallelism
but also reduce the cache missing rate, then performs a
k-way merge sorting to merge all the sorted blocks. The I/O
complexity of the in-cache sorting step and k-way merge step are
analyzed in detail. The simulation results show that the PSC
based PMCC algorithm can out-performance the latest PEM
based cache-conscious algorithm and the scalability of PMCC is
also discussed. The low I/O complexity, high parallelism and the
high scalability of PMCC can take advantage of CMP to improve
its performance significantly and deal with large scale problem
efficiently.