arch-beer

Weekly Reading
 

Guru is presenting...

Coherence Miss Classification for Performance Debugging in MultiCore Processors

Guru Venkataramani, Christopher J. Hughes, Sanjeev Kumar, and Milos Prvulovic

To appear in Interact 2009
PDF copy



Multi-core processors offer large performance potential for parallel applications, but writing these applications is notoriously difficult. Tuning a parallel application to achieve scalability, referred to as performance debugging, is often more challenging for programmers than conventional debugging for correctness. Parallel programs have several performance related issues that are not seen in sequential programs. In particular, increased cache misses triggered by data sharing (coherence misses) are a challenge for programmers. Data sharing misses can stem from true or false sharing and the solutions for the two types of misses are quite different. Therefore, to minimize coherence misses, it is not just sufficient for programmers to only identify the source of the extra misses. They should also have information about the type of coherence misses that are hurting performance.

In this paper, we propose a programmer-centric definition of false sharing misses for use in performance debugging. We describe our algorithm to classify coherence misses based on this definition, and explore a practical and low cost solution that keeps no state at all. We find that the low cost solution can suffer from considerable inaccuracy that might mislead programmers in their performance debugging efforts.