CS7290 Advanced microarchitecture
Fall 2014
Instructor: Prof. Hyesoon Kim
Due: October 6 (Monday) 6 pm
Improving the shared last-level cache performance
Overview:
In this assignment, you will improve the performance of shared cache. Now, you will simulate multi-programmed workloads.
Each core has its own private cache and all the cores are sharing the last-level cache.
The baseline architecture just has a LRU cache replacement policy for the LLC. You can implement cache portioning schemes like UCP or other newly proposed cache replacement policies. You can implement schemes that were presented in other papers or you can propose a new one. The only constrain is that you are improving the performance of LLC.
You might use params_4c for your params.in file.
Grading policy:
- (25 points) Implementation of a last-level cache management scheme.
- (75 points) Discussions of the reports
- Extra 10 points: If you propose a new mechanism, which is better than LRU and different from previous mechanisms, you will get extra 10 points.
What to submit:
- [1] Source codes (only the relevant files)
- [2]Report: You will write a 3-page double column report with IEEE (http://www.ieee.org/conferences_events/conferences/publishing/templates.html)
The report might have the following 5 sections.
1. Introduction 2. Descriptions of the mechanisms (You should include the hardware overhead) 3. Evaluation methodology (include architecture parameters and benchmark characteristics) 4. Results and discussions 5. Related work 6. Conclusions
Suggested experiments: 4-core configurations.
- Step 1: Classify applications into high cache miss rate apps (H) vs. low cache miss rate apps (L)
- Step 2: Create a mixture of applications for evaluations
You randomly select 3 cases for each category. (the same applications can be selected multiple times.) : Total 3 cases.
Category: H-H-H-H, H-H-L-L
e.g.) let’s say that mcf, milc and libquantum are belong to H-class and bzip2, dealII, and gcc are belong to L-class
Examples of H-H-H-L are mcf-mcf-mcf-bzip2, mcf-milc-libquantum-gcc, milc-milc-mcf-gcc, Step 3: Evaluate LRU and your scheme for 4 cores.
Notes:
Relevant source code: cache.cc
Hints on how to implement new cache polices:
- Similar to a branch predictor factory, you create a new cache factory and overload the entire cache_c class.
- The main function that you have to implement
- find_replacement_line () and you probably need to implement several book-keeping mechanisms.