School of computer science

Georgia Institute of Technology

CS4290/CS6290HPCA Fall 2011


Programming assignment #3
Due: Simulator (10/20) (Th) 6:00 pm
Report: (10/25) Hard copy only Before the class
Hyesoon Kim, Instructor

This is an individual assignment. You can discuss this assignment with other classmates but you should do your assignment individually except for the one extra bonus point question. Please follow the submission instructions. If you do not follow the submission file names, you will not receive the full credit. Please check the class homepage to see the latest update. Your code must run on jinx cluster with g++4.1


Simulator (80%): Complete the memory system


You will extend your Lab #2 pipeline design. Please add the code in add_me3.txt appropriate places. userknob.h and simknob.h files have been updated, please use the updated files.

Step 1:
You need to complete the dcache_access function in this assignment. I-cache is still a perfect cache for this assignment

To activate your dcache structure, you must turn off KNOB_PERFECT_DCACHE.
e.g.) ../../../pin -t obj-intel64/sim.so -perfect_dcache 0 -readtrace 1 -- /bin/ls
Note that KNOB_PERFECT_DCACHE should work even after you implement a data cache. Hence, when KNOB_PERFECT_DCACHE value is 1, regardless of data cache size, all the cache access should be cache hit.

The cache has a 64B block size, true LRU and write-through policy. D-cache access latency is set by KNOB_DCACHE_LATENCY. Note that, a load/store instruction still takes load/store instruction latency cycles in side the execution stage. Hence, if there is a cache miss, the processor needs to wait at least KNOB_MEM_LATENCY_ROW_HIT/MISS cycles. You implement the write-allocate policy, so for both store and load misses, you bring the entire cache block. However, you can retire store instructions even before the requested block is serviced. We are implementing a non-blocking cache. Even if an instruction generates a cache miss, the pipeline continues to execute if there are ready instructions.
KNOB_DCACHE_SIZE, and KNOB_DCACHE_WAY set the cache configurations. Cache size should use a K-Byte unit

e.g) ../../../pin -t obj-intel64/sim.so -perfect_dcache 0 -dcache_size 1 -dcache_way 4 -readtrace 1 -- /bin/ls
cache size = 1KB, 1024/4/64=4 sets



To provide hints to build a cache, a stand alone cache simulator, cache.cc is provided. You can design your own cache structure.


Step 2:
You need to implement a MSHR to handle memory latency correctly. The size of MSHR is determined by KNOB_MSHR_SIZE.


Summary of how to handle memory instructions in the core Step 3: Modeling a DRAM

relevant data structures:

Knobs related to this assignment

KNOB_DCACHE_SIZE: data cache size (kbytes) (default value: 512 i.e., 512KB)
KNOB_DCACHE_WAY: N-way set associative data cache (default value: 4)
KNOB_DCACHE_LATENCY: cache latency when a cache hit (default value: 5)
KNOB_MEM_LATENCY_ROW_HIT: DRAM access latency when row buffer hit. ( default value: 100)
KNOB_MEM_LATENCY_ROW_MISS: DRAM access latency when row buffer miss ( default value: 200)
KNOB_MSHR_SIZE: the number of entries in the MSHR ( default value is 4)
KNOB_DRAM_BANK_INDEX_SIZE: log2 of (the number of DRAM banks) (default value is 2 i.e. 4)
KNOB_DRAM_BANK_ROW_ADDR_BITS: the number of row address bits. (default value is 20)
KNOB_DRAM_PAGE_SIZE: the size of DRAM banks (unit: KB) (default value is 2 i.e., 2KB)


You have to update dcache_hit_count, dcache_miss_count accordingly.





Submission Guide
Please do not turn in pzip files(trace files). Trace file sizes are so huge so they will cause a lot of problems.
(Tar the lab3 directory. Gzip the tarfile and submit lab3.tar.gz file at T-square)
cd pin-2.8-36111-gcc.3.4.6-ia32_intel64-linux/source/tools

cd lab3
make clean
rm *.pzip
cd ..
tar cvf lab3.tar lab3
gzip lab3.tar



Report (20%)
Include your simulation results in a report. You do not need to submit any traces. Please note that there are many simulation cases so it will take several hours to simulate all of them. Please consider to use the Jinx job batch system to simulate your work. 10M instructions will provide enough data so you can reduce the simulation time by simulating only 10M instructions.

The default configuration is