School of computer science

Georgia Institute of Technology

CS4290/CS6290HPCA Fall 2011


Programming assignment #4
For CS6290 Stduents : Simulator due: Wednesday (11/9) 6:00 PM, Report due: Thursday (11/10) in the class.
For CS4290 Stduents : Simulator due: Friday (11/18) 6:00 pm, Report due: Tuesday (11/22) in the class

Hyesoon Kim, Instructor


This is an individual assignment. You can discuss this assignment with other classmates but you should do your assignment individually.


Part 1 (80 points) Simulator
Overview
In this assignment, you will extend your simulator to support fine-grained MT.

We need multiple steps to support the MT feature. Adding MT feature requires modifications in multiple places in the simulator. Additional data structures must be added. add_me4.txt file is provided.

First, you need to make the simulator run correctly right after you add add_me4.txt file. add_me4.txt file changes get_op function and adds thread_id into op_struct, add additional knobs to userknob.h and simknob.h. You need to update use new trace.cpp, sim_pin.cpp

Because now the system handles multiple traces, before you add MT feature, you need to make it sure that your simulator still runs one thread just like before and then you add features to support multi traces.

In a real architecture simulator, simulation ending condition should be more sophisticated. However, in this assignment, we do not change the ending conditions. Therefore, simulator reads from only remaining traces until all the traces are finished. max_inst_count is based on the sum of all threads.

  • Fetch Stage
  • Decode Stage
  • Memory
  • WB
  • Stats
  • Knobs related to this assignment


    Submission Guide
    Please follow the submission instructions. If you do not follow the submission file names, you will not receive the full credit. Please check the class homepage to see the latest update. Your code must run on jinx with g++4.1.

    Please do not turn in pzip files(trace files). Trace file sizes are huge so they will cause a lot of problems.
    (Tar the lab4 directory. Gzip the tarfile and submit lab4.tar.gz file at T-square)
    Please make it sure the directory name is lab4!
    cd pin-2.8-36111-gcc.3.4.6-ia32_intel64-linux/source/tools

    cd lab4
    make clean
    rm *.pzip
    cd ..
    tar cvf lab4.tar lab4
    gzip lab4.tar


    Part 2:
    Simluations (20 points)
    Include your simulation results in a report. You do not need to submit any traces. Please note that there are many simulation caess so it will take several hours to simulate all of them. 10M instructions will provide enough data so you can reduce the simulation time by simulating only 10M instructions.

    The default configuration is

  • [0 point] Assume that there are two processors and each processor has a cache which is described in Problem 5 ( see the below). The processors use the MSI coherence protocol. The coherence stats are denoted M, S, and I for Modified, Shared, and Invalid.
    A sequence of one or more CPU operations is specified P#: op, address, -> value. P# designates the CPU (e.g., P0), op is the CPU operation (LDB: load byte, STB: store byte). address denotes the memory address, and -> value indicates the new word to be assigned on a write operation.
    Each action occurs after the previous action is finished. (1) Show only the cache blocks that change, for example P0.B0: (I, 1, 0x01) indicates that CPU P0's block B0 (0 is the index number) has the final state of I, tag of 1, and data 0x1. Assume that the memory is initialized by all 0 at the beginning of the program. (2) Show the final state of the caches after all the following operations. (coherence state, cache tags, data, and index). To simplify the problem you do not need to draw the entire cache. You just need to fill out the relevant entries

    P0: LDB 0x450

    P0: STB 0x448 -> 0x4

    P1: STB 0x450 -> 0x5

    P1: LDB 0x450

    P1: STB 0x448 -> 0x6

    P0: LDB 0x450

    P0: LDB 0x052

    P0: STB 0x452 -> 0x7

    P0: STB 0x852 -> 0x8

    P0: LDB 0x1850

    P1: LDB 0x850

    P1: LDB 0x450

  • [0 point] A computer has a 128KB write back cache and 2MB physical memory. Each cache block is 16B, the cache is 2-way set associative and uses the true LRU replacement policy. Assume 26-bit virtual address space and byte-addressable memory. It uses a 1KB page size. The cache uses a physical tag and physical index. How big (in bits) is the tag store? The cache uses MSI protocol and assume that we need 2 bits to indicate cache coherence states.

  • [0 point] The FRFCFS dram memory scheduler shows better performance than the FCFS dram scheduler. Why is it?



  • (Additional question) p% of a program is vectorizable. We wish to run it on a multiprocessor. Assume we have an unlimited number of processing elements. If the maximum speedup achievable on this program is 30, what is p?