School of computer science

Georgia Institute of Technology

CS4290/CS6290HPCA Fall 2010


Homework assignment #2
Due: Thursday, September 30, Before class (no late submission)
Hyesoon Kim, Instructor

This is an individual assignment. You can discuss this assignment with other classmates but you should do your assignment individually.

  1. (10 points) A computer has a 256KB write back cache. Each cache block is 64 bits, the cache is 8-way set associative and uses the true LRU replacement policy. Assume a 24-bit address space and byte-addressable memory. How big (in bits) is the tag store ?


    # of sets = 256KB/8(#way)/8(block size) = 4KB = 2^12B
    # of index bits = 12
    # of Boffset bits = 3
    # of tag bits = 24-12-3 = 9
    Each tag entry = 9 (tag bits) + 1 (valid bit) + 1 (dirty bit) + 3 (LRU bits) = 14 bits
    tag storage size = (tag entry size) * (# of sets) * (# of ways) = 14 * 8 * 2^12 = 448K bits
  2. (10 points) Discuss mechanisms to hide memory latency other than a cache.

    Out of order scheduler, prefetching, (helper threading), write buffer to support store-load forwarding, TLB to hide memory address translation cost. Non-blocking cache.
  3. (10 points) What are purposes of write buffers in a processor?

    load/store forwarding, merge multiple writes in the same cache block
  4. (10 points) Even an in-order scheduling processor might have an out of order complete case if there is no ROB. When would it happen?
    multiple cycle latency How can it solve this problem without using a rob?

    add dummy pipeline stages along with forwarding path, score boarding. recovery mechanisms: future registers, history buffer,
  5. (10 points) When a processor discovers a branch misprediction in an out of order processor, what are the steps for flushing the pipeline and resuming from the correct path?

    1. Invalidate instructions that are younger than the mispredicted branches inside the rob.
    1. Flush the fetch buffer
    2. Reset the PC value with the correct PC value
    3. Wait until all the instructions that are fetched before the branch instruction are retired.
    4. Reset Register Alias Table
    5. Resume renaming stage. (instructions could have been fetched)
    or we can just start to fetch from at this moment.

    If a processor has a check point of RAT, step 3 can be eliminated. Instead of resetting the register alias table, it recovers from the corresponding check pointed RAT.
  6. (10 points) Machine A is 1GHz and IPC is 1.4. Machine B is 1.5GHz and IPC is 1.2. A program has 90B instructions. Between machine A and machine B, which one is faster? How long does it take to finish the program on both machines respectively?

    Machine A: 90B instructions. 90*10^9/1.4 cycles. 1 cycle = 1/10^9 sec. execution time = 90*10^9/1.4*1/(10^9) = 90/1.4 = 64.29 sec.
    Machine B: 90B instructions. 90*10^9/1.2 cycles. 1 cycle = 1/1.5*10^9 sec. execution time = 90*10^9/1.2*1/(1.5*10^9) = 90/(1.2*1.5) = 50 sec.
    Machine B is faster.