School of computer science

Georgia Institute of Technology

CS4290/CS6290HPCA Fall 2010


Homework assignment #1
Due: Thursday, September 21, Before class (no late submission)
Hyesoon Kim, Instructor

This is an individual assignment. You can discuss this assignment with other classmates but you should do your assignment individually.

  1. (20 pts) In lab #1, we used the print_pipeline function to print out the latch in each pipeline stage. Complete the pipeline latch values when the simulator is executing the instructions in the following sequence. Show the value of each pipeline latch at the end of each cycle. The execution time of all instructions is one cycle, except for multiplication (which takes 2 cycles) and division (which takes 4 cycles). An instruction can read the register value at the same cycle when an instruction writes a value at WB_stage.


    MUL R4<-R3*R2 // PC:0x800

    ADD R5<-R1+R2 // PC:0x804

    BR (R5) TARGET // PC:0x808

    ADD R6<-R5+R2 // PC:0x80c

    LD R7<-MEM[R6] // PC:0x810

    TARGET ADD R6<-R1+R2 // PC:0x814

    DIV R5<-R6/R2 // PC:0x818



    *BR is taken in the code. Fetch address is the address which is used for fetching an instruction from the I-cache. The first cycle is already filled. Fill out the rest of the table.
    Cycle Fetch address FE_stage ID_stage EX_stage MEM_stage
    1 0x800 MUL xxx xxx xxx
    2 0x804 ADD MUL xxx xxx
    3 0x804 ADD MUL xxx xxx
    4 0x808; BR ADD MUL xxx
    5 0x80c BR xxx ADD MUL
    6 0x80c BR xxx xxx ADD
    7 0x80c xxx BR XXX XXX
    8 0x80c xxx xxx BR xxx
    9 0x80c xxx xxx xxx BR
    10 0x814 ADD xxx xxx xxx
    11 0x818 DIV ADD xxx xxx
    12 0x818 DIV xxx ADD xxx
    13 0x818 DIV xxx xxx ADD
    14 0x82c ? DIV xxx xxx
    15,16,17 ? ? DIV xxx xxx
    18 ? ? xxx DIV xxx
    19 ? ? xxx xxx DIV


  2. (20 pts) Using Tomasulo's algorithm, for each instruction in the following sequence determine when (in which cycle, counting from the start) it issues, begins execution, and writes its result to the CDB, and commit. Assume that the result of an instruction can be written at the next cycle of its execution. An instruction becomes eligible for execution in the cycle that follows the one in which the last operand was broadcast on the CDB. The execution time of all instructions is two cycles, except for multiplication (which takes 4 cycles) and division (which takes 8 cycles). The processor has one multiply/divide unit and one add/subtract unit. The multiply/divide unit has two reservation stations and the add/subtract unit has four reservation stations. None of the execution units is pipelined-each can only be executing one instruction at a time. If a conflict for use of the CBD occurs, the result of the add/subtract unit has priority over the result of the multiply/divide unit. Assume that at start all instructions are already in the instruction queue, but none has yet been issued to any reservation stations. The processor can issue only one instruction per cycle, and there is only one CDB for writing results. A reservation station and a functional unit are freed at the same cycle when an instruction writes results (broadcast on the CDB).


    solution 1.1: only one mulitply or divide instruction can be executed and only one add or subtract instruction can be exectued at the same time.


    Instruction Operands Issue Execution Write Commit
    MUL F2<-F1*F1 1 2 7 8
    DIV F4<-F2/F6 2 8 16 17
    ADD F4<-F3+F1 3 4 6 18
    MUL F3<-F1*F6 8 17 21 22
    SUB F1<-F2-F5 9 10 12 23
    ADD F4<-F5+F6 10 13 15 24



    solution 1.2: only one mulitply or divide instruction can be executed and only one add or subtract instruction can be exectued at the same time.


    Instruction Operands Issue Execution Write Commit
    MUL F2<-F1*F1 1 2 7 8
    DIV F4<-F2/F6 2 8 16 17
    ADD F4<-F3+F1 3 4 6 18
    MUL F3<-F1*F6 8 16 20 21
    SUB F1<-F2-F5 9 10 12 22
    ADD F4<-F5+F6 10 12 14 23



    solution 2: one mulitply and one divide instructions can be executed. one add and one subtract instructions can be exectued at the same time.


    Instruction Operands Issue Execution Write Commit
    MUL F2<-F1*F1 1 2 7 8
    DIV F4<-F2/F6 2 8 16 17
    ADD F4<-F3+F1 3 4 6 18
    MUL F3<-F1*F6 8 9 14 19
    SUB F1<-F2-F5 9 10 12 20
    ADD F4<-F5+F6 10 11 13 21

  3. (10 points) In the programming assignment #1, even though the processor is an in-order processor, it can cause WAW dependencies. Discuss when WAW dependencies can occur, and how can we solve this problem?
    There are at least more than one solution.

    Because of valid bits in the register file is only one bit, it can causes WAW Dependences.
    Here is one example. Assuming the following sequence of instructions:
    ADD R1, R2, R3 // I1
    SUB R1, R3, R4 // I2
    MUL R3, R1, R5 // I3
    Cycle valid bit register[1] FE_stage_latch ID_stage_latch EX_stage_latch MEM_stage_latch
    1 1 ADD xxx xxx xxx
    2 0 SUB ADD xxx xxx
    3 0 MUL SUB ADD xxx
    4 0 MUL xxx SUB ADD
    5 1   MUL xxx SUB
    At cycle 5, R1 register valid bit it set true by the ADD instruction. However, the MUL instruction should use the outcome of the SUB instruction.
    This is a WAW dependency.


    Possible solutions.
    Solution 1: Check WAW dependence. If there is WAW dependence, it stalls the pipeline.
    Solution 2: Before an instruction sets a valid bit in the WB stage, the processor checks the destination register ids of instructions in the MEM/EX stage instructions. If destination register ids are the same, the processor does not set the valid bit to true.
    Solution 3: A reference counter. Whenever there is a write to a register, it increments the counter and when the processor writes a result into the register file it decrements the counter. It sets the valid bit as true only if the reference counter value is 0.



    [4-5] A graduate student, Mary, likes drinking coffee. She drinks coffee every day. She wrote a simple program to decide which coffee shop to go every day. Here is the code. (to simplify the problem, we assume that Mary drinks coffee only Th,F, Sa and Sunday)
    class date_c{
    	public:
    	void get_coffee();
    };
    class weekday_c: public date_c{
    	public:
    	void get_coffee();
    }
    class saturday_c:public date{
    	public:
    	void get_coffee();
    }
    class sunday_c:public date{
    	public:
    	void get_coffee();
    }
    
    
    void TutoTh_c::get_coffe(){
    // beginning of the function body starts at 0x800
    	If (time >= 7 am && time < 8 am) printf("go to Seattle's Best\n");  // br: pc 0x801
    	else if (time >= 8 am && time < 1pm ) printf ("go to CCB \n"); // br: pc 0x807
    	else if (time < 6 pm ) printf ("go to IIB \n");  // br: pc 0x80a
    	else if (time <=10 pm) printf("go to StarBucks\n");  // br: pc 0x815 
    	else printf("sorry. No more coffee\n"); 
    }
    
    void MF_c::get_coffe(){
    // beginning of the function body starts at 0x820
    	If (time >= 7 am && time < 8 am) printf("go to Seattle's Best\n");  // br: pc 0x821
    	else if (time >= 8 am && time < 1pm ) printf ("go to CCB \n"); // br: pc 0x827
    	else if (time < 5 pm ) printf ("go to IIB \n");  // br: pc0x82a
    	else if (time <=10 pm) printf("go to StarBucks\n");  // br: pc 0x825 
    	else printf("sorry. No more coffee\n"); 
    }
    
    void Sa_c::get_coffee() {
    // beginning of the function body starts at 0x900
    	if (time >= 7 am && time < 10 pm) printf(\"go to Starbucks \n"); // br: pc 0x902
    	else printf("sorry, No more coffee\n"); 
    
    }
    void Su_c::get_coffee(){
    // beginning of the function body starts at 0xa00
    	if (time> =10 am && time < 6 pm) printf("go to Starbucks \n"); // br: pc 0xa03
    	else printf("sorry,  No more coffee\n"); 
    }
    
    main()
    {
    // initial code. // week array is initialized 
    for (today= Thursday ; today <= Sunday; today++) { // br: pc 0x100 : NT: Loop exit 
    	date_c *data = week[today];
           for (time = 7 am ; time < 5 pm; time = time + 6) {  // br: pc 0x102, NT Loop exit
               date->get_coffee(); // indirect branch at PC 0x110
          }
    }
    

  4. (10 points) We execute the code in a processor which uses the BTB to predict indirect branches. Calculate the branch prediction accuracy of indirect branch. (Hint: fill out the target addresses of indirect branch X.)


    Accuracy = 50%


      Th F Sa Su
    7 am 0x800 0x820 0x900 0xa00
    1 pm 0x800 (hit) 0x820 (hit) 0x900 (hit) 0xa00 (hit)



  5. (20 points) We execute the code in a processor which has a 3-bit history gshare branch predictor.
    What is the overall accuracy of branch predictor?
    For each branch shows: which entry (i.e. entry index), is used to predict it, whether the prediction is correct, what is the GHR value (before prediction, after update), what is the entry value (after update). We assume that the entry value and the GHR will be updated--after the branch is predicted--with a correct value.
    The pc_pred address is used the lowermost 3 bits of the branch address. GHR is initialized as "000" at the beginning. All two bit counters in the PHT are initialized as "10"(weekly taken). We assume that only the following branches might be executed. Ignore all other branches in the code. (0x801, 0x807, 0x80a, 0x815, 0x821, 0x827, 0x82a, 0x825, 0x902, 0xa03, 0x100, 0x102).
    Note that indirect branches do not update the GHR.


    Branch prediction accuracy = 18/29



    We provide a table to help you calculate the prediction accuracy but the table is not part of the grading.

    br Address T/NT pc_pred (low 3 bit) GHR (pred) Entry index Prediction Correct? Entry value (updated value) GHR (updated value)
    0x100 T000000000 TY11001
    0x102 T010001011 TY11011
    0x801 T001011010 TY11111
    0x102 T010111101 TY11111
    0x801NT001111110 TN01110
    0x807NT111110001 TN01100
    0x80a T010100110NTN10001
    0x102NT010001011 TN10010
    0x100 T000010010 TY11101
    0x102 T010101111 TY11011
    0x821 T001011010 TY11111
    0x102 T010111101 TY11111
    0x821NT001111110 TN01110
    0x827NT111110001NTY00100
    0x82a T010100110NTN10001
    0x102NT010001011 TN01010
    0x100 T000010010 TY11101
    0x102 T010101111 TY11011
    0x902 T010011001NTN01111
    0x102 T010111101 TY11111
    0x902 T010111101 TY11111
    0x102NT010111101 TN10110
    0x100 T000110110 TY11101
    0x102 T010101111 TY11011
    0xa03NT011011000 TN10110
    0x102 T010110100 TY11101
    0xa03 T011101110 TY11011
    0x102NT010011001NTY00110
    0x100NT000110110 TN10100