School of computer science

Georgia Institute of Technology

CS4290/CS6290HPCA Fall 2010


Homework assignment #1
Due: Thursday, September 21, Before class (no late submission)
Hyesoon Kim, Instructor

This is an individual assignment. You can discuss this assignment with other classmates but you should do your assignment individually.

  1. (20 pts) In lab #1, we used the print_pipeline function to print out the latch in each pipeline stage. Complete the pipeline latch values when the simulator is executing the instructions in the following sequence. Show the value of each pipeline latch at the end of each cycle. The execution time of all instructions is one cycle, except for multiplication (which takes 2 cycles) and division (which takes 4 cycles). An instruction can read the register value at the same cycle when an instruction writes a value at WB_stage.


    MUL R4<-R3*R2 // PC:0x800

    ADD R5<-R1+R2 // PC:0x804

    BR (R5) TARGET // PC:0x808

    ADD R6<-R5+R2 // PC:0x80c

    LD R7<-MEM[R6] // PC:0x810

    TARGET ADD R6<-R1+R2 // PC:0x814

    DIV R5<-R6/R2 // PC:0x818



    *BR is taken in the code. Fetch address is the address which is used for fetching an instruction from the I-cache. The first cycle is already filled. Fill out the rest of the table.
    Cycle Fetch address FE_stage ID_stage EX_stage MEM_stage
    1 0x800 MUL xxx xxx xxx
    2          
    3          
    4          
    5          
    6          
    7          
    8          


  2. (20 pts) Using Tomasulo's algorithm, for each instruction in the following sequence determine when (in which cycle, counting from the start) it issues, begins execution, and writes its result to the CDB, and commit. Assume that the result of an instruction can be written at the next cycle of its execution. An instruction becomes eligible for execution in the cycle that follows the one in which the last operand was broadcast on the CDB. The execution time of all instructions is two cycles, except for multiplication (which takes 4 cycles) and division (which takes 8 cycles). The processor has one multiply/divide unit and one add/subtract unit. The multiply/divide unit has two reservation stations and the add/subtract unit has four reservation stations. None of the execution units is pipelined-each can only be executing one instruction at a time. If a conflict for use of the CBD occurs, the result of the add/subtract unit has priority over the result of the multiply/divide unit. Assume that at start all instructions are already in the instruction queue, but none has yet been issued to any reservation stations. The processor can issue only one instruction per cycle, and there is only one CDB for writing results. A reservation station and a functional unit are freed at the same cycle when an instruction writes results (broadcast on the CDB).


    Instruction Operands Issue Execution Write Commit
    MUL F2<-F1*F1        
    DIV F4<-F2/F6        
    ADD F4<-F3+F1        
    MUL F3<-F1*F6        
    SUB F1<-F2-F5        
    ADD F4<-F5+F6        
  3. (10 points) In the programming assignment #1, even though the processor is an in-order processor, it can cause WAW dependencies. Discuss when WAW dependencies can occur, and how can we solve this problem?
    There are at least more than one solution.

    [4-5] A graduate student, Mary, likes drinking coffee. She drinks coffee every day. She wrote a simple program to decide which coffee shop to go every day. Here is the code. (to simplify the problem, we assume that Mary drinks coffee only Th,F, Sa and Sunday)
    class date_c{
    	public:
    	void get_coffee();
    };
    class weekday_c: public date_c{
    	public:
    	void get_coffee();
    }
    class saturday_c:public date{
    	public:
    	void get_coffee();
    }
    class sunday_c:public date{
    	public:
    	void get_coffee();
    }
    
    
    void TutoTh_c::get_coffe(){
    // beginning of the function body starts at 0x800
    	If (time >= 7 am && time < 8 am) printf("go to Seattle's Best\n");  // br: pc 0x801
    	else if (time >= 8 am && time < 1pm ) printf ("go to CCB \n"); // br: pc 0x807
    	else if (time < 6 pm ) printf ("go to IIB \n");  // br: pc 0x80a
    	else if (time <=10 pm) printf("go to StarBucks\n");  // br: pc 0x815 
    	else printf("sorry. No more coffee\n"); 
    }
    
    void MF_c::get_coffe(){
    // beginning of the function body starts at 0x820
    	If (time >= 7 am && time < 8 am) printf("go to Seattle's Best\n");  // br: pc 0x821
    	else if (time >= 8 am && time < 1pm ) printf ("go to CCB \n"); // br: pc 0x827
    	else if (time < 5 pm ) printf ("go to IIB \n");  // br: pc0x82a
    	else if (time <=10 pm) printf("go to StarBucks\n");  // br: pc 0x825 
    	else printf("sorry. No more coffee\n"); 
    }
    
    void Sa_c::get_coffee() {
    // beginning of the function body starts at 0x900
    	if (time >= 7 am && time < 10 pm) printf(\"go to Starbucks \n"); // br: pc 0x902
    	else printf("sorry, No more coffee\n"); 
    
    }
    void Su_c::get_coffee(){
    // beginning of the function body starts at 0xa00
    	if (time> =10 am && time < 6 pm) printf("go to Starbucks \n"); // br: pc 0xa03
    	else printf("sorry,  No more coffee\n"); 
    }
    
    main()
    {
    // initial code. // week array is initialized 
    for (today= Thursday ; today <= Sunday; today++) { // br: pc 0x100 : NT: Loop exit 
    	date_c *data = week[today];
           for (time = 7 am ; time < 5 pm; time = time + 6) {  // br: pc 0x102, NT Loop exit
               date->get_coffee(); // indirect branch at PC 0x110
          }
    }
    

  4. (10 points) We execute the code in a processor which uses the BTB to predict indirect branches. Calculate the branch prediction accuracy of indirect branch. (Hint: fill out the target addresses of indirect branch X.)
      Th F Sa Su
    7 am        
    1 pm        



  5. (20 points) We execute the code in a processor which has a 3-bit history gshare branch predictor.
    What is the overall accuracy of branch predictor?
    For each branch shows: which entry (i.e. entry index), is used to predict it, whether the prediction is correct, what is the GHR value (before prediction, after update), what is the entry value (after update). We assume that the entry value and the GHR will be updated--after the branch is predicted--with a correct value.
    The pc_pred address is used the lowermost 3 bits of the branch address. GHR is initialized as "000" at the beginning. All two bit counters in the PHT are initialized as "10"(weekly taken). We assume that only the following branches might be executed. Ignore all other branches in the code. (0x801, 0x807, 0x80a, 0x815, 0x821, 0x827, 0x82a, 0x825, 0x902, 0xa03, 0x100, 0x102).
    Note that indirect branches do not update the GHR.


    Branch prediction accuracy =


    We provide a table to help you calculate the prediction accuracy but the table is not part of the grading.

    br Address T/NT pc_pred (low 3 bit) GHR (pred) Entry index Prediction Correct? Entry value (updated value) GHR (updated value)