Project 2--CMPE/CS 4760 (Fall 1998) 1 Purpose This project is intended to give you practice with advanced pipelining features, including dynamic scheduling, branch prediction and speculative execution. 2 Introduction In this project, you will implement the algorithms for dynamic scheduling and speculative execution using Tomasulo's algorithm, reservation stations and reorder buffers. Optionally, you may also implement branch prediction using a branch target buffer. To keep the project reasonably simple, we will not worry about the specifics of the pipeline implementation. Rather, you will do a simple behavioral simulation, described below in more detail. 2.1 Program input As before, the input to your simulator will be a machine code file. Assume the same instruction set and instruction format that you used in the last project: LW, SW ADD, ADDI SUB AND, ANDI BEQZ J HALT NOOP 2.2 Behavioral simulation This project does NOT require a detailed simulation of the pipeline, as in project 1. Rather, we will execute instructions using a simple behavioral model, and focus on the Tomasulo's implementation. For example, for the add instruction, we will determine on what cycle the add will complete, and at that point, we will (logically) drive the result=regA + regB onto the common data bus, latching the value into any waiting reservation station or reorder buffer entries. Your simulation should follow these steps on every clock cycle: 1) First check whether the instruction at the head of the reorder buffer is ready to commit. If so, check whether the instruction is a conditional branch or a jump; for a taken conditional branch or a jump (always taken), you will need to kill the instructions that have been fetched after the branch/jump and start fetching instructions at the correct target. (Your initial implementation should use a predict-branch-not-taken strategy.) (Optional: extra credit. Implement a branch target buffer to predict the outcome of the branch. Use a two-state branch predictor.) If the instruction being committed is not a taken branch or a jump, make any state changes necessary to registers or memory. Free the reorder buffer entry for the committed instruction. 2) Next, for all instructions that have been issued, perform whatever action is required. There are four states of execution: Issuing, for instructions that have been issued but are waiting for all operands to become available; Executing, for instructions in the middle of execution; Writing Result, for instructions that have finished execution and are writing results to reservation stations and reorder buffers; and Committing, for instructions that are waiting to commit. An instruction is in the Issuing state after it has been assigned a reorder buffer entry and a reservation station. It remains in this state until all its operands are available, at which time it moves to the Executing state. We discuss the process of issuing instructions in step 3). To execute an instruction, we fill in an entry in the reservation station called ExTimeLeft, or execution time left. When we start executing, we fill this in with the latency of executing an instruction. For every cycle thereafter, we decrement the count of ExTimeLeft. When it reaches zero, the instruction moves into the WritingResult state. The WritingResult state writes a result to the reorder buffer holding the current instruction. It also sends the value to any reservation stations waiting for the result of that functional unit. Note that the writingResult stage does NOT change the register file or memory. We aren't allowed to do this since the execution is still speculative until commit time. After the result is written to the reorder buffer, we can free the reservation station so that other instructions can issue. An instruction waits in the Committing state until it reaches the head of the reorder buffer queue, where it either commits successfully or causes a flush of the pipeline, as described above. 3) Finally, on every cycle, we check whether we are able to issue a new instruction. To issue an instruction, we must determine that a free reorder buffer entry and a free reservation station for the appropriate functional unit are available. We must issue instructions IN-ORDER. 2.3 Execution Units and Times Assume that we have the following execution units: 2 load buffers 2 store buffers 2 integer units Assume the following execution times, in numbers of cycles, for various instruction types: /* * Execution times in cycles */ #define BRANCHEXEC 3 #define LDEXEC 2 #define STEXEC 2 #define INTEXEC 1 For the branch instruction, assume that the branch is fully resolved after 3 cycles of execution. At that point, you have the branch target and branch condition available. Before that time, you don't know the outcome of the branch, but you can (OPTIONALLY) use the branch target buffer to do branch prediction. 3 Major Components 3.1 Reservation Stations Unlike in class, we'll use reservation stations to implement Tomasulo's algorithm for integer instructions. These aren't as long-running as the floating point operations, but you can still learn how to implement the algorithms. When the instruction is issued, we copy the register VALUES, if they are available, into the Vj and Vk fields of the reservation station. If the values are not available, we indicate in the Qj and Qk fields which execution unit will produce the value we are waiting for. For instructions that take only one register operand (e.g., I-type), set Rk=1 and Vk=0. Logially, the reservation monitors the common data bus waiting for these values to appear. We simulate this, as described above, by consulting the reservation stations when we produce results at an execution unit. typedef struct _resStation { int instr; int busy; int Rj; /* Rj, Rk indicate whether Vj, Vk are valid */ int Rk; int Vj; /* Vj, Vk will hold operand values */ int Vk; int Qj; /* Qj, Qk hold execution unit numbers that */ int Qk; /* will produce the result */ int exTimeLeft; /* Remaining execution time for instruction */ int reorderNum; /* Number of reorder buffer for this instruction. */ }resStation; 3.2 Reorder buffer To issue an instruction, there must be an empty reorder buffer entry. We store the instruction as well as the execution unit that will produce the result. When the result is generated, it is also stored in the reorder buffer entry, and the valid flag is set. The reorder buffer is a circular buffer. When issuing an instruction, we look at the tail of the buffer list to see if a buffer is free. When committing instructions, we look at the entry at the head of the buffer list. We must commit IN ORDER. When the entry at the head of the queue has a valid result, we attempt to commit the instruction. Under what circumstances do we not commit the instruction? Under those conditions, what further actions must we take? #define RBSIZE 16 /* Reorder buffer has 16 entries */ typedef struct _reorderEntry { int busy; /* reorder buffer is in use */ int instr; /* instruction being executed */ int execUnit; /* execution unit producing result */ int instrStatus; /* current status of instruction */ int valid; /* flat thag indicates that result value is valid */ int result; /* holds speculative result until commit */ }reorderEntry; 3.3 Register result status Assume this data structure is part of the register file. It indicates whether the register contains the current (valid) value, or is waiting for a particular reorder buffer to commit an instruction that is producing the value. When filling in the reservation station, we will consult the register result status register to fill in the Vj, Vk, Qj, and Qk fields. typedef struct _regResultEntry { int valid; /* 1 if regFile holds correct value, else 0 */ int reorderNum; /* reorder buffer number that will commit the value */ }regResultEntry; 3.4 HALT and NOOP instructions Assume that HALT and NOOP instructions have to go through the integer unit and do consume a reservation station as well as a reorder buffer. This is a bit wasteful of our reservation stations, but it will allow you to treat them much like other instructions and guarantee that you commit all the instructions in the correct order. 3.5 The SW instruction Unlike the other instructions, the SW instruction requires that you store two values in the reorder buffer: the register to be sent to memory and the address at which the value will be stored. We have added a storeAddress value to the reorder buffer for this purpose. 3.6 The branch and jump instructions *** Slight Change!! *** Calculate the PC-relative branch or jump targets and store these in the storeAddress field of the reorder buffer. On a conditional branch, you can store the branch condition in the result field of the reorder buffer. (For a jump, you can just set this value to 1 unconditionally.) If you implement the branch target buffer, you can get the branch target from that buffer and store it in the storeAddress field. Make sure that you don't change the PC or flush the reorder buffer/reservation stations on a taken branch until the branch reaches the head of the reorder buffer queue. 3.7 Branch Target Buffer (OPTIONAL--EXTRA CREDIT) It is ok to assume that the branch is not taken and then flush the reorder buffer and reservation stations when this prediction is wrong. However, you may optionally implement a branch target buffer to make the predictions more accurate. You could implement the branch target buffer as a fully-associative or a direct-mapped cache. You should use a 2-bit branch history scheme to make your predictions. If the current PC (of the instruction being fetched) matches an entry in the branch target buffer, then the instruction being fetched is a branch instruction. Consult the branch target buffer to see if the branch is predicted taken. If we predict taken, then replace the PC value with the predicted branch target and continue execution. Otherwise, predict not taken and leave the PC unchanged. The first time you execute a branch, add it to the buffer along with the branch target and prediction. If the branch is taken the first time it is executed, make the first prediction in the BTB the weak predict taken state; otherwise, make the first prediction weak not taken. /* * Branch prediction states */ #define STRONGTAKEN 0 #define WEAKTAKEN 1 #define STRONGNOT 2 #define WEAKNOT 3 Every time a branch is executed, update the prediction according to the state transitions described in lecture and the book. When inserting a branch into the BTB, look for an empty location. If there are none, you must replace an existing entry. You may choose either a random or LRU replacement strategy for choosing which entry to replace. #define BTBSIZE 8 /* 8 entries in branch target buffer */ typedef struct _btbEntry{ int valid; /* valid bit set if entry not empty */ int branchPC; /* PC of branch instruction */ int branchTarget; /* when predict taken, update pc with target */ int branchPred; /* prediction; uses 2-bit branch history */ }btbEntry; 3.5 Overall Machine State Here is the overall machine state. typedef struct _machineState{ int pc; int cycles; /* how many cycles have executed */ resStation reservation[NUMUNITS]; reorderEntry reorderBuf[RBSIZE]; regResultEntry regResult[NUMREGS]; btbEntry btBuf[BTBSIZE]; int memory[MEMSIZE]; int regFile[NUMREGS]; }machineState; 4.0 Simulator template This web page contains a simulator template as well as some function names and comments from my implementation. This template is intended to help rather than limit you. Don't feel obligated to structure your code the way I did. Use any functions and define any variables you like. The important thing is that you don't modify the printState function. As before, try to match your output with mine as much as possible. Whether you use my code or not, you should look through the comments, because they give some suggestions on how to fill in various fields. They should make it easier for you to match my sample output.