FAQ
How to debug a branch predictor?
You can debug by having a very short GHR (3 bits or 4 bits ) something like that and print out all the PHT entries (8 or 16 entries)
and then check how 2 bit counters and GHR are all updated.
Will there be any pipeline modifications in the later assignment?
In the 3rd assignment, you will add a memory system which is independent from your current pipeline code. You need to add thread id feature to SMT but that does not require significant changes in your data structure.
I'm not maintaining the valid bit in the register file that you provided for programming assignment #1. Is it OK?
Yes, we will not check any register valid bits for grading any more.
Can we fetch instructions after a branch if a branch is correctly predicted ?
This is a very good question. To simplify the homework,
we assume that the processor fetch instructions after a branch only if a branch is correctly predicted at the following cycle.
If a branch is mispredicted, the processor should not fetch instructions after the mispredicted branch.
This is because we are developing a trace-driven simulator.
In a real hardware, the processor fetch instructions if a branch is not taken at the same cycle.
Typically the processor brings a cache block so it can fetch instructions from the same cache block.
However, we do not model this behavior in the simulator. If a processor has very aggressive I-cache or trace-cache mechanism, it can fetch instructions across branches.
How can we know a branch's direction?
Op->actually_taken (1:taken 0: not taken)
Will you check pipeline latches to grade our homework?
No, this time we do not check pipeline latches. So you can modify pipeline latches or you do not have to use them if you prefer other data structures.
Do we need to check memory dependences?
For the assignments, we assume that we have a perfect memory dependence predictor.
When should the processor update the register file?
It should update the register file at the commit stage. We do not check the register valid bits to grade.
Do we need to collect control hazard and data hazard for this assignment?
No, we will not use these stats for grading.
What does N-stage pieplined FE stage mean? Does it mean that the FE stage stall N-cycle to handle one instruction or N-cycle stage can take N instruction?
N-stage pipelined FE stage means that one instruction takes N cycles
but each cycle the FE can start to fetch the next instruction.
If an instruction stream is ADD1, ADD2 ~ ADD 6, and FE/ID/WB stage depth are 5/2/2 cycles. Assume that ADD takes 2 cycles in the EX stage.
ADD takes two cycles in the EX stage so cycle 8 and cycle 9 look the same.
Time | FE1 | FE2 | FE3 | FE4 | FE5 | ID1 | ID2 | EX | MEM | WB1 | WB2 |
cycle 1 | ADD1 | | | | | | | | |
cycle 2 | ADD2 | ADD1 | | | | | | | |
cycle 3 | ADD3 | ADD2 | ADD1 | | | | | | |
cycle 4 | ADD4 | ADD3 | ADD2 | ADD1 | | | | | |
cycle 5 | ADD5 | ADD4 | ADD3 | ADD2 | ADD1 | | | | |
cycle 6 | ADD6 | ADD5 | ADD4 | ADD3 | ADD2 | ADD1 | | | |
cycle 7 | ADD7 | ADD6 | ADD5 | ADD4 | ADD3 | ADD2 | ADD1 | | | |
cycle 8 | ADD8 | ADD7 | ADD6 | ADD5 | ADD4 | ADD3 | ADD2 | ADD1 | | |
cycle 9 | ADD8 | ADD7 | ADD6 | ADD5 | ADD4 | ADD3 | ADD2 | ADD1 | | |
cycle 10 | ADD9 | ADD8 | ADD7 | ADD6 | ADD5 | ADD4 | ADD3 | ADD2 | ADD1 | |
cycle 11 | ADD9 | ADD8 | ADD7 | ADD6 | ADD5 | ADD4 | ADD3 | ADD2 | bubble | ADD1 |
cycle 12 | ADD10 | ADD9 | ADD8 | ADD7 | ADD6 | ADD5 | ADD4 | ADD3 | ADD2 | bubble | ADD1 |
What should be the initial value of the GHR?
GHR is initialized with 0. PHT is initialized with 10b (weakly taken.).
- When is a register value available to the dependent instructions if the WB has multiple stages?
To simplify the hardware design, we assume that the data is available at the last stage in WB. Typical hardware has data forwarding so data is available right after the execution stage.