; ; 27-Apr-98 ; ; RCS: $Id$ ; Lecture notes for Friday, 24-Apr-98 This lecture returned to the single-bus implementation of the LC. 1. The elements in the datapath were selected to cover the instructions needed for the LC. Last Friday, we showed the sequence of states needed to interpret an ADD instruction. The BEQ instruction is interpreted by using a subtract operation in the ALU, then using the Z (zero-test) circuit to check whether the result is zero. Here's a state-transition table with states for both ADD and BEQ instructions: OP Z Current | Bus Next 3210 0 State | Action State --------------------------------------- XXXX X 0 | MAR <- PC 1 ! fetch the next ! instruction XXXX X 1 | IR <- RAM 2 ! (i.e. read memory) 0000 X 2 | 17 ! ADD 0001 X 2 | ? ! NAND 0010 X 2 | ? ! LW 0011 X 2 | ? ! SW 0100 X 2 | 20 ! BEQ 0101 X 2 | ? ! JALR 0110 X 2 | ? ! HALT 0111 X 2 | - 57 ! NOOP 1000 X 2 | ? ! 1001 X 2 | ? ! 1010 X 2 | ? ! 1011 X 2 | ? ! 1100 X 2 | ? ! 1101 X 2 | ? ! 1110 X 2 | ? ! 1111 X 2 | ? ! [...] XXXX X 17 | A <- REG[RA] 18 ! perform the req'd ! sequence of actions XXXX X 18 | B <- REG[RB] 19 ! for ADD... XXXX X 19 | REG[RB] <- A+B 57 ! (i.e. ALU does +) [...] XXXX X 20 | A <- REG[RA] 21 ! intepret a BEQ ! by first comparing XXXX X 21 | B <- REG[RB] 22 ! RA & RB (using subtract) ! and then branch XXXX X 22 | Z <- zero(A - B) 23 ! based on the Z result XXXX 0 23 ! - 57 ! not zero: do nothing XXXX 1 23 ! - 24 ! is zero: do branch XXXX X 24 | A <- PC 25 ! do the branch: ! add the OFFSET XXXX X 25 | B <- OFFSET 26 ! field from the IR ! to the PC, then XXXX X 26 | PC <- A+B 0 ! start fetching there. [...] XXXX X 57 | A <- PC 58 ! housekeeping: ! update the PC to XXXX X 58 | B <- 1 59 ! point to the ! next instruction. XXXX X 59 | PC <- A+B 0 Stepping back for a moment, consider that the sequence of states in the FSM looks very much like a program. Also consider that the datapath circuit is actually pretty general and we could program it to intepret a lot more instructions than are in the LC. This notion of building a relatively general datapath and then "programming" it is called "microprogramming" or "microcoding". You can think of the contents of the two FSM ROMs as microcode. I've presented this idea as if it were a completely natural way of structuring a circuit (i.e. of dealing with complexity), but, like anything else, it took awhile for architects to see the power of the idea. The classic paper on microcode is: M. V. Wilkes and J. B. Stringer. Micro-Programming and the Design of the Control Circuits in an Electronic Digital Computer. Proceedings of the Cambridge Philosophical Society, pages 230-238, 1953. 2. What is the minimum clock period for this circuit? It depends on the worst case path. My guess is that the worst-case path is the 4 gigaword RAM, in which case: Tclock > Tpd_reg + Tpd_RAM + Tpd_RAM-driver + Tsetup. Note, though, that if its only the RAM that is slow, we can rearrange the microcode to give the RAM extra cycles to compute. You give the RAM extra time by adding "wait" states after writing the MAR but before driving the RAM. This sequence gives the RAM four clock cycles to compute: MAR <- PC - - - IR <- RAM Once we do that, the clock period can be made as small as limited by the next-worst-case path. 3. The execution time of a program on this computer (or any computer) is: Instrs cycles seconds Execution time = ------ * ------ * ------- Program Instr. cycle The instrs/program is fixed if we fix the instruction set. The seconds/cycle (clock frequency) is pretty good for a circuit like this, actually. Implemented on one chip, it could probably be clocked at 1 GHz the same as the latest RISC processors. The real limiting factor is cycles/instr. This single-bus architecture has a serious limit on the minimum number of cycles it takes to execute an instruction because of the single bus. Higher-performance implementations we will look at will exploit "P" (Parallelism) in various forms to reduce the number of cycles/instruction. The first form will be multiple busses... What about _lower_ performance implementations of the LC? Or rather, to be politic, say "lower cost", or "lower power" versions (which also happen to run slower)? You can make a cheaper/slower version of the LC by computing fewer bits in parallel (e.g. 16 or 8). The extreme is to compute one bit at a time. Processors used in calculators and watches operate this way.