; ; 10-Apr-98 ; ; RCS: $Id$ ; Lecture notes for Friday, 10-Apr-98 This lecture wrapped up the discussion of digital logic with an example: a methadology for producing a low-cost circuit to compute a simple equation. The strategy is to use a minimum number of units connected to a single common bus. The flow of data between the units is orchestrated/sequenced using a FSM. 1. ROMs are the same as ROMs... One of the first things I talked about was using a ROM as a universal combinational logic. However, the ordinary/other way to think of a ROM is as a memory, as the name Read-Only Memory implies. The same physical component can be used either way, it's just a matter of how you want to think about it: --------- --------- N | | M N | | M ----/---| |---/--- ----/---| |---/--- | | | | --------- --------- Combinational logic: An Memory: An N-input, M-output N-input, M-output ROM ROM can be viewed as a 2^N can implement any N-input, element array of M-bit words. M-output boolean function. The N-bit input selects one of the 2^N words to be read. A ROM can implement combinational logic because it can be programmed with the truth table for the logic function. In practice, the full generality of a ROM is overkill for combinational logic. What is generally used is a denser Programmable Logic Array (PLA) (described in the book). A given PLA can implement only a subset of possible truth tables, but it's a "large enough" subset. A PLA is like a "compressed" form of a ROM. One other aside: RAM is generally viewed as memory but you can also view RAM as combinational logic. Logic built from RAMs can be reconfigured on the fly! RAM-based logic is the basis of some forms of the currently hot "Field Programmable Gate Array" (FPGA) logic. There is a lot of research interest in trying to improve computing by synthesizing specialized logic for programs. In other words, instead of compiling a program into assembly language/machine language, compile the program all the way down to the gates. 2. Combinational circuits. At this point, we know how to build simple functions out of combinational logic. For instance, suppose we are given a specification to produce a unit that will compute y = a + bx + cx^2 where x is the input, y is the output and a/b/c are constants. Assume a fixed number of bits, say 8. I.e. we are to build a box that looks like this: --------- 8 | | 8 x --/---| |---/-- y | | --------- A ready-made "polynomial" box for this function is not likely to exist (particularly not for the choices of a/b/c). However, assume that we can get 8-bit adders and 8-bit multipliers: --------- --------- ----/---| | ----/---| | | Mul |---/--- | Add |---/--- ----/---| | ----/---| | --------- --------- Given these components, it's easy to see how to wire them up to compute the given function. [It's not quite so easy to draw in ascii...] Advantages of this construction technique: -- it's easy, at least for a problem this small. -- it's fast: all of the computations we do are required by the specification, so the composite propagation delay, Tpd, for our finished circuit represents the minimum achievable for these components. However, there are some disadvantages: -- It's expensive. Combinational multipliers in particular are expensive (many gates/transistors) and this circuit has three. -- It's completely single-purpose. Any change requires a physical alteration to the circuit. 3. A single-bus methodology. Here's a technique for making a circuit with a minimum number of (presumably expensive) computational units. The idea is to make a *sequencial* circuit (i.e. a circuit with registers and a clock) and spend a number of clock cycles to compute the result a step at a time. An expensive computational unit can be reused on different clock cycles. The method is as follows: a. Lay down one each of each required unit. b. Connect the units with one bus: add a tri-state buffer between each unit's output and the bus and a register-with-an-enable between the bus and each unit's input(s). c. Build an FSM to drive all the tri-state and register enable inputs in a sequence that computes the right function. The hard part is part (c) -- you have to apply some creativity to come up with the right sequence. For the example problem, y = a + bx + cx^2, we did the following [again, hard to draw...]: a. The required units in this case are one multipier, one adder and one small ROM to hold the three constants. The ROM has a 2-bit input and one 8-bit output. b. Build the bus structure. i. We added tri-state buffers to the ROM, multiplier & adder and also a fourth tri-state buffer for the input value, X. The enable inputs to these buffers get the (hopefully mnemonic) names: DrROM (i.e "Drive the ROM value onto the bus"), DrMul, DrAdd and DrX. ii. Add registers at the inputs of the multipler & adder (two registers each) and one for the output value, Y. The registers are named Mul1, Mul2, Add1, Add2 and Y. The enable signals for the registers get the names LdMul1, LdMul2, LdAdd1, LdAdd2 and LdY c. Build the FSM. The FSM has zero inputs (we assume it just computes y = a + bx + cx^2 over and over), and one output for every element that needs to be controlled: all the drive signals + all the load signals + the two bits of address for the ROM. That's 11 outputs. The number of bits of state isn't known yet since we haven't yet written out the state transition diagram. In this case, it's easiest to think of the state transition diagram as a series of statements describing bus transfers. For each bus transfer, we'll enable one driver on the bus and enable one register. This idea is easiest to explain by example. Here's my example program (like any program, this took some thought to create): 0: Mul1 <- X ! copy x input to Mul1 register 1: Mul2 <- B ! copy b constant to Mul2 register 2: Add1 <- Mul ! copy bx from the multiplier to Add1 3: Mul2 <- X ! Mul1 & Mul2 now both have x 4: Mul1 <- Mul ! x^2 into Mul1 5: Mul2 <- C 6: Add2 <- Mul ! cx^2 into Add2 (Add1 has bx) 7: Add1 <- Add ! compute bx + cx^2 8: Add2 <- A 9: Y <- Add ! output a + bx + cx^2 Each line specifies a transfer with a register (on the left) as a destination and a tri-state buffer (on the right) as a source. Each transfer takes a clock cycle and only one transfer is possible each cycle because there's only one bus. Now that the state-transition diagram is available (think of each line as a bubble connected by arrows downward) (and assume there's an arrow out of state 9 back to state 0), we can declare that the FSM needs 4 state bits (to hold the 10 states). The FSM is constructed in the usual way, as per Homework 1. The next-state and output ROMs need to be specified. The contents of the next-state ROM are trivial: just a counter: S3 S2 S1 S0 | NS3 NS2 NS1 NS0 ----------------------------- 0 0 0 0 | 0 0 0 1 ! just count up 0 0 0 1 | 0 0 1 0 0 0 1 0 | 0 0 1 1 0 0 1 1 | 0 1 0 0 0 1 0 0 | 0 1 0 1 0 1 0 1 | 0 1 1 0 0 1 1 0 | 0 1 1 1 0 1 1 1 | 1 0 0 0 1 0 0 0 | 1 0 0 1 1 0 0 1 | 0 0 0 0 ! go back to zero 1 0 1 0 | 0 0 0 0 ! the rest of the 1 0 1 1 | 0 0 0 0 ! states don't matter 1 1 0 0 | 0 0 0 0 ! but we might as 1 1 0 1 | 0 0 0 0 ! well make them zero 1 1 1 0 | 0 0 0 0 1 1 1 1 | 0 0 0 0 The real work is in the contents of the output ROM. The outputs are the load/drive signals plus the address inputs to the constant ROM (addr1/addr0). Each line in the program is interpreted by setting the appropriate Ld/Dr bit in the output, the ROM address if necessary and setting all the other bits to zero L L L L D D D d d d d a a r r r M M A A d d D M A R u u d d L d d S S S S | r u d O l l d d d r r 3 2 1 0 | X l d M 1 2 1 2 Y 1 0 ------------------------------- 0 0 0 0 | 1 0 0 0 1 0 0 0 0 x x ! Mul1 <- X 0 0 0 1 | 0 0 0 1 0 1 0 0 0 0 1 ! Mul2 <- B (B at addr 1) 0 0 1 0 | 0 1 0 0 0 0 1 0 0 x x ! Add1 <- Mul 0 0 1 1 | 1 0 0 0 0 1 0 0 0 x x ! Mul2 <- X 0 1 0 0 | 0 1 0 0 1 0 0 0 0 x x ! Mul1 <- Mul 0 1 0 1 | 0 0 0 1 0 1 0 0 0 1 0 ! Mul2 <- C (C at addr 2) 0 1 1 0 | 0 1 0 0 0 0 0 1 0 x x ! Add2 <- Mul 0 1 1 1 | 0 0 1 0 0 0 1 0 0 x x ! Add1 <- Add 1 0 0 0 | 0 0 0 1 0 0 0 1 0 0 0 ! Add2 <- A (A at addr 0) 1 0 0 1 | 0 0 1 0 0 0 0 0 1 x x ! Y <- Add 1 0 1 0 | 0 0 0 0 0 0 0 0 0 x x 1 0 1 1 | 0 0 0 0 0 0 0 0 0 x x 1 1 0 0 | 0 0 0 0 0 0 0 0 0 x x 1 1 0 1 | 0 0 0 0 0 0 0 0 0 x x 1 1 1 0 | 0 0 0 0 0 0 0 0 0 x x 1 1 1 1 | 0 0 0 0 0 0 0 0 0 x x In summary, the single-bus hardware is constructed by cookbook method. The FSM design requires some applied creativity, like any program. The FSM state transition diagram is a program written in a language defined by the available hardware (the register and tri-state buffers). One subtlety: the example program didn't require any temporary registers beyond the ones available as part of the adder and multiplier. If the program had needed temporary storage, we could have added a RAM for that purpose. Compared to the pure combinational approach, this single-bus methodology has some advantages and disadvantages: Advantages of this construction technique: -- it's cheap, assuming registers and tri-state buffers are cheap compared to the computational units. -- it's somewhat more flexible. This hardware can be adapted to other functions simply by changing the contents of a few ROMs rather than rewiring. It's still not completely flexible. The main disadvantage is that it's a lot slower. If multipliers take 100nS and adders take 10nS, then the combinational circuit drawn in class takes 220nS to compute a + bx + cx^2. Assuming register and tri-states are perfect, the single-bus hardware will take more like 1000nS to compute the same result. There are a bunch of reasons why the single-bus hardware is slower: -- We're reusing units instead of duplicating them, so there's no possible parallelism. -- There's added delay from the registers and tri-state buffers. This probably isn't a lot. -- The clock period must be selected to accomodate the slowest part. E.g. if the multiplier takes 100nS and the adder takes 10nS and everything else is free, we set the clock to 100nS. [there are some possible tricks here...] -- There are bus cycles which are completely wasted in the sense that they involve no computation (i.e. 0, 1, 3, 5, 8)