;-*- Text -*- ; 1-Apr-98 ; ; RCS: $Id$ ; lecture notes for CS3760, 1-Apr-98 1. Administrivia "Homework 0" is just a questionaire; Please add your e-mail address. Prereqs: I'm assuming "digital logic" and "assembly language" as the direct prereqs. You also need to do a fair amount of programming in C for the projects. Since the projects are simulators, though, it's a stylized" kind of C built around skeleton code that we will provide. The upshot is that it should be straightforward to pick up even if you've had only a little exposure to C. Reading: Buzz through chapters 1-3 and appendicies A & B. This should mostly be either pleasure reading (ch. 1), old news (ch. 3, A & B) or quite lightweight (ch. 2). 2. Overview. This course is about where digital logic meets assembly language (or vice versa). You take two fields you know about in a disconnected way, smoosh 'em together and get lots of interesting stuff. The basic idea is that you build digital logic circuits to interpret assembly language programs. One thing that's interesting is that you can build an interpreter for the same assembly language in a lot of different ways depending on your requirements. Requirements might be high speed, low cost, low energy. 3. Abstraction. Another way to look at this course is that it fills in a void in a "stack" of abstractions you've been building in previous courses. Abstraction is a crucial technique for structuring complex systems in any engineering discipline. Consider the following stack of abstractions: database + macros: ---------------- macros database UI: ---------------- C + SQL calls SQL: ---------------- C + linked lists Linked list package: ----- C C: - - - - - - - - - <-- compiled layer (others are assembly language: ---------------- interpreted) X machine code: ---------------- X X processor/ X computer system X X "control" --> --------+------- <--- "datapath" X FSMs + combinational logic FSM + ALUs, etc.: ---------------- circuits w/state flip-flops: -------- gate circuits gates: ---------------- transistor circuits transistors: ---------------- silicon structures The X'd stuff is new to this course. Starting near the bottom, in digital logic you start with an abstraction of a gate. The beauty of an abstraction is that once you've defined it, you don't have to worry about the lower levels at all. Someone builds you gates out of transistors but you don't have to worry about how it was done, just about the specification of the gate. Moving up, as should be familiar from digital logic, gates can be composed into structures with state (e.g. flip flops). Flip-flops and combinational logic circuits (gate circuits without state) can be composed into things like finite-state machines (FSMs) and other recognizable components like adders and arithmetic-logical-units (ALUs). That's about as far as the digital logic course gets. Starting just above the X'd stuff, in the assembly-language course you learned assembly language. Higher-level languages like C are an abstraction above assembly. As you know from programming, you can (and should) construct your own abstractions to structure your own programs. For instance, you might write a linked-list package and then use that in writing a bigger program, like an SQL (database jargon; I'm outta my depth here :-) server. Then someone else, who knows nothing about how your SQL server works, comes along and writes a database user interface (UI) to it. Finally, at the pinnacle, along comes a database program user who knows nothing about any of the glorious structure underneath but can extend the database program with simple macros & things. This class attacks the X'd part in the middle. We're going to build digital circuits (using the best available abstractions like FSMs rather than fooling (much) directly with low-level gates). Since processors are complex, there are layers of abstraction within them. I said that the book mentions "datapaths" and "control" (to be discussed later). I should have pointed out that there are others, e.g. the memory system, the input/output system, etc. Finally, for completeness, there's a very thin layer of abstraction between assembly-language and machine code. Assembly language is the human-readable version of machine code. There's usually a one-to-one correspondence between a line of assembly and a binary word of machine code, so there's not a huge distinction between the two. Interesting: there are two ways to implement a layer of abstraction: you can *interpret* a layer or *compile* it. Interpretation means the next lower layer sequentially reads-and-executes instructions/lines/commands/whatever from the layer above at "run time". Compilation means that you use an external agent (the compiler) to inspect instructions/lines/commands/whatever at a layer and convert them into the instructions/lines/commands/whatever of the next lower layer. In the example, programs written in C are ordinarily compiled while all the other layers are ordinarily interpreted. It needn't be that way. For instance, you could compile SQL down to C. Compiling has a startup (compile time) cost but is usually runs much faster at run time. SHOULD HAVE SAID: one thing you should get out of looking at this (now) whole stack of abstractions is that between this class and the previous ones, you'll have covered "computation" from the ground up... Now you can go out and happily hack at any level in this stack -- despite the complexity of the whole thing, you work by focussing on one abstraction at a time and blocking the rest out. 4. Performance. Chapter 2 discusses this topic at length. The word "peformance" begs the question of what is it that we want to perform... "Speed", as in unit computation per unit time, is the obvious answer but there are other. For instance, "cost", as in computation per dollar. If you have a laptop, you're interested in computation per unit energy ("one battery" is a unit of energy). The idea of "unit computation" is slippery, however. What we really want is to measure the computation we care about. If it's a game of Doom, then it's time to run a game of Doom. Unfortunately, real computation that people care about is hard to come by, so people resort to benchmarks of various kinds. Chapter 2 discusses the standard SPEC benchmark in great detail. The bottom line is that these benchmarks are bogus but they're the best we can do so we have to try. Performance in terms of execution time can be broken down somewhat, e.g.: instructions/program * time/instruction = instructions/program * cycles/instruction * time/cycle Time/cycle is the inverse of the "clock rate" widely touted in CPU marketing, e.g. a 300MHz Pentium II has a time/clock of 1/300MHz = 3.3nS. Historically, most improvement in computer performance has come from this number, but you can improve performance by improving any one of these factors in the performance equation. Even getting past the defintion of unit computation, speed performance remains slippery because there are two definitions you might use: "execution time" for one program and "throughput", when you run multiple programs. execution time = time to run one program. throughput = programs run per unit time. Superficially, it seems that throughput should be 1/execution time. However, if there's any opportunity to run programs in parallel, the throughput can be higher than 1/execution time. One example is a lab full of people running doom. The execution time of any one game is the same as the execution time on one computer. However, the throughput of the lab as a whole is potentially up to N times 1/execution time on one computer, where N is the number of computers in the lab. Parallelism of this sort only helps you if you (a) have the means to run things in parallel and (b) have the desire to run things in parallel. For instance, the administrator of the lab may care about throughput, but things the administrator does to improve throughput (e.g. add more machines) doesn't help at all if what you, as an individual, want is lower execution time. Later we will see that there are circuit-level opportunities to trade off execution time and throughput. 5. Course outline. Hokay, back to explaining what this course is about. As said above, it's about the collision of digital logic and assembly-language programming. What we're going to do is build interpreters for assembly language out of digital circuits. The plan is to learn this stuff primarily via three reasonably substantial programming projects (in C). The projects are all based on an (extremely) simple processor specification which will be implemented in several ways. 1. Write a "behavioral" simulator. I.e. one that interprets machine code instructions in C with no regard to circuit structure. 2. Write a simulator for a "low-cost" style circuit for a pretty loose definition of low-cost. 3. Write a simulator for a pipelined RISC circuit that uses the same instruction set as the other two. This style corresponds to high-performance microprocessors that were state of the art about 10 years ago. Interesting: the pipeline RISC is the big contribution of the two authors of the textbook. Their work was a quite a revolution at the time. The lessons were twofold: a. First and less important, they recognized that by simplifying instructions ("Reduced Instruction Set Architecture"), you could reduce both cycles/instruction and time/cycles (in the equation somewhere above) and that the impact on instructions/program was not severe. There are two things going on here. Simple instructions are faster in hardware. Also, by lowering the level of the instruction set abstraction, the compiler could do a better job. b. Second and more important, they introduced the discipline of actually measuring performance into a field that previously depended more on intuition. 6. The fact that we stop about 10 years ago raises the question of what are architects doing now. Well, microprocessors have gotten increasingly complicated; that's why we leave the details to 4760. Interestingly, the complexity problem is so severe that it's starting to impact computer architecture from unexpected directions. The one I mentioned in class is design time. Companies want to produce new generations of microprocessor at the maximum possible rate to best exploit changes in integrated circuit technology. The extreme complexity of current designs increases design (and verification) time. For this and other reasons, there appears to be another revolution brewing in architecture with the goal of further simplification. In the "abstraction" diagram, we observed that compilation is often faster than interpretation. Many proposals for simplification hope to leverage the compiler even further to compile down to sub-elements of a processor, like individual functional units. Or even down to the gates. Mucho rhetoric on this topic available if you're interested :-)