;-*- Text -*- ; 29-May-98 ; ; RCS: $Id$ ; Notes for Friday, 29-May-98 Caching. Reading: Chapter 7. ---------------- 1. there's a fundamental tradeoff between speed and size... big memories are slow. Memory takes up physical space. At a minimum, you have to wait for the speed-of-light delay for electrical signals to cross that space. More realisticaly, big things have a lot of capacitance and capacitors take time to charge. storage access (+xfer) time reasonable size ------- ------------------- --------------- register 200pS 8 bytes register file < 1nS 640 bytes (& 8 ports) on-chip SRAM 2nS 32KB (& 1 port) SRAM 5nS 1MB DRAM 50nS + 5nS/byte 8MB/chip DRAM system 200nS + 5nS/byte 1GB+ disk 10mS + 200nS/byte 18GB/disk disk system 10mS + 200nS/byte 200GB+ tape system 200S + 1uS/byte + 2. So here's the trick: use a small, fast memory (a cache) between the processor and your big, slow memory. When you want to access memory, *check* in the small memory first. 3. Why should this trick be beneficial at all?? "locality", in particular "temporal locality" -- locations in memory that you've used recently are likely to be used again. Think of a loop in program code or stack frames in memory. a. temporal locality: words of memory are reused. In particular, a word in memory that has just been accessed tends to be re-accessed relatively soon. Like the weather, the recent past tends to predict the immediate future. b. spacial locality: words appear to be accessed in "runs" e.g. instructions, stack accesses, array references. 4. Here's a simple cache memory: V tag data --------------------------- | | | | --------------------------- The "tag" contains enough information to tell you whether this is item you want. I.e., if there's one word of data here, then the tag is the address of the data. The "V" (valid) bit tells whether the line is valid at all. That's to deal with, say, the power-on case where the tag matches some location in memory but the data does not. Here's a circuit to go with that cache: add a comparator on the tag and address, and the output with the "V" bit, and use the resulting signal (named "HIT") it to select whether to take data from the cache or from the memory). Also the HIT bit needs to be fed back to whatever state machine is controlling the processor so the processor knows whether it is getting its result back from the cache in one cycle or from the memory in some larger number of cycles. 5. One word of memory is pretty boring. There are three dimensions in which to expand a cache like this (show circuits for each): a. add more words of data with the same tag. E.g. four words of data with the same upper bits of address. Use the upper 30 bits of the address as the tag and the bottom 2 bits to choose one of the four words in the block. Advantage: only one comparator so it's still fast. Disadvantage: works well only if you happen to use consecutive words of data together. Happiliy, this happens a lot ("spacial locality"). b. add more tag/data pairs, but pick the line you're going to use before checking any of the tags. E.g. use 8K RAMs for both the tag and the data. Use the bottom 13 bits of address to pick the line (tag and data) that you will use. Check the upper 19 bits of address against the tag. c. duplicate the whole schmoodle (tag & data). This is extremely flexible, but expensive, both in size and in speed. Adds an ambiguity in how to replace an entry when you run out. 6. The tricks can be combined. For instance, the configuration in the 21264 is 32KB, 2-way set associative with 64-byte blocks.