;-*- Text -*- ; 3-Jun-98 ; ; RCS: $Id$ ; notes for Wednesday, 3-Jun-98 Caching, wrapup Reading: chapter 7 Topics: 1. cache associativity 2. virtual memory (in 5 minutes) 3. show & tell ---------------- Cache associativity. 1. We talked about adding rows to the cache & indexing using part of the address (each row has a tag, but the tag is smaller); we talked about adding blocking to the cache (multiple data with the same tag). The third dimension in which to expand a cache is to duplicate the whole schmoodle. This dimension is called "associativity". A "direct-mapped" cache (what we've been building (with or without blocking)) has associativity = 1. An n-way "set-associative cache" duplicates the direct-mapped hardware n times. Conventionally this is drawn as n caches side by side, and you can say that an n-way associative cache has "n columns". Also, in a set-associative cache, the rows are sometimes called "sets". 4-way set-associative w/128 sets, 8 bytes/block = 128 rows, 4 columns, 8 data bytes/block or 32 data bytes/row = 4KBytes total cache size. A "fully-associative cache" has 1 row and a column for every block in the cache. Small, fully-associative caches are often used for special-purpose cache applications, but it's too expensive to make them very big (all those comparators & distributed mux circuitry). There's a picture of a four-way set-associative cache in Figure 7.19 of the book 1.5. Associativity adds a new ambiguity: the replacement policy. When we go to replace a line in the cache, we have to decide which existing line in a set to evict. The (apparently provably) optimal one to evict is the line that will-be-reused at the most distant time in the future ... unfortunately, that algorithm requires prescience! Here are some possible policies: 1. apply the principle of "the past predicts the future" and evict the line used at the most distant time in the past. This is the "Least Recently Used" or LRU strategy. This is a common strategy but requires a lot of bookkeeping as the associativity gets large. 2. FIFO: use up the lines in a set in the order they were allocated. This is sort of like LRU but generally crummier. 3. Random: no bookkeeping! The random number can come from the clock (assuming the clock isn't corrolated to the program). Random is widely used. So here's the effect of associativity. Consider two caches. One has 64K entries and is direct-mapped. The other has 64K total entries but is two-way set associative: the 64K entries are split into two columns of 32K entries each. Now consider the following reference stream: 0 0x10000 0 0x10000 [...] hit rate on the direct-mapped cache = 0% (!) (both addresses have the same 16-bit index (0) and since the direct-mapped cache has only one possible location per address they can't both be stored in the cache at the same time. however, hit rate on the 2-way set associative cahce is 100% (ignoring the first two misses) -- both addresses have the same index, but now there are *two* entries in the cache that can have that same index (they'll have different tags). On the other hand, it's easy to see that we can "break" the 2-way set associative cache by just adding more addresses to our stream that have the same index, e.g.: 0 0x8000 0x10000 0 0x8000 0x10000 [...] With an LRU policy, the set-associative cache will now achieve a hit rate of 0% ! With a FIFO policy it would also achieve 0%. The random policy would do better. Interestingly, the direct-mapped cache does better than 0% with this stream; it gets a hit rate of 1/3. It is amusing that one can contrive situations that work better on one cache than the other *and* vice versa. Amusing enough to make a homework problem out of it, anyway. Since cache effects are subtle and there are tradeoffs to every possible change, the only real way to judge a cache is to measure its performance on some programs that matter. 2. Virtual memory is a fascinating topic that I have to leave to 4760... Or read the rest of Chapter 7.