1And in Conclusion¶
Replacement policies:
For direct-mapped caches, each block of memory maps to one specific block in our cache. On a cache miss, if there is data present in that cache block, then we must evict the block to make room for our new data.
For non-direct-mapped caches, we can choose one of multiple cache blocks to place our new data. When our cache is full, we will have to decide which block to evict to make space for the new data. Block Replacement policies decide which block should be replaced.
Least recenty used (LRU)
First-in, first-out (FIFO)
Random
Table 1:Comparison of cache replacement policies.
| Feature | LRU | FIFO | Random |
|---|---|---|---|
| Description | Replace the entry that has not been used for the longest time | Replace the oldest block in the set (queue) | Replace a random block |
| Implementation | Bit counters for each cache block; see quick check | FIFO queue or similar approximation | Flip a figurative hardware coin |
| Advantage | Temporal locality | Reasonable approximation to LRU | Easy to implement |
| Disadvantage | Complicated hardware to keep track of access history | – | Terrible if workload leverages high temporal locality |
Write policies:
Table 2:Comparison of cache write policies.
| Feature | Write-through | Write-back |
|---|---|---|
| Description | Write to the cache and memory at the same time | Write to the cache. Only write back to memory when the block is replaced. |
| Implementation | Easy | More complicated |
| “Synchronized” with memory?[6] | Yes | No. Sometimes cache will have the most current copy of data. |
| Optimization | – | Include a dirty bit to only write back modified blocks. |
| Hit time for writes | Each write to a cache block also requires a write to memory. Longer. | Multiple writes within a cache block require only one write to memory (the latter happens only on replacement). Faster. |
| Miss penalty | “Read” misses are fast and never result in a write to memory. Shorter. | A “read” miss has variable time, since a write to memory may also be needed if the block to be replaced is dirty. Longer on average. |
| AMAT | Longer | Shorter |
2Textbook Readings¶
P&H 5.1-5.4, 5.8, 5.9, 5.13
3Additional References¶
Amazing Illustrations by Ketrina (Yim) Thompson: CS Illustrated Cache Handouts
4Exercises¶
Check your knowledge!
4.1Short Exercises¶
Solution to Exercise 1 #
False.
Solution to Exercise 2 #
True. A direct-mapped cache needs to index every block of the cache, whereas a 4-way set associative cache needs to index every set of 4 blocks. The 4-way set associative cache will have 2 fewer index bits than the direct-mapped cache.
Solution to Exercise 3 #
False. Similar to the previous question, the impact depends on the program. If a program iterates through contiguous memory (like an array), having larger block sizes with fewer blocks may be beneficial as each block contains more contiguous data. For instance, if Cache A has 10 blocks and a block size of 8 bytes while Cache B has 20 block and a block size of 4 bytes, and we loop through an array of 80 characters, Cache A will experience 10 cache misses and 70 hits, while Cache B will have 20 misses and 60 hits.