Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1And in Conclusion\dots

1.1Cache Misses

In order to evaluate cache performance and hit rate, especially with determining how effective our current cache configuration is, it is useful to analyze the misses that do occur, and adjust accordingly. Below, we categorize cache misses into two types:

  1. Compulsory: A miss that must occur when you bring in a certain block for the first time, hence “compulsory”. Compulsory misses are cache attempts that would never be a hit regardless of the cache design.

  2. Noncompulsory: A cache miss that occurs after the data has already been brought into the cache and then evicted afterwards. If the miss could have been alleviated via increasing the cache size or associativity, then the miss is considered noncompulsory.

1.2Cache Associativity

Direct-Mapped caches, where each block of memory maps to specifically one slot in our cache, are good for fast searching, simple hardware, and quick replacement, but not so good for spatial locality! This is where we bring associativity into the matter. Associativity is the number of slots a memory block can map to in our cache. Thus, a Fully-Associative cache has the most associativity, meaning one memory block can map to any cache block. Our Direct-Mapped cache, on the other hand, has the least (being only 1-way set associative) because one memory block can only map to a single cache block.

For an N-way set associative cache, the following relationships are true:

2Additional References

Amazing Illustrations by Ketrina (Yim) Thompson: CS Illustrated Cache Handouts

2.1Benchmark references

The data in this section is from Patterson & Hennessy, Computer Organization: A General Approach, Fifth Edition.

2.2Short Exercises

Solution to Exercise 1 #

False. Similar to the previous question, the impact depends on the program. If a program iterates through contiguous memory (like an array), having larger block sizes with fewer blocks may be beneficial as each block contains more contiguous data. For instance, if Cache A has 10 blocks and a block size of 8 bytes while Cache B has 20 block and a block size of 4 bytes, and we loop through an array of 80 characters, Cache A will experience 10 cache misses and 70 hits, while Cache B will have 20 misses and 60 hits.

Solution to Exercise 2 #

False. A cache starts off “cold” and requires loading in values in blocks at first directly from memory, forcing compulsory misses. This can be somewhat alleviated by the use of a hardware prefetcher, which uses the current pattern of misses to predict and prefetch data that may be accessed later on. Prefetchers are out of scope for this course.

Solution to Exercise 3 #

False. Whether this improves the hit rate for a given program depends on the characteristics of the program. For example, a program that loops through an array once may have each access be separated by more than one block (e.g., if the block size is 8B but we access every fourth element of an integer array, our accesses are separated by 16B). This results in compulsory misses, which cannot be reduced just by adding more blocks to the cache.

Footnotes
  1. J.F. Cantin and M.D. Hill [2001]. “Cache PErformance for Selected SPEC CPU2000 Benchmarks.” Cantin & Hill (2001) website Version 2.0.

  2. J.D. Gee, M.D. Hill, D.N. Pnevmatikatos, and A.J. Smith [1993]. “Cache performance of the SPEC92 benchmark suite.” IEEE Micro 13:4 (August), 17-27.Gee et al. (1993)

References
  1. Cantin, J. F., & Hill, M. D. (2001). Cache performance for selected SPEC CPU2000 benchmarks. ACM SIGARCH Computer Architecture News, 29(4), 13–18. 10.1145/563519.563522
  2. Gee, J. D., Hill, M. D., Pnevmatikatos, D. N., & Smith, A. J. (1993). Cache performance of the SPEC92 benchmark suite. IEEE Micro, 13(4), 17–27. 10.1109/40.229711