Given the widening gaps between processor speed, main memory (DRAM) speed, and secondary memory (disk) speed, it has become more and more difficult in recent years to feed data and instructions at the speed required by the processor while providing the ever-expanding memory space expected by modern applications. Modern systems rely on a memory hierarchy based on speed, size, and cost, as illustrated in Figure 4.1. Left of the dotted line is the cache hierarchy. Right of the dotted line is the virtual memory hierarchy, which may include a disk cache (not shown).
It has been observed over the years that the speed gap between the processor (clocked at multiple gigahertz and executing multiple instructions per clock) and main memory (with access times in the tens or even hundreds of nanoseconds) is growing exponentially. This problem is commonly referred to as the memory wall. A hierarchy of multiple levels of caches with various sizes and access times are employed to bridge the speed gap. Moreover, caches at every level are becoming more and more complex to help reduce or hide the latency of cache misses. To support OoO dynamically scheduled processors, which may have more than ten memory accesses pending at any time, modern, lockup-free (non-blocking) caches are capable of handling multiple cache hits and misses at a time. Furthermore, data and instructions are prefetched in caches before they are needed. In this chapter we describe these enhancements to cache designs.