The Cache Hierarchy

Jean-Loup Baer

doi:10.1017/CBO9780511811258.007

6 - The Cache Hierarchy

Published online by Cambridge University Press: 05 June 2012

Jean-Loup Baer

Show author details

Jean-Loup Baer: Affiliation:
University of Washington

Book contents

Get access

Summary

We reviewed the basics of caches in Chapter 2. In subsequent chapters, when we looked at instruction fetch in the front-end and data load–store operations in the back-end, we assumed most of the time that we had cache hits in the respective first-level instruction and data caches. It is time now to look at the memory hierarchy in a more realistic fashion. In this chapter, our focus is principally on the cache hierarchy.

The challenge for an effective memory hierarchy can be summarized by two technological constraints:

With processors running at a few gigahertz, main memory latencies are now of the order of several hundred cycles.
In order to access first-level caches in 1 or 2 cycles, their size and associativity must be severely limited.

These two facts point to a hierarchy of caches: relatively small-size and small-associativity first-level instruction and data caches (L1 caches); a large second-level on-chip cache with access an order of magnitude slower than L1 accesses (L2 cache generally unified, i.e., holding both instructions and data); often in high-performance servers a third-level cache (L3) off chip, with latencies approaching 100 cycles; and then main memory, with latencies of a few hundred cycles. The goal of the design of a cache hierarchy is to keep a latency of one or two cycles for L1 caches and to hide as much as possible the latencies of higher cache levels and of main memory.

Type: Chapter
Information: Microprocessor Architecture
From Simple Pipelines to Chip Multiprocessors
, pp. 208 - 259

DOI: https://doi.org/10.1017/CBO9780511811258.007 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agarwal, A. and Pudar, S., “Column-Associative Caches: A Technique for Reducing the Miss Rate of Direct-Mapped Caches,” Proc. 20th Int. Symp. on Computer Architecture, 1993, 179–190Google Scholar

Baer, J.-L. and Wang, W.-H., “On the Inclusion Properties for Multi-Level Cache Hierarchies,” Proc. 15th Int. Symp. on Computer Architecture, 1988, 73–80Google Scholar

Crisp, R., “Direct Rambus Technology: The New Main memory Standard,” IEEE Micro, 17, 6, Nov.–Dec. 1997, 18–28CrossRef Google Scholar

Chen, T.-F. and Baer, J.-L., “Effective Hardware-based Data Prefetching for High-Performance Processors,” IEEE Trans. on Computers, 44, 5, May 1995, 609–623CrossRef Google Scholar

Calder, B., Grunwald, D., and Emer, J., “Predictive Sequential Associative Cache,” Proc. 2nd Int. Symp. on High-Performance Computer Architecture, 1996, 244–253CrossRef Google Scholar

Conti, C., Gibson, D., and Pitkowsky, S., “Structural Aspects of the IBM System 360/85; General Organization,” IBM Systems Journal, 7, 1968, 2–14CrossRef Google Scholar

Chan, K., Hay, C., Keller, J., Kurpanek, G., Shumaker, F., and Zheng, J., “Design of the HP PA 7200 CPU,” Hewlett Packard Journal, 47, 1, Jan. 1996, 25–33Google Scholar

Cuppu, V., Jacob, B., Davis, B., and Mudge, T., “High-Performance DRAMs in Workstation Environments,” IEEE Trans. on Computers, 50, 11, Nov. 2001, 1133–1153CrossRef Google Scholar

Cooksey, R., Jourdan, S., and Grunwald, D., “A Stateless, Content-Directed Data Prefetching Mechanism,” Proc. 10th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 2002, 279–290CrossRef Google Scholar

Farkas, D. and Jouppi, N., “Complexity/Performance Trade-offs with Non-Blocking Loads,” Proc. 21st Int. Symp. on Computer Architecture, 1994, 211–222Google Scholar

Hallnor, E. and Reinhardt, S., “A Fully Associative Software-Managed Cache Design,” Proc. 27th Int. Symp. on Computer Architecture, 2000, 107–116Google Scholar

Jouppi, N., “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” Proc. 17th Int. Symp. on Computer Architecture, 1990, 364–373CrossRef Google Scholar

Joseph, D. and Grunwald, D., “Prefetching Using Markov Predictors,” Proc. 24th Int. Symp. on Computer Architecture, 1997, 252–263CrossRef Google Scholar

Kroft, D., “Lockup-Free Instruction Fetch/Prefetch Cache Organization,” Proc. 8th Int. Symp. on Computer Architecture, 1981, 81–87Google Scholar

Kim, C., Burger, D., and Keckler, S., “An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches,” Proc. 10th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 2002, 211–222CrossRef Google Scholar

Kessler, R., Jooss, R., Lebeck, A., and Hill, M., “Inexpensive Implementations of Set-Associativity,” Proc. 16th Int. Symp. on Computer Architecture, 1989, 131–139CrossRef Google Scholar

Kalamatianos, J., Khalafi, A., Kaeli, D., and Meleis, W., “Analysis of Temporal-based Program Behavior for Improved Instruction Cache Performance,” IEEE Trans. on Computers, 48, 2, Feb. 1999, 168–175CrossRef Google Scholar

Kalla, R., Sinharoy, B., and Tendler, J., “IBM Power5 Chip: A Dual-Core Multithreaded Processor,” IEEE Micro, 24, 2, Apr. 2004, 40–47CrossRef Google Scholar

Lai, A., Fide, C., and Falsafi, B., “Dead-block Prediction & Dead-block Correlation Prefetchers,” Proc. 28th Int. Symp. on Computer Architecture, 2001, 144–154CrossRef Google Scholar

Lin, W.-F., Reinhardt, S., and Burger, D., “Designing a Modern Memory Hierarchy with Hardware Prefetching,” IEEE Trans .on Computers, 50, 11, Nov. 2001, 1202–1218Google Scholar

Mowry, T., Lam, M., and Gupta, A., “Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors,” Proc. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1992, 62–73Google Scholar

Pettis, K. and Hansen, R., “Profile Guided Code Positioning,” Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation, SIGPLAN Notices, 25, Jun. 1990, 16–27Google Scholar

Peir, J.-K., Hsu, W., and Smith, A. J., “Functional Implementation Techniques for CPU Cache Memories,” IEEE Trans. on Computers, 48, 2, Feb. 1999, 100–110CrossRef Google Scholar

Palacharla, S. and Kessler, R., “Evaluating Stream Buffers as a Secondary Cache Replacement,” Proc. 21st Int. Symp. on Computer Architecture, 1994, 24–33Google Scholar

Smith, A., “Cache Memories,” ACM Computing Surveys, 14, 3, Sep. 1982, 473–530CrossRef Google Scholar

Seznec, A., “A Case for Two-way Skewed-Associative Caches,” Proc. 20th Int. Symp. on Computer Architecture, 1993, 169–178Google Scholar

Tendler, J., Dodson, J., Fields, J., Jr., Le, H., and Sinharoy, B., “POWER4 System Microarchitecture,” IBM Journal of Research and Development, 46, 1, Jan. 2002, 5–24CrossRef Google Scholar

Vanderwiel, S. and Lilja, D., “Data Prefetch Mechanisms,” ACM Computing Surveys, 32, 2, Jun. 2000, 174–199CrossRef Google Scholar

Wong, W. and Baer, J.-L., “Modified LRU Policies for Improving Second-Level Cache Behavior,” Proc. 6th Int. Symp. on High-Performance Computer Architecture, 2000, 49–60Google Scholar

Zhang, Z., Zhu, Z., and Zhang, X., “A Permutation-based Page Interleaving Scheme to Reduce Row-buffer Conflicts and Exploit Data Locality,” Proc. 33rd Int. Symp. on Microarchitecture, 2000, 32–41Google Scholar

Book contents

6 - The Cache Hierarchy

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive