Skip to main content Accessibility help
  • Print publication year: 2009
  • Online publication date: June 2012

2 - The Basics


This chapter reviews features that are found in all modern microprocessors: (i) instruction pipelining and (ii) a main memory hierarchy with caches, including the virtual-to-physical memory translation. It does not dwell on many details – that is what subsequent chapters will do. It provides solely a basis on which we can build later on.


Consider the steps required to execute an arithmetic instruction in the von Neumann machine model, namely:

1. Fetch the (next) instruction (the one at the address given by the program counter).

2. Decode it.

3. Execute it.

4. Store the result and increment the program counter.

In the case of a load or a store instruction, step 3 becomes two steps: calculate a memory address, and activate the memory for a read or for a write. In the latter case, no subsequent storing is needed. In the case of a branch, step 3 sets the program counter to point to the next instruction, and step 4 is voided.

Early on in the design of processors, it was recognized that complete sequentiality between the executions of instructions was often too restrictive and that parallel execution was possible. One of the first forms of parallelism that was investigated was the overlap of the mentioned steps between consecutive instructions. This led to what is now called pipelining.

Anderson, S., Earle, J., Goldschmitt, R., and Powers, D., “The IBM System/360 Model 91: Floating-point Execution Unit,” IBM Journal of Research and Development, 11, Jan. 1967, 34–53
Belady, L., “A Study of Replacement Algorithms for a Virtual Storage Computer,” IBM Systems Journal, 5, 1966, 78–101
Bucholz, W. (Ed.), Planning a Computer System: Project Stretch, McGraw-Hill, New York, 1962
Conti, C., Gibson, D., and Pitkowsky, S., “Structural Aspects of the IBM System 360/85; General Organization,” IBM Systems Journal, 7, 1968, 2–14
Cantin, J. and Hill, M., Cache Performance for SPEC CPU2000 Benchmarks, Version 3.0, May 2003,
Case, R. and Padegs, A., “The Architecture of the IBM System/370,” Communications of the ACM, 21, 1, Jan. 1978, 73–96
Denning, P., “Virtual Memory,” ACM Computing Surveys, 2, Sep. 1970, 153–189
Golden, M. and Mudge, T., “A Comparison of Two Pipeline Organizations,” Proc. 27th Int. Symp. on Microarchitecture, 1994, 153–161
Hill, M., Aspects of Cache Memory and Instruction Buffer Performance, Ph.D. Dissertation, Univ. of California, Berkeley, Nov. 1987
Kogge, P., The Architecture of Pipelined Computers, McGraw-Hill, New York, 1981
Kilburn, T., Edwards, D., Lanigan, M., and Sumner, F., “One-level Storage System,” IRE Trans. on Electronic Computers, Ec02-76992, 2, Apr. 1962, 223–235
Lee, J., “Study of ‘Look-Aside’ Memory,” IEEE Trans. on Computers, C-18, 11, Nov. 1969, 1062–1065
Mattson, R., Gecsei, J., Slutz, D., and Traiger, I., “Evaluation Techniques for Storage Hierarchies,” IBM Systems Journal, 9, 1970, 78–117
Przybylski, S., Cache Design: A Performance Directed Approach, Morgan Kaufman Publishers, San Francisco, 1990
Patterson, D. and Hennessy, J., Computer Organization & Design: The Hardware/Software Interface, Third Edition, Morgan Kaufman Publishers, San Francisco, 2004
Pugh, E., Johnson, L., and Palmer, J., IBM's 360 and Early 370 Systems, The MIT Press, Cambridge, MA, 1991
Smith, A., “Cache Memories,” ACM Computing Surveys, 14, 3, Sep. 1982, 473–530
Shen, J. P. and Lipasti, M., Modern Processor Design Fundamentals of Superscalar Processors, McGraw-Hill, 2004
Uhlig, R. and Mudge, T., “Trace-driven Memory Simulation: A Survey,” ACM Computing Surveys, 29, 2, Jun. 1997, 128–170
VanVleet, P., Anderson, E., Brown, L., Baer, J.-L., and Karlin, A., “Pursuing the Performance Potential of Dynamic Cache Lines,” Proc. ICCD, Oct. 1999, 528–537
Wilkes, M., “Slave Memories and Dynamic Storage Allocation,” IEEE Trans on Electronic Computers, Ec02-76992, Apr. 1965, 270–271