Skip to main content Accessibility help
  • Print publication year: 2009
  • Online publication date: June 2012

4 - Front-End: Branch Prediction, Instruction Fetching, and Register Renaming


In this chapter, we revisit the actions that are taken during the front-end of the instruction pipeline: instruction fetch, instruction decode, and, for out-of-order processors, register renaming.

A large part of this chapter will be devoted to branch prediction, which in turn governs instruction fetch. We have already seen in previous chapters the importance of branch prediction in that (i) branch instructions, or more generally transfer of control flow instructions, occur very often (once every five instructions on average), and (ii) branch mispredictions are extremely costly in lost instruction issue slots. The performance penalty of misprediction increases with the depth and the width of the pipelines. In other words, the faster the processor (the greater the depth) and the more it can exploit instruction-level parallelism (the greater the width), the more important it is to have accurate branch predictors.

We shall start our study of branch prediction by examining the anatomy of a branch predictor, an instance of a general prediction model. This model will highlight the decision points: when we predict, what we predict, how we predict, and how to provide feedback to the predictor. For the two types of prediction, branch direction and branch target address, the emphasis will first be on the “how.” Because there have been hundreds of papers devoted to branch direction prediction, the highlight will be on what may be considered to be the most important schemes, historically and performancewise.

Christie, D., “Developing the AMD-K5 Architecture,” IEEE Micro, 16, 2, Mar. 1996, 16–27
Cheng, I-C., Coffey, J., and Mudge, T., “Analysis of Branch Prediction via Data Compression,” Proc. 7th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1996, 128–137
Calder, B. and Grunwald, D., “Fast & Accurate Instruction Fetch and Branch Prediction,” Proc. 21st Int. Symp. on Computer Architecture, 1994, 2–11
Calder, B. and Grunwald, D., “Next Cache Line and Set Prediction,” Proc. 22nd Int. Symp. on Computer Architecture, 1995, 287–296
Cvetanovic, Z. and Kessler, R., “Performance Analysis of the Alpha 21264-based Compaq ES40 System,” Proc. 27th Int. Symp. on Computer Architecture, 2000, 192–202
Conte, T., Memezes, K., Mills, P., and Patel, B., “Optimization of Instruction Fetch Mechanisms for High Issue Rates,” Proc. 22nd Int. Symp. on Computer Architecture, 1995, 333–344
Eden, A. and Mudge, T., “The YAGS Branch Prediction Scheme,” Proc. 31st Int. Symp. on Microarchitecture, 1998, 69–77
Fagin, B. and Russell, K., “Partial Resolution in Branch Target Buffers,” Proc. 28th Int. Symp. on Microarchitecture, 1995, 193–198
Gochman, S., Ronen, R., Anati, I., Berkovits, R., Kurts, T., Naveh, A., Saeed, A., Sperber, Z., and Valentine, R., “The Intel Pentium M Processor: Microarchitecture and Performance,” Intel Tech. Journal, 07, 2, May 2003, 21–39
Hao, E., Chang, P.-Y., and Patt, Y., “The Effect of Speculatively Updating Branch History on Branch Prediction Accuracy, Revisited,” Proc. 27th Int. Symp. on Microarchitecture, 1994, 228–232
Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., and Roussel, P., “The Microarchitecture of the Pentium 4 Processor,” Intel Tech. Journal, Feb. 2001, 1–12
Jiménez, D., Keckler, S., and Lin, C., “The Impact of Delay on the Design of Branch Predictors,” Proc. 33rd Int. Symp. on Microarchitecture, 2000, 67–76
Jourdan, S., Stark, J., Hsing, T.-H., and Patt, Y., “Recovery Requirements of Branch Prediction Storage Structures in the Presence of Mispredicted-path Execution,” International Journal of Parallel Programming, 25, Oct. 1997, 363–383
Kessler, R., “The Alpha 21264 Microprocessor,” IEEE Micro, 19, 2, Mar. 1999, 24–36
Kaeli, D. and Emma, P., “Branch History Table Prediction of Moving Target Branches Due to Subroutine Returns,” Proc. 18th Int. Symp. on Computer Architecture, 1991, 34–42
Liptay, J., “Design of the IBM Enterprise System/9000 High-end Processor,” IBM Journal of Research and Development, 36, 4, Jul. 1992, 713–731
Loh, G., “Advanced Instruction Flow Techniques,” Chapter 9 in Shen, J. P. and Lipasti, M., Eds., Modern Processor Design, 2005, 453–518
Lee, D., Crowley, P., Baer, J.-L., Anderson, T., and Bershad, B., “Execution Characteristics of Desktop Applications on Windows NT,” Proc. 25th Int. Symp. on Computer Architecture, 1998, 27–38
Lee, J. and Smith, A., “Branch Prediction Strategies and Branch Target Buffer Design,” IEEE Computer, 17, 1, Jan. 1984, 6–22
McFarling, S., “Combining Branch Predictors,” WRL Tech. Note, TN-36, Jun. 1993
Patel, S., Friendly, D., and Patt, Y., “Evaluation of Design Options for the Trace Cache Fetch Mechanism,” IEEE Trans. on Computers, 48, 2, Feb. 1999, 193–204
Perleberg, C. and Smith, A., “Branch Target Buffer Design and Optimization,” IEEE Trans. on Computers, 42, 4 Apr. 1993, 396–412
Pan, S., So, K., and Rahmey, J., “Improving the Accuracy of Dynamic Branch Prediction Using Branch Correlation,” Proc. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1992, 76–84
Postiff, M., Tyson, G., and Mudge, T., “Performance Limits of Trace Caches,” Journal of Instruction-Level Parallelism, 1, Sep. 1999, 1–17
Peleg, A. and Weiser, U., “Dynamic Flow Instruction Cache Memory Organized Around Trace Segments Independent of Virtual Address Line,” U.S. Patent Number 5,381,533, 1994
Rotenberg, E., Bennett, S., and Smith, J., “Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching,” Proc. 29th Int. Symp. on Microarchitecture, 1996, 24–34
Smith, J., “A Study of Branch Prediction Strategies,” Proc. 8th Int. Symp. on Computer Architecture, 1981, 135–148
Sima, D., “The Design Space of Register Renaming Techniques,” IEEE Micro, 20, 5, Sep. 2000, 70–83
Skadron, K., Martonosi, M., and Clark, D., “Speculative Updates of Local and Global Branch History: A Quantitative Analysis,” Journal of Instruction-Level Parallelism, 2, 2000, 1–23
Smith, J. and Sohi, G., “The Microarchitecture of Superscalar Processors,” Proc. IEEE, 83, 12, Dec. 1995, 1609–1624
Thornton, J., Design of a Computer: The Control Data 6600, Scott, Foresman and Co., Glenview, IL, 1970
Yeh, T.-Y. and Patt, Y., “Alternative Implementations of Two-Level Adaptive Branch Prediction,” Proc. 19th Int. Symp. on Computer Architecture, 1992, 124–134
Yeh, T.-Y. and Patt, Y., “A Comprehensive Instruction Fetch Mechanism for a Processor Supporting Speculative Execution,” Proc. 25th Int. Symp. on Microarchitecture, 1992, 129–139