Back-End: Instruction Scheduling, Memory Access Instructions, and Clusters

Jean-Loup Baer

doi:10.1017/CBO9780511811258.006

5 - Back-End: Instruction Scheduling, Memory Access Instructions, and Clusters

Published online by Cambridge University Press: 05 June 2012

Jean-Loup Baer

Show author details

Jean-Loup Baer: Affiliation:
University of Washington

Book contents

Get access

Summary

When an instruction has passed through all stages of the front-end of an out-of-order superscalar, it will either be residing in an instruction window or be dispatched to a reservation station. In this chapter, we first examine several schemes for holding an instruction before it is issued to one of the functional units. We do not consider the design of the latter; hence, this chapter will be relatively short. Some less common features related to multimedia instructions will be described in Chapter 7.

In a given cycle, several instructions awaiting the result of a preceding instruction will become ready to be issued. The detection of readiness is the wakeup step. Hopefully there will be as many as m instructions in an m-way superscalar, but maybe more, that have been woken up in this or previous cycles. Since several of them might vie for the same functional unit, some scheduling algorithm must be applied. Most scheduling algorithms are a variation of first-come–first-served (FCFS or FIFO). Determination of which instructions should proceed takes place during the select step. Once an instruction has been selected for a given functional unit, input operands must be provided. Forwarding, also called bypassing, must be implemented, as already shown in the simple pipelines of Chapter 2 and the examples of Chapter 3.

One particular instruction type that is often found to be on the critical path is the load instruction. As we have seen before, load dependencies are a bottleneck even in the simplest processors.

Type: Chapter
Information: Microprocessor Architecture
From Simple Pipelines to Chip Multiprocessors
, pp. 177 - 207

DOI: https://doi.org/10.1017/CBO9780511811258.006 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aggarwal, A. and Franklin, M., “Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors,” IEEE Trans. on Parallel and Distributed Systems, 16, 10, Oct. 2005, 944–955CrossRef Google Scholar

Chryzos, G. and Emer, J., “Memory Dependence Prediction Using Store Sets,” Proc. 25th Int. Symp. on Computer Architecture, 1998, 142–153CrossRef Google Scholar

Canal, R., Parcerisa, J. M., and Gonzales, A., “Dynamic Cluster Assignment Mechanisms,” Proc. 6th Int. Symp. on High-Performance Computer Architecture, 2000, 133–141Google Scholar

Calder, B. and Reinmann, G., “A Comparative Survey of Load Speculation Architectures,” Journal of Instruction-Level Parallelism, 1, 2000, 1–39Google Scholar

Fields, B., Bodik, R., and Hill, M., “Slack: Maximizing Performance under Technological Constraints,” Proc. 29th Int. Symp. on Computer Architecture, 2002, 47–58Google Scholar

Franklin, M. and Sohi, G., “A Hardware Mechanism for Dynamic Reordering of Memory References,” IEEE Trans. on Computers, 45, 6, Jun. 1996, 552–571CrossRef Google Scholar

Gochman, S., Ronen, R., Anati, I., Berkovits, R., Kurts, T., Naveh, A., Saeed, A., Sperber, Z., and Valentine, R., “The Intel Pentium M Processor: Microarchitecture and Performance,” Intel Tech. Journal, 07, 2, May 2003, 21–39Google Scholar

Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., and Roussel, P., “The Microarchitecture of the Pentium 4 Processor,” Intel Tech. Journal, Feb. 2001, 1–12Google Scholar

Kessler, R., “The Alpha 21264 Microprocessor,” IEEE Micro, 19, 2, Apr. 1999, 24–36CrossRef Google Scholar

Kerns, D. and Eggers, S., “Balanced Scheduling: Instruction Scheduling when Memory Latency is Uncertain,” Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation, SIGPLAN Notices, 28, 6, Jun. 1993, 278–289.Google Scholar

Keltcher, C., McGrath, J., Ahmed, A., and Conway, P., “The AMD Opteron for Multiprocessor Servers,” IEEE Micro, 23, 2, Apr. 2003, 66–76CrossRef Google Scholar

Keshava, J. and Pentkovski, V., “Pentium III Processor Implementation Tradeoffs,” Intel Tech. Journal, 2, May 1999Google Scholar

Lipasti, M. and Shen, J., “Exceeding the Dataflow Limit with Value Prediction,” Proc. 29th Int. Symp. on Microarchitecture, 1996, 226–237Google Scholar

Moshovos, A., Breach, S., Vijaykumar, T., and Sohi, G., “Dynamic Speculation and Synchronization of Data Dependences,” Proc. 24th Int. Symp. on Computer Architecture, 1997, 181–193CrossRef Google Scholar

Ozer, E., Banerjia, S., and Conte, T., “Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File Microarchitectures,” Proc. 31st Int. Symp. on Microarchitecture, 1998, 308–315CrossRef Google Scholar

Palacharla, S., Jouppi, N., and Smith, J., “Complexity-Effective Superscalar Processors,”Proc. 24th Int. Symp. on Computer Architecture, 1997, 206–218CrossRef Google Scholar

Stark, J., Brown, M., and Patt, Y., “On Pipelining Dynamic Instruction Scheduling Logic,” Proc. 34th Int. Symp. on Microarchitecture, 2000, 57–66Google Scholar

Srinivasan, S., Ju, D.-C., Lebeck, A., and Wilkerson, C., “Locality vs. Criticality,” Proc. 28th Int. Symp. on Computer Architecture, 2001, 132–143Google Scholar

Shen, J. P. and Lipasti, M., Modern Processor Design Fundamentals of Superscalar Processors, McGraw-Hill, 2004Google Scholar

Salverda, P. and Zilles, C., “A Criticality Analysis of Clustering in Superscalar Processors,” Proc. 38th Int. Symp. on Microarchitecture, 2005, 55–66CrossRef Google Scholar

Tendler, J., Dodson, J., Fields, Jr. J., Le, H., and Sinharoy, B., “POWER 4 System Microarchitecture,” IBM Journal of Research and Development, 46, 1, Jan. 2002, 5–25CrossRef Google Scholar

Tune, E., Liang, D., Tullsen, D., and Calder, B., “Dynamic Prediction of Critical Path Instructions,” Proc. 7th Int. Symp. on High-Performance Computer Architecture, 2001, 185–195Google Scholar

Weschler, O., “Inside Intel Core Microarchitecture,” Intel White Paper, 2006, http://download.intel.com/technology/architecture/new_architecture_06.pdf

Yeager, K., “The MIPS R10000 Superscalar Microprocessor,” IEEE Micro, 16, 2, Apr. 1996, 28–41CrossRef Google Scholar

Yoaz, A., Erez, M., Ronen, R., and Jourdan, S., “Speculation Techniques for Improving Load Related Instruction Scheduling,” Proc. 26th Int. Symp. on Computer Architecture, 1999, 42–53CrossRef Google Scholar

Book contents

5 - Back-End: Instruction Scheduling, Memory Access Instructions, and Clusters

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive