Skip to main content Accessibility help
×
Hostname: page-component-76fb5796d-x4r87 Total loading time: 0 Render date: 2024-04-27T04:21:52.004Z Has data issue: false hasContentIssue false

5 - Back-End: Instruction Scheduling, Memory Access Instructions, and Clusters

Published online by Cambridge University Press:  05 June 2012

Jean-Loup Baer
Affiliation:
University of Washington
Get access

Summary

When an instruction has passed through all stages of the front-end of an out-of-order superscalar, it will either be residing in an instruction window or be dispatched to a reservation station. In this chapter, we first examine several schemes for holding an instruction before it is issued to one of the functional units. We do not consider the design of the latter; hence, this chapter will be relatively short. Some less common features related to multimedia instructions will be described in Chapter 7.

In a given cycle, several instructions awaiting the result of a preceding instruction will become ready to be issued. The detection of readiness is the wakeup step. Hopefully there will be as many as m instructions in an m-way superscalar, but maybe more, that have been woken up in this or previous cycles. Since several of them might vie for the same functional unit, some scheduling algorithm must be applied. Most scheduling algorithms are a variation of first-come–first-served (FCFS or FIFO). Determination of which instructions should proceed takes place during the select step. Once an instruction has been selected for a given functional unit, input operands must be provided. Forwarding, also called bypassing, must be implemented, as already shown in the simple pipelines of Chapter 2 and the examples of Chapter 3.

One particular instruction type that is often found to be on the critical path is the load instruction. As we have seen before, load dependencies are a bottleneck even in the simplest processors.

Type
Chapter
Information
Microprocessor Architecture
From Simple Pipelines to Chip Multiprocessors
, pp. 177 - 207
Publisher: Cambridge University Press
Print publication year: 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aggarwal, A. and Franklin, M., “Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors,” IEEE Trans. on Parallel and Distributed Systems, 16, 10, Oct. 2005, 944–955CrossRefGoogle Scholar
Chryzos, G. and Emer, J., “Memory Dependence Prediction Using Store Sets,” Proc. 25th Int. Symp. on Computer Architecture, 1998, 142–153CrossRefGoogle Scholar
Canal, R., Parcerisa, J. M., and Gonzales, A., “Dynamic Cluster Assignment Mechanisms,” Proc. 6th Int. Symp. on High-Performance Computer Architecture, 2000, 133–141Google Scholar
Calder, B. and Reinmann, G., “A Comparative Survey of Load Speculation Architectures,” Journal of Instruction-Level Parallelism, 1, 2000, 1–39Google Scholar
Fields, B., Bodik, R., and Hill, M., “Slack: Maximizing Performance under Technological Constraints,” Proc. 29th Int. Symp. on Computer Architecture, 2002, 47–58Google Scholar
Franklin, M. and Sohi, G., “A Hardware Mechanism for Dynamic Reordering of Memory References,” IEEE Trans. on Computers, 45, 6, Jun. 1996, 552–571CrossRefGoogle Scholar
Gochman, S., Ronen, R., Anati, I., Berkovits, R., Kurts, T., Naveh, A., Saeed, A., Sperber, Z., and Valentine, R., “The Intel Pentium M Processor: Microarchitecture and Performance,” Intel Tech. Journal, 07, 2, May 2003, 21–39Google Scholar
Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., and Roussel, P., “The Microarchitecture of the Pentium 4 Processor,” Intel Tech. Journal, Feb. 2001, 1–12Google Scholar
Kessler, R., “The Alpha 21264 Microprocessor,” IEEE Micro, 19, 2, Apr. 1999, 24–36CrossRefGoogle Scholar
Kerns, D. and Eggers, S., “Balanced Scheduling: Instruction Scheduling when Memory Latency is Uncertain,” Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation, SIGPLAN Notices, 28, 6, Jun. 1993, 278–289.Google Scholar
Keltcher, C., McGrath, J., Ahmed, A., and Conway, P., “The AMD Opteron for Multiprocessor Servers,” IEEE Micro, 23, 2, Apr. 2003, 66–76CrossRefGoogle Scholar
Keshava, J. and Pentkovski, V., “Pentium III Processor Implementation Tradeoffs,” Intel Tech. Journal, 2, May 1999Google Scholar
Lipasti, M. and Shen, J., “Exceeding the Dataflow Limit with Value Prediction,” Proc. 29th Int. Symp. on Microarchitecture, 1996, 226–237Google Scholar
Moshovos, A., Breach, S., Vijaykumar, T., and Sohi, G., “Dynamic Speculation and Synchronization of Data Dependences,” Proc. 24th Int. Symp. on Computer Architecture, 1997, 181–193CrossRefGoogle Scholar
Ozer, E., Banerjia, S., and Conte, T., “Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File Microarchitectures,” Proc. 31st Int. Symp. on Microarchitecture, 1998, 308–315CrossRefGoogle Scholar
Palacharla, S., Jouppi, N., and Smith, J., “Complexity-Effective Superscalar Processors,”Proc. 24th Int. Symp. on Computer Architecture, 1997, 206–218CrossRefGoogle Scholar
Stark, J., Brown, M., and Patt, Y., “On Pipelining Dynamic Instruction Scheduling Logic,” Proc. 34th Int. Symp. on Microarchitecture, 2000, 57–66Google Scholar
Srinivasan, S., Ju, D.-C., Lebeck, A., and Wilkerson, C., “Locality vs. Criticality,” Proc. 28th Int. Symp. on Computer Architecture, 2001, 132–143Google Scholar
Shen, J. P. and Lipasti, M., Modern Processor Design Fundamentals of Superscalar Processors, McGraw-Hill, 2004Google Scholar
Salverda, P. and Zilles, C., “A Criticality Analysis of Clustering in Superscalar Processors,” Proc. 38th Int. Symp. on Microarchitecture, 2005, 55–66CrossRefGoogle Scholar
Tendler, J., Dodson, J., Fields, Jr. J., Le, H., and Sinharoy, B., “POWER 4 System Microarchitecture,” IBM Journal of Research and Development, 46, 1, Jan. 2002, 5–25CrossRefGoogle Scholar
Tune, E., Liang, D., Tullsen, D., and Calder, B., “Dynamic Prediction of Critical Path Instructions,” Proc. 7th Int. Symp. on High-Performance Computer Architecture, 2001, 185–195Google Scholar
Weschler, O., “Inside Intel Core Microarchitecture,” Intel White Paper, 2006, http://download.intel.com/technology/architecture/new_architecture_06.pdf
Yeager, K., “The MIPS R10000 Superscalar Microprocessor,” IEEE Micro, 16, 2, Apr. 1996, 28–41CrossRefGoogle Scholar
Yoaz, A., Erez, M., Ronen, R., and Jourdan, S., “Speculation Techniques for Improving Load Related Instruction Scheduling,” Proc. 26th Int. Symp. on Computer Architecture, 1999, 42–53CrossRefGoogle Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×