Bibliography

Jean-Loup Baer

doi:10.1017/CBO9780511811258.011

Bibliography

Published online by Cambridge University Press: 05 June 2012

Jean-Loup Baer

Show author details

Jean-Loup Baer: Affiliation:
University of Washington

Book contents

Get access

Summary

A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'

Type: Chapter
Information: Microprocessor Architecture
From Simple Pipelines to Chip Multiprocessors
, pp. 351 - 360

DOI: https://doi.org/10.1017/CBO9780511811258.011 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abel, N., Budnick, D., Kuck, D., Muraoka, Y., Northcote, R., and Wilhelmson, R., “TRANQUIL: A Language for an Array Processing Computer,” Proc. AFIPS SJCC, 1969, 57–73Google Scholar

Adiletta, M., Rosenbluth, M., Bernstein, D., Wolrich, G., and Wilkinson, H., “The Next Generation of Intel IXP Network Processors,” Intel Tech. Journal, 6, 3, Aug. 2002, 6–18Google Scholar

Adve, S. and Gharachorloo, K., “Shared Memory Consistency Models: A Tutorial,” IEEE Computer, 29, 12, Dec. 1996, 66–76CrossRef Google Scholar

Agarwal, A., Bianchini, R., Chaiken, D., Johnson, K., Kranz, D., Kubiatowicz, J., Lim, B.-H., Mackenzie, K., and Yeung, D., “The MIT Alewife Machine: Architecture and Performance,” Proc. 22nd Int. Symp. on Computer Architecture, 1995, 2–13CrossRef Google Scholar

Agarwal, A., Lim, B.-H., Kranz, D., and Kubiatowicz, J., “APRIL: A Processor Architecture for Multiprocessing,” Proc. 17th Int. Symp. on Computer Architecture, 1990, 104–114CrossRef Google Scholar

Agarwal, A. and Pudar, S., “Column-Associative Caches: A Technique for Reducing the Miss Rate of Direct-Mapped Caches,” Proc. 20th Int. Symp. on Computer Architecture, 1993, 179–190Google Scholar

Agarwal, A., Simoni, R., Hennessy, J., and Horowitz, M., “An Evaluation of Directory Schemes for Cache Coherence,” Proc. 15th Int. Symp. on Computer Architecture, 1988, 280–289Google Scholar

Aggarwal, A. and Franklin, M., “Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors,” IEEE Trans. on Parallel and Distributed Systems, 16, 10, Oct. 2005, 944–955CrossRef Google Scholar

Akkary, H. and Driscoll, M., “A Dynamic Multithreading Processor,” Proc. 31st Int. Symp. on Microarchitecture, 1998, 226–236CrossRef Google Scholar

Albonesi, D., Balasubramonian, R., Dropsho, S., Dwarkadas, S., Friedman, E., Huang, M., Kursun, V., Magklis, G., Scott, M., Semeraro, G., Bose, P., Buyuktosunoglu, A., Cook, P., and Schuster, S., “Dynamic Tuning Processor Resources with Adaptive Processing,” IEEE Computer, 36, 12, Dec. 2003, 49–58CrossRef Google Scholar

Alverson, R., Callahan, D., Cummings, D., Koblenz, B., Porterfield, A., and Smith, B., “The Tera Computer System,” Proc. Int. Conf. on Supercomputing, 1990, 1–6Google Scholar

Amdahl, G., “Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities,” Proc. AFIPS SJCC, 30, Apr. 1967, 483–485Google Scholar

Anderson, D., Sparacio, F., and Tomasulo, R., “Machine Philosophy and Instruction Handling,” IBM Journal of Research and Development, 11, 1, Jan. 1967, 8–24CrossRef Google Scholar

Anderson, S., Earle, J., Goldschmitt, R., and Powers, D., “The IBM System/360 Model 91: Floating-point Execution Unit,” IBM Journal of Research and Development, 11, Jan. 1967, 34–53CrossRef Google Scholar

Anderson, T., “The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors,” IEEE Trans. on Parallel and Distributed Systems, 1, 1, Jan. 1990, 6–16CrossRef Google Scholar

Archibald, J. and Baer, J.-L., “An Economical Solution to the Cache Coherence Problem,” Proc. 12th Int. Symp. on Computer Architecture, 1985, 355–362Google Scholar

Archibald, J. and Baer, J.-L., “Cache Coherence Protocols: Evaluation Using a Multiprocessor Simulation Model,” ACM Trans. on Computing Systems, 4, 4, Nov. 1986, 273–298CrossRef Google Scholar

August, D., Connors, D., Mahlke, S., Sias, J., Crozier, K., Cheng, B., Eaton, P., Olaniran, Q., and Hwu, W.-m., “Integrated Predicated and Speculative Execution in the IMPACT EPIC Architecture,” Proc. 25th Int. Symp. on Computer Architecture, 1998, 227–237CrossRef Google Scholar

Austin, T., Larson, D., and Ernst, D., “SimpleScalar: An Infrastructure for Computer System Modeling,” IEEE Computer, 35, 2, Feb. 2002, 59–67CrossRef Google Scholar

Baer, J.-L. and Wang, W.-H., “On the Inclusion Properties for Multi-Level Cache Hierarchies,” Proc. 15th Int. Symp. on Computer Architecture, 1988, 73–80Google Scholar

Baetke, F., “The CONVEX Exemplar SPP1000 and SPP1200 – New Scalable Parallel Systems with a Virtual Shared Memory Architecture,” in Dongarra, J., Grandinetti, L., Joubert, G., and Kowalik, J., Eds., High Performance Computing: Technology, Methods and Applications, Elsevier Press, 1995, 81–102CrossRef Google Scholar

Balasubramonian, R., Albonesi, D., Buyuktosunoglu, A., and Dwarkadas, S., “Memory Hierarchy Reconfiguration for Energy and Performance in General-purpose Processor Architectures,” Proc. 33rd Int. Symp. on Microarchitecture, 2000, 245–257Google Scholar

Belady, L., “A Study of Replacement Algorithms for a Virtual Storage Computer,” IBM Systems Journal, 5, 1966, 78–101CrossRef Google Scholar

Bernstein, A., “Analysis of Programs for Parallel Processing,” IEEE Trans. on Electronic Computers, EC-15, Oct. 1966, 746–757Google Scholar

Bhandarkar, D., Alpha Implementations and Architecture. Complete Reference and Guide, Digital Press, Boston, 1995Google Scholar

Boggs, D., Baktha, A., Hawkins, J., Marr, D., Miller, J., Roussel, P., Singhal, R., Toll, B., and Venkatraman, K., “The Microarchitecture of the Pentium 4 Processor on 90nm Technology,” Intel Tech. Journal, 8, 1, Feb. 2004, 1–17Google Scholar

Borkenhagen, J., Eickemeyer, R., Kalla, R., and Kunkel, S., “A Multithreaded PowerPC Processor for Commercial Servers,” IBM Journal of Research and Development, 44, 6, 2000, 885–899CrossRef Google Scholar

Brooks, D. and Martonosi, M., “Dynamic Thermal Management in High-Performance Microprocessors,” Proc.7th Int. Symp. on High-Performance Computer Architecture, 2001, 171–182Google Scholar

Bucholz, W., Ed., Planning a Computer System: Project Stretch, McGraw-Hill, New York, 1962

Calder, B. and Grunwald, D., “Fast & Accurate Instruction Fetch and Branch Prediction,” Proc. 21st Int. Symp. on Computer Architecture, 1994, 2–11Google Scholar

Calder, B. and Grunwald, D., “Next Cache Line and Set Prediction,” Proc. 22nd Int. Symp. on Computer Architecture, 1995, 287–296CrossRef Google Scholar

Calder, B., Grunwald, D., and Emer, J., “Predictive Sequential Associative Cache,” Proc. 2nd Int. Symp. on High-Performance Computer Architecture, 1996, 244–253CrossRef Google Scholar

Calder, B. and Reinmann, G., “A Comparative Survey of Load Speculation Architectures,” Journal of Instruction-Level Parallelism, 1, 2000, 1–39Google Scholar

Canal, R., Parcerisa, J.M., and Gonzales, A., “Dynamic Cluster Assignment Mechanisms,” Proc. 6th Int. Symp. on High-Performance Computer Architecture, 2000, 133–141Google Scholar

Cantin, J. and Hill, M., Cache Performance for SPEC CPU2000 Benchmarks, Version 3.0, May 2003, http://www.cs.wisc.edu/multifacet/misc/spec2000cache-data/

Case, R. and Padegs, A., “The Architecture of the IBM System/370,” Communications of the ACM, 21, 1, Jan. 1978, 73–96CrossRef Google Scholar

Censier, L. and Feautrier, P., “A New Solution to Coherence Problems in Multicache Systems,” IEEE Trans. on Computers, 27, 12, Dec. 1978, 1112–1118CrossRef Google Scholar

Chan, K., Hay, C., Keller, J., Kurpanek, G., Shumaker, F., and Zheng, J., “Design of the HP PA 7200 CPU,” Hewlett Packard Journal, 47, 1, Jan. 1996, 25–33Google Scholar

Chaudhry, S., Caprioli, P., Yip, S., and Tremblay, M., “High-Performance Throughput Computing,” IEEE Micro, 25, 3, May 2005, 32–45CrossRef Google Scholar

Chen, T.-F. and Baer, J.-L., “Effective Hardware-based Data Prefetching for High-Performance Processors,” IEEE Trans. on Computers, 44, 5, May 1995, 609–623CrossRef Google Scholar

Cheng, I-C., Coffey, J., and Mudge, T., “Analysis of Branch Prediction via Data Compression,” Proc. 7th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1996, 128–137Google Scholar

Christie, D., “Developing the AMD-K5 Architecture,” IEEE Micro, 16, 2, Mar. 1996, 16–27CrossRef Google Scholar

Chryzos, G. and Emer, J., “Memory Dependence Prediction Using Store Sets,” Proc. 25th Int. Symp. on Computer Architecture, 1998, 142–153CrossRef Google Scholar

Citron, D., Hurani, A., and Gnadrey, A., “The Harmonic or Geometric Mean: Does it Really Matter,” Computer Architecture News, 34, 6, Sep. 2006, 19–26CrossRef Google Scholar

Colwell, R., Papworth, D., Hinton, G., Fetterman, M., and Glew, A., “Intel's P6 Microarchitecture,” Chapter 7 in Shen, J. P. and Lipasti, M., Eds., Modern Processor Design, 2005, 329–367Google Scholar

Conte, T., Memezes, K., Mills, P., and Patel, B., “Optimization of Instruction Fetch Mechanisms for High Issue Rates,” Proc. 22nd Int. Symp. on Computer Architecture, 1995, 333–344CrossRef Google Scholar

Conti, C., Gibson, D., and Pitkowsky, S., “Structural Aspects of the IBM System 360/85; General Organization,” IBM Systems Journal, 7, 1968, 2–14CrossRef Google Scholar

Cooksey, R., Jourdan, S., and Grunwald, D., “A Stateless, Content-Directed Data Prefetching Mechanism,” Proc. 10th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 2002, 279–290CrossRef Google Scholar

Crisp, R., “Direct Rambus Technology: The New Main Memory Standard,” IEEE Micro, 17, 6, Nov.–Dec. 1997, 18–28CrossRef Google Scholar

Culler, D. and Singh, J.P. with Gupta, A., Parallel Computer Architecture: A Hardware/Software Approach, Morgan Kaufman Publishers, San Francisco, 1999Google Scholar

Cuppu, V., Jacob, B., Davis, B., and Mudge, T., “High-Performance DRAMs in Workstation Environments,” IEEE Trans. on Computers, 50, 11, Nov. 2001, 1133–1153CrossRef Google Scholar

Curnow, H. and Wichman, B., “Synthetic Benchmark,” Computer Journal, 19, 1, Feb. 1976CrossRef Google Scholar

Cvetanovic, Z. and Bhandarkar, D., “Performance Characterization of the Alpha 21164 Microprocessor Using TP and SPEC Workloads,” Proc. 2nd Int. Symp. on High-Performance Computer Architecture, 1996, 270–280CrossRef Google Scholar

Cvetanovic, Z. and Kessler, R., “Performance Analysis of the Alpha 21264-based Compaq ES40 System,” Proc. 27th Int. Symp. on Computer Architecture, 2000, 192–202Google Scholar

Dally, W., “Virtual-Channel Flow Control,” Proc. 17th Int. Symp. on Computer Architecture, 1990, 60–68CrossRef Google Scholar

Denning, P., “Virtual Memory,” ACM Computing Surveys, 2, Sep. 1970, 153–189CrossRef Google Scholar

Dennis, J. and Misunas, D., “A Preliminary Data Flow Architecture for a Basic Data Flow Processor,” Proc. 2nd Int. Symp. on Computer Architecture, 1974, 126–132CrossRef Google Scholar

Dongarra, J., Bunch, J., Moler, C., and Stewart, G., LINPACK User's Guide, SIAM, Philadelphia, 1979CrossRef Google Scholar

Dongarra, J., Luszczek, P., and Petitet, A., “The LINPACK Benchmark: Past, Present, and Future,” Concurrency and Computation: Practice and Experience, 15, 2003, 1–18CrossRef Google Scholar

Dubois, M., Scheurich, C., and Briggs, F., “Memory Access Buffering in Multiprocessors,” Proc. 13th Int. Symp. on Computer Architecture, 1986, 434–442CrossRef Google Scholar

Eden, A. and Mudge, T., “The YAGS Branch Prediction Scheme,” Proc. 31st Int. Symp. on Microarchitecture, 1998, 69–77CrossRef Google Scholar

Edmondson, J., Rubinfeld, P., Preston, R., and Rajagopalan, V., “Superscalar Instruction Execution in the 21164 Alpha Microprocessor,” IEEE Micro, 15, 2, Apr. 1995, 33–43CrossRef Google Scholar

Eggers, S., Emer, J., Levy, H., Lo, J., Stamm, R., and Tullsen, D., “Simultaneous Multithreading: A Platform for Next-Generation Processors,” IEEE Micro, 17, 5, Sep. 1997, 12–19CrossRef Google Scholar

Fagin, B. and Russell, K., “Partial Resolution in Branch Target Buffers,” Proc. 28th Int. Symp. on Microarchitecture, 1995, 193–198CrossRef Google Scholar

Farkas, D. and Jouppi, N., “Complexity/Performance Trade-offs with Non-Blocking Loads,” Proc. 21st Int. Symp. on Computer Architecture, 1994, 211–222Google Scholar

Fields, B., Bodik, R., and Hill, M., “Slack: Maximizing Performance under Technological Constraints,” Proc. 29th Int. Symp. on Computer Architecture, 2002, 47–58Google Scholar

Flynn, M., “Very High Speed Computing Systems,” Proc. IEEE, 54, 12, Dec. 1966, 1901–1909CrossRef Google Scholar

Folegnani, D. and Gonzales, A., “Energy-effective Issue Logic,” Proc. 28th Int. Symp. on Computer Architecture, 2001, 230–239Google Scholar

Franklin, M. and Sohi, G., “A Hardware Mechanism for Dynamic Reordering of Memory References,” IEEE Trans. on Computers, 45, 6, Jun. 1996, 552–571CrossRef Google Scholar

Gharachorloo, K., Gupta, A., and Hennessy, J., “Two Techniques to Enhance the Performance of Memory Consistency Models,” Proc. Int. Conf. on Parallel Processing, 1991, I-355–364Google Scholar

Gochman, S., Ronen, R., Anati, I., Berkovits, R., Kurts, T., Naveh, A., Saeed, A., Sperber, Z., and Valentine, R., “The Intel Pentium M Processor: Microarchitecture and Performance,” Intel Tech. Journal, 07, 2, May 2003, 21–39Google Scholar

Golden, M. and Mudge, T., “A Comparison of Two Pipeline Organizations,” Proc. 27th Int. Symp. on Microarchitecture, 1994, 153–161Google Scholar

Goodman, J., Vernon, M., and Woest, P., “Efficient Synchronization Primitives for Large-Scale Cache Coherent Multiprocessors,” Proc. 3rd Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Apr. 1989, 64–73Google Scholar

Graunke, G. and Thakkar, S., “Synchronization Algorithms for Shared-Memory Multiprocessors,” IEEE Computer, 23, 6, Jun. 1990, 60–70CrossRef Google Scholar

Grunwald, D., Levis, P., Farkas, K., Morrey, C., and Neufeld, M., “Policies for Dynamic Clock Scheduling,” Proc. 4th USENIX Symp. on Operating Systems Design and Implementation, 2000, 73–86Google Scholar

Gschwind, M., Hofstee, H., Flachs, B., Hopkins, M., Watanabe, Y., and Yamazaki, T., “Synergistic Processing in Cell's Multicore Architecture,” IEEE Micro, 26, 2, Mar. 2006, 11–24CrossRef Google Scholar

Gunther, S., Beans, F., Carmean, D., and Hall, J., “Managing the Impact of Increasing Power Consumption,” Intel Tech. Journal, 5, 1, Feb. 2001, 1–9Google Scholar

Gwennap, L., “Brainiacs, Speed Demons, and Farewell,” Microprocessor Report Newsletter, 13, 7, Dec. 1999Google Scholar

Hallnor, E. and Reinhardt, S., “A Fully Associative Software-Managed Cache Design,” Proc. 27th Int. Symp. on Computer Architecture, 2000, 107–116Google Scholar

Hao, E., Chang, P.-Y., and Patt, Y., “The Effect of Speculatively Updating Branch History on Branch Prediction Accuracy, Revisited,” Proc. 27th Int. Symp. on Microarchitecture, 1994, 228–232Google Scholar

Harstein, A. and Puzak, T., “The Optimum Pipeline Depth for a Microprocessor,” Proc. 29th Int. Symp. on Computer Architecture, 2002, 7–13Google Scholar

Hennessy, J. and Patterson, D., Computer Architecture: A Quantitative Approach, Fourth Edition, Elsevier Inc., San Francisco, 2007Google Scholar

Henning, J., Ed., “SPEC CPU2006 Benchmark Descriptions,” Computer Architecture News, 36, 4, Sep. 2006, 1–17CrossRef

Hill, M., Aspects of Cache Memory and Instruction Buffer Performance, Ph.D. Dissertation, Univ. of California, Berkeley, Nov. 1987CrossRef Google Scholar

Hill, M., “Multiprocessors Should Support Simple Memory-Consistency Models,” IEEE Computer, 31, 8, Aug. 1998, 28–34CrossRef Google Scholar

Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., and Roussel, P., “The Microarchitecture of the Pentium4 Processor,” Intel Tech. Journal, 1, Feb. 2001Google Scholar

Ho, R., Mai, K., and Horowitz, M., “The Future of Wires,” Proc. of the IEEE, 89, 4, Apr. 2001, 490–504CrossRef Google Scholar

Hrishikesh, M., Jouppi, N., Farkas, K., Burger, D., Keckler, S., and Shivakumar, P., “The Optimal Logic Depth per Pipeline Stage is 6 to 8 FO4 Inverter Delays,” Proc. 29th Int. Symp. on Computer Architecture, 2002, 14–24Google Scholar

Huck, J., Morris, D., Ross, J., Knies, A., Mulder, H., and Zahir, R., “Introducing the IA-64 Architecture,” IEEE Micro, 20, 5, Sep. 2000, 12–23CrossRef Google Scholar

Hwu, W.-m. and Patt, Y., “HPSm, A High-Performance Restricted Data Flow Architecture Having Minimal Functionality,” Proc. 13th Int. Symp. on Computer Architecture, 1986, 297–307CrossRef Google Scholar

,Intel Corp., A Tour of the P6 Microarchitecture, 1995, http://www.x86.org/ftp/manuals/686/p6tour.pdf

Jeremiassen, T. and Eggers, S., “Reducing False Sharing on Shared Memory Multiprocessors through Compile Time Data Transformations,” Proc. 5th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, 1995, 179–188Google Scholar

Jiménez, D., Keckler, S., and Lin, C., “The Impact of Delay on the Design of Branch Predictors,” Proc. 33rd Int. Symp. on Microarchitecture, 2000, 67–76Google Scholar

John, L., “More on Finding a Single Number to Indicate Overall Performance of a Benchmark Suite,” Computer Architecture News, 32, 1, Mar. 2004, 3–8CrossRef Google Scholar

Joseph, D. and Grunwald, D., “Prefetching Using Markov Predictors,” Proc. 24th Int. Symp. on Computer Architecture, 1997, 252–263CrossRef Google Scholar

Jouppi, N., “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” Proc. 17th Int. Symp. on Computer Architecture, 1990, 364–373CrossRef Google Scholar

Jourdan, S., Stark, J., Hsing, T.-H., and Patt, Y., “Recovery Requirements of Branch Prediction Storage Structures in the Presence of Mispredicted-path Execution,” International Journal of Parallel Programming, 25, Oct. 1997, 363–383CrossRef Google Scholar

Kaeli, D. and Emma, P., “Branch History Table Prediction of Moving Target Branches Due to Subroutine Returns,” Proc. 18th Int. Symp. on Computer Architecture, 1991, 34–42CrossRef Google Scholar

Kagi, A., Burger, D., and Goodman, J., “Efficient Synchronization: Let them Eat QOLB,” Proc. 24th Int. Symp. on Computer Architecture, 1997, 170–180CrossRef Google Scholar

Kahle, J., Day, M., Hofstee, H., Johns, C., Maeurer, T., and Shippy, D., “Introduction to the Cell Multiprocessor,” IBM Journal of Research and Development, 49, 4/5, Jul. 2005, 589–604CrossRef Google Scholar

Kalamatianos, J., Khalafi, A., Kaeli, D., and Meleis, W., “Analysis of Temporal-based Program Behavior for Improved Instruction Cache Performance,” IEEE Trans. on Computers, 48, 2, Feb. 1999, 168–175CrossRef Google Scholar

Kalla, R., Sinharoy, B., and Tendler, J., “IBM Power5 Chip: A Dual-Core Multithreaded Processor,” IEEE Micro, 24, 2, Apr. 2004, 40–47CrossRef Google Scholar

Kapasi, U., Rixner, S., Dally, W., Khailany, B., Ahn, J., Mattson, P., and Owens, J., “Programmable Stream Processors,” IEEE Computer, 36, 8, Aug. 2003, 54–62CrossRef Google Scholar

Kaxiras, S., Hu, Z., and Martonosi, M., “Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power,” Proc. 28th Int. Symp. on Computer Architecture, 2001, 240–251Google Scholar

Keller, R., “Look-Ahead Processors,” ACM Computing Surveys, 7, 4, Dec. 1975, 177–195CrossRef Google Scholar

Keltcher, C., McGrath, J., Ahmed, A., and Conway, P., “The AMD Opteron for Multiprocessor Servers,” IEEE Micro, 23, 2, 2003, 66–76CrossRef Google Scholar

,Kendall Square Research, KSR1 Technology Background, Waltham, MA, 1992Google Scholar

Kermani, P. and Kleinrock, L., “Virtual Cut-through: A New Computer Communication Switching Technique,” Computer Networks, 3, 4, Sep. 1979, 267–286Google Scholar

Kerns, D. and Eggers, S., “Balanced Scheduling: Instruction Scheduling when Memory Latency is Uncertain,” Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation, SIGPLAN Notices, 28, 6, Jun. 1993, 278–289Google Scholar

Keshava, J. and Pentkovski, V., “Pentium III Processor Implementation Tradeoffs,” Intel Tech. Journal, 2, May 1999Google Scholar

Kessler, R., “The Alpha 21264 Microprocessor,” IEEE Micro, 19, 2, Mar. 1999, 24–36CrossRef Google Scholar

Kessler, R., Jooss, R., Lebeck, A., and Hill, M., “Inexpensive Implementations of Set-Associativity,” Proc. 16th Int. Symp. on Computer Architecture, 1989, 131–139CrossRef Google Scholar

Kilburn, T., Edwards, D., Lanigan, M., and Sumner, F., “One-level Storage System,” IRE Trans. on Electronic Computers, EC-11, 2, Apr. 1962, 223–235CrossRef Google Scholar

Kim, C., Burger, D., and Keckler, S., “An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches,” Proc. 10th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 2002, 211–222CrossRef Google Scholar

Kim, N., Flautner, K., Blaauw, D., and Mudge, T., “Drowsy Instruction Caches – Leakage Power Reduction Using Dynamic Voltage Scaling and Cache Sub-bank Prediction,” Proc. 29th Int. Symp. on Computer Architecture, 2002, 219–230Google Scholar

KleinOsowski, A. and Lilja, D., “MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research,” Computer Architecture Letters, 1, Jun. 2002CrossRef Google Scholar

Kogge, P., The Architecture of Pipelined Computers, McGraw-Hill, New York, 1981Google Scholar

Kongetira, P., Aingaran, K., and Olukotun, K., “Niagara: A 32-way Multithreaded Sparc Processor,” IEEE Micro, 24, 2, Apr. 2005, 21–29CrossRef Google Scholar

Koufaty, D. and Marr, D., “Hyperthreading Technology in the Netburst Microarchitecture,” IEEE Micro, 23, 2, Mar. 2003, 56–65CrossRef Google Scholar

Kroft, D., “Lockup-Free Instruction Fetch/Prefetch Cache Organization,” Proc. 8th Int. Symp. on Computer Architecture, 1981, 81–87Google Scholar

Lai, A., Fide, C., and Falsafi, B., “Dead-block Prediction & Dead-block Correlation Prefetchers,” Proc. 28th Int. Symp. on Computer Architecture, 2001, 144–154CrossRef Google Scholar

Lam, M., “Software Pipelining: An Effective Scheduling Technique for VLIW Machines,” Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation, SIGPLAN Notices, 23, 7, Jul. 1988, 318–328Google Scholar

Lamport, L., “How to Make a Multiprocessor Computer that Correctly Executes Programs,” IEEE Trans. on Computers, 28, 9, Sep. 1979, 690–691CrossRef Google Scholar

Larus, J. and Kozyrakis, C., “Transactional Memory,” Communications of the ACM, 51, 7, Jul. 2008, 80–88CrossRef Google Scholar

Lee, D., Crowley, P., Baer, J.-L., Anderson, T., and Bershad, B., “Execution Characteristics of Desktop Applications on Windows NT,” Proc. 25th Int. Symp. on Computer Architecture, 1998, 27–38CrossRef Google Scholar

Lee, J., “Study of ‘Look-Aside’ Memory,” IEEE Trans. on Computers, C-18, 11, Nov. 1969, 1062–1065CrossRef Google Scholar

Lee, J. and Smith, A., “Branch Prediction Strategies and Branch Target Buffer Design,” IEEE Computer, 17, 1, Jan. 1984, 6–22CrossRef Google Scholar

Lin, W.-F., Reinhardt, S., and Burger, D., “Designing a Modern Memory Hierarchy with Hardware Prefetching,” IEEE Trans. on Computers, 50, 11, Nov. 2001, 1202–1218Google Scholar

Lipasti, M. and Shen, J.P., “Exceeding the Dataflow Limit with Value Prediction,” Proc. 29th Int. Symp. on Microarchitecture, 1996, 226–237Google Scholar

Liptay, J., “Design of the IBM Enterprise System/9000 High-end Processor,” IBM Journal of Research and Development, 36, 4, Jul. 1992, 713–731CrossRef Google Scholar

Lo, J., Barroso, L., Eggers, S., Gharachorloo, K., Levy, H., and Parekh, S., “An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors,” Proc. 25th Int. Symp. on Computer Architecture, 1998, 39–50CrossRef Google Scholar

Loh, G., “Advanced Instruction Flow Techniques,” Chapter 9 in Shen, J. P. and Lipasti, M., Eds., Modern Processor Design, 2005, 453–518Google Scholar

Lovett, T. and Clapp, R., “STiNG: A CC-NUMA Computer System for the Commercial Marketplace,” Proc. 23rd Int. Symp. on Computer Architecture, 1996, 308–317CrossRef Google Scholar

Lovett, T. and Thakkar, S., “The Symmetry Multiprocessor System,” Proc. Int. Conf. on Parallel Processing, Aug. 1988, pp. 303–310Google Scholar

Mathis, H., Mericas, A., McCalpin, J., Eickemeyer, R., and Kunkel, S., “Characterization of Simultaneous Multithreading (SMT) Efficiency in Power5,” IBM Journal of Research and Development, 49, 4, Jul. 2005, 555–564CrossRef Google Scholar

Mattson, R., Gecsei, J., Slutz, D., and Traiger, I., “Evaluation Techniques for Storage Hierarchies,” IBM Systems Journal, 9, 1970, 78–117CrossRef Google Scholar

McFarling, S., “Combining Branch Predictors,” WRL Technical Note, TN-36, Jun. 1993

McMahon, H., “The Livermore Fortran Kernels Test of the Numerical Performance Range,” in Martin, J. L., Ed., Performance Evaluation of Supercomputers, Elsevier Science B.V., North-Holland, Amsterdam, 1988, 143–186.Google Scholar

McNairy, C. and Soltis, D., “Itanium 2 Processor Microarchitecture,” IEEE Micro, 23, 2, Mar. 2003, 44–55CrossRef Google Scholar

Mendelson, A., Mandelblat, J., Gochman, S., Shemer, A., Chabukswar, R., Niemeyer, E., and Kumar, A., “CMP Implementation in Systems Based on the Intel Core Duo Processor,” Intel Tech. Journal, 10, 2, May 2006, 99–107CrossRef Google Scholar

Moore, G., “Cramming More Components onto Integrated Circuits,” Electronics, 38, 8, Apr. 1965Google Scholar

Moshovos, A., Breach, S., Vijaykumar, T., and Sohi, G., “Dynamic Speculation and Synchronization of Data Dependences,” Proc. 24th Int. Symp. on Computer Architecture, 1997, 181–193CrossRef Google Scholar

Mowry, T., Lam, M., and Gupta, A., “Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors,” Proc. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1992, 62–73Google Scholar

Mutlu, O., Stark, J., Wilkerson, C., and Patt, Y., “Run-ahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors,” Proc. 9th Int. Symp. on High-Performance Computer Architecture, 2003, 129–140Google Scholar

Naveh, A., Rotem, E., Mendelson, A., Gochman, S., Chabuskwar, R., Krishnan, K., and Kumar, A., “Power and Thermal Management in the Intel Core Dual Processor,” Intel Tech. Journal, 10, 2, May 2006, 109–122CrossRef Google Scholar

Ozer, E., Banerjia, S., and Conte, T., “Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File Microarchitectures,” Proc. 31st Int. Symp. on Microarchitecture, 1998, 308–315CrossRef Google Scholar

Palacharla, S., Jouppi, N., and Smith, J., “Complexity-Effective Superscalar Processors,” Proc. 24th Int. Symp. on Computer Architecture, 1997, 206–218CrossRef Google Scholar

Palacharla, S. and Kessler, R., “Evaluating Stream Buffers as a Secondary Cache Replacement,” Proc. 21st Int. Symp. on Computer Architecture, 1994, 24–33Google Scholar

Pan, S., So, K., and Rahmey, J., “Improving the Accuracy of Dynamic Branch Prediction using Branch Correlation,” Proc. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1992, 76–84Google Scholar

Papamarcos, M. and Patel, J., “A Low-overhead Coherence Solution for Multiprocessors with Private Cache Memories,” Proc. 12th Int. Symp. on Computer Architecture, 1985, 348–354Google Scholar

Papworth, D., “Tuning the Pentium Pro Microarchitecture,” IEEE Micro, 16, 2, Mar. 1996, 8–15CrossRef Google Scholar

Patel, S., Friendly, D., and Patt, Y., “Evaluation of Design Options for the Trace Cache Fetch Mechanism,” IEEE Trans. on Computers, 48, 2, Feb. 1999, 193–204CrossRef Google Scholar

Patterson, D. and Hennessy, J., Computer Organization & Design: The Hardware/Software Interface, Third Edition, Morgan Kaufman Publishers, San Francisco, 2004Google Scholar

Patterson, D. and Séquin, C., “RISC I: A Reduced Instruction Set VLSI Computer,” Proc. 8th Int. Symp. on Computer Architecture, 1981, 443–457.Google Scholar

Peir, J.-K., Hsu, W., and Smith, A., “Functional Implementations Techniques for CPU Cache Memories,” IEEE Trans. on Computers, 48, 2, Feb. 1999, 100–110CrossRef Google Scholar

Peleg, A. and Weiser, U., “Dynamic Flow Instruction Cache Memory Organized Around Trace Segments Independent of Virtual Address Line,” U.S. Patent Number 5,381,533, 1994

Peleg, A. and Weiser, U., “MMX Technology Extension to the Intel Architecture,” IEEE Micro, 16, 4, Aug. 1996, 42–50CrossRef Google Scholar

Perleberg, C. and Smith, A., “Branch Target Buffer Design and Optimization,” IEEE Trans. on Computers, 42, 4, Apr. 1993, 396–412CrossRef Google Scholar

Pettis, K. and Hansen, R., “Profile Guided Code Positioning,” Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation, SIGPLAN Notices, 25, Jun. 1990, 16–27Google Scholar

Ponomarev, D., Kucuk, G., and Ghose, K., “Reducing Power Requirements of Instruction Scheduling through Dynamic Allocation of Multiple Datapath resources,” Proc. 34th Int. Symp. on Microarchitecture, 2001, 90–101CrossRef Google Scholar

Postiff, M., Tyson, G., and Mudge, T., “Performance Limits of Trace Caches,” Journal of Instruction-Level Parallelism, 1, Sep. 1999, 1–17Google Scholar

Przybylski, S., Cache Design: A Performance Directed Approach, Morgan Kaufman Publishers, San Francisco, 1990Google Scholar

Pugh, E., Johnson, L., and Palmer, J., IBM's 360 and Early 370 Systems, The MIT Press, Cambridge, MA, 1991Google Scholar

Ranganathan, P., Adve, S., and Jouppi, N., “Performance of Image and Video Processing with General-Purpose Processors and Media ISA Extensions,” Proc. 26th Int. Symp. on Computer Architecture, 1999, 124–135CrossRef Google Scholar

Riseman, E. and Foster, C., “The Inhibition of Potential Parallelism by Conditional Jumps,” IEEE Trans. on Computers, C-21, 12, Dec. 1972, 1405–1411CrossRef Google Scholar

Romer, T., Lee, D., Volker, G., Wolman, A., Wong, W., Baer, J.-L., Bershad, B., and Levy, H., “The Structure and Performance of Interpreters,” Proc. 7th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1996, pp. 150–159Google Scholar

Rotenberg, E., Bennett, S., and Smith, J., “Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching,” Proc. 29th Int. Symp. on Microarchitecture, 1996, 24–34Google Scholar

Rudolf, L. and Segall, Z., “Dynamic Decentralized Cache Schemes for MIMD Parallel Processors,” Proc. 11th Int. Symp. on Computer Architecture, 1984, 340–347CrossRef Google Scholar

Salverda, P. and Zilles, C., “A Criticality Analysis of Clustering in Superscalar Processors,” Proc. 38th Int. Symp. on Microarchitecture, 2005, 55–66CrossRef Google Scholar

Schlansker, M. and Rau, B., “EPIC: Explicitly Parallel Instruction Computing,” IEEE Computer, 33, 2, Feb. 2000, 37–45CrossRef Google Scholar

Scott, S., “Synchronization and Communication in the Cray 3TE Multiprocessor,” Proc. 7th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1996, 26–36Google Scholar

Seznec, A., “A Case for Two-way Skewed-Associative Caches,” Proc. 20th Int. Symp. on Computer Architecture, 1993, 169–178Google Scholar

Sharangpani, H. and Arora, K., “Itanium Processor Microarchitecture,” IEEE Micro, 20, 5, Sep. 2000, 24–43CrossRef Google Scholar

Shen, J. P. and Lipasti, M., Modern Processor Design Fundamentals of Superscalar Processors, McGraw-Hill, 2005Google Scholar

Sherwood, T., Perelman, E., Hamerly, G., Sair, S., and Calder, B., “Discovering and Exploiting Program Phases,” IEEE Micro, 23, 6, Nov.–Dec. 2003, 84–93CrossRef Google Scholar

Sima, D., “The Design Space of Register Renaming Techniques,” IEEE Micro, 20, 5, Sep. 2000, 70–83CrossRef Google Scholar

Skadron, K., Martonosi, M., and Clark, D., “Speculative Updates of Local and Global Branch History: A Quantitative Analysis,” Journal of Instruction-Level Parallelism, 2, 2000, 1–23Google Scholar

Skadron, K., Stan, M., Huang, W., Velusamy, S., Sankararayanan, K., and Tarjan, D., “Temperature-Aware Microarchitecture,” Proc. 30th Int. Symp. on Computer Architecture, 2003, 2–13CrossRef Google Scholar

Slingerland, N. and Smith, A., “Multimedia Extensions for General-Purpose Microprocessors: A Survey,” Microprocessors and Microsystems, 29, 5, Jan. 2005, 225–246CrossRef Google Scholar

Smith, A., “Cache Memories,” ACM Computing Surveys, 14, 3, Sep. 1982, 473–530CrossRef Google Scholar

Smith, B., “A Pipelined, Shared Resource MIMD Computer,” Proc. Int. Conf. on Parallel Processing, 1978, 6–8Google Scholar

Smith, J., “A Study of Branch Prediction Strategies,” Proc. 8th Int. Symp. on Computer Architecture, 1981, 135–148Google Scholar

Smith, J., “Characterizing Computer Performance with a Single Number,” Communications of the ACM, 31, 10, Oct. 1988, 1201–1206CrossRef Google Scholar

Smith, J. and Pleszkun, A., “Implementation of Precise Interrupts in Pipelined Processors,” IEEE Trans. on Computers, C-37, 5, May 1988, 562–573 (an earlier version was published in Proc. 12th Int. Symp. on Computer Architecture, 1985)CrossRef Google Scholar

Smith, J. and Sohi, G., “The Microarchitecture of Superscalar Processors,” Proc. IEEE, 83, 12, Dec. 1995, 1609–1624CrossRef Google Scholar

Sohi, G., “Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers,” IEEE Trans. on Computers, C-39, 3, Mar. 1990, 349–359 (an earlier version with co-author S. Vajapeyam was published in Proc. 14th Int. Symp. on Computer Architecture, 1987)CrossRef Google Scholar

Sohi, G., Breach, S., and Vijaykumar, T., “Multiscalar Processors,” Proc. 22nd Int. Symp. on Computer Architecture, 1995, 414–425CrossRef Google Scholar

Sohi, G. and Roth, A., “Speculative Multithreaded Processors,” IEEE Computer, 34, 4, Apr. 2001, 66–73CrossRef Google Scholar

Srinivasan, S., Ju, D.-C., Lebeck, A., and Wilkerson, C., “Locality vs. Criticality,” Proc. 28th Int. Symp. on Computer Architecture, 2001, 132–143Google Scholar

Stark, J., Brown, M., and Patt, Y., “On Pipelining Dynamic Instruction Scheduling Logic,” Proc. 34th Int. Symp. on Microarchitecture, 2000, 57–66Google Scholar

Stunkel, C., Herring, J., Abali, B., and Sivaram, R., “A New Switch Chip for IBM RS/6000 SP Systems,” Proc. Int. Conf. on Supercomputing, 1999, 16–33Google Scholar

Sweazey, P. and Smith, A., “A Class of Compatible Cache Consistency Protocols and their Support by the IEEE Future Bus,” Proc. 13th Int. Symp. on Computer Architecture, 1986, 414–423CrossRef Google Scholar

Tendler, J., Dodson, J., Fields, Jr. J., Le, H., and Sinharoy, B., “POWER 4 System Microarchitecture,” IBM Journal of Research and Development, 46, 1, Jan. 2002, 5–25CrossRef Google Scholar

Thornton, J., “Parallel Operation in the Control Data 6600,” Proc. AFIPS. FJCC, pt. 2, vol. 26, 1964, 33–40 (reprinted as Chapter 39 of Bell, C. and Newell, A., Eds., Computer Structures: Readings and Examples, McGraw-Hill, New York, 1971, and Chapter 43 of Siewiorek, D., Bell, C., and Newell, A., Eds., Computer Structures: Principles and Examples, McGraw-Hill, New York, 1982)Google Scholar

Thornton, J., Design of a Computer. The Control Data 6600, Scott, Foresman and Co., Glenview, IL, 1970Google Scholar

Tjaden, G. and Flynn, M., “Detection and Parallel Execution of Independent Instructions,” IEEE Trans. on Computers, C-19, 10, Oct. 1970, 889–895CrossRef Google Scholar

Tomasulo, R., “An Efficient Algorithm for Exploiting Multiple Arithmetic Units,” IBM Journal of Research and Development, 11, 1, Jan. 1967, 25–33CrossRef Google Scholar

Tremblay, M., Chan, J., Chaudhry, S., Coniglaro, A., and Tse, S., “The MAJC Architecture: A Synthesis of Parallelism and Scalability,” IEEE Micro, 20, 6, Nov. 2000, 12–25Google Scholar

Tremblay, M. and O'Connor, J., “UltraSparc I: A Four-issue Processor Supporting Multimedia,” IEEE Micro, 16, 2, Apr. 1996, 42–50CrossRef Google Scholar

Tucker, L. and Robertson, G., “Architecture and Applications of the Connection Machine,” IEEE Computer, 21, 8, Aug. 1988, 26–38CrossRef Google Scholar

Tullsen, D., Eggers, S., and Levy, H., “Simultaneous Multithreading: Maximizing On-chip Parallelism,” Proc. 22nd Int. Symp. on Computer Architecture, 1995, 392–403CrossRef Google Scholar

Tune, E., Liang, D., Tullsen, D., and Calder, B., “Dynamic Prediction of Critical Path Instructions,” Proc. 7th Int. Symp. on High-Performance Computer Architecture, 2001, 185–195Google Scholar

Uhlig, R. and Mudge, T., “Trace-driven Memory Simulation: A Survey,” ACM Computing Surveys, 29, 2, Jun. 1997, 128–170CrossRef Google Scholar

Vanderwiel, S. and Lilja, D., “Data Prefetch Mechanisms,” ACM Computing Surveys, 32, 2, Jun. 2000, 174–199CrossRef Google Scholar

VanVleet, P., Anderson, E., Brown, L., Baer, J.-L., and Karlin, A., “Pursuing the Performance Potential of Dynamic Cache Lines,” Proc. ICCD, Oct. 1999, 528–537Google Scholar

Venkatachalam, V. and Franz, M., “Power Reduction Techniques for Microprocessor Systems,” ACM Computing Surveys, 37, 3, Sep. 2005, 195–237CrossRef Google Scholar

Weicker, R., “Dhrystone: A Synthetic Systems Programming Benchmark,” Communications of the ACM, 27, Oct. 1984, 1013–1030CrossRef Google Scholar

Weiser, M., Welch, B., Demers, A., and Shenker, S., “Scheduling for Reduced CPU Energy,” Proc. 1st USENIX Symp. on Operating Systems Design and Implementation, 1994, 13–23Google Scholar

Weschler, O., “Inside Intel Core Microarchitecture,” Intel White Paper, 2006, http://download.intel.com/technology/architecture/new_architecture_06.pdf

Wilkes, M., “Slave Memories and Dynamic Storage Allocation,” IEEE Trans. on Electronic Computers, EC-14, Apr. 1965, 270–271CrossRef Google Scholar

Wong, W. and Baer, J.-L., “Modified LRU Policies for Improving Second-Level Cache Behavior,” Proc. 6th Int. Symp. on High-Performance Computer Architecture, 2000, 49–60Google Scholar

Yeager, K., “The MIPS R10000 Superscalar Microprocessor,” IEEE Micro, 16, 2, Apr. 1996, 28–41CrossRef Google Scholar

Yeh, T.-Y. and Patt, Y., “Alternative Implementations of Two-Level Adaptive Branch Prediction,” Proc. 19th Int. Symp. on Computer Architecture, 1992, 124–134CrossRef Google Scholar

Yeh, T.-Y. and Patt, Y., “A Comprehensive Instruction Fetch Mechanism for a Processor Supporting Speculative Execution,” Proc. 25th Ann. Symp. on Microarchitecture, 1992, 129–139Google Scholar

Yoaz, A., Erez, M., Ronen, R., and Jourdan, S., “Speculation Techniques for Improving Load Related Instruction Scheduling,” Proc. 26th Int. Symp. on Computer Architecture, 1999, 42–53CrossRef Google Scholar

Zhang, Z., Zhu, Z., and Zhang, X., “A Permutation-based Page Interleaving Scheme to Reduce Row-buffer Conflicts and Exploit Data Locality,” Proc. 33rd Int. Symp. on Microarchitecture, 2000, 32–41Google Scholar

Book contents

Bibliography

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive