Skip to main content Accessibility help
×
Hostname: page-component-8448b6f56d-dnltx Total loading time: 0 Render date: 2024-04-25T05:16:47.327Z Has data issue: false hasContentIssue false

Bibliography

Published online by Cambridge University Press:  05 June 2012

Jean-Loup Baer
Affiliation:
University of Washington
Get access

Summary

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'
Type
Chapter
Information
Microprocessor Architecture
From Simple Pipelines to Chip Multiprocessors
, pp. 351 - 360
Publisher: Cambridge University Press
Print publication year: 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abel, N., Budnick, D., Kuck, D., Muraoka, Y., Northcote, R., and Wilhelmson, R., “TRANQUIL: A Language for an Array Processing Computer,” Proc. AFIPS SJCC, 1969, 57–73Google Scholar
Adiletta, M., Rosenbluth, M., Bernstein, D., Wolrich, G., and Wilkinson, H., “The Next Generation of Intel IXP Network Processors,” Intel Tech. Journal, 6, 3, Aug. 2002, 6–18Google Scholar
Adve, S. and Gharachorloo, K., “Shared Memory Consistency Models: A Tutorial,” IEEE Computer, 29, 12, Dec. 1996, 66–76CrossRefGoogle Scholar
Agarwal, A., Bianchini, R., Chaiken, D., Johnson, K., Kranz, D., Kubiatowicz, J., Lim, B.-H., Mackenzie, K., and Yeung, D., “The MIT Alewife Machine: Architecture and Performance,” Proc. 22nd Int. Symp. on Computer Architecture, 1995, 2–13CrossRefGoogle Scholar
Agarwal, A., Lim, B.-H., Kranz, D., and Kubiatowicz, J., “APRIL: A Processor Architecture for Multiprocessing,” Proc. 17th Int. Symp. on Computer Architecture, 1990, 104–114CrossRefGoogle Scholar
Agarwal, A. and Pudar, S., “Column-Associative Caches: A Technique for Reducing the Miss Rate of Direct-Mapped Caches,” Proc. 20th Int. Symp. on Computer Architecture, 1993, 179–190Google Scholar
Agarwal, A., Simoni, R., Hennessy, J., and Horowitz, M., “An Evaluation of Directory Schemes for Cache Coherence,” Proc. 15th Int. Symp. on Computer Architecture, 1988, 280–289Google Scholar
Aggarwal, A. and Franklin, M., “Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors,” IEEE Trans. on Parallel and Distributed Systems, 16, 10, Oct. 2005, 944–955CrossRefGoogle Scholar
Akkary, H. and Driscoll, M., “A Dynamic Multithreading Processor,” Proc. 31st Int. Symp. on Microarchitecture, 1998, 226–236CrossRefGoogle Scholar
Albonesi, D., Balasubramonian, R., Dropsho, S., Dwarkadas, S., Friedman, E., Huang, M., Kursun, V., Magklis, G., Scott, M., Semeraro, G., Bose, P., Buyuktosunoglu, A., Cook, P., and Schuster, S., “Dynamic Tuning Processor Resources with Adaptive Processing,” IEEE Computer, 36, 12, Dec. 2003, 49–58CrossRefGoogle Scholar
Alverson, R., Callahan, D., Cummings, D., Koblenz, B., Porterfield, A., and Smith, B., “The Tera Computer System,” Proc. Int. Conf. on Supercomputing, 1990, 1–6Google Scholar
Amdahl, G., “Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities,” Proc. AFIPS SJCC, 30, Apr. 1967, 483–485Google Scholar
Anderson, D., Sparacio, F., and Tomasulo, R., “Machine Philosophy and Instruction Handling,” IBM Journal of Research and Development, 11, 1, Jan. 1967, 8–24CrossRefGoogle Scholar
Anderson, S., Earle, J., Goldschmitt, R., and Powers, D., “The IBM System/360 Model 91: Floating-point Execution Unit,” IBM Journal of Research and Development, 11, Jan. 1967, 34–53CrossRefGoogle Scholar
Anderson, T., “The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors,” IEEE Trans. on Parallel and Distributed Systems, 1, 1, Jan. 1990, 6–16CrossRefGoogle Scholar
Archibald, J. and Baer, J.-L., “An Economical Solution to the Cache Coherence Problem,” Proc. 12th Int. Symp. on Computer Architecture, 1985, 355–362Google Scholar
Archibald, J. and Baer, J.-L., “Cache Coherence Protocols: Evaluation Using a Multiprocessor Simulation Model,” ACM Trans. on Computing Systems, 4, 4, Nov. 1986, 273–298CrossRefGoogle Scholar
August, D., Connors, D., Mahlke, S., Sias, J., Crozier, K., Cheng, B., Eaton, P., Olaniran, Q., and Hwu, W.-m., “Integrated Predicated and Speculative Execution in the IMPACT EPIC Architecture,” Proc. 25th Int. Symp. on Computer Architecture, 1998, 227–237CrossRefGoogle Scholar
Austin, T., Larson, D., and Ernst, D., “SimpleScalar: An Infrastructure for Computer System Modeling,” IEEE Computer, 35, 2, Feb. 2002, 59–67CrossRefGoogle Scholar
Baer, J.-L. and Wang, W.-H., “On the Inclusion Properties for Multi-Level Cache Hierarchies,” Proc. 15th Int. Symp. on Computer Architecture, 1988, 73–80Google Scholar
Baetke, F., “The CONVEX Exemplar SPP1000 and SPP1200 – New Scalable Parallel Systems with a Virtual Shared Memory Architecture,” in Dongarra, J., Grandinetti, L., Joubert, G., and Kowalik, J., Eds., High Performance Computing: Technology, Methods and Applications, Elsevier Press, 1995, 81–102CrossRefGoogle Scholar
Balasubramonian, R., Albonesi, D., Buyuktosunoglu, A., and Dwarkadas, S., “Memory Hierarchy Reconfiguration for Energy and Performance in General-purpose Processor Architectures,” Proc. 33rd Int. Symp. on Microarchitecture, 2000, 245–257Google Scholar
Belady, L., “A Study of Replacement Algorithms for a Virtual Storage Computer,” IBM Systems Journal, 5, 1966, 78–101CrossRefGoogle Scholar
Bernstein, A., “Analysis of Programs for Parallel Processing,” IEEE Trans. on Electronic Computers, EC-15, Oct. 1966, 746–757Google Scholar
Bhandarkar, D., Alpha Implementations and Architecture. Complete Reference and Guide, Digital Press, Boston, 1995Google Scholar
Boggs, D., Baktha, A., Hawkins, J., Marr, D., Miller, J., Roussel, P., Singhal, R., Toll, B., and Venkatraman, K., “The Microarchitecture of the Pentium 4 Processor on 90nm Technology,” Intel Tech. Journal, 8, 1, Feb. 2004, 1–17Google Scholar
Borkenhagen, J., Eickemeyer, R., Kalla, R., and Kunkel, S., “A Multithreaded PowerPC Processor for Commercial Servers,” IBM Journal of Research and Development, 44, 6, 2000, 885–899CrossRefGoogle Scholar
Brooks, D. and Martonosi, M., “Dynamic Thermal Management in High-Performance Microprocessors,” Proc.7th Int. Symp. on High-Performance Computer Architecture, 2001, 171–182Google Scholar
Bucholz, W., Ed., Planning a Computer System: Project Stretch, McGraw-Hill, New York, 1962
Calder, B. and Grunwald, D., “Fast & Accurate Instruction Fetch and Branch Prediction,” Proc. 21st Int. Symp. on Computer Architecture, 1994, 2–11Google Scholar
Calder, B. and Grunwald, D., “Next Cache Line and Set Prediction,” Proc. 22nd Int. Symp. on Computer Architecture, 1995, 287–296CrossRefGoogle Scholar
Calder, B., Grunwald, D., and Emer, J., “Predictive Sequential Associative Cache,” Proc. 2nd Int. Symp. on High-Performance Computer Architecture, 1996, 244–253CrossRefGoogle Scholar
Calder, B. and Reinmann, G., “A Comparative Survey of Load Speculation Architectures,” Journal of Instruction-Level Parallelism, 1, 2000, 1–39Google Scholar
Canal, R., Parcerisa, J.M., and Gonzales, A., “Dynamic Cluster Assignment Mechanisms,” Proc. 6th Int. Symp. on High-Performance Computer Architecture, 2000, 133–141Google Scholar
Cantin, J. and Hill, M., Cache Performance for SPEC CPU2000 Benchmarks, Version 3.0, May 2003, http://www.cs.wisc.edu/multifacet/misc/spec2000cache-data/
Case, R. and Padegs, A., “The Architecture of the IBM System/370,” Communications of the ACM, 21, 1, Jan. 1978, 73–96CrossRefGoogle Scholar
Censier, L. and Feautrier, P., “A New Solution to Coherence Problems in Multicache Systems,” IEEE Trans. on Computers, 27, 12, Dec. 1978, 1112–1118CrossRefGoogle Scholar
Chan, K., Hay, C., Keller, J., Kurpanek, G., Shumaker, F., and Zheng, J., “Design of the HP PA 7200 CPU,” Hewlett Packard Journal, 47, 1, Jan. 1996, 25–33Google Scholar
Chaudhry, S., Caprioli, P., Yip, S., and Tremblay, M., “High-Performance Throughput Computing,” IEEE Micro, 25, 3, May 2005, 32–45CrossRefGoogle Scholar
Chen, T.-F. and Baer, J.-L., “Effective Hardware-based Data Prefetching for High-Performance Processors,” IEEE Trans. on Computers, 44, 5, May 1995, 609–623CrossRefGoogle Scholar
Cheng, I-C., Coffey, J., and Mudge, T., “Analysis of Branch Prediction via Data Compression,” Proc. 7th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1996, 128–137Google Scholar
Christie, D., “Developing the AMD-K5 Architecture,” IEEE Micro, 16, 2, Mar. 1996, 16–27CrossRefGoogle Scholar
Chryzos, G. and Emer, J., “Memory Dependence Prediction Using Store Sets,” Proc. 25th Int. Symp. on Computer Architecture, 1998, 142–153CrossRefGoogle Scholar
Citron, D., Hurani, A., and Gnadrey, A., “The Harmonic or Geometric Mean: Does it Really Matter,” Computer Architecture News, 34, 6, Sep. 2006, 19–26CrossRefGoogle Scholar
Colwell, R., Papworth, D., Hinton, G., Fetterman, M., and Glew, A., “Intel's P6 Microarchitecture,” Chapter 7 in Shen, J. P. and Lipasti, M., Eds., Modern Processor Design, 2005, 329–367Google Scholar
Conte, T., Memezes, K., Mills, P., and Patel, B., “Optimization of Instruction Fetch Mechanisms for High Issue Rates,” Proc. 22nd Int. Symp. on Computer Architecture, 1995, 333–344CrossRefGoogle Scholar
Conti, C., Gibson, D., and Pitkowsky, S., “Structural Aspects of the IBM System 360/85; General Organization,” IBM Systems Journal, 7, 1968, 2–14CrossRefGoogle Scholar
Cooksey, R., Jourdan, S., and Grunwald, D., “A Stateless, Content-Directed Data Prefetching Mechanism,” Proc. 10th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 2002, 279–290CrossRefGoogle Scholar
Crisp, R., “Direct Rambus Technology: The New Main Memory Standard,” IEEE Micro, 17, 6, Nov.–Dec. 1997, 18–28CrossRefGoogle Scholar
Culler, D. and Singh, J.P. with Gupta, A., Parallel Computer Architecture: A Hardware/Software Approach, Morgan Kaufman Publishers, San Francisco, 1999Google Scholar
Cuppu, V., Jacob, B., Davis, B., and Mudge, T., “High-Performance DRAMs in Workstation Environments,” IEEE Trans. on Computers, 50, 11, Nov. 2001, 1133–1153CrossRefGoogle Scholar
Curnow, H. and Wichman, B., “Synthetic Benchmark,” Computer Journal, 19, 1, Feb. 1976CrossRefGoogle Scholar
Cvetanovic, Z. and Bhandarkar, D., “Performance Characterization of the Alpha 21164 Microprocessor Using TP and SPEC Workloads,” Proc. 2nd Int. Symp. on High-Performance Computer Architecture, 1996, 270–280CrossRefGoogle Scholar
Cvetanovic, Z. and Kessler, R., “Performance Analysis of the Alpha 21264-based Compaq ES40 System,” Proc. 27th Int. Symp. on Computer Architecture, 2000, 192–202Google Scholar
Dally, W., “Virtual-Channel Flow Control,” Proc. 17th Int. Symp. on Computer Architecture, 1990, 60–68CrossRefGoogle Scholar
Denning, P., “Virtual Memory,” ACM Computing Surveys, 2, Sep. 1970, 153–189CrossRefGoogle Scholar
Dennis, J. and Misunas, D., “A Preliminary Data Flow Architecture for a Basic Data Flow Processor,” Proc. 2nd Int. Symp. on Computer Architecture, 1974, 126–132CrossRefGoogle Scholar
Dongarra, J., Bunch, J., Moler, C., and Stewart, G., LINPACK User's Guide, SIAM, Philadelphia, 1979CrossRefGoogle Scholar
Dongarra, J., Luszczek, P., and Petitet, A., “The LINPACK Benchmark: Past, Present, and Future,” Concurrency and Computation: Practice and Experience, 15, 2003, 1–18CrossRefGoogle Scholar
Dubois, M., Scheurich, C., and Briggs, F., “Memory Access Buffering in Multiprocessors,” Proc. 13th Int. Symp. on Computer Architecture, 1986, 434–442CrossRefGoogle Scholar
Eden, A. and Mudge, T., “The YAGS Branch Prediction Scheme,” Proc. 31st Int. Symp. on Microarchitecture, 1998, 69–77CrossRefGoogle Scholar
Edmondson, J., Rubinfeld, P., Preston, R., and Rajagopalan, V., “Superscalar Instruction Execution in the 21164 Alpha Microprocessor,” IEEE Micro, 15, 2, Apr. 1995, 33–43CrossRefGoogle Scholar
Eggers, S., Emer, J., Levy, H., Lo, J., Stamm, R., and Tullsen, D., “Simultaneous Multithreading: A Platform for Next-Generation Processors,” IEEE Micro, 17, 5, Sep. 1997, 12–19CrossRefGoogle Scholar
Fagin, B. and Russell, K., “Partial Resolution in Branch Target Buffers,” Proc. 28th Int. Symp. on Microarchitecture, 1995, 193–198CrossRefGoogle Scholar
Farkas, D. and Jouppi, N., “Complexity/Performance Trade-offs with Non-Blocking Loads,” Proc. 21st Int. Symp. on Computer Architecture, 1994, 211–222Google Scholar
Fields, B., Bodik, R., and Hill, M., “Slack: Maximizing Performance under Technological Constraints,” Proc. 29th Int. Symp. on Computer Architecture, 2002, 47–58Google Scholar
Flynn, M., “Very High Speed Computing Systems,” Proc. IEEE, 54, 12, Dec. 1966, 1901–1909CrossRefGoogle Scholar
Folegnani, D. and Gonzales, A., “Energy-effective Issue Logic,” Proc. 28th Int. Symp. on Computer Architecture, 2001, 230–239Google Scholar
Franklin, M. and Sohi, G., “A Hardware Mechanism for Dynamic Reordering of Memory References,” IEEE Trans. on Computers, 45, 6, Jun. 1996, 552–571CrossRefGoogle Scholar
Gharachorloo, K., Gupta, A., and Hennessy, J., “Two Techniques to Enhance the Performance of Memory Consistency Models,” Proc. Int. Conf. on Parallel Processing, 1991, I-355–364Google Scholar
Gochman, S., Ronen, R., Anati, I., Berkovits, R., Kurts, T., Naveh, A., Saeed, A., Sperber, Z., and Valentine, R., “The Intel Pentium M Processor: Microarchitecture and Performance,” Intel Tech. Journal, 07, 2, May 2003, 21–39Google Scholar
Golden, M. and Mudge, T., “A Comparison of Two Pipeline Organizations,” Proc. 27th Int. Symp. on Microarchitecture, 1994, 153–161Google Scholar
Goodman, J., Vernon, M., and Woest, P., “Efficient Synchronization Primitives for Large-Scale Cache Coherent Multiprocessors,” Proc. 3rd Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Apr. 1989, 64–73Google Scholar
Graunke, G. and Thakkar, S., “Synchronization Algorithms for Shared-Memory Multiprocessors,” IEEE Computer, 23, 6, Jun. 1990, 60–70CrossRefGoogle Scholar
Grunwald, D., Levis, P., Farkas, K., Morrey, C., and Neufeld, M., “Policies for Dynamic Clock Scheduling,” Proc. 4th USENIX Symp. on Operating Systems Design and Implementation, 2000, 73–86Google Scholar
Gschwind, M., Hofstee, H., Flachs, B., Hopkins, M., Watanabe, Y., and Yamazaki, T., “Synergistic Processing in Cell's Multicore Architecture,” IEEE Micro, 26, 2, Mar. 2006, 11–24CrossRefGoogle Scholar
Gunther, S., Beans, F., Carmean, D., and Hall, J., “Managing the Impact of Increasing Power Consumption,” Intel Tech. Journal, 5, 1, Feb. 2001, 1–9Google Scholar
Gwennap, L., “Brainiacs, Speed Demons, and Farewell,” Microprocessor Report Newsletter, 13, 7, Dec. 1999Google Scholar
Hallnor, E. and Reinhardt, S., “A Fully Associative Software-Managed Cache Design,” Proc. 27th Int. Symp. on Computer Architecture, 2000, 107–116Google Scholar
Hao, E., Chang, P.-Y., and Patt, Y., “The Effect of Speculatively Updating Branch History on Branch Prediction Accuracy, Revisited,” Proc. 27th Int. Symp. on Microarchitecture, 1994, 228–232Google Scholar
Harstein, A. and Puzak, T., “The Optimum Pipeline Depth for a Microprocessor,” Proc. 29th Int. Symp. on Computer Architecture, 2002, 7–13Google Scholar
Hennessy, J. and Patterson, D., Computer Architecture: A Quantitative Approach, Fourth Edition, Elsevier Inc., San Francisco, 2007Google Scholar
Henning, J., Ed., “SPEC CPU2006 Benchmark Descriptions,” Computer Architecture News, 36, 4, Sep. 2006, 1–17CrossRef
Hill, M., Aspects of Cache Memory and Instruction Buffer Performance, Ph.D. Dissertation, Univ. of California, Berkeley, Nov. 1987CrossRefGoogle Scholar
Hill, M., “Multiprocessors Should Support Simple Memory-Consistency Models,” IEEE Computer, 31, 8, Aug. 1998, 28–34CrossRefGoogle Scholar
Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., and Roussel, P., “The Microarchitecture of the Pentium4 Processor,” Intel Tech. Journal, 1, Feb. 2001Google Scholar
Ho, R., Mai, K., and Horowitz, M., “The Future of Wires,” Proc. of the IEEE, 89, 4, Apr. 2001, 490–504CrossRefGoogle Scholar
Hrishikesh, M., Jouppi, N., Farkas, K., Burger, D., Keckler, S., and Shivakumar, P., “The Optimal Logic Depth per Pipeline Stage is 6 to 8 FO4 Inverter Delays,” Proc. 29th Int. Symp. on Computer Architecture, 2002, 14–24Google Scholar
Huck, J., Morris, D., Ross, J., Knies, A., Mulder, H., and Zahir, R., “Introducing the IA-64 Architecture,” IEEE Micro, 20, 5, Sep. 2000, 12–23CrossRefGoogle Scholar
Hwu, W.-m. and Patt, Y., “HPSm, A High-Performance Restricted Data Flow Architecture Having Minimal Functionality,” Proc. 13th Int. Symp. on Computer Architecture, 1986, 297–307CrossRefGoogle Scholar
,Intel Corp., A Tour of the P6 Microarchitecture, 1995, http://www.x86.org/ftp/manuals/686/p6tour.pdf
Jeremiassen, T. and Eggers, S., “Reducing False Sharing on Shared Memory Multiprocessors through Compile Time Data Transformations,” Proc. 5th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, 1995, 179–188Google Scholar
Jiménez, D., Keckler, S., and Lin, C., “The Impact of Delay on the Design of Branch Predictors,” Proc. 33rd Int. Symp. on Microarchitecture, 2000, 67–76Google Scholar
John, L., “More on Finding a Single Number to Indicate Overall Performance of a Benchmark Suite,” Computer Architecture News, 32, 1, Mar. 2004, 3–8CrossRefGoogle Scholar
Joseph, D. and Grunwald, D., “Prefetching Using Markov Predictors,” Proc. 24th Int. Symp. on Computer Architecture, 1997, 252–263CrossRefGoogle Scholar
Jouppi, N., “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” Proc. 17th Int. Symp. on Computer Architecture, 1990, 364–373CrossRefGoogle Scholar
Jourdan, S., Stark, J., Hsing, T.-H., and Patt, Y., “Recovery Requirements of Branch Prediction Storage Structures in the Presence of Mispredicted-path Execution,” International Journal of Parallel Programming, 25, Oct. 1997, 363–383CrossRefGoogle Scholar
Kaeli, D. and Emma, P., “Branch History Table Prediction of Moving Target Branches Due to Subroutine Returns,” Proc. 18th Int. Symp. on Computer Architecture, 1991, 34–42CrossRefGoogle Scholar
Kagi, A., Burger, D., and Goodman, J., “Efficient Synchronization: Let them Eat QOLB,” Proc. 24th Int. Symp. on Computer Architecture, 1997, 170–180CrossRefGoogle Scholar
Kahle, J., Day, M., Hofstee, H., Johns, C., Maeurer, T., and Shippy, D., “Introduction to the Cell Multiprocessor,” IBM Journal of Research and Development, 49, 4/5, Jul. 2005, 589–604CrossRefGoogle Scholar
Kalamatianos, J., Khalafi, A., Kaeli, D., and Meleis, W., “Analysis of Temporal-based Program Behavior for Improved Instruction Cache Performance,” IEEE Trans. on Computers, 48, 2, Feb. 1999, 168–175CrossRefGoogle Scholar
Kalla, R., Sinharoy, B., and Tendler, J., “IBM Power5 Chip: A Dual-Core Multithreaded Processor,” IEEE Micro, 24, 2, Apr. 2004, 40–47CrossRefGoogle Scholar
Kapasi, U., Rixner, S., Dally, W., Khailany, B., Ahn, J., Mattson, P., and Owens, J., “Programmable Stream Processors,” IEEE Computer, 36, 8, Aug. 2003, 54–62CrossRefGoogle Scholar
Kaxiras, S., Hu, Z., and Martonosi, M., “Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power,” Proc. 28th Int. Symp. on Computer Architecture, 2001, 240–251Google Scholar
Keller, R., “Look-Ahead Processors,” ACM Computing Surveys, 7, 4, Dec. 1975, 177–195CrossRefGoogle Scholar
Keltcher, C., McGrath, J., Ahmed, A., and Conway, P., “The AMD Opteron for Multiprocessor Servers,” IEEE Micro, 23, 2, 2003, 66–76CrossRefGoogle Scholar
,Kendall Square Research, KSR1 Technology Background, Waltham, MA, 1992Google Scholar
Kermani, P. and Kleinrock, L., “Virtual Cut-through: A New Computer Communication Switching Technique,” Computer Networks, 3, 4, Sep. 1979, 267–286Google Scholar
Kerns, D. and Eggers, S., “Balanced Scheduling: Instruction Scheduling when Memory Latency is Uncertain,” Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation, SIGPLAN Notices, 28, 6, Jun. 1993, 278–289Google Scholar
Keshava, J. and Pentkovski, V., “Pentium III Processor Implementation Tradeoffs,” Intel Tech. Journal, 2, May 1999Google Scholar
Kessler, R., “The Alpha 21264 Microprocessor,” IEEE Micro, 19, 2, Mar. 1999, 24–36CrossRefGoogle Scholar
Kessler, R., Jooss, R., Lebeck, A., and Hill, M., “Inexpensive Implementations of Set-Associativity,” Proc. 16th Int. Symp. on Computer Architecture, 1989, 131–139CrossRefGoogle Scholar
Kilburn, T., Edwards, D., Lanigan, M., and Sumner, F., “One-level Storage System,” IRE Trans. on Electronic Computers, EC-11, 2, Apr. 1962, 223–235CrossRefGoogle Scholar
Kim, C., Burger, D., and Keckler, S., “An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches,” Proc. 10th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 2002, 211–222CrossRefGoogle Scholar
Kim, N., Flautner, K., Blaauw, D., and Mudge, T., “Drowsy Instruction Caches – Leakage Power Reduction Using Dynamic Voltage Scaling and Cache Sub-bank Prediction,” Proc. 29th Int. Symp. on Computer Architecture, 2002, 219–230Google Scholar
KleinOsowski, A. and Lilja, D., “MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research,” Computer Architecture Letters, 1, Jun. 2002CrossRefGoogle Scholar
Kogge, P., The Architecture of Pipelined Computers, McGraw-Hill, New York, 1981Google Scholar
Kongetira, P., Aingaran, K., and Olukotun, K., “Niagara: A 32-way Multithreaded Sparc Processor,” IEEE Micro, 24, 2, Apr. 2005, 21–29CrossRefGoogle Scholar
Koufaty, D. and Marr, D., “Hyperthreading Technology in the Netburst Microarchitecture,” IEEE Micro, 23, 2, Mar. 2003, 56–65CrossRefGoogle Scholar
Kroft, D., “Lockup-Free Instruction Fetch/Prefetch Cache Organization,” Proc. 8th Int. Symp. on Computer Architecture, 1981, 81–87Google Scholar
Lai, A., Fide, C., and Falsafi, B., “Dead-block Prediction & Dead-block Correlation Prefetchers,” Proc. 28th Int. Symp. on Computer Architecture, 2001, 144–154CrossRefGoogle Scholar
Lam, M., “Software Pipelining: An Effective Scheduling Technique for VLIW Machines,” Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation, SIGPLAN Notices, 23, 7, Jul. 1988, 318–328Google Scholar
Lamport, L., “How to Make a Multiprocessor Computer that Correctly Executes Programs,” IEEE Trans. on Computers, 28, 9, Sep. 1979, 690–691CrossRefGoogle Scholar
Larus, J. and Kozyrakis, C., “Transactional Memory,” Communications of the ACM, 51, 7, Jul. 2008, 80–88CrossRefGoogle Scholar
Lee, D., Crowley, P., Baer, J.-L., Anderson, T., and Bershad, B., “Execution Characteristics of Desktop Applications on Windows NT,” Proc. 25th Int. Symp. on Computer Architecture, 1998, 27–38CrossRefGoogle Scholar
Lee, J., “Study of ‘Look-Aside’ Memory,” IEEE Trans. on Computers, C-18, 11, Nov. 1969, 1062–1065CrossRefGoogle Scholar
Lee, J. and Smith, A., “Branch Prediction Strategies and Branch Target Buffer Design,” IEEE Computer, 17, 1, Jan. 1984, 6–22CrossRefGoogle Scholar
Lin, W.-F., Reinhardt, S., and Burger, D., “Designing a Modern Memory Hierarchy with Hardware Prefetching,” IEEE Trans. on Computers, 50, 11, Nov. 2001, 1202–1218Google Scholar
Lipasti, M. and Shen, J.P., “Exceeding the Dataflow Limit with Value Prediction,” Proc. 29th Int. Symp. on Microarchitecture, 1996, 226–237Google Scholar
Liptay, J., “Design of the IBM Enterprise System/9000 High-end Processor,” IBM Journal of Research and Development, 36, 4, Jul. 1992, 713–731CrossRefGoogle Scholar
Lo, J., Barroso, L., Eggers, S., Gharachorloo, K., Levy, H., and Parekh, S., “An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors,” Proc. 25th Int. Symp. on Computer Architecture, 1998, 39–50CrossRefGoogle Scholar
Loh, G., “Advanced Instruction Flow Techniques,” Chapter 9 in Shen, J. P. and Lipasti, M., Eds., Modern Processor Design, 2005, 453–518Google Scholar
Lovett, T. and Clapp, R., “STiNG: A CC-NUMA Computer System for the Commercial Marketplace,” Proc. 23rd Int. Symp. on Computer Architecture, 1996, 308–317CrossRefGoogle Scholar
Lovett, T. and Thakkar, S., “The Symmetry Multiprocessor System,” Proc. Int. Conf. on Parallel Processing, Aug. 1988, pp. 303–310Google Scholar
Mathis, H., Mericas, A., McCalpin, J., Eickemeyer, R., and Kunkel, S., “Characterization of Simultaneous Multithreading (SMT) Efficiency in Power5,” IBM Journal of Research and Development, 49, 4, Jul. 2005, 555–564CrossRefGoogle Scholar
Mattson, R., Gecsei, J., Slutz, D., and Traiger, I., “Evaluation Techniques for Storage Hierarchies,” IBM Systems Journal, 9, 1970, 78–117CrossRefGoogle Scholar
McFarling, S., “Combining Branch Predictors,” WRL Technical Note, TN-36, Jun. 1993
McMahon, H., “The Livermore Fortran Kernels Test of the Numerical Performance Range,” in Martin, J. L., Ed., Performance Evaluation of Supercomputers, Elsevier Science B.V., North-Holland, Amsterdam, 1988, 143–186.Google Scholar
McNairy, C. and Soltis, D., “Itanium 2 Processor Microarchitecture,” IEEE Micro, 23, 2, Mar. 2003, 44–55CrossRefGoogle Scholar
Mendelson, A., Mandelblat, J., Gochman, S., Shemer, A., Chabukswar, R., Niemeyer, E., and Kumar, A., “CMP Implementation in Systems Based on the Intel Core Duo Processor,” Intel Tech. Journal, 10, 2, May 2006, 99–107CrossRefGoogle Scholar
Moore, G., “Cramming More Components onto Integrated Circuits,” Electronics, 38, 8, Apr. 1965Google Scholar
Moshovos, A., Breach, S., Vijaykumar, T., and Sohi, G., “Dynamic Speculation and Synchronization of Data Dependences,” Proc. 24th Int. Symp. on Computer Architecture, 1997, 181–193CrossRefGoogle Scholar
Mowry, T., Lam, M., and Gupta, A., “Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors,” Proc. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1992, 62–73Google Scholar
Mutlu, O., Stark, J., Wilkerson, C., and Patt, Y., “Run-ahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors,” Proc. 9th Int. Symp. on High-Performance Computer Architecture, 2003, 129–140Google Scholar
Naveh, A., Rotem, E., Mendelson, A., Gochman, S., Chabuskwar, R., Krishnan, K., and Kumar, A., “Power and Thermal Management in the Intel Core Dual Processor,” Intel Tech. Journal, 10, 2, May 2006, 109–122CrossRefGoogle Scholar
Ozer, E., Banerjia, S., and Conte, T., “Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File Microarchitectures,” Proc. 31st Int. Symp. on Microarchitecture, 1998, 308–315CrossRefGoogle Scholar
Palacharla, S., Jouppi, N., and Smith, J., “Complexity-Effective Superscalar Processors,” Proc. 24th Int. Symp. on Computer Architecture, 1997, 206–218CrossRefGoogle Scholar
Palacharla, S. and Kessler, R., “Evaluating Stream Buffers as a Secondary Cache Replacement,” Proc. 21st Int. Symp. on Computer Architecture, 1994, 24–33Google Scholar
Pan, S., So, K., and Rahmey, J., “Improving the Accuracy of Dynamic Branch Prediction using Branch Correlation,” Proc. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1992, 76–84Google Scholar
Papamarcos, M. and Patel, J., “A Low-overhead Coherence Solution for Multiprocessors with Private Cache Memories,” Proc. 12th Int. Symp. on Computer Architecture, 1985, 348–354Google Scholar
Papworth, D., “Tuning the Pentium Pro Microarchitecture,” IEEE Micro, 16, 2, Mar. 1996, 8–15CrossRefGoogle Scholar
Patel, S., Friendly, D., and Patt, Y., “Evaluation of Design Options for the Trace Cache Fetch Mechanism,” IEEE Trans. on Computers, 48, 2, Feb. 1999, 193–204CrossRefGoogle Scholar
Patterson, D. and Hennessy, J., Computer Organization & Design: The Hardware/Software Interface, Third Edition, Morgan Kaufman Publishers, San Francisco, 2004Google Scholar
Patterson, D. and Séquin, C., “RISC I: A Reduced Instruction Set VLSI Computer,” Proc. 8th Int. Symp. on Computer Architecture, 1981, 443–457.Google Scholar
Peir, J.-K., Hsu, W., and Smith, A., “Functional Implementations Techniques for CPU Cache Memories,” IEEE Trans. on Computers, 48, 2, Feb. 1999, 100–110CrossRefGoogle Scholar
Peleg, A. and Weiser, U., “Dynamic Flow Instruction Cache Memory Organized Around Trace Segments Independent of Virtual Address Line,” U.S. Patent Number 5,381,533, 1994
Peleg, A. and Weiser, U., “MMX Technology Extension to the Intel Architecture,” IEEE Micro, 16, 4, Aug. 1996, 42–50CrossRefGoogle Scholar
Perleberg, C. and Smith, A., “Branch Target Buffer Design and Optimization,” IEEE Trans. on Computers, 42, 4, Apr. 1993, 396–412CrossRefGoogle Scholar
Pettis, K. and Hansen, R., “Profile Guided Code Positioning,” Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation, SIGPLAN Notices, 25, Jun. 1990, 16–27Google Scholar
Ponomarev, D., Kucuk, G., and Ghose, K., “Reducing Power Requirements of Instruction Scheduling through Dynamic Allocation of Multiple Datapath resources,” Proc. 34th Int. Symp. on Microarchitecture, 2001, 90–101CrossRefGoogle Scholar
Postiff, M., Tyson, G., and Mudge, T., “Performance Limits of Trace Caches,” Journal of Instruction-Level Parallelism, 1, Sep. 1999, 1–17Google Scholar
Przybylski, S., Cache Design: A Performance Directed Approach, Morgan Kaufman Publishers, San Francisco, 1990Google Scholar
Pugh, E., Johnson, L., and Palmer, J., IBM's 360 and Early 370 Systems, The MIT Press, Cambridge, MA, 1991Google Scholar
Ranganathan, P., Adve, S., and Jouppi, N., “Performance of Image and Video Processing with General-Purpose Processors and Media ISA Extensions,” Proc. 26th Int. Symp. on Computer Architecture, 1999, 124–135CrossRefGoogle Scholar
Riseman, E. and Foster, C., “The Inhibition of Potential Parallelism by Conditional Jumps,” IEEE Trans. on Computers, C-21, 12, Dec. 1972, 1405–1411CrossRefGoogle Scholar
Romer, T., Lee, D., Volker, G., Wolman, A., Wong, W., Baer, J.-L., Bershad, B., and Levy, H., “The Structure and Performance of Interpreters,” Proc. 7th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1996, pp. 150–159Google Scholar
Rotenberg, E., Bennett, S., and Smith, J., “Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching,” Proc. 29th Int. Symp. on Microarchitecture, 1996, 24–34Google Scholar
Rudolf, L. and Segall, Z., “Dynamic Decentralized Cache Schemes for MIMD Parallel Processors,” Proc. 11th Int. Symp. on Computer Architecture, 1984, 340–347CrossRefGoogle Scholar
Salverda, P. and Zilles, C., “A Criticality Analysis of Clustering in Superscalar Processors,” Proc. 38th Int. Symp. on Microarchitecture, 2005, 55–66CrossRefGoogle Scholar
Schlansker, M. and Rau, B., “EPIC: Explicitly Parallel Instruction Computing,” IEEE Computer, 33, 2, Feb. 2000, 37–45CrossRefGoogle Scholar
Scott, S., “Synchronization and Communication in the Cray 3TE Multiprocessor,” Proc. 7th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1996, 26–36Google Scholar
Seznec, A., “A Case for Two-way Skewed-Associative Caches,” Proc. 20th Int. Symp. on Computer Architecture, 1993, 169–178Google Scholar
Sharangpani, H. and Arora, K., “Itanium Processor Microarchitecture,” IEEE Micro, 20, 5, Sep. 2000, 24–43CrossRefGoogle Scholar
Shen, J. P. and Lipasti, M., Modern Processor Design Fundamentals of Superscalar Processors, McGraw-Hill, 2005Google Scholar
Sherwood, T., Perelman, E., Hamerly, G., Sair, S., and Calder, B., “Discovering and Exploiting Program Phases,” IEEE Micro, 23, 6, Nov.–Dec. 2003, 84–93CrossRefGoogle Scholar
Sima, D., “The Design Space of Register Renaming Techniques,” IEEE Micro, 20, 5, Sep. 2000, 70–83CrossRefGoogle Scholar
Skadron, K., Martonosi, M., and Clark, D., “Speculative Updates of Local and Global Branch History: A Quantitative Analysis,” Journal of Instruction-Level Parallelism, 2, 2000, 1–23Google Scholar
Skadron, K., Stan, M., Huang, W., Velusamy, S., Sankararayanan, K., and Tarjan, D., “Temperature-Aware Microarchitecture,” Proc. 30th Int. Symp. on Computer Architecture, 2003, 2–13CrossRefGoogle Scholar
Slingerland, N. and Smith, A., “Multimedia Extensions for General-Purpose Microprocessors: A Survey,” Microprocessors and Microsystems, 29, 5, Jan. 2005, 225–246CrossRefGoogle Scholar
Smith, A., “Cache Memories,” ACM Computing Surveys, 14, 3, Sep. 1982, 473–530CrossRefGoogle Scholar
Smith, B., “A Pipelined, Shared Resource MIMD Computer,” Proc. Int. Conf. on Parallel Processing, 1978, 6–8Google Scholar
Smith, J., “A Study of Branch Prediction Strategies,” Proc. 8th Int. Symp. on Computer Architecture, 1981, 135–148Google Scholar
Smith, J., “Characterizing Computer Performance with a Single Number,” Communications of the ACM, 31, 10, Oct. 1988, 1201–1206CrossRefGoogle Scholar
Smith, J. and Pleszkun, A., “Implementation of Precise Interrupts in Pipelined Processors,” IEEE Trans. on Computers, C-37, 5, May 1988, 562–573 (an earlier version was published in Proc. 12th Int. Symp. on Computer Architecture, 1985)CrossRefGoogle Scholar
Smith, J. and Sohi, G., “The Microarchitecture of Superscalar Processors,” Proc. IEEE, 83, 12, Dec. 1995, 1609–1624CrossRefGoogle Scholar
Sohi, G., “Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers,” IEEE Trans. on Computers, C-39, 3, Mar. 1990, 349–359 (an earlier version with co-author S. Vajapeyam was published in Proc. 14th Int. Symp. on Computer Architecture, 1987)CrossRefGoogle Scholar
Sohi, G., Breach, S., and Vijaykumar, T., “Multiscalar Processors,” Proc. 22nd Int. Symp. on Computer Architecture, 1995, 414–425CrossRefGoogle Scholar
Sohi, G. and Roth, A., “Speculative Multithreaded Processors,” IEEE Computer, 34, 4, Apr. 2001, 66–73CrossRefGoogle Scholar
Srinivasan, S., Ju, D.-C., Lebeck, A., and Wilkerson, C., “Locality vs. Criticality,” Proc. 28th Int. Symp. on Computer Architecture, 2001, 132–143Google Scholar
Stark, J., Brown, M., and Patt, Y., “On Pipelining Dynamic Instruction Scheduling Logic,” Proc. 34th Int. Symp. on Microarchitecture, 2000, 57–66Google Scholar
Stunkel, C., Herring, J., Abali, B., and Sivaram, R., “A New Switch Chip for IBM RS/6000 SP Systems,” Proc. Int. Conf. on Supercomputing, 1999, 16–33Google Scholar
Sweazey, P. and Smith, A., “A Class of Compatible Cache Consistency Protocols and their Support by the IEEE Future Bus,” Proc. 13th Int. Symp. on Computer Architecture, 1986, 414–423CrossRefGoogle Scholar
Tendler, J., Dodson, J., Fields, Jr. J., Le, H., and Sinharoy, B., “POWER 4 System Microarchitecture,” IBM Journal of Research and Development, 46, 1, Jan. 2002, 5–25CrossRefGoogle Scholar
Thornton, J., “Parallel Operation in the Control Data 6600,” Proc. AFIPS. FJCC, pt. 2, vol. 26, 1964, 33–40 (reprinted as Chapter 39 of Bell, C. and Newell, A., Eds., Computer Structures: Readings and Examples, McGraw-Hill, New York, 1971, and Chapter 43 of Siewiorek, D., Bell, C., and Newell, A., Eds., Computer Structures: Principles and Examples, McGraw-Hill, New York, 1982)Google Scholar
Thornton, J., Design of a Computer. The Control Data 6600, Scott, Foresman and Co., Glenview, IL, 1970Google Scholar
Tjaden, G. and Flynn, M., “Detection and Parallel Execution of Independent Instructions,” IEEE Trans. on Computers, C-19, 10, Oct. 1970, 889–895CrossRefGoogle Scholar
Tomasulo, R., “An Efficient Algorithm for Exploiting Multiple Arithmetic Units,” IBM Journal of Research and Development, 11, 1, Jan. 1967, 25–33CrossRefGoogle Scholar
Tremblay, M., Chan, J., Chaudhry, S., Coniglaro, A., and Tse, S., “The MAJC Architecture: A Synthesis of Parallelism and Scalability,” IEEE Micro, 20, 6, Nov. 2000, 12–25Google Scholar
Tremblay, M. and O'Connor, J., “UltraSparc I: A Four-issue Processor Supporting Multimedia,” IEEE Micro, 16, 2, Apr. 1996, 42–50CrossRefGoogle Scholar
Tucker, L. and Robertson, G., “Architecture and Applications of the Connection Machine,” IEEE Computer, 21, 8, Aug. 1988, 26–38CrossRefGoogle Scholar
Tullsen, D., Eggers, S., and Levy, H., “Simultaneous Multithreading: Maximizing On-chip Parallelism,” Proc. 22nd Int. Symp. on Computer Architecture, 1995, 392–403CrossRefGoogle Scholar
Tune, E., Liang, D., Tullsen, D., and Calder, B., “Dynamic Prediction of Critical Path Instructions,” Proc. 7th Int. Symp. on High-Performance Computer Architecture, 2001, 185–195Google Scholar
Uhlig, R. and Mudge, T., “Trace-driven Memory Simulation: A Survey,” ACM Computing Surveys, 29, 2, Jun. 1997, 128–170CrossRefGoogle Scholar
Vanderwiel, S. and Lilja, D., “Data Prefetch Mechanisms,” ACM Computing Surveys, 32, 2, Jun. 2000, 174–199CrossRefGoogle Scholar
VanVleet, P., Anderson, E., Brown, L., Baer, J.-L., and Karlin, A., “Pursuing the Performance Potential of Dynamic Cache Lines,” Proc. ICCD, Oct. 1999, 528–537Google Scholar
Venkatachalam, V. and Franz, M., “Power Reduction Techniques for Microprocessor Systems,” ACM Computing Surveys, 37, 3, Sep. 2005, 195–237CrossRefGoogle Scholar
Weicker, R., “Dhrystone: A Synthetic Systems Programming Benchmark,” Communications of the ACM, 27, Oct. 1984, 1013–1030CrossRefGoogle Scholar
Weiser, M., Welch, B., Demers, A., and Shenker, S., “Scheduling for Reduced CPU Energy,” Proc. 1st USENIX Symp. on Operating Systems Design and Implementation, 1994, 13–23Google Scholar
Weschler, O., “Inside Intel Core Microarchitecture,” Intel White Paper, 2006, http://download.intel.com/technology/architecture/new_architecture_06.pdf
Wilkes, M., “Slave Memories and Dynamic Storage Allocation,” IEEE Trans. on Electronic Computers, EC-14, Apr. 1965, 270–271CrossRefGoogle Scholar
Wong, W. and Baer, J.-L., “Modified LRU Policies for Improving Second-Level Cache Behavior,” Proc. 6th Int. Symp. on High-Performance Computer Architecture, 2000, 49–60Google Scholar
Yeager, K., “The MIPS R10000 Superscalar Microprocessor,” IEEE Micro, 16, 2, Apr. 1996, 28–41CrossRefGoogle Scholar
Yeh, T.-Y. and Patt, Y., “Alternative Implementations of Two-Level Adaptive Branch Prediction,” Proc. 19th Int. Symp. on Computer Architecture, 1992, 124–134CrossRefGoogle Scholar
Yeh, T.-Y. and Patt, Y., “A Comprehensive Instruction Fetch Mechanism for a Processor Supporting Speculative Execution,” Proc. 25th Ann. Symp. on Microarchitecture, 1992, 129–139Google Scholar
Yoaz, A., Erez, M., Ronen, R., and Jourdan, S., “Speculation Techniques for Improving Load Related Instruction Scheduling,” Proc. 26th Int. Symp. on Computer Architecture, 1999, 42–53CrossRefGoogle Scholar
Zhang, Z., Zhu, Z., and Zhang, X., “A Permutation-based Page Interleaving Scheme to Reduce Row-buffer Conflicts and Exploit Data Locality,” Proc. 33rd Int. Symp. on Microarchitecture, 2000, 32–41Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • Bibliography
  • Jean-Loup Baer, University of Washington
  • Book: Microprocessor Architecture
  • Online publication: 05 June 2012
  • Chapter DOI: https://doi.org/10.1017/CBO9780511811258.011
Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

  • Bibliography
  • Jean-Loup Baer, University of Washington
  • Book: Microprocessor Architecture
  • Online publication: 05 June 2012
  • Chapter DOI: https://doi.org/10.1017/CBO9780511811258.011
Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

  • Bibliography
  • Jean-Loup Baer, University of Washington
  • Book: Microprocessor Architecture
  • Online publication: 05 June 2012
  • Chapter DOI: https://doi.org/10.1017/CBO9780511811258.011
Available formats
×