In this chapter we sharpen our focus on thread-level parallelism within a single die. Parallelism within a die comes in different forms. Within a single-core, multiple threads can be executed to improve resource utilization, an approach called core multi-threading. There are three approaches to core multi-threading depending on how and when instructions are fetched from multiple ready threads: block multi-threading, interleaved multi-threading and simultaneous multi-threading. We show the hardware additions and modifications necessary for each of these three multi-threading approaches to work within the contexts of traditional (single-threaded) in-order and out-of-order processors. We use example-driven approaches to show the performance advantages of finer-grain multi-threading over coarse-grain multithreading. The performance advantages come at additional hardware cost.
The next paradigm to provide on-die parallelism is exploiting multiple cores on the same chip. Chip multiprocessors (CMPs) are fast becoming ubiquitous in all walks of computing, from cell phones to datacenter servers. We explain the fundamental advantages of CMPs over traditional shared-memory multiprocessors (SMPs) mostly borne from the fact that all cores are tightly integrated on a single die by on-die interconnects.We describe three on-die interconnect topologies common today for building CMPs. When all cores on a CMP are identical, the CMP is said to be homogeneous. The cores in heterogeneous CMPs differ in their capabilities. We describe various heterogeneous CMP designs and the gamut of different performance and functionality possible.
Email your librarian or administrator to recommend adding this book to your organisation's collection.