To send content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about sending content to .
To send content items to your Kindle, first ensure firstname.lastname@example.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about sending to your Kindle.
Note you can select to send to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In 1989 a forward-looking paper attempted to determine the characteristics of microprocessors in the year 2000. Called “Microprocessors circa 2000”, the paper hypothesized that a high-performance microprocessor in the year 2000 would have an area of 1 square inch (645 sq mm), contain 50 million transistors, and run at above 250 MHz . The overall performance of the microprocessor was estimated at 2000 million instructions per second (MIPS), achieved by the employment of two or three cores, each with a performance of 750 MIPS. Forward-looking papers often have somewhat fanciful conceits of future developments, illustrating the witticism that predictions tend to be difficult if they involve the future. This prediction, however, was based on many years of microprocessor development, leading to a broadly accurate prediction of things to come. The International Solid State Circuit Conference (ISSCC), held in early 2000, presented a number of microprocessors whose transistor counts and area were within 2× of the prediction. Since much of the area of a microprocessor is composed of on-chip memory, the prediction for transistor count was achieved soon afterwards. The prediction of 2000 MIPS for the maximum performance of the system also proved to be accurate. The interesting discrepancy was in the way that the performance of the microprocessor was achieved. Instead of employing a number of processors operating at 250 MHz, most high end microprocessors were single core designs running at or above 1 GHz.
Presenting a methodology for using domino logic in an ASIC design flow developed over several years in an industrial context, this text covers practical issues related to the use of domino logic in an automated framework, and brings together all the knowledge needed to apply these design techniques in practice. Beginning with a discussion of how to achieve high speed in ASIC designs, subsequent chapters detail the design and characterization of standard cell compatible domino logic libraries and an advanced domino logic synthesis flow. The results achieved by using automated domino logic design techniques, including silicon measurements, are used to validate the presented solution. With design examples including the implementation of the execution unit of a microprocessor and a Viterbi decoder, this text is ideal for graduate students and researchers in electrical and computer engineering and also for circuit designers in industry.
This book stems from my experience over the last few years in designing high-speed digital logic using ASIC design flows. I discovered that while it is possible to significantly improve performance in ASIC implementations with deep pipelining and careful physical design, a speed penalty still had to be paid due to their exclusive use of static logic. This spurred an interest in using domino logic with automated synthesis and place and route tools. This book documents my experiences in automating the use of domino logic, and shows that despite the challenges entailed in the process, it is possible to use domino logic with industry-standard ASIC tools and achieve a significant speed improvement in the process.
Engineering is a group activity. The development of our domino logic synthesis system was possible due to the collaboration of many intelligent, enthusiastic, and dedicated co-workers whose contributions I must acknowledge. First of all I would like to thank my two chapter co-authors, Tommy Zounes and Bernard Bourgin. In addition to being gifted and hard-working engineers, Tommy and Bernard have also always been very generous with their knowledge and time, allowing all of their co-workers, including me, to learn a great deal from them. The domino logic library was possible due to the talents and efforts of Scott Anderson, Shaun Forsting, Judy Alvarez-Gallardo, Roger Boates, Michael Lin, and Juneho Park, who helped design the schematics and also contributed to the myriad other tasks involved with taping out a number of chips.
By the late 1970s complementary metal oxide semiconductor (CMOS) started to become the process of choice for digital semiconductor designs. CMOS had originally been proposed by Frank Wanlass in 1963 as a low standby power technology, since CMOS logic gates dissipate almost no power when the inputs to the gate do not change . This follows as CMOS contains both PMOS field effect transistors (FETs), which can efficiently drive a high voltage, or logic one value, and NMOS transistors, which are good at driving a zero voltage. The presence of complementary transistors allows CMOS logic gates to be implemented so that the output voltage level is connected to the power or ground line, but not both. This ability to avoid contention ensures that if the inputs are not changing, then no power is dissipated. This was a major advantage of CMOS over the other manufacturing processes then available, which dissipated constant leakage or bias currents.
In Figure 1.1 the schematic representation of a CMOS static NAND logic gate is shown. The logic gate has two inputs A and B. A high logic value at inputs A and B turns on transistors MN1 and MN2, while turning off transistors MP1 and MP2. This causes the output Z to be low. When either input A or B is off, however, the path to the ground line is ruptured, with a path to the power supply (by convention called Vdd) being established. This causes Z to rise.
Previous chapters in this book have been devoted to the design of domino logic standard cells and methods to synthesize logic using them. In this chapter we describe some example circuits implemented using different automated domino logic design flows. Since the primary benchmark for synthesizable domino logic is against synthesizable static logic, comparisons are provided between the two. Silicon-measured data is also provided wherever it is available.
Domino integer execution unit
A typical application for high-speed logic is in the execution units of microprocessors. Execution units are the main arithmetic modules in processors, performing integer or floating point arithmetic. In order to understand the speed advantages possible with domino logic, we decided to build a simple integer execution unit. The block has an adder, a shifter, a multiplier, and a bit operations unit. Memory modules interact closely with execution units, to provide data and instructions. For this design two 32-entry, 32-bit wide register files are used in each execution unit. One register file supplies the 32-bit wide data operands that are applied to the datapath modules and stores the result. The other register keeps a simple set of instructions. These instructions allow the data operations to start and stop. They also determine the operations to be performed and the data memory locations to be accessed.
A schematic representation of the execution unit data flow is shown in Figure 5.1. Operation starts via instructions sent from the instruction register file. Each arithmetic function receives operands from the data register file.
Digital ASIC design methodologies are now mature technologies. While EDA tools continue to progress and improve, the basic algorithms on which they are based have been well optimized. In addition, the high-speed needs in an ASIC often tend to be focused on small or medium-sized blocks of logic, while the current focus for EDA tools is on dealing with the massive complexity of systems on-chip. Static logic libraries, like EDA tools, have also improved in the last few years, especially with the introduction of pulse-based flip-flops [1, 2]. Beyond that there does not appear to be very much one can do to improve performance significantly beyond the incremental work of increasing the number of cells and type of libraries provided for the synthesis tool. This is common for many maturing industries, where once the low-hanging fruit has been picked further improvements require considerable effort, often for limited gain.
Before the reader decides to accept the limitations in ASIC design flows with the calm serenity with which it is best to accept the unalterable frailties of the human condition, and other such phenomena, it is perhaps useful to remember that custom designs still remain significantly faster than ASIC implementations in the same process generation . This suggests that there still remains scope for further speed improvements in ASIC flows by using custom design techniques.