[ Chapter start ] [ Previous page ] [ Next page ] 3.3 Logical EffortIn this section we explore a delay model based on logical effort, a term coined by Ivan Sutherland and Robert Sproull [1991], that has as its basis the timeconstant analysis of Carver Mead, Chuck Seitz, and others. We add a “catch all” nonideal component of delay, t _{ q} , to Eq. 3.2 that includes: (1) delay due to internal parasitic capacitance; (2) the time for the input to reach the switching threshold of the cell; and (3) the dependence of the delay on the slew rate of the input waveform. With these assumptions we can express the delay as follows: (The input capacitance of the logic cell is C , but we do not need it yet.) We will use a standardcell library for a 3.3 V, 0.5 m m (0.6 m m drawn) technology (from Compass) to illustrate our model. We call this technology C5 ; it is almost identical to the G5 process from Section 2.1 (the Compass library uses a more accurate and more complicated SPICE model than the generic process). The equation for the delay of a 1X drive, twoinput NAND cell is in the form of Eq. 3.10 ( C _{ out} is in pF): The delay due to the intrinsic output capacitance (0.07 ns, equal to RC _{ p} ) and the nonideal delay ( t _{ q} = 0.15 ns) are specified separately. The nonideal delay is a considerable fraction of the total delay, so we may hardly ignore it. If data books do not specify these components of delay separately, we have to estimate the fractions of the constant part of a delay equation to assign to RC _{ p} and t _{ q} (here the ratio RC _{ p} / t _{ q} is approximately 2). The data book tells us the input trip point is 0.5 and the output trip points are 0.35 and 0.65. We can use Eq. 3.11 to estimate the pull resistance for this cell as R ª 1.46 nspF ^{ –1} or about 1.5 k W . Equation 3.11 is for the falling delay; the data book equation for the rising delay gives slightly different values (but within 10 percent of the falling delay values). We can scale any logic cell by a scaling factor s (transistor gates become s times wider, but the gate lengths stay the same), and as a result the pull resistance R will decrease to R / s and the parasitic capacitance C _{ p} will increase to sC _{ p} . Since t _{ q} is nonideal, by definition it is hard to predict how it will scale. We shall assume that t _{ q} scales linearly with s for all cells. The total cell delay then scales as follows: For example, the delay equation for a 2X drive ( s = 2), twoinput NAND cell is Compared to the 1X version (Eq. 3.11 ), the output parasitic delay has decreased to 0.03 ns (from 0.07 ns), whereas we predicted it would remain constant (the difference is because of the layout); the pull resistance has decreased by a factor of 2 from 1.5 k W to 0.75 k W , as we would expect; and the nonideal delay has increased to 0.51 ns (from 0.15 ns). The differences between our predictions and the actual values give us a measure of the model accuracy. We rewrite Eq. 3.12 using the input capacitance of the scaled logic cell, C _{ in} = s C , Finally we normalize the delay using the time constant formed from the pull resistance R _{ inv} and the input capacitance C _{ inv} of a minimumsize inverter: is a basic property of any CMOS technology. We shall measure delays in terms of t . The delay equation for a 1X (minimumsize) inverter in the C5 library is
Thus tq _{ inv} = 0.1 ns and R _{ inv} = 1.60 k W . The input capacitance of the 1X inverter (the standard load for this library) is specified in the data book as C _{ inv} = 0.036 pF; thus t = (0.036 pF)(1.60 k W ) = 0.06 ns for the C5 technology. The use of logical effort consists of rearranging and understanding the meaning of the various terms in Eq. 3.15 . The delay equation is the sum of three terms, We give these terms special names as follows: The effort delay f we write as a product of logical effort, g , and electrical effort, h: So we can further partition delay into the following terms:
The logical effort g is a function of the type of logic cell, What size of logic cell do the R and C refer to? It does not matter because the R and C will change as we scale a logic cell, but the RC product stays the same—the logical effort is independent of the size of a logic cell. We can find the logical effort by scaling down the logic cell so that it has the same drive capability as the 1X minimumsize inverter. Then the logical effort, g , is the ratio of the input capacitance, C _{ in} , of the 1X version of the logic cell to C _{ inv} (see Figure 3.8 ). The electrical effort h depends only on the load capacitance C _{ out} connected to the output of the logic cell and the input capacitance of the logic cell, C _{ in} ; thus The parasitic delay p depends on the intrinsic parasitic capacitance C _{ p} of the logic cell, so that Table 3.2 shows the logical efforts for singlestage logic cells. Suppose the minimumsize inverter has an n channel transistor with W/L = 1 and a p channel transistor with W/L = 2 (logic ratio, r , of 2). Then each twoinput NAND logic cell input is connected to an n channel transistor with W/L = 2 and a p channel transistor with W/L = 2. The input capacitance of the twoinput NAND logic cell divided by that of the inverter is thus 4/3. This is the logical effort of a twoinput NAND when r = 2. Logical effort depends on the ratio of the logic. For an n input NAND cell with ratio r , the p channel transistors are W/L = r /1, and the n channel transistors are W/L = n /1. For a NOR cell the n channel transistors are 1/1 and the p channel transistors are nr /1.
The parasitic delay arises from parasitic capacitance at the output node of a singlestage logic cell and most (but not all) of this is due to the source and drain capacitance. The parasitic delay of a minimumsize inverter is The parasitic delay is a constant, for any technology. For our C5 technology we know RC _{ p} = 0.06 ns and, using Eq. 3.17 for a minimumsize inverter, we can calculate p _{ inv} = RC _{ p} / t = 0.06/0.06 = 1 (this is purely a coincidence). Thus C _{ p} is about equal to C _{ inv} and is approximately 0.036 pF. There is a large error in calculating p _{ inv} from extracted delay values that are so small. Often we can calculate p _{ inv} more accurately from estimating the parasitic capacitance from layout. Because RC _{ p} is constant, the parasitic delay is equal to the ratio of parasitic capacitance of a logic cell to the parasitic capacitance of a minimumsize inverter. In practice this ratio is very difficult to calculate—it depends on the layout. We can approximate the parasitic delay by assuming it is proportional to the sum of the widths of the n channel and p channel transistors connected to the output. Table 3.2 shows the parasitic delay for different cells in terms of p _{ inv} . The nonideal delay q is hard to predict and depends mainly on the physical size of the logic cell (proportional to the cell area in general, or width in the case of a standard cell or a gatearray macro), We define q _{ inv} in the same way we defined p _{ inv} . An n input cell is approximately n times larger than an inverter, giving the values for nonideal delay shown in Table 3.2 . For our C5 technology, from Eq. 3.17 , q _{ inv} = t _{ q} _{ inv} / t = 0.1 ns/0.06 ns = 1.7. 3.3.1 Predicting DelayAs an example, let us predict the delay of a threeinput NOR logic cell with 2X drive, driving a net with a fanout of four, with a total load capacitance (comprising the input capacitance of the four cells we are driving plus the interconnect) of 0.3 pF. From Table 3.2 we see p = 3 p _{ inv} and q = 3 q _{ inv} for this cell. We can calculate C _{ in} from the fact that the input gate capacitance of a 1X drive, threeinput NOR logic cell is equal to gC _{ inv} , and for a 2X logic cell, C _{ in} = 2 gC _{ inv} . Thus, (Notice that g cancels out in this equation, we shall discuss this in the next section.) The delay of the NOR logic cell, in units of t , is thus equivalent to an absolute delay, t _{ PD} ª 12.3 ¥ 0.06 ns = 0.74 ns. The delay for a 2X drive, threeinput NOR logic cell in the C5 library is compared to our prediction of 0.74 ns. Almost all of the error here comes from the inaccuracy in predicting the nonideal delay. Logical effort gives us a method to examine relative delays and not accurately calculate absolute delays. More important is that logical effort gives us an insight into why logic has the delay it does. 3.3.2 Logical Area and Logical EfficiencyFigure 3.9 shows a singlestage ORANDINVERT cell that has different logical efforts at each input. The logical effort for the OAI221 is the logicaleffort vector g = (7/3, 7/3, 5/3). For example, the first element of this vector, 7/3, is the logical effort of inputs A and B in Figure 3.9 . We can calculate the area of the transistors in a logic cell (ignoring the routing area, drain area, and source area) in units of a minimumsize n channel transistor—we call these units logical squares . We call the transistor area the logical area . For example, the logical area of a 1X drive cell, OAI221X1, is calculated as follows:
Figure 3.10 shows a singlestage AOI221 cell, with g = (8/3, 8/3, 6/3). The calculation of the logical area (for a AOI221X1) is as follows:
These calculations show us that the singlestage AOI221, with an area of 33 logical squares and logical effort of (7/3, 7/3, 5/3), is more logically efficient than the singlestage OAI221 logic cell with a larger area of 39 logical squares and larger logical effort of (8/3, 8/3, 6/3). 3.3.3 Logical PathsWhen we calculated the delay of the NOR logic cell in Section 3.3.1, the answer did not depend on the logical effort of the cell, g (it cancelled out in Eqs. 3.27 and 3.28 ). This is because g is a measure of the input capacitance of a 1X drive logic cell. Since we were not driving the NOR logic cell with another logic cell, the input capacitance of the NOR logic cell had no effect on the delay. This is what we do in a data book—we measure logiccell delay using an ideal input waveform that is the same no matter what the input capacitance of the cell. Instead let us calculate the delay of a logic cell when it is driven by a minimumsize inverter. To do this we need to extend the notion of logical effort. So far we have only considered a singlestage logic cell, but we can extend the idea of logical effort to a chain of logic cells or logical path . Consider the logic path when we use a minimumsize inverter ( g _{ 0} = 1, p _{ 0} = 1, q _{ 0} = 1.7) to drive one input of a 2X drive, threeinput NOR logic cell with g _{ 1} = ( nr + 1)/( r + 1), p _{ 1} = 3, q _{ 1} =3, and a load equal to four standard loads. If the logic ratio is r = 1.5, then g _{ 1} = 5.5/2.5 = 2.2.
Of this 7.1 t delay we can attribute 4.4 t to the loading of the NOR logic cell input capacitance, which is 2 g _{ 1} C _{ inv} . The delay of the NOR logic cell is, as before, d _{ 1} = g _{ 1} h _{ 1} + p _{ 1} + q _{ 1} = 12.3, making the total delay 7.1 + 12.3 = 19.4, so the absolute delay is (19.4)(0.06 ns) = 1.164 ns, or about 1.2 ns. We can see that the path delay D is the sum of the logical effort, parasitic delay, and nonideal delay at each stage. In general, we can write the path delay as 3.3.4 Multistage CellsConsider the following function (a multistage AOI221 logic cell): Figure 3.11 (a) shows this implementation with each input driven by a minimumsize inverter so we can measure the effect of the cell input capacitance.
The logical efforts of each of the logic cells in Figure 3.11 (a) are as follows:
Each of the logic cells in Figure 3.11 has a 1X drive strength. This means that the input capacitance of each logic cell is given, as shown in the figure, by gC _{ inv} . Using Eq. 3.32 we can calculate the delay from the input of the inverter driving A1 to the output ZN as In Eq. 3.35 we have normalized the output load, C _{ L} , by dividing it by a standard load (equal to C _{ inv} ). We can calculate the delays of the other paths similarly. More interesting is to compare the multistage implementation with the singlestage version. In our C5 technology, with a logic ratio, r = 1.5, we can calculate the logical effort for a singlestage AOI221 logic cell as
This gives the delay from an inverter driving the A input to the output ZN of the singlestage logic cell as The singlestage delay is very close to the delay for the multistage version of this logic cell. In some ASIC libraries the AOI221 is implemented as a multistage logic cell instead of using a single stage. It raises the question: Can we make the multistage logic cell any faster by adjusting the scale of the intermediate logic cells? 3.3.5 Optimum DelayBefore we can attack the question of how to optimize delay in a logic path, we shall need some more definitions. The path logical effort G is the product of logical efforts on a path: The path electrical effort H is the product of the electrical efforts on the path, where C _{ out} is the last output capacitance on the path (the load) and C _{ in} is the first input capacitance on the path. The path effort F is the product of the path electrical effort and logical efforts, The optimum effort delay for each stage is found by minimizing the path delay D by varying the electrical efforts of each stage h _{ i} , while keeping H , the path electrical effort fixed. The optimum effort delay is achieved when each stage operates with equal effort, This a useful result. The optimum path delay is then where P + Q is the sum of path parasitic delay and nonideal delay, We can use these results to improve the AOI221 multistage implementation of Figure 3.11 (a). Assume that we need a 1X cell, so the output inverter (cell 4) must have 1X drive strength. This fixes the capacitance we must drive as C _{ out} = C _{ inv} (the capacitance at the input of this inverter). The input inverters are included to measure the effect of the cell input capacitance, so we cannot cheat by altering these. This fixes the input capacitance as C _{ in} = C _{ inv} . In this case H = 1. The logic cells that we can scale on the path from the A input to the output are NAND logic cells labeled as 2 and 3. In this case Thus F = GH = 1.95 and the optimum stage effort is 1.95^{ (1/3)} = 1.25, so that the optimum delay NF ^{ 1/} ^{ N} = 3.75. From Figure 3.11 (a) we see that
This means that even if we scale the sizes of the cells to their optimum values, we only save a fraction of a t (3.8 – 3.75 = 0.05). This is a useful result (and one that is true in general)—the delay is not very sensitive to the scale of the cells. In this case it means that we can reduce the size of the two NAND cells in the multicell implementation of an AOI221 without sacrificing speed. We can use logical effort to predict what the change in delay will be for any given cell sizes. We can use logical effort in the design of logic cells and in the design of logic that uses logic cells. If we do have the flexibility to continuously size each logic cell (which in ASIC design we normally do not, we usually have to choose from 1X, 2X, 4X drive strengths), each logic stage can be sized using the equation for the individual stage electrical efforts, For example, even though we know that it will not improve the delay by much, let us size the cells in Figure 3.11 (a). We shall work backward starting at the fixed load capacitance at the input of the last inverter. For NAND cell 3, gh = 1.25; thus (since g = 1.4), h = C _{ out } / C _{ in} = 0.893. The output capacitance, C _{ out} , for this NAND cell is the input capacitance of the inverter—fixed as 1 standard load, C _{ inv} . This fixes the input capacitance, C _{ in} , of NAND cell 3 at 1/0.893 = 1.12 standard loads. Thus, the scale of NAND cell 3 is 1.12/1.4 or 0.8X. Now for NAND cell 2, gh = 1.25; C _{ out} for NAND cell 2 is the C _{ in} of NAND cell 3. Thus C _{ in} for NAND cell 2 is 1.12/0.893 = 1.254 standard loads. This means the scale of NAND cell 2 is 1.254/1.4 or 0.9X. The optimum sizes of the NAND cells are not very different from 1X in this case because H = 1 and we are only driving a load no bigger than the input capacitance. This raises the question: What is the optimum stage effort if we have to drive a large load, H >> 1? Notice that, so far, we have only calculated the optimum stage effort when we have a fixed number of stages, N . We have said nothing about the situation in which we are free to choose, N , the number of stages. 3.3.6 Optimum Number of StagesSuppose we have a chain of N inverters each with equal stage effort, f = gh . Neglecting parasitic and nonideal delay, the total path delay is Nf = Ngh = Nh , since g = 1 for an inverter. Suppose we need to drive a path electrical effort H ; then h ^{ N} = H , or N ln h = ln H . Thus the delay, Nh = h ln H /ln h . Since ln H is fixed, we can only vary h /ln ( h ). Figure 3.12 shows that this is a very shallow function with a minimum at h = e ª 2.718. At this point ln h = 1 and the total delay is N e = e ln H . This result is particularly useful in driving large loads either onchip (the clock, for example) or offchip (I/O pad drivers, for example).
Figure 3.12 shows us how to minimize delay regardless of area or power and neglecting parasitic and nonideal delays. More complicated equations can be derived, including nonideal effects, when we wish to trade off delay for smaller area or reduced power. [ Chapter start ] [ Previous page ] [ Next page ] 




