# 13.2  The Comparator/MUX Example

As an example we borrow the model from Section 12.2, “A Comparator/MUX,”

// comp_mux.v

module comp_mux(a, b, outp); input [2:0] a, b; output [2:0] outp;

function [2:0] compare; input [2:0] ina, inb;

begin if (ina <= inb) compare = ina; else compare = inb; end

endfunction

assign outp = compare(a, b);

endmodule

We can use the following testbench to generate a sequence of input values (we call these input vectors ) that test or exercise the behavioral model, comp_mux.v :

// testbench.v

module comp_mux_testbench;

integer i, j;

reg [2:0] x, y, smaller; wire [2:0] z;

always @(x) ("t x y actual calculated");

initial ("%4g",,,x,,y,,z,,,,,,,smaller);

initial ; initial #1000 ;

initial

begin

for (i = 0; i <= 7; i = i + 1)

begin

for (j = 0; j <= 7; j = j + 1)

begin

x = i; y = j; smaller = (x <= y) ? x : y;

#1 if (z != smaller) ("error");

end

end

end

comp_mux v_1 (x, y, z);

endmodule

The results from the behavioral simulation are as follows:

t x y actual calculated

0 0 0 0 0

1 0 1 0 0

... 60 lines omitted...

62 7 6 6 6

63 7 7 7 7

We included a delay of one Verilog time unit in line 15 of the testbench model (allowing time to progress), but we did not specify the units—they could be nanoseconds or days. Thus, behavioral simulation can only tell us if our design does not work; it cannot tell us that real hardware will work.

## 13.2.1  Structural Simulation

We use logic synthesis to produce a structural model from a behavioral model. The following comparator/MUX model is adapted from the example in Section 12.11 , “ Performance-Driven Synthesis ” (optimized for a 0.6 m m standard-cell library):

`timescale 1ns / 10ps // comp_mux_o2.v

module comp_mux_o (a, b, outp);

input [2:0] a; input [2:0] b;

output [2:0] outp;

supply1 VDD; supply0 VSS;

mx21d1 b1_i1 (.i0(a[0]), .i1(b[0]), .s(b1_i6_zn), .z(outp[0]));

oa03d1 b1_i2 (.a1(b1_i9_zn), .a2(a[2]), .b1(a[0]), .b2(a[1]),

.c(b1_i4_zn), .zn(b1_i2_zn));

nd02d0 b1_i3 (.a1(a[1]), .a2(a[0]), .zn(b1_i3_zn));

nd02d0 b1_i4 (.a1(b[1]), .a2(b1_i3_zn), .zn(b1_i4_zn));

mx21d1 b1_i5 (.i0(a[1]), .i1(b[1]), .s(b1_i6_zn), .z(outp[1]));

oa04d1 b1_i6 (.a1(b[2]), .a2(b1_i7_zn), .b(b1_i2_zn),

.zn(b1_i6_zn));

in01d0 b1_i7 (.i(a[2]), .zn(b1_i7_zn));

an02d1 b1_i8 (.a1(b[2]), .a2(a[2]), .z(outp[2]));

in01d0 b1_i9 (.i(b[2]), .zn(b1_i9_zn));

endmodule

Logic simulation requires Verilog models for the following six logic cells: mx21d1 (2:1 MUX), oa03d1 (OAI221), nd02d0 (two-input NAND), oa04d1 (OAI21), in01d0 (inverter), and an02d1 (two-input AND). These models are part of an ASIC library (often encoded so that they cannot be seen) and thus, from this point on, the designer is dependent on a particular ASIC library company. As an example of this dependence, notice that some of the names in the preceding code have changed from uppercase (in Figure 12.8 on p. 624) to lowercase. Verilog is case sensitive and we are using a cell library that uses lowercase. Most unfortunately, there are no standards for names, cell functions, or the use of case in ASIC libraries.

The following code (a simplified model from a 0.8 m m standard-cell library) models a 2:1 MUX and uses fixed delays:

`timescale 1 ns / 10 ps

module mx21d1 (z, i0, i1, s); input i0, i1, s; output z;

not G3(N3, s);

and G4(N4, i0, N3), G5(N5, s, i1), G6(N6, i0, i1);

or G7(z, N4, N5, N6);

specify

(i0*>z) = (0.279:0.504:0.900, 0.276:0.498:0.890);

(i1*>z) = (0.248:0.448:0.800, 0.264:0.476:0.850);

(s*>z)  = (0.285:0.515:0.920, 0.298:0.538:0.960);

endspecify

endmodule

This code uses Verilog primitive models ( not , and , or ) to describe the behavior of a MUX, but this is not how the logic cell is implemented.

To simulate the optimized structural model, module comp_mux_o2.v , we use the library cell models (module mx21d1 and the other five that are not shown here) together with the following new testbench model:

`timescale 1 ps / 1 ps // comp_mux_testbench2.v

module comp_mux_testbench2;

integer i, j; integer error;

reg [2:0] x, y, smaller; wire [2:0] z, ref;

always @(x) ("t x y derived reference");

// initial ("%8.2f",/1e3,,x,,y,,z,,,,,,,,ref);

initial ;

initial begin

error = 0; #1e6 ("%4g", error, " errors");

;

end

initial begin

for (i = 0; i <= 7; i = i + 1) begin

for (j = 0; j <= 7; j = j + 1) begin

x = i; y = j; #10e3;

("%8.2f",/1e3,,x,,y,,z,,,,,,,,ref);

if (z != ref)

begin ("error"); error = error + 1; end

end

end

end

comp_mux_o v_1 (x, y, z); // comp_mux_o2.v

reference v_2 (x, y, ref);

endmodule

// reference.v

module reference(a, b, outp);

input [2:0] a, b; output [2:0] outp;

assign outp = (a <= b) ? a : b; // different from comp_mux

endmodule

In this testbench we have instantiated two models: a reference model (module reference ) and a derived model (module comp_mux_o , the optimized structural model). The high-level behavioral model that represents the initial system specification (module reference ) may be different from the model that we use as input to the logic-synthesis tool (module comp_mux ). Which is the real reference model? We postpone this question until we discuss formal verification in Section 13.8 . For the moment, we shall simply perform simulations to check the reference model against the derived model. The simulation results are as follows:

t x y derived reference

10.00 0 0 0 0

20.00 0 1 0 0

... 60 lines omitted...

630.00 7 6 6 6

640.00 7 7 7 7

0 errors

(A summary is printed at the end of the simulation to catch any errors.) The next step is to examine the timing of the structural model (by switching the leading '//' from line 6 to 16 in module comp_mux_testbench2 ). It is important to simulate using the worst-case delays by using a command-line switch as follows: verilog +maxdelays . We can then find the longest path delay by searching through the simulator output, part of which follows:

t x y derived reference

... lines omitted...

260.00 3 2 1 2

260.80 3 2 3 2

260.85 3 2 2 2

270.00 3 3 2 3

270.80 3 3 3 3

280.00 3 4 3 3

280.85 3 4 0 3

283.17 3 4 3 3

... lines omitted...

At time 280 ns, the input vectors, x and y , switch from ( x = 3 , y = 3 ) to ( x = 3 , y = 4 ). The output of the derived model (which should be equal to the smaller of x and y ) is the same for both of these input vectors and should remain unchanged. In fact there is a glitch at the output of the derived model, as it changes from 3 to 0 and back to 3 again, taking 3.17 ns to settle to its final value (this is the longest delay that occurs using this testbench). The glitch occurs because one of the input vectors (input y ) changes from '011' (3 in decimal) to '100' (decimal 4). Changing several input bits simultaneously causes the output to vacillate.

Notice that the nominal and worst-case simulations will not necessarily give the same longest path delay. In addition the longest path delay found using this testbench is not necessarily the critical path delay. For example, the longest, and therefore critical, path delay might result from a transition from x = 3 , y = 4 to x = 4 , y = 3 (to choose a random but possible candidate set of input vectors). This testbench does not include tests with such transitions. To find the critical path using logic simulation requires simulating all possible input transitions (64 ¥ 64 = 4096) and then sifting through the output to find the critical path.

Vector-based simulation (or dynamic simulation ) can show us that our design functions correctly—hence the name functional simulation. However, functional simulation does not work well if we wish to find the critical path. For this we turn to a different type of simulation—static simulation or static timing analysis.

 TABLE 13.1  Timing analysis of the comparator/MUX structural model, comp_mux_o2.v , from Figure 12.8 . Command Timing analyzer/logic synthesizer output 1 1 > report timing instance name inPin --> outPin incr arrival trs rampDel cap cell                       (ns) (ns) (ns) (pf) ---------------------------------------------------------------------- a[0] .00 .00 R .00 .12 comp_m... b1_i3 A2 --> ZN .31 .31 F .23 .08 nd02d0 b1_i4 A2 --> ZN .41 .72 R .26 .07 nd02d0 b1_i2 C --> ZN 1.36 2.08 F .13 .07 oa03d1 b1_i6 B --> ZN .94 3.01 R .24 .14 oa04d1 b1_i5 S --> Z 1.04 4.06 F .08 .04 mx21d1 outp[0] .00 4.06 F .00 .00 comp_m...

## 13.2.2 Static Timing Analysis

A timing analyzer answers the question: “What is the longest delay in my circuit?” Table 13.1 shows the timing analysis of the comparator/MUX structural model, module comp_mux_o2.v . The longest or critical path delay is 4.06 ns under the following worst-case operating conditions: worst-case process, V DD = 4.75 V, and T = 70 C (the same conditions as used for the library data book delay values). The timing analyzer gives us only the critical path and its delay. A timing analyzer does not give us the input vectors that will activate the critical path. In fact input vectors may not exist to activate the critical path. For example, it may be that the decimal values of the input vectors to the comparator/MUX may never differ by more than four, but the timing-analysis tool cannot use this information. Future timing-analysis tools may consider such factors, called Boolean relations , but at present they do not.

Section 13.2.1 explained why dynamic functional simulation does not necessarily find the critical path delay. Nevertheless, the difference between the longest path delay found using functional simulation, 3.17 ns, and the critical path delay reported by the static timing-analysis tool, 4.06 ns, is surprising. This difference occurs because the timing analysis accounts for the loading of each logic cell by the input capacitance of the logic cells that follow, but the simplified Verilog models used for functional simulation in Section 13.2.1 did not include the effects of capacitive loading. For example, in the model for the logic cell mx21d1 , the (rising) delay from the i0 input to the output z , was fixed at 0.900 ns worst case (the maximum delay value is the third number in the first triplet in line 7 of module mx21d1 ). Normally library models include another portion that adjusts the timing of each logic cell—this portion was removed to simplify the model mx21d1 shown in Section 13.2.1 .

Most timing analyzers do not consider the function of the logic when they search for the critical path. Thus, for example, the following code models z = NAND(a, NOT(a)) , which means that the output, z , is always '1' .

module check_critical_path_1 (a, z);

input a; output z; supply1 VDD; supply0 VSS;

nd02d0 b1_i3 (.a1(a), .a2(b), .zn(z)); // 2-input NAND

in01d0 b1_i7 (.i(a), .zn(b)); // inverter

endmodule

A timing-analyzer report for this model might show the following critical path:

inPin --> outPin incr arrival trs rampDel cap cell

(ns) (ns) (ns) (pf)

--------------------------------------------------------------------

a .00 .00 R .00 .08    check_...

b1_i7

I --> ZN .38 .38 F .30 .07 in01d0

b1_i3

A2 --> ZN .28 .66 R .13 .04 nd02d0

z .00 .66 R .00 .00 check_...

Paths such as this, which are impossible to activate, are known as false paths . Timing analysis is essential to ASIC design but has limitations. A timing-analysis tool is more logic calculator than logic simulator.

## 13.2.3  Gate-Level Simulation

To illustrate the differences between functional simulation, timing analysis, and gate-level simulation, we shall simulate the comparator/MUX critical path (the path is shown in Table 13.1 ). We start by trying to find vectors that activate this critical path by working forward from the beginning of the critical path, the input a[0] , toward the end of the critical path, output outp[0] , as follows:

1. Input a[0] to the two-input NAND, nd02d0 , cell instance b1_i3 , changes from a '0' to a '1' . We know this because there is an 'R' (for rising) under the trs (for transition) heading on the first line of the critical path timing analysis report in Table 13.1 .
2. Input a[1] to the two-input NAND, nd02d0 , cell instance b1_i3 , must be a '1' . This allows the change on a[0] to propagate toward the output, outp[0] .
3. Similarly, input b[1] to the two-input NAND, cell instance b1_i4 , must be a '1' .
4. We skip over the required inputs to cells b1_i2 and b1_i6 for the moment.
5. From the last line of Table 13.1 we know the output of MUX, mx21d1 , cell instance b1_i5 , changes from '1' to a '0' . From the previous line in Table 13.1 we know that the select input of this MUX changes from '0' to a '1' . This means that the final value of input b[0] (the i1 input, selected when the select input is '1' ) must be '0' (since this is the final value that must appear at the MUX output). Similarly, the initial value of a[0] must be a '1' .

We have now contradicted ourselves. In step 1 we saw that the initial value of a[0] must be a '0' . The critical path is thus a false path. Nevertheless we shall proceed. We set the initial input vector to ( a = '110' , b = '111') and then to ( a = '111' , b = '110' ). These vectors allow the change on a[0] to propagate to the select signal of the MUX, mx21d1 , cell instance b1_i5 . In decimal we are changing a from 6 to 7, and b from 7 to 6; the output should remain unchanged at 6. The simulation results from the gate-level simulator we shall use ( CompassSim) can be displayed graphically or in the text form that follows:

...

# The calibration was done at Vdd=4.65V, Vss=0.1V, T=70 degrees C

Time = 0:0 [0 ns]

a = 'D6 [0] (input)(display)

b = 'D7 [0] (input)(display)

outp = 'Buuu ('Du) [0] (display)

outp --> 'B1uu ('Du) [.47]

outp --> 'B11u ('Du) [.97]

outp --> 'D6 [4.08]

a --> 'D7 [10]

b --> 'D6 [10]

outp --> 'D7 [10.97]

outp --> 'D6 [14.15]

Time = 0:0 +20ns [20 ns]

The code 'Buuu denotes that the output is initially, at t = 0 ns, a binary vector of three unknown or unsettled signals. The output bits become valid as follows: outp[2] at 0.47 ns, outp[1] at 0.97 ns, and outp[0] at 4.08 ns. The output is stable at 'D6 (decimal 6) or '110' at t = 10 ns when the input vectors are changed in an attempt to activate the critical path. The output glitches from 'D6 ( '110' ) to 'D7 ( '111' ) at t = 10.97 ns and back to 'D6 again at t = 14.15 ns. Thus, the output bit, outp[0] , takes a total of 4.15 ns to settle.

Can we explain this behavior? The data book entry for the mx21d1 logic cell gives the following equation for the rising delay as a function of Cld (the load capacitance, excluding the output capacitance of the logic cell itself, expressed in picofarads):

 tI0Z (IO->Z) = 0.90 + 0.07 + (1.76 ¥ Cld) ns (13.1)

tI0Z (IO->Z) = 0.90 + 0.07 + (1.76 ¥ Cld) ns (13.1)

The capacitance, Cld , at the output of each MUX is zero (because nothing is connected to the outputs). From Eq.  13.1 , the path delay from the input, a[0] , to the output, outp[0] , is thus 0.97 ns. This explains why the output, outp[0] , changes from '0' to '1' at t = 10.97 ns, 0.97 ns after a change occurs on a[0] .

The gate-level simulation predicts that the input, a[0] , to the MUX will change before the changes on the inputs have time to propagate to the MUX select. Finally, at t = 14.15 ns, the MUX select will change and switch the output, outp[0] , back to '0' again. The total delay for this input vector stimulus is thus 4.15 ns. Even though this path is a false path (as far as timing analysis is concerned), it is a critical path. It is indeed necessary to wait for 4.15 ns before using the output signal of this circuit. A timing analyzer can only offer us a guarantee that there is no other path that is slower than the critical path.

## 13.2.4 Net Capacitance

The timing analyzer predicted a critical path delay of 4.06 ns compared to the gate-level simulation prediction of 4.15 ns. We can check our results by using another gate-level simulator ( QSim) which uses a slightly different algorithm. Here is the output (with the same input vectors as before):

@nodes

a R10 W1; a[2] a[1] a[0]

b R10 W1; b[2] b[1] b[0]

outp R10 W1; outp[2] outp[1] outp[0]

@data

.00 a -> 'D6

.00 b -> 'D7

.00 outp -> 'Du

.53 outp -> 'Du

.93 outp -> 'Du

4.42 outp -> 'D6

10.00 a -> 'D7

10.00 b -> 'D6

11.03 outp -> 'D7

14.43 outp -> 'D6

### END OF SIMULATION TIME = 20 ns

@end

The output is similar but gives yet another value, 4.43 ns, for the path delay. Can this be explained? The simulator prints the following messages as a clue:

defCapacitance = .1E-01 pF

incCapacitance = .1E-01 pF/pin

The simulator is adding capacitance to the outputs of each of the logic cells to model the parasitic net capacitance ( interconnect capacitance or wire capacitance) that will be present in the physical layout. The simulator adds 0.01 pF ( defCapacitance ) on each node and another 0.01 pF ( incCapacitance ) for each pin (logic cell input) attached to a node. The model that predicts these values is known as a wire-load model , wire-delay model , or interconnect model . Changing the wire-load model parameters to zero and repeating the simulation changes the critical-path delay to 4.06 ns, which agrees exactly with the logic-synthesizer timing analysis. This emphasizes that the net capacitance may contribute a significant delay.

The library data book (VLSI Technology, vsc450) lists the cell input and output capacitances. For example, the values for the nd02d0 logic cell are as follows:

 Cin (inputs, a1 and a2) = 0.042 pF Cout (output, zn) = 0.038 pF (13.2)

Cin (inputs, a1 and a2) = 0.042 pF Cout (output, zn) = 0.038 pF
(13.2)

Armed with this information, let us return to the timing analysis report of Table 13.1 (the part of this table we shall focus on follows) and examine how a timing analyzer handles net capacitance.

inPin --> outPin incr arrival trs rampDel cap cell

(ns) (ns) (ns) (pf)

---------------------------------------------------------------------

a[0] .00 .00 R .00 .12 comp_m...

b1_i3

A2 --> ZN .31 .31 F .23 .08 nd02d0

...

The total capacitance at the output node of logic cell instance b1_i3 is 0.08 pF. This figure is the sum of the logic cell ( nd02d0 ) output capacitance of cell instance b1_i3 (equal to 0.038 pF) and Cld , the input capacitance of the next cell, b1_i2 (also an nd02d0 ), equal to 0.042 pF.

The capacitance at the input node, a[0] , is equal to the sum of the input capacitances of the logic cells connected to that node. These capacitances (and their sources) are as follows:

1. 0.042 pF (the a2 input of the two-input NAND, instance b1_i3 , cell nd02d0)
2. 0.038 pF (the i0 input of the 2:1 MUX, instance b1_i1 , cell mx21d1 )
3. 0.038 pF (the b1 input of the OAI221, instance b1_i2 , cell oa03d1 )

The sum of these capacitances is the 0.12 pF shown in the timing-analysis report.

Having explained the capacitance figures in the timing-analysis report, let us turn to the delay figures. The fall-time delay equation for a nd02d0 logic cell (again from the vsc450 library data book) is as follows:

 tD (AX->ZN) = 0.08 + 0.11 + (2.89 ¥ Cld) ns (13.3)

tD (AX->ZN) = 0.08 + 0.11 + (2.89 ¥ Cld) ns (13.3)

Notice 0.11 ns = 2.89 nspF–1 ¥ 0.038 pF, and this figure in Eq.  13.3 is the part of the cell delay attributed to the cell output capacitance. The ramp delay in the timing analysis (under the heading rampDel in Table 13.1 ) is the sum of the last two terms in Eq.  13.3 . Thus, the ramp delay is 0.11 + (2.89 ¥ 0.042 ) = 0.231 ns (since Cld is 0.042 pF). The total delay (under incr in Table 13.1 ) is 0.08 + 0.231 = 0.31 ns.

There are thus the following four figures for the critical path delay:

1. 4.06 ns from a static timing analysis using the logic-synthesizer timing engine (worst-case process, V DD = 4.50 V, and T = 70 C). No wire capacitance.
2. 4.15 ns from a gate-level functional simulation (worst-case process, V SS = 0.1 V, V DD = 4.65 V, and T = 70 C). No wire capacitance.
3. 4.43 ns from a gate-level functional simulation. Default wire-capacitance model (0.01 pF + 0.01 pF / pin).
4. 4.06 ns from a gate-level functional simulation. No wire capacitance.

Normally we do not check our simulation results this thoroughly. However, we can only trust the tools if we understand what they are doing, how they work, their limitations, and we are able to check that the results are reasonable.

1. 1Using a 0.8 m m standard-cell library, VLSI Technology vsc450. Worst-case environment: worst-case process, V DD = 4.75 V, and T = 70 °C. No wire capacitance, no input or output capacitance, prop–ramp timing model. The structural model was synthesized and optimized using a 0.6 m m library, but this timing analysis was performed using the 0.8 m m library. This is because the library models are simpler for the 0.8 m m library and thus easier to explain in the text.