

Staple This Side

- Problem 1 \_\_\_\_\_ (25 pts)
- Problem 2 \_\_\_\_\_ (30 pts)
- Problem 3 \_\_\_\_\_ (10 pts)
- Problem 4 (10 pts)
- Problem 5 \_\_\_\_\_ (15 pts)
- Problem 6 (10 pts)
- Exam Total \_\_\_\_\_ (100 pts)

Alias

 $V(\text{mRNA}) \Rightarrow R_e < 1$ 

Good Luck!

Staple This Side

Problem 1: [25 pts] Appearing in this problem are two variations on hardware that selects one of four inputs, i, based on the position of the least-significant 1 in a 4-bit quantity, fmt. This is similar to the hardware needed in the solution to Homework 2, except that here i[3] can be selected.

```
module nn_sparse #( int w = 20 )
   ( output logic [w-1:0] o, input uwire [w-1:0] i[4], input uwire [3:0] fmt );
```

(a) Show the hardware that will be inferred for is0 and show that hardware after optimization.

uwire [w-1:0] is0 = fmt[0] ? i[0] : fmt[1] ? i[1] : fmt[2] ? i[2] : i[3];

Show inferred hardware.

Show optimized hardware. Hardware can be re-arranged to reduce delay.

Use only basic logic gates and multiplexors.

(b) Compute the cost and delay of the optimized hardware for is0 in terms of w. (That's w, not its default value.)

In terms of w cost is:

In terms of w delay is:

(c) Appearing below is an alternative design. Net is0b will have the same value as is0. Show the hardware below before and after optimization. For isi0 do not show multiplexors after optimization. For is0b use two-input multiplexors (as many as needed).

```
uwire [1:0] isi0 = fmt[0] ? 0 : fmt[1] ? 1 : fmt[2] ? 2 : 3;
uwire [w-1:0] is0b = i[isi0];
```

Show inferred hardware.

Show optimized hardware, optimize to reduce delay.

Use basic logic gates and  $\square$  no muxen for isi0 and  $\square$  two-input muxen (plus other logic) for is0b.

(d) Compute the cost and delay of the optimized hardware (from the previous part) in terms of w. (That's w, not its default value.)

In terms of w cost is:

In terms of w delay is:

Staple This Side

Staple This Side

Problem 2: [30 pts] The next\_dist4 hardware illustrated below consists of several duplicated pieces of hardware, one of which is circled. Call the circled hardware an *ami* unit (for add-minimum).



(a) Compute the cost and delay of the module using the simple model, and show the critical path on the illustration. Assume that the adder and comparison units are based on ripple adders.

Cost in terms of w:

Show critical path.  $\Box$  Delay in terms of w:

Account for any cascading ripple units.

(b) Appearing below are two incomplete modules, one is an ami module the other is the next\_dist4 module. Complete these modules to match the diagram using as many ami modules as needed. The ami module can use procedural or implicit structural code. The next\_dist4 module must instantiate and use ami modules but can contain procedural or implicit structural code.

Complete the ami module so that it matches the circled hardware.

Complete the next\_dist4 module using as many ami modules as needed.

Don't forget to declare any intermediate objects that are used.

Noting that there are four adders and the width of each wire is w,  $\Box$  declare and use parameters appropriately.

module ami

endmodule

endmodule

(c) Incomplete module next\_dist is a generalization of next\_dist4 to n elements per input. The module includes a generate loop. Use that loop to instantiate ami modules so that it performs the correct calculation. Keep the loop simple, don't try to fix the delay problem.

Complete module, taking advantage of the generate loop.

Be sure to instantiate ami modules, 🗌 connect the first ami correctly, 🗌 and don't leave e unconnected.

```
module next_dist #( int n = 20, w = 12 )
  ( output uwire [w-1:0] e,
    input uwire [w-1:0] L[n], input uwire [w-1:0] d[n] );
  localparam logic [w-1:0] mv = ~w'(0); // Can use as input to first ami.
  uwire [w-1:0]
```

for ( genvar i=0; i<n; i++ ) begin</pre>

end

endmodule

Problem 3: [10 pts] Consider the with\_assign module below.

## endmodule

(a) Why might the module confuse or annoy humans?

with\_assign could be confusing because:

(b) The module makes extra work for simulators too. Suppose that the input values to with\_assign, b and c, change at t = 10. About how many times will each line below execute in a worst-case scenario? The following sentence was not in the original exam: Use sensitivity lists to justify your answer.

About how many times does each line execute? 
Explain with sensitivity lists.

(c) Complete the sans\_assign routine below so that it does the same thing as with\_assign but is less confusing and less work for simulators.

Complete routine below. (Yes, it's easy but not trivial.)

end endmodule

Why does sans\_assign make less work for the simulator than with\_assign? Explain using sensitivity lists.

Problem 4: [10 pts] Appearing below is an ordinary multiplier, followed by a multiplier that is naïvely designed to take advantage of special cases (first operand is 0 or 1), followed by a module that instantiates both.

```
module mult #( int w = 32 )
             ( output logic [w-1:0] p, input uwire [w-1:0] a, b );
   always_comb p = a * b;
endmodule
module mult_1a #( int w = 32 )
                ( output logic [w-1:0] p, input uwire [w-1:0] a, b );
   always_comb begin
      if (a == 0) p = 0;
      else if ( a == 1 ) p = b;
      else p = a * b;
   end
endmodule
module nm #( int w = 32, logic [w-1:0] c = 12 )
           ( output uwire [w-1:0] prods[4], input uwire [w-1:0] a[4], b[4] );
                  m1 ( prods[0], a[0], b[0] );
  mult #(w)
  mult #(w)
                  m2 ( prods[1], c,
                                       b[1] );
  mult_1a #(w)
                  ma1( prods[2], a[0], b[0] );
  mult_1a \#(w)
                  ma2( prods[3], c,
                                       b[1]);
endmodule
```

Explain why m1 will be faster (lower delay) than ma1, even when possible values of a[0] include 0, 1, and other values. Assume good synthesis programs.

How will the cost and performance of m2 and ma2 compare (to each other) using good synthesis programs? That is,  $\square$  which should be chosen when delay is the only concern and,  $\square$  which of the two should be chosen when cost is the only concern. The answer should not depend on any particular value of c.

Problem 5: [15 pts] Answer the following questions about Verilog syntax and semantics.

(a) Appearing below are four variations on a multiplier with a constant input. Most have errors that would prevent them from compiling. For each indicate whether there is an error, and if so, what the error is and a minimal fix.

Module is  $\bigcirc$  correct or  $\bigcirc$  has the following error and fix: module mult\_2a #( int w = 32, logic [w-1:0] a = 12 ) ( output uwire [w-1:0] p, input uwire [w-1:0] b ); if ( a == 0 ) p = 0;else p = a \* b;endmodule Module is () correct or () has the following error and fix: module mult\_2b #( int w = 32, logic [w-1:0] a = 12 ) ( output uwire [w-1:0] p, input uwire [w-1:0] b ); always\_comb begin if ( a == 0 ) p = 0; else if ( a == 1 ) p = b; p = a \* b;else end endmodule Module is  $\bigcirc$  correct or  $\bigcirc$  has the following error and fix: module mult\_2c #( int w = 32, logic [w-1:0] a = 12 ) ( output uwire [w-1:0] p, input uwire [w-1:0] b ); if (b == 0)p = 0;else if ( b == 1 ) p = a; else p = a \* b;endmodule Module is () correct or () has the following error and fix: module mult\_2d #( int w = 32, logic [w-1:0] a = 12 ) ( output uwire [w-1:0] p, input uwire [w-1:0] b ); if ( a == 0 ) assign p = 0;else if ( a == 1 ) assign p = b; assign p = a \* b; else

endmodule

(b) Show the values of **b** and **c** where requested below.

```
module assortment;
logic [15:0] a;
logic [0:15] b;
logic [16:1] c;
initial begin
a = 16'h1234;
b = a;
c = a;
// Show value of b and c after line above executes:
#1; // Not really needed.
for ( int i=0; i<16; i++ ) b[i] = a[i];
// Show value of b after line above executes:
end
endmodule
```

Problem 6: [10 pts] Answer the following synthesis questions.

(a) Cadence Genus defines the following three synthesis steps: syn\_gen (generic), syn\_map (mapped, or technology mapping), and syn\_opt (optimized). Answer the following questions about technology mapping.

Explain what happens during technology mapping.

Even if optimization were done before technology mapping why is it important optimize after technology mapping?

(b) What is the big disadvantage of setting the delay target too low when performing synthesis? (The small disadvantage is that it takes a longer time to run.)

Disadvantage of setting delay target too low during synthesis: