Name

## Digital Design using HDLs LSU EE 4755Final Examination Wednesday, 6 December 2017 15:00-17:00 CST

- Problem 1 \_\_\_\_\_ (15 pts)
- Problem 2 \_\_\_\_\_ (25 pts)
- Problem 3 \_\_\_\_\_ (20 pts)
- Problem 4 \_\_\_\_\_ (10 pts)
- Problem 5 \_\_\_\_\_ (30 pts)
- Exam Total \_\_\_\_\_ (100 pts)

Alias

Good Luck!

Problem 1: [15 pts] The Verilog code below is the solution to Problem 1a of Homework 7. Below that is the hardware for a slightly different pipelined multiplier. Modify the hardware to match the Verilog code. Changes need to be made for each line commented DIFFERS.







Problem 2: [25 pts] Module oldest\_find\_plan\_b, illustrated below, is based on an alternative solution to Homework 7 Problem 1b. Below the hardware illustration is incomplete Verilog code for this module. The Verilog code uses abbreviated names, such as ns, comments show the original names from the assignment, such as nstages. Complete the module. Note: This problem can be solved without having ever seen Homework 7, though not as quickly.



Complete the module so that it matches the hardware above.

```
module oldest_find_plan_b
```

Problem 3: [20 pts] Appearing below are two variations on the oldest index module from the previous problem. The Plan A version is based on the code from the posted Homework 7 solution. The Plan B module is slightly different.

(a) Compute the cost of each module based on the simple model after optimizing for constant values. Use symbol w (for w) and n (for ns). Base the cost of an  $\alpha$ -input,  $\beta$ -bit multiplexor on the tree (recursive) implementation. Recall that the tree implementation consists of  $\alpha - 1$  two-input multiplexors arranged in a tree.

Plan A cost in terms of w and n.  $\Box$  Show cost components on diagram, such as cost of big mux,  $\Box$  don't forget to account for the constant inputs, and  $\Box$  for the number of bits in each wire.



Plan B cost in terms of w and n. Show cost components on diagram, such as cost of big mux, [ don't forget to account for the constant inputs and,  $\square$  for the number of bits in each wire.



(b) Show the delay along all paths and show the critical path. Compute delay based on the simple model after optimizing for constant values. Use the tree mux described in the previous part.

Plan A:  $\square$  show delay along all paths,  $\square$  highlight the critical path,  $\square$  and show the delay through each component. Show these  $\square$  in terms of w and n, and  $\square$  account for constant inputs such as the zeros in the equality units.



Plan B: show delay along all paths, highlight the critical path, and show the delay through each component. Show these in terms of w and n, and account for constant inputs such as the zeros in the equality units.



Problem 4: [10 pts] Explain why each of the modules below is not synthesizable by Cadence Encounter (or similar tools) and modify the code so that it is without changing what the module does. Note: The warning about not changing what the module does was not in the original exam.

```
module one_run #( int w = 16, int lw = $clog2(w) )
   (output logic all_1s, input uwire [w-1:0] a, input uwire [lw:0] start, stop );
   always_comb begin
      all_1s = 1;
      for ( int i=start; i<stop; i++ )</pre>
        all_1s = all_1s && a[i];
   end
endmodule
Reason code above is not synthsizable:
Modify code so that it is.
module running_sum #( int w = 32 )
   ( output logic [w-1:0] rsum,
     input uwire [w-1:0] a,
                                  input uwire reset, clk );
   always @( posedge clk ) begin
      if ( reset ) rsum <= 0;</pre>
   end
   always @( posedge clk ) begin
      rsum <= rsum + a;</pre>
   end
endmodule
Modify code so that it is synthsizable.
Reason code above was not synthsizable:
```

Explain assumption about intended behavior of this module.

Problem 5: [30 pts] Answer each question below.

(a) Show when each piece of code below executes (use the C labels) up until the start of C5c, and show when and in which region each piece is scheduled. See the table below.

```
module eq;
   logic [7:0] a, b, c, d, x, y, x1, x2, y1, y2, z2;
                              // C1
   always_comb begin
      x1 = a + b;
      y1 = 2 * b;
   end
   assign x^2 = 100 + a + b; // C2
   assign y^2 = 4 * b;
                              // C3
   assign z^2 = y^2 + 1;
                              // C4
   initial begin
      11
                                  C5a
      a = 0;
      b = 10;
      #2;
      11
                                  С5ъ
      a = 1;
      b <= 11;
      #2;
      11
                                  C5c
      a = 2;
      b = 12;
   \operatorname{end}
endmodule
```

Continue the diagram below so that it shows scheduling up to the point where C5c executes.

| Step 1   | Step 2   | Step 3 |
|----------|----------|--------|
| t = 0    | t = 0    | t = 0  |
| Active   | Active   | Active |
| C5a      |          |        |
| Inactive | Inactive |        |
|          | C1       |        |
| NBA      | C2       |        |
|          | C3       |        |
|          | NBA      |        |
|          |          |        |
|          | t=2      |        |
|          | Inactive |        |
|          | C5b      |        |

(b) Which of the two modules does what it looks like it's trying to do? Explain.

```
module Sa1(input logic [7:0] a, b, c, d, output wire [7:0] x, y );
    assign x = a + b;
    assign y = 2 * x;
    assign x = c + d;
endmodule
module Sa2(input logic [7:0] a, b, c, d, output logic [7:0] x, y );
    always_comb begin
    x = a + b;
    y = 2 * x;
    x = c + d;
    end
endmodule
```

Module that is probably correct is:

Major problem with other module.

Provide a possible wrong answer from other module.

(c) Define throughput and latency and indicate where each is preferred. Provide examples appropriate for pipelined systems. Throughput is: For example: Latency is: For example, If the goal is to improve throughput is higher throughput good or bad? If the goal is to improve latency, is higher latency good or bad? In what situation is latency more important than throughput? (d) When we synthesize we specified a target delay, for example, 400 ns. Does specifying a larger delay mean that there will be less optimization? Explain.