## Digital Design using HDLs LSU EE 4755 Final Examination

Friday, 12 December 2025 12:30-14:30 CST

| Problem 1  | <br>(15 pts)  |
|------------|---------------|
| Problem 2  | <br>(20 pts)  |
| Problem 3  | <br>(20 pts)  |
| Problem 4  | <br>(15 pts)  |
| Problem 5  | <br>(20 pts)  |
| Problem 6  | <br>(10 pts)  |
| Exam Total | <br>(100 pts) |

Alias \_\_\_\_\_

Problem 1: [15 pts] Appearing below is an Verilog description of the rotate match module from the 2024 final exam. The facing page shows the inferred hardware for an instantiation at wa=3 and wb=2.

```
module rmatch #( int wa = 3, wb = 2, wm = $clog2(wa+1) )
   ( output logic [wm-1:0] m, pos,
     input uwire [wm-1:0] m0, input uwire [wa-1:0] a, input uwire [wb-1:0] b,
     input uwire clk );
   logic [wa-1:0] a_cpy, as;
   logic [wm-1:0] m0_cpy;
   logic [wb-1:0] b_cpy;
   always_ff @( posedge clk ) begin
      a_cpy <= a;
      b_cpy \le b;
      mO_cpy \le mO;
      as = a_cpy;
      pos = 0;
      m = 0;
      for ( int i=0; i<wa; i++ ) begin</pre>
         if ( as[wb-1:0] == b_cpy ) begin
            if ( m == m0_cpy ) pos = i;
            m++;
         end
         as = { as[wa-2:0], as[wa-1] };
   end
endmodule
(a) Optimize the inferred hardware including the items below.
```

- Optimize for constants, there are plenty of them.
- If a unit (equality, adder, mux) as one-bit inputs replace the unit with gates (that implement the correct function).
- Show one of the upper comparison units as individual gates.



| Problem 2: [20 pts] On the facing page is the inferred hardware for the <b>rmatch</b> module, but this time the diagram does not show a particular size.                                                                      |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| (a) Show the simple-model cost of the items requested below.                                                                                                                                                                  |
| Show costs requested below in terms of $w_a$ , $w_b$ , and $w_m$ . $\square$ Take care to <b>avoid</b> using a generic $w$ or $n$ in your final answers. When showing the cost $\square$ <b>account for constant inputs</b> . |
| Cost of the upper (b_cpy) comparison units.                                                                                                                                                                                   |
| Cost of the bit rotation hardware.                                                                                                                                                                                            |
| Cost of the adder.                                                                                                                                                                                                            |
| Cost of the m multiplexor.                                                                                                                                                                                                    |
| Cost of the lower $(mO\_cpy)$ comparison unit.                                                                                                                                                                                |
| Cost of the pos multiplexor.                                                                                                                                                                                                  |
| (b) Show the simple-model delay of each component indicated below in terms of $w_a$ , $w_b$ , and $w_m$ .                                                                                                                     |
| Delay of the upper (b_cpy) comparison unit.                                                                                                                                                                                   |
| Delay of the bit rotation hardware.                                                                                                                                                                                           |
| Delay of the adder.                                                                                                                                                                                                           |
| Delay of the ${\tt m}$ multiplexor.                                                                                                                                                                                           |
| Delay of the lower $(m0\_cpy)$ comparison unit.                                                                                                                                                                               |
| Delay of the pos multiplexor.                                                                                                                                                                                                 |
| (c) Assume signals entering the i=wa-3 section arrive at $t=0$ .                                                                                                                                                              |
| Show the arrival times, especially at the capture points $\square$ in terms of $w_a$ , $w_b$ , and $w_m$ .                                                                                                                    |
| On the diagram show the critical path. No credit will be given for a list of numbers or components, instead show a path on the diagram.                                                                                       |



Problem 3: [20 pts] Appearing below is a simplified version of the solution to Homework 4, the sequential trend module. Show the hardware that will be inferred for this description.

```
module trend2 simple
  #( int wd = 20, wi = 10 )
   ( output logic [1:0] pr_trend,
     output logic [1:0] cr_trend,
     output logic [wi-1:0] cr_istart,
     input uwire [wd-1:0] samp,
     input uwire reset, clk );
   logic [wd-1:0] samp_prev;
   always_ff @( posedge clk ) samp_prev <= samp;</pre>
   uwire [1:0] trl2;
   ist_compare #(wd) c( trl2, samp_prev, samp );
   logic [wi-1:0] idx;
   uwire [wi-1:0] next_idx = reset ? 0 : idx + 1;
   always_ff @( posedge clk ) idx <= next_idx;</pre>
   logic [wi-1:0] lc_idx;
   always_ff @( posedge clk )
     if ( reset || trl2 ) lc_idx <= next_idx;</pre>
   uwire tr_new;
   magic_box mc( tr_new, cr_trend, trl2, reset, clk );
                 <-- outputs --->, <--- inputs --->
   always_ff @( posedge clk )
     if ( reset ) begin
        cr_istart <= 0;</pre>
        pr_trend <= 0;</pre>
     end else if ( tr_new ) begin
        cr_istart <= lc_idx;</pre>
        pr_trend <= cr_trend;</pre>
     end
endmodule
```

|   | Ş   | ž |
|---|-----|---|
| è | j   |   |
|   | 9   |   |
|   |     |   |
|   | 3   | į |
|   | 200 |   |

| Show hardware that will be inferred for the module.                                                                            |
|--------------------------------------------------------------------------------------------------------------------------------|
| Show all module input and output ports.                                                                                        |
| Pay attention to whether a wire connects to a register output. Do not confuse elaboration-time item with synthesized hardware. |

Problem 4: [15 pts] Appearing below are variations on best\_rot\_pipe based on Homework 5.

```
module best_rot_pipe #( int wv = 17, wp = $clog2(wv+1) )
   ( output logic [wp-1:0] pos, dif, input uwire [wv-1:0] val, key, input uwire clk );
  logic [wv-1:0] pl_val_rot[wv:0], pl_key[wv:0];
  logic [wp-1:0] pl_pos[wv:1], pl_dif[wv:1], dif_here[wv-1:0];
   assign pos = pl_pos[wv], dif = pl_dif[wv];
  for ( genvar stage=0; stage<wv; stage++ )</pre>
     pop #(wv,wp) p( dif_here[stage], pl_val_rot[stage] ^ pl_key[stage] );
  always_ff @( posedge clk ) begin
      pl_val_rot[0] <= val;</pre>
                               pl_key[0] <= key;
      for ( int stage=0; stage<wv; stage++ ) begin</pre>
         automatic logic new_low = stage==0 || dif_here[stage] < pl_dif[stage];</pre>
         pl_dif[stage+1] <= new_low ? dif_here[stage] : pl_dif[stage];</pre>
         pl_pos[stage+1] <= new_low ? stage : pl_pos[stage];</pre>
         pl_key[stage+1] <= pl_key[stage]; // <----- Hmm, key not changed.
         pl_val_rot[stage+1] <= { pl_val_rot[stage][0], pl_val_rot[stage][wv-1:1] };</pre>
      end
   end
endmodule
```

(a) Notice that in the module above pl\_key is never changed. Variation Plan K, below, reduces cost by eliminating pl\_key.

```
module best rot pipe plan K #( int wv = 17, wp = $clog2(wv+1) )
   ( output logic [wp-1:0] pos, dif, input uwire [wv-1:0] val, key, input uwire clk );
   logic [wv-1:0] pl_val_rot[wv:0];
   logic [wp-1:0] pl_pos[wv:1], pl_dif[wv:1], dif_here[wv-1:0];
   assign pos = pl_pos[wv], dif = pl_dif[wv];
   for ( genvar stage=0; stage<wv; stage++ )</pre>
     pop #(wv,wp) p( dif_here[stage], pl_val_rot[stage] ^ key ); // <--- Important change.</pre>
   always_ff @( posedge clk ) begin
      pl_val_rot[0] <= val;</pre>
      for ( int stage=0; stage<wv; stage++ ) begin</pre>
         automatic logic new_low = stage==0 || dif_here[stage] < pl_dif[stage];</pre>
         pl_dif[stage+1] <= new_low ? dif_here[stage] : pl_dif[stage];</pre>
         pl_pos[stage+1] <= new_low ? stage : pl_pos[stage];</pre>
         pl_val_rot[stage+1] <= { pl_val_rot[stage][0], pl_val_rot[stage][wv-1:1] };</pre>
      end
   end
endmodule
```

Describe what's wrong with Plan K.

Describe two sets of non-trivial inputs. For the first Plan K provides the wrong answers (different than the original), for the other Plan K and the original agree.

```
Staple This Side
```

```
module best rot pipe plan B #( int wv = 17, wp = $clog2(wv+1) )
   // Omitted the code above this point. It's the same as the original code.
   always_ff @( posedge clk ) begin
                              pl_key[0] = key;
      pl_val_rot[0] = val;
      for ( int stage=0; stage<wv; stage++ ) begin</pre>
         automatic logic new_low = stage==0 || dif_here[stage] < pl_dif[stage];</pre>
         pl_dif[stage+1] = new_low ? dif_here[stage] : pl_dif[stage];
         pl_pos[stage+1] = new_low ? stage : pl_pos[stage];
         pl_key[stage+1] = pl_key[stage];
         pl_val_rot[stage+1] = { pl_val_rot[stage][0], pl_val_rot[stage][wv-1:1] };
   end
endmodule
Will the Plan_B code, above, compile and if so, run correctly?
module best rot pipe plan b #( int wv = 17, wp = $clog2(wv+1) )
   // Omitted the code above this point. It's the same as the original code.
   always_ff @( posedge clk ) begin
      for ( int stage=wv-1; stage>=0; stage-- ) begin // <--- Note change in loop direction.
         automatic logic new_low = stage==0 || dif_here[stage] < pl_dif[stage];</pre>
         pl_dif[stage+1] = new_low ? dif_here[stage] : pl_dif[stage];
         pl_pos[stage+1] = new_low ? stage : pl_pos[stage];
         pl_key[stage+1] = pl_key[stage];
         pl_val_rot[stage+1] = { pl_val_rot[stage][0], pl_val_rot[stage][wv-1:1] };
      end
      pl_val_rot[0] = val;
                               pl_key[0] = key;
   end
endmodule
Will the Plan_b code, above, compile and if so, run correctly?
(c) Consider Plan P:
module best_rot_pipe_plan_P #( int wv = 17, wp = $clog2(wv+1) )
   ( output logic [wp-1:0] pos, dif, input uwire [wv-1:0] val, key,
                                                                           input uwire clk );
   logic [wv-1:0] pl_val_rot[wv:0], pl_key[wv:0];
   logic [wp-1:0] pl_pos[wv:1], pl_dif[wv:1], dif_here[wv-1:0];
   assign pos = pl_pos[wv], dif = pl_dif[wv];
   always_ff @( posedge clk ) begin
      pl_val_rot[0] <= val;</pre>
                                pl_key[0] <= key;</pre>
      for ( int stage=0; stage<wv; stage++ ) begin</pre>
         automatic logic new_low = stage==0 || dif_here[stage] < pl_dif[stage];</pre>
         pop #(wv,wp) p( dif_here[stage], pl_val_rot[stage] ^ pl_key[stage] ); // <--- Change.
         pl_dif[stage+1] <= new_low ? dif_here[stage] : pl_dif[stage];</pre>
         pl_pos[stage+1] <= new_low ? stage : pl_pos[stage];</pre>
         pl_key[stage+1] <= pl_key[stage];</pre>
         pl_val_rot[stage+1] <= { pl_val_rot[stage][0], pl_val_rot[stage][wv-1:1] };</pre>
      end
   end
endmodule
Will the Plan_P code, above, compile and if so, run correctly?
```

(b) The two variations below, Plan B, and Plan b (case sensitive), use blocking assignments.

Problem 5: [20 pts] Synthesis of modules based on Homework 5 are reported in the tables below. It's not important to understand how these modules compute their results. It is important to understand how their combinational, sequential, and pipelined organizations affect result timing. Let w denote the value of parameter wv, the number of bits in the val and key inputs.

Module best\_rot\_procedural describes combinational logic. Module best\_rot\_seq is sequential and requires w cycles to compute a result. Module best\_rot\_pipe is pipelined and requires w cycles to compute a result. Module best\_rot\_pipe\_extra\_stage is also pipelined but requires w+1 cycles to compute a result. Appearing below is synthesis data for instantiations with wv=8.

| Module Name                                                                                                                                                           | Area       | Delay           | Delay           | Synth         |  |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|-----------------|-----------------|---------------|--|
|                                                                                                                                                                       |            | Actual          | Target          | Time          |  |
| best_rot_procedural_wv8                                                                                                                                               | 184379     | 2.34            | 0.1 ns          | 340 s         |  |
| best_rot_seq_wv8                                                                                                                                                      | 52448      | 2.02            | 0.1 ns          | 26 s          |  |
| best_rot_pipe_wv8                                                                                                                                                     | 255256     | 2.27            | 0.1 ns          | 66 s          |  |
| best_rot_pipe_extra_stage_wv8                                                                                                                                         | 265623     | 1.63            | 0.1 ns          | 69 s          |  |
| Based on this data:                                                                                                                                                   |            |                 |                 |               |  |
| What is the latency and throughput of be                                                                                                                              | est_rot_pı | rocedural       | 1?              |               |  |
| What is the latency and throughput of be                                                                                                                              | est_rot_se | eq?             |                 |               |  |
| What is the latency and throughput of be                                                                                                                              | est_rot_p  | ipe?            |                 |               |  |
| What is the latency and throughput of be                                                                                                                              | est_rot_p  | ipe_extra       | _stage?         |               |  |
| (a) Appearing below is another table of synthesis data for wv=8 instantiations of the modules. The costs (areas) are lower compared to the syntheses appearing above. |            |                 |                 |               |  |
| Module Name                                                                                                                                                           | Area       | Delay<br>Actual | Delay<br>Target | Synth<br>Time |  |
| best_rot_procedural_wv8                                                                                                                                               | 77188      | 4.59            | 100.0 ns        | 51 s          |  |
| best_rot_seq_wv8                                                                                                                                                      | 31271      | 3.58            | 100.0 ns        | 3 s           |  |
| best_rot_pipe_wv8                                                                                                                                                     | 181612     | 3.81            | 100.0 ns        | 24 s          |  |
| best_rot_pipe_extra_stage_wv8                                                                                                                                         | 198426     | 2.89            | 100.0 ns        | 25 s          |  |
| Why are the costs in this second set lower?                                                                                                                           |            |                 |                 |               |  |
| Why is best_rot_seq less expensive than best_rot                                                                                                                      | ot_procedu | ıral?           |                 |               |  |
| Why is best_rot_pipe more expensive than best                                                                                                                         | _rot_proc  | edural?         |                 |               |  |

Problem 6: [10 pts] Answer each question below.

(a) Show the values of the variables where indicated.

- (b) In the simple model the time to compute the sum of three integers, such as a + b + c, is much less than twice the time to compute the sum of two items, such as a + b.
- Why is the time to compute a + b + c much less than twice the time to compute a + b? What about the numbers and hardware are we assuming?

Can the same technique be applied to floating-point numbers? Explain.