///  LSU EE 4755 -- Fall 2025 -- Digital Design / HDL
//
/// Verilog Notes 014 -- Synthesis Overview

/// Under Construction


/// Contents
 //
 // Synthesis Overview
 // Synthesis Steps
 // Synthesis of Simple Logic

/// References

// :SV: IEEE 1800-2023 -- The SystemVerilog Standard
//      https://ieeexplore.ieee.org/document/10458102
//      This is for those already familiar with Verilog.
//
// :BV3:  Brown & Vranesic, Fundamentals of Digital Logic with Verilog, 3rd Ed.
//        The text used in LSU EE 2740.


////////////////////////////////////////////////////////////////////////////////
/// Synthesis Overview

// :BV3: 10 (The whole chapter. It's short.)


 /// :Def: Synthesis
//
//  The steps needed to convert a Verilog (or other HDL) description into
//  a form that can be manufactured or downloaded into an FPGA or other
//  programmable device.

 /// :Def: Inference         <-- Make sure that this is clearly understood.
//
//  The process of converting behavioral code into primitives
//  or modules recognized by the synthesis program.

 /// :Def: Optimization   <-- Did it do a good job? It's up to us to know.
//
//  The process of simplifying logic to realize a set of goals.
//  Typical goals are to minimize cost while meeting a timing
//  constraint.

 /// :Def: Synthesis Technology Target
//
//   The type of semiconductor (or other) technology used to build the design.
//   Sometimes shortened to ``target''

 ///  Common Targets
//
//  ASIC
//   Application-Specific Integrated Circuit
//   A fully custom chip.
//   Output of synthesis program drives machines fabricating chip.

//  FPGA
//   Field-Programmable Gate Array
//   Programmable logic.
//   Output of synthesis program downloaded into pre-manufactured chip.


 /// What a Synthesis Program Does
//
//  First Step
//    Read a Verilog description.
//
//  Last Step (Informally known as Tape-Out)
//    Write a design file that specifies (for ASICS):
//      The components that are needed. (and, mux, ram)
//      How they are connected.
//      The components' locations and how the wires are routed.

 /// Synthesis In a Perfect World
//
//  Engineer prepares a Verilog description.
//  Using simulation verify that description works as desired.
//  Click ``synthesize'' and enter credit card number.
//  Wait a few weeks.
//  Chips arrive, and perform exactly as simulated chips did.

 /// Synthesis In the Early 21st Century
//
//  Engineer prepares a Verilog (or other HDL) description following
//    the synthesis program's /design rules/.
//
//  Using simulation, engineer verifies that description works as desired.
//
//  Runs synthesis program,
//   compares area and timing of synthesized design to expectations,
//   if satisfied, the design is done,
//   otherwise modify description to fix problem or tune design.


////////////////////////////////////////////////////////////////////////////////
/// Synthesis Steps

// :BV3: 10 (The whole chapter.) Description of steps is slightly different.

 /// Synthesis Steps
//
//
//  (0)  Elaborate
//       -- Gather and "wire together" modules based on Verilog, libraries, etc.
//
//       Details on elaboration will be covered in set l025-gen-elab.v ..
//       .. don't worry if you don't understand what it is now.
//
//  (1)  Infer
//       -- Transform Verilog code to /generic gates/ (a simple form).
//
//       Details on inference covered in later sets ..
//       .. and is mostly relevant to procedural code. 

//  (1a) High-Level Optimization
//       -- Reduce number of gates, etc.
//
//  (2)  Technology Mapping
//       -- Replace generic gates with /cells/ from /technology target/.
//
//  (3)  Optimization
//       -- Reduce number of gates.
//       -- Fit within timing constraint.
//
//  (4)  Place and Route (Description is for ASICs)
//       -- Place: Determine where on the chip to place each cell.
//       -- Route: Determine route of wires interconnecting cells.


 /// Step 0: Elaboration
//
//  Note: Elaboration is also performed before simulation.
//
//  Input:  A Verilog description, information on libraries, etc.
//  Output: A Verilog description in which all modules are present.
//
//  Elaboration will be covered in set l025-gen-elab.v.
//
//  Briefly, to main activities:
//    Modules that are instantiated in the code but which are not
//      defined in the code are located by looking in places such
//      as libraries.
//    Generate statements are executed. (Generate statements form
//      programs that write Verilog descriptions.)


 /// Step 1:  Inference
//
//  Input:  An elaborated Verilog description.
//  Output: An Explicit Structural Description (RTL) using Generic Gates

//  For implicit structural input, operators are replaced with pre-defined
//  modules or gates.
//
//  For example,
//  -- an "&&" operator is replaced with an AND gate
//  -- an "!" operator is replaced with an inverter (NOT gate).

//  For arithmetic operators synthesis program might use Verilog
//  descriptions from a library.
//
//  For example, for Cadence Encounter with ChipWare library
//  -- a "+" operator is replaced with Verilog module CW_add.

//  Synthesis programs can also recognize certain types of flip-flops
//  and latches.

//  The output of this step, often called an RTL description, consists
//  of generic gates and modules.
//  


 /// Step 2:  Technology Mapping
//
//  The generic gates and modules are replaced with /cells/
//  from the target technology library.

//    ASIC technology libraries' cells typically consist of ...
//    ... a set of gates of various sizes ( 2-input AND, 3-input AND,...) ...
//    ... multiplexors, binary full and half adders ...
//    ... and perhaps larger components that are expected to be useful.

//    FPGA technology libraries' cells usually consist of mostly multiplexors
//    and lookup tables, perhaps with a few memory banks.
//    The lookup tables can be configured to perform small logic functions.

//  Target gates don't always match generic ones, so technology mapping
//    code finds substitutes.

//    For example, a generic 3-input AND gate might be mapped to a
//    four-input AND gate in the technology library with one input
//    tied to logic 1.

//  One of several target gates might be chosen based on fanout needs.

//  Generic arithmetic modules replaced by technology modules, if available,
//  otherwise by gates.


 /// Step 3:  Optimization

//  The goal of optimization is to transform the design ...
//  ... into a logically equivalent design ...
//  ... that is of lower cost and that meets timing constraints.

//  When running synthesis the user provides timing constraints. For
//  example, that a module output must be available 1.2 ns after
//  module inputs arrive.
//
//  In Cadence Genus commands like "create_clock" and "set_input_delay"
//  specify timing constraints.


 /// Step 4: Place and Route

//  Finally, locations are chosen for components in the
//  optimized design and routes are chosen for wires connecting them.

//  No further details will be given for place and route.


////////////////////////////////////////////////////////////////////////////////
/// Synthesis Example - Cadence Genus Synthesis

 /// References
 //
 //  Step-by-step Instructions (For this Course)
 //  https://www.ece.lsu.edu/v/proc.html#synthesis
 //
 //  Genus User Guide -- Access from LSU Only.
 //  https://www.ece.lsu.edu/v/s/genus_user.pdf
 //
 //  Links to More Documentation
 //  https://www.ece.lsu.edu/v/ref.html

module shift_right_logarithmic
  ( output uwire [15:0] sh,
    input uwire [15:0] s0,
    input uwire [3:0] amt );

   uwire [15:0] s1, s2, s3;

   mux2 st0( s1, amt[0], s0, {1'b0, s0[15:1]} ); // Shift by 0 or 1
   mux2 st1( s2, amt[1], s1, {2'b0, s1[15:2]} ); // Shift by 0 or 2
   mux2 st2( s3, amt[2], s2, {4'b0, s2[15:4]} ); // Shift by 0 or 4
   mux2 st3( sh, amt[3], s3, {8'b0, s3[15:8]} ); // Shift by 0 or 8

endmodule

module mux2
  ( output uwire [15:0] x,
    input uwire select,
    input uwire [15:0] a0, a1 );
    assign x = select ? a1 : a0;
endmodule



 /// Start Program

 // [cyc.ece.lsu.edu] % genus
 // :snip:
 // Cadence Genus(TM) Synthesis Solution.
 // Copyright 2023 Cadence Design Systems, Inc. All rights reserved worldwide.
 // :snip:
 // Version: 21.17-s066_1, built Wed Mar 15 07:43:30 PDT 2023
 // :snip:
 // Setting attribute of root '/': 'stdout_log' = genus.log
 // @genus:root: 1>

 /// Specify Technology Library
 //
 //  set_db library osu035_stdcells.lib
 //  -- Note: No need to type this, it's placed in genus_startup.tcl

 /// Load Verilog Source File to be Synthesized
 //
 //  @genus:root: 1> read_hdl l014-example-shifter.v 


 /// Specify What to Synthesize and Infer
 //
 //  @genus:root: 3> elaborate shift_right_logarithmic


 /// Specify Timing Constraints
 //
 //  create_clock -name clk -period 0.5
 //  set_input_delay -clock clk 0.0 [all_inputs]
 //  set_output_delay -clock clk 0.0 [all_outputs]

 /// Perform High-Level Optimizations
 //
 //  syn_gen

 /// Write Out Verilog (For debugging or familiarization.)
 //
 //  write_hdl > mrr-generic.v

 /// Technology Map
 //
 //  syn_map

 /// Perform Low-Level Optimization
 //
 //  syn_opt





////////////////////////////////////////////////////////////////////////////////
/// Synthesis of Simple Logic

// Verilog description to implement:  x = ab + ac + ad;
// Target: ASIC produced by Fab Fab Tech [tm] (Name made up.)

 ///
 /// Engineer-Written Verilog Input and Logic Diagram
 ///

// :Example:
//
// Verilog code, written by human, to implement x = ab + ac + ad;

module my_logic(x,a,b,c,d);
   input uwire a, b, c, d;
   output uwire x;

   assign x = a & b  |  a & c  |  a & d;

endmodule

// 


 ///
 /// Step 0, and 1: Elaboration and Inference
 ///

 ///  Simplified output of inference stage.
//
//  The Verilog below was hand written but designed to show what would
//  be produced by the synthesis steps.  The actual output is
//  less readable, see next example.

module logic_rtl(x,a,b,c,d);
   input a, b, c, d;
   output x;

   uwire ab, ac, ad;

   and a1(ab,a,b);
   and a2(ac,a,c);
   and a3(ad,a,d);

   or o1(x,ab,ac,ad);

endmodule

 ///  The real output of the inference stage (by Cadence Genus)
//
// Generated by Cadence Genus(TM) Synthesis Solution 25.10-p002_1
// Generated on: Sep  5 2025 15:20:07 CDT (Sep  5 2025 20:20:07 UTC)
module my_logic(x, a, b, c, d);
  input a, b, c, d;
  output x;
  wire a, b, c, d;
  wire x;
  wire n_4, n_5, n_7, n_8;
  and g1 (n_4, a, b);
  and g2 (n_5, a, c);
  or g3 (n_7, n_4, n_5);
  and g4 (n_8, a, d);
  or g5 (x, n_7, n_8);
endmodule


 ///
 /// Step 1a -- High-Level Optimization
 ///

 // The module computes x = ab + ac + ad
 // Note that           x = a ( b + c + d )  // Looks less expensive!
 //

// 

 /// Simplified High-Level Optimization
//
module logic_rtl(x,a,b,c,d);
   input a, b, c, d;
   output x;

   uwire borcord;

   or o1( borcord, b, c, d );
   and a1( x, a, borcord );

endmodule


 ///  The real output of the high-level optimization stage (by Cadence Genus)
//
// Generated by Cadence Genus(TM) Synthesis Solution 25.10-p002_1
// Generated on: Sep  5 2025 15:21:16 CDT (Sep  5 2025 20:21:16 UTC)
//
module my_logic(x, a, b, c, d);
  input a, b, c, d;
  output x;
  wire a, b, c, d;
  wire x;
  wire n_21, n_24;
  or g32 (n_21, d, n_24);
  or g33 (n_24, b, c);
  and g34 (x, n_21, a);
endmodule


 ///
 /// Step 2 -- Technology Mapping
 ///

// Simplified output of the technology mapping step.
//
// Here generic gates (and, or) are replaced with specific
// gates from the target technology, fab_fab_and_2 and fab_fab_or_4.

module logic_tech(x,a,b,c,d);
   input a, b, c, d;
   output x;

   uwire borcord;

   fab_fab_or_4 o1( borcord, b, c, d, 1'b0);
   fab_fab_and_2 a1(x, a, borcord);

endmodule


 ///  The real output of the tech mapping stage (by Cadence Genus)
//
// Generated by Cadence Genus(TM) Synthesis Solution 25.10-p002_1
// Generated on: Sep  5 2025 15:45:50 CDT (Sep  5 2025 20:45:50 UTC)
module my_logic(x, a, b, c, d);
  input a, b, c, d;
  output x;
  wire a, b, c, d;
  wire x;
  wire n_0, n_1;
  INVX1 g75(.A (n_1), .Y (x));
  OAI21X1 g76__2398(.A (d), .B (n_0), .C (a), .Y (n_1));
  OR2X2 g77__5107(.A (b), .B (c), .Y (n_0));
endmodule
//
//  Note that OAI21X1 performs the operation Y = ! ( (A|B) & C )


 ///
 /// Step 3 -- Optimization
 ///
 // Further optimization steps applied after technology mapping.

 /// Simplified output of the optimization step.
//
// The four-input OR gate is replaced by two two-input gates, to lower
// costs.
//
module logic_opt(x,a,b,c,d);
   input a, b, c, d;
   output x;

   uwire bc, bcd;

   fab_fab_or_2 o1( bc, b, c );
   fab_fab_or_2 o2( bcd, bc, d );
   fab_fab_and_2 a1( x, a, bcd );

endmodule


 /// The real output of the optimization stage.
//
// In this case, Genus makes no further optimizations. In more complex
// designs it would have plenty to do.
//
module my_logic_real_opt(x, a, b, c, d);
  input a, b, c, d;
  output x;
  wire a, b, c, d;
  wire x;
  wire n_0, n_1;
  INVX1 g53(.A (n_1), .Y (x));
  OAI21X1 g54__8780(.A (d), .B (n_0), .C (a), .Y (n_1));
  OR2X2 g55__4296(.A (b), .B (c), .Y (n_0));
endmodule



////////////////////////////////////////////////////////////////////////////////
/// Synthesizing Arithmetic

// Many synthesis programs recognize common integer arithmetic operators
// and will substitute appropriate library functions.

// Design hardware with two 8-bit inputs, and 1-bit output that is
// true if sum of unsigned integers on inputs > 120 and false
// otherwise.

// :Example:
//
// Verilog description of hardware written by human.

module too_bit(output uwire x, input uwire [7:0] a,b);

   assign      x = a + b > 120;

endmodule


// :Example:
//
// Simplified output of inference step.  Operators replaced with
// instantiation of generic (can be fitted to many technologies) adder
// and comparison units.

module too_bit_rtl(x,a,b);
   input [7:0] a,b;
   output      x;

   uwire [8:0] ab;

   generic_add_8 a1(ab,a,b);
   generic_compare_gt_9 gc1(x,ab,9'd120);

endmodule


// :Example:
//
// The actual output of the inference step.

module too_bit_inf ( x, a, b ) ;

    output x ;
    input [7:0]a ;
    input [7:0]b ;

    wire GND, nx2, nx3, nx4, nx5, nx6, nx7, nx8, nx9, nx10, PWR;
    wire [0:0] \$dummy ;

    assign GND = 0 ;
    add_9u_9u_9u_0_0 x_add_0 (.cin (GND), .a ({GND,a[7],a[6],a[5],a[4],a[3],a[2]
                     ,a[1],a[0]}), .b ({GND,b[7],b[6],b[5],b[4],b[3],b[2],b[1],
                     b[0]}), .d ({nx2,nx3,nx4,nx5,nx6,nx7,nx8,nx9,nx10}), .cout (
                     \$dummy [0])) ;
    assign PWR = 1 ;
    gt_9u_9u x_gt_1 (.a ({nx2,nx3,nx4,nx5,nx6,nx7,nx8,nx9,nx10}), .b ({GND,GND,
             PWR,PWR,PWR,PWR,GND,GND,GND}), .d (x)) ;
endmodule


// :Example:
//
// Simplified output of technology mapping.
//
//  Synthesis program used individual gates rather than a comparison unit
//  because one input is a constant, and so the gates can be optimized.

module too_bit_tech(x,a,b);
   input [7:0] a,b;
   output      x;

   uwire [8:0] ab;

   fab_fab_add_8 a1(ab,a,b);

   // Gates implementing a comparison circuit. (Not shown.)

endmodule


// :Example:
//
// Simplified output of optimization step.
//
// Since one input to comparison is a constant comparison was
// optimized.

module too_bit_opt(x,a,b);
   input [7:0] a,b;
   output      x;

   uwire [8:0]  ab;

   fab_fab_add_8 a1(ab,a,b);

   // Gates implementing a comparison circuit, simplified because
   // one operand is a constant.

endmodule


// :Example:
//
// The actual output of optimization.

module too_bit_opt_real ( x, a, b ) ;
    output x ;
    input [7:0]a ;
    input [7:0]b ;

    wire nx8, nx10, nx12, nx18, nx30, nx38, nx44, nx98, nx102, nx125, nx127,
         nx129, nx131, nx134, nx136, nx138, nx141, nx158, nx161, nx163, nx165,
         nx168, nx183, nx187, nx199, nx200, nx201, nx202, nx203, nx204, nx205,
         nx206, nx207, nx208, nx209, nx174, nx210, nx211, nx212, nx213, nx214,
         nx215, nx216, nx217, nx218, nx219, nx220, nx221, nx222, nx223, nx88,
         nx224, nx225, nx226, nx227, nx228, nx229, nx230, nx231, nx232, nx233;

    OAI2N0 ix45 (.X (nx44), .A1 (nx125), .A2 (nx127), .B1 (nx129), .B2 (nx131)
           ) ;
    IV1N0 ix126 (.X (nx125), .A (b[3])) ;
    IV1N0 ix128 (.X (nx127), .A (a[3])) ;
    XN2R0 ix130 (.X (nx129), .A1 (b[3]), .A2 (a[3])) ;
    AO2I0 ix132 (.X (nx131), .A1 (b[2]), .A2 (a[2]), .B1 (nx10), .B2 (nx134)) ;
    XR2T0 ix11 (.X (nx10), .A1 (b[2]), .A2 (a[2])) ;
    OAI2N0 ix135 (.X (nx134), .A1 (nx136), .A2 (nx138), .B1 (nx18), .B2 (nx141)
           ) ;
    IV1N0 ix137 (.X (nx136), .A (a[1])) ;
    IV1N0 ix139 (.X (nx138), .A (b[1])) ;
    ND2N0 ix19 (.X (nx18), .A1 (a[0]), .A2 (b[0])) ;
    XN2R0 ix142 (.X (nx141), .A1 (b[1]), .A2 (a[1])) ;
    XR2T0 ix9 (.X (nx8), .A1 (b[3]), .A2 (a[3])) ;
    AO2I0 ix159 (.X (nx158), .A1 (b[3]), .A2 (a[3]), .B1 (nx8), .B2 (nx38)) ;
    OAI2N0 ix39 (.X (nx38), .A1 (nx161), .A2 (nx163), .B1 (nx165), .B2 (nx30)) ;
    IV1N0 ix162 (.X (nx161), .A (b[2])) ;
    IV1N0 ix164 (.X (nx163), .A (a[2])) ;
    XN2R0 ix166 (.X (nx165), .A1 (b[2]), .A2 (a[2])) ;
    AO2I0 ix31 (.X (nx30), .A1 (a[1]), .A2 (b[1]), .B1 (nx168), .B2 (nx12)) ;
    IV1N0 ix169 (.X (nx168), .A (nx18)) ;
    XR2T0 ix13 (.X (nx12), .A1 (b[1]), .A2 (a[1])) ;
    NR2R0 ix184 (.X (nx183), .A1 (nx102), .A2 (nx98)) ;
    XR2T0 ix103 (.X (nx102), .A1 (a[0]), .A2 (b[0])) ;
    XN2R0 ix99 (.X (nx98), .A1 (nx12), .A2 (nx18)) ;
    XR2T0 ix188 (.X (nx187), .A1 (nx10), .A2 (nx30)) ;
    IV1NP ix234 (.X (nx199), .A (b[6])) ;
    IV1NP ix235 (.X (nx200), .A (a[6])) ;
    AO2I1 ix236 (.X (nx201), .A1 (a[6]), .A2 (nx199), .B1 (b[6]), .B2 (nx200)) ;
    OR2T0 ix237 (.X (nx202), .A1 (b[5]), .A2 (a[5])) ;
    IV1NP ix238 (.X (nx203), .A (a[4])) ;
    IV1NP ix239 (.X (nx204), .A (b[4])) ;
    NR2Q1 ix240 (.X (nx205), .A1 (a[4]), .A2 (b[4])) ;
    IV1NP ix241 (.X (nx206), .A (b[5])) ;
    IV1NP ix242 (.X (nx207), .A (a[5])) ;
    OAI5N1 ix243 (.X (nx208), .A1 (nx203), .A2 (nx204), .B1 (nx205), .B2 (nx232)
           , .C1 (nx206), .C2 (nx207)) ;
    AN2T0 ix244 (.X (nx209), .A1 (b[5]), .A2 (a[5])) ;
    OAI2N1 nx174_rename (.X (nx174), .A1 (nx203), .A2 (nx204), .B1 (nx205), .B2 (
           nx232)) ;
    OAOI1 ix245 (.X (nx210), .A1 (nx209), .A2 (nx174), .B (nx202), .C (nx201)) ;
    AO3I1 ix246 (.X (nx211), .A1 (nx201), .A2 (nx202), .A3 (nx208), .B (nx210)
          ) ;
    NR2Q1 ix247 (.X (nx212), .A1 (b[5]), .A2 (a[5])) ;
    AN2T0 ix248 (.X (nx213), .A1 (a[4]), .A2 (b[4])) ;
    NR2R1 ix249 (.X (nx214), .A1 (nx213), .A2 (nx233)) ;
    OAI2N1 ix250 (.X (nx215), .A1 (nx212), .A2 (nx209), .B1 (nx214), .B2 (nx205)
           ) ;
    NR2Q1 ix251 (.X (nx216), .A1 (nx206), .A2 (a[5])) ;
    NR2Q1 ix252 (.X (nx217), .A1 (nx207), .A2 (b[5])) ;
    OAI5N1 ix253 (.X (nx218), .A1 (a[4]), .A2 (b[4]), .B1 (nx213), .B2 (nx233),
           .C1 (nx216), .C2 (nx217)) ;
    IV1N2 ix254 (.X (nx219), .A (nx232)) ;
    IV1NP ix255 (.X (nx220), .A (a[4])) ;
    IV1NP ix256 (.X (nx221), .A (b[4])) ;
    AO2I1 ix257 (.X (nx222), .A1 (b[4]), .A2 (nx220), .B1 (a[4]), .B2 (nx221)) ;
    AO1I1 ix258 (.X (nx223), .A1 (a[4]), .A2 (b[4]), .B (nx205)) ;
    OAI2N1 nx88_rename (.X (nx88), .A1 (nx219), .A2 (nx222), .B1 (nx223), .B2 (
           nx232)) ;
    OR2T0 ix259 (.X (nx224), .A1 (nx8), .A2 (nx131)) ;
    ND2N1 ix260 (.X (nx225), .A1 (nx8), .A2 (nx131)) ;
    AO2I1 ix261 (.X (nx226), .A1 (nx224), .A2 (nx225), .B1 (nx183), .B2 (nx187)
          ) ;
    ND4N0 ix262 (.X (nx227), .A1 (nx215), .A2 (nx218), .A3 (nx88), .A4 (nx226)
          ) ;
    AO2LP ix263 (.X (nx228), .A1 (a[6]), .A2 (b[6]), .B1 (b[7]), .B2 (a[7])) ;
    OAI5N1 ix264 (.X (nx229), .A1 (a[4]), .A2 (b[4]), .B1 (nx213), .B2 (nx233),
           .C1 (b[5]), .C2 (a[5])) ;
    ND2N1 ix265 (.X (nx230), .A1 (b[5]), .A2 (a[5])) ;
    AO1A0 ix266 (.X (nx231), .A1 (nx229), .A2 (nx230), .B (nx201)) ;
    OAI3N1 x_rename_rename (.X (x), .A1 (nx211), .A2 (nx227), .B1 (nx228), .B2 (
           nx231)) ;
    WGT1 ix267 (.X (nx232), .CK (nx158)) ;
    WGT1 ix268 (.X (nx233), .CK (nx44)) ;
endmodule