///  LSU EE 4755 -- Fall 2020 -- Digital Design / HDL
/// Verilog Notes 014 -- Synthesis Overview

/// Under Construction

/// Contents
 // Synthesis Overview
 // Synthesis Steps
 // Synthesis of Simple Logic

/// References

// :SV17: IEEE 1800-2017 -- The SystemVerilog Standard
//        https://ieeexplore.ieee.org/document/8299595/
//        This is for those already familiar with Verilog.
// :BV3:  Brown & Vranesic, Fundamentals of Digital Logic with Verilog, 3rd Ed.
//        The text used in LSU EE 2740.

/// Synthesis Overview

// :BV3: 10 (The whole chapter. It's short.)

 /// :Def: Synthesis
//  The steps needed to convert a Verilog (or other HDL) description into
//  a form that can be manufactured or downloaded into an FPGA or other
//  programmable device.

 /// :Def: Inference         <-- Make sure that this is clearly understood.
//  The process of converting behavioral code into primitives
//  or modules recognized by the synthesis program.

 /// :Def: Optimization   <-- Did it do a good job? It's up to us to know.
//  The process of simplifying logic to realize a set of goals.
//  Typical goals are to minimize cost while meeting a timing
//  constraint.

 /// :Def: Synthesis Technology Target
//   The type of semiconductor (or other) technology used to build the design.
//   Sometimes shortened to ``target''

 ///  Common Targets
//  ASIC
//   Application-Specific Integrated Circuit
//   A fully custom chip.
//   Output of synthesis program drives machines fabricating chip.

//  FPGA
//   Field-Programmable Gate Array
//   Programmable logic.
//   Output of synthesis program downloaded into pre-manufactured chip.

 /// What a Synthesis Program Does
//  First Step
//    Read a Verilog description.
//  Last Step (Informally known as Tape-Out)
//    Write a design file that specifies (for ASICS):
//      The components that are needed. (and, mux, ram)
//      How they are connected.
//      The components' locations and how the wires are routed.

 /// Synthesis In a Perfect World
//  Engineer prepares a Verilog description.
//  Using simulation verify that description works as desired.
//  Click ``synthesize'' and enter credit card number.
//  Wait a few weeks.
//  Chips arrive, and perform exactly as simulated chips do.

 /// Synthesis In the Early 21st Century
//  Engineer prepares a Verilog (or other HDL) description following
//    the synthesis program's /design rules/.
//  Using simulation, engineer verifies that description works as desired.
//  Runs synthesis program,
//   compares area and timing of synthesized design to expectations,
//   if satisfied, the design is done,
//   otherwise modify description to fix problem or tune design.

/// Synthesis Steps

// :BV3: 10 (The whole chapter.) Description of steps is slightly different.

 /// Synthesis Steps
//  (0)  Elaborate
//       -- Gather and "wire together" modules based on Verilog, libraries, etc.
//  (1)  Infer
//       -- Transform Verilog code to /generic gates/ (a simple form).
//  (1a) High-Level Optimization
//       -- Reduce number of gates, etc.
//  (2)  Technology Mapping
//       -- Replace generic gates with /cells/ from /technology target/.
//  (3)  Optimization
//       -- Reduce number of gates.
//       -- Fit within timing constraint.
//  (3a) Timing Simulation
//       -- Simulate (e.g., using Verilog) with cell timing.
//  (4)  Place and Route
//       -- Simulate (e.g., using Verilog) with cell and interconnect timing.

 /// Step 0: Elaboration
//  Note: Elaboration is also performed before simulation.
//  Input:  A Verilog description, information on libraries, etc.
//  Output: A Verilog description in which all modules are present.
//  Elaboration will be covered in detail elsewhere.
//  Briefly, to main activities:
//    Modules that are instantiated in the code but which are not
//      defined in the code are located by looking in places such
//      as libraries.
//    Generate statements are executed. (Generate statements form
//      programs that write Verilog descriptions.)

 /// Step 1:  Inference
//  Input:  An elaborated Verilog description.
//  Output: An Explicit Structural Description using Generic Gates

//  For implicit structural input, operators are replaced with pre-defined
//  modules or gates.
//  For example,
//  -- an "&&" operator is replaced with an AND gate
//  -- an "!" operator is replaced with an inverter (NOT gate).

//  For arithmetic operators synthesis program might use Verilog
//  descriptions from a library.
//  For example, for Cadence Encounter with ChipWare library
//  -- a "+" operator is replaced with Verilog module CW_add.

//  Synthesis programs can also recognize certain types of flip-flops
//  and latches.

//  The output of this step consists of generic gates and modules
//  (see next step).

//  --- Easy Stuff
//      Inference of explicit and implicit structural Verilog.
//      Inference of combinational logic.
//  --- Not-So-Easy Stuff
//      Inference of behavioral code.
//      Inference of generated code (Verilog using generate statements.)
//      Inference of sequential logic.

 /// Step 2:  Technology Mapping
//  The generic gates and modules are replaced with /cells/
//  from the target technology library.

//    ASIC technology libraries have a ``normal'' set of gates and
//    modules: AND, OR, adders, etc.

//    FPGA technology libraries usually consist of mostly multiplexors
//    or lookup tables, perhaps with a few memory banks.

//  Target gates don't always match generic ones, so technology mapping
//    code finds substitutes.

//    For example, a generic 3-input AND gate might be mapped to a
//    four-input AND gate in the technology library with one input
//    tied to logic 1.

//  One of several target gates might be chosen based on fanout needs.

//  Generic arithmetic modules replaced by technology modules, if available,
//  otherwise by gates.

 /// Step 3:  Optimization

//  The output of the last step is modified to achieve any specified
//  timing goal and then to reduce the cost (roughly the number of
//  gates).

//  Synthesis programs are usually provided with detailed timing
//  constraints. For example, module output must be available 1.2 ns
//  after module inputs arrive.

 /// Step 4: Place and Route

//  Finally, locations are chosen for components in the
//  optimized design and routes are chosen for wires connecting them.

//  No further details will be given for this step.

/// Synthesis Example - Cadence Genus Synthesis

 /// References
 //  Step-by-step Instructions (For this Course)
 //  https://www.ece.lsu.edu/v/proc.html#synthesis
 //  Genus User Guide -- Access from LSU Only.
 //  https://www.ece.lsu.edu/v/s/genus_user.pdf
 //  Links to More Documentation
 //  https://www.ece.lsu.edu/v/ref.html

module shift_right_logarithmic
  ( output uwire [15:0] sh,
    input uwire [15:0] s0,
    input uwire [3:0] amt );

   uwire [15:0] s1, s2, s3;

   mux2 st0( s1, amt[0], s0, {1'b0, s0[15:1]} ); // Shift by 0 or 1
   mux2 st1( s2, amt[1], s1, {2'b0, s1[15:2]} ); // Shift by 0 or 2
   mux2 st2( s3, amt[2], s2, {4'b0, s2[15:4]} ); // Shift by 0 or 4
   mux2 st3( sh, amt[3], s3, {8'b0, s3[15:8]} ); // Shift by 0 or 8


module mux2
  ( output uwire [15:0] x,
    input uwire select,
    input uwire [15:0] a0, a1 );
    assign x = select ? a1 : a0;

 /// Start Program

 // [cyc.ece.lsu.edu] % genus
 // Cadence Genus(TM) Synthesis Solution.
 // Copyright 2017 Cadence Design Systems, Inc. All rights reserved worldwide.
 // Cadence and the Cadence logo are registered trademarks and Genus is a trademark
 // of Cadence Design Systems, Inc. in the United States and other countries.

 // Version: 17.14-s037_1, built Wed Apr 18 2018
 /// [snip]
 // @genus:root: 1> 

 /// Specify Technology Library
 //  set_db library osu035_stdcells.lib
 //  -- Note: No need to type this, it's placed in genus_startup.tcl

 /// Load Verilog Source File to be Synthesized
 //  @genus:root: 1> read_hdl l014-example-shifter.v 

 /// Specify What to Synthesize and Infer
 //  @genus:root: 3> elaborate shift_right_logarithmic

 /// Specify Timing Constraints
 //  create_clock -name clk -period 0.5
 //  set_input_delay -clock clk 0.0 [all_inputs]
 //  set_output_delay -clock clk 0.0 [all_outputs]

 /// Perform High-Level Optimizations
 //  syn_gen

 /// Write Out Verilog (For debugging or familiarization.)
 //  write_hdl > mrr-generic.v

 /// Technology Map
 //  syn_map

 /// Perform Low-Level Optimization
 //  syn_opt

/// Synthesis of Simple Logic

// Verilog description to implement:  x = ab + ac + ad;
// Target: ASIC produced by Fab Fab Tech [tm] (Name made up.)

// :Example:
// Verilog code, written by human, to implement x = ab + ac + ad;

module my_logic(x,a,b,c,d);
   input uwire a, b, c, d;
   output uwire x;

   assign x = a & b  |  a & c  |  a & d;



// :Example:
//  Simplified output of inference stage.
//  The Verilog below was hand written but designed to show what would
//  be produced by the synthesis steps.  The actual output is
//  less readable, see next example.

module logic_rtl(x,a,b,c,d);
   input a, b, c, d;
   output x;

   wire   ab, ac, ad;

   and a1(ab,a,b);
   and a2(ac,a,c);
   and a3(ad,a,d);

   or o1(x,ab,ac,ad);


// :Example:
//  The real output of the inference stage.
//  Generated by Cadence Genus(TM) Synthesis Solution 17.14-s037_1

module my_logic_rtl_real(x, a, b, c, d);
  input a, b, c, d;
  output x;
  wire a, b, c, d;
  wire x;
  wire n_4, n_5, n_7, n_8;
  and g1 (n_4, a, b);
  and g2 (n_5, a, c);
  or g3 (n_7, n_4, n_5);
  and g4 (n_8, a, d);
  or g5 (x, n_7, n_8);

// :Example:
// Simplified output of the technology mapping step.
// Here generic gates (and, or) are replaced with specific
// gates from the target technology, fab_fab_and_2 and fab_fab_or_4.

module logic_tech(x,a,b,c,d);
   input a, b, c, d;
   output x;

   wire   ab, ac, ad;

   fab_fab_and_2 a1(ab,a,b);
   fab_fab_and_2 a2(ac,a,c);
   fab_fab_and_2 a3(ad,a,d);

   fab_fab_or_4 o1(x,ab,ac,ad,1'b0);


// :Example:
// Simplified output of the optimization step. Logic is
// simplified by optimization program.

module logic_opt(x,a,b,c,d);
   input a, b, c, d;
   output x;

   wire   bcd;

   fab_fab_or_4 o1(bcd,b,c,d,1'b0);
   fab_fab_and_2 a1(x,a,bcd);


// :Example:
// The real output of the optimization stage.
//  Note that OAI21X1 performs the operation Y = ! ( (A|B) & C )

// Generated by Cadence Genus(TM) Synthesis Solution 17.14-s037_1

module my_logic_real_opt(x, a, b, c, d);
  input a, b, c, d;
  output x;
  wire a, b, c, d;
  wire x;
  wire n_0, n_1;
  INVX1 g53(.A (n_1), .Y (x));
  OAI21X1 g54__8780(.A (d), .B (n_0), .C (a), .Y (n_1));
  OR2X2 g55__4296(.A (b), .B (c), .Y (n_0));

/// Synthesizing Arithmetic

// Many synthesis programs recognize common integer arithmetic operators
// and will substitute appropriate library functions.

// Design hardware with two 8-bit inputs, and 1-bit output that is
// true if sum of unsigned integers on inputs > 120 and false
// otherwise.

// :Example:
// Verilog description of hardware written by human.

module too_bit(output uwire x, input uwire [7:0] a,b);

   assign      x = a + b > 120;


// :Example:
// Simplified output of inference step.  Operators replaced with
// instantiation of generic (can be fitted to many technologies) adder
// and comparison units.

module too_bit_rtl(x,a,b);
   input [7:0] a,b;
   output      x;

   wire [8:0]  ab;

   generic_add_8 a1(ab,a,b);
   generic_compare_gt_9 gc1(x,ab,9'd120);


// :Example:
// The actual output of the inference step.

module too_bit_inf ( x, a, b ) ;

    output x ;
    input [7:0]a ;
    input [7:0]b ;

    wire GND, nx2, nx3, nx4, nx5, nx6, nx7, nx8, nx9, nx10, PWR;
    wire [0:0] \$dummy ;

    assign GND = 0 ;
    add_9u_9u_9u_0_0 x_add_0 (.cin (GND), .a ({GND,a[7],a[6],a[5],a[4],a[3],a[2]
                     ,a[1],a[0]}), .b ({GND,b[7],b[6],b[5],b[4],b[3],b[2],b[1],
                     b[0]}), .d ({nx2,nx3,nx4,nx5,nx6,nx7,nx8,nx9,nx10}), .cout (
                     \$dummy [0])) ;
    assign PWR = 1 ;
    gt_9u_9u x_gt_1 (.a ({nx2,nx3,nx4,nx5,nx6,nx7,nx8,nx9,nx10}), .b ({GND,GND,
             PWR,PWR,PWR,PWR,GND,GND,GND}), .d (x)) ;

// :Example:
// Simplified output of technology mapping.
//  Synthesis program used individual gates rather than a comparison unit
//  because one input is a constant, and so the gates can be optimized.

module too_bit_tech(x,a,b);
   input [7:0] a,b;
   output      x;

   wire [8:0]  ab;

   fab_fab_add_8 a1(ab,a,b);

   // Gates implementing a comparison circuit. (Not shown.)


// :Example:
// Simplified output of optimization step.
// Since one input to comparison is a constant comparison was
// optimized.

module too_bit_opt(x,a,b);
   input [7:0] a,b;
   output      x;

   wire [8:0]  ab;

   fab_fab_add_8 a1(ab,a,b);

   // Gates implementing a comparison circuit, simplified because
   // one operand is a constant.


// :Example:
// The actual output of optimization.

module too_bit_opt_real ( x, a, b ) ;
    output x ;
    input [7:0]a ;
    input [7:0]b ;

    wire nx8, nx10, nx12, nx18, nx30, nx38, nx44, nx98, nx102, nx125, nx127,
         nx129, nx131, nx134, nx136, nx138, nx141, nx158, nx161, nx163, nx165,
         nx168, nx183, nx187, nx199, nx200, nx201, nx202, nx203, nx204, nx205,
         nx206, nx207, nx208, nx209, nx174, nx210, nx211, nx212, nx213, nx214,
         nx215, nx216, nx217, nx218, nx219, nx220, nx221, nx222, nx223, nx88,
         nx224, nx225, nx226, nx227, nx228, nx229, nx230, nx231, nx232, nx233;

    OAI2N0 ix45 (.X (nx44), .A1 (nx125), .A2 (nx127), .B1 (nx129), .B2 (nx131)
           ) ;
    IV1N0 ix126 (.X (nx125), .A (b[3])) ;
    IV1N0 ix128 (.X (nx127), .A (a[3])) ;
    XN2R0 ix130 (.X (nx129), .A1 (b[3]), .A2 (a[3])) ;
    AO2I0 ix132 (.X (nx131), .A1 (b[2]), .A2 (a[2]), .B1 (nx10), .B2 (nx134)) ;
    XR2T0 ix11 (.X (nx10), .A1 (b[2]), .A2 (a[2])) ;
    OAI2N0 ix135 (.X (nx134), .A1 (nx136), .A2 (nx138), .B1 (nx18), .B2 (nx141)
           ) ;
    IV1N0 ix137 (.X (nx136), .A (a[1])) ;
    IV1N0 ix139 (.X (nx138), .A (b[1])) ;
    ND2N0 ix19 (.X (nx18), .A1 (a[0]), .A2 (b[0])) ;
    XN2R0 ix142 (.X (nx141), .A1 (b[1]), .A2 (a[1])) ;
    XR2T0 ix9 (.X (nx8), .A1 (b[3]), .A2 (a[3])) ;
    AO2I0 ix159 (.X (nx158), .A1 (b[3]), .A2 (a[3]), .B1 (nx8), .B2 (nx38)) ;
    OAI2N0 ix39 (.X (nx38), .A1 (nx161), .A2 (nx163), .B1 (nx165), .B2 (nx30)) ;
    IV1N0 ix162 (.X (nx161), .A (b[2])) ;
    IV1N0 ix164 (.X (nx163), .A (a[2])) ;
    XN2R0 ix166 (.X (nx165), .A1 (b[2]), .A2 (a[2])) ;
    AO2I0 ix31 (.X (nx30), .A1 (a[1]), .A2 (b[1]), .B1 (nx168), .B2 (nx12)) ;
    IV1N0 ix169 (.X (nx168), .A (nx18)) ;
    XR2T0 ix13 (.X (nx12), .A1 (b[1]), .A2 (a[1])) ;
    NR2R0 ix184 (.X (nx183), .A1 (nx102), .A2 (nx98)) ;
    XR2T0 ix103 (.X (nx102), .A1 (a[0]), .A2 (b[0])) ;
    XN2R0 ix99 (.X (nx98), .A1 (nx12), .A2 (nx18)) ;
    XR2T0 ix188 (.X (nx187), .A1 (nx10), .A2 (nx30)) ;
    IV1NP ix234 (.X (nx199), .A (b[6])) ;
    IV1NP ix235 (.X (nx200), .A (a[6])) ;
    AO2I1 ix236 (.X (nx201), .A1 (a[6]), .A2 (nx199), .B1 (b[6]), .B2 (nx200)) ;
    OR2T0 ix237 (.X (nx202), .A1 (b[5]), .A2 (a[5])) ;
    IV1NP ix238 (.X (nx203), .A (a[4])) ;
    IV1NP ix239 (.X (nx204), .A (b[4])) ;
    NR2Q1 ix240 (.X (nx205), .A1 (a[4]), .A2 (b[4])) ;
    IV1NP ix241 (.X (nx206), .A (b[5])) ;
    IV1NP ix242 (.X (nx207), .A (a[5])) ;
    OAI5N1 ix243 (.X (nx208), .A1 (nx203), .A2 (nx204), .B1 (nx205), .B2 (nx232)
           , .C1 (nx206), .C2 (nx207)) ;
    AN2T0 ix244 (.X (nx209), .A1 (b[5]), .A2 (a[5])) ;
    OAI2N1 nx174_rename (.X (nx174), .A1 (nx203), .A2 (nx204), .B1 (nx205), .B2 (
           nx232)) ;
    OAOI1 ix245 (.X (nx210), .A1 (nx209), .A2 (nx174), .B (nx202), .C (nx201)) ;
    AO3I1 ix246 (.X (nx211), .A1 (nx201), .A2 (nx202), .A3 (nx208), .B (nx210)
          ) ;
    NR2Q1 ix247 (.X (nx212), .A1 (b[5]), .A2 (a[5])) ;
    AN2T0 ix248 (.X (nx213), .A1 (a[4]), .A2 (b[4])) ;
    NR2R1 ix249 (.X (nx214), .A1 (nx213), .A2 (nx233)) ;
    OAI2N1 ix250 (.X (nx215), .A1 (nx212), .A2 (nx209), .B1 (nx214), .B2 (nx205)
           ) ;
    NR2Q1 ix251 (.X (nx216), .A1 (nx206), .A2 (a[5])) ;
    NR2Q1 ix252 (.X (nx217), .A1 (nx207), .A2 (b[5])) ;
    OAI5N1 ix253 (.X (nx218), .A1 (a[4]), .A2 (b[4]), .B1 (nx213), .B2 (nx233),
           .C1 (nx216), .C2 (nx217)) ;
    IV1N2 ix254 (.X (nx219), .A (nx232)) ;
    IV1NP ix255 (.X (nx220), .A (a[4])) ;
    IV1NP ix256 (.X (nx221), .A (b[4])) ;
    AO2I1 ix257 (.X (nx222), .A1 (b[4]), .A2 (nx220), .B1 (a[4]), .B2 (nx221)) ;
    AO1I1 ix258 (.X (nx223), .A1 (a[4]), .A2 (b[4]), .B (nx205)) ;
    OAI2N1 nx88_rename (.X (nx88), .A1 (nx219), .A2 (nx222), .B1 (nx223), .B2 (
           nx232)) ;
    OR2T0 ix259 (.X (nx224), .A1 (nx8), .A2 (nx131)) ;
    ND2N1 ix260 (.X (nx225), .A1 (nx8), .A2 (nx131)) ;
    AO2I1 ix261 (.X (nx226), .A1 (nx224), .A2 (nx225), .B1 (nx183), .B2 (nx187)
          ) ;
    ND4N0 ix262 (.X (nx227), .A1 (nx215), .A2 (nx218), .A3 (nx88), .A4 (nx226)
          ) ;
    AO2LP ix263 (.X (nx228), .A1 (a[6]), .A2 (b[6]), .B1 (b[7]), .B2 (a[7])) ;
    OAI5N1 ix264 (.X (nx229), .A1 (a[4]), .A2 (b[4]), .B1 (nx213), .B2 (nx233),
           .C1 (b[5]), .C2 (a[5])) ;
    ND2N1 ix265 (.X (nx230), .A1 (b[5]), .A2 (a[5])) ;
    AO1A0 ix266 (.X (nx231), .A1 (nx229), .A2 (nx230), .B (nx201)) ;
    OAI3N1 x_rename_rename (.X (x), .A1 (nx211), .A2 (nx227), .B1 (nx228), .B2 (
           nx231)) ;
    WGT1 ix267 (.X (nx232), .CK (nx158)) ;
    WGT1 ix268 (.X (nx233), .CK (nx44)) ;