/// LSU EE 3755 -- Spring 2002 -- Computer Organization
//
/// Note Set 13 -- Our MIPS Implementations
// Time-stamp: <29 April 2002, 15:21:40 CDT, koppel@neptune>
/// Contents
//
// Class MIPS Implementations
// Strawman MIPS Implementation
// Hardwired Control MIPS Implementation
// Microcoded Control MIPS Implementation
// Other Implementations
/// References
//
// :P: Palnitkar, "Verilog HDL"
// :Q: Qualis, "Verilog HDL Quick Reference Card Revision 1.0"
// :H: Hyde, "Handbook on Verilog HDL"
// :LRM: IEEE, Verilog Language Reference Manual (Hawaii Section Numbering)
// :PH: Patterson & Hennessy, "Computer Organization & Design"
// :HP: Hennessy & Patterson, "Computer Architecture: A Quantitative Approach"
// :Mv1: MIPS Technologies, "MIPS32 Architecture for Programmers Vol I: Intro"
// :Mv2: MIPS Technologies, "MIPS32 Architecture for Programmers Vol II: Instr"
////////////////////////////////////////////////////////////////////////////////
/// Class MIPS Implementations
/// WARNING: Links below to code will change in April and May 2002.
/// Summary of Implementation Types
// All Verilog descriptions are synthesizable.
// They execute only a subset of MIPS instructions.
/// Functional Simulator (EE 3755)
//
// http://www.ece.lsu.edu/3755/2002/mips_fs.html
//
// Not intended for synthesis, but synthesizable nevertheless.
// Illustrates basic implementation techniques.
// Can be used to validate other models.
/// Hardwired Control Implementation (EE 3755)
//
// http://www.ece.lsu.edu/3755/2002/mips_hc.html
//
// Low cost, small size.
//
// Processors implemented this way may be used where the very smallest
// size is needed, for example as part of a complete system on a
// chip used to control, say, a microwave oven. In reality,
// even low-cost MIPS implementations are pipelined, unlike this
// implementation (see below).
/// Microcoded Control Implementation (EE 3755)
//
// http://www.ece.lsu.edu/3755/2001f/mipsmc.html
//
// Low cost.
//
// Microcoding usually applied to processors with more complex
// instruction sets.
/// Pipelined, Statically Scheduled (EE 4720)
//
// http://www.ece.lsu.edu/ee4720/v/mipspipeby.html
//
// Similar to hardwired control, but much faster at little additional cost.
// Many current processors implemented this way, e.g., Ultrasparc-III
//
/// Pipelined, Dynamically Scheduled (EE 4720)
//
// http://www.ece.lsu.edu/ee4720/v/mipspipeds.html
//
// Some additional speed, lots of additional complexity.
// Many current processors implemented this way, e.g., Pentium 4, MIPS R10000
////////////////////////////////////////////////////////////////////////////////
/// MIPS Functional Simulator
// http://www.ece.lsu.edu/3755/2002/mips_fs.html
/// Simple as possible.
/// Uses
//
// Illustrate basics of implementation to students.
// Verify more complex implementations.
/// Key Characteristics
//
/// Each instruction executed in a single cycle.
//
// Unrealistic because it precludes hardware sharing, for example,
// using the ALU to compute a branch condition and target.
/// Memory is within processor.
//
// Unrealistic because memory is usually too large to fit on the
// same chip as the CPU.
//
// Even if some memory is on-chip an interface to that memory
// (addr, we, etc) is usually needed for external communication.
//
// Even if an external interface is present, we might not trust
// the synthesis program to properly infer a memory from our
// "casual" use of the mem array.
/// Almost no thought given to synthesized hardware.
//
// Synthesis programs cannot yet be trusted to do a good job.
// The problems above are only relevant if the description is synthesized.
// If simulation is our goal then the MIPS functional simulator
// implementation is fine because of its simplicity. For simulation
// a description should be as simple as possible to minimize the
// chance of errors.
////////////////////////////////////////////////////////////////////////////////
/// Hardwired Control MIPS Implementation
// http://www.ece.lsu.edu/3755/2002/mips_hc.html
/// Low Cost Implementation
/// Key Characteristics
//
/// Memory is outside the CPU.
//
// This is the way it should be.
/// The ALU is a separate module.
//
// In the functional simulator a synthesis program might have
// synthesized an adder each place an addition operator is used, a
// bitwise AND for each AND, etc. The Hardwired code instantiates
// one ALU, so that's all the synthesis program will create.
/// Each instruction is executed in multiple cycles.
/// Further Refinements That Might Be Made
//
/// Separate out additional modules.
//
// In real systems arithmetic and shifting are done by separate
// units because shifting logic cannot be shared with arithmetic
// or logic.
//
// Dividing parts of the design in to separate modules allows them
// to be developed concurrently and makes tuning of cost (gates)
// and performance (cycle time) easier.
//
// Humans know the GPR is a memory with two read ports and a write
// port. The synthesis program might not realize it (perhaps due
// bad Verilog coding style) and instead provide more write ports
// than necessary or it might not realize it can use the
// streamlined memory cell provided in the target technology
// library. Having a separate GPR module can avoid these problems.
//
/// Perform some limited pipelining.
//
// The fetch of one instruction can be overlapped with the
// execution of the previous one. See Fall 2001 Homework 7
// Problem 2.
/// Refine the ALU design.
//
// Will the synthesis program efficiently combine the different
// operations in the ALU module? If not, re-do the alu with a
// lower level design, something like alu2 in l05.v (Section 4.5
// of the text).
/// Move code for simmed, etc. out of the procedural block to simplify hardware.
//
// The synthesis program might create more registers and other
// logic than necessary. For example, the assignment
rs_val = gpr[rs];
// in the ID state might force the synthesis program to use a
// register for rs_val (if it were accessed outside the ID state).
// Making that a continuous assignment outside of procedural code
// would solve the problem:
wire [31:0] rs_val = gpr[rs];
// Another candidate for de-proceduralization is:
{opcode,rs,rt,rd,sa,func} = ir;
// since ir doesn't change.
/// Check timing and move hardware between states.
//
// For example, "move" the alu input multiplexors from ID to the
// EX state. The ID state would compute an alu control input,
// like alu_a_src in the microcoded design.
/// Use the ALU to compute NPC + 4.
/// Add the remaining MIPS instructions!
////////////////////////////////////////////////////////////////////////////////
/// Microcoded Control MIPS Implementation
// http://www.ece.lsu.edu/3755/2001f/mipsmc.html
/// Different Type of Control Logic
// In the Hardwired Control Implementation the control signals (the
// control inputs to the multiplexors and the enable inputs to the
// registers) are output by combinational logic. (The combinational
// logic is generated from the case expressions in case statements
// and the condition in if statements. To reduce the number of
// multiplexor inputs, optimization might make this logic quite
// complex.)
// In the Microcoded Control Implementation the control signals are
// generated by a small computer, called the microcontroller.
// Microcode is used in older mainframe-class systems.
// Microprocessors might use microcode, or something like it, for
// only a few instructions.
// A microcoded MIPS implementation is very unlikely.
/// Key Characteristics
//
/// Control signals generated by microcontroller.
//
/// Control logic is very simple, complexity is in the microprogram ROM.
/// Further Refinements That Might Be Made
/// Make part of micro ROM writable.
//
// This would allow new instructions to be added after
// the part is manufactured. (And yes, if it were writable
// it probably shouldn't be called a ROM anymore.)
/// Implement all instructions!
////////////////////////////////////////////////////////////////////////////////
/// Other Implementations
//http://www.ece.lsu.edu/ee4720
/// Implementation Techniques Covered in EE 4720
/// Pipelined, Statically Scheduled
//
// http://www.ece.lsu.edu/ee4720/v/mipspipeby.html
//
// Overlap execution states, so the ID for one instruction
// is done at the same time as IF for the previous one.
//
// "States" for a pipelined processor are carefully chosen to
// allow overlap.
//
// One section of hardware, called a stage, is dedicated to doing IF,
// one is dedicated to doing ID, and so on.
//
// Each stage does what the corresponding state did in the hardwired
// processor.
//
// The stages are: IF, ID, EX, MEM, WB. The last is used to write
// back the register file.
//
// At any cycle there can be up to five instructions in the processor,
// each in a different stage.
//
// RISC ISAs (MIPS is one, IA-32 [80x86] is not) were chosen to
// facilitate pipelined implementations, which is why most (if not
// all) MIPS implementations are pipelined.
//
// This pipelining opens up a can of worms, which we'll have fun
// playing with in EE 4720.
/// Pipelined, Dynamically Scheduled
//
// http://www.ece.lsu.edu/ee4720/v/mipspipeds.html
//
// Some instructions take a long time to produce results, floating
// point instructions and memory loads, for example. (This problem
// was completely avoided in EE 3755 by omitting floating-point
// instructions and having perfect memory.)
//
// In a dynamically scheduled processor instructions are fetched and
// decoded in program order, but they execute when their operands
// become available which might not be in program order. An
// instruction waiting for its operands (because a previous
// instruction is taking a long time) does not prevent instructions
// that follow it from being fetched, decoded, and executed.