l13.v

/// LSU EE 3755 -- Fall 2001 -- Computer Organization // /// Note Set 13 -- Our MIPS Implementations // Time-stamp: <5 December 2001, 18:29:25 CST, koppel@sol>

These notes are outdated. The lectures page contains links to the latest versions of these notes.

/// Contents // // Class MIPS Implementations // Strawman MIPS Implementation // Hardwired Control MIPS Implementation // Microcoded Control MIPS Implementation // Other Implementations /// References // // :P: Palnitkar, "Verilog HDL" // :Q: Qualis, "Verilog HDL Quick Reference Card Revision 1.0" // :H: Hyde, "Handbook on Verilog HDL" // :LRM: IEEE, Verilog Language Reference Manual (Hawaii Section Numbering) // :PH: Patterson & Hennessy, "Computer Organization & Design" // :HP: Hennessy & Patterson, "Computer Architecture: A Quantitative Approach" // :Mv1: MIPS Technologies, "MIPS32 Architecture for Programmers Vol I: Intro" // :Mv2: MIPS Technologies, "MIPS32 Architecture for Programmers Vol II: Instr" //////////////////////////////////////////////////////////////////////////////// /// Class MIPS Implementations /// Summary of Implementation Types // All Verilog descriptions are synthesizable. // They execute only a subset of MIPS instructions. /// Strawman Implementation (EE 3755) // // http://www.ece.lsu.edu/3755/2001f/mips.html // // Not intended for synthesis. // Illustrates basic implementation techniques. // Can be used to validate other models. /// Hardwired Control Implementation (EE 3755) // // http://www.ece.lsu.edu/3755/2001f/mips3lt.html // // Low cost, small size. // // Processors implemented this way may be used where the smallest // size is needed, for example as part of a complete system on a // chip used to control, say, a microwave oven. /// Microcoded Control Implementation (EE 3755) // // http://www.ece.lsu.edu/3755/2001f/mipsmc.html // // Low cost. // // Microcoding usually applied to processors with more complex // instruction sets. /// Pipelined, Statically Scheduled (EE 4720) // // Similar to hardwired control, but much faster at little additional cost. // Many current processors implemented this way, e.g., Ultrasparc-III /// Pipelined, Dynamically Scheduled (EE 4720) // // Some additional speed, lots of additional complexity. // Many current processors implemented this way, e.g., Pentium IV, MIPS R10000 //////////////////////////////////////////////////////////////////////////////// /// Strawman MIPS Implementation // http://www.ece.lsu.edu/3755/2001f/mips.html /// Simple as possible. /// Uses // // Illustrate basics of implementation to students. // Verify more complex implementations. /// Key Characteristics // /// Each instruction executed in a single cycle. // // Unrealistic because it precludes hardware sharing, for example, // using the ALU to compute a branch condition and target. /// Memory is within processor. // // Unrealistic because memory is usually too large to fit on the // same chip as the CPU. // // Even if some memory is on-chip an interface to that memory // (addr, we, etc) is usually needed for external communication. // // Even if an external interface is present, we might not trust // the synthesis program to properly infer a memory from our // "casual" use of the mem array. /// Almost no thought given to synthesized hardware. // // Synthesis programs cannot yet be trusted to do a good job. // The problems above are only relevant if the description is synthesized. // If simulation is our goal then the Strawman MIPS implementation is // fine because of its simplicity. For simulation a description should // be as simple as possible to minimize the chance of errors. //////////////////////////////////////////////////////////////////////////////// /// Hardwired Control MIPS Implementation // http://www.ece.lsu.edu/3755/2001f/mips3lt.html /// Low Cost Implementation /// Key Characteristics // /// Memory is outside the CPU. // // This is the way it should be. /// The ALU is a separate module. // // In the Strawman Implementation a synthesis program might have // synthesized an adder each place an addition operator is used, a // bitwise AND for each AND, etc. The Hardwired code instantiates // one ALU, so that's all the synthesis program will create. /// Each instruction is executed in multiple cycles. /// Further Refinements That Might Be Made // /// Separate out additional modules. // // In real systems arithmetic and shifting are done by separate // units because shifting logic cannot be shared with arithmetic // or logic. // // Humans know the GPR is a memory with two read ports and a write // port. The synthesis program might not realize it (perhaps due // bad Verilog coding style) and instead provide more write ports // than necessary or it might not realize it can use the // streamlined memory cell provided in the target technology // library. Having a separate GPR module can avoid these problems. // /// Perform some limited pipelining. // // The fetch of one instruction can be overlapped with the // execution of the previous one. See Fall 2001 Homework 7 // Problem 2. /// Refine the ALU design. // // Will the synthesis program efficiently combine the different // operations in the ALU module? If not, re-do the alu with a // lower level design, something like alu2 in l05.v (Section 4.5 // of the text). /// Move code for simmed, etc. out of the procedural block to simplify hardware. // // The synthesis program might create more registers and other // logic than necessary. For example, the assignment rs_val = gpr[rs]; // in the ID state might force the synthesis program to use a // register for rs_val (if it were accessed outside the ID state). // Making that a continuous assignment outside of procedural code // would solve the problem: wire [31:0] rs_val = gpr[rs]; // Another candidate for de-proceduralization is: {opcode,rs,rt,rd,sa,func} = ir; // since ir doesn't change. /// Check timing and move hardware between states. // // For example, "move" the alu input multiplexors from ID to the // EX state. The ID state would compute an alu control input, // like alu_a_src in the microcoded design. /// Use the ALU to compute NPC + 4. /// Add the remaining MIPS instructions! //////////////////////////////////////////////////////////////////////////////// /// Microcoded Control MIPS Implementation // http://www.ece.lsu.edu/3755/2001f/mipsmc.html /// Different Type of Control Logic // In the Hardwired Control Implementation the control signals (the // control inputs to the multiplexors and the enable inputs to the // registers) are output by combinational logic. (The combinational // logic is generated from the case expressions in case statements // and the condition in if statements. To reduce the number of // multiplexor inputs, optimization might make this logic quite // complex.) // In the Microcoded Control Implementation the control signals are // generated by a small computer, called the microcontroller. /// Key Characteristics // /// Control signals generated by microcontroller. // /// Control logic is very simple, complexity is in the microprogram ROM. /// Further Refinements That Might Be Made /// Make part of micro ROM writable. // // This would allow new instructions to be added after // the part is manufactured. (And yes, if it were writable // it probably shouldn't be called a ROM anymore.) /// Implement all instructions! //////////////////////////////////////////////////////////////////////////////// /// Other Implementations // http://www.ece.lsu.edu/ee4720 /// Implementation Techniques Covered in EE 4720 /// Pipelined, Statically Scheduled // // Overlap execution states, so the ID for one instruction // is done at the same time as IF for the previous one. // // "States" for a pipelined processor are carefully chosen to // allow overlap. // // One section of hardware, called a stage, is dedicated to doing IF, // one is dedicated to doing ID, and so on. // // Each stage does what the corresponding state did in the hardwired // processor. // // The stages are: IF, ID, EX, MEM, WB. The last is used to write // back the register file. // // At any cycle there can be up to five instructions in the processor, // each in a different stage. // // This pipelining opens up a can of worms, which we'll have fun playing // with in EE 4720. /// Pipelined, Dynamically Scheduled // // Some instructions take a long time to produce results, floating // point instructions and memory loads, for example. (This problem // was completely avoided in EE 3755 by omitting floating-point // instructions and having perfect memory.) // // In a dynamically scheduled processor instructions are fetched and // decoded in program order, but execute when their operands are // available. An instruction waiting for its operands (because a // previous instruction is taking a long time) does not prevent // instructions that follow it from being fetched, decoded, and // executed.