EE 4720 1997 Midterm Exam Solution

Problem 1

Part (a)

The modified pipe below implements a high-speed version of mems, the modified portions are shown in red. Pipeline modification for problem 1

mems executes as follows:

      0   1   2   3    4        19   20
mems  IF  ID  EX₁ MEM₁ MEM₁ ... MEM₁ MEM₁ MEM₂ WB
                  EX₂  EX₂      EX₂  EX₃

As usual, the instruction is fetched in IF. In ID rs1 (e.g., r1) is loaded into latch buffer A and rs2 (e.g., r2) is loaded into B. The EX stage is used three different ways, EX₁, EX₂, and EX₃. In EX₁ the address, rs2, is passed through the ALU unchanged. The ID/EX latches are not clocked until the cycle after the element is found, so that rs1 is held in the B buffer. (rs2 is also held, but is not needed.) In MEM₁ the element is retrieved from memory while at the same time the address of the next element is computed in EX₂. Using an added comparison unit, EX₂ also compares the element retrieved in the previous cycle with rs1. The mems instruction will continue using the EX and MEM stages (inserting NOPs into WB and stalling instructions in IF and ID), until the comparison unit signals a match (cycle 19 in the example). In the cycle following a match (20) the execute stage performs EX₃, subtracting 4 to compute the address of the element found in the previous cycle. The address makes its way back to rs2 in MEM₂ and WB.

Part (b)

LOOP:
 lw   r3, 0(r2)
 addi r2, r2, #4	! Words are four bytes long.
 seq  r3, r3, r1	! Compare. Note that lw stall avoided.
 beqz r3, LOOP
 subi r2, r2, #4	! Saves a branch inside or before loop.

Part (c)

The mems instruction executes for three cycles before reaching MEM₁. MEM₁ is used once for each element fetched, plus 1, for a total of 1001 cycles. Two more cycles complete execution for a total of 1006 cycles.

An iteration of the program above executes as follows (on a implementation that includes bypass paths)

      0	  1   2   3   4   5   6   7
lw    IF  ID  EX  MEM WB          IF
addi	  IF  ID  EX  MEM WB
seq	      IF  ID  EX  MEM WB
beqz		  IF  ID  EX  MEM WB

Each iteration takes seven cycles, one additional cycle is needed after the last iteration to complete the subtraction. The total number of cycles is 7 × 1000 + 1 = 7001.

Advanced techniques, covered in chapter 4, will reduce mems advantage.

Problem 2

Part (a)

     0   1   2   3   4
lw   IF  ID  EX  MEM WB
add	 IF  ID  EX  WB

The hazard occurs in cycle 4.

Part (b)

The modifications (additions) are shown in red.
Problem 2b solution.

If the "=ARITH" comparison unit in the EX stage detects an arithmetic instruction, the IR and ALU outputs are written to the MEM/WB latches, skipping a stage. Note that this replaces whatever was in MEM, resulting in incorrect execution if a WB structural hazard is present.

Part (c)

The structural hazard could be avoided by adding a second write port to the register file (additional changes would have to be made to the part (b) solution) or by having an arithmetic instruction execute normally if skipping the MEM stage would result in a structural hazard.

Part (d)

The benefit of skipping the MEM stage is that registers are written one cycle early. If register forwarding is not used then early writeback reduces the number of stall cycles due to RAW hazards, which is worthwhile. If register forwarding is used, then there is no benefit to early writeback since results from any arithmetic instruction are available to instructions immediately following.

Problem 3

Part (a)

An undefined instruction in an ISA might still do something useful in an implementation of that ISA. (This might happen by accident.) Suppose programmers discover such an instruction in a current implementation and use it in their code. These programs would run fine on the current implementation, but the computer engineers working on the new implementation of the ISA may make no attempt to see that undefined instructions execute the same way. (Why should they, the instructions were undefined and programmers were warned not to use them.) Therefore, computers using the new chip may not be able to run the programs using the undefined instructions.

The problem is avoided if undefined instructions raise an illegal instruction exception.

Part (b)

The target address of a trap instruction is found in a table using an index provided in the instruction. The target address of a subroutine call is given in the instruction, be it PC relative, register indirect, immediate, etc.

The target address of a trap instruction is typically in the system part of the address space while subroutines are in the user part.

As a result of executing a trap, the processor switches to privileged mode, while subroutine calls do not change the processor mode.

Part (c)

All instructions that read registers have the register numbers in the same place, bits 6..10 and 11..15, so they can be fetched without knowing the identity of the instruction. There is no harm in fetching registers for an instruction that does not need them, so registers can be fetched for any instruction and so no decoding is necessary.

Part (d)

Program weights in a geometric mean of normalized execution times (or rates) do not depend on their run time on the base machine, so a particular base machine will not exaggerate or diminish the impact of any program in the suite.

Part (e)

The clock rate might change because of faster technology (e.g., the new fabrication process has a smaller feature size) or because engineers chose to make pipeline segments shorter, allowing for a faster clock.

The instruction count might change in two implementations of the same ISA because relative execution times for individual instructions might change (although the instructions available in the old and new implementations are the same), leading programmers to re-write time-critical portions. For example, an integer add might be replaced by a speeded-up floating-point add, perhaps saving integer-to-floating-point and floating-point-to-integer conversion steps.

CPI might change because the number of stall cycles was reduced using advanced techniques, etc.

David M. Koppelman - koppel@ee.lsu.edu

Modified 17 Mar 1998 17:31 (23:31 UTC)