Problem 1,
EE 4720 HW 4 Solution |
Top Next |

The most informative notation would indicate both the hardware the instruction were in and how far along execution it was. (The existing notation shows only how far along execution an instruction is, which unambiguously indicates the hardware only when the initiation interval is one). In one possible solution the location is indicated with a 3-part label. The first part shows the functional unit using a capital letter (A for add, etc.). The second part shows which part of the functional unit the instruction is in, in parenthesis. The third part shows the execution step. For example, A(2)2 shows an instruction in the second adder segment, which is also the second step of execution. (When the initiation interval is one the execution unit part and step will always be the same.) As another example, D(1)20 shows an instruction in the one-and-only divide unit part, in the twentieth step of execution. |

The notation is used in the pipeline execution below, which runs on the implementation described in problem 2. |

div f11, f4, f5 IF ID D(1)1 D(1)2 D(1)3 D(1)4 D(1)5 D(1)6 D(1)7 D(1)8 D(1)9 mul f0, f1, f2 IF ID M(1)1 M(1)2 M(2)3 M(2)4 MEM WB mul f3, f6, f7 IF ID ----> M(1)1 M(1)2 M(2)3 M(2)4 MEM WB sub f8, f9, f10 IF ----> ID A(1)1 A(2)2 A(3)3 --> MEM WB

Problem 2,
EE 4720 HW 4 Solution |
Top Previous Next |

The problem did not specify how WAW hazards were to be resolved. They could be resolved by stalling the second multiply so it writes after the divide or by cancelling the divide when the multiply is either in ID or WB. (The divide can be canceled because no instruction reads the value it produces.) The second approach will be used since it does not stall following instructions. The execution diagram appears below, using the notation from part 1 |

Time 0 1 2 3 4 5 6 7 8 9 10 11 12div f3, f4, f5 IF ID D(1)1 D(1)2 D(1)3 D(1)4 D(1)5 D(1)6 D(1)7 D(1)8 x mul f0, f1, f2 IF ID M(1)1 M(1)2 M(2)3 M(2)4 MEM WB mul f3, f6, f7 IF ID ----> M(1)1 M(1)2 M(2)3 M(2)4 MEM WB sub f8, f9, f10 IF ----> ID A(1)1 A(2)2 A(3)3 --> MEM WB mul f11, f0, f12 IF ID M(1)1 M(1)2 M(2)3 M(2)3 MEM WB

The execution above has two stalls, one in cycle 4 due to the multiply-unit structural hazard, the other in cycle 9 due to the memory stage structural hazard. |

Problem 3,
EE 4720 HW 4 Solution |
Top Previous Next |

As with the previous problem, WAW hazards will be handled by cancelling the first instruction writing a register when the second instruction writing that register is in the WB stage. |

Time 0 1 2 3 4 5 6 7 8 9 10 11 12div f3, f4, f5 IF ID D(1)1 D(1)2 D(1)3 D(1)4 D(1)5 D(1)6 D(1)7 D(1)8 x mul f0, f1, f2 IF ID M(1)1 M(1)2 M(2)3 M(2)4 MEM WB mul f3, f6, f7 IF ID ----> M(1)1 M(1)2 M(2)3 M(2)4 MEM WB sub f8, f9, f10 IF ----> ID ----> A(1)1 A(2)2 A(3)3 MEM WB mul f11, f0, f12 IF ----> ID M(1)1 M(1)2 M(2)3 M(2)3 MEM WB

Notice that the ID-stage stall delays the third multiply by one cycle. |

Problem 4,
EE 4720 HW 4 Solution |
Top Previous Next |

In the execution below, the integer unit uses reservation stations 3 and 4. Branch instructions are shown stopping after ID since they don't do anything useful after that and so reservation stations are not shown. (If the branch had to wait for |r1| it would sit in a reservation station which would be shown.) The execution is shown until cycle 25, two cycles after the second multiply writeback. |

Time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25addi IF ID 3:EX 3:WB LOOP: ld IF ID 4:EX 4:ME 4:WB IF ID 4:EX 4:ME 4:WB IF ID 4:EX 4:ME 4:WB IF ID 4:EX 4:ME 4:WB IF subi IF ID 3:EX ---> 3:WB IF ID 3:EX ---> 3:WB IF ID 3:EX ---> 3:WB IF ID 3:EX ---> 3:WB mul IF ID 1:M1 1:M2 1:M3 1:M4 1:M5 1:M6 1:M7 1:M8 1:M9 1:WB IF ID 2:RS 2:RS 2:RS 2:RS 2:M1 2:M2 2:M3 2:M4 2:M5 2:M6 2:M7 2:M8 2:M9 2:WB IF ID 1:RS 1:RS 1:RS 1:RS 1:RS 1:RS 1:RS 1:RS 1:M1 1:M2 1:M3 IF ID ------------------> 2:RS 2:RS bneq IF ID IF ID IF ID IF ------------------> ID I1 IF IF IF IF

Before reservation stations run out, each iteration of the loop above takes five cycles, after they run out nine cycles per iteration will be needed. The reservation stations allow the "integer" part of the loop to get several cycles ahead of the floating point part. |

Problem 5,
EE 4720 HW 4 Solution |
Top Previous |

In the solution to the previous problem, in cycle 9 the multiply instruction is in ID for the second time. It can move into the execute stage only if the result from the previous iteration is ready. Since the first multiply is in ID in cycle 4, the multiply unit would have to produce a result in 5 (or fewer) cycles to avoid delaying the second multiply. If the execution of the multiply is delayed by any amount, all reservation stations will eventually be used up. A multiply unit that produces a result in 5 cycles has a latency of 4. |

David M. Koppelman - koppel@ee.lsu.edu | Modified 12 Apr 1999 13:46 (18:46 UTC) |