EE 4720 Computer Architecture - HW 4 Solution (Spring 1998)

Problem 1, EE 4720 HW 4 Solution

The most informative notation would indicate both the hardware the instruction were in and how far along execution it was. (The existing notation shows only how far along execution an instruction is, which unambiguously indicates the hardware only when the initiation interval is one). In one possible solution the location is indicated with a 3-part label. The first part shows the functional unit using a capital letter (A for add, etc.). The second part shows which part of the functional unit the instruction is in, in parenthesis. The third part shows the execution step. For example, A(2)2 shows an instruction in the second adder segment, which is also the second step of execution. (When the initiation interval is one the execution unit part and step will always be the same.) As another example, D(1)20 shows an instruction in the one-and-only divide unit part, in the twentieth step of execution.

The notation is used in the pipeline execution below, which runs on the implementation described in problem 2.

div f11, f4, f5 IF ID D(1)1 D(1)2 D(1)3 D(1)4 D(1)5 D(1)6 D(1)7 D(1)8 D(1)9
mul f0, f1, f2     IF ID    M(1)1 M(1)2 M(2)3 M(2)4 MEM   WB
mul f3, f6, f7        IF    ID    ----> M(1)1 M(1)2 M(2)3 M(2)4 MEM   WB
sub f8, f9, f10             IF    ----> ID    A(1)1 A(2)2 A(3)3 -->   MEM   WB

Problem 2, EE 4720 HW 4 Solution

Top Previous Next

The problem did not specify how WAW hazards were to be resolved. They could be resolved by stalling the second multiply so it writes after the divide or by cancelling the divide when the multiply is either in ID or WB. (The divide can be canceled because no instruction reads the value it produces.) The second approach will be used since it does not stall following instructions. The execution diagram appears below, using the notation from part 1

Time           0   1  2     3     4     5     6     7     8     9     10    11    12
div f3, f4, f5 IF  ID D(1)1 D(1)2 D(1)3 D(1)4 D(1)5 D(1)6 D(1)7 D(1)8 x
mul f0, f1, f2     IF ID    M(1)1 M(1)2 M(2)3 M(2)4 MEM   WB
mul f3, f6, f7        IF    ID    ----> M(1)1 M(1)2 M(2)3 M(2)4 MEM   WB
sub f8, f9, f10             IF    ----> ID    A(1)1 A(2)2 A(3)3 -->   MEM   WB
mul f11, f0, f12                        IF    ID    M(1)1 M(1)2 M(2)3 M(2)3 MEM   WB

The execution above has two stalls, one in cycle 4 due to the multiply-unit structural hazard, the other in cycle 9 due to the memory stage structural hazard.

Problem 3, EE 4720 HW 4 Solution

Top Previous Next

As with the previous problem, WAW hazards will be handled by cancelling the first instruction writing a register when the second instruction writing that register is in the WB stage.

Time           0   1  2     3     4     5     6     7     8     9     10    11  12
div f3, f4, f5 IF  ID D(1)1 D(1)2 D(1)3 D(1)4 D(1)5 D(1)6 D(1)7 D(1)8 x
mul f0, f1, f2     IF ID    M(1)1 M(1)2 M(2)3 M(2)4 MEM   WB
mul f3, f6, f7        IF    ID    ----> M(1)1 M(1)2 M(2)3 M(2)4 MEM   WB
sub f8, f9, f10             IF    ----> ID    ----> A(1)1 A(2)2 A(3)3 MEM   WB
mul f11, f0, f12                        IF    ----> ID    M(1)1 M(1)2 M(2)3 M(2)3 MEM   WB

Notice that the ID-stage stall delays the third multiply by one cycle.

Problem 4, EE 4720 HW 4 Solution

Top Previous Next

In the execution below, the integer unit uses reservation stations 3 and 4. Branch instructions are shown stopping after ID since they don't do anything useful after that and so reservation stations are not shown. (If the branch had to wait for |r1| it would sit in a reservation station which would be shown.) The execution is shown until cycle 25, two cycles after the second multiply writeback.

Time     0    1  2      3    4    5    6    7    8    9    10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25
addi     IF   ID 3:EX 3:WB
LOOP:
ld            IF   ID 4:EX 4:ME 4:WB   IF   ID 4:EX 4:ME 4:WB   IF   ID 4:EX 4:ME 4:WB   IF   ID 4:EX 4:ME 4:WB                       IF
subi               IF   ID 3:EX ---> 3:WB   IF   ID 3:EX ---> 3:WB   IF   ID 3:EX ---> 3:WB   IF   ID 3:EX ---> 3:WB   
mul                     IF   ID 1:M1 1:M2 1:M3 1:M4 1:M5 1:M6 1:M7 1:M8 1:M9 1:WB
                                                 IF   ID 2:RS 2:RS 2:RS 2:RS 2:M1 2:M2 2:M3 2:M4 2:M5 2:M6 2:M7 2:M8 2:M9 2:WB
                                                                          IF   ID 1:RS 1:RS 1:RS 1:RS 1:RS 1:RS 1:RS 1:RS 1:M1 1:M2 1:M3
                                                                                                   IF   ID ------------------> 2:RS 2:RS
bneq                         IF   ID                  IF   ID                  IF   ID                  IF ------------------>   ID
I1                                IF                       IF                       IF                                           IF

Before reservation stations run out, each iteration of the loop above takes five cycles, after they run out nine cycles per iteration will be needed. The reservation stations allow the "integer" part of the loop to get several cycles ahead of the floating point part.

Problem 5, EE 4720 HW 4 Solution

Top Previous

In the solution to the previous problem, in cycle 9 the multiply instruction is in ID for the second time. It can move into the execute stage only if the result from the previous iteration is ready. Since the first multiply is in ID in cycle 4, the multiply unit would have to produce a result in 5 (or fewer) cycles to avoid delaying the second multiply. If the execution of the multiply is delayed by any amount, all reservation stations will eventually be used up. A multiply unit that produces a result in 5 cycles has a latency of 4.

David M. Koppelman - koppel@ee.lsu.edu

Modified 12 Apr 1999 13:46 (18:46 UTC)