# EE 4720 Computer Architecture - HW 3 Solution (Spring 1998)

 Problem 1,  EE 4720 HW 3 Solution Top   Next

 The pipeline execution diagram appears below. With bypass paths many of the stalls present in the hw 2 execution are not present here, but two remain. The add following lw must stall one cycle, and the bneq also stalls one cycle. The bneq instruction stalls, in the ID stage, because when it first arrives, at cycle 8, the value it needs to make a decision (prodcued by the slt instruction) is in the MEM stage. At cycle 9 the needed value (r3) arrives and so the ID stage of bneq can complete. At the end of bneq's ID stage the target address is clocked into PC, and so at cycle 10 the target, lw, starts IF. The total delay for the branch here is two cycles.
 The execution of a single instruction uses just one line, and MEM is abbreviated to ME. Abandoned instruction are shown in gray.
```Cycle  0  1  2  3  4  5  6  7   8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
add    IF ID EX ME WB
add       IF ID EX ME WB
Loop:
lw           IF ID EX ME WB          IF ID EX ME WB
add             IF ID -> EX ME WB	IF ID -> EX ME WB
slt                IF -> ID EX ME WB	   IF -> ID EX ME WB
addi                     IF ID EX ME WB		 IF ID EX ME WB
bneq                        IF ID -> EX ME WB	    IF ID -> EX ME WB
i1                             IF -> ID EX ME WB       IF -> ID EX ME WB
```
 The CPI is based on corresponding points in consecutive iterations, here when the first instruction (lw) is in IF, the occurs at cycle 2 and cycle 10. The average instruction execution time (CPI) is thus (10-2)/5 =1.6 CPI.

 Problem 2,  EE 4720 HW 3 Solution Top   Previous   Next

 As illustrated in the assignment, the branch condition is read from the register file and so branches must stall until the value they need is written or in the WB stage (assuming simultaneous read and write). As was done for arithmetic and logical instructions, bypass paths can be added to the ID stage. For the code fragment in problem 1, bypass paths from MEM to ID would be needed. A path from EX to ID would help when the register is written by an instruction immediately preceding the branch (although it might lengthen critical paths):
```slt  r3, r1, r2
bneq r3, TARGET
```
 With the forwarding paths the branch could determine if r3 were zero in cycle 8, avoiding the stall.

 Problem 3,  EE 4720 HW 3 Solution Top   Previous

 The key here is to re-write the code to avoid the stalls. The stall after lw can be avoided by placing some instruction between lw and the add that uses the fetched result, one possibility is the addi:
```        add     r2, r0, r0    ! Clear sum register.
add     r5, r10, rll  ! Set r5 to first element.
LOOP:
lw      r6, 0(r5)
slt     r3, r2, r4
bneq    r3, LOOP
```
 The next problem is, what to put in the branch delay slot. There seems to be no way to fill it by just rearranging the instructions or filling it with addi, which would eliminate one bubble, but open another:
```        add     r2, r0, r0    ! Clear sum register.
add     r5, r10, rll  ! Set r5 to first element.
LOOP:
lw      r6, 0(r5)
slt     r3, r2, r4
bneqd   r3, LOOP
addi    r5, r5, #4    ! In delay slot, part of loop.
```
 (bneqd is the delayed branch mnemonic.) If just rearranging instructions won't work, then adding some might:
```        add     r2, r0, r0    ! Clear sum register.
add     r5, r10, rll  ! Set r5 to first element.
lw      r6, 0(r5)     ! Added instruction.
LOOP:
slt     r3, r2, r4
bneqd   r3, LOOP
lw      r6, 0(r5)     ! In delay slot, part of loop.
```
 In the code above, lw has been duplicated. The first lw initializes r6, the second one executes at the end of the loop. The code above executes bubble free, with a CPI of 1. The pipeline execution diagram appears below:
```Cycle  0  1  2  3  4  5  6  7   8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
add    IF ID EX ME WB
add       IF ID EX ME WB
lw           IF ID EX ME WB
Loop:
addi            IF ID EX ME WB IF ID EX ME WB
add                IF ID EX ME WB IF ID EX ME WB
slt                   IF ID EX ME WB IF ID EX ME WB
bneqd                    IF ID EX ME WB IF ID EX ME WB
lw                          IF ID EX ME WB IF ID EX ME WB
```
 A repeating iteration starts on cycle 3 and ends before cycle 8, for a total of 5 cycles. Containing 5 instructions, the CPI is 1.0.

 David M. Koppelman - koppel@ee.lsu.edu Modified 16 Apr 1998 17:49 (22:49 UTC)