EE 4720 Computer Architecture - HW 3 Solution (Spring 1998)
|
The pipeline execution diagram appears below. With bypass paths
many of the stalls present in the hw 2 execution are not present here,
but two remain. The add following lw must stall one cycle, and the
bneq also stalls one cycle. The bneq instruction stalls, in
the ID stage, because when it first arrives, at cycle 8, the value it
needs to make a decision (prodcued by the slt instruction) is in the
MEM stage. At cycle 9 the needed value (r3) arrives and so the ID stage
of bneq can complete. At the end of bneq's ID stage the target
address is clocked into PC, and so at cycle 10 the target, lw, starts
IF.
The total delay for the branch here is two
cycles. | |
|
The execution of a single instruction uses just one line, and
MEM is abbreviated to ME. Abandoned instruction are shown in gray.
| |
Cycle 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
add IF ID EX ME WB
add IF ID EX ME WB
Loop:
lw IF ID EX ME WB IF ID EX ME WB
add IF ID -> EX ME WB IF ID -> EX ME WB
slt IF -> ID EX ME WB IF -> ID EX ME WB
addi IF ID EX ME WB IF ID EX ME WB
bneq IF ID -> EX ME WB IF ID -> EX ME WB
i1 IF -> ID EX ME WB IF -> ID EX ME WB
|
The CPI is based on corresponding points in consecutive iterations,
here when the first instruction (lw) is in IF, the occurs
at cycle 2 and cycle 10. The average instruction execution
time (CPI) is thus
(10-2)/5 =1.6 CPI. | |
|
As illustrated in the assignment, the branch condition is read
from the register file and so branches must stall until the value
they need is written or in the WB stage (assuming simultaneous
read and write). As was done for arithmetic and logical instructions,
bypass paths can be added to the ID stage. For the code fragment
in problem 1, bypass paths from MEM to ID would be needed. A path
from EX to ID would help when the register is written by an
instruction immediately preceding the branch (although it
might lengthen critical paths): | |
slt r3, r1, r2
bneq r3, TARGET
|
With the forwarding paths the branch could determine if r3 were
zero in cycle 8, avoiding the stall. | |
|
The key here is to re-write the code to avoid the stalls.
The stall after lw can be avoided by placing some instruction between
lw and the add that uses the fetched result, one possibility is
the addi: | |
add r2, r0, r0 ! Clear sum register.
add r5, r10, rll ! Set r5 to first element.
LOOP:
lw r6, 0(r5)
addi r5, r5, #4
add r2, r2, r6
slt r3, r2, r4
bneq r3, LOOP
|
The next problem is, what to put in the branch delay slot.
There seems to be no way to fill it by just rearranging the
instructions or filling it with addi, which would eliminate one bubble,
but open another: | |
add r2, r0, r0 ! Clear sum register.
add r5, r10, rll ! Set r5 to first element.
LOOP:
lw r6, 0(r5)
add r2, r2, r6
slt r3, r2, r4
bneqd r3, LOOP
addi r5, r5, #4 ! In delay slot, part of loop.
|
(bneqd is the delayed branch mnemonic.)
If just rearranging instructions won't work, then adding some might: | |
add r2, r0, r0 ! Clear sum register.
add r5, r10, rll ! Set r5 to first element.
lw r6, 0(r5) ! Added instruction.
LOOP:
addi r5, r5, #4
add r2, r2, r6
slt r3, r2, r4
bneqd r3, LOOP
lw r6, 0(r5) ! In delay slot, part of loop.
|
In the code above, lw has been duplicated. The first lw
initializes r6, the second one executes at the end of the loop.
The code above executes bubble free, with a CPI of 1. The pipeline
execution diagram appears below: | |
Cycle 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
add IF ID EX ME WB
add IF ID EX ME WB
lw IF ID EX ME WB
Loop:
addi IF ID EX ME WB IF ID EX ME WB
add IF ID EX ME WB IF ID EX ME WB
slt IF ID EX ME WB IF ID EX ME WB
bneqd IF ID EX ME WB IF ID EX ME WB
lw IF ID EX ME WB IF ID EX ME WB
|
A repeating iteration starts on cycle 3 and ends before cycle 8,
for a total of 5 cycles. Containing 5 instructions, the CPI is 1.0. | |