1: add r1, r2, r3 2: sub r6, r7, r8 3: lw r10, 0(r20) 4: add r11, r10, r12 5: sub r14, r1, r9 6: add r1, r14, r15 7: sub r16, r17, r18 8: add r19, r21, r22 9: sw 0(r20), r6The instructions are rearranged and placed in groups of 3 for 3VDLX. The first three instructions can be, and are, grouped together because they don't use any values they produce. The next three, on lines 4, 5, and 6, cannot be grouped together because the value produced on line 5 is needed on line 6. The inefficient solution is to keep the instructions in the same order and use nops to avoid hazards, starting with the load latency:

add r1, r2, r3; sub r6, r7, r8; lw r10, 0(r20) nop; nop; nop add r11, r10, r12; sub r14, r1, r9; nop add r1, r14, r15; sub r16, r17, r18; add r19, r21, r22 sw 0(r20), r6 nop; nopThe first set of nops is needed because of the load instruction. The nop in the third 3VDLX instruction is needed because of the true dependency between lines 5 and 6 in the original program. The nops in the last 3VDLX instruction would be needed if we had no further instructions (which is unlikely when the last instruction is not a CTI, but that's where the problem ends).

The instructions can be rearranged for efficient execution. The instructions at lines 5, 7 and 8 can be placed in the second 3VDLX instruction. The remaining can be placed in the third 3VDLX instruction yielding:

add r1, r2, r3; sub r6, r7, r8; lw r10, 0(r20) sub r16, r17, r18; add r19, r21, r22; sub r14, r1, r9 add r11, r10, r12; add r1, r14, r15; sw 0(r20), r6Since the VLIW instructions execute as a unit, there is no need to distinguish between the separate parts of a pipeline stage, as is done for superscalar.

add sub lw IF ID EX MEM WB sub add sub IF ID EX MEM WB add add sw IF ID EX MEM WB

The first three instructions can start executing without delay. Of the next three, only the one on line 5 can start, the add on line 4 must wait for the load and the add on line 6 must wait for the sub on line 5. With two instructions in ID the last group of three instructions are stalled at time 3.

Time 0 1 2 3 4 5 6 7add IF^{1}ID^{1}EX^{1}M^{1}WB^{1}sub IF^{2}ID^{2}EX^{2}M^{2}WB^{2}lw IF^{3}ID^{3}EX^{3}M^{3}WB^{3}add IF^{1}ID^{1}EX^{1}M^{1}WB^{1}sub IF^{2}ID^{2}EX^{2}M^{2}WB^{2}add IF^{3}ID^{3}EX^{3}M^{3}WB^{3}sub IF^{1}ID^{1}EX^{1}M^{1}WB^{1}add IF^{2}ID^{2}EX^{2}M^{2}WB^{2}sw IF^{3}ID^{3}EX^{3}M^{3}WB^{3}

Int 1 2 L/S 3 4 Int 5 6 L/S 7 8 Int 9 10 L/S 11 12where Int refers to an integer execution unit and L/S refers to a load/store unit. Note that reservation stations are dedicated to functional units. If ready instructions were waiting in reservation stations 1 and 2, only one could start in a cycle, even if the other two integer execution units were free.

The pipeline notation is the same used in class, for example,
5:EX^{3} indicates that execution unit 3 is executing an
instruction from reservation station 5.

Load and store instructions need an ALU to compute addresses, in this solution a load/store unit has its own ALU, that stage is indicated by AD.

Execution is given below:

Time 0 1 2 3 4 5 6 7add IF^{1}ID^{1}1:EX^{1}1:WB^{1}sub IF^{2}ID^{2}5:EX^{2}5:WB^{2}lw IF^{3}ID^{3}11:AD^{3}11:M^{3}11:WB^{3}add IF^{1}ID^{1}2:RS 2:RS 2:EX^{1}2:WB^{1}sub IF^{2}ID^{2}6:RS 6:EX^{2}6:WB^{2}add IF^{3}ID^{3}9:RS 9:RS 9:RS 9:EX^{3}9:WB^{3}sub IF^{1}ID^{1}1:EX^{1}1:WB^{1}add IF^{2}ID^{2}5:EX^{2}5:WB^{2}sw IF^{3}ID^{3}12:AD^{3}12:M^{3}12:WB^{3}

David M. Koppelman - koppel@ee.lsu.edu | Modified 2 May 1997 18:34 (23:34 UTC) |