# EE 4720 Computer Architecture - HW 4 Solution

### Problem 1

In the timing diagrams below the active PM mask bit is indicated for each cycle. In the solution to homework 3, instructions are nullified (replaced with NOPs) in the EX stage; the segments used by nullified instructions are surrounded by X's. Note that the mask bit is checked in the EX stage and that because of pipeline stalls, mask bits become misaligned with instructions. For example, in the execution below, because the SUB instruction stalls it "misses" its zero.

With R4 = 0:

```Time:0   1   2   3   4   5   6   7   8   9   10
Mask:		 0   1	 0   1	 1   1	 1   1
PM   IF  ID  EX  MEM WB
ADD	 IF  ID  XEX XMX XWX
LW	     IF  ID  EX  MEM WB
SUB		 IF  ID      EX  MEM WB
```
Below, the SUB instruction executes normally (albeit with a stall) because the NOP was inserted in the EX stage while SUB was stalled in ID.

With R4 = 1

```Time:0   1   2   3   4   5   6   7   8   9   10
Mask:		 1   0	 1   1	 1   1	 1   1
PM   IF  ID  EX  MEM WB
ADD	 IF  ID  EX  MEM WB
LW	     IF  ID  XEX XMX XWX
SUB		 IF  ID      EX  MEM WB
```
Note that, if the controller doesn't know that LW will be nullified, it will have to stall SUB to avoid the possible RAW hazard. (The stall is unnecessary in this case.)

For R4 = 1, R5 = 0, Mask = 00010011

```Time: 0   1   2   3   4   5   6   7   8   9   10
Mask:	          1   1	  0   0	  1   0	  0   0
PM    IF  ID  EX  MEM WB
ADD	  IF  ID  EX  MEM WB  IF  ID  XEX XMX XWX
BEQZ	      IF  ID  EX  MEM WB  IF  ID  XEX XMX XWX
SUB		  IF  ID	      IF  ID  XEX XMX XWX
```
For R4 = 1, R5 = 1, Mask = 00010011
```Time: 0   1   2   3   4   5   6   7   8   9   10
Mask:	          1   1	  0   0	  1   0	  0   0
PM    IF  ID  EX  MEM WB
ADD	  IF  ID  EX  MEM WB
BEQZ	      IF  ID  EX  MEM WB
SUB		  IF  ID  XEX XMX XWX
```
For R4 = 0, R5 = 1 or R5 = 0, Mask = 11111100
```Time: 0   1   2   3   4   5   6   7   8   9   10
Mask:	          0   0	  1   1	  1   1	  1   1
PM    IF  ID  EX  MEM WB
ADD	  IF  ID  XEX XMX XWX
BEQZ	      IF  ID  XEX XMX XWX
SUB		  IF  ID  EX  MEM WB
```

### Problem 2

Use a shift register in which all all bits are available, not just the bit at the end of the register. And these bits together, call the result DONE. DONE indicates that the current and next seven instructions will execute. Generate a second signal, PM_ILL, by detecting an LW or CTI instruction opcode in the EX stage. Then PM_VIOL is the and of PM_ILL and NOT DONE. The exception can easily be made precise since it is detected while the the faulting instruction is in the EX stage, and so the following instructions can easily be abandoned since they are only in the IF and ID segments.

These changes are shown below; blue indicates changes for homework 3 and red indicates changes for this problem. Note the exaggerated inversion bubble at the shift register shift input.

### Problem 3

 For loads and branches to be properly nullified, the shift register must not be clocked when bubbles are passing though the EX stage. The simplest solution is to assume that the controller can provide a "bubble bit" in the EX stage; when such a bit is 1 the shift register is not clocked. If a bubble bit is not already provided, it can be synthesized by checking for the two stall conditions: a load instruction with a RAW hazard, and a taken branch or any other CTI. A bubble shift register is set to the number of stall cycles, one bit for a load, and three for a taken branch. The register is either loaded, as described above, or shifted. If the bit out is one, then the PM shift register is not shifted. The load stall is detected by checking for a load instruction in the EX stage and any instruction reading registers in the ID stage. If the register written by the load is the same as either of the registers read by the instructions in ID, then one bit in the bubble register is set. Instructions using one and two source registers need to be distinguished. CTIs can be detected in the EX stage. If instruction in EX is a jump, jump/link, or a taken branch, three bits in the bubble shift register are set. These changes are shown below; blue indicates changed for homework 3 and red indicates changes for this problem. The solution below assumes that NOPs are inserted into the IR in place of the stalled instructions. What if that assumption is not correct?

 David M. Koppelman - koppel@ee.lsu.edu Modified 18 Mar 1997 18:39 (0:39 UTC)