## LSU EE 4720

Homework 7

The PPC 970 which was the subject of a question in Homework 6 is very similar to the POWER4 chip, the main differences being that the POWER4 lacks the packed-operand instructions and POWER4 includes two processors on a single chip.

Answer the following questions about the POWER4 based on information in "POWER4 System Microarchitecture," by Tendler et al, available via

http://www.ece.lsu.edu/ee4720/doc/power4.pdf. The questions can be answered without reading the entire paper. In particular, there is no need to read past page 17.

**Problem 1**: Translate the following terms, as used in class, to their nearest equivalent in the paper.

- Integer Instruction
- Instruction Queue
- Reorder Buffer
- Physical Register

**Problem 2**: The pipeline execution diagram below shows MIPS code on the dynamically scheduled system described in the study guide.

(a) Re-draw the diagram using the stages from POWER4. (Do not translate the instructions into the POWER assembly language.) Just show one iteration and assume that the four instructions are formed into one group. Also assume that the branch does not have a delay slot. Use stages F1, F2, and F3 for the multiply.

(b) In your diagram identify the *fetch* and *execute* pipelines, as defined in class.

# Cycle 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 LOOP: ldc1 f0, 0(t1) RR EA ME WB C IF ID Q mul f2, f2, f0 IF ID Q RR M1 M2 M3 WB C С addi t1, t1, 8 IF ID Q RR EX WB bneq t1, t2 LOOP IF ID Q RR B WB С

**Problem 3:** The POWER4 uses what is commonly called a *hybrid predictor* in which each branch is predicted by two different predictors and a third predictor predicts the prediction to use. One predictor is something like the bimodal predictor discussed in class and the other is something like the gshare predictor discussed in class.

(a) Provide a code example in which the bimodal predictor described in class will do better than the POWER4's almost equivalent predictor. (Ignore the selector.)

(b) How might the POWER4 designers justify the differences with the bimodal predictor given the lower performance in the example above?

(c) Provide a code example in which the gshare predictor described in class outperforms the POWER4's almost equivalent predictor. (Ignore the selector.)

(d) How might the POWER4 designers justify the differences with the gshare predictor given the lower performance in the example above?