The SPARC assembly language program below is used in the problems that follow. SPARC register names are %g0-%g7, %i0-%i7, %10-%17, and %o0-%o7; and %g0 is a zero register (like r0 in DLX). The destination for arithmetic, logical, and load instructions is the rightmost register (add %11,%12,%13 means %13=%11+%12). SPARC uses a condition code register and special condition-code-setting instructions for branches. Branches include a delay slot.

```
LOOP:
ld [%11], %12
                      ! Load 12 = MEM[ 11 ]
addcc %12, %g0, %g0 ! g0 = g0 + 12. Sets cond. codes. Note: g0 is zero reg.
be DONE
                      ! Branch if result zero.
                      ! Fill delay slot with nop.
nop
add %16, %12, %16
                      ! 16 = 16 + 12
                   ! g0 = 1 & 13. Sets cond. codes. Note: g0 is zero reg.
 andcc %13, 1, %g0
be SKIP1
nop
add %14, 1, %14
SKIP1:
 subcc %13, 1000, %g0
                      ! Branch if >= 0;
bpos SKIP2
nop
add %14, %13, %14
SKIP2:
 andcc %13, 1, %g0
be SKIP3
nop
add %14, %14, %14
SKIP3:
add %11, 4, %11
ba LOOP
                      ! Branch always. (Jump.)
nop
DONE:
```

**Problem 1:** An execution of the code above on a SPARC implementation takes 1000 cycles. The dynamic instruction count is  $IC_{all}$  of which  $IC_{nop}$  instructions are nop's. Consider two ways of computing CPI:

$$\mathrm{CPI_A} = \frac{t}{\mathrm{IC_{all}}} \qquad \mathrm{and} \qquad \mathrm{CPI_B} = \frac{t}{\mathrm{IC_{all}} - \mathrm{IC_{nop}}},$$

where t is the execution time in cycles. Which is better? Justify your answer; an argument for either formula can be correct.

 $\mathsf{CPI}_A$  is better because it measures how efficiently a processor executes instructions, including  $\mathtt{nop}$  instructions which are part of the code.

**Problem 2:** SPARC branches have a one-instruction delay slot, in the code above they are filled with nop's. Re-write the code filling as many slots with useful instructions as possible, reducing the number of instructions in the program.

Solution: ld [%11], %12 ! Load 12 = MEM[ 11 ] LOOP: addcc %12, %g0, %g0 ! g0 = g0 + 12. Sets cond. codes. Note: g0 is zero reg. be DONE ! Branch if result zero. andcc %13, 1, %g0 ! g0 = 1 & 13. Sets cond. codes. Note: g0 is zero reg. add %16, %12, %16 ! 16 = 16 + 12 be SKIP1 subcc %13, 1000, %g0 add %14, 1, %14 SKIP1: bpos SKIP2 ! Branch if  $\geq 0$ ; andcc %13, 1, %g0 add %14, %13, %14 SKIP2: be SKIP3 add %11, 4, %11 add %14, %14, %14 SKIP3: ba LOOP ! Branch always. (Jump.)

! Load 12 = MEM[ 11 ]

**Problem 3:** Re-write the program in DLX, taking advantage of DLX's use of general purpose registers for specifying branch conditions.

Solution:

ld [%11], %12

DONE:

```
LOOP:
lw r2, 0(r1)
beqz r2, DONE
add r6, r2, r6
andi r10, r3, #1
beqz r10, SKIP1
addi r4, r4, #1
SKIP1:
 sgei r11, r3, #1000
bneq r11, SKIP2
add r4, r3, r4
SKIP2:
beqz r10, SKIP3 ! r10 computed before SKIP1.
add r4, r4, r4
SKIP3:
addi r1, r1, #4
beqz r0, LOOP
DONE:
```

**Problem 4:** The program below executes on the DLX implementation shown below. The comments show the results of the xori, or, and lw instructions.



The table below shows the contents of pipeline registers and changes to architecturally visible registers r1-r31 over time. Cycle zero is the time that xori is in instruction fetch. The first two columns are completed, continue filling the table up until the sw instruction finishes writeback. Ignore values which are not used and which depend on the func field of type-R instructions. Values which are not used and don't depend on the func field should be shown. The output of the data memory is zero when a store or no memory operation is performed. The row labeled "Reg. Chng." shows a new register value that is available at the beginning of the cycle. If no register value is written leave the entry blank.

| Cycle      | 0                          | 1                          | 2                          | 3                          | 4                          | 5              | 6                    | 7                            | 8    | 9                          | 10 |
|------------|----------------------------|----------------------------|----------------------------|----------------------------|----------------------------|----------------|----------------------|------------------------------|------|----------------------------|----|
| PC         | 0×50                       | 0x54                       | 0x58                       | 0x5e                       | 0x60                       | 0x64           | 0x <b>6</b> 8        | 0 <b>x6</b> 0                | 0x70 | •••                        |    |
| IF/ID.IR   | addi                       | xori                       | or                         | lW                         | SW                         | addi           | • • •                |                              |      |                            |    |
| Reg. Chng. | $\mathtt{r0} \leftarrow 0$ | <b>r1</b> ← 10 | $00r2 \leftarrow 45$ | $5 \text{ r5} \leftarrow 42$ | 2 X  | $\mathtt{r0} \leftarrow 0$ |    |
| ID/EX.IR   | addi                       | addi                       | xori                       | or                         | lW                         | SW             | addi                 |                              |      |                            |    |
| ID/EX.A    | 0                          | 0                          | 99                         | 33                         | 66                         | 77             | 0                    |                              |      |                            |    |
| ID/EX.B    | 0                          | 0                          | 11                         | 44                         | 55                         | 88             | 0                    |                              |      |                            |    |
| ID/EX.IMM  | 0                          | 0                          | 7                          | Χ                          | 9                          | 10             | 0                    | • • •                        |      |                            |    |
| EX/MEM.IR  | addi                       | addi                       | addi                       | irox                       | or                         | lW             | SW                   | addi                         |      |                            |    |
| EX/MEM.ALU | 0                          | 0                          | 0                          | 100                        | 45                         | 75             | 87                   | 0                            |      |                            |    |
| EX/MEM.B   | 0                          | 0                          | 0                          | 11                         | 44                         | 55             | 88                   | 0                            |      |                            |    |
| MEM/WB.IR  | addi                       | addi                       | addi                       | addi                       | xori                       | or             | lw                   | SW                           | addi |                            |    |
| MEM/WB.ALU | 0                          | 0                          | 0                          | 0                          | 100                        | 45             | 75                   | 87                           | 0    |                            |    |
| MEM/WB.MD  | 0                          | 0                          | 0                          | 0                          | 0                          | 0              | 42                   | 0                            | 0    |                            |    |