EE 4720 Computer Architecture - HW 1 Solution (Spring 1998)


Problem 1

Temporal Locality

Instruction fetches for instructions within loop. (Since they are fetched multiple times.)

The store of the incremented histogram element, since the same address was used to load the element.

SW   0(r10), rll   ! hist[e] = r11

Spatial Locality

Accesses to the array elements. (Since they are done sequentially.) For example:

LH   r10, 0(r10)   ! r10 = array[ i*JSIZE + j ]

Possibly Neither Temporal Nor Spatial Locality

Accesses to the hist array, since the pattern of access depends on the value of elements of array, which are not known in advance.

LW   r11, 0(r10)   ! r11 = hist[ e ];

Problem 2

The static count is simply the number of instructions, 24. The dynamic count is 3275 instructions, the table below shows the dynamic count of each instruction. The labels (e.g., NEXTI:) are included for clarity.

Dynamic Instruction
1 ADDI r10, r0, #20
1 MOVI2FP f0, r10
1 ADDI r1, r0, #0
NEXTI:
11 SGEI r10, r1, #10
11 BNEZ r10, DONEI
10 ADDI r2, r0, #0
NEXTJ:
210 SGEI r10, r2, #20
210 BNEZ r10, DONEJ
200 MOVI2PF f1, r1
200 MULT f1, f1, f0
200 MOVFP2I r10, f1
200 ADD r10, r10, r2
200 SLLI r10, r10, #1
200 ADD r10, r10, r4
200 LH r10, 0(r10)
200 SLLI r10, r10, #2
200 ADD r10, r10, r3
200 LW r11, 0(r10)
200 ADDI r11, rll, #1
200 SW 0(r10), rll
200 ADDI r2, r2, #1
200 J NEXTJ
DONEJ:
10 ADDI r1, r1, #1
10 J NEXTI
DONEI:

Problem 3

The scaled addressing mode makes accessing both arrays much easier. Element array[i*JSIZE+j] is still loaded into a register, but the histogram element is incremented in a memory-memory add instruction.


        ADDI r1, r0, #0    ! i=0
NEXTI:
        SGEI r10, r1, #10  ! if i >= ISIZE ...
        BNEZ r10, DONEI    ! ... exit loop.
        ADDI r2, r0, #0    ! j=0
NEXTJ:
        SGEI r10, r2, #20  ! if j >= JSIZE ...
        BNEZ r10, DONEJ    ! ... exit loop
        MULTI r10, r1, JSIZE      !  r10 = i * JSIZE
        ADD   r10, r10, r2        !  r10 = i * JSIZE + j
        LH    r10, 0(r4)[r10]     !  d = 2; r10 = array[r10]
        ADDI O(r3)[r10],0(r3)[r10],#1  ! d=4; hist[r10] = hist[r10]+1;
        ADDI r2, r2, #1    ! r2 = j+1
        J NEXTJ
DONEJ:
        ADDI r1, r1, #1    ! r1 = i+1
        J NEXTI
DONEI:

The new static count is 14, the dynamic count is 1674, the table below lists count by instruction.

1 ADDI r1, r0, #0
NEXTI:
11 SGEI r10, r1, #10
11 BNEZ r10, DONEI
11 ADDI r2, r0, #0
NEXTJ:
210 SGEI r10, r2, #20
210 BNEZ r10, DONEJ
200 MULTI r10, r1, JSIZE
200 ADD r10, r10, r2
200 LH r10, 0(r4)[r10]
200 ADDI O(r3)[r10],0(r3)[r10],#1
200 ADDI r2, r2, #1
200 J NEXTJ
DONEJ:
10 ADDI r1, r1, #1
10 J NEXTI
DONEI:

Problem 4

Clock frequency for fixed CPI

Need to solve: ICnew CPInew / fnew = ICold CPIold / fold for fnew. Solving yields, fnew = 1 GHz ICnew / ICold = 511 MHz

So, even though an implementation using complex addressing modes might have a slower clock, it can still run faster. Note that the assumption that the CPI is the same is unrealistic, since it would take much longer to execute, for example, an instruction that had three memory operands.

CPI for fixed clock frequency.

Need to solve: ICnew CPInew / fnew = ICold CPIold / fold for CPInew. The resulting instruction execution time is CPInew = CPIold ICold / ICnew = 0.489 CPI.

So, even though CPI will suffer in an ISA using elaborate addressing, it is still possible to gain performance.


Problem 5

Unoptimized Assembler

Some notes on Sparc assembler:

The output of the Sun compiler is shown below. The compiler helpfully shows the source lines corresponding to the assembler instructions, and also breaks code into basic blocks which is convenient for finding the dynamic instruction count. (A basic block is a group of instructions that can only be entered from the first instructions [can't branch into the middle] and can only be exited from the last instruction [only the last instruction can be a CTI]. All instructions in a basic block are executed the same number of times.)

Static count: 55 (yes, nops count, they take space). Dynamic count, 6938 (yes, nops count, they at least take fetch and decode space). The count is detailed below, by basic block, followed by an annotated assembly language listing.

Block
Number
Static
Count
Dynamic Count
(Per instr)
Dynamic Count
(Total)
1 6 1 6
2 6 10 60
3 34 200 6800
4 7 10 70
5 2 1 2
! File hw1.c:
!    2  extern short int *array;
!    3  extern int *hist;
!    4  
!    5  #define ISIZE 10
!    6  #define JSIZE 20
!    7  
!    8  void histo()
!    9  {
!   10    int i,j;
!   11  
!   12    for(i=0; i<ISIZE; i++)

        mov     0,%l0
        st      %l0,[%fp-4]   ! Store reg. l0 into frame pointer - 4 =  &i.
        ld      [%fp-4],%l0   ! Load the 0 just written. 
        cmp     %l0,10        ! Compare i to 10, set condition bits.
        bge     .L19          ! Branch if i greater than 10.
        nop                   ! Sparc has a branch delay slot, wasted here.

        ! block 2
.L20:
.L17:

!   13      for(j=0; j<JSIZE; j++)

        mov     0,%l0
        st      %l0,[%fp-8]   ! Store reg l0 info frame pointer - 8 = &j...
        ld      [%fp-8],%l0   ! ...and load it back in.
        cmp     %l0,20        ! Compare to 20
        bge     .L23          ! Branch if not smaller.
        nop                   

        ! block 3
.L24:
.L21:

!   14        {
!   15          int e= array[ i * JSIZE + j ];

        sethi   %hi(array),%l0        ! Load hi bits of array pointer addr.
        or      %l0,%lo(array),%l0    ! Load rest of array pointer address.
        ld      [%l0+0],%l3           ! Load address of first element.
        ld      [%fp-4],%l0           ! Load i into l0
        sll     %l0,2,%l2             ! l2 = i * 4 
        sll     %l0,4,%l1             ! l1 = i * 16
        add     %l2,%l1,%l0           ! l0 = i * 20 = i * JSIZE;
        ld      [%fp-8],%l1           ! l1 = j
        add     %l0,%l1,%l0           ! l0 = j + i * JSIZE
        sll     %l0,1,%l1             ! l1 = l0 * 2 = l0 * sizeof(short int)
        add     %l3,%l1,%l0           ! l0 = &array[ j + i * JSIZE ]
        ldsh    [%l0+0],%l0           ! l0 = array[ j + i * JSIZE ] = e
        sll     %l0,16,%l0            ! l0 = l0 << 16  (Part 1 of sign ext.)
        sra     %l0,16,%l0            ! l0 = l0 >> 16  (Part 2 of sign ext.)
        st      %l0,[%fp-12]          ! e = l0

!   16          hist[ e ]++;

        sethi   %hi(hist),%l0         ! Load hi bits of addr of hist pointer
        or      %l0,%lo(hist),%l0     ! Load lo bits of addr of hist pointer
        ld      [%l0+0],%l2           ! Load hist pointer, &hist[0]
        ld      [%fp-12],%l0          ! Load e 
        sll     %l0,2,%l1             ! l1 = e * sizeof(int)
        add     %l2,%l1,%l0           ! l0 = &hist[e]
        st      %l0,[%fp-16]          ! Write &hist[e] into stack
        ld      [%fp-16],%l0          ! Read &hist[e] from stack.
        ld      [%l0+0],%l0           ! l0 = hist[e]
        add     %l0,1,%l1             ! l1 = hist[e] + 1
        ld      [%fp-16],%l0          ! Read &hist[e] from stack.
        st      %l1,[%l0+0]           ! Store hist[e]+1

!   17        }

        ld      [%fp-8],%l0           ! l0 = j
        add     %l0,1,%l0             ! l0 = l0 + 1
        st      %l0,[%fp-8]           ! Store new value of j.
        ld      [%fp-8],%l0           ! And load it again.
        cmp     %l0,20                ! Compare to twenty
        bl      .L21                  ! If less, continue.
        nop

        ! block 4
.L25:
.L23:
        ld      [%fp-4],%l0           ! l0 = i.
        add     %l0,1,%l0             ! l0 = l0 + 1
        st      %l0,[%fp-4]           ! i = l0, store new value of i.
        ld      [%fp-4],%l0           ! Load again. 
        cmp     %l0,10                ! Compare to ten..
        bl      .L17                  ! ...if less continue.
        nop

        ! block 5
.L26:
.L19:
.L15:
        jmp     %i7+8                 ! Return
        restore

Optimized (for speed) Assembler

Notes on optimized code

Static count, 26, dynamic count 2476. Alot fewer!

Block
Number
Static
Count
Dynamic Count
(Per Instr)
Dynamic Count
(Total)
1 5 1 5
2 2 10 20
3 12 200 2400
4 5 10 50
5 2 1 1
! FILE hw1.c

!    2		      !extern short int *array;
!    3		      !extern int *hist;
!    5		      !#define ISIZE 10
!    6		      !#define JSIZE 20
!    8		      !void histo()
!    9		      !{
!   10		      !  int i,j;
!   12		      !  for(i=0; i<ISIZE; i++)

! Block 1

/* 000000	  12 */		or	%g0,0,%g4
/* 0x0004	     */		or	%g0,0,%o5

!   13		      !    for(j=0; j<JSIZE; j++)

/* 0x0008	  13 */		or	%g0,0,%g1       ! j = 0;
/* 0x000c	     */		sethi	%hi(hist),%g3
/* 0x0010	     */		sethi	%hi(array),%g2

! Block 2

                       .L900000109:

!   14		      !      {
!   15		      !	int e= array[ i * JSIZE + j ];
!   16		      !	hist[ e ]++;

/* 0x0014	  16 */		sll	%o5,1,%o4
                       .L77000005:
			        ! o3 = &array[0]
/* 0x0018	     */		ld	[%g2+%lo(array)],%o3 ! volatile

! Block 3

                       .L900000110:
			        ! o3 = array[ i * JSIZE + j ] = e
/* 0x001c	     */		ldsh	[%o3+%o4],%o3 ! volatile

!   17		      !      }

/* 0x0020	  17 */		add	%g1,1,%g1  ! j = j + 1
/* 0x0024	     */		add	%o4,2,%o4  ! 
				! Below, o2 = &hist
/* 0x0028	  16 */		ld	[%g3+%lo(hist)],%o2 ! volatile
/* 0x002c	  17 */		cmp	%g1,20     ! Compare j to 20.
/* 0x0030	  16 */		sll	%o3,2,%o3  ! e * sizeof(int)
/* 0x0034	     */		add	%o2,%o3,%o1 ! o1 = &hist[e]
				! Below, o1 = hist[e];
/* 0x0038	     */		ld	[%o1],%o1 ! volatile 
/* 0x003c	     */		add	%o1,1,%o1  ! o1 = hist[e] + 1
				! Below, save: hist[e] = hist[e] + 1
/* 0x0040	     */		st	%o1,[%o3+%o2] ! volatile
				! Referring to cmp above, branch if j < 20.
/* 0x0044	  17 */		bl,a	.L900000110  ! Branch less than.
				! Instruction below is skipped if branch
				! not taken.  o3 = &array[0];
/* 0x0048	     */		ld	[%g2+%lo(array)],%o3 ! volatile

! Block 4

                       .L77000007:
/* 0x004c	     */		add	%g4,1,%g4   ! i = i + 1
				! Below, instead of multiplying i * JSIZE,
				! just add JSIZE (20) each iteration.
/* 0x0050	     */		add	%o5,20,%o5  
/* 0x0054	     */		cmp	%g4,10     ! Compare i to ISIZE 
/* 0x0058	     */		bl	.L900000109  ! If less, continue
				! Instruction below executed even if
				! branch above is taken.
/* 0x005c	     */		or	%g0,0,%g1    ! j = 0

! Block 5

                       .L77000010:
/* 0x0060	     */		retl
/* 0x0064	     */		nop
/* 0x0068	   0 */		.type	histo,2
/* 0x0068	     */		.size	histo,(.-histo)

ECE Home Page 4720 Home Page
David M. Koppelman - koppel@ee.lsu.edu
Modified 20 Feb 1998 12:09 (18:09 UTC)