From: "Saved by Windows Internet Explorer 7" Subject: Control logic for Page 85 of Datapath Date: Sun, 4 Oct 2009 22:12:45 -0600 MIME-Version: 1.0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Location: http://www.ece.lsu.edu/alex/EE3755/fall05/CONTROL.HTM X-MimeOLE: Produced By Microsoft MimeOLE V6.0.6001.18049
## =
LSU EE=20
3755 -- -- Computer Organization
#
## =
Control=20
Logic for MIPS -- fall 2006
##=20
Contents
#
# Single Cycle =
control=20
logic for the Datapath (page 85)
# Hard wired =
and Micro=20
Programmed Controller
# Multi Cycle =
control=20
logic
# Finite State =
Diagram=20
and Finite State Machine
# Programmable =
Logic=20
Array
# Single Cycle =
control=20
logic for the Datapath (page 85)
Control logic for Page 85 of=20
Datapath..
We simply have to give a control signal for =
each=20
Multiplexor , the ALU
and design logic for control circuit =
near the =
PC.
1) the ALU
2) Mux at ALU
3)Mux at =20
data Memory and ALU.
4)Mux at Rd for =20
Register file
5)Mux at line 15-11 for Register=20
file
6)Mux at D/IN for Register=20
file
7) Mux and control for =
NPC
8)Mux for the control after the=20
ALU
1) the ALU
ALU =
can=20
perform
&n=
bsp; =20
AND,
&n=
bsp; =20
OR,
&n=
bsp; =20
SLT,
&n=
bsp; =20
ADD,
&n=
bsp; =20
SUB
&n=
bsp; =20
operations.
We =
will generate=20
those control signals from
OP =
code=20
field[31:26] and func field[5:0]
=
We=20
have control inputs for ALU.
&n=
bsp;=20
=20
=20
ALU_OP[2:0] =20
Operation
&n=
bsp;=20
000 &n=
bsp; &nb=
sp; =20
AND
&n=
bsp;=20
001 &n=
bsp; &nb=
sp; =20
OR
&n=
bsp;=20
010 &n=
bsp; &nb=
sp; =20
SLT
&n=
bsp;=20
011 &n=
bsp; &nb=
sp; =20
ADD
&n=
bsp;=20
100 &n=
bsp; &nb=
sp; =20
SUB
&n=
bsp; =20
OP =20
FUN =20
ALUoperation =
: =
OP &n=
bsp; =20
FUN =20
ALU_OP
=20
AND =20
0 =20
0X 24 =20
AND &n=
bsp; =20
000000 =
100100 &n=
bsp; =20
000
=20
ANDI =20
6 =20
X &n=
bsp; =20
AND &n=
bsp; =20
000110 =
XXXXXX &n=
bsp;=20
000
=20
OR =20
0 =20
0x25 &n=
bsp; =20
OR &n=
bsp; =20
000000 =
100101 &n=
bsp; =20
001
=20
ORI =20
d =20
X &n=
bsp; =20
OR &n=
bsp; =20
001101 =
XXXXXX &n=
bsp;=20
001
=20
SLT =20
0 =20
0x2a &n=
bsp;=20
SLT &n=
bsp; =20
000000 =
101010 &n=
bsp; =20
010 =20
=20
SLTI =20
a =20
X &n=
bsp; =20
SLT &n=
bsp; =20
001010 =
XXXXXX &n=
bsp; =20
010
=20
ADD =20
0 =20
0X20 &n=
bsp; =20
ADD &n=
bsp;=20
000000 =
100000 &n=
bsp; =20
011
=20
ADDI =20
8 =20
X &n=
bsp; =20
ADD &n=
bsp; =20
001000 =
XXXXXX &n=
bsp; =20
011
=20
SUB =20
0 =20
0X22 =20
SUB &n=
bsp; =20
000000 =20
100010 &n=
bsp; =20
100
=20
LW &n=
bsp;=20
23 =20
X &n=
bsp; =20
ADD &n=
bsp; =20
100011 =
XXXXXX &n=
bsp; =20
011
=20
LB &n=
bsp; =20
20 =20
X &n=
bsp; =20
ADD &n=
bsp; =20
100000 =
XXXXXX =20
011
=20
SW &n=
bsp;=20
2b =20
X &n=
bsp; =20
ADD &n=
bsp; =20
100000 =
XXXXXX &n=
bsp; =20
011
=20
SB &n=
bsp; =20
28 =20
X &n=
bsp; =20
ADD &n=
bsp; =20
101000 =
XXXXXX &n=
bsp; =20
011
=20
BEQ =20
4 =20
X &n=
bsp; =20
SUB &n=
bsp; =20
000100 XXXXXX &n=
bsp; =20
100
=20
BNE =20
5 =20
X &n=
bsp; =20
SUB &n=
bsp; =20
000101 XXXXXX &n=
bsp; =20
100
=20
&n=
bsp;=20
By using AND and =
OR gates,=20
We could easily generate ALU_OP.
&n=
bsp;=20
After logic minimization, the output will be=20
simpler.
=20
##############################################################
&n=
bsp;=20
There are many other ways to do this.
&n=
bsp;=20
One simple way is using decoders to decode OP code and =
function
&n=
bsp;=20
field.
=20
&n=
bsp; =20
6 inputs and 64 outputs for each =
decoder.
&n=
bsp;=20
000 &n=
bsp; &nb=
sp; =20
AND
&n=
bsp;=20
001 &n=
bsp; &nb=
sp; =20
OR
&n=
bsp;=20
010 &n=
bsp; &nb=
sp; =20
SLT
&n=
bsp;=20
011 =20
&n=
bsp; ADD
&n=
bsp;=20
100 &n=
bsp; &nb=
sp; =20
SUB =20
=20
ALU_OP[2] =3D x0 y34 + =20
x4 + x5
=20
ALU_OP[1] =3D slt add (0 2a) + a + (0 20) + 8 + 23 + 20 +2b =
+ 28=20
&n=
bsp; &nb=
sp; =20
=3D =20
x0y42+x10+x0y32+x8+x35+x32+x43+x40
=20
ALU_OP[0] =3D or add (0 25) + d+ (0 20) + 8 + 23 + 20 +2b + 28=20
&n=
bsp; &nb=
sp; =20
=3D =
x0y37+x13+ =20
x0y32+x8+x35+x32+x43+x40
=20
=20
=
## &n=
bsp; =20
Fig. for Decoders
#######################################################=
############ &n=
bsp; =20
&n=
bsp; =20
second simple way is using a ROM.
&n=
bsp; =20
the address for the ROM is OP code and funcion field and the =
ouput is=20
ALU_OP.
## &n=
bsp; =20
Fig. for ROM
##################################################################=
##
2) Mux at ALU
=20
the Mux will select D/rt or sign extended=20
immed.
=20
if select for the Mux is 0, it will select D/rt otherwise select =
sign=20
extended immed.
=20
We call the select signal Mux_ALU_CNT.
=20
ANDI,ORI,SLTI,ADDI,LW,LB,SW,SB use
&n=
bsp; =20
immediate values.
=20
OP FUN =
ALUoperation : =
OP =
FUN =20
Mux_ALU_CNT
=20
ANDI 6 =
X =20
AND &n=
bsp; =20
000110 =
XXXXXX =20
1
ORI d =
X =20
OR &n=
bsp; =20
001101 =
XXXXXX =20
1
SLTI a =20
X =20
SLT &n=
bsp; =20
001010 =
XXXXXX =20
1
ADDI 8 =20
X =20
ADD &n=
bsp; =20
001000 =
XXXXXX =20
1
LW 23 X =20
ADD &n=
bsp; =20
100011 =
XXXXXX =20
1
LB =
20 =
X =20
ADD &n=
bsp; =20
100000 =
XXXXXX =20
1
SW 2b =
X =20
ADD &n=
bsp; =20
100000 =
XXXXXX =20
1
SB 28 =20
X =20
ADD =
&n=
bsp; 101000 XXXXXX =20
1
## We could implement the logic with AND =
and OR=20
gates.
## We could use a decoder with OR gate to =
implement=20
this.
## We could use a ROM with 6bit address =
field.
3)Mux at =20
data Memory and ALU.
=20
the Mux will select ALU_output or Dout.
=20
if select for the Mux is 0, it will select ALU_output=20
,
=20
otherwise select Dout.
=20
We call the select signal Mux_DataMem.
=20
Only LW and LB will set Mux_DataMem.
=20
OP =20
FUN =20
ALUoperation =
: OP =
FUN =20
Mux_DataMem
LW 23 =20
X =20
ADD &n=
bsp; =20
100011 =
XXXXXX =20
1
LB 20 =20
X =20
ADD &n=
bsp; =20
100000 =
XXXXXX =20
1
=20
## We could implement the logic with AND =
and OR=20
gates.
## We could use a decoder with OR gate to =
implement=20
this.
## We could use a ROM with 6bit address =
field.
4)Mux at Rd for =20
Register file
=20
the Mux will select =
"#31"=20
output of another Mux.
=20
if select=20
for the Mux is 0, it will select output of another Mux=20
&n=
bsp;=20
,otherwise select "#31".
=20
We call the select signal Mux_Rd_CNT.
=20
Only Jal will set Mux_Rd_CNT.
=20
OP =20
FUN =20
ALUoperation =
: =
OP FUN =20
Mux_Rd_CNT
=20
JAL =
3 =20
X &n=
bsp; =20
X &n=
bsp; =20
000011 =
XXXXXX =20
1
## We =
could=20
implement the logic with AND and OR gates.
## We =
could use=20
a decoder to implement=20
this.
5)Mux at line 15-11 for Register=20
file
=20
the Mux will select =
lines 15=20
to 11 or lines 20 to 16.
=20
if select for the Mux is 0, it will select lines 15 to 11=20
,
=20
otherwise select lines 20 to 16.
=20
We call the select signal Mux_Line11.
=20
Whenever RT field acts like RD field, we will set=20
Mux_Line11.
=20
Whenever they try to write something,
&n=
bsp; =20
they need Rd field(some times Rt) . =20
&n=
bsp;=20
OP =20
FUN =20
ALUoperation =
: =
OP FUN =20
Mux_Line11. =20
ANDI 6 =20
X &n=
bsp; =20
AND =20
000110 =
XXXXXX &n=
bsp;=20
1 =20
ORI =20
d =20
X &n=
bsp; =20
OR =20
001101 =
XXXXXX &n=
bsp;=20
1
SLTI =
a =20
X &n=
bsp; =20
SLT =20
001010 =
XXXXXX =20
1
ADDI 8 =20
X &n=
bsp; =20
ADD =20
001000 =
XXXXXX &n=
bsp; =20
1
LW =20
23 =20
X &n=
bsp; =20
ADD =20
100011 =
XXXXXX &n=
bsp; =20
1
LB =20
20 =20
X &n=
bsp; =20
ADD =20
100000 =
XXXXXX &n=
bsp; =20
1
=20
## We could implement the logic with AND and OR=20
gates.
=20
## We could use a decoder with OR gate to implement=20
this.
=20
## We could use a ROM with 6bit address field.
6)Mux at D/IN for Register=20
file
=20
the Mux will select =
either=20
output from Mux 3(control signal for this is
=20
Mux_dataMem) or lines from NPC +4(PC =
+8).
=20
if select for the Mux is 0, it will select output from Mux3,=20
=20
otherwise select =
lines from=20
NPC + 4.
=20
We call the select signal =
Mux_Line_NPC_plus4.
=20
Whenever we have to save NPC + 4, we will set=20
Mux_Line_NPC_plus4.
=20
=
Only=20
Jal and Jalr will set Mux_Line_NPC_plus4.
=20
=20
OP FUN ALUoperation : OP FUN ALU_OP:=20
Mux_Line_NPC_plus4.
jal 3 =
X &n=
bsp;=20
X =20
000011 XXXXXX =
X &n=
bsp; =20
1
jalr 0 =20
9 &n=
bsp; =20
X =20
000000 001001 =20
X &n=
bsp; =20
1 =
=20
=20
## We could implement the logic with AND and OR=20
gates.
=20
## We could use two decoders with OR gate and AND gate=20
=20
to implement this.
=20
## We could use a ROM with 12bit address field.
7) Mux and control for =
NPC
=20
We have to design control logic for =20
NPC_CNT
=20
(controller before the Mux).
=20
The controller will =
do one=20
of four things.
=20
1)check if branch =
condition=20
is met and if the =
instruction is=20
branch
&n=
bsp; =20
instruction, it will generate select signal =
&n=
bsp; =20
for the Mux(Mux_NPC_CNT =3D00).
&n=
bsp; =20
the mux will select branch address(Target address at the=20
figure).
=20
2) check if the instruction is jump instruction,=20
&n=
bsp; =20
it will generate select signal
&n=
bsp; =20
for the Mux(Mux_NPC_CNT =3D01).
&n=
bsp; =20
the mux will select jump address.
=20
3)check if the instruction is jr or jalr=20
instruction,
&n=
bsp; =20
it will generate select signal
&n=
bsp; =20
for the Mux(Mux_NPC_CNT =3D10).
&n=
bsp; =20
the mux will select jump address(Drs from the register=20
file).
=20
4)otherwise, =
either branch condition failed or regular =
instruction.
&n=
bsp; =20
it will generate select signal
&n=
bsp; =20
for the Mux(Mux_NPC_CNT =3D11).
&n=
bsp; =20
the mux will select NPC +4 address =
lines.
&n=
bsp; =20
The part of inputs for this controller come from=20
&n=
bsp; =20
Control Signal which tell =
the instruction is
&n=
bsp; =20
branch instruction(NPC_CNT_SIG =3D 00) , or =
&n=
bsp; =20
the instruction is jump or jal =20
instruction(NPC_CNT_SIG =3D 01), or
=
t=
he=20
instruction is jr or jalr instruction(NPC_CNT_SIG =3D=20
10).
&n=
bsp; =20
the instruction is regular instruction(NPC_CNT_SIG =3D=20
11).
&n=
bsp; =20
The other part of inputs for this controller come=20
&n=
bsp; =20
from Mux 8 which tells =
the=20
branch condition met or
&n=
bsp; &nb=
sp; =20
not(Mux8_output =3D 1 ; branch condition met). =
=20
The Control Signal should generate=20
NPC_CNT_SIG.
&n=
bsp;=20
&n=
bsp; =20
OP =20
FUN =20
ALUoperation =
: =
OP &n=
bsp; =20
FUN =20
NPC_CNT_SIG[1:0]
=20
BEQ =
4 =20
X &n=
bsp; =20
SUB &n=
bsp; =20
000100 =20
XXXXXX &n=
bsp;=20
00
BNE =
5 =20
X &n=
bsp; =20
SUB &n=
bsp; =20
000101 =20
XXXXXX =20
00=20
J =20
2 =20
X &n=
bsp; =20
X &n=
bsp; =20
000010 =20
XXXXXX &n=
bsp;=20
01
jal =20
3 =20
X &n=
bsp; =20
X &n=
bsp; =20
000011 =20
XXXXXX &n=
bsp;=20
01
Jr =20
0 =20
8 &n=
bsp; =20
X &n=
bsp; =20
000000 =20
001000 &n=
bsp; =20
10
jalr =20
0 =20
9 &n=
bsp; =20
X &n=
bsp; =20
000000 =20
001001 &n=
bsp; =20
10 =20
=20
=20
## We could implement the logic with AND and OR=20
gates.
=20
## We could use two decoders with Inverters and AND gates to
=20
# implement this.
=20
## We could use a ROM with 12bit address field.
Now we need to generate signals =
MuX_NPC_CNT.
=20
1)check if branch =
condition=20
is met and if the =
instruction is=20
branch
&n=
bsp; =20
instruction, it will generate select signal for=20
the
&n=
bsp; =20
Mux(Mux_NPC_CNT =3D00).
&n=
bsp; =20
the mux will select branch address(Target address at the=20
figure).
&n=
bsp; =20
if NPC_CNT_SIG =3D00 and Mux_output =3D 1 , =
then=20
&n=
bsp;=20
Mux_NPC_CNT[1:0]=20
=3D00.
&n=
bsp; &nb=
sp;=20
=20
2) check if the instruction is jump instruction, it will generate =
select=20
&n=
bsp; =20
signal for the=20
Mux(Mux_NPC_CNT =3D01).
&n=
bsp; =20
the mux will select jump address.
&n=
bsp; =20
if NPC_CNT_SIG =3D01 then Mux_NPC_CNT[1:0] =3D =
01.
=20
3)check if the instruction is jr instruction, it will generate =
select=20
&n=
bsp; =20
signal for the =
Mux(Mux_NPC_CNT=20
=3D10).
&n=
bsp; =20
the mux will select jump address(Drs from the register=20
file).
&n=
bsp; =20
if NPC_CNT_SIG =3D10 then Mux_NPC_CNT[1:0] =3D =
10.
=20
4)otherwise, either =
regular=20
instruction, =
or branch instruction =
and=20
&n=
bsp; =20
branch condition failed =20
.
&n=
bsp; =20
it will generate select signal
&n=
bsp; =20
for the Mux(Mux_NPC_CNT =3D11).
&n=
bsp; =20
the mux will select NPC +4 address lines.
&n=
bsp; =20
=20
i=
f=20
NPC_CNT_SIG =3D11 or
&n=
bsp; =20
NPC_CNT_SIG =3D00 =
and=20
Mux_output =3D 0 ,
&n=
bsp; &nb=
sp;=20
then =
Mux_NPC_CNT[1:0]=20
=3D11.
################################################################
=20
## We could implement the logic with AND and OR=20
gates.
=20
## We could use two decoders with OR gate and AND gate=20
## to =
implement=20
this.
=20
## We could use a ROM with 12bit address field.
The logic is =
simple,=20
so try to implement with gates.
&n=
bsp; &nb=
sp; &nbs=
p; =20
Mux_NPC_CNT[0] =
=3D=20
SIG[0] + MUX' =
*SIG[1]'
&n=
bsp; =20
SIG &n=
bsp; =20
00 01 11 1=20
0
&n=
bsp; &nb=
sp; &nbs=
p; =20
0 =
1 =
1 =
1 =
0
&n=
bsp; =20
MUX =20
1 =20
0 =
1 1 =20
0
&n=
bsp; &nb=
sp; &nbs=
p; =20
&n=
bsp; =20
&n=
bsp; &nb=
sp; &nbs=
p; =20
Mux_NPC_CNT[1] =3D SIG[1 ] =20
+ MUX'=20
*SIG[0]'
&n=
bsp; =20
SIG &n=
bsp; =20
00 =
01 11 1 =
0
&n=
bsp; =20
&n=
bsp; 0 1 =
0 1 =20
1
&n=
bsp; =20
MUX =20
1 =
0 =
0 1 1
=20
8)Mux for the control after the=20
ALU
=20
the Mux will select =
either=20
zero from ALU or=20
zero'.
=20
if select for the Mux is 0, it will select zero, otherwise=20
select
=20
zero'.
=20
We call the select signal Mux_Branch.
=20
Whenever we have =
BEQ, Mux=20
will select zero otherwise , select zero'.
&n=
bsp; =20
OP =20
FUN =20
ALUoperation =
: =
OP &n=
bsp; =20
FUN =20
Mux_Branch
=20
=20
BEQ =
4 =20
X &n=
bsp; =20
SUB &n=
bsp; =20
000100 =20
XXXXXX &n=
bsp;=20
0
=20
BNE =
5 =20
X &n=
bsp; =20
SUB &n=
bsp; =20
000101 =20
XXXXXX &n=
bsp;=20
1
################################################################
## We could implement the logic with AND =
gate and=20
Inverters.
## We could use a decoder to implement=20
this.
=20
this is just output =
x5 of OP=20
code decoder.
9)R/W signal for Data=20
memory.
R/W =
=3D 0 means=20
read.
R/W =
=3D 1 means=20
write.
&n=
bsp; =20
OP =20
FUN =20
ALUoperation =
: =
OP &n=
bsp; =20
FUN =20
R/W
=20
LW =20
23 =20
X &n=
bsp; =20
ADD &n=
bsp; =20
100011 =
XXXXXX &n=
bsp; =20
0
=20
LB =20
20 =20
X &n=
bsp; =20
ADD &n=
bsp; =20
100000 =
XXXXXX &n=
bsp; =20
0
=20
SW =20
2b =20
X &n=
bsp; =20
ADD &n=
bsp; =20
100000 =
XXXXXX &n=
bsp; =20
1
=20
SB =20
28 =20
X &n=
bsp; =20
ADD &n=
bsp; =20
101000 =
XXXXXX &n=
bsp; =20
1
## We could use a decoder to implement=20
this.
=20
this is ( x43 + =
x40) of OP=20
code decoder.
=20
# Multi Cycle =
control=20
logic
### =
Hard Wired and =
Micro=20
programmed Controller ##
We have a high =
level language program(c program=20
..)
We could compile =
it to=20
Mips assembly language instruction(compiler).
We could translate assembly language to Mips = machine=20 language(assembler).
(c program =
=3D> assembly=20
language =3D> machine language).
For =
a CPU with=20
a hardwired controller:
each machine =
language=20
instruction is decoded =
and executed=20
like a finite state machine.
For a CPU with a micro =
programmed controller:
Each =
machine=20
language instruction is =
defined by=20
a set of microinstructions and =
each=20
microinstruction is decoded and
Executed by =
micro=20
sequencer (Micro=20
CPU)
## Multi Cycle=20
Implementation####
##### Performance=20
###################################
=20
So far,=20
we assume we could finish every instruction in a single cycle, which means =
we have to finish every =
instruction in a=20
fixed time.
That means we have to set the clock =
frequency to the=20
slowest instruction.
Usually memory access instruction or =
floating point=20
instruction is
=20
the slowest=20
one.
Although the CPI(cycle per =
instruction) is 1, the overall performance =
of a=20
single-cycle implementation is not likely to be very good, since several =
of the=20
instruction classes could fit in a shorter clock cycle.
One way=20
to solve the problem of single cycle is :
=20
Break =
the=20
instruction into smaller steps
=
When=20
we break the instruction make =
sure=20
all the steps
&n=
bsp;=20
to have similar length(this is =
important)
=20
Execute each=20
step (instead of the entire instruction) in one =
cycle
=20
Cycle time: time it takes to execute the longest=20
step
=20
The advantages of the =
multiple cycle=20
processor:
=20
Cycle =
time is=20
much shorter
=20
Different=20
instructions take different number of number of steps =
=20
to complete=20
=20
&n=
bsp;=20
We have =
examples of=20
cylces below.
=20
Use =
ALU more=20
than once per instruction
=20
=20
Question: What =
will be one=20
instruction which uses ALU
&n=
bsp; =20
&n=
bsp; more=20
than once per instruction?
=20
Answer: =20
######Execution time =20
####################
We have to think execution time=20
equation.
=
Execution=20
time =3D Instructions/Program =
* Clock=20
cycles/Instruction
&n=
bsp; &nb=
sp; =20
*=
=20
Seconds/Clock cycle
## Performance of single cycle =
machine. =20
=20
one program consists of =
24%=20
loads, 12%stores,
&n=
bsp;=20
44% ALU instructions,18% branches
=20
and 2% jumps( Instruction mix).
=20
=20
Assume operation times for the major functional units are=20
following:
=20
Memory units: 2 ns(nano seconds).
=20
ALU and adders: 2 ns.
=20
Register file( read and write): 1ns.
=
Instruction
=
class =20
&n=
bsp;Fuctional=20
units used by the instruction class
(IF : instruction fetch; MEM: Memory =
access;) &n=
bsp; &nb=
sp; =20
ALU =
type =20
IF: =
Register=20
access : ALU =
: Register=20
access
=20
Load &n=
bsp; =20
IF: =
Register=20
access: =
ALU: MEM: Register=20
access
=20
Store &n=
bsp; =20
IF: =
Register=20
access: =
ALU: MEM:
=20
Branch &n=
bsp; =20
IF: =
Register=20
access: =
ALU: =20
=20
Jump &n=
bsp; =20
IF: =20
=20
We will compute the required time =20
for each instruction class
Instruction =
=
class =
instruction register ALU =20
Data =20
Register
&n=
bsp; =20
memory =20
read =20
operation =
memory write =20
total
=20
ALU type =
2 &n=
bsp; =20
1 &n=
bsp; 2 &n=
bsp; =20
0 &n=
bsp; =20
1 =20
6ns
=20
Load &n=
bsp; =20
2 &n=
bsp; =20
1 &n=
bsp; =20
2 &n=
bsp; =20
2 &n=
bsp; =20
1 =
8ns
=20
Store &n=
bsp; =20
2 &n=
bsp; =20
1 &n=
bsp; =20
2 &n=
bsp; =20
2 &n=
bsp; =20
0 =
7ns
=20
Branch =20
2 &n=
bsp; =20
1 &n=
bsp; =20
2 &n=
bsp; =20
0 &n=
bsp; =20
0 =
5ns
=20
Jump &n=
bsp;=20
2 &n=
bsp; =20
0 &n=
bsp; =20
0 &n=
bsp; =20
0 &n=
bsp; =20
0 =
2ns
Question: What will the =
clock cycle=20
for a single clock cycle machine?
## Answer:
#################################################
A machine with a variable clock will have a =
clock=20
cycle that varies
between=20
2ns and 8ns.
What will the average time per=20
instruction?
=20
CPU clock cycle =3D 8*24% + 7 *12%+ 6*44%+=20
5*18%+2*2%
&n=
bsp; &nb=
sp; &nbs=
p; =20
=3D 6.6ns. =20
A little bit fast but not much=20
difference.
##Performance of a single -cycle cpu with =
floating=20
point instructions#####
loads comprise =20
31% of the instructions. =
stores comprise 21% of the =
instructions. =
R format instructions comprise 27% of the=20
mix
Branches comprise =
5%
Jumps comprise 2%
FP add and subtract take the same time and =
together=20
total 7% of
&n=
bsp;=20
the instructions
FP multiply and divide take the same time =
and=20
together total 7% of
=20
the instructions
Assume we have a floating point unit that =
takes 8ns=20
for addition
=20
and 16ns for multiply.
##
Question: How long will it take for the =
floating=20
point add?
Answer:
##
Question: How long will it take for the =
floating=20
point multiply?
Answer:
CPU clock cycle =3D8*31% + 7 *21%+ 6*27%+ =
5*5%+2*2%=20
+12*7%+20*7%
&n=
bsp; &nb=
sp; =20
=3D 8ns.
Improved performance =3D =20
20/8
## This is part of the reason we have to =
make=20
multi-cycle implementation.
###############################################
Go back to page 85=20
datapath.
Question: Is there any way to reduce=20
H/W?
Answer:
#########################
We have two memories : we will combine two =
memories=20
&n=
bsp; &nb=
sp; =20
into a single memory unit for both instructions and data.=20
We have two ALUs: we will combine two ALUs =
into a=20
single ALU.
##
Question : What kind of H/W do we need =
after making=20
the modification.
Answer: =20
##############
New data Path. =20
Figure 5.33 .page 383.
##############
1)Instruction fetch =
step
2)Instruction decode and register fetch=20
step
3)Execution, memory address computation, or =
branch=20
completion
=20
(change the PC with branch target =
address)
4)Memory access or R type instruction =
completion=20
step
5)Memory read completion=20
step
1)Instruction fetch =
step
=
IR =3D=20
Memory[PC];
=
PC =3D PC=20
+4;
=
IR : Instruction=20
Register.
2)Instruction decode and register fetch=20
step
A =3D=20
GPR[IR[25-21]];
B =3D=20
GPR[IR[20-16]];
ALUOUT =
=3D PC +4 +=20
(sign-extend(IR[15-0]) << 2);
3)Execution,memory address computation, or =
branch=20
completion
=20
(change the PC with branch target =
address)
=20
Memory reference:
=20
ALUOut =
=3D A +=20
sign extend(IR[15-0]);
=20
Arithmetic logical instruction(R type)
=20
=20
ALUOut =3D A op B;
=20
Branch:
=
=20
if (A =3D B=3D) PC =3D ALUOut;
=20
Jump: =20
=20
PC =3D PC[31-28] || (IR[25-0] << =
2);
4)Memory access or R type instruction =
completion=20
step
=20
Memory reference:
=20
MDR =3D Memory [ALUOut];
=
or=20
=20
Memory [ ALUOut] =3D B;
=20
Arithmetic logical instruction(R type)
=20
GPR[IR[15-11]] =3D ALUOut;
5)Memory read completion=20
step
=20
Load
=20
GPR[IR[15-11]] =3D MDR;
=20
############## Control for Each State=20
###############
Finite State =
Machine(FSM).
FSM is =
used to=20
specify the multi cycle control.
=
FSM consists of a set of states =
and=20
directions on how to change
=20
states.
FSM =
review.
=20
Finite =
state=20
machines:
=
a=20
set of states and internal storage
=20
next =
state=20
function (determined by current state and the =
input)
=20
output =
function=20
(determined by current state and possibly input)
=20
We=92ll use a Moore machine (output based only on current=20
state)
=20
=20
Finite State Machine Concept.
=
Finite State Machine=20
Controller
=
Finite State =
Diagram Fig. =
5.42.
=B7 Next state from current=20
state
=B7 State 0 ->=20
State1
=B7 State 1 ->=20
S2,S6,S8,S9
=B7 State 2=20
->S3,S5
=B7 State 3=20
->S4
=B7 State 4 ->State=20
0
=B7 State 5 -> State=20
0
=B7 State 6 ->=20
S7
=B7 State 7 -> State=20
0
=B7 State 8 -> State=20
0
=B7 State 9 -> State=20
0
Or How can we reach each=20
state.
prior state &=20
condition
S4, S5, S7, S8, S9 &n=
bsp;=20
-> State0
S0 &n=
bsp; &nb=
sp; &nbs=
p; =20
-> State 1
S1 &(op =3D sw || op =3D =
lw) -> State =
2
S2 &op =3D lw &n=
bsp; =20
-> State 3
S3 &n=
bsp; &nb=
sp; &nbs=
p; =20
-> State 4
State 2 & op =3D sw &n=
bsp;=20
-> State 5
S1 & op =3D Rtype &n=
bsp; =20
-> State 6
State 6 &n=
bsp; &nb=
sp; =20
-> State 7
S1 & op =3D beq &n=
bsp; =20
-> State 8
State 1 & op =3D j &n=
bsp; =20
-> State 9
Implementation Technique: Programmed Logic=20
Arrays
Fig. PLA...
=B7 Given numbers of FSM, can =
determine=20
next state as function of inputs,
including current=20
state
Turn these into Boolean =
equations for=20
each bit of the next state lines
Can implement easily using=20
PLA
What if many more states, =
many more=20
conditions?
What if need to add a=20
state?