## LSU EE 3755 -- -- Computer Organization
#
## Control Logic for MIPS -- fall 2003
## Dec. 1. 2003
## Contents
#
# Single Cycle control logic for the Datapath (page 85)
# Hard wired and Micro Programmed Controller
# Multi Cycle control logic
# Finite State Diagram and Finite State Machine
# Programmable Logic Array
# Single Cycle control logic for the Datapath (page 85)
Control logic for Page
85 of Datapath..
We simply have to give
a control signal for each Multiplexor , the ALU
and design logic for
control circuit near the PC.
1) the ALU
2) Mux at ALU
3)Mux at data Memory and ALU.
4)Mux at Rd for Register file
5)Mux at line 15-11 for
Register file
6)Mux at D/IN for
Register file
7) Mux and control for
NPC
8)Mux for the control
after the ALU
1) the ALU
ALU can perform
AND,
OR,
SLT,
ADD,
SUB
operations.
We will generate those control signals
from
OP code field[31:26] and func field[5:0]
We have control inputs for ALU.
ALU_OP[2:0] Operation
000 AND
001 OR
010 SLT
011 ADD
100 SUB
OP FUN
ALUoperation : OP FUN
ALU_OP
AND 0 0X
24 AND 000000 100100 000
ANDI 6 X AND 000110 XXXXXX 000
OR 0
0x25 OR 000000 100101 001
ORI d X OR
001101 XXXXXX 001
SLT 0
0x2a SLT 000000 101010 010
SLTI a X SLT 001010 XXXXXX 010
ADD 0
0X20 ADD 000000 100000
011
ADDI 8
X ADD 001000 XXXXXX 011
SUB 0
0X22 SUB 000000
100010 100
LW 23
X ADD 100011 XXXXXX 011
LB 20
X ADD 100000 XXXXXX 011
SW 2b
X ADD 100000 XXXXXX 011
SB 28
X ADD 101000 XXXXXX 011
BEQ 4
X SUB 000100
XXXXXX 100
BNE 5
X SUB 000101 XXXXXX 100
By using AND and OR gates, We could easily generate
ALU_OP.
After logic minimization, the
output will be simpler.
##############################################################
There are many other ways to do
this.
One simple way is using decoders
to decode OP code and function
field.
6 inputs and 64 outputs for
each decoder.
000 AND
001 OR
010 SLT
011 ADD
100 SUB
ALU_OP[2] = x0 y34 + x4
+ x5
ALU_OP[1] = slt add
(0 2a) + a + (0 20) + 8 + 23 +
20 +2b + 28
= x0y42+x10+x0y32+x8+x35+x32+x43+x40
ALU_OP[0] = or add (0 25) + d+ (0 20) + 8 + 23 + 20 +2b + 28
= x0y37+x13+ x0y32+x8+x35+x32+x43+x40
## Fig. for Decoders
###################################################################
second simple way is using a
ROM.
the address for the ROM is OP
code and funcion field and the ouput is ALU_OP.
## Fig. for ROM
####################################################################
2) Mux at ALU
the Mux will select D/rt or sign
extended immed.
if select for the Mux is 0, it will
select D/rt otherwise select sign extended immed.
We call the select signal
Mux_ALU_CNT.
ANDI,ORI,SLTI,ADDI,LW,LB,SW,SB use
immediate values.
OP FUN ALUoperation :
OP FUN Mux_ALU_CNT
ANDI 6
X AND 000110 XXXXXX 1
ORI
d X OR 001101
XXXXXX 1
SLTI
a X SLT 001010
XXXXXX 1
ADDI
8 X ADD 001000
XXXXXX 1
LW
23 X ADD 100011 XXXXXX 1
LB
20 X ADD 100000
XXXXXX 1
SW
2b X ADD 100000
XXXXXX 1
SB
28 X ADD
101000 XXXXXX 1
## We could implement
the logic with AND and OR gates.
## We could use a
decoder with OR gate to implement this.
## We could use a ROM
with 6bit address field.
3)Mux at data Memory and ALU.
the Mux will select ALU_output or
Dout.
if select for the Mux is 0, it will
select ALU_output ,
otherwise select Dout.
We call the select signal
Mux_DataMem.
Only LW and LB will set Mux_DataMem.
OP FUN
ALUoperation : OP FUN Mux_DataMem
LW
23 X ADD 100011
XXXXXX 1
LB 20 X ADD
100000 XXXXXX 1
## We could implement
the logic with AND and OR gates.
## We could use a
decoder with OR gate to implement this.
## We could use a ROM
with 6bit address field.
4)Mux at Rd for Register file
the Mux will select "#31" output of another Mux.
if select for the Mux is 0, it will select output of another
Mux
,otherwise select
"#31".
We call the select signal Mux_Rd_CNT.
Only Jal will set Mux_Rd_CNT.
OP FUN
ALUoperation : OP
FUN Mux_Rd_CNT
JAL
3 X X 000011
XXXXXX 1
## We could implement the logic with AND
and OR gates.
## We could use a decoder to implement this.
5)Mux at line 15-11 for
Register file
the Mux will select lines 15 to 11 or lines 20 to 16.
if select for the Mux is 0, it will
select lines 15 to 11 ,
otherwise select lines 20 to 16.
We call the select signal Mux_Line11.
Whenever RT field acts like RD field,
we will set Mux_Line11.
Whenever they try to write something,
they need Rd field(some times
Rt) .
OP FUN
ALUoperation : OP
FUN Mux_Line11.
ANDI 6 X
AND 000110 XXXXXX 1
ORI d X
OR 001101 XXXXXX 1
SLTI a X
SLT 001010 XXXXXX 1
ADDI 8 X
ADD 001000 XXXXXX 1
LW 23 X
ADD 100011 XXXXXX 1
LB 20 X
ADD 100000 XXXXXX 1
## We could implement the logic with AND
and OR gates.
## We could use a decoder with OR gate to
implement this.
## We could use a ROM with 6bit address
field.
6)Mux at D/IN for
Register file
the Mux will select either output from Mux 3(control signal for
this is
Mux_dataMem) or lines from NPC +4(PC
+8).
if select for the Mux is 0, it will
select output from Mux3,
otherwise select lines from NPC + 4.
We call the select signal
Mux_Line_NPC_plus4.
Whenever we have to save NPC + 4, we
will set Mux_Line_NPC_plus4.
Only Jal and Jalr will set
Mux_Line_NPC_plus4.
OP
FUN ALUoperation : OP
FUN ALU_OP: Mux_Line_NPC_plus4.
jal
3 X X 000011 XXXXXX
X 1
jalr
0 9 X 000000 001001 X 1
## We could implement the logic with AND
and OR gates.
## We could use two decoders with OR gate
and AND gate
to implement this.
## We could use a ROM with 12bit address
field.
7) Mux and control for
NPC
We have to design control logic for NPC_CNT
(controller before the Mux).
The controller will do one of four things.
1)check if branch condition is met and if the instruction is branch
instruction, it will generate
select signal
for the Mux(Mux_NPC_CNT =00).
the mux will select branch
address(Target address at the figure).
2) check if the instruction is
jump instruction,
it will generate select
signal
for the Mux(Mux_NPC_CNT =01).
the mux will select jump
address.
3)check if the instruction is jr
or jalr instruction,
it will generate select signal
for the Mux(Mux_NPC_CNT =10).
the mux will select jump
address(Drs from the register file).
4)otherwise, either branch condition failed or regular
instruction.
it will generate select
signal
for the Mux(Mux_NPC_CNT =11).
the mux will select NPC +4
address lines.
The part of inputs for this
controller come from
Control Signal which
tell the instruction is
branch
instruction(NPC_CNT_SIG = 00) , or
the instruction is jump or
jal instruction(NPC_CNT_SIG = 01), or
the instruction is jr or jalr instruction(NPC_CNT_SIG =
10).
the instruction is regular
instruction(NPC_CNT_SIG = 11).
The other part of inputs for
this controller come
from Mux 8 which tells the branch condition met or
not(Mux8_output = 1
; branch condition met).
The Control Signal
should generate NPC_CNT_SIG.
OP FUN
ALUoperation : OP FUN
NPC_CNT_SIG[1:0]
BEQ
4 X SUB 000100 XXXXXX 00
BNE
5 X SUB 000101 XXXXXX 00
J 2 X X 000010 XXXXXX 01
jal 3 X X 000011 XXXXXX 01
Jr 0 8 X
000000 001000 10
jalr 0 9 X 000000 001001 10
## We could implement the logic with AND
and OR gates.
## We could use two decoders with Inverters
and AND gates to
# implement this.
## We could use a ROM with 12bit address
field.
Now we need to generate
signals MuX_NPC_CNT.
1)check if branch condition is met and if the instruction is branch
instruction, it will generate
select signal for the
Mux(Mux_NPC_CNT =00).
the mux will select branch
address(Target address at the figure).
if NPC_CNT_SIG =00 and Mux_output = 1 , then
Mux_NPC_CNT[1:0] =00.
2) check if the instruction is
jump instruction, it will generate select
signal for the Mux(Mux_NPC_CNT =01).
the mux will select jump
address.
if NPC_CNT_SIG =01 then
Mux_NPC_CNT[1:0] = 01.
3)check if the instruction is jr
instruction, it will generate select
signal for the
Mux(Mux_NPC_CNT =10).
the mux will select jump
address(Drs from the register file).
if NPC_CNT_SIG =10 then
Mux_NPC_CNT[1:0] = 10.
4)otherwise, either regular instruction, or
branch instruction and
branch condition failed .
it will generate select
signal
for the Mux(Mux_NPC_CNT =11).
the mux will select NPC +4
address lines.
if NPC_CNT_SIG =11 or
NPC_CNT_SIG =00 and Mux_output = 0 ,
then Mux_NPC_CNT[1:0] =11.
################################################################
## We could implement the logic with AND
and OR gates.
## We could use two decoders with OR gate
and AND gate
## to implement this.
## We could use a ROM with 12bit address
field.
The logic is simple, so try to implement
with gates.
Mux_NPC_CNT[0] = SIG[0] +
MUX' *SIG[1]'
SIG 00 01 11 1 0
0 1
1 1 0
MUX 1
0 1 1 0
Mux_NPC_CNT[1] = SIG[1 ] + MUX' *SIG[0]'
SIG 00 01 11 1 0
0 1 0 1
1
MUX 1
0 0 1
1
8)Mux for the control
after the ALU
the Mux will select either zero
from ALU or zero'.
if select for the Mux is 0, it will
select zero, otherwise select
zero'.
We call the select signal Mux_Branch.
Whenever we have BEQ, Mux will select zero otherwise , select zero'.
OP FUN
ALUoperation : OP FUN Mux_Branch
BEQ
4 X SUB 000100 XXXXXX 0
BNE
5 X SUB 000101 XXXXXX 1
################################################################
## We could implement
the logic with AND gate and Inverters.
## We could use a
decoder to implement this.
this is just output x5 of OP code decoder.
9)R/W signal for Data
memory.
R/W = 0 means read.
R/W = 1 means write.
OP FUN
ALUoperation : OP FUN
R/W
LW 23 X ADD 100011 XXXXXX 0
LB 20 X ADD 100000 XXXXXX 0
SW 2b X ADD 100000 XXXXXX 1
SB 28 X ADD 101000 XXXXXX 1
## We could use a
decoder to implement this.
this is ( x43 + x40) of OP code decoder.
# Multi Cycle control logic
### Hard
Wired and Micro programmed Controller ##
We
have a high level language program(c
program ..)
We
could compile it to Mips assembly language instruction(compiler).
We could translate assembly language to Mips machine language(assembler).
(c
program => assembly language => machine language).
For a CPU with a hardwired controller:
each machine language instruction is decoded and executed like a finite state
machine.
For a CPU with a micro programmed controller:
Each machine
language instruction is defined by a
set of microinstructions and each
microinstruction is decoded and
Executed
by micro sequencer (Micro CPU)
## Multi Cycle
Implementation####
##### Performance
###################################
So far, we assume we could finish every
instruction in a single cycle, which
means we have to finish every
instruction in a fixed time.
That means we have to
set the clock frequency to the slowest instruction.
Usually memory access
instruction or floating point instruction is
the
slowest one.
Although the CPI(cycle
per instruction) is 1, the overall
performance of a single-cycle implementation is not likely to be very good,
since several of the instruction classes could fit in a shorter clock
cycle.
One way to solve the problem of single cycle
is :
Break the instruction into smaller steps
When we break the instruction
make sure all the steps
to have similar length(this is
important)
Execute each step (instead of the entire instruction) in one
cycle
Cycle time: time it takes to execute
the longest step
The advantages of the multiple cycle processor:
Cycle time is much shorter
Different instructions take different number of number of
steps
to
complete
We have examples of cylces below.
Use ALU more than once per instruction
Question: What will be one instruction which uses ALU
more than once per instruction?
Answer:
######Execution
time ####################
We have to think
execution time equation.
Execution time =
Instructions/Program * Clock
cycles/Instruction
* Seconds/Clock cycle
## Performance of
single cycle machine.
one program consists of 24% loads, 12%stores,
44% ALU instructions,18% branches
and 2% jumps( Instruction mix).
Assume operation times for the
major functional units are following:
Memory units: 2 ns(nano seconds).
ALU and adders: 2 ns.
Register file( read and write):
1ns.
Instruction
class Fuctional units used by the instruction class
(IF : instruction
fetch; MEM: Memory access;)
ALU type IF: Register
access : ALU : Register access
Load IF:
Register access: ALU: MEM:
Register access
Store IF:
Register access: ALU: MEM:
Branch IF:
Register access: ALU:
Jump IF:
We will compute the required time for each instruction class
Instruction
class instruction
register ALU Data Register
memory read operation memory write total
ALU type 2
1 2
0 1 6ns
Load 2
1 2 2 1 8ns
Store 2
1 2 2 0 7ns
Branch 2
1 2 0 0 5ns
Jump 2
0 0 0 0 2ns
Question: What will the clock cycle for a single
clock cycle machine?
## Answer:
#################################################
A machine with a variable
clock will have a clock cycle that varies
between 2ns and 8ns.
What will the average
time per instruction?
CPU clock cycle = 8*24% + 7 *12%+
6*44%+ 5*18%+2*2%
=
6.6ns.
A little bit fast but not much difference.
##Performance of a
single -cycle cpu with floating point instructions#####
loads comprise 31% of the instructions.
stores comprise 21% of
the instructions.
R format instructions
comprise 27% of the mix
Branches comprise 5%
Jumps comprise 2%
FP add and subtract
take the same time and together total 7% of
the instructions
FP multiply and divide
take the same time and together total 7% of
the instructions
Assume we have a
floating point unit that takes 8ns for addition
and 16ns for multiply.
##
Question: How long will
it take for the floating point add?
Answer:
##
Question: How long will
it take for the floating point multiply?
Answer:
CPU clock cycle =8*31%
+ 7 *21%+ 6*27%+ 5*5%+2*2% +12*7%+20*7%
= 8ns.
Improved
performance = 20/8
## This is part of the
reason we have to make multi-cycle implementation.
###############################################
Go back to page 85
datapath.
Question: Is there any
way to reduce H/W?
Answer:
#########################
We have two memories :
we will combine two memories
into a single memory
unit for both instructions and data.
We have two ALUs: we
will combine two ALUs into a single ALU.
##
Question : What kind of
H/W do we need after making the modification.
Answer:
##############
New data Path. Figure 5.33 .page 383.
##############
1)Instruction fetch
step
2)Instruction decode
and register fetch step
3)Execution, memory
address computation, or branch completion
(change the PC with branch target address)
4)Memory access or R
type instruction completion step
5)Memory read
completion step
1)Instruction fetch
step
IR = Memory[PC];
PC = PC +4;
IR
: Instruction Register.
2)Instruction decode
and register fetch step
A = GPR[IR[25-21]];
B = GPR[IR[20-16]];
ALUOUT = PC +4 + (sign-extend(IR[15-0])
<< 2);
3)Execution,memory
address computation, or branch completion
(change the PC with branch target address)
Memory reference:
ALUOut
= A + sign extend(IR[15-0]);
Arithmetic logical instruction(R type)
ALUOut
= A op B;
Branch:
if
(A = B=) PC = ALUOut;
Jump:
PC = PC[31-28] || (IR[25-0] <<
2);
4)Memory access or R
type instruction completion step
Memory reference:
MDR = Memory [ALUOut];
or
Memory [ ALUOut] = B;
Arithmetic logical instruction(R type)
GPR[IR[15-11]] = ALUOut;
5)Memory read
completion step
Load
GPR[IR[15-11]] = MDR;
############## Control for Each State
###############
Finite State
Machine(FSM).
FSM is used to specify the multi cycle
control.
FSM
consists of a set of states and directions on how to change
states.
FSM review.
Finite state machines:
a set of states and internal storage
next state function (determined by current state and the
input)
output function (determined by current state and possibly
input)
We’ll use a Moore machine (output
based only on current state)
Finite State Machine Concept.
Finite State Machine Controller
Finite State Diagram Fig.
5.42.
· Next state from current state
· State 0 -> State1
· State 1 -> S2,S6,S8,S9
· State 2 ->S3,S5
· State 3 ->S4
· State 4 ->State 0
· State 5 -> State 0
· State 6 -> S7
· State 7 -> State 0
· State 8 -> State 0
· State 9 -> State 0
Or How can
we reach each state.
prior state
& condition
S4, S5, S7,
S8, S9 -> State0
S0 ->
State 1
S1 &(op
= sw || op = lw) -> State 2
S2 &op
= lw -> State
3
S3 ->
State 4
State 2
& op = sw -> State 5
S1 & op
= Rtype -> State 6
State
6 ->
State 7
S1 & op
= beq -> State 8
State 1
& op = j ->
State 9
Implementation
Technique: Programmed Logic Arrays
Fig. PLA...
· Given numbers of FSM, can determine next
state as function of inputs,
including
current state
Turn these into Boolean equations for each bit of the next state
lines
Can implement easily using PLA
What if many more states, many more conditions?
What if need to add a state?