Why just increment 1? "Because of Alignment" "Every instruction is 4 bytes long". ADD R1, R2, R3 | 0 | P | RS | RT | RD | SA | FUN | |----|------|------|----------|------|------|-----| | C | ) | 2 | 3 | 1 | 0 | 20 | | 31 | 26.2 | 5 21 | 20 16 15 | 5 11 | 10 6 | 5 0 | ADD Rd, Rs Rt Where the "ADD" control signal for ALU comes? "From the instruction field "5-0" [function field] and OPCode(31-26)" OP FUN ALUOP OP FUN ALUOP ADD R1, R2, Rs 0 20 ADD SUB R4, R5, R6 0 22 SUB Can the above H/W perform ADDi Instructions?" What do we need? ADDi R7, R8, 5 ADDi rt,rs, imm We need to get the immediate data, and "RT" field becomes acting like destination field at "R" format. Where the "CNT" signal come? "From the OPCode field (31-26) OP FUN ALUOP ADD R1, R2, Rs SUB R4, R5, R6 ADDI R7, R8, 5 | 0, | LOIA | ALGO | | |----|------|------|--| | 0 | 20 | ADD | | | 0 | 22 | SUB | | | 8 | × | ADD | | 25-21 Rs 20-16 Rt (Probably not work due to fan-out) "Can the above H/W perform OR Instruction?" "Do we need more H/W?" It also can perform "ORi" without having more H/W." So far we don't have any instructions to access memory and we don't have data memory unit yet. LW R7, 8(R9); LW rt, address | | imm | rt | Rs | OP | |---|-----|-------|-------|-------| | | 8 | 7 | 89 | 23 | | 0 | 5 | 0 161 | 5 212 | 31 26 | How to implement this? And what will be ALU operation? One way to do this is using the ALU to compute memory address and put the output of ALU (computed memory address; (R9+8)) to the address of memory and put the data from the memory register file (R7). Memory output should go back to register file and ALU output should go back to register file. But this time ALU output is memory address and Dout is data. This is a collision. How can we solve this? => Multiplexor Where the CNTMUX come? => If "LW" instruction, then the mux should select input from memory. => So, from Control Signal. Can the above H/W perform "LB"? What will be problems? How can you modify above HW to perform "LB"? LB is loading only one byte and LW is loading 4bytes. So we have to think about either modifying memory unit (Memory Module) to get one byte data at a time (LW, we assume the memory unit giving us 4 bytes data). So Let's assume the memory unit giving us 4 bytes data, then we have to truncate extra 3 bytes data, and we have to think about memory address. If we assume we could access arbitrary memory address, then we need logic to do that. LB R1 2(R3) LB rt, address | OP | R | S I | rt | offset | | |----|-------|-------|-------|--------|---| | 20 | | 3 | 1 | 2 | | | 31 | 26 25 | 21 20 | 16 15 | | 0 | We need logic to truncate extra 3 bytes data and memory address logic to access arbitrary address ( Alignment problem). So when we perform "LBU", truncation logic will give us byte data with 20 zeroes padded to upper part. So we could make TRUC\_CNT and MEM\_CNT from Control Signal. It(LBU) can be done with above H/W. But think about "SB". Can "SB" be done with above approach? ## TRUNCATE LOGIC MUX-select (from Control Signal) if LB≅> select 24 bits zeroes if LW=> select 24 bits from memory But because of alignment problem and accessing arbitrary address the logic will be complicated. The above logic is working when accessing "0000" but what about accessing "0001"? The problem of SB is we are saving only one byte out of 4 bytes. So we have to think about how to save data to the memory. For SW, if we assume can save 4 bytes by giving an address, "SW will be easily implemented. SW R10, 11(R12) | 2 | 8 | 12 | 10 | 11 | | |----|-------|-----|-------|----|---| | 31 | 26 25 | 212 | 20 16 | 15 | 0 | SW rt, address | | offset | rt | Rs | P | 0 | |---|--------|------|----|-------|----| | | 11 | 10 | 12 | 3 | 28 | | 0 | 15 | 20 1 | 21 | 26 25 | 31 | Can "SB" be done with above "HW"? Can we remove the new datapath? Can we just by pass through ALU? When we cover Verilog, we implement ALU unit. It can perform "ADD", "SUBTRACTION", "SLT"," AND", "OR", "ByPassA" and "ByPassB". So, we could bypass Drt value through ALU, we can remove the new datapath. BUT, The Answer is NO. No., ALU is performing Address calculation (RS + offset). So, we can not bypass this time. Then we need the new datapath. Again, even we can perform SW, doing "SB" is not OK. For "LB", we could use "LW" and then truncate some bits (24 bits). But for "SB" case, we can not do that. One way of doing that is "LW" and modifying byte information and using "SW". But this brings extra "LW". So, we may assume that we could store byte and for SW instruction, we assume we could store 4 bytes. Then we need control circuit between Drt and Din. SB rt, address SB R10, 11(R12) | 0 | 20 | ADD | |----|-------------------------------|---------------------------------------------| | 0 | 22 | SUB | | 8 | X | ADD | | 0 | 25 | OR | | 23 | × | ADD | | 20 | × | ADD | | 2b | Х | ADD | | 28 | Х | ADD | | | 0<br>8<br>0<br>23<br>20<br>2b | 0 22<br>8 X<br>0 25<br>23 X<br>20 X<br>2b X | New Control Logic Do we need extra datapath to perform SIt? Do we need extra HW to perform SIt? Can the above H/W perform "SLTI"? ox1000: BEQ R10, R11,12 What will be the target address? How to compute the target address? BEQ rs, rt, label | _ | OP | rs | rt | offset | | |----|-------|----|----|--------|---| | 4 | 1 | 10 | 11 | 12 | | | 31 | 26 25 | 21 | 20 | 16 15 | 0 | What kind of action do we need? - a) compute the target address - b) check the branch condition (rs=rt) Action a) needs Addition Action b) needs Addition [ Subtraction] ## Target Address Target Address Computation logic: Added Branch condition logic: Added Multiplexor for PC+4 and Target address : Added What's wrong with this? (What if an instruction performs subtraction and the result is zero; what will be the next address? OP FUN ALUOP ADD R1, R2, R3 0 20 ADD SUB R4, R5, R6 0 22 SUB ADDi R7, R8, 5 ADD OR R9, R10, R11 0 25 OR LW R7, 8(R9) 23 X ADD LB R1, 2(R3) 20 X ADD SW R10, 11(R12) ADD SB R10, 11(R12) 28 ADD SLT R1, R2, R3 SUBSLT SLTi R4, R5, 6 SUBSLT BEQ R10, R11, 12 SUB Can the previous page H/W perform "BNE"? - a) computing target address - b) computing branch condition If branch condition is true, the next instruction to be executed is the branch target, instead of PC+4. So, we need to change the H/W 76 BNE R10, R11, 12 BNE Rs, Rt, label | | offset | rt | rs | OP | |---|--------|-------|-------|-------| | | 12 | 11 | 10 | 5 | | 0 | 8 15 | 120 1 | 25 21 | 31 26 | JUMP Jose J target j ox2000 | 2 | 2 | ox200 | 0 | | |----|-------|-------|-------|---| | 31 | 26 25 | 21 20 | 16 15 | 0 | How to compute the target address? a) Address computation logic is needed. ## 0x10000000 jr 0x2000 0X 1 0 0 0 8 0 0 0 (target address) Sign Extention Control Signal JUMP "Can the above H/W perform jr R10?" Instruction Memory Jr R10; jr Rs Target address is in R10. So we need a datapath from D/rs to NPC. The "Control" comes from bit 31-26 and bit 5-0 "Jr Rs" If we want to perform jal, what do we need more? OP FUN ALUOP | 1 | ADD R1, R2, R3 | |-----|------------------| | 100 | SUB R4, R5, R6 | | 1 | ADDi R7, R8, 5 | | 0 | OR R9, R10, R11 | | 1 | W R7, 8(R9) | | ı | .B R1, 2(R3) | | 40 | SW R10, 11(R12) | | 44 | SB R10, 11(R12) | | 93 | SLT R1, R2, R3 | | 00 | SLTI R4, R5, 6 | | E | BEQ R10, R11, 12 | | Е | BNE | | J | 2000 | | J | R R10 | | 0 | 20 | ADD | |----|----|-----| | 0 | 22 | SUB | | 8 | х | ADD | | 0 | 25 | OR | | 23 | Х | ADD | | 20 | Х | ADD | | 2b | Х | ADD | | 28 | х | ADD | | 0 | 2a | SLT | | а | Х | SLT | | 4 | Х | SUB | | 5 | Х | SUB | | 2 | Х | x | | 0 | 8 | X | For Jal instruction, everything is the same as j instruction, except, we have to save $\gamma$ CORSESSED BEEFE TO BEEFE TO BEEFE PC+8 or NPC+4 to R31 Jal ox2000 | : | 3 | 2000 | | |----|-------|------|---| | 31 | 26 25 | | 0 | So we need data path to D/In from NPC and datapath to Awrite to point ra (r31) This way, we could save PC+8 to ra. The control and CNT signals can be coming from Control logic. | 1 | 3 | 2000 | | |----|-------|------|---| | 31 | 26 25 | | 0 | "jal ox2000" "Added HW for saving R31" "Jalr Rs, Rd" Instruction. Everything is same to jr rs instruction, except we have to save the return address to "rd". | | 1 | 25 | RH | Rd | | | | |----|-------|------|-----|-------|-------|-----|---| | 0 | 1 | 0 | 0 | 11 | 0 | 9 | | | 31 | 26 25 | 21.2 | 0 1 | 16 15 | 11 10 | 6.5 | 0 | Jalr R10, R11 Do we need new datapath? No "jalr R10,R11 We don't need new datapath