├── .gitignore ├── LICENSE ├── README.md ├── asm └── mips1.asm ├── img ├── branch_front_forwarding.png ├── branch_front_stall.png ├── branch_load_forwarding.png ├── branch_load_stall.png ├── design.png ├── inst_num.png ├── loaduse_forwarding.png ├── loaduse_stall.png ├── max_delay.png ├── resource.png ├── result_18.jpg ├── result_3.jpg ├── schematic.png ├── simulation_matching.png ├── simulation_v0.png ├── summary.png └── timing.png ├── src ├── ALU.v ├── ALUControl.v ├── ALUForwarding.v ├── BranchForwarding.v ├── BranchJudge.v ├── CLK.v ├── Control.v ├── DataMemory.v ├── EX_MEM.v ├── ID_EX.v ├── IF_ID.v ├── ImmProcess.v ├── InstructionMemory.v ├── MEM_WB.v ├── PC.v ├── PipelineCPU.v ├── RegisterFile.v ├── constrain.xdc └── test_pipeline.v └── 实验报告.md /.gitignore: -------------------------------------------------------------------------------- 1 | *.pdf 2 | *.rar -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 TCL 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # MIPS五级流水线CPU 2 | 3 | 本仓库使用`verilog`编写MIPS五级流水线CPU。 4 | 5 | ## 设计方案 6 | 7 | ### 基本框架 8 | 9 | 5级流水线,并实现`forwarding`相关电路。同时,我也实现了`Branch`指令在`ID`阶段提前跳转的功能,并做出了一系列调整保证CPU安全稳定的运行,成功避免了冒险的产生,加速了CPU的运行。 10 | 11 | 存储结构上采用哈佛结构,数据存储器与指令存储器分离。 12 | 13 | ### 设计实现的指令集 14 | 15 | 设计的流水线CPU,能够实现大多数MIPS指令,在春季学期在单周期、多周期CPU上已实现的指令外,还增添了以下指令:`lb、bne、blez、bgtz、bltz、jal、jalr、jr、jalr` 等。 16 | 17 | ### 设计框图 18 | 19 | 设计框图如下: 20 | 21 | ![design](img/design.png) 22 | 23 | ## 原理说明与部分代码实现 24 | 25 | ### 控制信号 26 | 27 | 控制信号在我的代码中,由`Control.v`实现译码。根据指令的`OpCode`和`Funct`,将生成以下控制信号:`Branch、RegWrite、RegDst、MemRead、MemWrite、MemtoReg、ALUSrc1、ALUSrc2、ExtOp、LuOp、Jop、LoadByte`。 28 | 29 | 相比于多周期CPU,新增添的控制信号为`JOp`和`LoadByte`,前者用于指示该条指令是否为跳转指令,方便CPU进行跳转与stall;后者用于指示该条指令是否为`lb`指令,方便CPU从主存中直接取出字节。 30 | 31 | ### 五级流水线原理 32 | 33 | 将指令的执行阶段划分为5个阶段,分别为:指令获取(IF)、指令译码(ID)、计算执行(EX)、访问主存(MEM)、写回寄存器堆(WB)。每两个阶段间,设计一个暂存的寄存器,用于存储该条指令在接下来的阶段中会用到的控制信号。 34 | 35 | 由于总共需要有4组寄存器,来存取5个阶段间的信息传递,我将这4组寄存器命名为:`IF_ID、ID_EX、EX_MEM、MEM_WB`。其中`IF_ID`寄存器的输入有`flush`和`hold`信号,用于刷新与保持寄存器信息;`ID_EX`寄存器的输入有`flush`信号,用于刷新寄存器信息。它们的具体用法在下面涉及stall的时候详细介绍。 36 | 37 | ### Stall 原理与实现 38 | 39 | #### 分支或跳转指令后stall 40 | 41 | 在分支指令或跳转指令后,由于两种指令我都设计为在`ID`阶段就完成跳转,因此在它们之后都只需要`stall`一个周期。`stall`的具体方法为:如果在`ID`阶段的`Branch`信号为真,或者`JOp`信号为真,则设置`IF_ID`寄存器的`flush`信号,使`IF_ID`寄存器在下一周期刷新,同时设置下一帧的`PC`为跳转的地址(若`Branch`指令判断为`False`,则`PC`还是会变为`PC+4`)。 42 | 43 | 设置`flush_IFID`的代码如下: 44 | 45 | ~~~verilog 46 | assign flush_IFID = Branch_ID || JOp_ID; 47 | ~~~ 48 | 49 | 设置`PC`下一帧的代码如下: 50 | 51 | ~~~verilog 52 | assign PC_new = (RegWrite_EX && Branch_ID && (Rw_EX == rs_ID || Rw_EX == rt_ID) && Load_EX) ? PC_now - 4 : 53 | hold_IFID ? PC_now : 54 | PCSrc_ID == 1 ? {PC_ID[31:28], rs_ID, rt_ID, rd_ID, Shamt_ID, Funct_ID, 2'b00} : 55 | PCSrc_ID == 2 ? dataA_ID + 4: 56 | Branch_ID ? PC_Branch : 57 | PC_now + 4; 58 | ~~~ 59 | 60 | 其中,第3行是针对`j`指令跳转的表达式,第4行是针对`jr`等指令跳转的表达式,第5行是针对`Branch`指令跳转的表达式。`Branch`指令在`ID`阶段就已完成判断,因此`PC_Branch`在`ID`阶段就已经被计算好,这样跳转就不会发生问题。 61 | 62 | `PC_Branch`的计算方法如下 63 | 64 | ~~~verilog 65 | assign PC_Branch = Branch_ID && Zero ? PC_ID + 4 + ImmExtShift_ID : PC_ID + 4; 66 | ~~~ 67 | 68 | 其中`Zero`信号会根据`Branch`指令的不同来对应产生,如`beq`指令产生两输入是否相等的信号,`bne`指令产生两输入是否不等的信号。 69 | 70 | #### 分支指令前stall 71 | 72 | 由于在`ID`阶段提前判断了分支指令,这里可能会产生数据冒险,因此分支指令前也可能需要`stall`。 73 | 74 | 细节而言,分为两种情况: 75 | 76 | ##### 情形一:分支指令前是`R`型指令或计算型的`I`型指令 77 | 78 | 如果`Branch`的前一条指令是`R`型指令或计算型的`I`型指令,且前一条指令要写回的寄存器是分支指令需要用于比较的寄存器`rs`或`rt`时,会引起数据冒险。 79 | 80 | ![branch_front_stall](img/branch_front_stall.png) 81 | 82 | 如图所示,如果`Branch`前是`R`型指令或计算型的`I`型指令,且有数据冒险时,`ALU`的计算结果要到`Branch`指令的`ID`阶段结束之后才会被计算出来,这已经无法使用`forwarding`的方法让`Branch`指令正确运行了。此时需要让`Branch`指令`stall`一个周期后,再将前一条指令的`ALUOut`转发到`Branch`指令的`ID`阶段。如下图所示: 83 | 84 | ![branch_front_forwarding](img/branch_front_forwarding.png) 85 | 86 | 转发操作的实现在下面的转发单元中再仔细介绍,这里先介绍`stall`是如何实现的。 87 | 88 | 这里`Branch`指令需要`stall`一个周期,只需将`IF_ID`寄存器保持住,`ID_EX`寄存器刷新即可。 89 | 90 | 虽然在`stall`的时候,`PC`的值仍会变化,但是由于无论如何,当`Branch`指令执行完`ID`后,都会给`PC`一个新值,故此时`stall`不需要关注`PC`的变化。 91 | 92 | ##### 情形二:分支指令前是`lb`或`lw`指令 93 | 94 | 如果分支前的指令是`lb`或`lw`指令,且Load出来的数据要被`Branch`指令用到的话,也会引起数据冒险。与情形一不同,此时数据最早出现在Load指令的`MEM`阶段,因此`Branch`指令需要`stall`两个周期。 95 | 96 | 数据冒险如图所示: 97 | 98 | ![branch_load_stall](img/branch_load_stall.png) 99 | 100 | `stall`两个周期后,就可以实现转发,示意图如下: 101 | 102 | ![branch_load_forwarding](img/branch_load_forwarding.png) 103 | 104 | 这里`stall`执行起来相比情形一,略微复杂一些。 105 | 106 | 具体操作是:首先要`flush`寄存器`IF_ID`和寄存器`ID_EX`,然后需要将`PC-4`。这是因为如果仅仅`hold` `IF_ID`寄存器,只能`stall`一个周期;只有通过`flush` `IF_ID`寄存器的同时,将当前`PC`(即已经执行到`Branch`的`ID`阶段时,在`IF`阶段取出来的`PC`)重新置为`PC-4`才能保证`stall`两个周期。 107 | 108 | 置为`PC-4`时一定是正确的,这是因为我已经确定了前一条被执行的指令是`Load`指令,而不是跳转或分支指令。 109 | 110 | ##### 情形一与情形二的代码细节 111 | 112 | 控制信号`flush_IFID`、`hold_IFID`、`flush_IDEX`的逻辑如下: 113 | 114 | ~~~verilog 115 | assign flush_IFID = Branch_ID || JOp_ID; 116 | assign hold_IFID = ((RegWrite_EX && Branch_ID && (Rw_EX == rs_ID || Rw_EX == rt_ID)) && Load_EX == 0) || 117 | (MemRead_EX && (rt_EX == rs_ID || rt_EX == rt_ID) && Load_EX); // next inst is branch && !Load, stall || load use hazard 118 | assign flush_IDEX = (RegWrite_EX && Branch_ID && (Rw_EX == rs_ID || Rw_EX == rt_ID)) || 119 | (MemRead_EX && (rt_EX == rs_ID || rt_EX == rt_ID) && Load_EX); 120 | ~~~ 121 | 122 | 这里`hold_IFID`和`flush_IFID`的后面那部分是`Load-Use`冒险检测,前面那部分才是分支指令相关。 123 | 124 | 其中,`flush_IFID`与`hold_IFID`都是对`IF_ID`寄存器的控制,在不同情况下有着不同的优先级,具体实现代码如下: 125 | 126 | ~~~verilog 127 | always @(posedge clk or posedge reset) begin 128 | if(reset || (flush_IFID && Load_EX)) begin 129 | // flush 130 | // ... 131 | end 132 | else if (hold_IFID) begin 133 | // hold 134 | // ... 135 | end 136 | else if (flush_IFID) begin 137 | // flush 138 | // ... 139 | end 140 | else begin 141 | // decode 142 | OpCode <= Instruction[31:26]; 143 | rs <= Instruction[25:21]; 144 | rt <= Instruction[20:16]; 145 | rd <= Instruction[15:11]; 146 | Shamt <= Instruction[10:6]; 147 | Funct <= Instruction[5:0]; 148 | PC_ID <= PC_IF; 149 | end 150 | end 151 | ~~~ 152 | 153 | 当目前`EX`阶段是`Load`指令时,`flush_IFID`比`hold_IFID`有着更高的优先级,这是因为此时需要`stall`两个周期;当目前`EX`阶段不是`Load`指令时,`hold_IFID`比`flush_IFID`有更高的优先级。 154 | 155 | 设置`PC-4`的代码如下: 156 | 157 | ~~~verilog 158 | assign PC_new = (RegWrite_EX && Branch_ID && (Rw_EX == rs_ID || Rw_EX == rt_ID) && Load_EX) ? PC_now - 4 : 159 | hold_IFID ? PC_now : 160 | PCSrc_ID == 1 ? {PC_ID[31:28], rs_ID, rt_ID, rd_ID, Shamt_ID, Funct_ID, 2'b00} : 161 | PCSrc_ID == 2 ? dataA_ID + 4: 162 | Branch_ID ? PC_Branch : 163 | PC_now + 4; 164 | ~~~ 165 | 166 | 第1行就是设置`PC-4`的代码,具体逻辑是:如果`EX`阶段是`Load`,下一条指令是`Branch`,且`Load`要写回的寄存器是`Branch`要用到的,则下一帧的`PC`设为`PC-4`。 167 | 168 | #### Load-Use冒险检测并stall 169 | 170 | 当前一条指令是`lb`或`lw`,下一条指令是`R`型指令或计算型的`I`型指令,且`Load`要写入的寄存器会被下一条指令用到时,会引起数据冒险。此时在`Load`指令后需要`stall`一个周期。原理图如下: 171 | 172 | ![loaduse_stall](img/loaduse_stall.png) 173 | 174 | `Load`出来的数据最早在`MEM`阶段后才出现,而`Use`的时候在`EX`阶段就已经需要了,因此`Load`后要`stall`一个周期,并转发`LoadData`。如下图所示: 175 | 176 | ![loaduse_forwarding](img/loaduse_forwarding.png) 177 | 178 | 具体实现为:执行到`Load`指令的`EX`阶段时,可以判断下一条指令是否为`Use`且是否存在数据冒险。如果存在,则在下一周期保持`Use`指令的`IF_ID`寄存器,并清空`ID_EX`寄存器。 179 | 180 | 代码上就是: 181 | 182 | ~~~verilog 183 | assign hold_IFID = ((RegWrite_EX && Branch_ID && (Rw_EX == rs_ID || Rw_EX == rt_ID)) && Load_EX == 0) || 184 | (MemRead_EX && (rt_EX == rs_ID || rt_EX == rt_ID) && Load_EX); // next inst is branch && !Load, stall || load use hazard 185 | assign flush_IDEX = (RegWrite_EX && Branch_ID && (Rw_EX == rs_ID || Rw_EX == rt_ID)) || 186 | (MemRead_EX && (rt_EX == rs_ID || rt_EX == rt_ID) && Load_EX); 187 | ~~~ 188 | 189 | 上面代码中,`hold_IFID`与`flush_IDEX`的后半部分,是`Load-Use`的冒险检测部分。 190 | 191 | ### Forwarding 原理与实现 192 | 193 | #### Forwarding 到`ID`阶段 194 | 195 | 由于在我的设计中,`Branch`指令需要在`ID`阶段提前判断,因此我需要实现转发到`ID`阶段的操作,以解决`Branch`指令中存在的数据冒险。 196 | 197 | 我设置了`BrForwardingA`和`BrForwardingB`两个转发单元控制信号,来控制`ID`阶段中`Branch`指令判断的两个输入。 198 | 199 | `Branch`指令用于判断的两个输入变量,可以来自于三个方面: 200 | 201 | - `WriteData_WB`:即上一条指令从`DataMem`中取出的数据,适用于分支指令前为`Load`指令的场景。 202 | - `ALUOut_MEM`:即上一条指令的`ALU`输出,适用于分支指令前为`R`型指令或计算型`I`型指令的场景。 203 | - `dataA_ID` or `dataB_ID`:直接从寄存器堆中根据`rs`与`rt`的值取出的数据,适用于没有数据冒险时的场景。 204 | 205 | 以上三个场景分别对应于`BrForwarding`控制信号为:2、1、0。 206 | 207 | 我设计的`Branch`转发单元实现如下: 208 | 209 | ~~~verilog 210 | assign BrForwardingA = rs == Rw_WB && Load_WB ? 2 : rs == Rw_MEM && RegWrite_MEM ? 1 : 0; 211 | assign BrForwardingB = rt == Rw_WB && Load_WB ? 2 : rt == Rw_MEM && RegWrite_MEM ? 1 : 0; 212 | ~~~ 213 | 214 | 以`BrForwardingA`为例: 215 | 216 | - 如果`rs == Rw_WB && Load_WB`,说明前一条指令是`Load`(已经`stall`了两个周期),且写回的寄存器与`rs`相同,因此将`BrForwardingA`设为2。 217 | - 如果`rs == Rw_MEM && RegWrite_MEM`,说明前一条指令是`R`型指令或计算型`I`型指令(已经`stall`了一个周期),且写回的寄存器与`rs`相同,因此将`BrForwardingA`设为1。 218 | - 没有数据冒险时,`BrForwardingA`默认是0。 219 | 220 | 然后,`ID`阶段对`Branch`判断的输入`BrJudger`,会根据`BrForwarding`信号进行选择,代码如下: 221 | 222 | ~~~verilog 223 | assign BrJuderA = BrForwardingA == 1 ? ALUOut_MEM : BrForwardingA == 2 ? WriteData_WB : dataA_ID; 224 | assign BrJuderB = BrForwardingB == 1 ? ALUOut_MEM : BrForwardingB == 2 ? WriteData_WB : dataB_ID; 225 | ~~~ 226 | 227 | #### Forwarding 到`EX`阶段 228 | 229 | `EX`阶段`ALU`的输入,可能会有4种来源,分别是: 230 | 231 | - `dataA_EX` or `dataB_EX`:从寄存器堆中读取出来并随流水线传到`EX`阶段的数据。 232 | - 移位量`Shamt`或立即数`ImmExtOut`。 233 | - `ALUOut_MEM`:上一条指令的`ALU`计算结果。 234 | - `WriteData_WB`:上上条指令`ALU`计算结果,或者是上条指令`Load`的结果。 235 | 236 | 我设置的转发选择信号为`ALUChooseA`与`ALUChooseB`。以上四个场景分别对应于`ALUChoose`为:0、1、2、3。 237 | 238 | 我设计的转发单元代码如下: 239 | 240 | ~~~verilog 241 | assign ALUChooseA = ALUSrcA_EX == 1 ? 1 : 242 | (RegWrite_MEM && (Rw_MEM == rs_EX) && (Rw_MEM != 0)) ? 2 : // 优先判断MEM阶段,即前一条指令 243 | (RegWrite_WB && (Rw_WB == rs_EX) && (Rw_WB != 0)) ? 3 : 0; 244 | assign ALUChooseB = ALUSrcB_EX == 1 ? 1 : 245 | (RegWrite_MEM && (Rw_MEM == rt_EX) && (Rw_MEM != 0)) ? 2 : // 优先判断MEM阶段,即前一条指令 246 | (RegWrite_WB && (Rw_WB == rt_EX) && (Rw_WB != 0)) ? 3 : 0; 247 | ~~~ 248 | 249 | 这里,`ALUSrcA_EX`和`ALUSrcB_EX`是指令译码单元解码出来的 250 | 251 | 控制信号,用于指示是否要使用移位量或立即数。后面的判断就是关于转发的判断。 252 | 253 | 优先判断前一条指令是否满足转发条件,不满足时再判断前前条指令是否满足条件。 254 | 255 | 以`ALUChooseA`为例,判断的逻辑是:如果前一条指令要写回寄存器堆,且写回的寄存器为`rs`,且该寄存器不为`$0`,则将前一条指令的`ALU`输出转发到目前`EX`阶段指令的输入。如果前一条指令不满足转发条件,则看前前条指令(也包括前一条指令为`Load`的情况)。如果在`WB`阶段的要写回寄存器堆,且`WB`阶段写回的寄存器为`rs`,且该寄存器不是`$0`,则将要写回的值转发到`ALU`的输入。如果上述的转发条件都不满足,则直接使用从寄存器堆中读取的值。 256 | 257 | 有了`ALUChoose`信号后,就可以对`ALU`的输入进行选择,代码如下: 258 | 259 | ~~~verilog 260 | assign ALUinA = ALUChooseA == 1 ? {27'h0000000, Shamt_EX} : 261 | ALUChooseA == 2 ? ALUOut_MEM : 262 | ALUChooseA == 3 ? WriteData_WB: dataA_EX; 263 | assign ALUinB = ALUChooseB == 1 ? ImmExtOut_EX : 264 | ALUChooseB == 2 ? ALUOut_MEM : 265 | ALUChooseB == 3 ? WriteData_WB: dataB_EX; 266 | ~~~ 267 | 268 | ### 数据存储器 269 | 270 | 数据存储器的大小我设置为`512`个字大小,字节地址从`0x00000000`到`0x000007FF`。 271 | 272 | 在字节地址为`0x4000000C`的位置,我设置其对应外部`LEDs`的控制信息;在字节地址为`0x40000010`的位置,我设置其对应七段数码管的控制信息。 273 | 274 | ### Load Byte 的实现 275 | 276 | `Load Byre`大体上和`Load Word`类似。我只是单独添加了一个`LoadByte`控制信号,并根据该控制信号来选择是`LoadByte`还是`LoadWord`。 277 | 278 | 大概思路是,先用`LoadWord`把一个字取出来,再根据地址的后2位,选取对应的`Byte`,并进行符号拓展后返回。 279 | 280 | 代码如下: 281 | 282 | ~~~verilog 283 | assign ReadData_MEM = LoadByte_MEM == 0 ? ReadData_Temp : 284 | ALUOut_MEM[1:0] == 2'b00 ? {{24{ReadData_Temp[7]}}, ReadData_Temp[7:0]} : 285 | ALUOut_MEM[1:0] == 2'b01 ? {{24{ReadData_Temp[15]}}, ReadData_Temp[15:8]} : 286 | ALUOut_MEM[1:0] == 2'b10 ? {{24{ReadData_Temp[23]}}, ReadData_Temp[23:16]} : 287 | {{24{ReadData_Temp[31]}}, ReadData_Temp[31:24]}; 288 | ~~~ 289 | 290 | 其中,`ReadData_Temp`是从`DataMemory`中读取出的字。 291 | 292 | -------------------------------------------------------------------------------- /asm/mips1.asm: -------------------------------------------------------------------------------- 1 | .text 2 | main: 3 | addi $a0 $zero 32 #len_str 4 | addi $a1 $zero 0 #*str 5 | addi $a2 $zero 4 #len_pattern 6 | addi $a3 $zero 400 #*pattern 7 | 8 | #brute_force: 9 | sub $t2 $a0 $a2 #calculate len_str - len_pattern 10 | addi $t0 $zero 0 #i = 0 11 | addi $v0 $zero 0 #cnt = 0 12 | 13 | loop1: 14 | slt $t3 $t2 $t0 #if(len_str - len_pattern) in2[31] ? 1 : 0) 35 | : lt_low31 36 | ); 37 | 38 | always @(*) begin 39 | case(ALUCtrl) 40 | ADD: out <= in1 + in2; 41 | SUB: out <= in1 - in2; 42 | AND: out <= in1 & in2; 43 | OR: out <= in1 | in2; 44 | XOR: out <= in1 ^ in2; 45 | NOR: out <= ~(in1 | in2); 46 | SLL: out <= in2 << in1[4:0]; 47 | SRL: out <= in2 >> in1[4:0]; 48 | SRA: out <= {{32{in2[31]}}, in2} >> in1[4:0]; 49 | SLT: out <= {31'h00000000, Sign ? lt_sign : in1 < in2}; 50 | default: out <= 0; 51 | endcase 52 | end 53 | 54 | endmodule -------------------------------------------------------------------------------- /src/ALUControl.v: -------------------------------------------------------------------------------- 1 | `timescale 1ns / 1ps 2 | module ALUControl ( 3 | OpCode, 4 | Funct, 5 | ALUCtrl, 6 | Sign 7 | ); 8 | 9 | parameter ADD = 0; 10 | parameter SUB = 1; 11 | parameter AND = 2; 12 | parameter OR = 3; 13 | parameter XOR = 4; 14 | parameter NOR = 5; 15 | parameter SLL = 6; 16 | parameter SRL = 7; 17 | parameter SRA = 8; 18 | parameter SLT = 9; 19 | 20 | input [5:0] OpCode; 21 | input [5:0] Funct; 22 | output [4:0] ALUCtrl; 23 | output Sign; 24 | 25 | reg [4:0] ALUCtrl; 26 | reg Sign; 27 | 28 | always @(OpCode, Funct) begin 29 | case(OpCode) 30 | 6'h23: ALUCtrl <= ADD; // lw 31 | 6'h20: ALUCtrl <= ADD; // lb 32 | 6'h2b: ALUCtrl <= ADD; // sw 33 | 6'h0f: ALUCtrl <= ADD; // lui 34 | 6'h08: begin 35 | ALUCtrl <= ADD; 36 | Sign <= 1; 37 | end 38 | 6'h09: begin 39 | ALUCtrl <= ADD; 40 | Sign <= 0; 41 | end 42 | 6'h0c: ALUCtrl <= AND; 43 | 6'h0a: begin 44 | ALUCtrl <= SLT; 45 | Sign <= 1; 46 | end 47 | 6'h0b: begin 48 | ALUCtrl <= SLT; 49 | Sign <= 0; 50 | end 51 | 6'h04: ALUCtrl <= SUB; // beq 52 | default: begin // Opcode = 0 53 | case(Funct) 54 | 6'h20: begin 55 | ALUCtrl <= ADD; 56 | Sign <= 1; 57 | end 58 | 6'h21: begin 59 | ALUCtrl <= ADD; 60 | Sign <= 0; 61 | end 62 | 6'h22: begin 63 | ALUCtrl <= SUB; 64 | Sign <= 1; 65 | end 66 | 6'h23: begin 67 | ALUCtrl <= SUB; 68 | Sign <= 0; 69 | end 70 | 6'h24: ALUCtrl <= AND; 71 | 6'h25: ALUCtrl <= OR; 72 | 6'h26: ALUCtrl <= XOR; 73 | 6'h27: ALUCtrl <= NOR; 74 | 6'h00: ALUCtrl <= SLL; 75 | 6'h02: begin 76 | ALUCtrl <= SRL; 77 | Sign <= 0; 78 | end 79 | 6'h03: begin 80 | ALUCtrl <= SRA; 81 | Sign <= 1; 82 | end 83 | 6'h2a: begin 84 | ALUCtrl <= SLT; 85 | Sign <= 1; 86 | end 87 | 6'h2b: begin 88 | ALUCtrl <= SLT; 89 | Sign <= 0; 90 | end 91 | 6'h08: ALUCtrl <= ADD; // jr 92 | 6'h09: ALUCtrl <= ADD; // jalr 93 | default: ALUCtrl <= ALUCtrl; 94 | endcase 95 | end 96 | endcase 97 | end 98 | 99 | endmodule -------------------------------------------------------------------------------- /src/ALUForwarding.v: -------------------------------------------------------------------------------- 1 | `timescale 1ns / 1ps 2 | module ALUForwarding( 3 | input wire [4:0] rs_EX, 4 | input wire [4:0] rt_EX, 5 | input wire [4:0] Rw_MEM, 6 | input wire [4:0] Rw_WB, 7 | input wire RegWrite_MEM, 8 | input wire RegWrite_WB, 9 | input wire ALUSrcA_EX, 10 | input wire ALUSrcB_EX, 11 | output wire [1:0] ALUChooseA, // A: 0-dataA 1-Shamt 2-WriteData_MEM 3-WriteData_WB 12 | output wire [1:0] ALUChooseB // B: 0-dataB 1-Imm 2-WriteData_MEM 3-WriteData_WB 13 | ); 14 | 15 | assign ALUChooseA = ALUSrcA_EX == 1 ? 1 : 16 | (RegWrite_MEM && (Rw_MEM == rs_EX) && (Rw_MEM != 0)) ? 2 : // 优先判断MEM阶段,即前一条指令 17 | (RegWrite_WB && (Rw_WB == rs_EX) && (Rw_WB != 0)) ? 3 : 0; 18 | assign ALUChooseB = ALUSrcB_EX == 1 ? 1 : 19 | (RegWrite_MEM && (Rw_MEM == rt_EX) && (Rw_MEM != 0)) ? 2 : // 优先判断MEM阶段,即前一条指令 20 | (RegWrite_WB && (Rw_WB == rt_EX) && (Rw_WB != 0)) ? 3 : 0; 21 | 22 | endmodule -------------------------------------------------------------------------------- /src/BranchForwarding.v: -------------------------------------------------------------------------------- 1 | `timescale 1ns / 1ps 2 | module BranchForwarding( 3 | input wire [4:0] rs, 4 | input wire [4:0] rt, 5 | input wire [4:0] Rw_MEM, 6 | input wire RegWrite_MEM, 7 | input wire Load_WB, 8 | input wire [4:0] Rw_WB, 9 | output wire [1:0] BrForwardingA, 10 | output wire [1:0] BrForwardingB 11 | ); 12 | 13 | assign BrForwardingA = rs == Rw_WB && Load_WB ? 2 : 14 | rs == Rw_MEM && RegWrite_MEM ? 1 : 0; 15 | assign BrForwardingB = rt == Rw_WB && Load_WB ? 2 : 16 | rt == Rw_MEM && RegWrite_MEM ? 1 : 0; 17 | 18 | endmodule -------------------------------------------------------------------------------- /src/BranchJudge.v: -------------------------------------------------------------------------------- 1 | `timescale 1ns / 1ps 2 | module BranchJudge( 3 | input wire [5:0] OpCode, 4 | input wire [31:0] data1_ID, 5 | input wire [31:0] data2_ID, 6 | input wire Branch, 7 | output reg Zero 8 | ); 9 | 10 | always@(*) begin 11 | if(Branch) begin 12 | if(OpCode == 6'h04) begin // beq 13 | Zero <= data1_ID == data2_ID; 14 | end 15 | else if(OpCode == 6'h05) begin // bne 16 | Zero <= data1_ID != data2_ID; 17 | end 18 | else if(OpCode == 6'h06) begin // blez 19 | Zero <= (data1_ID <= data2_ID); 20 | end 21 | else if(OpCode == 6'h07) begin // bgtz 22 | Zero <= (data1_ID > data2_ID); 23 | end 24 | else if(OpCode == 6'h01) begin // bltz 25 | Zero <= (data1_ID < data2_ID); 26 | end 27 | else begin 28 | Zero <= Zero; 29 | end 30 | end 31 | else begin 32 | Zero <= 0; 33 | end 34 | end 35 | 36 | endmodule -------------------------------------------------------------------------------- /src/CLK.v: -------------------------------------------------------------------------------- 1 | module CLK( 2 | input wire sysclk, 3 | input wire reset, 4 | output reg clk 5 | ); 6 | 7 | reg [31:0] count; 8 | 9 | initial begin 10 | count <= 0; 11 | end 12 | 13 | always@(posedge sysclk or posedge reset) begin 14 | if(reset) begin 15 | count <= 0; 16 | clk <= 0; 17 | end 18 | else if(count >= 2) begin 19 | clk <= ~clk; 20 | count <= 0; 21 | end 22 | else begin 23 | count <= count + 1; 24 | end 25 | end 26 | 27 | endmodule -------------------------------------------------------------------------------- /src/Control.v: -------------------------------------------------------------------------------- 1 | `timescale 1ns / 1ps 2 | module Control(OpCode, Funct, 3 | PCSrc, Branch, RegWrite, RegDst, 4 | MemRead, MemWrite, MemtoReg, 5 | ALUSrc1, ALUSrc2, ExtOp, LuOp, JOp, LoadByte 6 | ); 7 | input wire [5:0] OpCode; 8 | input wire [5:0] Funct; 9 | output wire [1:0] PCSrc; 10 | output wire Branch; 11 | output wire RegWrite; 12 | output wire [1:0] RegDst; 13 | output wire MemRead; 14 | output wire MemWrite; 15 | output wire [1:0] MemtoReg; 16 | output wire ALUSrc1; 17 | output wire ALUSrc2; 18 | output wire ExtOp; 19 | output wire LuOp; 20 | output wire JOp; 21 | output wire LoadByte; 22 | 23 | // Your code below 24 | 25 | assign PCSrc = (OpCode == 6'h02 || OpCode == 6'h03) ? 1 : // j,jal: 1 -- jr,jalr: 2 -- 0 26 | (OpCode == 0 && (Funct == 6'h08 || Funct == 6'h09)) ? 2 : 0; 27 | 28 | assign Branch = OpCode == 6'h04 || OpCode == 6'h05 || OpCode == 6'h06 || OpCode == 6'h07 || OpCode == 6'h01; 29 | 30 | assign RegWrite = (OpCode == 6'h2b || OpCode == 6'h04 || OpCode == 6'h05 || OpCode == 6'h06 || OpCode == 6'h07 || OpCode == 6'h01 || 31 | OpCode == 6'h02 || (OpCode == 6'h00 && Funct == 6'h08)) ? 0 : 1; 32 | 33 | assign RegDst = (OpCode == 6'h23 || OpCode == 6'h20 || OpCode == 6'h0f || OpCode == 6'h08 || OpCode == 6'h09 || 34 | OpCode == 6'h0c || OpCode == 6'h0a || OpCode == 6'h0b) ? 0 : 35 | OpCode == 6'h03 ? 2 : 1; // 0: rt; 1: rd; 2: ra 36 | 37 | assign MemRead = OpCode == 6'h23 || OpCode == 6'h20; 38 | 39 | assign MemWrite = OpCode == 6'h2b; 40 | 41 | assign MemtoReg = OpCode == 6'h23 || OpCode == 6'h20 ? 1 : 42 | (OpCode == 6'h03 || (OpCode == 6'h00 && Funct == 6'h09)) ? 2 : 0; 43 | 44 | assign ALUSrc1 = (OpCode == 6'h00 && (Funct == 6'h00 || Funct == 6'h02 || Funct == 6'h03)) ? 1 : 0; 45 | 46 | assign ALUSrc2 = (OpCode == 6'h23 || OpCode == 6'h20 || OpCode == 6'h2b || OpCode == 6'h0f || 47 | OpCode == 6'h08 || OpCode == 6'h09 || OpCode == 6'h0c || 48 | OpCode == 6'h0a || OpCode == 6'h0b) ? 1 : 0; 49 | 50 | assign ExtOp = OpCode == 6'h0c ? 0 : 1; 51 | 52 | assign LuOp = OpCode == 6'h0f; 53 | 54 | assign JOp = OpCode == 6'h02 || OpCode == 6'h03 || (OpCode == 6'h00 && (Funct == 6'h08 || Funct == 6'h09)); 55 | 56 | assign LoadByte = OpCode == 6'h20; 57 | // Your code above 58 | 59 | endmodule -------------------------------------------------------------------------------- /src/DataMemory.v: -------------------------------------------------------------------------------- 1 | `timescale 1ns / 1ps 2 | module DataMemory(clk, reset, Address, Write_data, Read_data, MemRead, MemWrite, LEDData, BCDData, an); 3 | input clk, reset; 4 | input [31:0] Address, Write_data; 5 | input MemRead, MemWrite; 6 | output [31:0] Read_data; 7 | output reg [7:0] LEDData; 8 | output wire [7:0] BCDData; 9 | output wire [3:0] an; 10 | reg [11:0] ANBCDData; 11 | 12 | parameter RAM_SIZE = 512; 13 | parameter RAM_SIZE_BIT = 9; 14 | 15 | reg [31:0] RAM_data[RAM_SIZE - 1: 0]; 16 | assign Read_data = MemRead ? RAM_data[Address[RAM_SIZE_BIT + 1:2]]: 32'h00000000; 17 | assign BCDData = ANBCDData[7:0]; 18 | assign an = ANBCDData[11:8]; 19 | 20 | integer j; 21 | initial begin 22 | LEDData <= 0; 23 | ANBCDData <= 0; 24 | 25 | // for(i = 0; i < RAM_SIZE; i = i + 1) begin 26 | // case(i) 27 | // 0: RAM_data[0] <= 32'h756e696c; 28 | // 1: RAM_data[1] <= 32'h6e736978; 29 | // 2: RAM_data[2] <= 32'h6e75746f; 30 | // 3: RAM_data[3] <= 32'h73697869; 31 | // 4: RAM_data[4] <= 32'h75746f6e; 32 | // 5: RAM_data[5] <= 32'h6978696e; 33 | // 6: RAM_data[6] <= 32'h746f6e73; 34 | // 7: RAM_data[7] <= 32'h78696e75; 35 | // 100: RAM_data[100] <= 32'h78696e75; 36 | // default: RAM_data[i] <= 32'h0; 37 | // endcase 38 | // end 39 | 40 | for(i = 0; i < RAM_SIZE; i = i + 1) begin 41 | case(i) 42 | 0: RAM_data[0] <= 32'h64636261; 43 | 1: RAM_data[1] <= 32'h63626178; 44 | 2: RAM_data[2] <= 32'h65737364; 45 | 3: RAM_data[3] <= 32'h64636261; 46 | 4: RAM_data[4] <= 32'h61786373; 47 | 5: RAM_data[5] <= 32'h66646362; 48 | 6: RAM_data[6] <= 32'h63626173; 49 | 7: RAM_data[7] <= 32'h62617664; 50 | 8: RAM_data[8] <= 32'h61616463; 51 | 9: RAM_data[9] <= 32'h62616463; 52 | 10: RAM_data[10] <= 32'h61636463; 53 | 11: RAM_data[11] <= 32'h61706362; 54 | 12: RAM_data[12] <= 32'h6e646362; 55 | 13: RAM_data[13] <= 32'h64636261; 56 | 14: RAM_data[14] <= 32'h72657771; 57 | 15: RAM_data[15] <= 32'h64636261; 58 | 16: RAM_data[16] <= 32'h6362616c; 59 | 17: RAM_data[17] <= 32'h61676764; 60 | 18: RAM_data[18] <= 32'h67646362; 61 | 19: RAM_data[19] <= 32'h64636261; 62 | 20: RAM_data[20] <= 32'h63626167; 63 | 21: RAM_data[21] <= 32'h65636564; 64 | 22: RAM_data[22] <= 32'h6d656161; 65 | 23: RAM_data[23] <= 32'h64636261; 66 | 24: RAM_data[24] <= 32'h72646b6c; 67 | 25: RAM_data[25] <= 32'h64636261; 68 | 26: RAM_data[26] <= 32'h62616262; 69 | 27: RAM_data[27] <= 32'h63636463; 70 | 28: RAM_data[28] <= 32'h64636261; 71 | 100: RAM_data[100] <= 32'h64636261; 72 | default: RAM_data[i] <= 32'h00000000; 73 | endcase 74 | end 75 | 76 | end 77 | 78 | integer i; 79 | always @(posedge reset or posedge clk) 80 | if (reset) begin 81 | LEDData <= 0; 82 | ANBCDData <= 0; 83 | // for(i = 0; i < RAM_SIZE; i = i + 1) begin 84 | // case(i) 85 | // 0: RAM_data[0] <= 32'h756e696c; 86 | // 1: RAM_data[1] <= 32'h6e736978; 87 | // 2: RAM_data[2] <= 32'h6e75746f; 88 | // 3: RAM_data[3] <= 32'h73697869; 89 | // 4: RAM_data[4] <= 32'h75746f6e; 90 | // 5: RAM_data[5] <= 32'h6978696e; 91 | // 6: RAM_data[6] <= 32'h746f6e73; 92 | // 7: RAM_data[7] <= 32'h78696e75; 93 | // 100: RAM_data[100] <= 32'h78696e75; 94 | // default: RAM_data[i] <= 32'h0; 95 | // endcase 96 | // end 97 | 98 | for(i = 0; i < RAM_SIZE; i = i + 1) begin 99 | case(i) 100 | 0: RAM_data[0] <= 32'h64636261; 101 | 1: RAM_data[1] <= 32'h63626178; 102 | 2: RAM_data[2] <= 32'h65737364; 103 | 3: RAM_data[3] <= 32'h64636261; 104 | 4: RAM_data[4] <= 32'h61786373; 105 | 5: RAM_data[5] <= 32'h66646362; 106 | 6: RAM_data[6] <= 32'h63626173; 107 | 7: RAM_data[7] <= 32'h62617664; 108 | 8: RAM_data[8] <= 32'h61616463; 109 | 9: RAM_data[9] <= 32'h62616463; 110 | 10: RAM_data[10] <= 32'h61636463; 111 | 11: RAM_data[11] <= 32'h61706362; 112 | 12: RAM_data[12] <= 32'h6e646362; 113 | 13: RAM_data[13] <= 32'h64636261; 114 | 14: RAM_data[14] <= 32'h72657771; 115 | 15: RAM_data[15] <= 32'h64636261; 116 | 16: RAM_data[16] <= 32'h6362616c; 117 | 17: RAM_data[17] <= 32'h61676764; 118 | 18: RAM_data[18] <= 32'h67646362; 119 | 19: RAM_data[19] <= 32'h64636261; 120 | 20: RAM_data[20] <= 32'h63626167; 121 | 21: RAM_data[21] <= 32'h65636564; 122 | 22: RAM_data[22] <= 32'h6d656161; 123 | 23: RAM_data[23] <= 32'h64636261; 124 | 24: RAM_data[24] <= 32'h72646b6c; 125 | 25: RAM_data[25] <= 32'h64636261; 126 | 26: RAM_data[26] <= 32'h62616262; 127 | 27: RAM_data[27] <= 32'h63636463; 128 | 28: RAM_data[28] <= 32'h64636261; 129 | 100: RAM_data[100] <= 32'h64636261; 130 | default: RAM_data[i] <= 32'h00000000; 131 | endcase 132 | end 133 | end 134 | else if (MemWrite) begin 135 | if(Address == 32'h4000000C) begin 136 | LEDData <= Write_data[7:0]; 137 | end 138 | else if(Address == 32'h40000010) begin 139 | ANBCDData <= Write_data[11:0]; 140 | end 141 | else begin 142 | RAM_data[Address[RAM_SIZE_BIT + 1:2]] <= Write_data; 143 | end 144 | end 145 | else begin 146 | ANBCDData <= ANBCDData; 147 | LEDData <= LEDData; 148 | end 149 | endmodule 150 | -------------------------------------------------------------------------------- /src/EX_MEM.v: -------------------------------------------------------------------------------- 1 | module EX_MEM( 2 | input wire clk, 3 | input wire reset, 4 | 5 | input wire MemRead_EX, 6 | input wire MemWrite_EX, 7 | input wire [31:0] ALUOut_EX, 8 | input wire [4:0] Rw_EX, 9 | input wire [1:0] MemtoReg_EX, 10 | input wire RegWrite_EX, 11 | input wire [31:0] rt_EX, 12 | input wire LoadByte_EX, 13 | input wire [31:0] PC_EX, 14 | input wire Load_EX, 15 | 16 | output reg MemRead_MEM, 17 | output reg MemWrite_MEM, 18 | output reg [31:0] ALUOut_MEM, 19 | output reg [4:0] Rw_MEM, 20 | output reg [1:0] MemtoReg_MEM, 21 | output reg RegWrite_MEM, 22 | output reg [31:0] rt_MEM, 23 | output reg LoadByte_MEM, 24 | output reg [31:0] PC_MEM, 25 | output reg Load_MEM 26 | ); 27 | 28 | initial begin 29 | MemRead_MEM <= 0; 30 | MemWrite_MEM <= 0; 31 | ALUOut_MEM <= 0; 32 | Rw_MEM <= 0; 33 | MemtoReg_MEM <= 0; 34 | RegWrite_MEM <= 0; 35 | rt_MEM <= 0; 36 | LoadByte_MEM <= 0; 37 | PC_MEM <= 0; 38 | Load_MEM <= 0; 39 | end 40 | 41 | always@(posedge clk or posedge reset) begin 42 | if(reset) begin 43 | MemRead_MEM <= 0; 44 | MemWrite_MEM <= 0; 45 | ALUOut_MEM <= 0; 46 | Rw_MEM <= 0; 47 | MemtoReg_MEM <= 0; 48 | RegWrite_MEM <= 0; 49 | rt_MEM <= 0; 50 | LoadByte_MEM <= 0; 51 | PC_MEM <= 0; 52 | Load_MEM <= 0; 53 | end 54 | else begin 55 | MemRead_MEM <= MemRead_EX; 56 | MemWrite_MEM <= MemWrite_EX; 57 | ALUOut_MEM <= ALUOut_EX; 58 | Rw_MEM <= Rw_EX; 59 | MemtoReg_MEM <= MemtoReg_EX; 60 | RegWrite_MEM <= RegWrite_EX; 61 | rt_MEM <= rt_EX; 62 | LoadByte_MEM <= LoadByte_EX; 63 | PC_MEM <= PC_EX; 64 | Load_MEM <= Load_EX; 65 | end 66 | end 67 | 68 | endmodule -------------------------------------------------------------------------------- /src/ID_EX.v: -------------------------------------------------------------------------------- 1 | module ID_EX( 2 | input wire clk, 3 | input wire reset, 4 | input wire flush_IDEX, 5 | //input wire hold_IDEX, 6 | 7 | input wire RegWrite_ID, 8 | input wire Branch_ID, 9 | input wire MemRead_ID, 10 | input wire MemWrite_ID, 11 | input wire [1:0] MemtoReg_ID, 12 | input wire ALUSrcA_ID, 13 | input wire ALUSrcB_ID, 14 | input wire [4:0] ALUCtrl_ID, 15 | input wire [1:0] RegDst_ID, 16 | input wire [31:0] dataA_ID, 17 | input wire [31:0] dataB_ID, 18 | input wire [31:0] ImmExtOut_ID, 19 | input wire [4:0] Shamt_ID, 20 | input wire [4:0] rs_ID, 21 | input wire [4:0] rt_ID, 22 | input wire [4:0] rd_ID, 23 | input wire Sign_ID, 24 | input wire LoadByte_ID, 25 | input wire [31:0] PC_ID, 26 | input wire Load_ID, 27 | 28 | output reg RegWrite_EX, 29 | output reg Branch_EX, 30 | output reg MemRead_EX, 31 | output reg MemWrite_EX, 32 | output reg [1:0] MemtoReg_EX, 33 | output reg ALUSrcA_EX, 34 | output reg ALUSrcB_EX, 35 | output reg [4:0] ALUCtrl_EX, 36 | output reg [1:0] RegDst_EX, 37 | output reg [31:0] dataA_EX, 38 | output reg [31:0] dataB_EX, 39 | output reg [31:0] ImmExtOut_EX, 40 | output reg [4:0] Shamt_EX, 41 | output reg [4:0] rs_EX, 42 | output reg [4:0] rt_EX, 43 | output reg [4:0] rd_EX, 44 | output reg Sign_EX, 45 | output reg LoadByte_EX, 46 | output reg [31:0] PC_EX, 47 | output reg Load_EX 48 | ); 49 | 50 | initial begin 51 | RegWrite_EX <= 0; 52 | Branch_EX <= 0; 53 | MemRead_EX <= 0; 54 | MemWrite_EX <= 0; 55 | MemtoReg_EX <= 0; 56 | ALUSrcA_EX <= 0; 57 | ALUSrcB_EX <= 0; 58 | ALUCtrl_EX <= 0; 59 | RegDst_EX <= 0; 60 | dataA_EX <= 0; 61 | dataB_EX <= 0; 62 | ImmExtOut_EX <= 0; 63 | Shamt_EX <= 0; 64 | rs_EX <= 0; 65 | rt_EX <= 0; 66 | rd_EX <= 0; 67 | Sign_EX <= 0; 68 | LoadByte_EX <= 0; 69 | PC_EX <= 0; 70 | Load_EX <= 0; 71 | end 72 | 73 | always@(posedge clk or posedge reset) begin 74 | if(reset || flush_IDEX) begin 75 | RegWrite_EX <= 0; 76 | Branch_EX <= 0; 77 | MemRead_EX <= 0; 78 | MemWrite_EX <= 0; 79 | MemtoReg_EX <= 0; 80 | ALUSrcA_EX <= 0; 81 | ALUSrcB_EX <= 0; 82 | ALUCtrl_EX <= 0; 83 | RegDst_EX <= 0; 84 | dataA_EX <= 0; 85 | dataB_EX <= 0; 86 | ImmExtOut_EX <= 0; 87 | Shamt_EX <= 0; 88 | rs_EX <= 0; 89 | rt_EX <= 0; 90 | rd_EX <= 0; 91 | Sign_EX <= 0; 92 | LoadByte_EX <= 0; 93 | PC_EX <= 0; 94 | Load_EX <= 0; 95 | end 96 | else begin 97 | RegWrite_EX <= RegWrite_ID; 98 | Branch_EX <= Branch_ID; 99 | MemRead_EX <= MemRead_ID; 100 | MemWrite_EX <= MemWrite_ID; 101 | MemtoReg_EX <= MemtoReg_ID; 102 | ALUSrcA_EX <= ALUSrcA_ID; 103 | ALUSrcB_EX <= ALUSrcB_ID; 104 | ALUCtrl_EX <= ALUCtrl_ID; 105 | RegDst_EX <= RegDst_ID; 106 | dataA_EX <= dataA_ID; 107 | dataB_EX <= dataB_ID; 108 | ImmExtOut_EX <= ImmExtOut_ID; 109 | Shamt_EX <= Shamt_ID; 110 | rs_EX <= rs_ID; 111 | rt_EX <= rt_ID; 112 | rd_EX <= rd_ID; 113 | Sign_EX <= Sign_ID; 114 | LoadByte_EX <= LoadByte_ID; 115 | PC_EX <= PC_ID; 116 | Load_EX <= Load_ID; 117 | end 118 | // else begin 119 | // RegWrite_EX <= RegWrite_EX; 120 | // Branch_EX <= Branch_EX; 121 | // MemRead_EX <= MemRead_EX; 122 | // MemWrite_EX <= MemWrite_EX; 123 | // MemtoReg_EX <= MemtoReg_EX; 124 | // ALUSrcA_EX <= ALUSrcA_EX; 125 | // ALUSrcB_EX <= ALUSrcB_EX; 126 | // ALUCtrl_EX <= ALUCtrl_EX; 127 | // RegDst_EX <= RegDst_EX; 128 | // dataA_EX <= dataA_EX; 129 | // dataB_EX <= dataB_EX; 130 | // ImmExtOut_EX <= ImmExtOut_EX; 131 | // Shamt_EX <= Shamt_EX; 132 | // rt_EX <= rt_EX; 133 | // rd_EX <= rd_EX; 134 | // Sign_EX <= Sign_EX; 135 | // LoadByte_EX <= LoadByte_EX; 136 | // end 137 | end 138 | 139 | endmodule -------------------------------------------------------------------------------- /src/IF_ID.v: -------------------------------------------------------------------------------- 1 | `timescale 1ns / 1ps 2 | module IF_ID( 3 | input wire clk, 4 | input wire reset, 5 | input wire flush_IFID, 6 | input wire hold, 7 | input wire Load_EX, // if Load, flush is more important than hold 8 | 9 | input wire [31:0] Instruction, 10 | input wire [31:0] PC_IF, 11 | output reg [5:0] OpCode, 12 | output reg [4:0] rs, 13 | output reg [4:0] rt, 14 | output reg [4:0] rd, 15 | output reg [4:0] Shamt, 16 | output reg [5:0] Funct, 17 | output reg [31:0] PC_ID 18 | ); 19 | 20 | initial begin 21 | OpCode <= 0; 22 | rs <= 0; 23 | rt <= 0; 24 | rd <= 0; 25 | Shamt <= 0; 26 | Funct <= 0; 27 | PC_ID <= 0; 28 | end 29 | 30 | always @(posedge clk or posedge reset) begin 31 | if(reset || (flush_IFID && Load_EX)) begin // || flush_IFID 32 | OpCode <= 0; 33 | rs <= 0; 34 | rt <= 0; 35 | rd <= 0; 36 | Shamt <= 0; 37 | Funct <= 0; 38 | PC_ID <= 0; 39 | end 40 | else if (hold) begin 41 | OpCode <= OpCode; 42 | rs <= rs; 43 | rt <= rt; 44 | rd <= rd; 45 | Shamt <= Shamt; 46 | Funct <= Funct; 47 | PC_ID <= PC_ID; 48 | end 49 | else if (flush_IFID) begin 50 | OpCode <= 0; 51 | rs <= 0; 52 | rt <= 0; 53 | rd <= 0; 54 | Shamt <= 0; 55 | Funct <= 0; 56 | PC_ID <= 0; 57 | end 58 | else begin 59 | OpCode <= Instruction[31:26]; 60 | rs <= Instruction[25:21]; 61 | rt <= Instruction[20:16]; 62 | rd <= Instruction[15:11]; 63 | Shamt <= Instruction[10:6]; 64 | Funct <= Instruction[5:0]; 65 | PC_ID <= PC_IF; 66 | end 67 | // else if(!hold) begin 68 | // if(flush_IFID) begin 69 | // OpCode <= 0; 70 | // rs <= 0; 71 | // rt <= 0; 72 | // rd <= 0; 73 | // Shamt <= 0; 74 | // Funct <= 0; 75 | // PC_ID <= 0; 76 | // end 77 | // else begin 78 | // OpCode <= Instruction[31:26]; 79 | // rs <= Instruction[25:21]; 80 | // rt <= Instruction[20:16]; 81 | // rd <= Instruction[15:11]; 82 | // Shamt <= Instruction[10:6]; 83 | // Funct <= Instruction[5:0]; 84 | // PC_ID <= PC_IF; 85 | // end 86 | // end 87 | // else begin 88 | // OpCode <= OpCode; 89 | // rs <= rs; 90 | // rt <= rt; 91 | // rd <= rd; 92 | // Shamt <= Shamt; 93 | // Funct <= Funct; 94 | // PC_ID <= PC_ID; 95 | // end 96 | 97 | end 98 | 99 | endmodule -------------------------------------------------------------------------------- /src/ImmProcess.v: -------------------------------------------------------------------------------- 1 | `timescale 1ns / 1ps 2 | 3 | module ImmProcess(ExtOp, LuiOp, Immediate, ImmExtOut, ImmExtShift); 4 | //Input Control Signals 5 | input ExtOp; //'0'-zero extension, '1'-signed extension 6 | input LuiOp; //for lui instruction 7 | //Input 8 | input [15:0] Immediate; 9 | //Output 10 | output [31:0] ImmExtOut; 11 | output [31:0] ImmExtShift; 12 | 13 | wire [31:0] ImmExt; 14 | 15 | assign ImmExt = {ExtOp? {16{Immediate[15]}}: 16'h0000, Immediate}; 16 | assign ImmExtShift = ImmExt << 2; 17 | assign ImmExtOut = LuiOp? {Immediate, 16'h0000}: ImmExt; 18 | 19 | endmodule 20 | -------------------------------------------------------------------------------- /src/InstructionMemory.v: -------------------------------------------------------------------------------- 1 | `timescale 1ns / 1ps 2 | module InstructionMemory(Address, Instruction); 3 | input wire [31:0] Address; 4 | output wire [31:0] Instruction; 5 | 6 | parameter MEM_SIZE = 512; 7 | reg [31:0] data [MEM_SIZE - 1:0]; 8 | 9 | assign Instruction = data[Address[10:2]]; 10 | 11 | integer i; 12 | initial begin 13 | //data[9'd0] <= 32'h20040020; 14 | data[9'd0] <= 32'h20040074; 15 | 16 | data[9'd1] <= 32'h20050000; 17 | data[9'd2] <= 32'h20060004; 18 | data[9'd3] <= 32'h20070190; 19 | data[9'd4] <= 32'h00865022; 20 | data[9'd5] <= 32'h20080000; 21 | data[9'd6] <= 32'h20020000; 22 | data[9'd7] <= 32'h0148582a; 23 | data[9'd8] <= 32'h1560000f; 24 | data[9'd9] <= 32'h20090000; 25 | data[9'd10] <= 32'h0126582a; 26 | data[9'd11] <= 32'h11600008; 27 | data[9'd12] <= 32'h01096020; 28 | data[9'd13] <= 32'h01856820; 29 | data[9'd14] <= 32'h81ae0000; 30 | data[9'd15] <= 32'h01276820; 31 | data[9'd16] <= 32'h81af0000; 32 | data[9'd17] <= 32'h15cf0002; 33 | data[9'd18] <= 32'h21290001; 34 | data[9'd19] <= 32'h0810000a; 35 | data[9'd20] <= 32'h15260001; 36 | data[9'd21] <= 32'h20420001; 37 | data[9'd22] <= 32'h21080001; 38 | data[9'd23] <= 32'h08100007; 39 | data[9'd24] <= 32'h200c4000; 40 | data[9'd25] <= 32'h000c6400; 41 | data[9'd26] <= 32'h218c000c; 42 | data[9'd27] <= 32'had820000; 43 | data[9'd28] <= 32'h20100000; 44 | data[9'd29] <= 32'h00104302; 45 | data[9'd30] <= 32'h31080003; 46 | data[9'd31] <= 32'h20090000; 47 | data[9'd32] <= 32'h11090006; 48 | data[9'd33] <= 32'h21290001; 49 | data[9'd34] <= 32'h11090007; 50 | data[9'd35] <= 32'h21290001; 51 | data[9'd36] <= 32'h11090009; 52 | data[9'd37] <= 32'h21290001; 53 | data[9'd38] <= 32'h1109000b; 54 | data[9'd39] <= 32'h20110100; 55 | data[9'd40] <= 32'h304a000f; 56 | data[9'd41] <= 32'h08100035; 57 | data[9'd42] <= 32'h20110200; 58 | data[9'd43] <= 32'h304a00f0; 59 | data[9'd44] <= 32'h000a5102; 60 | data[9'd45] <= 32'h08100035; 61 | data[9'd46] <= 32'h20110400; 62 | data[9'd47] <= 32'h304a0f00; 63 | data[9'd48] <= 32'h000a5202; 64 | data[9'd49] <= 32'h08100035; 65 | data[9'd50] <= 32'h20110800; 66 | data[9'd51] <= 32'h304af000; 67 | data[9'd52] <= 32'h000a5042; 68 | data[9'd53] <= 32'h20090000; 69 | data[9'd54] <= 32'h1149001e; 70 | data[9'd55] <= 32'h20090001; 71 | data[9'd56] <= 32'h1149001e; 72 | data[9'd57] <= 32'h20090002; 73 | data[9'd58] <= 32'h1149001e; 74 | data[9'd59] <= 32'h20090003; 75 | data[9'd60] <= 32'h1149001e; 76 | data[9'd61] <= 32'h20090004; 77 | data[9'd62] <= 32'h1149001e; 78 | data[9'd63] <= 32'h20090005; 79 | data[9'd64] <= 32'h1149001e; 80 | data[9'd65] <= 32'h20090006; 81 | data[9'd66] <= 32'h1149001e; 82 | data[9'd67] <= 32'h20090007; 83 | data[9'd68] <= 32'h1149001e; 84 | data[9'd69] <= 32'h20090008; 85 | data[9'd70] <= 32'h1149001e; 86 | data[9'd71] <= 32'h20090009; 87 | data[9'd72] <= 32'h1149001e; 88 | data[9'd73] <= 32'h2009000a; 89 | data[9'd74] <= 32'h1149001e; 90 | data[9'd75] <= 32'h2009000b; 91 | data[9'd76] <= 32'h1149001e; 92 | data[9'd77] <= 32'h2009000c; 93 | data[9'd78] <= 32'h1149001e; 94 | data[9'd79] <= 32'h2009000d; 95 | data[9'd80] <= 32'h1149001e; 96 | data[9'd81] <= 32'h2009000e; 97 | data[9'd82] <= 32'h1149001e; 98 | data[9'd83] <= 32'h2009000f; 99 | data[9'd84] <= 32'h1149001e; 100 | data[9'd85] <= 32'h200b003f; 101 | data[9'd86] <= 32'h08100074; 102 | data[9'd87] <= 32'h200b0006; 103 | data[9'd88] <= 32'h08100074; 104 | data[9'd89] <= 32'h200b005b; 105 | data[9'd90] <= 32'h08100074; 106 | data[9'd91] <= 32'h200b004f; 107 | data[9'd92] <= 32'h08100074; 108 | data[9'd93] <= 32'h200b0066; 109 | data[9'd94] <= 32'h08100074; 110 | data[9'd95] <= 32'h200b006d; 111 | data[9'd96] <= 32'h08100074; 112 | data[9'd97] <= 32'h200b007d; 113 | data[9'd98] <= 32'h08100074; 114 | data[9'd99] <= 32'h200b0007; 115 | data[9'd100] <= 32'h08100074; 116 | data[9'd101] <= 32'h200b007f; 117 | data[9'd102] <= 32'h08100074; 118 | data[9'd103] <= 32'h200b006f; 119 | data[9'd104] <= 32'h08100074; 120 | data[9'd105] <= 32'h200b0077; 121 | data[9'd106] <= 32'h08100074; 122 | data[9'd107] <= 32'h200b007c; 123 | data[9'd108] <= 32'h08100074; 124 | data[9'd109] <= 32'h200b0039; 125 | data[9'd110] <= 32'h08100074; 126 | data[9'd111] <= 32'h200b005e; 127 | data[9'd112] <= 32'h08100074; 128 | data[9'd113] <= 32'h200b0079; 129 | data[9'd114] <= 32'h08100074; 130 | data[9'd115] <= 32'h200b0071; 131 | data[9'd116] <= 32'h022b9020; 132 | data[9'd117] <= 32'h200c4000; 133 | data[9'd118] <= 32'h000c6400; 134 | data[9'd119] <= 32'h218c0010; 135 | data[9'd120] <= 32'had920000; 136 | data[9'd121] <= 32'h22100001; 137 | data[9'd122] <= 32'h0810001d; 138 | for (i = 9'd123; i < MEM_SIZE; i = i + 1) begin 139 | data[i] <= 0; 140 | end 141 | end 142 | 143 | endmodule 144 | -------------------------------------------------------------------------------- /src/MEM_WB.v: -------------------------------------------------------------------------------- 1 | module MEM_WB( 2 | input wire clk, 3 | input wire reset, 4 | 5 | input wire RegWrite_MEM, 6 | input wire [1:0] MemtoReg_MEM, 7 | input wire [4:0] Rw_MEM, 8 | input wire [31:0] ReadData_MEM, 9 | input wire [31:0] ALUOut_MEM, 10 | input wire [31:0] PC_MEM, 11 | input wire Load_MEM, 12 | 13 | output reg RegWrite_WB, 14 | output reg [1:0] MemtoReg_WB, 15 | output reg [4:0] Rw_WB, 16 | output reg [31:0] ReadData_WB, 17 | output reg [31:0] ALUOut_WB, 18 | output reg [31:0] PC_WB, 19 | output reg Load_WB 20 | ); 21 | 22 | initial begin 23 | RegWrite_WB <= 0; 24 | MemtoReg_WB <= 0; 25 | Rw_WB <= 0; 26 | ReadData_WB <= 0; 27 | ALUOut_WB <= 0; 28 | PC_WB <= 0; 29 | Load_WB <= 0; 30 | end 31 | 32 | always @(posedge clk or posedge reset) begin 33 | if(reset) begin 34 | RegWrite_WB <= 0; 35 | MemtoReg_WB <= 0; 36 | Rw_WB <= 0; 37 | ReadData_WB <= 0; 38 | ALUOut_WB <= 0; 39 | PC_WB <= 0; 40 | Load_WB <= 0; 41 | end 42 | else begin 43 | RegWrite_WB <= RegWrite_MEM; 44 | MemtoReg_WB <= MemtoReg_MEM; 45 | Rw_WB <= Rw_MEM; 46 | ReadData_WB <= ReadData_MEM; 47 | ALUOut_WB <= ALUOut_MEM; 48 | PC_WB <= PC_MEM; 49 | Load_WB <= Load_MEM; 50 | end 51 | end 52 | 53 | endmodule -------------------------------------------------------------------------------- /src/PC.v: -------------------------------------------------------------------------------- 1 | `timescale 1ns / 1ps 2 | 3 | module PC(clk, reset, PC_i, PC_o); 4 | //Input Clock Signals 5 | input reset; 6 | input clk; 7 | //Input PC 8 | input [31:0] PC_i; 9 | //Output PC 10 | output reg [31:0] PC_o; 11 | 12 | initial begin 13 | PC_o <= 0; 14 | end 15 | 16 | always@(posedge reset or posedge clk) 17 | begin 18 | if(reset) begin 19 | PC_o <= 0; 20 | end else begin 21 | PC_o <= PC_i; 22 | end 23 | end 24 | endmodule -------------------------------------------------------------------------------- /src/PipelineCPU.v: -------------------------------------------------------------------------------- 1 | `timescale 1ns / 1ps 2 | module PipelineCPU( 3 | input wire sysclk, 4 | input wire reset, 5 | output wire [7:0] leds, 6 | output wire [7:0] bcd7, 7 | output wire [3:0] an 8 | ); 9 | 10 | wire clk; 11 | //CLK CLKController(sysclk, reset, clk); 12 | assign clk = sysclk; 13 | wire [31:0] Instruction; 14 | wire [31:0] PC_now; // PC_IF 15 | wire [31:0] PC_new; 16 | 17 | // IF 18 | InstructionMemory InstMemory(PC_now, Instruction); 19 | 20 | // ID 21 | wire [5:0] OpCode_ID; 22 | wire [4:0] rs_ID; 23 | wire [4:0] rt_ID; 24 | wire [4:0] rd_ID; 25 | wire [4:0] Shamt_ID; 26 | wire [5:0] Funct_ID; 27 | wire [31:0] PC_ID; 28 | wire flush_IFID; 29 | wire hold_IFID; 30 | 31 | wire Load_EX; 32 | IF_ID IFIDReg(clk, reset, flush_IFID, hold_IFID, Load_EX, Instruction, PC_now, OpCode_ID, rs_ID, rt_ID, rd_ID, Shamt_ID, Funct_ID, PC_ID); 33 | 34 | wire [1:0] PCSrc_ID; 35 | wire Branch_ID; 36 | wire RegWrite_ID; 37 | wire [1:0] RegDst_ID; 38 | wire MemRead_ID; 39 | wire MemWrite_ID; 40 | wire [1:0] MemtoReg_ID; 41 | wire ALUSrcA_ID; 42 | wire ALUSrcB_ID; 43 | wire ExtOp_ID; 44 | wire LuOp_ID; 45 | wire JOp_ID; 46 | wire LoadByte_ID; 47 | Control ControlDecoder(OpCode_ID, Funct_ID, PCSrc_ID, Branch_ID, RegWrite_ID, RegDst_ID, MemRead_ID, MemWrite_ID, MemtoReg_ID, ALUSrcA_ID, ALUSrcB_ID, ExtOp_ID, LuOp_ID, JOp_ID, LoadByte_ID); 48 | 49 | wire [4:0] ALUCtrl_ID; 50 | wire Sign_ID; 51 | ALUControl ALUController(OpCode_ID, Funct_ID, ALUCtrl_ID, Sign_ID); 52 | 53 | assign flush_IFID = Branch_ID || JOp_ID; 54 | 55 | wire [31:0] WriteData_WB; 56 | wire [4:0] Rw_WB; 57 | wire RegWrite_WB; 58 | wire [31:0] dataA_ID; 59 | wire [31:0] dataB_ID; 60 | RegisterFile RF(clk, reset, RegWrite_WB, rs_ID, rt_ID, Rw_WB, WriteData_WB, dataA_ID, dataB_ID); 61 | 62 | wire [31:0] ImmExtOut_ID; 63 | wire [31:0] ImmExtShift_ID; 64 | ImmProcess Imm1(ExtOp_ID, LuOp_ID, {rd_ID, Shamt_ID, Funct_ID}, ImmExtOut_ID, ImmExtShift_ID); 65 | 66 | wire Zero; 67 | wire [1:0] BrForwardingA; 68 | wire [1:0] BrForwardingB; 69 | wire [31:0] BrJuderA; 70 | wire [31:0] BrJuderB; 71 | 72 | wire [4:0] Rw_MEM; 73 | wire RegWrite_MEM; 74 | wire [31:0] ALUOut_MEM; 75 | wire Load_ID; 76 | wire Load_WB; 77 | assign Load_ID = OpCode_ID == 6'h23 || OpCode_ID == 6'h20; 78 | 79 | BranchForwarding BrForwarding(rs_ID, rt_ID, Rw_MEM, RegWrite_MEM, Load_WB, Rw_WB, BrForwardingA, BrForwardingB); 80 | assign BrJuderA = BrForwardingA == 1 ? ALUOut_MEM : 81 | BrForwardingA == 2 ? WriteData_WB : dataA_ID; 82 | assign BrJuderB = BrForwardingB == 1 ? ALUOut_MEM : 83 | BrForwardingB == 2 ? WriteData_WB : dataB_ID; 84 | BranchJudge BranchJudger(OpCode_ID, BrJuderA, BrJuderB, Branch_ID, Zero); 85 | 86 | wire [31:0] PC_Branch; 87 | assign PC_Branch = Branch_ID && Zero ? PC_ID + 4 + ImmExtShift_ID : PC_ID + 4; 88 | 89 | // EX 90 | wire flush_IDEX; 91 | wire [4:0] Rw_EX; 92 | //wire hold_IDEX; 93 | 94 | wire RegWrite_EX; 95 | wire Branch_EX; 96 | wire MemRead_EX; 97 | wire MemWrite_EX; 98 | wire [1:0] MemtoReg_EX; 99 | wire ALUSrcA_EX; 100 | wire ALUSrcB_EX; 101 | wire [4:0] ALUCtrl_EX; 102 | wire [1:0] RegDst_EX; 103 | wire [31:0] dataA_EX; 104 | wire [31:0] dataB_EX; 105 | wire [31:0] ImmExtOut_EX; 106 | wire [4:0] Shamt_EX; 107 | wire [4:0] rs_EX; 108 | wire [4:0] rt_EX; 109 | wire [4:0] rd_EX; 110 | wire Sign_EX; 111 | wire LoadByte_EX; 112 | wire [31:0] PC_EX; 113 | //wire Load_EX; 114 | ID_EX IDEXReg( 115 | clk, reset, flush_IDEX, RegWrite_ID, Branch_ID, MemRead_ID, MemWrite_ID, 116 | MemtoReg_ID, ALUSrcA_ID, ALUSrcB_ID, ALUCtrl_ID, RegDst_ID, dataA_ID, dataB_ID, 117 | ImmExtOut_ID, Shamt_ID, rs_ID, rt_ID, rd_ID, Sign_ID, LoadByte_ID, PC_ID, Load_ID, 118 | RegWrite_EX, Branch_EX, MemRead_EX, MemWrite_EX, 119 | MemtoReg_EX, ALUSrcA_EX, ALUSrcB_EX, ALUCtrl_EX, RegDst_EX, dataA_EX, dataB_EX, 120 | ImmExtOut_EX, Shamt_EX, rs_EX, rt_EX, rd_EX, Sign_EX, LoadByte_EX, PC_EX, Load_EX 121 | ); 122 | 123 | assign Rw_EX = RegDst_EX == 2'b00 ? rt_EX : RegDst_EX == 2'b01 ? rd_EX : 31; // 0: rt; 1: rd; 2: ra 124 | 125 | assign hold_IFID = ((RegWrite_EX && Branch_ID && (Rw_EX == rs_ID || Rw_EX == rt_ID)) && Load_EX == 0) || 126 | (MemRead_EX && (rt_EX == rs_ID || rt_EX == rt_ID) && Load_EX); // next inst is branch && !Load, stall || load use hazard 127 | assign flush_IDEX = (RegWrite_EX && Branch_ID && (Rw_EX == rs_ID || Rw_EX == rt_ID)) || 128 | (MemRead_EX && (rt_EX == rs_ID || rt_EX == rt_ID) && Load_EX); 129 | 130 | wire [1:0] ALUChooseA; 131 | wire [1:0] ALUChooseB; 132 | ALUForwarding ALUForward(rs_EX, rt_EX, Rw_MEM, Rw_WB, RegWrite_MEM, RegWrite_WB, ALUSrcA_EX, ALUSrcB_EX, ALUChooseA, ALUChooseB); 133 | 134 | wire [31:0] ALUinA; 135 | wire [31:0] ALUinB; 136 | assign ALUinA = ALUChooseA == 1 ? {27'h0000000, Shamt_EX} : 137 | ALUChooseA == 2 ? ALUOut_MEM : 138 | ALUChooseA == 3 ? WriteData_WB: dataA_EX; 139 | assign ALUinB = ALUChooseB == 1 ? ImmExtOut_EX : 140 | ALUChooseB == 2 ? ALUOut_MEM : 141 | ALUChooseB == 3 ? WriteData_WB: dataB_EX; 142 | 143 | wire [31:0] ALUOut_EX; 144 | ALU ALUCalculate(ALUCtrl_EX, Sign_EX, ALUinA, ALUinB, ALUOut_EX); 145 | 146 | // MEM 147 | wire MemRead_MEM; 148 | wire MemWrite_MEM; 149 | //wire [31:0] ALUOut_MEM; 150 | //wire [4:0] Rw_MEM; 151 | wire [1:0] MemtoReg_MEM; 152 | //wire RegWrite_MEM; 153 | wire [31:0] dataB_MEM; 154 | wire LoadByte_MEM; 155 | wire [31:0] PC_MEM; 156 | wire Load_MEM; 157 | EX_MEM EXMEMReg( 158 | clk, reset, MemRead_EX, MemWrite_EX, ALUOut_EX, Rw_EX, MemtoReg_EX, RegWrite_EX, dataB_EX, LoadByte_EX, PC_EX, Load_EX, 159 | MemRead_MEM, MemWrite_MEM, ALUOut_MEM, Rw_MEM, MemtoReg_MEM, RegWrite_MEM, dataB_MEM, LoadByte_MEM, PC_MEM, Load_MEM 160 | ); 161 | 162 | wire [31:0] ReadData_Temp; 163 | 164 | DataMemory DataMem(clk, reset, ALUOut_MEM, dataB_MEM, ReadData_Temp, MemRead_MEM, MemWrite_MEM, leds, bcd7, an); 165 | 166 | wire [31:0] ReadData_MEM; 167 | assign ReadData_MEM = LoadByte_MEM == 0 ? ReadData_Temp : 168 | ALUOut_MEM[1:0] == 2'b00 ? {{24{ReadData_Temp[7]}}, ReadData_Temp[7:0]} : 169 | ALUOut_MEM[1:0] == 2'b01 ? {{24{ReadData_Temp[15]}}, ReadData_Temp[15:8]} : 170 | ALUOut_MEM[1:0] == 2'b10 ? {{24{ReadData_Temp[23]}}, ReadData_Temp[23:16]} : 171 | {{24{ReadData_Temp[31]}}, ReadData_Temp[31:24]}; 172 | 173 | // WB 174 | //wire RegWrite_WB; 175 | wire [1:0] MemtoReg_WB; 176 | //wire [4:0] Rw_WB; 177 | wire [31:0] ReadData_WB; 178 | wire [31:0] ALUOut_WB; 179 | wire [31:0] PC_WB; 180 | //wire Load_WB; 181 | MEM_WB MEMWBReg( 182 | clk, reset, RegWrite_MEM, MemtoReg_MEM, Rw_MEM, ReadData_MEM, ALUOut_MEM, PC_MEM, Load_MEM, 183 | RegWrite_WB, MemtoReg_WB, Rw_WB, ReadData_WB, ALUOut_WB, PC_WB, Load_WB 184 | ); 185 | 186 | //wire [31:0] WriteData_WB; 187 | assign WriteData_WB = MemtoReg_WB == 1 ? ReadData_WB : 188 | MemtoReg_WB == 2 ? PC_WB : ALUOut_WB; 189 | 190 | // PC 191 | assign PC_new = (RegWrite_EX && Branch_ID && (Rw_EX == rs_ID || Rw_EX == rt_ID) && Load_EX) ? PC_now - 4 : 192 | hold_IFID ? PC_now : 193 | PCSrc_ID == 1 ? {PC_ID[31:28], rs_ID, rt_ID, rd_ID, Shamt_ID, Funct_ID, 2'b00} : 194 | PCSrc_ID == 2 ? dataA_ID + 4: 195 | // (Branch_ID && Zero) ? PC_now + ImmExtShift_ID : // ID stage Judge: PC_now has plused 4 196 | // Branch_ID ? PC_now : 197 | Branch_ID ? PC_Branch : 198 | PC_now + 4; 199 | PC PCConctroller(clk, reset, PC_new, PC_now); 200 | 201 | endmodule -------------------------------------------------------------------------------- /src/RegisterFile.v: -------------------------------------------------------------------------------- 1 | `timescale 1ns / 1ps 2 | module RegisterFile(clk, reset, RegWrite, Read_register1, Read_register2, Write_register, Write_data, Read_data1, Read_data2); 3 | input reset, clk; 4 | input RegWrite; 5 | input [4:0] Read_register1, Read_register2, Write_register; 6 | input [31:0] Write_data; 7 | output [31:0] Read_data1, Read_data2; 8 | 9 | reg [31:0] RF_data[31:0]; 10 | wire Writable; 11 | 12 | assign Writable = RegWrite && (Write_register != 5'b00000); 13 | 14 | assign Read_data1 = (Write_register == Read_register1 && Writable) ? Write_data : 15 | (Read_register1 == 5'b00000)? 32'h00000000: RF_data[Read_register1]; 16 | assign Read_data2 = (Write_register == Read_register2 && Writable) ? Write_data : 17 | (Read_register2 == 5'b00000)? 32'h00000000: RF_data[Read_register2]; 18 | 19 | integer i; 20 | always @(posedge reset or posedge clk) 21 | if (reset) 22 | for (i = 1; i < 32; i = i + 1) 23 | RF_data[i] <= 32'h00000000; 24 | else if (Writable) 25 | RF_data[Write_register] <= Write_data; 26 | 27 | endmodule 28 | -------------------------------------------------------------------------------- /src/constrain.xdc: -------------------------------------------------------------------------------- 1 | set_property -dict {PACKAGE_PIN U4 IOSTANDARD LVCMOS33} [get_ports {reset}] 2 | set_property -dict {PACKAGE_PIN P17 IOSTANDARD LVCMOS33} [get_ports {sysclk}] 3 | 4 | set_property -dict {PACKAGE_PIN B4 IOSTANDARD LVCMOS33} [get_ports {bcd7[0]}] 5 | set_property -dict {PACKAGE_PIN A4 IOSTANDARD LVCMOS33} [get_ports {bcd7[1]}] 6 | set_property -dict {PACKAGE_PIN A3 IOSTANDARD LVCMOS33} [get_ports {bcd7[2]}] 7 | set_property -dict {PACKAGE_PIN B1 IOSTANDARD LVCMOS33} [get_ports {bcd7[3]}] 8 | set_property -dict {PACKAGE_PIN A1 IOSTANDARD LVCMOS33} [get_ports {bcd7[4]}] 9 | set_property -dict {PACKAGE_PIN B3 IOSTANDARD LVCMOS33} [get_ports {bcd7[5]}] 10 | set_property -dict {PACKAGE_PIN B2 IOSTANDARD LVCMOS33} [get_ports {bcd7[6]}] 11 | set_property -dict {PACKAGE_PIN D5 IOSTANDARD LVCMOS33} [get_ports {bcd7[7]}] 12 | 13 | set_property -dict {PACKAGE_PIN G2 IOSTANDARD LVCMOS33} [get_ports {an[3]}] 14 | set_property -dict {PACKAGE_PIN C2 IOSTANDARD LVCMOS33} [get_ports {an[2]}] 15 | set_property -dict {PACKAGE_PIN C1 IOSTANDARD LVCMOS33} [get_ports {an[1]}] 16 | set_property -dict {PACKAGE_PIN H1 IOSTANDARD LVCMOS33} [get_ports {an[0]}] 17 | 18 | set_property -dict {PACKAGE_PIN F6 IOSTANDARD LVCMOS33} [get_ports {leds[7]}] 19 | set_property -dict {PACKAGE_PIN G4 IOSTANDARD LVCMOS33} [get_ports {leds[6]}] 20 | set_property -dict {PACKAGE_PIN G3 IOSTANDARD LVCMOS33} [get_ports {leds[5]}] 21 | set_property -dict {PACKAGE_PIN J4 IOSTANDARD LVCMOS33} [get_ports {leds[4]}] 22 | set_property -dict {PACKAGE_PIN H4 IOSTANDARD LVCMOS33} [get_ports {leds[3]}] 23 | set_property -dict {PACKAGE_PIN J3 IOSTANDARD LVCMOS33} [get_ports {leds[2]}] 24 | set_property -dict {PACKAGE_PIN J2 IOSTANDARD LVCMOS33} [get_ports {leds[1]}] 25 | set_property -dict {PACKAGE_PIN K2 IOSTANDARD LVCMOS33} [get_ports {leds[0]}] 26 | 27 | create_clock -period 20.000 -name CLK -waveform {0.000 10.000} [get_ports sysclk] -------------------------------------------------------------------------------- /src/test_pipeline.v: -------------------------------------------------------------------------------- 1 | `timescale 1ns/1ps 2 | module test_pipeline(); 3 | 4 | reg reset; 5 | reg sysclk; 6 | wire [7:0] leds; 7 | wire [7:0] bcd7; 8 | wire [3:0] an; 9 | 10 | PipelineCPU pipelineCpu(sysclk, reset, leds, bcd7, an); 11 | 12 | initial begin 13 | reset = 1; 14 | sysclk = 1; 15 | #10 reset = 0; 16 | end 17 | 18 | always #5 sysclk = ~sysclk; 19 | 20 | endmodule 21 | -------------------------------------------------------------------------------- /实验报告.md: -------------------------------------------------------------------------------- 1 | # 实验报告:MIPS五级流水线CPU 2 | 3 | ## 实验目的 4 | 5 | 设计一个支持MIPS指令集的五级流水线CPU,并利用此处理器完成字符串搜索算法。 6 | 7 | ## 设计方案 8 | 9 | ### 基本框架 10 | 11 | 5级流水线,并实现`forwarding`相关电路。同时,我也实现了`Branch`指令在`ID`阶段提前跳转的功能,并做出了一系列调整保证CPU安全稳定的运行,成功避免了冒险的产生,加速了CPU的运行。 12 | 13 | 存储结构上采用哈佛结构,数据存储器与指令存储器分离。 14 | 15 | ### 设计实现的指令集 16 | 17 | 我设计的流水线CPU,能够实现大多数MIPS指令,在春季学期在单周期、多周期CPU上已实现的指令外,还增添了以下指令:`lb、bne、blez、bgtz、bltz、jal、jalr、jr、jalr` 等。 18 | 19 | ### 设计框图 20 | 21 | 我在马洪兵老师课件的`ID`阶段判断分支的流水线设计框图上进一步修改,得到我设计的流水线的设计框图如下: 22 | 23 | ![design](img/design.png) 24 | 25 | ## 原理说明与部分代码实现 26 | 27 | ### 控制信号 28 | 29 | 控制信号在我的代码中,由`Control.v`实现译码。根据指令的`OpCode`和`Funct`,将生成以下控制信号:`Branch、RegWrite、RegDst、MemRead、MemWrite、MemtoReg、ALUSrc1、ALUSrc2、ExtOp、LuOp、Jop、LoadByte`。 30 | 31 | 相比于多周期CPU,新增添的控制信号为`JOp`和`LoadByte`,前者用于指示该条指令是否为跳转指令,方便CPU进行跳转与stall;后者用于指示该条指令是否为`lb`指令,方便CPU从主存中直接取出字节。 32 | 33 | ### 五级流水线原理 34 | 35 | 将指令的执行阶段划分为5个阶段,分别为:指令获取(IF)、指令译码(ID)、计算执行(EX)、访问主存(MEM)、写回寄存器堆(WB)。每两个阶段间,设计一个暂存的寄存器,用于存储该条指令在接下来的阶段中会用到的控制信号。 36 | 37 | 由于总共需要有4组寄存器,来存取5个阶段间的信息传递,我将这4组寄存器命名为:`IF_ID、ID_EX、EX_MEM、MEM_WB`。其中`IF_ID`寄存器的输入有`flush`和`hold`信号,用于刷新与保持寄存器信息;`ID_EX`寄存器的输入有`flush`信号,用于刷新寄存器信息。它们的具体用法在下面涉及stall的时候详细介绍。 38 | 39 | ### Stall 原理与实现 40 | 41 | #### 分支或跳转指令后stall 42 | 43 | 在分支指令或跳转指令后,由于两种指令我都设计为在`ID`阶段就完成跳转,因此在它们之后都只需要`stall`一个周期。`stall`的具体方法为:如果在`ID`阶段的`Branch`信号为真,或者`JOp`信号为真,则设置`IF_ID`寄存器的`flush`信号,使`IF_ID`寄存器在下一周期刷新,同时设置下一帧的`PC`为跳转的地址(若`Branch`指令判断为`False`,则`PC`还是会变为`PC+4`)。 44 | 45 | 设置`flush_IFID`的代码如下: 46 | 47 | ~~~verilog 48 | assign flush_IFID = Branch_ID || JOp_ID; 49 | ~~~ 50 | 51 | 设置`PC`下一帧的代码如下: 52 | 53 | ~~~verilog 54 | assign PC_new = (RegWrite_EX && Branch_ID && (Rw_EX == rs_ID || Rw_EX == rt_ID) && Load_EX) ? PC_now - 4 : 55 | hold_IFID ? PC_now : 56 | PCSrc_ID == 1 ? {PC_ID[31:28], rs_ID, rt_ID, rd_ID, Shamt_ID, Funct_ID, 2'b00} : 57 | PCSrc_ID == 2 ? dataA_ID + 4: 58 | Branch_ID ? PC_Branch : 59 | PC_now + 4; 60 | ~~~ 61 | 62 | 其中,第3行是针对`j`指令跳转的表达式,第4行是针对`jr`等指令跳转的表达式,第5行是针对`Branch`指令跳转的表达式。`Branch`指令在`ID`阶段就已完成判断,因此`PC_Branch`在`ID`阶段就已经被计算好,这样跳转就不会发生问题。 63 | 64 | `PC_Branch`的计算方法如下 65 | 66 | ~~~verilog 67 | assign PC_Branch = Branch_ID && Zero ? PC_ID + 4 + ImmExtShift_ID : PC_ID + 4; 68 | ~~~ 69 | 70 | 其中`Zero`信号会根据`Branch`指令的不同来对应产生,如`beq`指令产生两输入是否相等的信号,`bne`指令产生两输入是否不等的信号。 71 | 72 | #### 分支指令前stall 73 | 74 | 由于在`ID`阶段提前判断了分支指令,这里可能会产生数据冒险,因此分支指令前也可能需要`stall`。 75 | 76 | 细节而言,分为两种情况: 77 | 78 | ##### 情形一:分支指令前是`R`型指令或计算型的`I`型指令 79 | 80 | 如果`Branch`的前一条指令是`R`型指令或计算型的`I`型指令,且前一条指令要写回的寄存器是分支指令需要用于比较的寄存器`rs`或`rt`时,会引起数据冒险。 81 | 82 | ![branch_front_stall](img/branch_front_stall.png) 83 | 84 | 如图所示,如果`Branch`前是`R`型指令或计算型的`I`型指令,且有数据冒险时,`ALU`的计算结果要到`Branch`指令的`ID`阶段结束之后才会被计算出来,这已经无法使用`forwarding`的方法让`Branch`指令正确运行了。此时需要让`Branch`指令`stall`一个周期后,再将前一条指令的`ALUOut`转发到`Branch`指令的`ID`阶段。如下图所示: 85 | 86 | ![branch_front_forwarding](img/branch_front_forwarding.png) 87 | 88 | 转发操作的实现在下面的转发单元中再仔细介绍,这里先介绍`stall`是如何实现的。 89 | 90 | 这里`Branch`指令需要`stall`一个周期,只需将`IF_ID`寄存器保持住,`ID_EX`寄存器刷新即可。 91 | 92 | 虽然在`stall`的时候,`PC`的值仍会变化,但是由于无论如何,当`Branch`指令执行完`ID`后,都会给`PC`一个新值,故此时`stall`不需要关注`PC`的变化。 93 | 94 | ##### 情形二:分支指令前是`lb`或`lw`指令 95 | 96 | 如果分支前的指令是`lb`或`lw`指令,且Load出来的数据要被`Branch`指令用到的话,也会引起数据冒险。与情形一不同,此时数据最早出现在Load指令的`MEM`阶段,因此`Branch`指令需要`stall`两个周期。 97 | 98 | 数据冒险如图所示: 99 | 100 | ![branch_load_stall](img/branch_load_stall.png) 101 | 102 | `stall`两个周期后,就可以实现转发,示意图如下: 103 | 104 | ![branch_load_forwarding](img/branch_load_forwarding.png) 105 | 106 | 这里`stall`执行起来相比情形一,略微复杂一些。 107 | 108 | 具体操作是:首先要`flush`寄存器`IF_ID`和寄存器`ID_EX`,然后需要将`PC-4`。这是因为如果仅仅`hold` `IF_ID`寄存器,只能`stall`一个周期;只有通过`flush` `IF_ID`寄存器的同时,将当前`PC`(即已经执行到`Branch`的`ID`阶段时,在`IF`阶段取出来的`PC`)重新置为`PC-4`才能保证`stall`两个周期。 109 | 110 | 置为`PC-4`时一定是正确的,这是因为我已经确定了前一条被执行的指令是`Load`指令,而不是跳转或分支指令。 111 | 112 | ##### 情形一与情形二的代码细节 113 | 114 | 控制信号`flush_IFID`、`hold_IFID`、`flush_IDEX`的逻辑如下: 115 | 116 | ~~~verilog 117 | assign flush_IFID = Branch_ID || JOp_ID; 118 | assign hold_IFID = ((RegWrite_EX && Branch_ID && (Rw_EX == rs_ID || Rw_EX == rt_ID)) && Load_EX == 0) || 119 | (MemRead_EX && (rt_EX == rs_ID || rt_EX == rt_ID) && Load_EX); // next inst is branch && !Load, stall || load use hazard 120 | assign flush_IDEX = (RegWrite_EX && Branch_ID && (Rw_EX == rs_ID || Rw_EX == rt_ID)) || 121 | (MemRead_EX && (rt_EX == rs_ID || rt_EX == rt_ID) && Load_EX); 122 | ~~~ 123 | 124 | 这里`hold_IFID`和`flush_IFID`的后面那部分是`Load-Use`冒险检测,前面那部分才是分支指令相关。 125 | 126 | 其中,`flush_IFID`与`hold_IFID`都是对`IF_ID`寄存器的控制,在不同情况下有着不同的优先级,具体实现代码如下: 127 | 128 | ~~~verilog 129 | always @(posedge clk or posedge reset) begin 130 | if(reset || (flush_IFID && Load_EX)) begin 131 | // flush 132 | // ... 133 | end 134 | else if (hold_IFID) begin 135 | // hold 136 | // ... 137 | end 138 | else if (flush_IFID) begin 139 | // flush 140 | // ... 141 | end 142 | else begin 143 | // decode 144 | OpCode <= Instruction[31:26]; 145 | rs <= Instruction[25:21]; 146 | rt <= Instruction[20:16]; 147 | rd <= Instruction[15:11]; 148 | Shamt <= Instruction[10:6]; 149 | Funct <= Instruction[5:0]; 150 | PC_ID <= PC_IF; 151 | end 152 | end 153 | ~~~ 154 | 155 | 当目前`EX`阶段是`Load`指令时,`flush_IFID`比`hold_IFID`有着更高的优先级,这是因为此时需要`stall`两个周期;当目前`EX`阶段不是`Load`指令时,`hold_IFID`比`flush_IFID`有更高的优先级。 156 | 157 | 设置`PC-4`的代码如下: 158 | 159 | ~~~verilog 160 | assign PC_new = (RegWrite_EX && Branch_ID && (Rw_EX == rs_ID || Rw_EX == rt_ID) && Load_EX) ? PC_now - 4 : 161 | hold_IFID ? PC_now : 162 | PCSrc_ID == 1 ? {PC_ID[31:28], rs_ID, rt_ID, rd_ID, Shamt_ID, Funct_ID, 2'b00} : 163 | PCSrc_ID == 2 ? dataA_ID + 4: 164 | Branch_ID ? PC_Branch : 165 | PC_now + 4; 166 | ~~~ 167 | 168 | 第1行就是设置`PC-4`的代码,具体逻辑是:如果`EX`阶段是`Load`,下一条指令是`Branch`,且`Load`要写回的寄存器是`Branch`要用到的,则下一帧的`PC`设为`PC-4`。 169 | 170 | #### Load-Use冒险检测并stall 171 | 172 | 当前一条指令是`lb`或`lw`,下一条指令是`R`型指令或计算型的`I`型指令,且`Load`要写入的寄存器会被下一条指令用到时,会引起数据冒险。此时在`Load`指令后需要`stall`一个周期。原理图如下: 173 | 174 | ![loaduse_stall](img/loaduse_stall.png) 175 | 176 | `Load`出来的数据最早在`MEM`阶段后才出现,而`Use`的时候在`EX`阶段就已经需要了,因此`Load`后要`stall`一个周期,并转发`LoadData`。如下图所示: 177 | 178 | ![loaduse_forwarding](img/loaduse_forwarding.png) 179 | 180 | 具体实现为:执行到`Load`指令的`EX`阶段时,可以判断下一条指令是否为`Use`且是否存在数据冒险。如果存在,则在下一周期保持`Use`指令的`IF_ID`寄存器,并清空`ID_EX`寄存器。 181 | 182 | 代码上就是: 183 | 184 | ~~~verilog 185 | assign hold_IFID = ((RegWrite_EX && Branch_ID && (Rw_EX == rs_ID || Rw_EX == rt_ID)) && Load_EX == 0) || 186 | (MemRead_EX && (rt_EX == rs_ID || rt_EX == rt_ID) && Load_EX); // next inst is branch && !Load, stall || load use hazard 187 | assign flush_IDEX = (RegWrite_EX && Branch_ID && (Rw_EX == rs_ID || Rw_EX == rt_ID)) || 188 | (MemRead_EX && (rt_EX == rs_ID || rt_EX == rt_ID) && Load_EX); 189 | ~~~ 190 | 191 | 上面代码中,`hold_IFID`与`flush_IDEX`的后半部分,是`Load-Use`的冒险检测部分。 192 | 193 | ### Forwarding 原理与实现 194 | 195 | #### Forwarding 到`ID`阶段 196 | 197 | 由于在我的设计中,`Branch`指令需要在`ID`阶段提前判断,因此我需要实现转发到`ID`阶段的操作,以解决`Branch`指令中存在的数据冒险。 198 | 199 | 我设置了`BrForwardingA`和`BrForwardingB`两个转发单元控制信号,来控制`ID`阶段中`Branch`指令判断的两个输入。 200 | 201 | `Branch`指令用于判断的两个输入变量,可以来自于三个方面: 202 | 203 | - `WriteData_WB`:即上一条指令从`DataMem`中取出的数据,适用于分支指令前为`Load`指令的场景。 204 | - `ALUOut_MEM`:即上一条指令的`ALU`输出,适用于分支指令前为`R`型指令或计算型`I`型指令的场景。 205 | - `dataA_ID` or `dataB_ID`:直接从寄存器堆中根据`rs`与`rt`的值取出的数据,适用于没有数据冒险时的场景。 206 | 207 | 以上三个场景分别对应于`BrForwarding`控制信号为:2、1、0。 208 | 209 | 我设计的`Branch`转发单元实现如下: 210 | 211 | ~~~verilog 212 | assign BrForwardingA = rs == Rw_WB && Load_WB ? 2 : rs == Rw_MEM && RegWrite_MEM ? 1 : 0; 213 | assign BrForwardingB = rt == Rw_WB && Load_WB ? 2 : rt == Rw_MEM && RegWrite_MEM ? 1 : 0; 214 | ~~~ 215 | 216 | 以`BrForwardingA`为例: 217 | 218 | - 如果`rs == Rw_WB && Load_WB`,说明前一条指令是`Load`(已经`stall`了两个周期),且写回的寄存器与`rs`相同,因此将`BrForwardingA`设为2。 219 | - 如果`rs == Rw_MEM && RegWrite_MEM`,说明前一条指令是`R`型指令或计算型`I`型指令(已经`stall`了一个周期),且写回的寄存器与`rs`相同,因此将`BrForwardingA`设为1。 220 | - 没有数据冒险时,`BrForwardingA`默认是0。 221 | 222 | 然后,`ID`阶段对`Branch`判断的输入`BrJudger`,会根据`BrForwarding`信号进行选择,代码如下: 223 | 224 | ~~~verilog 225 | assign BrJuderA = BrForwardingA == 1 ? ALUOut_MEM : BrForwardingA == 2 ? WriteData_WB : dataA_ID; 226 | assign BrJuderB = BrForwardingB == 1 ? ALUOut_MEM : BrForwardingB == 2 ? WriteData_WB : dataB_ID; 227 | ~~~ 228 | 229 | #### Forwarding 到`EX`阶段 230 | 231 | `EX`阶段`ALU`的输入,可能会有4种来源,分别是: 232 | 233 | - `dataA_EX` or `dataB_EX`:从寄存器堆中读取出来并随流水线传到`EX`阶段的数据。 234 | - 移位量`Shamt`或立即数`ImmExtOut`。 235 | - `ALUOut_MEM`:上一条指令的`ALU`计算结果。 236 | - `WriteData_WB`:上上条指令`ALU`计算结果,或者是上条指令`Load`的结果。 237 | 238 | 我设置的转发选择信号为`ALUChooseA`与`ALUChooseB`。以上四个场景分别对应于`ALUChoose`为:0、1、2、3。 239 | 240 | 我设计的转发单元代码如下: 241 | 242 | ~~~verilog 243 | assign ALUChooseA = ALUSrcA_EX == 1 ? 1 : 244 | (RegWrite_MEM && (Rw_MEM == rs_EX) && (Rw_MEM != 0)) ? 2 : // 优先判断MEM阶段,即前一条指令 245 | (RegWrite_WB && (Rw_WB == rs_EX) && (Rw_WB != 0)) ? 3 : 0; 246 | assign ALUChooseB = ALUSrcB_EX == 1 ? 1 : 247 | (RegWrite_MEM && (Rw_MEM == rt_EX) && (Rw_MEM != 0)) ? 2 : // 优先判断MEM阶段,即前一条指令 248 | (RegWrite_WB && (Rw_WB == rt_EX) && (Rw_WB != 0)) ? 3 : 0; 249 | ~~~ 250 | 251 | 这里,`ALUSrcA_EX`和`ALUSrcB_EX`是指令译码单元解码出来的 252 | 253 | 控制信号,用于指示是否要使用移位量或立即数。后面的判断就是关于转发的判断。 254 | 255 | 优先判断前一条指令是否满足转发条件,不满足时再判断前前条指令是否满足条件。 256 | 257 | 以`ALUChooseA`为例,判断的逻辑是:如果前一条指令要写回寄存器堆,且写回的寄存器为`rs`,且该寄存器不为`$0`,则将前一条指令的`ALU`输出转发到目前`EX`阶段指令的输入。如果前一条指令不满足转发条件,则看前前条指令(也包括前一条指令为`Load`的情况)。如果在`WB`阶段的要写回寄存器堆,且`WB`阶段写回的寄存器为`rs`,且该寄存器不是`$0`,则将要写回的值转发到`ALU`的输入。如果上述的转发条件都不满足,则直接使用从寄存器堆中读取的值。 258 | 259 | 有了`ALUChoose`信号后,就可以对`ALU`的输入进行选择,代码如下: 260 | 261 | ~~~verilog 262 | assign ALUinA = ALUChooseA == 1 ? {27'h0000000, Shamt_EX} : 263 | ALUChooseA == 2 ? ALUOut_MEM : 264 | ALUChooseA == 3 ? WriteData_WB: dataA_EX; 265 | assign ALUinB = ALUChooseB == 1 ? ImmExtOut_EX : 266 | ALUChooseB == 2 ? ALUOut_MEM : 267 | ALUChooseB == 3 ? WriteData_WB: dataB_EX; 268 | ~~~ 269 | 270 | ### 数据存储器 271 | 272 | 数据存储器的大小我设置为`512`个字大小,字节地址从`0x00000000`到`0x000007FF`。 273 | 274 | 在字节地址为`0x4000000C`的位置,我设置其对应外部`LEDs`的控制信息;在字节地址为`0x40000010`的位置,我设置其对应七段数码管的控制信息。 275 | 276 | ### Load Byte 的实现 277 | 278 | `Load Byre`大体上和`Load Word`类似。我只是单独添加了一个`LoadByte`控制信号,并根据该控制信号来选择是`LoadByte`还是`LoadWord`。 279 | 280 | 大概思路是,先用`LoadWord`把一个字取出来,再根据地址的后2位,选取对应的`Byte`,并进行符号拓展后返回。 281 | 282 | 代码如下: 283 | 284 | ~~~verilog 285 | assign ReadData_MEM = LoadByte_MEM == 0 ? ReadData_Temp : 286 | ALUOut_MEM[1:0] == 2'b00 ? {{24{ReadData_Temp[7]}}, ReadData_Temp[7:0]} : 287 | ALUOut_MEM[1:0] == 2'b01 ? {{24{ReadData_Temp[15]}}, ReadData_Temp[15:8]} : 288 | ALUOut_MEM[1:0] == 2'b10 ? {{24{ReadData_Temp[23]}}, ReadData_Temp[23:16]} : 289 | {{24{ReadData_Temp[31]}}, ReadData_Temp[31:24]}; 290 | ~~~ 291 | 292 | 其中,`ReadData_Temp`是从`DataMemory`中读取出的字。 293 | 294 | ## 仿真结果及分析 295 | 296 | ### 计算过程分析 297 | 298 | 仿真采用`./asm/mips1.asm`代码。要复现仿真结果,只需将`DataMemory.v`里`initial`和`reset`部分的代码,都选取上方只有0-7的部分,并在`InstructionMemory.v`中的第0条指令选用`data[9'd0] <= 32'h20040020;` 299 | 300 | 代码采用的`Brute-Force`暴力算法,下面我们分析代码的执行过程。 301 | 302 | 待搜索的字符串为多周期大作业中的示例:`linuxisnotunixisnotunixisnotunix`,子串为`unix`。 303 | 304 | 我将主串从`DataMemory`第0位地址开始存储,子串从`DataMemory`的第100位地址开始存储。主串中`lb`出来的字节存在`$t6`寄存器(第14号)中,子串中`lb`出来的字节存在`$t7`寄存器(第15号)寄存器中。可以观察匹配的过程如下: 305 | 306 | ![simulation_matching](img/simulation_matching.png) 307 | 308 | 主要关注第14号寄存器与第15号寄存器,他们分别代表从主串与从子串中`lb`出来的字节。我这里简单起见,都使用有十进制符号数格式进行查看。 309 | 310 | 可以看到,首先子串`load`出第一个字节117,与主串进行匹配,主串在前3个字节都没有匹配上,因此主串开始移动,第4个字节时主串也为117了,匹配成功1个字节,随后主串与子串共同向后移动,开始匹配子串的第2个字节。子串第2个字节为110,与主串第5个字节120不匹配,再次失配,使子串回到第1个字节,主串继续向后移动。这个过程与理论分析相符。 311 | 312 | 计算的最后结果保存在`$v0`寄存器中,`$v0`寄存器的值如下图 313 | 314 | ![simulation_v0](img/simulation_v0.png) 315 | 316 | `$v0`寄存器为第2号寄存器,可以看到最后算出了正确的值3。 317 | 318 | ### CPI 319 | 320 | 接下来在Mars里看下执行了多少指令。我从开始执行`Brute-Force`开始,到算出正确结果3结束,总共执行的指令数为526,如下图所示: 321 | 322 | ![inst_num](img/inst_num.png) 323 | 324 | 在仿真中,共消耗了8580ns的时间,10ns一个周期,总计858个周期。 325 | 326 | 从而计算得到CPI为 327 | $$ 328 | CPI=\frac{858}{526}=1.63 329 | $$ 330 | 该CPI较高主要是因为该程序中反复运行`lb`后接`bne`的指令,且数据冒险总会发生,导致程序不得不在`bne`后`stall`两个周期。 331 | 332 | ## 资源与时序性能 333 | 334 | ### 资源情况 335 | 336 | #### 流水线资源情况 337 | 338 | 流水线CPU资源占用情况如下 339 | 340 | ![resource](img/resource.png) 341 | 342 | ![summary](img/summary.png) 343 | 344 | 总共使用了6672个LUT,18148个寄存器,22个IO端口。 345 | 346 | #### Schematic 347 | 348 | 设计简图如下: 349 | 350 | ![schematic](img/schematic.png) 351 | 352 | #### 与单周期、多周期的对比 353 | 354 | 上学期实现的单周期CPU中,占用3398个LUT,8389个寄存器; 355 | 356 | 上学期实现的多周期CPU中,占用4304个LUT,9434个寄存器。 357 | 358 | 可以看到,流水线CPU还是非常吃资源的,在LUT的使用与寄存器的使用上都远远超过单周期与多周期的实现。 359 | 360 | ### 时序性能 361 | 362 | #### 虚拟时钟 363 | 364 | 综合时,我使用周期为20ns的虚拟时钟。 365 | 366 | #### 时序情况 367 | 368 | 流水线CPU时序情况如下: 369 | 370 | ![timing](img/timing.png) 371 | 372 | 总共检查35602个点,WNS为10.762ns,WHS为0.134ns。 373 | 374 | 考虑关键路径的延时如下: 375 | 376 | ![max_delay](img/max_delay.png) 377 | 378 | 最高延时为9.087ns。 379 | 380 | 从而,可以计算出我设计的流水线CPU,频率最高为 381 | $$ 382 | f_{max}=\frac{1}{9.087ns}=110.05MHz 383 | $$ 384 | 这个频率已经超过了FPGA提供的100MHz,因此我在后期上板子时,都直接接系统时钟作为CPU的时钟。这么做计算的结果也是正确的,说明主频的确超过了100MHz。 385 | 386 | #### 与单周期、多周期的对比 387 | 388 | 上学期我设计的单周期CPU,最高频率为72.61MHz;多周期CPU,最高频率为102.86MHz。而现在实现的流水线CPU,主频最高可达110.05MHz,在时钟频率上的性能超过了之前设计的单周期与多周期CPU。 389 | 390 | ## 硬件调试情况 391 | 392 | 在写好代码后上板子时,我在一开始遇到了仿真结果正确,但是实际结果错误的情况。这个bug非常难找到,我曾一度认为时钟周期的问题。我不断降低时钟周期,将原频率100分频后,结果仍然错误,此时我确定不是由于时钟周期的问题,而是其他的原因。 393 | 394 | 我将PC显示在LED上,将时钟周期设置为1s,在板子上仔细查看PC值的变化。结果显示,PC的值运行相对正常,在`Branch`与`J`指令都能够正常`stall`,看不出什么问题。 395 | 396 | 然后我仔细分析FPGA计算的错误结果,发现程序结果是统计了子串移动的次数,而不是成功匹配的次数。这让我非常疑惑。在细致的debug后,我将问题确定为`Load`指令失败。 397 | 398 | 在一系列操作后,我发现我的`DataMemory`使用了`initial`块进行初始化,而这有可能会初始化失败。因此,我在`reset`中也添加了初始化代码,发现结果运行正确!这个问题有点玄学,我合理怀疑问题就是因为`initial`块初始化内存失败。 399 | 400 | ## 最终运行结果 401 | 402 | 由于综合后的时钟频率超过了100MHz,因此我直接使用FPGA的系统时钟作为我流水线CPU的时钟。 403 | 404 | 将代码烧录到FPGA上(代码中已将结果写入BCD管与LED管中),呈现的结果如下: 405 | 406 | ![result_3](img/result_3.jpg) 407 | 408 | 可以看到,流水线CPU计算出了正确的结果。 409 | 410 | 由于原本的例子字符串长度较小,搜索起来可能比较简单,我设计了一个更长字符串搜索问题作为样例,放到流水线CPU上测试。测试样例位于`DataMemory.v`中,是原本样例的下面。在`initial`中与在`reset`中选用初始化0-28行的代码,并将`InstructionMemory.v`的第0条指令设为`data[9'd0] <= 32'h20040074`,就是我设计的另一个测试样例。 411 | 412 | 主串:`abcdxabcdsseabcdscxabcdfsabcdvabcdaacdabcdcabcpabcdnabcdqwerabcdlabcdggabcdgabcdgabcdeceaaemabcdlkdrabcdbbabcdccabcd` 413 | 414 | 子串:`abcd` 415 | 416 | 主串长度116,子串在主串中出现了18次。可以用如下Python代码进行验证 417 | 418 | ~~~python 419 | st = "abcdxabcdsseabcdscxabcdfsabcdvabcdaacdabcdcabcpabcdnabcdqwerabcdlabcdggabcdgabcdgabcdeceaaemabcdlkdrabcdbbabcdccabcd" 420 | sub = "abcd" 421 | 422 | if __name__ == "__main__": 423 | length = len(st) 424 | print(length) 425 | cnt = 0 426 | for i in range(length - 3): 427 | if(st[i: i + 4] == sub): 428 | cnt += 1 429 | print(cnt) 430 | ~~~ 431 | 432 | 将修改好的代码烧录到FPGA上运行,结果如下: 433 | 434 | ![result_18](img/result_18.jpg) 435 | 436 | 结果为0x12,即十进制的18,计算结果正确。 437 | 438 | ## 实验总结 439 | 440 | 编写MIPS五级流水线CPU,是一个浩大的工程,也是将数逻理论课上所学运用到实践一个重要过程。 441 | 442 | 在设计流水线CPU的过程中,我希望挑战自己,设计一个在`ID`阶段提前判断`Branch`的流水线CPU。`ID`阶段提前判断分支,数逻理论课上只是稍微提了一点,对设计细节没有太多介绍。因此,如何避免提前分支引起的冒险、控制信号该如何设计等问题,是我在设计该流水线CPU中仔细思考的问题。在一遍遍的尝试后,我最终确定了目前设计的方案。最后在时钟频率上效果良好,仿真与上板子的结果也都正确。 443 | 444 | ## 文件清单 445 | 446 | 在`src`目录中存放了所有`verilog`文件,包括`ALU.v、ALUControl.v、ALUForwarding.v、BranchForwarding.v、BranchJudge.v、CLK.v、Control.v、DataMemory.v、EX_MEM.v、ID_EX.v、IF_ID.v、ImmProcess.v、InstructionMemory.v、MEM_WB.v、PC.v、PipelineCPU.v、RegisterFile.v、test_pipeline.v`。其中`CLK.v`为分频器,由于我设计的流水线频率已经超过了100MHz,所有该模块并没有被真正烧录到流水线CPU中;`test_pipeline.v`为测试testbench文件。`src`目录中也存放了`contrain.xdc`文件。 447 | 448 | 在`asm`目录下`mips1.asm`为汇编文件,流水线CPU中的`InstructionMemory.v`的初始化由该文件翻译而来。 449 | --------------------------------------------------------------------------------