├── LICENSE ├── README.md └── hw ├── pkg.sv ├── vector_addsub.sv ├── vector_alu.sv ├── vector_logic.sv ├── vector_mult.sv └── vector_processor_pkg.sv /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Martin Riis 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # RISC-V-Vector-Processor 2 | 256-bit vector processor based on the RISC-V vector (V) extension 3 | Currently, only the execution units are published, however, the registers, fetch and decode units are in development. 4 | **THIS PROJECT IS IN ACTIVE DEVELOPMENT AND SHOULD NOT BE CONSIDERED BUG FREE** 5 | 6 | ## 1. Background 7 | 8 | ## 2. RISC-V Vector Extension Terminology 9 | ### 2.1 Standard Element Width (SEW) 10 | The SEW defines the width of each word in the vector. For example, a 256-bit vector could contain 8 32-bit words, in this case, the SEW would be 32 bits. SEW is encoded as a 3-bit value using the following method: 11 | | `sew`/`vsew` | SEW | 12 | |---|---| 13 | | `000` | 8 | 14 | | `001` | 16 | 15 | | `010` | 32 | 16 | | `011` | 64 | 17 | | `100` | 128 | 18 | | `101` | 256 | 19 | | `110` | 512 | 20 | | `111` | 1024 | 21 | 22 | Although some execution units can operate on and value of SEW up to 256 (bits), most only support up to 64 (bits). 23 | 24 | ### 2.2 Vector Length (VLEN) 25 | VLEN describes the total length (usually in bits) of the vectors operated upon by the processor. In this design, VLEN is fixed at 256 bits. In the future, some simple elements, such as the ALU may be extended to support arbitrary vector lengths. 26 | 27 | ## 3. Features 28 | ### 3.1 Integer Addition/Subtraction 29 | Addition and subtraction are performed using the `addsub_256bit` module (`addsub.sv`). This can be used on its own or combined with the logic unit as the ALU module, `vector_alu` (`vector_alu.sv`). 30 | | Port | Direction | Width | Description | 31 | |---|---|---|---| 32 | | `vaddsub_en_i` | in | 1 | Active high enable | 33 | | `a_i` | in | 256 | Vector input A | 34 | | `b_i` | in | 256 | Vector input B | 35 | | `sew_i` | in | 3 | Standard element width (8, 16, 32, 64, 128, 256) | 36 | | `carry_ext_i` | in | 32 | External carry/borrow in | 37 | | `op_i` | in | 1 | Operation: 0 = subtract, 1 = add | 38 | | `out_o` | out | 256 | Vector output | 39 | | `cout_o` | out | 32 | Carry/borrow out | 40 | 41 | ### 3.2 Logic 42 | Logic and shift operations are performed by the `logic_256bit` module (`vector_logic.sv`). This can be used on its own or combined with the addsub unit as the ALU module, `vector_alu` (`vector_alu.sv`). 43 | | Port | Direction | Width | Description | 44 | |---|---|---|---| 45 | | `a_i` | in | 256 | Vector input A | 46 | | `b_i` | in | 256 | Vector input B | 47 | | `sew_i` | in | 3 | Standard element width (8, 16, 32, 64, 128, 256) | 48 | | `opcode_i` | in | 6 | Logic/ALU opcodes | 49 | | `carry_ext_i` | in | 32 | External carry/borrow in | 50 | 51 | ### 3.3 Integer Multiplication 52 | Integer multiplication is performed by the `mult_256bit` module (`vector_mult.sv`). 53 | As the design targets Xilinx FPGAs, the `(* use_dsp48 = "true" *)` directive is added to force the tool to infer DSP48 blocks to perform the multiplication. This can be removed if targeting for simulation or if a non-Xilinx FPGA is used. 54 | | Port | Direction | Width | Description | 55 | |---|---|---|---| 56 | | `a_i` | in | 256 | Vector input A | 57 | | `b_i` | in | 256 | Vector input B | 58 | | `sew_i` | in | 3 | Standard element width (8, 16, 32, 64) | 59 | | `out_o` | out | 256 | Vector output | 60 | 61 | ### 3.4 Shift 62 | 63 | ### 3.5 Integer Compare 64 | Currently in development 65 | 66 | ### 3.6 Vector Masking 67 | Vector masking allows certain elements of an input vector to be ignored by a execution unit. Vector register `v0` is used as the vector mask register, each element in a vector is allocated a single bit in the mask register. Element i is masked by bit i in the mask register. 68 | Currently, only the ALU (module `vector_alu`) supports vector masking. 69 | 70 | ## 4. Processor Opcodes 71 | Please note, these are the opcodes used for specifying the operation of each execution unit and are not identical to the opcodes used by the RISC-V vector extension standard. The decode module (not yet published) is responsible for this conversion. 72 | ### 4.1 ALU 73 | | Binary Code | Opcode | Operation| 74 | |---|---|---| 75 | | `000000` | `ALU_VAND` | `A & B` | 76 | | `000001` | `ALU_VNAND`| `¬(A & B)` | 77 | | `000010` | `ALU_VANDNOT` | `A & ¬B` | 78 | | `000011` | `ALU_VOR` | `A + B` | 79 | | `000100` | `ALU_VNOR` | `¬(A + B)` | 80 | | `000101` | `ALU_VXOR` | `A ⊕ B`| 81 | | `000110` | `ALU_VXNOR` | `¬(A ⊕ B)` | 82 | | `000111` | `ALU_VNOT` | `¬A` | 83 | | `001000` | `ALU_VSLL` | `A << B` | 84 | | `001001` | `ALU_VSRL`| `A >> B` | 85 | | `001010` | `ALU_VSRA` | `A >>> B` | 86 | | `001011` | `ALU_VADD` | `A + B` | 87 | | `001100` | `ALU_VSUB` | `A - B` | 88 | | `001101` | `ALU_VMIN` | `min(A, B)`| 89 | | `001110` | `ALU_VMAX` | `max(A, B)` | 90 | | `001111` | `ALU_VADC` | `A + B + Cin` | 91 | | `001110` | `ALU_VSBC` | `A - B - Cin` | 92 | -------------------------------------------------------------------------------- /hw/pkg.sv: -------------------------------------------------------------------------------- 1 | //////////////////////////////////////// 2 | ////////// RISC-V CPU Package ////////// 3 | ////////// Martin Riis, 2020 /////////// 4 | //////////////////////////////////////// 5 | 6 | package riscv_pkg; 7 | 8 | typedef union packed { 9 | logic[255:0] i256; 10 | logic[1:0][127:0] i128; 11 | logic[3:0][63:0] i64; 12 | logic[7:0][31:0] i32; 13 | logic[15:0][15:0] i16; 14 | logic[31:0][7:0] i8; 15 | logic[63:0][3:0] i4; 16 | } vector_t; 17 | 18 | typedef enum logic[2:0] { 19 | SEW8 = 3'b000, 20 | SEW16 = 3'b001, 21 | SEW32 = 3'b010, 22 | SEW64 = 3'b011 23 | // SEW128 - SEW1024 are reserved as of v1.0 24 | //SEW128 = 3'b100, 25 | //SEW256 = 3'b101, 26 | //SEW512 = 3'b110, 27 | //SEW1024 = 3'b111 28 | } sew_t; 29 | 30 | 31 | /////////////////////////////// 32 | // Standard Instruction Formats 33 | /////////////////////////////// 34 | typedef struct packed { 35 | logic[6:0] funct7; 36 | logic[4:0] rs2; 37 | logic[4:0] rs1; 38 | logic[2:0] funct3; 39 | logic[4:0] rd; 40 | logic[6:0] opcode; 41 | } rtype_t; 42 | 43 | typedef struct packed { 44 | logic[4:0] rs3; 45 | logic[1:0] fmt; 46 | logic[4:0] rs2; 47 | logic[4:0] rs1; 48 | logic[2:0] rm; 49 | logic[4:0] rd; 50 | logic[6:0] opcode; 51 | } r4type_t; 52 | 53 | typedef struct packed { 54 | logic[11:0] imm; 55 | logic[4:0] rs1; 56 | logic[2:0] funct3; 57 | logic[4:0] rd; 58 | logic[6:0] opcode; 59 | } itype_t; 60 | 61 | typedef struct packed { 62 | logic[6:0] imm1; 63 | logic[4:0] rs2; 64 | logic[4:0] rs1; 65 | logic[2:0] funct3; 66 | logic[4:0] imm0; 67 | logic[6:0] opcode; 68 | } stype_t; 69 | 70 | typedef struct packed { 71 | logic imm3; 72 | logic[5:0] imm1; 73 | logic[4:0] rs2; 74 | logic[4:0] rs1; 75 | logic[2:0] funct3; 76 | logic[3:0] imm0; 77 | logic imm2; 78 | logic[6:0] opcode; 79 | } btype_t; 80 | 81 | typedef struct packed { 82 | logic[19:0] imm; 83 | logic[4:0] rd; 84 | logic[6:0] opcode; 85 | } utype_t; 86 | 87 | typedef struct packed { 88 | logic imm3; 89 | logic[9:0] imm0; 90 | logic imm1; 91 | logic[7:0] imm2; 92 | logic[4:0] rd; 93 | logic[6:0] opcode; 94 | } jtype_t; 95 | 96 | typedef struct packed { 97 | logic[4:0] funct5; 98 | logic aq; 99 | logic rl; 100 | logic[4:0] rs2; 101 | logic[4:0] rs1; 102 | logic[2:0] funct3; 103 | logic[4:0] rd; 104 | logic[6:0] opcode; 105 | } atype_t; // For AMO 106 | 107 | typedef enum logic[6:0] { 108 | OPCODE_AMO = 7'b0101111, 109 | OPCODE_C0 = 7'bXXXXX00, 110 | OPCODE_C1 = 7'bXXXXX01, 111 | OPCODE_C2 = 7'bXXXXX10, 112 | OPCODE_LD_FP = 7'b0000111, 113 | OPCODE_ST_FP = 7'b0100111, 114 | OPCODE_OP_FP = 7'b0111011, 115 | OPCODE_FMADD = 7'b1000011, 116 | OPCODE_FMSUB = 7'b1000111, 117 | OPCODE_FNMSUB = 7'b1001011, 118 | OPCODE_FNMADD = 7'b1001111, 119 | OPCODE_OP_IMM = 7'b0010011, 120 | OPCODE_LUI = 7'b0110111, 121 | OPCODE_AUIPC = 7'b0010111, 122 | OPCODE_OP = 7'b0110011, 123 | OPCODE_JAL = 7'b1101111, 124 | OPCODE_JALR = 7'b1100111, 125 | OPCODE_BRANCH = 7'b1100011, 126 | OPCODE_LOAD = 7'b0000011, 127 | OPCODE_STORE = 7'b0100011, 128 | OPCODE_MISC_MEM = 7'b0001111, 129 | OPCODE_OPV = 7'b1010111 130 | } rv_opcodes; 131 | 132 | typedef enum logic[2:0] { 133 | OPIVV = 3'b000, 134 | OPFVV = 3'b001, 135 | OPMVV = 3'b010, 136 | OPIVI = 3'b011, 137 | OPIVX = 3'b100, 138 | OPFVF = 3'b101, 139 | OPMVX = 3'b110, 140 | OPCFG = 3'b111 141 | } vfunct3_t; 142 | 143 | typedef enum logic[5:0] { 144 | VALU_NOP = 6'b000000, 145 | VALU_ADDU, 146 | VALU_ADDS, 147 | VALU_SUBU, 148 | VALU_SUBS, 149 | VALU_RSUB, // Should be more efficient than larger input muxes 150 | VALU_MINS, 151 | VALU_MINU, 152 | VALU_MAXS, 153 | VALU_MAXU, 154 | VALU_EQ, 155 | VALU_NE, 156 | VALU_LTU, 157 | VALU_LTS, 158 | VALU_LEU, 159 | VALU_LES, 160 | VALU_GTU, 161 | VALU_GTS, 162 | VALU_AND, 163 | VALU_OR, 164 | VALU_XOR, 165 | VALU_SLL, 166 | VALU_SRL, 167 | VALU_SRA, 168 | VALU_ADC, 169 | VALU_SBC, 170 | VALU_AADDS, 171 | VALU_AADDU, 172 | VALU_ASUBS, 173 | VALU_ASUBU, 174 | VALU_ZEX2, 175 | VALU_ZEX4, 176 | VALU_ZEX8, 177 | VALU_SEX2, 178 | VALU_SEX4, 179 | VALU_SEX8 180 | } valu_opcodes; 181 | 182 | typedef enum logic[4:0] { 183 | VMD_NOP = 5'b00000, 184 | VMD_MULUU, 185 | VMD_MULUS, 186 | VMD_MULSS, 187 | VMD_DIVU, 188 | VMD_DIVS, 189 | VMD_REMU, 190 | VMD_REMS 191 | } vmuldiv_opcodes; 192 | 193 | typedef enum logic[5:0] { 194 | VFPU_NOP = 6'b000000, 195 | VFPU_ADD, 196 | VFPU_SUB, 197 | VFPU_MIN, 198 | VFPU_MAX, 199 | VFPU_SQRT, 200 | VFPU_SQRT_EST, 201 | VFPU_REC_EST, 202 | VFPU_EQ, 203 | VFPU_LE, 204 | VFPU_LT, 205 | VFPU_NE, 206 | VFPU_GT, 207 | VFPU_GE, 208 | VFPU_DIV, 209 | VFPU_RDIV, 210 | VFPU_MUL, 211 | VFPU_RSUB, 212 | VFPU_SGNJ, 213 | VFPU_SGNJN, 214 | VFPU_SGNJX, 215 | VFPU_F2U, 216 | VFPU_F2S, 217 | VFPU_U2F, 218 | VFPU_S2F, 219 | VFPU_F2D, 220 | VFPU_D2F, 221 | VFPU_CLASS 222 | } vfpu_opcodes; 223 | 224 | typedef enum logic[1:0] { 225 | CVT_NONE, 226 | CVT_WIDE, 227 | CVT_NARROW 228 | } width_cvt_t; 229 | 230 | typedef enum logic[2:0] { 231 | FPU_RNE = 3'b000, 232 | FPU_RTZ = 3'b001, 233 | FPU_RDN = 3'b010, 234 | FPU_RUP = 3'b011, 235 | FPU_RMM = 3'b100, 236 | FPU_REV = 3'b101, // Not part of standard F/D extensions, only used for vfpu 237 | FPU_ROD = 3'b110, // Not part of standard F/D extensions, only used for vfpu 238 | FPU_INST = 3'b111 239 | } fpu_round_t; 240 | 241 | typedef struct packed { 242 | logic valid; 243 | logic[20:0] tag; 244 | logic [31:0] data; 245 | } cache_struct; 246 | 247 | typedef struct packed { 248 | logic[20:0] tag; 249 | logic[10:0] index; 250 | } addr_struct; 251 | 252 | endpackage 253 | -------------------------------------------------------------------------------- /hw/vector_addsub.sv: -------------------------------------------------------------------------------- 1 | import vector_processor_pkg::*; 2 | 3 | // *** 8-BIT ADDER/SUBTRACTOR WITH CARRY IN AND OUT *** // 4 | module addsub_8bit ( 5 | input wire addsub8_en_i, // Active HIGH enable 6 | input wire [7:0] a_i, b_i, 7 | input wire cin_i, // Carry in 8 | input wire op_i, // Operation, 0 = subtract, 1 = add 9 | output reg [7:0] out_o, 10 | output reg cout_o // Carry out 11 | ); 12 | 13 | reg [8:0] full_out; // 9-bit output 14 | 15 | assign out_o = addsub8_en_i ? full_out[7:0] : 8'b0; 16 | assign cout_o = addsub8_en_i ? full_out[8] : 1'b0; 17 | 18 | always_comb begin 19 | if (addsub8_en_i) begin 20 | if (op_i) // Add 21 | full_out <= a_i + b_i + cin_i; 22 | else // Subtract 23 | full_out <= a_i - b_i - cin_i; 24 | end 25 | end 26 | endmodule 27 | 28 | // *** 8-BIT FLIP-FLOP *** // 29 | module flipflop_8bit ( 30 | input wire clk_i, 31 | input wire [7:0] d_i, 32 | output reg [7:0] q_o 33 | ); 34 | 35 | always_ff @ (posedge clk_i) begin 36 | q_o <= d_i; 37 | end 38 | endmodule 39 | 40 | // *** 8-BIT DELAY BLOCK *** // 41 | // Allows for custom delay using flip-flops 42 | module delay_8bit # ( 43 | parameter delay = 8 44 | ) 45 | ( 46 | input wire clk_i, 47 | input wire [7:0] d_i, 48 | output wire [7:0] q_o 49 | ); 50 | 51 | wire [7:0] sig_int[delay]; 52 | 53 | genvar d; 54 | for (d = 0; d < delay; d++) begin 55 | // Single delay cycle - single flip-flop 56 | if (delay == 1) begin 57 | flipflop_8bit ff8_inst ( 58 | .clk_i (clk_i), 59 | .d_i (d_i), 60 | .q_o (q_o) 61 | ); 62 | end 63 | // Two cycle delay - two flip-flops 64 | else if (delay == 2) begin 65 | if (d == 0) begin 66 | flipflop_8bit ff8_inst ( 67 | .clk_i (clk_i), 68 | .d_i (d_i), 69 | .q_o (sig_int[d]) 70 | ); 71 | end 72 | else begin 73 | flipflop_8bit ff8_inst ( 74 | .clk_i (clk_i), 75 | .d_i (sig_int[d-1]), 76 | .q_o (q_o) 77 | ); 78 | end 79 | end 80 | // Multi-cycle delay 81 | else begin 82 | if (d == 0) begin 83 | flipflop_8bit ff8_inst ( 84 | .clk_i (clk_i), 85 | .d_i (d_i), 86 | .q_o (sig_int[d]) 87 | ); 88 | end 89 | else if (d == delay - 1) begin 90 | flipflop_8bit ff8_inst ( 91 | .clk_i (clk_i), 92 | .d_i (sig_int[d-1]), 93 | .q_o (q_o) 94 | ); 95 | end 96 | else begin 97 | flipflop_8bit ff8_inst ( 98 | .clk_i (clk_i), 99 | .d_i (sig_int[d-1]), 100 | .q_o (sig_int[d]) 101 | ); 102 | end 103 | end 104 | end 105 | endmodule 106 | 107 | 108 | // *** 256-BIT VECTOR ADDER WITH VECTOR CARRY IN AND OUT *** // 109 | // Supports: 110 | // - 1x 256-bit word 111 | // - 2x 128-bit words 112 | // - 4x 64-bit words 113 | // - 8x 32-bit words 114 | // - 16x 16-bit words 115 | // - 32x 8-bit words 116 | 117 | // Word length selected by sew_i input 118 | module addsub_256bit ( 119 | input wire clk_i, 120 | input wire vaddsub_en_i, // Active HIGH enable 121 | input wire [255:0] a_i, b_i, 122 | input wire [2:0] sew_i, 123 | input wire [31:0] carry_ext_i, // External carry in 124 | input wire op_i, // Operation, 0 = subtract, 1 = add 125 | output reg [255:0] out_o, 126 | output wire [32:0] cout_o // Carry out 127 | ); 128 | 129 | reg [32:0] cin_int = 33'b0; 130 | wire [32:0] cout_int; 131 | 132 | // Mask internal carry values depending on the current SEW 133 | always_comb begin 134 | case (sew_i) 135 | 3'b000 : cin_int <= 32'h0 & cout_int; // 8-bit words 136 | 3'b001 : cin_int <= 32'haaaaaaaa & cout_int; // 16-bit words 137 | 3'b010 : cin_int <= 32'heeeeeeee & cout_int; // 32-bit words 138 | 3'b011 : cin_int <= 32'hfefefefe & cout_int; // 64-bit words 139 | 3'b100 : cin_int <= 32'hfffefffe & cout_int; // 128-bit words 140 | 3'b101 : cin_int <= 32'hfffffffe & cout_int; // 256-bit words 141 | endcase 142 | end 143 | 144 | wire [255:0] a_int, b_int; 145 | 146 | assign a_int[7:0] = a_i[7:0]; 147 | assign b_int[7:0] = b_i[7:0]; 148 | 149 | genvar p; 150 | for (p = 1; p < 32; p++) begin 151 | delay_8bit # ( 152 | .delay(p) 153 | ) 154 | reg_pipeline_a ( 155 | .clk_i (clk_i), 156 | .d_i (a_i[p*8+7:p*8]), 157 | .q_o (a_int[p*8+7:p*8]) 158 | ); 159 | 160 | delay_8bit # ( 161 | .delay(p) 162 | ) 163 | reg_pipeline_b ( 164 | .clk_i (clk_i), 165 | .d_i (b_i[p*8+7:p*8]), 166 | .q_o (b_int[p*8+7:p*8]) 167 | ); 168 | end 169 | 170 | // Generate 32 8-bit addsubs (256-bits total) 171 | genvar i; 172 | for (i = 0; i < 32; i++) begin 173 | if (i == 0) begin 174 | addsub_8bit addsub_8bit_inst ( 175 | .addsub8_en_i (vaddsub_en_i), 176 | .a_i (a_int[8*(i+1)-1:8*(i+1)-8]), 177 | .b_i (b_int[8*(i+1)-1:8*(i+1)-8]), 178 | .out_o (out_o[8*(i+1)-1:8*(i+1)-8]), 179 | .op_i (op_i), 180 | .cin_i (carry_ext_i[i]), 181 | .cout_o (cout_int[i+1]) 182 | ); 183 | end 184 | else begin 185 | addsub_8bit addsub_8bit_inst ( 186 | .addsub8_en_i (vaddsub_en_i), 187 | .a_i (a_int[8*(i+1)-1:8*(i+1)-8]), 188 | .b_i (b_int[8*(i+1)-1:8*(i+1)-8]), 189 | .out_o (out_o[8*(i+1)-1:8*(i+1)-8]), 190 | .op_i (op_i), 191 | .cin_i (cin_int[i] | carry_ext_i[i]), 192 | .cout_o (cout_int[i+1]) 193 | ); 194 | end 195 | end 196 | endmodule 197 | -------------------------------------------------------------------------------- /hw/vector_alu.sv: -------------------------------------------------------------------------------- 1 | ////////////////////////////////// 2 | // Vector Arithmetic Logic Unit // 3 | /////// Martin Riis, 2020 //////// 4 | ////////////////////////////////// 5 | 6 | import vector_processor_pkg::*; 7 | import riscv_pkg::*; 8 | 9 | module vector_alu ( 10 | input logic clk_i, 11 | input logic rst_i, 12 | 13 | input vector_t vs1_i, 14 | input vector_t vs2_i, 15 | input logic[XLEN-1:0] rs1_i, 16 | input vector_t v0_i, // Mask and carry in register 17 | output vector_t vd_o, 18 | input sew_t sew_i, 19 | 20 | input logic signed_i, // 0 = unsigned, 1 = signed 21 | input logic use_mask_i, // 0 = don't mask, 1 = use mask 22 | input logic use_carry_i, // 0 = don't use carry/borrow, 1 = use carry/borrow 23 | input logic produce_carry_i, // Produce carry/borrow out in mask register format 24 | input logic saturate_i, // 0 = overflow, 1 = saturate + set VXSAT bit 25 | 26 | input valu_opcodes valu_op_i 27 | ); 28 | 29 | // Internal VS1 and VS2 signals 30 | vector_t vs1; 31 | vector_t vs2; 32 | 33 | // VECTOR MASKING OPERATION // 34 | /* 35 | * If the mask input is '1', the input vectors are masked, else, 36 | * no mask is applied. 37 | * The mask source is vector v0 (input v0_i) which is shared 38 | * with the carry input. Therefore, a mask and carry cannot be 39 | * used together. 40 | */ 41 | always_comb begin 42 | if (use_mask_i) begin 43 | case (sew_i) 44 | SEW8 : begin // 8-bit input 45 | for (int i = 0; i < 32; i++) begin 46 | vs1.i8[i] <= vs1_i.i8[i] & ~{8{v0_i.i8[i]}}; 47 | vs2.i8[i] <= vs2_i.i8[i] & ~{8{v0_i.i8[i]}}; 48 | end 49 | end 50 | SEW16 : begin // 16-bit input 51 | for (int i = 0; i < 16; i++) begin 52 | vs1.i16[i] <= vs1_i.i16[i] & ~{16{v0_i.i16[i]}}; 53 | vs2.i16[i] <= vs2_i.i16[i] & ~{16{v0_i.i16[i]}}; 54 | end 55 | end 56 | SEW32 : begin // 32-bit input 57 | for (int i = 0; i < 8; i++) begin 58 | vs1.i32[i] <= vs1_i.i32[i] & ~{32{v0_i.i32[i]}}; 59 | vs2.i32[i] <= vs2_i.i32[i] & ~{32{v0_i.i32[i]}}; 60 | end 61 | end 62 | SEW64 : begin // 64-bit input 63 | for (int i = 0; i < 4; i++) begin 64 | vs1.i64[i] <= vs1_i.i64[i] & ~{64{v0_i.i64[i]}}; 65 | vs2.i64[i] <= vs2_i.i64[i] & ~{64{v0_i.i64[i]}}; 66 | end 67 | end 68 | endcase 69 | end 70 | else begin 71 | vs1 <= vs1_i; 72 | vs2 <= vs1_i; 73 | end 74 | end 75 | 76 | // Internal carry signals 77 | logic[31:0] carry_out; 78 | logic[31:0] carry_in; 79 | // Set external carry in 80 | /* 81 | External carry input uses the mask register (v0). 82 | use_carry_i == 1 : carry input is used and sourced from mask register 83 | use_carry_i == 0 : carry input is not used, carry_in = 0 84 | */ 85 | assign carry_in = use_carry_i ? v0_i.i32[0] : 32'b0; 86 | 87 | addsub_256bit addsub_256bit_inst ( 88 | .vaddsub_en_i (addsub_en_int), 89 | .a_i (vs1_int), 90 | .b_i (vs2_int), 91 | .sew_i (sew_i), 92 | .carry_ext_i (carry_ext_int), 93 | .op_i (addsub_op_i), 94 | .out_o (vd_o), 95 | .cout_o (carry_out_int) 96 | ); 97 | 98 | logic_256bit logic_256bit_inst ( 99 | .logic_en_i (logic_en_int), 100 | .a_i (vs1_int), 101 | .b_i (vs2_int), 102 | .sew_i (sew_i), 103 | .opcode_i (dc_alu_opcode_i), 104 | .out_o (vd_o) 105 | ); 106 | 107 | endmodule 108 | -------------------------------------------------------------------------------- /hw/vector_logic.sv: -------------------------------------------------------------------------------- 1 | import vector_processor_pkg::*; 2 | 3 | // *** 256-BIT LOGIC UNIT *** // 4 | module logic_256bit ( 5 | input wire [255:0] a_i, b_i, 6 | input wire [2:0] sew_i, 7 | input vector_processor_pkg::alu_opcodes opcode_i, 8 | output reg [255:0] out_o 9 | ); 10 | 11 | reg [255:0] result_logic; 12 | 13 | // LOGIC OPERATIONS // 14 | always_comb begin 15 | result_logic = '0; 16 | case (opcode_i) 17 | ALU_VAND : result_logic <= a_i & b_i; 18 | ALU_VNAND : result_logic <= ~(a_i & b_i); 19 | ALU_VANDNOT : result_logic <= a_i & ~b_i; 20 | ALU_VOR : result_logic <= a_i | b_i; 21 | ALU_VNOR : result_logic <= ~(a_i | b_i); 22 | ALU_VXOR : result_logic <= a_i ^ b_i; 23 | ALU_VXNOR : result_logic <= ~(a_i ^ b_i); 24 | ALU_VNOT : result_logic <= ~a_i; 25 | endcase 26 | end 27 | 28 | // SHIFT OPERATIONS // 29 | // Can likely be optimised to use fewer resources 30 | wire [255:0] result_shift_8bit, 31 | result_shift_16bit, 32 | result_shift_32bit, 33 | result_shift_64bit, 34 | result_shift_128bit, 35 | result_shift_256bit; 36 | reg [255:0] result_shift; 37 | 38 | // Instantiates 32 8-bit shiftere 39 | genvar i8; 40 | for (i8 = 0; i8 < 255; i8 = i8 + 8) begin 41 | shifter # (.width (8)) shift_8bit ( 42 | .a_i (a_i[i8+7:i8]), 43 | .b_i (b_i[i8+7:i8]), 44 | .opcode_i (opcode_i), 45 | .out_o (result_shift_8bit[i8+7:i8]) 46 | ); 47 | end 48 | 49 | // Instantiates 16 16-bit shiftere 50 | genvar i16; 51 | for (i16 = 0; i16 < 255; i16 = i16 + 16) begin 52 | shifter # (.width (16)) shift_16bit ( 53 | .a_i (a_i[i16+15:i16]), 54 | .b_i (b_i[i16+15:i16]), 55 | .opcode_i (opcode_i), 56 | .out_o (result_shift_16bit[i16+15:i16]) 57 | ); 58 | end 59 | 60 | // Instantiates 8 32-bit shiftere 61 | genvar i32; 62 | for (i32 = 0; i32 < 255; i32 = i32 + 32) begin 63 | shifter # (.width (32)) shift_32bit ( 64 | .a_i (a_i[i32+31:i32]), 65 | .b_i (b_i[i32+31:i32]), 66 | .opcode_i (opcode_i), 67 | .out_o (result_shift_32bit[i32+31:i32]) 68 | ); 69 | end 70 | 71 | // Instantiates 4 64-bit shiftere 72 | genvar i64; 73 | for (i64 = 0; i64 < 255; i64 = i64 + 64) begin 74 | shifter # (.width (64)) shift_64bit ( 75 | .a_i (a_i[i64+63:i64]), 76 | .b_i (b_i[i64+63:i64]), 77 | .opcode_i (opcode_i), 78 | .out_o (result_shift_64bit[i64+63:i64]) 79 | ); 80 | end 81 | 82 | // Instantiates 2 128-bit shiftere 83 | genvar i128; 84 | for (i128 = 0; i128 < 255; i128 = i128 + 128) begin 85 | shifter # (.width (128)) shift_128bit ( 86 | .a_i (a_i[i128+127:i128]), 87 | .b_i (b_i[i128+127:i128]), 88 | .opcode_i (opcode_i), 89 | .out_o (result_shift_128bit[i128+127:i128]) 90 | ); 91 | end 92 | 93 | // Instantiates 1 256-bit shifter 94 | genvar i256; 95 | for (i256 = 0; i256 < 255; i256 = i256 + 256) begin 96 | shifter # (.width (256)) shift_256bit ( 97 | .a_i (a_i[i256+255:i256]), 98 | .b_i (b_i[i256+255:i256]), 99 | .opcode_i (opcode_i), 100 | .out_o (result_shift_256bit[i256+255:i256]) 101 | ); 102 | end 103 | 104 | // SHIFTER OUTPUT CONTROL // 105 | // Selects shift output based on SEW 106 | always_comb begin 107 | case (sew_i) 108 | 3'b000 : result_shift <= result_shift_8bit; 109 | 3'b001 : result_shift <= result_shift_16bit; 110 | 3'b010 : result_shift <= result_shift_32bit; 111 | 3'b011 : result_shift <= result_shift_64bit; 112 | 3'b100 : result_shift <= result_shift_128bit; 113 | 3'b101 : result_shift <= result_shift_256bit; 114 | endcase 115 | end 116 | 117 | // RESULT OUTPUT // 118 | // Selects between logic block result and shifter result based on the opcode 119 | always_comb begin 120 | out_o = '0; 121 | case (opcode_i) 122 | ALU_VAND, ALU_VNAND, ALU_VANDNOT, ALU_VOR, 123 | ALU_VNOR, ALU_VXOR, ALU_VXNOR, ALU_VNOT : out_o <= result_logic; 124 | ALU_VSLL, ALU_VSRL, ALU_VSRA : out_o <= result_shift; 125 | endcase 126 | end 127 | endmodule 128 | 129 | 130 | // *** N-BIT SHIFTER *** // 131 | // Shifts a_i by b_i bits 132 | module shifter # ( 133 | parameter width = 8 134 | ) 135 | ( 136 | input wire [width-1:0] a_i, b_i, 137 | input vector_processor_pkg::alu_opcodes opcode_i, 138 | output reg [width-1:0] out_o 139 | ); 140 | 141 | always_comb begin 142 | case (opcode_i) 143 | ALU_VSLL : out_o <= a_i << b_i; // Left logical shift 144 | ALU_VSRL : out_o <= a_i >> b_i; // Right logical shift 145 | ALU_VSRA : out_o <= a_i >>> b_i; // Right arithmetic shift (preserves sign) 146 | endcase 147 | end 148 | endmodule -------------------------------------------------------------------------------- /hw/vector_mult.sv: -------------------------------------------------------------------------------- 1 | // Forces use of Xilinx DSP48 blocks for MAC operation 2 | (* use_dsp48 = "true" *) 3 | 4 | // 16-bit multiply-accumulate unit 5 | module mult_16bit ( 6 | input wire [15:0] a_i, b_i, c_i, 7 | output reg [31:0] fout_o, // Full 16-bit output 8 | output reg [15:0] out_o // Truncated 8-bit output 9 | ); 10 | 11 | assign fout_o = a_i * b_i + c_i; 12 | assign out_o = fout_o[15:0]; 13 | 14 | endmodule 15 | 16 | // 256-BIT VECTOR MULTIPLIER 17 | // Supports: 18 | // - 4x 64-bit words 19 | // - 8x 32-bit words 20 | // - 16x 16-bit words 21 | // - 32x 8-bit words 22 | 23 | // Word length selected by sew_i input 24 | module mult_256bit ( 25 | input wire [255:0] a_i, b_i, 26 | input wire [2:0] sew_i, 27 | output reg [255:0] out_o 28 | ); 29 | 30 | genvar i; 31 | for (i = 0; i < 4; i++) begin 32 | mult_64bit mult_64bit_inst ( 33 | .a_i (a_i[i*64+63:i*64]), 34 | .b_i (b_i[i*64+63:i*64]), 35 | .sew_i (sew_i), 36 | .out_o (out_o[i*64+63:i*64]) 37 | ); 38 | end 39 | endmodule 40 | 41 | 42 | // 64-bit multiplier from 16-bit multipliers 43 | module mult_64bit ( 44 | input wire [63:0] a_i, b_i, 45 | input wire [7:0] sew_i, 46 | output reg [63:0] out_o 47 | ); 48 | 49 | reg [15:0] a_int[10], b_int[10]; 50 | reg [15:0] out_int[10]; 51 | reg [31:0] fout_int [18]; 52 | 53 | always_comb begin 54 | case (sew_i) 55 | 3'b000 : begin // 8-bit words 56 | a_int[0] <= {8'b0, a_i[7:0]}; 57 | b_int[0] <= {8'b0, b_i[7:0]}; 58 | 59 | a_int[1] <= {8'b0, a_i[15:8]}; 60 | b_int[1] <= {8'b0, b_i[15:8]}; 61 | 62 | a_int[2] <= {8'b0, a_i[23:16]}; 63 | b_int[2] <= {8'b0, b_i[23:16]}; 64 | 65 | a_int[3] <= {8'b0, a_i[31:24]}; 66 | b_int[3] <= {8'b0, b_i[31:24]}; 67 | 68 | a_int[4] <= {8'b0, a_i[39:32]}; 69 | b_int[4] <= {8'b0, b_i[39:32]}; 70 | 71 | a_int[5] <= {8'b0, a_i[47:40]}; 72 | b_int[5] <= {8'b0, b_i[47:40]}; 73 | 74 | a_int[6] <= {8'b0, a_i[55:48]}; 75 | b_int[6] <= {8'b0, b_i[55:48]}; 76 | 77 | a_int[7] <= {8'b0, a_i[63:56]}; 78 | b_int[7] <= {8'b0, b_i[63:56]}; 79 | 80 | a_int[8] <= 16'b0; 81 | b_int[8] <= 16'b0; 82 | a_int[9] <= 16'b0; 83 | b_int[9] <= 16'b0; 84 | 85 | // Outputs 86 | out_o[7:0] <= out_int[0]; 87 | out_o[15:8] <= out_int[1]; 88 | out_o[23:16] <= out_int[2]; 89 | out_o[31:24] <= out_int[3]; 90 | out_o[39:32] <= out_int[4]; 91 | out_o[47:40] <= out_int[5]; 92 | out_o[55:48] <= out_int[6]; 93 | out_o[63:56] <= out_int[7]; 94 | 95 | // Intermediate full output 96 | fout_int[1] <= 32'b0; // Stage 0-1 97 | fout_int[3] <= 32'b0; // Stage 1-2 98 | fout_int[5] <= 32'b0; // Stage 2-3 99 | fout_int[7] <= 32'b0; // Stage 3-4 100 | fout_int[9] <= 32'b0; // Stage 4-5 101 | fout_int[11] <= 32'b0; // Stage 5-6 102 | fout_int[13] <= 32'b0; // Stages 6-7 103 | fout_int[15] <= 32'b0; // Stage 7-8 104 | fout_int[17] <= 32'b0; // Stage 8-9 105 | end 106 | 3'b001 : begin // 16-bit words 107 | a_int[0] <= a_i[15:0]; 108 | b_int[0] <= b_i[15:0]; 109 | 110 | a_int[1] <= a_i[31:16]; 111 | b_int[1] <= b_i[31:16]; 112 | 113 | a_int[2] <= a_i[47:32]; 114 | b_int[2] <= b_i[47:32]; 115 | 116 | a_int[3] <= a_i[63:48]; 117 | b_int[3] <= b_i[63:48]; 118 | 119 | a_int[4] <= 16'b0; 120 | b_int[4] <= 16'b0; 121 | a_int[5] <= 16'b0; 122 | b_int[5] <= 16'b0; 123 | a_int[6] <= 16'b0; 124 | b_int[6] <= 16'b0; 125 | a_int[7] <= 16'b0; 126 | b_int[7] <= 16'b0; 127 | a_int[8] <= 16'b0; 128 | b_int[8] <= 16'b0; 129 | a_int[9] <= 16'b0; 130 | b_int[9] <= 16'b0; 131 | 132 | // Outputs 133 | out_o[15:0] <= out_int[0]; 134 | out_o[31:16] <= out_int[1]; 135 | out_o[47:32] <= out_int[2]; 136 | out_o[63:48] <= out_int[3]; 137 | 138 | // Intermediate full output 139 | fout_int[1] <= 32'b0; // Stage 0-1 140 | fout_int[3] <= 32'b0; // Stage 1-2 141 | fout_int[5] <= 32'b0; // Stage 2-3 142 | fout_int[7] <= 32'b0; // Stage 3-4 143 | fout_int[9] <= 32'b0; // Stage 4-5 144 | fout_int[11] <= 32'b0; // Stage 5-6 145 | fout_int[13] <= 32'b0; // Stage 6-7 146 | fout_int[15] <= 32'b0; // Stage 7-8 147 | fout_int[17] <= 32'b0; // Stage 8-9 148 | end 149 | 3'b010 : begin // 32-bit words 150 | a_int[0] <= a_i[15:0]; 151 | b_int[0] <= b_i[15:0]; 152 | 153 | a_int[1] <= a_i[15:0]; 154 | b_int[1] <= b_i[31:16]; 155 | a_int[2] <= a_i[31:16]; 156 | b_int[2] <= b_i[15:0]; 157 | 158 | a_int[3] <= a_i[47:32]; 159 | b_int[3] <= b_i[47:32]; 160 | 161 | a_int[4] <= a_i[47:32]; 162 | b_int[4] <= b_i[63:48]; 163 | a_int[5] <= a_i[63:48]; 164 | b_int[5] <= b_i[47:32]; 165 | 166 | a_int[6] <= 16'b0; 167 | b_int[6] <= 16'b0; 168 | a_int[7] <= 16'b0; 169 | b_int[7] <= 16'b0; 170 | a_int[8] <= 16'b0; 171 | b_int[8] <= 16'b0; 172 | a_int[9] <= 16'b0; 173 | b_int[9] <= 16'b0; 174 | 175 | // Outputs 176 | out_o[15:0] <= out_int[0]; 177 | out_o[31:16] <= out_int[2]; 178 | out_o[47:32] <= out_int[3]; 179 | out_o[63:48] <= out_int[5]; 180 | 181 | // Intermediate full output 182 | fout_int[1] <= fout_int[0] >> 16; // Stage 0-1 183 | fout_int[3] <= fout_int[2]; // Stage 1-2 184 | fout_int[5] <= 32'b0; // Stage 2-3 185 | fout_int[7] <= fout_int[6] >> 16; // Stage 3-4 186 | fout_int[9] <= fout_int[8]; // Stage 4-5 187 | fout_int[11] <= 32'b0; // Stage 5-6 188 | fout_int[13] <= 32'b0; // Stages 6-7 189 | fout_int[15] <= 32'b0; // Stage 7-8 190 | fout_int[17] <= 32'b0; // Stage 8-9 191 | end 192 | 3'b011 : begin // 64-bit words 193 | a_int[0] <= a_i[15:0]; 194 | b_int[0] <= b_i[15:0]; 195 | 196 | a_int[1] <= a_i[15:0]; 197 | b_int[1] <= b_i[31:16]; 198 | a_int[2] <= a_i[31:16]; 199 | b_int[2] <= b_i[15:0]; 200 | 201 | a_int[3] <= a_i[15:0]; 202 | b_int[3] <= b_i[47:32]; 203 | a_int[4] <= a_i[31:16]; 204 | b_int[4] <= b_i[31:16]; 205 | a_int[5] <= a_i[47:32]; 206 | b_int[5] <= b_i[15:0]; 207 | 208 | a_int[6] <= a_i[15:0]; 209 | b_int[6] <= b_i[63:48]; 210 | a_int[7] <= a_i[31:16]; 211 | b_int[7] <= b_i[47:32]; 212 | a_int[8] <= a_i[47:32]; 213 | b_int[8] <= b_i[31:16]; 214 | a_int[9] <= a_i[63:48]; 215 | b_int[9] <= b_i[15:0]; 216 | 217 | // Outputs 218 | out_o[15:0] <= out_int[0]; 219 | out_o[31:16] <= out_int[2]; 220 | out_o[47:32] <= out_int[5]; 221 | out_o[63:48] <= out_int[9]; 222 | 223 | // Intermediate full output 224 | fout_int[1] <= fout_int[0] >> 16; // Stage 0-1 225 | fout_int[3] <= fout_int[2]; // Stage 1-2 226 | fout_int[5] <= fout_int[4] >> 16; // Stage 2-3 227 | fout_int[7] <= fout_int[6]; // Stage 3-4 228 | fout_int[9] <= fout_int[8]; // Stage 4-5 229 | fout_int[11] <= fout_int[10] >> 16; // Stage 5-6 230 | fout_int[13] <= fout_int[12]; // Stages 6-7 231 | fout_int[15] <= fout_int[14]; // Stage 7-8 232 | fout_int[17] <= fout_int[16]; // Stage 8-9 233 | end 234 | endcase 235 | end 236 | 237 | // INSTANTATES TEN 16-BIT MULTIPLIERS // 238 | // 15:0 239 | mult_16bit mult16_0 ( 240 | .a_i (a_int[0]), 241 | .b_i (b_int[0]), 242 | .c_i (0), 243 | .fout_o (fout_int[0]), 244 | .out_o (out_int[0]) 245 | ); 246 | 247 | // 31:16 248 | mult_16bit mult16_1 ( 249 | .a_i (a_int[1]), 250 | .b_i (b_int[1]), 251 | .c_i (fout_int[1]), 252 | .fout_o (fout_int[2]), 253 | .out_o (out_int[1]) 254 | ); 255 | 256 | mult_16bit mult16_2 ( 257 | .a_i (a_int[2]), 258 | .b_i (b_int[2]), 259 | .c_i (fout_int[3]), 260 | .fout_o (fout_int[4]), 261 | .out_o (out_int[2]) 262 | ); 263 | 264 | // 47:32 265 | mult_16bit mult16_3 ( 266 | .a_i (a_int[3]), 267 | .b_i (b_int[3]), 268 | .c_i (fout_int[5]), 269 | .fout_o (fout_int[6]), 270 | .out_o (out_int[3]) 271 | ); 272 | 273 | mult_16bit mult16_4 ( 274 | .a_i (a_int[4]), 275 | .b_i (b_int[4]), 276 | .c_i (fout_int[7]), 277 | .fout_o (fout_int[8]), 278 | .out_o (out_int[4]) 279 | ); 280 | 281 | mult_16bit mult16_5 ( 282 | .a_i (a_int[5]), 283 | .b_i (b_int[5]), 284 | .c_i (fout_int[9]), 285 | .fout_o (fout_int[10]), 286 | .out_o (out_int[5]) 287 | ); 288 | 289 | // 63:48 290 | mult_16bit mult16_6 ( 291 | .a_i (a_int[6]), 292 | .b_i (b_int[6]), 293 | .c_i (fout_int[11]), 294 | .fout_o (fout_int[12]), 295 | .out_o (out_int[6]) 296 | ); 297 | 298 | mult_16bit mult16_7 ( 299 | .a_i (a_int[7]), 300 | .b_i (b_int[7]), 301 | .c_i (fout_int[13]), 302 | .fout_o (fout_int[14]), 303 | .out_o (out_int[7]) 304 | ); 305 | 306 | mult_16bit mult16_8 ( 307 | .a_i (a_int[8]), 308 | .b_i (b_int[8]), 309 | .c_i (fout_int[15]), 310 | .fout_o (fout_int[16]), 311 | .out_o (out_int[8]) 312 | ); 313 | 314 | mult_16bit mult16_9 ( 315 | .a_i (a_int[9]), 316 | .b_i (b_int[9]), 317 | .c_i (fout_int[17]), 318 | .fout_o (), 319 | .out_o (out_int[9]) 320 | ); 321 | 322 | endmodule -------------------------------------------------------------------------------- /hw/vector_processor_pkg.sv: -------------------------------------------------------------------------------- 1 | package vector_processor_pkg; 2 | 3 | parameter XLEN = 32; 4 | parameter VLEN = 256; 5 | 6 | // Control and Status Registers 7 | typedef enum logic [11:0] { 8 | CSR_VSTART = 12'h008, 9 | CSR_VXSAT = 12'h009, 10 | CSR_VXRM = 12'h00A, 11 | CSR_VCSR = 12'h00F, 12 | CSR_VL = 12'hC20, 13 | CSR_VTYPE = 12'hC21, 14 | CSR_VLENB = 12'hC22 15 | } csr_regs; 16 | 17 | // OPCODEs 18 | typedef enum logic [6:0] { 19 | OPCODE_LOAD_FP = 7'h07, // Floating point load 20 | OPCODE_STORE_FP = 7'h27, // Floating point store 21 | OPCODE_AMO = 7'h2F, // Atomic memory operation 22 | OPCODE_OP_V = 7'h57 // Vector arithmatic 23 | } opcodes; 24 | 25 | // Memory Addressing Mode 26 | typedef enum logic [2:0] { 27 | MOP_ZEU = 3'h0, // Zero extended unit stride 28 | MOP_ZES = 3'h2, // Zero extended strided 29 | MOP_ZEI = 3'h3, // Zero extended indexed 30 | MOP_SEU = 3'h4, // Sign extended unit stride 31 | MOP_SES = 3'h6, // Sign extended strided 32 | MOP_SEI = 3'h7 // Sign extended indexed 33 | } mops; 34 | 35 | // LSU Widths 36 | typedef enum logic [2:0] { 37 | S16 = 3'h1, // Scalar FP 16-bit 38 | S32 = 3'h2, // Scalar FP 32-bit 39 | S64 = 3'h3, // Scalar FP 64-bit 40 | S128 = 3'h4, // Scalar FP 128-bit 41 | VB = 3'h0, // Vector byte 42 | VH = 3'h5, // Vector halfword 43 | VW = 3'h6, // Vector word 44 | VE = 3'h7 // Vector element 45 | } lsu_widths; 46 | 47 | // Vector ALU OPCODEs 48 | typedef enum logic[5:0] { 49 | // LOGIC 50 | ALU_VAND, 51 | ALU_VNAND, 52 | ALU_VANDNOT, 53 | ALU_VOR, 54 | ALU_VNOR, 55 | ALU_VXOR, 56 | ALU_VXNOR, 57 | ALU_VNOT, 58 | // SHIFT 59 | ALU_VSLL, 60 | ALU_VSRL, 61 | ALU_VSRA, 62 | // ARITHMETIC 63 | ALU_VADD, 64 | ALU_VSUB, 65 | ALU_VMIN, 66 | ALU_VMAX, 67 | ALU_VADC, 68 | ALU_VSBC, 69 | // OTHER 70 | ALU_RGATHER 71 | } alu_opcodes; 72 | 73 | // Vector Compare OPCODEs 74 | typedef enum logic[3:0] { 75 | COMP_EQ, 76 | COMP_NEQ, 77 | COMP_LT, 78 | COMP_LTU, 79 | COMP_LE, 80 | COMP_LEU, 81 | COMP_GT, 82 | COMP_GTU, 83 | COMP_MIN, 84 | COMP_MINU, 85 | COMP_MAX, 86 | COMP_MAXU 87 | } comp_opcodes; 88 | 89 | endpackage 90 | --------------------------------------------------------------------------------