├── LICENSE
├── README.md
└── hw
    ├── pkg.sv
    ├── vector_addsub.sv
    ├── vector_alu.sv
    ├── vector_logic.sv
    ├── vector_mult.sv
    └── vector_processor_pkg.sv


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 Martin Riis
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # RISC-V-Vector-Processor
 2 | 256-bit vector processor based on the RISC-V vector (V) extension
 3 | Currently, only the execution units are published, however, the registers, fetch and decode units are in development. 
 4 | **THIS PROJECT IS IN ACTIVE DEVELOPMENT AND SHOULD NOT BE CONSIDERED BUG FREE**
 5 | 
 6 | ## 1. Background
 7 | 
 8 | ## 2. RISC-V Vector Extension Terminology
 9 | ### 2.1 Standard Element Width (SEW)
10 | The SEW defines the width of each word in the vector. For example, a 256-bit vector could contain 8 32-bit words, in this case, the SEW would be 32 bits. SEW is encoded as a 3-bit value using the following method:
11 | | `sew`/`vsew` | SEW |
12 | |---|---|
13 | | `000` | 8 |
14 | | `001` | 16 |
15 | | `010` | 32 |
16 | | `011` | 64 |
17 | | `100` | 128 |
18 | | `101` | 256 |
19 | | `110` | 512 |
20 | | `111` | 1024 |
21 | 
22 | Although some execution units can operate on and value of SEW up to 256 (bits), most only support up to 64 (bits).
23 | 
24 | ### 2.2 Vector Length (VLEN)
25 | VLEN describes the total length (usually in bits) of the vectors operated upon by the processor. In this design, VLEN is fixed at 256 bits. In the future, some simple elements, such as the ALU may be extended to support arbitrary vector lengths. 
26 | 
27 | ## 3. Features 
28 | ### 3.1 Integer Addition/Subtraction
29 | Addition and subtraction are performed using the `addsub_256bit` module (`addsub.sv`). This can be used on its own or combined with the logic unit as the ALU module, `vector_alu` (`vector_alu.sv`).
30 | | Port | Direction | Width | Description |
31 | |---|---|---|---|
32 | | `vaddsub_en_i` | in | 1 | Active high enable |
33 | | `a_i` | in | 256 | Vector input A |
34 | | `b_i` | in | 256 | Vector input B |
35 | | `sew_i` | in | 3 | Standard element width (8, 16, 32, 64, 128, 256) |
36 | | `carry_ext_i` | in | 32 | External carry/borrow in |
37 | | `op_i` | in | 1 | Operation: 0 = subtract, 1 = add |
38 | | `out_o` | out | 256 | Vector output |
39 | | `cout_o` | out | 32 | Carry/borrow out |
40 | 
41 | ### 3.2 Logic
42 | Logic and shift operations are performed by the `logic_256bit` module (`vector_logic.sv`). This can be used on its own or combined with the addsub unit as the ALU module, `vector_alu` (`vector_alu.sv`).
43 | | Port | Direction | Width | Description |
44 | |---|---|---|---|
45 | | `a_i` | in | 256 | Vector input A |
46 | | `b_i` | in | 256 | Vector input B |
47 | | `sew_i` | in | 3 | Standard element width (8, 16, 32, 64, 128, 256) |
48 | | `opcode_i` | in | 6 | Logic/ALU opcodes |
49 | | `carry_ext_i` | in | 32 | External carry/borrow in |
50 | 
51 | ### 3.3 Integer Multiplication
52 | Integer multiplication is performed by the `mult_256bit` module (`vector_mult.sv`). 
53 | As the design targets Xilinx FPGAs, the `(* use_dsp48 =  "true"  *)` directive is added to force the tool to infer DSP48 blocks to perform the multiplication. This can be removed if targeting for simulation or if a non-Xilinx FPGA is used. 
54 | | Port | Direction | Width | Description |
55 | |---|---|---|---|
56 | | `a_i` | in | 256 | Vector input A |
57 | | `b_i` | in | 256 | Vector input B |
58 | | `sew_i` | in | 3 | Standard element width (8, 16, 32, 64) |
59 | | `out_o` | out | 256 | Vector output |
60 | 
61 | ### 3.4 Shift
62 | 
63 | ### 3.5 Integer Compare
64 | Currently in development
65 | 
66 | ### 3.6 Vector Masking
67 | Vector masking allows certain elements of an input vector to be ignored by a execution unit. Vector register `v0` is used as the vector mask register, each element in a vector is allocated a single bit in the mask register. Element i is masked by bit i in the mask register.
68 | Currently, only the ALU (module `vector_alu`) supports vector masking.
69 | 
70 | ## 4. Processor Opcodes
71 | Please note, these are the opcodes used for specifying the operation of each execution unit and are not identical to the opcodes used by the RISC-V vector extension standard. The decode module (not yet published) is responsible for this conversion. 
72 | ### 4.1 ALU
73 | | Binary Code | Opcode | Operation|
74 | |---|---|---|
75 | | `000000` | `ALU_VAND` | `A & B` |
76 | | `000001` | `ALU_VNAND`| `¬(A & B)`  |
77 | | `000010` | `ALU_VANDNOT` | `A & ¬B` |
78 | | `000011` | `ALU_VOR` | `A + B` |
79 | | `000100` | `ALU_VNOR` | `¬(A + B)` |
80 | | `000101` | `ALU_VXOR` | `A ⊕ B`|
81 | | `000110` | `ALU_VXNOR` | `¬(A ⊕ B)` |
82 | | `000111` | `ALU_VNOT` | `¬A` |
83 | | `001000` | `ALU_VSLL` | `A << B` |
84 | | `001001` | `ALU_VSRL`| `A >> B`  |
85 | | `001010` | `ALU_VSRA` | `A >>> B` |
86 | | `001011` | `ALU_VADD` | `A + B` |
87 | | `001100` | `ALU_VSUB` | `A - B` |
88 | | `001101` | `ALU_VMIN` | `min(A, B)`|
89 | | `001110` | `ALU_VMAX` | `max(A, B)` |
90 | | `001111` | `ALU_VADC` | `A + B + Cin` |
91 | | `001110` | `ALU_VSBC` | `A - B - Cin` |
92 | 


--------------------------------------------------------------------------------
/hw/pkg.sv:
--------------------------------------------------------------------------------
  1 | ////////////////////////////////////////
  2 | ////////// RISC-V CPU Package //////////
  3 | ////////// Martin Riis, 2020 ///////////
  4 | ////////////////////////////////////////
  5 | 
  6 | package riscv_pkg;
  7 | 
  8 | typedef union packed {
  9 |     logic[255:0] i256;
 10 |     logic[1:0][127:0] i128;
 11 |     logic[3:0][63:0] i64;
 12 |     logic[7:0][31:0] i32;
 13 |     logic[15:0][15:0] i16;
 14 |     logic[31:0][7:0] i8;
 15 |     logic[63:0][3:0] i4;
 16 | } vector_t;
 17 | 
 18 | typedef enum logic[2:0] {
 19 |     SEW8    = 3'b000,
 20 |     SEW16   = 3'b001,
 21 |     SEW32   = 3'b010,
 22 |     SEW64   = 3'b011
 23 |     // SEW128 - SEW1024 are reserved as of v1.0
 24 |     //SEW128  = 3'b100,
 25 |     //SEW256  = 3'b101,
 26 |     //SEW512  = 3'b110,
 27 |     //SEW1024 = 3'b111
 28 | } sew_t;
 29 | 
 30 | 
 31 | ///////////////////////////////
 32 | // Standard Instruction Formats
 33 | ///////////////////////////////
 34 | typedef struct packed {
 35 |     logic[6:0] funct7;
 36 |     logic[4:0] rs2;
 37 |     logic[4:0] rs1;
 38 |     logic[2:0] funct3;
 39 |     logic[4:0] rd;
 40 |     logic[6:0] opcode;
 41 | } rtype_t;
 42 | 
 43 | typedef struct packed {
 44 |     logic[4:0] rs3;
 45 |     logic[1:0] fmt;
 46 |     logic[4:0] rs2;
 47 |     logic[4:0] rs1;
 48 |     logic[2:0] rm;
 49 |     logic[4:0] rd;
 50 |     logic[6:0] opcode;
 51 | } r4type_t;
 52 | 
 53 | typedef struct packed {
 54 |     logic[11:0] imm;
 55 |     logic[4:0] rs1;
 56 |     logic[2:0] funct3;
 57 |     logic[4:0] rd;
 58 |     logic[6:0] opcode;
 59 | } itype_t;
 60 | 
 61 | typedef struct packed {
 62 |     logic[6:0] imm1;
 63 |     logic[4:0] rs2;
 64 |     logic[4:0] rs1;
 65 |     logic[2:0] funct3;
 66 |     logic[4:0] imm0;
 67 |     logic[6:0] opcode;
 68 | } stype_t;
 69 | 
 70 | typedef struct packed {
 71 |     logic imm3;
 72 |     logic[5:0] imm1;
 73 |     logic[4:0] rs2;
 74 |     logic[4:0] rs1;
 75 |     logic[2:0] funct3;
 76 |     logic[3:0] imm0;
 77 |     logic imm2;
 78 |     logic[6:0] opcode;
 79 | } btype_t;
 80 | 
 81 | typedef struct packed {
 82 |     logic[19:0] imm;
 83 |     logic[4:0] rd;
 84 |     logic[6:0] opcode;
 85 | } utype_t;
 86 | 
 87 | typedef struct packed {
 88 |     logic imm3;
 89 |     logic[9:0] imm0;
 90 |     logic imm1;
 91 |     logic[7:0] imm2;
 92 |     logic[4:0] rd;
 93 |     logic[6:0] opcode;
 94 | } jtype_t;
 95 | 
 96 | typedef struct packed {
 97 |     logic[4:0] funct5;
 98 |     logic aq;
 99 |     logic rl;
100 |     logic[4:0] rs2;
101 |     logic[4:0] rs1;
102 |     logic[2:0] funct3;
103 |     logic[4:0] rd;
104 |     logic[6:0] opcode;
105 | } atype_t; // For AMO
106 | 
107 | typedef enum logic[6:0] {
108 |     OPCODE_AMO        = 7'b0101111,
109 |     OPCODE_C0         = 7'bXXXXX00,
110 |     OPCODE_C1         = 7'bXXXXX01,
111 |     OPCODE_C2         = 7'bXXXXX10,
112 |     OPCODE_LD_FP      = 7'b0000111,
113 |     OPCODE_ST_FP      = 7'b0100111,
114 |     OPCODE_OP_FP      = 7'b0111011,
115 |     OPCODE_FMADD      = 7'b1000011,
116 |     OPCODE_FMSUB      = 7'b1000111,
117 |     OPCODE_FNMSUB     = 7'b1001011,
118 |     OPCODE_FNMADD     = 7'b1001111,
119 |     OPCODE_OP_IMM     = 7'b0010011,
120 |     OPCODE_LUI        = 7'b0110111,
121 |     OPCODE_AUIPC      = 7'b0010111,
122 |     OPCODE_OP         = 7'b0110011,
123 |     OPCODE_JAL        = 7'b1101111,
124 |     OPCODE_JALR       = 7'b1100111,
125 |     OPCODE_BRANCH     = 7'b1100011,
126 |     OPCODE_LOAD       = 7'b0000011,
127 |     OPCODE_STORE      = 7'b0100011,
128 |     OPCODE_MISC_MEM   = 7'b0001111,
129 |     OPCODE_OPV        = 7'b1010111
130 | } rv_opcodes;
131 | 
132 | typedef enum logic[2:0] {
133 |     OPIVV   = 3'b000,
134 |     OPFVV   = 3'b001,
135 |     OPMVV   = 3'b010,
136 |     OPIVI   = 3'b011,
137 |     OPIVX   = 3'b100,
138 |     OPFVF   = 3'b101,
139 |     OPMVX   = 3'b110,
140 |     OPCFG   = 3'b111
141 | } vfunct3_t;
142 | 
143 | typedef enum logic[5:0] {
144 |     VALU_NOP = 6'b000000,
145 |     VALU_ADDU,
146 |     VALU_ADDS,
147 |     VALU_SUBU,
148 |     VALU_SUBS,
149 |     VALU_RSUB, // Should be more efficient than larger input muxes
150 |     VALU_MINS,
151 |     VALU_MINU,
152 |     VALU_MAXS,
153 |     VALU_MAXU,
154 |     VALU_EQ,
155 |     VALU_NE,
156 |     VALU_LTU,
157 |     VALU_LTS,
158 |     VALU_LEU,
159 |     VALU_LES,
160 |     VALU_GTU,
161 |     VALU_GTS,
162 |     VALU_AND,
163 |     VALU_OR,
164 |     VALU_XOR,
165 |     VALU_SLL,
166 |     VALU_SRL,
167 |     VALU_SRA,    
168 |     VALU_ADC,
169 |     VALU_SBC,
170 |     VALU_AADDS,
171 |     VALU_AADDU,
172 |     VALU_ASUBS,
173 |     VALU_ASUBU,
174 |     VALU_ZEX2,
175 |     VALU_ZEX4,
176 |     VALU_ZEX8,
177 |     VALU_SEX2,
178 |     VALU_SEX4,
179 |     VALU_SEX8
180 | } valu_opcodes;
181 | 
182 | typedef enum logic[4:0] {
183 |     VMD_NOP = 5'b00000,
184 |     VMD_MULUU,
185 |     VMD_MULUS,
186 |     VMD_MULSS,
187 |     VMD_DIVU,
188 |     VMD_DIVS,
189 |     VMD_REMU,
190 |     VMD_REMS
191 | } vmuldiv_opcodes;
192 | 
193 | typedef enum logic[5:0] {
194 |     VFPU_NOP = 6'b000000,
195 |     VFPU_ADD,
196 |     VFPU_SUB,
197 |     VFPU_MIN,
198 |     VFPU_MAX,
199 |     VFPU_SQRT,
200 |     VFPU_SQRT_EST,
201 |     VFPU_REC_EST,
202 |     VFPU_EQ,
203 |     VFPU_LE,
204 |     VFPU_LT,
205 |     VFPU_NE,
206 |     VFPU_GT,
207 |     VFPU_GE,
208 |     VFPU_DIV,
209 |     VFPU_RDIV,
210 |     VFPU_MUL,
211 |     VFPU_RSUB,
212 |     VFPU_SGNJ,
213 |     VFPU_SGNJN,
214 |     VFPU_SGNJX,
215 |     VFPU_F2U,
216 |     VFPU_F2S,
217 |     VFPU_U2F,
218 |     VFPU_S2F,
219 |     VFPU_F2D,
220 |     VFPU_D2F,
221 |     VFPU_CLASS
222 | } vfpu_opcodes;
223 | 
224 | typedef enum logic[1:0] {
225 |     CVT_NONE,
226 |     CVT_WIDE,
227 |     CVT_NARROW
228 | } width_cvt_t;
229 | 
230 | typedef enum logic[2:0] {
231 |     FPU_RNE  = 3'b000,
232 |     FPU_RTZ  = 3'b001,
233 |     FPU_RDN  = 3'b010,
234 |     FPU_RUP  = 3'b011,
235 |     FPU_RMM  = 3'b100,
236 |     FPU_REV  = 3'b101, // Not part of standard F/D extensions, only used for vfpu
237 |     FPU_ROD  = 3'b110, // Not part of standard F/D extensions, only used for vfpu
238 |     FPU_INST = 3'b111
239 | } fpu_round_t;
240 | 
241 | typedef struct packed {
242 |     logic valid;
243 |     logic[20:0] tag;
244 |     logic [31:0] data;
245 | } cache_struct;
246 | 
247 | typedef struct packed {
248 |     logic[20:0] tag;
249 |     logic[10:0] index;
250 | } addr_struct;
251 | 
252 | endpackage
253 | 


--------------------------------------------------------------------------------
/hw/vector_addsub.sv:
--------------------------------------------------------------------------------
  1 | import vector_processor_pkg::*;
  2 | 
  3 | // *** 8-BIT ADDER/SUBTRACTOR WITH CARRY IN AND OUT *** //
  4 | module addsub_8bit (
  5 |     input wire addsub8_en_i, // Active HIGH enable
  6 |     input wire [7:0] a_i, b_i,
  7 |     input wire cin_i, // Carry in
  8 |     input wire op_i, // Operation, 0 = subtract, 1 = add
  9 |     output reg [7:0] out_o,
 10 |     output reg cout_o // Carry out
 11 | );
 12 | 
 13 | reg [8:0] full_out; // 9-bit output
 14 | 
 15 | assign out_o = addsub8_en_i ? full_out[7:0] : 8'b0;
 16 | assign cout_o = addsub8_en_i ? full_out[8] : 1'b0;
 17 | 
 18 | always_comb begin
 19 |     if (addsub8_en_i) begin
 20 |         if (op_i) // Add
 21 |             full_out <= a_i + b_i + cin_i;
 22 |         else // Subtract
 23 |             full_out <= a_i - b_i - cin_i;
 24 |     end
 25 | end
 26 | endmodule
 27 | 
 28 | // *** 8-BIT FLIP-FLOP *** //
 29 | module flipflop_8bit (
 30 |     input wire clk_i,
 31 |     input wire [7:0] d_i,
 32 |     output reg [7:0] q_o 
 33 | );
 34 | 
 35 | always_ff @ (posedge clk_i) begin
 36 |     q_o <= d_i;
 37 | end
 38 | endmodule
 39 | 
 40 | // *** 8-BIT DELAY BLOCK *** //
 41 | // Allows for custom delay using flip-flops
 42 | module delay_8bit # (
 43 |     parameter delay = 8
 44 | )
 45 | (
 46 |     input wire clk_i,
 47 |     input wire [7:0] d_i,
 48 |     output wire [7:0] q_o
 49 | );
 50 | 
 51 | wire [7:0] sig_int[delay];
 52 | 
 53 | genvar d;
 54 | for (d = 0; d < delay; d++) begin
 55 |     // Single delay cycle - single flip-flop
 56 |     if (delay == 1) begin
 57 |         flipflop_8bit ff8_inst (
 58 |             .clk_i (clk_i),
 59 |             .d_i (d_i),
 60 |             .q_o (q_o)
 61 |         );
 62 |     end
 63 |     // Two cycle delay - two flip-flops
 64 |     else if (delay == 2) begin
 65 |         if (d == 0) begin
 66 |         flipflop_8bit ff8_inst (
 67 |             .clk_i (clk_i),
 68 |             .d_i (d_i),
 69 |             .q_o (sig_int[d])
 70 |         );
 71 |         end
 72 |         else begin
 73 |             flipflop_8bit ff8_inst (
 74 |                 .clk_i (clk_i),
 75 |                 .d_i (sig_int[d-1]),
 76 |                 .q_o (q_o)
 77 |             );
 78 |         end
 79 |     end
 80 |     // Multi-cycle delay
 81 |     else begin
 82 |         if (d == 0) begin
 83 |             flipflop_8bit ff8_inst (
 84 |                 .clk_i (clk_i),
 85 |                 .d_i (d_i),
 86 |                 .q_o (sig_int[d])
 87 |             );
 88 |         end
 89 |         else if (d == delay - 1) begin
 90 |             flipflop_8bit ff8_inst (
 91 |                 .clk_i (clk_i),
 92 |                 .d_i (sig_int[d-1]),
 93 |                 .q_o (q_o)
 94 |             );
 95 |         end
 96 |         else begin
 97 |             flipflop_8bit ff8_inst (
 98 |                 .clk_i (clk_i),
 99 |                 .d_i (sig_int[d-1]),
100 |                 .q_o (sig_int[d])
101 |             );
102 |         end
103 |     end
104 | end
105 | endmodule
106 | 
107 | 
108 | // *** 256-BIT VECTOR ADDER WITH VECTOR CARRY IN AND OUT *** //
109 | // Supports:
110 | //  - 1x 256-bit word
111 | //  - 2x 128-bit words
112 | //  - 4x 64-bit words
113 | //  - 8x 32-bit words
114 | //  - 16x 16-bit words
115 | //  - 32x 8-bit words
116 | 
117 | // Word length selected by sew_i input
118 | module addsub_256bit (
119 |     input wire clk_i,
120 |     input wire vaddsub_en_i, // Active HIGH enable
121 |     input wire [255:0] a_i, b_i,
122 |     input wire [2:0] sew_i,
123 |     input wire [31:0] carry_ext_i, // External carry in
124 |     input wire op_i, // Operation, 0 = subtract, 1 = add
125 |     output reg [255:0] out_o,
126 |     output wire [32:0] cout_o // Carry out
127 | );
128 | 
129 | reg [32:0] cin_int = 33'b0;
130 | wire [32:0] cout_int;
131 | 
132 | // Mask internal carry values depending on the current SEW
133 | always_comb begin
134 |     case (sew_i)
135 |         3'b000  : cin_int <= 32'h0 & cout_int; // 8-bit words
136 |         3'b001  : cin_int <= 32'haaaaaaaa & cout_int; // 16-bit words
137 |         3'b010  : cin_int <= 32'heeeeeeee & cout_int; // 32-bit words
138 |         3'b011  : cin_int <= 32'hfefefefe & cout_int; // 64-bit words
139 |         3'b100  : cin_int <= 32'hfffefffe & cout_int; // 128-bit words
140 |         3'b101  : cin_int <= 32'hfffffffe & cout_int; // 256-bit words
141 |     endcase
142 | end
143 | 
144 | wire [255:0] a_int, b_int;
145 | 
146 | assign a_int[7:0] = a_i[7:0];
147 | assign b_int[7:0] = b_i[7:0];
148 | 
149 | genvar p;
150 | for (p = 1; p < 32; p++) begin
151 |     delay_8bit # (
152 |         .delay(p)
153 |     )
154 |     reg_pipeline_a (
155 |         .clk_i (clk_i),
156 |         .d_i (a_i[p*8+7:p*8]),
157 |         .q_o (a_int[p*8+7:p*8])
158 |     );
159 |     
160 |     delay_8bit # (
161 |         .delay(p)
162 |     )
163 |     reg_pipeline_b (
164 |         .clk_i (clk_i),
165 |         .d_i (b_i[p*8+7:p*8]),
166 |         .q_o (b_int[p*8+7:p*8])
167 |     );
168 | end
169 | 
170 | // Generate 32 8-bit addsubs (256-bits total)
171 | genvar i;
172 | for (i = 0; i < 32; i++) begin
173 |     if (i == 0) begin
174 |         addsub_8bit addsub_8bit_inst (
175 |             .addsub8_en_i (vaddsub_en_i),
176 |             .a_i (a_int[8*(i+1)-1:8*(i+1)-8]),
177 |             .b_i (b_int[8*(i+1)-1:8*(i+1)-8]),
178 |             .out_o (out_o[8*(i+1)-1:8*(i+1)-8]),
179 |             .op_i (op_i),
180 |             .cin_i (carry_ext_i[i]),
181 |             .cout_o (cout_int[i+1])
182 |         );
183 |     end
184 |     else begin
185 |         addsub_8bit addsub_8bit_inst (
186 |             .addsub8_en_i (vaddsub_en_i),
187 |             .a_i (a_int[8*(i+1)-1:8*(i+1)-8]),
188 |             .b_i (b_int[8*(i+1)-1:8*(i+1)-8]),
189 |             .out_o (out_o[8*(i+1)-1:8*(i+1)-8]),
190 |             .op_i (op_i),
191 |             .cin_i (cin_int[i] | carry_ext_i[i]),
192 |             .cout_o (cout_int[i+1])
193 |         );
194 |     end
195 | end
196 | endmodule
197 | 


--------------------------------------------------------------------------------
/hw/vector_alu.sv:
--------------------------------------------------------------------------------
  1 | //////////////////////////////////
  2 | // Vector Arithmetic Logic Unit //
  3 | /////// Martin Riis, 2020 ////////
  4 | //////////////////////////////////
  5 | 
  6 | import vector_processor_pkg::*;
  7 | import riscv_pkg::*;
  8 | 
  9 | module vector_alu (
 10 |     input logic clk_i,
 11 |     input logic rst_i,
 12 |     
 13 |     input vector_t vs1_i,
 14 |     input vector_t vs2_i,
 15 |     input logic[XLEN-1:0] rs1_i,
 16 |     input vector_t v0_i, // Mask and carry in register
 17 |     output vector_t vd_o,
 18 |     input sew_t sew_i,
 19 |     
 20 |     input logic signed_i, // 0 = unsigned, 1 = signed
 21 |     input logic use_mask_i, // 0 = don't mask, 1 = use mask
 22 |     input logic use_carry_i, // 0 = don't use carry/borrow, 1 = use carry/borrow
 23 |     input logic produce_carry_i, // Produce carry/borrow out in mask register format
 24 |     input logic saturate_i, // 0 = overflow, 1 = saturate + set VXSAT bit
 25 |     
 26 |     input valu_opcodes valu_op_i
 27 | );
 28 | 
 29 | // Internal VS1 and VS2 signals
 30 | vector_t vs1;
 31 | vector_t vs2;
 32 | 
 33 | // VECTOR MASKING OPERATION //
 34 | /*
 35 | * If the mask input is '1', the input vectors are masked, else,
 36 | * no mask is applied. 
 37 | * The mask source is vector v0 (input v0_i) which is shared
 38 | * with the carry input. Therefore, a mask and carry cannot be
 39 | * used together.
 40 | */
 41 | always_comb begin
 42 |     if (use_mask_i) begin
 43 |         case (sew_i)
 44 |             SEW8 : begin // 8-bit input
 45 |                 for (int i = 0; i < 32; i++) begin
 46 |                     vs1.i8[i] <= vs1_i.i8[i] & ~{8{v0_i.i8[i]}};
 47 |                     vs2.i8[i] <= vs2_i.i8[i] & ~{8{v0_i.i8[i]}};
 48 |                 end
 49 |             end
 50 |             SEW16 : begin // 16-bit input
 51 |                 for (int i = 0; i < 16; i++) begin
 52 |                     vs1.i16[i] <= vs1_i.i16[i] & ~{16{v0_i.i16[i]}};
 53 |                     vs2.i16[i] <= vs2_i.i16[i] & ~{16{v0_i.i16[i]}};
 54 |                 end
 55 |             end
 56 |             SEW32 : begin // 32-bit input
 57 |                 for (int i = 0; i < 8; i++) begin
 58 |                     vs1.i32[i] <= vs1_i.i32[i] & ~{32{v0_i.i32[i]}};
 59 |                     vs2.i32[i] <= vs2_i.i32[i] & ~{32{v0_i.i32[i]}};
 60 |                 end
 61 |             end
 62 |             SEW64 : begin // 64-bit input
 63 |                 for (int i = 0; i < 4; i++) begin
 64 |                     vs1.i64[i] <= vs1_i.i64[i] & ~{64{v0_i.i64[i]}};
 65 |                     vs2.i64[i] <= vs2_i.i64[i] & ~{64{v0_i.i64[i]}};
 66 |                 end
 67 |             end
 68 |         endcase
 69 |     end
 70 |     else begin
 71 |         vs1 <= vs1_i;
 72 |         vs2 <= vs1_i;
 73 |     end
 74 | end
 75 | 
 76 | // Internal carry signals
 77 | logic[31:0] carry_out;
 78 | logic[31:0] carry_in;
 79 | // Set external carry in
 80 | /*
 81 | External carry input uses the mask register (v0).
 82 | use_carry_i == 1 : carry input is used and sourced from mask register
 83 | use_carry_i == 0 : carry input is not used, carry_in = 0
 84 | */
 85 | assign carry_in = use_carry_i ? v0_i.i32[0] : 32'b0;
 86 | 
 87 | addsub_256bit addsub_256bit_inst (
 88 |     .vaddsub_en_i (addsub_en_int),
 89 |     .a_i (vs1_int),
 90 |     .b_i (vs2_int),
 91 |     .sew_i (sew_i),
 92 |     .carry_ext_i (carry_ext_int),
 93 |     .op_i (addsub_op_i),
 94 |     .out_o (vd_o),
 95 |     .cout_o (carry_out_int)
 96 | );
 97 | 
 98 | logic_256bit logic_256bit_inst (
 99 |     .logic_en_i (logic_en_int),
100 |     .a_i (vs1_int),
101 |     .b_i (vs2_int),
102 |     .sew_i (sew_i),
103 |     .opcode_i (dc_alu_opcode_i),
104 |     .out_o (vd_o)
105 | );
106 | 
107 | endmodule
108 | 


--------------------------------------------------------------------------------
/hw/vector_logic.sv:
--------------------------------------------------------------------------------
  1 | import vector_processor_pkg::*;
  2 | 
  3 | // *** 256-BIT LOGIC UNIT *** //
  4 | module logic_256bit (
  5 |     input wire [255:0] a_i, b_i,
  6 |     input wire [2:0] sew_i,
  7 |     input vector_processor_pkg::alu_opcodes opcode_i,
  8 |     output reg [255:0] out_o
  9 | );
 10 | 
 11 | reg [255:0] result_logic;
 12 | 
 13 | // LOGIC OPERATIONS //
 14 | always_comb begin
 15 |     result_logic = '0;
 16 |     case (opcode_i)
 17 |         ALU_VAND    : result_logic <= a_i & b_i;
 18 |         ALU_VNAND   : result_logic <= ~(a_i & b_i);
 19 |         ALU_VANDNOT : result_logic <= a_i & ~b_i;
 20 |         ALU_VOR     : result_logic <= a_i | b_i;
 21 |         ALU_VNOR    : result_logic <= ~(a_i | b_i);
 22 |         ALU_VXOR    : result_logic <= a_i ^ b_i;
 23 |         ALU_VXNOR   : result_logic <= ~(a_i ^ b_i);
 24 |         ALU_VNOT    : result_logic <= ~a_i;
 25 |     endcase
 26 | end
 27 | 
 28 | // SHIFT OPERATIONS //
 29 | // Can likely be optimised to use fewer resources
 30 | wire [255:0] result_shift_8bit, 
 31 |              result_shift_16bit,
 32 |              result_shift_32bit,
 33 |              result_shift_64bit,
 34 |              result_shift_128bit,
 35 |              result_shift_256bit;
 36 | reg [255:0]  result_shift;
 37 | 
 38 | // Instantiates 32 8-bit shiftere        
 39 | genvar i8;
 40 | for (i8 = 0; i8 < 255; i8 = i8 + 8) begin
 41 |     shifter # (.width (8)) shift_8bit (
 42 |         .a_i (a_i[i8+7:i8]),
 43 |         .b_i (b_i[i8+7:i8]),
 44 |         .opcode_i (opcode_i),
 45 |         .out_o (result_shift_8bit[i8+7:i8])
 46 |     );
 47 | end
 48 | 
 49 | // Instantiates 16 16-bit shiftere 
 50 | genvar i16;
 51 | for (i16 = 0; i16 < 255; i16 = i16 + 16) begin
 52 |     shifter # (.width (16)) shift_16bit (
 53 |         .a_i (a_i[i16+15:i16]),
 54 |         .b_i (b_i[i16+15:i16]),
 55 |         .opcode_i (opcode_i),
 56 |         .out_o (result_shift_16bit[i16+15:i16])
 57 |     );
 58 | end
 59 | 
 60 | // Instantiates 8 32-bit shiftere 
 61 | genvar i32;
 62 | for (i32 = 0; i32 < 255; i32 = i32 + 32) begin
 63 |     shifter # (.width (32)) shift_32bit (
 64 |         .a_i (a_i[i32+31:i32]),
 65 |         .b_i (b_i[i32+31:i32]),
 66 |         .opcode_i (opcode_i),
 67 |         .out_o (result_shift_32bit[i32+31:i32])
 68 |     );
 69 | end
 70 | 
 71 | // Instantiates 4 64-bit shiftere 
 72 | genvar i64;
 73 | for (i64 = 0; i64 < 255; i64 = i64 + 64) begin
 74 |     shifter # (.width (64)) shift_64bit (
 75 |         .a_i (a_i[i64+63:i64]),
 76 |         .b_i (b_i[i64+63:i64]),
 77 |         .opcode_i (opcode_i),
 78 |         .out_o (result_shift_64bit[i64+63:i64])
 79 |     );
 80 | end
 81 | 
 82 | // Instantiates 2 128-bit shiftere 
 83 | genvar i128;
 84 | for (i128 = 0; i128 < 255; i128 = i128 + 128) begin
 85 |     shifter # (.width (128)) shift_128bit (
 86 |         .a_i (a_i[i128+127:i128]),
 87 |         .b_i (b_i[i128+127:i128]),
 88 |         .opcode_i (opcode_i),
 89 |         .out_o (result_shift_128bit[i128+127:i128])
 90 |     );
 91 | end
 92 | 
 93 | // Instantiates 1 256-bit shifter 
 94 | genvar i256;
 95 | for (i256 = 0; i256 < 255; i256 = i256 + 256) begin
 96 |     shifter # (.width (256)) shift_256bit (
 97 |         .a_i (a_i[i256+255:i256]),
 98 |         .b_i (b_i[i256+255:i256]),
 99 |         .opcode_i (opcode_i),
100 |         .out_o (result_shift_256bit[i256+255:i256])
101 |     );
102 | end
103 | 
104 | // SHIFTER OUTPUT CONTROL // 
105 | // Selects shift output based on SEW
106 | always_comb begin
107 |     case (sew_i)
108 |         3'b000  : result_shift <= result_shift_8bit;
109 |         3'b001  : result_shift <= result_shift_16bit;
110 |         3'b010  : result_shift <= result_shift_32bit;
111 |         3'b011  : result_shift <= result_shift_64bit;
112 |         3'b100  : result_shift <= result_shift_128bit;
113 |         3'b101  : result_shift <= result_shift_256bit;
114 |     endcase
115 | end
116 | 
117 | // RESULT OUTPUT //
118 | // Selects between logic block result and shifter result based on the opcode
119 | always_comb begin
120 |     out_o = '0;
121 |     case (opcode_i)
122 |         ALU_VAND, ALU_VNAND, ALU_VANDNOT, ALU_VOR,
123 |         ALU_VNOR, ALU_VXOR, ALU_VXNOR, ALU_VNOT : out_o <= result_logic;
124 |         ALU_VSLL, ALU_VSRL, ALU_VSRA : out_o <= result_shift;
125 |     endcase
126 | end
127 | endmodule
128 | 
129 | 
130 | // *** N-BIT SHIFTER *** //
131 | // Shifts a_i by b_i bits
132 | module shifter # (
133 |     parameter width = 8
134 | )
135 | (
136 |     input wire [width-1:0] a_i, b_i,
137 |     input vector_processor_pkg::alu_opcodes opcode_i,
138 |     output reg [width-1:0] out_o
139 | );
140 | 
141 | always_comb begin
142 |     case (opcode_i)
143 |         ALU_VSLL : out_o <= a_i << b_i; // Left logical shift
144 |         ALU_VSRL : out_o <= a_i >> b_i; // Right logical shift
145 |         ALU_VSRA : out_o <= a_i >>> b_i; // Right arithmetic shift (preserves sign)
146 |     endcase
147 | end
148 | endmodule


--------------------------------------------------------------------------------
/hw/vector_mult.sv:
--------------------------------------------------------------------------------
  1 | // Forces use of Xilinx DSP48 blocks for MAC operation
  2 | (* use_dsp48 = "true" *)
  3 | 
  4 | // 16-bit multiply-accumulate unit
  5 | module mult_16bit (
  6 |     input wire [15:0] a_i, b_i, c_i,
  7 |     output reg [31:0] fout_o, // Full 16-bit output
  8 |     output reg [15:0] out_o // Truncated 8-bit output
  9 | );
 10 | 
 11 | assign fout_o = a_i * b_i + c_i;
 12 | assign out_o = fout_o[15:0];
 13 | 
 14 | endmodule
 15 | 
 16 | // 256-BIT VECTOR MULTIPLIER
 17 | // Supports:
 18 | //  - 4x 64-bit words
 19 | //  - 8x 32-bit words
 20 | //  - 16x 16-bit words
 21 | //  - 32x 8-bit words
 22 | 
 23 | // Word length selected by sew_i input
 24 | module mult_256bit (
 25 |     input wire [255:0] a_i, b_i,
 26 |     input wire [2:0] sew_i,
 27 |     output reg [255:0] out_o
 28 | );
 29 | 
 30 | genvar i;
 31 | for (i = 0; i < 4; i++) begin
 32 |     mult_64bit mult_64bit_inst (
 33 |         .a_i (a_i[i*64+63:i*64]),
 34 |         .b_i (b_i[i*64+63:i*64]),
 35 |         .sew_i (sew_i),
 36 |         .out_o (out_o[i*64+63:i*64])
 37 |     );
 38 | end
 39 | endmodule
 40 | 
 41 | 
 42 | // 64-bit multiplier from 16-bit multipliers
 43 | module mult_64bit (
 44 |     input wire [63:0] a_i, b_i,
 45 |     input wire [7:0] sew_i,
 46 |     output reg [63:0] out_o
 47 | );
 48 | 
 49 | reg [15:0] a_int[10], b_int[10];
 50 | reg [15:0] out_int[10];
 51 | reg [31:0] fout_int [18];
 52 | 
 53 | always_comb begin
 54 |     case (sew_i)
 55 |         3'b000 : begin // 8-bit words
 56 |             a_int[0] <= {8'b0, a_i[7:0]};
 57 |             b_int[0] <= {8'b0, b_i[7:0]};
 58 |             
 59 |             a_int[1] <= {8'b0, a_i[15:8]};
 60 |             b_int[1] <= {8'b0, b_i[15:8]};
 61 |             
 62 |             a_int[2] <= {8'b0, a_i[23:16]};
 63 |             b_int[2] <= {8'b0, b_i[23:16]};
 64 |             
 65 |             a_int[3] <= {8'b0, a_i[31:24]};
 66 |             b_int[3] <= {8'b0, b_i[31:24]};
 67 |             
 68 |             a_int[4] <= {8'b0, a_i[39:32]};
 69 |             b_int[4] <= {8'b0, b_i[39:32]};
 70 |             
 71 |             a_int[5] <= {8'b0, a_i[47:40]};
 72 |             b_int[5] <= {8'b0, b_i[47:40]};
 73 |             
 74 |             a_int[6] <= {8'b0, a_i[55:48]};
 75 |             b_int[6] <= {8'b0, b_i[55:48]};
 76 |             
 77 |             a_int[7] <= {8'b0, a_i[63:56]};
 78 |             b_int[7] <= {8'b0, b_i[63:56]};
 79 |             
 80 |             a_int[8] <= 16'b0;
 81 |             b_int[8] <= 16'b0;
 82 |             a_int[9] <= 16'b0;
 83 |             b_int[9] <= 16'b0;
 84 |             
 85 |             // Outputs
 86 |             out_o[7:0] <= out_int[0];
 87 |             out_o[15:8] <= out_int[1];
 88 |             out_o[23:16] <= out_int[2];
 89 |             out_o[31:24] <= out_int[3];
 90 |             out_o[39:32] <= out_int[4];
 91 |             out_o[47:40] <= out_int[5];
 92 |             out_o[55:48] <= out_int[6];
 93 |             out_o[63:56] <= out_int[7];
 94 |             
 95 |             // Intermediate full output
 96 |             fout_int[1] <= 32'b0; // Stage 0-1
 97 |             fout_int[3] <= 32'b0; // Stage 1-2
 98 |             fout_int[5] <= 32'b0; // Stage 2-3
 99 |             fout_int[7] <= 32'b0; // Stage 3-4
100 |             fout_int[9] <= 32'b0; // Stage 4-5
101 |             fout_int[11] <= 32'b0; // Stage 5-6
102 |             fout_int[13] <= 32'b0; // Stages 6-7
103 |             fout_int[15] <= 32'b0; // Stage 7-8
104 |             fout_int[17] <= 32'b0; // Stage 8-9
105 |         end
106 |         3'b001 : begin // 16-bit words
107 |             a_int[0] <= a_i[15:0];
108 |             b_int[0] <= b_i[15:0];
109 |             
110 |             a_int[1] <= a_i[31:16];
111 |             b_int[1] <= b_i[31:16];
112 |             
113 |             a_int[2] <= a_i[47:32];
114 |             b_int[2] <= b_i[47:32];
115 |             
116 |             a_int[3] <= a_i[63:48];
117 |             b_int[3] <= b_i[63:48];
118 |             
119 |             a_int[4] <= 16'b0;
120 |             b_int[4] <= 16'b0;
121 |             a_int[5] <= 16'b0;
122 |             b_int[5] <= 16'b0;
123 |             a_int[6] <= 16'b0;
124 |             b_int[6] <= 16'b0;
125 |             a_int[7] <= 16'b0;
126 |             b_int[7] <= 16'b0;
127 |             a_int[8] <= 16'b0;
128 |             b_int[8] <= 16'b0;
129 |             a_int[9] <= 16'b0;
130 |             b_int[9] <= 16'b0;
131 |             
132 |             // Outputs
133 |             out_o[15:0] <= out_int[0];
134 |             out_o[31:16] <= out_int[1];
135 |             out_o[47:32] <= out_int[2];
136 |             out_o[63:48] <= out_int[3];
137 |             
138 |             // Intermediate full output
139 |             fout_int[1] <= 32'b0; // Stage 0-1
140 |             fout_int[3] <= 32'b0; // Stage 1-2
141 |             fout_int[5] <= 32'b0; // Stage 2-3
142 |             fout_int[7] <= 32'b0; // Stage 3-4
143 |             fout_int[9] <= 32'b0; // Stage 4-5
144 |             fout_int[11] <= 32'b0; // Stage 5-6
145 |             fout_int[13] <= 32'b0; // Stage 6-7
146 |             fout_int[15] <= 32'b0; // Stage 7-8
147 |             fout_int[17] <= 32'b0; // Stage 8-9
148 |         end
149 |         3'b010 : begin // 32-bit words
150 |             a_int[0] <= a_i[15:0];
151 |             b_int[0] <= b_i[15:0];
152 |             
153 |             a_int[1] <= a_i[15:0];
154 |             b_int[1] <= b_i[31:16];
155 |             a_int[2] <= a_i[31:16];
156 |             b_int[2] <= b_i[15:0];
157 |             
158 |             a_int[3] <= a_i[47:32];
159 |             b_int[3] <= b_i[47:32];
160 |             
161 |             a_int[4] <= a_i[47:32];
162 |             b_int[4] <= b_i[63:48];
163 |             a_int[5] <= a_i[63:48];
164 |             b_int[5] <= b_i[47:32];
165 |             
166 |             a_int[6] <= 16'b0;
167 |             b_int[6] <= 16'b0;
168 |             a_int[7] <= 16'b0;
169 |             b_int[7] <= 16'b0;
170 |             a_int[8] <= 16'b0;
171 |             b_int[8] <= 16'b0;
172 |             a_int[9] <= 16'b0;
173 |             b_int[9] <= 16'b0;
174 |             
175 |             // Outputs
176 |             out_o[15:0] <= out_int[0];
177 |             out_o[31:16] <= out_int[2];
178 |             out_o[47:32] <= out_int[3];
179 |             out_o[63:48] <= out_int[5];
180 |             
181 |             // Intermediate full output
182 |             fout_int[1] <= fout_int[0] >> 16; // Stage 0-1
183 |             fout_int[3] <= fout_int[2]; // Stage 1-2
184 |             fout_int[5] <= 32'b0; // Stage 2-3
185 |             fout_int[7] <= fout_int[6] >> 16; // Stage 3-4
186 |             fout_int[9] <= fout_int[8]; // Stage 4-5
187 |             fout_int[11] <= 32'b0; // Stage 5-6
188 |             fout_int[13] <= 32'b0; // Stages 6-7
189 |             fout_int[15] <= 32'b0; // Stage 7-8
190 |             fout_int[17] <= 32'b0; // Stage 8-9
191 |         end
192 |         3'b011 : begin // 64-bit words
193 |             a_int[0] <= a_i[15:0];
194 |             b_int[0] <= b_i[15:0];
195 |             
196 |             a_int[1] <= a_i[15:0];
197 |             b_int[1] <= b_i[31:16];
198 |             a_int[2] <= a_i[31:16];
199 |             b_int[2] <= b_i[15:0];
200 |             
201 |             a_int[3] <= a_i[15:0];
202 |             b_int[3] <= b_i[47:32];
203 |             a_int[4] <= a_i[31:16];
204 |             b_int[4] <= b_i[31:16];
205 |             a_int[5] <= a_i[47:32];
206 |             b_int[5] <= b_i[15:0];
207 |             
208 |             a_int[6] <= a_i[15:0];
209 |             b_int[6] <= b_i[63:48];
210 |             a_int[7] <= a_i[31:16];
211 |             b_int[7] <= b_i[47:32];
212 |             a_int[8] <= a_i[47:32];
213 |             b_int[8] <= b_i[31:16];
214 |             a_int[9] <= a_i[63:48];
215 |             b_int[9] <= b_i[15:0];
216 |             
217 |             // Outputs
218 |             out_o[15:0] <= out_int[0];
219 |             out_o[31:16] <= out_int[2];
220 |             out_o[47:32] <= out_int[5];
221 |             out_o[63:48] <= out_int[9];
222 |             
223 |             // Intermediate full output
224 |             fout_int[1] <= fout_int[0] >> 16; // Stage 0-1
225 |             fout_int[3] <= fout_int[2]; // Stage 1-2
226 |             fout_int[5] <= fout_int[4] >> 16; // Stage 2-3
227 |             fout_int[7] <= fout_int[6]; // Stage 3-4
228 |             fout_int[9] <= fout_int[8]; // Stage 4-5
229 |             fout_int[11] <= fout_int[10] >> 16; // Stage 5-6
230 |             fout_int[13] <= fout_int[12]; // Stages 6-7
231 |             fout_int[15] <= fout_int[14]; // Stage 7-8
232 |             fout_int[17] <= fout_int[16]; // Stage 8-9
233 |         end
234 |     endcase
235 | end
236 | 
237 | // INSTANTATES TEN 16-BIT MULTIPLIERS //
238 | // 15:0
239 | mult_16bit mult16_0 (
240 |     .a_i (a_int[0]),
241 |     .b_i (b_int[0]),
242 |     .c_i (0),
243 |     .fout_o (fout_int[0]),
244 |     .out_o (out_int[0])
245 | );
246 | 
247 | // 31:16
248 | mult_16bit mult16_1 (
249 |     .a_i (a_int[1]),
250 |     .b_i (b_int[1]),
251 |     .c_i (fout_int[1]),
252 |     .fout_o (fout_int[2]),
253 |     .out_o (out_int[1])
254 | );
255 | 
256 | mult_16bit mult16_2 (
257 |     .a_i (a_int[2]),
258 |     .b_i (b_int[2]),
259 |     .c_i (fout_int[3]),
260 |     .fout_o (fout_int[4]),
261 |     .out_o (out_int[2])
262 | );
263 | 
264 | // 47:32
265 | mult_16bit mult16_3 (
266 |     .a_i (a_int[3]),
267 |     .b_i (b_int[3]),
268 |     .c_i (fout_int[5]),
269 |     .fout_o (fout_int[6]),
270 |     .out_o (out_int[3])
271 | );
272 | 
273 | mult_16bit mult16_4 (
274 |     .a_i (a_int[4]),
275 |     .b_i (b_int[4]),
276 |     .c_i (fout_int[7]),
277 |     .fout_o (fout_int[8]),
278 |     .out_o (out_int[4])
279 | );
280 | 
281 | mult_16bit mult16_5 (
282 |     .a_i (a_int[5]),
283 |     .b_i (b_int[5]),
284 |     .c_i (fout_int[9]),
285 |     .fout_o (fout_int[10]),
286 |     .out_o (out_int[5])
287 | );
288 | 
289 | // 63:48
290 | mult_16bit mult16_6 (
291 |     .a_i (a_int[6]),
292 |     .b_i (b_int[6]),
293 |     .c_i (fout_int[11]),
294 |     .fout_o (fout_int[12]),
295 |     .out_o (out_int[6])
296 | );
297 | 
298 | mult_16bit mult16_7 (
299 |     .a_i (a_int[7]),
300 |     .b_i (b_int[7]),
301 |     .c_i (fout_int[13]),
302 |     .fout_o (fout_int[14]),
303 |     .out_o (out_int[7])
304 | );
305 | 
306 | mult_16bit mult16_8 (
307 |     .a_i (a_int[8]),
308 |     .b_i (b_int[8]),
309 |     .c_i (fout_int[15]),
310 |     .fout_o (fout_int[16]),
311 |     .out_o (out_int[8])
312 | );
313 | 
314 | mult_16bit mult16_9 (
315 |     .a_i (a_int[9]),
316 |     .b_i (b_int[9]),
317 |     .c_i (fout_int[17]),
318 |     .fout_o (),
319 |     .out_o (out_int[9])
320 | );
321 | 
322 | endmodule


--------------------------------------------------------------------------------
/hw/vector_processor_pkg.sv:
--------------------------------------------------------------------------------
 1 | package vector_processor_pkg;
 2 | 
 3 | parameter XLEN = 32;
 4 | parameter VLEN = 256;
 5 | 
 6 | // Control and Status Registers
 7 | typedef enum logic [11:0] {
 8 |     CSR_VSTART      = 12'h008,
 9 |     CSR_VXSAT       = 12'h009,
10 |     CSR_VXRM        = 12'h00A,
11 |     CSR_VCSR        = 12'h00F,
12 |     CSR_VL          = 12'hC20,
13 |     CSR_VTYPE       = 12'hC21,
14 |     CSR_VLENB       = 12'hC22
15 | } csr_regs;
16 | 
17 | // OPCODEs
18 | typedef enum logic [6:0] {
19 |     OPCODE_LOAD_FP  = 7'h07, // Floating point load
20 |     OPCODE_STORE_FP = 7'h27, // Floating point store
21 |     OPCODE_AMO      = 7'h2F, // Atomic memory operation
22 |     OPCODE_OP_V     = 7'h57  // Vector arithmatic
23 | } opcodes;
24 | 
25 | // Memory Addressing Mode
26 | typedef enum logic [2:0] {
27 |     MOP_ZEU = 3'h0, // Zero extended unit stride
28 |     MOP_ZES = 3'h2, // Zero extended strided
29 |     MOP_ZEI = 3'h3, // Zero extended indexed
30 |     MOP_SEU = 3'h4, // Sign extended unit stride
31 |     MOP_SES = 3'h6, // Sign extended strided
32 |     MOP_SEI = 3'h7  // Sign extended indexed          
33 | } mops;
34 | 
35 | // LSU Widths
36 | typedef enum logic [2:0] {
37 |     S16     = 3'h1, // Scalar FP 16-bit
38 |     S32     = 3'h2, // Scalar FP 32-bit
39 |     S64     = 3'h3, // Scalar FP 64-bit
40 |     S128    = 3'h4, // Scalar FP 128-bit
41 |     VB      = 3'h0, // Vector byte
42 |     VH      = 3'h5, // Vector halfword
43 |     VW      = 3'h6, // Vector word
44 |     VE      = 3'h7  // Vector element  
45 | } lsu_widths;
46 | 
47 | // Vector ALU OPCODEs
48 | typedef enum logic[5:0] {
49 |     // LOGIC
50 |     ALU_VAND,
51 |     ALU_VNAND,
52 |     ALU_VANDNOT,
53 |     ALU_VOR,
54 |     ALU_VNOR,
55 |     ALU_VXOR,
56 |     ALU_VXNOR,
57 |     ALU_VNOT,
58 |     // SHIFT
59 |     ALU_VSLL,
60 |     ALU_VSRL,
61 |     ALU_VSRA,
62 |     // ARITHMETIC
63 |     ALU_VADD,
64 |     ALU_VSUB,
65 |     ALU_VMIN,
66 |     ALU_VMAX,
67 |     ALU_VADC,
68 |     ALU_VSBC,
69 |     // OTHER
70 |     ALU_RGATHER
71 | } alu_opcodes;
72 | 
73 | // Vector Compare OPCODEs
74 | typedef enum logic[3:0] {
75 |     COMP_EQ,
76 |     COMP_NEQ,
77 |     COMP_LT,
78 |     COMP_LTU,
79 |     COMP_LE,
80 |     COMP_LEU,
81 |     COMP_GT,
82 |     COMP_GTU,
83 |     COMP_MIN,
84 |     COMP_MINU,
85 |     COMP_MAX,
86 |     COMP_MAXU
87 | } comp_opcodes;
88 | 
89 | endpackage
90 | 


--------------------------------------------------------------------------------