├── .gitignore ├── LICENSE ├── README.md └── exercises.md /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | pip-wheel-metadata/ 24 | share/python-wheels/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | MANIFEST 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .nox/ 44 | .coverage 45 | .coverage.* 46 | .cache 47 | nosetests.xml 48 | coverage.xml 49 | *.cover 50 | *.py,cover 51 | .hypothesis/ 52 | .pytest_cache/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | target/ 76 | 77 | # Jupyter Notebook 78 | .ipynb_checkpoints 79 | 80 | # IPython 81 | profile_default/ 82 | ipython_config.py 83 | 84 | # pyenv 85 | .python-version 86 | 87 | # pipenv 88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 91 | # install all needed dependencies. 92 | #Pipfile.lock 93 | 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 95 | __pypackages__/ 96 | 97 | # Celery stuff 98 | celerybeat-schedule 99 | celerybeat.pid 100 | 101 | # SageMath parsed files 102 | *.sage.py 103 | 104 | # Environments 105 | .env 106 | .venv 107 | env/ 108 | venv/ 109 | ENV/ 110 | env.bak/ 111 | venv.bak/ 112 | 113 | # Spyder project settings 114 | .spyderproject 115 | .spyproject 116 | 117 | # Rope project settings 118 | .ropeproject 119 | 120 | # mkdocs documentation 121 | /site 122 | 123 | # mypy 124 | .mypy_cache/ 125 | .dmypy.json 126 | dmypy.json 127 | 128 | # Pyre type checker 129 | .pyre/ 130 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 Hugh Perkins 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # cpu-tutorial 2 | 3 | Exercises to guide you through building your own CPU, from scratch, in verilog 4 | 5 | ## pre-requisites 6 | 7 | - you should already have a grounding in verilog 8 | - one way to do this is to work your way through the questions at https://hdlbits.01xz.net/wiki/Main_Page 9 | - I didn't finish these, but I did many of them; you can see the ones I did personally at https://hdlbits.01xz.net/wiki/Special:VlgStats/3D7115FE8A440C29 10 | - note: I'm not affiliated, I just found it was quite useful for me 11 | - make sure you complete the `game of life` problem, since you are going to be building tons of finite state machines, and this problem is good practice 12 | - to learn verilog, hdlbits isn't enough: this just provides short-term validation, practice, and dopamine-stimulation that you are in fact learning 13 | - one resource that is quite useful for learning is https://www.doulos.com/knowhow/verilog/ (note: I'm not affiliated, I just found it worked well for me) 14 | - you can google around the web for other resources as you go 15 | - you'll need to install some simulators and synthesizers. I use: 16 | - [iverilog](http://iverilog.icarus.com/) 17 | - [verilator](https://www.veripool.org/verilator/) 18 | - [yosys](https://github.com/YosysHQ/yosys) 19 | - you'll need a text editor, or development environment. I use the following: 20 | - [Visual Studio Code](https://code.visualstudio.com/) 21 | - know at least one programming language you can use to write host-side code, such as assemblers 22 | - python or C++ are both ok. I used python initially 23 | 24 | Note that *all* the above resources can be used without needing to buy licenses or similar. 25 | 26 | ## Exercises 27 | 28 | - [exercises.md](/exercises.md) 29 | -------------------------------------------------------------------------------- /exercises.md: -------------------------------------------------------------------------------- 1 | 1. build simulatable verilog, that will read some arbitrary 16-bit hexadecimal numbers from a text file, and output them onto the screen (using $display for output, you can use $readmemh to load the file) 2 | 2. create a module 'proc' that will output the numbers 1 to 10, changing the number output on each clock positive edge. Create a driver module, which will provide a clock and reset to the module; and will use $display to print the numbers output by the module. 3 | 3. modify the modules from 2. so that: 4 | - the driver module loads the hexadecimal file from 1., into a memory, and provides this memory to `proc` 5 | - `proc` iterates over the memory contents, giving each value to the driver module, one at a time 6 | - the driver module uses $display to print out each value 7 | - it's possible to create the memory in driver module, and then send the entire memory into the proc, using iverilog (this won't synthesize with yosys, but we can think about that later) 8 | 4. split the hexadecimal file into two sections: 9 | - first numbers contain instructions, which we will talk about in a second 10 | - second set of numbers are the numbers we want to print out, just as before 11 | - for now, create a single 16-bit instruction, which we'll denote as `OUTLOC`, which will contain a memory location, and will send the contents of that memory location to the driver module 12 | - you'll have to find a way to encode the memory location inside the instruction 13 | - for example, the last 8 bits of the instruction could represent the instruction type, for example you could use `1` to mean `OUTLOC` 14 | - and the first 8 bits of the instruction could represent the location in memory to output 15 | - run the driver, and check the outputs are ok 16 | 5. since creating the instructions is kind of tedious, create a python script (or C++, or whatever language you like), that will take a text file with assembly code, and convert it into the hexadecimal instructions. The assembly code can look something like the following: 17 | ``` 18 | outloc 64 19 | outloc 68 20 | outloc 72 21 | outloc 76 22 | 23 | location 64: 24 | abcd 25 | 1234 26 | dead 27 | beef 28 | ``` 29 | - check that you can run your assembler to produce machine code, and then run your verilog simulation, to run the machine code, and the outputs look correct 30 | 6. __registers__ 31 | - Modify your proc module so that it has a memory to store 32 registers, which we will denote x0 to x31. 32 | - add a new instruction `LI`, which will load a number into a register 33 | - for example `li x1, 123` will load the number 123 into register x1 34 | - add an instruction `outr` which will print out the value of a register, eg `outr x1` will print out the value of x1 35 | - create some assembly code to test these two instruction, assemble this assembly, and run it 36 | - check the output looks ok 37 | 7. __memory__ 38 | - move the memory out of proc/driver modules, into a new file, e.g. `mem.sv` 39 | - you will need to design an appropriate protocol for `mem.sv`, to allow reads and writes by `proc.sv` 40 | - you will need to design an appropriate protocl so that the driver module can write the initial hexadecimal instructions and data into the memory 41 | - easiest way might be to create a second 'write port' into `mem.sv`, that only the driver module uses 42 | - assembler and run your assembler programs, and check they continue to run ok 43 | - you will need to somehow handle that reading in instructions from memory will now take more than a single clock cycle 44 | - at this point, you will probalby want to start making proc.sv be a finite state machine 45 | - (if you've no idea how to start with this, you could try some of the FSM problems in hdlbits, if you haven't already; and make sure you completed the 'game of life' problem, if you didn't already) 46 | 8. add `load` and `store` instructions, that will load the value of a register from a memory location, or store the value of a register into a memory location, respectively. e.g. `load x1 64`, will load the value from memory location 64 into register x1; and `store x2 68` will store the value from register `x2` into memory location `68`. 47 | - you can add a new instruction `outloc`, if you wish, that outputs the value at a particular memory location, e.g. `outloc 64` will send the value at location 64 to the driver for output 48 | - both load and store will likely need multiple clock cycles, so you will need to continue working with the finite state machine you created in step 7 49 | 9. __RISCV__ 50 | - migrate your instructions to be RISC-V compliant 51 | - don't modify `out` or `outloc` or `outr` for now, since these are not RISC-V instructions 52 | - you'll need to change your instructions to be 32-bit 53 | - you'll need to implement `li` as a pseudoinstruction, that does something like `addi x5, x0, 123`, which takes the value of register `x0` (always 0), adds 123, and places the result into register x5 54 | - you'll need to look up the binary representations used in RISC-V in the RISC-V volume 1, unprivileged spec, https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf 55 | 10. add arithmetic, like `add`, `sub`, `mul`, using the simple verilog operators `+` and `-`, `*` 56 | - write assembly to test this 57 | - run, and check output is ok 58 | 11. build an assembly program to add the numbers 1 to 5 59 | - you'll need to add at least one branch instruction to do this 60 | - this means you'll need to add the ability to create labels to your assembler 61 | - eg something like `somelabel:` is a label called `somelabel` 62 | - for now, you can simply allow backwards jumps only, which simplifies your assembler, since you only need a single pass 63 | 12. __delay propagation__ at some point you need to start synthesizing your design, and doing things like: 64 | - gate-level simulation 65 | - measuring propagation delay 66 | - measuring die area 67 | - let's start with measuring propagation delay 68 | - there are various repos around which say they can do this (e.g. [OpenTimer](https://github.com/OpenTimer/OpenTimer)), however I didn't have much/any success in using them, so I built my own script, [https://github.com/hughperkins/VeriGPU/blob/main/verigpu/timing.py](https://github.com/hughperkins/VeriGPU/blob/main/verigpu/timing.py) 69 | - you are free to measure propagation delay however you want, but I feel you do need to start measuring it :) 70 | - obviously, my own opinion is that using my own script is the easiest way, but your mileage may vary :) 71 | - the way my script works is: 72 | - first use yosys to synthesize your verilog down to gate-level cells, using the OSU018 tech, [https://github.com/hughperkins/VeriGPU/tree/main/tech/osu018](https://github.com/hughperkins/VeriGPU/tree/main/tech/osu018) 73 | - assign a relative propagation delay, relative to that of a single NAND gate, to each node in the tree, based loosely on the propagation delays in [https://web.engr.oregonstate.edu/~traylor/ece474/reading/SAED_Cell_Lib_Rev1_4_20_1.pdf](https://web.engr.oregonstate.edu/~traylor/ece474/reading/SAED_Cell_Lib_Rev1_4_20_1.pdf) 74 | - walk the graph, finding the longest path between flip-flops, outputs and inputs, as a sum of the propagation delays of the walked nodes 75 | - in any case, measure somehow the propagation delay of your `proc.sv` module 76 | 13. __div__ create a division module, that will divide two integers, returning the result and the remainder 77 | - note that using the verilog `/` operator will result in the division running in a single cycle 78 | - this is *very* slow: high propagation delay. you can measure using whatever approach you settled on in 12 79 | - so you will likely want to make the division run over multiple cycles somehow (e.g. 32 cycles, one for each bit, for example; whatever it takes to keep the propagation delay short) 80 | 14. __divu__, __remu__ : use the module from the previous step to implement `divu` and `remu` instructions in `proc.sv` 81 | 15. create an assembler program that outputs the first few prime numbers 82 | - you can do this two ways: sieve of aristothenes (I can never spell this...), or iterating over each integer, and checking for factors 83 | - the first way doesn't need either `divu` or `remu`, so try the second way, even if you do the sieve of aristothenes too 84 | 16. add float support. I used `zfinx`, i.e. using same registers for both floats and integers, but you could use the more standard `F` extension of RISC-V 85 | - ensure that at least the following work for now: 86 | - `li x1, 1.23` 87 | - `outr.s x1` output x1 as a float 88 | - load and store for floats (either using `lw` and `sw` if using zfinx, or using `flw.s` and `fsw.s` if using `f`) 89 | 17. add add for floats 90 | 18. add subtract for floats 91 | 19. add multiplication for floats 92 | 20. add division for floats 93 | 21. write a matrix multiplication assembler program 94 | - this gets pretty fiddly 95 | 22. since writing assembler for the matrix multiplication was getting fiddly, let's migrate to use a compiler to be able to convert e.g. C/C++ programs into assembler, which we can then assemble and run 96 | - you can use the `clang+` compiler provided with llvm 97 | - if you use the options `-S -target riscv32`, then `clang+` will convert your C++ into RISC-V assembler 98 | - a few complications: 99 | - what you'll need to write in the C or C++ is a function 100 | - you'll need to add additional 'header' assembler in front of the resulting assembler to jump into this function 101 | - then halt afterwards 102 | - you'll need to therefore implement jump instructions (possiblye a `halt` instruction, if you don't already have one) 103 | - you'll need to be able to handle forward jumps, so you'll need to make your assembler handle this somehow (e.g. using two passes) 104 | - you'll need to load any parameters into the registers `a0`, `a1`, etc 105 | - you'll need to make `a0`, `a1` etc be aliases to the appropriate risc-v `x` registers 106 | - you'll also need to add in other aliases, such as `sp` 107 | - you'll need to point `sp` towards the top of some unused memory before jumping into the function 108 | - if you want to output anything, you'll need to make your code call a function `void out(int value);` 109 | - and you'll need to implement this in assembler, as part of the 'header' assembler 110 | 23. at this point, you could also consider migrating from using your own assembler, to using the clang/llvm assembler, e.g. `llc` 111 | 24. once you've got to this point, you can already run fairly complex programs 112 | 25. your next steps will be things like: 113 | - adding instruction caching, and memory caching in general 114 | - adding parallel instruction execution 115 | 116 | (Note: you can see my own processor at https://github.com/hughperkins/VeriGPU , which I basically wrote by doing approximately what I've outlined above :) (I did a few GPU-specific things too; but I'm making this a CPU tutorial, not a GPU tutorial; happy to extend this for GPU if interest)) 117 | --------------------------------------------------------------------------------