├── .gitignore ├── lab2 └── figs │ ├── dve.png │ ├── fir.png │ ├── q1_a.png │ ├── q1_b_1.png │ ├── q1_b_2.png │ ├── q1_c.png │ ├── new_wave.png │ ├── vlsi_flow.png │ └── display_wave.png ├── lab6 ├── figs │ ├── drc.png │ ├── dp_div.png │ ├── layout.png │ └── lvs_smiley.png └── spec.md ├── project ├── figs │ ├── csrw.png │ ├── RV32I_Base_Instruction_Set.pdf │ └── RV32I_Base_Instruction_Set.png ├── README.md ├── final.md ├── checkpoint2.md ├── checkpoint3.md ├── checkpoint4.md ├── overview.md └── checkpoint1.md ├── lab1 ├── figs │ └── x2gomacos.png └── spec.md ├── lab4 ├── figs │ ├── view_icons.png │ ├── timing_debug.png │ ├── clock_tree_nets.png │ ├── gcd_coprocessor.pdf │ ├── gcd_coprocessor.png │ ├── innovus_window.png │ ├── clock_tree_debugger.png │ ├── sky130 │ │ ├── innovus_window.png │ │ ├── timing_debug.png │ │ ├── clock_tree_nets.png │ │ ├── clock_tree_debugger.png │ │ └── critical_path_highlight.png │ └── critical_path_highlight.png └── spec_sky130.md ├── lab3 ├── figs │ ├── block-diagram.pdf │ └── block-diagram.png ├── spec.md └── spec_sky130.md ├── README.md └── lab5 ├── spec_sky130.md └── spec.md /.gitignore: -------------------------------------------------------------------------------- 1 | *.DS_Store -------------------------------------------------------------------------------- /lab2/figs/dve.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab2/figs/dve.png -------------------------------------------------------------------------------- /lab2/figs/fir.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab2/figs/fir.png -------------------------------------------------------------------------------- /lab6/figs/drc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab6/figs/drc.png -------------------------------------------------------------------------------- /lab2/figs/q1_a.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab2/figs/q1_a.png -------------------------------------------------------------------------------- /lab2/figs/q1_b_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab2/figs/q1_b_1.png -------------------------------------------------------------------------------- /lab2/figs/q1_b_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab2/figs/q1_b_2.png -------------------------------------------------------------------------------- /lab2/figs/q1_c.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab2/figs/q1_c.png -------------------------------------------------------------------------------- /lab6/figs/dp_div.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab6/figs/dp_div.png -------------------------------------------------------------------------------- /lab6/figs/layout.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab6/figs/layout.png -------------------------------------------------------------------------------- /lab2/figs/new_wave.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab2/figs/new_wave.png -------------------------------------------------------------------------------- /project/figs/csrw.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/project/figs/csrw.png -------------------------------------------------------------------------------- /lab1/figs/x2gomacos.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab1/figs/x2gomacos.png -------------------------------------------------------------------------------- /lab2/figs/vlsi_flow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab2/figs/vlsi_flow.png -------------------------------------------------------------------------------- /lab4/figs/view_icons.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/view_icons.png -------------------------------------------------------------------------------- /lab6/figs/lvs_smiley.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab6/figs/lvs_smiley.png -------------------------------------------------------------------------------- /lab2/figs/display_wave.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab2/figs/display_wave.png -------------------------------------------------------------------------------- /lab3/figs/block-diagram.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab3/figs/block-diagram.pdf -------------------------------------------------------------------------------- /lab3/figs/block-diagram.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab3/figs/block-diagram.png -------------------------------------------------------------------------------- /lab4/figs/timing_debug.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/timing_debug.png -------------------------------------------------------------------------------- /lab4/figs/clock_tree_nets.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/clock_tree_nets.png -------------------------------------------------------------------------------- /lab4/figs/gcd_coprocessor.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/gcd_coprocessor.pdf -------------------------------------------------------------------------------- /lab4/figs/gcd_coprocessor.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/gcd_coprocessor.png -------------------------------------------------------------------------------- /lab4/figs/innovus_window.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/innovus_window.png -------------------------------------------------------------------------------- /lab4/figs/clock_tree_debugger.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/clock_tree_debugger.png -------------------------------------------------------------------------------- /lab4/figs/sky130/innovus_window.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/sky130/innovus_window.png -------------------------------------------------------------------------------- /lab4/figs/sky130/timing_debug.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/sky130/timing_debug.png -------------------------------------------------------------------------------- /lab4/figs/critical_path_highlight.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/critical_path_highlight.png -------------------------------------------------------------------------------- /lab4/figs/sky130/clock_tree_nets.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/sky130/clock_tree_nets.png -------------------------------------------------------------------------------- /lab4/figs/sky130/clock_tree_debugger.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/sky130/clock_tree_debugger.png -------------------------------------------------------------------------------- /lab4/figs/sky130/critical_path_highlight.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/sky130/critical_path_highlight.png -------------------------------------------------------------------------------- /project/figs/RV32I_Base_Instruction_Set.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/project/figs/RV32I_Base_Instruction_Set.pdf -------------------------------------------------------------------------------- /project/figs/RV32I_Base_Instruction_Set.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/project/figs/RV32I_Base_Instruction_Set.png -------------------------------------------------------------------------------- /project/README.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Project Specification: RISC-V Processor Design 2 | 3 | 4 | - [Project Overview](overview.md) : Introduction, Project setup and Grading 5 | - [Checkpoint 1](checkpoint1.md) : ALU design and Pipeline diagram 6 | - Apr 1 (Friday), 2022 7 | - [Checkpoint 2](checkpoint2.md) : Fully functioning core 8 | - Apr 15 (Friday), 2022 9 | - [Checkpoint 3](checkpoint3.md) : Cache 10 | - Apr 22 (Friday), 2022 11 | - [Checkpoint 4](checkpoint4.md) : Synthesis, PAR & Power 12 | - Apr 29 (Friday), 2022 13 | - [Final Deliverables](final.md): 14 | - Final Interview/Checkoff: May 6, 2022 15 | - Report: May 9, 2022 16 | # Resources: 17 | [RISC-V Instruction Set Manual](https://riscv.org/technical/specifications/) (Volume 1, Unprivileged Spec) 18 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Labs Fall 21 2 | 3 | This lab course consists of 6 labs and a final project. The labs go through the ASIC design flow, from RTL through GDS. 4 | These labs are now available in two process technologies, 5 | the [ASAP7 7nm Predictive PDK](http://asap.asu.edu/asap/) (a non-implementable finFET technology developed for educational purposes) 6 | and the [Skywater 130nm PDK](https://skywater-pdk.readthedocs.io/en/latest/) (a real open-source 130nm CMOS process developed by Google and Skywater foundries). 7 | 8 | ## ASAP7 Labs 9 | - [Lab 1: Getting Around the Compute Environment](lab1/spec.md) 10 | - [Lab 2: Simulation](lab2/spec.md) 11 | - [Lab 3: Logic Synthesis](lab3/spec.md) 12 | - [Lab 4: Floorplanning, Placement, Power, and CTS](lab4/spec.md) 13 | - [Lab 5: Parallelization and Routing](lab5/spec.md) 14 | - [Lab 6: SRAM Integration, DRC, LVS](lab6/spec.md) 15 | 16 | ## ASIC Final Project 17 | This project guides students through writing their own CPU core and cache, and pushing this design through the ASIC flow to achieve a physical design. 18 | - [Project Overview](project/overview.md) : Introduction, Project setup and Grading 19 | - [Checkpoint 1](project/checkpoint1.md) : ALU design and Pipeline diagram 20 | - [Checkpoint 2](project/checkpoint2.md) : Fully functioning core 21 | - [Checkpoint 3](project/checkpoint3.md) : Cache 22 | - [Checkpoint 4](project/checkpoint4.md) : Synthesis, PAR & Power 23 | 24 | ## Sky130 Labs 25 | Alternate versions of the ASAP7 labs above use the Skywater 130nm PDK instead. Lab 6 is omitted because (1) the Sky130 SRAMs are currently not mature enough to be used for educational purposes, and (2) for DRC/LVS, the Sky130 Calibre decks are still under NDA, and while the open-source decks are available (for use with Magic and Netgen), our ASIC design flow does not currently support these open-source EDA tools. To learn about SRAMs, DRC, and LVS, please follow the ASAP7 version of Lab 6 above. 26 | - [Lab 1: Getting Around the Compute Environment](lab1/spec.md) 27 | - [Lab 2: Simulation](lab2/spec_sky130.md) 28 | - [Lab 3: Logic Synthesis](lab3/spec_sky130.md) 29 | - [Lab 4: Floorplanning, Placement, Power, and CTS](lab4/spec_sky130.md) 30 | - [Lab 5: Parallelization and Routing](lab5/spec_sky130.md) 31 | -------------------------------------------------------------------------------- /project/final.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Project Specification: Final Deliverables 2 |
3 | Prof. Sophia Shao 4 |
5 |6 | TAs (ASIC): Dima Nikiforov 7 |
8 |9 | Department of Electrical Engineering and Computer Science 10 |
11 |12 | College of Engineering, University of California, Berkeley 13 |
14 | 15 | --- 16 | 17 | ## Final Project Deliverables 18 | 19 | By now you should have designed a fully-functional processor from scratch! Your design should pass all assembly tests at your reported maximum frequency. Your 20 | design should also pass all of the benchmark tests in at your reported maximum frequency, and you 21 | should report the cycle count for each of those tests. By the due date (Monday, May 9, 2022), each 22 | team needs to push their final commits to their team’s git repository. Only the final commit before the 23 | due date will be graded, so be very, very careful that you have submitted everything required. To be 24 | graded you must submit the following items: 25 | * `src/*.v` 26 | * `build/syn-rundir/reports/*` 27 | * `build/par-rundir/timingReports/*` 28 | * `build/par-rundir/innovus.log*` 29 | 30 | These files will be used to check processor functionality and will show us your critical path, maximum operating frequency and area. During the interview session (Friday, May 6, 2022), the 31 | professors and the GSI will be interviewing each team to gauge understanding of various concepts 32 | learned in the project, understand more about each team’s design process, and provide feedback. Your 33 | final report needs to answer the following questions: 34 | 35 | 1. Show the final pipeline diagram, and explain the functionality of different submodules in your design, how control signals are generated, memory structure, etc. 36 | 37 | 2. What is the post-synthesis critical path length? What sections of the processor does the critical 38 | path pass through? Why is this the critical path? 39 | 40 | 3. Show a screenshot of the final floorplan. Also include a screenshot of the clock tree debugger results. Discuss your floorplanning strategy and the quality of your clock tree results. 41 | 42 | 4. What is the post-place-and-route critical path length? What sections of the processor does the 43 | critical path pass through? Why is this the critical path? If it is different from the post-synthesis 44 | critical path, why? 45 | 46 | 5. What is the area utilization of the final design? Also include the total core area you used in PnR and the density. 47 | 48 | 6. What is the Innovus-estimated power consumption of the final design? 49 | 50 | 7. What is the number of cycles that your design takes to run the benchmarks? What changes/optimizations 51 | have you done to try and optimize for these tests? 52 | 53 | 8. What is the post-place-and-route runtime (in seconds) of each benchmark? 54 | *Use the number of cycles from RTL simulation, and minimum clock period to meet timing for place-and-route (design doesn't have to pass post-PAR simulations with this clock period).* 55 | 56 | 9. If there are bugs in your design still, explain what is working and what isn't. What was your debugging process? Where are the bugs localized? 57 | 58 | 10. Explain any other optimizations you made for your design. 59 | 60 | 11. Is there anything you would like to tell the staff before we grade your project? 61 | 62 | 63 | If you worked with a partner you do not need separate reports. If you are having issues with your 64 | partner please contact the GSI privately as soon as possible. 65 | 66 | 67 | 68 | ## Acknowledgement 69 | 70 | This project is the result of the work of many EECS151/251 GSIs over the years including: 71 | Written By: 72 | - Nathan Narevsky (2014, 2017) 73 | - Brian Zimmer (2014) 74 | Modified By: 75 | - John Wright (2015,2016) 76 | - Ali Moin (2018) 77 | - Arya Reais-Parsi (2019) 78 | - Cem Yalcin (2019) 79 | - Tan Nguyen (2020) 80 | - Harrison Liew (2020) 81 | - Sean Huang (2021) 82 | - Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu (2021) 83 | - Dima Nikiforov (2022) 84 | -------------------------------------------------------------------------------- /project/checkpoint2.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Project Specification: Checkpoint 2 2 |3 | Prof. Sophia Shao 4 |
5 |6 | TAs (ASIC): Dima Nikiforov 7 |
8 |9 | Department of Electrical Engineering and Computer Science 10 |
11 |12 | College of Engineering, University of California, Berkeley 13 |
14 | 15 | --- 16 | ## Fully functioning core 17 | 18 | ### 1. Additional Instructions 19 | #### 1.1 Control and Status Register (CSR) 20 | In order to run the testbenches, there are a few new instructions that need to be added for help in 21 | debugging/creating testbenches. Read through Chapter 9 in the RISC-V specification. A CSR (or 22 | control status register) is some state that is stored independent of the register file and the memory. 23 | While there are 2^12 possible CSR addresses, you will only use one of them (`tohost = 0x51E`). The 24 | `tohost` register is monitored by the test harness, and simulation ends when a value is written to this 25 | register. A value of 1 indicates success, a value greater than 1 gives clues as to the location of the failure. 26 | There are 2 CSR related instructions that you will need to implement: 27 | 1. `csrw tohost,t2` (short for `csrrw x0,csr,rs1` where `csr = 0x51E`) 28 | 2. `csrwi tohost,1` (short for `csrrwi x0,csr,zimm` where `csr = 0x51E`) 29 | 30 | `csrw` will write the value from register in rs1. `csrwi` will write the immediate (stored in rs1) to 31 | the addressed csr. Note that you do not need to write to rd (writing to x0 does nothing). 32 | 33 |
34 |
35 |
3 | Prof. Sophia Shao 4 |
5 |6 | TAs (ASIC): Dima Nikiforov 7 |
8 |9 | Department of Electrical Engineering and Computer Science 10 |
11 |12 | College of Engineering, University of California, Berkeley 13 |
14 | 15 | --- 16 | 17 | ## Cache 18 | A processor operates on data in memory. Memory can hold billions of bits, which can either be instructions or data. In a VLSI design, it is a very bad idea to store this many bits close to the processor. The 19 | chip area required would be huge - consider how many DRAM chips your PC has, and that DRAM cells 20 | are much smaller than SRAM cells (which can actually be implemented in the same CMOS process). 21 | Moreover, the entire processor would have to slow down to accommodate delays in the large memory 22 | array. Instead, caches are used to create the illusion of a large memory with low latency. 23 | 24 | Your task is to implement a (relatively) simple cache for your RISC-V processor, based on some 25 | predefined SRAM macros (memory arrays) and the interface specified below. 26 | 27 | ### 1 Cache overview 28 | When you request data at a given address, the cache will see if it is stored locally. If it is (cache hit), it 29 | is returned immediately. Otherwise if it is not found (cache miss), the cache fetches the bits from the 30 | main memory. 31 | Caches store data in “ways.” A way is a logical element which contains valid bits, tag bits, and data. 32 | The simplest type of cache is direct-mapped (a 1-way cache). A cache stores data in larger units (lines) 33 | than single words. In each way, a given address may only occupy a single location, determined by the 34 | lowest bits of the cache line address. The remaining address bits are called the “tag” and are stored so 35 | that we can check if a given cache line belongs to a given address. The valid bit indicates which lines 36 | contain valid data. 37 | Multi-way caches allow more flexibility in what data is stored in the cache, since there are multiple 38 | locations for a line to occupy (the number of ways). For this reason, a ”replacement policy” is needed. 39 | This is used to decide which way’s data to evict when fetching new data. For this project you may use 40 | any policy you wish, but pseudo-random is recommended. 41 | 42 | ### 2 Guidelines and requirements 43 | You have been given the interface of a cache (`Cache.v`) and your next task is to implement the cache. 44 | EECS151 students should build a direct-mapped cache, and EECS251 students are required to implement a cache that either: 45 | 46 | 1. is configurable to be either direct-mapped or at least 2-way set associative; or 47 | 48 | 2. is set-associative with configurable associativity. 49 | 50 | You are welcome to implement a more performant cache if you desire. 51 | Your cache should be at least 512 bytes; if you wish to increase the size, implement the 512 bytes 52 | cache first and upgrade later. 53 | Use the SRAMs that are available in 54 | 55 | `/home/ff/eecs151/hammer/src/hammer-vlsi/technology/asap7/sram_compiler/memories/behavioral/sram_behav_models.v` 56 | 57 | for your data and tag arrays. 58 | 59 | 60 | The pin descriptions for these SRAMs are as follows: 61 | 62 | | | | 63 | |-------------|-------------------------------------------------| 64 | | `A` | Address | 65 | | `CE` | clock edge | 66 | | `OEB` | output enable bar (tie this to 0) | 67 | | `WEB` | write enable bar (1 is a read, 0 is a write) | 68 | | `CSB` | chip select bar (tie this to 0) | 69 | | `BYTEMASK` | write byte mask | 70 | | `I` | write data | 71 | | `O` | read data | 72 | 73 | You should use cache lines that are 512 bits (16 words) for this project. The memory interface is 74 | 128 bits, meaning that you will require multiple (4) cycles to perform memory transactions. 75 | Below find a description of each signal in `Cache.v`: 76 | 77 | | | | 78 | |----------------------|----------------------------------------| 79 | | `clk` | clock 80 | | `reset` | reset 81 | | `cpu_req_valid` | The CPU is requesting a memory transaction 82 | | `cpu_req_rdy` | The cache is ready for a CPU memory transaction 83 | | `cpu_req_addr` | The address of the CPU memory transaction 84 | | `cpu_req_data` | The write data for a CPU memory write (ignored on reads) 85 | | `cpu_req_write` | The 4-bit write mask for a CPU memory transaction (each bit corresponds to the byte address within the word). `4’b0000` indicates a read. 86 | | `cpu_resp_val` | The cache has output valid data to the CPU after a memory read 87 | | `cpu_resp_data` | The data requested by the CPU 88 | | `mem_req_val` | The cache is requesting a memory transaction to main memory 89 | | `mem_req_rdy` | Main memory is ready for the cache to provide a memory address 90 | | `mem_req_addr` | The address of the main memory transaction from the cache. Note that this address is narrower than the CPU byte address since main memory has wider data._ 91 | | `mem_req_rw` | 1 if the main memory transaction is a write; 0 for a read. 92 | | `mem_req_data_valid` | The cache is providing write data to main memory. 93 | | `mem_req_data_ready` | Main memory is ready for the cache to provide write data. 94 | | `mem_req_data_bits` | Data to write to main memory from the cache (128 bits/4 words). 95 | | `mem_req_data_mask` | Byte-level write mask to main memory. May be `16’hFFFF` for a full write. 96 | | `mem_resp_val` | The main memory response data is valid. 97 | | `mem_resp_data` | Main memory response data to the cache (128 bits/4 words). 98 | | | 99 | 100 | To design your cache, start by outlining where the SRAMs should go. You should include an SRAM 101 | per way for data, and a separate SRAM per way for the tags. Depending on your implementation, you 102 | may want to implement the valid bits in flip flops or as part of the tag SRAM. 103 | 104 | Next you should develop a state machine that covers all the events that your cache needs to handle 105 | for both hits and misses. You can do it without an explicit state machine, but you will suffer. Keep in 106 | mind you will need to write any valid data back to main memory before you start refilling the cache (you 107 | can use a write-back or a write-through policy). Both of these transactions will take multiple cycles. 108 | 109 | ### 3 Changes to the flow for this checkpoint 110 | You should now be able to pass the `bmark` test. The test suite includes many C programs that do 111 | various things to test your processor and cache implementation. You can observe the number of cycles 112 | that each bmark test takes to run by opening `bmark_output/*.out` and taking note of the number 113 | on the last line. The `make sim-rtl test bmark=all` target will also print this number for you. 114 | To run a specific benchmark (e.g., cachetest), run 115 | ``` 116 | make sim-rtl test_bmark=cachetest.out 117 | ``` 118 | After completing your cache, run the tests with both the cache included and with the fake memory 119 | (`no_cache_mem`) included. To use no_cache_mem be sure to have `+define+no_cache_mem` in the 120 | simOptions variable in the `sim-rtl.yml` file. To use your cache, comment out `+define+no_cache_mem`. 121 | Take note of the cycle counts for both, you should see the cycle counts increase when you use the cache. 122 | 123 | ### 4 Checkpoint 3 Deliverables 124 | *Checkoff due: Apr 29 (Friday), 2022* 125 | 126 | 1. Show that all of the assembly tests and final pass using the cache 127 | 128 | 2. Show the block diagram of your cache 129 | 130 | 3. What was the difference in the cycle count for the `bmark` test with the perfect memory and the 131 | cache? 132 | 133 | 4. Show your final pipeline diagram, updated to match the code 134 | 135 | --- 136 | 137 | 138 | ## Acknowledgement 139 | 140 | This project is the result of the work of many EECS151/251 GSIs over the years including: 141 | Written By: 142 | - Nathan Narevsky (2014, 2017) 143 | - Brian Zimmer (2014) 144 | Modified By: 145 | - John Wright (2015,2016) 146 | - Ali Moin (2018) 147 | - Arya Reais-Parsi (2019) 148 | - Cem Yalcin (2019) 149 | - Tan Nguyen (2020) 150 | - Harrison Liew (2020) 151 | - Sean Huang (2021) 152 | - Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu (2021) 153 | - Dima Nikiforov (2022) 154 | -------------------------------------------------------------------------------- /lab5/spec_sky130.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Lab 5: Parallelization and Routing 2 | 3 |4 | Prof. Bora Nikolic 5 |
6 |7 | TAs: Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu 8 |
9 |10 | Department of Electrical Engineering and Computer Science 11 |
12 |13 | College of Engineering, University of California, Berkeley 14 |
15 | 16 | ## Overview 17 | 18 | Like last week, this lab has two parts. For the first part, we will continue to develop our GCD 19 | coprocessor by improving its performance. After that, we will continue the physical design flow by 20 | performing routing. 21 | 22 | To begin this lab, get the project files and set up your environment by typing the following commands: 23 | 24 | ```shell 25 | git clone /home/ff/eecs151/fa21/sky130/lab5-sky130.git 26 | cd lab5 27 | ``` 28 | 29 | ## Design 30 | 31 | One way we can improve the performance of our GCD coprocessor is by parallelizing the compute. 32 | We can do this by including multiple GCD units in our design, and routing traffic to them as they 33 | become available. 34 | 35 | You will find that the solution to last week’s lab (`fifo.v` and `gcd_coprocessor.v`) is included. The 36 | test has been modified to check the total number of cycles taken by the coprocessor to complete the 37 | tests. Run `make sim-rtl` to run the new testbench on the solution code. Take note of the number 38 | of cycles that the tests take without modification, as you will need it to calculate your speedup. 39 | 40 | Your task is to edit `gcd_coprocessor.v` to improve the performance below 225 cycles. We will do 41 | this by using two instances of GCD. 42 | 43 | You will find RTL that connects the datapath and controller into one module in `gcd_unit.v`. You 44 | may find this useful when refactoring the `gcd_coprocessor`, since you will need fewer wires to place 45 | both GCD instances. 46 | 47 | You will also find stub code for an arbiter, which you should complete. We will use the arbiter 48 | to route traffic to GCD units and preserve the response ordering. Most of your design can be 49 | implemented with combinational logic, but you will need some state to remember which GCD 50 | block contains the earliest data to preserve ordering. 51 | 52 | --- 53 | ## Question 1: Design 54 | 55 | **a.) Submit your code (`gcd_coprocessor.v` and `gcd_arbiter.v`) with your lab assignment.** 56 | 57 | **b.) How many cycles did your simulation take? What was the % speedup?** 58 | 59 | --- 60 | 61 | ## Automated Flow 62 | 63 | In the last lab, we only focused on the PAR flow through CTS. In this lab, we will go through the full flow. 64 | Routing is the next major flow step. Prior to the actual routing step, Innovus uses a 65 | basic routing engine with errors and shorts, but ignores these errors and simply tries to 66 | get an estimate of delays and parasitics. Once post-CTS 67 | optimization is done, it switches to a different tool that actually legalizes routing and tries to eliminate 68 | shorts while meeting timing. Routing is one of the most 69 | computationally heavy tasks of digital IC design and can take days to complete for complicated designs. 70 | This will be reflected in the runtime in this lab. 71 | 72 | After routing is complete, a post-Route optimization is run to ensure no timing violations 73 | remain. Post-Route optimization typically has little freedom to move cells around, and it tries to 74 | meet the timing constraints mostly by tweaking the length of the routings. You may see some DRC 75 | (Design Rule Check) errors caused by the 7nm technology library, after routing. 76 | 77 | First, synthesize the design: 78 | 79 | ```shell 80 | make syn 81 | ``` 82 | 83 | Then, simulate the synthesized design to make sure it still works: 84 | 85 | ```shell 86 | make sim-gl-syn 87 | ``` 88 | 89 | Once your synthesized design passes the test, you can start the PAR flow: 90 | 91 | ```shell 92 | make par 93 | ``` 94 | 95 | The PAR command will take a long time to complete, as it runs through all stages of PAR. 96 | Check out the iterations that Innovus runs through during optimization. You can see some of the metrics that Innovus is using. 97 | Once it completes, take a look at the build directory as in the previous labs. You might see additional files 98 | compare to the `syn-rundir`, and that’s because the PAR flow incorporates the RC and parasitic delays, in addition to the cell delays. Open `build/par-rundir/gcd_coprocessor.setup.par.spef` 99 | and search for the first occurrence of `D_NET`. What does it say about the first net? You may find 100 | [this wiki page](https://en.wikipedia.org/wiki/Standard_Parasitic_Exchange_Format#Parasitics) helpful. *(thought experiment #1 : get a sense of the units at the top and orders of magnitude of the RC parasitics in the SPEF file. If we used a 5nm technology library, do you expect the resistance to generally increase or decrease? How about the capacitance?)* 101 | 102 | --- 103 | ## Question 2: Automated Flow 104 | 105 | a.) Check the post-Synthesis timing report 106 | (`syn-rundir/reports/final_time_ss_100C_1v60.setup_view.rpt`) and post-PAR timing report (`par-rundir/timingReports/gcd_coprocessor_postRoute_all.tarpt`). 107 | **What are the critical paths of your post-PAR and post-Synthesis designs?** 108 | **Are they the same path?** 109 | **How does this critical path compare to your single-unit critical path?** 110 | 111 | b.) Iterate on your design by modifying `design.yml` to find a rough estimate (no need to be too 112 | precise) for the clock period until you start running into setup errors. 113 | **Given the number of cycles it takes to complete the testbench, what is the shortest time your design can finish the computation?** 114 | 115 | c.) Open the post-CTS timing report(`par-rundir/hammer_cts_debug/hammer_cts_all.tarpt`) and the post-PAR 116 | timing report(`par-rundir/timingReports/gcd_coprocessor_postRoute_all.tarpt`). 117 | **Find a common path (same start and end sequential elements). What differences do you notice within the paths?** 118 | 119 | --- 120 | 121 | ## Innovus Commands 122 | 123 | As in the previous lab, we will look at the contents of `par.tcl` that Hammer generates and follow 124 | along using Innovus. 125 | *(thought experiment #2 : open the `par.tcl` and search for the command `set_db add_fillers_cells`. Based on the names of the cells specified by this command, what do you think is the function of the filler cells?)* 126 | 127 | Navigate to the directory `build/par-rundir` and type: 128 | 129 | ```shell 130 | innovus -common_ui 131 | ``` 132 | 133 | This will open the Innovus shell. Next, type `read_db gcd_coprocessor_FINAL` to load the current design 134 | database from the latest PAR flow. This will help us to avoid re-running the entire flow. To see 135 | all the reporting commands, type `help report*` in the Innovus shell and read through the options 136 | available to you. 137 | 138 | --- 139 | ## Question 3: Innovus Reports 140 | 141 | **a.) What is the area consumed by your design?** 142 | **What percentage of the total area does the arbiter occupy?** 143 | 144 | **b.) Submit a screenshot of your setup slack histogram.** 145 | **Compared with the histogram you obtained in Lab 4, does your new slack distribution support the observed performance improvements you obtained in your coprocessor?** 146 | 147 | --- 148 | 149 | After you are done with the flow, it is time to simulate our newly printed post-PAR netlist. Type 150 | the following command: 151 | 152 | ```shell 153 | make sim-gl-par 154 | ``` 155 | 156 | This will use the same testbench, but will now use the post-PAR netlist of your design, backannotated with delays and parasitics from PAR. Make sure to adjust the `CLOCK_PERIOD` variable in `sim-gl-par.yml` to match the clock period you obtained from PAR. Note, however, that the exact 157 | clock period may not work and you may need to relax it slightly. 158 | 159 | After running `make sim-gl-par` you can run power analysis using: 160 | 161 | ```shell 162 | make power-par 163 | ``` 164 | 165 | Navigate to `power-rundir/activePowerReports` and open `ss_100C_1v60.setup_view.rpt`. Do 166 | the power estimation numbers match your expectation? 167 | 168 | --- 169 | ## Question 4: Trade-offs 170 | 171 | **a.)** Re-run the flow using your old design. 172 | To prevent your `build` directory from being overwritten, set the `OBJ_DIR` Make variable to a different name (i.e. `make par OBJ_DIR=build2`). 173 | Using the area and power values from Innovus, 174 | **how does the performance improvement from the dual-unit design compare to area occupation and power consumption increase compared to your old design?** 175 | 176 | **b.)** Modify your `gcd_coprocessor.v` to take an input parameter in terms of number of clock cycles we 177 | want our design to meet (`parameter TARGET_NUMBER_OF_CYCLES`) for this given testbench. Your 178 | code should generate a low area, low power design if the number is greater than that your simple 179 | gcd coprocessor can achieve, and it should generate the dual-unit design if it is lower. 180 | **Submit your code.** 181 | 182 | c.) (Optional) Using a rough estimate of target number of cycles versus number of units in the design, 183 | write a code that will generate 1-8 cores depending on the performance demand. Do NOT do this 184 | by writing out every possible case explicitly. You can limit the number of units to powers of two 185 | (1,2,4,8) if it makes your life easier. 186 | 187 | --- 188 | 189 | ## Lab Deliverables 190 | 191 | ### Lab Due 11:59 PM, Friday Oct 15th, 2021 192 | 193 | - Submit a written report with all 4 questions answered to Gradescope 194 | - Checkoff with an ASIC lab TA 195 | 196 | ## Acknowledgements 197 | 198 | This lab is the result of the work of many EECS151/251 GSIs over the years including: 199 | 200 | - Nathan Narevsky (2014, 2017) 201 | - Brian Zimmer (2014) 202 | - Cem Yalcin (2019) 203 | 204 | Modified By: 205 | - John Wright (2015,2016) 206 | - Ali Moin (2018) 207 | - Arya Reais-Parsi (2019) 208 | - Cem Yalcin (2019) 209 | - Tan Nguyen (2020) 210 | - Harrison Liew (2020) 211 | - Sean Huang (2021) 212 | - Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu (2021) 213 | -------------------------------------------------------------------------------- /lab5/spec.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Lab 5: Parallelization and Routing 2 |3 | Prof. Sophia Shao 4 |
5 |6 | TAs (ASIC): Erik Anderson, Roger Hsiao, Hansung Kim, Richard Yan 7 |
8 |9 | Department of Electrical Engineering and Computer Science 10 |
11 |12 | College of Engineering, University of California, Berkeley 13 |
14 | 15 | ## Overview 16 | 17 | Like last week, this lab has two parts. For the first part, we will continue to develop our GCD 18 | coprocessor by improving its performance. After that, we will continue the physical design flow by 19 | performing routing. 20 | 21 | To begin this lab, get the project files and set up your environment by typing the following command and sourcing the `eecs151.bashrc` file, as usual: 22 | 23 | ```shell 24 | source /home/ff/eecs151/asic/eecs151.bashrc 25 | ``` 26 | 27 | ```shell 28 | git clone /home/ff/eecs151/labs/lab5.git 29 | cd lab5 30 | ``` 31 | 32 | ## Design 33 | 34 | One way we can improve the performance of our GCD coprocessor is by parallelizing the compute. 35 | We can do this by including multiple GCD units in our design, and routing traffic to them as they 36 | become available. 37 | 38 | You will find that the solution to last week’s lab (`fifo.v` and `gcd_coprocessor.v`) is included. The 39 | test has been modified to check the total number of cycles taken by the coprocessor to complete the 40 | tests. Run `make sim-rtl` to run the new testbench on the solution code. Take note of the number 41 | of cycles that the tests take without modification, as you will need it to calculate your speedup. 42 | 43 | Your task is to edit `gcd_coprocessor.v` to improve the performance below 225 cycles. We will do 44 | this by using two instances of GCD. 45 | 46 | You will find RTL that connects the datapath and controller into one module in `gcd_unit.v`. You 47 | may find this useful when refactoring the `gcd_coprocessor`, since you will need fewer wires to place 48 | both GCD instances. 49 | 50 | You will also find stub code for an arbiter, which you should complete. We will use the arbiter 51 | to route traffic to GCD units and preserve the response ordering. Most of your design can be 52 | implemented with combinational logic, but you will need some state to remember which GCD 53 | block contains the earliest data to preserve ordering. 54 | 55 | --- 56 | ## Question 1: Design 57 | 58 | **a.) Submit your code (`gcd_coprocessor.v` and `gcd_arbiter.v`) with your lab assignment.** 59 | 60 | **b.) How many cycles did your simulation take? What was the % speedup?** 61 | 62 | ## Checkoff 1: Design 63 | Demonstrate your simulation's functionality and explain your design/approach. 64 | 65 | --- 66 | 67 | ## Automated Flow 68 | 69 | In the last lab, we only focused on the PAR flow through CTS. In this lab, we will go through the full flow. 70 | Routing is the next major flow step. Prior to the actual routing step, Innovus uses a 71 | basic routing engine with errors and shorts, but ignores these errors and simply tries to 72 | get an estimate of delays and parasitics. Once post-CTS 73 | optimization is done, it switches to a different tool that actually legalizes routing and tries to eliminate 74 | shorts while meeting timing. Routing is one of the most 75 | computationally heavy tasks of digital IC design and can take days to complete for complicated designs. 76 | This will be reflected in the runtime in this lab. 77 | 78 | After routing is complete, a post-Route optimization is run to ensure no timing violations 79 | remain. Post-Route optimization typically has little freedom to move cells around, and it tries to 80 | meet the timing constraints mostly by tweaking the length of the routings. You may see some DRC 81 | (Design Rule Check) errors caused by the 7nm technology library, after routing. 82 | 83 | First, synthesize the design: 84 | 85 | ```shell 86 | make syn 87 | ``` 88 | 89 | Then, simulate the synthesized design to make sure it still works: 90 | 91 | ```shell 92 | make sim-gl-syn 93 | ``` 94 | 95 | Once your synthesized design passes the test, you can start the PAR flow: 96 | 97 | ```shell 98 | make par 99 | ``` 100 | 101 | The PAR command will take a long time to complete, as it runs through all stages of PAR. 102 | Check out the iterations that Innovus runs through during optimization. You can see some of the metrics that Innovus is using. 103 | Once it completes, take a look at the build directory as in the previous labs. You might see additional files 104 | compare to the `syn-rundir`, and that’s because the PAR flow incorporates the RC and parasitic delays, in addition to the cell delays. Open `build/par-rundir/gcd_coprocessor.PVT_0P63V_100C.par.spef` 105 | and search for the first occurrence of `D_NET`. What does it say about the first net? You may find 106 | [this wiki page](https://en.wikipedia.org/wiki/Standard_Parasitic_Exchange_Format#Parasitics) helpful. *(thought experiment #1 : get a sense of the units at the top and orders of magnitude of the RC parasitics in the SPEF file. If we used a 5nm technology library, do you expect the resistance to generally increase or decrease? How about the capacitance?)* 107 | 108 | ## Question 2: Automated Flow 109 | 110 | a.) Check the post-Synthesis timing report 111 | (`syn-rundir/reports/final_time_PVT_0P63V_100C.setup_view.rpt`) and post-PAR timing report (`par-rundir/timingReports/gcd_coprocessor_postRoute_all.tarpt`). 112 | **What are the critical paths of your post-PAR and post-Synthesis designs?** 113 | **Are they the same path?** 114 | **How does this critical path compare to your single-unit critical path?** 115 | 116 | b.) Iterate on your design by modifying `design.yml` to find a rough estimate (no need to be too 117 | precise) for the clock period until you start running into setup errors. 118 | **Given the number of cycles it takes to complete the testbench, what is the shortest time your design can finish the computation?** 119 | 120 | c.) Open the post-CTS timing report(`par-rundir/hammer_cts_debug/hammer_cts_all.tarpt`) and the post-PAR 121 | timing report(`par-rundir/timingReports/gcd_coprocessor_postRoute_all.tarpt`). 122 | **Find a common path (same start and end sequential elements). What differences do you notice within the paths?** 123 | 124 | --- 125 | 126 | ## Innovus Commands 127 | 128 | As in the previous lab, we will look at the contents of `par.tcl` that Hammer generates and follow 129 | along using Innovus. 130 | *(thought experiment #2 : open the `par.tcl` and search for the command `set_db add_fillers_cells`. Based on the names of the cells specified by this command, what do you think is the function of the filler cells?)* 131 | 132 | Navigate to the directory `build/par-rundir` and type: 133 | 134 | ```shell 135 | innovus -common_ui 136 | ``` 137 | 138 | This will open the Innovus shell. Next, type `read_db gcd_coprocessor_FINAL` to load the current design 139 | database from the latest PAR flow. This will help us to avoid re-running the entire flow. To see 140 | all the reporting commands, type `help report*` in the Innovus shell and read through the options 141 | available to you. 142 | 143 | ## Checkoff 2: Innovus Commands 144 | Explain the PAR flow, or ask some questions about any steps you don't understand. 145 | 146 | --- 147 | ## Question 3: Innovus Reports 148 | 149 | **a.) What is the area consumed by your design?** 150 | **What percentage of the total area does the arbiter occupy?** 151 | 152 | **b.) Submit a screenshot of your setup slack histogram.** 153 | **Compared with the histogram you obtained in Lab 4, does your new slack distribution support the observed performance improvements you obtained in your coprocessor?** 154 | 155 | --- 156 | 157 | After you are done with the flow, it is time to simulate our newly printed post-PAR netlist. Type 158 | the following command: 159 | 160 | ```shell 161 | make sim-gl-par 162 | ``` 163 | 164 | This will use the same testbench, but will now use the post-PAR netlist of your design, backannotated with delays and parasitics from PAR. Make sure to adjust the `CLOCK_PERIOD` variable in `sim-gl-par.yml` to match the clock period you obtained from PAR. Note, however, that the exact 165 | clock period may not work and you may need to relax it slightly. 166 | 167 | After running `make sim-gl-par` you can run power analysis using: 168 | 169 | ```shell 170 | make power-par 171 | ``` 172 | 173 | Navigate to `build/power-rundir/activePowerReports.PVT_0P63V_100C.setup_view/` and open `power.rpt`. Do 174 | the power estimation numbers match your expectation? 175 | 176 | --- 177 | ## Question 4: Trade-offs 178 | 179 | **a.)** Re-run the flow using your old design. 180 | To prevent your `build` directory from being overwritten, set the `OBJ_DIR` Make variable to a different name (i.e. `make par OBJ_DIR=build2`). 181 | Using the area and power values from Innovus, 182 | **how does the performance improvement from the dual-unit design compare to area occupation and power consumption increase compared to your old design?** 183 | 184 | **b.)** Modify your `gcd_coprocessor.v` to take an input parameter in terms of number of clock cycles we 185 | want our design to meet (`parameter TARGET_NUMBER_OF_CYCLES`) for this given testbench. Your 186 | code should generate a low area, low power design if the number is greater than that your simple 187 | gcd coprocessor can achieve, and it should generate the dual-unit design if it is lower. 188 | **Submit your code.** 189 | 190 | Hint: Use the `verilog` `generate` syntax for choosing between designs. See [here](https://www.chipverify.com/verilog/verilog-generate-block) for documentation on how to use the `generate` syntax. 191 | 192 | c.) (Optional) Using a rough estimate of target number of cycles versus number of units in the design, 193 | write a code that will generate 1-8 cores depending on the performance demand. Do NOT do this 194 | by writing out every possible case explicitly. You can limit the number of units to powers of two 195 | (1,2,4,8) if it makes your life easier. 196 | 197 | --- 198 | 199 | ## Lab Deliverables 200 | 201 | ### Lab Due 11:59 PM, 2 weeks after your registered lab section. (Oct. 17 for lab section 1) 202 | 203 | - Submit a written report with all 4 questions answered to Gradescope 204 | - Checkoff with an ASIC lab TA 205 | 206 | ## Acknowledgements 207 | 208 | This lab is the result of the work of many EECS151/251 GSIs over the years including: 209 | 210 | - Nathan Narevsky (2014, 2017) 211 | - Brian Zimmer (2014) 212 | - Cem Yalcin (2019) 213 | 214 | Modified By: 215 | - John Wright (2015,2016) 216 | - Ali Moin (2018) 217 | - Arya Reais-Parsi (2019) 218 | - Cem Yalcin (2019) 219 | - Tan Nguyen (2020) 220 | - Harrison Liew (2020) 221 | - Sean Huang (2021) 222 | - Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu (2021) 223 | - Dima Nikiforov (2022) 224 | - Roger Hsiao (2022) 225 | -------------------------------------------------------------------------------- /project/checkpoint4.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Project Specification: Checkpoint 4 2 |3 | Prof. Sophia Shao 4 |
5 |6 | TAs (ASIC): Dima Nikiforov 7 |
8 |9 | Department of Electrical Engineering and Computer Science 10 |
11 |12 | College of Engineering, University of California, Berkeley 13 |
14 | 15 | --- 16 | 17 | ## 1 Synthesis, PAR, & Power 18 | 19 | ### 1.1 Performing Synthesis and PAR 20 | Make sure your design is backed up at this point. 21 | 22 | The setup for Synthesis, and PAR is the similar to what we have used in the labs during the class, 23 | with some formatting differences. In `par.yml`, there is extra guidance 24 | for how to do placement constraints. Based on how you implemented the caches from the previous 25 | checkpoint, you will need to modify these constraints to match the master SRAM cell used as well as 26 | the path. Now you should be ready to proceed to Synthesis and PAR. As in the previous labs, execute 27 | the following: 28 | ```shell 29 | export HAMMER_HOME=$PWD/hammer 30 | source hammer/sourceme.sh 31 | ``` 32 | The first thing you should do before simulating, is to make the SRAM libraries, with the command: 33 | ``` 34 | make srams 35 | ``` 36 | If you want to make sure the RTL has been pointed to correctly, you can try running the asm tests 37 | from this environment. To do so, type the following commands: 38 | ``` 39 | make sim-rtl test_asm=all 40 | ``` 41 | The command generates the `simv` file, which is the simulation executable, then iterates through all 42 | the asm tests using the root `Makefile`. If everything looks fine, you can proceed to Synthesis and 43 | PAR: 44 | ``` 45 | make clean 46 | make srams 47 | make syn 48 | make par 49 | ``` 50 | If everything went smoothly, you should now have a circuit laid out. To view the layout, go to 51 | `build/par-rundir/` directory and type 52 | ``` 53 | ./generated_scripts/open_chip 54 | ``` 55 | You are expected to record and document your area, power and clock frequency performance (as 56 | determined by your critical path). To verify that your design works after PAR, use the following commands: 57 | ``` 58 | make sim-gl-par test_asm=all 59 | ``` 60 | Some final notes: 61 | * You may also want to generally make sure that the post-synthesis netlist passes tests before moving onto post-PAR simulation, because the latter can be slower and will complicate your debugging with any PAR-related failures you may have (e.g. incomplete wiring of signal or clock nets 62 | due to a bad floorplan). 63 | * There is a new constraint added to `syn.tcl` under the key `vlsi.inputs.delays`. 64 | The external memory model in `riscv_test_harness.v` generates a delayed version of the signals 65 | going into your CPU (see the parameter `INPUT_DELAY`). Annotating these delays for synthesis/PR 66 | is necessary in order capture this effect when the tools perform timing analysis. If you are 67 | curious, this gets translated into `build/syn-rundir/pin_constraints_fragment.sdc` 68 | as input to synthesis. After synthesis, the relevant pin delays are encoded in 69 | `build/syn-rundir/riscv_top.mapped.sdc`. These are Synopsys Design Constraint 70 | format files. Do not touch this delay constraint except to update the value as your clock period 71 | divided by 5. 72 | * As described in Lab 6, the ASAP7 dummy SRAMs do not have complete timing information. 73 | This is most apparent in gate-level simulations because the SRAMs do not provide any SDF 74 | timing annotation. You may find that despite meeting timing in synthesis and PAR, you will 75 | likely need to increase the gate-level simulation clock period for the benchmarks to pass. 76 | 77 | ### 2 Checkpoint 4 Deliverables 78 | *Checkoff due: May 6 (Friday), 2022* 79 | 80 | 81 | 1. Show that all of the assembly tests and final pass using the cache in a post-par simulation 82 | 83 | 2. Show your layout, and explain your design considerations when creating the floorplan 84 | 85 | 3. Show your final pipeline diagram, updated to match the code 86 | 87 | --- 88 | 89 | ## 3 Beyond Checkpoint 4: CPU Optimization 90 | 91 | ### 3.1 Optimizing for frequency 92 | Beyond functionality, your final project grade will be determined by the maximum operating frequency 93 | of your processor, determined by the critical path. You will also want to optimize for the number of 94 | cycles that your processor takes to execute certain programs, more on that later. The critical path will 95 | be dependent on how aggressively you ask the tools to optimize the design, by changing the target clock 96 | period in the `syn.yml` file. 97 | 98 | When Innovus is finished, look at the timing report for the critical path. In some cases, it is possible 99 | to modify your Verilog to improve the critical path by moving pipeline stage registers. However in other 100 | cases, timing can only be improved by tweaking settings in `syn.yml` and `par.yml`. 101 | 102 | Be sure to backup (meaning check in or branch) your working design before attempting to move 103 | logic, because functionality is worth much more of your grade than maximum frequency. 104 | 105 | You are allowed to add additional pipeline stages, but remember that you will need to deal with the additional hazards that accompany them. 106 | Be careful that adding additional stages does not increase the overall execution. 107 | Your final performance metric is not only based on the clock speed at which your design will run, so keep 108 | that in mind before heavily modifying your design. 109 | 110 | Note for bonus grading: due to the SRAM timing issue described above, the maximum frequency 111 | you achieved in PAR (not gate-level simulation) is most accurate and should be what you report for 112 | frequency. 113 | 114 | ### 3.2 Optimizing for number of cycles 115 | We are providing you tests that are the output of example C programs to run for your processor. They 116 | are meant to be a representative example of different types of programs that each have different reasons 117 | why they may take extra cycles to execute, for a variety of reasons including, but not limited to cache 118 | misses, and branch/jump stalls. A more complicated cache structure may be able to reduce some of the 119 | time spent waiting for memory accesses, but it may not be optimal for all cases. If you implement a 120 | configurable cache you are allowed to set the cache settings differently on a per test basis, you will need 121 | to add those pins to the top level Riscv151 file as well as the testbench with compile flags for VCS. In 122 | terms of dealing with branching and jumping, you can implement any type of branch predictor that you 123 | want to. A branch predictor in its simplest form will always choose to take (or not take) the branch and 124 | then figure out if it was incorrect, and if so go back to where the instruction memory should have gone, 125 | making sure that any additional instructions that were started do not change the state of the CPU. This 126 | means that there should be no writes to memory or any registers for those instructions. 127 | 128 | The list of final tests are contained within the Makefile under the variable `bmark_tests`, which 129 | include a few tests that are meant to actually test the performance of your design. These tests are longer 130 | C programs that are meant to test different aspects of your design and how you handle different types 131 | of hazards. To run these longer tests you can run the following commands, like in checkpoint #3: 132 | ``` 133 | make sim-rtl test_bmark=all 134 | ``` 135 | You may need to increase the number of cycles for timeout for some of the longer tests (like sum, 136 | replace and cachetest) to pass. 137 | 138 | ### 3.3 Optimizing for power 139 | **DISCLAIMER:** The infrastructure to do power analysis in this project is very different from gate-level 140 | simulation and power analysis so far. Doing this optimization is *purely optional* and should only be 141 | tried after you can pass the benchmarks normally! **Proceed at your own risk!** 142 | 143 | You have the ability to also find out the power consumption of your processor for the various provided benchmarks. 144 | The value of this is to figure out whether the way you wrote your logic is efficient 145 | and avoids extra switching activity. Simplify instruction decode logic, forwarding paths, etc. can result 146 | in lower power consumption! 147 | 148 | Near the bottom of `sim-gl-par.yml`, you will see a few lines: 149 | ``` 150 | execute_sim: false 151 | # Below is for power analysis. See the spec for instructions! 152 | # execution_flags_append: 153 | # - "+loadmem=../../tests/asm/addi.hex" 154 | # - "+max-cycles=10000" 155 | ``` 156 | If you reverse the comments (i.e. comment out `execute_sim`: false and uncomment the 157 | rest), this tells Hammer to run the `simv` executable with the addi test, instead of having the `Makefile` 158 | in the root folder run the executable. This is currently the only way that we can get Hammer currently 159 | to generate the SAIF file with our benchmark hex files. To proceed with the simulation of addi in this 160 | case: 161 | ``` 162 | make sim-gl-par test_asm=addi.out 163 | ``` 164 | You will find that it will do a simulation twice due to how the root `Makefile` is configured. The 165 | first one should pass (after a lot of printing each cycle number), while the second one should also pass 166 | like you have seen so far—ignore this second simulation. You should now see an `ucli.saif` file in 167 | `build/sim-rundir`. Then, as in previous labs, run Voltus: 168 | ``` 169 | make power-par 170 | ``` 171 | And you should get static and dynamic power reports in `build/power-rundir`. 172 | 173 | Some closing recommendations: 174 | 175 | * This infrastructure only allows us to run one benchmark at a time. To run a different benchmark, 176 | replace the hex file in the `execution_flags_append` list, and alter the `max-cycles` value 177 | as necessary (see the `*_timeout_cycles_variables` in the root Makefile for the numbers). 178 | * Due to the ASAP7 PDK’s dummy SRAMs, we can’t measure SRAM power, and thus can’t find 179 | out how power-efficient our caching is. Therefore, the best benchmarks to run would be an 180 | arithmetic-heavy one that relies heavily on the register file (but the provided benchmarks require 181 | memory accesses). If you have lots of time on your hand, we encourage you to find power 182 | numbers for the **final** benchmark, but **you will not be graded on power performance.** 183 | 184 | --- 185 | 186 | 187 | ## Acknowledgement 188 | 189 | This project is the result of the work of many EECS151/251 GSIs over the years including: 190 | Written By: 191 | - Nathan Narevsky (2014, 2017) 192 | - Brian Zimmer (2014) 193 | Modified By: 194 | - John Wright (2015,2016) 195 | - Ali Moin (2018) 196 | - Arya Reais-Parsi (2019) 197 | - Cem Yalcin (2019) 198 | - Tan Nguyen (2020) 199 | - Harrison Liew (2020) 200 | - Sean Huang (2021) 201 | - Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu (2021) 202 | - Dima Nikiforov (2022) 203 | -------------------------------------------------------------------------------- /project/overview.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Project Specification RISC-V Processor Design: Overview 2 |3 | Prof. Sophia Shao 4 |
5 |6 | TAs (ASIC): Dima Nikiforov 7 |
8 |9 | Department of Electrical Engineering and Computer Science 10 |
11 |12 | College of Engineering, University of California, Berkeley 13 |
14 | ## 1. Introduction 15 | 16 | The primary goal of this project is to familiarize students with the methods and tools of digital design. In order to make the project both interesting and useful, we will guide you through the implementation of a CPU that is intended to be integrated on a modern SoC. Working alone or in teams of 2, you will be designing a simple 3-stage CPU that implements the RISC-V ISA, developed here at UC Berkeley. If you work in a team, you must both have a complete understanding of your entire project code, and you will both receive the same grade. 17 | 18 | Your first and most important goal is to write a functional implementation of your processor. To better expose you to real design decisions, you will also be tasked with improving the performance of your processor. You will be required to meet a minimum performance to be specified later in the project. 19 | 20 | You will use Verilog HDL to implement this system. You will be provided with some testbenches to verify your design, but you will be responsible for creating additional testbenches to exercise your entire design. Your target implementation technology will be the ASAP7 7nm Educational PDK, a predictive model technology used for instruction. The project will give you experience designing synthesizeable RTL (Register Transfer Level) code, resolving hazards in a simple pipeline, building interfaces, and approaching system-level optimization. 21 | 22 | Your first step will be to map our high level specification to a design which can be translated into a hardware implementation. You will then generate and debug that implementation in Verilog. These steps may take significant time if you do not put effort into your system architecture before attempting implementation. After you have built a working design, you will be optimizing it for speed in the 7nm technology that we have been using this semester. 23 | 24 | 25 | ### 1.1 RISC-V 26 | The final project for this class will be a VLSI implementation of a RISC-V (pronounced risk-five) CPU. RISC-V is an instruction set architecture (ISA) developed here at UC Berkeley. It was originally developed for computer architecture research and education purposes, but recently there has been a push towards commercialization and industry adoption. For the purposes of this lab, you don’t need to delve too deeply into the details of RISC-V. However, it may be good to familiarize yourself with it, as this will be at the core of your final project. Check out the official [RISC-V Instruction Set Manual](https://riscv.org/technical/specifications/) (Volume 1, Unprivileged Spec) and explore http://riscv.org for more information. 27 | - Read sections 2.2 and 2.3 to understand how the different types of instructions are encoded. 28 | - Read sections 2.4, 2.5, 2.6, and 9.1 and think about how each of the instructions will use the ALU 29 | 30 | ### 1.2 Project phases 31 | Your project will consist of two different phases: front-end and back-end. Within each phase, you will have multiple checkpoints that will ensure you are making consistent progress. These checkpoints will contribute (although not significantly) to your final grade. You are free to make design changes after they have been checked off. 32 | 33 | In the first phase (front-end), you will design and implement a 3-stage RISC-V processor in Verilog, and run simulations to test for functionality. At this point, you will only have a functional description of your processor that is independent of technology (there are no standard cells yet). You are highly encouraged to finish each checkpoint early, and each checkpoint will be released before the due date of the ongoing one. Everything will take much longer than you expect, and finishing early gives you more time to improve your QoR (Quality of Results, e.g. clock period). 34 | 35 | In the second phase (back-end), you will implement your front-end design in the ASAP7 7nm kit using the VLSI tools you used in lab. When you have finished phase 2, you will have a design that could move onto fabrication if this were a real technology process. You will have about 2 weeks to complete the second phase after its release. 36 | 37 | ### 1.3 Philosophy 38 | This document is meant to describe a high-level specification for the project and its associated support hardware. You can also use it to help lay out a plan for completing the project. As with any design you will encounter in the professional world, we are merely providing a framework within which your project must fit. 39 | 40 | You should consider the GSI(s) a source of direction and clarification, but it is up to you to produce a fully-functional design and its physical implementation. Ultimately the responsibility of designing and debugging your solution lies on you. 41 | 42 | ### 1.4 General Project Tips 43 | Be sure to use top-down design methodologies in this project. We began by taking the problem of designing a basic computer system, modularizing it into distinct parts, and then refining those parts into manageable checkpoints. You should take this scheme one step further; we have given you each checkpoint, so break each into smaller, manageable pieces. 44 | 45 | As with many engineering disciplines, digital design has a normal development cycle. In the norm, after modularizing your design, your strategy should roughly resemble the following steps: 46 | 47 | - **Design** your modules well, make sure you understand what you want before you begin to code. 48 | 49 | - **Code** exactly what you designed; do not try to add features without redesigning. 50 | 51 | - **Simulate** thoroughly; writing a good testbench is as much a part of creating a module as actually coding it. 52 | - **Debug** completely; anything which can go wrong with your implementation will. 53 | 54 | Some general tips when designing complex RTL modules: 55 | 56 | * Document your project thoroughly as you go 57 | * comment your Verilog 58 | * before making any RTL changes, **modify your pipeline diagram first to visualize this change**, doing this: 59 | * may reveal the change is actually infeasible 60 | * ensures that you and your partner have the same view of your processor's operation 61 | * Split the module operation into data/control paths and design each separately 62 | * Start with the simplest possible implementation 63 | * Make changes incrementally and always test your module after each change, no matter how small 64 | * Finish the required features first before attempting any extra features 65 | * Use github version control features like commits, branches, etc. 66 | * Save your work often and rely on redundancy (e.g. copy files from `/scratch` to your home directory often to ensure they're backed up) 67 | * Parallelize work as much as possible (e.g. start writing CPU RTL as you finish your diagram, work on CPU and Cache in parallel, start physical design as you finish your cache) 68 | 69 | 70 | This project is divided into checkpoints. Each checkpoint will be due 1 to 2 weeks after its release, but the next checkpoint will be released early. Use this to your advantage- try to get ahead so that you have additional time to debug. Your TA will clarify the specific timeline for your semester. 71 | 72 | The most important goal is to design a functional processor- this alone is 50-60% of the final grade, and you must have it **working completely** to receive any credit for performance. 73 | 74 | --- 75 | 76 | ## 2. Front-end design (Phase 1) 77 | 78 | The first phase in this project is designed to guide the development of a three-stage pipelined RISC-V CPU that will be used as a base system for your back-end implementation. 79 | Phase 1 will last for 5 weeks and has weekly checkpoints. 80 | 81 | - Checkpoint 1: ALU design and pipeline diagram 82 | - Checkpoint 2: Core implementation 83 | - Checkpoint 3: Core + memory system implementation 84 | - Checkpoint 4: Physical Design 85 | 86 | 87 | ### 2.1 Adding SSH Key 88 | First you must add an SSH key to your Github account, to allow you to push to your project repo from the instructional machines without entering your Github password each time. You may run these commands in any location on any instructional machine (the SSH key will be stored in your home directory and thus work on all machines). 89 | ```shell 90 | ssh-keygen -t ed25519 -C "your_email@example.com" 91 | # hit Enter to each prompt (leave response blank) 92 | cat ~/.ssh/id_ed25519.pub 93 | # Then select and copy the contents of the id_ed25519.pub file 94 | # displayed in the terminal to your clipboard 95 | ``` 96 | 97 | In your browser, navigate to [https://github.com/settings/ssh/new](https://github.com/settings/ssh/new) (log into your Github account if needed). You should see the `SSH Keys / Add New` page. Enter the following values: 98 | * Title: `something descriptive (ex. eecs151)` 99 | * Key: `paste the contents of the id_ed25519.pub file` 100 | 101 | Then click the green `Add SSH key` button. 102 | 103 | ### 2.2 Project Git Repo 104 | The skeleton files for the project will be delivered as a git repository provided by the staff. You should clone this repository as follows. It is highly recommended to familiarize yourself with git and use it to manage your development. 105 | 106 | 107 | ```shell 108 | git clone /home/ff/eecs151/labs/project_skeleton /path/to/my/project 109 | ``` 110 | 111 | To get a team repo, fill out the google form via the link on Piazza with your team information. Please do this even if you are working alone, as these git repos will be used for version control and as part of the final checkoff. You will receive an email with an invite link to your project repo, which you should click to join before following the directions below. 112 | 113 | An example working flow to be able to pull from the skeleton as well as push/pull with your team repository is shown below: 114 | 115 | 116 | ```shell 117 | cd /path/to/my/project 118 | git remote add myOrigin git@github.com:EECS150/fa21_asic_teamXX 119 | ``` 120 | 121 | Then to pull changes from the skeleton, you would need to type: 122 | ```shell 123 | git pull origin master 124 | ``` 125 | 126 | To pull changes from your team repository you would type: 127 | ```shell 128 | git pull myOrigin master 129 | ``` 130 | 131 | And to push changes to your team repository (please do not attempt to push to the skeleton repository), you would usually want to pull first (above) and then type: 132 | ```shell 133 | git push myOrigin master 134 | ``` 135 | 136 | --- 137 | 138 | ## 3. Grading 139 | 140 | ### EECS 151: 141 | | | | 142 | |-------------------|---------| 143 | | **70%** | Functionality at project due date: Your design will be subjected to a comprehensive test suite and your score will reflect how many of the tests your implementation passes. 144 | | **25%** | Final Report and Final Interview: If your design is not 100% functional, this is your opportunity explain your bugs and recoup points. 145 | | **5%** | Checkpoints: Each check-off is worth 1.25%. If you accomplished all of your checkpoints on time, you will receive full credit in this category. 146 | | **Bonus 5%** | Performance at project due date: You must have a fully working design to score points in this section. You will receive up to 5 bonus points as your performance improves relative to your peers. Performance will be calculated using the Iron Law: IPC * F 147 | 148 | ### EECS 251A: 149 | | | | 150 | |-----------------|---------| 151 | | **60%** | Functionality at project due date: Your design will be subjected to a comprehensive test suite and your score will reflect how many of the tests your implementation passes. 152 | | **10%** | Set-Associative Cache: Implementation and performance of the configurable set-associative cache. 153 | | **25%** | Final Report and Final Interview: If your design is not 100% functional, this is your opportunity explain your bugs and recoup points. 154 | | **5%** | Checkpoints: Each check-off is worth 1.25%. If you accomplished all of your checkpoints on time, you will receive full credit in this category. 155 | | **Bonus 5%** | Performance at project due date: You must have a fully working design to score points in this section. You will receive up to 5 bonus points as your performance improves relative to your peers. Performance will be calculated using the Iron Law: IPC * F 156 | 157 | ## Acknowledgement 158 | 159 | This project is the result of the work of many EECS151/251 GSIs over the years including: 160 | Written By: 161 | - Nathan Narevsky (2014, 2017) 162 | - Brian Zimmer (2014) 163 | Modified By: 164 | - John Wright (2015,2016) 165 | - Ali Moin (2018) 166 | - Arya Reais-Parsi (2019) 167 | - Cem Yalcin (2019) 168 | - Tan Nguyen (2020) 169 | - Harrison Liew (2020) 170 | - Sean Huang (2021) 171 | - Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu (2021) 172 | - Dima Nikiforov (2022) 173 | -------------------------------------------------------------------------------- /project/checkpoint1.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Project Specification: Checkpoint 1 2 |3 | Prof. Sophia Shao 4 |
5 |6 | TAs (ASIC): Dima Nikiforov 7 |
8 |9 | Department of Electrical Engineering and Computer Science 10 |
11 |12 | College of Engineering, University of California, Berkeley 13 |
14 | 15 | --- 16 | 17 | ## ALU design and pipeline diagram 18 | The ALU that we will implement in this lab is for a RISC-V instruction set architecture. Pay close attention to the design patterns and how the ALU is intended to function in the context of the RISC-V processor. In particular it is important to note the separation of the datapath and control used in this system which we will explore more here. 19 | 20 | The specific instructions that your ALU must support are shown in the tables below. The branch condition should not be calculated in the ALU. Depending on your CPU implementation, your ALU may or may not need to do anything for branch, jump, load, and store instructions (i.e., it can just output 0). 21 | 22 | --- 23 | 24 | ### 1. Making a pipeline diagram 25 | 26 | 27 | The first step in this project is to make a pipeline diagram of your processor, as described in lecture. You only need to make a diagram of the datapath (not the control). Each stage should be clearly separated with a vertical line, and flip-flops will form the boundary between stages. It is a good idea to name signals depending on what stage they are in (eg. `s1_killf`, `s2_rd0`). Also, it is a good idea to separately name the input/output (D/Q) of a flip flop (eg. `s0_next_pc`, `s1_pc`). Draw your diagram in a drawing program (Inkscape, Google Drawings, draw.io or any program you want), because you will need to keep it up-to-date as you build your processor. As such, we recommend you leave plenty of space between diagram elements to make it easier to insert changes as your project evolves. 28 | It helps to print out scratch copies while you are debugging your processor and to keep your drawings revision-controlled with git. Once you have finished your initial datapath design, you will implement the main building block in the datapath—the ALU. 29 | 30 | --- 31 | 32 | ### 2. ALU functional specification 33 | Given specifications about what the ALU should do, you will create an ALU in Verilog and write a test harness to test the ALU. 34 | 35 | The encoding of each instruction is shown in the table below. There is a detailed functional description of each of the instructions in Section 2.4 of the [RISC-V Instruction Set Manual](https://riscv.org/technical/specifications/) (Volume 1, Unprivileged Spec). Pay close attention to the functional description of each instruction as there are some subtleties. 36 | 37 |
38 |
39 |
4 | Prof. Sophia Shao 5 |
6 |7 | TAs (ASIC): Dima Nikiforov 8 |
9 |10 | Department of Electrical Engineering and Computer Science 11 |
12 |13 | College of Engineering, University of California, Berkeley 14 |
15 | 16 | ## Overview 17 | 18 | The process of VLSI design is different than developing software, designing analog circuits, and even FPGA-based design. Instead of using a single graphical user interface (GUI) or environment (eg. Eclipse, Cadence Virtuoso, or Xilinx Vivado), VLSI design is done using dozens of command line interface tools on a Linux machine. These tools primarily use text files as their inputs and outputs, and include GUIs mainly for only visualization, rather than design. Therefore, familiarity with Linux, text manipulation, and scripting is required to successfully complete the labs this semester. 19 | 20 | The goal of this lab is to introduce some basic techniques needed to use the computer aided design (CAD) tools that are taught in this class. Mastering the topics in this lab will help you save hours of time in later labs and make you a much more efficient chip designer. While you go through this lab, focus on how these techniques will allow you to automate tasks and improve your efficiency. Chip design requires plenty of iteration, so being able to perform trials and identify errors quickly is key to success. 21 | 22 | ## Administrative Info 23 | 24 | This lab, like all labs will be turned in electronically using Gradescope. Please upload a pdf document with the answers to the six questions in the lab. 25 | 26 | ### Getting an Instructional Account 27 | 28 | You are required to get an EECS instructional account to login to the workstations in the lab, since you will be doing all your work on these machines (whether you're working remotely or in-person). This can be done by using WebAcct here: http://inst.eecs.berkeley.edu/webacct. 29 | 30 | Once you login using your CalNet ID, you can click on 'Get a new account' in the eecs151 row. Once the account has been created, you can email your class account form to yourself to have a record of your account information. You can follow the instructions on the emailed form to change your Linux password with `ssh update.eecs.berkeley.edu` and following the prompts. 31 | 32 | ## Logging into the Classroom Servers 33 | 34 | The servers used for this class are primarily `eda-[1-11].eecs.berkeley.edu`. You may also use the `c111-[1-17].eecs.berkeley.edu` machines 35 | (which are physically located in Cory 111/117), although those will be shared with the FPGA lab. You can access all of these machines remotely through SSH. 36 | 37 | ### Remote Access 38 | 39 | It is important that you can remotely access the instructional servers. There are two convenient ways to remotely access our 40 | lab machines: SSH (Secure SHell) and X2Go. 41 | First, select a machine. The range of accessible machines are `eda-X`, where X is a number from 1 to 11, 42 | and `c111-X`, where X is a number from 1 to 17. The fully qualified DNS name (FQDN) of 43 | your machine is then `eda-X.eecs.berkeley.edu` or `c111-X.eecs.berkeley.edu`. For example, 44 | if you select machine `eda-8`, the FQDN would be `eda-8.eecs.berkeley.edu`. 45 | You can use any lab machine, but our lab machines aren’t very powerful; if everyone 46 | uses the same one, everyone will find that their jobs perform poorly. ASIC design tools are resource 47 | intensive and will not run well when there are too many simultaneous users on these machines. We 48 | recommend that every time you want to log into a machine, examine its load on https://hivemind.eecs.berkeley.edu/ 49 | for the `eda-X` machines, or using `top` when you log in. If it is heavily loaded, consider 50 | using a different machine. If you also notice other `eecs151` users with jobs consuming excessive 51 | resources, do feel free to reach out to the GSIs about it. 52 | Next, note your instructional class acccount name - the one that looks like `eecs151-YYY`, for example 53 | `eecs151-abc`. This is the account you created at the start of this lab. 54 | 55 | 56 | #### SSH: Linux, BSD, MacOS 57 | 58 | SSH is the de facto remote terminal tool for Linux and BSD systems (which includes macOS). It 59 | lets you login to a text console from anywhere (as long as you have network connectivity). SSH 60 | also comes as a standard utility in almost all Linux and BSD systems. 61 | If you’re using Linux or BSD, you should be able to access your workstation through SSH by running: 62 | 63 | ```shell 64 | ssh eecs151-YYY@eda-X.eecs.berkeley.edu 65 | ``` 66 | 67 | In our examples, this would be: 68 | 69 | ```shell 70 | ssh eecs151-abc@eda-8.eecs.berkeley.edu 71 | ``` 72 | 73 | The SSH protocol also enables file transfer between your local and lab machines via the `sftp` and 74 | `scp` utilities. **WARNING: please only transfer files needed for your reports and nothing else, particularly files relating to CAD tool commnads or process technologies!!!** 75 | 76 | 77 | #### SSH: Windows 78 | 79 | The classic and most lightweight way to use SSH on Windows is PuTTY (https://www.putty.org/). Download it and login with the FQDN above as the Host and your instructional account 80 | username. You can also use WinSCP (winscp.net) for file transfer over SSH. 81 | Advanced users may wish to install Windows Subsystem for Linux (https://docs.microsoft.com/en-us/windows/wsl/install-win10, Windows 10 build 16215 or later) or Cygwin (cygwin.com) and use SSH, SFTP, and SCP through there. 82 | 83 | 84 | #### SSHL Session Management 85 | 86 | Because all your work will be done remotely, we recommend that you utilize SSH session management tools and that all terminal-based work be done over SSH. This would allow your remote terminal sessions to remain active even if your SSH session disconnects, intentionally or not. 87 | The two most common session managers are tmux and screen. These run persistently on the 88 | remote workstation, are highly customizable, and can greatly improve your productivity. 89 | Here are some good tmux and screen tutorials: 90 | * https://www.hamvocke.com/blog/a-quick-and-easy-guide-to-tmux/ 91 | * https://www.rackaid.com/blog/linux-screen-tutorial-and-how-to/ 92 | 93 | 94 | #### X2Go 95 | 96 | For situations in which you need a graphical interface (waveform debugging, layout viewing, etc.), 97 | you should use X2Go. This is a faster and more reliable alternative to more traditional XForwarding over SSH. X2Go is also recommended because it connects to a persistent graphical 98 | desktop environment, which continues running even if your internet connection drops. 99 | Download the X2Go client for your platform from the website: https://wiki.x2go.org/doku.php/download:start. 100 | 101 | Note: MacOS sometimes blocks the X2Go download/install, if it does follow the directions here: https://support.apple.com/en-us/HT202491. 102 | 103 | To use X2Go, you need to create a new session (look under the Session menu). Give the session any 104 | name, it doesn’t matter, but set the Host field to the FQDN of your lab machine and the User field 105 | to your instructional account username. For “Session type”, select “GNOME”. Here’s an example from macOS: 106 | 107 |
108 |
109 |
3 | Prof. Sophia Shao 4 |
5 |6 | TAs (ASIC): Dima Nikiforov 7 |
8 |9 | Department of Electrical Engineering and Computer Science 10 |
11 |12 | College of Engineering, University of California, Berkeley 13 |
14 | 15 | 16 | 17 | ## Overview 18 | For this lab, you will learn how to translate RTL code into a gate-level netlist in a process called 19 | synthesis. In order to successfully synthesize your design, you will need to understand how to 20 | constrain your design, learn how the tools optimize logic and estimate timing, analyze the critical 21 | path of your design, and simulate the gate-level netlist. 22 | To begin this lab, get the project files by typing the following commands: 23 | 24 | ```shell 25 | git clone /home/ff/eecs151/labs/lab3.git 26 | cd lab3 27 | ``` 28 | 29 | You should add the following lines to the `.bashrc` file in your home folder 30 | (for more information about what `.bashrc` does, see https://www.tldp.org/LDP/abs/html/sample-bashrc.html) 31 | so that every time 32 | you open a new terminal you have the paths for the tools setup properly. 33 | 34 | ```shell 35 | source /home/ff/eecs151/tutorials/eecs151.bashrc 36 | export HAMMER_HOME=/home/ff/eecs151/hammer 37 | source ${HAMMER_HOME}/sourceme.sh 38 | ``` 39 | 40 | Type 41 | 42 | ```shell 43 | which genus 44 | ``` 45 | 46 | to see if the shell prints out the path to the Cadence Genus Synthesis program (which we will be 47 | using for this lab). If it does not work, add the lines to your `.bash_profile` in your home folder 48 | as well. Try to open a new terminal to see if it works. The file `eecs151.bashrc` sets various 49 | environment variables in your system such as where to find the CAD programs or license servers. 50 | 51 | 52 | ## Synthesis Environment 53 | To perform synthesis, we will be using Cadence Genus. However, we will not be interfacing with 54 | Genus directly, we will rather use Hammer. Just like in lab 2, we have set up the basic Hammer 55 | flow for your lab exercises using Makefile. 56 | 57 | In this lab repository, you will see two sets of input files for Hammer. The first set of files are 58 | the source codes for our design that you will explore in the next section. The second set of files are 59 | some YAML files (`inst-env.yml`, `asap7.yml`, `design.yml`, `sim-rtl.yml`, `sim-gl-syn.yml`) that 60 | configure the Hammer flow. Of these YAML files, you should only need to modify `design.yml`, 61 | `sim-rtl.yml` and `sim-gl-syn.yml` in order to configure the synthesis and simulation for your 62 | design. 63 | 64 | 65 | Hammer is already setup at `/home/ff/eecs151/hammer` with all the required plugins for Cadence 66 | Synthesis (Genus) and Place-and-Route (Innovus), Synopsys Simulator (VCS), Mentor Graphics 67 | DRC and LVS (Calibre). You should not need to install it on your own home directory. **These 68 | Hammer plugins are under NDA. They are provided to us for educational purpose. 69 | They should never be copied outside of instructional machines under any circumstances or else we are at risk of unable to get access to the tools in the future!!!** 70 | 71 | Let us take a look at some parts of `design.yml` file: 72 | 73 | ```yaml 74 | gcd.clockPeriod: &CLK_PERIOD "1ns" 75 | ``` 76 | 77 | This option sets the target clock speed for our design. A more stringent target (a shorter clock 78 | period) will make the tool work harder and use higher-power gates to meet the 79 | constraints. A more relaxed timing target allows the tool to focus on reducing area and/or power. 80 | In the sim-rtl.yml: 81 | 82 | ```yaml 83 | defines: 84 | - "CLOCK_PERIOD=1.00" 85 | ``` 86 | 87 | This option sets the clock period used during simulation. It is generally useful to separate the two as 88 | you might want to see how the circuit performs under different clock frequencies without changing 89 | the design constraints. Continuing from `design.yml`: 90 | 91 | ```yaml 92 | gcd.verilogSrc: &VERILOG_SRC 93 | - "src/gcd.v" 94 | - "src/gcd_datapath.v" 95 | - "src/gcd_control.v" 96 | ``` 97 | 98 | and in `sim-rtl.yml`: 99 | 100 | ```yaml 101 | sim.inputs: 102 | input_files: 103 | - "src/gcd.v" 104 | - "src/gcd_datapath.v" 105 | - "src/gcd_control.v" 106 | - "src/gcd_testbench.v" 107 | ``` 108 | 109 | These specify the files for synthesis and simulation. Moving on, we have: 110 | 111 | ```yaml 112 | vlsi.inputs.clocks: [ 113 | {name: "clk", period: *CLK_PERIOD, uncertainty: "0.1ns"} 114 | ] 115 | ``` 116 | 117 | This is where we specify to Hammer that we intend on using the `CLK_PERIOD` we defined earlier 118 | as the constraint for our design. We will see more detailed constraints in later labs. 119 | 120 | ## Understanding the example design 121 | We have provided a circuit described in Verilog that computes the greatest common divisor (GCD) 122 | of two numbers. Unlike the FIR filter from the last lab, in which the testbench constantly provided 123 | stimuli, the GCD algorithm takes a variable number of cycles, so the testbench needs to know when 124 | the circuit is done to check the output. This is accomplished through a “ready/valid” handshake 125 | protocol. This protocol shows up in many places in digital circuit design. 126 | Look [here](https://inst.eecs.berkeley.edu/~eecs151/fa21/files/verilog/ready_valid_interface.pdf) at information on the course website for more background. 127 | The GCD top level is shown in the figure below. 128 | 129 |
130 |
131 |
3 | Prof. Sophia Shao 4 |
5 |6 | TAs (ASIC): Dima Nikiforov 7 |
8 |9 | Department of Electrical Engineering and Computer Science 10 |
11 |12 | College of Engineering, University of California, Berkeley 13 |
14 | 15 | ## Overview 16 | In this lab, we will go over two very important concepts. First we will look at the basics of using 17 | circuits beyond standard cells in VLSI designs. The most common example of this is SRAM, 18 | which is a dense addressable memory block used in most VLSI designs. You will learn about SRAM 19 | in more detail later in the lectures, but the [Wikipedia article on SRAM](https://en.wikipedia.org/wiki/Static_random-access_memory) provides a good starting 20 | point. SRAM is treated as a hard macro block in VLSI flow. It is created separately from the 21 | standard cell libraries. The process for adding other custom, analog, or mixed signal circuits will 22 | be similar to what we use for SRAMs. In your project, you will use the SRAMs extensively for 23 | data caching. It is important to know how to design a digital circuit and run a CAD flow with 24 | those hard macro blocks. The lab exercises will help you get familiar with SRAM interfacing. We 25 | will use an example design of computing a dot product of two vectors to walk you through how to 26 | use the SRAM blocks. 27 | 28 | Next, we will take a cursory glance at part of the ”signoff” flow: design rule checking (DRC) and 29 | layout-versus-schematic (LVS). DRC checks all the geometry in the post-PAR’d layout to see that 30 | they meet all the design rules for the process technology. LVS checks for discrepancies between 31 | the actual layout and the netlist that the PAR tool thinks it laid out. 32 | In a purely standard-cell based design, LVS will almost never be wrong. However, once you start 33 | integrating hard macros like SRAMs and custom analog cells, LVS can reveal unconnected pins, 34 | unintended shorts between power/ground/signals, and more that would prevent the circuit from 35 | working. Often, these stem from improper abstraction of the macro cells for the PAR tool. 36 | 37 | To begin this lab, get the project files and set up your environment by typing the following command and sourcing the `eecs151.bashrc` file, as usual: 38 | 39 | ```shell 40 | git clone /home/ff/eecs151/labs/lab6.git 41 | ``` 42 | 43 | You should also clean up the build directory generated from the previous labs to save some disk space. 44 | 45 | For this lab, there are many Make targets that will be run, some of which you have explored in 46 | previous labs. The following list is a reference of what each one does for future reference, but **do not run them right now!** 47 | 48 | ```shell 49 | # This command gets all the relevant SRAM configurations (file pointers) for the ASAP7 library 50 | make srams 51 | 52 | # This command runs RTL simulation 53 | make sim-rtl 54 | 55 | # This command runs Synthesis using Cadence Genus tool 56 | make syn 57 | 58 | # This command runs Post-Synthesis gate-level simulation 59 | make sim-gl-syn 60 | 61 | # This command runs Placement-and-Routing using Cadence Innovus tool 62 | make par 63 | 64 | # This command runs Post-PAR gate-level simulation 65 | make sim-gl-par 66 | 67 | # This command runs Post-PAR power estimation 68 | make power-par 69 | 70 | # This command runs DRC using Mentor Calibre tool 71 | make drc 72 | 73 | # This command runs LVS using Mentor Calibre tool 74 | make lvs 75 | ``` 76 | 77 | The configuration files (`*.yml` files) are intended to provide you more flexibility when you have a 78 | large design project, and you want to test the modules separately before final integration. You can 79 | simply set the top-level module to the one you care about in these configuration files. Don’t hesitate 80 | the make changes to those files whenever you want to test out your new modules. This structure 81 | will also be used in the final project, so please take the exercises in this lab as a final practice run 82 | with the CAD flow so that you will become more productive when 83 | working on your project. At the very least, you should be aware of which files to make changes for the tasks 84 | that you want to carry out. We will run through small to moderate designs to get a sense of the 85 | entire flow. Please let the TAs know if you have any feedback or suggestion on how to improve 86 | the tool flow, or you if encounter some tooling issues. 87 | 88 | ## SRAM Modeling and Abstraction 89 | Open the file `src/dot_product.v`. This Verilog module implements a vector dot product of two 90 | vectors of unsigned integers a and b. The module first reads elements of the vectors one-by-one via 91 | the ready/valid interfaces and stores them to two SRAMs, one for each vector. 92 | 93 | Note: You will see some `REGISTER_R_CE` blocks in `dot_product.v`. These are used by some 94 | iterations of this lab to remove the `reg` ambiguity that exists in Verilog. You may refer to 95 | `/home/ff/eecs151/verilog_lib/EECS151.v` to see their definition, but in essence they are structural descriptions of registers that are unambiguously translated to flip-flops when written in this 96 | fashion. You may use these constructs or normal verilog syntax. 97 | 98 | Let’s look at one particular SRAM module instantiation to understand its interface. The function 99 | of select ports are annotated here: 100 | 101 | ```v 102 | SRAM2RW16x16 sram ( 103 | .CE1(), // clock edge (clock signal) 104 | .CE2(), 105 | 106 | .WEB1(), // Write Enable Bar (HIGH: Read, LOW: Write) 107 | .WEB2(), 108 | .OEB1(), // Output Enable Bar (always tie to LOW) 109 | .OEB2(), 110 | .CSB1(), // Chip Select Bar (always tie to LOW) 111 | .CSB2(), 112 | 113 | .A1(), // Address pin 114 | .A2(), 115 | .I1(), // Input Data pin 116 | .I2(), 117 | .O1(), // Output Data pin 118 | .O2() 119 | ); 120 | ``` 121 | 122 | This `SRAM2RW16x16` is a dual-port Read/Write memory block of sixteen 16-bit entries. This means 123 | there is a 4-bit address for selecting those 16-bit entries. The SRAM can be clocked with two 124 | independent clock signals. Also, to write to an SRAM, we need to set the `WEBi` signal to LOW. The 125 | signals `OEBi` and `CSBi` should be set to LOW. SRAMs are synchronous-write and synchronous-read; 126 | the read data is only available at the next rising edge, and the write data is only written at the 127 | next rising edge. 128 | 129 | Where are those SRAMs coming from? Because SRAMs are not made out of standard cells, and 130 | are rather built using different units that do not conform to our PAR flow, they are pre-compiled 131 | and stored in separate databases. These cells are then instantiated by Innovus as black boxes, 132 | and are connected to the rest of the circuit as specified in your Verilog. In order to generate the 133 | database that Innovus will use, type the following command: 134 | 135 | ```shell 136 | make srams 137 | ``` 138 | 139 | For simulation purposes, a Verilog behavioral model for the SRAMs from the HAMMER repository 140 | is used. This is automatically set up in build/sram generator-output.json and points to `/home/ff/eecs151/hammer/src/hammer-vlsi/technology/asap7/sram_compiler/memories/behavioral/sram_behav_models.v`. 141 | 142 | This file includes models for various types of SRAMs. You can find SRAMs that have only singleport for Read and Write, or SRAMs with different address widths and data widths. For your final 143 | project, you need to select the appropriate SRAM models to meet the specification. The SRAM 144 | models in this file are only intended for simulation, **do not include this file in your project configuration for Synthesis or PAR**, otherwise, it will mess up with your post-Synthesis or 145 | post-PAR netlist. 146 | 147 | For Synthesis and PAR, the SRAMs must be abstracted away from the tools, because the only 148 | things that the flow is concerned about at these stages are the timing characteristics and the outer 149 | layout geometry of the SRAM macros. The ASAP7 PDK does not come with SRAMs by default, 150 | so a graduate student (Sean Huang) graciously created some dummy models for us to use. They 151 | are located at: 152 | 153 | ```shell 154 | # Liberty Timing File -- delay information 155 | /home/ff/eecs151/hammer/src/hammer-vlsi/technology/asap7/sram_compiler/memories/lib/ 156 | 157 | # Library Exchange Format -- placement information 158 | /home/ff/eecs151/hammer/src/hammer-vlsi/technology/asap7/sram_compiler/memories/lef/ 159 | 160 | # Graphical Database System -- final layout information 161 | /home/ff/eecs151/hammer/src/hammer-vlsi/technology/asap7/sram_compiler/memories/gds/ 162 | 163 | ``` 164 | 165 | #### Liberty Timing Files (*.lib) 166 | [Liberty files](http://web.engr.uky.edu/~elias/lectures/LibertyFileIntroduction.pdf) must be generated for macros at every relevant process, voltage, and temperature 167 | (PVT) corner that you are using for setup and hold timing analysis. Detailed models contain 168 | descriptions of what each pin does, the delays depending on the load given in tables, and power 169 | information. There are also 3 types of Liberty files: [CCS, ECSM, and NLDM](https://chitlesh.ch/wordpress/liberty-ccs-ecsm-or-ndlm/), which tradeoff 170 | accuracy with tool runtime. 171 | If you open up a file for the 172 | SRAMs we are using, you will see that they are very basic because these are fake timing models. 173 | Note that you will see that your post-synthesis and post- 174 | 175 | 176 | 177 | 178 | timing reports will differ from gatelevel simulation due to these inaccuracies. 179 | 180 | #### Library Exchange Format (*.lef) 181 | [LEF files](http://web.engr.uky.edu/~elias/lectures/LibertyFileIntroduction.pdf) must be generated for macros in order to denote 182 | where pins are located and encode any obstructions (places where the PAR tool cannot place other 183 | cells or routing). The quality of LEFs is very important to get clean layouts. Again, our SRAM 184 | LEFs are fake, so they may present some issues with routing and DRC. 185 | 186 | 187 | #### Graphical Database System (*.gds) 188 | [GDS files](https://www.artwork.com/gdsii/gdsii/) must be generated 189 | for macros to encode the entire detailed layout, and get merged with the PAR’d layout before 190 | running DRC, LVS, and sending the design off to the fabrication house. 191 | 192 | --- 193 | 194 | ### Question 1: Understanding SRAMs 195 | **a)** Open the file `sram_behav_models.v` (located in HAMMER repository). 196 | **What are different SRAM-sizes available?** 197 | **What is the difference between the `SRAM1RW*` and `SRAM2RW*` variants?** 198 | Hint: take some time to look at the Verilog implementation to understand what it does. You will need to use this SRAM model in the final project. 199 | 200 | **b)** In the same file, select an SRAM instance that has a BYTEMASK pin. 201 | **What is the SRAM model (in terms of number of Read/Write ports, address width, data/word width)?** 202 | **Briefly describe the purpose the BYTEMASK. In which situation do you think it is useful?** 203 | 204 | *c) (Ungraded thought experiment #1) SRAM libraries in real process technologies are much larger than the list you see in `sram_behav_models.v`. What features do you think are important for real SRAM libraries? Think in terms of number of ports, masking, improving yield, or anything else you can think of. What would these features do to the size of the SRAM macros?* 205 | 206 | *d) (Ungraded thought experiment #2) SRAMs should be integrated very densely in a circuit’s layout. To build large SRAM arrays, often times many SRAM macros are tiled together, abutted on one or more sides. Knowing this, take a guess at how SRAMs are laid out.* 207 | 208 | *i) In ASAP7, there are 9 metal layers, but realistically only 7 layers to route on in order to leave the top 2 layers for robust power distribution, as you saw in Lab 4. How many layers should a well-designed SRAM macro use (i.e. block off from PAR routing), at maximum?* 209 | 210 | *ii) Where should the pins on SRAMs be located, if you want to maximize the ability for them to abut together?* 211 | 212 | --- 213 | 214 | ## A Vector Dot Product with SRAMs 215 | Take a moment to read through the file src/dot_product.v to understand the control logic of 216 | writing and reading from SRAMs. The two SRAMs are first filled with vector data up until a size 217 | of vector size, after that they are read for the dot product computation. 218 | To run RTL simulation, type the following command 219 | 220 | ```shell 221 | make sim-rtl 222 | ``` 223 | 224 | To inspect the RTL simulation waveform, type the following commands 225 | 226 | ```shell 227 | cd build/sim-rundir 228 | dve -vpd vcdplus.vpd 229 | ``` 230 | 231 | The simulation takes 35 cycles to complete, which makes sense since it spends the first 16 cycles 232 | to read data from vector `a` and `b`, and performs a dot product computation in 16 cycles, including 233 | extra few cycles for various state transitions. The goal is not building the most efficient dot product 234 | implementation, but rather providing you an introductory design to how you would interface with 235 | SRAMs. 236 | 237 | Next, we will perform PAR on the circuit. 238 | 239 | ```shell 240 | make par 241 | ``` 242 | 243 | This command will invoke Synthesis as well, if it has not been run already (However, make sure to re-run synthesis if you updated your `design.yml` file). After PAR finishes, 244 | you can open the floorplan of the design by doing 245 | 246 | ```shell 247 | cd build/par-rundir 248 | ./generated-scripts/open_chip 249 | ``` 250 | 251 | This will launch Cadence Innovus GUI and load your final design database. You should expect to 252 | see the floorplan as in the following image. Don’t forget to disable M8, V8, M9, V9 on the right 253 | pane to see the unobstructed floorplan. 254 | 255 |
256 |
257 |
355 |
356 |
386 |
387 |
411 |
412 |
3 | Prof. Bora Nikolic 4 |
5 |6 | TAs: Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu 7 |
8 |9 | Department of Electrical Engineering and Computer Science 10 |
11 |12 | College of Engineering, University of California, Berkeley 13 |
14 | 15 | 16 | 17 | ## Overview 18 | For this lab, you will learn how to translate RTL code into a gate-level netlist in a process called 19 | synthesis. In order to successfully synthesize your design, you will need to understand how to 20 | constrain your design, learn how the tools optimize logic and estimate timing, analyze the critical 21 | path of your design, and simulate the gate-level netlist. 22 | To begin this lab, get the project files by typing the following commands: 23 | 24 | ```shell 25 | git clone /home/ff/eecs151/labs/lab3.git 26 | cd lab3 27 | ``` 28 | 29 | You should add the following lines to the `.bashrc` file in your home folder 30 | (for more information about what `.bashrc` does, see https://www.tldp.org/LDP/abs/html/sample-bashrc.html) 31 | so that every time 32 | you open a new terminal you have the paths for the tools setup properly. 33 | 34 | ```shell 35 | source /home/ff/eecs151/tutorials/eecs151.bashrc 36 | export HAMMER_HOME=/home/ff/eecs151/hammer 37 | source ${HAMMER_HOME}/sourceme.sh 38 | ``` 39 | 40 | Type 41 | 42 | ```shell 43 | which genus 44 | ``` 45 | 46 | to see if the shell prints out the path to the Cadence Genus Synthesis program (which we will be 47 | using for this lab). If it does not work, add the lines to your `.bash_profile` in your home folder 48 | as well. Try log in or open a new terminal to see if it works. The file `eecs151.bashrc` sets various 49 | environment variables in your system such as where to find the CAD programs or license servers. 50 | 51 | 52 | ## Synthesis Environment 53 | To perform synthesis, we will be using Cadence Genus. However, we will not be interfacing with 54 | Genus directly, we will rather use HAMMER. Just like in lab 2, we have set up the basic HAMMER 55 | flow for your lab exercises using Makefile. 56 | 57 | In this lab repository, you will see two sets of input files for HAMMER. The first set of files are 58 | the source codes for our design that you will explore in the next section. The second set of files are 59 | some YAML files (`inst-env.yml`, `sky130.yml`, `design-sky130.yml`, `sim-rtl.yml`, `sim-gl-syn.yml`) that 60 | configure the HAMMER flow. Of these YAML files, you should only need to modify `design.yml`, 61 | `sim-rtl.yml` and `sim-gl-syn.yml` in order to configurate to the synthesis and simulation for your 62 | design. 63 | 64 | 65 | HAMMER is already setup at `/home/ff/eecs151/hammer` with all the required plugins for Cadence 66 | Synthesis (Genus) and Place-and-Route (Innovus), Synopsys Simulator (VCS), Mentor Graphics 67 | DRC and LVS (Calibre). You should not need to install it on your own home directory. **These 68 | HAMMER plugins are under NDA. They are provided to us for educational purpose. 69 | They should never be copied outside of instructional machines under any circumstances or else we are at risk of unable to get access to the tools in the future!!!** 70 | 71 | Let us take a look at some parts of `design.yml` file: 72 | 73 | ```yaml 74 | gcd.clockPeriod: &CLK_PERIOD "1ns" 75 | ``` 76 | 77 | This option sets the target clock speed for our design. A more stringent target (a lower clock 78 | period) will make the tool work harder and use higher-power gates to meet the clock 79 | period. A lower target lets the tool focus on reducing area and/or power. 80 | In the sim-rtl.yml: 81 | 82 | ```yaml 83 | defines: 84 | - "CLOCK_PERIOD=1.00" 85 | ``` 86 | 87 | The option sets the clock period used during simulation. It is generally useful to separate the two as 88 | you might want to see how the circuit performs under different clock frequencies without changing 89 | the design constraints. Continuing from `design.yml`: 90 | 91 | ```yaml 92 | gcd.verilogSrc: &VERILOG_SRC 93 | - "src/gcd.v" 94 | - "src/gcd_datapath.v" 95 | - "src/gcd_control.v" 96 | ``` 97 | 98 | and in `sim-rtl.yml`: 99 | 100 | ```yaml 101 | sim.inputs: 102 | input_files: 103 | - "src/gcd.v" 104 | - "src/gcd_datapath.v" 105 | - "src/gcd_control.v" 106 | - "src/gcd_testbench.v" 107 | ``` 108 | 109 | These specify the files for synthesis and simulation. Moving on, we have: 110 | 111 | ```yaml 112 | vlsi.inputs.clocks: [ 113 | {name: "clk", period: *CLK_PERIOD, uncertainty: "0.1ns"} 114 | ] 115 | ``` 116 | 117 | This is where we specify to HAMMER that we intend on using the `CLK_PERIOD` we defined earlier 118 | as the constraint for our design. We will see more detailed constraints in the later labs. 119 | 120 | ## Understanding the example design 121 | We have provided a circuit described in Verilog that computes the greatest common divisor (GCD) 122 | of two numbers. Unlike the FIR filter from the last lab where the testbench constantly provided 123 | stimuli, the GCD algorithm takes a variable number of cycles, so the testbench needs to know when 124 | the circuit is done to check the output. This is accomplished through a “ready/valid” handshake 125 | protocol. This protocol is very ubiquitous and a flavor of it will appear both in the class project 126 | and later on in other blocks you will encounter throughout your career. The block diagram is shown 127 | in the figure below. 128 | 129 |
130 |
131 |
4 | Prof. Bora Nikolic 5 |
6 |7 | TAs: Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu 8 |
9 |10 | Department of Electrical Engineering and Computer Science 11 |
12 |13 | College of Engineering, University of California, Berkeley 14 |
15 | 16 | ## Overview 17 | This lab consists of three parts. For the first part, you will be writing a GCD coprocessor that could be included alongside a general-purpose CPU (like your final project). You will then learn how the tools can create a floorplan, route power straps, place standard cells, perform timing optimizations, and generate a clock tree for your design. Finally, you will get a slight head start on your project by writing part of the ALU. 18 | 19 | To begin this lab, get the project files and set up your environment by typing the following command and sourcing the `eecs151.bashrc` file, as usual: 20 | 21 | ```shell 22 | git clone /home/ff/eecs151/fa21/sky130/lab4-sky130.git 23 | ``` 24 | 25 | You should also clean up the build directory generated from the previous labs to save some disk space. 26 | 27 | ## Writing Your Coprocessor 28 | 29 | Take a look at the `gcd_coprocessor.v` file in the src folder. You will see the following empty Verilog module. 30 | 31 | ```verilog 32 | module gcd_coprocessor #( parameter W = 32 )( 33 | input clk, 34 | input reset, 35 | input operands_val, 36 | input [W-1:0] operands_bits_A, 37 | input [W-1:0] operands_bits_B, 38 | output operands_rdy, 39 | output result_val, 40 | output [W-1:0] result_bits, 41 | input result_rdy 42 | ); 43 | 44 | // You should be able to build this with only structural verilog! 45 | // Define wires 46 | // Instantiate gcd_datapath 47 | // Instantiate gcd_control 48 | // Instantiate request FIFO 49 | // Instantiate response FIFO 50 | 51 | endmodule 52 | 53 | ``` 54 | First notice the parameter `W`. `W` is the data width of your coprocessor; the input data and output data will all be this bitwidth. Be sure to pass this parameter on to any submodules that may use it! You should implement a coprocessor that can handle 4 outstanding requests at a time. For now, you will use a FIFO (First-In, First-Out) block to store requests (operands) and responses (results). 55 | 56 | A FIFO is a sequential logic element which accepts (enqueues) valid data and outputs (dequeues) it in the same order when the next block is ready to accept. This is useful for buffering between the producer of data and its consumer. When the input data is valid (`enq_val`) and the FIFO is ready for data (`enq_rdy`), the input data is enqueued into the FIFO. There are similar signals for the output data. This interface is called a “decoupled” interface, and if implemented correctly it makes modular design easy (although sometimes with performance penalties). 57 | 58 | This FIFO is implemented with a 2-dimensional array of data called `buffer`. There are two pointers: a read pointer `rptr` and a write pointer `wptr`. When data is enqueued, the write pointer is incremented. When data is dequeued, the read pointer is incremented. Because the FIFO depth is a power of 2, we can leverage the fact that addition rolls over and the FIFO will continue to work. However, once the read and write pointers are the same, we don’t know if the FIFO is full or empty. We fix this by writing to the `full` register when they are the same and we just enqueued, and clearing the `full` register otherwise. 59 | 60 | A partially written FIFO has been provided for you in `fifo.v`. Using the information above, complete the FIFO implementation so that it behaves as expected. 61 | 62 | 63 | Then, finish the coprocessor implementation in `gcd_coprocessor.v`, so that the GCD unit and FIFOs are connected like in the following diagram. Note the connection between the `gcd_datapath` and `gcd_control` should be very similar to that in Lab 3’s `gcd.v` and that clock and reset are omitted from the diagram. You will need to think about how to manage a ready/valid decoupled interface with 2 FIFOs in parallel. 64 | 65 | 66 |
67 |
68 |
217 |
218 |
226 |
227 |
233 |
234 |
242 |
243 |
250 |
251 |