├── .gitignore ├── lab2 └── figs │ ├── dve.png │ ├── fir.png │ ├── q1_a.png │ ├── q1_b_1.png │ ├── q1_b_2.png │ ├── q1_c.png │ ├── new_wave.png │ ├── vlsi_flow.png │ └── display_wave.png ├── lab6 ├── figs │ ├── drc.png │ ├── dp_div.png │ ├── layout.png │ └── lvs_smiley.png └── spec.md ├── project ├── figs │ ├── csrw.png │ ├── RV32I_Base_Instruction_Set.pdf │ └── RV32I_Base_Instruction_Set.png ├── README.md ├── final.md ├── checkpoint2.md ├── checkpoint3.md ├── checkpoint4.md ├── overview.md └── checkpoint1.md ├── lab1 ├── figs │ └── x2gomacos.png └── spec.md ├── lab4 ├── figs │ ├── view_icons.png │ ├── timing_debug.png │ ├── clock_tree_nets.png │ ├── gcd_coprocessor.pdf │ ├── gcd_coprocessor.png │ ├── innovus_window.png │ ├── clock_tree_debugger.png │ ├── sky130 │ │ ├── innovus_window.png │ │ ├── timing_debug.png │ │ ├── clock_tree_nets.png │ │ ├── clock_tree_debugger.png │ │ └── critical_path_highlight.png │ └── critical_path_highlight.png └── spec_sky130.md ├── lab3 ├── figs │ ├── block-diagram.pdf │ └── block-diagram.png ├── spec.md └── spec_sky130.md ├── README.md └── lab5 ├── spec_sky130.md └── spec.md /.gitignore: -------------------------------------------------------------------------------- 1 | *.DS_Store -------------------------------------------------------------------------------- /lab2/figs/dve.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab2/figs/dve.png -------------------------------------------------------------------------------- /lab2/figs/fir.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab2/figs/fir.png -------------------------------------------------------------------------------- /lab6/figs/drc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab6/figs/drc.png -------------------------------------------------------------------------------- /lab2/figs/q1_a.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab2/figs/q1_a.png -------------------------------------------------------------------------------- /lab2/figs/q1_b_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab2/figs/q1_b_1.png -------------------------------------------------------------------------------- /lab2/figs/q1_b_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab2/figs/q1_b_2.png -------------------------------------------------------------------------------- /lab2/figs/q1_c.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab2/figs/q1_c.png -------------------------------------------------------------------------------- /lab6/figs/dp_div.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab6/figs/dp_div.png -------------------------------------------------------------------------------- /lab6/figs/layout.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab6/figs/layout.png -------------------------------------------------------------------------------- /lab2/figs/new_wave.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab2/figs/new_wave.png -------------------------------------------------------------------------------- /project/figs/csrw.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/project/figs/csrw.png -------------------------------------------------------------------------------- /lab1/figs/x2gomacos.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab1/figs/x2gomacos.png -------------------------------------------------------------------------------- /lab2/figs/vlsi_flow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab2/figs/vlsi_flow.png -------------------------------------------------------------------------------- /lab4/figs/view_icons.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/view_icons.png -------------------------------------------------------------------------------- /lab6/figs/lvs_smiley.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab6/figs/lvs_smiley.png -------------------------------------------------------------------------------- /lab2/figs/display_wave.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab2/figs/display_wave.png -------------------------------------------------------------------------------- /lab3/figs/block-diagram.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab3/figs/block-diagram.pdf -------------------------------------------------------------------------------- /lab3/figs/block-diagram.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab3/figs/block-diagram.png -------------------------------------------------------------------------------- /lab4/figs/timing_debug.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/timing_debug.png -------------------------------------------------------------------------------- /lab4/figs/clock_tree_nets.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/clock_tree_nets.png -------------------------------------------------------------------------------- /lab4/figs/gcd_coprocessor.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/gcd_coprocessor.pdf -------------------------------------------------------------------------------- /lab4/figs/gcd_coprocessor.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/gcd_coprocessor.png -------------------------------------------------------------------------------- /lab4/figs/innovus_window.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/innovus_window.png -------------------------------------------------------------------------------- /lab4/figs/clock_tree_debugger.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/clock_tree_debugger.png -------------------------------------------------------------------------------- /lab4/figs/sky130/innovus_window.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/sky130/innovus_window.png -------------------------------------------------------------------------------- /lab4/figs/sky130/timing_debug.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/sky130/timing_debug.png -------------------------------------------------------------------------------- /lab4/figs/critical_path_highlight.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/critical_path_highlight.png -------------------------------------------------------------------------------- /lab4/figs/sky130/clock_tree_nets.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/sky130/clock_tree_nets.png -------------------------------------------------------------------------------- /lab4/figs/sky130/clock_tree_debugger.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/sky130/clock_tree_debugger.png -------------------------------------------------------------------------------- /lab4/figs/sky130/critical_path_highlight.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/lab4/figs/sky130/critical_path_highlight.png -------------------------------------------------------------------------------- /project/figs/RV32I_Base_Instruction_Set.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/project/figs/RV32I_Base_Instruction_Set.pdf -------------------------------------------------------------------------------- /project/figs/RV32I_Base_Instruction_Set.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EECS150/asic_labs_sp22/HEAD/project/figs/RV32I_Base_Instruction_Set.png -------------------------------------------------------------------------------- /project/README.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Project Specification: RISC-V Processor Design 2 | 3 | 4 | - [Project Overview](overview.md) : Introduction, Project setup and Grading 5 | - [Checkpoint 1](checkpoint1.md) : ALU design and Pipeline diagram 6 | - Apr 1 (Friday), 2022 7 | - [Checkpoint 2](checkpoint2.md) : Fully functioning core 8 | - Apr 15 (Friday), 2022 9 | - [Checkpoint 3](checkpoint3.md) : Cache 10 | - Apr 22 (Friday), 2022 11 | - [Checkpoint 4](checkpoint4.md) : Synthesis, PAR & Power 12 | - Apr 29 (Friday), 2022 13 | - [Final Deliverables](final.md): 14 | - Final Interview/Checkoff: May 6, 2022 15 | - Report: May 9, 2022 16 | # Resources: 17 | [RISC-V Instruction Set Manual](https://riscv.org/technical/specifications/) (Volume 1, Unprivileged Spec) 18 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Labs Fall 21 2 | 3 | This lab course consists of 6 labs and a final project. The labs go through the ASIC design flow, from RTL through GDS. 4 | These labs are now available in two process technologies, 5 | the [ASAP7 7nm Predictive PDK](http://asap.asu.edu/asap/) (a non-implementable finFET technology developed for educational purposes) 6 | and the [Skywater 130nm PDK](https://skywater-pdk.readthedocs.io/en/latest/) (a real open-source 130nm CMOS process developed by Google and Skywater foundries). 7 | 8 | ## ASAP7 Labs 9 | - [Lab 1: Getting Around the Compute Environment](lab1/spec.md) 10 | - [Lab 2: Simulation](lab2/spec.md) 11 | - [Lab 3: Logic Synthesis](lab3/spec.md) 12 | - [Lab 4: Floorplanning, Placement, Power, and CTS](lab4/spec.md) 13 | - [Lab 5: Parallelization and Routing](lab5/spec.md) 14 | - [Lab 6: SRAM Integration, DRC, LVS](lab6/spec.md) 15 | 16 | ## ASIC Final Project 17 | This project guides students through writing their own CPU core and cache, and pushing this design through the ASIC flow to achieve a physical design. 18 | - [Project Overview](project/overview.md) : Introduction, Project setup and Grading 19 | - [Checkpoint 1](project/checkpoint1.md) : ALU design and Pipeline diagram 20 | - [Checkpoint 2](project/checkpoint2.md) : Fully functioning core 21 | - [Checkpoint 3](project/checkpoint3.md) : Cache 22 | - [Checkpoint 4](project/checkpoint4.md) : Synthesis, PAR & Power 23 | 24 | ## Sky130 Labs 25 | Alternate versions of the ASAP7 labs above use the Skywater 130nm PDK instead. Lab 6 is omitted because (1) the Sky130 SRAMs are currently not mature enough to be used for educational purposes, and (2) for DRC/LVS, the Sky130 Calibre decks are still under NDA, and while the open-source decks are available (for use with Magic and Netgen), our ASIC design flow does not currently support these open-source EDA tools. To learn about SRAMs, DRC, and LVS, please follow the ASAP7 version of Lab 6 above. 26 | - [Lab 1: Getting Around the Compute Environment](lab1/spec.md) 27 | - [Lab 2: Simulation](lab2/spec_sky130.md) 28 | - [Lab 3: Logic Synthesis](lab3/spec_sky130.md) 29 | - [Lab 4: Floorplanning, Placement, Power, and CTS](lab4/spec_sky130.md) 30 | - [Lab 5: Parallelization and Routing](lab5/spec_sky130.md) 31 | -------------------------------------------------------------------------------- /project/final.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Project Specification: Final Deliverables 2 |

3 | Prof. Sophia Shao 4 |

5 |

6 | TAs (ASIC): Dima Nikiforov 7 |

8 |

9 | Department of Electrical Engineering and Computer Science 10 |

11 |

12 | College of Engineering, University of California, Berkeley 13 |

14 | 15 | --- 16 | 17 | ## Final Project Deliverables 18 | 19 | By now you should have designed a fully-functional processor from scratch! Your design should pass all assembly tests at your reported maximum frequency. Your 20 | design should also pass all of the benchmark tests in at your reported maximum frequency, and you 21 | should report the cycle count for each of those tests. By the due date (Monday, May 9, 2022), each 22 | team needs to push their final commits to their team’s git repository. Only the final commit before the 23 | due date will be graded, so be very, very careful that you have submitted everything required. To be 24 | graded you must submit the following items: 25 | * `src/*.v` 26 | * `build/syn-rundir/reports/*` 27 | * `build/par-rundir/timingReports/*` 28 | * `build/par-rundir/innovus.log*` 29 | 30 | These files will be used to check processor functionality and will show us your critical path, maximum operating frequency and area. During the interview session (Friday, May 6, 2022), the 31 | professors and the GSI will be interviewing each team to gauge understanding of various concepts 32 | learned in the project, understand more about each team’s design process, and provide feedback. Your 33 | final report needs to answer the following questions: 34 | 35 | 1. Show the final pipeline diagram, and explain the functionality of different submodules in your design, how control signals are generated, memory structure, etc. 36 | 37 | 2. What is the post-synthesis critical path length? What sections of the processor does the critical 38 | path pass through? Why is this the critical path? 39 | 40 | 3. Show a screenshot of the final floorplan. Also include a screenshot of the clock tree debugger results. Discuss your floorplanning strategy and the quality of your clock tree results. 41 | 42 | 4. What is the post-place-and-route critical path length? What sections of the processor does the 43 | critical path pass through? Why is this the critical path? If it is different from the post-synthesis 44 | critical path, why? 45 | 46 | 5. What is the area utilization of the final design? Also include the total core area you used in PnR and the density. 47 | 48 | 6. What is the Innovus-estimated power consumption of the final design? 49 | 50 | 7. What is the number of cycles that your design takes to run the benchmarks? What changes/optimizations 51 | have you done to try and optimize for these tests? 52 | 53 | 8. What is the post-place-and-route runtime (in seconds) of each benchmark? 54 | *Use the number of cycles from RTL simulation, and minimum clock period to meet timing for place-and-route (design doesn't have to pass post-PAR simulations with this clock period).* 55 | 56 | 9. If there are bugs in your design still, explain what is working and what isn't. What was your debugging process? Where are the bugs localized? 57 | 58 | 10. Explain any other optimizations you made for your design. 59 | 60 | 11. Is there anything you would like to tell the staff before we grade your project? 61 | 62 | 63 | If you worked with a partner you do not need separate reports. If you are having issues with your 64 | partner please contact the GSI privately as soon as possible. 65 | 66 | 67 | 68 | ## Acknowledgement 69 | 70 | This project is the result of the work of many EECS151/251 GSIs over the years including: 71 | Written By: 72 | - Nathan Narevsky (2014, 2017) 73 | - Brian Zimmer (2014) 74 | Modified By: 75 | - John Wright (2015,2016) 76 | - Ali Moin (2018) 77 | - Arya Reais-Parsi (2019) 78 | - Cem Yalcin (2019) 79 | - Tan Nguyen (2020) 80 | - Harrison Liew (2020) 81 | - Sean Huang (2021) 82 | - Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu (2021) 83 | - Dima Nikiforov (2022) 84 | -------------------------------------------------------------------------------- /project/checkpoint2.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Project Specification: Checkpoint 2 2 |

3 | Prof. Sophia Shao 4 |

5 |

6 | TAs (ASIC): Dima Nikiforov 7 |

8 |

9 | Department of Electrical Engineering and Computer Science 10 |

11 |

12 | College of Engineering, University of California, Berkeley 13 |

14 | 15 | --- 16 | ## Fully functioning core 17 | 18 | ### 1. Additional Instructions 19 | #### 1.1 Control and Status Register (CSR) 20 | In order to run the testbenches, there are a few new instructions that need to be added for help in 21 | debugging/creating testbenches. Read through Chapter 9 in the RISC-V specification. A CSR (or 22 | control status register) is some state that is stored independent of the register file and the memory. 23 | While there are 2^12 possible CSR addresses, you will only use one of them (`tohost = 0x51E`). The 24 | `tohost` register is monitored by the test harness, and simulation ends when a value is written to this 25 | register. A value of 1 indicates success, a value greater than 1 gives clues as to the location of the failure. 26 | There are 2 CSR related instructions that you will need to implement: 27 | 1. `csrw tohost,t2` (short for `csrrw x0,csr,rs1` where `csr = 0x51E`) 28 | 2. `csrwi tohost,1` (short for `csrrwi x0,csr,zimm` where `csr = 0x51E`) 29 | 30 | `csrw` will write the value from register in rs1. `csrwi` will write the immediate (stored in rs1) to 31 | the addressed csr. Note that you do not need to write to rd (writing to x0 does nothing). 32 | 33 |

34 | 35 |

36 | 37 | ### 2. Details 38 | Your job is to implement the core of the 3-stage RISC-V CPU. 39 | 40 | #### 2.1 Reset 41 | Your CPU will have an input reset signal that the testbench toggles. Once out of reset, 42 | your CPU should start at PC address `0x2000` (defined as `PC_RESET` in `src/const.vh`) 43 | and begin executing instructions. 44 | 45 | #### 2.2 Misaligned Addresses 46 | According to the RISC-V ISA spec, reads and writes to memory addresses not aligned to a 32-bit word boundary (or 16-bit for halfword) should cause an exception. In this project, for the purpose of simplicity, we ignore the misaligned bits (i.e. set them to zero), which is done in `Memory151.v` by only using address bits `31:2`. 47 | 48 | 49 | ### 3. File Structure 50 | Implement the datapath and control logic for your RISC-V processor in the file `Riscv151.v`. Make 51 | sure that the inputs and outputs remain the same, since this module connects to the memory system 52 | for system-level testing. If you look at `riscv_test_harness.v` you can see a testbench that 53 | is provided. Target this testbench in your `sim-rtl.yml` file by changing the `tb_name` key to 54 | `rocketTestHarness`. 55 | 56 | ### 4. Running the Test 57 | This testbench will load a program into the instruction memory, and will then run until the exit code 58 | register has been set. 59 | There is also a timeout to make sure that the simulation does not run forever. 60 | You should only be running this test 61 | suite after you have eliminated some of the bugs using single instruction tests, as described below. 62 | 63 | ### 5. Running assembly tests 64 | We have provided a suite of assembly tests to help you debug all of the instructions you need to estimate. 65 | To run all of them: 66 | ``` 67 | make sim-rtl test_asm=all 68 | ``` 69 | This will generate .out files in the `asm_output/` directory, and summarize which tests passed and 70 | failed. You can also run a single asm test with the following command: 71 | ``` 72 | make sim-rtl test_asm=simple.out 73 | ``` 74 | If you would like to generate waveforms for a single test: 75 | ``` 76 | make sim-rtl test_asm=simple.vpd 77 | ``` 78 | ’simple’ may be replaced with any of the available tests defined in the `Makefile`. 79 | 80 | You can read the assembly code of the programs by looking at the dump file. Comments in the code 81 | will help you understand what is happening. 82 | ``` 83 | cd tests/asm/ 84 | vim addi.dump 85 | ``` 86 | Last, you can see the hex code that is loaded directly into the memory by looking at the hex file. 87 | ``` 88 | cd tests/asm/ 89 | vim addi.hex 90 | ``` 91 | 92 | 93 | ### 6. Checkpoint 2 Deliverables 94 | *Checkoff due: Apr 15 (Friday), 2022* 95 | 96 | Congratulations! You’ve started the design of your datapath by implementing your pipeline diagram, and written and thoroughly tested a key component in your processor and should now be well-versed in testing Verilog modules. Please answer the following questions to be checked off by a TA. 97 | 98 | 1. Show that all of the assembly tests pass 99 | 100 | 2. Show your final pipeline diagram, updated to match the code. 101 | 102 | 3. Push your implementation and updated pipeline diagram to your repo. 103 | 104 | --- 105 | 106 | 107 | ## Acknowledgement 108 | 109 | This project is the result of the work of many EECS151/251 GSIs over the years including: 110 | Written By: 111 | - Nathan Narevsky (2014, 2017) 112 | - Brian Zimmer (2014) 113 | Modified By: 114 | - John Wright (2015,2016) 115 | - Ali Moin (2018) 116 | - Arya Reais-Parsi (2019) 117 | - Cem Yalcin (2019) 118 | - Tan Nguyen (2020) 119 | - Harrison Liew (2020) 120 | - Sean Huang (2021) 121 | - Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu (2021) 122 | - Dima Nikiforov (2022) 123 | -------------------------------------------------------------------------------- /project/checkpoint3.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Project Specification: Checkpoint 3 2 |

3 | Prof. Sophia Shao 4 |

5 |

6 | TAs (ASIC): Dima Nikiforov 7 |

8 |

9 | Department of Electrical Engineering and Computer Science 10 |

11 |

12 | College of Engineering, University of California, Berkeley 13 |

14 | 15 | --- 16 | 17 | ## Cache 18 | A processor operates on data in memory. Memory can hold billions of bits, which can either be instructions or data. In a VLSI design, it is a very bad idea to store this many bits close to the processor. The 19 | chip area required would be huge - consider how many DRAM chips your PC has, and that DRAM cells 20 | are much smaller than SRAM cells (which can actually be implemented in the same CMOS process). 21 | Moreover, the entire processor would have to slow down to accommodate delays in the large memory 22 | array. Instead, caches are used to create the illusion of a large memory with low latency. 23 | 24 | Your task is to implement a (relatively) simple cache for your RISC-V processor, based on some 25 | predefined SRAM macros (memory arrays) and the interface specified below. 26 | 27 | ### 1 Cache overview 28 | When you request data at a given address, the cache will see if it is stored locally. If it is (cache hit), it 29 | is returned immediately. Otherwise if it is not found (cache miss), the cache fetches the bits from the 30 | main memory. 31 | Caches store data in “ways.” A way is a logical element which contains valid bits, tag bits, and data. 32 | The simplest type of cache is direct-mapped (a 1-way cache). A cache stores data in larger units (lines) 33 | than single words. In each way, a given address may only occupy a single location, determined by the 34 | lowest bits of the cache line address. The remaining address bits are called the “tag” and are stored so 35 | that we can check if a given cache line belongs to a given address. The valid bit indicates which lines 36 | contain valid data. 37 | Multi-way caches allow more flexibility in what data is stored in the cache, since there are multiple 38 | locations for a line to occupy (the number of ways). For this reason, a ”replacement policy” is needed. 39 | This is used to decide which way’s data to evict when fetching new data. For this project you may use 40 | any policy you wish, but pseudo-random is recommended. 41 | 42 | ### 2 Guidelines and requirements 43 | You have been given the interface of a cache (`Cache.v`) and your next task is to implement the cache. 44 | EECS151 students should build a direct-mapped cache, and EECS251 students are required to implement a cache that either: 45 | 46 | 1. is configurable to be either direct-mapped or at least 2-way set associative; or 47 | 48 | 2. is set-associative with configurable associativity. 49 | 50 | You are welcome to implement a more performant cache if you desire. 51 | Your cache should be at least 512 bytes; if you wish to increase the size, implement the 512 bytes 52 | cache first and upgrade later. 53 | Use the SRAMs that are available in 54 | 55 | `/home/ff/eecs151/hammer/src/hammer-vlsi/technology/asap7/sram_compiler/memories/behavioral/sram_behav_models.v` 56 | 57 | for your data and tag arrays. 58 | 59 | 60 | The pin descriptions for these SRAMs are as follows: 61 | 62 | | | | 63 | |-------------|-------------------------------------------------| 64 | | `A` | Address | 65 | | `CE` | clock edge | 66 | | `OEB` | output enable bar (tie this to 0) | 67 | | `WEB` | write enable bar (1 is a read, 0 is a write) | 68 | | `CSB` | chip select bar (tie this to 0) | 69 | | `BYTEMASK` | write byte mask | 70 | | `I` | write data | 71 | | `O` | read data | 72 | 73 | You should use cache lines that are 512 bits (16 words) for this project. The memory interface is 74 | 128 bits, meaning that you will require multiple (4) cycles to perform memory transactions. 75 | Below find a description of each signal in `Cache.v`: 76 | 77 | | | | 78 | |----------------------|----------------------------------------| 79 | | `clk` | clock 80 | | `reset` | reset 81 | | `cpu_req_valid` | The CPU is requesting a memory transaction 82 | | `cpu_req_rdy` | The cache is ready for a CPU memory transaction 83 | | `cpu_req_addr` | The address of the CPU memory transaction 84 | | `cpu_req_data` | The write data for a CPU memory write (ignored on reads) 85 | | `cpu_req_write` | The 4-bit write mask for a CPU memory transaction (each bit corresponds to the byte address within the word). `4’b0000` indicates a read. 86 | | `cpu_resp_val` | The cache has output valid data to the CPU after a memory read 87 | | `cpu_resp_data` | The data requested by the CPU 88 | | `mem_req_val` | The cache is requesting a memory transaction to main memory 89 | | `mem_req_rdy` | Main memory is ready for the cache to provide a memory address 90 | | `mem_req_addr` | The address of the main memory transaction from the cache. Note that this address is narrower than the CPU byte address since main memory has wider data._ 91 | | `mem_req_rw` | 1 if the main memory transaction is a write; 0 for a read. 92 | | `mem_req_data_valid` | The cache is providing write data to main memory. 93 | | `mem_req_data_ready` | Main memory is ready for the cache to provide write data. 94 | | `mem_req_data_bits` | Data to write to main memory from the cache (128 bits/4 words). 95 | | `mem_req_data_mask` | Byte-level write mask to main memory. May be `16’hFFFF` for a full write. 96 | | `mem_resp_val` | The main memory response data is valid. 97 | | `mem_resp_data` | Main memory response data to the cache (128 bits/4 words). 98 | | | 99 | 100 | To design your cache, start by outlining where the SRAMs should go. You should include an SRAM 101 | per way for data, and a separate SRAM per way for the tags. Depending on your implementation, you 102 | may want to implement the valid bits in flip flops or as part of the tag SRAM. 103 | 104 | Next you should develop a state machine that covers all the events that your cache needs to handle 105 | for both hits and misses. You can do it without an explicit state machine, but you will suffer. Keep in 106 | mind you will need to write any valid data back to main memory before you start refilling the cache (you 107 | can use a write-back or a write-through policy). Both of these transactions will take multiple cycles. 108 | 109 | ### 3 Changes to the flow for this checkpoint 110 | You should now be able to pass the `bmark` test. The test suite includes many C programs that do 111 | various things to test your processor and cache implementation. You can observe the number of cycles 112 | that each bmark test takes to run by opening `bmark_output/*.out` and taking note of the number 113 | on the last line. The `make sim-rtl test bmark=all` target will also print this number for you. 114 | To run a specific benchmark (e.g., cachetest), run 115 | ``` 116 | make sim-rtl test_bmark=cachetest.out 117 | ``` 118 | After completing your cache, run the tests with both the cache included and with the fake memory 119 | (`no_cache_mem`) included. To use no_cache_mem be sure to have `+define+no_cache_mem` in the 120 | simOptions variable in the `sim-rtl.yml` file. To use your cache, comment out `+define+no_cache_mem`. 121 | Take note of the cycle counts for both, you should see the cycle counts increase when you use the cache. 122 | 123 | ### 4 Checkpoint 3 Deliverables 124 | *Checkoff due: Apr 29 (Friday), 2022* 125 | 126 | 1. Show that all of the assembly tests and final pass using the cache 127 | 128 | 2. Show the block diagram of your cache 129 | 130 | 3. What was the difference in the cycle count for the `bmark` test with the perfect memory and the 131 | cache? 132 | 133 | 4. Show your final pipeline diagram, updated to match the code 134 | 135 | --- 136 | 137 | 138 | ## Acknowledgement 139 | 140 | This project is the result of the work of many EECS151/251 GSIs over the years including: 141 | Written By: 142 | - Nathan Narevsky (2014, 2017) 143 | - Brian Zimmer (2014) 144 | Modified By: 145 | - John Wright (2015,2016) 146 | - Ali Moin (2018) 147 | - Arya Reais-Parsi (2019) 148 | - Cem Yalcin (2019) 149 | - Tan Nguyen (2020) 150 | - Harrison Liew (2020) 151 | - Sean Huang (2021) 152 | - Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu (2021) 153 | - Dima Nikiforov (2022) 154 | -------------------------------------------------------------------------------- /lab5/spec_sky130.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Lab 5: Parallelization and Routing 2 | 3 |

4 | Prof. Bora Nikolic 5 |

6 |

7 | TAs: Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu 8 |

9 |

10 | Department of Electrical Engineering and Computer Science 11 |

12 |

13 | College of Engineering, University of California, Berkeley 14 |

15 | 16 | ## Overview 17 | 18 | Like last week, this lab has two parts. For the first part, we will continue to develop our GCD 19 | coprocessor by improving its performance. After that, we will continue the physical design flow by 20 | performing routing. 21 | 22 | To begin this lab, get the project files and set up your environment by typing the following commands: 23 | 24 | ```shell 25 | git clone /home/ff/eecs151/fa21/sky130/lab5-sky130.git 26 | cd lab5 27 | ``` 28 | 29 | ## Design 30 | 31 | One way we can improve the performance of our GCD coprocessor is by parallelizing the compute. 32 | We can do this by including multiple GCD units in our design, and routing traffic to them as they 33 | become available. 34 | 35 | You will find that the solution to last week’s lab (`fifo.v` and `gcd_coprocessor.v`) is included. The 36 | test has been modified to check the total number of cycles taken by the coprocessor to complete the 37 | tests. Run `make sim-rtl` to run the new testbench on the solution code. Take note of the number 38 | of cycles that the tests take without modification, as you will need it to calculate your speedup. 39 | 40 | Your task is to edit `gcd_coprocessor.v` to improve the performance below 225 cycles. We will do 41 | this by using two instances of GCD. 42 | 43 | You will find RTL that connects the datapath and controller into one module in `gcd_unit.v`. You 44 | may find this useful when refactoring the `gcd_coprocessor`, since you will need fewer wires to place 45 | both GCD instances. 46 | 47 | You will also find stub code for an arbiter, which you should complete. We will use the arbiter 48 | to route traffic to GCD units and preserve the response ordering. Most of your design can be 49 | implemented with combinational logic, but you will need some state to remember which GCD 50 | block contains the earliest data to preserve ordering. 51 | 52 | --- 53 | ## Question 1: Design 54 | 55 | **a.) Submit your code (`gcd_coprocessor.v` and `gcd_arbiter.v`) with your lab assignment.** 56 | 57 | **b.) How many cycles did your simulation take? What was the % speedup?** 58 | 59 | --- 60 | 61 | ## Automated Flow 62 | 63 | In the last lab, we only focused on the PAR flow through CTS. In this lab, we will go through the full flow. 64 | Routing is the next major flow step. Prior to the actual routing step, Innovus uses a 65 | basic routing engine with errors and shorts, but ignores these errors and simply tries to 66 | get an estimate of delays and parasitics. Once post-CTS 67 | optimization is done, it switches to a different tool that actually legalizes routing and tries to eliminate 68 | shorts while meeting timing. Routing is one of the most 69 | computationally heavy tasks of digital IC design and can take days to complete for complicated designs. 70 | This will be reflected in the runtime in this lab. 71 | 72 | After routing is complete, a post-Route optimization is run to ensure no timing violations 73 | remain. Post-Route optimization typically has little freedom to move cells around, and it tries to 74 | meet the timing constraints mostly by tweaking the length of the routings. You may see some DRC 75 | (Design Rule Check) errors caused by the 7nm technology library, after routing. 76 | 77 | First, synthesize the design: 78 | 79 | ```shell 80 | make syn 81 | ``` 82 | 83 | Then, simulate the synthesized design to make sure it still works: 84 | 85 | ```shell 86 | make sim-gl-syn 87 | ``` 88 | 89 | Once your synthesized design passes the test, you can start the PAR flow: 90 | 91 | ```shell 92 | make par 93 | ``` 94 | 95 | The PAR command will take a long time to complete, as it runs through all stages of PAR. 96 | Check out the iterations that Innovus runs through during optimization. You can see some of the metrics that Innovus is using. 97 | Once it completes, take a look at the build directory as in the previous labs. You might see additional files 98 | compare to the `syn-rundir`, and that’s because the PAR flow incorporates the RC and parasitic delays, in addition to the cell delays. Open `build/par-rundir/gcd_coprocessor.setup.par.spef` 99 | and search for the first occurrence of `D_NET`. What does it say about the first net? You may find 100 | [this wiki page](https://en.wikipedia.org/wiki/Standard_Parasitic_Exchange_Format#Parasitics) helpful. *(thought experiment #1 : get a sense of the units at the top and orders of magnitude of the RC parasitics in the SPEF file. If we used a 5nm technology library, do you expect the resistance to generally increase or decrease? How about the capacitance?)* 101 | 102 | --- 103 | ## Question 2: Automated Flow 104 | 105 | a.) Check the post-Synthesis timing report 106 | (`syn-rundir/reports/final_time_ss_100C_1v60.setup_view.rpt`) and post-PAR timing report (`par-rundir/timingReports/gcd_coprocessor_postRoute_all.tarpt`). 107 | **What are the critical paths of your post-PAR and post-Synthesis designs?** 108 | **Are they the same path?** 109 | **How does this critical path compare to your single-unit critical path?** 110 | 111 | b.) Iterate on your design by modifying `design.yml` to find a rough estimate (no need to be too 112 | precise) for the clock period until you start running into setup errors. 113 | **Given the number of cycles it takes to complete the testbench, what is the shortest time your design can finish the computation?** 114 | 115 | c.) Open the post-CTS timing report(`par-rundir/hammer_cts_debug/hammer_cts_all.tarpt`) and the post-PAR 116 | timing report(`par-rundir/timingReports/gcd_coprocessor_postRoute_all.tarpt`). 117 | **Find a common path (same start and end sequential elements). What differences do you notice within the paths?** 118 | 119 | --- 120 | 121 | ## Innovus Commands 122 | 123 | As in the previous lab, we will look at the contents of `par.tcl` that Hammer generates and follow 124 | along using Innovus. 125 | *(thought experiment #2 : open the `par.tcl` and search for the command `set_db add_fillers_cells`. Based on the names of the cells specified by this command, what do you think is the function of the filler cells?)* 126 | 127 | Navigate to the directory `build/par-rundir` and type: 128 | 129 | ```shell 130 | innovus -common_ui 131 | ``` 132 | 133 | This will open the Innovus shell. Next, type `read_db gcd_coprocessor_FINAL` to load the current design 134 | database from the latest PAR flow. This will help us to avoid re-running the entire flow. To see 135 | all the reporting commands, type `help report*` in the Innovus shell and read through the options 136 | available to you. 137 | 138 | --- 139 | ## Question 3: Innovus Reports 140 | 141 | **a.) What is the area consumed by your design?** 142 | **What percentage of the total area does the arbiter occupy?** 143 | 144 | **b.) Submit a screenshot of your setup slack histogram.** 145 | **Compared with the histogram you obtained in Lab 4, does your new slack distribution support the observed performance improvements you obtained in your coprocessor?** 146 | 147 | --- 148 | 149 | After you are done with the flow, it is time to simulate our newly printed post-PAR netlist. Type 150 | the following command: 151 | 152 | ```shell 153 | make sim-gl-par 154 | ``` 155 | 156 | This will use the same testbench, but will now use the post-PAR netlist of your design, backannotated with delays and parasitics from PAR. Make sure to adjust the `CLOCK_PERIOD` variable in `sim-gl-par.yml` to match the clock period you obtained from PAR. Note, however, that the exact 157 | clock period may not work and you may need to relax it slightly. 158 | 159 | After running `make sim-gl-par` you can run power analysis using: 160 | 161 | ```shell 162 | make power-par 163 | ``` 164 | 165 | Navigate to `power-rundir/activePowerReports` and open `ss_100C_1v60.setup_view.rpt`. Do 166 | the power estimation numbers match your expectation? 167 | 168 | --- 169 | ## Question 4: Trade-offs 170 | 171 | **a.)** Re-run the flow using your old design. 172 | To prevent your `build` directory from being overwritten, set the `OBJ_DIR` Make variable to a different name (i.e. `make par OBJ_DIR=build2`). 173 | Using the area and power values from Innovus, 174 | **how does the performance improvement from the dual-unit design compare to area occupation and power consumption increase compared to your old design?** 175 | 176 | **b.)** Modify your `gcd_coprocessor.v` to take an input parameter in terms of number of clock cycles we 177 | want our design to meet (`parameter TARGET_NUMBER_OF_CYCLES`) for this given testbench. Your 178 | code should generate a low area, low power design if the number is greater than that your simple 179 | gcd coprocessor can achieve, and it should generate the dual-unit design if it is lower. 180 | **Submit your code.** 181 | 182 | c.) (Optional) Using a rough estimate of target number of cycles versus number of units in the design, 183 | write a code that will generate 1-8 cores depending on the performance demand. Do NOT do this 184 | by writing out every possible case explicitly. You can limit the number of units to powers of two 185 | (1,2,4,8) if it makes your life easier. 186 | 187 | --- 188 | 189 | ## Lab Deliverables 190 | 191 | ### Lab Due 11:59 PM, Friday Oct 15th, 2021 192 | 193 | - Submit a written report with all 4 questions answered to Gradescope 194 | - Checkoff with an ASIC lab TA 195 | 196 | ## Acknowledgements 197 | 198 | This lab is the result of the work of many EECS151/251 GSIs over the years including: 199 | 200 | - Nathan Narevsky (2014, 2017) 201 | - Brian Zimmer (2014) 202 | - Cem Yalcin (2019) 203 | 204 | Modified By: 205 | - John Wright (2015,2016) 206 | - Ali Moin (2018) 207 | - Arya Reais-Parsi (2019) 208 | - Cem Yalcin (2019) 209 | - Tan Nguyen (2020) 210 | - Harrison Liew (2020) 211 | - Sean Huang (2021) 212 | - Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu (2021) 213 | -------------------------------------------------------------------------------- /lab5/spec.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Lab 5: Parallelization and Routing 2 |

3 | Prof. Sophia Shao 4 |

5 |

6 | TAs (ASIC): Erik Anderson, Roger Hsiao, Hansung Kim, Richard Yan 7 |

8 |

9 | Department of Electrical Engineering and Computer Science 10 |

11 |

12 | College of Engineering, University of California, Berkeley 13 |

14 | 15 | ## Overview 16 | 17 | Like last week, this lab has two parts. For the first part, we will continue to develop our GCD 18 | coprocessor by improving its performance. After that, we will continue the physical design flow by 19 | performing routing. 20 | 21 | To begin this lab, get the project files and set up your environment by typing the following command and sourcing the `eecs151.bashrc` file, as usual: 22 | 23 | ```shell 24 | source /home/ff/eecs151/asic/eecs151.bashrc 25 | ``` 26 | 27 | ```shell 28 | git clone /home/ff/eecs151/labs/lab5.git 29 | cd lab5 30 | ``` 31 | 32 | ## Design 33 | 34 | One way we can improve the performance of our GCD coprocessor is by parallelizing the compute. 35 | We can do this by including multiple GCD units in our design, and routing traffic to them as they 36 | become available. 37 | 38 | You will find that the solution to last week’s lab (`fifo.v` and `gcd_coprocessor.v`) is included. The 39 | test has been modified to check the total number of cycles taken by the coprocessor to complete the 40 | tests. Run `make sim-rtl` to run the new testbench on the solution code. Take note of the number 41 | of cycles that the tests take without modification, as you will need it to calculate your speedup. 42 | 43 | Your task is to edit `gcd_coprocessor.v` to improve the performance below 225 cycles. We will do 44 | this by using two instances of GCD. 45 | 46 | You will find RTL that connects the datapath and controller into one module in `gcd_unit.v`. You 47 | may find this useful when refactoring the `gcd_coprocessor`, since you will need fewer wires to place 48 | both GCD instances. 49 | 50 | You will also find stub code for an arbiter, which you should complete. We will use the arbiter 51 | to route traffic to GCD units and preserve the response ordering. Most of your design can be 52 | implemented with combinational logic, but you will need some state to remember which GCD 53 | block contains the earliest data to preserve ordering. 54 | 55 | --- 56 | ## Question 1: Design 57 | 58 | **a.) Submit your code (`gcd_coprocessor.v` and `gcd_arbiter.v`) with your lab assignment.** 59 | 60 | **b.) How many cycles did your simulation take? What was the % speedup?** 61 | 62 | ## Checkoff 1: Design 63 | Demonstrate your simulation's functionality and explain your design/approach. 64 | 65 | --- 66 | 67 | ## Automated Flow 68 | 69 | In the last lab, we only focused on the PAR flow through CTS. In this lab, we will go through the full flow. 70 | Routing is the next major flow step. Prior to the actual routing step, Innovus uses a 71 | basic routing engine with errors and shorts, but ignores these errors and simply tries to 72 | get an estimate of delays and parasitics. Once post-CTS 73 | optimization is done, it switches to a different tool that actually legalizes routing and tries to eliminate 74 | shorts while meeting timing. Routing is one of the most 75 | computationally heavy tasks of digital IC design and can take days to complete for complicated designs. 76 | This will be reflected in the runtime in this lab. 77 | 78 | After routing is complete, a post-Route optimization is run to ensure no timing violations 79 | remain. Post-Route optimization typically has little freedom to move cells around, and it tries to 80 | meet the timing constraints mostly by tweaking the length of the routings. You may see some DRC 81 | (Design Rule Check) errors caused by the 7nm technology library, after routing. 82 | 83 | First, synthesize the design: 84 | 85 | ```shell 86 | make syn 87 | ``` 88 | 89 | Then, simulate the synthesized design to make sure it still works: 90 | 91 | ```shell 92 | make sim-gl-syn 93 | ``` 94 | 95 | Once your synthesized design passes the test, you can start the PAR flow: 96 | 97 | ```shell 98 | make par 99 | ``` 100 | 101 | The PAR command will take a long time to complete, as it runs through all stages of PAR. 102 | Check out the iterations that Innovus runs through during optimization. You can see some of the metrics that Innovus is using. 103 | Once it completes, take a look at the build directory as in the previous labs. You might see additional files 104 | compare to the `syn-rundir`, and that’s because the PAR flow incorporates the RC and parasitic delays, in addition to the cell delays. Open `build/par-rundir/gcd_coprocessor.PVT_0P63V_100C.par.spef` 105 | and search for the first occurrence of `D_NET`. What does it say about the first net? You may find 106 | [this wiki page](https://en.wikipedia.org/wiki/Standard_Parasitic_Exchange_Format#Parasitics) helpful. *(thought experiment #1 : get a sense of the units at the top and orders of magnitude of the RC parasitics in the SPEF file. If we used a 5nm technology library, do you expect the resistance to generally increase or decrease? How about the capacitance?)* 107 | 108 | ## Question 2: Automated Flow 109 | 110 | a.) Check the post-Synthesis timing report 111 | (`syn-rundir/reports/final_time_PVT_0P63V_100C.setup_view.rpt`) and post-PAR timing report (`par-rundir/timingReports/gcd_coprocessor_postRoute_all.tarpt`). 112 | **What are the critical paths of your post-PAR and post-Synthesis designs?** 113 | **Are they the same path?** 114 | **How does this critical path compare to your single-unit critical path?** 115 | 116 | b.) Iterate on your design by modifying `design.yml` to find a rough estimate (no need to be too 117 | precise) for the clock period until you start running into setup errors. 118 | **Given the number of cycles it takes to complete the testbench, what is the shortest time your design can finish the computation?** 119 | 120 | c.) Open the post-CTS timing report(`par-rundir/hammer_cts_debug/hammer_cts_all.tarpt`) and the post-PAR 121 | timing report(`par-rundir/timingReports/gcd_coprocessor_postRoute_all.tarpt`). 122 | **Find a common path (same start and end sequential elements). What differences do you notice within the paths?** 123 | 124 | --- 125 | 126 | ## Innovus Commands 127 | 128 | As in the previous lab, we will look at the contents of `par.tcl` that Hammer generates and follow 129 | along using Innovus. 130 | *(thought experiment #2 : open the `par.tcl` and search for the command `set_db add_fillers_cells`. Based on the names of the cells specified by this command, what do you think is the function of the filler cells?)* 131 | 132 | Navigate to the directory `build/par-rundir` and type: 133 | 134 | ```shell 135 | innovus -common_ui 136 | ``` 137 | 138 | This will open the Innovus shell. Next, type `read_db gcd_coprocessor_FINAL` to load the current design 139 | database from the latest PAR flow. This will help us to avoid re-running the entire flow. To see 140 | all the reporting commands, type `help report*` in the Innovus shell and read through the options 141 | available to you. 142 | 143 | ## Checkoff 2: Innovus Commands 144 | Explain the PAR flow, or ask some questions about any steps you don't understand. 145 | 146 | --- 147 | ## Question 3: Innovus Reports 148 | 149 | **a.) What is the area consumed by your design?** 150 | **What percentage of the total area does the arbiter occupy?** 151 | 152 | **b.) Submit a screenshot of your setup slack histogram.** 153 | **Compared with the histogram you obtained in Lab 4, does your new slack distribution support the observed performance improvements you obtained in your coprocessor?** 154 | 155 | --- 156 | 157 | After you are done with the flow, it is time to simulate our newly printed post-PAR netlist. Type 158 | the following command: 159 | 160 | ```shell 161 | make sim-gl-par 162 | ``` 163 | 164 | This will use the same testbench, but will now use the post-PAR netlist of your design, backannotated with delays and parasitics from PAR. Make sure to adjust the `CLOCK_PERIOD` variable in `sim-gl-par.yml` to match the clock period you obtained from PAR. Note, however, that the exact 165 | clock period may not work and you may need to relax it slightly. 166 | 167 | After running `make sim-gl-par` you can run power analysis using: 168 | 169 | ```shell 170 | make power-par 171 | ``` 172 | 173 | Navigate to `build/power-rundir/activePowerReports.PVT_0P63V_100C.setup_view/` and open `power.rpt`. Do 174 | the power estimation numbers match your expectation? 175 | 176 | --- 177 | ## Question 4: Trade-offs 178 | 179 | **a.)** Re-run the flow using your old design. 180 | To prevent your `build` directory from being overwritten, set the `OBJ_DIR` Make variable to a different name (i.e. `make par OBJ_DIR=build2`). 181 | Using the area and power values from Innovus, 182 | **how does the performance improvement from the dual-unit design compare to area occupation and power consumption increase compared to your old design?** 183 | 184 | **b.)** Modify your `gcd_coprocessor.v` to take an input parameter in terms of number of clock cycles we 185 | want our design to meet (`parameter TARGET_NUMBER_OF_CYCLES`) for this given testbench. Your 186 | code should generate a low area, low power design if the number is greater than that your simple 187 | gcd coprocessor can achieve, and it should generate the dual-unit design if it is lower. 188 | **Submit your code.** 189 | 190 | Hint: Use the `verilog` `generate` syntax for choosing between designs. See [here](https://www.chipverify.com/verilog/verilog-generate-block) for documentation on how to use the `generate` syntax. 191 | 192 | c.) (Optional) Using a rough estimate of target number of cycles versus number of units in the design, 193 | write a code that will generate 1-8 cores depending on the performance demand. Do NOT do this 194 | by writing out every possible case explicitly. You can limit the number of units to powers of two 195 | (1,2,4,8) if it makes your life easier. 196 | 197 | --- 198 | 199 | ## Lab Deliverables 200 | 201 | ### Lab Due 11:59 PM, 2 weeks after your registered lab section. (Oct. 17 for lab section 1) 202 | 203 | - Submit a written report with all 4 questions answered to Gradescope 204 | - Checkoff with an ASIC lab TA 205 | 206 | ## Acknowledgements 207 | 208 | This lab is the result of the work of many EECS151/251 GSIs over the years including: 209 | 210 | - Nathan Narevsky (2014, 2017) 211 | - Brian Zimmer (2014) 212 | - Cem Yalcin (2019) 213 | 214 | Modified By: 215 | - John Wright (2015,2016) 216 | - Ali Moin (2018) 217 | - Arya Reais-Parsi (2019) 218 | - Cem Yalcin (2019) 219 | - Tan Nguyen (2020) 220 | - Harrison Liew (2020) 221 | - Sean Huang (2021) 222 | - Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu (2021) 223 | - Dima Nikiforov (2022) 224 | - Roger Hsiao (2022) 225 | -------------------------------------------------------------------------------- /project/checkpoint4.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Project Specification: Checkpoint 4 2 |

3 | Prof. Sophia Shao 4 |

5 |

6 | TAs (ASIC): Dima Nikiforov 7 |

8 |

9 | Department of Electrical Engineering and Computer Science 10 |

11 |

12 | College of Engineering, University of California, Berkeley 13 |

14 | 15 | --- 16 | 17 | ## 1 Synthesis, PAR, & Power 18 | 19 | ### 1.1 Performing Synthesis and PAR 20 | Make sure your design is backed up at this point. 21 | 22 | The setup for Synthesis, and PAR is the similar to what we have used in the labs during the class, 23 | with some formatting differences. In `par.yml`, there is extra guidance 24 | for how to do placement constraints. Based on how you implemented the caches from the previous 25 | checkpoint, you will need to modify these constraints to match the master SRAM cell used as well as 26 | the path. Now you should be ready to proceed to Synthesis and PAR. As in the previous labs, execute 27 | the following: 28 | ```shell 29 | export HAMMER_HOME=$PWD/hammer 30 | source hammer/sourceme.sh 31 | ``` 32 | The first thing you should do before simulating, is to make the SRAM libraries, with the command: 33 | ``` 34 | make srams 35 | ``` 36 | If you want to make sure the RTL has been pointed to correctly, you can try running the asm tests 37 | from this environment. To do so, type the following commands: 38 | ``` 39 | make sim-rtl test_asm=all 40 | ``` 41 | The command generates the `simv` file, which is the simulation executable, then iterates through all 42 | the asm tests using the root `Makefile`. If everything looks fine, you can proceed to Synthesis and 43 | PAR: 44 | ``` 45 | make clean 46 | make srams 47 | make syn 48 | make par 49 | ``` 50 | If everything went smoothly, you should now have a circuit laid out. To view the layout, go to 51 | `build/par-rundir/` directory and type 52 | ``` 53 | ./generated_scripts/open_chip 54 | ``` 55 | You are expected to record and document your area, power and clock frequency performance (as 56 | determined by your critical path). To verify that your design works after PAR, use the following commands: 57 | ``` 58 | make sim-gl-par test_asm=all 59 | ``` 60 | Some final notes: 61 | * You may also want to generally make sure that the post-synthesis netlist passes tests before moving onto post-PAR simulation, because the latter can be slower and will complicate your debugging with any PAR-related failures you may have (e.g. incomplete wiring of signal or clock nets 62 | due to a bad floorplan). 63 | * There is a new constraint added to `syn.tcl` under the key `vlsi.inputs.delays`. 64 | The external memory model in `riscv_test_harness.v` generates a delayed version of the signals 65 | going into your CPU (see the parameter `INPUT_DELAY`). Annotating these delays for synthesis/PR 66 | is necessary in order capture this effect when the tools perform timing analysis. If you are 67 | curious, this gets translated into `build/syn-rundir/pin_constraints_fragment.sdc` 68 | as input to synthesis. After synthesis, the relevant pin delays are encoded in 69 | `build/syn-rundir/riscv_top.mapped.sdc`. These are Synopsys Design Constraint 70 | format files. Do not touch this delay constraint except to update the value as your clock period 71 | divided by 5. 72 | * As described in Lab 6, the ASAP7 dummy SRAMs do not have complete timing information. 73 | This is most apparent in gate-level simulations because the SRAMs do not provide any SDF 74 | timing annotation. You may find that despite meeting timing in synthesis and PAR, you will 75 | likely need to increase the gate-level simulation clock period for the benchmarks to pass. 76 | 77 | ### 2 Checkpoint 4 Deliverables 78 | *Checkoff due: May 6 (Friday), 2022* 79 | 80 | 81 | 1. Show that all of the assembly tests and final pass using the cache in a post-par simulation 82 | 83 | 2. Show your layout, and explain your design considerations when creating the floorplan 84 | 85 | 3. Show your final pipeline diagram, updated to match the code 86 | 87 | --- 88 | 89 | ## 3 Beyond Checkpoint 4: CPU Optimization 90 | 91 | ### 3.1 Optimizing for frequency 92 | Beyond functionality, your final project grade will be determined by the maximum operating frequency 93 | of your processor, determined by the critical path. You will also want to optimize for the number of 94 | cycles that your processor takes to execute certain programs, more on that later. The critical path will 95 | be dependent on how aggressively you ask the tools to optimize the design, by changing the target clock 96 | period in the `syn.yml` file. 97 | 98 | When Innovus is finished, look at the timing report for the critical path. In some cases, it is possible 99 | to modify your Verilog to improve the critical path by moving pipeline stage registers. However in other 100 | cases, timing can only be improved by tweaking settings in `syn.yml` and `par.yml`. 101 | 102 | Be sure to backup (meaning check in or branch) your working design before attempting to move 103 | logic, because functionality is worth much more of your grade than maximum frequency. 104 | 105 | You are allowed to add additional pipeline stages, but remember that you will need to deal with the additional hazards that accompany them. 106 | Be careful that adding additional stages does not increase the overall execution. 107 | Your final performance metric is not only based on the clock speed at which your design will run, so keep 108 | that in mind before heavily modifying your design. 109 | 110 | Note for bonus grading: due to the SRAM timing issue described above, the maximum frequency 111 | you achieved in PAR (not gate-level simulation) is most accurate and should be what you report for 112 | frequency. 113 | 114 | ### 3.2 Optimizing for number of cycles 115 | We are providing you tests that are the output of example C programs to run for your processor. They 116 | are meant to be a representative example of different types of programs that each have different reasons 117 | why they may take extra cycles to execute, for a variety of reasons including, but not limited to cache 118 | misses, and branch/jump stalls. A more complicated cache structure may be able to reduce some of the 119 | time spent waiting for memory accesses, but it may not be optimal for all cases. If you implement a 120 | configurable cache you are allowed to set the cache settings differently on a per test basis, you will need 121 | to add those pins to the top level Riscv151 file as well as the testbench with compile flags for VCS. In 122 | terms of dealing with branching and jumping, you can implement any type of branch predictor that you 123 | want to. A branch predictor in its simplest form will always choose to take (or not take) the branch and 124 | then figure out if it was incorrect, and if so go back to where the instruction memory should have gone, 125 | making sure that any additional instructions that were started do not change the state of the CPU. This 126 | means that there should be no writes to memory or any registers for those instructions. 127 | 128 | The list of final tests are contained within the Makefile under the variable `bmark_tests`, which 129 | include a few tests that are meant to actually test the performance of your design. These tests are longer 130 | C programs that are meant to test different aspects of your design and how you handle different types 131 | of hazards. To run these longer tests you can run the following commands, like in checkpoint #3: 132 | ``` 133 | make sim-rtl test_bmark=all 134 | ``` 135 | You may need to increase the number of cycles for timeout for some of the longer tests (like sum, 136 | replace and cachetest) to pass. 137 | 138 | ### 3.3 Optimizing for power 139 | **DISCLAIMER:** The infrastructure to do power analysis in this project is very different from gate-level 140 | simulation and power analysis so far. Doing this optimization is *purely optional* and should only be 141 | tried after you can pass the benchmarks normally! **Proceed at your own risk!** 142 | 143 | You have the ability to also find out the power consumption of your processor for the various provided benchmarks. 144 | The value of this is to figure out whether the way you wrote your logic is efficient 145 | and avoids extra switching activity. Simplify instruction decode logic, forwarding paths, etc. can result 146 | in lower power consumption! 147 | 148 | Near the bottom of `sim-gl-par.yml`, you will see a few lines: 149 | ``` 150 | execute_sim: false 151 | # Below is for power analysis. See the spec for instructions! 152 | # execution_flags_append: 153 | # - "+loadmem=../../tests/asm/addi.hex" 154 | # - "+max-cycles=10000" 155 | ``` 156 | If you reverse the comments (i.e. comment out `execute_sim`: false and uncomment the 157 | rest), this tells Hammer to run the `simv` executable with the addi test, instead of having the `Makefile` 158 | in the root folder run the executable. This is currently the only way that we can get Hammer currently 159 | to generate the SAIF file with our benchmark hex files. To proceed with the simulation of addi in this 160 | case: 161 | ``` 162 | make sim-gl-par test_asm=addi.out 163 | ``` 164 | You will find that it will do a simulation twice due to how the root `Makefile` is configured. The 165 | first one should pass (after a lot of printing each cycle number), while the second one should also pass 166 | like you have seen so far—ignore this second simulation. You should now see an `ucli.saif` file in 167 | `build/sim-rundir`. Then, as in previous labs, run Voltus: 168 | ``` 169 | make power-par 170 | ``` 171 | And you should get static and dynamic power reports in `build/power-rundir`. 172 | 173 | Some closing recommendations: 174 | 175 | * This infrastructure only allows us to run one benchmark at a time. To run a different benchmark, 176 | replace the hex file in the `execution_flags_append` list, and alter the `max-cycles` value 177 | as necessary (see the `*_timeout_cycles_variables` in the root Makefile for the numbers). 178 | * Due to the ASAP7 PDK’s dummy SRAMs, we can’t measure SRAM power, and thus can’t find 179 | out how power-efficient our caching is. Therefore, the best benchmarks to run would be an 180 | arithmetic-heavy one that relies heavily on the register file (but the provided benchmarks require 181 | memory accesses). If you have lots of time on your hand, we encourage you to find power 182 | numbers for the **final** benchmark, but **you will not be graded on power performance.** 183 | 184 | --- 185 | 186 | 187 | ## Acknowledgement 188 | 189 | This project is the result of the work of many EECS151/251 GSIs over the years including: 190 | Written By: 191 | - Nathan Narevsky (2014, 2017) 192 | - Brian Zimmer (2014) 193 | Modified By: 194 | - John Wright (2015,2016) 195 | - Ali Moin (2018) 196 | - Arya Reais-Parsi (2019) 197 | - Cem Yalcin (2019) 198 | - Tan Nguyen (2020) 199 | - Harrison Liew (2020) 200 | - Sean Huang (2021) 201 | - Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu (2021) 202 | - Dima Nikiforov (2022) 203 | -------------------------------------------------------------------------------- /project/overview.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Project Specification RISC-V Processor Design: Overview 2 |

3 | Prof. Sophia Shao 4 |

5 |

6 | TAs (ASIC): Dima Nikiforov 7 |

8 |

9 | Department of Electrical Engineering and Computer Science 10 |

11 |

12 | College of Engineering, University of California, Berkeley 13 |

14 | ## 1. Introduction 15 | 16 | The primary goal of this project is to familiarize students with the methods and tools of digital design. In order to make the project both interesting and useful, we will guide you through the implementation of a CPU that is intended to be integrated on a modern SoC. Working alone or in teams of 2, you will be designing a simple 3-stage CPU that implements the RISC-V ISA, developed here at UC Berkeley. If you work in a team, you must both have a complete understanding of your entire project code, and you will both receive the same grade. 17 | 18 | Your first and most important goal is to write a functional implementation of your processor. To better expose you to real design decisions, you will also be tasked with improving the performance of your processor. You will be required to meet a minimum performance to be specified later in the project. 19 | 20 | You will use Verilog HDL to implement this system. You will be provided with some testbenches to verify your design, but you will be responsible for creating additional testbenches to exercise your entire design. Your target implementation technology will be the ASAP7 7nm Educational PDK, a predictive model technology used for instruction. The project will give you experience designing synthesizeable RTL (Register Transfer Level) code, resolving hazards in a simple pipeline, building interfaces, and approaching system-level optimization. 21 | 22 | Your first step will be to map our high level specification to a design which can be translated into a hardware implementation. You will then generate and debug that implementation in Verilog. These steps may take significant time if you do not put effort into your system architecture before attempting implementation. After you have built a working design, you will be optimizing it for speed in the 7nm technology that we have been using this semester. 23 | 24 | 25 | ### 1.1 RISC-V 26 | The final project for this class will be a VLSI implementation of a RISC-V (pronounced risk-five) CPU. RISC-V is an instruction set architecture (ISA) developed here at UC Berkeley. It was originally developed for computer architecture research and education purposes, but recently there has been a push towards commercialization and industry adoption. For the purposes of this lab, you don’t need to delve too deeply into the details of RISC-V. However, it may be good to familiarize yourself with it, as this will be at the core of your final project. Check out the official [RISC-V Instruction Set Manual](https://riscv.org/technical/specifications/) (Volume 1, Unprivileged Spec) and explore http://riscv.org for more information. 27 | - Read sections 2.2 and 2.3 to understand how the different types of instructions are encoded. 28 | - Read sections 2.4, 2.5, 2.6, and 9.1 and think about how each of the instructions will use the ALU 29 | 30 | ### 1.2 Project phases 31 | Your project will consist of two different phases: front-end and back-end. Within each phase, you will have multiple checkpoints that will ensure you are making consistent progress. These checkpoints will contribute (although not significantly) to your final grade. You are free to make design changes after they have been checked off. 32 | 33 | In the first phase (front-end), you will design and implement a 3-stage RISC-V processor in Verilog, and run simulations to test for functionality. At this point, you will only have a functional description of your processor that is independent of technology (there are no standard cells yet). You are highly encouraged to finish each checkpoint early, and each checkpoint will be released before the due date of the ongoing one. Everything will take much longer than you expect, and finishing early gives you more time to improve your QoR (Quality of Results, e.g. clock period). 34 | 35 | In the second phase (back-end), you will implement your front-end design in the ASAP7 7nm kit using the VLSI tools you used in lab. When you have finished phase 2, you will have a design that could move onto fabrication if this were a real technology process. You will have about 2 weeks to complete the second phase after its release. 36 | 37 | ### 1.3 Philosophy 38 | This document is meant to describe a high-level specification for the project and its associated support hardware. You can also use it to help lay out a plan for completing the project. As with any design you will encounter in the professional world, we are merely providing a framework within which your project must fit. 39 | 40 | You should consider the GSI(s) a source of direction and clarification, but it is up to you to produce a fully-functional design and its physical implementation. Ultimately the responsibility of designing and debugging your solution lies on you. 41 | 42 | ### 1.4 General Project Tips 43 | Be sure to use top-down design methodologies in this project. We began by taking the problem of designing a basic computer system, modularizing it into distinct parts, and then refining those parts into manageable checkpoints. You should take this scheme one step further; we have given you each checkpoint, so break each into smaller, manageable pieces. 44 | 45 | As with many engineering disciplines, digital design has a normal development cycle. In the norm, after modularizing your design, your strategy should roughly resemble the following steps: 46 | 47 | - **Design** your modules well, make sure you understand what you want before you begin to code. 48 | 49 | - **Code** exactly what you designed; do not try to add features without redesigning. 50 | 51 | - **Simulate** thoroughly; writing a good testbench is as much a part of creating a module as actually coding it. 52 | - **Debug** completely; anything which can go wrong with your implementation will. 53 | 54 | Some general tips when designing complex RTL modules: 55 | 56 | * Document your project thoroughly as you go 57 | * comment your Verilog 58 | * before making any RTL changes, **modify your pipeline diagram first to visualize this change**, doing this: 59 | * may reveal the change is actually infeasible 60 | * ensures that you and your partner have the same view of your processor's operation 61 | * Split the module operation into data/control paths and design each separately 62 | * Start with the simplest possible implementation 63 | * Make changes incrementally and always test your module after each change, no matter how small 64 | * Finish the required features first before attempting any extra features 65 | * Use github version control features like commits, branches, etc. 66 | * Save your work often and rely on redundancy (e.g. copy files from `/scratch` to your home directory often to ensure they're backed up) 67 | * Parallelize work as much as possible (e.g. start writing CPU RTL as you finish your diagram, work on CPU and Cache in parallel, start physical design as you finish your cache) 68 | 69 | 70 | This project is divided into checkpoints. Each checkpoint will be due 1 to 2 weeks after its release, but the next checkpoint will be released early. Use this to your advantage- try to get ahead so that you have additional time to debug. Your TA will clarify the specific timeline for your semester. 71 | 72 | The most important goal is to design a functional processor- this alone is 50-60% of the final grade, and you must have it **working completely** to receive any credit for performance. 73 | 74 | --- 75 | 76 | ## 2. Front-end design (Phase 1) 77 | 78 | The first phase in this project is designed to guide the development of a three-stage pipelined RISC-V CPU that will be used as a base system for your back-end implementation. 79 | Phase 1 will last for 5 weeks and has weekly checkpoints. 80 | 81 | - Checkpoint 1: ALU design and pipeline diagram 82 | - Checkpoint 2: Core implementation 83 | - Checkpoint 3: Core + memory system implementation 84 | - Checkpoint 4: Physical Design 85 | 86 | 87 | ### 2.1 Adding SSH Key 88 | First you must add an SSH key to your Github account, to allow you to push to your project repo from the instructional machines without entering your Github password each time. You may run these commands in any location on any instructional machine (the SSH key will be stored in your home directory and thus work on all machines). 89 | ```shell 90 | ssh-keygen -t ed25519 -C "your_email@example.com" 91 | # hit Enter to each prompt (leave response blank) 92 | cat ~/.ssh/id_ed25519.pub 93 | # Then select and copy the contents of the id_ed25519.pub file 94 | # displayed in the terminal to your clipboard 95 | ``` 96 | 97 | In your browser, navigate to [https://github.com/settings/ssh/new](https://github.com/settings/ssh/new) (log into your Github account if needed). You should see the `SSH Keys / Add New` page. Enter the following values: 98 | * Title: `something descriptive (ex. eecs151)` 99 | * Key: `paste the contents of the id_ed25519.pub file` 100 | 101 | Then click the green `Add SSH key` button. 102 | 103 | ### 2.2 Project Git Repo 104 | The skeleton files for the project will be delivered as a git repository provided by the staff. You should clone this repository as follows. It is highly recommended to familiarize yourself with git and use it to manage your development. 105 | 106 | 107 | ```shell 108 | git clone /home/ff/eecs151/labs/project_skeleton /path/to/my/project 109 | ``` 110 | 111 | To get a team repo, fill out the google form via the link on Piazza with your team information. Please do this even if you are working alone, as these git repos will be used for version control and as part of the final checkoff. You will receive an email with an invite link to your project repo, which you should click to join before following the directions below. 112 | 113 | An example working flow to be able to pull from the skeleton as well as push/pull with your team repository is shown below: 114 | 115 | 116 | ```shell 117 | cd /path/to/my/project 118 | git remote add myOrigin git@github.com:EECS150/fa21_asic_teamXX 119 | ``` 120 | 121 | Then to pull changes from the skeleton, you would need to type: 122 | ```shell 123 | git pull origin master 124 | ``` 125 | 126 | To pull changes from your team repository you would type: 127 | ```shell 128 | git pull myOrigin master 129 | ``` 130 | 131 | And to push changes to your team repository (please do not attempt to push to the skeleton repository), you would usually want to pull first (above) and then type: 132 | ```shell 133 | git push myOrigin master 134 | ``` 135 | 136 | --- 137 | 138 | ## 3. Grading 139 | 140 | ### EECS 151: 141 | | | | 142 | |-------------------|---------| 143 | | **70%** | Functionality at project due date: Your design will be subjected to a comprehensive test suite and your score will reflect how many of the tests your implementation passes. 144 | | **25%** | Final Report and Final Interview: If your design is not 100% functional, this is your opportunity explain your bugs and recoup points. 145 | | **5%** | Checkpoints: Each check-off is worth 1.25%. If you accomplished all of your checkpoints on time, you will receive full credit in this category. 146 | | **Bonus 5%** | Performance at project due date: You must have a fully working design to score points in this section. You will receive up to 5 bonus points as your performance improves relative to your peers. Performance will be calculated using the Iron Law: IPC * F 147 | 148 | ### EECS 251A: 149 | | | | 150 | |-----------------|---------| 151 | | **60%** | Functionality at project due date: Your design will be subjected to a comprehensive test suite and your score will reflect how many of the tests your implementation passes. 152 | | **10%** | Set-Associative Cache: Implementation and performance of the configurable set-associative cache. 153 | | **25%** | Final Report and Final Interview: If your design is not 100% functional, this is your opportunity explain your bugs and recoup points. 154 | | **5%** | Checkpoints: Each check-off is worth 1.25%. If you accomplished all of your checkpoints on time, you will receive full credit in this category. 155 | | **Bonus 5%** | Performance at project due date: You must have a fully working design to score points in this section. You will receive up to 5 bonus points as your performance improves relative to your peers. Performance will be calculated using the Iron Law: IPC * F 156 | 157 | ## Acknowledgement 158 | 159 | This project is the result of the work of many EECS151/251 GSIs over the years including: 160 | Written By: 161 | - Nathan Narevsky (2014, 2017) 162 | - Brian Zimmer (2014) 163 | Modified By: 164 | - John Wright (2015,2016) 165 | - Ali Moin (2018) 166 | - Arya Reais-Parsi (2019) 167 | - Cem Yalcin (2019) 168 | - Tan Nguyen (2020) 169 | - Harrison Liew (2020) 170 | - Sean Huang (2021) 171 | - Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu (2021) 172 | - Dima Nikiforov (2022) 173 | -------------------------------------------------------------------------------- /project/checkpoint1.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Project Specification: Checkpoint 1 2 |

3 | Prof. Sophia Shao 4 |

5 |

6 | TAs (ASIC): Dima Nikiforov 7 |

8 |

9 | Department of Electrical Engineering and Computer Science 10 |

11 |

12 | College of Engineering, University of California, Berkeley 13 |

14 | 15 | --- 16 | 17 | ## ALU design and pipeline diagram 18 | The ALU that we will implement in this lab is for a RISC-V instruction set architecture. Pay close attention to the design patterns and how the ALU is intended to function in the context of the RISC-V processor. In particular it is important to note the separation of the datapath and control used in this system which we will explore more here. 19 | 20 | The specific instructions that your ALU must support are shown in the tables below. The branch condition should not be calculated in the ALU. Depending on your CPU implementation, your ALU may or may not need to do anything for branch, jump, load, and store instructions (i.e., it can just output 0). 21 | 22 | --- 23 | 24 | ### 1. Making a pipeline diagram 25 | 26 | 27 | The first step in this project is to make a pipeline diagram of your processor, as described in lecture. You only need to make a diagram of the datapath (not the control). Each stage should be clearly separated with a vertical line, and flip-flops will form the boundary between stages. It is a good idea to name signals depending on what stage they are in (eg. `s1_killf`, `s2_rd0`). Also, it is a good idea to separately name the input/output (D/Q) of a flip flop (eg. `s0_next_pc`, `s1_pc`). Draw your diagram in a drawing program (Inkscape, Google Drawings, draw.io or any program you want), because you will need to keep it up-to-date as you build your processor. As such, we recommend you leave plenty of space between diagram elements to make it easier to insert changes as your project evolves. 28 | It helps to print out scratch copies while you are debugging your processor and to keep your drawings revision-controlled with git. Once you have finished your initial datapath design, you will implement the main building block in the datapath—the ALU. 29 | 30 | --- 31 | 32 | ### 2. ALU functional specification 33 | Given specifications about what the ALU should do, you will create an ALU in Verilog and write a test harness to test the ALU. 34 | 35 | The encoding of each instruction is shown in the table below. There is a detailed functional description of each of the instructions in Section 2.4 of the [RISC-V Instruction Set Manual](https://riscv.org/technical/specifications/) (Volume 1, Unprivileged Spec). Pay close attention to the functional description of each instruction as there are some subtleties. 36 | 37 |

38 | 39 |

40 | 41 | --- 42 | 43 | ### 3. Project Files 44 | We have provided a skeleton directory structure to help you get started. 45 | 46 | Inside, you should see a `src` folder, as well as a `tests` folder. The `src` folder contains all of 47 | the verilog modules for this phase, and the `tests` folder contains some RISC-V test binaries for your processor. 48 | 49 | --- 50 | 51 | ### 4. Testing the Design 52 | Before writing any of modules, you will first write the tests so that once you’ve written the modules you’ll be able to test them immediately. This is effectively Test-driven Development (TDD). Writing tests first is good practice- it forces you to write thorough tests, and ensures that tests will exist when you need to rapidly iterate through module design tweaks. Thorough understanding of the expected functionality is key to writing good tests (or RTL). You will be expected to write unit tests for any modules that you design and implement and write integration tests. Unit tests will verify the functionality of individual modules against your specification. Integration tests verify that all the modules work as a system once you connect them together. 53 | 54 | #### 4.1 Verilog Testbench 55 | One way of testing Verilog code is with testbench Verilog files. The outline of a test bench file has been provided for you in ``ALUTestbench.v``. There are several key components to this file: 56 | - `` `timescale 1ns / 1ps`` - This specifies, in order,the reference time unit and the precision. This example sets the unit delay in the simulation to 1ns (i.e. `#1` = 1ns) and the precision to 1ps (i.e. the finest delay you can set is `#0.001` = 1ps). 57 | - The clock is generated by the code below. Since the ALU is only combinational logic, this is not necessary, but it will be a helpful reference once you have sequential elements. 58 | - The ``initial`` block sets the clock to 0 at the beginning of the simulation. You should be sure to only change your stimulus when the clock is falling, since the data is captured on the rising edge. Otherwise, it will not only be difficult to debug your design, but it will also cause hold time violations when you run gate level simulation. 59 | - You must use an always block without a sensitivity list (the `@` part of an always statement) to cause the clock to run automatically. 60 | ```verilog 61 | parameter Halfcycle = 5; //half period is 5ns 62 | localparam Cycle = 2*Halfcycle; 63 | reg Clock; 64 | // Clock Signal generation: 65 | initial Clock = 0; 66 | always #(Halfcycle) Clock = ̃Clock; 67 | ``` 68 | - ``task checkOutput``; - this task contains Verilog code that you would otherwise have to copy paste many times. Note that it is not the same thing as a function (as Verilog also has functions). 69 | - ``{$random} & 31'h7FFFFFFF `` - $random generates a pseudorandom 32-bit integer. A bitwise AND will mask the result for smaller bit widths. 70 | 71 | For these two modules, the inputs and outputs that you care about are ``opcode``, ``funct``, ``add_rshift_type``, ``A``, ``B`` and ``Out``. To test your design thoroughly, you should work through every possible `opcode`, `funct`, and ``add_rshift_type`` that you care about, and verify that the correct Out is generated from the A and B that you pass in. 72 | 73 | The test bench generates random values for ``A`` and ``B`` and computes ``REFout = A + B``. It also contains calls to ``checkOutput`` for load and store instructions, for which the ALU should perform addition. It will be up to you to write tests for the remaining combinations of opcode, funct, and ``add_rshift_type`` to test your other instructions. 74 | 75 | Remember to restrict ``A`` and ``B`` to reasonable values (e.g. masking them, or making sure that they are not zero) if necessary to guarantee that a function is sufficiently tested. Please also write tests where the inputs and the output are hard-coded. These should be corner cases that you want to be certain are stressed during testing. 76 | 77 | --- 78 | 79 | #### 4.2 Test Vector Testbench 80 | An alternative way of testing is to use a test vector, which is a series of bit arrays that map to the inputs and outputs of your module. The inputs can be all applied at once if you are testing a combinational logic block or applied over time for a sequential logic block (e.g. an FSM). 81 | 82 | You will write a Verilog testbench that takes the parts of the bit array that correspond to the inputs of the module, feeds those to the module, and compares the output of the module with the output bits of the bit array. The bit vector should be formatted as follows: 83 | ```verilog 84 | [106:100] = opcode 85 | [99:97] = funct 86 | [96] = add_rshift_type 87 | [95:64] = A 88 | [63:32] = B 89 | [31:0] = REFout 90 | ``` 91 | Open up the skeleton provided to you in ``ALUTestVectorTestbench.v``. You need to complete the module by making use of ``$readmemb`` to read in the test vector file (named testvectors.input), writing some assign statements to assign the parts of the test vectors to registers, and writing a for loop to iterate over the test vectors. 92 | 93 | The syntax for a for loop can be found in ``ALUTestbench.v``. ``$readmemb`` takes as its arguments a filename and a reg vector, e.g.: 94 | 95 | ```verilog 96 | reg [5:0] bar [0:20]; 97 | $readmemb("foo.input", bar); 98 | ``` 99 | 100 | #### 4.3 Writing Test Vectors 101 | Additionally, you will also have to generate actual test vectors to use in your test bench. A test vector can either be generated in Verilog (like how we generated ``A``, ``B`` using the random number generator and iterated over the possible opcodes and functs), or using a scripting language like Python. Since we have already written a Verilog test bench for our ALU and decoder, we will tackle writing a few test vectors by hand, then use a script to generate test vectors more quickly. 102 | 103 | Test vectors are of the format specified above, with the 7 opcode bits occupying the left-most bits. In the tests folder, create the file testvectors.input and add test vectors for the following instructions to the end (i.e. manually type the 107 zeros and ones required for each test vector): ``SLT``, ``SLTU``, ``SRA``, and ``SRL``. 104 | 105 | In the same directory, we’ve also provided a test vector generator ``ALUTestGen.py`` written in Python. We used this generator to generate the test vectors provided to you. If you’re curious, you can read the next paragraph and poke around in the file. If not, feel free to skip ahead to the next section. 106 | 107 | The script ``ALUTestGen.py`` is located in ``tests``. Run it so that it generates a test vector file in the tests folder. Keep in mind that this script makes a couple assumptions that aren’t necessary and may differ from your implementation: 108 | 109 | - Jump, branch, load and store instructions will use the ALU to compute the target address. 110 | 111 | - For all shift instructions, `A` is shifted by `B`. In other words, `B` is the shift amount. 112 | 113 | - For the `LUI` instruction, the value to load into the register is fed in through the `B` input. 114 | 115 | You can either match these assumptions or modify the script to fit with your implementation. All the methods to generate test vectors are located in the two Python dictionaries ``opcodes`` and ``functs``. The lambda methods contained (separated by commas) are respectively: the function that the operation should perform, a function to restrict the ``A`` input to a particular range, and a function to restrict the ``B ``input to a particular range. 116 | 117 | **If you modify the Python script**, run the generator to make new test vectors. This will overwrite the ``testvectors.input`` file, so if you want to save your handwritten test vectors, rename the file before running the script, then append them once the file has been generated. 118 | ```shell 119 | python ALUTestGen.py 120 | ``` 121 | This will write the test vector into the file ``testvectors.input``. Use this file as the target test vector 122 | file when loading the test vectors with ``$readmemb``. 123 | 124 | --- 125 | 126 | ### 5. Writing Verilog Modules 127 | For this exercise, we’ve provided the module interfaces for you. They are logically divided into a control (``ALUdec.v``) and a datapath (``ALU.v``). The datapath contains the functional units while control contains the necessary logic to drive the datapath. You will be responsible for implementing these two modules. Descriptions of the inputs and outputs of the modules can be found in the first few lines of each file. The ALU should take an ``ALUop`` and its two inputs ``A`` and ``B``, and provide an output dependent on the ``ALUop``. The operations that it needs to support are outlined in the Functional Specification. Don’t worry about sign extensions–they should take place outside of the ALU. The ALU decoder uses the ``opcode``, ``funct``, and ``add_rshift_type`` to determine the ``ALUop`` that the ALU should execute. The ``funct`` input corresponds to the ``funct3`` field from the ISA encoding table. The ``add_rshift_type`` input is used to distinguish between ``ADD/SUB``, ``SRA/SRL``, and ``SRAI/SRLI``; you will notice that each of these pairs has the same ``opcode`` and ``funct3``, but differ in the ``funct7 `` field. 128 | 129 | You will find the case statement useful, which has the following syntax: 130 | ```verilog 131 | always@(*) begin 132 | case(foo) 133 | 3'b000: // something happens here 134 | 3'b001: // something else happens here 135 | 3'b010, 3'b011: // you can have more than 136 | // one case do the same thing 137 | default: // everything else 138 | endcase end 139 | ``` 140 | 141 | To make your job easier, we have provided two Verilog header files: ``Opcode.vh`` and ``ALUop.vh``. They provide, respectively, macros for the opcodes and functs in the ISA and macros for the different ALU operations. You should feel free to change ``ALUop.vh`` to optimize the ``ALUop`` encoding, but if you change ``Opcode.vh``, you will break the test bench skeleton provided to you. You can use these macros by placing a backtick in front of the macro name, e.g.: 142 | 143 | ```verilog 144 | case(opcode) 145 | OPC_STORE: 146 | ``` 147 | is the equivalent of: 148 | ```verilog 149 | case(opcode) 150 | 7'b0100011: 151 | ``` 152 | 153 | --- 154 | 155 | ### 6. Running the Simulation 156 | 157 | Open the file ``sim-rtl.yml``, set the testbench’s name to be ALUTestbench. 158 | 159 | ```yaml 160 | tb_name: &TB_NAME "ALUTestbench" 161 | ``` 162 | 163 | By typing ```make sim-rtl``` you will run the ALU simulation.You may change the testbench’s name to ```ALUTestVectorTestbench``` to use the test vector testbench. 164 | 165 | Once you have a working design, you should see the following output when you run either of the given testbenches: 166 | ```shell 167 | # ALL TESTS PASSED! 168 | ``` 169 | 170 | To clean the simulation directory from previous simulations’ files, type ``make clean``. 171 | 172 | 173 | --- 174 | 175 | ### 7. Viewing Waveforms 176 | 177 | As in the previous labs, you should use DVE to view waveforms. 178 | 1. List of the modules involved in the test bench. You can select one of these to have its signals show up in the object window. 179 | 2. Object window - this lists all the wires and regs in your module. You can add signals to the waveform view by selecting them, right-clicking, and doing Add To Wave. 180 | 3. Waveform viewer - The signals that you add from the object window show up here. You can navigate the waves by searching for specific values, or going forward or backward one transition at a time. 181 | As an example of how to use the waveform viewer, suppose you get the following output when you run 182 | 183 | ```shell 184 | ALUTestbench: 185 | # FAIL: Incorrect result for opcode 0110011, funct: 101:, add_rshift_type: 1 186 | # A: 0x92153524, B: 0xffffde81, DUTout: 0x490a9a92, REFout: 0xc90a9a92 187 | ``` 188 | 189 | The ``$display()`` statement actually already tells you everything you need to know to fix your bug, but you’ll find that this is not always the case. For example, if you have an FSM and you need to look at multiple time steps, the waveform viewer presents the data in a much neater format. If your design had more than one clock domain, it would also be nearly impossible to tell what was going on with only ``$display()`` statements. 190 | 191 | Add all the signals from ``ALUTestbench`` to the waveform viewer and you see the following window: The two highlighted boxes contain the tools for navigation and zoom. You can hover over the icons to find out more about what each of them do. You can find the location (time) in the waveform viewer where the test bench failed by searching for the value of DUTout output by the ``$display()`` statement above (in this case, ``0x490a9a92``): 192 | 193 | 1. Selecting DUTout 194 | 2. ClickingEdit > Wave Signal Search > Search for Signal Value > ``0x490a9a92`` 195 | 196 | Now you can examine all the other signal values at this time. Compare the ```DUTout``` and ```REFout``` values at this time, and you should see that they are similar but not quite the same. From the ``opcode``, ``funct``, and ```add_rshift_type```, you know that this is supposed to be an ``SRA`` instruction, but it looks like your ALU performed a ``SRL`` instead. However, you wrote 197 | ```verilog 198 | Out = A >>> B[4:0]; 199 | ``` 200 | That looks like it should work, but it doesn’t! It turns out you need to tell Verilog to treat A as a signed 201 | number for SRA to work as you wish. You change the line to say: 202 | ```verilog 203 | Out = $signed(A) >>> B[4:0]; 204 | ``` 205 | After making this change, you run the tests again and cross your fingers. Hopefully, you will see the line: 206 | ```shell 207 | # ALL TESTS PASSED! 208 | ``` 209 | If not, you will need to debug your module until all test from the test vector file and the hard-coded test cases pass. 210 | 211 | --- 212 | 213 | ### 8. Checkpoint #1: Simple test program 214 | *Checkoff due: Apr 1 (Friday), 2022* 215 | 216 | 217 | Congratulations! You’ve started the design of your datapath by drawing a pipeline diagram, and written and thoroughly tested a key component in your processor. You should now be well-versed in testing Verilog modules. Please answer the following questions to be checked off by a TA: 218 | 1. Show your pipeline diagram, and explain when writes and reads occur in the register file and memory relative to the pipeline stages. 219 | 2. Show your working ALU test bench files to your TA and explain your hard-coded cases. You should also be able to show that the tests for the test vectors generated by the Python script and your hard-coded test vectors both work. 220 | 3. In ALUTestbench, the inputs to the ALU were generated randomly. When would it be preferable to perform an exhaustive test rather than a random test? 221 | 4. What bugs, if any, did your test bench help you catch? 222 | 5. For one of your bugs, come up with a short assembly program that would have failed had you not caught the bug. In the event that you had no bugs and wrote perfect code the first time, come up with an assembly program to stress the SRA bug mentioned in the above section. 223 | 224 | ## Acknowledgement 225 | 226 | This project is the result of the work of many EECS151/251 GSIs over the years including: 227 | Written By: 228 | - Nathan Narevsky (2014, 2017) 229 | - Brian Zimmer (2014) 230 | Modified By: 231 | - John Wright (2015,2016) 232 | - Ali Moin (2018) 233 | - Arya Reais-Parsi (2019) 234 | - Cem Yalcin (2019) 235 | - Tan Nguyen (2020) 236 | - Harrison Liew (2020) 237 | - Sean Huang (2021) 238 | - Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu (2021) 239 | - Dima Nikiforov (2022) 240 | -------------------------------------------------------------------------------- /lab1/spec.md: -------------------------------------------------------------------------------- 1 | 2 | # EECS 151/251A ASIC Lab 1: Getting Around the Compute Environment 3 |

4 | Prof. Sophia Shao 5 |

6 |

7 | TAs (ASIC): Dima Nikiforov 8 |

9 |

10 | Department of Electrical Engineering and Computer Science 11 |

12 |

13 | College of Engineering, University of California, Berkeley 14 |

15 | 16 | ## Overview 17 | 18 | The process of VLSI design is different than developing software, designing analog circuits, and even FPGA-based design. Instead of using a single graphical user interface (GUI) or environment (eg. Eclipse, Cadence Virtuoso, or Xilinx Vivado), VLSI design is done using dozens of command line interface tools on a Linux machine. These tools primarily use text files as their inputs and outputs, and include GUIs mainly for only visualization, rather than design. Therefore, familiarity with Linux, text manipulation, and scripting is required to successfully complete the labs this semester. 19 | 20 | The goal of this lab is to introduce some basic techniques needed to use the computer aided design (CAD) tools that are taught in this class. Mastering the topics in this lab will help you save hours of time in later labs and make you a much more efficient chip designer. While you go through this lab, focus on how these techniques will allow you to automate tasks and improve your efficiency. Chip design requires plenty of iteration, so being able to perform trials and identify errors quickly is key to success. 21 | 22 | ## Administrative Info 23 | 24 | This lab, like all labs will be turned in electronically using Gradescope. Please upload a pdf document with the answers to the six questions in the lab. 25 | 26 | ### Getting an Instructional Account 27 | 28 | You are required to get an EECS instructional account to login to the workstations in the lab, since you will be doing all your work on these machines (whether you're working remotely or in-person). This can be done by using WebAcct here: http://inst.eecs.berkeley.edu/webacct. 29 | 30 | Once you login using your CalNet ID, you can click on 'Get a new account' in the eecs151 row. Once the account has been created, you can email your class account form to yourself to have a record of your account information. You can follow the instructions on the emailed form to change your Linux password with `ssh update.eecs.berkeley.edu` and following the prompts. 31 | 32 | ## Logging into the Classroom Servers 33 | 34 | The servers used for this class are primarily `eda-[1-11].eecs.berkeley.edu`. You may also use the `c111-[1-17].eecs.berkeley.edu` machines 35 | (which are physically located in Cory 111/117), although those will be shared with the FPGA lab. You can access all of these machines remotely through SSH. 36 | 37 | ### Remote Access 38 | 39 | It is important that you can remotely access the instructional servers. There are two convenient ways to remotely access our 40 | lab machines: SSH (Secure SHell) and X2Go. 41 | First, select a machine. The range of accessible machines are `eda-X`, where X is a number from 1 to 11, 42 | and `c111-X`, where X is a number from 1 to 17. The fully qualified DNS name (FQDN) of 43 | your machine is then `eda-X.eecs.berkeley.edu` or `c111-X.eecs.berkeley.edu`. For example, 44 | if you select machine `eda-8`, the FQDN would be `eda-8.eecs.berkeley.edu`. 45 | You can use any lab machine, but our lab machines aren’t very powerful; if everyone 46 | uses the same one, everyone will find that their jobs perform poorly. ASIC design tools are resource 47 | intensive and will not run well when there are too many simultaneous users on these machines. We 48 | recommend that every time you want to log into a machine, examine its load on https://hivemind.eecs.berkeley.edu/ 49 | for the `eda-X` machines, or using `top` when you log in. If it is heavily loaded, consider 50 | using a different machine. If you also notice other `eecs151` users with jobs consuming excessive 51 | resources, do feel free to reach out to the GSIs about it. 52 | Next, note your instructional class acccount name - the one that looks like `eecs151-YYY`, for example 53 | `eecs151-abc`. This is the account you created at the start of this lab. 54 | 55 | 56 | #### SSH: Linux, BSD, MacOS 57 | 58 | SSH is the de facto remote terminal tool for Linux and BSD systems (which includes macOS). It 59 | lets you login to a text console from anywhere (as long as you have network connectivity). SSH 60 | also comes as a standard utility in almost all Linux and BSD systems. 61 | If you’re using Linux or BSD, you should be able to access your workstation through SSH by running: 62 | 63 | ```shell 64 | ssh eecs151-YYY@eda-X.eecs.berkeley.edu 65 | ``` 66 | 67 | In our examples, this would be: 68 | 69 | ```shell 70 | ssh eecs151-abc@eda-8.eecs.berkeley.edu 71 | ``` 72 | 73 | The SSH protocol also enables file transfer between your local and lab machines via the `sftp` and 74 | `scp` utilities. **WARNING: please only transfer files needed for your reports and nothing else, particularly files relating to CAD tool commnads or process technologies!!!** 75 | 76 | 77 | #### SSH: Windows 78 | 79 | The classic and most lightweight way to use SSH on Windows is PuTTY (https://www.putty.org/). Download it and login with the FQDN above as the Host and your instructional account 80 | username. You can also use WinSCP (winscp.net) for file transfer over SSH. 81 | Advanced users may wish to install Windows Subsystem for Linux (https://docs.microsoft.com/en-us/windows/wsl/install-win10, Windows 10 build 16215 or later) or Cygwin (cygwin.com) and use SSH, SFTP, and SCP through there. 82 | 83 | 84 | #### SSHL Session Management 85 | 86 | Because all your work will be done remotely, we recommend that you utilize SSH session management tools and that all terminal-based work be done over SSH. This would allow your remote terminal sessions to remain active even if your SSH session disconnects, intentionally or not. 87 | The two most common session managers are tmux and screen. These run persistently on the 88 | remote workstation, are highly customizable, and can greatly improve your productivity. 89 | Here are some good tmux and screen tutorials: 90 | * https://www.hamvocke.com/blog/a-quick-and-easy-guide-to-tmux/ 91 | * https://www.rackaid.com/blog/linux-screen-tutorial-and-how-to/ 92 | 93 | 94 | #### X2Go 95 | 96 | For situations in which you need a graphical interface (waveform debugging, layout viewing, etc.), 97 | you should use X2Go. This is a faster and more reliable alternative to more traditional XForwarding over SSH. X2Go is also recommended because it connects to a persistent graphical 98 | desktop environment, which continues running even if your internet connection drops. 99 | Download the X2Go client for your platform from the website: https://wiki.x2go.org/doku.php/download:start. 100 | 101 | Note: MacOS sometimes blocks the X2Go download/install, if it does follow the directions here: https://support.apple.com/en-us/HT202491. 102 | 103 | To use X2Go, you need to create a new session (look under the Session menu). Give the session any 104 | name, it doesn’t matter, but set the Host field to the FQDN of your lab machine and the User field 105 | to your instructional account username. For “Session type”, select “GNOME”. Here’s an example from macOS: 106 | 107 |

108 | 109 |

110 | 111 | 112 | ### Getting Started 113 | 114 | After you login to one of these servers, you are now ready to start the lab. You have a limited amount of space in your home directory, so we recommend completing work in the `/scratch/` directory, and then copying any important results to your home directory. 115 | 116 | To begin, get the lab files by typing the following commands: 117 | 118 | ```shell 119 | mkdir /scratch/ 120 | cd /scratch/ 121 | git clone /home/ff/eecs151/labs/lab1 122 | cd lab1 123 | ``` 124 | 125 | 126 | 127 | ## Linux Basics 128 | 129 | You will need to learn how to use Linux so that you can understand what programs are running 130 | on the server, manipulate files, launch programs, and debug problems. Please read through the 131 | tutorial here: http://linuxcommand.org/lc3_learning_the_shell.php 132 | 133 | To use the CAD tools in this class, you will need to load the class environment. All of the tools 134 | are already installed on the network filesystem, but by default users do not have the tools in their 135 | path. Try locating a program that is already installed (vim) and another which is not (innovus) 136 | by default: 137 | 138 | ```shell 139 | which vim 140 | which innovus 141 | ``` 142 | 143 | The vim program has been installed in: `/usr/bin/vim`. If you show the contents of `/usr/bin`, 144 | you will notice that you can launch any of programs by typing their filename. This is because 145 | /usr/bin is in the environment variable `$PATH`, which contains different directories to search in a 146 | colon-separated list. 147 | 148 | ```shell 149 | echo $PATH 150 | ``` 151 | 152 | To be able to access the CAD tools, you will need to append to their location to the `$PATH` variable: 153 | 154 | ```shell 155 | source /home/ff/eecs151/tutorials/eecs151.bashrc 156 | echo $PATH 157 | which innovus 158 | ``` 159 | 160 | 161 | #### Question 1: Common terminal tasks 162 | 163 | For 1-6 below, submit the command/keystrokes needed to generate the desired result. For 1-4, try generating only the desired result (no extraneous info). 164 | 165 | 1. List the 5 most recently modified items in `/usr/bin` 166 | 2. What directory is `git` installed in? 167 | 3. Show the hidden files in your lab directory (the one you cloned from `/home/ff/eecs151/labs/lab1` 168 | 4. What version of Vim is installed? Describe how you figured this out. 169 | 5. Copy the files in this lab to `/scratch` and then delete it. 170 | 6. Run `ping www.google.com`, suspend it, then kill the process. Then run it in the background, report its PID, then kill the process. 171 | 7. Run `top` and report the average CPU load, the highest CPU job, and the amount of memory used (just report the results for this question; you don't need to supply the command/how you got it). 172 | 173 | 174 | There are a few miscellaneous commands to analyze disk usage on the servers. 175 | 176 | ```shell 177 | du -ch --max-depth=1 . 178 | df -H 179 | ``` 180 | 181 | Finally, your instructional accounts have disk usage quotas. Find out how much you are allocated 182 | and how much you are using: 183 | 184 | ```shell 185 | quota -s 186 | ``` 187 | 188 | By default, you should be using the Bash shell (these labs are designed for Bash, not Csh). The 189 | Bash Guide (guide.bash.academy) is a great resource for users at all levels of Bash profiency. 190 | 191 | 192 | ## Using Text Editors 193 | 194 | Much of the time you will spend designing chips will be writing scripts in a text editor. 195 | Therefore becoming proficient at editing text is a vital skill. Unlike Java or C programming, there 196 | is no integrated development environment (IDE) for writing these scripts. However, many of the 197 | advantages of IDE’s can be obtained by using the proper editor. In this class, we will be using 198 | either Vim or Emacs. Editors such as gedit or nano are not allowed. 199 | 200 | If you have never used Vim, please follow the tutorial here: http://www.openvim.com/tutorial.html (If you would prefer to learn Emacs, you can read http://www.gnu.org/software/emacs/tour/ and run the Emacs built-in tutorial with Ctrl-h followed by t). Feel free to search for other 201 | resources online to learn more. 202 | 203 | #### Question 2: Common editor tasks 204 | 205 | For each task below, describe the keys you need to press to accomplish the action in the file `force_regs.ucli`. 206 | 207 | 1. Delete 5 lines 208 | 2. Search for the text `clock` 209 | 3. Replace the text `dut` with `device_under_test` 210 | 4. Jump to the end of the file 211 | 5. Go to line 42 212 | 6. Reload the file (in case it was modified in another window) 213 | 7. Save and exit 214 | 215 | #### Alternative Editors 216 | 217 | While Vim is a powerful editor and ubiquitous on Linux environments, there are other alternatives that might be more suitable for different use cases. A modern graphical text editor is Visual Studio Code, which supports editing text files through an SSH session. As Visual Studio Code renders text on the client machine, it can be useful for students with high latency or irregular internet connections as in such environments X2Go or Vim in SSH can feel unresponsive. To set up Visual Studio Code for remote development, please follow the tutorial here: https://code.visualstudio.com/docs/remote/ssh-tutorial 218 | 219 | ## Regular Expressions 220 | 221 | Regular expressions allow you to perform complex ’Search’ or ’Search and Replace’ operations. 222 | Please work through the tutorial here: http://regexone.com 223 | 224 | Regular expressions can be used from many different programs: Vim, Emacs, grep, sed, Python, 225 | etc. From the command line, use grep to search, and sed to search and replace. 226 | 227 | Unfortunately, deciding what characters needs to be escaped can be somewhat confusing. For 228 | example, to find all instances of `dcdc_unit_cell_x`, where `x` is a single digit number, using grep: 229 | 230 | ```shell 231 | grep "unit_cell_[0-9]\{1\}\." force_regs.ucli 232 | ``` 233 | 234 | And you can do the same search in Vim: 235 | 236 | ```vim 237 | vim force_regs.ucli 238 | /unit_cell_[0-9]\{1\}\. 239 | ``` 240 | 241 | Notice how you need to be careful what characters get escaped (the `[` is not escaped but `{` is). Now 242 | imagine we want to add a leading 0 to all of the single digit numbers. The match string in sed 243 | could be: 244 | 245 | ```shell 246 | sed -e 's/\(unit_cell_\)\([0-9]\{1\}\.\)/\10\2/' force_regs.ucli 247 | ``` 248 | 249 | Both sed, vim, and grep use ”Basic Regular Expressions” by default. For regular expressions heavy 250 | with special characters, sometimes it makes more sense to assume most characters except `a-zA-Z0-9` 251 | have special meanings (and they get escaped with only to match them literally). This is called 252 | ”Extended Regular Expressions”, and `?+{}()` no longer need to be escaped. A great resource 253 | for learning more is http://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended. In Vim, you can do this with `\v`: 254 | 255 | ```shell 256 | :%s/\v(unit_cell_)([0-9]{1}\.)/\10\2/ 257 | ``` 258 | 259 | And in sed, you can use the -r flag: 260 | 261 | ```shell 262 | sed -r -e 's/(unit_cell_)([0-9]{1}\.)/\10\2/' force_regs.ucli 263 | ``` 264 | 265 | And in grep, you can use the -E flag: 266 | 267 | ```shell 268 | grep -E "unit_cell_[0-9]{1}\." force_regs.ucli 269 | ``` 270 | 271 | sed and grep can be used for many purposes beyond text search and replace. For example, to find 272 | all files in the current directory with filenames that contain a specific text string: 273 | 274 | ```shell 275 | find . | grep ".ucli" 276 | ``` 277 | 278 | Or to delete all lines in a file that contain a string: 279 | 280 | ```shell 281 | sed -e '/reset/d' force_regs.ucli 282 | ``` 283 | 284 | #### Question 3: Fun with Regular Expressions 285 | 286 | For each regular expression, provide an answer for both basic and extended mode (`sed` and `sed -r`). 287 | You are allowed to use multiple commands to perform each task. Operate on the `force_regs.ucli` file. 288 | 289 | 1. Change all x's surrounding numbers to angle brackets. For example, `regx15xx79x` becomes `reg<15><79>`. Hint: remember to enable global subsitution. 290 | 2. Make every number in the file be exactly 3 digits with padded leading zeros (except the last 0 on each line). Eg. line 120/121 should read: 291 | 292 | ``` 293 | force -deposit rocketTestHarness.dut.Raven003Top_withoutPads.TileWrap. 294 | ... .io_tilelink_release_data.sync_w002r.rq002_wptr_regx000x.Q 0 295 | force -deposit rocketTestHarness.dut.Raven003Top_withoutPads.TileWrap. 296 | ... .io_tilelink_release_data.fifomem.mem_regx015xx098x.Q 0 297 | ``` 298 | 299 | 300 | ## File Permissions 301 | 302 | A tutorial about file permissions can be found here: http://www.tutorialspoint.com/unix/unix-file-permission.htm 303 | 304 | #### Question 4: Understanding File Permissions 305 | 306 | For each task below, please provide the commands that result in the correct permissions being set. Make no assumptions about the file's existing permissions. Operate on the `run_always.sh` script. 307 | 308 | 1. Change the script to be executable by you and no one else. 309 | 2. Add permissions for everyone in your group to be able to execute the same script 310 | 3. Make the script writable by you ane everyone in your group, but unreadable by others 311 | 4. Change the owner of the file to be `eecs151` (Note: you will not be able to execute this command, so just provide the command itself) 312 | 313 | 314 | ## Using Makefiles 315 | 316 | Makefiles are a simple way to string together a bunch of different shell tasks in an intelligent 317 | manner. This allows someone to automate tasks and save time when doing repetitive tasks 318 | since make targets allow for only files that have changed to need to be updated. Please read 319 | through the following tutorial here: http://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/ (optional). Further documentation on make can be found here: http://www.gnu.org/software/make/manual/make.html. 320 | 321 | Let’s look at a simple makefile to explain a few things about how they work - this is not meant to 322 | be anything more than a very brief overview of what a makefile is and how it works. If you look at 323 | the Makefile in the provided folder in your favorite text editor, you can see the following lines: 324 | 325 | ```shell 326 | output_name = force_regs.random.ucli 327 | 328 | $(output_name): force_regs.ucli 329 | awk 'BEGIN{srand();}{if ($$1 != "") { print $$1,$$2,$$3,int(rand()*2)}}' $< > $@ 330 | 331 | clean: 332 | rm -f $(output_name) 333 | ``` 334 | 335 | While this may look like a lot of random characters, let us walk through each part of it to see that 336 | it really is not that complicated. 337 | 338 | Makefiles are generally composed of rules, which tell Make how to execute a set of commands to 339 | build a set of targets from a set of dependencies. A rule typically has this structure: 340 | 341 | ```shell 342 | targets: dependencies 343 | commands 344 | ``` 345 | 346 | **It is very important that indentation in Makefiles are tabs, not spaces.** 347 | The two rules in the above Makefile have targets which are clean and output name. Here, 348 | output name is the name of a variable within the Makefile, which means that it can be overwritten 349 | from the command line. This can be done with the following command: 350 | 351 | ```shell 352 | make output_name=foo.txt 353 | ``` 354 | 355 | This will result in the output being written to `foo.txt` intstead of `force_regs.random.ucli`. 356 | Generally, a rule will run everytime that its dependencies have been updated more recently than 357 | its own targets, so by editing/updating the `force_regs.ucli` file (including via the touch command), you can regenerate the output name target. This is different than a bash script, as you can see in `runalways.sh`, which will always generate `force_regs.random.ucli` regardless of whether 358 | `force_regs.ucli` is updated or not. 359 | 360 | Inside the output name target, the `awk` command has a bunch of $ characters. This is because 361 | in normal `awk` the variable names are `$1`, `$2`, and then in the makefile you have to escape those 362 | variable names to get them to work properly. In Make, the character to do that is `$`. 363 | 364 | The other characters after the awk script are also special characters to make. The `$<` is the first 365 | dependency of that target, the `>` simply redirects the output of awk, and the `$@` is the name of the 366 | target itself. This allows users to create makefiles that can be reusable, since you are operating on 367 | a dependency and outputting the result into the name of your own target. 368 | 369 | #### Question 5: Makefile Targets 370 | 371 | 1. Add a new make rule that will create a file called `foo.txt`. Make it also run the `output_name` rule. 372 | 2. Name at least two ways that you could have the makefile regenerate the `output_name` target after its rule has been run. 373 | 374 | 375 | ## Comparing Files 376 | 377 | Comparing text files is another useful skill. The tools generally behave as black 378 | boxes, so comparing output files to prior output files is an important debugging technique. 379 | 380 | From the command lines, you can use `diff` to compare files: 381 | 382 | ```shell 383 | diff force_regs.ucli force_regs.random.ucli 384 | ``` 385 | 386 | You can also compare the contents of directories (the `-q` flag will summarize the results to only 387 | show the names of the files that differ, and the `-r` flag will recurse through subdirectories). 388 | For Vim users, there is a useful built-in `diff` tool: 389 | 390 | ```shell 391 | vimdiff force_regs.ucli force_regs.random.ucli 392 | ``` 393 | 394 | 395 | ## Version Control with Git 396 | 397 | Version control systems help track how files change overtime and make it easier for collaborators 398 | to work on the same files and share their changes. We use git to distribute the lab files so that 399 | bug fixes can easily be incorporated into your files. Please go through the following tutorial: 400 | https://try.github.io 401 | 402 | #### Question 6: Checking Git Understanding 403 | 404 | Submit the command required to perform the following tasks: 405 | 406 | 1. What is the difference between your current Makefile and the file you started with? 407 | 2. How do you make a new branch? 408 | 3. What is the SHA of the version you checked out? 409 | 410 | ## Customization 411 | 412 | Many of the commands and tools you will use on a daily basis can be customized. This can 413 | dramatically improve your productivity. Some tools (e.g. vim and bash) are customized using “dotfiles,” which are hidden files in your home directory (e.g. `.bashrc` and `.vimrc`) that contain a series of commands which set variables, create aliases, or change settings. Try adding the following lines to your `.bashrc` and restart your session or source 414 | `~/.bashrc`. Now when you change directories, you no longer need to type `ls` to show the directory contents. 415 | 416 | ```shell 417 | function cd { 418 | builtin cd "$@" && ls -F 419 | } 420 | ``` 421 | 422 | The following links are useful for learning how to make some common customizations. You should 423 | read these but are not required to turn in anything for this section. 424 | * https://www.digitalocean.com/community/tutorials/an-introduction-to-useful-bash-aliases-and-functions 425 | * http://statico.github.io/vim.html 426 | 427 | 428 | ## Lab Deliverables 429 | 430 | ### Lab Due: 11 AM, Friday January 28th, 2022 431 | 432 | - Submit a written report with all 6 questions answered to Gradescope 433 | 434 | ## Acknowledgement 435 | 436 | This lab is the result of the work of many EECS151/251 GSIs over the years including: 437 | Written By: 438 | - Nathan Narevsky (2014, 2017) 439 | - Brian Zimmer (2014) 440 | Modified By: 441 | - John Wright (2015,2016) 442 | - Ali Moin (2018) 443 | - Arya Reais-Parsi (2019) 444 | - Cem Yalcin (2019) 445 | - Tan Nguyen (2020) 446 | - Harrison Liew (2020) 447 | - Sean Huang (2021) 448 | - Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu (2021) 449 | - Dima Nikiforov (2022) 450 | -------------------------------------------------------------------------------- /lab3/spec.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Lab 3: Logic Synthesis 2 |

3 | Prof. Sophia Shao 4 |

5 |

6 | TAs (ASIC): Dima Nikiforov 7 |

8 |

9 | Department of Electrical Engineering and Computer Science 10 |

11 |

12 | College of Engineering, University of California, Berkeley 13 |

14 | 15 | 16 | 17 | ## Overview 18 | For this lab, you will learn how to translate RTL code into a gate-level netlist in a process called 19 | synthesis. In order to successfully synthesize your design, you will need to understand how to 20 | constrain your design, learn how the tools optimize logic and estimate timing, analyze the critical 21 | path of your design, and simulate the gate-level netlist. 22 | To begin this lab, get the project files by typing the following commands: 23 | 24 | ```shell 25 | git clone /home/ff/eecs151/labs/lab3.git 26 | cd lab3 27 | ``` 28 | 29 | You should add the following lines to the `.bashrc` file in your home folder 30 | (for more information about what `.bashrc` does, see https://www.tldp.org/LDP/abs/html/sample-bashrc.html) 31 | so that every time 32 | you open a new terminal you have the paths for the tools setup properly. 33 | 34 | ```shell 35 | source /home/ff/eecs151/tutorials/eecs151.bashrc 36 | export HAMMER_HOME=/home/ff/eecs151/hammer 37 | source ${HAMMER_HOME}/sourceme.sh 38 | ``` 39 | 40 | Type 41 | 42 | ```shell 43 | which genus 44 | ``` 45 | 46 | to see if the shell prints out the path to the Cadence Genus Synthesis program (which we will be 47 | using for this lab). If it does not work, add the lines to your `.bash_profile` in your home folder 48 | as well. Try to open a new terminal to see if it works. The file `eecs151.bashrc` sets various 49 | environment variables in your system such as where to find the CAD programs or license servers. 50 | 51 | 52 | ## Synthesis Environment 53 | To perform synthesis, we will be using Cadence Genus. However, we will not be interfacing with 54 | Genus directly, we will rather use Hammer. Just like in lab 2, we have set up the basic Hammer 55 | flow for your lab exercises using Makefile. 56 | 57 | In this lab repository, you will see two sets of input files for Hammer. The first set of files are 58 | the source codes for our design that you will explore in the next section. The second set of files are 59 | some YAML files (`inst-env.yml`, `asap7.yml`, `design.yml`, `sim-rtl.yml`, `sim-gl-syn.yml`) that 60 | configure the Hammer flow. Of these YAML files, you should only need to modify `design.yml`, 61 | `sim-rtl.yml` and `sim-gl-syn.yml` in order to configure the synthesis and simulation for your 62 | design. 63 | 64 | 65 | Hammer is already setup at `/home/ff/eecs151/hammer` with all the required plugins for Cadence 66 | Synthesis (Genus) and Place-and-Route (Innovus), Synopsys Simulator (VCS), Mentor Graphics 67 | DRC and LVS (Calibre). You should not need to install it on your own home directory. **These 68 | Hammer plugins are under NDA. They are provided to us for educational purpose. 69 | They should never be copied outside of instructional machines under any circumstances or else we are at risk of unable to get access to the tools in the future!!!** 70 | 71 | Let us take a look at some parts of `design.yml` file: 72 | 73 | ```yaml 74 | gcd.clockPeriod: &CLK_PERIOD "1ns" 75 | ``` 76 | 77 | This option sets the target clock speed for our design. A more stringent target (a shorter clock 78 | period) will make the tool work harder and use higher-power gates to meet the 79 | constraints. A more relaxed timing target allows the tool to focus on reducing area and/or power. 80 | In the sim-rtl.yml: 81 | 82 | ```yaml 83 | defines: 84 | - "CLOCK_PERIOD=1.00" 85 | ``` 86 | 87 | This option sets the clock period used during simulation. It is generally useful to separate the two as 88 | you might want to see how the circuit performs under different clock frequencies without changing 89 | the design constraints. Continuing from `design.yml`: 90 | 91 | ```yaml 92 | gcd.verilogSrc: &VERILOG_SRC 93 | - "src/gcd.v" 94 | - "src/gcd_datapath.v" 95 | - "src/gcd_control.v" 96 | ``` 97 | 98 | and in `sim-rtl.yml`: 99 | 100 | ```yaml 101 | sim.inputs: 102 | input_files: 103 | - "src/gcd.v" 104 | - "src/gcd_datapath.v" 105 | - "src/gcd_control.v" 106 | - "src/gcd_testbench.v" 107 | ``` 108 | 109 | These specify the files for synthesis and simulation. Moving on, we have: 110 | 111 | ```yaml 112 | vlsi.inputs.clocks: [ 113 | {name: "clk", period: *CLK_PERIOD, uncertainty: "0.1ns"} 114 | ] 115 | ``` 116 | 117 | This is where we specify to Hammer that we intend on using the `CLK_PERIOD` we defined earlier 118 | as the constraint for our design. We will see more detailed constraints in later labs. 119 | 120 | ## Understanding the example design 121 | We have provided a circuit described in Verilog that computes the greatest common divisor (GCD) 122 | of two numbers. Unlike the FIR filter from the last lab, in which the testbench constantly provided 123 | stimuli, the GCD algorithm takes a variable number of cycles, so the testbench needs to know when 124 | the circuit is done to check the output. This is accomplished through a “ready/valid” handshake 125 | protocol. This protocol shows up in many places in digital circuit design. 126 | Look [here](https://inst.eecs.berkeley.edu/~eecs151/fa21/files/verilog/ready_valid_interface.pdf) at information on the course website for more background. 127 | The GCD top level is shown in the figure below. 128 | 129 |

130 | 131 |

132 | 133 | The GCD module declaration is as follows: 134 | 135 | ```v 136 | module gcd#( parameter W = 16 ) 137 | ( 138 | input clk, reset, 139 | input [W-1:0] operands_bits_A, // Operand A 140 | input [W-1:0] operands_bits_B, // Operand B 141 | input operands_val, // Are operands valid? 142 | output operands_rdy, // ready to take operands 143 | 144 | output [W-1:0] result_bits_data, // GCD 145 | output result_val, // Is the result valid? 146 | input result_rdy // ready to take the result 147 | ); 148 | ``` 149 | 150 | On the `operands` boundary, nothing will happen until GCD is ready to receive data (`operands_rdy`). 151 | When this happens, the testbench will place data on the operands (`operands_bits_A` and `operands_bits_B`), 152 | but GCD will not start until the testbench declares that these operands are valid (`operands_val`). 153 | Then GCD will start. 154 | 155 | The testbench needs to know that GCD is not done. This will be true as long as `result_val` is 0 156 | (the results are not valid). Also, even if GCD is finished, it will hold the result until the testbench is 157 | prepared to receive the data (`result_rdy`). The testbench will check the data when GCD declares 158 | the results are valid by setting `result_val` to 1. 159 | 160 | The contract is that if the interface declares it is ready while the other side declares it is valid, the 161 | information must be transferred. 162 | 163 | Open `src/gcd.v`. This is the top-level of GCD and just instantiates `gcd_control` and `gcd_datapath`. 164 | Separating files into control and datapath is generally a good idea. Open `src/gcd_datapath.v`. 165 | This file stores the operands, and contains the logic necessary to implement the algorithm (subtraction and comparison). Open `src/gcd_control.v`. This file contains a state machine that handles 166 | the ready-valid interface and controls the mux selects in the datapath. Open `src/gcd_testbench.v`. 167 | This file sends different operands to GCD, and checks to see if the correct GCD was found. Make 168 | sure you understand how this file works. Note that the inputs are changed on the negative edge 169 | of the clock. This will prevent hold time violations for gate-level simulation, because once a clock 170 | tree has been added, the input flops will register data at a time later than the testbench’s rising 171 | edge of the clock. 172 | 173 | Now simulate the design by running `make sim-rtl`. The waveform is located under `build/sim-rundir/`. 174 | Open the waveform in DVE (you may need to scroll down in DVE to find the testbench) and try 175 | to understand how the code works by comparing the waveforms with the Verilog code. It might 176 | help to sketch out a state machine diagram and draw the datapath. 177 | 178 | --- 179 | ### Question 1: Understanding the algorithm 180 | 181 | By reading the provided Verilog code and/or viewing the RTL level simulations, demonstrate that 182 | you understand the provided code: 183 | 184 | **a.) Draw a table with 5 columns (cycle number, value of `A_reg`, value of `B_reg`, `A_next`, `B_next`) and fill in all of the rows for the first test vector (GCD of 27 and 15)** 185 | 186 | **b) In `src/gcd_testbench.v`, the inputs are changed on the negative edge of the clock to prevent hold time violations. Is the output checked on the positive edge of the clock or the negative edge of the clock? Why?** 187 | 188 | **c) In `src/gcd_testbench.v`, what will happen if you change `result_rdy = 1;` to `result_rdy = 0;`? What state will the `gcd_control.v` state machine be in?** 189 | 190 | --- 191 | ### Question 2: Testbenches 192 | **a) Modify `src/gcd_testbench.v` so that intermediate steps are displayed in the format below.** 193 | **Include a copy of the code you wrote in your writeup (this should be approximately 3-4 lines).** 194 | 195 | ```shell 196 | 0: [ ...... ] Test ( x ), [ x == x ] (decimal) 197 | 1: [ ...... ] Test ( x ), [ x == 0 ] (decimal) 198 | 2: [ ...... ] Test ( x ), [ x == 0 ] (decimal) 199 | 3: [ ...... ] Test ( x ), [ x == 0 ] (decimal) 200 | 4: [ ...... ] Test ( x ), [ x == 0 ] (decimal) 201 | 5: [ ...... ] Test ( x ), [ x == 0 ] (decimal) 202 | 6: [ ...... ] Test ( 0 ), [ 3 == 0 ] (decimal) 203 | 7: [ ...... ] Test ( 0 ), [ 3 == 0 ] (decimal) 204 | 8: [ ...... ] Test ( 0 ), [ 3 == 27 ] (decimal) 205 | 9: [ ...... ] Test ( 0 ), [ 3 == 12 ] (decimal) 206 | 10: [ ...... ] Test ( 0 ), [ 3 == 15 ] (decimal) 207 | 11: [ ...... ] Test ( 0 ), [ 3 == 3 ] (decimal) 208 | 12: [ ...... ] Test ( 0 ), [ 3 == 12 ] (decimal) 209 | 13: [ ...... ] Test ( 0 ), [ 3 == 9 ] (decimal) 210 | 14: [ ...... ] Test ( 0 ), [ 3 == 6 ] (decimal) 211 | 15: [ ...... ] Test ( 0 ), [ 3 == 3 ] (decimal) 212 | 16: [ ...... ] Test ( 0 ), [ 3 == 0 ] (decimal) 213 | 17: [ ...... ] Test ( 0 ), [ 3 == 3 ] (decimal) 214 | 18: [ passed ] Test ( 0 ), [ 3 == 3 ] (decimal) 215 | 19: [ ...... ] Test ( 1 ), [ 7 == 3 ] (decimal) 216 | ``` 217 | 218 | --- 219 | ## Synthesis 220 | Synthesis is the process of converting your Verilog RTL description into technology (or platform, in the case of 221 | FPGAs) specific gate-level Verilog. These gates are different from the “and”, “or”, “xor” etc. primitives in Verilog. While the logic primitives correspond to gate-level operations, they do not have 222 | a physical representation outside of their symbol. A synthesized gate-level Verilog netlist only contains 223 | cells with corresponding physical aspects: they have a transistor-level schematic with transistor 224 | sizes provided, a physical layout containing information necessary for fabrication, timing libraries 225 | providing performance specifications, etc. Some synthesis tools also output assign statements that 226 | refer to pass-through interfaces, but no logic operation is performed in these assignments (not even 227 | simple inversion!). 228 | 229 | 230 | Open the Makefile to see the available targets that you can run. You don’t have to know all of 231 | these for now. The Makefile provides shorthands to various Hammer commands for synthesis, 232 | placement-and-routing, or simulation. Read [Hammer-Flow](https://hammer-vlsi.readthedocs.io/en/latest/Hammer-Flow/index.html) if you want to get more detail. 233 | 234 | The first step is to have Hammer generate the necessary supplement Makefile (`build/hammer.d`). To do so, type the 235 | following command in the lab directory: 236 | 237 | make buildfile 238 | 239 | This generates a file with make targets specific to the constraints we have provided inside the YAML 240 | files. If you have not run `make clean` after simulating, this file should already be generated. `make buildfile` also copies and extracts a tarball of the ASAP7 PDK to your local workspace. It will 241 | take a while to finish if you run this command first time. The extracted PDK is not deleted when 242 | you do `make clean` to avoid unnecessarily rebuilding the PDK. To explicitly remove it, you need to 243 | remove the build folder (and you should do it once you finish the lab to save your allocated disk 244 | space since the PDK is huge). To synthesize the GCD, use the following command: 245 | 246 | make syn 247 | 248 | This runs through all the steps of synthesis. 249 | By default, Hammer puts the generated objects under the directory build. Go to `build/syn-rundir/reports`. 250 | There are five text files here that contain very useful information about 251 | the synthesized design that we just generated. Go through these files and familiarize yourself with 252 | these reports. One report of particular note is `final_time_PVT_0P63V_100C.setup.view.rpt`. The 253 | name of this file represents that it is a timing report, with the Process Voltage Temperature corner 254 | of 0.63 V and 100 degrees C, and that it contains the setup timing checks. Another important file 255 | is `build/syn-rundir/gcd.mapped.v`. This is your synthesized gate-level Verilog. Go through it 256 | to see what the RTL design has become to represent it in terms of technology-specific gates. Try 257 | to follow an input through these gates to see the path it takes until the output. 258 | These files are useful for debugging and evaluating your design. 259 | 260 | Now open the `final_time_PVT_0P63V_100C.setup.view.rpt` file and look at the first block of text 261 | you see. It should look similar to this: 262 | 263 | ```text 264 | Path 1: MET (474 ps) Setup Check with Pin GCDdpath0/A_reg_reg[15]/CLK->D 265 | View: PVT_0P63V_100C.setup_view 266 | Group: clk 267 | Startpoint: (R) GCDdpath0/B_reg_reg[5]/CLK 268 | Clock: (R) clk 269 | Endpoint: (F) GCDdpath0/A_reg_reg[15]/D 270 | Clock: (R) clk 271 | Capture Launch 272 | Clock Edge:+ 1000 0 273 | Src Latency:+ 0 0 274 | Net Latency:+ 0 (I) 0 (I) 275 | Arrival:= 1000 0 276 | Setup:- 25 277 | Uncertainty:- 0 278 | Required Time:= 975 279 | Launch Clock:- 0 280 | Data Path:- 501 281 | Slack:= 474 282 | #--------------------------------------------------------------------------------------------------------------------- 283 | # Timing Point Flags Arc Edge Cell Fanout Load Trans Delay Arrival Instance 284 | # (fF) (ps) (ps) (ps) Location 285 | #--------------------------------------------------------------------------------------------------------------------- 286 | GCDdpath0/B_reg_reg[5]/CLK - - R (arrival) 16 - 0 - 0 (-,-) 287 | GCDdpath0/B_reg_reg[5]/QN - CLK->QN R ASYNC_DFFHx1_ASAP7_75t_SL 5 3.3 42 48 48 (-,-) 288 | GCDdpath0/g1181/Y - A->Y F INVx1_ASAP7_75t_SL 2 1.2 20 10 58 (-,-) 289 | GCDdpath0/g1162__8246/Y - A->Y F OR2x2_ASAP7_75t_SL 2 1.3 12 17 76 (-,-) 290 | GCDdpath0/g1152__6260/Y - A1->Y F AO32x1_ASAP7_75t_SL 1 0.7 13 19 95 (-,-) 291 | GCDdpath0/g1144__2883/Y - C1->Y R AOI322xp5_ASAP7_75t_SL 1 0.7 47 19 114 (-,-) 292 | GCDdpath0/g1138__5115/Y - B2->Y F AOI221xp5_ASAP7_75t_SL 1 0.7 37 14 128 (-,-) 293 | GCDdpath0/g1137__1881/Y - A2->Y R O2A1O1Ixp33_ASAP7_75t_SL 3 2.2 72 36 164 (-,-) 294 | GCDctrl0/g446__5526/Y - B->Y F NAND2xp5_ASAP7_75t_SL 2 1.3 36 17 182 (-,-) 295 | GCDctrl0/g444/Y - A->Y R INVx1_ASAP7_75t_SL 18 10.0 102 52 234 (-,-) 296 | GCDdpath0/g1265/Y - A->Y F INVx1_ASAP7_75t_L 17 9.4 91 63 297 (-,-) 297 | GCDdpath0/g1232__9945/Y - B->Y R NOR2xp33_ASAP7_75t_L 16 9.0 304 154 451 (-,-) 298 | GCDdpath0/g1193__6417/Y - C1->Y F AOI222xp33_ASAP7_75t_SL 1 0.7 124 51 501 (-,-) 299 | GCDdpath0/A_reg_reg[15]/D - - F ASYNC_DFFHx1_ASAP7_75t_SL 1 - - 0 501 (-,-) 300 | #--------------------------------------------------------------------------------------------------------------------- 301 | ``` 302 | 303 | This is one of the most common ways to assess the critical paths in your circuit. 304 | The setup timing report lists each timing path's **slack**, which is the extra delay the signal can have before a setup 305 | violation occurs, in ascending order. The first block indicates the critical path of the design. 306 | Each row represents a timing path from a gate to the next, and the whole block is the **timing 307 | arc** between two flip-flops (or in some cases between latches). The `MET` at the top of the block 308 | indicates that the timing requirements have been met and there is no violation. If there was, this 309 | indicator would have read `VIOLATED`. Since our critical path meets the timing requirements with 310 | a 474 ps of slack, this means we can run this synthesized design with a period equal to clock period 311 | (1000 ps) minus the critical path slack (474 ps), which is 526 ps. 312 | 313 | --- 314 | 315 | ### Question 3: Reporting Questions 316 | **a) Which report would you look at to find the total number of each different standard cell that the design contains?** 317 | 318 | **b) Which report contains area breakdown by modules in the design?** 319 | 320 | **c) What is the cell used for `A_reg_reg[7]`? How much leakage power does `A_reg_reg[7]` contribute? How did you find this?** 321 | 322 | --- 323 | 324 | ### Question 4: Synthesis Questions 325 | **a) Looking at the total number of instances of sequential cells synthesized and the number of `reg` definitions in the Verilog files, are they consistent? If not, why?** 326 | 327 | **b) Modify the clock period in the `design.yml` file to make the design go faster. What is the highest clock frequency this design can operate at in this technology?** 328 | 329 | --- 330 | 331 | ### Synthesis: Step-by-step 332 | 333 | Typically, we will be roughly following the above section’s flow, but it is also 334 | useful to know what is going on underneath. In this section, 335 | we will look at the steps Hammer takes to get from RTL Verilog to all the outputs we saw in the 336 | last section. 337 | 338 | First, type `make clean` to clean the environment of previous build’s files. Then, use `make buildfile` 339 | to generate the supplementary Makefile as before. Now, we will modify the `make syn` command to 340 | only run the steps we want. Go through the following commands in the given order: 341 | 342 | make redo-syn HAMMER_EXTRA_ARGS="--stop_after_step init_environment" 343 | 344 | In this step, Hammer invokes Genus to read the technology libraries and the RTL Verilog files, as well as the constraints we 345 | provided in the `design.yml` file. 346 | Hammer will exit with an error, which is expected as Hammer looks for the final synthesis output 347 | files to gauge its success. We have not yet generated the gate-level Verilog, so we know Hammer will display an error after every step except the last one. 348 | 349 | make redo-syn HAMMER_EXTRA_ARGS="--stop_after_step syn_generic" 350 | 351 | This step is the **generic synthesis** step. In this step, Genus converts our RTL read 352 | in the previous step into an intermediate format, made up of technology-independent generic gates. These 353 | gates are purely for gate-level functional representation of the RTL we have coded, and are going 354 | to be used as an input to the next step. This step also performs logical optimizations on our design 355 | to eliminate any redundant/unused operations. 356 | 357 | make redo-syn HAMMER_EXTRA_ARGS="--stop_after_step syn_map" 358 | 359 | This step is the **mapping** step. Genus takes its own generic gate-level output and converts it to 360 | our ASAP7-specific gates. This step further optimizes the design given the gates in our technology. 361 | That being said, this step can also increase the number of gates from the previous step as not 362 | all gates in the generic gate-level Verilog may be available for our use and they may need to be 363 | constructed using several, simpler gates. 364 | 365 | make redo-syn HAMMER_EXTRA_ARGS="--stop_after_step add_tieoffs" 366 | 367 | In some designs, the pins in certain cells are hardwired to 0 or 1, which requires a tie-off cell. 368 | 369 | make redo-syn HAMMER_EXTRA_ARGS="--stop_after_step write_regs" 370 | 371 | This step is purely for the benefit of the designer. For some designs, we may need to have a list 372 | of all the registers in our design. In this lab, the list of regs is used in post-synthesis simulation to 373 | generate the `force_regs.ucli`, which sets initial states of registers. 374 | 375 | make redo-syn HAMMER_EXTRA_ARGS="--stop_after_step generate_reports" 376 | 377 | The reports we have seen in the previous section are generated during this step. 378 | 379 | make redo-syn HAMMER_EXTRA_ARGS="--stop_after_step write_outputs" 380 | 381 | This step writes the outputs of the synthesis flow. This includes the gate-level `.v` file we looked at 382 | earlier in the lab. Other outputs include the design constraints (such as clock frequencies, output 383 | loads etc., in `.sdc` format) and delays between cells (in `.sdf` format). 384 | 385 | ## Post-Synthesis Simulation 386 | From the root folder, type the following commands: 387 | 388 | make sim-gl-syn 389 | 390 | This will run a post-synthesis simulation using annotated delays from the `gcd.mapped.sdf` file. 391 | 392 | --- 393 | 394 | ### Checkoff 1: Synthesis Understanding 395 | Demonstrate that your synthesis flow works correctly, and be prepared to explain the synthesis steps at a high level. 396 | 397 | --- 398 | 399 | ### Question 5: Delay Questions 400 | Check the waveforms in DVE. 401 | 402 | **a) Report the clk-q delay of `state[0]` in `GCDctrl0` at 17.5 ns and submit a screenshot of the waveforms showing how you found this delay.** 403 | 404 | **b) Which line in the sdf file specifies this delay and what is the delay?** 405 | 406 | **c) Is the delay from the waveform the same as from the sdf file? Why or why not?** 407 | 408 | --- 409 | 410 | ## Build Your Divider 411 | In this section, you will build a parameterized divider of unsigned integers. Some initial code has 412 | been provided to help you get started. To keep the control logic simple, the divider module uses an input 413 | signal `start` to begin the computation at the next clock cycle, and asserts an output signal `done` to 414 | HIGH when the division result is valid. The input `dividend` and `divisor` should be registered 415 | when `start` is HIGH. You are not required to handle corner cases such as dividing by 0. You are 416 | free to modify the skeleton code to implement a ready/valid interface instead, but it is not required. 417 | 418 | It is suggested that you implement the divide algorithm described [here](http://bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s04/Project/Divider%20Background.pdf). Use the **Divide Algorithm Version 2** (slide 9). 419 | A simple testbench skeleton is also provided to you. You should change it to add more test vectors, 420 | or test your divider with different bitwidths. You need to change the file `sim-rtl.yml` to use your 421 | divider instead of the GCD module when testing. 422 | 423 | --- 424 | 425 | ### Question 6: Synthesize your divider 426 | **a) Push your 4-bit divider design through the synthesis tool, and determine its critical path, cell area, and maximum operating frequency from the reports. You might need to re-run synthesis multiple times to determine the maximum achievable frequency.** 427 | 428 | **b) Change the bitwidth of your divider to 32-bit, what is the critical path, area, and maximum operating frequency now?** 429 | 430 | **c) Submit your divider code and testbench to the report. Add comments to explain your testbench and why it provides sufficient coverage for your divider module.** 431 | 432 | --- 433 | 434 | ## Lab Deliverables 435 | 436 | ### Lab Due: 11:59 PM, Friday February 18th, 2021 437 | 438 | - Submit a written report with all 6 questions answered to Gradescope 439 | - Checkoff with an ASIC lab TA 440 | 441 | ## Acknowledgement 442 | 443 | This lab is the result of the work of many EECS151/251 GSIs over the years including: 444 | Written By: 445 | - Nathan Narevsky (2014, 2017) 446 | - Brian Zimmer (2014) 447 | Modified By: 448 | - John Wright (2015,2016) 449 | - Ali Moin (2018) 450 | - Arya Reais-Parsi (2019) 451 | - Cem Yalcin (2019) 452 | - Tan Nguyen (2020) 453 | - Harrison Liew (2020) 454 | - Sean Huang (2021) 455 | - Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu (2021) 456 | - Dima Nikiforov (2022) 457 | -------------------------------------------------------------------------------- /lab6/spec.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Lab 6: SRAM Integration, DRC, LVS 2 |

3 | Prof. Sophia Shao 4 |

5 |

6 | TAs (ASIC): Dima Nikiforov 7 |

8 |

9 | Department of Electrical Engineering and Computer Science 10 |

11 |

12 | College of Engineering, University of California, Berkeley 13 |

14 | 15 | ## Overview 16 | In this lab, we will go over two very important concepts. First we will look at the basics of using 17 | circuits beyond standard cells in VLSI designs. The most common example of this is SRAM, 18 | which is a dense addressable memory block used in most VLSI designs. You will learn about SRAM 19 | in more detail later in the lectures, but the [Wikipedia article on SRAM](https://en.wikipedia.org/wiki/Static_random-access_memory) provides a good starting 20 | point. SRAM is treated as a hard macro block in VLSI flow. It is created separately from the 21 | standard cell libraries. The process for adding other custom, analog, or mixed signal circuits will 22 | be similar to what we use for SRAMs. In your project, you will use the SRAMs extensively for 23 | data caching. It is important to know how to design a digital circuit and run a CAD flow with 24 | those hard macro blocks. The lab exercises will help you get familiar with SRAM interfacing. We 25 | will use an example design of computing a dot product of two vectors to walk you through how to 26 | use the SRAM blocks. 27 | 28 | Next, we will take a cursory glance at part of the ”signoff” flow: design rule checking (DRC) and 29 | layout-versus-schematic (LVS). DRC checks all the geometry in the post-PAR’d layout to see that 30 | they meet all the design rules for the process technology. LVS checks for discrepancies between 31 | the actual layout and the netlist that the PAR tool thinks it laid out. 32 | In a purely standard-cell based design, LVS will almost never be wrong. However, once you start 33 | integrating hard macros like SRAMs and custom analog cells, LVS can reveal unconnected pins, 34 | unintended shorts between power/ground/signals, and more that would prevent the circuit from 35 | working. Often, these stem from improper abstraction of the macro cells for the PAR tool. 36 | 37 | To begin this lab, get the project files and set up your environment by typing the following command and sourcing the `eecs151.bashrc` file, as usual: 38 | 39 | ```shell 40 | git clone /home/ff/eecs151/labs/lab6.git 41 | ``` 42 | 43 | You should also clean up the build directory generated from the previous labs to save some disk space. 44 | 45 | For this lab, there are many Make targets that will be run, some of which you have explored in 46 | previous labs. The following list is a reference of what each one does for future reference, but **do not run them right now!** 47 | 48 | ```shell 49 | # This command gets all the relevant SRAM configurations (file pointers) for the ASAP7 library 50 | make srams 51 | 52 | # This command runs RTL simulation 53 | make sim-rtl 54 | 55 | # This command runs Synthesis using Cadence Genus tool 56 | make syn 57 | 58 | # This command runs Post-Synthesis gate-level simulation 59 | make sim-gl-syn 60 | 61 | # This command runs Placement-and-Routing using Cadence Innovus tool 62 | make par 63 | 64 | # This command runs Post-PAR gate-level simulation 65 | make sim-gl-par 66 | 67 | # This command runs Post-PAR power estimation 68 | make power-par 69 | 70 | # This command runs DRC using Mentor Calibre tool 71 | make drc 72 | 73 | # This command runs LVS using Mentor Calibre tool 74 | make lvs 75 | ``` 76 | 77 | The configuration files (`*.yml` files) are intended to provide you more flexibility when you have a 78 | large design project, and you want to test the modules separately before final integration. You can 79 | simply set the top-level module to the one you care about in these configuration files. Don’t hesitate 80 | the make changes to those files whenever you want to test out your new modules. This structure 81 | will also be used in the final project, so please take the exercises in this lab as a final practice run 82 | with the CAD flow so that you will become more productive when 83 | working on your project. At the very least, you should be aware of which files to make changes for the tasks 84 | that you want to carry out. We will run through small to moderate designs to get a sense of the 85 | entire flow. Please let the TAs know if you have any feedback or suggestion on how to improve 86 | the tool flow, or you if encounter some tooling issues. 87 | 88 | ## SRAM Modeling and Abstraction 89 | Open the file `src/dot_product.v`. This Verilog module implements a vector dot product of two 90 | vectors of unsigned integers a and b. The module first reads elements of the vectors one-by-one via 91 | the ready/valid interfaces and stores them to two SRAMs, one for each vector. 92 | 93 | Note: You will see some `REGISTER_R_CE` blocks in `dot_product.v`. These are used by some 94 | iterations of this lab to remove the `reg` ambiguity that exists in Verilog. You may refer to 95 | `/home/ff/eecs151/verilog_lib/EECS151.v` to see their definition, but in essence they are structural descriptions of registers that are unambiguously translated to flip-flops when written in this 96 | fashion. You may use these constructs or normal verilog syntax. 97 | 98 | Let’s look at one particular SRAM module instantiation to understand its interface. The function 99 | of select ports are annotated here: 100 | 101 | ```v 102 | SRAM2RW16x16 sram ( 103 | .CE1(), // clock edge (clock signal) 104 | .CE2(), 105 | 106 | .WEB1(), // Write Enable Bar (HIGH: Read, LOW: Write) 107 | .WEB2(), 108 | .OEB1(), // Output Enable Bar (always tie to LOW) 109 | .OEB2(), 110 | .CSB1(), // Chip Select Bar (always tie to LOW) 111 | .CSB2(), 112 | 113 | .A1(), // Address pin 114 | .A2(), 115 | .I1(), // Input Data pin 116 | .I2(), 117 | .O1(), // Output Data pin 118 | .O2() 119 | ); 120 | ``` 121 | 122 | This `SRAM2RW16x16` is a dual-port Read/Write memory block of sixteen 16-bit entries. This means 123 | there is a 4-bit address for selecting those 16-bit entries. The SRAM can be clocked with two 124 | independent clock signals. Also, to write to an SRAM, we need to set the `WEBi` signal to LOW. The 125 | signals `OEBi` and `CSBi` should be set to LOW. SRAMs are synchronous-write and synchronous-read; 126 | the read data is only available at the next rising edge, and the write data is only written at the 127 | next rising edge. 128 | 129 | Where are those SRAMs coming from? Because SRAMs are not made out of standard cells, and 130 | are rather built using different units that do not conform to our PAR flow, they are pre-compiled 131 | and stored in separate databases. These cells are then instantiated by Innovus as black boxes, 132 | and are connected to the rest of the circuit as specified in your Verilog. In order to generate the 133 | database that Innovus will use, type the following command: 134 | 135 | ```shell 136 | make srams 137 | ``` 138 | 139 | For simulation purposes, a Verilog behavioral model for the SRAMs from the HAMMER repository 140 | is used. This is automatically set up in build/sram generator-output.json and points to `/home/ff/eecs151/hammer/src/hammer-vlsi/technology/asap7/sram_compiler/memories/behavioral/sram_behav_models.v`. 141 | 142 | This file includes models for various types of SRAMs. You can find SRAMs that have only singleport for Read and Write, or SRAMs with different address widths and data widths. For your final 143 | project, you need to select the appropriate SRAM models to meet the specification. The SRAM 144 | models in this file are only intended for simulation, **do not include this file in your project configuration for Synthesis or PAR**, otherwise, it will mess up with your post-Synthesis or 145 | post-PAR netlist. 146 | 147 | For Synthesis and PAR, the SRAMs must be abstracted away from the tools, because the only 148 | things that the flow is concerned about at these stages are the timing characteristics and the outer 149 | layout geometry of the SRAM macros. The ASAP7 PDK does not come with SRAMs by default, 150 | so a graduate student (Sean Huang) graciously created some dummy models for us to use. They 151 | are located at: 152 | 153 | ```shell 154 | # Liberty Timing File -- delay information 155 | /home/ff/eecs151/hammer/src/hammer-vlsi/technology/asap7/sram_compiler/memories/lib/ 156 | 157 | # Library Exchange Format -- placement information 158 | /home/ff/eecs151/hammer/src/hammer-vlsi/technology/asap7/sram_compiler/memories/lef/ 159 | 160 | # Graphical Database System -- final layout information 161 | /home/ff/eecs151/hammer/src/hammer-vlsi/technology/asap7/sram_compiler/memories/gds/ 162 | 163 | ``` 164 | 165 | #### Liberty Timing Files (*.lib) 166 | [Liberty files](http://web.engr.uky.edu/~elias/lectures/LibertyFileIntroduction.pdf) must be generated for macros at every relevant process, voltage, and temperature 167 | (PVT) corner that you are using for setup and hold timing analysis. Detailed models contain 168 | descriptions of what each pin does, the delays depending on the load given in tables, and power 169 | information. There are also 3 types of Liberty files: [CCS, ECSM, and NLDM](https://chitlesh.ch/wordpress/liberty-ccs-ecsm-or-ndlm/), which tradeoff 170 | accuracy with tool runtime. 171 | If you open up a file for the 172 | SRAMs we are using, you will see that they are very basic because these are fake timing models. 173 | Note that you will see that your post-synthesis and post- 174 | 175 | 176 | 177 | 178 | timing reports will differ from gatelevel simulation due to these inaccuracies. 179 | 180 | #### Library Exchange Format (*.lef) 181 | [LEF files](http://web.engr.uky.edu/~elias/lectures/LibertyFileIntroduction.pdf) must be generated for macros in order to denote 182 | where pins are located and encode any obstructions (places where the PAR tool cannot place other 183 | cells or routing). The quality of LEFs is very important to get clean layouts. Again, our SRAM 184 | LEFs are fake, so they may present some issues with routing and DRC. 185 | 186 | 187 | #### Graphical Database System (*.gds) 188 | [GDS files](https://www.artwork.com/gdsii/gdsii/) must be generated 189 | for macros to encode the entire detailed layout, and get merged with the PAR’d layout before 190 | running DRC, LVS, and sending the design off to the fabrication house. 191 | 192 | --- 193 | 194 | ### Question 1: Understanding SRAMs 195 | **a)** Open the file `sram_behav_models.v` (located in HAMMER repository). 196 | **What are different SRAM-sizes available?** 197 | **What is the difference between the `SRAM1RW*` and `SRAM2RW*` variants?** 198 | Hint: take some time to look at the Verilog implementation to understand what it does. You will need to use this SRAM model in the final project. 199 | 200 | **b)** In the same file, select an SRAM instance that has a BYTEMASK pin. 201 | **What is the SRAM model (in terms of number of Read/Write ports, address width, data/word width)?** 202 | **Briefly describe the purpose the BYTEMASK. In which situation do you think it is useful?** 203 | 204 | *c) (Ungraded thought experiment #1) SRAM libraries in real process technologies are much larger than the list you see in `sram_behav_models.v`. What features do you think are important for real SRAM libraries? Think in terms of number of ports, masking, improving yield, or anything else you can think of. What would these features do to the size of the SRAM macros?* 205 | 206 | *d) (Ungraded thought experiment #2) SRAMs should be integrated very densely in a circuit’s layout. To build large SRAM arrays, often times many SRAM macros are tiled together, abutted on one or more sides. Knowing this, take a guess at how SRAMs are laid out.* 207 | 208 | *i) In ASAP7, there are 9 metal layers, but realistically only 7 layers to route on in order to leave the top 2 layers for robust power distribution, as you saw in Lab 4. How many layers should a well-designed SRAM macro use (i.e. block off from PAR routing), at maximum?* 209 | 210 | *ii) Where should the pins on SRAMs be located, if you want to maximize the ability for them to abut together?* 211 | 212 | --- 213 | 214 | ## A Vector Dot Product with SRAMs 215 | Take a moment to read through the file src/dot_product.v to understand the control logic of 216 | writing and reading from SRAMs. The two SRAMs are first filled with vector data up until a size 217 | of vector size, after that they are read for the dot product computation. 218 | To run RTL simulation, type the following command 219 | 220 | ```shell 221 | make sim-rtl 222 | ``` 223 | 224 | To inspect the RTL simulation waveform, type the following commands 225 | 226 | ```shell 227 | cd build/sim-rundir 228 | dve -vpd vcdplus.vpd 229 | ``` 230 | 231 | The simulation takes 35 cycles to complete, which makes sense since it spends the first 16 cycles 232 | to read data from vector `a` and `b`, and performs a dot product computation in 16 cycles, including 233 | extra few cycles for various state transitions. The goal is not building the most efficient dot product 234 | implementation, but rather providing you an introductory design to how you would interface with 235 | SRAMs. 236 | 237 | Next, we will perform PAR on the circuit. 238 | 239 | ```shell 240 | make par 241 | ``` 242 | 243 | This command will invoke Synthesis as well, if it has not been run already (However, make sure to re-run synthesis if you updated your `design.yml` file). After PAR finishes, 244 | you can open the floorplan of the design by doing 245 | 246 | ```shell 247 | cd build/par-rundir 248 | ./generated-scripts/open_chip 249 | ``` 250 | 251 | This will launch Cadence Innovus GUI and load your final design database. You should expect to 252 | see the floorplan as in the following image. Don’t forget to disable M8, V8, M9, V9 on the right 253 | pane to see the unobstructed floorplan. 254 | 255 |

256 | 257 |

258 | 259 | This floorplan has two SRAM instances: `sram_a` and `sram_b`. The placement constraints of those 260 | SRAMs were given in the file `design.yml` in this block below. You can look at `build/par-rundir/floorplan.tcl` 261 | to see how HAMMER translated these constraints into Innovus floorplanning commands. Note that 262 | you should: 263 | 264 | - Always generate a placement constraint for hard macros like SRAMs, because Innovus is not 265 | able to auto-place them in a valid location most of the time. 266 | - Ensure that the hierarchical path to the macro instance is specified correctly, otherwise Innovus will not know what to place. 267 | - Pre-calculate valid locations for the macros. This will involve: 268 | - Looking at the LEF file to find out its width and height (e.g. 12.384um × 77.184um for 269 | `SRAM2RW16x16`) to make sure it fits within the core boundary/desired area. 270 | - Legalizing the x and y coordinates. These generally need to be a multiple of a technology 271 | grid to avoid layout rule violations. The most conservative rule of thumb is a multiple 272 | of the site height (height of a standard cell row, which is 1.08um in this technology). 273 | - Ensuring that the macros receive power. You can see that the SRAMs in the picture 274 | above are placed beneath the M5 power straps. This is because the SRAM’s power pins 275 | are on M4. 276 | 277 | ```yaml 278 | - path: "dot_product/sram_a" 279 | type: hardmacro 280 | x: 35.64 281 | y: 10.8 282 | width: 12.384 283 | height: 77.184 284 | orientation: r0 285 | top_layer: M4 286 | 287 | - path: "dot_product/sram_b" 288 | type: hardmacro 289 | x: 71.28 290 | y: 10.8 291 | width: 12.384 292 | height: 77.184 293 | orientation: r0 294 | top_layer: M4 295 | ``` 296 | 297 | You can play around with those constraints to change the SRAM placement to a geometry you like. 298 | If you change the placement constraint only in `design.yml` and only want to redo PAR (skipping 299 | synthesis), you can do: 300 | 301 | ```shell 302 | make redo-par HAMMER_EXTRA_ARGS='-p build/sram_generator-output.json -p design.yml' 303 | ``` 304 | 305 | Finally, we will perform post-PAR gate-level simulation and power estimation. 306 | 307 | ```shell 308 | make sim-gl-par 309 | make power-par 310 | ``` 311 | 312 | Theoretically, if you don’t have any setup/hold time violation, your post-PAR gate-level simulation 313 | should pass. However, as mentioned above, when are you pushing the timing constraints, due to 314 | the incomplete SRAM timing libraries, the gate-level simulation may not pass. One manifestation 315 | of this is the PAR tool trying to use very large clock buffers (x16 size) in the presence of SRAMs, 316 | which sometimes cannot be placed in the floorplan because they are too wide. At the bottom of 317 | design.yml, they are set to be ”don’t use” by PAR. 318 | 319 | --- 320 | ### Question 2: Using a different SRAM 321 | **a)** Modify the dot product design to use only one instantiation of a *dual-port, 5-bit address width, and 16-bit data width SRAM*. In this SRAM, you want to store vector `a` to the first 16 entries of the SRAM, and store vector `b` to the remaining entries of the SRAM. You can use the dot product code given to you as a starting point, but please implement your design in `src/dot_product_1SRAM.v`. 322 | **Include a screenshot of the code you added when modifying `dot_product` to be `dot_product_1SRAM`.** 323 | 324 | **b)** Run PAR (remember to update your SRAM placement constraints) and find the post-PAR critical path in your design: with a step size of 0.1ns, reduce the PAR clock period until your design has setup violation. 325 | **Describe that path based on your Verilog source (roughly).** 326 | **Can you give a strategy to improve the timing based on the path that you find?** 327 | You don’t have to implement it. Just provide a brief description of how you should fix it. 328 | 329 | **c) What is the final performance (latency – in terms of nanoseconds) of your single-SRAM vector dot product design (post PAR)?** 330 | Remember that Latency (ns) = Number of Post-PAR simulation cycles × Lowest Post-PAR clock period. Make sure to run Post-PAR simulation with that clock period when you finish the PAR process. 331 | 332 | **d) Screenshot the final floorplan of your single-SRAM dot product design to the report, as well as the power report, timing report, and area report.** 333 | The SRAMs will have 0 power due to incomplete LIBs–show where this shows up in the power reports. 334 | 335 | ### Checkoff 1: Modified Design 336 | Explain the updated vector unit design, and show the updated layout. 337 | 338 | --- 339 | ### [Optional, Extra Credit] Question 3: Divide Your Vector Dot Products 340 | **Note: this question is extra credit. You will be awarded up to 20% extra credit on this lab report.** 341 | 342 | Imagine we would like to compute the division of two dot products of vectors of unsigned integers. Open the file `src/dp_div.v`, connect two single-SRAM vector dot product modules with the divider you implemented in Lab 4 (the divider should have Ready/Valid interfaces for input and output) via FIFOs. If you implement a correct Ready/Valid mechanism for each block, connecting those blocks is simply a matter of wiring relevant signals at the interfaces. One dot product produces dividend input, and the other provides divisor input to your Divider. Then write a testbench for your new `dp_div` module based on `dot_product_tb.v`, where the test cases are simple yet non-trivial (don't worry about covering edge cases with these). Refer to the figure below for the high-level overview of the design. 343 | 344 | **What is the number of cycles it takes to run a design of 16-element vectors with 16-bit datapath (for both dot product modules and divider module)?** 345 | **Screenshot the floorplan, collect the power report, timing report, and area report at a clock period that your design can meet (i.e., you don’t have to find the maximum achievable frequency).** 346 | **Zip your code and power, timing, area reports and submit it to the separate code assignment on Gradescope instead of pasting them into your lab PDF.** 347 | Start early, since the tools take a long time! 348 | 349 | 350 | To receive full credit, you should make sure that your final implementations has no latch (one way 351 | to do so is opening Genus log file, search for ”latch”). Also, your post PAR gate-level simulation 352 | should pass the test in the testbench code. 353 | 354 |

355 | 356 |

357 | 358 | 359 | --- 360 | 361 | ## DRC and LVS 362 | [DRC](https://en.wikipedia.org/wiki/Design_rule_checking) and [LVS](https://en.wikipedia.org/wiki/Layout_Versus_Schematic) are two of the most important ”signoff” checks. DRC checks that all of the geometries 363 | in the layout conform to process fabrication rules. Without a DRC ”clean” design, the fabrication 364 | house will not accept your design! LVS checks that the PAR’s conception of the circuit is actually 365 | matched by the generated layout. LVS extracts a connectivity netlist from your physical layout by 366 | tracing wires to/from transistors and pins and then tryies to match up transistors and nets between 367 | the netlist reported by the PAR tool and its layout-extracted netlist. DRC and LVS are run in our 368 | environment using an industry standard tool, Mentor Graphics Calibre. This section is intended 369 | as only a brief introduction to the steps of the flow, but you will not need to do them for your final 370 | project. 371 | 372 | To run DRC and view the results: 373 | 374 | ```shell 375 | make drc 376 | cd build/drc-rundir 377 | ./generated-scripts/view_drc 378 | ``` 379 | 380 | Your layout will open in Calibre DESIGNrev (or CalibreDRV for short), followed by a window listing 381 | the results. Together, they look like this (using the dual-SRAM design, sorting the violations from 382 | most common to least): 383 | 384 | 385 |

386 | 387 |

388 | 389 | 390 | We can see that our design is not clean. The rule-checking decks (Calibre script files) are incomplete 391 | for this PDK, so this is expected. The design rule manual (DRM) for this technology is extracted 392 | to your working directory under `build/tech-asap7-cache/extracted/ASAP7_PDKandLIB.tar/ASAP7_PDKandLIB_v1p5/asap7PDK_r1p5.tar.bz2/asap7PDK_r1p5/docs/asap7_drm.pdf`. 393 | 394 | In a design without SRAMs (i.e. not this lab or your project, but you can try it on previous labs), 395 | we can run LVS and view the results similarly as follows: 396 | 397 | 398 | ```shell 399 | make lvs 400 | cd build/lvs-rundir 401 | ./generated-scripts/view_lvs 402 | ``` 403 | 404 | Again, there are some issues with this PDK that would preclude generating LVS clean results. 405 | However, LVS is especially useful to run early on, after you get a first PAR database, to catch 406 | shorts, etc. that may pop up especially when integrating hard macros like SRAMs. After fixing 407 | everything, you will see this at the end a long design process, which is always super satisfying! 408 | 409 | 410 |

411 | 412 |

413 | 414 | --- 415 | ### Question 4: DRC and LVS 416 | a) Scroll to the bottom of the DRC result summary report in `build/drc-rundir/drc_results.rpt`. 417 | **For the cell `dot_product` (or whatever you named your single-SRAM vector dot product), how many total violation results do you have? How many rules did you violate?** 418 | Note: the result count is in the format `hierarchical_count` (`flat_count`), which would disagree if you have many 419 | instances of a submodule in the design. 420 | **Please report the hierarchical count.** 421 | 422 | b) Skim through Chapter 1.2 of the DRM (`build/tech-asap7-cache/extracted/ASAP7_PDKandLIB.tar/ASAP7_PDKandLIB_v1p5/asap7PDK_r1p5.tar.bz2/asap7PDK_r1p5/docs/asap7_drm.pdf`). 423 | **For the violated rule with the highest numbers of occurrences less than 1000, provide a brief description of what the rule requires based on the naming convention and descriptions in Table 1.2.1 of the DRM.** 424 | 425 | *c) (Ungraded thought experiment #3) If the DRC rule decks are perfect, the way you floorplan your design has a large impact on whether your design can be DRC clean. What things do you think can cause violations? What about other things that are constrained in PAR other than the floorplan?* 426 | 427 | *d) (Ungraded thought experiment #4) At first, it may seem odd that the netlist that the PAR tool thinks the layout corresponds to could be different from the netlist extracted from the actual layout. What reasons can you think of that could cause mismatches? Which of these causes might make the LVS tool to slow down dramatically as it tries to extract/compare? Would you be able to catch any of these discrepancies if doing a post-PAR gate-level simulation in lieu of LVS, and why?* 428 | 429 | ### Checkoff 2: DRC and LVS Demo 430 | Show the DRC and LVS results, and explain the meaning of what you see. 431 | 432 | 433 | --- 434 | 435 | ## Lab Deliverables 436 | 437 | ### Lab Due: 11:59 PM, Friday April 1st, 2022 438 | 439 | - Submit a written report with all 3 questions (4 if doing extra credit) answered to Gradescope 440 | - Checkoff with an ASIC lab TA 441 | 442 | ## Acknowledgement 443 | 444 | This lab is the result of the work of many EECS151/251 GSIs over the years including: 445 | 446 | Written By: 447 | - Nathan Narevsky (2014, 2017) 448 | - Brian Zimmer (2014) 449 | 450 | Modified By: 451 | - John Wright (2015,2016) 452 | - Ali Moin (2018) 453 | - Arya Reais-Parsi (2019) 454 | - Cem Yalcin (2019) 455 | - Tan Nguyen (2020) 456 | - Harrison Liew (2020) 457 | - Sean Huang (2021) 458 | - Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu (2021) 459 | - Dima Nikiforov (2022) 460 | -------------------------------------------------------------------------------- /lab3/spec_sky130.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Lab 3: Logic Synthesis 2 |

3 | Prof. Bora Nikolic 4 |

5 |

6 | TAs: Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu 7 |

8 |

9 | Department of Electrical Engineering and Computer Science 10 |

11 |

12 | College of Engineering, University of California, Berkeley 13 |

14 | 15 | 16 | 17 | ## Overview 18 | For this lab, you will learn how to translate RTL code into a gate-level netlist in a process called 19 | synthesis. In order to successfully synthesize your design, you will need to understand how to 20 | constrain your design, learn how the tools optimize logic and estimate timing, analyze the critical 21 | path of your design, and simulate the gate-level netlist. 22 | To begin this lab, get the project files by typing the following commands: 23 | 24 | ```shell 25 | git clone /home/ff/eecs151/labs/lab3.git 26 | cd lab3 27 | ``` 28 | 29 | You should add the following lines to the `.bashrc` file in your home folder 30 | (for more information about what `.bashrc` does, see https://www.tldp.org/LDP/abs/html/sample-bashrc.html) 31 | so that every time 32 | you open a new terminal you have the paths for the tools setup properly. 33 | 34 | ```shell 35 | source /home/ff/eecs151/tutorials/eecs151.bashrc 36 | export HAMMER_HOME=/home/ff/eecs151/hammer 37 | source ${HAMMER_HOME}/sourceme.sh 38 | ``` 39 | 40 | Type 41 | 42 | ```shell 43 | which genus 44 | ``` 45 | 46 | to see if the shell prints out the path to the Cadence Genus Synthesis program (which we will be 47 | using for this lab). If it does not work, add the lines to your `.bash_profile` in your home folder 48 | as well. Try log in or open a new terminal to see if it works. The file `eecs151.bashrc` sets various 49 | environment variables in your system such as where to find the CAD programs or license servers. 50 | 51 | 52 | ## Synthesis Environment 53 | To perform synthesis, we will be using Cadence Genus. However, we will not be interfacing with 54 | Genus directly, we will rather use HAMMER. Just like in lab 2, we have set up the basic HAMMER 55 | flow for your lab exercises using Makefile. 56 | 57 | In this lab repository, you will see two sets of input files for HAMMER. The first set of files are 58 | the source codes for our design that you will explore in the next section. The second set of files are 59 | some YAML files (`inst-env.yml`, `sky130.yml`, `design-sky130.yml`, `sim-rtl.yml`, `sim-gl-syn.yml`) that 60 | configure the HAMMER flow. Of these YAML files, you should only need to modify `design.yml`, 61 | `sim-rtl.yml` and `sim-gl-syn.yml` in order to configurate to the synthesis and simulation for your 62 | design. 63 | 64 | 65 | HAMMER is already setup at `/home/ff/eecs151/hammer` with all the required plugins for Cadence 66 | Synthesis (Genus) and Place-and-Route (Innovus), Synopsys Simulator (VCS), Mentor Graphics 67 | DRC and LVS (Calibre). You should not need to install it on your own home directory. **These 68 | HAMMER plugins are under NDA. They are provided to us for educational purpose. 69 | They should never be copied outside of instructional machines under any circumstances or else we are at risk of unable to get access to the tools in the future!!!** 70 | 71 | Let us take a look at some parts of `design.yml` file: 72 | 73 | ```yaml 74 | gcd.clockPeriod: &CLK_PERIOD "1ns" 75 | ``` 76 | 77 | This option sets the target clock speed for our design. A more stringent target (a lower clock 78 | period) will make the tool work harder and use higher-power gates to meet the clock 79 | period. A lower target lets the tool focus on reducing area and/or power. 80 | In the sim-rtl.yml: 81 | 82 | ```yaml 83 | defines: 84 | - "CLOCK_PERIOD=1.00" 85 | ``` 86 | 87 | The option sets the clock period used during simulation. It is generally useful to separate the two as 88 | you might want to see how the circuit performs under different clock frequencies without changing 89 | the design constraints. Continuing from `design.yml`: 90 | 91 | ```yaml 92 | gcd.verilogSrc: &VERILOG_SRC 93 | - "src/gcd.v" 94 | - "src/gcd_datapath.v" 95 | - "src/gcd_control.v" 96 | ``` 97 | 98 | and in `sim-rtl.yml`: 99 | 100 | ```yaml 101 | sim.inputs: 102 | input_files: 103 | - "src/gcd.v" 104 | - "src/gcd_datapath.v" 105 | - "src/gcd_control.v" 106 | - "src/gcd_testbench.v" 107 | ``` 108 | 109 | These specify the files for synthesis and simulation. Moving on, we have: 110 | 111 | ```yaml 112 | vlsi.inputs.clocks: [ 113 | {name: "clk", period: *CLK_PERIOD, uncertainty: "0.1ns"} 114 | ] 115 | ``` 116 | 117 | This is where we specify to HAMMER that we intend on using the `CLK_PERIOD` we defined earlier 118 | as the constraint for our design. We will see more detailed constraints in the later labs. 119 | 120 | ## Understanding the example design 121 | We have provided a circuit described in Verilog that computes the greatest common divisor (GCD) 122 | of two numbers. Unlike the FIR filter from the last lab where the testbench constantly provided 123 | stimuli, the GCD algorithm takes a variable number of cycles, so the testbench needs to know when 124 | the circuit is done to check the output. This is accomplished through a “ready/valid” handshake 125 | protocol. This protocol is very ubiquitous and a flavor of it will appear both in the class project 126 | and later on in other blocks you will encounter throughout your career. The block diagram is shown 127 | in the figure below. 128 | 129 |

130 | 131 |

132 | 133 | The GCD module declaration is as follows: 134 | 135 | ```v 136 | module gcd#( parameter W = 16 ) 137 | ( 138 | input clk, reset, 139 | input [W-1:0] operands_bits_A, // Operand A 140 | input [W-1:0] operands_bits_B, // Operand B 141 | input operands_val, // Are operands valid? 142 | output operands_rdy, // ready to take operands 143 | 144 | output [W-1:0] result_bits_data, // GCD 145 | output result_val, // Is the result valid? 146 | input result_rdy // ready to take the result 147 | ); 148 | ``` 149 | 150 | On the `operands` boundary, nothing will happen until GCD is ready to receive data (`operands_rdy`). 151 | When this happens, the testbench will place data on the operands (`operands_bits_A` and `operands_bits_B`), 152 | but GCD will not start until the testbench declares that these operands are valid (`operands_val`). 153 | Then GCD will start. 154 | 155 | The testbench needs to know that GCD is not done. This will be true as long as `result_val` is 0 156 | (the results are not valid). Also, even if GCD is finished, it will hold the result until the testbench is 157 | prepared to receive the data (`result_rdy`). The testbench will check the data when GCD declares 158 | the results are valid by setting `result_val` to 1. 159 | 160 | The main contract is that if the interface declares it is ready, and the other side declares valid, the 161 | information must be transfered. 162 | 163 | Open `src/gcd.v`. This is the top-level of GCD and just instantiates `gcd_control` and `gcd_datapath`. 164 | Separating files into control and datapath is generally a good idea. Open `src/gcd_datapath.v`. 165 | This file stores the operands, and contains the logic necessary to implement the algorithm (subtraction and comparison). Open `src/gcd_control.v`. This file contains a state machine that handles 166 | the ready-valid interface and controls the mux selects in the datapath. Open `src/gcd_testbench.v`. 167 | This file sends different operands to GCD, and checks to see if the correct GCD was found. Make 168 | sure you understand how this file works. Note that the inputs are changed on the negative edge 169 | of the clock. This will prevent hold time violations for gate-level simulation, because once a clock 170 | tree has been added, the input flops will register data at a time later than the testbench’s rising 171 | edge of the clock. 172 | 173 | Now simulate the design by running `make sim-rtl`. The waveform is located under `build/sim-rundir/`. 174 | Open the waveform in DVE (you may need to scroll down in DVE to find the testbench) and try 175 | to understand how the code works by comparing the waveforms with the Verilog code. It might 176 | help to sketch out a state machine diagram and draw the datapath. 177 | 178 | --- 179 | 180 | ### Question 1: Understanding the algorithm 181 | 182 | By reading the provided Verilog code and/or viewing the RTL level simulations, demonstrate that 183 | you understand the provided code: 184 | 185 | **a.) Draw a table with 5 columns (cycle number, value of `A_reg`, value of `B_reg`, next value of `A_reg`, next value of `B_reg`) and fill in all of the rows for the first test vector (GCD of 27 and 15)** 186 | 187 | **b) In `src/gcd_testbench.v`, the inputs are changed on the negative edge of the clock to prevent hold time violations. Is the output checked on the positive edge of the clock or the negative edge of the clock? Why?** 188 | 189 | **c) In `src/gcd_testbench.v`, what will happen if you change `result_rdy = 1;` to `result_rdy = 0;`? What state will `gcd_control.v` state machine be in?** 190 | 191 | --- 192 | ### Question 2: Testbenches 193 | **a) Modify `src/gcd_testbench.v` so that intermediate steps are displayed in the format below. Include a copy of the code you wrote in your writeup (this should be approximately 3-4 lines).** 194 | 195 | ```shell 196 | 0: [ ...... ] Test ( x ), [ x == x ] (decimal) 197 | 1: [ ...... ] Test ( x ), [ x == 0 ] (decimal) 198 | 2: [ ...... ] Test ( x ), [ x == 0 ] (decimal) 199 | 3: [ ...... ] Test ( x ), [ x == 0 ] (decimal) 200 | 4: [ ...... ] Test ( x ), [ x == 0 ] (decimal) 201 | 5: [ ...... ] Test ( x ), [ x == 0 ] (decimal) 202 | 6: [ ...... ] Test ( 0 ), [ 3 == 0 ] (decimal) 203 | 7: [ ...... ] Test ( 0 ), [ 3 == 0 ] (decimal) 204 | 8: [ ...... ] Test ( 0 ), [ 3 == 27 ] (decimal) 205 | 9: [ ...... ] Test ( 0 ), [ 3 == 12 ] (decimal) 206 | 10: [ ...... ] Test ( 0 ), [ 3 == 15 ] (decimal) 207 | 11: [ ...... ] Test ( 0 ), [ 3 == 3 ] (decimal) 208 | 12: [ ...... ] Test ( 0 ), [ 3 == 12 ] (decimal) 209 | 13: [ ...... ] Test ( 0 ), [ 3 == 9 ] (decimal) 210 | 14: [ ...... ] Test ( 0 ), [ 3 == 6 ] (decimal) 211 | 15: [ ...... ] Test ( 0 ), [ 3 == 3 ] (decimal) 212 | 16: [ ...... ] Test ( 0 ), [ 3 == 0 ] (decimal) 213 | 17: [ ...... ] Test ( 0 ), [ 3 == 3 ] (decimal) 214 | 18: [ passed ] Test ( 0 ), [ 3 == 3 ] (decimal) 215 | 19: [ ...... ] Test ( 1 ), [ 7 == 3 ] (decimal) 216 | ``` 217 | --- 218 | 219 | ## Synthesis 220 | Synthesis is the process of converting RTL Verilog files into technology (or platform, in the case of 221 | FPGAs) specific gate-level Verilog. These gates are different from the “and”, “or”, “xor” etc. primitives in Verilog. While the logic primitives correspond to gate-level operations, they do not have 222 | a physical representation outside of their symbol. A synthesized gate-level Verilog only contains 223 | cells with corresponding physical aspects: they have a transistor-level schematic with transistor 224 | sizes provided, a physical layout containing information necessary for fabrication, timing libraries 225 | providing performance specifications etc. Some synthesis tools also output assign statements that 226 | refer to pass-through interfaces, but no logic operation is performed in these assignments (not even 227 | simple inversion!). 228 | 229 | 230 | Open the Makefile to see the available targets that you can run. You don’t have to know all of 231 | these for now. The Makefile provides shorthands to various HAMMER commands for synthesis, 232 | placement-and-routing, or simulation. Read [Hammer-Flow](https://hammer-vlsi.readthedocs.io/en/latest/Hammer-Flow/index.html) if you want to get more detail. 233 | 234 | To start the synthesis process of the GCD module you just analyzed, the first step is to make 235 | HAMMER generate the necessary supplement Makefile (`build/hammer.d`). To do so, type the 236 | following command in the lab directory: 237 | 238 | make buildfile 239 | 240 | This generates a file with make targets specific to the constraints we have provided inside the YAML 241 | files. If you have not run `make clean` after simulating, this file should already be generated. `make buildfile` 242 | also modifies a few files from the Sky130 PDK and stores them to your local workspace. 243 | The extracted PDK is not deleted when 244 | you do `make clean` to avoid unnecessarily rebuilding the PDK. To explicitly remove it, you need to 245 | remove the build folder (and you should do it once you finish the lab to save your allocated disk 246 | space since the PDK is huge). To synthesize the GCD, use the following command: 247 | 248 | make syn 249 | 250 | This runs through all the steps necessary to generate the gate-level Verilog. The final lines of output 251 | you will see is a list of all the registers in the design. There should be all the bits of `A_reg_reg`, 252 | `B_reg_reg` and state registers. 253 | 254 | By default, HAMMER puts the generated objects under the directory build. Go to `build/syn-rundir/reports`. 255 | There are five text files here that contain very useful information about 256 | the synthesized design that we just generated. Go through these files and familiarize yourself with 257 | these reports. One report of particular note is `final_time_ss_100C_1v60.setup_view.rpt`. The 258 | name of this file represents that it is a timing report, with the Process Voltage Temperature corner 259 | of 1.6 V and 100 degrees C, and that it contains the setup timing checks. Another important file 260 | is `build/syn-rundir/gcd.mapped.v`. This is your synthesized gate-level Verilog. Go through it 261 | to see what the RTL design has become to represent it in terms of technology-specific gates. Try 262 | to follow an input through these gates to see the path it takes until the output. While these files 263 | are rarely ever read by humans, you may sometimes find yourself going through these during the 264 | process of debugging. 265 | 266 | Now open the `final_time_ss_100C_1v60.setup_view.rpt` file and look at the first block of text 267 | you see. It should look similar to this: 268 | 269 | ```text 270 | Path 1: MET (212 ps) Setup Check with Pin GCDdpath0/A_reg_reg[15]/CLK->D 271 | View: ss_100C_1v60.setup_view 272 | Group: clk 273 | Startpoint: (R) GCDdpath0/A_reg_reg[1]/CLK 274 | Clock: (R) clk 275 | Endpoint: (F) GCDdpath0/A_reg_reg[15]/D 276 | Clock: (R) clk 277 | 278 | Capture Launch 279 | Clock Edge:+ 5000 0 280 | Src Latency:+ 0 0 281 | Net Latency:+ 0 (I) 0 (I) 282 | Arrival:= 5000 0 283 | 284 | Setup:- 293 285 | Uncertainty:- 500 286 | Required Time:= 4207 287 | Launch Clock:- 0 288 | Data Path:- 3995 289 | Slack:= 212 290 | 291 | #-------------------------------------------------------------------------------------------------------------------------- 292 | # Timing Point Flags Arc Edge Cell Fanout Load Trans Delay Arrival Instance 293 | # (fF) (ps) (ps) (ps) Location 294 | #-------------------------------------------------------------------------------------------------------------------------- 295 | GCDdpath0/A_reg_reg[1]/CLK - - R (arrival) 16 - 0 0 0 (-,-) 296 | GCDdpath0/A_reg_reg[1]/Q - CLK->Q F sky130_fd_sc_hd__dfrtp_1 2 8.4 128 756 756 (-,-) 297 | GCDdpath0/g815/Y - A->Y R sky130_fd_sc_hd__inv_2 2 11.1 99 135 891 (-,-) 298 | GCDdpath0/g812/Y - A->Y F sky130_fd_sc_hd__inv_2 2 5.5 37 75 966 (-,-) 299 | GCDdpath0/sub_45_24_g546__2346/Y - A_N->Y F sky130_fd_sc_hd__nand2b_1 2 6.4 145 322 1287 (-,-) 300 | GCDdpath0/sub_45_24_g482__9315/Y - A->Y R sky130_fd_sc_hd__nand2_1 1 5.8 122 155 1442 (-,-) 301 | GCDdpath0/sub_45_24_g480__6161/Y - A->Y F sky130_fd_sc_hd__nand2_2 3 11.5 120 151 1593 (-,-) 302 | GCDdpath0/sub_45_24_g468__3680/Y - A->Y R sky130_fd_sc_hd__nand3_1 1 3.7 115 136 1729 (-,-) 303 | GCDdpath0/sub_45_24_g467__6783/Y - A->Y F sky130_fd_sc_hd__nand2_1 4 14.4 250 253 1982 (-,-) 304 | GCDdpath0/sub_45_24_g465__8428/Y - A->Y R sky130_fd_sc_hd__nand2_1 2 7.5 145 218 2200 (-,-) 305 | GCDdpath0/sub_45_24_g464/Y - A->Y F sky130_fd_sc_hd__clkinv_1 1 3.6 78 137 2337 (-,-) 306 | GCDdpath0/sub_45_24_g459__5477/X - A1->X F sky130_fd_sc_hd__a21o_2 7 23.1 146 447 2784 (-,-) 307 | GCDdpath0/sub_45_24_g455__2346/Y - A->Y R sky130_fd_sc_hd__nand2_1 2 6.9 130 166 2950 (-,-) 308 | GCDdpath0/sub_45_24_g447__1881/Y - A2->Y F sky130_fd_sc_hd__o21ai_1 1 5.7 139 169 3119 (-,-) 309 | GCDdpath0/sub_45_24_g440__1617/Y - B->Y F sky130_fd_sc_hd__xnor2_1 1 3.6 111 244 3363 (-,-) 310 | GCDdpath0/g1627__5122/X - B1->X F sky130_fd_sc_hd__a22o_1 1 3.6 82 350 3714 (-,-) 311 | GCDdpath0/g1596__1666/X - B1->X F sky130_fd_sc_hd__a21o_1 1 3.1 64 282 3995 (-,-) 312 | GCDdpath0/A_reg_reg[15]/D - - F sky130_fd_sc_hd__dfrtp_1 1 - - 0 3995 (-,-) 313 | #-------------------------------------------------------------------------------------------------------------------------- 314 | 315 | ``` 316 | 317 | This is one of the most common ways to assess the critical paths in your circuit. 318 | The setup timing report lists each timing path's **slack**, which is the extra delay the signal can have before a setup 319 | violation occurs, in ascending order. So the first block indicates the critical path of the design. 320 | Each row represents a timing path from a gate to the next, and the whole block is the **timing 321 | arc** between two flip-flops (or in some cases between latches). The `MET` at the top of the block 322 | indicates that the timing requirements have been met and there is no violation. If there was, this 323 | indicator would have read `VIOLATED`. Since our critical path meets the timing requirements with 324 | a 212 ps of slack, this means we can run this synthesized design with a period equal to clock period 325 | (5000 ps) minus the critical path slack (212 ps), which is 4788 ps. 326 | 327 | --- 328 | ### Question 3: Reporting Questions 329 | **a) Which report would you look at to find the total number of each different standard cell that the design contains?** 330 | 331 | **b) Which report contains area breakdown by modules in the design?** 332 | 333 | **c) What is the cell used for `A_reg_reg[7]`? How much leakage power does this contribute? How did you find this?** 334 | 335 | --- 336 | 337 | ### Question 4: Synthesis Questions 338 | **a) Looking at the total number of sequential cells synthesized and the number of `reg` definitions in the Verilog files, are they consistent? If not, why?** 339 | 340 | **b) Modify the clock period in the `design.yml` file to make the design go faster. What is the highest clock frequency this design can operate at in this technology?** 341 | 342 | --- 343 | 344 | ### Synthesis: Step-by-step 345 | 346 | While for the remainder of the semester we will be roughly following the above section’s flow, it is 347 | useful as a digital IC design engineer to know what is going on during the process. In this section, 348 | we will look at the steps HAMMER takes to get from RTL Verilog to all the outputs we saw in the 349 | last section. 350 | 351 | First, type `make clean` to clean the environment of previous build’s files. Then, use `make buildfile` 352 | to generate the supplementary Makefile as before. Now, we will modify the `make syn` command to 353 | only run the steps we want. Go through the following commands in the given order: 354 | 355 | make redo-syn HAMMER_EXTRA_ARGS="--stop_after_step init_environment" 356 | 357 | HAMMER flow will exit with an error. This is expected, as HAMMER looks for the final output 358 | files to gauge its success. We have not yet generated the gate-level Verilog, so we know beforehand 359 | that every step except the last one is going to end with an error. In this step, HAMMER invokes 360 | Genus to read the technology libraries and the RTL Verilog files, as well as the constraints we 361 | provided in the `design.yml` file. 362 | 363 | make redo-syn HAMMER_EXTRA_ARGS="--stop_after_step syn_generic" 364 | 365 | This step is the **generic synthesis** step. In this step, Genus converts our RTL Verilog files read 366 | in the previous step to an intermediate format, using technology-independent generic gates. These 367 | gates are purely for gate-level functional representation of the RTL we have coded, and are going 368 | to be used as an input to the next step. This step also performs logical optimizations on our design 369 | to eliminate any redundant/unused operations. 370 | 371 | make redo-syn HAMMER_EXTRA_ARGS="--stop_after_step syn_map" 372 | 373 | This step is the **mapping** step. Genus takes its own generic gate-level output and converts it to 374 | our Sky130-specific gates. This step further optimizes the design given the gates in our technology. 375 | That being said, this step can also increase the number of gates from the previous step as not 376 | all gates in the generic gate-level Verilog may be available for our use and they may need to be 377 | constructed using several, simpler gates. 378 | 379 | make redo-syn HAMMER_EXTRA_ARGS="--stop_after_step add_tieoffs" 380 | 381 | In some designs, the pins in certain cells are hardwired to 0 or 1. Since modern technology does 382 | not directly connect cells to Vdd or ground, the tie-off cells are added in this step. 383 | 384 | make redo-syn HAMMER_EXTRA_ARGS="--stop_after_step write_regs" 385 | 386 | This step is purely for the benefit of the designer. For some designs, we may need to have a list 387 | of all the registers in our design. In this lab, the list of regs is used in post-synthesis simulation to 388 | generate the `force_regs.ucli`, which sets initial states of registers. 389 | 390 | make redo-syn HAMMER_EXTRA_ARGS="--stop_after_step generate_reports" 391 | 392 | The reports we have seen in the previous section are generated during this step. 393 | 394 | make redo-syn HAMMER_EXTRA_ARGS="--stop_after_step write_outputs" 395 | 396 | This step writes the outputs of the synthesis flow. This includes the gate-level `.v` file we looked at 397 | earlier in the lab. Other outputs include the design constraints (such as clock frequencies, output 398 | loads etc., in `.sdc` format) and delays between cells (in `.sdf` format). 399 | 400 | ## Post-Synthesis Simulation 401 | From the root folder, type the following commands: 402 | 403 | make sim-gl-syn 404 | 405 | This will run a post-synthesis simulation using annotated delays from the `gcd.mapped.sdf` file. 406 | 407 | --- 408 | ### Question 5: Delay Questions 409 | **a) Check the waveforms in DVE. Submit a screenshot and report the clk-q delay of `state[0]` in `GCDctrl0` at 17.5 ns. Which line in the sdf file specifies this delay?** 410 | 411 | --- 412 | 413 | ## Build Your Divider 414 | Now that you understand how to use the tools to synthesize and simulate the GCD implementation. 415 | In this section, you will build a parameterized divider of unsigned integers. Some initial code has 416 | been provided to you to get started. To keep the control logic simple, the divider module uses input 417 | signal `start` to begin the computation at the next clock cycle, and asserts output signal `done` to 418 | HIGH when the division result is valid. The input `dividend` and `divisor` should be registered 419 | when `start` is HIGH. You are not required to handle corner cases such as dividing by 0. You are 420 | free to modify the skeleton code to adopt ready/valid instead, but it is not required. 421 | 422 | It is suggested that you implement the divide algorithm described [here](http://bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_s04/Project/Divider%20Background.pdf). Use the **Divide Algorithm Version 2** (slide 9). 423 | A simple testbench skeleton is also provided to you. You should change it to add more test vectors, 424 | or test your divider with different bitwidths. You need to change the file `sim-rtl.yml` to use your 425 | divider instead of the GCD module when testing. 426 | 427 | --- 428 | ### Question 6: HAMMER your divider 429 | **1. Push your 4-bit divider design through the tools, and determine its critical path, cell area, and maximum operating frequency from the reports. You might need to rerun synthesis multiple times to determine the maximum achievable frequency.** 430 | 431 | **2. Change the bitwidth of your divider to 32-bit, what is the critical path, area, and maximum operating frequency now?** 432 | 433 | **3. Submit your divider code and testbench to the report. Add comments to explain your testbench and why it provides sufficient coverage for your divider module.** 434 | 435 | --- 436 | ## Lab Deliverables 437 | 438 | ### Lab Due: 11:59 PM, Friday September 24th, 2021 439 | 440 | - Submit a written report with all 6 questions answered to Gradescope 441 | - Checkoff with an ASIC lab TA 442 | 443 | ## Acknowledgement 444 | 445 | This lab is the result of the work of many EECS151/251 GSIs over the years including: 446 | Written By: 447 | - Nathan Narevsky (2014, 2017) 448 | - Brian Zimmer (2014) 449 | Modified By: 450 | - John Wright (2015,2016) 451 | - Ali Moin (2018) 452 | - Arya Reais-Parsi (2019) 453 | - Cem Yalcin (2019) 454 | - Tan Nguyen (2020) 455 | - Harrison Liew (2020) 456 | - Sean Huang (2021) 457 | - Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu (2021) 458 | -------------------------------------------------------------------------------- /lab4/spec_sky130.md: -------------------------------------------------------------------------------- 1 | # EECS 151/251A ASIC Lab 4: Floorplanning, Placement, Power, and CTS 2 | 3 |

4 | Prof. Bora Nikolic 5 |

6 |

7 | TAs: Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu 8 |

9 |

10 | Department of Electrical Engineering and Computer Science 11 |

12 |

13 | College of Engineering, University of California, Berkeley 14 |

15 | 16 | ## Overview 17 | This lab consists of three parts. For the first part, you will be writing a GCD coprocessor that could be included alongside a general-purpose CPU (like your final project). You will then learn how the tools can create a floorplan, route power straps, place standard cells, perform timing optimizations, and generate a clock tree for your design. Finally, you will get a slight head start on your project by writing part of the ALU. 18 | 19 | To begin this lab, get the project files and set up your environment by typing the following command and sourcing the `eecs151.bashrc` file, as usual: 20 | 21 | ```shell 22 | git clone /home/ff/eecs151/fa21/sky130/lab4-sky130.git 23 | ``` 24 | 25 | You should also clean up the build directory generated from the previous labs to save some disk space. 26 | 27 | ## Writing Your Coprocessor 28 | 29 | Take a look at the `gcd_coprocessor.v` file in the src folder. You will see the following empty Verilog module. 30 | 31 | ```verilog 32 | module gcd_coprocessor #( parameter W = 32 )( 33 | input clk, 34 | input reset, 35 | input operands_val, 36 | input [W-1:0] operands_bits_A, 37 | input [W-1:0] operands_bits_B, 38 | output operands_rdy, 39 | output result_val, 40 | output [W-1:0] result_bits, 41 | input result_rdy 42 | ); 43 | 44 | // You should be able to build this with only structural verilog! 45 | // Define wires 46 | // Instantiate gcd_datapath 47 | // Instantiate gcd_control 48 | // Instantiate request FIFO 49 | // Instantiate response FIFO 50 | 51 | endmodule 52 | 53 | ``` 54 | First notice the parameter `W`. `W` is the data width of your coprocessor; the input data and output data will all be this bitwidth. Be sure to pass this parameter on to any submodules that may use it! You should implement a coprocessor that can handle 4 outstanding requests at a time. For now, you will use a FIFO (First-In, First-Out) block to store requests (operands) and responses (results). 55 | 56 | A FIFO is a sequential logic element which accepts (enqueues) valid data and outputs (dequeues) it in the same order when the next block is ready to accept. This is useful for buffering between the producer of data and its consumer. When the input data is valid (`enq_val`) and the FIFO is ready for data (`enq_rdy`), the input data is enqueued into the FIFO. There are similar signals for the output data. This interface is called a “decoupled” interface, and if implemented correctly it makes modular design easy (although sometimes with performance penalties). 57 | 58 | This FIFO is implemented with a 2-dimensional array of data called `buffer`. There are two pointers: a read pointer `rptr` and a write pointer `wptr`. When data is enqueued, the write pointer is incremented. When data is dequeued, the read pointer is incremented. Because the FIFO depth is a power of 2, we can leverage the fact that addition rolls over and the FIFO will continue to work. However, once the read and write pointers are the same, we don’t know if the FIFO is full or empty. We fix this by writing to the `full` register when they are the same and we just enqueued, and clearing the `full` register otherwise. 59 | 60 | A partially written FIFO has been provided for you in `fifo.v`. Using the information above, complete the FIFO implementation so that it behaves as expected. 61 | 62 | 63 | Then, finish the coprocessor implementation in `gcd_coprocessor.v`, so that the GCD unit and FIFOs are connected like in the following diagram. Note the connection between the `gcd_datapath` and `gcd_control` should be very similar to that in Lab 3’s `gcd.v` and that clock and reset are omitted from the diagram. You will need to think about how to manage a ready/valid decoupled interface with 2 FIFOs in parallel. 64 | 65 | 66 |

67 | 68 |

69 | 70 | A testbench has been provided for you (`gcd_coprocessor_testbench.v`). You can run the testbench to test your code by typing `make sim-rtl` in the root directory as before. 71 | 72 | --- 73 | ### Question 1: Design 74 | 75 | **a) Submit your code (`gcd_coprocessor.v` and `fifo.v`) and show that your code works (VCS output is fine).** 76 | 77 | --- 78 | 79 | ## Introducing Place and Route 80 | 81 | In this lab, you will begin to implement your GCD coprocessor in physical layout–the next step towards making it a real integrated circuit. Place & Route (P&R or PAR) itself is a much longer process than synthesis, so for this lab we will look at the first few (and arguably most important) steps: floorplanning, placement, power straps, and clock tree synthesis (CTS). The rest will be introduced in the next lab. 82 | 83 | ### Setting up for P&R 84 | 85 | We will first bring our design to the point we stopped in Lab 3. Synthesize your design: 86 | 87 | 88 | ```shell 89 | make syn 90 | ``` 91 | 92 | Before proceeding, make sure your design is working correctly. It should meet timing at the default 10ns clock period in the setup corner with plenty of slack. 93 | 94 | ### Floorplanning & Placement 95 | Floorplanning is the process of allocating area to the design as well as putting constraints on how this area is utilized. Floorplanning is often the most important factor for determining a physical circuit’s performance, because intelligent floorplanning can assist the tool in minimizing the delays in the design, especially if the total area is highly constrained. 96 | 97 | Floorplan constraints can be “hard” or “soft”. “Hard” constraints generally involve pre-placement of “macros”, which can be anything from memory elements (SRAM arrays, in an upcoming lab) to analog black boxes (like PLLs or LDOs). “Soft” constraints are generally guided placements of hierarchical modules in the design (e.g. the datapath, controller, and FIFOs in your coprocessor), towards certain regions of the floorplan. Generally, the P&R tool does a good job of placing hierarchical modules optimally, but sometimes, a little human assistance is necessary to eke out the last bit of performance. 98 | 99 | In this lab, we will just look at allocating a custom sized area to our design, specified in the `design-sky130.yml` file. Open up this file and locate the following text block: 100 | 101 | ```verilog 102 | # Placement Constraints 103 | vlsi.inputs.placement_constraints: 104 | - path: "gcd_coprocessor" 105 | type: "toplevel" 106 | x: 0 107 | y: 0 108 | width: 150 109 | height: 150 110 | margins: 111 | left: 10 112 | right: 10 113 | top: 10 114 | bottom: 10 115 | - path: "gcd_coprocessor/GCDpath0" 116 | type: "placement" 117 | x: 50 118 | y: 50 119 | width: 50 120 | height: 50 121 | 122 | # Pin placement constraints 123 | vlsi.inputs.pin_mode: generated 124 | vlsi.inputs.pin.generate_mode: semi_auto 125 | vlsi.inputs.pin.assignments: [ 126 | {pins: "*", layers: ["met2", "met4"], side: "bottom"} 127 | ] 128 | ``` 129 | 130 | The `vlsi.inputs.placement_constraints` block specifies two floorplan constraints. The first one denotes the origin `(x, y)`, size `(width, height)` and border margins of the top-level block `gcd_coprocessor`. The second one denotes a soft placement constraint on the GCD datapath to be roughly in the center of the floorplan. For complicated designs, floorplans of major modules are often defined separately, and then assembled together hierarchically. 131 | 132 | Pin constraints are also shown here. All that we need to see is that all pins are located at the bottom boundary of the design, on metal 2 and metal 4 layers. Pin placement becomes very important in a hierarchical design, if modules need to abut each other. 133 | 134 | Placement is the process of placing the synthesized design (structural connection of standard cells) onto the specified floorplan. While there is placement of minor cells (such as bulk connection cells, antenna-effect prevention cells, I/O buffers...) that take place separately and in between various stages of design, “placement” usually refers to the initial placement of the standard cells. 135 | 136 | After the cells are placed, they are not “locked”–they can be moved around by the tool during subsequent optimization steps. However, initial placement tries its best to place the cells optimally, obeying the floorplan constraints and using complex heuristics to minimize the parasitic delay caused by the connecting wires between cells and timing skew between synchronous elements (e.g. flip-flops, memories). Poor placement (as well as poor aspect ratio of the floorplan) can result in congestion of wires later on in the design, which may prevent successful routing. 137 | 138 | ### Power 139 | 140 | 141 | In the middle of the `sky130.yml` file, you will see this block, which contains parameters to HAMMER’s power strap auto-calculation API: 142 | 143 | ```yaml 144 | # Power Straps 145 | par.power_straps_mode: generate 146 | par.generate_power_straps_method: by_tracks 147 | par.blockage_spacing: 2.0 148 | par.generate_power_straps_options: 149 | by_tracks: 150 | strap_layers: 151 | - met2 152 | - met3 153 | - met4 154 | - met5 155 | pin_layers: 156 | - met5 157 | track_width: 6 158 | track_width_met5: 2 159 | track_spacing: 1 160 | track_start: 10 161 | power_utilization: 0.25 162 | power_utilization_met5: 1 163 | ``` 164 | 165 | Power must be delivered to the cells from the topmost metal layers all the way down to the transistors, in a fashion that minimizes the overall resistance of the power wires without eating up all the resources that are needed for wiring the cells together. You will learn about power distribution briefly at the end of this course’s lectures, but the preferred method is to place interconnected grids of fat wires on every metal layer. There are tools to check the quality of the `power_distribution` network, which like the post-P&R simulations you did in Lab 2, calculate how the current being drawn by the circuit is transiently distributed across the power grid. 166 | 167 | You should not need to touch this block of yaml, because the parameters are tuned for meeting design rules in this technology. However, the important parameter is `power_utilization`, which specifies that approximately 25% of the available routing space on each metal layer should be reserved for power, with the exception of metal 5, which should have 100% coverage. 168 | 169 | ### Clock Tree Synthesis (CTS): Overview 170 | 171 | Clock Tree Synthesis (CTS) is arguably the next most important step in P&R behind floorplanning. Recall that up until this point, we have not talked about the clock that triggers all the sequential logic in our design. This is because the clock signal is assumed to arrive at every sequential element in our design at the same time. The synthesis tool makes this assumption and so does the initial cell placement algorithm. In reality, the sequential elements have to be placed wherever makes the most sense (e.g. to minimize delays between them). As a result, there is a different amount of delay to every element from the top-level clock pin that must be “balanced” to maintain the timing results from synthesis. We shall now explore the steps the P&R tool takes to solve this problem and why it is called Clock Tree Synthesis. 172 | 173 | ### Pre-CTS Optimization 174 | 175 | Pre-CTS optimization is the first round of Static Timing Analysis (STA) and optimization performed on the design. It has a large freedom to move the cells around to optimize your design to meet setup checks, and is performed after the initial cell placement. Hold errors are not checked during pre-CTS optimization. Because we do not have a clock tree in place yet, we do not know when the clocks will arrive to each sequential element, hence we don’t know if there are hold violations. The tool therefore assumes that every sequential element receives the clock ideally at the same time, and tries to balance out the delays in data paths to ensure no setup violations occur. In the end, it generates a timing report, very similar to the ones we saw in the last lab. 176 | 177 | ### Clock Tree Clustering and Balancing 178 | Most of CTS is accomplished after initial optimization. The CTS algorithm first clusters groups of sequential elements together, mostly based on their position in the design relative to the top-level clock pin and common clock gating logic. The numbers of elements in each cluster is selected so that it does not present too large of a load to a driving cell. These clusters of sequential elements are the “leaves” of the clock tree attached to branches. 179 | 180 | Next, the CTS algorithm tries to ensure that the delay from the top-level clock pin to the leaves are all the same. It accomplishes this by adding and sizing clock buffers between the top-level pin and the leaves. There may be multiple stages of clock buffering, depending on how physically large the design is. Each clock buffer that drives multiple loads is a branching point in the clock tree, and strings of clock buffers in a row are essentially the “trunks”. Finally, the top-level clock pin is considered the “root” of the clock tree. 181 | 182 | The CTS algorithm may go through many iterations of clustering and balancing. It will try to minimize the depth of the tree (called *insertion delay*, i.e. the delay from the root to the leaves) while simultaneously minimizing the *skew* (difference in insertion delay) between each leaf in the tree. The deeper the tree, the harder it is to meet both setup and hold timing (*thought experiment #1*: why is this?). 183 | 184 | ### Post-CTS Optimization 185 | Post-CTS optimization is then performed, where the clock is now a real signal that is being distributed unequally to different parts of the design. In this step, the tool fixes setup and hold time violations simultaneously. Often times, fixing one error may introduce one or multiple errors (*thought experiment #2*: why is this?), so this process is iterative until it reaches convergence (which may or may not meet your timing constraints!). Fixing these violations involve resizing, adding/deleting, and even moving the logic and clock cells. 186 | 187 | After this stage of optimization, the clock tree and clock routing are fixed. In the next lab, you will finish the P&R flow, which finalizes the rest of the routing, but it is usually the case that if your design is unable to meet timing after CTS, there’s no point continuing! 188 | 189 | ## Compiling the Design with HAMMER 190 | 191 | Now that we went over the flow (at least at a high level), it is time to actually perform these steps. Type the following commands to perform the above described operations: 192 | 193 | ```shell 194 | make syn-to-par 195 | make redo-par HAMMER_EXTRA_ARGS="--stop_after_step clock_tree" 196 | ``` 197 | 198 | 199 | The first command here translates the outputs of the synthesis tool to conform to the inputs expected by the P&R tool. The second command is similar to the partial synthesis commands we used in the last lab. It tells HAMMER to do the PAR flow until it finishes CTS, then stop. Under the hood, for this lab, HAMMER uses Cadence Innovus as the back-end tool to perform P&R. HAMMER waits until Innovus is done with the P&R steps through post-CTS optimization, then exits. You will see that HAMMER again gives you an error - similar to last lab when HAMMER expected a synthesized output, this time HAMMER expects the full flow to be completed and gives an error whenever it can’t find some collateral expected of P&R. 200 | 201 | Once done, look into the `build/par-rundir` folder. Similar to how all the synthesis files were placed under `build/syn-rundir` folder in the previous lab, this folder holds all the P&R files. Go ahead and open `par.tcl` file in a text editor. HAMMER generated this file for Innovus to consume in batch mode, and inside are Innovus Common UI commands as a TCL script. 202 | 203 | While we will be looking through some of these commands in a bit, first take a look at `timingReports`. You should only see the pre-CTS timing reports. `gcd_coprocessor_preCTS_all.tarpt.gz` contains the report in a g-zipped archive. The remaining files also contain useful information regarding capacitances, length of wires etc. You may view these directly using Vim, unzip them using `gzip`, or navigate through them with Caja, the file browser. 204 | 205 | Going back a level, in `par-rundir`, the folder `hammer_cts_debug` has the post-CTS timing reports. The two important archives are `hammer_cts_all.tarpt.gz` and `hammer_cts_all_hold.tarpt.gz`. These contain the setup and hold timing analyses results after post-CTS optimization. Look into the hold report (you may actually see some violations!). However, any violation should be small (<1 ps) and because we have a lot of margins during design (namely the `design.yml` file has “clock uncertainty” set to 100 ps), these small violations are not of concern, but should still be investigated in a real design. 206 | 207 | ## Visualizing the Results 208 | 209 | From the `build/par-rundir` folder, execute the following in a terminal with graphics (X2Go highly recommended for low latency): 210 | 211 | ```shell 212 | ./generated-scripts/open_chip 213 | ``` 214 | The Innovus GUI will pop up with your layout and your terminal is now the Innovus shell. After the window opens, click anywhere inside the black window at the center of the GUI and press “F” to zoom-to-fit. You should see your entire design, which should look roughly similar to the one below once you disable the via4 and met5 layers (because recall that the power straps in these metal layers were set to 100% coverage) using the right panel by unchecking their respective boxes under the “V” column: 215 | 216 |

217 | 218 |

219 | 220 | 221 | Take a moment to familiarize yourself with the Innovus GUI. You should also toggle between the floorplan, amoeba, and placement views using the buttons in the top right corner of the screen that look like this: and examine how the actual placement of the GCD datapath in ameoba view doesn’t follow our soft placement guidance in floorplan view. This is because our soft placement guidance clearly places the datapath farther away from the pins and would result in a worse clock tree! 222 | 223 | Now, let’s take a look at the clock tree a couple different ways. In the right panel, under the “Net” category, hide from view all the types of nets except “Clock”. Your design should now look approximately like this, which shows the clock tree routing: 224 | 225 |

226 | 227 |

228 | 229 | 230 | We can also see the clock tree in its “tree” form by going to the menu Clock → CCOpt Clock Tree Debugger and pressing OK in the popup dialog. A window should pop up looking approximately like this: 231 | 232 |

233 | 234 |

235 | 236 | 237 | The red dots are the “leaves”, the green triangles are the clock buffers, the blue dots are clock gates (they are used to save power), and the green pin on top is the clock pin or the clock “root”. The numbers on the left side denote the insertion delay in ps. 238 | 239 | Now, let’s visualize our critical path. Go to the menu Timing → Debug Timing and press OK in the popup dialog. A window will pop up that looks approximately like this: 240 | 241 |

242 | 243 |

244 | 245 | Examine the histogram. This shows the number of paths for every amount of slack (on the x-axis), and you always want to see a green histogram! The shape of the histogram is a good indicator of how good your design is and how hard the tool is working to meet your timing constraints (*thought experiment #3:* how so, and what would be the the ideal histogram shape?). 246 | 247 | Now right-click on Path 1 in this window (the critical path), select Show Timing Analyzer and Highlight Path, and select a color. A window will pop up, which is a graphical representation of the timing reports you saw in the `hammer_cts_debug` folder. Poke around the tabs to see all the different representations of this critical path. Back in the main Innovus window, the critical path will be highlighted, showing the chain of cells along the path and the approximate routing it takes to get there, which may look something like this (with all Layers disabled): 248 | 249 |

250 | 251 |

252 | 253 | --- 254 | 255 | ### Question 2: Interpreting P&R Timing Reports 256 | **a) What is the critical path of your design pre- and post-CTS? Is it the same as the post-synthesis critical path?** 257 | 258 | **b) Look in the post-CTS text timing report (`hammer_cts_debug/hammer_cts.all.tarpt`). Find a path inside which the same cell is used more than once. Identify the delay of those instances of that common cell. Can you explain why they are different?** 259 | 260 | **c) What is the skew between the clock that arrives at the flip-flops at the beginning and end of the post-CTS critical path? Does this skew help or hurt the timing margin calculation?** 261 | 262 | d) (UNGRADED thought experiment #1) Why is it harder to meet both setup and hold timing constraints if the clock tree has large insertion delay? 263 | 264 | e) (UNGRADED thought experiment #2) Why does fixing one setup or hold error introduce one or multiple errors? Is it more likely to produce an error of the same, or different type, and why? 265 | 266 | f) (UNGRADED thought experiment #3) P&R tools have a goal to minimize power while ensur- ing that all paths have have >0ps of slack. What might a timing path histogram look like in a design that has maximized the frequency it can run at while meeting this goal? Given the histogram obtained here, does it look we can increase our performance? What might we need to improve/change? 267 | 268 | --- 269 | 270 | When you are done, you may exit Innovus by closing the GUI window. 271 | ## Under the Hood: Innovus 272 | While HAMMER obfuscates a lot from the end-user in terms of tool-based commands, most IC companies directly interface with Innovus and it is useful to know what tool-specific commands you are running in case you need to debug your circuit step-by-step. Therefore, we will now look into par.tcl and follow along using Innovus. Make sure you are in the directory `build/par-rundir` and type: 273 | 274 | ```shell 275 | innovus -common_ui 276 | ``` 277 | Now, follow `par.tcl` command-by-command, copying and pasting the commands to the Innovus shell and looking at the GUI for any changes. You may skip the `puts` commands as they just tell the tool to print out what its doing, and the `write_db` commands which write a checkpoint database between each step of the P&R flow. The steps that you will see significant changes are listed below. As you progress through the steps, feel free to zoom in to investigate what is going on with the design, look at the extra TCL files that are sourced, and cross-reference the commands with the command reference manual at `/home/ff/eecs151/labs/manuals/TCRcom.pdf`. 278 | 279 | 1. After the command sourcing `floorplan.tcl` 280 | 2. After the command sourcing power `straps.tcl` 281 | 3. After the command `edit pin` 282 | 4. After the command `place_opt_design` 283 | 284 | After the `ccopt_design` command is run, you may see a bunch of white X markers on your design. These are some Design Rule Violations (DRVs), indicating Innovus didn’t quite comply with the technology’s requirements. Ignore these for the purposes of this lab. 285 | 286 | --- 287 | 288 | ### Question 3: Understanding P&R Steps 289 | 290 | **a) Submit a snapshot of your design for each of the four steps described above (use whichever Innovus view you deem is most appropriate to show the changes of each step). Make sure the via4 and met5 layers are not visible, and your design is zoomed-to-fit. Describe how the design layout changes for each major step in their respective figure captions.** 291 | 292 | **b) Examine the power straps on met1, in relation to the cells. You will need to zoom in far enough to see the net label on the straps. What does their pattern tell you about how digital standard cells are constructed?** 293 | 294 | **c) Take a note of the orientations of power straps and routing metals. If you were to place pins on the right side of this block instead of the bottom, what metal layers could they be on?** 295 | 296 | --- 297 | 298 | Now zoom in to one of the cells and click the box next to “Cell” on the right panel of the GUI. This will show you the internal routing of the standard cells. While by default we have this off, it may prove useful when investigating DRVs in a design. You can now exit the application by closing the GUI window. 299 | 300 | ## Project Preparation 301 | --- 302 | ### Question 4: ALU 303 | In this question, you will be designing and testing an ALU for later use in the semester. A header file containing define statements for operations (`ALUop.vh`) is provided inside the `src` directory of this lab. This file has already been included in an ALU template given to you in the same folder (`ALU.v`), but you may need to modify the include statement to match the correct path of the header file. Compare ALUop input of your ALU to the define statements inside the header file to select the function ALU is currently running. Definition of the functions is given below: 304 | 305 | | Op Code | Definition | 306 | |:-------:|:---------------------------------------------------------:| 307 | | ADD | Add A and B | 308 | | SUB | Substrate B from A | 309 | | AND | Bitwise `and` A and B | 310 | | OR | Bitwise `or` A and B | 311 | | XOR | Bitwise `xor` A and B | 312 | | SLT | Perform a signed comparison, Out=1 if A < B | 313 | | SLTU | Perform an unsigned comparison, Out = 1 if A < B | 314 | | SLL | Logical shift left A by an amount indicated by B[4:0] | 315 | | SLA | Arithmetic shift right A by an amount indicated by B[4:0] | 316 | | SRL | Logical shift right A by an amount indicated by B[4:0] | 317 | | COPY_B | Output is equal to B | 318 | | XXX | Output is 0 | 319 | 320 | Given these definitions, complete `ALU.v` and write a testbench tb `ALU.v` that checks all these operations with random inputs at least a 100 times per operation and outputs a PASS/FAIL indicator. For this lab, we will only check for effort and not correctness, but you will need it to work later! 321 | 322 | --- 323 | 324 | ## Lab Deliverables 325 | 326 | ### Lab Due: 11:59 PM, Friday October 1st, 2021 327 | 328 | - Submit a written report with all 4 questions answered to Gradescope 329 | - Checkoff with an ASIC lab TA 330 | 331 | ## Acknowledgement 332 | 333 | This lab is the result of the work of many EECS151/251 GSIs over the years including: 334 | 335 | Written By: 336 | - Nathan Narevsky (2014, 2017) 337 | - Brian Zimmer (2014) 338 | 339 | Modified By: 340 | - John Wright (2015,2016) 341 | - Ali Moin (2018) 342 | - Arya Reais-Parsi (2019) 343 | - Cem Yalcin (2019) 344 | - Tan Nguyen (2020) 345 | - Harrison Liew (2020) 346 | - Sean Huang (2021) 347 | - Daniel Grubb, Nayiri Krzysztofowicz, Zhaokai Liu (2021) 348 | --------------------------------------------------------------------------------