├── README.md ├── LICENSE └── notes.md /README.md: -------------------------------------------------------------------------------- 1 | # Next-Generation FPGA Place-and-Route 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2018, Robert Ou 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | 1. Redistributions of source code must retain the above copyright notice, 8 | this list of conditions and the following disclaimer. 9 | 2. Redistributions in binary form must reproduce the above copyright notice, 10 | this list of conditions and the following disclaimer in the documentation 11 | and/or other materials provided with the distribution. 12 | 13 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 14 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 15 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 16 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 17 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 19 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 20 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 21 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 22 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 23 | -------------------------------------------------------------------------------- /notes.md: -------------------------------------------------------------------------------- 1 | The overall approach will probably use a hybrid of the [RippleFPGA](https://pdfs.semanticscholar.org/c1c9/43c047e9d834c8ac487ec6d5292485546744.pdf) 2 | and [UTPlaceF](http://wuxili.net/pdf/TCAD17_UTPlaceF_Li.pdf) algorithms for combined packing and placing. Early experiments 3 | with VPR demonstrated to me that an artificial barrier between these stages is not helpful. These algorithms all use 4 | quadratic programming. These two algorithms are fairly similar to each other. 5 | 6 | For now, routing will just use whatever algorithm is "typical" (Pathfinder?). I may end up with egg on my face, but it 7 | seems that this is "just" an informed search problem. Some ideas that _may_ end up enhancing this stage: 8 | * A* or other heuristics (XXX can we use inadmissible heuristics?) 9 | * Beam search 10 | 11 | Miscellaneous papers on how we "got to" analytic placers from SA placers: 12 | * http://janders.eecg.toronto.edu/1387/readings/marcelfpl12.pdf 13 | * http://www.ece.umich.edu/cse/awards/pdfs/iccad10-simpl.pdf 14 | 15 | Proposed overall flow for KinglerPAR: 16 | * Read yosys netlist 17 | * Cell munging (general purpose and arch-specific) 18 | * Types: IO, LUT, FF, RAM/DSP/large-in-fabric-movable-blocks, fixed specials (blocks like PLLs, flash, ADC), movable specials? 19 | * Arch-specific logic propagates around some constraints now (e.g. for special blocks or for promoting globals (XXX should this be generic?)) 20 | * XXX should we have some logic for handling clock domains? Is this ever useful? 21 | * Arch-specific greedy packing (e.g. for RAM/DSP/special) 22 | * Arch-specific greedy placement (e.g. for specials) 23 | * Build HeAP-style superblocks for carry chains 24 | * Warn if carry chains are huge? 25 | * XXX can we auto-split them? This is hard; involves injecting feedthrough cells, which affects legalization 26 | * XXX UNTESTEED IDEA: Can we twiddle net weights to encourage packing of carry chains and BLEs and other similar inside-CLB features? Do we need to? 27 | * XXX can QP work with _zero_ constraints? If not, assign non-user-constrained IOs uniformly so as to avoid getting a giant blob of 100% overlapped cells (XXX what about clocks and stuff?). 28 | * Distribute all cells uniformly in fabric (required to seed B2B). 29 | * Run some iterations of HPWL-driven QP (3 iterations? just to get something that vaguely looks right. We need this for the next step) 30 | * Optionally run RippleFPGA partitioning (e.g. this doesn't work on MAX V because it's way too small. It's unclear for iCE40 which doesn't have as much directional routing bias.) 31 | * Run HPWL-driven QP to a convergence criteria. 32 | * QP legalization will probably be UTPlaceF/POLAR-like for the spreading phase 33 | * Arch-specific hook point (e.g. for clocks). This hook point has enough information to e.g. assign quadrants but is also early enough that there is room to "recover from" constraints such as quadrants. 34 | * Run some iterations of congestion-driven QP. 35 | * Fully legalize RAM/DSP now, but don't lock it in yet (UTPlaceF) 36 | * Will probably use RippleFPGA-style area inflating and congestion estimation. Definitely won't import NTUPlace. 37 | * Finish congestion-driven global placement (skip RippleFPGA soft BLE packing -- intuition tells me this probably doesn't do much) 38 | * CLB packing with full legalization (XXX if it fails do we run QP again?) 39 | * Arch-specific hook point (e.g. for clocks). This is an "extra" hook point that I'm not currently sure how to use. It is after CLB packing, so that might be useful. 40 | * Save a design checkpoint now; possibly rewrite data structures ((somewhat) packed netlist) 41 | * Detail placement (RippleFPGA-style with alignment) 42 | * Final slot assignment 43 | * Save a design checkpoint now; possibly rewrite data structures (packed+placed netlist) 44 | * Arch-specific hook point (e.g. for clocks). This is intended for code to actually implement clock routing. 45 | * XXX research how to implement routing algorithms 46 | * It seems we should be able to reenter pack+place and adjust cell density if we save enough state 47 | * Save a design checkpoint now; possibly rewrite data structures (fully-PARed result) 48 | * XXX we probably want a way to back-annotate this 49 | * Convert to arch-specific data structures and write bitstream 50 | 51 | Need to further research (directly-relevant algorithm details): 52 | * What does the RippleFPGA deferred slot assignment actually gain us? 53 | * Simplicity, but it seems we need to do per-arch manual simplifying of legalization rules to take advantage of this 54 | * Can MAX V or iCE40 or other "simple LUT4" FPGAs skip complexity in CLB packing? 55 | * See above 56 | * How do UTPlaceF/RippleFPGA CLB packing differ? 57 | 58 | Need to further research (FPGA architectures): 59 | * Investigate global net structures in real FPGAs 60 | * How will other "special" blocks affect our PAR algorithm? (e.g. PLLs, ADCs, user flash, etc.) 61 | * Does any FPGA have "fracturable" RAM/DSP blocks? (Answer: YES) What to do about those? (Probably just treat them similarly and apply legalization etc. except with fixing their locations earlier) 62 | 63 | Need to further research (general abstract CS topics): 64 | * Efficient cache-optimized data structures, especially for multithreading 65 | * General research on "modern" approaches to multithreaded algorithms 66 | * Brief research on numerical stability (can we just use integers/fixed?) 67 | 68 | Features for minimum viable product: 69 | * Quadratic programming core engine, HPWL only 70 | * CLB packing 71 | * Detail placement 72 | * Final assignment 73 | * MVP will target iCE40 + MAX V 74 | 75 | Post-MVP top priorities: 76 | * Carry chains (can probably ignore for MVP as long as code gets plumbed properly for them) 77 | * RAM/DSP (iCE40 has, MAX V doesn't) 78 | * LUT6/ALM (question: can we understand X-ray by then? If not then we can consider working on S6 or Cyc10GX) 79 | * Congestion-driven placement (does this need to be higher priority?) 80 | * Partitioning 81 | 82 | XXX additional notes: 83 | * Need to ensure we don't pessimize LUT6/ALM architectures (everything I know well are "simple" LUT4) 84 | * VPR XML packing descriptions for AAPack are ridiculously overengineered -- KinglerPAR will just use code instead of declarative data 85 | * How to make this code sufficiently reusable? MAX V is probably simpler than iCE40 is simpler than ECP5 is simpler than 7 86 | * G-cell congestion estimates seem to be designed around "Xilinx-style" big switch box architectures, what happens on Altera-style architectures? 87 | * Can we "dump" data back into VPR for routing initially? 88 | 89 | Architectures: 90 | * MAX V -- "Altera-style", LUT4 91 | * iCE40 -- "Altera-style", LUT4, RAM 92 | * 7 -- "Xilinx-style", LUT6/ALM, RAM+DSP 93 | * ECP5 -- "Xilinx-style", LUT4, RAM+DSP 94 | * MAX10 (future) -- "Altera-style", LUT4, RAM+mult 95 | * Cyc10LP (future) -- "Altera-style", LUT4, RAM+mult 96 | * Cyc10GX (future) -- "Altera-style", LUT6/ALM, RAM+DSP 97 | 98 | Unsorted references: 99 | * http://www.cse.cuhk.edu.hk/~byu/papers/C54-ICCAD2016-RippleFPGA-slides.pdf 100 | * https://chengengjie.github.io/papers/C2-ICCAD16-RippleFPGA.pdf 101 | * https://pdfs.semanticscholar.org/c1c9/43c047e9d834c8ac487ec6d5292485546744.pdf 102 | * http://wuxili.net/pdf/TCAD17_UTPlaceF_Li.pdf 103 | * http://sci-hub.tw/https://ieeexplore.ieee.org/document/6691143/ 104 | * http://sci-hub.tw/https://ieeexplore.ieee.org/document/1560039/ 105 | * http://www.cse.cuhk.edu.hk/~fyyoung/paper/iccad11_ripple.pdf 106 | * http://appsrv.cse.cuhk.edu.hk/~jkuang/pdf/ripple2.pdf 107 | * http://janders.eecg.toronto.edu/1387/readings/marcelfpl12.pdf 108 | * http://www.ece.umich.edu/cse/awards/pdfs/iccad10-simpl.pdf 109 | * https://atrium.lib.uoguelph.ca/xmlui/bitstream/handle/10214/12985/Abuowaimer_Ziad_201805_PhD.pdf?sequence=5&isAllowed=y 110 | --------------------------------------------------------------------------------