├── README.md
├── LICENSE
└── notes.md


/README.md:
--------------------------------------------------------------------------------
1 | # Next-Generation FPGA Place-and-Route
2 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2018, Robert Ou <rqou@robertou.com>
 2 | All rights reserved.
 3 | 
 4 | Redistribution and use in source and binary forms, with or without
 5 | modification, are permitted provided that the following conditions are met:
 6 | 
 7 | 1. Redistributions of source code must retain the above copyright notice,
 8 |    this list of conditions and the following disclaimer.
 9 | 2. Redistributions in binary form must reproduce the above copyright notice,
10 |    this list of conditions and the following disclaimer in the documentation
11 |    and/or other materials provided with the distribution.
12 | 
13 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
14 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
15 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
16 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
17 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
19 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
20 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
21 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
22 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
23 | 


--------------------------------------------------------------------------------
/notes.md:
--------------------------------------------------------------------------------
  1 | The overall approach will probably use a hybrid of the [RippleFPGA](https://pdfs.semanticscholar.org/c1c9/43c047e9d834c8ac487ec6d5292485546744.pdf)
  2 | and [UTPlaceF](http://wuxili.net/pdf/TCAD17_UTPlaceF_Li.pdf) algorithms for combined packing and placing. Early experiments
  3 | with VPR demonstrated to me that an artificial barrier between these stages is not helpful. These algorithms all use
  4 | quadratic programming. These two algorithms are fairly similar to each other.
  5 | 
  6 | For now, routing will just use whatever algorithm is "typical" (Pathfinder?). I may end up with egg on my face, but it
  7 | seems that this is "just" an informed search problem. Some ideas that _may_ end up enhancing this stage:
  8 | * A* or other heuristics (XXX can we use inadmissible heuristics?)
  9 | * Beam search
 10 | 
 11 | Miscellaneous papers on how we "got to" analytic placers from SA placers:
 12 | * http://janders.eecg.toronto.edu/1387/readings/marcelfpl12.pdf
 13 | * http://www.ece.umich.edu/cse/awards/pdfs/iccad10-simpl.pdf
 14 | 
 15 | Proposed overall flow for KinglerPAR:
 16 | * Read yosys netlist
 17 | * Cell munging (general purpose and arch-specific)
 18 |     * Types: IO, LUT, FF, RAM/DSP/large-in-fabric-movable-blocks, fixed specials (blocks like PLLs, flash, ADC), movable specials?
 19 | * Arch-specific logic propagates around some constraints now (e.g. for special blocks or for promoting globals (XXX should this be generic?))
 20 |     * XXX should we have some logic for handling clock domains? Is this ever useful?
 21 | * Arch-specific greedy packing (e.g. for RAM/DSP/special)
 22 | * Arch-specific greedy placement (e.g. for specials)
 23 | * Build HeAP-style superblocks for carry chains
 24 |     * Warn if carry chains are huge?
 25 |     * XXX can we auto-split them? This is hard; involves injecting feedthrough cells, which affects legalization
 26 |     * XXX UNTESTEED IDEA: Can we twiddle net weights to encourage packing of carry chains and BLEs and other similar inside-CLB features? Do we need to?
 27 | * XXX can QP work with _zero_ constraints? If not, assign non-user-constrained IOs uniformly so as to avoid getting a giant blob of 100% overlapped cells (XXX what about clocks and stuff?).
 28 | * Distribute all cells uniformly in fabric (required to seed B2B).
 29 | * Run some iterations of HPWL-driven QP (3 iterations? just to get something that vaguely looks right. We need this for the next step)
 30 | * Optionally run RippleFPGA partitioning (e.g. this doesn't work on MAX V because it's way too small. It's unclear for iCE40 which doesn't have as much directional routing bias.)
 31 | * Run HPWL-driven QP to a convergence criteria.
 32 |     * QP legalization will probably be UTPlaceF/POLAR-like for the spreading phase
 33 | * Arch-specific hook point (e.g. for clocks). This hook point has enough information to e.g. assign quadrants but is also early enough that there is room to "recover from" constraints such as quadrants.
 34 | * Run some iterations of congestion-driven QP.
 35 |     * Fully legalize RAM/DSP now, but don't lock it in yet (UTPlaceF)
 36 |     * Will probably use RippleFPGA-style area inflating and congestion estimation. Definitely won't import NTUPlace.
 37 | * Finish congestion-driven global placement (skip RippleFPGA soft BLE packing -- intuition tells me this probably doesn't do much)
 38 | * CLB packing with full legalization (XXX if it fails do we run QP again?)
 39 | * Arch-specific hook point (e.g. for clocks). This is an "extra" hook point that I'm not currently sure how to use. It is after CLB packing, so that might be useful.
 40 | * Save a design checkpoint now; possibly rewrite data structures ((somewhat) packed netlist)
 41 | * Detail placement (RippleFPGA-style with alignment)
 42 | * Final slot assignment
 43 | * Save a design checkpoint now; possibly rewrite data structures (packed+placed netlist)
 44 | * Arch-specific hook point (e.g. for clocks). This is intended for code to actually implement clock routing.
 45 | * XXX research how to implement routing algorithms
 46 |     * It seems we should be able to reenter pack+place and adjust cell density if we save enough state
 47 | * Save a design checkpoint now; possibly rewrite data structures (fully-PARed result)
 48 |     * XXX we probably want a way to back-annotate this
 49 | * Convert to arch-specific data structures and write bitstream
 50 | 
 51 | Need to further research (directly-relevant algorithm details):
 52 | * What does the RippleFPGA deferred slot assignment actually gain us?
 53 |     * Simplicity, but it seems we need to do per-arch manual simplifying of legalization rules to take advantage of this
 54 | * Can MAX V or iCE40 or other "simple LUT4" FPGAs skip complexity in CLB packing?
 55 |     * See above
 56 | * How do UTPlaceF/RippleFPGA CLB packing differ?
 57 | 
 58 | Need to further research (FPGA architectures):
 59 | * Investigate global net structures in real FPGAs
 60 | * How will other "special" blocks affect our PAR algorithm? (e.g. PLLs, ADCs, user flash, etc.)
 61 | * Does any FPGA have "fracturable" RAM/DSP blocks? (Answer: YES) What to do about those? (Probably just treat them similarly and apply legalization etc. except with fixing their locations earlier)
 62 | 
 63 | Need to further research (general abstract CS topics):
 64 | * Efficient cache-optimized data structures, especially for multithreading
 65 | * General research on "modern" approaches to multithreaded algorithms
 66 | * Brief research on numerical stability (can we just use integers/fixed?)
 67 | 
 68 | Features for minimum viable product:
 69 | * Quadratic programming core engine, HPWL only
 70 | * CLB packing
 71 | * Detail placement
 72 | * Final assignment
 73 | * MVP will target iCE40 + MAX V
 74 | 
 75 | Post-MVP top priorities:
 76 | * Carry chains (can probably ignore for MVP as long as code gets plumbed properly for them)
 77 | * RAM/DSP (iCE40 has, MAX V doesn't)
 78 | * LUT6/ALM (question: can we understand X-ray by then? If not then we can consider working on S6 or Cyc10GX)
 79 | * Congestion-driven placement (does this need to be higher priority?)
 80 | * Partitioning
 81 | 
 82 | XXX additional notes:
 83 | * Need to ensure we don't pessimize LUT6/ALM architectures (everything I know well are "simple" LUT4)
 84 | * VPR XML packing descriptions for AAPack are ridiculously overengineered -- KinglerPAR will just use code instead of declarative data
 85 |     * How to make this code sufficiently reusable? MAX V is probably simpler than iCE40 is simpler than ECP5 is simpler than 7
 86 | * G-cell congestion estimates seem to be designed around "Xilinx-style" big switch box architectures, what happens on Altera-style architectures?
 87 | * Can we "dump" data back into VPR for routing initially?
 88 | 
 89 | Architectures:
 90 | * MAX V -- "Altera-style", LUT4
 91 | * iCE40 -- "Altera-style", LUT4, RAM
 92 | * 7 -- "Xilinx-style", LUT6/ALM, RAM+DSP
 93 | * ECP5 -- "Xilinx-style", LUT4, RAM+DSP
 94 | * MAX10 (future) -- "Altera-style", LUT4, RAM+mult
 95 | * Cyc10LP (future) -- "Altera-style", LUT4, RAM+mult
 96 | * Cyc10GX (future) -- "Altera-style", LUT6/ALM, RAM+DSP
 97 | 
 98 | Unsorted references:
 99 | * http://www.cse.cuhk.edu.hk/~byu/papers/C54-ICCAD2016-RippleFPGA-slides.pdf
100 | * https://chengengjie.github.io/papers/C2-ICCAD16-RippleFPGA.pdf
101 | * https://pdfs.semanticscholar.org/c1c9/43c047e9d834c8ac487ec6d5292485546744.pdf
102 | * http://wuxili.net/pdf/TCAD17_UTPlaceF_Li.pdf
103 | * http://sci-hub.tw/https://ieeexplore.ieee.org/document/6691143/
104 | * http://sci-hub.tw/https://ieeexplore.ieee.org/document/1560039/
105 | * http://www.cse.cuhk.edu.hk/~fyyoung/paper/iccad11_ripple.pdf
106 | * http://appsrv.cse.cuhk.edu.hk/~jkuang/pdf/ripple2.pdf
107 | * http://janders.eecg.toronto.edu/1387/readings/marcelfpl12.pdf
108 | * http://www.ece.umich.edu/cse/awards/pdfs/iccad10-simpl.pdf
109 | * https://atrium.lib.uoguelph.ca/xmlui/bitstream/handle/10214/12985/Abuowaimer_Ziad_201805_PhD.pdf?sequence=5&isAllowed=y
110 | 


--------------------------------------------------------------------------------