└── README.md /README.md: -------------------------------------------------------------------------------- 1 | Program Analysis and Transformation Survey and Links 2 | ==================================================== 3 | 4 | Table of Contents: 5 | 6 | * [Glossary](#glossary) 7 | * [Names](#names) 8 | * [Intermediate Representation Forms/Types](#intermediate-representation-formstypes) 9 | * [SSA Form](#ssa-form) 10 | * [Classification of SSA types](#classification-of-ssa-types) 11 | * [History](#history) 12 | * [Construction Algorithms](#construction-algorithms) 13 | * [Deconstruction Algorithms](#deconstruction-algorithms) 14 | * [Control Flow Analysis](#control-flow-analysis) 15 | * [Alias Analysis](#alias-analysis) 16 | * [Register Allocation](#register-allocation) 17 | * [Projects](#projects) 18 | 19 | Glossary 20 | ======== 21 | 22 | * Critical edge (of a graph) - Edge between a vertex which also has other successors and a vertex which has other predecessors. 23 | [(wikipedia)](https://en.wikipedia.org/wiki/Control-flow_graph#Special_edges) 24 | * DCE - Dead Code Elimination [(wikipedia)](https://en.wikipedia.org/wiki/Dead_code_elimination) 25 | * Graph - [(wikipedia)](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)) 26 | * LOLSSA - *[ this entry is a joke! due to [lolspeak](https://www.researchgate.net/profile/Jill_Vaughan/publication/323620982_I_can_haz_language_play_The_construction_of_language_and_identity_in_LOLspeak/links/5aa089e8aca272d448b178b6/I-can-haz-language-play-The-construction-of-language-and-identity-in-LOLspeak.pdf) ]* 27 | a) SSA as defined in some early papers on the matter, especially 28 | in the part of out-of-SSA conversion (see epigraph to 29 | [SSA Deconstruction](#deconstruction-algorithms) section below); 30 | b) a similar version of SSA used in some (oftentimes amateur) projects 31 | decades later. 32 | 33 | Names 34 | ===== 35 | 36 | As a dedication: 37 | 38 | We study Program Analysis because it's objective and complex phenomena of 39 | nature devoid of subjectivities of the mankind. But then, we can't separate 40 | it from the work of great human minds who laid the paths in this area, whose 41 | steps we now follow. 42 | 43 | These are people who contributed to the Program Analysis field of study 44 | (with all the apology to many more who are not listed here). The emphasis 45 | here is on well-knownness and public availability of their works: 46 | 47 | * Gregory Chaitin 48 | * Jeffrey Ullman 49 | * Alfred Aho 50 | * [Keith Cooper](http://keith.rice.edu/) 51 | * thesis: 1983 "Interprocedural Data Flow Analysis in a Programming Environment" 52 | * [Andrew Appel](https://www.cs.princeton.edu/~appel/) 53 | * thesis: 1985 "Compile-time Evaluation and Code Generation in Semantics-Directed Compilers" 54 | * book 1998: "Modern Compiler Implementation in ML/Java/C" 55 | * 2000: [Optimal Register Coalescing Challenge](https://www.cs.princeton.edu/~appel/coalesce/) 56 | * [Preston Briggs](https://genealogy.math.ndsu.nodak.edu/id.php?id=94538) 57 | * thesis: 1992 "Register Allocation via Graph Coloring" 58 | * [Clifford Click](http://cliffc.org/blog/sample-page/) [@cliffclick](https://github.com/cliffclick) 59 | * thesis: 1995 "Combining Analyses, Combining Optimizations" 60 | * [John Aycock](https://pages.cpsc.ucalgary.ca/~aycock/) 61 | * thesis: 2001 "Practical Earley Parsing and the SPARK Toolkit" 62 | * Hacked on Python compilation: [[1]](https://pages.cpsc.ucalgary.ca/~aycock/papers/ucpy.pdf), [[2]](https://prism.ucalgary.ca/handle/1880/45370), [[3]](https://pages.cpsc.ucalgary.ca/~aycock/papers/ipc7-211.pdf), [[4]](https://pages.cpsc.ucalgary.ca/~aycock/papers/ipc8.pdf) 63 | * Now hacks on retrogaming: [[1]](https://www.amazon.com/Retrogame-Archeology-Exploring-Computer-Games/dp/3319300024), [[2]](https://www.youtube.com/watch?v=WV259iLon1M) 64 | * [Sebastian Hack](http://compilers.cs.uni-saarland.de/people/hack/) 65 | * thesis: 2006 "Register Allocation for Programs in SSA Form" 66 | * [Matthias Braun](https://pp.ipd.kit.edu/personhp/matthias_braun.php) [@MatzeB](https://github.com/MatzeB) 67 | * thesis: 2006 "Heuristisches Auslagern in einem SSA-basierten Registerzuteiler" in German, "Heuristic spilling in an SSA-based register allocator" 68 | * [Sebastian Buchwald](https://pp.ipd.kit.edu/personhp/sebastian_buchwald.php) 69 | * thesis: 2008 "Befehlsauswahl auf expliziten Abhangigkeitsgraphen" in German, "Instruction selection on explicit dependency graphs" 70 | * [Florent Bouchez](http://florent.bouchez.free.fr/) 71 | * thesis: 2009 "A Study of Spilling and Coalescing in Register Allocation as Two Separate Phases" 72 | * [Benoit Boissinot](https://bboissin.appspot.com/) 73 | * thesis: 2010 "Towards an SSA based compiler back-end: some interesting properties of SSA and its extensions" 74 | * Quentin Colombet 75 | * thesis 2012: "Decoupled (SSA-based) Register Allocators: from Theory to Practice, Coping with Just-In-Time Compilation and Embedded Processors Constraints" 76 | 77 | Intermediate Representation Forms/Types 78 | ======================================= 79 | 80 | * Imperative 81 | * Functional 82 | * Static Single-Assignment (SSA) - As argued by Appel, SSA is a functional representation. 83 | * (Lambda-)Functional 84 | * Continuation-passing Style (CPS) 85 | 86 | SSA Form 87 | ======== 88 | 89 | Put simple, in an SSA program, each variable is (statically, syntactically) 90 | assigned only once. 91 | 92 | Wikipedia: https://en.wikipedia.org/wiki/Static_single_assignment_form 93 | 94 | General reference: "SSA Book" aka "Static Single Assignment Book" aka 95 | "SSA-based Compiler Design" is open, collaborative effort of many SSA 96 | researchers to write a definitive reference for all things SSA. 97 | * Direct download (new versions are published): 98 | http://ssabook.gforge.inria.fr/latest/book-full.pdf 99 | * Download directory: 100 | http://ssabook.gforge.inria.fr/latest/ 101 | * GForge project: https://gforge.inria.fr/projects/ssabook/ 102 | * Subversion repository: 103 | https://gforge.inria.fr/scm/browser.php?group_id=1950 104 | 105 | Classification of SSA types 106 | --------------------------- 107 | 108 | * Axis 1: Minimality. There're 2 poles: fully minimal vs fully maximal SSA form. 109 | Between those, there's continuum of intermediate cases. 110 | * Fully maximal 111 | * Defined e.g. by Appel: 112 | > A really crude approach is to split every variable at every basic-block 113 | > boundary, and put φ-functions for every variable in every block. 114 | 115 | Maximal form is the most intuitive form for construction, gives the simplest 116 | algorithms for both phi insertion and variable renaming phases of the 117 | construction. 118 | * Optimized maximal 119 | * An obvious optimization of avoiding placing phi functions in blocks with 120 | a single predecessor, as they never needed there. While cuts the number 121 | of phi functions, makes renaming algorithm a bit more complex: while for 122 | maximal form renaming could process blocks in arbitrary order (because 123 | each of program's variables has a local definition in every basic block), 124 | optimized maximal form requires processing predecessor first for each 125 | such single-predecessor block. 126 | * Minimal for reducible CFGs 127 | * Some algorithms (e.g. optimized for simplicity) naturally produce minimal 128 | form only for reducible CFGs. Applied to non-reducible CFGs, they may 129 | generate extra Phi functions. There're usually extensions to such 130 | algorithms to generate minimal form even for non-reducible CFGs too (but 131 | such extensions may add noticeable complexity to otherwise "simple" 132 | algorithm). Examplem of such an algorithm ins 2013 Braun et al. 133 | * Fully minimal 134 | * This is usually what's sought for SSA form, where there're no superflous 135 | phi functions, based only on graph properties of the CFG (with consulting 136 | semantics of the underlying program). 137 | * Axis 2: Prunedness. As argued (implied) by 2013 Braun et al., prunedness is 138 | a separate trait from minimality. E.g., their algorithm constructs not fully 139 | minimal, yet pruned form. Between pruned and non-pruned forms, there're 140 | intermediate types again. 141 | * Pruned 142 | * Minimal form can still have dead phi functions, i.e. phi functions which 143 | reference variables which are not actually used in the rest of the program. 144 | Note that such references are problematic, as they artificially extend live 145 | ranges of referenced variables. Likewise, it defines new variables which 146 | aren't really live. The pruned SSA form is devoid of the dead phi functions. 147 | Two obvious way to achieve this: a) perform live variable analysis prior to 148 | SSA construction and use it to avoid placing dead phi functions; b) run 149 | dead code elimination (DCE) pass after the construction (which requires 150 | live variable analysis first, this time on SSA form of the program already). 151 | Due to these additional passes, pruned SSA construction is more expensive 152 | than just the minimal form. Note that if we intend to run DCE pass on the 153 | program anyway, which is often happens, we don't really need to be concerned 154 | to *construct* pruned form, as we will get it after the DCE pass "for free". 155 | Except of course that minimal and especially maximal form require more 156 | space to store and more time to go thru it during DCE. 157 | * Semi-pruned 158 | * Sometimes called "Briggs-Minimal" form. A compromise between fully 159 | pruned and minimal form. From Wikipedia: 160 | > Semi-pruned SSA form[6] is an attempt to reduce the number of Φ 161 | functions without incurring the relatively high cost of computing 162 | live variable information. It is based on the following observation: 163 | if a variable is never live upon entry into a basic block, it never 164 | needs a Φ function. During SSA construction, Φ functions for any 165 | "block-local" variables are omitted. 166 | * Not pruned 167 | * Axis 2: Conventional vs Transformed SSA 168 | * Conventional 169 | * Allows for easy deconstruction algorithm (literally, just drop 170 | SSA variables subscripts and remove Phi functions). Usually, 171 | after construction, SSA is in conventional form (if during 172 | construction, additional optimizations were not performed). 173 | * Transformed 174 | * Some optimizations applied to an SSA program make simple deconstruction 175 | algorithm outlined above not possible (not producing correct 176 | results). This is known as "transformed SSA". There're algorithms 177 | to convert transformed SSA into conventional form. 178 | * Axis 3: Strict vs non-strict SSA 179 | * Non-strict SSA allows some variables to be undefined on some paths 180 | (just like conventional imperative programs). 181 | * Strict form requires each use to be dominated by definition. This 182 | in turn means that every variable must be explicitly initialized. 183 | Non-strict program can be trivially converted into strict form, by 184 | initializing variables with special values, like "undef" for truly 185 | undefined values, "param" for function paramters, etc. Most of SSA 186 | algorithms requires/assume strict SSA form, so non-strict is not 187 | further considered. 188 | 189 | 190 | Discussion: There's one and true SSA type - the maximal one. It has a 191 | straightforward, easy to understand construction algorithm which does 192 | not depend on any other special algorithms. Running a generic 193 | DCE algorithm on it will remove any redundancies of the maximal form 194 | (oftentimes, together with other dead code). All other types are 195 | optimizations of the maximal form, allowing to generate less Phi 196 | functions, so less are removed later. Optimizations are useful, but 197 | the usual warning about premature optimization applies. 198 | 199 | History 200 | ------- 201 | 202 | * 1969 203 | 204 | According to Aycock/Horspool: 205 | 206 | > The genesis of SSA form was in the 1960s with the work of Shapiro and 207 | Saint [23,19]. Their conversion algorithm was based upon finding equivalence 208 | classes of variables by walking the control-flow graph. 209 | 210 | R. M. Shapiro and H. Saint. The Representation of Algorithms. Rome Air 211 | Development Center TR-69-313, Volume II, September 1969. 212 | 213 | > Given the possibility of concurrent operation, we might also wish to question the automatic one-one mapping of variable names to equipment locations. Two uses of the same variable name might be entirely unrelated in terms of data dependency and thus potentially concurrent if mapped to different equipment locations. 214 | 215 | Continues on the p.31 of the paper (p.39 of the PDF) under the title: 216 | 217 | > VI. Variable-Names and Data Dependency Relations 218 | 219 | * 1988 220 | 221 | Then, following Wikipedia, "SSA was proposed by Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck in 1988." 222 | Barry Rosen; Mark N. Wegman; F. Kenneth Zadeck (1988). "Global value numbers and redundant computations" 223 | 224 | Construction Algorithms 225 | ----------------------- 226 | 227 | Based on excerpts from "Simple Generation of Static Single-Assignment Form", 228 | Aycock/Horspool 229 | 230 | ### For Reducible CFGs (i.e. special case) 231 | 232 | * 1986 R. Cytron, A. Lowry, K. Zadeck. Code Motion of Control Structures in High-Level 233 | Languages. Proceedings of the Thirteenth Annual ACM Symposium on Principles 234 | of Programming Languages, 1986, pp. 70–85. 235 | 236 | > Cytron, Lowry, and Zadeck [11] predate the use of φ-functions, and employ 237 | a heuristic placement policy based on the interval structure of the control-flow 238 | graph, similar to that of Rosen, Wegman, and Zadeck [22]. The latter work is 239 | interesting because they look for the same patterns as our algorithm does during 240 | our minimization phase. However, they do so after generating SSA form, and 241 | then only to correct ‘second order effects’ created during redundancy elimination. 242 | 243 | * 1994 Single-Pass Generation of Static Single-Assignment Form for Structured Languages, Brandis and Mössenböck 244 | 245 | > Brandis and Mössenböck [5] generate SSA form in one pass for structured control- 246 | flow graphs, a subset of reducible control-flow graphs, by delicate placement of 247 | φ-functions. They describe how to extend their method to reducible control-flow 248 | graphs, but require the dominator tree to do so. 249 | 250 | * 2000 Simple Generation of Static Single-Assignment Form, 2000, John Aycock and Nigel Horspool 251 | 252 | > In this paper we present a new, simple method for converting to SSA 253 | form, which produces correct solutions for nonreducible control-flow 254 | graphs, and produces minimal solutions for reducible ones. 255 | 256 | ### For Non-Reducible CFGs (i.e. general case) 257 | 258 | * 1991 R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Efficiently 259 | Computing Static Single-Assignment Form and the Control Dependence Graph. 260 | ACM TOPLAS 13, 4 (October 1991), pp. 451–490. 261 | 262 | > "Canonical" SSA construction algorithm. 263 | 264 | * 1995 R. K. Cytron and J. Ferrante. Efficiently Computing φ-Nodes On-The-Fly. ACM 265 | TOPLAS 17, 3 (May 1995), pp. 487–506. 266 | 267 | > Cytron and Ferrante [9] later refined their method so that it runs in almost-linear time. 268 | 269 | * 1994 R. Johnson, D. Pearson, and K. Pingali. The Program Structure Tree: Computing 270 | Control Regions in Linear Time. ACM PLDI ’94, pp. 171–185. 271 | 272 | > Johnson, Pearson, and Pingali [16] demonstrate conversion to SSA form as 273 | an application of their “program structure tree,” a decomposition of the control- 274 | flow graph into single-entry, single-exit regions. They claim that using this graph 275 | representation allows them to avoid areas in the control-flow graph that do not 276 | contribute to a solution. 277 | 278 | * 1995 V. C. Sreedhar and G. R. Gao. A Linear Time Algorithm for Placing φ-Nodes. 279 | Proceedings of the Twenty-Second Annual ACM Symposium on Principles of 280 | Programming Languages, 1995, pp. 62–73. 281 | 282 | > Sreedhar and Gao [24] devised a linear-time algorithm for φ-function 283 | placement using DJ-graphs, a data structure which combines the dominator tree 284 | with information about where data flow in the program merges. 285 | 286 | * 2013 M. Braun, S. Buchwald, S. Hack, R. Leißa, C. Mallon, and 287 | A. Zwinkau. Simple and efficient construction of static single assignment 288 | form. In R. Jhala and K. Bosschere, editors, Compiler Construction, 289 | volume 7791 of Lecture Notes in Computer Science, pp. 290 | 102–122. Springer, 2013. doi: 10.1007/978-3-642-37051-9_6. 291 | 292 | > Braun, et al present a simple SSA construction algorithm, which allows 293 | direct translation from an abstract syntax tree or bytecode into an 294 | SSA-based intermediate representation. The algorithm requires no prior 295 | analysis and ensures that even during construction the intermediate representation 296 | is in SSA form. This allows the application of SSA-based optimizations 297 | during construction. After completion, the intermediate representation 298 | is in minimal and pruned SSA form. In spite of its simplicity, 299 | the runtime of the algorithm is on par with Cytron et al.’s algorithm. 300 | 301 | * 2016 Verified Construction of Static Single Assignment Form, Sebastian Buchwald, 302 | Denis Lohner, Sebastian Ullrich 303 | 304 | Deconstruction Algorithms 305 | ------------------------- 306 | 307 | --- 308 | Epigraph (due to [Boissinot](https://bboissin.appspot.com/static/upload/bboissin-outssa-cgo09-slides.pdf), slide 20): 309 | 310 | *Naively, a k-input Phi-function at entrance to a node X can 311 | be replaced by k ordinary assignments, one at the end of 312 | each control flow predecessor of X. This is always correct...* 313 | 314 | -- *Cytron, Ferrante, Rosen, Wegman, Zadeck (1991) 315 | Efficiently computing static single assignment form and the control 316 | dependence graph.* 317 | 318 | > Cytron et al. (1991): Copies in predecessor basic blocks. 319 | 320 | Incorrect! 321 | * Bad understanding of parallel copies 322 | * Bad understanding of critical edges and interference 323 | 324 | > Briggs et al. (1998) 325 | 326 | Both problems identified. General correctness unclear. 327 | 328 | > Sreedhar et al. (1999) 329 | 330 | Correct but: 331 | * handling of complex branching instructions unclear 332 | * interplay with coalescing unclear 333 | * "virtualization" hard to implement 334 | 335 | Many SSA optimizations turned off in gcc and Jikes. 336 | 337 | --- 338 | 339 | TBD. Some papers in the "Construction Algorithms" section also include 340 | information/algorithms on deconstruction. 341 | 342 | Converting out of SSA is effectively elimination (lowering) of Phi 343 | functions. (Note that Phi functions may appear in a program which is 344 | not (purely) SSA, so Phi elimination is formally more general process 345 | than conversion out of SSA.) 346 | 347 | There are 2 general ways to eliminate Phi functions: 348 | 349 | 1. **Requires splitting critical edges, but doesn't introduce new variables 350 | and extra copies**: 351 | Treat Phi functions as parallel copies on the incoming edges. This 352 | requires splitting critical edges. Afterwards, parallel copies are 353 | sequentialized. 354 | 2. **Does not require splitting critical edges, but introduces new 355 | variables and extra copies to them which then would need to be 356 | coalesced**: 357 | For Conventional SSA (CSSA), result and arguments of a Phi can 358 | be just renamed to the same name (throughout the program), and the Phi 359 | removed. This is because arguments and result do not interfere among 360 | themselves (CSSA is produced by normal SSA construction algorithms, which 361 | don't perform copy propagation and value numbering during construction). 362 | Arbitrary SSA (or Transformed SSA, TSSA) can be converted to CSSA 363 | by splitting live ranges of Phi arguments and results, by renaming them 364 | to new variables, then inserting parallel copy of old argument variables 365 | to new at the end of each predecessor, and parallel copy of all Phi 366 | results, after all the Phi functions of the current basic block. These 367 | parallel copies (usually) can be sequentialized trivially (so oftentimes 368 | even not treated as parallel in the literature). This method does not 369 | require splitting critical edges, but introduces many unnecessary copies 370 | (intuitively, for non-interfering Phi variables), which then need to 371 | be optimized by coalescing (or alternatively, unneeded copies should 372 | not be introduced in the first place). 373 | 374 | Control Flow Analysis 375 | ===================== 376 | 377 | According to 1997 Muchnick: 378 | 379 | * Analysis on "raw" graphs, using dominators, the iterative dataflow algorithms 380 | * Interval Analysis, which then allows to use adhoc optimized dataflow analysis. 381 | Variants in the order of advanceness: 382 | * The simplest form is T1-T2 reduction 383 | * Maximal intervals analysis 384 | * Minimal intervals analysis 385 | * Structural analysis 386 | 387 | 388 | Alias Analysis 389 | ============== 390 | 391 | * 1994 [Program Analysis and Specialization for the C Programming Language](http://www.cs.cornell.edu/courses/cs711/2005fa/papers/andersen-thesis94.pdf), 392 | Lars Andersen 393 | 394 | 395 | Register Allocation 396 | =================== 397 | 398 | Wikipedia: https://en.wikipedia.org/wiki/Register_allocation 399 | 400 | Terms: 401 | 402 | * Decoupled allocator - In classic register allocation algorithms, variables 403 | assignment to registers and spilling of non-assignable variables are 404 | tightly-coupled, interleaving phases of the single algorithm. In a decoupled 405 | allocator, these phases are well separated, with spilling algorithm first 406 | selecting and rewriting spilling variables, and assignment algorithm then 407 | dealing with the remaining variables. Most of decoupled register allocators 408 | are SSA-based, though recent developments also include decoupled allocators 409 | for standard imperative programs. 410 | * Chordal graph - A type of graph, having a property that it can be colored 411 | in polynomial time (whereas generic graphs require NP time for coloring). 412 | Interference graphs of SSA programs are chordal. (Note that arbitrary 413 | pre-coloring and/or register aliasing support for chordal graphs, as 414 | required for real-world register allocation, may push complexity back into 415 | NP territory). 416 | 417 | Conventional Register Allocation 418 | -------------------------------- 419 | 420 | TBD 421 | 422 | Register Allocation on SSA Form 423 | ------------------------------- 424 | 425 | * [University of Saarland page on SSA register allocation](http://compilers.cs.uni-saarland.de/projects/ssara/) 426 | gives a good overview of Register Allocation area, and how SSA form makes 427 | some matters easier. 428 | 429 | 430 | Projects 431 | ======== 432 | 433 | Academic projects 434 | ----------------- 435 | 436 | * [SUIF1](https://suif.stanford.edu/suif/suif1/index.html) - 1994, Stanford University 437 | * "The SUIF (Stanford University Intermediate Format) 1.x compiler, 438 | developed by the Stanford Compiler Group, is a free infrastructure 439 | designed to support collaborative research in optimizing and 440 | parallelizing compilers." 441 | * [SUIF2](https://suif.stanford.edu/suif/suif2/index.html) - 1999, Stanford University 442 | * "A new version of the SUIF compiler system, a free infrastructure 443 | designed to support collaborative research in optimizing and 444 | parallelizing compilers. It is currently in the beta test stage 445 | of development." 446 | * Machine SUIF aka machsuif - "Fork" of SUIF1/SUIF2, Harvard University 447 | * https://web.archive.org/web/20090630022924/http://www.eecs.harvard.edu/machsuif/software/software.html (via archive.org) 448 | * http://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15745-s02/public/doc/overview.html 449 | * [NCI (National Compiler Infrastructure)](https://suif.stanford.edu/suif/NCI/) ([archive.org](https://web.archive.org/web/20090901021758/http://www.cs.virginia.edu/nci/)) - 1998-200x? Collaborative project among US universities 450 | * "the National Compiler Infrastructure project has two components:" 451 | * SUIF 452 | * [Zephyr](https://web.archive.org/web/20090310184351/http://www.cs.virginia.edu/zephyr/) 453 | * Zephyr ASDL now lives at http://asdl.sourceforge.net/ 454 | * Zephyr ASDL description used in Python: https://github.com/python/cpython/blob/master/Parser/Python.asdl 455 | * Oilshell blog post dedicated to Zephyr ASDL: https://www.oilshell.org/blog/2016/12/11.html 456 | 457 | --- 458 | This list is compiled and maintained by Paul Sokolovsky, and released under 459 | [Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0)](https://creativecommons.org/licenses/by-sa/4.0/). 460 | --------------------------------------------------------------------------------