├── LICENSE
├── README.md
└── Slides
├── 12.png
├── 13.png
├── 14.png
├── 15.png
├── 16.png
├── 17.png
├── 18.png
├── 19.png
├── 21.png
├── 22.png
├── 23.png
├── 24.png
├── 4.png
├── 5.png
├── 6.png
├── 7.png
└── 8.png
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2020 Nimrod Partush
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Program Analysis Glosarry
2 |
3 | [I](https://nimrodpar.github.io/) created this glossary due to a recent program analysis exploration task where I found myself struggling to keep up with the jargon and all the (many) different kinds of analyses.
4 |
5 |
8 |
9 | A transcribed version of the slides in Github Markdown format follows.
10 |
11 | ---
12 |
13 | # Intro/Disclaimer
14 |
15 | * The goal of this glossary is to convey the differences between different types of program analyses by example.
16 | * The notions and examples shown in the slides are example-driven, and are not quite formal
17 | * [I](https://nimrodpar.github.io/) have some knowledge in program analysis, but may contain errors or inaccuracies.
18 | * Please suggest corrections/additions via Github issues
19 |
20 | **Some of the Sources:**
21 | * https://www.seas.harvard.edu/courses/cs252/2011sp/slides/Lec02-Dataflow.pdf
22 | * https://en.wikipedia.org/wiki/Data-flow_analysis
23 | * https://stackoverflow.com/questions/13397180/what-exactly-does-context-mean-in-context-insensitive-analysis#:~:text=A%20context%2Dsensitive%20analysis%20is,target%20of%20a%20function%20call.&text=Context%2Dsensitivity%20also%20makes%20the%20analysis%20expensive.
24 | * http://pages.cs.wisc.edu/~fischer/cs701.f14/popl95.pdf
25 |
26 |
27 | # Basic Definitions
28 | _These apply to (almost) all analyses._
29 |
30 | ## Definition: A Program Analysis
31 |
32 | * A mapping from each location in the program to an abstract state describing some trait for program variables.
33 | * Example traits: Possible range of values (a.k.a interval), Which variables may still be used later in the program (a.k.a live variables), Which computed expressions are still valid (a.k.a available expressions), etc.
34 |
35 |
36 |
37 |
38 |
39 | _Speaker Notes:_
40 |
41 | For the sake of simplicity, we define a very basic notion of a program analysis. The example we show here and in the next 3 slides is an interval analysis, which simply tried to track the range of values for variables.
42 |
43 | `x = nothing` means that x is not initialized so we don’t have any valid possible value for it. Another notation for this is `x = ⟂` (also called ‘bottom’).
44 |
45 | Note: some analysis treat uninitialized variables as having *all possible values* (since this may be the case for some languages like C), which is usually denoted as `x = ⟙` (also called “top”).
46 |
47 |
48 |
49 |
50 |
51 | _Speaker Notes:_
52 |
53 | The abstract states here does not seem abstract at all right? The values are very concrete and well defined.
54 |
55 | This is because of:
56 | 1. the very simple program we are analysing
57 | 1. the “abstract domain” we are using.
58 |
59 | An abstract domain simply keeps track of certain properties for the (variables in) the program. In this example the properties are integer values, but can also be stuff like “is the variable null or not?” or “is the value of the variable odd or even”?.
60 |
61 | Abstract domains in program analysis have many rules that are required for correctness and are a very big and deep field of study. We won’t elaborate any further, just remember that their purpose is to keep track of (properties of) variable values.
62 |
63 | ## Definition: The Join Operation ⨆
64 |
65 | * An operation for...joining abstract states flowing from multiple paths.
66 | * Also called a "May Analysis".
67 |
68 |
69 |
70 |
71 |
72 | _Speaker Notes:_
73 |
74 | So, what do we do in the case of the pos() function? What is the abstract state at the `L3` location (i.e., what are all possible values for the variable `x`)?
75 |
76 | Since we want to cover all possible values of variable `x` (this is also called “over-approximating”), we need to account for the values flowing into `L3`. The values are determined by the two branches of the if statement. So, we somehow need to account for the states that we have for `L1` and `L2`.
77 |
78 |
79 |
80 |
81 |
82 | _Speaker Notes:_
83 |
84 | The operation for accounting for all states flowing into a program location is called a Join operation (denoted ⨆).
85 |
86 | Basically the operation means “the value for `x` in `L3` can be either the value of `x` in `L1` or the value of `x` in `L2`”.
87 |
88 | Here, we join by simply ORing all the possible states flowing in. But this can be done differently. For instance, instead of 2 values for `x` in two sub-states, we can have `{ x = [0,42] }` i.e., `x` can be any value between 0 and 42. Since we are over-approximating, this result is okay but also very inaccurate.
89 |
90 | This sort of compromise (also called abstraction) is needed for efficiency. One of the major goals of program analysis is to create efficient and scalable analyses that are still accurate enough to be useful.
91 |
92 | ## Definition: The Meet Operation ⊓
93 |
94 | * Merging abstract states flowing from multiple paths while keeping facts that are *true on all paths*.
95 | * Also called a "Must Analysis".
96 |
97 |
98 |
99 |
100 |
101 | _Speaker Notes:_
102 |
103 | For some analyses, we do not want to account for all possible states on all paths.
104 |
105 | That’s the case for the the available expressions analysis shown in the slide. This is an analysis used by compilers to determine what temporary computations can be re-used and need not be calculated.
106 |
107 | In the example, the expressions `2*y` and `3*x` are used multiple times so it would be wise to re-use the calculation if possible. The analysis tracks the computed expressions by keeping tabs on the inputs to each expression to check if it had changed.
108 |
109 | For instance, `2*y` is computed in L0 and is therefore added to the state at the location (the state is the list of available expressions). At L1 location, `2*y` is no longer available since `y` changes and is removed from the state (but kept in `L2`). `3*x` is added to the state at `L1` and `L2` since it was computed in the if condition and `x` does not change in any of the branches.
110 |
111 | Arriving at `L3`, to correctly know which expressions are available and need not be recomputed, we must only consider the expressions that are available in all paths leading to `L3`. This means we must meet the states from `L1` and `L2`, and keep only the expression shared by both-- `3*x`.
112 |
113 | # Analyses Types
114 |
115 | ## May Analysis
116 |
117 | * (as seen before) an over-approximating analysis that uses the join ⨆ operation to merge states flowing from multiple paths.
118 |
119 |
120 |
121 |
122 |
123 |
124 | ## Must Analysis
125 |
126 | * (as seen before) an under-approximating analysis that uses the meet ⊓ operation to merge states flowing from multiple paths.
127 |
128 |
129 |
130 |
131 |
132 | ## Forward Analysis
133 |
134 | * An analysis where the abstract state flows forward.
135 | * The state at each program point is derived from the states of the **preceding** program points.
136 |
137 | 
138 |
139 | _Speaker Notes:_
140 |
141 | You may not have noticed, but the way states at each location were computed is by using the states from previous lines and accounting for what happens at the current line (also called applying a “transformer”).
142 |
143 | This sort of analysis, where you use the states for (direct) previous lines, is called a forward analysis. Most analysis are forward ones, including what we saw so far.
144 |
145 |
146 |
147 |
148 |
149 | _Speaker Notes:_
150 |
151 | So that state from `L1` is taken as the initial state for `L2` (i.e., `in(L2) = L1`), and the operation `x=1` at `L2` is accounted for (the semantics of `L2` i.e., [| `x = 1` |]) and we get the resulting state for `L2` (i.e., `out(L2) = {x =1, y = nothing}`.
152 |
153 | ## Backward Analysis
154 | * An analysis where the abstract state flows backwards.
155 | * The state at each program point is derived from the states of the **succeeding** program points.
156 | * Example: Live Variables analysis.
157 | * An analysis used by compilers to determine which variables are no longer needed and can be freed.
158 |
159 |
160 |
161 |
162 |
163 | _Speaker Notes:_
164 |
165 | This analysis starts the end of the program. It identifies that `L4` uses `y`, so `y` is live at `L4`.
166 |
167 |
168 |
169 |
170 |
171 | _Speaker Notes:_
172 |
173 | To determine which variables are live (i.e., required and can’t be freed) at `L3`, the state from `L4` is taken, and the statement in `L3` is examined. The analysis identifies that `x` is needed at `L3`, so it is added to the state.
174 |
175 | Note that we already know that `y` can be freed after `L3`.
176 |
177 |
178 |
179 |
180 |
181 | _Speaker Notes:_
182 |
183 | The final result of the analysis.
184 |
185 | ## Flow-Sensitive Analysis
186 |
187 | * An analysis that takes the order of instructions into account.
188 |
189 | 
190 |
191 | _Speaker Notes:_
192 |
193 | This (concise) slide shows the state for the exit point of f() (a.k.a. post-condition).
194 |
195 | On the left we have a result for a flow-insensitive analysis where instruction ordering is ignored. In this sort of analysis, basically everything that happens everywhere in the program (w.r.t. Each variable) is tracked, and no overwriting is done. Therefore we get the state on the left tracking both assignments to `x`.
196 |
197 | On the right we have a flow-sensitive analysis which tracks instruction ordering. The assignment of `x = 2` therefore overwrite the previous assignment in the state, resulting in the more precise `{x=2}` state at program exit.
198 |
199 | Why would anyone use a flow-insensitive algorithm? They are simpler to specify, and can be faster. Therefore they are useful in some types of analysis where the precision gap may be small (e.g., https://www.cs.colorado.edu/~bec/papers/sas11-ptaprecision.pdf).
200 |
201 | ## Path-Sensitive Analysis
202 |
203 | * An analysis that takes branch conditions into account.
204 |
205 | 
206 |
207 | _Speaker Notes:_
208 |
209 | As before, the state of the left shows the result of a path-insensitive analysis at the exit point of `f()`. The `x > 0` branch condition is unaccounted for.
210 |
211 | The path-sensitive state on the right shows the result at the exit point for `f()`, where path conditions, i.e., the branch condition and its negation, are taken into account. This result in a bigger and more informative state (but also more expensive).
212 |
213 | ## Context-Sensitive / Interprocedural Analysis
214 |
215 | * An analysis that takes the calling context into account.
216 | * a.k.a inter-procedural analysis.
217 |
218 | 
219 |
220 | _Speaker Notes:_
221 |
222 | In the context-insensitive analysis we analyze all functions once independently of calling context (we don’t care where the function was called from and what was the state at the point of invocation). Thus `f()` will be analyzed once, with no context, forcing the analysis to assume that input variable `x` can hold any possible value (a.k.a top `T`). Incrementing `T` by 1 results in `T`, which will be the return value of `f()`. This will be assigned to `y` in `L1`, resulting in the very imprecise abstract state on the left.
223 |
224 | A context-sensitive analysis, will maintain the calling context, i.e., what values are possible for `x` at the callsite to `f()`. The analysis will carry the `{ y = 0 }` context to `f()` and plug that to the input argument `x`, resulting in a `{ x = 1 }` state for `f()` which in turn will be assigned to `y` at `L1`, resulting with the state on the right. This may remind you of inlining performed by compilers, as it indeed operates in a similar way.
225 |
226 | ## k-CFA
227 |
228 | * k-CFA is a property of context sensitive analysis.
229 | * CFA stands for Control Flow Analysis.
230 | * It basically describes the size of the context maintained throughout the analysis.
231 | * For interprocedural callsite-based analysis, a 1-CFA analysis maintains the context of the caller, when starting the analysis of a function (the callee).
232 | * The context is the state of the caller, at the point where the function was invoked.
233 | * 2-CFA includes the state of the caller, and the caller of that caller.
234 | * Etc.
235 |
236 | Note: CFA is a misnomer (since control relates to branching in general and not necessarily function calls), but it stuck.
237 |
238 | ## Call Site Sensitive Analysis
239 |
240 | * A kind of a context-sensitive analysis that retains context by call sites.
241 |
242 |
243 |
244 |
245 |
246 | _Speaker Notes:_
247 |
248 | This is one kind of a context sensitive analysis.
249 |
250 | The analysis remembers the (chain of) call-site(s) from which the currently analyzed function arrived from, and carries context, in the form of program state, from that call location to the entry point of the program.
251 |
252 | The program state captioned as “1-CFA” is the result of a callsite sensitive analysis with a callsite chain of length 1. The analysis knows that it arrived at `L1` in `hey()` from `L2` in `foo()`, and therefore uses what is known about the state of `foo()` at `L2` for the input variable `i`. Since not much is known (`x` in an argument to `foo()` and is passed directly to `hey()`), the state maintains that the value of result is the value of `x` from `L2`, plus 1.
253 |
254 | The state captioned as “2-CFA” is the result of a 2-callsite-length chain analysis. This means that the analysis tracks the state of the calling function, as well as the state of the caller to the calling function. In the example, the state shown is for `hey()` at location `L1`, that was called from `L2` (in `foo()`), which in turn was called from `L3` (in `goo()`).
255 |
256 | As you may imagine, the longer the call chain for the k-CFA analysis, the (exponentially) more expensive it is. Once of the defining papers for creating an efficient interprocedural analysis is Precise interprocedural dataflow analysis via graph reachability.
257 |
258 | ## Object Sensitive Analysis
259 |
260 | * A kind of a context-sensitive analysis that retains context by objects/allocation-sites.
261 |
262 |
263 |
264 |
265 |
266 | _Speaker Notes:_
267 |
268 | A different approach for maintaining context in object oriented code is object sensitivity. Instead of maintaining a chain of states for callsites, we maintain a chain of calling objects, and their state.
269 |
270 | Objects are almost almost always identified by Allocation Sites (abbreviated as AS).
271 |
272 | In the example we have two objects of type `A`, identified by their two allocation site `AS1` and `AS2`. The state shown on the left is the exit state for `foo()`. You can see that the state is composed of 2 disjunctions: one for when the object is `a1` allocated at `AS1`, and one for `a2` from `AS2`.
273 |
274 | A 2-CFA analysis for object sensitive would maintain a chain of two calling objects (for instance if we had class `C` that allocated `B` objects and invoked `bar()`).
275 |
276 | ## Hybrid Callsite-Object Sensitive Analysis
277 |
278 | * Joins both worlds, more precise, more expensive.
279 |
280 |
281 |
282 |
283 |
284 | _Speaker Notes:_
285 |
286 | Note: Object sensitivity may be expressed via callsite sensitivity by treating the object as an argument for the invoked call.
287 |
288 | ## Field-Sensitive Analysis
289 |
290 | * An analysis that distinguishes different fields of an object.
291 |
292 | 
293 |
294 | _Speaker Notes:_
295 |
296 | It may be hard to imagine, but some analysis are able to scale better (be faster) by treating all the fields in some object as the same field. This is called field-insensitivity.
297 |
298 | The state on the left is a result of a flow-insensitive, field-insensitive analysis. All the fields for the objects are indistinguishable and marked as `*`. The sub-state pertaining to `AS1` comes from the `a.x = 1` assignment, and the `AS2` substate comes from `a.y = 2`.
299 |
300 | The state on the right is both flow and field sensitive.
301 |
302 |
303 |
304 |
305 |
306 |
307 |
308 |
309 |
310 |
311 |
312 |
313 |
--------------------------------------------------------------------------------
/Slides/12.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nimrodpar/ProgramAnalysisGlossary/e0895a3cd46c1fea9380f12300bde63edf42fe31/Slides/12.png
--------------------------------------------------------------------------------
/Slides/13.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nimrodpar/ProgramAnalysisGlossary/e0895a3cd46c1fea9380f12300bde63edf42fe31/Slides/13.png
--------------------------------------------------------------------------------
/Slides/14.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nimrodpar/ProgramAnalysisGlossary/e0895a3cd46c1fea9380f12300bde63edf42fe31/Slides/14.png
--------------------------------------------------------------------------------
/Slides/15.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nimrodpar/ProgramAnalysisGlossary/e0895a3cd46c1fea9380f12300bde63edf42fe31/Slides/15.png
--------------------------------------------------------------------------------
/Slides/16.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nimrodpar/ProgramAnalysisGlossary/e0895a3cd46c1fea9380f12300bde63edf42fe31/Slides/16.png
--------------------------------------------------------------------------------
/Slides/17.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nimrodpar/ProgramAnalysisGlossary/e0895a3cd46c1fea9380f12300bde63edf42fe31/Slides/17.png
--------------------------------------------------------------------------------
/Slides/18.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nimrodpar/ProgramAnalysisGlossary/e0895a3cd46c1fea9380f12300bde63edf42fe31/Slides/18.png
--------------------------------------------------------------------------------
/Slides/19.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nimrodpar/ProgramAnalysisGlossary/e0895a3cd46c1fea9380f12300bde63edf42fe31/Slides/19.png
--------------------------------------------------------------------------------
/Slides/21.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nimrodpar/ProgramAnalysisGlossary/e0895a3cd46c1fea9380f12300bde63edf42fe31/Slides/21.png
--------------------------------------------------------------------------------
/Slides/22.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nimrodpar/ProgramAnalysisGlossary/e0895a3cd46c1fea9380f12300bde63edf42fe31/Slides/22.png
--------------------------------------------------------------------------------
/Slides/23.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nimrodpar/ProgramAnalysisGlossary/e0895a3cd46c1fea9380f12300bde63edf42fe31/Slides/23.png
--------------------------------------------------------------------------------
/Slides/24.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nimrodpar/ProgramAnalysisGlossary/e0895a3cd46c1fea9380f12300bde63edf42fe31/Slides/24.png
--------------------------------------------------------------------------------
/Slides/4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nimrodpar/ProgramAnalysisGlossary/e0895a3cd46c1fea9380f12300bde63edf42fe31/Slides/4.png
--------------------------------------------------------------------------------
/Slides/5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nimrodpar/ProgramAnalysisGlossary/e0895a3cd46c1fea9380f12300bde63edf42fe31/Slides/5.png
--------------------------------------------------------------------------------
/Slides/6.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nimrodpar/ProgramAnalysisGlossary/e0895a3cd46c1fea9380f12300bde63edf42fe31/Slides/6.png
--------------------------------------------------------------------------------
/Slides/7.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nimrodpar/ProgramAnalysisGlossary/e0895a3cd46c1fea9380f12300bde63edf42fe31/Slides/7.png
--------------------------------------------------------------------------------
/Slides/8.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nimrodpar/ProgramAnalysisGlossary/e0895a3cd46c1fea9380f12300bde63edf42fe31/Slides/8.png
--------------------------------------------------------------------------------