└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # C-compiler-optimizations 2 | 3 | ## Branch Optimization 4 | 5 | ### If optimization 6 | 7 | ```c 8 | void f (int *p) 9 | { 10 | if (p) g(1); 11 | if (p) g(2); 12 | return; 13 | } 14 | ``` 15 | 16 | Can be simply replaced by 17 | 18 | ```c 19 | void f (int *p) 20 | { 21 | if (p) { 22 | g(1); 23 | g(2); 24 | 25 | } 26 | return; 27 | } 28 | ``` 29 | 30 | 31 | ### Value Range Optimization 32 | ```c 33 | for(int i = 1; i < 100; i++) { 34 | if (i) 35 | g(); 36 | } 37 | ``` 38 | 39 | The ```if``` expression can be eliminated since it is already known that ```i``` is a positive integer. 40 | 41 | ```c 42 | 43 | for(int i = 1; i < 100; i++) { 44 | g(); 45 | } 46 | ``` 47 | 48 | ### Branch elimination 49 | ```c 50 | goto L1; 51 | //do something 52 | 53 | L1: 54 | goto L2 //L1 branch is unnecessary 55 | ``` 56 | 57 | 58 | 59 | ### Unswitching 60 | 61 | As opposed to checking if some condition or the other is true inside of a loop, you can take the ```if``` out of the loop and then loop. 62 | 63 | ```c 64 | for (int i = 0; i < N; i++) 65 | if (x) 66 | a[i] = 0; 67 | else 68 | b[i] = 0; 69 | ``` 70 | 71 | ```c 72 | if (x) 73 | for (int i = 0; i < N; i++) 74 | a[i] = 0 75 | else 76 | for (int i = 0; i < N; i++) 77 | b[i] = 0; 78 | ``` 79 | 80 | 81 | ### Tail Recursion 82 | 83 | A tail recursive call can be replaced by a ```goto``` statement which avoids keeping track of the stack frame. 84 | 85 | Let's take a simple recursive function 86 | 87 | ```c 88 | int f(int i) { 89 | if (i > 0) { 90 | g(i); 91 | return f(i - 1) 92 | } 93 | else 94 | return 0; 95 | } 96 | ``` 97 | 98 | Now let's optimize it 99 | 100 | ```c 101 | int f (int i) { 102 | entry: 103 | 104 | if (i > 0) { 105 | g(i); 106 | i--; 107 | goto entry; 108 | } 109 | else 110 | return 0; 111 | } 112 | 113 | ``` 114 | 115 | ### Try/Catch block optimization 116 | 117 | Try/Catch blocks that never throw an exception can be optimized 118 | 119 | ```c 120 | try { 121 | int i = 1; 122 | } catch (Exception e) { 123 | //some code 124 | } 125 | ``` 126 | 127 | Can be turned into ```int i = 1;``` 128 | 129 | ## Loop Optimizations 130 | 131 | ### Loop unrolling 132 | 133 | When the different iterations of a loop are independent 134 | 135 | ```c 136 | for (int i = 0; i < 100; i++) { 137 | g(); 138 | } 139 | ``` 140 | 141 | The loop can be optimized 142 | 143 | ```c 144 | for (int i = 0; i < 100; i += 2) { 145 | g(); 146 | g(); 147 | } 148 | ``` 149 | 150 | This can of course be done even more aggressively 151 | 152 | ### Loop Collapsing 153 | 154 | ```c 155 | int a[100][300]; 156 | 157 | for (i = 0; i < 300; i++) 158 | for (j = 0; j < 100; j++) 159 | a[j][i] = 0; 160 | ``` 161 | 162 | Nested loops can be collapsed into a single loop where the index iterates over range(0,\product_j index_j) 163 | ```c 164 | int a[100][300]; 165 | int *p = &a[0][0]; 166 | 167 | for (i = 0; i < 30000; i++) 168 | *p++ = 0; 169 | 170 | ``` 171 | 172 | ### Loop fusion 173 | Two separate loops can be fused together 174 | 175 | ```c 176 | for (i = 0; i < 300; i++) 177 | a[i] = a[i] + 3; 178 | 179 | for (i = 0; i < 300; i++) 180 | b[i] = b[i] + 4; 181 | ``` 182 | 183 | ```c 184 | for (i = 0; i < 300; i++) { 185 | a[i] = a[i] + 3; 186 | b[i] = b[i] + 4; 187 | } 188 | ``` 189 | 190 | ### Forward store 191 | Stores to global variables in loops can be moved out of the loop 192 | ```c 193 | int sum; 194 | 195 | void f (void) 196 | { 197 | int i; 198 | 199 | sum = 0; 200 | for (i = 0; i < 100; i++) 201 | sum += a[i]; 202 | } 203 | ``` 204 | 205 | ```c 206 | int sum; 207 | 208 | void f (void) 209 | { 210 | int i; 211 | register int t; 212 | 213 | t = 0; 214 | for (i = 0; i < 100; i++) 215 | t += a[i]; 216 | sum = t; 217 | } 218 | ``` 219 | 220 | ## Access pattern optimization 221 | 222 | ### Volative Optimization 223 | ```volatile``` keyword is used to declare objects that may have unintended side effects. 224 | 225 | ```c 226 | volatile int x,y; 227 | int a[SIZE]; 228 | 229 | void f (void) { 230 | int i; 231 | for (i = 0; i < SIZE; i++) 232 | a[i] = x + y; 233 | } 234 | ``` 235 | 236 | You would think that you could hoist the computation of ```x+y``` outside of the loop 237 | 238 | ```c 239 | volatile int x,y; 240 | int a[SIZE]; 241 | 242 | void f (void) { 243 | int i; 244 | int temp = x + y; 245 | for (i = 0; i < SIZE; i++) 246 | a[i] = temp; 247 | } 248 | 249 | ``` 250 | 251 | However if ```x``` and ```y``` are volatile then this optimization might in fact be incorrect which is why compilers will not perform it. 252 | 253 | 254 | ### Quick Optimization 255 | 256 | Accessed objects can be cached into a temporary variable 257 | ```java 258 | { 259 | for(i = 0; i < 10; i++) 260 | arr[i] = obj.i + volatile_var; 261 | } 262 | ``` 263 | 264 | Below is the code fragment after Quick Optimization. 265 | 266 | ```java 267 | { 268 | t = obj.i; 269 | for(i = 0; i < 10; i++) 270 | arr[i] = t + volatile_var; 271 | } 272 | ``` 273 | 274 | ### printf Optimization 275 | Calling ```printf()``` invokes the external library function ```printf()``` 276 | 277 | ```c 278 | #include 279 | 280 | void f (char *s) 281 | { 282 | printf ("%s", s); 283 | } 284 | ``` 285 | 286 | The string can be formatted at compile time using 287 | ```c 288 | #include 289 | 290 | void f (char *s) 291 | { 292 | fputs (s, stdout); 293 | } 294 | ``` 295 | 296 | ### Dead code elimination 297 | 298 | Unused code is removed 299 | ```c 300 | int i = 1; 301 | return //something else 302 | ``` 303 | 304 | ### Constant Propagation/Constant folding 305 | ```c 306 | int x = 3; 307 | int y = 4 + x; //replaced by y = 7 308 | 309 | return (x + y) //replaced by 10 310 | ``` 311 | 312 | ### Instruction combining 313 | Below is a simple case of this, loop unrolling can reveal instances where instruction combining is possible 314 | ```c 315 | i++; 316 | i++; 317 | 318 | i += 2 319 | ``` 320 | 321 | 322 | ### Narrowing 323 | 324 | ```c 325 | unsigned short int s; 326 | 327 | (s >> 20) /* all bits of precision have been shifted out, thus 0 */ 328 | (s > 0x10000) /* 16 bit value can't be greater than 17 bit, thus 0 */ 329 | (s == -1) /* can't be negative, thus 0 */ 330 | ``` 331 | 332 | ### Integer Multiply 333 | 334 | This a well known one, given an expression 335 | 336 | ```c 337 | int f (int i,int n) 338 | { 339 | return i * n; 340 | } 341 | ``` 342 | 343 | Multiplication can be replaced by leftwise bitwise shifting 344 | 345 | ```c 346 | int f (int i) 347 | { 348 | return i << (n-1); 349 | } 350 | ``` 351 | 352 | ### Integer mod optimization 353 | Another known one, integer divide is really expensive on hardware. 354 | ```c 355 | int f (int x,int y) 356 | { 357 | return x % y; 358 | } 359 | ``` 360 | 361 | Hacker's delight is a wonderful book that's encyclopedic in its treatment of cool bit tricks such as the one below. 362 | 363 | ```c 364 | 365 | int f (int x) 366 | { 367 | int temp = x & (y-1); 368 | return (x < 0) ? ((temp == 0) ? 0 : (temp | ~(y-1))) : temp; 369 | } 370 | ``` 371 | 372 | ### Block Merging 373 | 374 | Suppose you had the following code fragment 375 | 376 | ```c 377 | int a; 378 | int b; 379 | 380 | void f (int x, int y) 381 | { 382 | goto L1; /* basic block 1 */ 383 | 384 | L2: /* basic block 2 */ 385 | b = x + y; 386 | goto L3; 387 | 388 | L1: /* basic block 3 */ 389 | a = x + y; 390 | goto L2; 391 | 392 | L3: /* basic block 4 */ 393 | return; 394 | } 395 | ``` 396 | 397 | 398 | The different blocks will be optimized into one 399 | 400 | ```c 401 | int a; 402 | int b; 403 | 404 | void f (int x, int y) 405 | { 406 | a = x + y; /* basic block 1 */ 407 | b = x + y; 408 | return; 409 | } 410 | 411 | ``` 412 | 413 | ### Common SubExpression 414 | The second code fragment above can further be optimzed into 415 | 416 | ```c 417 | tmp = x + y 418 | a = tmp 419 | b = tmp 420 | return; 421 | ``` 422 | 423 | ### Function inlining 424 | 425 | A lot of optimizations can be discovered if a function call is replaced by the body of the function 426 | 427 | Suppose we wish to implement a substraction function given a working addition function 428 | 429 | ```c 430 | 431 | int add (int x, int y) 432 | { 433 | return x + y; 434 | } 435 | 436 | int sub (int x, int y) 437 | { 438 | return add (x, -y); 439 | } 440 | ``` 441 | 442 | Expanding add() at the call site in sub() yields: 443 | 444 | ```c 445 | int sub (int x, int y) 446 | { 447 | return x + -y; 448 | } 449 | ``` 450 | 451 | which can be further optimized to: 452 | 453 | ```c 454 | int sub (int x, int y) 455 | { 456 | return x - y; 457 | } 458 | ``` 459 | 460 | 461 | 462 | 463 | 464 | 465 | --------------------------------------------------------------------------------