├── README.md ├── tr.md └── zen-circle.jpg /README.md: -------------------------------------------------------------------------------- 1 | ## Code Optimization Methods 2 | 3 | *A summary of various code optimization methods* 4 | 5 | ### Translations 6 | 7 | * [Turkish](/tr.md) (by [@umutphp](https://github.com/umutphp)) 8 | 9 | 10 | ### Contents 11 | 12 | * [General Principles](#general-principles) 13 | * [Low-level](#low-level) 14 | * [Language-dependent optimization](#language-dependent-optimization) 15 | * [Language-independent optimization](#language-independent-optimization) 16 | * [Databases](#databases) 17 | * [Web](#web) 18 | * [References](#references) 19 | 20 | 21 | __A note on the relation between Computational Complexity of Algorithms and Code Optimization Techniques__ 22 | 23 | Both computational complexity theory ^{[2](#r2 "Computational complexity theory, wikipedia")} and code optimization techniques ^{[1](#r1 "Code optimization, wikipedia")} , have the common goal of efficient problem solution. While they are related to each other and share some concepts, the difference lies in what is emphasized at each time. 24 | 25 | Computational complexity theory, studies the performance with respect to input data size. Trying to design algorithmic solutions that have the least / fastest dependence on data size, regardless of underlying architecture. Code optimization techniques, on the other hand, focus on the architecture and the specific constants which enter those computational complexity estimations. 26 | 27 | Systems that operate in Real-Time are examples where both factors can be critical (eg. [Real-time Image Processing in JavaScript](https://github.com/foo123/FILTER.js) , yeah i know i am the author :) ). 28 | 29 | 30 | ### General Principles 31 | 32 | 33 | * __Keep it `DRY` and Cache__ : The general concept of caching involves avoiding re-computation/re-loading of a result if not necessary. This can be seen as a variation of Dont Repeat Yourself principle ^{[3](#r3 "DRY principle, wikipedia")} . Even dynamic programming can be seen as a variation of caching, in the sense that it stores intermediate results saving re-computation time and resources. 34 | 35 | * __`KISS` it, simpler can be faster__ : Keep it simple ^{[4](#r4 "KISS principle, wikipedia")} , makes various other techniques easier to apply and modify. ( A plethora of Software Engineering methods can help in this ^{[34](#r34 "Software development philosophies, wikipedia"), [35](#r35 "97 Things every programmer should know"), [55](#r55 "Stream processing, wikipedia"), [56](#r56 "Dataflow programming, wikipedia")} ) 36 | 37 | * __Sosi ample free orginizd, So simple if re-organized__ : Dont hesitate to re-organize if needed. Many times sth can be re-organized, re-structured in a much simpler / faster way while retaining its original functionality (concept of isomorphism ^{[22](#r22 "Isomorphism, wikipedia")} , change of representation ^{[23](#r23 "Representation, wikipedia"), [24](#r24 "Symmetry, wikipedia"), [36](#r36 "Data structure, wikipedia")}), yet providing other advantages. For example, the expression `(10+5*2)^2` is the simple constant `400`, another example is the transformation from `infix` expression notation to `prefix` (Polish) notation which can be parsed faster in one pass. 38 | 39 | * __Divide into subproblems and Conquer the solution__ : Subproblems (or smaller, trivial, special cases) can be easier/faster to solve and combine for the global solution. Sorting algorithms are great examples of that ^{[5](#r5 "Divide and conquer algorithm, wikipedia"), [sorting algorithms](https://github.com/foo123/SortingAlgorithms)} 40 | 41 | * __More helping hands are always welcome__ : Spread the work load, subdivide, share, parallelize if possible ^{[6](#r6 "Parallel computation, wikipedia"), [54](#r54 "Heterogeneous computing, wikipedia"), [68](#r68 "A Practical Wait-Free Simulation for Lock-Free Data Structures"), [69](#r69 "A Highly-Efficient Wait-Free Universal Construction"), [70](#r70 "A Methodology for Creating Fast Wait-Free Data Structures")} . 42 | 43 | * __United we Stand and Deliver__ : Having data together in a contiguous chunk, instead of scattered around here and there, makes it faster to load and process as a single block, instead of (fetching and) accessing many smaller chunks (eg. cache memory, vector/pipeline machines, database queries) ^{[51](#r51 "Locality of reference, wikipedia"), [52](#r52 "Memory access pattern, wikipedia"), [53](#r53 "Memory hierarchy, wikipedia")}. 44 | 45 | * __A little Laziness never hurt anyone__ : So true, each time a program is executed, only some of its data and functionality are used. Delaying to load and initialize (being lazy) all the data and functionality untill needed, can go a long way ^{[7](#r7 "Lazy load, wikipedia")} . 46 | 47 | 48 | __Further Notes__ 49 | 50 | Before trying to optimize, one has to measure and identify what needs to be optimized, if any. Blind "optimization" can be as good as no optimization at all, if not worse. 51 | 52 | That being said, one should always try to optimize and produce efficient solutions. A non-efficient "solution" can be as good as no solution at all, if not worse. 53 | 54 | **Pre-optimisation is perfectly valid given pre-knowledge**. For example that instantiating a whole `class` or `array` is slower than just returning an `integer` with the appropriate information. 55 | 56 | Some of the optimization techniques can be automated (eg in compilers), while others are better handled manually. 57 | 58 | Some times there is a trade-off between space/time resources. Increasing speed might result in increasing space/memory requirements (__caching__ is a classic example of that). 59 | 60 | The `90-10` (or `80-20` or other variations) rule of thumb, states that __`90` percent of the time__ is spent on __`10` percent of the code__ (eg a loop). Optimizing this part of the code can result in great benefits. (see for example Knuth ^{[8](#r8 "An empirical study of Fortran programs")} ) 61 | 62 | One optimization technique (eg simplification) can lead to the application of another optimization technique (eg constant substitution) and this in turn can lead back to the further application of the first optimization technique (or others). Doors can open. 63 | 64 | 65 | __References:__ [9](#r9 "Compiler optimizations, wikipedia"), [11](#r11 "Compiler Design Theory"), [12](#r12 "The art of compiler design - Theory and Practice"), [46](#r46 "Optimisation techniques"), [47](#r47 "Notes on C Optimisation"), [48](#r48 "Optimising C++"), [49](#r49 "Programming Optimization"), [50](#r50 "CODE OPTIMIZATION - USER TECHNIQUES") 66 | 67 | 68 | ### Low-level 69 | 70 | 71 | #### Generalities^{[44](#r44 "What Every Programmer Should Know About Floating-Point Arithmetic"), [45](#r45 "What Every Computer Scientist Should Know About Floating-Point Arithmetic")} 72 | 73 | __Data Allocation__ 74 | 75 | * Disk access is slow (Network access is even slower) 76 | * Main Memory access is faster than disk 77 | * CPU Cache Memory (if exists) is faster than main memory 78 | * CPU Registers are fastest 79 | 80 | 81 | __Binary Formats__ 82 | 83 | * Double precision arithmetic is slow 84 | * Floating point arithmetic is faster than double precision 85 | * Long integer arithmetic is faster than floating-point 86 | * Short Integer, fixed-point arithmetic is faster than long arithmetic 87 | * Bitwise arithmetic is fastest 88 | 89 | 90 | __Arithmetic Operations__ 91 | 92 | * Exponentiation is slow 93 | * Division is faster than exponentiation 94 | * Multiplication is faster than division 95 | * Addition/Subtraction is faster than multiplication 96 | * Bitwise operations are fastest 97 | 98 | 99 | #### Methods 100 | 101 | 102 | * __Register allocation__ : Since register memory is fastest way to access heavily used data, it is desirable (eg compilers, real-time systems) to allocate some data in an optimum sense in the cpu registers during a heavy-load operation. There are various algorithms (based on the graph coloring problem) which provide an automated way for this kind of optimization. Other times a programmer can explicitly declare a variable that is allocated in the cpu registers during some part of an operation ^{[10](#r10 "Register allocation, wikipedia")} 103 | 104 | 105 | * __Single Atom Optimizations__ : This involves various operations which optimize one cpu instruction (atom) at a time. For example some operands in an instruction, can be constants, so their values can be replaced instead of the variables. Another example is replacing exponentiation with a power of `2` with a multiplication, etc.. 106 | 107 | 108 | * __Optimizations over a group of Atoms__ : Similar to previous, this kind of optimization, involves examining the control flow over a group of cpu instructions and re-arranging so that the functionality is retained, while using simpler/fewer instructions. For example a complex `IF THEN` logic, depending on parameters, can be simplified to a single `Jump` statement, and so on. 109 | 110 | 111 | ### Language-dependent optimization 112 | 113 | * Check carefuly the **documentation and manual** for the underlying mechanisms the language is using to implement specific features and operations and use them to estimate the cost of a certain code and the alternatives provided. 114 | 115 | 116 | 117 | ### Language-independent optimization 118 | 119 | 120 | * __Re-arranging Expressions__ : More efficient code for the evaluation of an expression (or the computation of a process) can often be produced if the operations occuring in the expression are evaluated in a different order. This works because by re-arranging expression/operations, what gets added or multiplied to what, gets changed, including the relative number of additions and multiplications, and thus the (overall) relative (computational) costs of each operation. In fact, this is not restricted to arithmetic operations, but any operations whatsoever using symmetries (eg commutative laws, associative laws and distributive laws, when they indeed hold, are actualy examples of arithmetic operator symmetries) of the process/operators and re-arrange to produce same result while having other advantages. That is it, so simple. Classic examples are Horner's Rule ^{[13](#r13 "Horner rule, wikipedia")}, Karatsuba Multiplication ^{[14](#r14 "Karatsuba algorithm, wikipedia")}, fast complex multiplication ^{[15](#r15 "Fast multiplication of complex numbers")}, fast matrix multiplication ^{[18](#r18 "Strassen algorithm, wikipedia"), [19](#r19 "Coppersmith-Winograd algorithm, wikipedia")}, fast exponentiation ^{[16](#r16 "Exponentiation by squaring, wikipedia"), [17](#r17 "Fast Exponentiation")}, fast gcd computation ^{[78](#r78 "A Binary Recursive Gcd Algorithm")}, fast factorials/binomials ^{[20](#r20 "Comments on Factorial Programs"), [21](#r21 "Fast Factorial Functions")}, fast fourier transform ^{[57](#r57 "Fast Fourier transform, wikipedia")}, fast fibonacci numbers ^{[76](#r76 "Fast Fibonacci numbers")}, sorting by merging ^{[25](#r25 "Merge sort, wikipedia")}, sorting by powers ^{[26](#r26 "Radix sort, wikipedia")}. 121 | 122 | 123 | * __Constant Substitution/Propagation__ : Many times an expression is under all cases evaluated to a single constant, the constant value can be replaced instead of the more complex and slower expression (sometimes compilers do that). 124 | 125 | 126 | * __Inline Function/Routine Calls__ : Calling a function or routine, involves many operations from the part of the cpu, it has to push onto the stack the current program state and branch to another location, and then do the reverse procedure. This can be slow when used inside heavy-load operations, inlining the function body can be much faster without all this overhead. Sometimes compilers do that, other times a programmer can declare or annotate a function as `inline` explicitly. ^{[27](#r27 "Function inlining, wikipedia")} 127 | 128 | 129 | * __Combining Flow Transfers__ : `IF/THEN` instructions and logic are, in essence, cpu `branch` instructions. Branch instructions involve changing the program `pointer` and going to a new location. This can be slower if many `jump` instructions are used. However re-arranging the `IF/THEN` statements (factorizing common code, using De Morgan's rules for logic simplification etc..) can result in *isomorphic* functionality with fewer and more efficient logic and as a result fewer and more efficient `branch` instructions 130 | 131 | 132 | * __Dead Code Elimination__ : Most times compilers can identify code that is never accessed and remove it from the compiled program. However not all cases can be identified. Using previous simplification schemes, the programmer can more easily identify "dead code" (never accessed) and remove it. An alternative approach to "dead-code elimination" is "live-code inclusion" or "tree-shaking" techniques. 133 | 134 | 135 | * __Common Subexpressions__ : This optimization involves identifying subexpressions which are common in various parts of the code and evaluating them only once and use the value in all subsequent places (sometimes compilers do that). 136 | 137 | 138 | * __Common Code Factorisation__ : Many times the same block of code is present in different branches, for example the program has to do some common functionality and then something else depending on some parameter. This common code can be factored out of the branches and thus eliminate unneeded redundancy , latency and size. 139 | 140 | 141 | * __Strength Reduction__ : This involves transforming an operation (eg an expression) into an equivalent one which is faster. Common cases involve replacing `exponentiation` with `multiplication` and `multiplication` with `addition` (eg inside a loop). This technique can result in great efficiency stemming from the fact that simpler but equivalent operations are several cpu cycles faster (usually implemented in hardware) than their more complex equivalents (usually implemented in software) ^{[28](#r28 "Strength reduction, wikipedia")} 142 | 143 | 144 | * __Handling Trivial/Special Cases__ : Sometimes a complex computation has some trivial or special cases which can be handled much more efficiently by a reduced/simplified version of the computation (eg computing `a^b`, can handle the special cases for `a,b=0,1,2` by a simpler method). Trivial cases occur with some frequency in applications, so simplified special case code can be quite useful. ^{[42](#r42 "Three optimization tips for C"), [43](#r43 "Three optimization tips for C, slides")} . Similar to this, is the handling of common/frequent computations (depending on application) with fine-tuned or faster code or even hardcoding results directly. 145 | 146 | 147 | * __Exploiting Mathematical Theorems/Relations__ : Some times a computation can be performed in an equivalent but more efficient way by using some mathematical theorem, transformation, symmetry ^{[24](#r24 "Symmetry, wikipedia")} or knowledge (eg. Gauss method of solving Systems of Linear equations ^{[58](#r58 "Gaussian elimination, wikipedia")}, Euclidean Algorithm ^{[71](#r71 "Euclidean Algorithm")}, or both ^{[72](#r72 "Gröbner basis")}, Fast Fourier Transforms ^{[57](#r57 "Fast Fourier transform, wikipedia")}, Fermat's Little Theorem ^{[59](#r59 "Fermat's little theorem, wikipedia")}, Taylor-Mclaurin Series Expasions, Trigonometric Identities ^{[60](#r60 "Trigonometric identities, wikipedia")}, Newton's Method ^{[73](#r73 "Newton's Method"),[74](#r74 "Fast Inverse Square Root")}, etc..^{[75](#r75 "Methods of computing square roots")}). This can go a long way. It is good to refresh your mathematical knowledge every now and then. 148 | 149 | 150 | * __Using Efficient Data Structures__ : Data structures are the counterpart of algorithms (in the space domain), each efficient algorithm needs an associated efficient data structure for the specific task. In many cases using an appropriate data structure (representation) can make all the difference (eg. database designers and search engine developers know this very well) ^{[36](#r36 "Data structure, wikipedia"), [37](#r37 "List of data structures, wikipedia"), [23](#r23 "Representation, wikipedia"), [62](#r62 "Dancing links algorithm"), [63](#r63 "Data Interface + Algorithms = Efficient Programs"), [64](#r64 "Systems Should Automatically Specialize Code and Data"), [65](#r65 "New Paradigms in Data Structure Design: Word-Level Parallelism and Self-Adjustment"), [68](#r68 "A Practical Wait-Free Simulation for Lock-Free Data Structures"), [69](#r69 "A Highly-Efficient Wait-Free Universal Construction"), [70](#r70 "A Methodology for Creating Fast Wait-Free Data Structures"), [77](#r77 "Fast k-Nearest Neighbors (k-NN) algorithm")} 151 | 152 | 153 | 154 | __Loop Optimizations__ 155 | 156 | Perhaps the most important code optimization techniques are the ones involving loops. 157 | 158 | 159 | * __Code Motion / Loop Invariants__ : Sometimes code inside a loop is independent of the loop index, can be moved out of the loop and computed only once (it is a loop invariant). This results in the loop doing fewer operations (sometimes compilers do that) ^{[29](#r29 "Loop invariant, wikipedia"), [30](#r30 "Loop-invariant code motion, wikipedia")} 160 | 161 | __example:__ 162 | 163 | ```javascript 164 | 165 | // this can be transformed 166 | for (i=0; i<1000; i++) 167 | { 168 | invariant = 100*b[0]+15; // this is a loop invariant, not depending on the loop index etc.. 169 | a[i] = invariant+10*i; 170 | } 171 | 172 | // into this 173 | invariant = 100*b[0]+15; // now this is out of the loop 174 | for (i=0; i<1000; i++) 175 | { 176 | a[i] = invariant+10*i; // loop executes fewer operations now 177 | } 178 | 179 | ``` 180 | 181 | 182 | * __Loop Fusion__ : Sometimes two or more loops can be combined into a single loop, thus reducing the number of test and increment instructions executed. 183 | 184 | __example:__ 185 | 186 | ```javascript 187 | 188 | // 2 loops here 189 | for (i=0; i<1000; i++) 190 | { 191 | a[i] = i; 192 | } 193 | for (i=0; i<1000; i++) 194 | { 195 | b[i] = i+5; 196 | } 197 | 198 | 199 | // one fused loop here 200 | for (i=0; i<1000; i++) 201 | { 202 | a[i] = i; 203 | b[i] = i+5; 204 | } 205 | 206 | ``` 207 | 208 | 209 | * __Unswitching__ : Some times a loop can be split into two or more loops, of which only one needs be executed at any time. 210 | 211 | __example:__ 212 | 213 | ```javascript 214 | 215 | // either one of the cases will be executing in each time 216 | for (i=0; i<1000; i++) 217 | { 218 | if (X>Y) // this is executed every time inside the loop 219 | a[i] = i; 220 | else 221 | b[i] = i+10; 222 | } 223 | 224 | // loop split in two here 225 | if (X>Y) // now executed only once 226 | { 227 | for (i=0; i<1000; i++) 228 | { 229 | a[i] = i; 230 | } 231 | } 232 | else 233 | { 234 | for (i=0; i<1000; i++) 235 | { 236 | b[i] = i+10; 237 | } 238 | } 239 | 240 | ``` 241 | 242 | 243 | * __Array Linearization__ : This involves handling a multi-dimensional array in a loop, as if it was a (simpler) one-dimensional array. Most times multi-dimensional arrays (eg `2D` arrays `NxM`) use a linearization scheme, when stored in memory. Same scheme can be used to access the array data as if it is one big `1`-dimensional array. This results in using a single loop instead of multiple nested loops ^{[31](#r31 "Array linearisation, wikipedia"), [32](#r32 "Vectorization, wikipedia"), [61](#r61 "The NumPy array: a structure for efficient numerical computation")} 244 | 245 | __example:__ 246 | 247 | ```javascript 248 | 249 | // nested loop 250 | // N = M = 20 251 | // total size = NxM = 400 252 | for (i=0; i<20; i+=1) 253 | { 254 | for (j=0; j<20; j+=1) 255 | { 256 | // usually a[i, j] means a[i + j*N] or some other equivalent indexing scheme, 257 | // in most cases linearization is straight-forward 258 | a[i, j] = 0; 259 | } 260 | } 261 | 262 | // array linearized single loop 263 | for (i=0; i<400; i++) 264 | a[i] = 0; // equivalent to previous with just a single loop 265 | 266 | 267 | ``` 268 | 269 | 270 | * __Loop Unrolling__ : Loop unrolling involves reducing the number of executions of a loop by performing the computations corresponding to two (or more) loop iterations in a single loop iteration. This is partial loop unrolling, full loop unrolling involves eliminating the loop completely and doing all the iterations explicitly in the code (for example for small loops where the number of iterations is fixed). Loop unrolling results in the loop (and as a consequence all the overhead associated with each loop iteration) executing fewer times. In processors which allow pipelining or parallel computations, loop unroling can have an additional benefit, the next unrolled iteration can start while the previous unrolled iteration is being computed or loaded without waiting to finish. Thus loop speed can increase even more ^{[33](#r33 "Loop unrolling, wikipedia")} 271 | 272 | __example:__ 273 | 274 | ```javascript 275 | 276 | // "rolled" usual loop 277 | for (i=0; i<1000; i++) 278 | { 279 | a[i] = b[i]*c[i]; 280 | } 281 | 282 | // partially unrolled loop (half iterations) 283 | for (i=0; i<1000; i+=2) 284 | { 285 | a[i] = b[i]*c[i]; 286 | // unroled the next iteration into the current one and increased the loop iteration step to 2 287 | a[i+1] = b[i+1]*c[i+1]; 288 | } 289 | 290 | // sometimes special care is needed to handle cases 291 | // where the number of iterations is NOT an exact multiple of the number of unrolled steps 292 | // this can be solved by adding the remaining iterations explicitly in the code, after or before the main unrolled loop 293 | 294 | ``` 295 | 296 | 297 | ### Databases 298 | 299 | #### Generalities 300 | 301 | Database Access can be expensive, this means it is usually better to fetch the needed data using as few DB connections and calls as possible 302 | 303 | 304 | #### Methods 305 | 306 | 307 | * __Lazy Load__ : Avoiding the DB access unless necessary can be efficient, provided that during the application life-cycle there is a frequency of cases where the extra data are not needed or requested 308 | 309 | 310 | * __Caching__ : Re-using previous fetched data-results for same query, if not critical and if a slight-delayed update is tolerable 311 | 312 | 313 | * __Using Efficient Queries__ : For Relational DBs, the most efficient query is by using an index (or a set of indexes) by which data are uniquely indexed in the DB ^{[66](#r66 "10 tips for optimising Mysql queries"), [67](#r67 "Mysql Optimisation")}. 314 | 315 | 316 | * __Exploiting Redundancy__ : Adding more helping hands(DBs) to handle the load instead of just one. In effect this means copying (creating redundancy) of data in multiple places, which can subdivide the total load and handle it independantly 317 | 318 | 319 | ### Web 320 | 321 | * __Minimal Transactions__ : Data over the internet (and generally data over a network), take some time to be transmitted. More so if the data are large, therefore it is best to transmit only the necessary data, and even these in a compact form. That is one reason why `JSON` replaced the verbose `XML` for encoding of arbitrary data on the web. 322 | 323 | 324 | * __Minimum Number of Requests__ : This can be seen as a variaton of the previous principle. It means that not only each request should transmit only necessary data in a compact form, but also that the number of requests should be minimized. This can include, minifying `.css` files into one `.css` file (even embedding images if needed), minifying `.js` files into one `.js` file, etc.. This can sometimes generate large data (files), however coupled with the next tip, can result in better performance. 325 | 326 | 327 | * __Cache, cache and then cache some more__ : This can include everything, from whole pages to `.css` files, `.js` files, images etc.. Cache in the server, cache in the client, cache in-between, cache everywhere.. 328 | 329 | 330 | * __Exploiting Redundancy__ : For web applications, this is usually implemented by exploiting some [cloud architecture](http://en.wikipedia.org/wiki/Cloud_computing) in order to store (static) files, which can be loaded (through the cloud) from more than one location. Other approaches include, [Load balancing](http://en.wikipedia.org/wiki/Load_balancing_%28computing%29) ( having redundancy not only for static files, but also for servers ). 331 | 332 | 333 | * __Make application code faster/lighter__ : This draws from the previous principles about code optimization in general. Efficient application code can save both server and user resources. There is a reason why Facebook created `HipHop VM` .. 334 | 335 | 336 | * __Minimalism is an art form__ : Having web pages and applications with tons of html, images, (not to mention a ton of advertisements) etc, etc.. is not necessarily better design, and of course makes page load time slower. Therefore having minimal pages and doing updates in small chunks using `AJAX` and `JSON` (that is what `web 2.0` was all about), instead of reloading a whole page each time, can go a long way. This is one reason why [Template Engines](http://en.wikipedia.org/wiki/Template_engine) and [MVC Frameworks](http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller) were created. Minimalism does not need to sacrifice the artistic dimension, [Minimalism](http://en.wikipedia.org/wiki/Minimalism) __IS__ an art form. 337 | 338 | [το λακωνίζειν εστί φιλοσοφείν (i.e to be laconic in speech and deeds, simple and to the point, is the art of philosophy)](https://en.wikipedia.org/wiki/Laconic_phrase) 339 | 340 | 341 | [![Zen Circle](/zen-circle.jpg)](http://en.wikipedia.org/wiki/Ens%C5%8D) 342 | 343 | 344 | 345 | ### References 346 | 347 | 1. Code optimization, wikipedia 348 | 2. Computational complexity theory, wikipedia 349 | 3. DRY principle, wikipedia 350 | 4. KISS principle, wikipedia 351 | 5. Divide and conquer algorithm, wikipedia 352 | 6. Parallel computation, wikipedia 353 | 7. Lazy load, wikipedia 354 | 8. An empirical study of Fortran programs 355 | 9. Compiler optimizations, wikipedia 356 | 10. Register allocation, wikipedia 357 | 11. Compiler Design Theory 358 | 12. The art of compiler design - Theory and Practice 359 | 13. Horner rule, wikipedia 360 | 14. Karatsuba algorithm, wikipedia 361 | 15. Fast multiplication of complex numbers 362 | 16. Exponentiation by squaring, wikipedia 363 | 17. Fast Exponentiation 364 | 18. Strassen algorithm, wikipedia 365 | 19. Coppersmith-Winograd algorithm, wikipedia 366 | 20. Comments on Factorial Programs 367 | 21. Fast Factorial Functions 368 | 22. Isomorphism, wikipedia 369 | 23. Representation, wikipedia 370 | 24. Symmetry, wikipedia 371 | 25. Merge sort, wikipedia 372 | 26. Radix sort, wikipedia 373 | 27. Function inlining, wikipedia 374 | 28. Strength reduction, wikipedia 375 | 29. Loop invariant, wikipedia 376 | 30. Loop-invariant code motion, wikipedia 377 | 31. Array linearisation, wikipedia 378 | 32. Vectorization, wikipedia 379 | 33. Loop unrolling, wikipedia 380 | 34. Software development philosophies, wikipedia 381 | 35. 97 Things every programmer should know 382 | 36. Data structure, wikipedia 383 | 37. List of data structures, wikipedia 384 | 38. Cloud computing, wikipedia 385 | 39. Load balancing, wikipedia 386 | 40. Model-view-controller, wikipedia 387 | 41. Template engine, wikipedia 388 | 42. Three optimization tips for C 389 | 43. Three optimization tips for C, slides 390 | 44. What Every Programmer Should Know About Floating-Point Arithmetic 391 | 45. What Every Computer Scientist Should Know About Floating-Point Arithmetic 392 | 46. Optimisation techniques 393 | 47. Notes on C Optimisation 394 | 48. Optimising C++ 395 | 49. Programming Optimization 396 | 50. CODE OPTIMIZATION - USER TECHNIQUES 397 | 51. Locality of reference, wikipedia 398 | 52. Memory access pattern, wikipedia 399 | 53. Memory hierarchy, wikipedia 400 | 54. Heterogeneous computing, wikipedia 401 | 55. Stream processing, wikipedia 402 | 56. Dataflow programming, wikipedia 403 | 57. Fast Fourier transform, wikipedia 404 | 58. Gaussian elimination, wikipedia 405 | 59. Fermat's little theorem, wikipedia 406 | 60. Trigonometric identities, wikipedia 407 | 61. The NumPy array: a structure for efficient numerical computation 408 | 62. Dancing links algorithm 409 | 63. Data Interface + Algorithms = Efficient Programs 410 | 64. Systems Should Automatically Specialize Code and Data 411 | 65. New Paradigms in Data Structure Design: Word-Level Parallelism and Self-Adjustment 412 | 66. 10 tips for optimising Mysql queries 413 | 67. Mysql Optimisation 414 | 68. A Practical Wait-Free Simulation for Lock-Free Data Structures 415 | 69. A Highly-Efficient Wait-Free Universal Construction 416 | 70. A Methodology for Creating Fast Wait-Free Data Structures 417 | 71. Euclidean Algorithm 418 | 72. Gröbner basis 419 | 73. Newton's Method 420 | 74. Fast Inverse Square Root 421 | 75. Methods of computing square roots 422 | 76. Fast Fibonacci numbers 423 | 77. Fast k-Nearest Neighbors (k-NN) algorithm 424 | 78. A Binary Recursive Gcd Algorithm 425 | 426 | -------------------------------------------------------------------------------- /tr.md: -------------------------------------------------------------------------------- 1 | ## Kod Optimizasyon Yöntemleri 2 | 3 | *Çeşitli kod optimizasyon yöntemlerinden oluşan derleme* 4 | 5 | ### Çeviriler 6 | 7 | * [Türkçe](/tr.md) ([@umutphp](https://github.com/umutphp) tarafından) 8 | 9 | 10 | ### İçindekiler 11 | 12 | - [Genel Prensipler](#general-principles) 13 | - [Alt-seviye](#low-level) 14 | - [Dile bağımlı optimizasyon](#language-dependent-optimization) 15 | - [Dile bağımlı optimizasyon](#language-independent-optimization) 16 | - [Veritabanları](#databases) 17 | - [Web](#web) 18 | - [Referanslar](#references) 19 | 20 | **Algoritmaların Hesaplama Karmaşıklığı Teorisi ve Kod Optimizasyon Teknikleri arasındaki ilişkiye dair bir not** 21 | 22 | Hesaplama Karmaşıklığı Teorisi ² ve kod optimizasyon teknikleri ¹ ortak bir amaç olarak verimli problem çözmeyi hedeflerler. Birbirileri ile ortak paydaları olmasına ve aynı bağlamı paylaşmalarına rağmen, aralarındaki fark vurgu yaptıkları konulardadır. 23 | 24 | Hesaplama karmaşıklığı teorisi performansı girdi boyutu göre araştırır. Alt yatan mimariden bağımsız olarak girdi boyutuna en az bağımlı ve en hızlı algoritmik çözümü tasarlamaya çalışır. Kod optimizasyon teknikleri ise mimariye ve hesaplama karmaşıklığı teorisinin tahminlemede kullandığı belli sabitlere odaklanır. 25 | 26 | Gerçek zamanlı çalışan sistemlerde bu ikisi de kritik faktör olabilir (Örneğin, [JavaScript ile Gerçek Zamanlı Resim İşleme Processing](https://github.com/foo123/FILTER.js) kütüphanesi, evet ben yazdım :) ). 27 | 28 | ### Genel Prensipler 29 | 30 | - **`DRY` Prensibi ve Önbellek** : Genel önbellekleme kavramı, gerekli değilse bir sonucun yeniden hesaplanmasından / yeniden yüklenmesinden kaçınmayı içerir. Bu DRY prensibinin ³ farklı bir çeşidi olarak da görülebilir. Ara sonuçları saklayıp tekrar hesaplama zaman ve kaynak tasarrufu sağlandığından dinamik programlama da önbelleklemenin bir çeşidi gibi görülebilir. 31 | 32 | - **`KISS`, sade olan daha hızlı olabilir** : 'Keep it simple' ⁴ prensibi, bir çok tekniği uygulanabilir ve değiştirilebilir hale getirir. ( Bu konuda yardımcı olabilecek çok fazla yazılım mühendisliği tekniği vardır ^{34, 35, 55, 56} ) 33 | 34 | - **Tek rardüzen lersenda hakolayola bilir, tekrar düzenlersen daha kolay olabilir** : İhtiyaç duyduğunuz an yeniden düzenleme yapmaktan çekinmeyin. Çoğu zaman bir şey var olan özellikleri korunarak daha hızlı ve basit bir yapıya sahip olacak şekilde tekrar düzenlenebilir ve yeniden yapılandırılabilir (denkşekillilik kavramı ²² , temsil değişikliği ^{23, 24, 36}), ve bu durum beraberinde başka faydalar da getirir. Örneğin, the bu sayısal ifade `(10+5*2)^2` aslında `400` sabitine eşittir, bir başka örnek ise ` infix ` ifade notasyonundan daha hızlı ayrıştırılabilen ` prefix ` (Polish) notasyonuna olan dönüşümdür. 35 | 36 | - **Alt problemlere Böl ve çözümü Fethet** : Alt problemler (küçük, önemsiz ve özel parçalar) daha kolayca ve hızlıca çözülerek büyük çözüme ulaşılabilir. Sıralama algoritmaları buna en güzel örneklerdir ^{5, sıralama algoritmaları}. 37 | 38 | - **Yardım istemekten çekinme** : Mümkünse iş yükünü dağıtın, daha küçük parçalara bölüm, paylaşın ve paralel olarak yapılabilmesini sağlayın ^{6, 54, 68, 69, 70} . 39 | 40 | - **United we Stand and Deliver** : Veriyi farklı kaynaklarda dağınık halde değil de tek ve bitişik bir kaynaktan alma, yüklemeyi ve küçük parçalar halinde okumak yerine tek parça halinde işlemeyi hızlı hale getirir (ör. önbellek, vector/pipeline, veritabanı sorguları) ^{51, 52, 53}. 41 | 42 | - **Biraz tembellik etmek çok da kötü bir şey değildir** : Kesinlikle söyleyebiliriz ki, bir program her çalıştırıldığında, verilerinin ve işlevlerinin yalnızca bir kısmı kullanılır. Sadece gerektiğinde bütün veriyi ve özellikleri yükleme ya da çalıştırma (tembel olma) daha uzun bir yol almamızı sağlayabilir ⁷ . 43 | 44 | **Ek Notlar** 45 | 46 | Optimizasyona başlamadan önce neyin optimizasyona ihtiyaç duyduğunu belirleyip hesaplamak gerekir. Hiç optimizasyon yapmamak kör "optimizasyondan" çoğunlukla iyidir. 47 | 48 | Bununla beraber, kişi daima verimli çözümler üretmeye ve optimize etmeye çalışmalı. Verimli olmayan "çözümler" en fazla çözümsüz durum kadar iyi olabilir, o da tabi kötü değillerse. 49 | 50 | **Eksik optimizasyon ancak eksik bilgi ile geçerli görülebilir**. Örneğin, bütün bir `class` yada `array` yüklemek aynı bilgi ile sadece bir `integer` dönmekten daha yavaştır. 51 | 52 | Bazı optimizasyon teknikleri otomatikleştiriebilir (örneğin derleyicilerde) ama çoğunluğun elle yapılması gerekir. 53 | 54 | Bazen hafıza/zaman kaynakları arasında bir takas dengesi vardır. Hızı artırmak hafıza/alan ihtiyacını artırabilir (**caching** buna en güzel örnektir). 55 | 56 | `90-10` (`80-20` veya diğer varyasyonlar) kuralına göre diyebiliriz ki kodun çalışması için harcanan zamanın **`90%`** kısmı yazılan kodun **`10%`** kısmı (örneğin bir döngü) tarafından kullanılır. Bu kısımda yapılan bir optimizasyonun etkisi çok büyük olacaktır. (Örnek olarak Knuth'a bakabilirsiniz ⁸ ) 57 | 58 | Bir optimizasyon yöntemi (örneğin sadeleştirme) diğer bir yöntemin (örneğin sabit değiştirme) uygulanmasına yol açabilir ve bu da bir zincir şeklinde birinci yöntemin (ya da başka bir yöntemlerin) uygulanmasına yol açabilir. Kapı kapıyı açar. 59 | 60 | **Referanslar:** [9](#r9 "Compiler optimizations, wikipedia"), [11](#r11 "Compiler Design Theory"), [12](#r12 "The art of compiler design - Theory and Practice"), [46](#r46 "Optimisation techniques"), [47](#r47 "Notes on C Optimisation"), [48](#r48 "Optimising C++"), [49](#r49 "Programming Optimization"), [50](#r50 "CODE OPTIMIZATION - USER TECHNIQUES") 61 | 62 | ### Alt-seviye 63 | 64 | #### Genel Konular^{44, 45} 65 | 66 | **Veri Kaydetme** 67 | 68 | - Disk erişimi yavaştır (Ağ üzerinden erişim daha da yavaştır) 69 | - Hafızaya (RAM) erişim diske erişimden daha hızlıdır 70 | - CPU Cache Hafızası (eğer varsa) ana hafızadan daha hızlıdır 71 | - CPU kayıtları en hızlısıdır 72 | 73 | **İkili (Binary) Formatlar** 74 | 75 | - Double kesirli sayı aritmetiği yavaştır 76 | - Float kesirli sayı aritmetiği double kesirli sayı aritmetiğinden daha hızlıdır 77 | - Büyük tamsayı (long integer) aritmetiği float kesirli sayı aritmetiğinden daha hızlıdır 78 | - Küçük tamsayı (short integer), sabit noktalı aritmetik büyük tamsayı aritmetiğinden daha hızlıdır 79 | - Bit temelli aritmetik en hızlısıdır 80 | 81 | **Aritmetik İşlemler** 82 | 83 | - Üslü işlemler yavaştır 84 | - Bölme işlemi üslü işlemlerden daha hızlıdır 85 | - Çarpma işlemi bölme işleminden daha hızlıdır 86 | - Toplama ve çıkarma işlemleri çarpma işleminden daha hızlıdırlar 87 | - Bit işlemleri en hızlısıdır 88 | 89 | #### Metodlar 90 | 91 | - **CPU veri yuvası kullanımı** : Veri yuvası hafızası çok kullanılan verilere ulaşmak için en hızlı erişim yöntemi olduğu için optimum bir mantıkla yüklü işlemler sırasında veri yuvalarında veri saklamak uygun bir yoldur (örneğin derleyiciler, gerçek zamanlı sistemler). Bu tarz optimizasyona imkan veren farklı algoritmalar (gafik reklendirme problemini temel alan) vardır. Başka bir yöntem olarak da programcı yapılan işlemin bir bölümünde CPU veri yuvalarında saklanan değişkenleri tanımlayabilir ¹⁰ 92 | 93 | - **Tekil Atom Optimizasyonu** : Bu atom olarak adlandırılan bir tek CPU işleminin optimizasyonun içeren farklı işlemleri kapsıyor. Örneğin bir atomun içindeki bazı işlenenler değişken kullanmak yerine sabit sayılara çevirilebilir. Başka bir örnek ise bir sayının karesini hesaplarken üslü sayı (code1}2) kullanmak yerine çarpma işlemi kullanmaktır, vs 94 | 95 | - **Atom Grupları Optimizasyonu** : Bir önceki başlığa benzer olarak, bu optimizasyon türünde atom ve atom gruplarının kontrol akışı incelenir, işlemlerin amacı korunarak tekrar düzenlenmesi ve daha az komut kullanılması sağlanmaya çalışılır. Örneğin, karmaşık bir `IF THEN` akışı incelenerek basit bir `Jump` cümlesine çevrilebilir, vs. 96 | 97 | ### Dil tabanlı optimizasyon 98 | 99 | - Dilin sunduğu özellikleri sağlamak için kullandığı alta yatan mekanizmaları anlamak için **resmi belgelerini ve el kitabını** dikkatlice okumak ve kod yazarken ya da alternatifler üretirken bunu kullanmak lazım. 100 | 101 | ### Dile bağımlı optimizasyon 102 | 103 | - **İfadelerin yeniden düzenlenmesi** : Bir ifadenin (veya bir işlemin hesaplanmasının) hesaplanması için daha verimli kod, ifadede gerçekleşen işlemleri farklı bir sırayla işleyerek üretilebilir. Bunun nedeni, ifadelerin / işlemlerin yeniden düzenlenmesi, eklenen ve çarpma sayısı ve dolayısıyla her bir işlemin (toplam) göreceli (hesaplamalı) maliyetleri de dahil olmak üzere, neyin eklenip neyin çarpılarak değiştirileceğidir. Aslında, bu aritmetik işlemlerle sınırlı değildir, fakat simetrileri kullanan herhangi bir işlem (örn. Değişmeli yasalar, birleşme yasaları ve dağıtım yasaları, gerçekte ellerinde bulundukları zaman, gerçekte aritmetik operatör simetrilerinin örnekleridir) ve işlemcilerin yeniden düzenlenmesidir ve aynı zamanda diğer avantajlara sahiptir. Bu durum karmaşık cümlelerin aksine aslında çok basittir. Daha iyi anlamak için bu örneklere bakabilirsiniz; Horner Yasası ¹³, Karatsuba Çarpımı ¹⁴, hızlı karmaşık çarpım ¹⁵, hızlı matris çarpımı ¹⁸, hızlı üs hesaplama ^{16, 17}, hızlı gcd işlemi ⁷⁸, hızlı faktöriyel/binom açılımı ²⁰ , hızlı fourier çevirimi ⁵⁷, hızlı fibonacci sayısı hesaplama ⁷⁶, birleştirerek sıralama ²⁵, üsleri kullanarak sıralama ²⁶. 104 | 105 | - **Sabit Yer Değiştirme/Üretme** : Çoğu zaman bir ifade tüm durumlarda tek bir sabit olarak hesaplanıyor olabilir, daha karmaşık ve daha yavaş ifade yerine sabit değer ile yer değiştirebilir (bazen derleyiciler bunu yapar). 106 | 107 | - ** Fonksiyon/Rutin Satır İçini Çağırma** : Bir fonksiyonu ya da rutini çağırma program durumunu yığının bir başka bölümne taşıma ve geri getirme gibi cpu'nun yapması gereken birçok işlem gerektirir. Bu yüklü işlemler yapılırken oldukça yavaş olabilir ve fonksiyonların gövdelerini kullanmak daha hızlı olabilir. Bazen bunu derleyiciler yapar, bazen de bunu bir programcı fonksiyon ya da rutin gövdelerini direk (`inline` ) kullanarak yapabilir. ²⁷ 108 | 109 | - **Akış Geçişlerini Birleştirme** : `IF/THEN` talimatları ve mantığı, özünde cpu ` dalı ` talimatlarıdır. CPU dal talimatları, program `işaretçisinin` değiştirilmesini ve yeni bir yere gitmesini içerir. Çok fazla `jump` talimatı içerirmesi durumunda bu çok yavaş olabilir.Bununla birlikte, `IF/THEN` ifadelerinin yeniden düzenlenmesi (ortak kodu çarpanlara ayırmak, De Morgan'ın mantık sadeleştirme kurallarını kullanarak vb.) verimli ve daha az mantıksal karmaşıklık içeren daha *izomorfik* işlevlere sebep olur ve sonuç olarak daha az ve daha verimli `dalı` talimatları oluşur. 110 | 111 | - **Ölü Kod Eleme** : Çoğu zaman derleyiciler, asla erişilmeyen bir kodu belirleyebilir ve derlenmiş programdan kaldırabilir. Ancak tüm vakalar tespit edilememektedir. Önceki sadeleştirme şemalarını kullanarak, programcı "ölü kodu" (asla erişilmez) daha kolay belirleyebilir ve kaldırabilir. "Ölü kod elemesine" alternatif bir yaklaşım "canlı kod ekleme" veya "ağaç sallama" teknikleridir. 112 | 113 | - **Ortak Alt İfadeler** : Bu optimizasyon, kodun çeşitli bölümlerinde ortak olan alt ifadelerin tanımlanmasını, yalnızca bir kez değerlendirilmesini ve değerinin sonraki tüm yerlerde kullanmasını içerir (bazen derleyiciler bunu yapar). 114 | 115 | - **Ortak Kodu Ayrıştırma** : Çoğu zaman aynı kod bloğu farklı dallarda bulunur, örneğin program bazı ortak işlevler ve ardından bazı parametrelere bağlı olarak başka bir şey yapmak zorundadır. Bu ortak kod dallardan ayrılabilir ve böylece gereksiz fazlalık, gecikme ve büyüklüğü ortadan kaldırabilir. 116 | 117 | - **Mukavemet Azaltma** : Bu, bir işlemin (örneğin bir ifadenin) daha hızlı olan bir eşdeğerine dönüştürülmesini içerir. Buna en genel iki örnek `üslü işlem`'i `çarpma` ile ve `çarpma`'yı `toplama` ile (örneğin bir döngü içinde) değiştirmektir. Bu tekniği uygulama, daha basit ancak eşdeğer işlemlerin, daha karmaşık eşdeğerlerinden (genellikle yazılımda uygulanan) daha hızlı (genellikle donanımda uygulanır) işlemci çevrimi içermesi gerçeğinden kaynaklanan verimlilik artışı ile sonuçlanabilir ²⁸. 118 | 119 | - **Küçük/Özel Durumları Ele Almak** : Bazen karmaşık bir hesaplamanın daha basit/sade işlem türleriyle hesaplanabilecek daha küçük ya da özel durumları olabilir(örneğin, `a^b` hesaplaması, daha basit bir yöntemle ` a,b=0,1,2` için özel durumları ele alınabilir). Önemsiz durumlar uygulamalarda bazı sıklıklarla ortaya çıkar, bu nedenle basitleştirilmiş özel durumlar kodda oldukça yararlı olabilir. ^{42, 43} . 120 | 121 | - **Matematiksel Teoremleri/İlişkileri Kullanma** : Bazen bir hesaplama herhangi bir matematik teoremini, çevrimini, simetrisini ²⁴ ya da bilgisini kullanarak doğru ve daha verimli bir şekilde yapılabilir (Örneğin, lineer denklem sistemlerini çözmek için Gauss metodu ⁵⁸, Euclid Algoritması ⁷¹, ya da ikisi ⁷², Fast Fourier Çevirimi ⁵⁷, Fermat'ın Küçük Teorisi ⁵⁹, Taylor-Mclaurin Serileri Açılımı, Trigonometrik Bilgiler ⁶⁰, Newton'un Metodu ^73,74, v.s.⁷⁵). 122 | 123 | - **Verimli Veri Yapılarını Kullanmak** : Veri yapıları, algoritmaların karşılığıdır (tüm kapama alanındaki), her verimli algoritma, belirli bir görev için ilişkili bir verimli veri yapısına ihtiyaç duyar. Birçok durumda uygun bir veri yapısı (gösterimi) kullanmak tüm farkı oluşturabilir (örneğin; veritabanı tasarımcıları ve arama motoru geliştiricileri bunu çok iyi bilir) ^{36, 37, 23, 62, 63, 64, 65, 68, 69, 70, 77}. 124 | 125 | **Döngü Optimizasyonu** 126 | 127 | Belki de en önemli kod optimizasyon teknikleri, döngü içeren tekniklerdir. 128 | 129 | - ** Kod Hareketi / Döngü Değişmezleri ** : Bazen döngünün içindeki bir kod bölümü döngü indeksinden bağımsızdır, döngü dışına alınabilir ve sadece bir kere çalıştırılabilir (bu bir döngü değişmezidir). Böylelikle döngü daha az işlem yapar (bazen derleyiciler bunu yapar) 130 | ^{29, 30} 131 | 132 | **örnek:** 133 | 134 | ```javascript 135 | 136 | // bu döngü geliştirilebilir 137 | for (i=0; i<1000; i++) 138 | { 139 | invariant = 100*b[0]+15; // bu bir döngü değişmezidir çünkü döngü indeksine bağlı değildir 140 | a[i] = invariant+10*i; 141 | } 142 | 143 | // döngünün daha verimli hali 144 | invariant = 100*b[0]+15; // şimdi döngü dışına alındı 145 | for (i=0; i<1000; i++) 146 | { 147 | a[i] = invariant+10*i; // döngü artık daha az işlem yapıyor 148 | } 149 | 150 | ``` 151 | 152 | - **Döngü Birleştime** : Bazen 2 ve ya daha fazla döngü birleştirilerek tek bir döngü haline getirilebilir ve böylece yapılan kontrol ve atırma işlemlerinin sayısı azaltıralabilir. 153 | 154 | **örnek:** 155 | 156 | ```javascript 157 | 158 | // 2 loops here 159 | for (i=0; i<1000; i++) 160 | { 161 | a[i] = i; 162 | } 163 | for (i=0; i<1000; i++) 164 | { 165 | b[i] = i+5; 166 | } 167 | 168 | 169 | // one fused loop here 170 | for (i=0; i<1000; i++) 171 | { 172 | a[i] = i; 173 | b[i] = i+5; 174 | } 175 | 176 | ``` 177 | 178 | - **Kontrolu Dışarı Alma** : Bazen bir döngü, herhangi bir zamanda yalnızca birinin çalıştırılması sağlanarak iki veya daha fazla döngüye ayrılabilir. 179 | 180 | **örnek:** 181 | 182 | ```javascript 183 | 184 | // döngünün her turunda kontrol tekrar tekrar çalıştırılır 185 | for (i=0; i<1000; i++) 186 | { 187 | if (X>Y) // her turda tekrar tekrar çalıştırılır 188 | a[i] = i; 189 | else 190 | b[i] = i+10; 191 | } 192 | 193 | // Kontrol dışarıya alınır ve sadece bir kere çalışır hale gelir 194 | if (X>Y) // sadece bir kere çalışacak 195 | { 196 | for (i=0; i<1000; i++) 197 | { 198 | a[i] = i; 199 | } 200 | } 201 | else 202 | { 203 | for (i=0; i<1000; i++) 204 | { 205 | b[i] = i+10; 206 | } 207 | } 208 | 209 | ``` 210 | 211 | - **Array Doğrusallaştırma** : Döngüye sokulan çok boyutlu bir diziyi sanki (daha basiti olan) tek boyutlu bir dizi gibi işlemektir. Genelde çok boyutlu diziler (örneğin `2D` diziler için `NxM`) hafızada kaydedilirken bir doğrusallaştırma şeması kullanırlar. Aynı şema mantığı dizideki veriye ulaşırken sanki `tek` boyutlu bir diziymiş gibi kullanılabilir. Bu da iç içe döngüler kullanmak yerine tek döngü kullanmayı sağlar 212 | ^{31, 32, 61}. 213 | 214 | **örnek:** 215 | 216 | ```javascript 217 | 218 | // iç içe döngü 219 | // N = M = 20 220 | // toplam boyut = NxM = 400 221 | for (i=0; i<20; i+=1) 222 | { 223 | for (j=0; j<20; j+=1) 224 | { 225 | // aslında a[i, j] gösterimi a[i + j*N] gösterimine benziyor 226 | // çoğunlukla tek boyutlu olan daha hızlıdır 227 | a[i, j] = 0; 228 | } 229 | } 230 | 231 | // dizi doğrusallaştırıldığında döngü teke düşer 232 | for (i=0; i<400; i++) 233 | a[i] = 0; // önceki döngüye eşit ama tek döngü 234 | 235 | 236 | ``` 237 | 238 | - **Döngü Azaltma** : Döngü azaltma iki (ya da daha fazla) döngü turunu teke düşürerek yapılan işlem sayısını azaltmaktır. Bu aslında kısmı döngü azaltmaktır, bütün halinde azaltma ise döngüyü tamamen ortadan kaldırıp bütün işlemi açıkta yapmaktır (işlem sayısının sabit olduğu küçük döngüler). Döngü azaltma, döngünün (ve sonuç olarak her döngü yinelemeyle ilişkili tüm ek yüklerin) daha az çalışmasına neden olur. İşlem hattına veya paralel hesaplamaya izin veren işlemcilerde, döngü açma işleminin ek bir faydası olabilir, önceki yineleme bitmeden beklemeden, hesaplanırken veya yüklenirken bir sonraki kontrolsüz yineleme başlayabilir. Dolayısıyle döngü hızı beklendiğinden daha çok artabilir 239 | ³³. 240 | 241 | **örnek:** 242 | 243 | ```javascript 244 | 245 | // olağan bir döngü 246 | for (i=0; i<1000; i++) 247 | { 248 | a[i] = b[i]*c[i]; 249 | } 250 | 251 | // kısmen azaltılmış döngü 252 | for (i=0; i<1000; i+=2) 253 | { 254 | a[i] = b[i]*c[i]; 255 | // bir sonraki tur bu turda hesaplanarak döngü artışı ikiye katlanır 256 | a[i+1] = b[i+1]*c[i+1]; 257 | } 258 | 259 | // bu durum ele alınırken hassas davranmak gerekir 260 | // örneğin azaltılmaya çalışılan döngü sayısı azaltılacak döngü sayısının tam katı olmayabilir 261 | // bu durumda fazla olan miktar ayrıca hesaplanabilir 262 | 263 | ``` 264 | 265 | ### Veritabanları 266 | 267 | #### Genel Konular 268 | 269 | Veritabanı erişimi pahalı olabilir, bu da gerekli veri sayısını mümkün olduğunca az sayıda DB bağlantısı ve çağrı kullanarak almak daha iyi olur. 270 | 271 | #### Yöntemler 272 | 273 | - **Tembel Yüklemesi** : Gerekli olmadıkça, DB erişiminden kaçınmak, uygulama ömrü boyunca fazladan veriye ihtiyaç duyulmadığı veya istenmediği durumlarda sıklık olması koşuluyla verimli olabilir 274 | 275 | - **Önbellek** : Kritik değilse ve hafif gecikmeli bir güncellemeye izin veriliyorsa, önceki sorgulanmış veri sonuçlarını aynı sorgu için yeniden kullanmak 276 | 277 | - **Verimli Sorgu Kullanma** : İlişkisel DB'ler için en etkili sorgu, verilerin DB'de benzersiz bir şekilde endekslendiği bir dizin (veya bir dizi dizin) kullanmaktır ^{66, 67}. 278 | 279 | - **Yükü Dağıtma** : Yükü tek bir yerine taşımak için daha fazla yardım eli (DB) ekleyin. Aslında bu, toplam yükü bölmek ve bağımsız olarak ele almak için birden fazla yere verilerin kopyalanması (artıklık oluşturma) anlamına gelir 280 | 281 | ### Web 282 | 283 | - **Daha Küçük Veri Akışı** : İnternet üzerinden veri (ve genellikle bir ağ üzerinden veri) iletilmesi biraz zaman alır. Özellikle veri daha büyükse, bu nedenle sadece gerekli verileri ve hatta bunları kompakt bir biçimde iletmek en iyisidir. Bu, `JSON` yapısının, web’de rastgele verilerin kodlanması için `XML` yapısının yerini almasının bir nedenidir. 284 | 285 | - **En Küçük Sayıda İstek Yapma** : Bu prensip, önceki prensibin bir benzeri olarak görülebilir. Bu, her isteğin sadece gerekli verileri kompakt bir biçimde iletmesi gerektiği değil aynı zamanda istek sayısının en aza indirilmesi gerektiği anlamına gelir. Bu `.css` dosyalarını tek bir `.css` dosyasına (resim dosyalarını da gömülü hale getirmek), `.js` dosyalarını da tek bir `.js` dosyasına çevirme gibi işlemleri içerebilir. Bu bazen büyük boyutta dosyalar oluşturabilir ama buna rağmen bir sonraki prensip ile birlikte daha hızı bir hizmet sağlanabilir. 286 | 287 | - **Öbellek, önbellek ve daha fazla önbellek** : Bu bütün sayfaları, `.css` dosyalarını, `.js` dosyaları, resimleri vs her şeyi önbelleğe almak anlamına gelir. Sunucuda çnbellek oluşturma, itemcide önbellek oluşturma, ikisinin arasında önbellek oluşturma, kısaca heryerde önbellek kullanmak.. 288 | 289 | - **Yükü Dağıtma** : Web uygulamaları için, bu genellikle birden fazla konumdan (bulut üzerinden) yüklenebilecek (statik) dosyaları depolamak amacıyla bazı [bulut mimarilerinden](http://en.wikipedia.org/wiki/Cloud_computing) yararlanılarak uygulanır. Diğer yaklaşımlar arasında [yük dengeleme](http://en.wikipedia.org/wiki/Load_balancing_%28computing%29) (yalnızca statik dosyalar için değil, sunucular için de yük dağıtma) bulunur. 290 | 291 | - **Uygulama kodunu daha hızlı/sade yapın** : Bu, genel olarak kod optimizasyonu ile ilgili önceki ilkelerden alınmıştır. Verimli uygulama kodu hem sunucu hem de kullanıcı kaynaklarını koruyabilir. Facebook'un `HipHop VM` yaratmasında bir sebep vardır.. 292 | 293 | - **Minimalizm bir sanattır** : Web sayfalarını birçok html, resim dosyalarından (birçok reklam alanı olmasını saymıyorum) oluşturmak en iyi tasarım değildir ve tabiki sayfa yüklenmesini daha yavaşlatır. Dolayısıyla minimal sayfalara sahip olmak ve işlemleri `AJAX` ve `JSON` kullanarak (web 2.0 tam da bu demek) yapmak her defasında büyük sayfaları yeniden yüklemekten daha iyidir. Bu [kalıp motorlarının](http://en.wikipedia.org/wiki/Template_engine) and [MVC Çatılarının](http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller) yaratılmalarının bir sebebidir. Minimalizm için sanatsal boyutu kurban etmeye gerek yok çünkü [Minimalizm](http://en.wikipedia.org/wiki/Minimalism) **de** bir sanattır. 294 | 295 | [το λακωνίζειν εστί φιλοσοφείν (konuşma ve eylemlerde kısa, basit ve konu odaklı olmak, felsefe sanatıdır.)](https://en.wikipedia.org/wiki/Laconic_phrase) 296 | 297 | [![Zen Circle](/zen-circle.jpg)](http://en.wikipedia.org/wiki/Ens%C5%8D) 298 | 299 | ### Kaynaklar 300 | 301 | 1. Code optimization, wikipedia 302 | 2. Computational complexity theory, wikipedia 303 | 3. DRY principle, wikipedia 304 | 4. KISS principle, wikipedia 305 | 5. Divide and conquer algorithm, wikipedia 306 | 6. Parallel computation, wikipedia 307 | 7. Lazy load, wikipedia 308 | 8. An empirical study of Fortran programs 309 | 9. Compiler optimizations, wikipedia 310 | 10. Register allocation, wikipedia 311 | 11. Compiler Design Theory 312 | 12. The art of compiler design - Theory and Practice 313 | 13. Horner rule, wikipedia 314 | 14. Karatsuba algorithm, wikipedia 315 | 15. Fast multiplication of complex numbers 316 | 16. Exponentiation by squaring, wikipedia 317 | 17. Fast Exponentiation 318 | 18. Strassen algorithm, wikipedia 319 | 19. Coppersmith-Winograd algorithm, wikipedia 320 | 20. Comments on Factorial Programs 321 | 21. Fast Factorial Functions 322 | 22. Isomorphism, wikipedia 323 | 23. Representation, wikipedia 324 | 24. Symmetry, wikipedia 325 | 25. Merge sort, wikipedia 326 | 26. Radix sort, wikipedia 327 | 27. Function inlining, wikipedia 328 | 28. Strength reduction, wikipedia 329 | 29. Loop invariant, wikipedia 330 | 30. Loop-invariant code motion, wikipedia 331 | 31. Array linearisation, wikipedia 332 | 32. Vectorization, wikipedia 333 | 33. Loop unrolling, wikipedia 334 | 34. Software development philosophies, wikipedia 335 | 35. 97 Things every programmer should know 336 | 36. Data structure, wikipedia 337 | 37. List of data structures, wikipedia 338 | 38. Cloud computing, wikipedia 339 | 39. Load balancing, wikipedia 340 | 40. Model-view-controller, wikipedia 341 | 41. Template engine, wikipedia 342 | 42. Three optimization tips for C 343 | 43. Three optimization tips for C, slides 344 | 44. What Every Programmer Should Know About Floating-Point Arithmetic 345 | 45. What Every Computer Scientist Should Know About Floating-Point Arithmetic 346 | 46. Optimisation techniques 347 | 47. Notes on C Optimisation 348 | 48. Optimising C++ 349 | 49. Programming Optimization 350 | 50. CODE OPTIMIZATION - USER TECHNIQUES 351 | 51. Locality of reference, wikipedia 352 | 52. Memory access pattern, wikipedia 353 | 53. Memory hierarchy, wikipedia 354 | 54. Heterogeneous computing, wikipedia 355 | 55. Stream processing, wikipedia 356 | 56. Dataflow programming, wikipedia 357 | 57. Fast Fourier transform, wikipedia 358 | 58. Gaussian elimination, wikipedia 359 | 59. Fermat's little theorem, wikipedia 360 | 60. Trigonometric identities, wikipedia 361 | 61. The NumPy array: a structure for efficient numerical computation 362 | 62. Dancing links algorithm 363 | 63. Data Interface + Algorithms = Efficient Programs 364 | 64. Systems Should Automatically Specialize Code and Data 365 | 65. New Paradigms in Data Structure Design: Word-Level Parallelism and Self-Adjustment 366 | 66. 10 tips for optimising Mysql queries 367 | 67. Mysql Optimisation 368 | 68. A Practical Wait-Free Simulation for Lock-Free Data Structures 369 | 69. A Highly-Efficient Wait-Free Universal Construction 370 | 70. A Methodology for Creating Fast Wait-Free Data Structures 371 | 71. Euclidean Algorithm 372 | 72. Gröbner basis 373 | 73. Newton's Method 374 | 74. Fast Inverse Square Root 375 | 75. Methods of computing square roots 376 | 76. Fast Fibonacci numbers 377 | 77. Fast k-Nearest Neighbors (k-NN) algorithm 378 | 78. A Binary Recursive Gcd Algorithm 379 | -------------------------------------------------------------------------------- /zen-circle.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/foo123/code-optimization-methods/0acfafd0970b2fc965adee27bf5b1aca08383e28/zen-circle.jpg --------------------------------------------------------------------------------