├── .github └── FUNDING.yml ├── README.md ├── .gitignore ├── issue-3-for-loop-in-dart ├── snippet-1.md └── issue-3-for-loop-in-dart.md ├── issue-1-prefer-const-over-final └── issue-1-prefer-const-over-final.md ├── issue-5-recursive-functions-in-dart └── issue-5-recursive-functions-in-dart.md ├── issue-2-const-in-dart └── issue-2-const-in-dart.md └── issue-4-functions-in-dart └── issue-4-functions-in-dart.md /.github/FUNDING.yml: -------------------------------------------------------------------------------- 1 | # These are supported funding model platforms 2 | 3 | github: [vandadnp] 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Going Deep with Dart 2 | 3 | Going Deep with the [Dart language](https://dart.dev). 4 | 5 | * [Issue 1 - Prefer `const` over `final`](issue-1-prefer-const-over-final/issue-1-prefer-const-over-final.md) 6 | * [Issue 2 - `const` in Dart](issue-2-const-in-dart/issue-2-const-in-dart.md) 7 | * [Issue 3 - `for` loop in Dart](issue-3-for-loop-in-dart/issue-3-for-loop-in-dart.md) 8 | * [Issue 4 - Functions in Dart](issue-4-functions-in-dart/issue-4-functions-in-dart.md) 9 | * [Issue 5 - Recursive Function in Dart](issue-5-recursive-functions-in-dart/issue-5-recursive-functions-in-dart.md) 10 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # See https://www.dartlang.org/guides/libraries/private-files 2 | 3 | # Files and directories created by pub 4 | .dart_tool/ 5 | .packages 6 | build/ 7 | # If you're building an application, you may want to check-in your pubspec.lock 8 | pubspec.lock 9 | 10 | # Directory created by dartdoc 11 | # If you don't generate documentation locally you can remove this line. 12 | doc/api/ 13 | 14 | # Avoid committing generated Javascript files: 15 | *.dart.js 16 | *.info.json # Produced by the --dump-info flag. 17 | *.js # When generated by dart2js. Don't specify *.js if your 18 | # project includes source files written in JavaScript. 19 | *.js_ 20 | *.js.deps 21 | *.js.map 22 | -------------------------------------------------------------------------------- /issue-3-for-loop-in-dart/snippet-1.md: -------------------------------------------------------------------------------- 1 | ```asm 2 | 000000000009a6a0 push rbp ; CODE XREF=Precompiled____main_main_1435+17 3 | 000000000009a6a1 mov rbp, rsp 4 | 000000000009a6a4 sub rsp, 0x18 5 | 000000000009a6a8 mov rdx, qword [r15+0x1e07] 6 | 000000000009a6af cmp rsp, qword [r14+0x40] 7 | 000000000009a6b3 jbe loc_9a7a6 8 | 9 | loc_9a6b9: 10 | 000000000009a6b9 mov rsi, qword [rdx+7] ; CODE XREF=Precompiled____main_1434+269 11 | 000000000009a6bd mov qword [rbp+var_10], rsi 12 | 000000000009a6c1 xor edi, edi 13 | 14 | loc_9a6c3: 15 | 000000000009a6c3 mov qword [rbp+var_8], rdi ; CODE XREF=Precompiled____main_1434+257 16 | 000000000009a6c7 cmp rsp, qword [r14+0x40] 17 | 000000000009a6cb jbe loc_9a7b2 18 | 19 | loc_9a6d1: 20 | 000000000009a6d1 cmp rdi, 0x2 ; CODE XREF=Precompiled____main_1434+281 21 | 000000000009a6d5 jl loc_9a6ec 22 | 23 | 000000000009a6db call Precompiled____exit_1023 ; Precompiled____exit_1023 24 | 000000000009a6e0 mov rax, qword [r14+0xc8] 25 | 000000000009a6e7 mov rsp, rbp 26 | 000000000009a6ea pop rbp 27 | 000000000009a6eb ret 28 | ; endp 29 | 30 | loc_9a6ec: 31 | 000000000009a6ec mov rax, rdi ; CODE XREF=Precompiled____main_1434+53 32 | 000000000009a6ef add rax, rax 33 | 000000000009a6f2 jno loc_9a701 34 | 35 | 000000000009a6f8 call Precompiled_Stub__iso_stub_AllocateMintSharedWithoutFPURegsStub ; Precompiled_Stub__iso_stub_AllocateMintSharedWithoutFPURegsStub 36 | 000000000009a6fd mov qword [rax+7], rdi 37 | 38 | loc_9a701: 39 | 000000000009a701 movzx rcx, word [rdx+1] ; CODE XREF=Precompiled____main_1434+82 40 | 000000000009a706 mov r11, qword [r15+0x1e07] 41 | 000000000009a70d push r11 42 | 000000000009a70f push rax 43 | 000000000009a710 mov rax, qword [r14+0x60] 44 | 000000000009a714 call qword [rax+rcx*8] 45 | 000000000009a717 pop r11 46 | 000000000009a719 pop r11 47 | 000000000009a71b mov rbx, rax 48 | 000000000009a71e mov rsi, qword [rbp+var_8] 49 | 000000000009a722 mov qword [rbp+var_18], rbx 50 | 000000000009a726 add rsi, 0x1 51 | 000000000009a72a mov qword [rbp+var_8], rsi 52 | 000000000009a72e cmp rbx, qword [r14+0xc8] 53 | 000000000009a735 jne loc_9a76b 54 | 55 | 000000000009a73b mov rax, rbx 56 | 000000000009a73e mov rdx, qword [rbp+var_10] 57 | 000000000009a742 mov rcx, qword [r14+0xc8] 58 | 000000000009a749 cmp rdx, qword [r14+0xc8] 59 | 000000000009a750 je loc_9a76b 60 | 61 | 000000000009a756 mov rsi, qword [rdx+0x27] 62 | 000000000009a75a mov rbx, qword [r15+0xb7] 63 | 000000000009a761 mov r9, qword [r15+0x1e0f] 64 | 000000000009a768 call qword [rsi+7] 65 | 66 | loc_9a76b: 67 | 000000000009a76b mov rax, qword [rbp+var_18] ; CODE XREF=Precompiled____main_1434+149, Precompiled____main_1434+176 68 | 000000000009a76f test al, 0x1 69 | 000000000009a771 mov ecx, 0x35 70 | 000000000009a776 je loc_9a77d 71 | 72 | 000000000009a778 movzx rcx, word [rax+1] 73 | 74 | loc_9a77d: 75 | 000000000009a77d push rax ; CODE XREF=Precompiled____main_1434+214 76 | 000000000009a77e mov rax, qword [r14+0x60] 77 | 000000000009a782 call qword [rax+rcx*8+0x58d8] 78 | 000000000009a789 pop r11 79 | 000000000009a78b push rax 80 | 000000000009a78c call Precompiled____printToConsole_149 ; Precompiled____printToConsole_149 81 | 000000000009a791 pop rcx 82 | 000000000009a792 mov rdi, qword [rbp+var_8] 83 | 000000000009a796 mov rsi, qword [rbp+var_10] 84 | 000000000009a79a mov rdx, qword [r15+0x1e07] 85 | 000000000009a7a1 jmp loc_9a6c3 86 | 87 | loc_9a7a6: 88 | 000000000009a7a6 call qword [r14+0x240] ; CODE XREF=Precompiled____main_1434+19 89 | 000000000009a7ad jmp loc_9a6b9 90 | 91 | loc_9a7b2: 92 | 000000000009a7b2 call qword [r14+0x240] ; CODE XREF=Precompiled____main_1434+43 93 | 000000000009a7b9 jmp loc_9a6d1 94 | ``` -------------------------------------------------------------------------------- /issue-1-prefer-const-over-final/issue-1-prefer-const-over-final.md: -------------------------------------------------------------------------------- 1 | # Issue 1 - Prefer `const` over `final` 2 | 3 | Let's discuss the optimizations that the Dart compiler applies to using constants over finals. 4 | 5 | - [Issue 1 - Prefer `const` over `final`](#issue-1---prefer-const-over-final) 6 | - [What's the difference between `const` and `final`?](#whats-the-difference-between-const-and-final) 7 | - [Diving into `const`](#diving-into-const) 8 | - [How about the `final` code?](#how-about-the-final-code) 9 | - [Conclusion](#conclusion) 10 | - [Support my work](#support-my-work) 11 | - [References](#references) 12 | 13 | ## What's the difference between `const` and `final`? 14 | 15 | A `const` in Dart is a compile-time constant, meaning that all values that comprise the final value should be constants. For instance, the value `123` is a constant, but the value `123` read from the console into a variable of type `int` is **not** a constant since it's value is **not** known at compile-time. 16 | 17 | A `final` value on the other hand cannot be assigned a new value after it has received its initial value. In Swift and Rust, this is similar to the `let` statement. A `final` variable's internals can change, but the variable cannot be overwritten by a new one. 18 | 19 | ## Diving into `const` 20 | 21 | With the following Dart code: 22 | 23 | ```dart 24 | import 'dart:io' show exit; 25 | 26 | const value1 = 0xDEADBEEF; 27 | const value2 = 0xFEEDFEED; 28 | 29 | void main(List args) { 30 | print(value1); 31 | print(value2); 32 | print(value1 + value2); 33 | exit(0); 34 | } 35 | ``` 36 | 37 | this code compiles to the following x86_64 AOT: 38 | 39 | ```asm 40 | ; ================ B E G I N N I N G O F P R O C E D U R E ================ 41 | 42 | 43 | Precompiled____main_1558: 44 | 000000000005faec push rbp ; CODE XREF=Precompiled____main_main_1559+17 45 | 000000000005faed mov rbp, rsp 46 | 000000000005faf0 cmp rsp, qword [r14+0x40] 47 | 000000000005faf4 jbe loc_5fb34 48 | 49 | loc_5fafa: 50 | 000000000005fafa mov eax, 0xdeadbeef ; CODE XREF=Precompiled____main_1558+79 51 | 000000000005faff push rax 52 | 000000000005fb00 call Precompiled____print_813 ; Precompiled____print_813 53 | 000000000005fb05 pop rcx 54 | 000000000005fb06 mov eax, 0xfeedfeed 55 | 000000000005fb0b push rax 56 | 000000000005fb0c call Precompiled____print_813 ; Precompiled____print_813 57 | 000000000005fb11 pop rcx 58 | 000000000005fb12 movabs rax, 0x1dd9bbddc 59 | 000000000005fb1c push rax 60 | 000000000005fb1d call Precompiled____print_813 ; Precompiled____print_813 61 | 000000000005fb22 pop rcx 62 | 000000000005fb23 call Precompiled____exit_1070 ; Precompiled____exit_1070 63 | 000000000005fb28 mov rax, qword [r14+0xc8] 64 | 000000000005fb2f mov rsp, rbp 65 | 000000000005fb32 pop rbp 66 | 000000000005fb33 ret 67 | ; endp 68 | ``` 69 | 70 | I won't focus on the `cmp` and the `jbe` parts where that's the compiler setting up the stack for the *main* function. We are interested in `loc_5fafa` in this case which is the body of our main function. 71 | 72 | the following Dart code: 73 | 74 | ```dart 75 | print(value1); 76 | ``` 77 | 78 | was then compiled into these x86_64 instructions: 79 | 80 | ```asm 81 | 000000000005fafa mov eax, 0xdeadbeef ; CODE XREF=Precompiled____main_1558+79 82 | 000000000005faff push rax 83 | 000000000005fb00 call Precompiled____print_813 ; Precompiled____print_813 84 | ``` 85 | 86 | first the compiler is moving the value of `0xdeadbeef` into the 64 bit `eax` register (this fills the upper-bits all with zero while the lower-bits get set to the aforementioned value) and then pushes that value into the stack. The call then happens to the `Precompiled____print_813` function where the function will set up its own stack and then pop the value of `eax` from the stack to use for printing so we won't jump into those details. I'm not sure about the `pop ecx` bit of the code but usually that means the result of the print statement is placed into the stack after it's done and is being retrieved by the `pop` instruction into the `ecx` register, it being 32 bits, instead of 64 otherwise it would be `pop rcx`! 87 | 88 | The assembly code for this: 89 | 90 | ```dart 91 | print(value2); 92 | ``` 93 | 94 | is the following, identical to the previous print statement where a `const` was involved: 95 | 96 | ```asm 97 | 000000000005fb06 mov eax, 0xfeedfeed 98 | 000000000005fb0b push rax 99 | 000000000005fb0c call Precompiled____print_813 ; Precompiled____print_813 100 | ``` 101 | 102 | so I won't explain this more since we've already seen the previous explanation! 103 | 104 | then comes the plus operator: 105 | 106 | ```dart 107 | print(value1 + value2); 108 | ``` 109 | 110 | which gets compiled into the following assembly code: 111 | 112 | ```asm 113 | 000000000005fb12 movabs rax, 0x1dd9bbddc 114 | 000000000005fb1c push rax 115 | 000000000005fb1d call Precompiled____print_813 ; Precompiled____print_813 116 | 000000000005fb22 pop rcx 117 | ``` 118 | 119 | the compiler simply added `0xdeadbeef` and `0xfeedfeed` and the result was `0x1dd9bbddc` which then is moved to the 64 bit `rax` register using `movabs` which I just learned is a GAS specific `mov` instruction so opcode-wise is the same as `mov`. 120 | 121 | the take-away from this was the simplicity of the code and how compile-time constants get added at compile-time as well, so there is no `add` instruction to add the two values since a constant `mov` is faster in most modern cpus compared with an `add` instruction even if the two operands of the `add` are cpu registers! 122 | 123 | ## How about the `final` code? 124 | 125 | So let's just make one small adjustment and turn `value2` into a `final` variable instead of a `const`: 126 | 127 | ```dart 128 | import 'dart:io' show exit; 129 | 130 | const value1 = 0xDEADBEEF; 131 | final value2 = 0xFEEDFEED; 132 | 133 | void main(List args) { 134 | print(value1); 135 | print(value2); 136 | print(value1 + value2); 137 | exit(0); 138 | } 139 | ``` 140 | 141 | the compiled code for this is almost painfully longer and more complicated. let's have a look: 142 | 143 | ```asm 144 | 000000000005faf4 push rbp ; CODE XREF=Precompiled____main_main_1560+17 145 | 000000000005faf5 mov rbp, rsp 146 | 000000000005faf8 cmp rsp, qword [r14+0x40] 147 | 000000000005fafc jbe loc_5fb79 148 | 149 | loc_5fb02: 150 | 000000000005fb02 mov eax, 0xdeadbeef ; CODE XREF=Precompiled____main_1559+140 151 | 000000000005fb07 push rax 152 | 000000000005fb08 call Precompiled____print_813 ; Precompiled____print_813 153 | 000000000005fb0d pop rcx 154 | 000000000005fb0e mov rax, qword [r14+0x88] 155 | 000000000005fb15 mov rax, qword [rax+0x900] 156 | 000000000005fb1c sar rax, 0x1 157 | 000000000005fb1f jae loc_5fb29 158 | 159 | 000000000005fb21 mov rax, qword [0x8+rax*2] 160 | 161 | loc_5fb29: 162 | 000000000005fb29 push rax ; CODE XREF=Precompiled____main_1559+43 163 | 000000000005fb2a call Precompiled____print_813 ; Precompiled____print_813 164 | 000000000005fb2f pop rcx 165 | 000000000005fb30 mov rax, qword [r14+0x88] 166 | 000000000005fb37 mov rax, qword [rax+0x900] 167 | 000000000005fb3e cmp rax, qword [r14+0xc8] 168 | 000000000005fb45 je loc_5fb82 169 | 170 | 000000000005fb4b sar rax, 0x1 171 | 000000000005fb4e jae loc_5fb58 172 | 173 | 000000000005fb50 mov rax, qword [0x8+rax*2] 174 | 175 | loc_5fb58: 176 | 000000000005fb58 mov r11d, 0xdeadbeef ; CODE XREF=Precompiled____main_1559+90 177 | 000000000005fb5e add rax, r11 178 | 000000000005fb61 push rax 179 | 000000000005fb62 call Precompiled____print_813 ; Precompiled____print_813 180 | 000000000005fb67 pop rcx 181 | 000000000005fb68 call Precompiled____exit_1070 ; Precompiled____exit_1070 182 | 000000000005fb6d mov rax, qword [r14+0xc8] 183 | 000000000005fb74 mov rsp, rbp 184 | 000000000005fb77 pop rbp 185 | 000000000005fb78 ret 186 | ; endp 187 | 188 | loc_5fb79: 189 | 000000000005fb79 call qword [r14+0x240] ; CODE XREF=Precompiled____main_1559+8 190 | 000000000005fb80 jmp loc_5fb02 191 | 192 | loc_5fb82: 193 | 000000000005fb82 call Precompiled_Stub__iso_stub_NullErrorSharedWithoutFPURegsStub ; Precompiled_Stub__iso_stub_NullErrorSharedWithoutFPURegsStub, CODE XREF=Precompiled____main_1559+81 194 | 000000000005fb87 int3 195 | ; endp 196 | ``` 197 | 198 | jesus christ! that was a lot of code. I'm not going to go through it all since we've covered some of the basics and I try not to explain what all the instructions do since Intel has documented that already! 199 | 200 | the code for printing `value1` is the exact same as it was before, since it still is a `const`: 201 | 202 | ```asm 203 | 000000000005fb02 mov eax, 0xdeadbeef ; CODE XREF=Precompiled____main_1559+140 204 | 000000000005fb07 push rax 205 | 000000000005fb08 call Precompiled____print_813 ; Precompiled____print_813 206 | 000000000005fb0d pop rcx 207 | ``` 208 | 209 | how about the compiled code for this though? 210 | 211 | ```dart 212 | print(value2); 213 | ``` 214 | 215 | well, that's where things go south! even though the value of `value2` is a final value and won't be re-assigned to, but Dart doesn't know that! everything in Dart is a class and so Dart treats them as so. In this case, what we are telling Dart is that we have an instance of the `int` class inside a `final` variable, whose value cannot be overwritten, but the `int` instance internally can change, so Dart has to accommodate this into its calculations, hence the code becomes much longer: 216 | 217 | ```asm 218 | 000000000005fb0e mov rax, qword [r14+0x88] 219 | 000000000005fb15 mov rax, qword [rax+0x900] 220 | 000000000005fb1c sar rax, 0x1 221 | 000000000005fb1f jae loc_5fb29 222 | 223 | 000000000005fb21 mov rax, qword [0x8+rax*2] 224 | 225 | loc_5fb29: 226 | 000000000005fb29 push rax ; CODE XREF=Precompiled____main_1559+43 227 | 000000000005fb2a call Precompiled____print_813 ; Precompiled____print_813 228 | 000000000005fb2f pop rcx 229 | ``` 230 | 231 | the first two `mov` instructions are *most defintely* setting up the pointer to the `value2` pointer, I could be wrong about this, but I am assuming this since I don't know any better! If you know please let me know. then we have a `jae` which is pretty much the same as `jnc` which tests the Carry Flag (CF) in EFLAGS (refer to Intel's instructions for this!) since line before that is `sar` that stands for shift-arithmetic-right and the `jae` jumps to the print statement if the carry flag is 0. It's possible all of this is done to ensure `value2` is copied over to the stack before it is handed over to the `loc_5fb29` sub-procedure but I could be completely wrong about this. one thing that is clear though is that the code is definitely using `value2` as a constant, although it is not re-written at all! 232 | 233 | the part that annoys me the most is the compiled code for this Dart code: 234 | 235 | ```dart 236 | print(value1 + value2); 237 | ``` 238 | 239 | it translates to this code: 240 | 241 | ```asm 242 | 000000000005fb30 mov rax, qword [r14+0x88] 243 | 000000000005fb37 mov rax, qword [rax+0x900] 244 | 000000000005fb3e cmp rax, qword [r14+0xc8] 245 | 000000000005fb45 je loc_5fb82 246 | 247 | 000000000005fb4b sar rax, 0x1 248 | 000000000005fb4e jae loc_5fb58 249 | 250 | 000000000005fb50 mov rax, qword [0x8+rax*2] 251 | 252 | loc_5fb58: 253 | 000000000005fb58 mov r11d, 0xdeadbeef ; CODE XREF=Precompiled____main_1559+90 254 | 000000000005fb5e add rax, r11 255 | 000000000005fb61 push rax 256 | 000000000005fb62 call Precompiled____print_813 ; Precompiled____print_813 257 | ``` 258 | 259 | you can see the same two `mov` instructions happening here again, and you can pretty much see Dart is repeating itself, and not keeping the value of `value2` in a register, since as I said before, Dart doesn't know better. It just knows that `int` is a class and between the previous `print()` function and now it's internals may have changed, so it has to redo the whole thing again, and bring `value2` into context! 260 | 261 | this is really expensive if you have massive number of these hidden `final` values that could essentially be `const` but they just are not for some reason (most probably human error). 262 | 263 | the part with the `value1` in this addition is the most straightforward as you can see 264 | 265 | ```asm 266 | 000000000005fb58 mov r11d, 0xdeadbeef ; CODE XREF=Precompiled____main_1559+90 267 | ``` 268 | 269 | ## Conclusion 270 | 271 | Try to keep your constants as constants, and don't make the mistake of defining them as `final` values just because it's your team's convention or a similar reason. The Dart compiler treats your `final` values as potentially mutable class instances, which they are! So use `const` where you can and only use `final` if you cannot use `const`. 272 | 273 | ## Support my work 274 | 275 | If you like what I do, please consider supporting me: https://www.buymeacoffee.com/vandad 276 | 277 | ## References 278 | 279 | * Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1: Basic Architecture 280 | * Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2 (2A, 2B, 2C & 2D): Instruction Set Reference, A-Z 281 | -------------------------------------------------------------------------------- /issue-5-recursive-functions-in-dart/issue-5-recursive-functions-in-dart.md: -------------------------------------------------------------------------------- 1 | # Recursive Functions in Dart 2 | 3 | Let's have a look at how recursion works at low-level by having a look at some Dart AOT code! 4 | 5 | - [Recursive Functions in Dart](#recursive-functions-in-dart) 6 | - [What is a recursive function?](#what-is-a-recursive-function) 7 | - [Low-level anatomy of recursive functions](#low-level-anatomy-of-recursive-functions) 8 | - [Traditional factorial recursive function in Dart](#traditional-factorial-recursive-function-in-dart) 9 | - [Conclusions](#conclusions) 10 | - [References](#references) 11 | 12 | ## What is a recursive function? 13 | 14 | the best way to demonstrate what a recursive function is, is through an example. given the following Dart code: 15 | 16 | ```dart 17 | import 'dart:io' show exit; 18 | 19 | int incrementUntilValueIs100OrMore(int value) { 20 | if (value >= 100) { 21 | return value; 22 | } else { 23 | return incrementUntilValueIs100OrMore(value + 1); 24 | } 25 | } 26 | 27 | void main(List args) { 28 | final value = incrementUntilValueIs100OrMore(0); 29 | print(value); 30 | exit(0); 31 | } 32 | ``` 33 | 34 | we'll get the following AOT for the `incrementUntilValueIs100OrMore` function: 35 | 36 | ```asm 37 | Precompiled____incrementUntilValueIs100OrMore_1436: 38 | 000000000009a738 push rbp ; CODE XREF=Precompiled____main_1435+17, Precompiled____incrementUntilValueIs100OrMore_1436+38 39 | 000000000009a739 mov rbp, rsp 40 | 000000000009a73c cmp rsp, qword [r14+0x40] 41 | 000000000009a740 jbe loc_9a769 42 | 43 | loc_9a746: 44 | 000000000009a746 mov rax, qword [rbp+arg_0] ; CODE XREF=Precompiled____incrementUntilValueIs100OrMore_1436+56 45 | 000000000009a74a cmp rax, 0x64 46 | 000000000009a74e jl loc_9a759 47 | 48 | 000000000009a754 mov rsp, rbp 49 | 000000000009a757 pop rbp 50 | 000000000009a758 ret 51 | ; endp 52 | 53 | loc_9a759: 54 | 000000000009a759 add rax, 0x1 ; CODE XREF=Precompiled____incrementUntilValueIs100OrMore_1436+22 55 | 000000000009a75d push rax 56 | 000000000009a75e call Precompiled____incrementUntilValueIs100OrMore_1436 ; Precompiled____incrementUntilValueIs100OrMore_1436 57 | 000000000009a763 pop rcx 58 | 000000000009a764 mov rsp, rbp 59 | 000000000009a767 pop rbp 60 | 000000000009a768 ret 61 | ; endp 62 | 63 | loc_9a769: 64 | 000000000009a769 call qword [r14+0x240] ; CODE XREF=Precompiled____incrementUntilValueIs100OrMore_1436+8 65 | 000000000009a770 jmp loc_9a746 66 | ``` 67 | 68 | let's break it down 69 | 70 | these two bits of code are related to each other and as pointed by Vyacheslav Egorov, they are there as overflow checks: 71 | 72 | ```asm 73 | Precompiled____incrementUntilValueIs100OrMore_1436: 74 | 000000000009a738 push rbp ; CODE XREF=Precompiled____main_1435+17, Precompiled____incrementUntilValueIs100OrMore_1436+38 75 | 000000000009a739 mov rbp, rsp 76 | 000000000009a73c cmp rsp, qword [r14+0x40] 77 | 000000000009a740 jbe loc_9a769 78 | 79 | ... 80 | 81 | loc_9a769: 82 | 000000000009a769 call qword [r14+0x240] ; CODE XREF=Precompiled____incrementUntilValueIs100OrMore_1436+8 83 | 000000000009a770 jmp loc_9a746 84 | ``` 85 | 86 | so I won't go through them too much now since we've looked at them in issue-4 just briefly. the important part is the `jmp` that is a short jump to the `loc_9a746` tag where the actual procedure code is laid out. then we get to `loc_9a746` which starts like this: 87 | 88 | ```asm 89 | loc_9a746: 90 | 000000000009a746 mov rax, qword [rbp+arg_0] ; CODE XREF=Precompiled____incrementUntilValueIs100OrMore_1436+56 91 | 000000000009a74a cmp rax, 0x64 92 | 000000000009a74e jl loc_9a759 93 | ``` 94 | 95 | the `mov` instruction there is setting the 64-bit value of `rax` to the value of the `value` argument, that we passed to this function to start with. then you can see `cmp rax, 0x64` where `0x64` is the base-16 value of 100, basically our if statement. the `cmp` is there to do the `if` basically and compare `rax` with 100. this is then followed by `jl` which is "Jump near if less (SF≠ OF).". this will jump to `loc_9a759` if `rax` is less than 100. in our code we said `if value >= 100` but Dart is translating this to `if value < 100` and then it jumps to `loc_9a759`. if that's not the case, in other words, if `rax` is greater than or equal to 100, then this happens: 96 | 97 | ```asm 98 | 000000000009a754 mov rsp, rbp 99 | 000000000009a757 pop rbp 100 | 000000000009a758 ret 101 | ; endp 102 | ``` 103 | 104 | I asked Vyacheslav about the calling convention of Dart and he said "it's a custom one" and I can see a little indication that the callee stores the return value in `rax` in this case we are returning an `int` in our function so it seems like Dart is reserving `rax` for that return purpose. So keep that in mind! 105 | 106 | however if `rax` is less than 100, then we jump short to `loc_9a759` which is this: 107 | 108 | ```asm 109 | loc_9a759: 110 | 000000000009a759 add rax, 0x1 ; CODE XREF=Precompiled____incrementUntilValueIs100OrMore_1436+22 111 | 000000000009a75d push rax 112 | 000000000009a75e call Precompiled____incrementUntilValueIs100OrMore_1436 ; Precompiled____incrementUntilValueIs100OrMore_1436 113 | 000000000009a763 pop rcx 114 | 000000000009a764 mov rsp, rbp 115 | 000000000009a767 pop rbp 116 | 000000000009a768 ret 117 | ; endp 118 | ``` 119 | 120 | as you can see, `rax` is getting incremented by 1 and then pushed into the stack, and Dart is calling the `Precompiled____incrementUntilValueIs100OrMore_1436` procedure again, while we are already in that procedure. in this case `Precompiled____incrementUntilValueIs100OrMore_1436` becomes both the caller and the callee. 121 | 122 | back in the `main` function where we called our procedure from, you can see this: 123 | 124 | ```asm 125 | Precompiled____main_1435: 126 | 000000000009a700 push rbp ; CODE XREF=Precompiled____main_main_1437+17 127 | 000000000009a701 mov rbp, rsp 128 | 000000000009a704 xor eax, eax 129 | 000000000009a706 cmp rsp, qword [r14+0x40] 130 | 000000000009a70a jbe loc_9a72f 131 | 132 | loc_9a710: 133 | 000000000009a710 push rax ; CODE XREF=Precompiled____main_1435+54 134 | 000000000009a711 call Precompiled____incrementUntilValueIs100OrMore_1436 ; Precompiled____incrementUntilValueIs100OrMore_1436 135 | 000000000009a716 pop rcx 136 | 000000000009a717 push rax 137 | 000000000009a718 call Precompiled____print_911 ; Precompiled____print_911 138 | 000000000009a71d pop rcx 139 | 000000000009a71e call Precompiled____exit_1024 ; Precompiled____exit_1024 140 | 000000000009a723 mov rax, qword [r14+0xc8] 141 | 000000000009a72a mov rsp, rbp 142 | 000000000009a72d pop rbp 143 | 000000000009a72e ret 144 | ; endp 145 | 146 | loc_9a72f: 147 | 000000000009a72f call qword [r14+0x240] ; CODE XREF=Precompiled____main_1435+10 148 | 000000000009a736 jmp loc_9a710 149 | ``` 150 | 151 | with this part being the most interesting part to me: 152 | 153 | ```asm 154 | 000000000009a710 push rax ; CODE XREF=Precompiled____main_1435+54 155 | 000000000009a711 call Precompiled____incrementUntilValueIs100OrMore_1436 ; Precompiled____incrementUntilValueIs100OrMore_1436 156 | 000000000009a716 pop rcx 157 | 000000000009a717 push rax 158 | 000000000009a718 call Precompiled____print_911 ; Precompiled____print_911 159 | 000000000009a71d pop rcx 160 | ``` 161 | 162 | seems like the `Precompiled____incrementUntilValueIs100OrMore_1436` procedure is keeping its return value in `rax` after it's returning. so seems like the callee returns its value in `rax` in Dart. and the `push` and `pop` is just to balance the stack since `value` went into the stack with `push rax` so the caller is responsible for balancing the calls on the stack and that's what the `pop rcx` seems to be doing there. so we can conclude that the `Precompiled____incrementUntilValueIs100OrMore_1436` procedure here is keeping its return value in `rax` all the while it is calling itself on and on until it gets to the `JL` instruction where it pops out of the whole routine! 163 | 164 | to better understand how recursive functions actually work in Dart, you'd need to know how `call`, `ret`, stack pointers, base pointer, etc work at assembly level so let's dig into those things now. 165 | 166 | ## Low-level anatomy of recursive functions 167 | 168 | let's first talk about stacks. to know about stack, you'd need to know about segments. a segment is usually a defined piece of a software that has a limit, usually maximum of 4 gigs of memory on modern hardware. then you'd have a pointer, let's say the stack pointer, that starts at the **top** of the segment. so let's say we have a stack that is 1 megabytes, 1024 bytes in other words. in normal conditions, the program would set the stack up for you, in this case Dart, and then you'd have a stack pointer, or SP (stack pointer), that is stored in `esp` under x86_32 and `rsp` in x86_64. the stack pointer's value in this case would be 1024, so it would point to the top of the stack. 169 | 170 | when you pass a function to a procedure, the compiler would set up the stack using a calling convention. a calling convention is an agreed-upon *way of* calling other procedures. for instance, a calling convention might be that the first parameter to a function is passed into `rax`, the second into `rcx` and the rest are pushed into the stack as 64-bit pointers. something like that. i'm making this up but the idea stays corrected. Dart, as I've understood from Vyacheslav Egorov, has a custom calling convention meaning that it's not really documented in a central place and this is nothing strange. the calling convention has to make sense to those who write a specific compiler, otherwise you and I who use the compiler won't even notice how it is working under the hood but a good understanding of how the stack works will help you in understanding recursive functions. 171 | 172 | to continue you'd also need to know about the base pointer, or `ebp` in 32-bit mode or `rbp` in 64-bit protected mode. usually what happens inside the creation of a stack frame for a procedure, depending on who or what has compiled the code, would look like this: 173 | 174 | ```asm 175 | push rbp 176 | mov rbp, rsp 177 | sub rsp, NNN 178 | ... now we have NNN bytes of space on the stack 179 | leave 180 | ret 181 | ``` 182 | 183 | when we enter this procedure, usually what happens is that the stack pointer points to the top of the stack for the current procedure, so any values above that pointer are usually the values that have been pushed to the stack. again depending on the calling convention, maybe arguments will get passed to gprs (general purpose registers) and from what Vyacheslav said, Dart has a custom calling convention so I cannot really document it here but we will assume that in our fictitious calling convention in this article, all arguments are passed in the stack! in that case when we enter our procedure shown before, the stack pointer would point to the last argument pushed to the stack. then you would use the base pointer to point to the memory the procedure needs to allocate for its local variables as well as the variables that the *caller* pushed to the stack and passed to us. if you see `rbp + NNN` it may mean that the procedure is reading arguments passed to it while if you see `rsp + NNN` after a `sub rsp, NNN` it *may* mean that the procedure is working with its local variables. 184 | 185 | so to summarize, calling conventions define how a caller and callee communicate with each other while the base pointer and the stack pointer are used to coordinate the access to both arguments passed to a procedure and the locally stored stack space for the procedure. 186 | 187 | knowing that, and understanding that upon a Dart function getting called, it will set up its stack using the stack pointer and the base pointer, and knowing that every value pushed to the stack decreases the stack pointer by the number of bytes needed for that variable to be stored in the stack, you'll understand that the stack can actually run out of space if you recursively call a function with no proper exit. 188 | 189 | the thing to take away from all of this rant is that every time we enter a new procedure through `CALL` where that procedure is setting up its stack, the stack pointer is decremented since we start at the top of the stack segment, so if you continue calling procedures like in our case, recursively, without actually leaving the nested procedure calls, the stack pointer will get so low that the runtime will and should eventually throw a stack overflow, which I'm sure Dart does. 190 | 191 | here is also a good little bit of information about the stack pointer from Intel: 192 | 193 | > Items are placed on the stack using the PUSH instruction and removed from the stack using the POP instruction. When an item is pushed onto the stack, the processor decrements the ESP register, then writes the item at the new top of stack. When an item is popped off the stack, the processor reads the item from the top of stack, then incre- ments the ESP register. In this manner, the stack grows down in memory (towards lesser addresses) when items are pushed on the stack and shrinks up (towards greater addresses) when the items are popped from the stack. 194 | 195 | ## Traditional factorial recursive function in Dart 196 | 197 | i can't talk about recursive functions without paying tribute to the classical factorial function implementation that uses recursion so let's have a look at that. factorial of N is the *product* of all numbers from and including 1 up to and including N, so the factorial of 6 is `1*2*3*4*5*6 = 720`. a non-recursive way of calculating factorial of N would be like this: 198 | 199 | ```dart 200 | import 'dart:io' show exit; 201 | 202 | int factorial(int value) { 203 | var result = 1; 204 | for (var count = 1; count <= value; count++) { 205 | result *= count; 206 | } 207 | return result; 208 | } 209 | 210 | void main(List args) { 211 | print(factorial(6)); 212 | exit(0); 213 | } 214 | ``` 215 | 216 | and this is quite straightforward but not what I want to demonstrate in this issue. let's have a look at how this would look like if we used recursion: 217 | 218 | ```dart 219 | import 'dart:io' show exit; 220 | 221 | int factorial(int value) => value == 1 222 | ? value 223 | : value * factorial(value - 1); 224 | 225 | void main(List args) { 226 | print(factorial(4)); 227 | exit(0); 228 | } 229 | ``` 230 | 231 | let's check this function's compiled AOT and see how that looks like: 232 | 233 | ```asm 234 | ; ================ B E G I N N I N G O F P R O C E D U R E ================ 235 | 236 | ; Variables: 237 | ; arg_0: int, 16 238 | 239 | 240 | Precompiled____factorial_1436: 241 | 000000000009a73c push rbp ; CODE XREF=Precompiled____main_1435+20, Precompiled____factorial_1436+65 242 | 000000000009a73d mov rbp, rsp 243 | 000000000009a740 cmp rsp, qword [r14+0x40] 244 | 000000000009a744 jbe loc_9a793 245 | 246 | loc_9a74a: 247 | 000000000009a74a mov rcx, qword [rbp+arg_0] ; CODE XREF=Precompiled____factorial_1436+94 248 | 000000000009a74e mov rax, rcx 249 | 000000000009a751 add rax, rax 250 | 000000000009a754 jno loc_9a763 251 | 252 | 000000000009a75a call Precompiled_Stub__iso_stub_AllocateMintSharedWithoutFPURegsStub ; Precompiled_Stub__iso_stub_AllocateMintSharedWithoutFPURegsStub 253 | 000000000009a75f mov qword [rax+7], rcx 254 | 255 | loc_9a763: 256 | 000000000009a763 cmp rax, 0x2 ; CODE XREF=Precompiled____factorial_1436+24 257 | 000000000009a767 jne loc_9a775 258 | 259 | 000000000009a76d mov rax, rcx 260 | 000000000009a770 jmp loc_9a78e 261 | 262 | loc_9a775: 263 | 000000000009a775 mov rax, rcx ; CODE XREF=Precompiled____factorial_1436+43 264 | 000000000009a778 sub rax, 0x1 265 | 000000000009a77c push rax 266 | 000000000009a77d call Precompiled____factorial_1436 ; Precompiled____factorial_1436 267 | 000000000009a782 pop rcx 268 | 000000000009a783 mov rcx, qword [rbp+arg_0] 269 | 000000000009a787 imul rcx, rax 270 | 000000000009a78b mov rax, rcx 271 | 272 | loc_9a78e: 273 | 000000000009a78e mov rsp, rbp ; CODE XREF=Precompiled____factorial_1436+52 274 | 000000000009a791 pop rbp 275 | 000000000009a792 ret 276 | ; endp 277 | 278 | loc_9a793: 279 | 000000000009a793 call qword [r14+0x240] ; CODE XREF=Precompiled____factorial_1436+8 280 | ``` 281 | 282 | the first part of the procedure is just setting up the stack: 283 | 284 | ```asm 285 | Precompiled____factorial_1436: 286 | 000000000009a73c push rbp ; CODE XREF=Precompiled____main_1435+20, Precompiled____factorial_1436+65 287 | 000000000009a73d mov rbp, rsp 288 | 000000000009a740 cmp rsp, qword [r14+0x40] 289 | 000000000009a744 jbe loc_9a793 290 | ``` 291 | 292 | so i won't go through it again! after that we are going to `loc_9a74a` which looks like this: 293 | 294 | ```asm 295 | loc_9a74a: 296 | 000000000009a74a mov rcx, qword [rbp+arg_0] ; CODE XREF=Precompiled____factorial_1436+94 297 | 000000000009a74e mov rax, rcx 298 | 000000000009a751 add rax, rax 299 | 000000000009a754 jno loc_9a763 300 | 301 | 000000000009a75a call Precompiled_Stub__iso_stub_AllocateMintSharedWithoutFPURegsStub ; Precompiled_Stub__iso_stub_AllocateMintSharedWithoutFPURegsStub 302 | 000000000009a75f mov qword [rax+7], rcx 303 | 304 | loc_9a763: 305 | 000000000009a763 cmp rax, 0x2 ; CODE XREF=Precompiled____factorial_1436+24 306 | 000000000009a767 jne loc_9a775 307 | 308 | 000000000009a76d mov rax, rcx 309 | 000000000009a770 jmp loc_9a78e 310 | ``` 311 | 312 | i truly got puzzled by all of this so I asked Vyacheslav on Twitter what all of this means and he answerd with this: 313 | 314 | > it boxes an unboxed int64 value (either into a smi or a mint box if it does not fit into a smi) and then compares result with with a smi representation of 1 it's a bug - boxing should not be needed, will be fixed by https://t.co/PIKVq6kYFD 315 | 316 | Since it seems to be a known bug that the compiler is doing additional work here than it should, and the work should essentially be replaced by doing calculations in a gpr, I will skip explaining this. but if you're interested in what mint representations and smi are in Dart, I suggest that you have a look at [this resource](https://dart.dev/articles/archive/numeric-computation). 317 | 318 | we then get to the juciest part of the procedure, `loc_9a775`, which in my opinion is this: 319 | 320 | ```asm 321 | loc_9a775: 322 | 000000000009a775 mov rax, rcx ; CODE XREF=Precompiled____factorial_1436+43 323 | 000000000009a778 sub rax, 0x1 324 | 000000000009a77c push rax 325 | 000000000009a77d call Precompiled____factorial_1436 ; Precompiled____factorial_1436 326 | 000000000009a782 pop rcx 327 | 000000000009a783 mov rcx, qword [rbp+arg_0] 328 | 000000000009a787 imul rcx, rax 329 | 000000000009a78b mov rax, rcx 330 | ``` 331 | 332 | so it looks like the current accumulated value is being placed inside `rbp+arg_0` and the current `value` is being processed by the first three instructions in `loc_9a775`. if you look at the original Dart code you may get confused by the ternary statement comparing `value` with 1 and then passing the `value - 1` (which by the way is the `sub rax, 0x01` instruction above) into itself, and you'd wonder how `value` can both be `1` and also can be the result of the calculation of the `factorial` function but one thing we need to keep in mind is that the calculation done inside the `factorial` function is being saved in the stack, and the result is being stored in the stack too, and when `value` then gets subtracted, it gets pushed as a new value and passed to the stack and then read from the stack by the new `factorial` procedure call. so we essentially have two representations of the `value`! one is being essentially used as a counter, and the other as an accumulator, perfect for `rcx` and `rax` (or vice versa) if you ask me! 333 | 334 | ## Conclusions 335 | 336 | - recursive functions do indeed call themselves, and this comes with the overhead of settign up a new stack and unwinding the stack in every pass through the function. you may not incur the same execution cost if you write your functions using iterators! 337 | - if the exit scenario for a recursive function is not set properly, you will get stack overflow since you will run out of stack space after N execution depending on what Dart is allocated the stack size for. Usually the stack can be up to 4GB long, but I'm unsure as how large the stack is allowed to grow in Dart. The only part-official information about this is provided in GitHub on the Dart SDK repo from Ivan Posva who wrote "Stack space for the main thread is not set by the VM, but by the OS on launch." 338 | 339 | ## References 340 | 341 | - [Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1: Basic Architecture](https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-1-manual.pdf) 342 | - [Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2 (2A, 2B, 2C & 2D): Instruction Set Reference, A-Z](https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf) 343 | - [Dart: Numeric computation](https://dart.dev/articles/archive/numeric-computation) 344 | -------------------------------------------------------------------------------- /issue-3-for-loop-in-dart/issue-3-for-loop-in-dart.md: -------------------------------------------------------------------------------- 1 | # `for` loop in Dart 2 | 3 | Let's go deep into how `for` loops work in Dart. 4 | 5 | - [`for` loop in Dart](#for-loop-in-dart) 6 | - [What is a `for` loop?](#what-is-a-for-loop) 7 | - [`for` loop with index](#for-loop-with-index) 8 | - [The curious case of the unoptimized empty `for` loop](#the-curious-case-of-the-unoptimized-empty-for-loop) 9 | - [Non-entry `for` loops with `const` start/end values](#non-entry-for-loops-with-const-startend-values) 10 | - [`for` loops over variable iterables](#for-loops-over-variable-iterables) 11 | - [`for` loop over `const` iterables](#for-loop-over-const-iterables) 12 | - [Conclusions](#conclusions) 13 | - [References](#references) 14 | 15 | ## What is a `for` loop? 16 | 17 | There are two types of `for` loops in Dart: 18 | 19 | 1. `for ([final/var] x = N; x [<|<=|>|>=] y; x [+= y|++|--|-=])` 20 | 2. `for (final x in y)` where y is an `Iterable` 21 | 22 | we will start off by looking at the first for loop and see what we learn! 23 | 24 | ## `for` loop with index 25 | 26 | given the following Dart code: 27 | 28 | ```dart 29 | import 'dart:io' show exit; 30 | 31 | void main(List args) { 32 | for (var x = 0xDEADBEEF; x < 0xFEEDFEED; x++) { 33 | print(x); 34 | } 35 | exit(0); 36 | } 37 | ``` 38 | 39 | we will get the following AOT: 40 | 41 | ```asm 42 | 000000000009a6a0 push rbp ; CODE XREF=Precompiled____main_main_1435+17 43 | 000000000009a6a1 mov rbp, rsp 44 | 000000000009a6a4 sub rsp, 0x8 45 | 000000000009a6a8 cmp rsp, qword [r14+0x40] 46 | 000000000009a6ac jbe loc_9a72a 47 | 48 | loc_9a6b2: 49 | 000000000009a6b2 mov edx, 0xdeadbeef ; CODE XREF=Precompiled____main_1434+145 50 | 51 | loc_9a6b7: 52 | 000000000009a6b7 mov qword [rbp+var_8], rdx ; CODE XREF=Precompiled____main_1434+119 53 | 000000000009a6bb cmp rsp, qword [r14+0x40] 54 | 000000000009a6bf jbe loc_9a736 55 | 56 | loc_9a6c5: 57 | 000000000009a6c5 mov r11d, 0xfeedfeed ; CODE XREF=Precompiled____main_1434+157 58 | 000000000009a6cb cmp rdx, r11 59 | 000000000009a6ce jge loc_9a719 60 | 61 | 000000000009a6d4 mov rax, rdx 62 | 000000000009a6d7 add rax, rax 63 | 000000000009a6da jno loc_9a6e9 64 | 65 | 000000000009a6e0 call Precompiled_Stub__iso_stub_AllocateMintSharedWithoutFPURegsStub ; Precompiled_Stub__iso_stub_AllocateMintSharedWithoutFPURegsStub 66 | 000000000009a6e5 mov qword [rax+7], rdx 67 | 68 | loc_9a6e9: 69 | 000000000009a6e9 test al, 0x1 ; CODE XREF=Precompiled____main_1434+58 70 | 000000000009a6eb mov ecx, 0x35 71 | 000000000009a6f0 je loc_9a6f7 72 | 73 | 000000000009a6f2 movzx rcx, word [rax+1] 74 | 75 | loc_9a6f7: 76 | 000000000009a6f7 push rax ; CODE XREF=Precompiled____main_1434+80 77 | 000000000009a6f8 mov rax, qword [r14+0x60] 78 | 000000000009a6fc call qword [rax+rcx*8+0x58d8] 79 | 000000000009a703 pop r11 80 | 000000000009a705 push rax 81 | 000000000009a706 call Precompiled____printToConsole_149 ; Precompiled____printToConsole_149 82 | 000000000009a70b pop rcx 83 | 000000000009a70c mov rax, qword [rbp+var_8] 84 | 000000000009a710 add rax, 0x1 85 | 000000000009a714 mov rdx, rax 86 | 000000000009a717 jmp loc_9a6b7 87 | 88 | loc_9a719: 89 | 000000000009a719 call Precompiled____exit_1023 ; Precompiled____exit_1023, CODE XREF=Precompiled____main_1434+46 90 | 000000000009a71e mov rax, qword [r14+0xc8] 91 | 000000000009a725 mov rsp, rbp 92 | 000000000009a728 pop rbp 93 | 000000000009a729 ret 94 | ; endp 95 | 96 | loc_9a72a: 97 | 000000000009a72a call qword [r14+0x240] ; CODE XREF=Precompiled____main_1434+12 98 | 000000000009a731 jmp loc_9a6b2 99 | 100 | loc_9a736: 101 | 000000000009a736 call qword [r14+0x240] ; CODE XREF=Precompiled____main_1434+31 102 | 000000000009a73d jmp loc_9a6c5 103 | ``` 104 | 105 | let's break this down and see what's going on: 106 | 107 | ```asm 108 | 000000000009a6a0 push rbp ; CODE XREF=Precompiled____main_main_1435+17 109 | 000000000009a6a1 mov rbp, rsp 110 | 000000000009a6a4 sub rsp, 0x8 111 | 000000000009a6a8 cmp rsp, qword [r14+0x40] 112 | 000000000009a6ac jbe loc_9a72a 113 | ``` 114 | 115 | with the following pseudo-code: 116 | 117 | ```asm 118 | if (rsp <= *(r14 + 0x40)) { 119 | (*(r14 + 0x240))(); 120 | } 121 | ``` 122 | this all is setting up the stack for us so there is nothing fancy that we should dive into. the next part of the code is this cute little guy: 123 | 124 | ```asm 125 | loc_9a6b2: 126 | 000000000009a6b2 mov edx, 0xdeadbeef ; CODE XREF=Precompiled____main_1434+145 127 | ``` 128 | 129 | and this is moving the starting value of the `x` variable into the 32-bit register of `edx` since Dart realizes that the initial value of the `x` variable is indeed a constant so it places it inside a register immediately. I don't yet know why Dart doesn't use the 64-bit `rdx` register when it does these moves, but maybe the CPU does that internally! 130 | 131 | the next part is this: 132 | 133 | ```asm 134 | loc_9a6b7: 135 | 000000000009a6b7 mov qword [rbp+var_8], rdx ; CODE XREF=Precompiled____main_1434+119 136 | 000000000009a6bb cmp rsp, qword [r14+0x40] 137 | 000000000009a6bf jbe loc_9a736 138 | ``` 139 | 140 | if you look at the original assembly code and look for the `loc_9a6b7` label you'll see that there is a `loc_9a6f7` label at the end of which there is a `jmp` instruction that jumps to `loc_9a6b7`. do you know what this means? This is a typical `do { ... } while (true);` statement. So you can say that `for` loops with indices in Dart are created internally using a `while` loop! 141 | 142 | back to the assembly code, this was another way of writing the following pseudo-code: 143 | 144 | ```asm 145 | var_8 = rdx; 146 | if (rsp <= *(r14 + 0x40)) { 147 | (*(r14 + 0x240))(); 148 | } 149 | ``` 150 | 151 | remember the value of `0xdeadbeef` being stored in `edx`? well, `edx` are the lower 32-bits of the `rdx` register so by doing this `mov qword [rbp+var_8], rdx`, it seems like Dart is simply storing the initial value of our variable into the stack, since it is using `rbp` (64-bit base pointer). the `cmp` and the `jpm` instruction I cannot be sure right now as to their purpose but let's carry on and ignore those for now! 152 | 153 | then we have the next chunk of code like this: 154 | 155 | ```asm 156 | loc_9a6c5: 157 | 000000000009a6c5 mov r11d, 0xfeedfeed ; CODE XREF=Precompiled____main_1434+157 158 | 000000000009a6cb cmp rdx, r11 159 | 000000000009a6ce jge loc_9a719 160 | ``` 161 | 162 | remember how Dart placed `0xdeadbeef` inside the `edx` register? well since that's the lower 32-bits of the `rdx` register Dart is now comparing that value to the `r11` register which is a quadword register (fancy way of saying 64-bits register) and its lower 32-bits value are inside `r11d` dword register. don't worry if this all sounds weird for now but know that Dart is storing both the initial and the upper-bound values of our loop variable in two CPU registers, `rdx` and `r11`! 163 | 164 | the next interesting part of the code for us is this 165 | 166 | ```asm 167 | loc_9a6f7: 168 | 000000000009a6f7 push rax ; CODE XREF=Precompiled____main_1434+80 169 | 000000000009a6f8 mov rax, qword [r14+0x60] 170 | 000000000009a6fc call qword [rax+rcx*8+0x58d8] 171 | 000000000009a703 pop r11 172 | 000000000009a705 push rax 173 | 000000000009a706 call Precompiled____printToConsole_149 ; Precompiled____printToConsole_149 174 | 000000000009a70b pop rcx 175 | 000000000009a70c mov rax, qword [rbp+var_8] 176 | 000000000009a710 add rax, 0x1 177 | 000000000009a714 mov rdx, rax 178 | 000000000009a717 jmp loc_9a6b7 179 | ``` 180 | 181 | especially the `mov rax, qword [r14+0x60]` and the `push rax` calls where the actual value of the `x` variable is being loaded to the 64-bit `rax` register and then pushed into the stack to later be used by the `Precompiled____printToConsole_149` function. So this all is quite normal but the exciting part is this: 182 | 183 | ```asm 184 | 000000000009a710 add rax, 0x1 185 | 000000000009a714 mov rdx, rax 186 | 000000000009a717 jmp loc_9a6b7 187 | ``` 188 | 189 | recall how the `edx` is supposed to hold onto our `x` variable's current value. Well, there you have it. Dart is doing `add rax, 0x01` for the `x++` part and then it's moving the value to `rdx` which is the 64-bit value with `edx` being the lower 32-bit half, essentially just adding 1 to `x` and then continuing the code until it hits `loc_9a6c5` where the value of `rdx` will be compared to the upper-bounds of our `x` variable, stored in `r11` and if the value is greater than or equal to `0xfeedfeed` (`jge loc_9a719`) then it jumps to `loc_9a719` which is the `exit(0)` function: 190 | 191 | ```asm 192 | loc_9a719: 193 | 000000000009a719 call Precompiled____exit_1023 ; Precompiled____exit_1023, CODE XREF=Precompiled____main_1434+46 194 | 000000000009a71e mov rax, qword [r14+0xc8] 195 | 000000000009a725 mov rsp, rbp 196 | 000000000009a728 pop rbp 197 | 000000000009a729 ret 198 | ; endp 199 | ``` 200 | here you can see the entire pseudo-code for this asm code: 201 | 202 | ```asm 203 | int Precompiled____main_1434(int arg0, int arg1, int arg2, int arg3, int arg4, int arg5) { 204 | r9 = arg5; 205 | r8 = arg4; 206 | rcx = arg3; 207 | rsi = arg1; 208 | rdi = arg0; 209 | if (rsp <= *(r14 + 0x40)) { 210 | (*(r14 + 0x240))(); 211 | } 212 | rdx = 0xffffffffdeadbeef; 213 | do { 214 | var_8 = rdx; 215 | if (rsp <= *(r14 + 0x40)) { 216 | (*(r14 + 0x240))(); 217 | } 218 | if (rdx >= 0xfffffffffeedfeed) { 219 | break; 220 | } 221 | rax = rdx + rdx; 222 | if (OVERFLOW(rax)) { 223 | rax = Precompiled_Stub__iso_stub_AllocateMintSharedWithoutFPURegsStub(rdi, rsi, rdx, rcx, r8, r9, var_8, stack[-8], stack[0], stack[8], stack[16], stack[24], stack[32], stack[40], stack[48], stack[56], stack[64], stack[72], stack[80]); 224 | *(rax + 0x7) = rdx; 225 | } 226 | rcx = 0x35; 227 | if ((rax & 0x1) != 0x0) { 228 | rcx = *(int16_t *)(rax + 0x1) & 0xffff; 229 | } 230 | stack[-24] = (*(*(r14 + 0x60) + rcx * 0x8 + 0x58d8))(); 231 | Precompiled____printToConsole_149(rdi, rsi, rdx, rcx, r8, r9, stack[-24]); 232 | rcx = stack[-24]; 233 | rsp = ((rsp - 0x8) + 0x8 - 0x8) + 0x8; 234 | rdx = var_8 + 0x1; 235 | } while (true); 236 | Precompiled____exit_1023(); 237 | rax = *(r14 + 0xc8); 238 | return rax; 239 | } 240 | ``` 241 | 242 | ## The curious case of the unoptimized empty `for` loop 243 | 244 | for the given Dart code: 245 | 246 | ```dart 247 | import 'dart:io' show exit; 248 | 249 | void main(List args) { 250 | for (var x = 0xDEADBEEF; x < 0xFEEDFEED; x++) {} 251 | exit(0); 252 | } 253 | ``` 254 | 255 | we get the following AOT code 🤦🏻‍♂️: 256 | 257 | ```asm 258 | Precompiled____main_1433: 259 | 000000000009a644 push rbp ; CODE XREF=Precompiled____main_main_1434+17 260 | 000000000009a645 mov rbp, rsp 261 | 000000000009a648 cmp rsp, qword [r14+0x40] 262 | 000000000009a64c jbe loc_9a687 263 | 264 | loc_9a652: 265 | 000000000009a652 mov eax, 0xdeadbeef ; CODE XREF=Precompiled____main_1433+74 266 | 267 | loc_9a657: 268 | 000000000009a657 cmp rsp, qword [r14+0x40] ; CODE XREF=Precompiled____main_1433+48 269 | 000000000009a65b jbe loc_9a690 270 | 271 | loc_9a661: 272 | 000000000009a661 mov r11d, 0xfeedfeed ; CODE XREF=Precompiled____main_1433+83 273 | 000000000009a667 cmp rax, r11 274 | 000000000009a66a jge loc_9a676 275 | 276 | 000000000009a670 add rax, 0x1 277 | 000000000009a674 jmp loc_9a657 278 | 279 | loc_9a676: 280 | 000000000009a676 call Precompiled____exit_1015 ; Precompiled____exit_1015, CODE XREF=Precompiled____main_1433+38 281 | 000000000009a67b mov rax, qword [r14+0xc8] 282 | 000000000009a682 mov rsp, rbp 283 | 000000000009a685 pop rbp 284 | 000000000009a686 ret 285 | ; endp 286 | 287 | loc_9a687: 288 | 000000000009a687 call qword [r14+0x240] ; CODE XREF=Precompiled____main_1433+8 289 | 000000000009a68e jmp loc_9a652 290 | 291 | loc_9a690: 292 | 000000000009a690 call qword [r14+0x240] ; CODE XREF=Precompiled____main_1433+23 293 | 000000000009a697 jmp loc_9a661 294 | ``` 295 | 296 | this is very similar, if not identical to the previous code we looked at, and the problem I have with this code is that the Dart compiler didn't understand that this was an empty loop, and really created the loop code for it! Is this a bug? It could be. If you're a person working on the Dart compiler maybe you could sort this out! 297 | 298 | For us Dart developers though this means that if you have a `for` loop somewhere with a variable, make sure it does something 😂 299 | 300 | ## Non-entry `for` loops with `const` start/end values 301 | 302 | for the given Dart code: 303 | 304 | ```dart 305 | import 'dart:io' show exit; 306 | 307 | void main(List args) { 308 | for (var x = 0xDEADBEEF; x < 0xDEADBEEF; x++) { 309 | print(x); 310 | } 311 | exit(0); 312 | } 313 | ``` 314 | 315 | we get the following AOT: 316 | 317 | ```asm 318 | Precompiled____main_1433: 319 | 000000000009a644 push rbp ; CODE XREF=Precompiled____main_main_1434+17 320 | 000000000009a645 mov rbp, rsp 321 | 000000000009a648 cmp rsp, qword [r14+0x40] 322 | 000000000009a64c jbe loc_9a66d 323 | 324 | loc_9a652: 325 | 000000000009a652 cmp rsp, qword [r14+0x40] ; CODE XREF=Precompiled____main_1433+48 326 | 000000000009a656 jbe loc_9a676 327 | 328 | loc_9a65c: 329 | 000000000009a65c call Precompiled____exit_1022 ; Precompiled____exit_1022, CODE XREF=Precompiled____main_1433+57 330 | 000000000009a661 mov rax, qword [r14+0xc8] 331 | 000000000009a668 mov rsp, rbp 332 | 000000000009a66b pop rbp 333 | 000000000009a66c ret 334 | ; endp 335 | 336 | loc_9a66d: 337 | 000000000009a66d call qword [r14+0x240] ; CODE XREF=Precompiled____main_1433+8 338 | 000000000009a674 jmp loc_9a652 339 | 340 | loc_9a676: 341 | 000000000009a676 call qword [r14+0x240] ; CODE XREF=Precompiled____main_1433+18 342 | 000000000009a67d jmp loc_9a65c 343 | ``` 344 | 345 | as you can see nowhere in this code you can find a reference to our magic numbers nor can you find a reference to the `print()` function so this is great to know that as long as the start and the end values of your `for` loops are known at compile-time (contants), and if the end value and your incremenets/decrements make it so that the loop can never produce any iterations, then Dart is able to optimize out the whole loop! Good to know! 346 | 347 | ## `for` loops over variable iterables 348 | 349 | now let's imagine the following scenario that you want to iterate over a list of strings as shown here: 350 | 351 | ```dart 352 | import 'dart:io' show exit; 353 | 354 | void main(List args) { 355 | for (final value in args) { 356 | print(value); 357 | } 358 | exit(0); 359 | } 360 | ``` 361 | 362 | for this code, we will get a rather chunky AOT asm so I'm not going to dump the whole thing here but the jist of it is this part: 363 | 364 | ```asm 365 | 000000000009a6e6 mov rax, qword [rbp+var_8] 366 | 000000000009a6ea movzx rcx, word [rax+1] 367 | 000000000009a6ef push rax 368 | 000000000009a6f0 mov rax, qword [r14+0x60] 369 | 000000000009a6f4 call qword [rax+rcx*8+0x60] 370 | 000000000009a6f8 pop r11 371 | 000000000009a6fa push rax 372 | 000000000009a6fb call Precompiled____printToConsole_141 ; Precompiled____printToConsole_141 373 | 000000000009a700 pop rcx 374 | 000000000009a701 mov rax, qword [rbp+var_8] 375 | 000000000009a705 jmp loc_9a6c5 376 | ``` 377 | 378 | the part to pay close attention to is here, which might actually help a lot of programmers to understand how indexing into arrays work: 379 | 380 | ```asm 381 | 000000000009a6f4 call qword [rax+rcx*8+0x60] 382 | ``` 383 | 384 | check this out, this is just beautiful, it seems like `rax` holds the base address to a function that can access this particular memory address assigned to `args`! then `rcx` is holding the index (well done Dart compiler, that's exactly what `*cx` registers are for!) to the current item we are iterating over. so `rcx` starts at 0 for the first item in `args` so the result will be `base address + 0*8 + 0x60`, since every item in `args` has an 8 bytes (64 bits) pointer to it under x86_64, so once `rcx` is moved to the second item (index 1), we calculate `base address + 8 + 0x60` where `0x60` is most definitely the offset for where `args` is stored in the heap. 385 | 386 | ## `for` loop over `const` iterables 387 | 388 | we can look at a simpler example now where the iterable is a compile-time constant: 389 | 390 | ```dart 391 | import 'dart:io' show exit; 392 | 393 | const values = [0xDEADBEEF, 0xFEEDFEED]; 394 | 395 | void main(List args) { 396 | for (final value in values) { 397 | print(value); 398 | } 399 | exit(0); 400 | } 401 | ``` 402 | 403 | we will get [the following AOT code](snippet-1.md). I've decided not to paste that whole thing here since it is too long. the first thing you will notice is how long the code is and that Dart seems to be using some threading functions such as `AllocateMintSharedWithoutFPURegs` which you can find references to in the [Dart's actual source code](https://github.com/dart-lang/sdk/blob/e995cb5f7cd67d39c1ee4bdbe95c8241db36725f/runtime/vm/thread.h). This is way above my head since I haven't had the time to dive deep into the Dart's compiler code itself but I can certainly see the generated AOT and relalize that it is **not** smart. it would have been much smarter for Dart to realize that there are only two values in this iterable and simply made a normal `for` loop into that. 404 | 405 | let's now convert this `for` with iterable into a traditional `for` loop and see what the AOT is: 406 | 407 | ```dart 408 | import 'dart:io' show exit; 409 | 410 | const values = [0xDEADBEEF, 0xFEEDFEED]; 411 | 412 | void main(List args) { 413 | for (var i = 0; i < values.length; i++) { 414 | print(values[i]); 415 | } 416 | exit(0); 417 | } 418 | ``` 419 | 420 | this generates the following AOT: 421 | 422 | ```asm 423 | Precompiled____main_1434: 424 | 000000000009a6a0 push rbp ; CODE XREF=Precompiled____main_main_1435+17 425 | 000000000009a6a1 mov rbp, rsp 426 | 000000000009a6a4 sub rsp, 0x8 427 | 000000000009a6a8 cmp rsp, qword [r14+0x40] 428 | 000000000009a6ac jbe loc_9a71d 429 | 430 | loc_9a6b2: 431 | 000000000009a6b2 xor edx, edx ; CODE XREF=Precompiled____main_1434+132 432 | 433 | loc_9a6b4: 434 | 000000000009a6b4 mov rax, qword [r15+0x1e07] ; CODE XREF=Precompiled____main_1434+106 435 | 000000000009a6bb mov qword [rbp+var_8], rdx 436 | 000000000009a6bf cmp rsp, qword [r14+0x40] 437 | 000000000009a6c3 jbe loc_9a726 438 | 439 | loc_9a6c9: 440 | 000000000009a6c9 cmp rdx, 0x2 ; CODE XREF=Precompiled____main_1434+141 441 | 000000000009a6cd jge loc_9a70c 442 | 443 | 000000000009a6d3 mov rcx, qword [rax+rdx*8+0x17] 444 | 000000000009a6d8 test cl, 0x1 445 | 000000000009a6db mov ebx, 0x35 446 | 000000000009a6e0 je loc_9a6e7 447 | 448 | 000000000009a6e2 movzx rbx, word [rcx+1] 449 | 450 | loc_9a6e7: 451 | 000000000009a6e7 push rcx ; CODE XREF=Precompiled____main_1434+64 452 | 000000000009a6e8 mov rcx, rbx 453 | 000000000009a6eb mov rax, qword [r14+0x60] 454 | 000000000009a6ef call qword [rax+rcx*8+0x58d8] 455 | 000000000009a6f6 pop r11 456 | 000000000009a6f8 push rax 457 | 000000000009a6f9 call Precompiled____printToConsole_149 ; Precompiled____printToConsole_149 458 | 000000000009a6fe pop rcx 459 | 000000000009a6ff mov rax, qword [rbp+var_8] 460 | 000000000009a703 add rax, 0x1 461 | 000000000009a707 mov rdx, rax 462 | 000000000009a70a jmp loc_9a6b4 463 | 464 | loc_9a70c: 465 | 000000000009a70c call Precompiled____exit_1023 ; Precompiled____exit_1023, CODE XREF=Precompiled____main_1434+45 466 | 000000000009a711 mov rax, qword [r14+0xc8] 467 | 000000000009a718 mov rsp, rbp 468 | 000000000009a71b pop rbp 469 | 000000000009a71c ret 470 | ; endp 471 | 472 | loc_9a71d: 473 | 000000000009a71d call qword [r14+0x240] ; CODE XREF=Precompiled____main_1434+12 474 | 000000000009a724 jmp loc_9a6b2 475 | 476 | loc_9a726: 477 | 000000000009a726 call qword [r14+0x240] ; CODE XREF=Precompiled____main_1434+35 478 | ``` 479 | 480 | it's sad to say that the traditional `for` variant in this case is actually more performant and without a doubt easier for the compiler to solve since there are no iterables to iterate through. note that the iteration is being done using the `for` with an index, so no iterables are being created unlike the previous example. 481 | 482 | in this example we have a simple loop that is being incremented 1 value at a time (`0x01`) like this: 483 | 484 | ```asm 485 | 000000000009a6ff mov rax, qword [rbp+var_8] 486 | 000000000009a703 add rax, 0x1 487 | 000000000009a707 mov rdx, rax 488 | 000000000009a70a jmp loc_9a6b4 489 | ``` 490 | 491 | and the `value` is being retrieved from the `values` constant `List` using an effective address calculated with `rcx` holding the index (`i`): 492 | 493 | ```asm 494 | loc_9a6e7: 495 | 000000000009a6e7 push rcx ; CODE XREF=Precompiled____main_1434+64 496 | 000000000009a6e8 mov rcx, rbx 497 | 000000000009a6eb mov rax, qword [r14+0x60] 498 | 000000000009a6ef call qword [rax+rcx*8+0x58d8] 499 | ``` 500 | 501 | so there you have it, Dart seems to favor traditionl `for` loops *without* iterables over the more modern `for` loop over an iterable 🤷🏻‍♂️ 502 | 503 | ## Conclusions 504 | 505 | - Dart's `for` loop with an index is internally a `do { ... } while (true);` statement under the hood! 506 | - Dart keeps, if possible, both the initial and the upper/lower bound of a `for` loop inside CPU registers, speeding up calculations. 507 | - Dart doesn't seem to be able to optimize out empty `for` loops with indices! But hopefully you're not writing loops that don't do anything! 508 | - Dart optimizes out non-entry loops as long as conditions are compile-time constants! 509 | - Dart (at least my version of Dart `2.14.0 ` on the `stable` channel) tends to favor (asm-code wise) the looping over iterables using a traditional `for` loop as opposed to the more modern `for` loop over an iterable! 510 | 511 | ## References 512 | 513 | - Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1: Basic Architecture 514 | - Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2 (2A, 2B, 2C & 2D): Instruction Set Reference, A-Z 515 | -------------------------------------------------------------------------------- /issue-2-const-in-dart/issue-2-const-in-dart.md: -------------------------------------------------------------------------------- 1 | # `const` in Dart 2 | 3 | Let's go deep with `const` in Dart and see how it works under the hood. 4 | 5 | - [`const` in Dart](#const-in-dart) 6 | - [What is `const`?](#what-is-const) 7 | - [`const int`](#const-int) 8 | - [`const double`](#const-double) 9 | - [`const String`](#const-string) 10 | - [Curious case of mixing `const int` and `const double`](#curious-case-of-mixing-const-int-and-const-double) 11 | - [`const` custom classes](#const-custom-classes) 12 | - [Conclusion](#conclusion) 13 | - [References](#references) 14 | 15 | ## What is `const`? 16 | 17 | `const` denotes a variable whose value is known at compile-time. The value of the variable cannot be overwritten during runtime and nor can the value change internally. 18 | 19 | ## `const int` 20 | 21 | given the following Dart code: 22 | 23 | ```dart 24 | import 'dart:io' show exit; 25 | 26 | const intConst = 0xDEADBEEF; 27 | void main(List args) { 28 | print(intConst); 29 | exit(0); 30 | } 31 | ``` 32 | 33 | we will get the following AOT compilation: 34 | 35 | ```asm 36 | Precompiled____main_1558: 37 | 000000000005faec push rbp ; CODE XREF=Precompiled____main_main_1559+17 38 | 000000000005faed mov rbp, rsp 39 | 000000000005faf0 cmp rsp, qword [r14+0x40] 40 | 000000000005faf4 jbe loc_5fb17 41 | 42 | loc_5fafa: 43 | 000000000005fafa mov eax, 0xdeadbeef ; CODE XREF=Precompiled____main_1558+50 44 | 000000000005faff push rax 45 | 000000000005fb00 call Precompiled____print_813 ; Precompiled____print_813 46 | 000000000005fb05 pop rcx 47 | 000000000005fb06 call Precompiled____exit_1070 ; Precompiled____exit_1070 48 | 000000000005fb0b mov rax, qword [r14+0xc8] 49 | 000000000005fb12 mov rsp, rbp 50 | 000000000005fb15 pop rbp 51 | 000000000005fb16 ret 52 | ; endp 53 | 54 | loc_5fb17: 55 | 000000000005fb17 call qword [r14+0x240] ; CODE XREF=Precompiled____main_1558+8 56 | 000000000005fb1e jmp loc_5fafa 57 | ``` 58 | 59 | there are no surprises there, the const value is placed right inside the `eax` 32-bit register (not sure why we are using eax here and not `rax`, could it be because it's faster?), and then pushed into the stack using the `push` instruction, and then we are calling the `Precompiled____print_813` procedure to print it to the console. The thing to note is how the value of that integer was placed in the stack (as opposed to the heap) and then printed immediately. 60 | 61 | ## `const double` 62 | 63 | given the following Dart code: 64 | 65 | ```dart 66 | import 'dart:io' show exit; 67 | 68 | const doubleConst = 1.2; 69 | void main(List args) { 70 | print(doubleConst); 71 | exit(0); 72 | } 73 | ``` 74 | 75 | we'll get the following AOT compiled code: 76 | 77 | ```asm 78 | Precompiled____main_1558: 79 | 000000000005fac8 push rbp ; CODE XREF=Precompiled____main_main_1559+17 80 | 000000000005fac9 mov rbp, rsp 81 | 000000000005facc cmp rsp, qword [r14+0x40] 82 | 000000000005fad0 jbe loc_5faec 83 | 84 | loc_5fad6: 85 | 000000000005fad6 call Precompiled____print_813 ; Precompiled____print_813, CODE XREF=Precompiled____main_1558+43 86 | 000000000005fadb call Precompiled____exit_1070 ; Precompiled____exit_1070 87 | 000000000005fae0 mov rax, qword [r14+0xc8] 88 | 000000000005fae7 mov rsp, rbp 89 | 000000000005faea pop rbp 90 | 000000000005faeb ret 91 | ; endp 92 | 93 | loc_5faec: 94 | 000000000005faec call qword [r14+0x240] ; CODE XREF=Precompiled____main_1558+8 95 | 000000000005faf3 jmp loc_5fad6 96 | ``` 97 | 98 | this code is very different from the `const int` variant, since you see nowhere in this code where the value of `1.2` is moved to any register of any sort. Instead, Dart has created a new function called `Precompiled____print_813` and is calling that function instead. Let's go deep into that function to see what's going on: 99 | 100 | ```asm 101 | Precompiled____print_813: 102 | 0000000000037458 push rbp ; CODE XREF=Precompiled____main_1558+14 103 | 0000000000037459 mov rbp, rsp 104 | 000000000003745c cmp rsp, qword [r14+0x40] 105 | 0000000000037460 jbe loc_37488 106 | 107 | loc_37466: 108 | 0000000000037466 mov r11, qword [r15+0x20af] ; CODE XREF=Precompiled____print_813+55 109 | 000000000003746d push r11 110 | 000000000003746f call Precompiled__Double_0150898_toString_1175 ; Precompiled__Double_0150898_toString_1175 111 | 0000000000037474 pop rcx 112 | 0000000000037475 push rax 113 | 0000000000037476 call Precompiled____printToConsole_146 ; Precompiled____printToConsole_146 114 | 000000000003747b pop rcx 115 | 000000000003747c mov rax, qword [r14+0xc8] 116 | 0000000000037483 mov rsp, rbp 117 | 0000000000037486 pop rbp 118 | 0000000000037487 ret 119 | ; endp 120 | 121 | loc_37488: 122 | 0000000000037488 call qword [r14+0x240] ; CODE XREF=Precompiled____print_813+8 123 | 000000000003748f jmp loc_37466 124 | ``` 125 | 126 | the first part of the code before the linebreak is the setting up of the local stack so I won't talk about that. the interesting part is the `loc_37466` label where the first instruction is the `mov` instruction. This is _most definitely_ moving the pointer to the `doubleConst` constat into the `r11` 64-bit register. I could be wrong about this but I think that's it. So Dart is not hardcoding the `1.2` value as a 64-bit floating point into a register, instead, it's loading it from an effective address into the `r11` register. I _think_ this _could_ be less efficient than putting the constant value of the `double` right into the stack but I could be wrong about this. If you know, let me know too! 127 | 128 | then Dart is calling the `Precompiled__Double_0150898_toString_1175` function, supposedly to convert the double to a string instance. I won't go into the internals of that function but we get what it's doing. once that function is done, it puts its result (the pointer to the string) into the `rax` 64-bit register nad we are then pushing `rax` into the stack just before calling the `Precompiled____printToConsole_146` function which will read that value from its stack using the `rbp` and `rsp`. so to summarize, `const double` values are not really loaded into the stack in Dart the same way as `const int`. I don't know why! They _could_ be though, if they were just treated as a 64-bit value and placed right into a 64-bit register like `rax`! If you're a compiler engineer at Google or know why this is not the case, please do chime in. 129 | 130 | ## `const String` 131 | 132 | given the following Dart code: 133 | 134 | ```dart 135 | import 'dart:io' show exit; 136 | 137 | const stringConst = 'Hello, World!'; 138 | void main(List args) { 139 | print(stringConst); 140 | exit(0); 141 | } 142 | ``` 143 | 144 | we get the following AOT: 145 | 146 | ```asm 147 | Precompiled____main_1558: 148 | 000000000005fab8 push rbp ; CODE XREF=Precompiled____main_main_1559+17 149 | 000000000005fab9 mov rbp, rsp 150 | 000000000005fabc cmp rsp, qword [r14+0x40] 151 | 000000000005fac0 jbe loc_5fadc 152 | 153 | loc_5fac6: 154 | 000000000005fac6 call Precompiled____print_813 ; Precompiled____print_813, CODE XREF=Precompiled____main_1558+43 155 | 000000000005facb call Precompiled____exit_1070 ; Precompiled____exit_1070 156 | 000000000005fad0 mov rax, qword [r14+0xc8] 157 | 000000000005fad7 mov rsp, rbp 158 | 000000000005fada pop rbp 159 | 000000000005fadb ret 160 | ; endp 161 | 162 | loc_5fadc: 163 | 000000000005fadc call qword [r14+0x240] ; CODE XREF=Precompiled____main_1558+8 164 | 000000000005fae3 jmp loc_5fac6 165 | ``` 166 | 167 | this is very similar to the `const double` AOT if you look closely, so let's go deep into the `Precompiled____print_813` function and see what's happening there: 168 | 169 | ```asm 170 | 000000000003745c push rbp ; CODE XREF=Precompiled____main_1558+14 171 | 000000000003745d mov rbp, rsp 172 | 0000000000037460 cmp rsp, qword [r14+0x40] 173 | 0000000000037464 jbe loc_3747b 174 | 175 | loc_3746a: 176 | 000000000003746a call Precompiled____printToConsole_146 ; Precompiled____printToConsole_146, CODE XREF=Precompiled____print_813+38 177 | 000000000003746f mov rax, qword [r14+0xc8] 178 | 0000000000037476 mov rsp, rbp 179 | 0000000000037479 pop rbp 180 | 000000000003747a ret 181 | ; endp 182 | 183 | loc_3747b: 184 | 000000000003747b call qword [r14+0x240] ; CODE XREF=Precompiled____print_813+8 185 | 0000000000037482 jmp loc_3746a 186 | ``` 187 | 188 | okay that was unexpected! the main function called this function and all this function is doing is just calling another function called `Precompiled____printToConsole_146` and nowhere here call we find any reference to our string. I find this to be one layer too much, but I could be wrong here. let's see what's happening inside the `Precompiled____printToConsole_146` function: 189 | 190 | ```asm 191 | 000000000000f2ec push rbp ; CODE XREF=Precompiled____print_813+14 192 | 000000000000f2ed mov rbp, rsp 193 | 000000000000f2f0 cmp rsp, qword [r14+0x40] 194 | 000000000000f2f4 jbe loc_f346 195 | 196 | loc_f2fa: 197 | 000000000000f2fa mov rax, qword [r14+0x88] ; CODE XREF=Precompiled____printToConsole_146+97 198 | 000000000000f301 mov rax, qword [rax+0x38] 199 | 000000000000f305 cmp rax, qword [r15+0x27] 200 | 000000000000f309 jne loc_f31b 201 | 202 | 000000000000f30f mov rax, qword [r15+0x1867] 203 | 000000000000f316 call Precompiled_Stub__iso_stub_InitStaticFieldStub ; Precompiled_Stub__iso_stub_InitStaticFieldStub 204 | 205 | loc_f31b: 206 | 000000000000f31b mov rcx, qword [rax+0x1f] ; CODE XREF=Precompiled____printToConsole_146+29 207 | 000000000000f31f push rax 208 | 000000000000f320 mov r11, qword [r15+0x20af] 209 | 000000000000f327 push r11 210 | 000000000000f329 mov rax, rcx 211 | 000000000000f32c mov r10, qword [r15+0x7f] 212 | 000000000000f330 mov rcx, qword [rax+0xf] 213 | 000000000000f334 call rcx 214 | 000000000000f336 pop r11 215 | 000000000000f338 pop r11 216 | 000000000000f33a mov rax, qword [r14+0xc8] 217 | 000000000000f341 mov rsp, rbp 218 | 000000000000f344 pop rbp 219 | 000000000000f345 ret 220 | ; endp 221 | 222 | loc_f346: 223 | 000000000000f346 call qword [r14+0x240] ; CODE XREF=Precompiled____printToConsole_146+8 224 | 000000000000f34d jmp loc_f2fa 225 | ``` 226 | 227 | the part we're interested in is this: 228 | 229 | ```asm 230 | loc_f31b: 231 | 000000000000f31b mov rcx, qword [rax+0x1f] ; CODE XREF=Precompiled____printToConsole_146+29 232 | 000000000000f31f push rax 233 | 000000000000f320 mov r11, qword [r15+0x20af] 234 | 000000000000f327 push r11 235 | 000000000000f329 mov rax, rcx 236 | 000000000000f32c mov r10, qword [r15+0x7f] 237 | 000000000000f330 mov rcx, qword [rax+0xf] 238 | 000000000000f334 call rcx 239 | ``` 240 | 241 | it's interesting that in the other cases, the print function was being called with a call function and just a reference to the label (function name) but in this case the pointer to the print function is being loaded into the 64-bit register `ecx` (first line of code) and then called finally on the last line! the strange part is that `[rax+0x1f]` is literally a pointer to `000000000000f309 jne loc_f31b` which is jump short if not equal (if zero flag is 0, check EFLAGS in Intel references). this is way above my head and I don't really understand what `Precompiled_Stub__iso_stub_InitStaticFieldStub` does to be honest, but from the looks of it, it seems like it loads a given (through a push on the stack) static string from the data sector into the memory so that it can be used. But if you know better, please let me know too. Here is the data sector's placement of our string though if you can solve this mystery out yourself 242 | 243 | ```asm 244 | 0000000000084b30 db 0x48 ; 'H' 245 | 0000000000084b31 db 0x65 ; 'e' 246 | 0000000000084b32 db 0x6c ; 'l' 247 | 0000000000084b33 db 0x6c ; 'l' 248 | 0000000000084b34 db 0x6f ; 'o' 249 | 0000000000084b35 db 0x2c ; ',' 250 | 0000000000084b36 db 0x20 ; ' ' 251 | 0000000000084b37 db 0x57 ; 'W' 252 | 0000000000084b38 db 0x6f ; 'o' 253 | 0000000000084b39 db 0x72 ; 'r' 254 | 0000000000084b3a db 0x6c ; 'l' 255 | 0000000000084b3b db 0x64 ; 'd' 256 | 0000000000084b3c db 0x21 ; '!' 257 | ``` 258 | 259 | with the data section header reading as follows: 260 | 261 | ```asm 262 | ; Segment Segment 6 263 | ; Range: [0x62000; 0xb2840[ (329792 bytes) 264 | ; File offset : [401408; 731200[ (329792 bytes) 265 | ; Permissions: readable 266 | ; Flags: 0x4 267 | 268 | 269 | 270 | ; Section .rodata 271 | ; Range: [0x62000; 0xb26f0[ (329456 bytes) 272 | ; File offset : [401408; 730864[ (329456 bytes) 273 | ; Flags: 0x2 274 | ; SHT_PROGBITS 275 | ; SHF_ALLOC 276 | 277 | _kDartIsolateSnapshotData: 278 | 0000000000062000 db 0xf5 ; '.' 279 | 0000000000062001 db 0xf5 ; '.' 280 | 0000000000062002 db 0xdc ; '.' 281 | ``` 282 | 283 | ## Curious case of mixing `const int` and `const double` 284 | 285 | given the following Dart code: 286 | 287 | ```dart 288 | import 'dart:io' show exit; 289 | 290 | const intConst = 0xDEADBEEF; 291 | const doubleConst = 1.2; 292 | void main(List args) { 293 | print(intConst); 294 | print(doubleConst); 295 | exit(0); 296 | } 297 | ``` 298 | 299 | i would expect the `intConst` to be loaded into `eax` as it was before in the previous section where we talked about constant integers. but that's not the case! let's look at the AOT: 300 | 301 | ```asm 302 | Precompiled____main_1558: 303 | 000000000005fad8 push rbp ; CODE XREF=Precompiled____main_main_1559+17 304 | 000000000005fad9 mov rbp, rsp 305 | 000000000005fadc cmp rsp, qword [r14+0x40] 306 | 000000000005fae0 jbe loc_5fb15 307 | 308 | loc_5fae6: 309 | 000000000005fae6 mov r11, qword [r15+0x207f] ; CODE XREF=Precompiled____main_1558+68 310 | 000000000005faed push r11 311 | 000000000005faef call Precompiled____print_813 ; Precompiled____print_813 312 | 000000000005faf4 pop rcx 313 | 000000000005faf5 mov r11, qword [r15+0x2087] 314 | 000000000005fafc push r11 315 | 000000000005fafe call Precompiled____print_813 ; Precompiled____print_813 316 | 000000000005fb03 pop rcx 317 | 000000000005fb04 call Precompiled____exit_1070 ; Precompiled____exit_1070 318 | 000000000005fb09 mov rax, qword [r14+0xc8] 319 | 000000000005fb10 mov rsp, rbp 320 | 000000000005fb13 pop rbp 321 | 000000000005fb14 ret 322 | ; endp 323 | 324 | loc_5fb15: 325 | 000000000005fb15 call qword [r14+0x240] ; CODE XREF=Precompiled____main_1558+8 326 | 000000000005fb1c jmp loc_5fae6 327 | ``` 328 | 329 | now all of a sudden the `intConst` is not placed in the `eax` register anymore, instead it is loaded like this: 330 | 331 | ```asm 332 | 000000000005fae6 mov r11, qword [r15+0x207f] ; CODE XREF=Precompiled____main_1558+68 333 | 000000000005faed push r11 334 | 000000000005faef call Precompiled____print_813 ; Precompiled____print_813 335 | ``` 336 | 337 | this is pretty much another way of saying: 338 | 339 | ```asm 340 | stack[-16] = *(r15 + 0x207f); 341 | Precompiled____print_813(rdi, rsi, rdx, rcx, r8, r9, stack[-16]); 342 | ``` 343 | 344 | so this way we are loading the pointer to the `intConst` into the stack and then calling the `Precompiled____print_813` function with that value placed in the stack. the `intConst` got demoted from a constant register value to a stack value for some reason. I think only a dart compiler engineer at Google can answer why this demotion happened to be honest. If you know the answer please let me know. 345 | 346 | ## `const` custom classes 347 | 348 | let's have a look at a constant custom class: 349 | 350 | ```dart 351 | import 'dart:io' show exit; 352 | 353 | class Person { 354 | final int age; 355 | const Person(this.age); 356 | } 357 | void main(List args) { 358 | final foo = Person(0xDEADBEEF); 359 | print(foo.age); 360 | exit(0); 361 | } 362 | ``` 363 | 364 | that compiles to the following AOT: 365 | 366 | ```asm 367 | 000000000005faec push rbp ; CODE XREF=Precompiled____main_main_1559+17 368 | 000000000005faed mov rbp, rsp 369 | 000000000005faf0 cmp rsp, qword [r14+0x40] 370 | 000000000005faf4 jbe loc_5fb17 371 | 372 | loc_5fafa: 373 | 000000000005fafa mov eax, 0xdeadbeef ; CODE XREF=Precompiled____main_1558+50 374 | 000000000005faff push rax 375 | 000000000005fb00 call Precompiled____print_812 ; Precompiled____print_812 376 | 000000000005fb05 pop rcx 377 | 000000000005fb06 call Precompiled____exit_1066 ; Precompiled____exit_1066 378 | 000000000005fb0b mov rax, qword [r14+0xc8] 379 | 000000000005fb12 mov rsp, rbp 380 | 000000000005fb15 pop rbp 381 | 000000000005fb16 ret 382 | ; endp 383 | 384 | loc_5fb17: 385 | 000000000005fb17 call qword [r14+0x240] ; CODE XREF=Precompiled____main_1558+8 386 | 000000000005fb1e jmp loc_5fafa 387 | ``` 388 | 389 | this is great, and just as you'd expect it. the 32-bit value of `0xdeadbeef` that was assigned to the `age` of the `Person` instance is placed right into the `eax` 32-bit register and then printed to the screen. nice, no instance of the `Person` class was even created. 390 | 391 | now let's make it more complicated and have some logic in the initializer of the `Person` class: 392 | 393 | ```dart 394 | import 'dart:io' show exit; 395 | 396 | class Person { 397 | final int age; 398 | const Person(int age) : this.age = age + 0xFEEDFEED; 399 | } 400 | void main(List args) { 401 | final foo = Person(0xDEADBEEF); 402 | print(foo.age); 403 | exit(0); 404 | } 405 | ``` 406 | 407 | and get the following AOT: 408 | 409 | ```asm 410 | 411 | Precompiled____main_1558: 412 | 000000000005faec push rbp ; CODE XREF=Precompiled____main_main_1559+17 413 | 000000000005faed mov rbp, rsp 414 | 000000000005faf0 cmp rsp, qword [r14+0x40] 415 | 000000000005faf4 jbe loc_5fb1c 416 | 417 | loc_5fafa: 418 | 000000000005fafa movabs rax, 0x1dd9bbddc ; CODE XREF=Precompiled____main_1558+55 419 | 000000000005fb04 push rax 420 | 000000000005fb05 call Precompiled____print_812 ; Precompiled____print_812 421 | 000000000005fb0a pop rcx 422 | 000000000005fb0b call Precompiled____exit_1066 ; Precompiled____exit_1066 423 | 000000000005fb10 mov rax, qword [r14+0xc8] 424 | 000000000005fb17 mov rsp, rbp 425 | 000000000005fb1a pop rbp 426 | 000000000005fb1b ret 427 | ; endp 428 | 429 | loc_5fb1c: 430 | 000000000005fb1c call qword [r14+0x240] ; CODE XREF=Precompiled____main_1558+8 431 | 000000000005fb23 jmp loc_5fafa 432 | ``` 433 | 434 | this is also really neat. what Dart did here was that it took `0xdeadbeef` and found out that the constructor of the `Person` class is doing some calculation with that value, which in this case is to add `0xfeedfeed` to it and if you calculate that yourself you'll get `0x1DD9BBDDC` and it placed that value directly in the `rax` 32-bit register and passed it to the print statement. Neat huh? 435 | 436 | let's make the `Person` class more complicated and exciting: 437 | 438 | ```dart 439 | import 'dart:io' show exit; 440 | 441 | class Person { 442 | final String firstName; 443 | final String lastName; 444 | final String fullName; 445 | 446 | const Person(this.firstName, this.lastName) : fullName = '$firstName $lastName'; 447 | } 448 | void main(List args) { 449 | final foo = Person('Foo', 'Bar'); 450 | print(foo); 451 | exit(0); 452 | } 453 | ``` 454 | 455 | to get the following AOT: 456 | 457 | ```asm 458 | Precompiled____main_1558: 459 | 000000000005fac0 push rbp ; CODE XREF=Precompiled____main_main_1560+17 460 | 000000000005fac1 mov rbp, rsp 461 | 000000000005fac4 cmp rsp, qword [r14+0x40] 462 | 000000000005fac8 jbe loc_5faeb 463 | 464 | loc_5face: 465 | 000000000005face call Precompiled_AllocationStub_Person_1559 ; Precompiled_AllocationStub_Person_1559, CODE XREF=Precompiled____main_1558+50 466 | 000000000005fad3 push rax 467 | 000000000005fad4 call Precompiled____print_812 ; Precompiled____print_812 468 | 000000000005fad9 pop rcx 469 | 000000000005fada call Precompiled____exit_1066 ; Precompiled____exit_1066 470 | 000000000005fadf mov rax, qword [r14+0xc8] 471 | 000000000005fae6 mov rsp, rbp 472 | 000000000005fae9 pop rbp 473 | 000000000005faea ret 474 | ; endp 475 | 476 | loc_5faeb: 477 | 000000000005faeb call qword [r14+0x240] ; CODE XREF=Precompiled____main_1558+8 478 | 000000000005faf2 jmp loc_5face 479 | ``` 480 | 481 | oh, spicy! now we got some juice out of the example. you can see how the first line of the `loc_5face` label is making a call to the `Precompiled_AllocationStub_Person_1559` procedure. In this case, since our example got more complicated and involves two `String` final values to construct the eventual instance of the `Person` class, Dart doesn't seem to be able to optimize the implementation since `print(foo)` will internally call the `toString()` function of `Person` which we haven't overwritten, so it will have to make an instance of the `Person` class which is an `Object` and internally call that function to print out the `toString()` result. lots going on here but one thing at a time, let's look at `Precompiled_AllocationStub_Person_1559` and see what's going on there: 482 | 483 | ```asm 484 | Precompiled_AllocationStub_Person_1559: 485 | 000000000005faf4 mov r8d, 0xc70104 ; CODE XREF=Precompiled____main_1558+14 486 | 000000000005fafa jmp qword [r14+0x228] 487 | ; endp 488 | ``` 489 | 490 | which is another interpretation of the following pseudo-code: 491 | 492 | ```asm 493 | void Precompiled_AllocationStub_Person_1559() { 494 | (*(r14 + 0x228))(); 495 | return; 496 | } 497 | ``` 498 | 499 | since I am not debuggin the code while looking at the code, I can't be 100% sure what's lurking inside the `(*(r14 + 0x228))` pointer but it's for sure a function since it is being jumped to using the `jmp` instruction. the double-word register of r8d is being set to `0xc70104` though so that could be the parameter to the function lurking behind the pointer. this could all be the internals of Dart OR I may just not know what I'm talking about to be honest but for me this seems like an allocation function that is hidden from plain-sight and is creating a new instance of the `Person` class and placing its pointer in the `eax` register since if you look again at the original asm code, you'll see this: 500 | 501 | ```asm 502 | 000000000005fad3 push rax 503 | 000000000005fad4 call Precompiled____print_812 ; Precompiled____print_812 504 | ``` 505 | 506 | so here `eax` is for sure the pointer to our `Person` instance which then gets sent to `Precompiled____print_812` which internally calls the `toString()` function on that instance. 507 | 508 | ## Conclusion 509 | 510 | the `const` syntax in Dart is, like other languages, a hint to the compiler to make both your life and the compiler's life easier. some calculations might be done at compile-time to even make the code run faster such as our `Person` class above where we only stored the person's age and then printed that age property directly to the screen. No `Person` instance was then created in that case. But when we made the `Person` class carry more information and have a computed property, things got more complicated. Here are a few takeaways: 511 | 512 | - constant `int` are _sometimes_ placed inside a register (not even in the stack) directly and then worked with. as shown in this article `int` constants can be demoted to stack variables in some certain conditions and I don't really know the reason why! 513 | - constant `double` values are loaded from memory (not placed directly inside a register, unlike constant `int` values) and then used 514 | - constant `String` instances are first loaded into the memory through 2 layers of function calls and then printed to the screen 515 | - constant custom class instances, if just placeholders for data, and depending on what you are doing with those instances, could simply be calculated at compile-time and placed inside CPU registers to be used. 516 | - constant custom class instances, if more complicated and doing calculations such as string concatenation, and depending on what you do with those instances, may need to be allocated at run-time into their instances, resolved to proper pointers in the heap, and then passed around to different functions to be processed. 517 | 518 | ## References 519 | 520 | - Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1: Basic Architecture 521 | - Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2 (2A, 2B, 2C & 2D): Instruction Set Reference, A-Z 522 | -------------------------------------------------------------------------------- /issue-4-functions-in-dart/issue-4-functions-in-dart.md: -------------------------------------------------------------------------------- 1 | # Functions in Dart 2 | 3 | in this issue I want to explore how functions are compiled and are passed their arguments internally in Dart. I'm going to compile the code into x86_64 AOT with `dart compile` and then see the instructions generated for the Dart code. 4 | 5 | - [Functions in Dart](#functions-in-dart) 6 | - [Global functions without 0 arguments](#global-functions-without-0-arguments) 7 | - [Global functions with 1 compile-time constant argument](#global-functions-with-1-compile-time-constant-argument) 8 | - [Global functions with 1 non-compile-time-constant argument](#global-functions-with-1-non-compile-time-constant-argument) 9 | - [One-liner optimized `static` functions](#one-liner-optimized-static-functions) 10 | - [More complex `static` functions](#more-complex-static-functions) 11 | - [Conclusions](#conclusions) 12 | - [References](#references) 13 | 14 | ## Global functions without 0 arguments 15 | 16 | given the following Dart code with a loose function with 0 arguments: 17 | 18 | ```dart 19 | import 'dart:io' show exit; 20 | 21 | void foo() { 22 | print(0xDEADBEEF); 23 | } 24 | 25 | void main(List args) { 26 | foo(); 27 | exit(0); 28 | } 29 | ``` 30 | 31 | we will get the following AOT: 32 | 33 | ```asm 34 | Precompiled____foo_1436: 35 | 000000000009a730 push rbp ; CODE XREF=Precompiled____main_1435+14 36 | 000000000009a731 mov rbp, rsp 37 | 000000000009a734 mov eax, 0xdeadbeef 38 | 000000000009a739 cmp rsp, qword [r14+0x40] 39 | 000000000009a73d jbe loc_9a756 40 | 41 | loc_9a743: 42 | 000000000009a743 push rax ; CODE XREF=Precompiled____foo_1436+45 43 | 000000000009a744 call Precompiled____print_911 ; Precompiled____print_911 44 | 000000000009a749 pop rcx 45 | 000000000009a74a mov rax, qword [r14+0xc8] 46 | 000000000009a751 mov rsp, rbp 47 | 000000000009a754 pop rbp 48 | 000000000009a755 ret 49 | ; endp 50 | 51 | loc_9a756: 52 | 000000000009a756 call qword [r14+0x240] ; CODE XREF=Precompiled____foo_1436+13 53 | 000000000009a75d jmp loc_9a743 54 | ``` 55 | 56 | let's dig into this beauty and see what happened! the first part of the code is a so-called StackOverflowCheck. I had no idea about this. just to be clear, this is the code I'm talking about: 57 | 58 | ```asm 59 | Precompiled____foo_1436: 60 | 000000000009a730 push rbp ; CODE XREF=Precompiled____main_1435+14 61 | 000000000009a731 mov rbp, rsp 62 | 000000000009a734 mov eax, 0xdeadbeef 63 | 000000000009a739 cmp rsp, qword [r14+0x40] 64 | 000000000009a73d jbe loc_9a756 65 | ``` 66 | 67 | minus the `mov eax 0xdeadbeef`, that's just the value to printed later, but the rest is a stackoverflow check. I asked on Twitter about this and got the following response from Vyacheslav Egorov: 68 | 69 | > it's called StackOverflowCheck - it serves two main purposes: 1) checking for stack overflow 2) checking for internal interrupts scheduled by VM (e.g. might be used by GC to pause the execution on a safepoint). 70 | 71 | Thanks to Vyacheslav, I now know that the overflow check is done inside a file called `runtime_entry.cc` in the Dart SDK repo, where you will see this beauty: 72 | 73 | ```cpp 74 | DEFINE_RUNTIME_ENTRY(StackOverflow, 0) { 75 | #if defined(USING_SIMULATOR) 76 | uword stack_pos = Simulator::Current()->get_sp(); 77 | // If simulator was never called it may return 0 as a value of SPREG. 78 | if (stack_pos == 0) { 79 | // Use any reasonable value which would not be treated 80 | // as stack overflow. 81 | stack_pos = thread->saved_stack_limit(); 82 | } 83 | #else 84 | uword stack_pos = OSThread::GetCurrentStackPointer(); 85 | #endif 86 | // Always clear the stack overflow flags. They are meant for this 87 | // particular stack overflow runtime call and are not meant to 88 | // persist. 89 | uword stack_overflow_flags = thread->GetAndClearStackOverflowFlags(); 90 | 91 | // If an interrupt happens at the same time as a stack overflow, we 92 | // process the stack overflow now and leave the interrupt for next 93 | // time. 94 | if (!thread->os_thread()->HasStackHeadroom() || 95 | IsCalleeFrameOf(thread->saved_stack_limit(), stack_pos)) { 96 | if (FLAG_verbose_stack_overflow) { 97 | OS::PrintErr("Stack overflow\n"); 98 | OS::PrintErr(" Native SP = %" Px ", stack limit = %" Px "\n", stack_pos, 99 | thread->saved_stack_limit()); 100 | OS::PrintErr("Call stack:\n"); 101 | OS::PrintErr("size | frame\n"); 102 | StackFrameIterator frames(ValidationPolicy::kDontValidateFrames, thread, 103 | StackFrameIterator::kNoCrossThreadIteration); 104 | uword fp = stack_pos; 105 | StackFrame* frame = frames.NextFrame(); 106 | while (frame != NULL) { 107 | uword delta = (frame->fp() - fp); 108 | fp = frame->fp(); 109 | OS::PrintErr("%4" Pd " %s\n", delta, frame->ToCString()); 110 | frame = frames.NextFrame(); 111 | } 112 | } 113 | 114 | // Use the preallocated stack overflow exception to avoid calling 115 | // into dart code. 116 | const Instance& exception = 117 | Instance::Handle(isolate->group()->object_store()->stack_overflow()); 118 | Exceptions::Throw(thread, exception); 119 | UNREACHABLE(); 120 | } 121 | 122 | #if !defined(PRODUCT) && !defined(DART_PRECOMPILED_RUNTIME) 123 | HandleStackOverflowTestCases(thread); 124 | #endif // !defined(PRODUCT) && !defined(DART_PRECOMPILED_RUNTIME) 125 | ``` 126 | 127 | so what all of this is doing is, with excerpts from Vyacheslav, this is checking the current stack pointer against *a limit* to make sure it's not gone over that, so that a runtime routine can catch such stack overflows and handle that internally! so basically we don't have to worry about that part of the code, is what I'm trying to say 🤠 128 | 129 | so until this point, before the `loc_9a63f` label, the 32-bit CPU register `eax` (the lower 32-bits of `rax`) is holding the value we are going to print to the console using the `print` function. It's good to know, for me as well, that while operating in a 64-bit environment, a `mov` instruction onto a 32-bit register zeroes out the upper 32 bits of its 64-bit register, so that `mov eax, foo` will zero out the upper 32-bits of `rax`, `mov ebx, foo` will zero out the upper 32-bits of `rbx` and etc. The relevant information to this can be found in section *3.4.1.1 General-Purpose Registers in 64-Bit Mode* in "Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1: Basic Architecture", as pointed out by Vyacheslav, and it goes like: 130 | 131 | > - 64-bit operands generate a 64-bit result in the destination general-purpose register. 132 | > - **32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register.** 133 | > - 8-bit and 16-bit operands generate an 8-bit or 16-bit result. The upper 56 bits or 48 bits (respectively) of the destination general-purpose register are not modified by the operation. If the result of an 8-bit or 16-bit operation is intended for 64-bit address calculation, explicitly sign-extend the register to the full 64-bits. 134 | 135 | ```asm 136 | loc_9a63f: 137 | 000000000009a63f push rax ; CODE XREF=Precompiled____foo_1430+45 138 | 000000000009a640 call Precompiled____print_911 ; Precompiled____print_911 139 | 000000000009a645 pop rcx 140 | 000000000009a646 mov rax, qword [r14+0xc8] 141 | 000000000009a64d mov rsp, rbp 142 | 000000000009a650 pop rbp 143 | 000000000009a651 ret 144 | ; endp 145 | ``` 146 | 147 | that is pushing the value of `0x00000000deadbeef` into the stack as a 64-bit register, for use by the `Precompiled____print_911` procedure. 148 | 149 | if you call this function multiple times like this: 150 | 151 | ```dart 152 | import 'dart:io' show exit; 153 | 154 | void foo() { 155 | print(0xDEADBEEF); 156 | } 157 | 158 | void main(List args) { 159 | foo(); 160 | foo(); 161 | foo(); 162 | exit(0); 163 | } 164 | ``` 165 | 166 | the output AOT would be: 167 | 168 | ```asm 169 | Precompiled____main_1435: 170 | 000000000009a700 push rbp ; CODE XREF=Precompiled____main_main_1437+17 171 | 000000000009a701 mov rbp, rsp 172 | 000000000009a704 cmp rsp, qword [r14+0x40] 173 | 000000000009a708 jbe loc_9a72e 174 | 175 | loc_9a70e: 176 | 000000000009a70e call Precompiled____foo_1436 ; Precompiled____foo_1436, CODE XREF=Precompiled____main_1435+53 177 | 000000000009a713 call Precompiled____foo_1436 ; Precompiled____foo_1436 178 | 000000000009a718 call Precompiled____foo_1436 ; Precompiled____foo_1436 179 | 000000000009a71d call Precompiled____exit_1024 ; Precompiled____exit_1024 180 | 000000000009a722 mov rax, qword [r14+0xc8] 181 | 000000000009a729 mov rsp, rbp 182 | 000000000009a72c pop rbp 183 | 000000000009a72d ret 184 | ; endp 185 | 186 | loc_9a72e: 187 | 000000000009a72e call qword [r14+0x240] ; CODE XREF=Precompiled____main_1435+8 188 | 000000000009a735 jmp loc_9a70e 189 | ``` 190 | 191 | you can see the compiler is literally calling that function 3 times in a row. I know for sure that the Dart compiler *can* in fact optimize simple functions so that instead of a function call being done here it would do it inline, or even at compile-time if the function works with constant values for instance, but in this case, the compiler has decided that the `foo` function is not optimizable such that it can be done inline. I'm not so sure about the internals at the Dart SDK level how optimizations are done to global functions like this but I am almost sure there is a reason behind this! 192 | 193 | I think it's important to have a look at other global functions with 0 arguments of a different sort just to get a better understand of how Dart optimizations work under the hood because the above example is not a good representation of capabilities of the Dart compiler, I feel like! 194 | 195 | given the following Dart code: 196 | 197 | ```dart 198 | import 'dart:io' show exit; 199 | 200 | int foo() => 0xDEADBEEF; 201 | 202 | void main(List args) { 203 | print(foo); 204 | exit(0); 205 | } 206 | ``` 207 | 208 | Note: we are not actually invoking the foo function, rather printing its so called tear-off, and in Vyacheslav's words: 209 | 210 | > we print foo's tear-off - which is just a closure forwarding invocations to foo 211 | 212 | we get the following AOT: 213 | 214 | ```asm 215 | Precompiled____main_1435: 216 | 000000000009a6ec push rbp ; CODE XREF=Precompiled____main_main_1437+17 217 | 000000000009a6ed mov rbp, rsp 218 | 000000000009a6f0 cmp rsp, qword [r14+0x40] 219 | 000000000009a6f4 jbe loc_9a71a 220 | 221 | loc_9a6fa: 222 | 000000000009a6fa mov r11, qword [r15+0x1e07] ; CODE XREF=Precompiled____main_1435+53 223 | 000000000009a701 push r11 224 | 000000000009a703 call Precompiled____print_911 ; Precompiled____print_911 225 | 000000000009a708 pop rcx 226 | 000000000009a709 call Precompiled____exit_1024 ; Precompiled____exit_1024 227 | 000000000009a70e mov rax, qword [r14+0xc8] 228 | 000000000009a715 mov rsp, rbp 229 | 000000000009a718 pop rbp 230 | 000000000009a719 ret 231 | ; endp 232 | 233 | loc_9a71a: 234 | 000000000009a71a call qword [r14+0x240] ; CODE XREF=Precompiled____main_1435+8 235 | 000000000009a721 jmp loc_9a6fa 236 | ``` 237 | 238 | well this was interesting! there is no mention of the `foo` function anywhere in this code. the interesting part of the code for me is this: 239 | 240 | ```asm 241 | 000000000009a6fa mov r11, qword [r15+0x1e07] ; CODE XREF=Precompiled____main_1435+53 242 | 000000000009a701 push r11 243 | 000000000009a703 call Precompiled____print_911 ; Precompiled____print_911 244 | ``` 245 | 246 | I can see that the `r11` 64-bit GPR is getting set to `[r15+0x1e07]`. I was unsure at first about what `r15` actually is here so again, I asked Vyacheslav Egorov and in his words: 247 | 248 | > R15 is reserved as "object pool" register in Dart calling conventions, it contains a pointer to a pool object through which generated code accesses different constant and auxiliary objects in the isolate group's GC managed heap. 249 | 250 | In the original Issue 4, I made a mistake of tracing all of this back to the `DecodeLoadObjectFromPoolOrThread` function in the Dart SDK's source code but here is what Vyacheslav had to say about that: 251 | 252 | > the function you have found is used in the internal disassembler built into VM for debugging purposes (e.g. you can tell VM to disassemble the code it generates), in which case the disassembler looks for some specific patterns in the code (e.g. loads from the pool) and decodes what objects they will load at runtime - so that it could print a small comment next to these instructions in the disassembly listing to make it more human readable. This code is pattern matching on the stream of x86_64 instructions - hence a lot of magic constants, these are just parts of Intel instruction encoding. I think a more appropriate place to look at is something like `Assembler::LoadObjectHelper` which is a function that _generates_ machine code for loading a reference to a specific object into a specific register. 253 | 254 | I decided to leave this comment exactly as it is since it contains a lot of useful information for me as well. I won't paste the internals of the `Assembler::LoadObjectHelper` function here since it will diverge from the goal of this issue, but I will keep digging into the `DecodeLoadObjectFromPoolOrThread` function since it seems like this function is doing pretty much what I want to do, which is reverse engineer the actual source code! So here is the code block which I find helpful in understanding what the original code was doing! Teh `DecodeLoadObjectFromPoolOrThread` function in Dart SDK's source code, under the `instructions_x64.cc` file: 255 | 256 | ```cpp 257 | bool DecodeLoadObjectFromPoolOrThread(uword pc, const Code& code, Object* obj) { 258 | ASSERT(code.ContainsInstructionAt(pc)); 259 | 260 | uint8_t* bytes = reinterpret_cast(pc); 261 | 262 | COMPILE_ASSERT(THR == R14); 263 | if ((bytes[0] == 0x49) || (bytes[0] == 0x4d)) { 264 | if ((bytes[1] == 0x8b) || (bytes[1] == 0x3b)) { // movq, cmpq 265 | if ((bytes[2] & 0xc7) == (0x80 | (THR & 7))) { // [r14+disp32] 266 | int32_t offset = LoadUnaligned(reinterpret_cast(pc + 3)); 267 | return Thread::ObjectAtOffset(offset, obj); 268 | } 269 | if ((bytes[2] & 0xc7) == (0x40 | (THR & 7))) { // [r14+disp8] 270 | uint8_t offset = *reinterpret_cast(pc + 3); 271 | return Thread::ObjectAtOffset(offset, obj); 272 | } 273 | } 274 | } 275 | 276 | if (((bytes[0] == 0x41) && (bytes[1] == 0xff) && (bytes[2] == 0x76))) { 277 | // push [r14+disp8] 278 | uint8_t offset = *reinterpret_cast(pc + 3); 279 | return Thread::ObjectAtOffset(offset, obj); 280 | } 281 | 282 | COMPILE_ASSERT(PP == R15); 283 | if ((bytes[0] == 0x49) || (bytes[0] == 0x4d)) { 284 | if ((bytes[1] == 0x8b) || (bytes[1] == 0x3b)) { // movq, cmpq 285 | if ((bytes[2] & 0xc7) == (0x80 | (PP & 7))) { // [r15+disp32] 286 | intptr_t index = IndexFromPPLoadDisp32(pc + 3); 287 | const ObjectPool& pool = ObjectPool::Handle(code.GetObjectPool()); 288 | if (!pool.IsNull() && (index < pool.Length()) && 289 | (pool.TypeAt(index) == ObjectPool::EntryType::kTaggedObject)) { 290 | *obj = pool.ObjectAt(index); 291 | return true; 292 | } 293 | } 294 | if ((bytes[2] & 0xc7) == (0x40 | (PP & 7))) { // [r15+disp8] 295 | intptr_t index = IndexFromPPLoadDisp8(pc + 3); 296 | const ObjectPool& pool = ObjectPool::Handle(code.GetObjectPool()); 297 | if (!pool.IsNull() && (index < pool.Length()) && 298 | (pool.TypeAt(index) == ObjectPool::EntryType::kTaggedObject)) { 299 | *obj = pool.ObjectAt(index); 300 | return true; 301 | } 302 | } 303 | } 304 | } 305 | 306 | return false; 307 | } 308 | ``` 309 | 310 | the part which I think we need to look at is everything after this point: 311 | 312 | ```cpp 313 | COMPILE_ASSERT(PP == R15); 314 | ... 315 | ``` 316 | 317 | what's interesting to me is this comment `[r15+disp32]` which tells me that the value that is added to the location of `r15` (object pool) is the effective displacement address of where the object resides in memory. looking at this `intptr_t index = IndexFromPPLoadDisp32(pc + 3);` i can see that the compiler is moving 3 bytes over `pc` which is set initially as the value of `bytes`, since the first 3 bytes inside the `bytes`/`pc` values are some magic numbers that only the compiler understands! I can't pretend like I know what this code is actually doing even though the team has left some comments on it, do you? 🤷🏻‍♂️ 318 | 319 | ```cpp 320 | if ((bytes[0] == 0x49) || (bytes[0] == 0x4d)) { 321 | if ((bytes[1] == 0x8b) || (bytes[1] == 0x3b)) { // movq, cmpq 322 | if ((bytes[2] & 0xc7) == (0x80 | (PP & 7))) { // [r15+disp32] 323 | ``` 324 | 325 | what I *do* know with almost certainty (again I could be wrong about this!) is that we will end up in this code block: 326 | 327 | ```cpp 328 | intptr_t index = IndexFromPPLoadDisp32(pc + 3); 329 | const ObjectPool& pool = ObjectPool::Handle(code.GetObjectPool()); 330 | if (!pool.IsNull() && (index < pool.Length()) && 331 | (pool.TypeAt(index) == ObjectPool::EntryType::kTaggedObject)) { 332 | *obj = pool.ObjectAt(index); 333 | return true; 334 | } 335 | ``` 336 | 337 | and Dart is getting an object pointer to `foo` function using `IndexFromPPLoadDisp32` and then getting the pointer to the object pool using `code.GetObjectPool()` and then finally retrieving our object using `pool.ObjectAt(index)`. 338 | 339 | Now if we change the code so that we either print the return value of `foo` by writing `print(foo())` or simply changing `foo` to be a getter like so: 340 | 341 | ```dart 342 | import 'dart:io' show exit; 343 | 344 | int get foo => 0xDEADBEEF; 345 | 346 | void main(List args) { 347 | print(foo); 348 | exit(0); 349 | } 350 | ``` 351 | 352 | we will get this AOT: 353 | 354 | ```asm 355 | Precompiled____main_1435: 356 | 000000000009a700 push rbp ; CODE XREF=Precompiled____main_main_1436+17 357 | 000000000009a701 mov rbp, rsp 358 | 000000000009a704 mov eax, 0xdeadbeef 359 | 000000000009a709 cmp rsp, qword [r14+0x40] 360 | 000000000009a70d jbe loc_9a72b 361 | 362 | loc_9a713: 363 | 000000000009a713 push rax ; CODE XREF=Precompiled____main_1435+50 364 | 000000000009a714 call Precompiled____print_911 ; Precompiled____print_911 365 | 000000000009a719 pop rcx 366 | 000000000009a71a call Precompiled____exit_1024 ; Precompiled____exit_1024 367 | 000000000009a71f mov rax, qword [r14+0xc8] 368 | 000000000009a726 mov rsp, rbp 369 | 000000000009a729 pop rbp 370 | 000000000009a72a ret 371 | ; endp 372 | 373 | loc_9a72b: 374 | 000000000009a72b call qword [r14+0x240] ; CODE XREF=Precompiled____main_1435+13 375 | 000000000009a732 jmp loc_9a713 376 | ``` 377 | 378 | in this code I created a `foo` getter that simply returns the constant value of `FOO` and as you can see, in `000000000009a704` the value of `0xdeadbeef` was placed in `eax` (and setting the higher 32-bits of `rax` to 0x00000000 as seen earlier) and then going directly to the `Precompiled____print_911` procedure. 379 | 380 | ## Global functions with 1 compile-time constant argument 381 | 382 | given the following Dart code: 383 | 384 | ```dart 385 | import 'dart:io' show exit; 386 | 387 | int increment(int value) { 388 | return value + 1; 389 | } 390 | 391 | void main(List args) { 392 | print(increment(increment(increment(1)))); 393 | exit(0); 394 | } 395 | ``` 396 | 397 | we will get this AOT: 398 | 399 | ```asm 400 | 000000000009a700 push rbp ; CODE XREF=Precompiled____main_main_1436+17 401 | 000000000009a701 mov rbp, rsp 402 | 000000000009a704 mov eax, 0x4 403 | 000000000009a709 cmp rsp, qword [r14+0x40] 404 | 000000000009a70d jbe loc_9a72b 405 | 406 | loc_9a713: 407 | 000000000009a713 push rax ; CODE XREF=Precompiled____main_1435+50 408 | 000000000009a714 call Precompiled____print_911 ; Precompiled____print_911 409 | 000000000009a719 pop rcx 410 | 000000000009a71a call Precompiled____exit_1024 ; Precompiled____exit_1024 411 | 000000000009a71f mov rax, qword [r14+0xc8] 412 | 000000000009a726 mov rsp, rbp 413 | 000000000009a729 pop rbp 414 | 000000000009a72a ret 415 | ; endp 416 | 417 | loc_9a72b: 418 | 000000000009a72b call qword [r14+0x240] ; CODE XREF=Precompiled____main_1435+13 419 | 000000000009a732 jmp loc_9a713 420 | ``` 421 | 422 | you see that little `mov eax, 0x4` instruction up there? well that's the result to be printed to the screen using `Precompiled____print_911`. The Dart compiler just took the value of 0x01 that we passed to the inner function, calculated that the function just adds 1 to 0x01 so it becomes 0x02, and passed 0x02 to the function again to see it becomes 0x03 and finally 0x04 on the last pass; so it didn't even compile the `increment()` function and I can see inside the symbols i the resulting AOT that the `increment()` function is indeed not present at all in the binary. 423 | 424 | ## Global functions with 1 non-compile-time-constant argument 425 | 426 | let's make it a bit harded for the compiler to optimize this function so let's take this as an example: 427 | 428 | ```dart 429 | import 'dart:io' show exit; 430 | 431 | int increment(int value) { 432 | return value + 1; 433 | } 434 | 435 | void main(List args) { 436 | final number = int.tryParse( 437 | args.firstWhere( 438 | (_) => true, 439 | orElse: () => '', 440 | ), 441 | ) ?? 442 | 0; 443 | print(increment(increment(increment(number)))); 444 | exit(0); 445 | } 446 | ``` 447 | 448 | the `number` variable is being set to the first number passed to our program as an argument. this is just a way for me to make sure the Dart compiler cannot guess the value inside `number`, but it has no choice to compile that as a variable to be calculated at runtime and then passed to the `increment` function, let's check out the AOT for the main function now (note that I'm not going to paste the entire main function's AOT since it includes a **lot** of code for the `tryParse()` function and I don't see that as relevant to the point of this article so let's just look at the relevant part) 449 | 450 | ```asm 451 | 000000000009f8d5 push rax 452 | 000000000009f8d6 call Precompiled_int_tryParse_559 ; Precompiled_int_tryParse_559 453 | 000000000009f8db pop rcx 454 | 000000000009f8dc cmp rax, qword [r14+0xc8] 455 | 000000000009f8e3 jne loc_9f8f0 456 | 457 | 000000000009f8e9 xor eax, eax 458 | 000000000009f8eb jmp loc_9f8fd 459 | 460 | loc_9f8f0: 461 | 000000000009f8f0 sar rax, 0x1 ; CODE XREF=Precompiled____main_1440+111 462 | 000000000009f8f3 jae loc_9f8fd 463 | 464 | 000000000009f8f5 mov rax, qword [0x8+rax*2] 465 | 466 | loc_9f8fd: 467 | 000000000009f8fd add rax, 0x1 ; CODE XREF=Precompiled____main_1440+119, Precompiled____main_1440+127 468 | 000000000009f901 add rax, 0x1 469 | 000000000009f905 add rax, 0x1 470 | 000000000009f909 push rax 471 | 000000000009f90a call Precompiled____print_845 ; Precompiled____print_845 472 | ``` 473 | 474 | let's break it down one bit at a time, it seems like the `cmp` and `jne` instruction (jump short if not equal `ZF=0`, refer to EFLAGS in Intel instructions handbook) is checking the result of `Precompiled_int_tryParse_559` with `null` and if `ZF==0` (the result of `Precompiled_int_tryParse_559` was `null`), then it jumps to `loc_9f8f0`. but if the result was `null`, or in other words `ZF=0`, then it goes to this code: 475 | 476 | ```asm 477 | 000000000009f8e9 xor eax, eax 478 | 000000000009f8eb jmp loc_9f8fd 479 | ``` 480 | 481 | which is a pretty clever way of saying that `rax` is 0 at this point. I don't know if this is an optimization on the Dart compiler's side, but it seems like Dart is setting `rax` to 0 using `xor` on x86_64 instead of saying `mov rax, 0`, and I remember from many years ago where I programmed in Assembly that indeed `xor` could be faster than `mov r64, imm` so it could very well be an optimization. 482 | 483 | Now that `eax` is set to either 0 or the result of `tryParse()` we get to `loc_9f8fd` which is this code: 484 | 485 | ```asm 486 | loc_9f8fd: 487 | 000000000009f8fd add rax, 0x1 ; CODE XREF=Precompiled____main_1440+119, Precompiled____main_1440+127 488 | 000000000009f901 add rax, 0x1 489 | 000000000009f905 add rax, 0x1 490 | 000000000009f909 push rax 491 | 000000000009f90a call Precompiled____print_845 ; Precompiled____print_845 492 | ``` 493 | 494 | I would be lying if I said I didn't chuckle but this doesn't seem like the best way to increment `rax` by 3 😂 it seems like the Dart compiler understood that `increment()` increments by 1, but it can't quite literally put together that calling this function N times should add N to `eax` so it's just repeating itself 3 times. 495 | 496 | Here is what Vyacheslav had to say about this: 497 | 498 | > Yeah, this is a missing optimization opportunity here. Dart's compiler misses it because we don't do so called _reassociation_ (turning `((x + a) + b)` into `x + (a + b)`). 499 | 500 | So there you have the answer as to why Dart didn't optimize the above call further! 501 | 502 | ## One-liner optimized `static` functions 503 | 504 | given the following Dart code: 505 | 506 | ```dart 507 | import 'dart:io' show exit; 508 | import 'dart:math' show Random; 509 | 510 | class Foo { 511 | static int increment(int value1, int value2) { 512 | return value1 + value2; 513 | } 514 | } 515 | 516 | void main(List args) { 517 | final rnd = Random(); 518 | final value1 = rnd.nextInt(0xDEADBEEF); 519 | final value2 = rnd.nextInt(0xCAFEBABE); 520 | final result = Foo.increment(value1, value2); 521 | print(result); 522 | exit(0); 523 | } 524 | ``` 525 | 526 | we get the following AOT: 527 | 528 | ```asm 529 | Precompiled____main_1436: 530 | 000000000009a8fc push rbp ; CODE XREF=Precompiled____main_main_1437+17 531 | 000000000009a8fd mov rbp, rsp 532 | 000000000009a900 sub rsp, 0x10 533 | 000000000009a904 cmp rsp, qword [r14+0x40] 534 | 000000000009a908 jbe loc_9a960 535 | 536 | loc_9a90e: 537 | 000000000009a90e push qword [r14+0xc8] ; CODE XREF=Precompiled____main_1436+107 538 | 000000000009a915 call Precompiled_Random_Random__1165 ; Precompiled_Random_Random__1165 539 | 000000000009a91a pop rcx 540 | 000000000009a91b mov qword [rbp+var_8], rax 541 | 000000000009a91f push rax 542 | 000000000009a920 mov ecx, 0xdeadbeef 543 | 000000000009a925 push rcx 544 | 000000000009a926 call Precompiled__Random_11383281_nextInt_1164 ; Precompiled__Random_11383281_nextInt_1164 545 | 000000000009a92b pop rcx 546 | 000000000009a92c pop rcx 547 | 000000000009a92d mov qword [rbp+var_10], rax 548 | 000000000009a931 push qword [rbp+var_8] 549 | 000000000009a934 mov ecx, 0xcafebabe 550 | 000000000009a939 push rcx 551 | 000000000009a93a call Precompiled__Random_11383281_nextInt_1164 ; Precompiled__Random_11383281_nextInt_1164 552 | 000000000009a93f pop rcx 553 | 000000000009a940 pop rcx 554 | 000000000009a941 mov rcx, qword [rbp+var_10] 555 | 000000000009a945 add rcx, rax 556 | 000000000009a948 push rcx 557 | 000000000009a949 call Precompiled____print_911 ; Precompiled____print_911 558 | 000000000009a94e pop rcx 559 | 000000000009a94f call Precompiled____exit_1024 ; Precompiled____exit_1024 560 | 000000000009a954 mov rax, qword [r14+0xc8] 561 | 000000000009a95b mov rsp, rbp 562 | 000000000009a95e pop rbp 563 | 000000000009a95f ret 564 | ; endp 565 | 566 | loc_9a960: 567 | 000000000009a960 call qword [r14+0x240] ; CODE XREF=Precompiled____main_1436+12 568 | 000000000009a967 jmp loc_9a90e 569 | 570 | 571 | ; ================ B E G I N N I N G O F P R O C E D U R E ================ 572 | 573 | 574 | sub_9a969: 575 | 000000000009a969 int3 576 | ``` 577 | 578 | the first part of the code is again setting up the stack and checking for stack overflows as I explained before so I won't go into that again. The juicy part is inside the `loc_9a90e` label. that block of code starts with the following AOT: 579 | 580 | ```asm 581 | 000000000009a90e push qword [r14+0xc8] ; CODE XREF=Precompiled____main_1436+107 582 | 000000000009a915 call Precompiled_Random_Random__1165 ; Precompiled_Random_Random__1165 583 | 000000000009a91a pop rcx 584 | ``` 585 | 586 | `r14` 64-bit GPR is being used here to point to the object pool for this thread and I believe internally in the Dart SDK it hits this point in the `instructions_x64.cc` file: 587 | 588 | ```cpp 589 | bool DecodeLoadObjectFromPoolOrThread(uword pc, const Code& code, Object* obj) { 590 | ASSERT(code.ContainsInstructionAt(pc)); 591 | 592 | uint8_t* bytes = reinterpret_cast(pc); 593 | 594 | COMPILE_ASSERT(THR == R14); 595 | if ((bytes[0] == 0x49) || (bytes[0] == 0x4d)) { 596 | if ((bytes[1] == 0x8b) || (bytes[1] == 0x3b)) { // movq, cmpq 597 | if ((bytes[2] & 0xc7) == (0x80 | (THR & 7))) { // [r14+disp32] 598 | int32_t offset = LoadUnaligned(reinterpret_cast(pc + 3)); 599 | return Thread::ObjectAtOffset(offset, obj); 600 | } 601 | if ((bytes[2] & 0xc7) == (0x40 | (THR & 7))) { // [r14+disp8] 602 | uint8_t offset = *reinterpret_cast(pc + 3); 603 | return Thread::ObjectAtOffset(offset, obj); 604 | } 605 | } 606 | } 607 | ``` 608 | 609 | Before digging into this code, here is what Vyacheslav had to say about this function: 610 | 611 | > This particular function is just a helper used by disassembler and has nothing to do with how the code is generated or pool is populated or how the code is going to be executed later. 612 | 613 | Thanks to this I now know that the code above is used by the disassembler so we can't really rely on it to tell the truth, but still I find it useful since it's doing what I'm doing here in this issue of Going Deep with Dart, to traverse the generated code in reverse to see what we find! You see the `if ((bytes[2] & 0xc7) == (0x80 | (THR & 7))) { // [r14+disp32]` code? that's the code responsible for loading the `Random` class into the stack using the `push` instruction and then `Precompiled_Random_Random__1165` call will initialize the Random instance for us. But let's not get side-tracked here, let's just focus on the static `increment()` function here. the AOT for that function is shown here: 614 | 615 | ```asm 616 | 000000000009a941 mov rcx, qword [rbp+var_10] 617 | 000000000009a945 add rcx, rax 618 | 000000000009a948 push rcx 619 | 000000000009a949 call Precompiled____print_911 ; Precompiled____print_911 620 | ``` 621 | 622 | the `mov` instruction is simply loading the value in `[rbp+var_10]` into `rcx` and just so you know, `[rbp+var_10]` contains the result of `Precompiled__Random_11383281_nextInt_1164`, the first `nextInt()` call, and then Dart is doing `add rcx, rax` because `rax` is holding the result of the second call to the `nextInt()` function so the `add` instruction here is literally what we are doing inside the `increment()` function, brought into the lexical scope of the `main` function. this was quite a nice optimization by the Dart compiler so the static function got inlined in other words. 623 | 624 | ## More complex `static` functions 625 | 626 | let's make the previous example a bit more complex for the compiler: 627 | 628 | ```dart 629 | import 'dart:io' show exit; 630 | import 'dart:math' show Random; 631 | 632 | class Foo { 633 | static int increment(int value1, int value2) { 634 | print(value1); 635 | print(value2); 636 | return value1 + value2; 637 | } 638 | } 639 | 640 | void main(List args) { 641 | final rnd = Random(); 642 | final value1 = rnd.nextInt(0xDEADBEEF); 643 | final value2 = rnd.nextInt(0xCAFEBABE); 644 | final result = Foo.increment(value1, value2); 645 | print(result); 646 | exit(0); 647 | } 648 | ``` 649 | 650 | and for this we get the following AOT for the main function, which I have reduced literally just to a few lines of asm code, since otherwise it's going to be repetitive: 651 | 652 | ```asm 653 | ... 654 | 000000000009a948 push rax 655 | 000000000009a949 push rcx 656 | 000000000009a94a call Precompiled_Foo_increment_1437 ; Precompiled_Foo_increment_1437 657 | 000000000009a94f pop rcx 658 | 000000000009a950 pop rcx 659 | ... 660 | ``` 661 | 662 | this is, not surprisingly, putting `value1` and `value2` into the stack and then calling the `Precompiled_Foo_increment_1437` procedure so let's have a look at the AOT for that: 663 | 664 | ```asm 665 | Precompiled_Foo_increment_1437: 666 | 000000000009a974 push rbp ; CODE XREF=Precompiled____main_1436+78 667 | 000000000009a975 mov rbp, rsp 668 | 000000000009a978 cmp rsp, qword [r14+0x40] 669 | 000000000009a97c jbe loc_9a9ab 670 | 671 | loc_9a982: 672 | 000000000009a982 mov rax, qword [rbp+arg_8] ; CODE XREF=Precompiled_Foo_increment_1437+62 673 | 000000000009a986 push rax 674 | 000000000009a987 call Precompiled____print_911 ; Precompiled____print_911 675 | 000000000009a98c pop rcx 676 | 000000000009a98d mov rax, qword [rbp+arg_0] 677 | 000000000009a991 push rax 678 | 000000000009a992 call Precompiled____print_911 ; Precompiled____print_911 679 | 000000000009a997 pop rcx 680 | 000000000009a998 mov rcx, qword [rbp+arg_0] 681 | 000000000009a99c mov rdx, qword [rbp+arg_8] 682 | 000000000009a9a0 add rdx, rcx 683 | 000000000009a9a3 mov rax, rdx 684 | 000000000009a9a6 mov rsp, rbp 685 | 000000000009a9a9 pop rbp 686 | 000000000009a9aa ret 687 | ; endp 688 | ``` 689 | 690 | here you can see how `rbp` (the 64-bit base pointer) is the pointer to the arguments passed to the stack when this function was called. So our arguments are in the stack with `value1` placed inside `[rbp+arg_8]` and `value2` inside `[rbp+arg_0]`. since we made the `increment()` function more complicated with `print()` invocations inside, the Dart compiler couldn't optimize it so that it could be inlined! that's not a surprise though, almost nothing about this part of the AOT is a surprise to be honest. It's a simple function with simple arguments stored in the stack which internally makes calls to the `Precompiled____print_911` function and then uses these instructions to calculate `return value1 + value2;`: 691 | 692 | ```asm 693 | 000000000009a998 mov rcx, qword [rbp+arg_0] 694 | 000000000009a99c mov rdx, qword [rbp+arg_8] 695 | 000000000009a9a0 add rdx, rcx 696 | 000000000009a9a3 mov rax, rdx 697 | ``` 698 | 699 | so after this procedure is done, the `rax` 64-bit register will contain the result of `value1 + value2` and the caller will be able to read that as shown here: 700 | 701 | ```asm 702 | 000000000009a94a call Precompiled_Foo_increment_1437 ; Precompiled_Foo_increment_1437 703 | 000000000009a94f pop rcx 704 | 000000000009a950 pop rcx 705 | 000000000009a951 push rax 706 | 000000000009a952 call Precompiled____print_911 ; Precompiled____print_911 707 | ``` 708 | 709 | so there you have it! 710 | 711 | 712 | ## Conclusions 713 | 714 | - some global functions with 0 arguments, even if a 1 liner, may not get optimized at compile time, rather they will become procedures at the asm level and then called using the `call` instruction in x86_64 715 | - one liner getters and functions with a constant return value are optimized the same way and their return value (a constant) is usually placed in a relevant register (such as `rax`) to be consumed later. 716 | - parameters passed to functions that cannot be optimized at compile-time to be inlined, are passed into the stack, using Dart's custom calling convention. I haven't been able to find a single place in the Dart SDK source code where the calling convention is documented! 717 | - one-liner static functions, depending on their complexity, can, just like any other global function, be optimized as an inline function. 718 | - depending on the complexity of a function, regardless of whether it is a static function or not, the Dart compiler might make the decision to make that function inline and just because a function seems complicated, it doesn't necessarily mean it won't be optimized and vice versa! 719 | 720 | ## References 721 | 722 | - Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1: Basic Architecture 723 | - Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2 (2A, 2B, 2C & 2D): Instruction Set Reference, A-Z 724 | --------------------------------------------------------------------------------