├── 00. C strings & the proc filesystem ├── .gitignore ├── README.md ├── loop.c ├── main.c └── read_write_heap.py ├── 01. Python bytes ├── .gitignore ├── README.md ├── bytes.c ├── main.py ├── main_bytes.py ├── main_id.py ├── read_write_heap.py ├── read_write_stack.py └── rw_all.py ├── 02. What's where in the virtual memory ├── .gitignore ├── README.md ├── main-0.c ├── main-1.c ├── main-2.c ├── main-3.c ├── main-4.c ├── main-5.c ├── main-6.c └── main-7.c ├── 03. malloc, the heap and the program break ├── .gitignore ├── 0-main.c ├── 1-main.c ├── 10-main.c ├── 2-main.c ├── 3-main.c ├── 4-main.c ├── 5-main.c ├── 6-main.c ├── 7-main.c ├── 8-main.c ├── 9-main.c ├── README.md ├── naive_malloc.c └── version.c ├── 04. The Stack, registers and assembly code ├── 0-main.c ├── 1-main.c ├── 2-main.c ├── 3-main.c ├── 4-main.c └── README.md └── README.md /00. C strings & the proc filesystem/.gitignore: -------------------------------------------------------------------------------- 1 | holberton 2 | loop 3 | Makefile 4 | -------------------------------------------------------------------------------- /00. C strings & the proc filesystem/README.md: -------------------------------------------------------------------------------- 1 | 2 | ![hack the vm!](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/hack_the_vm_0.png) 3 | 4 | ## Intro 5 | 6 | ### Hack The Virtual Memory: Play with C strings & `/proc` 7 | 8 | This is the first in a series of small articles / tutorials based around virtual memory. The goal is to learn some CS basics, but in a different and more practical way. 9 | 10 | For this first piece, we'll use `/proc` to find and modify variables (in this example, an ASCII string) contained inside the virtual memory of a running process, and learn some cool things along the way. 11 | 12 | ## Environment 13 | 14 | All scripts and programs have been tested on the following system: 15 | 16 | - Ubuntu 14.04 LTS 17 | - Linux ubuntu 4.4.0-31-generic #50~14.04.1-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux 18 | - gcc 19 | - gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4 20 | - Python 3: 21 | - Python 3.4.3 (default, Nov 17 2016, 01:08:31) 22 | - \[GCC 4.8.4\] on linux 23 | 24 | ## Prerequisites 25 | 26 | In order to fully understand this article, you need to know: 27 | 28 | - The basics of the C programming language 29 | - Some Python 30 | - The very basics of the Linux filesystem and the shell 31 | 32 | ## Virtual Memory 33 | 34 | In computing, virtual memory is a memory management technique that is implemented using both hardware and software. It maps memory addresses used by a program, called virtual addresses, into physical addresses in computer memory. Main storage (as seen by a process or task) appears as a contiguous address space, or collection of contiguous segments. The operating system manages virtual address spaces and the assignment of real memory to virtual memory. Address translation hardware in the CPU, often referred to as a memory management unit or MMU, automatically translates virtual addresses to physical addresses. Software within the operating system may extend these capabilities to provide a virtual address space that can exceed the capacity of real memory and thus reference more memory than is physically present in the computer. 35 | 36 | The primary benefits of virtual memory include freeing applications from having to manage a shared memory space, increased security due to memory isolation, and being able to conceptually use more memory than might be physically available, using the technique of paging. 37 | 38 | You can read more about the virtual memory on [Wikipedia](https://en.wikipedia.org/wiki/Virtual_memory). 39 | 40 | In the next article, we'll go into more details and do some fact checking on what lies inside the virtual memory and where. For now, here are some key points you should know before you read on: 41 | 42 | - Each process has its own virtual memory 43 | - The amount of virtual memory depends on your system's architecture 44 | - Each OS handles virtual memory differently, but for most modern operating systems, the virtual memory of a process looks like this: 45 | 46 | ![virtual memory](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/virtual_memory.png) 47 | 48 | In the high memory addresses you can find (this is a non exhaustive list, there's much more to be found, but that's not today's topic): 49 | 50 | - The command line arguments and environment variables 51 | - The stack, growing "downwards". This may seem counter-intuitive, but this is the way the stack is implemented in virtual memory 52 | 53 | In the low memory addresses you can find: 54 | 55 | - Your executable (it's a little more complicated than that, but this is enough to understand the rest of this article) 56 | - The heap, growing "upwards" 57 | 58 | The heap is a portion of memory that is dynamically allocated (i.e. containing memory allocated using `malloc`). 59 | 60 | Also, keep in mind that **virtual memory is not the same as RAM**. 61 | 62 | ## C program 63 | 64 | Let's start with this simple C program: 65 | 66 | ```c 67 | #include 68 | #include 69 | #include 70 | 71 | /** 72 | * main - uses strdup to create a new string, and prints the 73 | * address of the new duplcated string 74 | * 75 | * Return: EXIT_FAILURE if malloc failed. Otherwise EXIT_SUCCESS 76 | */ 77 | int main(void) 78 | { 79 | char *s; 80 | 81 | s = strdup("Holberton"); 82 | if (s == NULL) 83 | { 84 | fprintf(stderr, "Can't allocate mem with malloc\n"); 85 | return (EXIT_FAILURE); 86 | } 87 | printf("%p\n", (void *)s); 88 | return (EXIT_SUCCESS); 89 | } 90 | ``` 91 | 92 | ### strdup 93 | 94 | _Take a moment to think before going further. How do you think `strdup` creates a copy of the string "Holberton"? How can you confirm that?_ 95 | 96 | . 97 | 98 | . 99 | 100 | . 101 | 102 | `strdup` has to create a new string, so it first has to reserve space for it. The function `strdup` is probably using `malloc`. A quick look at its man page can confirm: 103 | 104 | ```man 105 | DESCRIPTION 106 | The strdup() function returns a pointer to a new string which is a duplicate of the string s. 107 | Memory for the new string is obtained with malloc(3), and can be freed with free(3). 108 | ``` 109 | 110 | _Take a moment to think before going further. Based on what we said earlier about virtual memory, where do you think the duplicate string will be located? At a high or low memory address?_ 111 | 112 | . 113 | 114 | . 115 | 116 | . 117 | 118 | Probably in the lower addresses (in the heap). Let's compile and run our small C program to test our hypothesis: 119 | 120 | ```shell 121 | julien@holberton:~/holberton/w/hackthevm0$ gcc -Wall -Wextra -pedantic -Werror main.c -o holberton 122 | julien@holberton:~/holberton/w/hackthevm0$ ./holberton 123 | 0x1822010 124 | julien@holberton:~/holberton/w/hackthevm0$ 125 | ``` 126 | 127 | Our duplicated string is located at the address `0x1822010`. Great. But is this a low or a high memory address? 128 | 129 | ### How big is the virtual memory of a process 130 | 131 | The size of the virtual memory of a process depends on your system architecture. In this example I am using a 64-bit machine, so theoretically the size of each process' virtual memory is 2^64 bytes. In theory, the highest memory address possible is `0xffffffffffffffff` (1.8446744e+19), and the lowest is `0x0`. 132 | 133 | `0x1822010` is small compared to `0xffffffffffffffff`, so the duplicated string is probably located at a lower memory address. We will be able to confirm this when we will be looking at the `proc` filesystem). 134 | 135 | ## The proc filesystem 136 | 137 | From `man proc`: 138 | 139 | ``` 140 | The proc filesystem is a pseudo-filesystem which provides an interface to kernel data structures. It is commonly mounted at `/proc`. Most of it is read-only, but some files allow kernel variables to be changed. 141 | ``` 142 | 143 | If you list the contents of your `/proc` directory, you will probably see a lot of files. We will focus on two of them: 144 | 145 | - `/proc/[pid]/mem` 146 | - `/proc/[pid]/maps` 147 | 148 | ### mem 149 | 150 | From `man proc`: 151 | 152 | ```man 153 | /proc/[pid]/mem 154 | This file can be used to access the pages of a process's memory 155 | through open(2), read(2), and lseek(2). 156 | ``` 157 | 158 | Awesome! So, can we access and modify the entire virtual memory of any process? 159 | 160 | ### maps 161 | 162 | From `man proc`: 163 | 164 | ```man 165 | /proc/[pid]/maps 166 | A file containing the currently mapped memory regions and their access permissions. 167 | See mmap(2) for some further information about memory mappings. 168 | 169 | The format of the file is: 170 | 171 | address perms offset dev inode pathname 172 | 00400000-00452000 r-xp 00000000 08:02 173521 /usr/bin/dbus-daemon 173 | 00651000-00652000 r--p 00051000 08:02 173521 /usr/bin/dbus-daemon 174 | 00652000-00655000 rw-p 00052000 08:02 173521 /usr/bin/dbus-daemon 175 | 00e03000-00e24000 rw-p 00000000 00:00 0 [heap] 176 | 00e24000-011f7000 rw-p 00000000 00:00 0 [heap] 177 | ... 178 | 35b1800000-35b1820000 r-xp 00000000 08:02 135522 /usr/lib64/ld-2.15.so 179 | 35b1a1f000-35b1a20000 r--p 0001f000 08:02 135522 /usr/lib64/ld-2.15.so 180 | 35b1a20000-35b1a21000 rw-p 00020000 08:02 135522 /usr/lib64/ld-2.15.so 181 | 35b1a21000-35b1a22000 rw-p 00000000 00:00 0 182 | 35b1c00000-35b1dac000 r-xp 00000000 08:02 135870 /usr/lib64/libc-2.15.so 183 | 35b1dac000-35b1fac000 ---p 001ac000 08:02 135870 /usr/lib64/libc-2.15.so 184 | 35b1fac000-35b1fb0000 r--p 001ac000 08:02 135870 /usr/lib64/libc-2.15.so 185 | 35b1fb0000-35b1fb2000 rw-p 001b0000 08:02 135870 /usr/lib64/libc-2.15.so 186 | ... 187 | f2c6ff8c000-7f2c7078c000 rw-p 00000000 00:00 0 [stack:986] 188 | ... 189 | 7fffb2c0d000-7fffb2c2e000 rw-p 00000000 00:00 0 [stack] 190 | 7fffb2d48000-7fffb2d49000 r-xp 00000000 00:00 0 [vdso] 191 | 192 | The address field is the address space in the process that the mapping occupies. 193 | The perms field is a set of permissions: 194 | 195 | r = read 196 | w = write 197 | x = execute 198 | s = shared 199 | p = private (copy on write) 200 | 201 | The offset field is the offset into the file/whatever; 202 | dev is the device (major:minor); inode is the inode on that device. 0 indicates 203 | that no inode is associated with the memory region, 204 | as would be the case with BSS (uninitialized data). 205 | 206 | The pathname field will usually be the file that is backing the mapping. 207 | For ELF files, you can easily coordinate with the offset field 208 | by looking at the Offset field in the ELF program headers (readelf -l). 209 | 210 | There are additional helpful pseudo-paths: 211 | 212 | [stack] 213 | The initial process's (also known as the main thread's) stack. 214 | 215 | [stack:] (since Linux 3.4) 216 | A thread's stack (where the is a thread ID). 217 | It corresponds to the /proc/[pid]/task/[tid]/ path. 218 | 219 | [vdso] The virtual dynamically linked shared object. 220 | 221 | [heap] The process's heap. 222 | 223 | If the pathname field is blank, this is an anonymous mapping as obtained via the mmap(2) function. 224 | There is no easy way to coordinate 225 | this back to a process's source, short of running it through gdb(1), strace(1), or similar. 226 | 227 | Under Linux 2.0 there is no field giving pathname. 228 | ``` 229 | 230 | This means that we can look at the `/proc/[pid]/mem` file to locate the heap of a running process. If we can read from the heap, we can locate the string we want to modify. And if we can write to the heap, we can replace this string with whatever we want. 231 | 232 | ### pid 233 | 234 | A process is an instance of a program, with a unique process ID. This process ID (PID) is used by many functions and system calls to interact with and manipulate processes. 235 | 236 | We can use the program `ps` to get the PID of a running process (`man ps`). 237 | 238 | ## C program 239 | 240 | We now have everything we need to write a script or program that finds a string in the heap of a running process and then replaces it with another string (of the same length or shorter). We will work with the following simple program that infinitely loops and prints a "strduplicated" string. 241 | 242 | ```c 243 | #include 244 | #include 245 | #include 246 | #include 247 | 248 | /** 249 | * main - uses strdup to create a new string, loops forever-ever 250 | * 251 | * Return: EXIT_FAILURE if malloc failed. Other never returns 252 | */ 253 | int main(void) 254 | { 255 | char *s; 256 | unsigned long int i; 257 | 258 | s = strdup("Holberton"); 259 | if (s == NULL) 260 | { 261 | fprintf(stderr, "Can't allocate mem with malloc\n"); 262 | return (EXIT_FAILURE); 263 | } 264 | i = 0; 265 | while (s) 266 | { 267 | printf("[%lu] %s (%p)\n", i, s, (void *)s); 268 | sleep(1); 269 | i++; 270 | } 271 | return (EXIT_SUCCESS); 272 | } 273 | ``` 274 | 275 | Compiling and running the above source code should give you this output, and loop indefinitely until you kill the process. 276 | 277 | ``` 278 | julien@holberton:~/holberton/w/hackthevm0$ gcc -Wall -Wextra -pedantic -Werror loop.c -o loop 279 | julien@holberton:~/holberton/w/hackthevm0$ ./loop 280 | [0] Holberton (0xfbd010) 281 | [1] Holberton (0xfbd010) 282 | [2] Holberton (0xfbd010) 283 | [3] Holberton (0xfbd010) 284 | [4] Holberton (0xfbd010) 285 | [5] Holberton (0xfbd010) 286 | [6] Holberton (0xfbd010) 287 | [7] Holberton (0xfbd010) 288 | ... 289 | ``` 290 | 291 | _If you would like, pause the reading now and try to write a script or program that finds a string in the heap of a running process before reading further._ 292 | 293 | . 294 | 295 | . 296 | 297 | . 298 | 299 | ### looking at /proc 300 | 301 | Let's run our `loop` program. 302 | 303 | ``` 304 | julien@holberton:~/holberton/w/hackthevm0$ ./loop 305 | [0] Holberton (0x10ff010) 306 | [1] Holberton (0x10ff010) 307 | [2] Holberton (0x10ff010) 308 | [3] Holberton (0x10ff010) 309 | ... 310 | ``` 311 | 312 | The first thing we need to find is the PID of the process. 313 | 314 | ```shell 315 | julien@holberton:~/holberton/w/hackthevm0$ ps aux | grep ./loop | grep -v grep 316 | julien 4618 0.0 0.0 4332 732 pts/14 S+ 17:06 0:00 ./loop 317 | ``` 318 | 319 | In the above example, the PID is 4618 (it will be different each time we run it, and it is probably a different number if you are trying this on your own computer). As a result, the `maps` and `mem` files we want to look at are located in the `/proc/4618` directory: 320 | 321 | - `/proc/4618/maps` 322 | - `/proc/4618/mem` 323 | 324 | A quick `ls -la` in the directory should give you something like this: 325 | 326 | ```shell 327 | julien@ubuntu:/proc/4618$ ls -la 328 | total 0 329 | dr-xr-xr-x 9 julien julien 0 Mar 15 17:07 . 330 | dr-xr-xr-x 257 root root 0 Mar 15 10:20 .. 331 | dr-xr-xr-x 2 julien julien 0 Mar 15 17:11 attr 332 | -rw-r--r-- 1 julien julien 0 Mar 15 17:11 autogroup 333 | -r-------- 1 julien julien 0 Mar 15 17:11 auxv 334 | -r--r--r-- 1 julien julien 0 Mar 15 17:11 cgroup 335 | --w------- 1 julien julien 0 Mar 15 17:11 clear_refs 336 | -r--r--r-- 1 julien julien 0 Mar 15 17:07 cmdline 337 | -rw-r--r-- 1 julien julien 0 Mar 15 17:11 comm 338 | -rw-r--r-- 1 julien julien 0 Mar 15 17:11 coredump_filter 339 | -r--r--r-- 1 julien julien 0 Mar 15 17:11 cpuset 340 | lrwxrwxrwx 1 julien julien 0 Mar 15 17:11 cwd -> /home/julien/holberton/w/funwthevm 341 | -r-------- 1 julien julien 0 Mar 15 17:11 environ 342 | lrwxrwxrwx 1 julien julien 0 Mar 15 17:11 exe -> /home/julien/holberton/w/funwthevm/loop 343 | dr-x------ 2 julien julien 0 Mar 15 17:07 fd 344 | dr-x------ 2 julien julien 0 Mar 15 17:11 fdinfo 345 | -rw-r--r-- 1 julien julien 0 Mar 15 17:11 gid_map 346 | -r-------- 1 julien julien 0 Mar 15 17:11 io 347 | -r--r--r-- 1 julien julien 0 Mar 15 17:11 limits 348 | -rw-r--r-- 1 julien julien 0 Mar 15 17:11 loginuid 349 | dr-x------ 2 julien julien 0 Mar 15 17:11 map_files 350 | -r--r--r-- 1 julien julien 0 Mar 15 17:11 maps 351 | -rw------- 1 julien julien 0 Mar 15 17:11 mem 352 | -r--r--r-- 1 julien julien 0 Mar 15 17:11 mountinfo 353 | -r--r--r-- 1 julien julien 0 Mar 15 17:11 mounts 354 | -r-------- 1 julien julien 0 Mar 15 17:11 mountstats 355 | dr-xr-xr-x 5 julien julien 0 Mar 15 17:11 net 356 | dr-x--x--x 2 julien julien 0 Mar 15 17:11 ns 357 | -r--r--r-- 1 julien julien 0 Mar 15 17:11 numa_maps 358 | -rw-r--r-- 1 julien julien 0 Mar 15 17:11 oom_adj 359 | -r--r--r-- 1 julien julien 0 Mar 15 17:11 oom_score 360 | -rw-r--r-- 1 julien julien 0 Mar 15 17:11 oom_score_adj 361 | -r-------- 1 julien julien 0 Mar 15 17:11 pagemap 362 | -r-------- 1 julien julien 0 Mar 15 17:11 personality 363 | -rw-r--r-- 1 julien julien 0 Mar 15 17:11 projid_map 364 | lrwxrwxrwx 1 julien julien 0 Mar 15 17:11 root -> / 365 | -rw-r--r-- 1 julien julien 0 Mar 15 17:11 sched 366 | -r--r--r-- 1 julien julien 0 Mar 15 17:11 schedstat 367 | -r--r--r-- 1 julien julien 0 Mar 15 17:11 sessionid 368 | -rw-r--r-- 1 julien julien 0 Mar 15 17:11 setgroups 369 | -r--r--r-- 1 julien julien 0 Mar 15 17:11 smaps 370 | -r-------- 1 julien julien 0 Mar 15 17:11 stack 371 | -r--r--r-- 1 julien julien 0 Mar 15 17:07 stat 372 | -r--r--r-- 1 julien julien 0 Mar 15 17:11 statm 373 | -r--r--r-- 1 julien julien 0 Mar 15 17:07 status 374 | -r-------- 1 julien julien 0 Mar 15 17:11 syscall 375 | dr-xr-xr-x 3 julien julien 0 Mar 15 17:11 task 376 | -r--r--r-- 1 julien julien 0 Mar 15 17:11 timers 377 | -rw-r--r-- 1 julien julien 0 Mar 15 17:11 uid_map 378 | -r--r--r-- 1 julien julien 0 Mar 15 17:11 wchan 379 | ``` 380 | 381 | ### /proc/pid/maps 382 | 383 | As we have seen earlier, the `/proc/pid/maps` file is a text file, so we can directly read it. The content of the `maps` file of our process looks like this: 384 | 385 | ```shell 386 | julien@ubuntu:/proc/4618$ cat maps 387 | 00400000-00401000 r-xp 00000000 08:01 1070052 /home/julien/holberton/w/funwthevm/loop 388 | 00600000-00601000 r--p 00000000 08:01 1070052 /home/julien/holberton/w/funwthevm/loop 389 | 00601000-00602000 rw-p 00001000 08:01 1070052 /home/julien/holberton/w/funwthevm/loop 390 | 010ff000-01120000 rw-p 00000000 00:00 0 [heap] 391 | 7f144c052000-7f144c20c000 r-xp 00000000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 392 | 7f144c20c000-7f144c40c000 ---p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 393 | 7f144c40c000-7f144c410000 r--p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 394 | 7f144c410000-7f144c412000 rw-p 001be000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 395 | 7f144c412000-7f144c417000 rw-p 00000000 00:00 0 396 | 7f144c417000-7f144c43a000 r-xp 00000000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so 397 | 7f144c61e000-7f144c621000 rw-p 00000000 00:00 0 398 | 7f144c636000-7f144c639000 rw-p 00000000 00:00 0 399 | 7f144c639000-7f144c63a000 r--p 00022000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so 400 | 7f144c63a000-7f144c63b000 rw-p 00023000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so 401 | 7f144c63b000-7f144c63c000 rw-p 00000000 00:00 0 402 | 7ffc94272000-7ffc94293000 rw-p 00000000 00:00 0 [stack] 403 | 7ffc9435e000-7ffc94360000 r--p 00000000 00:00 0 [vvar] 404 | 7ffc94360000-7ffc94362000 r-xp 00000000 00:00 0 [vdso] 405 | ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] 406 | ``` 407 | 408 | Circling back to what we said earlier, we can see that the stack (`[stack]`) is located in high memory addresses and the heap (`[heap]`) in the lower memory addresses. 409 | 410 | ### [heap] 411 | 412 | Using the `maps` file, we can find all the information we need to locate our string: 413 | 414 | ``` 415 | 010ff000-01120000 rw-p 00000000 00:00 0 [heap] 416 | ``` 417 | 418 | The heap: 419 | 420 | - Starts at address `0x010ff000` in the virtual memory of the process 421 | - Ends at memory address: `0x01120000` 422 | - Is readable and writable (`rw`) 423 | 424 | A quick look back to our (still running) `loop` program: 425 | 426 | ``` 427 | ... 428 | [1024] Holberton (0x10ff010) 429 | ... 430 | ``` 431 | 432 | -> `0x010ff000` < `0x10ff010` < `0x01120000`. This confirms that our string is located in the heap. More precisely, it is located at index `0x10` of the heap. If we open the `/proc/pid/mem/` file (in this example `/proc/4618/mem`) and seek to the memory address `0x10ff010`, we can write to the heap of the running process, overwriting the "Holberton" string! 433 | 434 | Let's write a script or program that does just that. Choose your favorite language and let's do it! 435 | 436 | _If you would like, stop reading now and try to write a script or program that finds a string in the heap of a running process, before reading further. The next paragraph will give away the source code of the answer!_ 437 | 438 | . 439 | 440 | . 441 | 442 | . 443 | 444 | ### Overwriting the string in the virtual memory 445 | 446 | We'll be using Python 3 for writing the script, but you could write this in any language. Here is the code: 447 | 448 | ```python 449 | #!/usr/bin/env python3 450 | ''' 451 | Locates and replaces the first occurrence of a string in the heap 452 | of a process 453 | 454 | Usage: ./read_write_heap.py PID search_string replace_by_string 455 | Where: 456 | - PID is the pid of the target process 457 | - search_string is the ASCII string you are looking to overwrite 458 | - replace_by_string is the ASCII string you want to replace 459 | search_string with 460 | ''' 461 | 462 | import sys 463 | 464 | def print_usage_and_exit(): 465 | print('Usage: {} pid search write'.format(sys.argv[0])) 466 | sys.exit(1) 467 | 468 | # check usage 469 | if len(sys.argv) != 4: 470 | print_usage_and_exit() 471 | 472 | # get the pid from args 473 | pid = int(sys.argv[1]) 474 | if pid <= 0: 475 | print_usage_and_exit() 476 | search_string = str(sys.argv[2]) 477 | if search_string == "": 478 | print_usage_and_exit() 479 | write_string = str(sys.argv[3]) 480 | if search_string == "": 481 | print_usage_and_exit() 482 | 483 | # open the maps and mem files of the process 484 | maps_filename = "/proc/{}/maps".format(pid) 485 | print("[*] maps: {}".format(maps_filename)) 486 | mem_filename = "/proc/{}/mem".format(pid) 487 | print("[*] mem: {}".format(mem_filename)) 488 | 489 | # try opening the maps file 490 | try: 491 | maps_file = open('/proc/{}/maps'.format(pid), 'r') 492 | except IOError as e: 493 | print("[ERROR] Can not open file {}:".format(maps_filename)) 494 | print(" I/O error({}): {}".format(e.errno, e.strerror)) 495 | sys.exit(1) 496 | 497 | for line in maps_file: 498 | sline = line.split(' ') 499 | # check if we found the heap 500 | if sline[-1][:-1] != "[heap]": 501 | continue 502 | print("[*] Found [heap]:") 503 | 504 | # parse line 505 | addr = sline[0] 506 | perm = sline[1] 507 | offset = sline[2] 508 | device = sline[3] 509 | inode = sline[4] 510 | pathname = sline[-1][:-1] 511 | print("\tpathname = {}".format(pathname)) 512 | print("\taddresses = {}".format(addr)) 513 | print("\tpermisions = {}".format(perm)) 514 | print("\toffset = {}".format(offset)) 515 | print("\tinode = {}".format(inode)) 516 | 517 | # check if there is read and write permission 518 | if perm[0] != 'r' or perm[1] != 'w': 519 | print("[*] {} does not have read/write permission".format(pathname)) 520 | maps_file.close() 521 | exit(0) 522 | 523 | # get start and end of the heap in the virtual memory 524 | addr = addr.split("-") 525 | if len(addr) != 2: # never trust anyone, not even your OS :) 526 | print("[*] Wrong addr format") 527 | maps_file.close() 528 | exit(1) 529 | addr_start = int(addr[0], 16) 530 | addr_end = int(addr[1], 16) 531 | print("\tAddr start [{:x}] | end [{:x}]".format(addr_start, addr_end)) 532 | 533 | # open and read mem 534 | try: 535 | mem_file = open(mem_filename, 'rb+') 536 | except IOError as e: 537 | print("[ERROR] Can not open file {}:".format(mem_filename)) 538 | print(" I/O error({}): {}".format(e.errno, e.strerror)) 539 | maps_file.close() 540 | exit(1) 541 | 542 | # read heap 543 | mem_file.seek(addr_start) 544 | heap = mem_file.read(addr_end - addr_start) 545 | 546 | # find string 547 | try: 548 | i = heap.index(bytes(search_string, "ASCII")) 549 | except Exception: 550 | print("Can't find '{}'".format(search_string)) 551 | maps_file.close() 552 | mem_file.close() 553 | exit(0) 554 | print("[*] Found '{}' at {:x}".format(search_string, i)) 555 | 556 | # write the new string 557 | print("[*] Writing '{}' at {:x}".format(write_string, addr_start + i)) 558 | mem_file.seek(addr_start + i) 559 | mem_file.write(bytes(write_string, "ASCII")) 560 | 561 | # close files 562 | maps_file.close() 563 | mem_file.close() 564 | 565 | # there is only one heap in our example 566 | break 567 | 568 | ``` 569 | 570 | Note: You will need to run this script as root, otherwise you won't be able to read or write to the `/proc/pid/mem` file, even if you are the owner of the process. 571 | 572 | #### Running the script 573 | 574 | ``` 575 | julien@holberton:~/holberton/w/hackthevm0$ sudo ./read_write_heap.py 4618 Holberton "Fun w vm!" 576 | [*] maps: /proc/4618/maps 577 | [*] mem: /proc/4618/mem 578 | [*] Found [heap]: 579 | pathname = [heap] 580 | addresses = 010ff000-01120000 581 | permisions = rw-p 582 | offset = 00000000 583 | inode = 0 584 | Addr start [10ff000] | end [1120000] 585 | [*] Found 'Holberton' at 10 586 | [*] Writing 'Fun w vm!' at 10ff010 587 | julien@holberton:~/holberton/w/hackthevm0$ 588 | ``` 589 | 590 | Note that this address corresponds to the one we found manually: 591 | 592 | - The heap lies from addresses `0x010ff000` to `0x01120000` in the virtual memory of the running process 593 | - Our string is at index `0x10` in the heap, so at the memory address `0x10ff010` 594 | 595 | If we go back to our `loop` program, it should now print "fun w vm!" 596 | 597 | ``` 598 | ... 599 | [2676] Holberton (0x10ff010) 600 | [2677] Holberton (0x10ff010) 601 | [2678] Holberton (0x10ff010) 602 | [2679] Holberton (0x10ff010) 603 | [2680] Holberton (0x10ff010) 604 | [2681] Holberton (0x10ff010) 605 | [2682] Fun w vm! (0x10ff010) 606 | [2683] Fun w vm! (0x10ff010) 607 | [2684] Fun w vm! (0x10ff010) 608 | [2685] Fun w vm! (0x10ff010) 609 | ... 610 | ``` 611 | 612 | ![hack the virtual memory of a process: mind blowing !](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/blown-mind-explosion-gif.gif) 613 | 614 | ## Outro 615 | 616 | ### Questions? Feedback? 617 | 618 | If you have questions or feedback don't hesitate to ping us on Twitter at [@holbertonschool](https://twitter.com/holbertonschool) or [@julienbarbier42](https://twitter.com/julienbarbier42). 619 | _Haters, please send your comments to `/dev/null`._ 620 | 621 | Happy Hacking! 622 | 623 | ### Thank you for reading! 624 | 625 | As always, no-one is perfect (except [Chuck](http://codesqueeze.com/the-ultimate-top-25-chuck-norris-the-programmer-jokes/) of course), so don't hesitate to [contribute](https://github.com/holbertonschool/Hack-The-Virtual-Memory) or send me your comments. 626 | 627 | ### Files 628 | 629 | [This repo](https://github.com/holbertonschool/Hack-The-Virtual-Memory/tree/master/00.%20C%20strings%20%26%20the%20proc%20filesystem) contains the source code for all programs shown in this tutorial: 630 | 631 | - `main.c`: the first C program that prints the location of the string and exits 632 | - `loop.c`: the second C program that loops indefinitely 633 | - `read_write_heap.py`: the script used to modify the string in the running C program 634 | 635 | ### What's next? 636 | 637 | In the next piece we'll do almost the same thing, but instead we'll access the memory of a running Python 3 script. It won't be that straightfoward. We'll take this as an excuse to look at some Python 3 internals. If you are curious, try to do it yourself, and find out why the above `read_write_heap.py` script won't work to modify a Python 3 ASCII string. 638 | 639 | See you next time and Happy Hacking! 640 | 641 | _Many thanks to [Kristine](https://twitter.com/codechick1), [Tim](https://twitter.com/wintermanc3r) for English proof-reading & [Guillaume](https://twitter.com/guillaumesalva) for PEP8 proof-reading :)_ 642 | -------------------------------------------------------------------------------- /00. C strings & the proc filesystem/loop.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | 6 | /** 7 | * main - uses strdup to create a new string, loops forever-ever 8 | * 9 | * Return: EXIT_FAILURE if malloc failed. Other never returns 10 | */ 11 | int main(void) 12 | { 13 | char *s; 14 | unsigned long int i; 15 | 16 | s = strdup("Holberton"); 17 | if (s == NULL) 18 | { 19 | fprintf(stderr, "Can't allocate mem with malloc\n"); 20 | return (EXIT_FAILURE); 21 | } 22 | i = 0; 23 | while (s) 24 | { 25 | printf("[%lu] %s (%p)\n", i, s, (void *)s); 26 | sleep(1); 27 | i++; 28 | } 29 | return (EXIT_SUCCESS); 30 | } 31 | -------------------------------------------------------------------------------- /00. C strings & the proc filesystem/main.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | /** 6 | * main - uses strdup to create a new string, and prints the 7 | * address of the new duplcated string 8 | * 9 | * Return: EXIT_FAILURE if malloc failed. Otherwise EXIT_SUCCESS 10 | */ 11 | int main(void) 12 | { 13 | char *s; 14 | 15 | s = strdup("Holberton"); 16 | if (s == NULL) 17 | { 18 | fprintf(stderr, "Can't allocate mem with malloc\n"); 19 | return (EXIT_FAILURE); 20 | } 21 | printf("%p\n", (void *)s); 22 | return (EXIT_SUCCESS); 23 | } 24 | -------------------------------------------------------------------------------- /00. C strings & the proc filesystem/read_write_heap.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | ''' 3 | Locates and replaces the first occurrence of a string in the heap 4 | of a process 5 | 6 | Usage: ./read_write_heap.py PID search_string replace_by_string 7 | Where: 8 | - PID is the pid of the target process 9 | - search_string is the ASCII string you are looking to overwrite 10 | - replace_by_string is the ASCII string you want to replace 11 | search_string with 12 | ''' 13 | 14 | import sys 15 | 16 | 17 | def print_usage_and_exit(): 18 | print('Usage: {} pid search write'.format(sys.argv[0])) 19 | sys.exit(1) 20 | 21 | # check usage 22 | if len(sys.argv) != 4: 23 | print_usage_and_exit() 24 | 25 | # get the pid from args 26 | pid = int(sys.argv[1]) 27 | if pid <= 0: 28 | print_usage_and_exit() 29 | search_string = str(sys.argv[2]) 30 | if search_string == "": 31 | print_usage_and_exit() 32 | write_string = str(sys.argv[3]) 33 | if write_string == "": 34 | print_usage_and_exit() 35 | 36 | # open the maps and mem files of the process 37 | maps_filename = "/proc/{}/maps".format(pid) 38 | print("[*] maps: {}".format(maps_filename)) 39 | mem_filename = "/proc/{}/mem".format(pid) 40 | print("[*] mem: {}".format(mem_filename)) 41 | 42 | # try opening the maps file 43 | try: 44 | maps_file = open('/proc/{}/maps'.format(pid), 'r') 45 | except IOError as e: 46 | print("[ERROR] Can not open file {}:".format(maps_filename)) 47 | print(" I/O error({}): {}".format(e.errno, e.strerror)) 48 | sys.exit(1) 49 | 50 | for line in maps_file: 51 | sline = line.split(' ') 52 | # check if we found the heap 53 | if sline[-1][:-1] != "[heap]": 54 | continue 55 | print("[*] Found [heap]:") 56 | 57 | # parse line 58 | addr = sline[0] 59 | perm = sline[1] 60 | offset = sline[2] 61 | device = sline[3] 62 | inode = sline[4] 63 | pathname = sline[-1][:-1] 64 | print("\tpathname = {}".format(pathname)) 65 | print("\taddresses = {}".format(addr)) 66 | print("\tpermisions = {}".format(perm)) 67 | print("\toffset = {}".format(offset)) 68 | print("\tinode = {}".format(inode)) 69 | 70 | # check if there is read and write permission 71 | if perm[0] != 'r' or perm[1] != 'w': 72 | print("[*] {} does not have read/write permission".format(pathname)) 73 | maps_file.close() 74 | exit(0) 75 | 76 | # get start and end of the heap in the virtual memory 77 | addr = addr.split("-") 78 | if len(addr) != 2: # never trust anyone, not even your OS :) 79 | print("[*] Wrong addr format") 80 | maps_file.close() 81 | exit(1) 82 | addr_start = int(addr[0], 16) 83 | addr_end = int(addr[1], 16) 84 | print("\tAddr start [{:x}] | end [{:x}]".format(addr_start, addr_end)) 85 | 86 | # open and read mem 87 | try: 88 | mem_file = open(mem_filename, 'rb+') 89 | except IOError as e: 90 | print("[ERROR] Can not open file {}:".format(mem_filename)) 91 | print(" I/O error({}): {}".format(e.errno, e.strerror)) 92 | maps_file.close() 93 | exit(1) 94 | 95 | # read heap 96 | mem_file.seek(addr_start) 97 | heap = mem_file.read(addr_end - addr_start) 98 | 99 | # find string 100 | try: 101 | i = heap.index(bytes(search_string, "ASCII")) 102 | except Exception: 103 | print("Can't find '{}'".format(search_string)) 104 | maps_file.close() 105 | mem_file.close() 106 | exit(0) 107 | print("[*] Found '{}' at {:x}".format(search_string, i)) 108 | 109 | # write the new string 110 | print("[*] Writing '{}' at {:x}".format(write_string, addr_start + i)) 111 | mem_file.seek(addr_start + i) 112 | mem_file.write(bytes(write_string, "ASCII")) 113 | 114 | # close files 115 | maps_file.close() 116 | mem_file.close() 117 | 118 | # there is only one heap in our example 119 | break 120 | -------------------------------------------------------------------------------- /01. Python bytes/.gitignore: -------------------------------------------------------------------------------- 1 | libPython.so 2 | Makefile 3 | -------------------------------------------------------------------------------- /01. Python bytes/README.md: -------------------------------------------------------------------------------- 1 | ![hack the vm!](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/hacke_the_vm_1.png) 2 | 3 | ## Hack The Virtual Memory, Chapter 1: Python bytes 4 | 5 | For this second chapter, we'll do almost the same thing as for [chapter 0: C strings & /proc](https://blog.holbertonschool.com/hack-the-virtual-memory-c-strings-proc/), but instead we'll access the virtual memory of a running Python 3 script. It won't be as straightfoward. 6 | Let's take this as an excuse to look at some Python 3 internals! 7 | 8 | ## Prerequisites 9 | 10 | _This article is based on everything we learned in the previous chapter. Please read (and understand) [chapter 0: C strings & /proc](https://blog.holbertonschool.com/hack-the-virtual-memory-c-strings-proc/) before reading this one._ 11 | 12 | In order to fully understand this article, you need to know: 13 | 14 | - The basics of the C programming language 15 | - Some Python 16 | - The very basics of the Linux filesystem and the shell 17 | - The very basics of the `/proc` filesystem (see [chapter 0: C strings & /proc](https://blog.holbertonschool.com/hack-the-virtual-memory-c-strings-proc/) for an intro on this topic) 18 | 19 | ## Environment 20 | 21 | All scripts and programs have been tested on the following system: 22 | 23 | - Ubuntu 24 | - Linux ubuntu 4.4.0-31-generic #50~14.04.1-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux 25 | - gcc 26 | - gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4 27 | - Python 3: 28 | - Python 3.4.3 (default, Nov 17 2016, 01:08:31) 29 | - \[GCC 4.8.4\] on linux 30 | 31 | ## Python script 32 | 33 | We'll first use this script (`main.py`) and try to modify the "string" `Holberton` in the virtual memory of the process running it. 34 | 35 | ```python 36 | #!/usr/bin/env python3 37 | ''' 38 | Prints a b"string" (bytes object), reads a char from stdin 39 | and prints the same (or not :)) string again 40 | ''' 41 | 42 | import sys 43 | 44 | s = b"Holberton" 45 | print(s) 46 | sys.stdin.read(1) 47 | print(s) 48 | ``` 49 | 50 | ## About the bytes object 51 | 52 | ### bytes vs str 53 | 54 | As you can see, we are using a bytes object (we use a `b` in front of our string literal) to store our string. This type will store the characters of the string as bytes (vs potentially multibytes - you can read the `unicodeobject.h` to learn more about how Python 3 encodes strings). This ensures that the string will be a succession of ASCII-values in the virtual memory of the process running the script. 55 | 56 | _Technically `s` is not a Python string_ (but it doesn't matter in our context): 57 | 58 | ```shell 59 | julien@holberton:~/holberton/w/hackthevm1$ python3 60 | Python 3.4.3 (default, Nov 17 2016, 01:08:31) 61 | [GCC 4.8.4] on linux 62 | Type "help", "copyright", "credits" or "license" for more information. 63 | >>> s = "Betty" 64 | >>> type(s) 65 | 66 | >>> s = b"Betty" 67 | >>> type(s) 68 | 69 | >>> quit() 70 | ``` 71 | 72 | ### Everything is an object 73 | 74 | Everything in Python is an object: integers, strings, bytes, functions, everything. So the line `s = b"Holberton"` should create an object of type `bytes`, and store the string `b"Holberton` somewhere in memory. Probably in the heap since it has to reserve space for the object and the bytes referenced by or stored in the object (at this point we don't know about the exact implementation). 75 | 76 | ## Running `read_write_heap.py` against the Python script 77 | 78 | _Note: `read_write_heap.py` is a script we wrote in the previous chapter [chapter 0: C strings & /proc](https://blog.holbertonschool.com/hack-the-virtual-memory-c-strings-proc/)_ 79 | 80 | Let's run the above script and then run our `read_write_heap.py` script: 81 | 82 | ```shell 83 | julien@holberton:~/holberton/w/hackthevm1$ ./main.py 84 | b'Holberton' 85 | 86 | ``` 87 | 88 | At this point `main.py` is waiting for the user to hit `Enter`. That corresponds to the line `sys.stdin.read(1)` in our code. 89 | Let's run `read_write_heap.py`: 90 | 91 | ```shell 92 | julien@holberton:~/holberton/w/hackthevm1$ ps aux | grep main.py | grep -v grep 93 | julien 3929 0.0 0.7 31412 7848 pts/0 S+ 15:10 0:00 python3 ./main.py 94 | julien@holberton:~/holberton/w/hackthevm1$ sudo ./read_write_heap.py 3929 Holberton "~ Betty ~" 95 | [*] maps: /proc/3929/maps 96 | [*] mem: /proc/3929/mem 97 | [*] Found [heap]: 98 | pathname = [heap] 99 | addresses = 022dc000-023c6000 100 | permisions = rw-p 101 | offset = 00000000 102 | inode = 0 103 | Addr start [22dc000] | end [23c6000] 104 | [*] Found 'Holberton' at 8e192 105 | [*] Writing '~ Betty ~' at 236a192 106 | julien@holberton:~/holberton/w/hackthevm1$ 107 | ``` 108 | 109 | Easy! As expected, we found the string on the heap and replaced it. Now when we will hit `Enter` in the `main.py` script, it will print `b'~ Betty ~'`: 110 | 111 | ```shell 112 | 113 | b'Holberton' 114 | julien@holberton:~/holberton/w/hackthevm1$ 115 | ``` 116 | 117 | Wait, WAT?! 118 | 119 | ![WAT!](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/giphy-4.gif) 120 | 121 | We found the string "Holberton" and replaced it, but it was not the correct string? 122 | Before we go down the rabbit hole, we have one more thing to check. Our script stops when it finds the first occurence of the string. Let's run it several times to see if there are more occurences of the same string in the heap. 123 | 124 | ```shell 125 | julien@holberton:~/holberton/w/hackthevm1$ ./main.py 126 | b'Holberton' 127 | 128 | ``` 129 | 130 | ```shell 131 | 132 | julien@holberton:~/holberton/w/hackthevm1$ ps aux | grep main.py | grep -v grep 133 | julien 4051 0.1 0.7 31412 7832 pts/0 S+ 15:53 0:00 python3 ./main.py 134 | julien@holberton:~/holberton/w/hackthevm1$ sudo ./read_write_heap.py 4051 Holberton "~ Betty ~" 135 | [*] maps: /proc/4051/maps 136 | [*] mem: /proc/4051/mem 137 | [*] Found [heap]: 138 | pathname = [heap] 139 | addresses = 00bf4000-00cde000 140 | permisions = rw-p 141 | offset = 00000000 142 | inode = 0 143 | Addr start [bf4000] | end [cde000] 144 | [*] Found 'Holberton' at 8e162 145 | [*] Writing '~ Betty ~' at c82162 146 | julien@holberton:~/holberton/w/hackthevm1$ sudo ./read_write_heap.py 4051 Holberton "~ Betty ~" 147 | [*] maps: /proc/4051/maps 148 | [*] mem: /proc/4051/mem 149 | [*] Found [heap]: 150 | pathname = [heap] 151 | addresses = 00bf4000-00cde000 152 | permisions = rw-p 153 | offset = 00000000 154 | inode = 0 155 | Addr start [bf4000] | end [cde000] 156 | Can't find 'Holberton' 157 | julien@holberton:~/holberton/w/hackthevm1$ 158 | ``` 159 | 160 | Only one occurence. So where is the string "Holberton" that is used by the script? Where is our Python bytes object in memory? Could it be in the stack? Let's replace "[heap]" by "[stack]"\* in our `read_write_heap.py` script to create the `read_write_stack.py`: 161 | 162 | (*) _see [previous article](https://blog.holbertonschool.com/hack-the-virtual-memory-c-strings-proc/), the stack region is called "[stack]" on the `/proc/[pid]/maps` file_ 163 | 164 | ```python 165 | #!/usr/bin/env python3 166 | ''' 167 | Locates and replaces the first occurrence of a string in the stack 168 | of a process 169 | 170 | Usage: ./read_write_stack.py PID search_string replace_by_string 171 | Where: 172 | - PID is the pid of the target process 173 | - search_string is the ASCII string you are looking to overwrite 174 | - replace_by_string is the ASCII string you want to replace 175 | search_string with 176 | ''' 177 | 178 | import sys 179 | 180 | def print_usage_and_exit(): 181 | print('Usage: {} pid search write'.format(sys.argv[0])) 182 | sys.exit(1) 183 | 184 | # check usage 185 | if len(sys.argv) != 4: 186 | print_usage_and_exit() 187 | 188 | # get the pid from args 189 | pid = int(sys.argv[1]) 190 | if pid <= 0: 191 | print_usage_and_exit() 192 | search_string = str(sys.argv[2]) 193 | if search_string == "": 194 | print_usage_and_exit() 195 | write_string = str(sys.argv[3]) 196 | if search_string == "": 197 | print_usage_and_exit() 198 | 199 | # open the maps and mem files of the process 200 | maps_filename = "/proc/{}/maps".format(pid) 201 | print("[*] maps: {}".format(maps_filename)) 202 | mem_filename = "/proc/{}/mem".format(pid) 203 | print("[*] mem: {}".format(mem_filename)) 204 | 205 | # try opening the maps file 206 | try: 207 | maps_file = open('/proc/{}/maps'.format(pid), 'r') 208 | except IOError as e: 209 | print("[ERROR] Can not open file {}:".format(maps_filename)) 210 | print(" I/O error({}): {}".format(e.errno, e.strerror)) 211 | sys.exit(1) 212 | 213 | for line in maps_file: 214 | sline = line.split(' ') 215 | # check if we found the stack 216 | if sline[-1][:-1] != "[stack]": 217 | continue 218 | print("[*] Found [stack]:") 219 | 220 | # parse line 221 | addr = sline[0] 222 | perm = sline[1] 223 | offset = sline[2] 224 | device = sline[3] 225 | inode = sline[4] 226 | pathname = sline[-1][:-1] 227 | print("\tpathname = {}".format(pathname)) 228 | print("\taddresses = {}".format(addr)) 229 | print("\tpermisions = {}".format(perm)) 230 | print("\toffset = {}".format(offset)) 231 | print("\tinode = {}".format(inode)) 232 | 233 | # check if there is read and write permission 234 | if perm[0] != 'r' or perm[1] != 'w': 235 | print("[*] {} does not have read/write permission".format(pathname)) 236 | maps_file.close() 237 | exit(0) 238 | 239 | # get start and end of the stack in the virtual memory 240 | addr = addr.split("-") 241 | if len(addr) != 2: # never trust anyone, not even your OS :) 242 | print("[*] Wrong addr format") 243 | maps_file.close() 244 | exit(1) 245 | addr_start = int(addr[0], 16) 246 | addr_end = int(addr[1], 16) 247 | print("\tAddr start [{:x}] | end [{:x}]".format(addr_start, addr_end)) 248 | 249 | # open and read mem 250 | try: 251 | mem_file = open(mem_filename, 'rb+') 252 | except IOError as e: 253 | print("[ERROR] Can not open file {}:".format(mem_filename)) 254 | print(" I/O error({}): {}".format(e.errno, e.strerror)) 255 | maps_file.close() 256 | exit(1) 257 | 258 | # read stack 259 | mem_file.seek(addr_start) 260 | stack = mem_file.read(addr_end - addr_start) 261 | 262 | # find string 263 | try: 264 | i = stack.index(bytes(search_string, "ASCII")) 265 | except Exception: 266 | print("Can't find '{}'".format(search_string)) 267 | maps_file.close() 268 | mem_file.close() 269 | exit(0) 270 | print("[*] Found '{}' at {:x}".format(search_string, i)) 271 | 272 | # write the new stringprint("[*] Writing '{}' at {:x}".format(write_string, addr_start + i)) 273 | mem_file.seek(addr_start + i) 274 | mem_file.write(bytes(write_string, "ASCII")) 275 | 276 | # close filesmaps_file.close() 277 | mem_file.close() 278 | 279 | # there is only one stack in our example 280 | break 281 | ``` 282 | 283 | The above script (`read_write_heap.py`) does exactly the same thing than the previous one (`read_write_stack.py`), the same way. Except we're looking at the stack, instead of the heap. Let's try to find our string in the stack: 284 | 285 | ```shell 286 | julien@holberton:~/holberton/w/hackthevm1$ ./main.py 287 | b'Holberton' 288 | 289 | ``` 290 | 291 | ```shell 292 | julien@holberton:~/holberton/w/hackthevm1$ ps aux | grep main.py | grep -v grep 293 | julien 4124 0.2 0.7 31412 7848 pts/0 S+ 16:10 0:00 python3 ./main.py 294 | julien@holberton:~/holberton/w/hackthevm1$ sudo ./read_write_stack.py 4124 Holberton "~ Betty ~" 295 | [sudo] password for julien: 296 | [*] maps: /proc/4124/maps 297 | [*] mem: /proc/4124/mem 298 | [*] Found [stack]: 299 | pathname = [stack] 300 | addresses = 7fff2997e000-7fff2999f000 301 | permisions = rw-p 302 | offset = 00000000 303 | inode = 0 304 | Addr start [7fff2997e000] | end [7fff2999f000] 305 | Can't find 'Holberton' 306 | julien@holberton:~/holberton/w/hackthevm1$ 307 | ``` 308 | 309 | So our string is not in the heap and not in the stack: Where is it? It's time to dig into Python 3 internals and locate the string using what we will learn. Brace yourselves, the fun will begin now :) 310 | 311 | ## Locating the string in the virtual memory 312 | 313 | _Note: It is important to note that there are many implementations of Python 3. In this article, we are using the original and most commonly used: CPython (coded in C). What we are about to say about Python 3 will only be true for this implementation_. 314 | 315 | ### id 316 | 317 | There is a simple way to know where the object (be careful, the object is **not** the string) is in the virtual memory. CPython has a specific implementation of the [id()](https://docs.python.org/3.4/library/functions.html#id) builtin: `id()` will return the address of the object in memory. 318 | 319 | If we add a line to the Python script to print the id of our object, we should get its address (`main_id.py`): 320 | 321 | ```python 322 | #!/usr/bin/env python3 323 | ''' 324 | Prints: 325 | - the address of the bytes object 326 | - a b"string" (bytes object) 327 | reads a char from stdin 328 | and prints the same (or not :)) string again 329 | ''' 330 | 331 | import sys 332 | 333 | s = b"Holberton" 334 | print(hex(id(s))) 335 | print(s) 336 | sys.stdin.read(1) 337 | print(s) 338 | ``` 339 | 340 | ```shell 341 | julien@holberton:~/holberton/w/hackthevm1$ ./main_id.py 342 | 0x7f343f010210 343 | b'Holberton' 344 | 345 | ``` 346 | 347 | -> `0x7f343f010210`. Let's look at `/proc/` to understand where exactly our object is located. 348 | 349 | ```shell 350 | julien@holberton:/usr/include/python3.4$ ps aux | grep main_id.py | grep -v grep 351 | julien 4344 0.0 0.7 31412 7856 pts/0 S+ 16:53 0:00 python3 ./main_id.py 352 | julien@holberton:/usr/include/python3.4$ cat /proc/4344/maps 353 | 00400000-006fa000 r-xp 00000000 08:01 655561 /usr/bin/python3.4 354 | 008f9000-008fa000 r--p 002f9000 08:01 655561 /usr/bin/python3.4 355 | 008fa000-00986000 rw-p 002fa000 08:01 655561 /usr/bin/python3.4 356 | 00986000-009a2000 rw-p 00000000 00:00 0 357 | 021ba000-022a4000 rw-p 00000000 00:00 0 [heap] 358 | 7f343d797000-7f343de79000 r--p 00000000 08:01 663747 /usr/lib/locale/locale-archive 359 | 7f343de79000-7f343df7e000 r-xp 00000000 08:01 136303 /lib/x86_64-linux-gnu/libm-2.19.so 360 | 7f343df7e000-7f343e17d000 ---p 00105000 08:01 136303 /lib/x86_64-linux-gnu/libm-2.19.so 361 | 7f343e17d000-7f343e17e000 r--p 00104000 08:01 136303 /lib/x86_64-linux-gnu/libm-2.19.so 362 | 7f343e17e000-7f343e17f000 rw-p 00105000 08:01 136303 /lib/x86_64-linux-gnu/libm-2.19.so 363 | 7f343e17f000-7f343e197000 r-xp 00000000 08:01 136416 /lib/x86_64-linux-gnu/libz.so.1.2.8 364 | 7f343e197000-7f343e396000 ---p 00018000 08:01 136416 /lib/x86_64-linux-gnu/libz.so.1.2.8 365 | 7f343e396000-7f343e397000 r--p 00017000 08:01 136416 /lib/x86_64-linux-gnu/libz.so.1.2.8 366 | 7f343e397000-7f343e398000 rw-p 00018000 08:01 136416 /lib/x86_64-linux-gnu/libz.so.1.2.8 367 | 7f343e398000-7f343e3bf000 r-xp 00000000 08:01 136275 /lib/x86_64-linux-gnu/libexpat.so.1.6.0 368 | 7f343e3bf000-7f343e5bf000 ---p 00027000 08:01 136275 /lib/x86_64-linux-gnu/libexpat.so.1.6.0 369 | 7f343e5bf000-7f343e5c1000 r--p 00027000 08:01 136275 /lib/x86_64-linux-gnu/libexpat.so.1.6.0 370 | 7f343e5c1000-7f343e5c2000 rw-p 00029000 08:01 136275 /lib/x86_64-linux-gnu/libexpat.so.1.6.0 371 | 7f343e5c2000-7f343e5c4000 r-xp 00000000 08:01 136408 /lib/x86_64-linux-gnu/libutil-2.19.so 372 | 7f343e5c4000-7f343e7c3000 ---p 00002000 08:01 136408 /lib/x86_64-linux-gnu/libutil-2.19.so 373 | 7f343e7c3000-7f343e7c4000 r--p 00001000 08:01 136408 /lib/x86_64-linux-gnu/libutil-2.19.so 374 | 7f343e7c4000-7f343e7c5000 rw-p 00002000 08:01 136408 /lib/x86_64-linux-gnu/libutil-2.19.so 375 | 7f343e7c5000-7f343e7c8000 r-xp 00000000 08:01 136270 /lib/x86_64-linux-gnu/libdl-2.19.so 376 | 7f343e7c8000-7f343e9c7000 ---p 00003000 08:01 136270 /lib/x86_64-linux-gnu/libdl-2.19.so 377 | 7f343e9c7000-7f343e9c8000 r--p 00002000 08:01 136270 /lib/x86_64-linux-gnu/libdl-2.19.so 378 | 7f343e9c8000-7f343e9c9000 rw-p 00003000 08:01 136270 /lib/x86_64-linux-gnu/libdl-2.19.so 379 | 7f343e9c9000-7f343eb83000 r-xp 00000000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 380 | 7f343eb83000-7f343ed83000 ---p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 381 | 7f343ed83000-7f343ed87000 r--p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 382 | 7f343ed87000-7f343ed89000 rw-p 001be000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 383 | 7f343ed89000-7f343ed8e000 rw-p 00000000 00:00 0 384 | 7f343ed8e000-7f343eda7000 r-xp 00000000 08:01 136373 /lib/x86_64-linux-gnu/libpthread-2.19.so 385 | 7f343eda7000-7f343efa6000 ---p 00019000 08:01 136373 /lib/x86_64-linux-gnu/libpthread-2.19.so 386 | 7f343efa6000-7f343efa7000 r--p 00018000 08:01 136373 /lib/x86_64-linux-gnu/libpthread-2.19.so 387 | 7f343efa7000-7f343efa8000 rw-p 00019000 08:01 136373 /lib/x86_64-linux-gnu/libpthread-2.19.so 388 | 7f343efa8000-7f343efac000 rw-p 00000000 00:00 0 389 | 7f343efac000-7f343efcf000 r-xp 00000000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so 390 | 7f343f000000-7f343f1b6000 rw-p 00000000 00:00 0 391 | 7f343f1c5000-7f343f1cc000 r--s 00000000 08:01 918462 /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache 392 | 7f343f1cc000-7f343f1ce000 rw-p 00000000 00:00 0 393 | 7f343f1ce000-7f343f1cf000 r--p 00022000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so 394 | 7f343f1cf000-7f343f1d0000 rw-p 00023000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so 395 | 7f343f1d0000-7f343f1d1000 rw-p 00000000 00:00 0 396 | 7ffccf1fd000-7ffccf21e000 rw-p 00000000 00:00 0 [stack] 397 | 7ffccf23c000-7ffccf23e000 r--p 00000000 00:00 0 [vvar] 398 | 7ffccf23e000-7ffccf240000 r-xp 00000000 00:00 0 [vdso] 399 | ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] 400 | julien@holberton:/usr/include/python3.4$ 401 | ``` 402 | 403 | -> Our object is stored in the following memory region: `7f343f000000-7f343f1b6000 rw-p 00000000 00:00 0`, which is not the heap, and not the stack. That confirms what we saw earlier. But that doesn't mean that the string itself is stored in the same memory region. For instance, the `bytes` object could store a pointer to the string, and not a copy of a string. Of course, at this point we could search for our string in this memory region, but we want to understand and be positive we're looking in the right area, and not use "brute force" to find the solution. It's time to learn more about `bytes` objects. 404 | 405 | ### `bytesobject.h` 406 | 407 | We are using the C implementation of Python (CPython), so let's look at the header file for bytes objects. 408 | 409 | _Note: If you don't have the Python 3 header files, you can use this command on `Ubuntu`: `sudo apt-get install python3-dev` to downloadd them on your system. If you are using the exact same environment as me (see the "Environment" section above), you should then be able to see the Python 3 header files in the `/usr/include/python3.4/` directory._ 410 | 411 | From `bytesobject.h`: 412 | 413 | ```c 414 | typedef struct { 415 | PyObject_VAR_HEAD 416 | Py_hash_t ob_shash; 417 | char ob_sval[1]; 418 | 419 | /* Invariants: 420 | * ob_sval contains space for 'ob_size+1' elements. 421 | * ob_sval[ob_size] == 0. 422 | * ob_shash is the hash of the string or -1 if not computed yet. 423 | */ 424 | } PyBytesObject; 425 | ``` 426 | 427 | What does that tell us? 428 | 429 | - A Python 3 `bytes` object is represented internally using a variable of type `PyBytesObject` 430 | - `ob_sval` holds the entire string 431 | - The string ends with `0` 432 | - `ob_size` stores the length of the string (to find `ob_size`, look at the definition of the macro `PyObject_VAR_HEAD` in `objects.h`. We'll look at it later) 433 | 434 | So in our example, if we were able to print the `bytes` object, we should see this: 435 | 436 | - `ob_sval`: "Holberton" -> Bytes values: `48` `6f` `6c` `62` `65` `72` `74` `6f` `6e` `00` 437 | - `ob_size`: 9 438 | 439 | Based on what we did learn previously, this means that the string is "inside" the `bytes` object. So inside the same memory region \o/ 440 | 441 | What if we didn't know about the way `id` was implemented in CPython? There is actually another way that we can use to find where the string is: looking at the actual object in memory. 442 | 443 | ### Looking at the `bytes` object in memory 444 | 445 | If we want to look directly at the `PyBytesObject` variable, we will need to create a C function, and call this C function from Python. There are different ways to call a C function from Python. We will use the simplest one: using a dynamic library. 446 | 447 | #### Creating our C function 448 | 449 | So the idea is to create a C function that is called from Python with the object as a parameter, and then "explore" this object to get the exact address of the string (as well as other information about the object). 450 | 451 | The function prototype should be: `void print_python_bytes(PyObject *p);`, where `p` is a pointer to our object (so `p` stores the address of our object in the virtual memory). It doesn't have to return anything. 452 | 453 | ##### `object.h` 454 | 455 | You probably have noticed that we don't use a parameter of type `PyBytesObject`. To understand why, let's have a look at the `object.h` header file and see what we can learn from it: 456 | 457 | ```c 458 | /* Object and type object interface */ 459 | 460 | /* 461 | Objects are structures allocated on the heap. Special rules apply to 462 | the use of objects to ensure they are properly garbage-collected. 463 | Objects are never allocated statically or on the stack; they must be 464 | ... 465 | */ 466 | ``` 467 | 468 | - "Objects are never allocated statically or on the stack" -> ok, now we know why it was not on the stack. 469 | - "Objects are structures allocated on the heap" -> wait... WAT? We searched for the string in the heap and it was NOT there... I'm confused! We'll discuss this later, in another article :) 470 | 471 | What else can we read in this file: 472 | 473 | ```c 474 | /* 475 | ... 476 | Objects do not float around in memory; once allocated an object keeps 477 | the same size and address. Objects that must hold variable-size data 478 | ... 479 | */ 480 | ``` 481 | 482 | - "Objects do not float around in memory; once allocated an object keeps the same size and address". Good to know. That means that if we modify the correct string, it will always be modfied, and the addresses will never change 483 | - "once allocated" -> allocation? but not using the heap? I'm confused! We'll discuss this later in another article :) 484 | 485 | ```c 486 | /* 487 | ... 488 | Objects are always accessed through pointers of the type 'PyObject *'. 489 | The type 'PyObject' is a structure that only contains the reference count 490 | and the type pointer. The actual memory allocated for an object 491 | contains other data that can only be accessed after casting the pointer 492 | to a pointer to a longer structure type. This longer type must start 493 | with the reference count and type fields; the macro PyObject_HEAD should be 494 | used for this (to accommodate for future changes). The implementation 495 | of a particular object type can cast the object pointer to the proper 496 | type and back. 497 | ... 498 | */ 499 | ``` 500 | 501 | - "Objects are always accessed through pointers of the type 'PyObject \*'" -> that is why we have to have to take a pointer of type `PyObject` (vs `PyBytesObject`) as the parameter of our function 502 | - "The actual memory allocated for an object contains other data that can only be accessed after casting the pointer to a pointer to a longer structure type." -> So we will have to cast our function parameter to `PyBytesObject *` in order to access all its information. This is possible because the beginning of the `PyBytesObject` starts with a `PyVarObject` which itself starts with a `PyObject`: 503 | 504 | ```c 505 | /* PyObject_VAR_HEAD defines the initial segment of all variable-size 506 | * container objects. These end with a declaration of an array with 1 507 | * element, but enough space is malloc'ed so that the array actually 508 | * has room for ob_size elements. Note that ob_size is an element count, 509 | * not necessarily a byte count. 510 | */ 511 | #define PyObject_VAR_HEAD PyVarObject ob_base; 512 | #define Py_INVALID_SIZE (Py_ssize_t)-1 513 | 514 | /* Nothing is actually declared to be a PyObject, but every pointer to 515 | * a Python object can be cast to a PyObject*. This is inheritance built 516 | * by hand. Similarly every pointer to a variable-size Python object can, 517 | * in addition, be cast to PyVarObject*. 518 | */ 519 | typedef struct _object { 520 | _PyObject_HEAD_EXTRA 521 | Py_ssize_t ob_refcnt; 522 | struct _typeobject *ob_type; 523 | } PyObject; 524 | 525 | typedef struct { 526 | PyObject ob_base; 527 | Py_ssize_t ob_size; /* Number of items in variable part */ 528 | } PyVarObject; 529 | ``` 530 | 531 | -> Here is the `ob_size` that `bytesobject.h` was mentioning. 532 | 533 | ##### The C function 534 | 535 | Based on everything we just learned, the C code is pretty straightforward (`bytes.c`): 536 | 537 | ```c 538 | #include "Python.h" 539 | 540 | /** 541 | * print_python_bytes - prints info about a Python 3 bytes object 542 | * @p: a pointer to a Python 3 bytes object 543 | * 544 | * Return: Nothing 545 | */ 546 | void print_python_bytes(PyObject *p) 547 | { 548 | /* The pointer with the correct type.*/ 549 | PyBytesObject *s; 550 | unsigned int i; 551 | 552 | printf("[.] bytes object info\n"); 553 | /* casting the PyObject pointer to a PyBytesObject pointer */ 554 | s = (PyBytesObject *)p; 555 | /* never trust anyone, check that this is actually 556 | a PyBytesObject object. */ 557 | if (s && PyBytes_Check(s)) 558 | { 559 | /* a pointer holds the memory address of the first byte 560 | of the data it points to */ 561 | printf(" address of the object: %p\n", (void *)s); 562 | /* op_size is in the ob_base structure, of type PyVarObject. */ 563 | printf(" size: %ld\n", s->ob_base.ob_size); 564 | /* ob_sval is the array of bytes, ending with the value 0: 565 | ob_sval[ob_size] == 0 */ 566 | printf(" trying string: %s\n", s->ob_sval); 567 | printf(" address of the data: %p\n", (void *)(s->ob_sval)); 568 | printf(" bytes:"); 569 | /* printing each byte at a time, in case this is not 570 | a "string". bytes doesn't have to be strings. 571 | ob_sval contains space for 'ob_size+1' elements. 572 | ob_sval[ob_size] == 0. */ 573 | for (i = 0; i < s->ob_base.ob_size + 1; i++) 574 | { 575 | printf(" %02x", s->ob_sval[i] & 0xff); 576 | } 577 | printf("\n"); 578 | } 579 | /* if this is not a PyBytesObject print an error message */ 580 | else 581 | { 582 | fprintf(stderr, " [ERROR] Invalid Bytes Object\n"); 583 | } 584 | } 585 | ``` 586 | 587 | ### Calling the C function from the python script 588 | 589 | #### Creating the dynamic library 590 | 591 | As we said earlier, we will use the "dynamic library method" to call our C function from Python 3. So we just need to compile our C file with this command: 592 | 593 | ``` 594 | gcc -Wall -Wextra -pedantic -Werror -std=c99 -shared -Wl,-soname,libPython.so -o libPython.so -fPIC -I/usr/include/python3.4 bytes.c 595 | ``` 596 | 597 | _Don't forget to include the Python 3 header files directory: `-I/usr/include/python3.4`_ 598 | 599 | Hopefully, this should have created a dynamic library called `libPython.so`. 600 | 601 | #### Using the dynamic library from Python 3 602 | 603 | In order to use our function we simply need to add these lines in the Python script: 604 | 605 | ```python 606 | import ctypes 607 | 608 | lib = ctypes.CDLL('./libPython.so') 609 | lib.print_python_bytes.argtypes = [ctypes.py_object] 610 | ``` 611 | 612 | and call our function this way: 613 | 614 | ```python 615 | lib.print_python_bytes(s) 616 | ``` 617 | 618 | ### The new Python script 619 | 620 | Here is the complete source code of the new Python 3 script (`main_bytes.py`): 621 | 622 | ```python 623 | #!/usr/bin/env python3 624 | ''' 625 | Prints: 626 | - the address of the bytes object 627 | - a b"string" (bytes object) 628 | - information about the bytes object 629 | And then: 630 | - reads a char from stdin 631 | - prints the same (or not :)) information again 632 | ''' 633 | 634 | import sys 635 | import ctypes 636 | 637 | lib = ctypes.CDLL('./libPython.so') 638 | lib.print_python_bytes.argtypes = [ctypes.py_object] 639 | 640 | s = b"Holberton" 641 | print(hex(id(s))) 642 | print(s) 643 | lib.print_python_bytes(s) 644 | 645 | sys.stdin.read(1) 646 | 647 | print(hex(id(s))) 648 | print(s) 649 | lib.print_python_bytes(s) 650 | ``` 651 | 652 | Let's run it! 653 | 654 | ```shell 655 | julien@holberton:~/holberton/w/hackthevm1$ ./main_bytes.py 656 | 0x7f04d721b210 657 | b'Holberton' 658 | [.] bytes object info 659 | address of the object: 0x7f04d721b210 660 | size: 9 661 | trying string: Holberton 662 | address of the data: 0x7f04d721b230 663 | bytes: 48 6f 6c 62 65 72 74 6f 6e 00 664 | 665 | ``` 666 | 667 | As expected: 668 | 669 | - `id()` returns the address of the object itself (`0x7f04d721b210`) 670 | - the size of the data of our object (`ob_size`) is `9` 671 | - the data of our object is "Holberton", `48` `6f` `6c` `62` `65` `72` `74` `6f` `6e` `00` (and it ends with `00` as specified on the header file `bytesobject.h`) 672 | 673 | Annnnnnd, we have found the exact address of our string: `0x7f04d721b230` \o/ 674 | 675 | ![monty python](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/tumblr_nomr17FFSt1tym3lfo1_400.gif) 676 | 677 | _Sorry I had to add at least one Monty Python reference :) ([why](https://docs.python.org/3.4/faq/general.html#why-is-it-called-python))_ 678 | 679 | ## `rw_all.py` 680 | 681 | Now that we undertand a little bit more about what's happening, it's ok to "brute-force" the mapped memory regions. Let's update the script that replaces the string. Instead of looking only in the stack or the heap, let's look in all readable and writeable memory regions of the process. Here's the source code: 682 | 683 | ```python 684 | #!/usr/bin/env python3 685 | ''' 686 | Locates and replaces (if we have permission) all occurrences of 687 | an ASCII string in the entire virtual memory of a process. 688 | 689 | Usage: ./rw_all.py PID search_string replace_by_string 690 | Where: 691 | - PID is the pid of the target process 692 | - search_string is the ASCII string you are looking to overwrite 693 | - replace_by_string is the ASCII string you want to replace 694 | search_string with 695 | ''' 696 | 697 | import sys 698 | 699 | def print_usage_and_exit(): 700 | print('Usage: {} pid search write'.format(sys.argv[0])) 701 | exit(1) 702 | 703 | # check usage 704 | if len(sys.argv) != 4: 705 | print_usage_and_exit() 706 | 707 | # get the pid from args 708 | pid = int(sys.argv[1]) 709 | if pid <= 0: 710 | print_usage_and_exit() 711 | search_string = str(sys.argv[2]) 712 | if search_string == "": 713 | print_usage_and_exit() 714 | write_string = str(sys.argv[3]) 715 | if search_string == "": 716 | print_usage_and_exit() 717 | 718 | # open the maps and mem files of the process 719 | maps_filename = "/proc/{}/maps".format(pid) 720 | print("[*] maps: {}".format(maps_filename)) 721 | mem_filename = "/proc/{}/mem".format(pid) 722 | print("[*] mem: {}".format(mem_filename)) 723 | 724 | # try opening the file 725 | try: 726 | maps_file = open('/proc/{}/maps'.format(pid), 'r') 727 | except IOError as e: 728 | print("[ERROR] Can not open file {}:".format(maps_filename)) 729 | print(" I/O error({}): {}".format(e.errno, e.strerror)) 730 | exit(1) 731 | 732 | for line in maps_file: 733 | # print the name of the memory region 734 | sline = line.split(' ') 735 | name = sline[-1][:-1]; 736 | print("[*] Searching in {}:".format(name)) 737 | 738 | # parse line 739 | addr = sline[0] 740 | perm = sline[1] 741 | offset = sline[2] 742 | device = sline[3] 743 | inode = sline[4] 744 | pathname = sline[-1][:-1] 745 | 746 | # check if there are read and write permissions 747 | if perm[0] != 'r' or perm[1] != 'w': 748 | print("\t[\x1B[31m!\x1B[m] {} does not have read/write permissions ({})".format(pathname, perm)) 749 | continue 750 | 751 | print("\tpathname = {}".format(pathname)) 752 | print("\taddresses = {}".format(addr)) 753 | print("\tpermisions = {}".format(perm)) 754 | print("\toffset = {}".format(offset)) 755 | print("\tinode = {}".format(inode)) 756 | 757 | # get start and end of the memoy region 758 | addr = addr.split("-") 759 | if len(addr) != 2: # never trust anyone 760 | print("[*] Wrong addr format") 761 | maps_file.close() 762 | exit(1) 763 | addr_start = int(addr[0], 16) 764 | addr_end = int(addr[1], 16) 765 | print("\tAddr start [{:x}] | end [{:x}]".format(addr_start, addr_end)) 766 | 767 | # open and read the memory region 768 | try: 769 | mem_file = open(mem_filename, 'rb+') 770 | except IOError as e: 771 | print("[ERROR] Can not open file {}:".format(mem_filename)) 772 | print(" I/O error({}): {}".format(e.errno, e.strerror)) 773 | maps_file.close() 774 | 775 | # read the memory region 776 | mem_file.seek(addr_start) 777 | region = mem_file.read(addr_end - addr_start) 778 | 779 | # find string 780 | nb_found = 0; 781 | try: 782 | i = region.index(bytes(search_string, "ASCII")) 783 | while (i): 784 | print("\t[\x1B[32m:)\x1B[m] Found '{}' at {:x}".format(search_string, i)) 785 | nb_found = nb_found + 1 786 | # write the new string 787 | print("\t[:)] Writing '{}' at {:x}".format(write_string, addr_start + i)) 788 | mem_file.seek(addr_start + i) 789 | mem_file.write(bytes(write_string, "ASCII")) 790 | mem_file.flush() 791 | 792 | # update our buffer 793 | region.write(bytes(write_string, "ASCII"), i) 794 | 795 | i = region.index(bytes(search_string, "ASCII")) 796 | except Exception: 797 | if nb_found == 0: 798 | print("\t[\x1B[31m:(\x1B[m] Can't find '{}'".format(search_string)) 799 | mem_file.close() 800 | 801 | # close files 802 | maps_file.close() 803 | ``` 804 | 805 | Let's run it! 806 | 807 | ```shell 808 | julien@holberton:~/holberton/w/hackthevm1$ ./main_bytes.py 809 | 0x7f37f1e01210 810 | b'Holberton' 811 | [.] bytes object info 812 | address of the object: 0x7f37f1e01210 813 | size: 9 814 | trying string: Holberton 815 | address of the data: 0x7f37f1e01230 816 | bytes: 48 6f 6c 62 65 72 74 6f 6e 00 817 | 818 | ``` 819 | 820 | ```shell 821 | julien@holberton:~/holberton/w/hackthevm1$ ps aux | grep main_bytes.py | grep -v grep 822 | julien 4713 0.0 0.8 37720 8208 pts/0 S+ 18:48 0:00 python3 ./main_bytes.py 823 | julien@holberton:~/holberton/w/hackthevm1$ sudo ./rw_all.py 4713 Holberton "~ Betty ~" 824 | [*] maps: /proc/4713/maps 825 | [*] mem: /proc/4713/mem 826 | [*] Searching in /usr/bin/python3.4: 827 | [!] /usr/bin/python3.4 does not have read/write permissions (r-xp) 828 | ... 829 | [*] Searching in [heap]: 830 | pathname = [heap] 831 | addresses = 00e26000-00f11000 832 | permisions = rw-p 833 | offset = 00000000 834 | inode = 0 835 | Addr start [e26000] | end [f11000] 836 | [:)] Found 'Holberton' at 8e422 837 | [:)] Writing '~ Betty ~' at eb4422 838 | ... 839 | [*] Searching in : 840 | pathname = 841 | addresses = 7f37f1df1000-7f37f1fa7000 842 | permisions = rw-p 843 | offset = 00000000 844 | inode = 0 845 | Addr start [7f37f1df1000] | end [7f37f1fa7000] 846 | [:)] Found 'Holberton' at 10230 847 | [:)] Writing '~ Betty ~' at 7f37f1e01230 848 | ... 849 | [*] Searching in [stack]: 850 | pathname = [stack] 851 | addresses = 7ffdc3d0c000-7ffdc3d2d000 852 | permisions = rw-p 853 | offset = 00000000 854 | inode = 0 855 | Addr start [7ffdc3d0c000] | end [7ffdc3d2d000] 856 | [:(] Can't find 'Holberton' 857 | ... 858 | julien@holberton:~/holberton/w/hackthevm1$ 859 | ``` 860 | 861 | And if we hit enter in the running `main_bytes.py`... 862 | 863 | ```shell 864 | julien@holberton:~/holberton/w/hackthevm1$ ./main_bytes.py 865 | 0x7f37f1e01210 866 | b'Holberton' 867 | [.] bytes object info 868 | address of the object: 0x7f37f1e01210 869 | size: 9 870 | trying string: Holberton 871 | address of the data: 0x7f37f1e01230 872 | bytes: 48 6f 6c 62 65 72 74 6f 6e 00 873 | 874 | 0x7f37f1e01210 875 | b'~ Betty ~' 876 | [.] bytes object info 877 | address of the object: 0x7f37f1e01210 878 | size: 9 879 | trying string: ~ Betty ~ 880 | address of the data: 0x7f37f1e01230 881 | bytes: 7e 20 42 65 74 74 79 20 7e 00 882 | julien@holberton:~/holberton/w/hackthevm1$ 883 | ``` 884 | 885 | BOOM! 886 | 887 | ![yeah!](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/giphy-3.gif) 888 | 889 | ## Outro 890 | 891 | We managed to modify the string used by our Python 3 script. Awesome! But we still have questions to answer: 892 | 893 | - What is the "Holberton" string that is in the `[heap]` memory region? 894 | - How does Python 3 allocate memory outside of the heap? 895 | - If Python 3 is not using the heap, what does it refer to when it says "Objects are structures allocated on the heap" in `object.h`? 896 | 897 | That will be for another time :) 898 | 899 | In the meantime, if you are too curious to wait for the next article, you can try to find out yourself. 900 | 901 | ### Questions? Feedback? 902 | 903 | If you have questions or feedback don't hesitate to ping us on Twitter at [@holbertonschool](https://twitter.com/holbertonschool) or [@julienbarbier42](https://twitter.com/julienbarbier42). 904 | _Haters, please send your comments to `/dev/null`._ 905 | 906 | Happy Hacking! 907 | 908 | ### Thank you for reading! 909 | 910 | As always, no-one is perfect (except [Chuck](http://codesqueeze.com/the-ultimate-top-25-chuck-norris-the-programmer-jokes/) of course), so don't hesitate to contribute or send me your comments. 911 | 912 | ### Files 913 | 914 | [This repo](https://github.com/holbertonschool/Hack-The-Virtual-Memory/tree/master/01.%20Python%20bytes) contains the source code for all scripts and dynamic libraries created in this tutorial: 915 | 916 | - `main.py`: the first target 917 | - `main_id.py`: the second target, printing the id of the bytes object 918 | - `main_bytes.py`: the final target, printing also the information about the bytes object, using our dynamic library 919 | - `read_write_heap.py`: the "original" script to find and replace strings in the heap of a process 920 | - `read_write_stack.py`: same but, searches and replaces in the stack instead of the heap 921 | - `rw_all.py`: same but in every memory regions that are readable and writable 922 | - `bytes.c`: the C function to print info about a Python 3 bytes object 923 | 924 | _Many thanks to [Tim](https://twitter.com/wintermanc3r) for English proof-reading & [Guillaume](https://twitter.com/guillaumesalva) for PEP8 proof-reading :)_ 925 | -------------------------------------------------------------------------------- /01. Python bytes/bytes.c: -------------------------------------------------------------------------------- 1 | #include "Python.h" 2 | 3 | /** 4 | * print_python_bytes - prints info about a Python 3 bytes object 5 | * @p: a pointer to a Python 3 bytes object 6 | * 7 | * Return: Nothing 8 | */ 9 | void print_python_bytes(PyObject *p) 10 | { 11 | /* The pointer with the correct type. */ 12 | PyBytesObject *s; 13 | unsigned int i; 14 | 15 | printf("[.] bytes object info\n"); 16 | /* casting the PyObject pointer to a PyBytesObject pointer */ 17 | s = (PyBytesObject *)p; 18 | /* never trust anyone, check that this is actually 19 | a PyBytesObject object. */ 20 | if (s && PyBytes_Check(s)) 21 | { 22 | /* a pointer holds the memory address of the first byte 23 | of the data it points to */ 24 | printf(" address of the object: %p\n", (void *)s); 25 | /* op_size is in the ob_base structure, of type PyVarObject. */ 26 | printf(" size: %ld\n", s->ob_base.ob_size); 27 | /* ob_sval is the array of bytes ending with the value 0: 28 | ob_sval[ob_size] == 0. */ 29 | printf(" trying string: %s\n", s->ob_sval); 30 | printf(" address of the data: %p\n", (void *)(s->ob_sval)); 31 | printf(" bytes:"); 32 | /* printing each byte at a time, in case this is not 33 | a "string". bytes doesn't have to be strings. 34 | ob_sval contains space for 'ob_size+1' elements. 35 | ob_sval[ob_size] == 0. */ 36 | for (i = 0; i < s->ob_base.ob_size + 1; i++) 37 | { 38 | printf(" %02x", s->ob_sval[i] & 0xff); 39 | } 40 | printf("\n"); 41 | } 42 | /* if this is not a PyBytesObject print an error message. */ 43 | else 44 | { 45 | fprintf(stderr, " [ERROR] Invalid Bytes Object\n"); 46 | } 47 | } 48 | -------------------------------------------------------------------------------- /01. Python bytes/main.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | ''' 3 | Prints a b"string" (bytes object), reads a char from stdin 4 | and prints the same (or not :)) string again 5 | ''' 6 | 7 | import sys 8 | 9 | s = b"Holberton" 10 | print(s) 11 | sys.stdin.read(1) 12 | print(s) 13 | -------------------------------------------------------------------------------- /01. Python bytes/main_bytes.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | ''' 3 | Prints: 4 | - the address of the bytes object 5 | - a b"string" (bytes object) 6 | - information about the bytes object 7 | And then: 8 | - reads a char from stdin 9 | - prints the same (or not :)) information again 10 | ''' 11 | 12 | import sys 13 | import ctypes 14 | 15 | lib = ctypes.CDLL('./libPython.so') 16 | lib.print_python_bytes.argtypes = [ctypes.py_object] 17 | 18 | s = b"Holberton" 19 | print(hex(id(s))) 20 | print(s) 21 | lib.print_python_bytes(s) 22 | 23 | sys.stdin.read(1) 24 | 25 | print(hex(id(s))) 26 | print(s) 27 | lib.print_python_bytes(s) 28 | -------------------------------------------------------------------------------- /01. Python bytes/main_id.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | ''' 3 | Prints: 4 | - the address of the bytes object 5 | - a b"string" (bytes object) 6 | reads a char from stdin 7 | and prints the same (or not :)) string again 8 | ''' 9 | 10 | import sys 11 | 12 | s = b"Holberton" 13 | print(hex(id(s))) 14 | print(s) 15 | sys.stdin.read(1) 16 | print(s) 17 | -------------------------------------------------------------------------------- /01. Python bytes/read_write_heap.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | ''' 3 | Locates and replaces the first occurrence of a string in the heap 4 | of a process 5 | 6 | Usage: ./read_write_heap.py PID search_string replace_by_string 7 | Where: 8 | - PID is the pid of the target process 9 | - search_string is the ASCII string you are looking to overwrite 10 | - replace_by_string is the ASCII string you want to replace 11 | search_string with 12 | ''' 13 | 14 | import sys 15 | 16 | 17 | def print_usage_and_exit(): 18 | print('Usage: {} pid search write'.format(sys.argv[0])) 19 | sys.exit(1) 20 | 21 | # check usage 22 | if len(sys.argv) != 4: 23 | print_usage_and_exit() 24 | 25 | # get the pid from args 26 | pid = int(sys.argv[1]) 27 | if pid <= 0: 28 | print_usage_and_exit() 29 | search_string = str(sys.argv[2]) 30 | if search_string == "": 31 | print_usage_and_exit() 32 | write_string = str(sys.argv[3]) 33 | if write_string == "": 34 | print_usage_and_exit() 35 | 36 | # open the maps and mem files of the process 37 | maps_filename = "/proc/{}/maps".format(pid) 38 | print("[*] maps: {}".format(maps_filename)) 39 | mem_filename = "/proc/{}/mem".format(pid) 40 | print("[*] mem: {}".format(mem_filename)) 41 | 42 | # try opening the maps file 43 | try: 44 | maps_file = open('/proc/{}/maps'.format(pid), 'r') 45 | except IOError as e: 46 | print("[ERROR] Can not open file {}:".format(maps_filename)) 47 | print(" I/O error({}): {}".format(e.errno, e.strerror)) 48 | sys.exit(1) 49 | 50 | for line in maps_file: 51 | sline = line.split(' ') 52 | # check if we found the heap 53 | if sline[-1][:-1] != "[heap]": 54 | continue 55 | print("[*] Found [heap]:") 56 | 57 | # parse line 58 | addr = sline[0] 59 | perm = sline[1] 60 | offset = sline[2] 61 | device = sline[3] 62 | inode = sline[4] 63 | pathname = sline[-1][:-1] 64 | print("\tpathname = {}".format(pathname)) 65 | print("\taddresses = {}".format(addr)) 66 | print("\tpermisions = {}".format(perm)) 67 | print("\toffset = {}".format(offset)) 68 | print("\tinode = {}".format(inode)) 69 | 70 | # check if there is read and write permission 71 | if perm[0] != 'r' or perm[1] != 'w': 72 | print("[*] {} does not have read/write permission".format(pathname)) 73 | maps_file.close() 74 | exit(0) 75 | 76 | # get start and end of the heap in the virtual memory 77 | addr = addr.split("-") 78 | if len(addr) != 2: # never trust anyone, not even your OS :) 79 | print("[*] Wrong addr format") 80 | maps_file.close() 81 | exit(1) 82 | addr_start = int(addr[0], 16) 83 | addr_end = int(addr[1], 16) 84 | print("\tAddr start [{:x}] | end [{:x}]".format(addr_start, addr_end)) 85 | 86 | # open and read mem 87 | try: 88 | mem_file = open(mem_filename, 'rb+') 89 | except IOError as e: 90 | print("[ERROR] Can not open file {}:".format(mem_filename)) 91 | print(" I/O error({}): {}".format(e.errno, e.strerror)) 92 | maps_file.close() 93 | exit(1) 94 | 95 | # read heap 96 | mem_file.seek(addr_start) 97 | heap = mem_file.read(addr_end - addr_start) 98 | 99 | # find string 100 | try: 101 | i = heap.index(bytes(search_string, "ASCII")) 102 | except Exception: 103 | print("Can't find '{}'".format(search_string)) 104 | maps_file.close() 105 | mem_file.close() 106 | exit(0) 107 | print("[*] Found '{}' at {:x}".format(search_string, i)) 108 | 109 | # write the new string 110 | print("[*] Writing '{}' at {:x}".format(write_string, addr_start + i)) 111 | mem_file.seek(addr_start + i) 112 | mem_file.write(bytes(write_string, "ASCII")) 113 | 114 | # close files 115 | maps_file.close() 116 | mem_file.close() 117 | 118 | # there is only one heap in our example 119 | break 120 | -------------------------------------------------------------------------------- /01. Python bytes/read_write_stack.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | ''' 3 | Locates and replaces the first occurrence of a string in the stack 4 | of a process 5 | 6 | Usage: ./read_write_stack.py PID search_string replace_by_string 7 | Where: 8 | - PID is the pid of the target process 9 | - search_string is the ASCII string you are looking to overwrite 10 | - replace_by_string is the ASCII string you want to replace 11 | search_string with 12 | ''' 13 | 14 | import sys 15 | 16 | 17 | def print_usage_and_exit(): 18 | print('Usage: {} pid search write'.format(sys.argv[0])) 19 | sys.exit(1) 20 | 21 | # check usage 22 | if len(sys.argv) != 4: 23 | print_usage_and_exit() 24 | 25 | # get the pid from args 26 | pid = int(sys.argv[1]) 27 | if pid <= 0: 28 | print_usage_and_exit() 29 | search_string = str(sys.argv[2]) 30 | if search_string == "": 31 | print_usage_and_exit() 32 | write_string = str(sys.argv[3]) 33 | if write_string == "": 34 | print_usage_and_exit() 35 | 36 | # open the maps and mem files of the process 37 | maps_filename = "/proc/{}/maps".format(pid) 38 | print("[*] maps: {}".format(maps_filename)) 39 | mem_filename = "/proc/{}/mem".format(pid) 40 | print("[*] mem: {}".format(mem_filename)) 41 | 42 | # try opening the maps file 43 | try: 44 | maps_file = open('/proc/{}/maps'.format(pid), 'r') 45 | except IOError as e: 46 | print("[ERROR] Can not open file {}:".format(maps_filename)) 47 | print(" I/O error({}): {}".format(e.errno, e.strerror)) 48 | sys.exit(1) 49 | 50 | for line in maps_file: 51 | sline = line.split(' ') 52 | # check if we found the stack 53 | if sline[-1][:-1] != "[stack]": 54 | continue 55 | print("[*] Found [stack]:") 56 | 57 | # parse line 58 | addr = sline[0] 59 | perm = sline[1] 60 | offset = sline[2] 61 | device = sline[3] 62 | inode = sline[4] 63 | pathname = sline[-1][:-1] 64 | print("\tpathname = {}".format(pathname)) 65 | print("\taddresses = {}".format(addr)) 66 | print("\tpermisions = {}".format(perm)) 67 | print("\toffset = {}".format(offset)) 68 | print("\tinode = {}".format(inode)) 69 | 70 | # check if there is read and write permission 71 | if perm[0] != 'r' or perm[1] != 'w': 72 | print("[*] {} does not have read/write permission".format(pathname)) 73 | maps_file.close() 74 | exit(0) 75 | 76 | # get start and end of the stack in the virtual memory 77 | addr = addr.split("-") 78 | if len(addr) != 2: # never trust anyone, not even your OS :) 79 | print("[*] Wrong addr format") 80 | maps_file.close() 81 | exit(1) 82 | addr_start = int(addr[0], 16) 83 | addr_end = int(addr[1], 16) 84 | print("\tAddr start [{:x}] | end [{:x}]".format(addr_start, addr_end)) 85 | 86 | # open and read mem 87 | try: 88 | mem_file = open(mem_filename, 'rb+') 89 | except IOError as e: 90 | print("[ERROR] Can not open file {}:".format(mem_filename)) 91 | print(" I/O error({}): {}".format(e.errno, e.strerror)) 92 | maps_file.close() 93 | exit(1) 94 | 95 | # read stack 96 | mem_file.seek(addr_start) 97 | stack = mem_file.read(addr_end - addr_start) 98 | 99 | # find string 100 | try: 101 | i = stack.index(bytes(search_string, "ASCII")) 102 | except Exception: 103 | print("Can't find '{}'".format(search_string)) 104 | maps_file.close() 105 | mem_file.close() 106 | exit(0) 107 | print("[*] Found '{}' at {:x}".format(search_string, i)) 108 | 109 | # write the new string 110 | print("[*] Writing '{}' at {:x}".format(write_string, addr_start + i)) 111 | mem_file.seek(addr_start + i) 112 | mem_file.write(bytes(write_string, "ASCII")) 113 | 114 | # close files 115 | maps_file.close() 116 | mem_file.close() 117 | 118 | # there is only one stack in our example 119 | break 120 | -------------------------------------------------------------------------------- /01. Python bytes/rw_all.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | ''' 3 | Locates and replaces (if we have permission) all occurrences of 4 | an ASCII string in the entire virtual memory of a process. 5 | 6 | Usage: ./rw_all.py PID search_string replace_by_string 7 | Where: 8 | - PID is the pid of the target process 9 | - search_string is the ASCII string you are looking to overwrite 10 | - replace_by_string is the ASCII string you want to replace 11 | search_string with 12 | ''' 13 | 14 | import sys 15 | 16 | 17 | def print_usage_and_exit(): 18 | print('Usage: {} pid search write'.format(sys.argv[0])) 19 | exit(1) 20 | 21 | # check usage 22 | if len(sys.argv) != 4: 23 | print_usage_and_exit() 24 | 25 | # get the pid from args 26 | pid = int(sys.argv[1]) 27 | if pid <= 0: 28 | print_usage_and_exit() 29 | search_string = str(sys.argv[2]) 30 | if search_string == "": 31 | print_usage_and_exit() 32 | write_string = str(sys.argv[3]) 33 | if write_string == "": 34 | print_usage_and_exit() 35 | 36 | # open the maps and mem files of the process 37 | maps_filename = "/proc/{}/maps".format(pid) 38 | print("[*] maps: {}".format(maps_filename)) 39 | mem_filename = "/proc/{}/mem".format(pid) 40 | print("[*] mem: {}".format(mem_filename)) 41 | 42 | # try opening the file 43 | try: 44 | maps_file = open('/proc/{}/maps'.format(pid), 'r') 45 | except IOError as e: 46 | print("[ERROR] Can not open file {}:".format(maps_filename)) 47 | print(" I/O error({}): {}".format(e.errno, e.strerror)) 48 | exit(1) 49 | 50 | for line in maps_file: 51 | # print the name of the memory region 52 | sline = line.split(' ') 53 | name = sline[-1][:-1] 54 | print("[*] Searching in {}:".format(name)) 55 | 56 | # parse line 57 | addr = sline[0] 58 | perm = sline[1] 59 | offset = sline[2] 60 | device = sline[3] 61 | inode = sline[4] 62 | pathname = sline[-1][:-1] 63 | 64 | # check if there are read and write permissions 65 | if perm[0] != 'r' or perm[1] != 'w': 66 | print("\t[\x1B[31m!\x1B[m] {} does not have read/write\ 67 | permissions ({})".format(pathname, perm)) 68 | continue 69 | 70 | print("\tpathname = {}".format(pathname)) 71 | print("\taddresses = {}".format(addr)) 72 | print("\tpermisions = {}".format(perm)) 73 | print("\toffset = {}".format(offset)) 74 | print("\tinode = {}".format(inode)) 75 | 76 | # get start and end of the memoy region 77 | addr = addr.split("-") 78 | if len(addr) != 2: # never trust anyone 79 | print("[*] Wrong addr format") 80 | maps_file.close() 81 | exit(1) 82 | addr_start = int(addr[0], 16) 83 | addr_end = int(addr[1], 16) 84 | print("\tAddr start [{:x}] | end [{:x}]".format(addr_start, addr_end)) 85 | 86 | # open and read the memory region 87 | try: 88 | mem_file = open(mem_filename, 'rb+') 89 | except IOError as e: 90 | print("[ERROR] Can not open file {}:".format(mem_filename)) 91 | print(" I/O error({}): {}".format(e.errno, e.strerror)) 92 | maps_file.close() 93 | 94 | # read the memory region 95 | mem_file.seek(addr_start) 96 | region = mem_file.read(addr_end - addr_start) 97 | 98 | # find string 99 | nb_found = 0 100 | try: 101 | i = region.index(bytes(search_string, "ASCII")) 102 | while (i): 103 | print("\t[\x1B[32m:)\x1B[m] Found '{}' at\ 104 | {:x}".format(search_string, i)) 105 | nb_found = nb_found + 1 106 | # write the new string 107 | print("\t[:)] Writing '{}' at\ 108 | {:x}".format(write_string, addr_start + i)) 109 | mem_file.seek(addr_start + i) 110 | mem_file.write(bytes(write_string, "ASCII")) 111 | mem_file.flush() 112 | 113 | # update our buffer 114 | region.write(bytes(write_string, "ASCII"), i) 115 | 116 | i = region.index(bytes(search_string, "ASCII")) 117 | except Exception: 118 | if nb_found == 0: 119 | print("\t[\x1B[31m:(\x1B[m] Can't find\ 120 | '{}'".format(search_string)) 121 | mem_file.close() 122 | 123 | # close files 124 | maps_file.close() 125 | -------------------------------------------------------------------------------- /02. What's where in the virtual memory/.gitignore: -------------------------------------------------------------------------------- 1 | Makefile 2 | a.out 3 | 0 4 | 1 5 | 2 6 | 3 7 | 4 8 | 5 9 | 6 10 | 7 -------------------------------------------------------------------------------- /02. What's where in the virtual memory/README.md: -------------------------------------------------------------------------------- 1 | ![hack the virtual memory](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/htvn2.png) 2 | 3 | ## Hack The Virtual Memory, chapter 2: Drawing the VM diagram 4 | 5 | We previously talked about what you could find in the virtual memory of a process, and where you could find it. Today, we will try to "reconstruct" (part of) the following diagram by making our process print addresses of various elements of the program. 6 | 7 | ![the virtual memory](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/virtual_memory.png) 8 | 9 | ## Prerequisites 10 | 11 | In order to fully understand this article, you will need to know: 12 | 13 | - The basics of the C programming language 14 | - A little bit of assembly (but not required) 15 | - The very basics of the Linux filesystem and the shell 16 | - We will also use the `/proc/[pid]/maps` file (see `man proc` or read our first article [Hack The Virtual Memory, chapter 0: C strings & /proc](https://blog.holbertonschool.com/hack-the-virtual-memory-c-strings-proc/)) 17 | 18 | ## Environment 19 | 20 | All scripts and programs have been tested on the following system: 21 | 22 | - Ubuntu 23 | - Linux ubuntu 4.4.0-31-generic #50~14.04.1-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux 24 | - **Everything we will write will be true for this system, but may be different on another system** 25 | 26 | Tools used: 27 | 28 | - gcc 29 | - gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4 30 | - objdump 31 | - GNU objdump (GNU Binutils for Ubuntu) 2.24 32 | - udcli 33 | - udis86 1.7.2 34 | - bc 35 | - bc 1.06.95 36 | 37 | ## The stack 38 | 39 | The first thing we want to locate in our diagram is the stack. We know that in C, local variables are located on the stack. So if we print the address of a local variable, it should give us an idea on where we would find the stack in the virtual memory. Let's use this program (`main-1.c`) to find out: 40 | 41 | ```c 42 | #include 43 | #include 44 | #include 45 | 46 | /** 47 | * main - print locations of various elements 48 | * 49 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 50 | */ 51 | int main(void) 52 | { 53 | int a; 54 | 55 | printf("Address of a: %p\n", (void *)&a); 56 | return (EXIT_SUCCESS); 57 | } 58 | ``` 59 | 60 | ```shell 61 | julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -pedantic -Werror main-0.c -o 0 62 | julien@holberton:~/holberton/w/hackthevm2$ ./0 63 | Address of a: 0x7ffd14b8bd9c 64 | julien@holberton:~/holberton/w/hackthevm2$ 65 | ``` 66 | 67 | This will be our first point of reference when we will compare other elements' addresses. 68 | 69 | ## The heap 70 | 71 | The heap is used when you malloc space for your variables. Let's add a line to use malloc and see where the memory address returned by `malloc` is located (`main-1.c`): 72 | 73 | ```c 74 | #include 75 | #include 76 | #include 77 | 78 | /** 79 | * main - print locations of various elements 80 | * 81 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 82 | */ 83 | int main(void) 84 | { 85 | int a; 86 | void *p; 87 | 88 | printf("Address of a: %p\n", (void *)&a); 89 | p = malloc(98); 90 | if (p == NULL) 91 | { 92 | fprintf(stderr, "Can't malloc\n"); 93 | return (EXIT_FAILURE); 94 | } 95 | printf("Allocated space in the heap: %p\n", p); 96 | return (EXIT_SUCCESS); 97 | } 98 | ``` 99 | 100 | ``` 101 | julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -pedantic -Werror main-1.c -o 1 102 | julien@holberton:~/holberton/w/hackthevm2$ ./1 103 | Address of a: 0x7ffd4204c554 104 | Allocated space in the heap: 0x901010 105 | julien@holberton:~/holberton/w/hackthevm2$ 106 | ``` 107 | 108 | It's now clear that the heap (`0x901010`) is way below the stack (`0x7ffd4204c554`). At this point we can already draw this diagram: 109 | 110 | ![heap and stack](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/virtual_memory_stack_heap.png) 111 | 112 | ## The executable 113 | 114 | Your program is also in the virtual memory. If we print the address of the `main` function, we should have an idea of where the program is located compared to the stack and the heap. Let's see if we find it below the heap as expected (`main-2.c`): 115 | 116 | ```c 117 | #include 118 | #include 119 | #include 120 | 121 | /** 122 | * main - print locations of various elements 123 | * 124 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 125 | */ 126 | int main(void) 127 | { 128 | int a; 129 | void *p; 130 | 131 | printf("Address of a: %p\n", (void *)&a); 132 | p = malloc(98); 133 | if (p == NULL) 134 | { 135 | fprintf(stderr, "Can't malloc\n"); 136 | return (EXIT_FAILURE); 137 | } 138 | printf("Allocated space in the heap: %p\n", p); 139 | printf("Address of function main: %p\n", (void *)main); 140 | return (EXIT_SUCCESS); 141 | } 142 | ``` 143 | 144 | ```shell 145 | julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -Werror main-2.c -o 2 146 | julien@holberton:~/holberton/w/hackthevm2$ ./2 147 | Address of a: 0x7ffdced37d74 148 | Allocated space in the heap: 0x2199010 149 | Address of function main: 0x40060d 150 | julien@holberton:~/holberton/w/hackthevm2$ 151 | ``` 152 | 153 | It seems that our program (`0x40060d`) is located below the heap (`0x2199010`), just as expected. 154 | But let's make sure that this is the actual code of our program, and not some sort of pointer to another location. Let's disassemble our program `2` with [objdump](https://en.wikipedia.org/wiki/Objdump) to look at the "memory address" of the `main` function: 155 | 156 | ```shell 157 | julien@holberton:~/holberton/w/hackthevm2$ objdump -M intel -j .text -d 2 | grep '
:' -A 5 158 | 000000000040060d
: 159 | 40060d: 55 push rbp 160 | 40060e: 48 89 e5 mov rbp,rsp 161 | 400611: 48 83 ec 10 sub rsp,0x10 162 | 400615: 48 8d 45 f4 lea rax,[rbp-0xc] 163 | 400619: 48 89 c6 mov rsi,rax 164 | ``` 165 | 166 | `000000000040060d
` -> we find the exact same address (`0x40060d`). If you still have any doubts, you can print the first bytes located at this address, to make sure they match the output of `objdump` (`main-3.c`): 167 | 168 | ```c 169 | #include 170 | #include 171 | #include 172 | 173 | /** 174 | * main - print locations of various elements 175 | * 176 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 177 | */ 178 | int main(void) 179 | { 180 | int a; 181 | void *p; 182 | unsigned int i; 183 | 184 | printf("Address of a: %p\n", (void *)&a); 185 | p = malloc(98); 186 | if (p == NULL) 187 | { 188 | fprintf(stderr, "Can't malloc\n"); 189 | return (EXIT_FAILURE); 190 | } 191 | printf("Allocated space in the heap: %p\n", p); 192 | printf("Address of function main: %p\n", (void *)main); 193 | printf("First bytes of the main function:\n\t"); 194 | for (i = 0; i < 15; i++) 195 | { 196 | printf("%02x ", ((unsigned char *)main)[i]); 197 | } 198 | printf("\n"); 199 | return (EXIT_SUCCESS); 200 | } 201 | ``` 202 | 203 | ```shell 204 | julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -Werror main-3.c -o 3 205 | julien@holberton:~/holberton/w/hackthevm2$ objdump -M intel -j .text -d 3 | grep '
:' -A 5 206 | 000000000040064d
: 207 | 40064d: 55 push rbp 208 | 40064e: 48 89 e5 mov rbp,rsp 209 | 400651: 48 83 ec 10 sub rsp,0x10 210 | 400655: 48 8d 45 f0 lea rax,[rbp-0x10] 211 | 400659: 48 89 c6 mov rsi,rax 212 | julien@holberton:~/holberton/w/hackthevm2$ ./3 213 | Address of a: 0x7ffeff0f13b0 214 | Allocated space in the heap: 0x8b3010 215 | Address of function main: 0x40064d 216 | First bytes of the main function: 217 | 55 48 89 e5 48 83 ec 10 48 8d 45 f0 48 89 c6 218 | julien@holberton:~/holberton/w/hackthevm2$ echo "55 48 89 e5 48 83 ec 10 48 8d 45 f0 48 89 c6" | udcli -64 -x -o 40064d 219 | 000000000040064d 55 push rbp 220 | 000000000040064e 4889e5 mov rbp, rsp 221 | 0000000000400651 4883ec10 sub rsp, 0x10 222 | 0000000000400655 488d45f0 lea rax, [rbp-0x10] 223 | 0000000000400659 4889c6 mov rsi, rax 224 | julien@holberton:~/holberton/w/hackthevm2$ 225 | ``` 226 | 227 | -> We can see that we print the same address and the same content. We are now triple sure this is our `main` function. 228 | 229 | _You can download the Udis86 Disassembler Library [here](http://udis86.sourceforge.net/)._ 230 | 231 | Here is the updated diagram, based on what we have learned: 232 | 233 | ![stack, heap and executable](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/virtual_memory_stack_heap_executable.png) 234 | 235 | ## Command line arguments and environment variables 236 | 237 | The `main` function can take arguments: 238 | 239 | - The command line arguments 240 | - the first argument of the `main` function (usually named `argc` or `ac`) is the number of command line arguments 241 | - the second argument of the `main` function (usually named `argv` or `av`) is an array of pointers to the arguments (C strings) 242 | - The environment variables 243 | - the third argument of the `main` function (usally named `env` or `envp`) is an array of pointers to the environment variables (C strings) 244 | 245 | Let's see where those elements stand in the virtual memory of our process (`main-4.c`): 246 | 247 | ```c 248 | #include 249 | #include 250 | #include 251 | 252 | /** 253 | * main - print locations of various elements 254 | * 255 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 256 | */ 257 | int main(int ac, char **av, char **env) 258 | { 259 | int a; 260 | void *p; 261 | int i; 262 | 263 | printf("Address of a: %p\n", (void *)&a); 264 | p = malloc(98); 265 | if (p == NULL) 266 | { 267 | fprintf(stderr, "Can't malloc\n"); 268 | return (EXIT_FAILURE); 269 | } 270 | printf("Allocated space in the heap: %p\n", p); 271 | printf("Address of function main: %p\n", (void *)main); 272 | printf("First bytes of the main function:\n\t"); 273 | for (i = 0; i < 15; i++) 274 | { 275 | printf("%02x ", ((unsigned char *)main)[i]); 276 | } 277 | printf("\n"); 278 | printf("Address of the array of arguments: %p\n", (void *)av); 279 | printf("Addresses of the arguments:\n\t"); 280 | for (i = 0; i < ac; i++) 281 | { 282 | printf("[%s]:%p ", av[i], av[i]); 283 | } 284 | printf("\n"); 285 | printf("Address of the array of environment variables: %p\n", (void *)env); 286 | printf("Address of the first environment variable: %p\n", (void *)(env[0])); 287 | return (EXIT_SUCCESS); 288 | } 289 | ``` 290 | 291 | ```shell 292 | julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -Werror main-4.c -o 4 293 | julien@holberton:~/holberton/w/hackthevm2$ ./4 Hello Holberton School! 294 | Address of a: 0x7ffe7d6d8da0 295 | Allocated space in the heap: 0xc8c010 296 | Address of function main: 0x40069d 297 | First bytes of the main function: 298 | 55 48 89 e5 48 83 ec 30 89 7d ec 48 89 75 e0 299 | Address of the array of arguments: 0x7ffe7d6d8e98 300 | Addresses of the arguments: 301 | [./4]:0x7ffe7d6da373 [Hello]:0x7ffe7d6da377 [Holberton]:0x7ffe7d6da37d [School!]:0x7ffe7d6da387 302 | Address of the array of environment variables: 0x7ffe7d6d8ec0 303 | Address of the first environment variables: 304 | [0x7ffe7d6da38f]:"XDG_VTNR=7" 305 | [0x7ffe7d6da39a]:"XDG_SESSION_ID=c2" 306 | [0x7ffe7d6da3ac]:"CLUTTER_IM_MODULE=xim" 307 | julien@holberton:~/holberton/w/hackthevm2$ 308 | ``` 309 | 310 | These elements are above the stack as expected, but now we know the exact order: `stack` (`0x7ffe7d6d8da0`) < `argv` (`0x7ffe7d6d8e98`) < `env` (`0x7ffe7d6d8ec0`) < arguments (from `0x7ffe7d6da373` to `0x7ffe7d6da387` + `8` (`8` = size of the string `"school"` + `1` for the `'\0'` char)) < environment variables (starting at `0x7ffe7d6da38f`). 311 | 312 | Actually, we can also see that all the command line arguments are next to each other in the memory, and also right next to the environment variables. 313 | 314 | ### Are the `argv` and `env` arrays next to each other? 315 | 316 | The array `argv` is 5 elements long (there were 4 arguments from the command line + 1 `NULL` element at the end (`argv` always ends with `NULL` to mark the end of the array)). Each element is a pointer to a `char` and since we are on a 64-bit machine, a pointer is `8` bytes (if you want to make sure, you can use the C operator `sizeof()` to get the size of a pointer). As a result our `argv` array is of size `5 * 8` = `40`. `40` in base `10` is `0x28` in base `16`. If we add this value to the address of the beginning of the array `0x7ffe7d6d8e98`, we get... `0x7ffe7d6d8ec0` (The address of the `env` array)! So the two arrays are next to each other in memory. 317 | 318 | ### Is the first command line argument stored right after the `env` array? 319 | 320 | In order to check this we need to know the size of the `env` array. We know that it ends with a `NULL` pointer, so in order to get the number of its elements we simply need to loop through it, checking if the "current" element is `NULL`. Here's the updated C code (`main-5.c`): 321 | 322 | ```c 323 | #include 324 | #include 325 | #include 326 | 327 | /** 328 | * main - print locations of various elements 329 | * 330 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 331 | */ 332 | int main(int ac, char **av, char **env) 333 | { 334 | int a; 335 | void *p; 336 | int i; 337 | int size; 338 | 339 | printf("Address of a: %p\n", (void *)&a); 340 | p = malloc(98); 341 | if (p == NULL) 342 | { 343 | fprintf(stderr, "Can't malloc\n"); 344 | return (EXIT_FAILURE); 345 | } 346 | printf("Allocated space in the heap: %p\n", p); 347 | printf("Address of function main: %p\n", (void *)main); 348 | printf("First bytes of the main function:\n\t"); 349 | for (i = 0; i < 15; i++) 350 | { 351 | printf("%02x ", ((unsigned char *)main)[i]); 352 | } 353 | printf("\n"); 354 | printf("Address of the array of arguments: %p\n", (void *)av); 355 | printf("Addresses of the arguments:\n\t"); 356 | for (i = 0; i < ac; i++) 357 | { 358 | printf("[%s]:%p ", av[i], av[i]); 359 | } 360 | printf("\n"); 361 | printf("Address of the array of environment variables: %p\n", (void *)env); 362 | printf("Address of the first environment variables:\n"); 363 | for (i = 0; i < 3; i++) 364 | { 365 | printf("\t[%p]:\"%s\"\n", env[i], env[i]); 366 | } 367 | /* size of the env array */ 368 | i = 0; 369 | while (env[i] != NULL) 370 | { 371 | i++; 372 | } 373 | i++; /* the NULL pointer */ 374 | size = i * sizeof(char *); 375 | printf("Size of the array env: %d elements -> %d bytes (0x%x)\n", i, size, size); 376 | return (EXIT_SUCCESS); 377 | } 378 | ``` 379 | 380 | ```shell 381 | julien@holberton:~/holberton/w/hackthevm2$ ./5 Hello Betty Holberton! 382 | Address of a: 0x7ffc77598acc 383 | Allocated space in the heap: 0x2216010 384 | Address of function main: 0x40069d 385 | First bytes of the main function: 386 | 55 48 89 e5 48 83 ec 40 89 7d dc 48 89 75 d0 387 | Address of the array of arguments: 0x7ffc77598bc8 388 | Addresses of the arguments: 389 | [./5]:0x7ffc7759a374 [Hello]:0x7ffc7759a378 [Betty]:0x7ffc7759a37e [Holberton!]:0x7ffc7759a384 390 | Address of the array of environment variables: 0x7ffc77598bf0 391 | Address of the first environment variables: 392 | [0x7ffc7759a38f]:"XDG_VTNR=7" 393 | [0x7ffc7759a39a]:"XDG_SESSION_ID=c2" 394 | [0x7ffc7759a3ac]:"CLUTTER_IM_MODULE=xim" 395 | Size of the array env: 62 elements -> 496 bytes (0x1f0) 396 | julien@holberton:~/holberton/w/hackthevm2$ bc 397 | bc 1.06.95 398 | Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc. 399 | This is free software with ABSOLUTELY NO WARRANTY. 400 | For details type `warranty'. 401 | obase=16 402 | ibase=16 403 | 1F0+7FFC77598BF0 404 | 7FFC77598DE0 405 | quit 406 | julien@holberton:~/holberton/w/hackthevm2$ 407 | ``` 408 | 409 | -> `7FFC77598DE0` != (but still <) `0x7ffc7759a374`. So the answer is no :) 410 | 411 | ### Wrapping up 412 | 413 | Let's update our diagram with what we learned. 414 | 415 | ![virtual memory with command line arguments and environment variables](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/virtual_memory_args_env.png) 416 | 417 | ## Is the stack really growing downwards? 418 | 419 | Let's call a function and figure this out! If this is true, then the variables of the calling function will be higher in memory than those from the called function (`main-6.c`). 420 | 421 | ```c 422 | #include 423 | #include 424 | #include 425 | 426 | /** 427 | * f - print locations of various elements 428 | * 429 | * Returns: nothing 430 | */ 431 | void f(void) 432 | { 433 | int a; 434 | int b; 435 | int c; 436 | 437 | a = 98; 438 | b = 1024; 439 | c = a * b; 440 | printf("[f] a = %d, b = %d, c = a * b = %d\n", a, b, c); 441 | printf("[f] Adresses of a: %p, b = %p, c = %p\n", (void *)&a, (void *)&b, (void *)&c); 442 | } 443 | 444 | /** 445 | * main - print locations of various elements 446 | * 447 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 448 | */ 449 | int main(int ac, char **av, char **env) 450 | { 451 | int a; 452 | void *p; 453 | int i; 454 | int size; 455 | 456 | printf("Address of a: %p\n", (void *)&a); 457 | p = malloc(98); 458 | if (p == NULL) 459 | { 460 | fprintf(stderr, "Can't malloc\n"); 461 | return (EXIT_FAILURE); 462 | } 463 | printf("Allocated space in the heap: %p\n", p); 464 | printf("Address of function main: %p\n", (void *)main); 465 | printf("First bytes of the main function:\n\t"); 466 | for (i = 0; i < 15; i++) 467 | { 468 | printf("%02x ", ((unsigned char *)main)[i]); 469 | } 470 | printf("\n"); 471 | printf("Address of the array of arguments: %p\n", (void *)av); 472 | printf("Addresses of the arguments:\n\t"); 473 | for (i = 0; i < ac; i++) 474 | { 475 | printf("[%s]:%p ", av[i], av[i]); 476 | } 477 | printf("\n"); 478 | printf("Address of the array of environment variables: %p\n", (void *)env); 479 | printf("Address of the first environment variables:\n"); 480 | for (i = 0; i < 3; i++) 481 | { 482 | printf("\t[%p]:\"%s\"\n", env[i], env[i]); 483 | } 484 | /* size of the env array */ 485 | i = 0; 486 | while (env[i] != NULL) 487 | { 488 | i++; 489 | } 490 | i++; /* the NULL pointer */ 491 | size = i * sizeof(char *); 492 | printf("Size of the array env: %d elements -> %d bytes (0x%x)\n", i, size, size); 493 | f(); 494 | return (EXIT_SUCCESS); 495 | } 496 | ``` 497 | 498 | ```shell 499 | julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -Werror main-6.c -o 6 500 | julien@holberton:~/holberton/w/hackthevm2$ ./6 501 | Address of a: 0x7ffdae53ea4c 502 | Allocated space in the heap: 0xf32010 503 | Address of function main: 0x4006f9 504 | First bytes of the main function: 505 | 55 48 89 e5 48 83 ec 40 89 7d dc 48 89 75 d0 506 | Address of the array of arguments: 0x7ffdae53eb48 507 | Addresses of the arguments: 508 | [./6]:0x7ffdae54038b 509 | Address of the array of environment variables: 0x7ffdae53eb58 510 | Address of the first environment variables: 511 | [0x7ffdae54038f]:"XDG_VTNR=7" 512 | [0x7ffdae54039a]:"XDG_SESSION_ID=c2" 513 | [0x7ffdae5403ac]:"CLUTTER_IM_MODULE=xim" 514 | Size of the array env: 62 elements -> 496 bytes (0x1f0) 515 | [f] a = 98, b = 1024, c = a * b = 100352 516 | [f] Adresses of a: 0x7ffdae53ea04, b = 0x7ffdae53ea08, c = 0x7ffdae53ea0c 517 | julien@holberton:~/holberton/w/hackthevm2$ 518 | ``` 519 | 520 | -> True! (address of var `a` in function `f`) `0x7ffdae53ea04` < `0x7ffdae53ea4c` (address of var `a` in function `main`) 521 | 522 | We now update our diagram: 523 | 524 | ![stack is growing downwards](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/virtual_memory_stack.png) 525 | 526 | ## `/proc` 527 | 528 | Let's double check everything we found so far with `/proc/[pid]/maps` (`man proc` or refer to the first article in this series to learn about the `proc` filesystem if you don't know what it is). 529 | 530 | Let's add a `getchar()` to our program so that we can look at its "`/proc`" (`main-7.c`): 531 | 532 | ```c 533 | #include 534 | #include 535 | #include 536 | 537 | /** 538 | * f - print locations of various elements 539 | * 540 | * Returns: nothing 541 | */ 542 | void f(void) 543 | { 544 | int a; 545 | int b; 546 | int c; 547 | 548 | a = 98; 549 | b = 1024; 550 | c = a * b; 551 | printf("[f] a = %d, b = %d, c = a * b = %d\n", a, b, c); 552 | printf("[f] Adresses of a: %p, b = %p, c = %p\n", (void *)&a, (void *)&b, (void *)&c); 553 | } 554 | 555 | /** 556 | * main - print locations of various elements 557 | * 558 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 559 | */ 560 | int main(int ac, char **av, char **env) 561 | { 562 | int a; 563 | void *p; 564 | int i; 565 | int size; 566 | 567 | printf("Address of a: %p\n", (void *)&a); 568 | p = malloc(98); 569 | if (p == NULL) 570 | { 571 | fprintf(stderr, "Can't malloc\n"); 572 | return (EXIT_FAILURE); 573 | } 574 | printf("Allocated space in the heap: %p\n", p); 575 | printf("Address of function main: %p\n", (void *)main); 576 | printf("First bytes of the main function:\n\t"); 577 | for (i = 0; i < 15; i++) 578 | { 579 | printf("%02x ", ((unsigned char *)main)[i]); 580 | } 581 | printf("\n"); 582 | printf("Address of the array of arguments: %p\n", (void *)av); 583 | printf("Addresses of the arguments:\n\t"); 584 | for (i = 0; i < ac; i++) 585 | { 586 | printf("[%s]:%p ", av[i], av[i]); 587 | } 588 | printf("\n"); 589 | printf("Address of the array of environment variables: %p\n", (void *)env); 590 | printf("Address of the first environment variables:\n"); 591 | for (i = 0; i < 3; i++) 592 | { 593 | printf("\t[%p]:\"%s\"\n", env[i], env[i]); 594 | } 595 | /* size of the env array */ 596 | i = 0; 597 | while (env[i] != NULL) 598 | { 599 | i++; 600 | } 601 | i++; /* the NULL pointer */ 602 | size = i * sizeof(char *); 603 | printf("Size of the array env: %d elements -> %d bytes (0x%x)\n", i, size, size); 604 | f(); 605 | getchar(); 606 | return (EXIT_SUCCESS); 607 | } 608 | ``` 609 | 610 | ```shell 611 | julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -Werror main-7.c -o 7 612 | julien@holberton:~/holberton/w/hackthevm2$ ./7 Rona is a Legend SRE 613 | Address of a: 0x7fff16c8146c 614 | Allocated space in the heap: 0x2050010 615 | Address of function main: 0x400739 616 | First bytes of the main function: 617 | 55 48 89 e5 48 83 ec 40 89 7d dc 48 89 75 d0 618 | Address of the array of arguments: 0x7fff16c81568 619 | Addresses of the arguments: 620 | [./7]:0x7fff16c82376 [Rona]:0x7fff16c8237a [is]:0x7fff16c8237f [a]:0x7fff16c82382 [Legend]:0x7fff16c82384 [SRE]:0x7fff16c8238b 621 | Address of the array of environment variables: 0x7fff16c815a0 622 | Address of the first environment variables: 623 | [0x7fff16c8238f]:"XDG_VTNR=7" 624 | [0x7fff16c8239a]:"XDG_SESSION_ID=c2" 625 | [0x7fff16c823ac]:"CLUTTER_IM_MODULE=xim" 626 | Size of the array env: 62 elements -> 496 bytes (0x1f0) 627 | [f] a = 98, b = 1024, c = a * b = 100352 628 | [f] Adresses of a: 0x7fff16c81424, b = 0x7fff16c81428, c = 0x7fff16c8142c 629 | 630 | ``` 631 | 632 | ```shell 633 | julien@holberton:~$ ps aux | grep "./7" | grep -v grep 634 | julien 5788 0.0 0.0 4336 628 pts/8 S+ 18:04 0:00 ./7 Rona is a Legend SRE 635 | julien@holberton:~$ cat /proc/5788/maps 636 | 00400000-00401000 r-xp 00000000 08:01 171828 /home/julien/holberton/w/hackthevm2/7 637 | 00600000-00601000 r--p 00000000 08:01 171828 /home/julien/holberton/w/hackthevm2/7 638 | 00601000-00602000 rw-p 00001000 08:01 171828 /home/julien/holberton/w/hackthevm2/7 639 | 02050000-02071000 rw-p 00000000 00:00 0 [heap] 640 | 7f68caa1c000-7f68cabd6000 r-xp 00000000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 641 | 7f68cabd6000-7f68cadd6000 ---p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 642 | 7f68cadd6000-7f68cadda000 r--p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 643 | 7f68cadda000-7f68caddc000 rw-p 001be000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 644 | 7f68caddc000-7f68cade1000 rw-p 00000000 00:00 0 645 | 7f68cade1000-7f68cae04000 r-xp 00000000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so 646 | 7f68cafe8000-7f68cafeb000 rw-p 00000000 00:00 0 647 | 7f68cafff000-7f68cb003000 rw-p 00000000 00:00 0 648 | 7f68cb003000-7f68cb004000 r--p 00022000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so 649 | 7f68cb004000-7f68cb005000 rw-p 00023000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so 650 | 7f68cb005000-7f68cb006000 rw-p 00000000 00:00 0 651 | 7fff16c62000-7fff16c83000 rw-p 00000000 00:00 0 [stack] 652 | 7fff16d07000-7fff16d09000 r--p 00000000 00:00 0 [vvar] 653 | 7fff16d09000-7fff16d0b000 r-xp 00000000 00:00 0 [vdso] 654 | ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] 655 | julien@holberton:~$ 656 | ``` 657 | 658 | Let's check a few things: 659 | 660 | - The stack starts at `7fff16c62000` and ends at `7fff16c83000`. Our variables are all inside this region (`0x7fff16c8146c`, `0x7fff16c81424`, `0x7fff16c81428`, `0x7fff16c8142c`) 661 | - The heap starts at `02050000` and ends at `02071000`. Our allocated memory is in there (`0x2050010`) 662 | - Our code (the `main` function) is located at address `0x400739`, so in the following region: 663 | `00400000-00401000 r-xp 00000000 08:01 171828 /home/julien/holberton/w/hackthevm2/7` 664 | It comes from the file `/home/julien/holberton/w/hackthevm2/7` (our executable) and this region has execution permissions, which also makes sense. 665 | - The arguments and environment variables (from `0x7fff16c81568` to `0x7fff16c8238f` + `0x1f0`) are located in the region starting at `7fff16c62000` and ending at `7fff16c83000`... the stack! :) So they are IN the stack, not outside the stack. 666 | 667 | This also brings up more questions: 668 | 669 | - Why is our executable "divided" into three different regions with different permissions? What is inside these two regions? 670 | - `00600000-00601000 r--p 00000000 08:01 171828 /home/julien/holberton/w/hackthevm2/7` 671 | - `00601000-00602000 rw-p 00001000 08:01 171828 /home/julien/holberton/w/hackthevm2/7` 672 | - What are all those other regions? 673 | - Why our allocated memory does not start at the very beginning of the heap (`0x2050010` vs `02050000`)? What are those first 16 bytes used for? 674 | 675 | There is also another fact that we haven't checked: Is the heap actually growing upwards? 676 | 677 | We'll find out another day! But before we end this chapter, let's update our diagram with everything we've learned: 678 | 679 | ![the virtual memory](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/virtual_memory_args_stack.png) 680 | 681 | ## Outro 682 | 683 | We have learned a ton of things by simply printing informations from our executables! But we still have open questions that we will explore in a future chapter to complete our diagram of the virtual memory. In the meantime, you should try to find out yourself. 684 | 685 | ### Questions? Feedback? 686 | 687 | If you have questions or feedback don't hesitate to ping us on Twitter at [@holbertonschool](https://twitter.com/holbertonschool) or [@julienbarbier42](https://twitter.com/julienbarbier42). 688 | _Haters, please send your comments to `/dev/null`._ 689 | 690 | Happy Hacking! 691 | 692 | ### Thank you for reading! 693 | 694 | As always, no-one is perfect (except [Chuck](http://codesqueeze.com/the-ultimate-top-25-chuck-norris-the-programmer-jokes/) of course), so don't hesitate to [contribute](https://github.com/holbertonschool/Hack-The-Virtual-Memory/tree/master/02.%20What's%20where%20in%20the%20virtual%20memory) or send me your comments if you find anything I missed. 695 | 696 | ### Files 697 | 698 | This repo contains the source code (`main-X.c` files) for programs created in this tutorial. 699 | 700 | ### Read more about the virtual memory 701 | 702 | Follow [@holbertonschool](https://twitter.com/holbertonschool) or [@julienbarbier42](https://twitter.com/julienbarbier42) on Twitter to get the next chapters! This was the third chapter in our series on the virtual memory. If you missed the previous ones, here are the links to them: 703 | 704 | - Chapter 0: [Hack The Virtual Memory: C strings & /proc](https://blog.holbertonschool.com/hack-the-virtual-memory-c-strings-proc/) 705 | - Chapter 1: [Hack The Virtual Memory: Python bytes](https://blog.holbertonschool.com/hack-the-virtual-memory-python-bytes/) 706 | 707 | _Many thanks to [Tim](https://twitter.com/wintermanc3r) for English proof-reading!_ :) 708 | -------------------------------------------------------------------------------- /02. What's where in the virtual memory/main-0.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | /** 6 | * main - print locations of various elements 7 | * 8 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 9 | */ 10 | int main(void) 11 | { 12 | int a; 13 | 14 | printf("Address of a: %p\n", (void *)&a); 15 | return (EXIT_SUCCESS); 16 | } 17 | -------------------------------------------------------------------------------- /02. What's where in the virtual memory/main-1.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | /** 6 | * main - print locations of various elements 7 | * 8 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 9 | */ 10 | int main(void) 11 | { 12 | int a; 13 | void *p; 14 | 15 | printf("Address of a: %p\n", (void *)&a); 16 | p = malloc(98); 17 | if (p == NULL) 18 | { 19 | fprintf(stderr, "Can't malloc\n"); 20 | return (EXIT_FAILURE); 21 | } 22 | printf("Allocated space in the heap: %p\n", p); 23 | return (EXIT_SUCCESS); 24 | } 25 | -------------------------------------------------------------------------------- /02. What's where in the virtual memory/main-2.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | /** 6 | * main - print locations of various elements 7 | * 8 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 9 | */ 10 | int main(void) 11 | { 12 | int a; 13 | void *p; 14 | 15 | printf("Address of a: %p\n", (void *)&a); 16 | p = malloc(98); 17 | if (p == NULL) 18 | { 19 | fprintf(stderr, "Can't malloc\n"); 20 | return (EXIT_FAILURE); 21 | } 22 | printf("Allocated space in the heap: %p\n", p); 23 | printf("Address of function main: %p\n", (void *)main); 24 | return (EXIT_SUCCESS); 25 | } 26 | -------------------------------------------------------------------------------- /02. What's where in the virtual memory/main-3.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | /** 6 | * main - print locations of various elements 7 | * 8 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 9 | */ 10 | int main(void) 11 | { 12 | int a; 13 | void *p; 14 | unsigned int i; 15 | 16 | printf("Address of a: %p\n", (void *)&a); 17 | p = malloc(98); 18 | if (p == NULL) 19 | { 20 | fprintf(stderr, "Can't malloc\n"); 21 | return (EXIT_FAILURE); 22 | } 23 | printf("Allocated space in the heap: %p\n", p); 24 | printf("Address of function main: %p\n", (void *)main); 25 | printf("First bytes of the main function:\n\t"); 26 | for (i = 0; i < 15; i++) 27 | { 28 | printf("%02x ", ((unsigned char *)main)[i]); 29 | } 30 | printf("\n"); 31 | return (EXIT_SUCCESS); 32 | } 33 | -------------------------------------------------------------------------------- /02. What's where in the virtual memory/main-4.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | /** 6 | * main - print locations of various elements 7 | * 8 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 9 | */ 10 | int main(int ac, char **av, char **env) 11 | { 12 | int a; 13 | void *p; 14 | int i; 15 | 16 | printf("Address of a: %p\n", (void *)&a); 17 | p = malloc(98); 18 | if (p == NULL) 19 | { 20 | fprintf(stderr, "Can't malloc\n"); 21 | return (EXIT_FAILURE); 22 | } 23 | printf("Allocated space in the heap: %p\n", p); 24 | printf("Address of function main: %p\n", (void *)main); 25 | printf("First bytes of the main function:\n\t"); 26 | for (i = 0; i < 15; i++) 27 | { 28 | printf("%02x ", ((unsigned char *)main)[i]); 29 | } 30 | printf("\n"); 31 | printf("Address of the array of arguments: %p\n", (void *)av); 32 | printf("Addresses of the arguments:\n\t"); 33 | for (i = 0; i < ac; i++) 34 | { 35 | printf("[%s]:%p ", av[i], av[i]); 36 | } 37 | printf("\n"); 38 | printf("Address of the array of environment variables: %p\n", (void *)env); 39 | printf("Address of the first environment variables:\n"); 40 | for (i = 0; i < 3; i++) 41 | { 42 | printf("\t[%p]:\"%s\"\n", env[i], env[i]); 43 | } 44 | return (EXIT_SUCCESS); 45 | } 46 | -------------------------------------------------------------------------------- /02. What's where in the virtual memory/main-5.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | /** 6 | * main - print locations of various elements 7 | * 8 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 9 | */ 10 | int main(int ac, char **av, char **env) 11 | { 12 | int a; 13 | void *p; 14 | int i; 15 | int size; 16 | 17 | printf("Address of a: %p\n", (void *)&a); 18 | p = malloc(98); 19 | if (p == NULL) 20 | { 21 | fprintf(stderr, "Can't malloc\n"); 22 | return (EXIT_FAILURE); 23 | } 24 | printf("Allocated space in the heap: %p\n", p); 25 | printf("Address of function main: %p\n", (void *)main); 26 | printf("First bytes of the main function:\n\t"); 27 | for (i = 0; i < 15; i++) 28 | { 29 | printf("%02x ", ((unsigned char *)main)[i]); 30 | } 31 | printf("\n"); 32 | printf("Address of the array of arguments: %p\n", (void *)av); 33 | printf("Addresses of the arguments:\n\t"); 34 | for (i = 0; i < ac; i++) 35 | { 36 | printf("[%s]:%p ", av[i], av[i]); 37 | } 38 | printf("\n"); 39 | printf("Address of the array of environment variables: %p\n", (void *)env); 40 | printf("Address of the first environment variables:\n"); 41 | for (i = 0; i < 3; i++) 42 | { 43 | printf("\t[%p]:\"%s\"\n", env[i], env[i]); 44 | } 45 | /* size of the env array */ 46 | i = 0; 47 | while (env[i] != NULL) 48 | { 49 | i++; 50 | } 51 | i++; /* the NULL pointer */ 52 | size = i * sizeof(char *); 53 | printf("Size of the array env: %d elements -> %d bytes (0x%x)\n", i, size, size); 54 | return (EXIT_SUCCESS); 55 | } 56 | -------------------------------------------------------------------------------- /02. What's where in the virtual memory/main-6.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | /** 6 | * f - print locations of various elements 7 | * 8 | * Returns: nothing 9 | */ 10 | void f(void) 11 | { 12 | int a; 13 | int b; 14 | int c; 15 | 16 | a = 98; 17 | b = 1024; 18 | c = a * b; 19 | printf("[f] a = %d, b = %d, c = a * b = %d\n", a, b, c); 20 | printf("[f] Adresses of a: %p, b = %p, c = %p\n", (void *)&a, (void *)&b, (void *)&c); 21 | } 22 | 23 | /** 24 | * main - print locations of various elements 25 | * 26 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 27 | */ 28 | int main(int ac, char **av, char **env) 29 | { 30 | int a; 31 | void *p; 32 | int i; 33 | int size; 34 | 35 | printf("Address of a: %p\n", (void *)&a); 36 | p = malloc(98); 37 | if (p == NULL) 38 | { 39 | fprintf(stderr, "Can't malloc\n"); 40 | return (EXIT_FAILURE); 41 | } 42 | printf("Allocated space in the heap: %p\n", p); 43 | printf("Address of function main: %p\n", (void *)main); 44 | printf("First bytes of the main function:\n\t"); 45 | for (i = 0; i < 15; i++) 46 | { 47 | printf("%02x ", ((unsigned char *)main)[i]); 48 | } 49 | printf("\n"); 50 | printf("Address of the array of arguments: %p\n", (void *)av); 51 | printf("Addresses of the arguments:\n\t"); 52 | for (i = 0; i < ac; i++) 53 | { 54 | printf("[%s]:%p ", av[i], av[i]); 55 | } 56 | printf("\n"); 57 | printf("Address of the array of environment variables: %p\n", (void *)env); 58 | printf("Address of the first environment variables:\n"); 59 | for (i = 0; i < 3; i++) 60 | { 61 | printf("\t[%p]:\"%s\"\n", env[i], env[i]); 62 | } 63 | /* size of the env array */ 64 | i = 0; 65 | while (env[i] != NULL) 66 | { 67 | i++; 68 | } 69 | i++; /* the NULL pointer */ 70 | size = i * sizeof(char *); 71 | printf("Size of the array env: %d elements -> %d bytes (0x%x)\n", i, size, size); 72 | f(); 73 | return (EXIT_SUCCESS); 74 | } 75 | -------------------------------------------------------------------------------- /02. What's where in the virtual memory/main-7.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | /** 6 | * f - print locations of various elements 7 | * 8 | * Returns: nothing 9 | */ 10 | void f(void) 11 | { 12 | int a; 13 | int b; 14 | int c; 15 | 16 | a = 98; 17 | b = 1024; 18 | c = a * b; 19 | printf("[f] a = %d, b = %d, c = a * b = %d\n", a, b, c); 20 | printf("[f] Adresses of a: %p, b = %p, c = %p\n", (void *)&a, (void *)&b, (void *)&c); 21 | } 22 | 23 | /** 24 | * main - print locations of various elements 25 | * 26 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 27 | */ 28 | int main(int ac, char **av, char **env) 29 | { 30 | int a; 31 | void *p; 32 | int i; 33 | int size; 34 | 35 | printf("Address of a: %p\n", (void *)&a); 36 | p = malloc(98); 37 | if (p == NULL) 38 | { 39 | fprintf(stderr, "Can't malloc\n"); 40 | return (EXIT_FAILURE); 41 | } 42 | printf("Allocated space in the heap: %p\n", p); 43 | printf("Address of function main: %p\n", (void *)main); 44 | printf("First bytes of the main function:\n\t"); 45 | for (i = 0; i < 15; i++) 46 | { 47 | printf("%02x ", ((unsigned char *)main)[i]); 48 | } 49 | printf("\n"); 50 | printf("Address of the array of arguments: %p\n", (void *)av); 51 | printf("Addresses of the arguments:\n\t"); 52 | for (i = 0; i < ac; i++) 53 | { 54 | printf("[%s]:%p ", av[i], av[i]); 55 | } 56 | printf("\n"); 57 | printf("Address of the array of environment variables: %p\n", (void *)env); 58 | printf("Address of the first environment variables:\n"); 59 | for (i = 0; i < 3; i++) 60 | { 61 | printf("\t[%p]:\"%s\"\n", env[i], env[i]); 62 | } 63 | /* size of the env array */ 64 | i = 0; 65 | while (env[i] != NULL) 66 | { 67 | i++; 68 | } 69 | i++; /* the NULL pointer */ 70 | size = i * sizeof(char *); 71 | printf("Size of the array env: %d elements -> %d bytes (0x%x)\n", i, size, size); 72 | f(); 73 | getchar(); 74 | return (EXIT_SUCCESS); 75 | } 76 | -------------------------------------------------------------------------------- /03. malloc, the heap and the program break/.gitignore: -------------------------------------------------------------------------------- 1 | *~ 2 | \#*\# 3 | .\#* 4 | *.swp 5 | *.swo 6 | a.out 7 | 1 8 | 2 9 | 3 10 | 4 11 | 5 12 | 6 13 | 7 14 | 8 15 | 9 16 | 0 17 | 10 18 | Makefile 19 | -------------------------------------------------------------------------------- /03. malloc, the heap and the program break/0-main.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | /** 5 | * main - do nothing 6 | * 7 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 8 | */ 9 | int main(void) 10 | { 11 | getchar(); 12 | return (EXIT_SUCCESS); 13 | } 14 | -------------------------------------------------------------------------------- /03. malloc, the heap and the program break/1-main.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | /** 5 | * main - 1 call to malloc 6 | * 7 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 8 | */ 9 | int main(void) 10 | { 11 | malloc(1); 12 | getchar(); 13 | return (EXIT_SUCCESS); 14 | } 15 | -------------------------------------------------------------------------------- /03. malloc, the heap and the program break/10-main.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | /** 6 | * pmem - print mem 7 | * @p: memory address to start printing from 8 | * @bytes: number of bytes to print 9 | * 10 | * Return: nothing 11 | */ 12 | void pmem(void *p, unsigned int bytes) 13 | { 14 | unsigned char *ptr; 15 | unsigned int i; 16 | 17 | ptr = (unsigned char *)p; 18 | for (i = 0; i < bytes; i++) 19 | { 20 | if (i != 0) 21 | { 22 | printf(" "); 23 | } 24 | printf("%02x", *(ptr + i)); 25 | } 26 | printf("\n"); 27 | } 28 | 29 | /** 30 | * main - moving the program break 31 | * 32 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 33 | */ 34 | int main(void) 35 | { 36 | void *p; 37 | size_t size_of_the_chunk; 38 | char prev_used; 39 | 40 | p = malloc(0); 41 | printf("%p\n", p); 42 | pmem((char *)p - 0x10, 0x10); 43 | size_of_the_chunk = *((size_t *)((char *)p - 8)); 44 | prev_used = size_of_the_chunk & 1; 45 | size_of_the_chunk -= prev_used; 46 | printf("chunk size = %li bytes\n", size_of_the_chunk); 47 | return (EXIT_SUCCESS); 48 | } 49 | -------------------------------------------------------------------------------- /03. malloc, the heap and the program break/2-main.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | /** 5 | * main - prints the malloc returned address 6 | * 7 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 8 | */ 9 | int main(void) 10 | { 11 | void *p; 12 | 13 | p = malloc(1); 14 | printf("%p\n", p); 15 | getchar(); 16 | return (EXIT_SUCCESS); 17 | } 18 | -------------------------------------------------------------------------------- /03. malloc, the heap and the program break/3-main.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | /** 6 | * main - let's find out which syscall malloc is using 7 | * 8 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 9 | */ 10 | int main(void) 11 | { 12 | void *p; 13 | 14 | write(1, "BEFORE MALLOC\n", 14); 15 | p = malloc(1); 16 | write(1, "AFTER MALLOC\n", 13); 17 | printf("%p\n", p); 18 | getchar(); 19 | return (EXIT_SUCCESS); 20 | } 21 | -------------------------------------------------------------------------------- /03. malloc, the heap and the program break/4-main.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | /** 6 | * main - many calls to malloc 7 | * 8 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 9 | */ 10 | int main(void) 11 | { 12 | void *p; 13 | 14 | write(1, "BEFORE MALLOC #0\n", 17); 15 | p = malloc(1024); 16 | write(1, "AFTER MALLOC #0\n", 16); 17 | printf("%p\n", p); 18 | 19 | write(1, "BEFORE MALLOC #1\n", 17); 20 | p = malloc(1024); 21 | write(1, "AFTER MALLOC #1\n", 16); 22 | printf("%p\n", p); 23 | 24 | write(1, "BEFORE MALLOC #2\n", 17); 25 | p = malloc(1024); 26 | write(1, "AFTER MALLOC #2\n", 16); 27 | printf("%p\n", p); 28 | 29 | write(1, "BEFORE MALLOC #3\n", 17); 30 | p = malloc(1024); 31 | write(1, "AFTER MALLOC #3\n", 16); 32 | printf("%p\n", p); 33 | 34 | getchar(); 35 | return (EXIT_SUCCESS); 36 | } 37 | -------------------------------------------------------------------------------- /03. malloc, the heap and the program break/5-main.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | /** 6 | * pmem - print mem 7 | * @p: memory address to start printing from 8 | * @bytes: number of bytes to print 9 | * 10 | * Return: nothing 11 | */ 12 | void pmem(void *p, unsigned int bytes) 13 | { 14 | unsigned char *ptr; 15 | unsigned int i; 16 | 17 | ptr = (unsigned char *)p; 18 | for (i = 0; i < bytes; i++) 19 | { 20 | if (i != 0) 21 | { 22 | printf(" "); 23 | } 24 | printf("%02x", *(ptr + i)); 25 | } 26 | printf("\n"); 27 | } 28 | 29 | /** 30 | * main - the 0x10 lost bytes 31 | * 32 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 33 | */ 34 | int main(void) 35 | { 36 | void *p; 37 | int i; 38 | 39 | for (i = 0; i < 10; i++) 40 | { 41 | p = malloc(1024 * (i + 1)); 42 | printf("%p\n", p); 43 | printf("bytes at %p:\n", (void *)((char *)p - 0x10)); 44 | pmem((char *)p - 0x10, 0x10); 45 | } 46 | return (EXIT_SUCCESS); 47 | } 48 | -------------------------------------------------------------------------------- /03. malloc, the heap and the program break/6-main.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | /** 6 | * pmem - print mem 7 | * @p: memory address to start printing from 8 | * @bytes: number of bytes to print 9 | * 10 | * Return: nothing 11 | */ 12 | void pmem(void *p, unsigned int bytes) 13 | { 14 | unsigned char *ptr; 15 | unsigned int i; 16 | 17 | ptr = (unsigned char *)p; 18 | for (i = 0; i < bytes; i++) 19 | { 20 | if (i != 0) 21 | { 22 | printf(" "); 23 | } 24 | printf("%02x", *(ptr + i)); 25 | } 26 | printf("\n"); 27 | } 28 | 29 | /** 30 | * main - using the 0x10 bytes to jump to next malloc'ed chunks 31 | * 32 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 33 | */ 34 | int main(void) 35 | { 36 | void *p; 37 | int i; 38 | void *heap_start; 39 | size_t size_of_the_block; 40 | 41 | heap_start = sbrk(0); 42 | write(1, "START\n", 6); 43 | for (i = 0; i < 10; i++) 44 | { 45 | p = malloc(1024 * (i + 1)); 46 | *((int *)p) = i; 47 | printf("%p: [%i]\n", p, i); 48 | } 49 | p = heap_start; 50 | for (i = 0; i < 10; i++) 51 | { 52 | pmem(p, 0x10); 53 | size_of_the_block = *((size_t *)((char *)p + 8)) - 1; 54 | printf("%p: [%i] - size = %li\n", 55 | (void *)((char *)p + 0x10), 56 | *((int *)((char *)p + 0x10)), 57 | size_of_the_block); 58 | p = (void *)((char *)p + size_of_the_block); 59 | } 60 | write(1, "END\n", 4); 61 | return (EXIT_SUCCESS); 62 | } 63 | -------------------------------------------------------------------------------- /03. malloc, the heap and the program break/7-main.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | /** 6 | * pmem - print mem 7 | * @p: memory address to start printing from 8 | * @bytes: number of bytes to print 9 | * 10 | * Return: nothing 11 | */ 12 | void pmem(void *p, unsigned int bytes) 13 | { 14 | unsigned char *ptr; 15 | unsigned int i; 16 | 17 | ptr = (unsigned char *)p; 18 | for (i = 0; i < bytes; i++) 19 | { 20 | if (i != 0) 21 | { 22 | printf(" "); 23 | } 24 | printf("%02x", *(ptr + i)); 25 | } 26 | printf("\n"); 27 | } 28 | 29 | /** 30 | * main - confirm the source code 31 | * 32 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 33 | */ 34 | int main(void) 35 | { 36 | void *p; 37 | int i; 38 | size_t size_of_the_chunk; 39 | size_t size_of_the_previous_chunk; 40 | void *chunks[10]; 41 | 42 | for (i = 0; i < 10; i++) 43 | { 44 | p = malloc(1024 * (i + 1)); 45 | chunks[i] = (void *)((char *)p - 0x10); 46 | printf("%p\n", p); 47 | } 48 | free((char *)(chunks[3]) + 0x10); 49 | free((char *)(chunks[7]) + 0x10); 50 | for (i = 0; i < 10; i++) 51 | { 52 | p = chunks[i]; 53 | printf("chunks[%d]: ", i); 54 | pmem(p, 0x10); 55 | size_of_the_chunk = *((size_t *)((char *)p + 8)) - 1; 56 | size_of_the_previous_chunk = *((size_t *)((char *)p)); 57 | printf("chunks[%d]: %p, size = %li, prev = %li\n", 58 | i, p, size_of_the_chunk, size_of_the_previous_chunk); 59 | } 60 | return (EXIT_SUCCESS); 61 | } 62 | -------------------------------------------------------------------------------- /03. malloc, the heap and the program break/8-main.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | /** 6 | * pmem - print mem 7 | * @p: memory address to start printing from 8 | * @bytes: number of bytes to print 9 | * 10 | * Return: nothing 11 | */ 12 | void pmem(void *p, unsigned int bytes) 13 | { 14 | unsigned char *ptr; 15 | unsigned int i; 16 | 17 | ptr = (unsigned char *)p; 18 | for (i = 0; i < bytes; i++) 19 | { 20 | if (i != 0) 21 | { 22 | printf(" "); 23 | } 24 | printf("%02x", *(ptr + i)); 25 | } 26 | printf("\n"); 27 | } 28 | 29 | /** 30 | * main - updating with correct checks 31 | * 32 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 33 | */ 34 | int main(void) 35 | { 36 | void *p; 37 | int i; 38 | size_t size_of_the_chunk; 39 | size_t size_of_the_previous_chunk; 40 | void *chunks[10]; 41 | char prev_used; 42 | 43 | for (i = 0; i < 10; i++) 44 | { 45 | p = malloc(1024 * (i + 1)); 46 | chunks[i] = (void *)((char *)p - 0x10); 47 | } 48 | free((char *)(chunks[3]) + 0x10); 49 | free((char *)(chunks[7]) + 0x10); 50 | for (i = 0; i < 10; i++) 51 | { 52 | p = chunks[i]; 53 | printf("chunks[%d]: ", i); 54 | pmem(p, 0x10); 55 | size_of_the_chunk = *((size_t *)((char *)p + 8)); 56 | prev_used = size_of_the_chunk & 1; 57 | size_of_the_chunk -= prev_used; 58 | size_of_the_previous_chunk = *((size_t *)((char *)p)); 59 | printf("chunks[%d]: %p, size = %li, prev (%s) = %li\n", 60 | i, p, size_of_the_chunk, 61 | (prev_used? "allocated": "unallocated"), size_of_the_previous_chunk); 62 | } 63 | return (EXIT_SUCCESS); 64 | } 65 | -------------------------------------------------------------------------------- /03. malloc, the heap and the program break/9-main.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | /** 6 | * main - moving the program break 7 | * 8 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 9 | */ 10 | int main(void) 11 | { 12 | int i; 13 | 14 | write(1, "START\n", 6); 15 | malloc(1); 16 | getchar(); 17 | write(1, "LOOP\n", 5); 18 | for (i = 0; i < 0x25000 / 1024; i++) 19 | { 20 | malloc(1024); 21 | } 22 | write(1, "END\n", 4); 23 | getchar(); 24 | return (EXIT_SUCCESS); 25 | } 26 | -------------------------------------------------------------------------------- /03. malloc, the heap and the program break/README.md: -------------------------------------------------------------------------------- 1 | ![Hack the VM!](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/htvm3.png) 2 | 3 | # Hack The Virtual Memory, chapter 3: malloc, the heap and the program break 4 | 5 | This is the fourth chapter in a series around virtual memory. The goal of this series is to learn some CS basics, but in a different and more practical way. 6 | 7 | If you missed the previous chapters, you should probably start there: 8 | 9 | - Chapter 0: [Hack The Virtual Memory: C strings & /proc](https://blog.holbertonschool.com/hack-the-virtual-memory-c-strings-proc/) 10 | - Chapter 1: [Hack The Virtual Memory: Python bytes](https://blog.holbertonschool.com/hack-the-virtual-memory-python-bytes/) 11 | - Chapter 2: [Hack The Virtual Memory: Drawing the VM diagram](https://blog.holbertonschool.com/hack-the-virtual-memory-drawing-the-vm-diagram/) 12 | 13 | ## The heap 14 | 15 | In this chapter we will look at the heap and `malloc` in order to answer some of the questions we ended with at the end of the [previous chapter](https://blog.holbertonschool.com/hack-the-virtual-memory-drawing-the-vm-diagram/): 16 | 17 | - Why doesn't our allocated memory start at the very beginning of the heap (0x2050010 vs 02050000)? What are those first 16 bytes used for? 18 | - Is the heap actually growing upwards? 19 | 20 | ## Prerequisites 21 | 22 | In order to fully understand this article, you will need to know: 23 | 24 | - The basics of the C programming language (especially pointers) 25 | - The very basics of the Linux filesystem and the shell 26 | - We will also use the `/proc/[pid]/maps` file (see `man proc` or read our first article [Hack The Virtual Memory, chapter 0: C strings & /proc](https://blog.holbertonschool.com/hack-the-virtual-memory-c-strings-proc/)) 27 | 28 | ## Environment 29 | 30 | All scripts and programs have been tested on the following system: 31 | 32 | - Ubuntu 33 | - Linux ubuntu 4.4.0-31-generic #50~14.04.1-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux 34 | 35 | Tools used: 36 | 37 | - gcc 38 | - gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4 39 | - glibc 2.19 (see [version.c](https://github.com/holbertonschool/Hack-The-Virtual-Memory/blob/master/03.%20malloc%2C%20the%20heap%20and%20the%20program%20break/version.c) if you need to check your glibc version) 40 | - strace 41 | - strace -- version 4.8 42 | 43 | **Everything we will write will be true for this system/environment, but may be different on another system** 44 | 45 | We will also go through the Linux source code. If you are on Ubuntu, you can download the sources of your current kernel by running this command: 46 | 47 | ``` 48 | apt-get source linux-image-$(uname -r) 49 | ``` 50 | 51 | ## `malloc` 52 | 53 | `malloc` is the common function used to dynamically allocate memory. This memory is allocated on the "heap". 54 | _Note: `malloc` is not a system call._ 55 | 56 | From `man malloc`: 57 | 58 | ``` 59 | [...] allocate dynamic memory[...] 60 | void *malloc(size_t size); 61 | [...] 62 | The malloc() function allocates size bytes and returns a pointer to the allocated memory. 63 | ``` 64 | 65 | ### No malloc, no [heap] 66 | 67 | Let's look at memory regions of a process that does not call `malloc` (`0-main.c`). 68 | 69 | ```C 70 | #include 71 | #include 72 | 73 | /** 74 | * main - do nothing 75 | * 76 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 77 | */ 78 | int main(void) 79 | { 80 | getchar(); 81 | return (EXIT_SUCCESS); 82 | } 83 | 84 | ``` 85 | 86 | ``` 87 | julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 0-main.c -o 0 88 | julien@holberton:~/holberton/w/hackthevm3$ ./0 89 | 90 | ``` 91 | 92 | _Quick reminder (1/3): the memory regions of a process are listed in the `/proc/[pid]/maps` file. As a result, we first need to know the PID of the process. That is done using the `ps` command; the second column of `ps aux` output will give us the PID of the process. Please read [chapter 0](https://blog.holbertonschool.com/hack-the-virtual-memory-c-strings-proc/) to learn more._ 93 | 94 | ``` 95 | julien@holberton:/tmp$ ps aux | grep \ \./0$ 96 | julien 3638 0.0 0.0 4200 648 pts/9 S+ 12:01 0:00 ./0 97 | ``` 98 | 99 | _Quick reminder (2/3): from the above output, we can see that the PID of the process we want to look at is `3638`. As a result, the `maps` file will be found in the directory `/proc/3638`._ 100 | 101 | ``` 102 | julien@holberton:/tmp$ cd /proc/3638 103 | ``` 104 | 105 | _Quick reminder (3/3): The `maps` file contains the memory regions of the process. The format of each line in this file is: 106 | address perms offset dev inode pathname_ 107 | 108 | ``` 109 | julien@holberton:/proc/3638$ cat maps 110 | 00400000-00401000 r-xp 00000000 08:01 174583 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/0 111 | 00600000-00601000 r--p 00000000 08:01 174583 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/0 112 | 00601000-00602000 rw-p 00001000 08:01 174583 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/0 113 | 7f38f87d7000-7f38f8991000 r-xp 00000000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 114 | 7f38f8991000-7f38f8b91000 ---p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 115 | 7f38f8b91000-7f38f8b95000 r--p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 116 | 7f38f8b95000-7f38f8b97000 rw-p 001be000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 117 | 7f38f8b97000-7f38f8b9c000 rw-p 00000000 00:00 0 118 | 7f38f8b9c000-7f38f8bbf000 r-xp 00000000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so 119 | 7f38f8da3000-7f38f8da6000 rw-p 00000000 00:00 0 120 | 7f38f8dbb000-7f38f8dbe000 rw-p 00000000 00:00 0 121 | 7f38f8dbe000-7f38f8dbf000 r--p 00022000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so 122 | 7f38f8dbf000-7f38f8dc0000 rw-p 00023000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so 123 | 7f38f8dc0000-7f38f8dc1000 rw-p 00000000 00:00 0 124 | 7ffdd85c5000-7ffdd85e6000 rw-p 00000000 00:00 0 [stack] 125 | 7ffdd85f2000-7ffdd85f4000 r--p 00000000 00:00 0 [vvar] 126 | 7ffdd85f4000-7ffdd85f6000 r-xp 00000000 00:00 0 [vdso] 127 | ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] 128 | julien@holberton:/proc/3638$ 129 | ``` 130 | 131 | _Note: `hackthevm3` is a symbolic link to `hack_the_virtual_memory/03. The Heap/`_ 132 | 133 | -> As we can see from the above maps file, there’s no [heap] region allocated. 134 | 135 | ### `malloc(x)` 136 | 137 | Let's do the same but with a program that calls `malloc` (`1-main.c`): 138 | 139 | ```C 140 | #include 141 | #include 142 | 143 | /** 144 | * main - 1 call to malloc 145 | * 146 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 147 | */ 148 | int main(void) 149 | { 150 | malloc(1); 151 | getchar(); 152 | return (EXIT_SUCCESS); 153 | } 154 | ``` 155 | 156 | ``` 157 | julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 1-main.c -o 1 158 | julien@holberton:~/holberton/w/hackthevm3$ ./1 159 | 160 | ``` 161 | 162 | ``` 163 | julien@holberton:/proc/3638$ ps aux | grep \ \./1$ 164 | julien 3718 0.0 0.0 4332 660 pts/9 S+ 12:09 0:00 ./1 165 | julien@holberton:/proc/3638$ cd /proc/3718 166 | julien@holberton:/proc/3718$ cat maps 167 | 00400000-00401000 r-xp 00000000 08:01 176964 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/1 168 | 00600000-00601000 r--p 00000000 08:01 176964 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/1 169 | 00601000-00602000 rw-p 00001000 08:01 176964 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/1 170 | 01195000-011b6000 rw-p 00000000 00:00 0 [heap] 171 | ... 172 | julien@holberton:/proc/3718$ 173 | ``` 174 | 175 | -> the [heap] is here. 176 | 177 | Let's check the return value of `malloc` to make sure the returned address is in the heap region (`2-main.c`): 178 | 179 | ```C 180 | #include 181 | #include 182 | 183 | /** 184 | * main - prints the malloc returned address 185 | * 186 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 187 | */ 188 | int main(void) 189 | { 190 | void *p; 191 | 192 | p = malloc(1); 193 | printf("%p\n", p); 194 | getchar(); 195 | return (EXIT_SUCCESS); 196 | } 197 | ``` 198 | 199 | ``` 200 | julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 2-main.c -o 2 201 | julien@holberton:~/holberton/w/hackthevm3$ ./2 202 | 0x24d6010 203 | 204 | ``` 205 | 206 | ``` 207 | julien@holberton:/proc/3718$ ps aux | grep \ \./2$ 208 | julien 3834 0.0 0.0 4336 676 pts/9 S+ 12:48 0:00 ./2 209 | julien@holberton:/proc/3718$ cd /proc/3834 210 | julien@holberton:/proc/3834$ cat maps 211 | 00400000-00401000 r-xp 00000000 08:01 176966 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/2 212 | 00600000-00601000 r--p 00000000 08:01 176966 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/2 213 | 00601000-00602000 rw-p 00001000 08:01 176966 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/2 214 | 024d6000-024f7000 rw-p 00000000 00:00 0 [heap] 215 | ... 216 | julien@holberton:/proc/3834$ 217 | ``` 218 | 219 | -> `024d6000` <`0x24d6010` < `024f7000` 220 | 221 | The returned address is inside the heap region. And as we have seen in the [previous chapter](https://blog.holbertonschool.com/hack-the-virtual-memory-drawing-the-vm-diagram/), the returned address does not start exactly at the beginning of the region; we'll see why later. 222 | 223 | ## `strace`, `brk` and `sbrk` 224 | 225 | `malloc` is a "regular" function (as opposed to a system call), so it must call some kind of syscall in order to manipulate the heap. Let's use `strace` to find out. 226 | 227 | `strace` is a program used to trace system calls and signals. Any program will always use a few syscalls before your `main` function is executed. In order to know which syscalls are used by `malloc`, we will add a `write` syscall before and after the call to `malloc`(`3-main.c`). 228 | 229 | ``` 230 | #include 231 | #include 232 | #include 233 | 234 | /** 235 | * main - let's find out which syscall malloc is using 236 | * 237 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 238 | */ 239 | int main(void) 240 | { 241 | void *p; 242 | 243 | write(1, "BEFORE MALLOC\n", 14); 244 | p = malloc(1); 245 | write(1, "AFTER MALLOC\n", 13); 246 | printf("%p\n", p); 247 | getchar(); 248 | return (EXIT_SUCCESS); 249 | } 250 | ``` 251 | 252 | ``` 253 | julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 3-main.c -o 3 254 | julien@holberton:~/holberton/w/hackthevm3$ strace ./3 255 | execve("./3", ["./3"], [/* 61 vars */]) = 0 256 | ... 257 | write(1, "BEFORE MALLOC\n", 14BEFORE MALLOC 258 | ) = 14 259 | brk(0) = 0xe70000 260 | brk(0xe91000) = 0xe91000 261 | write(1, "AFTER MALLOC\n", 13AFTER MALLOC 262 | ) = 13 263 | ... 264 | read(0, 265 | ``` 266 | 267 | From the above listing we can focus on this: 268 | 269 | ``` 270 | brk(0) = 0xe70000 271 | brk(0xe91000) = 0xe91000 272 | ``` 273 | 274 | -> `malloc` is using the `brk` system call in order to manipulate the heap. From `brk` man page (`man brk`), we can see what this system call is doing: 275 | 276 | ``` 277 | ... 278 | int brk(void *addr); 279 | void *sbrk(intptr_t increment); 280 | ... 281 | DESCRIPTION 282 | brk() and sbrk() change the location of the program break, which defines 283 | the end of the process's data segment (i.e., the program break is the first 284 | location after the end of the uninitialized data segment). Increasing the 285 | program break has the effect of allocating memory to the process; decreas‐ 286 | ing the break deallocates memory. 287 | 288 | brk() sets the end of the data segment to the value specified by addr, when 289 | that value is reasonable, the system has enough memory, and the process 290 | does not exceed its maximum data size (see setrlimit(2)). 291 | 292 | sbrk() increments the program's data space by increment bytes. Calling 293 | sbrk() with an increment of 0 can be used to find the current location of 294 | the program break. 295 | ``` 296 | 297 | The program break is the address of the first location beyond the current end of the data region of the program in the virual memory. 298 | 299 | ![program break before the call to malloc / brk](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/program-break-before.png) 300 | 301 | By increasing the value of the program break, via `brk` or `sbrk`, the function `malloc` creates a new space that can then be used by the process to dynamically allocate memory (using `malloc`). 302 | 303 | ![program break after the malloc / brk call](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/program-break-after.png) 304 | 305 | So the heap is actually an extension of the data segment of the program. 306 | 307 | The first call to `brk` (`brk(0)`) returns the current address of the program break to `malloc`. And the second call is the one that actually creates new memory (since `0xe91000` > `0xe70000`) by increasing the value of the program break. In the above example, the heap is now starting at `0xe70000` and ends at `0xe91000`. Let's double check with the `/proc/[PID]/maps` file: 308 | 309 | ``` 310 | julien@holberton:/proc/3855$ ps aux | grep \ \./3$ 311 | julien 4011 0.0 0.0 4748 708 pts/9 S+ 13:04 0:00 strace ./3 312 | julien 4014 0.0 0.0 4336 644 pts/9 S+ 13:04 0:00 ./3 313 | julien@holberton:/proc/3855$ cd /proc/4014 314 | julien@holberton:/proc/4014$ cat maps 315 | 00400000-00401000 r-xp 00000000 08:01 176967 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/3 316 | 00600000-00601000 r--p 00000000 08:01 176967 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/3 317 | 00601000-00602000 rw-p 00001000 08:01 176967 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/3 318 | 00e70000-00e91000 rw-p 00000000 00:00 0 [heap] 319 | ... 320 | julien@holberton:/proc/4014$ 321 | ``` 322 | 323 | -> `00e70000-00e91000 rw-p 00000000 00:00 0 [heap]` matches the pointers returned back to `malloc` by `brk`. 324 | 325 | That's great, but wait, why did`malloc` increment the heap by `00e91000` - `00e70000` = `0x21000` or `135168` bytes, when we only asked for only 1 byte? 326 | 327 | ## Many mallocs 328 | 329 | What will happen if we call `malloc` several times? (`4-main.c`) 330 | 331 | ```C 332 | #include 333 | #include 334 | #include 335 | 336 | /** 337 | * main - many calls to malloc 338 | * 339 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 340 | */ 341 | int main(void) 342 | { 343 | void *p; 344 | 345 | write(1, "BEFORE MALLOC #0\n", 17); 346 | p = malloc(1024); 347 | write(1, "AFTER MALLOC #0\n", 16); 348 | printf("%p\n", p); 349 | 350 | write(1, "BEFORE MALLOC #1\n", 17); 351 | p = malloc(1024); 352 | write(1, "AFTER MALLOC #1\n", 16); 353 | printf("%p\n", p); 354 | 355 | write(1, "BEFORE MALLOC #2\n", 17); 356 | p = malloc(1024); 357 | write(1, "AFTER MALLOC #2\n", 16); 358 | printf("%p\n", p); 359 | 360 | write(1, "BEFORE MALLOC #3\n", 17); 361 | p = malloc(1024); 362 | write(1, "AFTER MALLOC #3\n", 16); 363 | printf("%p\n", p); 364 | 365 | getchar(); 366 | return (EXIT_SUCCESS); 367 | } 368 | ``` 369 | 370 | ``` 371 | julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 4-main.c -o 4 372 | julien@holberton:~/holberton/w/hackthevm3$ strace ./4 373 | execve("./4", ["./4"], [/* 61 vars */]) = 0 374 | ... 375 | write(1, "BEFORE MALLOC #0\n", 17BEFORE MALLOC #0 376 | ) = 17 377 | brk(0) = 0x1314000 378 | brk(0x1335000) = 0x1335000 379 | write(1, "AFTER MALLOC #0\n", 16AFTER MALLOC #0 380 | ) = 16 381 | ... 382 | write(1, "0x1314010\n", 100x1314010 383 | ) = 10 384 | write(1, "BEFORE MALLOC #1\n", 17BEFORE MALLOC #1 385 | ) = 17 386 | write(1, "AFTER MALLOC #1\n", 16AFTER MALLOC #1 387 | ) = 16 388 | write(1, "0x1314420\n", 100x1314420 389 | ) = 10 390 | write(1, "BEFORE MALLOC #2\n", 17BEFORE MALLOC #2 391 | ) = 17 392 | write(1, "AFTER MALLOC #2\n", 16AFTER MALLOC #2 393 | ) = 16 394 | write(1, "0x1314830\n", 100x1314830 395 | ) = 10 396 | write(1, "BEFORE MALLOC #3\n", 17BEFORE MALLOC #3 397 | ) = 17 398 | write(1, "AFTER MALLOC #3\n", 16AFTER MALLOC #3 399 | ) = 16 400 | write(1, "0x1314c40\n", 100x1314c40 401 | ) = 10 402 | ... 403 | read(0, 404 | ``` 405 | 406 | -> `malloc` is NOT calling `brk` each time we call it. 407 | 408 | The first time, `malloc` creates a new space (the heap) for the program (by increasing the program break location). The following times, `malloc` uses the same space to give our program "new" chunks of memory. Those "new" chunks of memory are part of the memory previously allocated using `brk`. This way, `malloc` doesn't have to use syscalls (`brk`) every time we call it, and thus it makes `malloc` - and our programs using `malloc` - faster. It also allows `malloc` and `free` to optimize the usage of the memory. 409 | 410 | Let's double check that we have only one heap, allocated by the first call to `brk`: 411 | 412 | ``` 413 | julien@holberton:/proc/4014$ ps aux | grep \ \./4$ 414 | julien 4169 0.0 0.0 4748 688 pts/9 S+ 13:33 0:00 strace ./4 415 | julien 4172 0.0 0.0 4336 656 pts/9 S+ 13:33 0:00 ./4 416 | julien@holberton:/proc/4014$ cd /proc/4172 417 | julien@holberton:/proc/4172$ cat maps 418 | 00400000-00401000 r-xp 00000000 08:01 176973 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/4 419 | 00600000-00601000 r--p 00000000 08:01 176973 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/4 420 | 00601000-00602000 rw-p 00001000 08:01 176973 /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/4 421 | 01314000-01335000 rw-p 00000000 00:00 0 [heap] 422 | 7f4a3f2c4000-7f4a3f47e000 r-xp 00000000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 423 | 7f4a3f47e000-7f4a3f67e000 ---p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 424 | 7f4a3f67e000-7f4a3f682000 r--p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 425 | 7f4a3f682000-7f4a3f684000 rw-p 001be000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 426 | 7f4a3f684000-7f4a3f689000 rw-p 00000000 00:00 0 427 | 7f4a3f689000-7f4a3f6ac000 r-xp 00000000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so 428 | 7f4a3f890000-7f4a3f893000 rw-p 00000000 00:00 0 429 | 7f4a3f8a7000-7f4a3f8ab000 rw-p 00000000 00:00 0 430 | 7f4a3f8ab000-7f4a3f8ac000 r--p 00022000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so 431 | 7f4a3f8ac000-7f4a3f8ad000 rw-p 00023000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so 432 | 7f4a3f8ad000-7f4a3f8ae000 rw-p 00000000 00:00 0 433 | 7ffd1ba73000-7ffd1ba94000 rw-p 00000000 00:00 0 [stack] 434 | 7ffd1bbed000-7ffd1bbef000 r--p 00000000 00:00 0 [vvar] 435 | 7ffd1bbef000-7ffd1bbf1000 r-xp 00000000 00:00 0 [vdso] 436 | ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] 437 | julien@holberton:/proc/4172$ 438 | ``` 439 | 440 | -> We have only one [heap] and the addresses match those returned by `sbrk`: `0x1314000` & `0x1335000` 441 | 442 | ## Naive malloc 443 | 444 | Based on the above, and assuming we won't ever need to free anything, we can now write our own (naive) version of `malloc`, that would move the program break each time it is called. 445 | 446 | ```C 447 | #include 448 | #include 449 | 450 | /** 451 | * malloc - naive version of malloc: dynamically allocates memory on the heap using sbrk 452 | * @size: number of bytes to allocate 453 | * 454 | * Return: the memory address newly allocated, or NULL on error 455 | * 456 | * Note: don't do this at home :) 457 | */ 458 | void *malloc(size_t size) 459 | { 460 | void *previous_break; 461 | 462 | previous_break = sbrk(size); 463 | /* check for error */ 464 | if (previous_break == (void *) -1) 465 | { 466 | /* on error malloc returns NULL */ 467 | return (NULL); 468 | } 469 | return (previous_break); 470 | } 471 | ``` 472 | 473 | ## The 0x10 lost bytes 474 | 475 | If we look at the output of the previous program (`4-main.c`), we can see that the first memory address returned by `malloc` doesn't start at the beginning of the heap, but `0x10` bytes after: `0x1314010` vs `0x1314000`. Also, when we call `malloc(1024)` a second time, the address should be `0x1314010` (the returned value of the first call to `malloc`) + `1024` (or `0x400` in hexadecimal, since the first call to `malloc` was asking for `1024` bytes) = `0x1318010`. But the return value of the second call to `malloc` is `0x1314420`. We have lost `0x10` bytes again! Same goes for the subsequent calls. 476 | 477 | Let's look at what we can find inside those "lost" `0x10`-byte memory spaces (`5-main.c`) and whether the memory loss stays constant: 478 | 479 | ```C 480 | #include 481 | #include 482 | #include 483 | 484 | /** 485 | * pmem - print mem 486 | * @p: memory address to start printing from 487 | * @bytes: number of bytes to print 488 | * 489 | * Return: nothing 490 | */ 491 | void pmem(void *p, unsigned int bytes) 492 | { 493 | unsigned char *ptr; 494 | unsigned int i; 495 | 496 | ptr = (unsigned char *)p; 497 | for (i = 0; i < bytes; i++) 498 | { 499 | if (i != 0) 500 | { 501 | printf(" "); 502 | } 503 | printf("%02x", *(ptr + i)); 504 | } 505 | printf("\n"); 506 | } 507 | 508 | /** 509 | * main - the 0x10 lost bytes 510 | * 511 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 512 | */ 513 | int main(void) 514 | { 515 | void *p; 516 | int i; 517 | 518 | for (i = 0; i < 10; i++) 519 | { 520 | p = malloc(1024 * (i + 1)); 521 | printf("%p\n", p); 522 | printf("bytes at %p:\n", (void *)((char *)p - 0x10)); 523 | pmem((char *)p - 0x10, 0x10); 524 | } 525 | return (EXIT_SUCCESS); 526 | } 527 | ``` 528 | 529 | ``` 530 | julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 5-main.c -o 5 531 | julien@holberton:~/holberton/w/hackthevm3$ ./5 532 | 0x1fa8010 533 | bytes at 0x1fa8000: 534 | 00 00 00 00 00 00 00 00 11 04 00 00 00 00 00 00 535 | 0x1fa8420 536 | bytes at 0x1fa8410: 537 | 00 00 00 00 00 00 00 00 11 08 00 00 00 00 00 00 538 | 0x1fa8c30 539 | bytes at 0x1fa8c20: 540 | 00 00 00 00 00 00 00 00 11 0c 00 00 00 00 00 00 541 | 0x1fa9840 542 | bytes at 0x1fa9830: 543 | 00 00 00 00 00 00 00 00 11 10 00 00 00 00 00 00 544 | 0x1faa850 545 | bytes at 0x1faa840: 546 | 00 00 00 00 00 00 00 00 11 14 00 00 00 00 00 00 547 | 0x1fabc60 548 | bytes at 0x1fabc50: 549 | 00 00 00 00 00 00 00 00 11 18 00 00 00 00 00 00 550 | 0x1fad470 551 | bytes at 0x1fad460: 552 | 00 00 00 00 00 00 00 00 11 1c 00 00 00 00 00 00 553 | 0x1faf080 554 | bytes at 0x1faf070: 555 | 00 00 00 00 00 00 00 00 11 20 00 00 00 00 00 00 556 | 0x1fb1090 557 | bytes at 0x1fb1080: 558 | 00 00 00 00 00 00 00 00 11 24 00 00 00 00 00 00 559 | 0x1fb34a0 560 | bytes at 0x1fb3490: 561 | 00 00 00 00 00 00 00 00 11 28 00 00 00 00 00 00 562 | julien@holberton:~/holberton/w/hackthevm3$ 563 | ``` 564 | 565 | There is one clear pattern: the size of the malloc'ed memory chunk is always found in the preceding 0x10 bytes. For instance, the first `malloc` call is malloc'ing `1024` (`0x0400`) bytes and we can find `11 04 00 00 00 00 00 00` in the preceding `0x10` bytes. Those last bytes represent the number `0x 00 00 00 00 00 00 04 11` = `0x400` (1024) + `0x10` (the block size preceding those `1024` bytes + `1` (we'll talk about this "+1" later in this chapter). If we look at each `0x10` bytes preceding the addresses returned by `malloc`, they all contain the size of the chunk of memory asked to `malloc` + `0x10` + `1`. 566 | 567 | At this point, given what we said and saw earlier, we can probably guess that those 0x10 bytes are a sort of data structure used by `malloc` (and `free`) to deal with the heap. And indeed, even though we don't understand everything yet, we can already use this data structure to go from one malloc'ed chunk of memory to the other (`6-main.c`) as long as we have the address of the beginning of the heap (_and as long as we have never called `free`_): 568 | 569 | ```C 570 | #include 571 | #include 572 | #include 573 | 574 | /** 575 | * pmem - print mem 576 | * @p: memory address to start printing from 577 | * @bytes: number of bytes to print 578 | * 579 | * Return: nothing 580 | */ 581 | void pmem(void *p, unsigned int bytes) 582 | { 583 | unsigned char *ptr; 584 | unsigned int i; 585 | 586 | ptr = (unsigned char *)p; 587 | for (i = 0; i < bytes; i++) 588 | { 589 | if (i != 0) 590 | { 591 | printf(" "); 592 | } 593 | printf("%02x", *(ptr + i)); 594 | } 595 | printf("\n"); 596 | } 597 | 598 | /** 599 | * main - using the 0x10 bytes to jump to next malloc'ed chunks 600 | * 601 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 602 | */ 603 | int main(void) 604 | { 605 | void *p; 606 | int i; 607 | void *heap_start; 608 | size_t size_of_the_block; 609 | 610 | heap_start = sbrk(0); 611 | write(1, "START\n", 6); 612 | for (i = 0; i < 10; i++) 613 | { 614 | p = malloc(1024 * (i + 1)); 615 | *((int *)p) = i; 616 | printf("%p: [%i]\n", p, i); 617 | } 618 | p = heap_start; 619 | for (i = 0; i < 10; i++) 620 | { 621 | pmem(p, 0x10); 622 | size_of_the_block = *((size_t *)((char *)p + 8)) - 1; 623 | printf("%p: [%i] - size = %lu\n", 624 | (void *)((char *)p + 0x10), 625 | *((int *)((char *)p + 0x10)), 626 | size_of_the_block); 627 | p = (void *)((char *)p + size_of_the_block); 628 | } 629 | write(1, "END\n", 4); 630 | return (EXIT_SUCCESS); 631 | } 632 | ``` 633 | 634 | ``` 635 | julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 6-main.c -o 6 636 | julien@holberton:~/holberton/w/hackthevm3$ ./6 637 | START 638 | 0x9e6010: [0] 639 | 0x9e6420: [1] 640 | 0x9e6c30: [2] 641 | 0x9e7840: [3] 642 | 0x9e8850: [4] 643 | 0x9e9c60: [5] 644 | 0x9eb470: [6] 645 | 0x9ed080: [7] 646 | 0x9ef090: [8] 647 | 0x9f14a0: [9] 648 | 00 00 00 00 00 00 00 00 11 04 00 00 00 00 00 00 649 | 0x9e6010: [0] - size = 1040 650 | 00 00 00 00 00 00 00 00 11 08 00 00 00 00 00 00 651 | 0x9e6420: [1] - size = 2064 652 | 00 00 00 00 00 00 00 00 11 0c 00 00 00 00 00 00 653 | 0x9e6c30: [2] - size = 3088 654 | 00 00 00 00 00 00 00 00 11 10 00 00 00 00 00 00 655 | 0x9e7840: [3] - size = 4112 656 | 00 00 00 00 00 00 00 00 11 14 00 00 00 00 00 00 657 | 0x9e8850: [4] - size = 5136 658 | 00 00 00 00 00 00 00 00 11 18 00 00 00 00 00 00 659 | 0x9e9c60: [5] - size = 6160 660 | 00 00 00 00 00 00 00 00 11 1c 00 00 00 00 00 00 661 | 0x9eb470: [6] - size = 7184 662 | 00 00 00 00 00 00 00 00 11 20 00 00 00 00 00 00 663 | 0x9ed080: [7] - size = 8208 664 | 00 00 00 00 00 00 00 00 11 24 00 00 00 00 00 00 665 | 0x9ef090: [8] - size = 9232 666 | 00 00 00 00 00 00 00 00 11 28 00 00 00 00 00 00 667 | 0x9f14a0: [9] - size = 10256 668 | END 669 | julien@holberton:~/holberton/w/hackthevm3$ 670 | ``` 671 | 672 | One of our open questions from the previous chapter is now answered: `malloc` is using `0x10` additional bytes for each malloc'ed memory block to store the size of the block. 673 | 674 | ![0x10 bytes preceeding malloc](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/0x10-malloc.png) 675 | 676 | This data will actually be used by `free` to save it to a list of available blocks for future calls to `malloc`. 677 | 678 | But our study also raises a new question: what are the first 8 bytes of the 16 (`0x10` in hexadecimal) bytes used for? It seems to always be zero. Is it just padding? 679 | 680 | ### RTFSC 681 | 682 | At this stage, we probably want to check the source code of `malloc` to confirm what we just found (`malloc.c` from the glibc). 683 | 684 | ``` 685 | 1055 /* 686 | 1056 malloc_chunk details: 687 | 1057 688 | 1058 (The following includes lightly edited explanations by Colin Plumb.) 689 | 1059 690 | 1060 Chunks of memory are maintained using a `boundary tag' method as 691 | 1061 described in e.g., Knuth or Standish. (See the paper by Paul 692 | 1062 Wilson ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for a 693 | 1063 survey of such techniques.) Sizes of free chunks are stored both 694 | 1064 in the front of each chunk and at the end. This makes 695 | 1065 consolidating fragmented chunks into bigger chunks very fast. The 696 | 1066 size fields also hold bits representing whether chunks are free or 697 | 1067 in use. 698 | 1068 699 | 1069 An allocated chunk looks like this: 700 | 1070 701 | 1071 702 | 1072 chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 703 | 1073 | Size of previous chunk, if unallocated (P clear) | 704 | 1074 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 705 | 1075 | Size of chunk, in bytes |A|M|P| 706 | 1076 mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 707 | 1077 | User data starts here... . 708 | 1078 . . 709 | 1079 . (malloc_usable_size() bytes) . 710 | 1080 . | 711 | 1081 nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 712 | 1082 | (size of chunk, but used for application data) | 713 | 1083 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 714 | 1084 | Size of next chunk, in bytes |A|0|1| 715 | 1085 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 716 | 1086 717 | 1087 Where "chunk" is the front of the chunk for the purpose of most of 718 | 1088 the malloc code, but "mem" is the pointer that is returned to the 719 | 1089 user. "Nextchunk" is the beginning of the next contiguous chunk. 720 | ``` 721 | 722 | -> We were correct \o/. Right before the address returned by `malloc` to the user, we have two variables: 723 | 724 | - Size of previous chunk, if unallocated: we never free'd any chunks so that is why it was always 0 725 | - Size of chunk, in bytes 726 | 727 | Let's free some chunks to confirm that the first 8 bytes are used the way the source code describes it (`7-main.c`): 728 | 729 | ```C 730 | #include 731 | #include 732 | #include 733 | 734 | /** 735 | * pmem - print mem 736 | * @p: memory address to start printing from 737 | * @bytes: number of bytes to print 738 | * 739 | * Return: nothing 740 | */ 741 | void pmem(void *p, unsigned int bytes) 742 | { 743 | unsigned char *ptr; 744 | unsigned int i; 745 | 746 | ptr = (unsigned char *)p; 747 | for (i = 0; i < bytes; i++) 748 | { 749 | if (i != 0) 750 | { 751 | printf(" "); 752 | } 753 | printf("%02x", *(ptr + i)); 754 | } 755 | printf("\n"); 756 | } 757 | 758 | /** 759 | * main - confirm the source code 760 | * 761 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 762 | */ 763 | int main(void) 764 | { 765 | void *p; 766 | int i; 767 | size_t size_of_the_chunk; 768 | size_t size_of_the_previous_chunk; 769 | void *chunks[10]; 770 | 771 | for (i = 0; i < 10; i++) 772 | { 773 | p = malloc(1024 * (i + 1)); 774 | chunks[i] = (void *)((char *)p - 0x10); 775 | printf("%p\n", p); 776 | } 777 | free((char *)(chunks[3]) + 0x10); 778 | free((char *)(chunks[7]) + 0x10); 779 | for (i = 0; i < 10; i++) 780 | { 781 | p = chunks[i]; 782 | printf("chunks[%d]: ", i); 783 | pmem(p, 0x10); 784 | size_of_the_chunk = *((size_t *)((char *)p + 8)) - 1; 785 | size_of_the_previous_chunk = *((size_t *)((char *)p)); 786 | printf("chunks[%d]: %p, size = %li, prev = %li\n", 787 | i, p, size_of_the_chunk, size_of_the_previous_chunk); 788 | } 789 | return (EXIT_SUCCESS); 790 | } 791 | ``` 792 | 793 | ``` 794 | julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 7-main.c -o 7 795 | julien@holberton:~/holberton/w/hackthevm3$ ./7 796 | 0x1536010 797 | 0x1536420 798 | 0x1536c30 799 | 0x1537840 800 | 0x1538850 801 | 0x1539c60 802 | 0x153b470 803 | 0x153d080 804 | 0x153f090 805 | 0x15414a0 806 | chunks[0]: 00 00 00 00 00 00 00 00 11 04 00 00 00 00 00 00 807 | chunks[0]: 0x1536000, size = 1040, prev = 0 808 | chunks[1]: 00 00 00 00 00 00 00 00 11 08 00 00 00 00 00 00 809 | chunks[1]: 0x1536410, size = 2064, prev = 0 810 | chunks[2]: 00 00 00 00 00 00 00 00 11 0c 00 00 00 00 00 00 811 | chunks[2]: 0x1536c20, size = 3088, prev = 0 812 | chunks[3]: 00 00 00 00 00 00 00 00 11 10 00 00 00 00 00 00 813 | chunks[3]: 0x1537830, size = 4112, prev = 0 814 | chunks[4]: 10 10 00 00 00 00 00 00 10 14 00 00 00 00 00 00 815 | chunks[4]: 0x1538840, size = 5135, prev = 4112 816 | chunks[5]: 00 00 00 00 00 00 00 00 11 18 00 00 00 00 00 00 817 | chunks[5]: 0x1539c50, size = 6160, prev = 0 818 | chunks[6]: 00 00 00 00 00 00 00 00 11 1c 00 00 00 00 00 00 819 | chunks[6]: 0x153b460, size = 7184, prev = 0 820 | chunks[7]: 00 00 00 00 00 00 00 00 11 20 00 00 00 00 00 00 821 | chunks[7]: 0x153d070, size = 8208, prev = 0 822 | chunks[8]: 10 20 00 00 00 00 00 00 10 24 00 00 00 00 00 00 823 | chunks[8]: 0x153f080, size = 9231, prev = 8208 824 | chunks[9]: 00 00 00 00 00 00 00 00 11 28 00 00 00 00 00 00 825 | chunks[9]: 0x1541490, size = 10256, prev = 0 826 | julien@holberton:~/holberton/w/hackthevm3$ 827 | ``` 828 | 829 | As we can see from the above listing, when the previous chunk has been free'd, the malloc chunk's first 8 bytes contain the size of the previous unallocated chunk. So the correct representation of a malloc chunk is the following: 830 | 831 | ![malloc chunk](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/malloc-chunk.png) 832 | 833 | Also, it seems that the first bit of the next 8 bytes (containing the size of the current chunk) serves as a flag to check if the previous chunk is used (`1`) or not (`0`). So the correct updated version of our program should be written this way (`8-main.c`): 834 | 835 | ```C 836 | #include 837 | #include 838 | #include 839 | 840 | /** 841 | * pmem - print mem 842 | * @p: memory address to start printing from 843 | * @bytes: number of bytes to print 844 | * 845 | * Return: nothing 846 | */ 847 | void pmem(void *p, unsigned int bytes) 848 | { 849 | unsigned char *ptr; 850 | unsigned int i; 851 | 852 | ptr = (unsigned char *)p; 853 | for (i = 0; i < bytes; i++) 854 | { 855 | if (i != 0) 856 | { 857 | printf(" "); 858 | } 859 | printf("%02x", *(ptr + i)); 860 | } 861 | printf("\n"); 862 | } 863 | 864 | /** 865 | * main - updating with correct checks 866 | * 867 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 868 | */ 869 | int main(void) 870 | { 871 | void *p; 872 | int i; 873 | size_t size_of_the_chunk; 874 | size_t size_of_the_previous_chunk; 875 | void *chunks[10]; 876 | char prev_used; 877 | 878 | for (i = 0; i < 10; i++) 879 | { 880 | p = malloc(1024 * (i + 1)); 881 | chunks[i] = (void *)((char *)p - 0x10); 882 | } 883 | free((char *)(chunks[3]) + 0x10); 884 | free((char *)(chunks[7]) + 0x10); 885 | for (i = 0; i < 10; i++) 886 | { 887 | p = chunks[i]; 888 | printf("chunks[%d]: ", i); 889 | pmem(p, 0x10); 890 | size_of_the_chunk = *((size_t *)((char *)p + 8)); 891 | prev_used = size_of_the_chunk & 1; 892 | size_of_the_chunk -= prev_used; 893 | size_of_the_previous_chunk = *((size_t *)((char *)p)); 894 | printf("chunks[%d]: %p, size = %li, prev (%s) = %li\n", 895 | i, p, size_of_the_chunk, 896 | (prev_used? "allocated": "unallocated"), size_of_the_previous_chunk); 897 | } 898 | return (EXIT_SUCCESS); 899 | } 900 | ``` 901 | 902 | ``` 903 | julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 8-main.c -o 8 904 | julien@holberton:~/holberton/w/hackthevm3$ ./8 905 | chunks[0]: 00 00 00 00 00 00 00 00 11 04 00 00 00 00 00 00 906 | chunks[0]: 0x1031000, size = 1040, prev (allocated) = 0 907 | chunks[1]: 00 00 00 00 00 00 00 00 11 08 00 00 00 00 00 00 908 | chunks[1]: 0x1031410, size = 2064, prev (allocated) = 0 909 | chunks[2]: 00 00 00 00 00 00 00 00 11 0c 00 00 00 00 00 00 910 | chunks[2]: 0x1031c20, size = 3088, prev (allocated) = 0 911 | chunks[3]: 00 00 00 00 00 00 00 00 11 10 00 00 00 00 00 00 912 | chunks[3]: 0x1032830, size = 4112, prev (allocated) = 0 913 | chunks[4]: 10 10 00 00 00 00 00 00 10 14 00 00 00 00 00 00 914 | chunks[4]: 0x1033840, size = 5136, prev (unallocated) = 4112 915 | chunks[5]: 00 00 00 00 00 00 00 00 11 18 00 00 00 00 00 00 916 | chunks[5]: 0x1034c50, size = 6160, prev (allocated) = 0 917 | chunks[6]: 00 00 00 00 00 00 00 00 11 1c 00 00 00 00 00 00 918 | chunks[6]: 0x1036460, size = 7184, prev (allocated) = 0 919 | chunks[7]: 00 00 00 00 00 00 00 00 11 20 00 00 00 00 00 00 920 | chunks[7]: 0x1038070, size = 8208, prev (allocated) = 0 921 | chunks[8]: 10 20 00 00 00 00 00 00 10 24 00 00 00 00 00 00 922 | chunks[8]: 0x103a080, size = 9232, prev (unallocated) = 8208 923 | chunks[9]: 00 00 00 00 00 00 00 00 11 28 00 00 00 00 00 00 924 | chunks[9]: 0x103c490, size = 10256, prev (allocated) = 0 925 | julien@holberton:~/holberton/w/hackthevm3$ 926 | ``` 927 | 928 | ## Is the heap actually growing upwards? 929 | 930 | The last question left unanswered is: "Is the heap actually growing upwards?". From the `brk` man page, it seems so: 931 | 932 | ``` 933 | DESCRIPTION 934 | brk() and sbrk() change the location of the program break, which defines the end of the 935 | process's data segment (i.e., the program break is the first location after the end of 936 | the uninitialized data segment). Increasing the program break has the effect of allocat‐ 937 | ing memory to the process; decreasing the break deallocates memory. 938 | ``` 939 | 940 | Let's check! (`9-main.c`) 941 | 942 | ```C 943 | #include 944 | #include 945 | #include 946 | 947 | /** 948 | * main - moving the program break 949 | * 950 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 951 | */ 952 | int main(void) 953 | { 954 | int i; 955 | 956 | write(1, "START\n", 6); 957 | malloc(1); 958 | getchar(); 959 | write(1, "LOOP\n", 5); 960 | for (i = 0; i < 0x25000 / 1024; i++) 961 | { 962 | malloc(1024); 963 | } 964 | write(1, "END\n", 4); 965 | getchar(); 966 | return (EXIT_SUCCESS); 967 | } 968 | ``` 969 | 970 | Now let’s confirm this assumption with strace: 971 | 972 | ``` 973 | julien@holberton:~/holberton/w/hackthevm3$ strace ./9 974 | execve("./9", ["./9"], [/* 61 vars */]) = 0 975 | ... 976 | write(1, "START\n", 6START 977 | ) = 6 978 | brk(0) = 0x1fd8000 979 | brk(0x1ff9000) = 0x1ff9000 980 | ... 981 | write(1, "LOOP\n", 5LOOP 982 | ) = 5 983 | brk(0x201a000) = 0x201a000 984 | write(1, "END\n", 4END 985 | ) = 4 986 | ... 987 | julien@holberton:~/holberton/w/hackthevm3$ 988 | ``` 989 | 990 | clearly, `malloc` made only two calls to `brk` to increase the allocated space on the heap. And the second call is using a higher memory address argument (`0x201a000` > `0x1ff9000`). The second syscall was triggered when the space on the heap was too small to host all the malloc calls. 991 | 992 | Let's double check with `/proc`. 993 | 994 | ``` 995 | julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 9-main.c -o 9 996 | julien@holberton:~/holberton/w/hackthevm3$ ./9 997 | START 998 | 999 | ``` 1000 | 1001 | ``` 1002 | julien@holberton:/proc/7855$ ps aux | grep \ \./9$ 1003 | julien 7972 0.0 0.0 4332 684 pts/9 S+ 19:08 0:00 ./9 1004 | julien@holberton:/proc/7855$ cd /proc/7972 1005 | julien@holberton:/proc/7972$ cat maps 1006 | ... 1007 | 00901000-00922000 rw-p 00000000 00:00 0 [heap] 1008 | ... 1009 | julien@holberton:/proc/7972$ 1010 | ``` 1011 | 1012 | -> `00901000-00922000 rw-p 00000000 00:00 0 [heap]` 1013 | Let's hit Enter and look at the [heap] again: 1014 | 1015 | ``` 1016 | LOOP 1017 | END 1018 | 1019 | ``` 1020 | 1021 | ``` 1022 | julien@holberton:/proc/7972$ cat maps 1023 | ... 1024 | 00901000-00943000 rw-p 00000000 00:00 0 [heap] 1025 | ... 1026 | julien@holberton:/proc/7972$ 1027 | ``` 1028 | 1029 | -> `00901000-00943000 rw-p 00000000 00:00 0 [heap]` 1030 | The beginning of the heap is still the same, but the size has increased upwards from `00922000` to `00943000`. 1031 | 1032 | ## The Address Space Layout Randomisation (ASLR) 1033 | 1034 | You may have noticed something "strange" in the `/proc/pid/maps` listing above, that we want to study: 1035 | 1036 | The program break is the address of the first location beyond the current end of the data region - so the address of the first location beyond the executable in the virtual memory. As a consequence, the heap should start right after the end of the executable in memory. As you can see in all above listing, it is NOT the case. The only thing that is true is that the heap is always the next memory region after the executable, which makes sense since the heap is actually part of the data segment of the executable itself. Also, if we look even closer, the memory gap size between the executable and the heap is never the same: 1037 | 1038 | _Format of the following lines: [PID of the above `maps` listings]: address of the beginning of the [heap] - address of the end of the executable = memory gap size_ 1039 | 1040 | - [3718]: 01195000 - 00602000 = b93000 1041 | - [3834]: 024d6000 - 00602000 = 1ed4000 1042 | - [4014]: 00e70000 - 00602000 = 86e000 1043 | - [4172]: 01314000 - 00602000 = d12000 1044 | - [7972]: 00901000 - 00602000 = 2ff000 1045 | 1046 | It seems that this gap size is random, and indeed, it is. If we look at the ELF binary loader source code (`fs/binfmt_elf.c`) we can find this: 1047 | 1048 | ```C 1049 | if ((current->flags & PF_RANDOMIZE) && (randomize_va_space > 1)) { 1050 | current->mm->brk = current->mm->start_brk = 1051 | arch_randomize_brk(current->mm); 1052 | #ifdef compat_brk_randomized 1053 | current->brk_randomized = 1; 1054 | #endif 1055 | } 1056 | ``` 1057 | 1058 | where `current->mm->brk` is the address of the program break. The `arch_randomize_brk` function can be found in the `arch/x86/kernel/process.c` file: 1059 | 1060 | ```C 1061 | unsigned long arch_randomize_brk(struct mm_struct *mm) 1062 | { 1063 | unsigned long range_end = mm->brk + 0x02000000; 1064 | return randomize_range(mm->brk, range_end, 0) ? : mm->brk; 1065 | } 1066 | ``` 1067 | 1068 | The `randomize_range` returns a start address such that: 1069 | 1070 | ```C 1071 | [...... .....] 1072 | start end 1073 | ``` 1074 | 1075 | Source code of the `randomize_range` function (`drivers/char/random.c`): 1076 | 1077 | ```C 1078 | /* 1079 | * randomize_range() returns a start address such that 1080 | * 1081 | * [...... .....] 1082 | * start end 1083 | * 1084 | * a with size "len" starting at the return value is inside in the 1085 | * area defined by [start, end], but is otherwise randomized. 1086 | */ 1087 | unsigned long 1088 | randomize_range(unsigned long start, unsigned long end, unsigned long len) 1089 | { 1090 | unsigned long range = end - len - start; 1091 | 1092 | if (end <= start + len) 1093 | return 0; 1094 | return PAGE_ALIGN(get_random_int() % range + start); 1095 | } 1096 | ``` 1097 | 1098 | As a result, the offset between the data section of the executable and the program break initial position when the process runs can have a size of anywhere between `0` and `0x02000000`. This randomization is known as Address Space Layout Randomisation (ASLR). ASLR is a computer security technique involved in preventing exploitation of memory corruption vulnerabilities. In order to prevent an attacker from jumping to, for example, a particular exploited function in memory, ASLR randomly arranges the address space positions of key data areas of a process, including the positions of the heap and the stack. 1099 | 1100 | ## The updated VM diagram 1101 | 1102 | With all the above in mind, we can now update our VM diagram: 1103 | 1104 | ![Virtual memory diagram](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/virtual_memory_diagram_v2.png) 1105 | 1106 | ## `malloc(0)` 1107 | 1108 | Did you ever wonder what was happening when we call `malloc` with a size of `0`? Let's check! (`10-main.c`) 1109 | 1110 | ```C 1111 | #include 1112 | #include 1113 | #include 1114 | 1115 | /** 1116 | * pmem - print mem 1117 | * @p: memory address to start printing from 1118 | * @bytes: number of bytes to print 1119 | * 1120 | * Return: nothing 1121 | */ 1122 | void pmem(void *p, unsigned int bytes) 1123 | { 1124 | unsigned char *ptr; 1125 | unsigned int i; 1126 | 1127 | ptr = (unsigned char *)p; 1128 | for (i = 0; i < bytes; i++) 1129 | { 1130 | if (i != 0) 1131 | { 1132 | printf(" "); 1133 | } 1134 | printf("%02x", *(ptr + i)); 1135 | } 1136 | printf("\n"); 1137 | } 1138 | 1139 | /** 1140 | * main - moving the program break 1141 | * 1142 | * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS 1143 | */ 1144 | int main(void) 1145 | { 1146 | void *p; 1147 | size_t size_of_the_chunk; 1148 | char prev_used; 1149 | 1150 | p = malloc(0); 1151 | printf("%p\n", p); 1152 | pmem((char *)p - 0x10, 0x10); 1153 | size_of_the_chunk = *((size_t *)((char *)p - 8)); 1154 | prev_used = size_of_the_chunk & 1; 1155 | size_of_the_chunk -= prev_used; 1156 | printf("chunk size = %li bytes\n", size_of_the_chunk); 1157 | return (EXIT_SUCCESS); 1158 | } 1159 | ``` 1160 | 1161 | ``` 1162 | julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 10-main.c -o 10 1163 | julien@holberton:~/holberton/w/hackthevm3$ ./10 1164 | 0xd08010 1165 | 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 1166 | chunk size = 32 bytes 1167 | julien@holberton:~/holberton/w/hackthevm3$ 1168 | ``` 1169 | 1170 | -> `malloc(0)` is actually using 32 bytes, including the first `0x10` bytes. 1171 | 1172 | Again, note that this will not always be the case. From the man page (`man malloc`): 1173 | 1174 | ``` 1175 | NULL may also be returned by a successful call to malloc() with a size of zero 1176 | ``` 1177 | 1178 | ## Outro 1179 | 1180 | We have learned a couple of things about malloc and the heap. But there is actually more than `brk` and `sbrk`. You can try malloc'ing a big chunk of memory, `strace` it, and look at `/proc` to learn more before we cover it in a next chapter :) 1181 | 1182 | Also, studying how `free` works in coordination with `malloc` is something we haven't covered yet. If you want to look at it, you will find part of the answer to why the minimum chunk size is `32` (when we ask `malloc` for `0` bytes) vs `16` (`0x10` in hexadecimal) or `0`. 1183 | 1184 | As usual, to be continued! Let me know if you have something you would like me to cover in the next chapter. 1185 | 1186 | ### Questions? Feedback? 1187 | 1188 | If you have questions or feedback don't hesitate to ping us on Twitter at [@holbertonschool](https://twitter.com/holbertonschool) or [@julienbarbier42](https://twitter.com/julienbarbier42). 1189 | _Haters, please send your comments to `/dev/null`._ 1190 | 1191 | Happy Hacking! 1192 | 1193 | ### Thank you for reading! 1194 | 1195 | As always, no-one is perfect (except [Chuck](http://codesqueeze.com/the-ultimate-top-25-chuck-norris-the-programmer-jokes/) of course), so don't hesitate to [contribute](https://github.com/holbertonschool/Hack-The-Virtual-Memory/blob/master/03.%20malloc,%20the%20heap%20and%20the%20program%20break/) or send me your comments if you find anything I missed. 1196 | 1197 | ### Files 1198 | 1199 | [This repo](https://github.com/holbertonschool/Hack-The-Virtual-Memory/tree/master/03.%20malloc%2C%20the%20heap%20and%20the%20program%20break) contains the source code (`naive_malloc.c`, `version.c` & `X-main.c` files) for programs created in this tutorial. 1200 | 1201 | ### Read more about the virtual memory 1202 | 1203 | Follow [@holbertonschool](https://twitter.com/holbertonschool) or [@julienbarbier42](https://twitter.com/julienbarbier42) on Twitter to get the next chapters! This was the fourth chapter in our series on the virtual memory. If you missed the previous ones, here are the links to them: 1204 | 1205 | - Chapter 0: [Hack The Virtual Memory: C strings & /proc](https://blog.holbertonschool.com/hack-the-virtual-memory-c-strings-proc/) 1206 | - Chapter 1: [Hack The Virtual Memory: Python bytes](https://blog.holbertonschool.com/hack-the-virtual-memory-python-bytes/) 1207 | - Chapter 2: [Hack The Virtual Memory: Drawing the VM diagram](https://blog.holbertonschool.com/hack-the-virtual-memory-drawing-the-vm-diagram/) 1208 | 1209 | _Many thanks to [Tim](https://twitter.com/wintermanc3r), [Anne](https://twitter.com/1million40) and [Ian](https://www.linkedin.com/in/iancugniere/) for proof-reading!_ :) 1210 | -------------------------------------------------------------------------------- /03. malloc, the heap and the program break/naive_malloc.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | /** 5 | * naive_malloc - dynamically allocates memory on the heap using sbrk 6 | * @size: number of bytes to allocate 7 | * 8 | * Return: the memory address newly allocated, or NULL on error 9 | * 10 | * Note: don't do this at home :) 11 | */ 12 | void *naive_malloc(size_t size) 13 | { 14 | void *previous_break; 15 | 16 | previous_break = sbrk(size); 17 | /* check for error */ 18 | if (previous_break == (void *) -1) 19 | { 20 | /* on error malloc returns NULL */ 21 | return (NULL); 22 | } 23 | return (previous_break); 24 | } 25 | -------------------------------------------------------------------------------- /03. malloc, the heap and the program break/version.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | /** 5 | * main - prints the version of the glibc 6 | * 7 | * Return: 0 8 | */ 9 | int main (void) 10 | { 11 | puts(gnu_get_libc_version()); 12 | return (0); 13 | } 14 | -------------------------------------------------------------------------------- /04. The Stack, registers and assembly code/0-main.c: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | int main(void) 4 | { 5 | int a; 6 | 7 | a = 972; 8 | printf("a = %d\n", a); 9 | return (0); 10 | } 11 | -------------------------------------------------------------------------------- /04. The Stack, registers and assembly code/1-main.c: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | void func1(void) 4 | { 5 | int a; 6 | int b; 7 | int c; 8 | 9 | a = 98; 10 | b = 972; 11 | c = a + b; 12 | printf("a = %d, b = %d, c = %d\n", a, b, c); 13 | } 14 | 15 | void func2(void) 16 | { 17 | int a; 18 | int b; 19 | int c; 20 | 21 | printf("a = %d, b = %d, c = %d\n", a, b, c); 22 | } 23 | 24 | int main(void) 25 | { 26 | func1(); 27 | func2(); 28 | return (0); 29 | } 30 | -------------------------------------------------------------------------------- /04. The Stack, registers and assembly code/2-main.c: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | void func1(void) 4 | { 5 | int a; 6 | int b; 7 | int c; 8 | register long rsp asm ("rsp"); 9 | register long rbp asm ("rbp"); 10 | 11 | a = 98; 12 | b = 972; 13 | c = a + b; 14 | printf("a = %d, b = %d, c = %d\n", a, b, c); 15 | printf("func1, rpb = %lx\n", rbp); 16 | printf("func1, rsp = %lx\n", rsp); 17 | printf("func1, a = %d\n", *(int *)(((char *)rbp) - 0xc) ); 18 | printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) ); 19 | printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) ); 20 | printf("func1, previous rbp value = %lx\n", *(unsigned long int *)rbp ); 21 | printf("func1, return address value = %lx\n", *(unsigned long int *)((char *)rbp + 8) ); 22 | } 23 | 24 | void func2(void) 25 | { 26 | int a; 27 | int b; 28 | int c; 29 | register long rsp asm ("rsp"); 30 | register long rbp asm ("rbp"); 31 | 32 | printf("func2, a = %d, b = %d, c = %d\n", a, b, c); 33 | printf("func2, rpb = %lx\n", rbp); 34 | printf("func2, rsp = %lx\n", rsp); 35 | } 36 | 37 | int main(void) 38 | { 39 | register long rsp asm ("rsp"); 40 | register long rbp asm ("rbp"); 41 | 42 | printf("main, rpb = %lx\n", rbp); 43 | printf("main, rsp = %lx\n", rsp); 44 | func1(); 45 | func2(); 46 | return (0); 47 | } 48 | 49 | -------------------------------------------------------------------------------- /04. The Stack, registers and assembly code/3-main.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | void bye(void) 5 | { 6 | printf("[x] I am in the function bye!\n"); 7 | exit(98); 8 | } 9 | 10 | void func1(void) 11 | { 12 | int a; 13 | int b; 14 | int c; 15 | register long rsp asm ("rsp"); 16 | register long rbp asm ("rbp"); 17 | 18 | a = 98; 19 | b = 972; 20 | c = a + b; 21 | printf("a = %d, b = %d, c = %d\n", a, b, c); 22 | printf("func1, rpb = %lx\n", rbp); 23 | printf("func1, rsp = %lx\n", rsp); 24 | printf("func1, a = %d\n", *(int *)(((char *)rbp) - 0xc) ); 25 | printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) ); 26 | printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) ); 27 | printf("func1, previous rbp value = %lx\n", *(unsigned long int *)rbp ); 28 | printf("func1, return address value = %lx\n", *(unsigned long int *)((char *)rbp + 8) ); 29 | } 30 | 31 | void func2(void) 32 | { 33 | int a; 34 | int b; 35 | int c; 36 | register long rsp asm ("rsp"); 37 | register long rbp asm ("rbp"); 38 | 39 | printf("func2, a = %d, b = %d, c = %d\n", a, b, c); 40 | printf("func2, rpb = %lx\n", rbp); 41 | printf("func2, rsp = %lx\n", rsp); 42 | } 43 | 44 | int main(void) 45 | { 46 | register long rsp asm ("rsp"); 47 | register long rbp asm ("rbp"); 48 | 49 | printf("main, rpb = %lx\n", rbp); 50 | printf("main, rsp = %lx\n", rsp); 51 | func1(); 52 | func2(); 53 | return (0); 54 | } 55 | -------------------------------------------------------------------------------- /04. The Stack, registers and assembly code/4-main.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | void bye(void) 5 | { 6 | printf("[x] I am in the function bye!\n"); 7 | exit(98); 8 | } 9 | 10 | void func1(void) 11 | { 12 | int a; 13 | int b; 14 | int c; 15 | register long rsp asm ("rsp"); 16 | register long rbp asm ("rbp"); 17 | 18 | a = 98; 19 | b = 972; 20 | c = a + b; 21 | printf("a = %d, b = %d, c = %d\n", a, b, c); 22 | printf("func1, rpb = %lx\n", rbp); 23 | printf("func1, rsp = %lx\n", rsp); 24 | printf("func1, a = %d\n", *(int *)(((char *)rbp) - 0xc) ); 25 | printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) ); 26 | printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) ); 27 | printf("func1, previous rbp value = %lx\n", *(unsigned long int *)rbp ); 28 | printf("func1, return address value = %lx\n", *(unsigned long int *)((char *)rbp + 8) ); 29 | /* hack the stack! */ 30 | *(unsigned long int *)((char *)rbp + 8) = 0x4005bd; 31 | } 32 | 33 | void func2(void) 34 | { 35 | int a; 36 | int b; 37 | int c; 38 | register long rsp asm ("rsp"); 39 | register long rbp asm ("rbp"); 40 | 41 | printf("func2, a = %d, b = %d, c = %d\n", a, b, c); 42 | printf("func2, rpb = %lx\n", rbp); 43 | printf("func2, rsp = %lx\n", rsp); 44 | } 45 | 46 | int main(void) 47 | { 48 | register long rsp asm ("rsp"); 49 | register long rbp asm ("rbp"); 50 | 51 | printf("main, rpb = %lx\n", rbp); 52 | printf("main, rsp = %lx\n", rsp); 53 | func1(); 54 | func2(); 55 | return (0); 56 | } 57 | -------------------------------------------------------------------------------- /04. The Stack, registers and assembly code/README.md: -------------------------------------------------------------------------------- 1 | ![hack the virtual memory, the stack, registers and assembly code](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/hack-the-virtual-memory-the-stack-rsp-rbp2.png) 2 | 3 | ## Hack the virtual memory, chapter 4: the stack, registers and assembly code 4 | 5 | This is the fifth chapter in a series about virtual memory. The goal is to learn some CS basics in a different and more practical way. 6 | 7 | If you missed the previous chapters, you should probably start there: 8 | 9 | * Chapter 0: [Hack The Virtual Memory: C strings & /proc](https://blog.holbertonschool.com/hack-the-virtual-memory-c-strings-proc/) 10 | * Chapter 1: [Hack The Virtual Memory: Python bytes](https://blog.holbertonschool.com/hack-the-virtual-memory-python-bytes/) 11 | * Chapter 2: [Hack The Virtual Memory: Drawing the VM diagram](https://blog.holbertonschool.com/hack-the-virtual-memory-drawing-the-vm-diagram/) 12 | * Chapter 3: [Hack the Virtual Memory: malloc, the heap & the program break](https://blog.holbertonschool.com/hack-the-virtual-memory-malloc-the-heap-the-program-break/) 13 | 14 | ## The Stack 15 | 16 | As we have seen in [chapter 2](https://blog.holbertonschool.com/hack-the-virtual-memory-drawing-the-vm-diagram/), the stack resides at the high end of memory and grows downward. But how does it work exactly? How does it translate into assembly code? What are the registers used? In this chapter we will have a closer look at how the stack works, and how the program automatically allocates and de-allocates local variables. 17 | 18 | Once we understand this, we will be able to play a bit with it, and hijack the flow of our program. Ready? Let's start! 19 | 20 | _Note: We will talk only about the user stack, as opposed to the kernel stack_ 21 | 22 | ## Prerequisites 23 | 24 | In order to fully understand this article, you will need to know: 25 | 26 | * The basics of the C programming language (especially pointers) 27 | 28 | ## Environment 29 | 30 | All scripts and programs have been tested on the following system: 31 | 32 | * Ubuntu 33 | * Linux ubuntu 4.4.0-31-generic #50~14.04.1-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux 34 | * Tools used: 35 | * gcc 36 | * gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4 37 | * objdump 38 | * GNU objdump (GNU Binutils for Ubuntu) 2.2 39 | 40 | **Everything we cover will be true for this system/environment, but may be different on another system** 41 | 42 | ## Automatic allocation 43 | 44 | Let's first look at a very simple program that has one function that uses one variable (`0-main.c`): 45 | 46 | ```c 47 | #include 48 | 49 | int main(void) 50 | { 51 | int a; 52 | 53 | a = 972; 54 | printf("a = %d\n", a); 55 | return (0); 56 | } 57 | ``` 58 | 59 | Let's compile this program and disassemble it using `objdump`: 60 | 61 | ```bash 62 | holberton$ gcc 0-main.c 63 | holberton$ objdump -d -j .text -M intel 64 | ``` 65 | 66 | The assembly code produced for our `main` function is the following: 67 | 68 | ```asm 69 | 000000000040052d
: 70 | 40052d: 55 push rbp 71 | 40052e: 48 89 e5 mov rbp,rsp 72 | 400531: 48 83 ec 10 sub rsp,0x10 73 | 400535: c7 45 fc cc 03 00 00 mov DWORD PTR [rbp-0x4],0x3cc 74 | 40053c: 8b 45 fc mov eax,DWORD PTR [rbp-0x4] 75 | 40053f: 89 c6 mov esi,eax 76 | 400541: bf e4 05 40 00 mov edi,0x4005e4 77 | 400546: b8 00 00 00 00 mov eax,0x0 78 | 40054b: e8 c0 fe ff ff call 400410 79 | 400550: b8 00 00 00 00 mov eax,0x0 80 | 400555: c9 leave 81 | 400556: c3 ret 82 | 400557: 66 0f 1f 84 00 00 00 nop WORD PTR [rax+rax*1+0x0] 83 | 40055e: 00 00 84 | ``` 85 | 86 | Let's focus on the first three lines for now: 87 | 88 | ```asm 89 | 000000000040052d
: 90 | 40052d: 55 push rbp 91 | 40052e: 48 89 e5 mov rbp,rsp 92 | 400531: 48 83 ec 10 sub rsp,0x10 93 | ``` 94 | 95 | The first lines of the function `main` refers to `rbp` and `rsp`; these are special purpose registers. `rbp` is the base pointer, which points to the base of the current stack frame, and `rsp` is the stack pointer, which points to the top of the current stack frame. 96 | 97 | Let's decompose step by step what is happening here. This is the state of the stack when we enter the function `main` before the first instruction is run: 98 | 99 | ![the stack](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/stack-step-1.png) 100 | 101 | * `push rbp` instruction pushes the value of the register `rbp` onto the stack. Because it "pushes" onto the stack, now the value of `rsp` is the memory address of the new top of the stack. The stack and the registers now look like this: 102 | 103 | ![the stack](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/stack-step-2.png) 104 | 105 | * `mov rbp, rsp` copies the value of the stack pointer `rsp` to the base pointer `rbp` -> `rpb` and `rsp` now both point to the top of the stack 106 | 107 | ![the stack](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/stack-step-3.png) 108 | 109 | * `sub rsp, 0x10` creates a space to store values of local variables. The space between `rbp` and `rsp` is this space. Note that this space is large enough to store our variable of type `integer` 110 | 111 | ![the stack](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/stack-step-4.png) 112 | 113 | We have just created a space in memory - on the stack - for our local variables. This space is called a stack frame. Every function that has local variables will use a stack frame to store those variables. 114 | 115 | ## Using local variables 116 | 117 | The fourth line of assembly code of our `main` function is the following: 118 | 119 | ```asm 120 | 400535: c7 45 fc cc 03 00 00 mov DWORD PTR [rbp-0x4],0x3cc 121 | ``` 122 | 123 | `0x3cc` is actually the value `972` in hexadecimal. This line corresponds to our C-code line: 124 | 125 | ```c 126 | a = 972; 127 | ``` 128 | 129 | `mov DWORD PTR [rbp-0x4],0x3cc` is setting the memory at address `rbp - 4` to `972`. `[rbp - 4]` IS our local variable `a`. The computer doesn't actually know the name of the variable we use in our code, it simply refers to memory addresses on the stack. 130 | 131 | This is the state of the stack and the registers after this operation: 132 | 133 | ![the stack](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/stack-variable.png) 134 | 135 | ## `leave`, Automatic de-allocation 136 | 137 | If we look now at the end of the function, we will find this: 138 | 139 | ``` 140 | 400555: c9 leave 141 | ``` 142 | 143 | The instruction `leave` sets `rsp` to `rbp`, and then pops the top of the stack into `rbp`. 144 | 145 | ![the stack](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/stack-leave-1.png) 146 | 147 | ![the stack](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/stack-leave-2.png) 148 | 149 | Because we pushed the previous value of `rbp` onto the stack when we entered the function, `rbp` is now set to the previous value of `rbp`. This is how: 150 | 151 | * The local variables are "de-allocated", and 152 | * the stack frame of the previous function is restored before we leave the current function. 153 | 154 | The state of the stack and the registers `rbp` and `rsp` are restored to the same state as when we entered our `main` function. 155 | 156 | ## Playing with the stack 157 | 158 | When the variables are automatically de-allocated from the stack, they are not completely "destroyed". Their values are still in memory, and this space will potentially be used by other functions. 159 | 160 | This is why it is important to initialize your variables when you write your code, because otherwise, they will take whatever value there is on the stack at the moment when the program is running. 161 | 162 | Let's consider the following C code (`1-main.c`): 163 | 164 | ```c 165 | #include 166 | 167 | void func1(void) 168 | { 169 | int a; 170 | int b; 171 | int c; 172 | 173 | a = 98; 174 | b = 972; 175 | c = a + b; 176 | printf("a = %d, b = %d, c = %d\n", a, b, c); 177 | } 178 | 179 | void func2(void) 180 | { 181 | int a; 182 | int b; 183 | int c; 184 | 185 | printf("a = %d, b = %d, c = %d\n", a, b, c); 186 | } 187 | 188 | int main(void) 189 | { 190 | func1(); 191 | func2(); 192 | return (0); 193 | } 194 | ``` 195 | 196 | As you can see, `func2` does not set the values of its local vaiables `a`, `b` and `c`, yet if we compile and run this program it will print... 197 | 198 | ```bash 199 | holberton$ gcc 1-main.c && ./a.out 200 | a = 98, b = 972, c = 1070 201 | a = 98, b = 972, c = 1070 202 | holberton$ 203 | ``` 204 | 205 | ... the same variable values of `func1`! This is because of how the stack works. The two functions declared the same amount of variables, with the same type, in the same order. Their stack frames are exactly the same. When `func1` ends, the memory where the values of its local variables reside are not cleared - only `rsp` is incremented. 206 | As a consequence, when we call `func2` its stack frame sits at exactly the same place of the previous `func1` stack frame, and the local variables of `func2` have the same values of the local variables of `func1` when we left `func1`. 207 | 208 | Let's examine the assembly code to prove it: 209 | 210 | ```bash 211 | holberton$ objdump -d -j .text -M intel 212 | ``` 213 | 214 | ```asm 215 | 000000000040052d : 216 | 40052d: 55 push rbp 217 | 40052e: 48 89 e5 mov rbp,rsp 218 | 400531: 48 83 ec 10 sub rsp,0x10 219 | 400535: c7 45 f4 62 00 00 00 mov DWORD PTR [rbp-0xc],0x62 220 | 40053c: c7 45 f8 cc 03 00 00 mov DWORD PTR [rbp-0x8],0x3cc 221 | 400543: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8] 222 | 400546: 8b 55 f4 mov edx,DWORD PTR [rbp-0xc] 223 | 400549: 01 d0 add eax,edx 224 | 40054b: 89 45 fc mov DWORD PTR [rbp-0x4],eax 225 | 40054e: 8b 4d fc mov ecx,DWORD PTR [rbp-0x4] 226 | 400551: 8b 55 f8 mov edx,DWORD PTR [rbp-0x8] 227 | 400554: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc] 228 | 400557: 89 c6 mov esi,eax 229 | 400559: bf 34 06 40 00 mov edi,0x400634 230 | 40055e: b8 00 00 00 00 mov eax,0x0 231 | 400563: e8 a8 fe ff ff call 400410 232 | 400568: c9 leave 233 | 400569: c3 ret 234 | 235 | 000000000040056a : 236 | 40056a: 55 push rbp 237 | 40056b: 48 89 e5 mov rbp,rsp 238 | 40056e: 48 83 ec 10 sub rsp,0x10 239 | 400572: 8b 4d fc mov ecx,DWORD PTR [rbp-0x4] 240 | 400575: 8b 55 f8 mov edx,DWORD PTR [rbp-0x8] 241 | 400578: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc] 242 | 40057b: 89 c6 mov esi,eax 243 | 40057d: bf 34 06 40 00 mov edi,0x400634 244 | 400582: b8 00 00 00 00 mov eax,0x0 245 | 400587: e8 84 fe ff ff call 400410 246 | 40058c: c9 leave 247 | 40058d: c3 ret 248 | 249 | 000000000040058e
: 250 | 40058e: 55 push rbp 251 | 40058f: 48 89 e5 mov rbp,rsp 252 | 400592: e8 96 ff ff ff call 40052d 253 | 400597: e8 ce ff ff ff call 40056a 254 | 40059c: b8 00 00 00 00 mov eax,0x0 255 | 4005a1: 5d pop rbp 256 | 4005a2: c3 ret 257 | 4005a3: 66 2e 0f 1f 84 00 00 nop WORD PTR cs:[rax+rax*1+0x0] 258 | 4005aa: 00 00 00 259 | 4005ad: 0f 1f 00 nop DWORD PTR [rax] 260 | ``` 261 | 262 | As you can see, the way the stack frame is formed is always consistent. In our two functions, the size of the stack frame is the same since the local variables are the same. 263 | 264 | ```asm 265 | push rbp 266 | mov rbp,rsp 267 | sub rsp,0x10 268 | ``` 269 | 270 | And both functions end with the `leave` statement. 271 | 272 | The variables `a`, `b` and `c` are referenced the same way in the two functions: 273 | 274 | * `a` lies at memory address `rbp - 0xc` 275 | * `b` lies at memory address `rbp - 0x8` 276 | * `c` lies at memory address `rbp - 0x4` 277 | 278 | Note that the order of those variables on the stack is not the same as the order of those variables in our code. The compiler orders them as it wants, so you should never assume the order of your local variables in the stack. 279 | 280 | So, this is the state of the stack and the registers `rbp` and `rsp` before we leave `func1`: 281 | 282 | ![the stack](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/stack-func1-1.png) 283 | 284 | When we leave the function `func1`, we hit the instruction `leave`; as previously explained, this is the state of the stack, `rbp` and `rsp` right before returning to the function `main`: 285 | 286 | ![the stack](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/stack-func1-2.png) 287 | 288 | So when we enter `func2`, the local variables are set to whatever sits in memory on the stack, and that is why their values are the same as the local variables of the function `func1`. 289 | 290 | ![the stack](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/stack-func2-1.png) 291 | 292 | ## ret 293 | 294 | You might have noticed that all our example functions end with the instruction `ret`. `ret` pops the return address from stack and jumps there. When functions are called the program uses the instruction `call` to push the return address before it jumps to the first instruction of the function called. 295 | This is how the program is able to call a function and then return from said function the calling function to execute its next instruction. 296 | 297 | So this means that there are more than just variables on the stack, there are also memory addresses of instructions. Let's revisit our `1-main.c` code. 298 | 299 | When the `main` function calls `func1`, 300 | 301 | ```asm 302 | 400592: e8 96 ff ff ff call 40052d 303 | ``` 304 | 305 | it pushes the memory address of the next instruction onto the stack, and then jumps to `func1`. 306 | As a consequence, before executing any instructions in `func1`, the top of the stack contains this address, so `rsp` points to this value. 307 | 308 | ![the stack](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/stack-call.png) 309 | 310 | After the stack frame of `func1` is formed, the stack looks like this: 311 | 312 | ![the stack](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/stack-func1-3.png) 313 | 314 | ## Wrapping everything up 315 | 316 | Given what we just learned, we can directly use `rbp` to directly access all our local variables (without using the C variables!), as well as the saved `rbp` value on the stack and the return address values of our functions. 317 | 318 | To do so in C, we can use: 319 | 320 | ```c 321 | register long rsp asm ("rsp"); 322 | register long rbp asm ("rbp"); 323 | ``` 324 | 325 | Here is the listing of the program `2-main.c`: 326 | 327 | ```c 328 | #include 329 | 330 | void func1(void) 331 | { 332 | int a; 333 | int b; 334 | int c; 335 | register long rsp asm ("rsp"); 336 | register long rbp asm ("rbp"); 337 | 338 | a = 98; 339 | b = 972; 340 | c = a + b; 341 | printf("a = %d, b = %d, c = %d\n", a, b, c); 342 | printf("func1, rpb = %lx\n", rbp); 343 | printf("func1, rsp = %lx\n", rsp); 344 | printf("func1, a = %d\n", *(int *)(((char *)rbp) - 0xc) ); 345 | printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) ); 346 | printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) ); 347 | printf("func1, previous rbp value = %lx\n", *(unsigned long int *)rbp ); 348 | printf("func1, return address value = %lx\n", *(unsigned long int *)((char *)rbp + 8) ); 349 | } 350 | 351 | void func2(void) 352 | { 353 | int a; 354 | int b; 355 | int c; 356 | register long rsp asm ("rsp"); 357 | register long rbp asm ("rbp"); 358 | 359 | printf("func2, a = %d, b = %d, c = %d\n", a, b, c); 360 | printf("func2, rpb = %lx\n", rbp); 361 | printf("func2, rsp = %lx\n", rsp); 362 | } 363 | 364 | int main(void) 365 | { 366 | register long rsp asm ("rsp"); 367 | register long rbp asm ("rbp"); 368 | 369 | printf("main, rpb = %lx\n", rbp); 370 | printf("main, rsp = %lx\n", rsp); 371 | func1(); 372 | func2(); 373 | return (0); 374 | } 375 | ``` 376 | 377 | ### Getting the values of the variables 378 | 379 | ![the stack](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/stack-func1-3.png) 380 | 381 | From our previous discoveries, we know that our variables are referenced via `rbp` - 0xX: 382 | 383 | * `a` is at `rbp - 0xc` 384 | * `b` is at `rbp - 0x8` 385 | * `c` is at `rbp - 0x4` 386 | 387 | So in order to get the values of those variables, we need to dereference `rbp`. For the variable `a`: 388 | 389 | * cast our variable `rbp` to a `char *`: `(char *)rbp` 390 | * subtract the correct amount of bytes to get the address of where the variable is in memory: `(char *)rbp) - 0xc` 391 | * cast it again to a pointer pointing to an `int` since `a` is of type `int`: `(int *)(((char *)rbp) - 0xc)` 392 | * and dereference it to get the value sitting at this address: `*(int *)(((char *)rbp) - 0xc)` 393 | 394 | ### The saved `rbp` value 395 | 396 | ![the stack](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/stack-func1-3.png) 397 | 398 | Looking at the above diagram, the current `rbp` directly points to the saved `rbp`, so we simply have to cast our variable `rbp` to a pointer to an `unsigned long int` and dereference it: `*(unsigned long int *)rbp`. 399 | 400 | ### The return address value 401 | 402 | ![the stack](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/stack-func1-3.png) 403 | 404 | The return address value is right before the saved previous `rbp` on the stack. `rbp` is 8 bytes long, so we simply need to add 8 to the current value of `rbp` to get the address where this return value is on the stack. This is how we do it: 405 | 406 | * cast our variable `rbp` to a `char *`: `(char *)rbp` 407 | * add 8 to this value: ((char *)rbp + 8) 408 | * cast it to point to an `unsigned long int`: `(unsigned long int *)((char *)rbp + 8)` 409 | * dereference it to get the value at this address: `*(unsigned long int *)((char *)rbp + 8)` 410 | 411 | ### The output of our program 412 | 413 | ```bash 414 | holberton$ gcc 2-main.c && ./a.out 415 | main, rpb = 7ffc78e71b70 416 | main, rsp = 7ffc78e71b70 417 | a = 98, b = 972, c = 1070 418 | func1, rpb = 7ffc78e71b60 419 | func1, rsp = 7ffc78e71b50 420 | func1, a = 98 421 | func1, b = 972 422 | func1, c = 1070 423 | func1, previous rbp value = 7ffc78e71b70 424 | func1, return address value = 400697 425 | func2, a = 98, b = 972, c = 1070 426 | func2, rpb = 7ffc78e71b60 427 | func2, rsp = 7ffc78e71b50 428 | holberton$ 429 | ``` 430 | 431 | We can see that: 432 | 433 | * from `func1` we can access all our variables correctly via `rbp` 434 | * from `func1` we can get the `rbp` of the function `main` 435 | * we confirm that `func1` and `func2` do have the same `rbp` and `rsp` values 436 | * the difference between `rsp` and `rbp` is 0x10, as seen in the assembly code (`sub rsp,0x10`) 437 | * in the `main` function, `rsp` == `rbp` because there are no local variables 438 | 439 | The return address from `func1` is `0x400697`. Let's double check this assumption by disassembling the program. If we are correct, this should be the address of the instruction right after the call of `func1` in the `main` function. 440 | 441 | ```bash 442 | holberton$ objdump -d -j .text -M intel | less 443 | ``` 444 | 445 | ```asm 446 | 0000000000400664
: 447 | 400664: 55 push rbp 448 | 400665: 48 89 e5 mov rbp,rsp 449 | 400668: 48 89 e8 mov rax,rbp 450 | 40066b: 48 89 c6 mov rsi,rax 451 | 40066e: bf 3b 08 40 00 mov edi,0x40083b 452 | 400673: b8 00 00 00 00 mov eax,0x0 453 | 400678: e8 93 fd ff ff call 400410 454 | 40067d: 48 89 e0 mov rax,rsp 455 | 400680: 48 89 c6 mov rsi,rax 456 | 400683: bf 4c 08 40 00 mov edi,0x40084c 457 | 400688: b8 00 00 00 00 mov eax,0x0 458 | 40068d: e8 7e fd ff ff call 400410 459 | 400692: e8 96 fe ff ff call 40052d 460 | 400697: e8 7a ff ff ff call 400616 461 | 40069c: b8 00 00 00 00 mov eax,0x0 462 | 4006a1: 5d pop rbp 463 | 4006a2: c3 ret 464 | 4006a3: 66 2e 0f 1f 84 00 00 nop WORD PTR cs:[rax+rax*1+0x0] 465 | 4006aa: 00 00 00 466 | 4006ad: 0f 1f 00 nop DWORD PTR [rax] 467 | ``` 468 | 469 | And yes! \o/ 470 | 471 | ## Hack the stack! 472 | 473 | Now that we know where to find the return address on the stack, what if we were to modify this value? Could we alter the flow of a program and make `func1` return to somewhere else? Let's add a new function, called `bye` to our program (`3-main.c`): 474 | 475 | ```c 476 | #include 477 | #include 478 | 479 | void bye(void) 480 | { 481 | printf("[x] I am in the function bye!\n"); 482 | exit(98); 483 | } 484 | 485 | void func1(void) 486 | { 487 | int a; 488 | int b; 489 | int c; 490 | register long rsp asm ("rsp"); 491 | register long rbp asm ("rbp"); 492 | 493 | a = 98; 494 | b = 972; 495 | c = a + b; 496 | printf("a = %d, b = %d, c = %d\n", a, b, c); 497 | printf("func1, rpb = %lx\n", rbp); 498 | printf("func1, rsp = %lx\n", rsp); 499 | printf("func1, a = %d\n", *(int *)(((char *)rbp) - 0xc) ); 500 | printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) ); 501 | printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) ); 502 | printf("func1, previous rbp value = %lx\n", *(unsigned long int *)rbp ); 503 | printf("func1, return address value = %lx\n", *(unsigned long int *)((char *)rbp + 8) ); 504 | } 505 | 506 | void func2(void) 507 | { 508 | int a; 509 | int b; 510 | int c; 511 | register long rsp asm ("rsp"); 512 | register long rbp asm ("rbp"); 513 | 514 | printf("func2, a = %d, b = %d, c = %d\n", a, b, c); 515 | printf("func2, rpb = %lx\n", rbp); 516 | printf("func2, rsp = %lx\n", rsp); 517 | } 518 | 519 | int main(void) 520 | { 521 | register long rsp asm ("rsp"); 522 | register long rbp asm ("rbp"); 523 | 524 | printf("main, rpb = %lx\n", rbp); 525 | printf("main, rsp = %lx\n", rsp); 526 | func1(); 527 | func2(); 528 | return (0); 529 | } 530 | ``` 531 | 532 | Let's see at which address the code of this function starts: 533 | 534 | ```bash 535 | holberton$ gcc 3-main.c && objdump -d -j .text -M intel | less 536 | ``` 537 | 538 | ```asm 539 | 00000000004005bd : 540 | 4005bd: 55 push rbp 541 | 4005be: 48 89 e5 mov rbp,rsp 542 | 4005c1: bf d8 07 40 00 mov edi,0x4007d8 543 | 4005c6: e8 b5 fe ff ff call 400480 544 | 4005cb: bf 62 00 00 00 mov edi,0x62 545 | 4005d0: e8 eb fe ff ff call 4004c0 546 | ``` 547 | 548 | Now let's replace the return address on the stack from the `func1` function with the address of the beginning of the function `bye`, `4005bd` (`4-main.c`): 549 | 550 | ```c 551 | #include 552 | #include 553 | 554 | void bye(void) 555 | { 556 | printf("[x] I am in the function bye!\n"); 557 | exit(98); 558 | } 559 | 560 | void func1(void) 561 | { 562 | int a; 563 | int b; 564 | int c; 565 | register long rsp asm ("rsp"); 566 | register long rbp asm ("rbp"); 567 | 568 | a = 98; 569 | b = 972; 570 | c = a + b; 571 | printf("a = %d, b = %d, c = %d\n", a, b, c); 572 | printf("func1, rpb = %lx\n", rbp); 573 | printf("func1, rsp = %lx\n", rsp); 574 | printf("func1, a = %d\n", *(int *)(((char *)rbp) - 0xc) ); 575 | printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) ); 576 | printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) ); 577 | printf("func1, previous rbp value = %lx\n", *(unsigned long int *)rbp ); 578 | printf("func1, return address value = %lx\n", *(unsigned long int *)((char *)rbp + 8) ); 579 | /* hack the stack! */ 580 | *(unsigned long int *)((char *)rbp + 8) = 0x4005bd; 581 | } 582 | 583 | void func2(void) 584 | { 585 | int a; 586 | int b; 587 | int c; 588 | register long rsp asm ("rsp"); 589 | register long rbp asm ("rbp"); 590 | 591 | printf("func2, a = %d, b = %d, c = %d\n", a, b, c); 592 | printf("func2, rpb = %lx\n", rbp); 593 | printf("func2, rsp = %lx\n", rsp); 594 | } 595 | 596 | int main(void) 597 | { 598 | register long rsp asm ("rsp"); 599 | register long rbp asm ("rbp"); 600 | 601 | printf("main, rpb = %lx\n", rbp); 602 | printf("main, rsp = %lx\n", rsp); 603 | func1(); 604 | func2(); 605 | return (0); 606 | } 607 | ``` 608 | 609 | ```bash 610 | holberton$ gcc 4-main.c && ./a.out 611 | main, rpb = 7fff62ef1b60 612 | main, rsp = 7fff62ef1b60 613 | a = 98, b = 972, c = 1070 614 | func1, rpb = 7fff62ef1b50 615 | func1, rsp = 7fff62ef1b40 616 | func1, a = 98 617 | func1, b = 972 618 | func1, c = 1070 619 | func1, previous rbp value = 7fff62ef1b60 620 | func1, return address value = 40074d 621 | [x] I am in the function bye! 622 | holberton$ echo $? 623 | 98 624 | holberton$ 625 | ``` 626 | 627 | We have called the function `bye`, without calling it! :) 628 | 629 | ## Outro 630 | 631 | I hope that you enjoyed this and learned a couple of things about the stack. As usual, this will be continued! Let me know if you have anything you would like me to cover in the next chapter. 632 | 633 | ### Questions? Feedback? 634 | 635 | If you have questions or feedback don't hesitate to ping us on Twitter at [@holbertonschool](https://twitter.com/holbertonschool) or [@julienbarbier42](https://twitter.com/julienbarbier42). 636 | _Haters, please send your comments to `/dev/null`._ 637 | 638 | Happy Hacking! 639 | 640 | ### Thank you for reading! 641 | 642 | As always, no one is perfect (except [Chuck](http://codesqueeze.com/the-ultimate-top-25-chuck-norris-the-programmer-jokes/) of course), so don't hesitate to contribute or send me your comments if you find anything I missed. 643 | 644 | ### Files 645 | 646 | [This repo](https://github.com/holbertonschool/Hack-The-Virtual-Memory/tree/master/04.%20The%20Stack%2C%20registers%20and%20assembly%20code) contains the source code (`X-main.c` files) for programs created in this tutorial. 647 | 648 | ### Read more about the virtual memory 649 | 650 | Follow [@holbertonschool](https://twitter.com/holbertonschool) or [@julienbarbier42](https://twitter.com/julienbarbier42) on Twitter to get the next chapters! This was the fifth chapter in our series on the virtual memory. If you missed the previous ones, here are the links to them: 651 | 652 | - Chapter 0: [Hack The Virtual Memory: C strings & /proc](https://blog.holbertonschool.com/hack-the-virtual-memory-c-strings-proc/) 653 | - Chapter 1: [Hack The Virtual Memory: Python bytes](https://blog.holbertonschool.com/hack-the-virtual-memory-python-bytes/) 654 | - Chapter 2: [Hack The Virtual Memory: Drawing the VM diagram](https://blog.holbertonschool.com/hack-the-virtual-memory-drawing-the-vm-diagram/) 655 | - Chapter 3: [Hack the Virtual Memory: malloc, the heap & the program break](https://blog.holbertonschool.com/hack-the-virtual-memory-malloc-the-heap-the-program-break/) 656 | 657 | _Many thanks to [Naomi](https://twitter.com/NamoDawn) for proof-reading!_ :) 658 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Hack the Virtual Memory 2 | 3 | ![hack the virtual memory](https://s3-us-west-1.amazonaws.com/holbertonschool/medias/hack_the_vm_0.png) 4 | 5 | This is a series of small articles / tutorials based around virtual memory. The goal is to learn some CS basics, but in a different and more practical way. 6 | 7 | ## TOC 8 | 9 | ### 00. C strings & `/proc` 10 | 11 | For this first piece, we'll use `/proc` to find and modify variables (in this example, an ASCII string) contained inside the virtual memory of a running process, and learn some cool things along the way. 12 | 13 | Status: _Published_ 14 | 15 | ### 01. Python bytes 16 | 17 | For this second piece, we'll do almost the same thing, but instead we will access the virtual memory of a running Python 3 script. It is not as straightfoward. Let's take this as an excuse to look at some Python 3 internals! 18 | 19 | Status: _Published_ 20 | 21 | ### 02. What's where in the virtual memory 22 | 23 | Let's try to guess where things are in the virtual memory. 24 | 25 | Status: _Published_ 26 | 27 | ### 03. `malloc`, the heap and the program break 28 | 29 | In this fourth chapter we will look at the heap and how malloc works in order to answer some of the questions we ended with at the end of the previous chapter. 30 | 31 | Status: _Published_ 32 | 33 | ### 04. The stack, registers and assembly code 34 | 35 | Status: _In progress_ 36 | --------------------------------------------------------------------------------