├── .gitignore ├── src ├── 6_security │ ├── cia.md │ ├── dos.md │ ├── io.md │ ├── bruteforcing.md │ ├── security.md │ └── enc_and_hash.md ├── 5_processes │ ├── gdb.md │ ├── introduction.md │ ├── executable_types.md │ ├── indroduction.md │ ├── interaction.md │ └── proc_and_debug.md ├── whats_next.md ├── 4_programming_languages │ ├── debugging.md │ ├── write.md │ ├── my_setup.png │ ├── c_workflow.jpeg │ ├── interp.md │ ├── python.md │ ├── debugging_py.md │ ├── programming_languages.md │ ├── c.md │ ├── write_py.md │ ├── debugging_c.md │ ├── compilation.md │ ├── write_c.md │ └── introduction.md ├── 3_computer_organization │ ├── pointers.md │ ├── mc_in_ram.jpeg │ ├── mem_model_1.jpeg │ ├── mem_model_2.jpeg │ ├── memory_in_use.png │ ├── minecraft_chat.png │ ├── render_distance.png │ ├── minecraft_username.jpg │ ├── memory.md │ ├── computer_org.md │ ├── hacker_practice.md │ ├── introduction.md │ ├── assembly.md │ ├── registers.md │ ├── programs_in_mem.md │ ├── asm_memory.md │ ├── addressing.md │ ├── math_and_counting.md │ ├── bits_and_logic.md │ ├── control_structures.md │ ├── memory_segments.md │ └── instructions.md ├── .DS_Store ├── 1_introduction │ ├── blank.png │ ├── ike-method.png │ ├── nightmare-method.png │ ├── inspiration.md │ ├── introduction.md │ ├── overview.md │ └── background.md ├── 2_operating_systems │ ├── vm.png │ ├── .DS_Store │ ├── container.png │ ├── minecraft.jpeg │ ├── computer_stack.jpeg │ ├── ms_container_v_vm.png │ ├── linux_os.md │ ├── operating_systems.md │ ├── virtualization.md │ ├── permissions.md │ ├── virtual_machines.md │ ├── terminal.md │ ├── hacker_practice.md │ ├── introduction.md │ └── containers.md └── SUMMARY.md ├── .DS_Store ├── TODO ├── book.toml ├── .vscode └── settings.json ├── push_book.sh ├── .github └── workflows │ └── gh-pages.yml └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | book 2 | .DS_Store 3 | -------------------------------------------------------------------------------- /src/6_security/cia.md: -------------------------------------------------------------------------------- 1 | # CIA 2 | -------------------------------------------------------------------------------- /src/6_security/dos.md: -------------------------------------------------------------------------------- 1 | # DOS 2 | -------------------------------------------------------------------------------- /src/5_processes/gdb.md: -------------------------------------------------------------------------------- 1 | # Using GDB 2 | -------------------------------------------------------------------------------- /src/6_security/io.md: -------------------------------------------------------------------------------- 1 | # I/O Vectors 2 | -------------------------------------------------------------------------------- /src/whats_next.md: -------------------------------------------------------------------------------- 1 | # What's Next 2 | -------------------------------------------------------------------------------- /src/5_processes/introduction.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | -------------------------------------------------------------------------------- /src/6_security/bruteforcing.md: -------------------------------------------------------------------------------- 1 | # Bruteforcing 2 | -------------------------------------------------------------------------------- /src/6_security/security.md: -------------------------------------------------------------------------------- 1 | # Security Concepts 2 | -------------------------------------------------------------------------------- /src/4_programming_languages/debugging.md: -------------------------------------------------------------------------------- 1 | # Debugging 2 | -------------------------------------------------------------------------------- /src/4_programming_languages/write.md: -------------------------------------------------------------------------------- 1 | # Writing Code 2 | -------------------------------------------------------------------------------- /src/5_processes/executable_types.md: -------------------------------------------------------------------------------- 1 | # Executable Types 2 | -------------------------------------------------------------------------------- /src/6_security/enc_and_hash.md: -------------------------------------------------------------------------------- 1 | # Encryption & Hashing 2 | -------------------------------------------------------------------------------- /src/3_computer_organization/pointers.md: -------------------------------------------------------------------------------- 1 | # C-Pointers Review 2 | -------------------------------------------------------------------------------- /src/5_processes/indroduction.md: -------------------------------------------------------------------------------- 1 | # Processes and Debugging 2 | -------------------------------------------------------------------------------- /src/5_processes/interaction.md: -------------------------------------------------------------------------------- 1 | # Interacting with Python 2 | -------------------------------------------------------------------------------- /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaloz/ike/HEAD/.DS_Store -------------------------------------------------------------------------------- /src/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaloz/ike/HEAD/src/.DS_Store -------------------------------------------------------------------------------- /src/1_introduction/blank.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaloz/ike/HEAD/src/1_introduction/blank.png -------------------------------------------------------------------------------- /TODO: -------------------------------------------------------------------------------- 1 | Fix these links: 2 | [] operating_systems > introduction 3 | [] programming_langauges > introduction -------------------------------------------------------------------------------- /src/2_operating_systems/vm.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaloz/ike/HEAD/src/2_operating_systems/vm.png -------------------------------------------------------------------------------- /src/1_introduction/ike-method.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaloz/ike/HEAD/src/1_introduction/ike-method.png -------------------------------------------------------------------------------- /src/2_operating_systems/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaloz/ike/HEAD/src/2_operating_systems/.DS_Store -------------------------------------------------------------------------------- /src/2_operating_systems/container.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaloz/ike/HEAD/src/2_operating_systems/container.png -------------------------------------------------------------------------------- /src/2_operating_systems/minecraft.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaloz/ike/HEAD/src/2_operating_systems/minecraft.jpeg -------------------------------------------------------------------------------- /book.toml: -------------------------------------------------------------------------------- 1 | [book] 2 | authors = ["mahaloz"] 3 | language = "en" 4 | multilingual = false 5 | src = "src" 6 | title = "'Ike" 7 | -------------------------------------------------------------------------------- /src/1_introduction/nightmare-method.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaloz/ike/HEAD/src/1_introduction/nightmare-method.png -------------------------------------------------------------------------------- /src/4_programming_languages/my_setup.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaloz/ike/HEAD/src/4_programming_languages/my_setup.png -------------------------------------------------------------------------------- /src/2_operating_systems/computer_stack.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaloz/ike/HEAD/src/2_operating_systems/computer_stack.jpeg -------------------------------------------------------------------------------- /src/3_computer_organization/mc_in_ram.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaloz/ike/HEAD/src/3_computer_organization/mc_in_ram.jpeg -------------------------------------------------------------------------------- /src/4_programming_languages/c_workflow.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaloz/ike/HEAD/src/4_programming_languages/c_workflow.jpeg -------------------------------------------------------------------------------- /src/2_operating_systems/ms_container_v_vm.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaloz/ike/HEAD/src/2_operating_systems/ms_container_v_vm.png -------------------------------------------------------------------------------- /src/3_computer_organization/mem_model_1.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaloz/ike/HEAD/src/3_computer_organization/mem_model_1.jpeg -------------------------------------------------------------------------------- /src/3_computer_organization/mem_model_2.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaloz/ike/HEAD/src/3_computer_organization/mem_model_2.jpeg -------------------------------------------------------------------------------- /src/3_computer_organization/memory_in_use.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaloz/ike/HEAD/src/3_computer_organization/memory_in_use.png -------------------------------------------------------------------------------- /.vscode/settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "spellchecker.ignoreWordsList": [ 3 | "volatile—meaning", 4 | "either—since" 5 | ] 6 | } -------------------------------------------------------------------------------- /src/3_computer_organization/minecraft_chat.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaloz/ike/HEAD/src/3_computer_organization/minecraft_chat.png -------------------------------------------------------------------------------- /src/3_computer_organization/render_distance.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaloz/ike/HEAD/src/3_computer_organization/render_distance.png -------------------------------------------------------------------------------- /src/3_computer_organization/minecraft_username.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaloz/ike/HEAD/src/3_computer_organization/minecraft_username.jpg -------------------------------------------------------------------------------- /src/4_programming_languages/interp.md: -------------------------------------------------------------------------------- 1 | # Interpretation & Execution 2 | 3 | This section has yet to be expanded on. For now, you can run programs with `python3 program_name`. -------------------------------------------------------------------------------- /src/4_programming_languages/python.md: -------------------------------------------------------------------------------- 1 | # Python 2 | 3 | ## Learning to use Python 4 | Just like the [C](./c.md) section, you will need to at least know the general syntax of Python. Knowing this language really helps with hacking fast. -------------------------------------------------------------------------------- /push_book.sh: -------------------------------------------------------------------------------- 1 | git worktree add -f /tmp/book gh-pages 2 | mdbook build 3 | rm -rf /tmp/book/* 4 | cp -rp book/* /tmp/book/ 5 | cd /tmp/book 6 | git pull 7 | echo "ike.mahaloz.re" > CNAME 8 | git add -A 9 | git commit -m 'book update' 10 | git push origin gh-pages 11 | cd - -------------------------------------------------------------------------------- /src/4_programming_languages/debugging_py.md: -------------------------------------------------------------------------------- 1 | # Debugging 2 | 3 | This section has yet to be expanded on. For now you can do this: 4 | 5 | `import ipdb; ipdb.set_trace()` in your Python file wherever you want to break like when you use gdb. This will open up a `ipdb` interface that is exactly like gdb. -------------------------------------------------------------------------------- /src/2_operating_systems/linux_os.md: -------------------------------------------------------------------------------- 1 | # Linux OS 2 | 3 | Instead of explaining how the Linux Operating System works from the ground up, we will 4 | instead attempt to learn how to use it as quickly as possible. To do this, we will cover: 5 | 6 | - Using the terminal 7 | - Permissions and how they work 8 | - SSHing 9 | - Puzzle solving in the shell -------------------------------------------------------------------------------- /src/4_programming_languages/programming_languages.md: -------------------------------------------------------------------------------- 1 | # Programming Languages 2 | 3 | A chapter on how programming languages differ, using C, using Python, and preparing 4 | ourselves for a deeper dive into how the computer works using compiled languages. 5 | Even if you have written C or Python before, I recommend glancing over the sections. -------------------------------------------------------------------------------- /src/5_processes/proc_and_debug.md: -------------------------------------------------------------------------------- 1 | # Processes and Debugging 2 | 3 | This section has yet to be completed. For now just go straight to the challenge in pwn.college. 4 | 5 | First you will want to do the [interaction](https://dojo.pwn.college/challenges/interaction) module, then go on to the [debugging](https://dojo.pwn.college/challenges/gdb) module. -------------------------------------------------------------------------------- /src/2_operating_systems/operating_systems.md: -------------------------------------------------------------------------------- 1 | # Operating Systems 2 | A speed introduction to operating systems and the fundamentals needed to be a hacker 3 | when working with Linux. If you are already familar with Linux, skip to the last 4 | sub-section of this chapter: [hacker practice](./hacker_practice.md) and do the 5 | shell practice section. 6 | -------------------------------------------------------------------------------- /src/3_computer_organization/memory.md: -------------------------------------------------------------------------------- 1 | # Memory 2 | 3 | Returning back from our discussion on [counting](./math_and_counting.md), let's talk about RAM, which we now refer to as computer memory. We will first talk about how the Von Neumann architecture affects how we run programs; then, we will talk about how bits dictate just how large programs can be. -------------------------------------------------------------------------------- /src/3_computer_organization/computer_org.md: -------------------------------------------------------------------------------- 1 | # Computer Organization 2 | 3 | In the last chapter, you learned how to use other people's programs (`ls`, `sh`, `find`), virtualize things, and how the operating system 4 | interacts with programs. In this chapter, we learn how to write our own programs, interface directly with the kernel, and understand how the computer manages high-level ideas for our low-level solutions. 5 | -------------------------------------------------------------------------------- /src/2_operating_systems/virtualization.md: -------------------------------------------------------------------------------- 1 | # Virtualization 2 | 3 | Virtualization is the cornerstone of modern operating systems. To virtualize something is 4 | to create a virtual version of it that does not require hardware. As an example, it's like 5 | running an Android phone on your laptop. The android operating system is supposed to only 6 | be run on phone hardware (for the most part), but yet we can run it on a laptop with 7 | an [emulator](https://developer.android.com/studio/run/emulator). 8 | -------------------------------------------------------------------------------- /.github/workflows/gh-pages.yml: -------------------------------------------------------------------------------- 1 | name: github pages 2 | 3 | on: 4 | push: 5 | branches: 6 | - main 7 | 8 | jobs: 9 | deploy: 10 | runs-on: ubuntu-18.04 11 | steps: 12 | - uses: actions/checkout@v2 13 | 14 | - name: Setup mdBook 15 | uses: peaceiris/actions-mdbook@v1 16 | with: 17 | mdbook-version: '0.4.6' 18 | # mdbook-version: 'latest' 19 | 20 | - run: mdbook build 21 | 22 | - name: Deploy 23 | uses: peaceiris/actions-gh-pages@v3 24 | with: 25 | github_token: ${{ secrets.GITHUB_TOKEN }} 26 | publish_dir: ./book 27 | -------------------------------------------------------------------------------- /src/2_operating_systems/permissions.md: -------------------------------------------------------------------------------- 1 | # Permissions 2 | 3 | The permission to do something is a major component of hacking. Often, we are trying to escelate 4 | our privligae in whatever scenario that is. This can mean becoming the admin of a computer you don't own, 5 | or simply being able to access restricted things in a video game. Following in this nature, files 6 | in the Linux operating system have permissions. 7 | 8 | To learn how they work, read [this](https://linuxcommand.org/lc3_lts0090.php). It should be easier 9 | and faster than the earlier sections, considering you may have already encountered permissions in 10 | your own system. -------------------------------------------------------------------------------- /src/1_introduction/inspiration.md: -------------------------------------------------------------------------------- 1 | # Inspiration 2 | 3 | While writing this, and coming up with the idea, I took a lot of inspiration from blogs and posts as well 4 | as entire pages dedicated to the education of CTF knowledge. A few stick out in my mind: 5 | 6 | - [Nightmare Book](https://guyinatuxedo.github.io/index.html): theme, idea, use of mdbooks 7 | - [CTF Wiki](https://ctf-wiki.org/): layout, reversing & pwning content 8 | - [pwn.college](https://pwn.college/): content 9 | - [pico.ctf](https://picoctf.org/): educating noobs in CTF 10 | 11 | There are many others that I did not list here, but I will try to link them in as we get through some 12 | of the content that cites them. 13 | -------------------------------------------------------------------------------- /src/2_operating_systems/virtual_machines.md: -------------------------------------------------------------------------------- 1 | # Virtual Machines 2 | 3 | ## What is a Virtual Machine? 4 | 5 | A virtual machine (VM) is a virtualized operating system. Read this 6 | [article](https://www.redhat.com/en/topics/virtualization/what-is-a-virtual-machine) before 7 | continuing. If you are very lazy, you can just watch this [video](https://www.youtube.com/watch?v=yIVXjl4SwVo) 8 | 9 | \* \* \* 10 | 11 | We use VMs, as said in the article, because it is isolated from the host system and 12 | it allows us to keep our nice comfortable OS on our computer. This means you will 13 | have both your host OS (Windows/MacOS), and this VM of Linux for hacking. 14 | 15 | ## Setting up a VM 16 | 17 | If you are an ASU student, use VMWare. The school has given access 18 | to it for free. Get it [here](https://ets.engineering.asu.edu/vmware/). 19 | If you are not, [Virtualbox](https://www.virtualbox.org/wiki/Downloads) works as well. 20 | 21 | \* \* \* 22 | 23 | Now that you have download the software to use a VM, let's actually make an Ubuntu VM. 24 | Follow the tutorial below for the software you are using: 25 | - [VMWare Tutorial](https://linuxhint.com/install_ubuntu_vmware_workstation/) 26 | - [Virtual Box Tutorial](https://itsfoss.com/install-linux-in-virtualbox/) 27 | 28 | After following the tutorials above, you should have an Ubuntu machine ready to go with a 29 | login. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 'Ike 2 | 'Ike: A binary exploitation and reversing handbook 3 | 4 | ## About 5 | 6 | Welcome to `'Ike`! This handbook is intended to take those with 0 system hacking experience (that's right 0), and get them to an entry level point within the [pwn.college](https://pwn.college) ecosystem. By following these steps, you can establish a strong base upon which to build your future hacking knowledge on. This process will essentially provide you with your white belt in hacking (at a system level) which will only be enhanced within the subsequent pwn.college program. While this handbook leans into the CTF atmosphere commonly seen on sites like [CTFTime](https://ctftime.org/ctf-wtf/), the content should prove insightful to those curious individuals with no CTF experience (if you are looking for web skills, look somehwere else :D). If you 7 | get some usage out of this handbook, I ask you to hmu with a star on the [github repo](https://github.com/mahaloz/ike), so I know people actually use 8 | this, lol. 9 | 10 | The site with the full hosted book is at [ike.mahaloz.re](https://ike.mahaloz.re), check it out. 11 | 12 | ## Contributing 13 | 14 | Since I am bad with GitHub actions, currently I make changes to the markdown, then build it with `mdbook build` and push the html to the `gh-pages` 15 | branch. I do all of this with the script `push_book.sh`. If you have changes, simply **make a pull request to the master branch only**. 16 | I will then push the html changes to the `gh-pages` branch. 17 | 18 | -------------------------------------------------------------------------------- /src/3_computer_organization/hacker_practice.md: -------------------------------------------------------------------------------- 1 | # Hacker Practice 2 | 3 | ## EmbryoASM 4 | 5 | For `'ike` we developed a module to teach and test x86 with the intention of serving it to people who have never programmed before. The hacker practice for this chapter will involve you solving all the challenges on `EmbryoASM`, which will cover every subsection of Assembly in this chapter and test your knowledge of logic, memory, and how an architecture works. 6 | 7 | For now, we have `EmbryoASM` deployed on `pwn.college` as the `Assembly Refresher` module. To play the levels, first register an account on [dojo.pwn.college](https://dojo.pwn.college/register). After that, skip all other modules and go right to [ASM Crash Course](https://dojo.pwn.college/fundamentals/assembly-crash-course). 8 | 9 | ## How to use pwn.college 10 | 11 | To play a level, first click the `start` button. Next, you have two options: 12 | 1. You can play in browser by now clicking the `Workspace` tab which will open a VS code instance in your browser with an embedded terminal 13 | 2. (**Recommended**) You can `ssh` onto the box after hitting play. It will start a Docker container ready for you to connect at `dojo.pwn.college` as user `hacker` 14 | 15 | To do option `2`, you must first upload an ssh key in the [settings](https://dojo.pwn.college/settings#key) tab of your profile. You will use this same key to ssh onto the pwn.college instance. After that you can connect like so: 16 | ```bash 17 | ssh -i /path/to/key hacker@dojo.pwn.college 18 | ``` 19 | -------------------------------------------------------------------------------- /src/4_programming_languages/c.md: -------------------------------------------------------------------------------- 1 | # C 2 | 3 | Our compiled language of choice will be `C` because it is the most fundamental 4 | and widely used compiled language that exists. All languages build on C's 5 | success, so we will start with it. 6 | 7 | ## Learning to program in C 8 | 9 | This is where the handbook does less justice. It's a lengthy process to teach 10 | someone how to program in another language, so I will defer this process to 11 | an external location. 12 | 13 | First, I recommend reading from the **beginning to page 20** in the book 14 | `Hacking: The Art of Exploitation, 2nd Edition`. This will give you a 15 | high-level overview of the structures we use in programming. Next, you 16 | will need to actually learn the general syntax of C. I am not asking you 17 | to become a master, but you must learn the fundamentals. 18 | 19 | Here are some tutorials I recommend, either will take at least a week or more to complete, 20 | but you got this. You are a hacker now. 21 | 22 | Tutorials, choose one: 23 | - [Learn-C](https://www.learn-c.org/) 24 | - [Programiz](https://www.programiz.com/c-programming) 25 | 26 | Before starting on the actual tutorial, it may be helpful to read the other 27 | three sub-sections associated with this section, i.e.: 28 | - [Writing Code](./write_c.md) 29 | - [Compiling & Running Code](./compilation.md) 30 | - [Debugging Code](./debugging_c.md) 31 | 32 | They are only there to give you suggestions on good ways to write, compile, and debug 33 | from my personal experience. These will help you write code faster while you learn, but 34 | may not be necessary since many tutorials have you program in the browser now. 35 | 36 | **OK**, start the tutorial and move on to the Python section when you are done. -------------------------------------------------------------------------------- /src/4_programming_languages/write_py.md: -------------------------------------------------------------------------------- 1 | # Writing Code 2 | 3 | Writing code in Python is *very* similar to writing code in 4 | any other language, but has the subtle difference of being able 5 | to test ideas quickly in a interactive interpreter. 6 | 7 | First, go and follow everything done in the [Writing C](./write_c.md) 8 | section. This will get you setup with Vim and tmux. 9 | 10 | ## Testing ideas as you go 11 | 12 | Python is very fast to write and even faster to test. For this reason 13 | I recommend almost always writing code in the terminal using vim, 14 | tmux, and a secret new tool called `IPython`. 15 | 16 | First install it: 17 | ```bash 18 | sudo apt-get install ipython3 19 | ``` 20 | 21 | What we just installed is an upgraded interactive python interpreter. 22 | Run it: 23 | 24 | ```bash 25 | ipython3 26 | ``` 27 | 28 | Now you are in a nice interpreter: 29 | 30 | ``` 31 | ▶ ipython3 32 | Python 3.6.9 (default, Oct 8 2020, 12:12:24) 33 | Type "copyright", "credits" or "license" for more information. 34 | 35 | IPython 5.5.0 -- An enhanced Interactive Python. 36 | ? -> Introduction and overview of IPython's features. 37 | %quickref -> Quick reference. 38 | help -> Python's own help system. 39 | object? -> Details about 'object', use 'object??' for extra details. 40 | 41 | In [1]: 42 | ``` 43 | 44 | Now you can test ideas quickly and instantly see the output. I recommend 45 | always having this open as a pane when you are writing code. It will 46 | allow you to test ideas very quickly and do other very interesting things. 47 | 48 | Assume we have a python file called `my_program.py`, and it looks like this: 49 | ```python 50 | def reverse(lst): 51 | return lst[::-1] 52 | ``` 53 | 54 | Yes it's just a function. Now open up `ipython3` and do this: 55 | 56 | ```python 57 | In [1]: from my_program import reverse 58 | 59 | In [2]: my_lst = [1,2,3,4] 60 | 61 | In [3]: reverse(my_lst) 62 | Out[3]: [4, 3, 2, 1] 63 | ``` 64 | 65 | Yup, we can directly execute the function inside our file. It makes for good writing. -------------------------------------------------------------------------------- /src/3_computer_organization/introduction.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | To cover computer organization is to cover what it means to *be a computer*. The way we organize the hardware dictates how our computer works. The organization of hardware in a particular form is called a [computer architecture](https://en.wikipedia.org/wiki/Computer_architecture). 4 | 5 | ## The Von Neumann Architecture 6 | 7 | The von Neumann architecture is by far the most famous and well-known abstract computer architecture in computer science. Its principal belief is that our computer is a "pyramid" of memory units that get faster but smaller as we go up the stack. You put things you need faster in the higher level memories, and things you need slower and in larger quantities in the lower level memories: 8 | 9 | ![](./mem_model_1.jpeg) 10 | 11 | Abstractly, this is done because faster memory units are more expensive to make, and usually, we don't need faster memory for long. So what the hell is a faster memory unit? Let's revise the picture above: 12 | 13 | ![](./mem_model_2.jpeg) 14 | 15 | Two commonly known memory units are your hard drive and your RAM. As you likely know, the RAM is much smaller than your hard drive, but somehow much more expensive. In a parallel fashion, your RAM is also much faster than your hard drive. From now on, let's refer to your RAM as [computer memory](https://www.computerhope.com/jargon/m/memory.htm). You probably have 8 Gigabytes of memory. Why is it so small? What is your computer memory actually used for? Why does it exist aside from your hard drive? The Von Neumann architecture dictates that it exists to put things we need more frequently or faster in. The special thing about your computer memory, though, is that it is volatile—meaning anything put in it is deleted when your computer restarts. 16 | 17 | Often the things we need in memory are programs, because we want those to run as fast as possible. Before we talk about putting programs in memory, we need to briefly talk about how we count in computers (yes, strange idea). Move on to the next reading. 18 | -------------------------------------------------------------------------------- /src/1_introduction/introduction.md: -------------------------------------------------------------------------------- 1 | # `Ike: The Systems Hacking Handbook 2 | 3 | ## Introduction 4 | 5 | Welcome to `'Ike` (pronounced Eeh-Kay). This handbook is intended to take those with 0 system hacking experience (that's right 0), and get them to an entry-level point within the [pwn.college](https://pwn.college) ecosystem. By following these steps, you can establish a strong base upon which to build your future hacking knowledge on. This process will essentially provide you with your white belt in hacking (at a system level), which will only be enhanced within the subsequent pwn.college program. While this handbook leans into the CTF atmosphere commonly seen on sites like [CTFTime](https://ctftime.org/ctf-wtf/), the content should prove insightful to those curious individuals with no CTF experience (if you are looking for web skills, look somewhere else :D). If you 6 | get some usage out of this handbook, I ask you to hmu with a star on the [github repo](https://github.com/mahaloz/ike), so I know people actually use this, lol. 7 | 8 | Through this `Introduction` section, you will find meta-data about the handbook, why it was written, 9 | and what I hope to accomplish with it as well as the target audience. 10 | 11 | ## About 12 | As [president](https://zionbasque.com) of the [ASU Hacking Club](http://asuhacking.club), I have seen a common trend with regards to incoming club recruits. They tend to be driven Freshman or Sophomores who have yet to take the necessary classwork or gain the necessary experience to easily transition into the pwn.college infrastructure. While I currently occupy a leadership position in ASU-HC and Shellphish, I was once in the same position as many of our struggling recruits. Ideally, this perspective should allow me to offer the necessary resources when getting started in ones “hacking” career. I would hope that the following material also helps those occupying an even earlier academic background. If you find any spelling errors or suggestions, feel free to PR me on [github](https://github.com/mahaloz/ike), or just contact me directly on discord: `mahaloz#1337`. 13 | 14 | ## Contributors 15 | 16 | - [Zion Leonahenahe Basque](https://zionbasque.com): main author 17 | - Scott Weston: editor -------------------------------------------------------------------------------- /src/2_operating_systems/terminal.md: -------------------------------------------------------------------------------- 1 | # The Terminal 2 | 3 | You got a little taste of using the terminal in the last section, but now it's time we 4 | got literate. Expect this section to take a few hours since we will be out-sourcing 5 | work to a few different problems/tutorials across the web. 6 | 7 | For the majority of getting you good with begging parts of Linux, we will be using 8 | the amazing class [The Missing Semester](https://missing.csail.mit.edu/) by MIT. 9 | We will not be doing the entire class, only the parts that matter the most. 10 | 11 | ## The Shell 12 | 13 | You can think of the shell as the terminal you used earlier. The shell is your 14 | text interface into the computer. It allows you to do things quickly and powerfully. 15 | To become a good hacker, you must first learn how to navigate and use a shell. 16 | 17 | To do this, we will now jump to the earlier mentioned MIT course on hacking 18 | tools. Read **and do the exercises** of this lesson: [Topic 1: The Shell](https://missing.csail.mit.edu/2020/course-shell/#topic-1-the-shell). 19 | If you do not like reading, you can also find the lecture at the top of that page. 20 | This will take some time, so come back here when you are done. 21 | 22 | *take the time to do the exercises before moving on, it will help* 23 | 24 | ## Shell Scripting 25 | 26 | At this point, it should start becoming clearer that each shell has it's own 27 | scripting language that makes automation very easy. Hacking often involves 28 | repetitive tasks and doing things on a massive level from a shell. Let's 29 | learn how to use *scripting* (using the language) in a powerful way. 30 | 31 | Like last time, Read **and do the exercises** of this lesson: [Topic 2: Shell Tools](https://missing.csail.mit.edu/2020/shell-tools/) 32 | This will take some time, so come back here when you are done. 33 | ## Using VIM 34 | 35 | Often, after scripting and finding your way to files in your filesystem you 36 | need to edit them. Currently, you should have no idea how to edit files in your terminal, but we 37 | are going to change that now. Hackers need to work fast, and that means writing code directly 38 | from the same interface you use to run it. The terminal is this place. To edit things 39 | fast, we use a quaint program called [VIM](https://www.vim.org/) -- it looks old and feels old, 40 | but it is very powerful. 41 | 42 | Like earlier, Read **and do the exercises** of this lesson: [Topic 3: Editors](https://missing.csail.mit.edu/2020/editors/) 43 | This will take some time, so come back here when you are done. 44 | 45 | After this, you will be quite a powerful user. -------------------------------------------------------------------------------- /src/SUMMARY.md: -------------------------------------------------------------------------------- 1 | # Summary 2 | 3 | - [Introduction](./1_introduction/introduction.md) 4 | - [Background](./1_introduction/background.md) 5 | - [Inspiration](./1_introduction/inspiration.md) 6 | - [Overview & Schedule](./1_introduction/overview.md) 7 | 8 | - [Operating Systems](./2_operating_systems/operating_systems.md) 9 | - [Introduction](./2_operating_systems/introduction.md) 10 | - [Virtualization](./2_operating_systems/virtualization.md) 11 | - [Virtual Machines](./2_operating_systems/virtual_machines.md) 12 | - [Containers](./2_operating_systems/containers.md) 13 | - [Linux OS](./2_operating_systems/linux_os.md) 14 | - [The Terminal](./2_operating_systems/terminal.md) 15 | - [Permissions](./2_operating_systems/permissions.md) 16 | - [Hacker Practice](./2_operating_systems/hacker_practice.md) 17 | 18 | - [Computer Organization](./3_computer_organization/computer_org.md) 19 | - [Introduction](./3_computer_organization/introduction.md) 20 | - [Math & Counting](./3_computer_organization/math_and_counting.md) 21 | - [Bits & Logic](./3_computer_organization/bits_and_logic.md) 22 | - [Memory](./3_computer_organization/memory.md) 23 | - [Programs in Memory](./3_computer_organization/programs_in_mem.md) 24 | - [Addressing](./3_computer_organization/addressing.md) 25 | - [Memory Segments](./3_computer_organization/memory_segments.md) 26 | - [Assembly](./3_computer_organization/assembly.md) 27 | - [Registers](./3_computer_organization/registers.md) 28 | - [Instructions](./3_computer_organization/instructions.md) 29 | - [ASM Memory](./3_computer_organization/asm_memory.md) 30 | - [Control Structures](./3_computer_organization/control_structures.md) 31 | - [Hacker Practice](./3_computer_organization/hacker_practice.md) 32 | 33 | - [Programming Languages](./4_programming_languages/programming_languages.md) 34 | - [Introduction](./4_programming_languages/introduction.md) 35 | - [C](./4_programming_languages/c.md) 36 | - [Writing Code](./4_programming_languages/write_c.md) 37 | - [Compilation & Execution](./4_programming_languages/compilation.md) 38 | - [Debugging](./4_programming_languages/debugging_c.md) 39 | - [Python](./4_programming_languages/python.md) 40 | - [Writing Code](./4_programming_languages/write_py.md) 41 | - [Interpretation & Execution](./4_programming_languages/interp.md) 42 | - [Debugging](./4_programming_languages/debugging_py.md) 43 | 44 | 45 | - [Processes and Debugging](./5_processes/proc_and_debug.md) 46 | - [Introduction](./5_processes/introduction.md) 47 | - [Executable Types](./5_processes/executable_types.md) 48 | - [Using GDB](./5_processes/gdb.md) 49 | - [Interacting with Python](./5_processes/interaction.md) 50 | 51 | - [Security Concepts](./6_security/security.md) 52 | - [CIA](./6_security/cia.md) 53 | - [Encryption & Hashing](./6_security/enc_and_hash.md) 54 | - [Bruteforcing](./6_security/bruteforcing.md) 55 | - [DOSing](./6_security/dos.md) 56 | - [I/O Vectors](./6_security/io.md) 57 | 58 | - [What's Next](./whats_next.md) 59 | -------------------------------------------------------------------------------- /src/3_computer_organization/assembly.md: -------------------------------------------------------------------------------- 1 | # Assembly 2 | 3 | Following our talk on the section on bits, we can now address the computer 4 | looking section of our chapter, **Assembly**. 5 | 6 | An assembly language, better referred to as an [ISA](https://en.wikipedia.org/wiki/Instruction_set_architecture), is the lowest level of instructions that run on a computer. These instructions, or operations on data, are predefined by the hardware you use! At the lowest level, each instruction in an ISA is a combination of logic gates that you have already learned. Before continuing, watch this 15 minutes video on what an architecture is by Yan [here](https://www.youtube.com/watch?v=9jc0eSnrzF4). 7 | 8 | \* \* \* 9 | 10 | As far as ISA's go, there are two types: 11 | - [RISC](https://en.wikipedia.org/wiki/Reduced_instruction_set_computer): Reduced Instruction Set Computer 12 | - [CISC](https://en.wikipedia.org/wiki/Complex_instruction_set_computer): Complex Instruction Set Computer 13 | 14 | RISC and CISC are competitors and mostly differ in the side-effects their instructions have on memory. If you are interested in this difference, then you can read what [Stanford has to say on the matter](https://cs.stanford.edu/people/eroberts/courses/soco/projects/risc/risccisc/), but it's not required. For this handbook we will be only using and referring to assembly for [Intel x86_64](https://en.wikipedia.org/wiki/X86-64). 15 | 16 | ## Intel x86_64 Assembly 17 | 18 | The assembly language we will be studying in this handbook is by no means the best assembly language to start with. Ideally, we would've started with a RISC architecture because they are easier to learn, but, the world had different ideas. 19 | 20 | There is an extremely high probability that you are reading this text on an Intel x86_64 machine. Dell, Lenovo, Apple; they all run x86_64. The only exception is the recent Mac M1 processor, but for the most part, the world's computers run on the Intel ISA. With that knowledge, I though it was most practical to teach you the most common architecture. 21 | 22 | ## How to read the rest of this section 23 | 24 | It's time we dive into x86_64, which we will shorthand to _x86_ from now. All the subsections in this section should be read in an _overview_ manner. I wrote these modules to assist and act as a reference for the challenges associated with this section. 25 | 26 | The challenges you will complete as part of this section are the EmbryoASM challenges hosted on pwn.college. Myself and @redgate wrote these challenges to teach you assembly rather than just test your skill. You will find the challenges on the [dojo](https://pwn.college/computing-101/assembly-crash-course/). Start them after reading the sections below. 27 | 28 | 29 | ## Quick Reference Links 30 | - Instructions 31 | - [Encoding](https://defuse.ca/online-x86-assembler.htm) 32 | - [Reference/Description](https://www.felixcloutier.com/x86/) 33 | 34 | - Syscalls 35 | - [Calling Convention](https://chromium.googlesource.com/chromiumos/docs/+/master/constants/syscalls.md) 36 | 37 | - Challenges 38 | - [EmbryoASM](https://pwn.college/computing-101/assembly-crash-course/) 39 | -------------------------------------------------------------------------------- /src/1_introduction/overview.md: -------------------------------------------------------------------------------- 1 | # Overview & Schedule 2 | 3 | ## Overview 4 | We will cover as few concepts as possible to make you dangerous enough to start down a systems 5 | hacking journey. I will usually use external tutorials embedded in this one to get ideas, 6 | skills, or setups done quickly. This handbook acts as duct tape for all those tutorials. 7 | 8 | As seen in the table of contents on the left, we go through 5 major topics: 9 | - Operating Systems 10 | - ASU Equiv: CSE330,365. Overviews what makes something an operating system, how we can 11 | virtualize them, and how we use them in hacking. Includes a set up for containers 12 | and VMs. 13 | - Programming Languages 14 | - ASU Equiv: CSE340,240. Overviews what makes something a programming language, then covers 15 | a interpreted and compiled language that are commonly used for hacking. This includes 16 | how to use them efficiently. 17 | - Computer Organization 18 | - ASU Equiv: CSE230. Overviews the internals of computing and how we organize them, 19 | including memory layout, assembly (x86), and the types of executable file types 20 | you will find in the wild. 21 | - Processes & Debugging 22 | - ASU Equiv: CSE330,240. Overviews how processes work in Linux and generally. Includes 23 | learning how to trace, debug, and understand running process. Also has a small intro 24 | to using Python to mess with running processes. 25 | - Security Concepts 26 | - ASU Equiv: CSE365. Does a small dive into some common security concepts that will 27 | help new hackers think about problems in different ways. Highly inspired by Adam 28 | Doupe's CSE365. 29 | 30 | ## How use this handbook 31 | To efficiently use this handbook you should be prepared to follow various links across the web that 32 | I've curated with time. Most links I use are often for things that are more copy-paste-and-follow like. 33 | As an example, in the next section, I'll have you install a Virtual Machine. In that section I use 34 | an external link to have you install it so I don't duplicate work that is already one well. Other things 35 | like using more complex stuff will usually be covered in the handbook. 36 | 37 | Another thing to note is the use of the Dinkus within this text: 38 | ``` 39 | * * * 40 | ``` 41 | Whenever you see this symbol it means there is link you will need to follow between the text above 42 | and below it. That link will take more than 15 minutes to complete. As an example, in the next 43 | section's introduction I place a link to "What is Linux." I estimate that you will take around 44 | 15 minutes or more of time to read that link before progressing onto the next section. Take your 45 | time with the reading, its the absolute minimum you need to read to get that chapters topic. 46 | 47 | Lastly, at the end each section, excluding this one, you can find a practice challenge-set that 48 | will test your knowledge on that chapter. It is highly recommended that you complete all the 49 | challenges the chapter asks for before you move on to the next chapter. Most of our challenges 50 | currently can be found on [dojo.pwn.college](https://dojo.pwn.college/challenges) 51 | 52 | ## Work in Progress 53 | 54 | The following sections are still being worked on: 55 | 56 | WIP: 57 | - Programming Languages (50% complete) 58 | - Processes & Debugging (25% complete) 59 | - Security Concepts (0% complete) 60 | 61 | -------------------------------------------------------------------------------- /src/3_computer_organization/registers.md: -------------------------------------------------------------------------------- 1 | # Registers 2 | 3 | ## Introduction 4 | Welcome to the land of x86. The first thing you need to learn is where _things_ are stored when you run instructions. What's in an instruction? What's a thing? Let's start with some simple math examples. 5 | 6 | In math, you often have variables where you store things. Often, those things are numbers. 7 | 8 | ``` 9 | x = 10 10 | x = x + 4 11 | x = x / 2 12 | x = x - 1 13 | ``` 14 | 15 | We can assign values, reassign values, and do general computation on them. The nice thing about math is that a variable has no size. When you think about assigning a value to `x`, you never wonder: will the value fit in x? As an example: 16 | 17 | ``` 18 | x = 18446744073709551616 19 | ``` 20 | 21 | > Note: From now on, 'x ** y' means x to the y power and 'x ^ y' means x _xored_ with y. 22 | 23 | In the last section, we talked about bits and hex. This value is actually `0xffffffffffffffff + 1`. The number of bits it would take to represent this number would be `65` bits `(2 ** 64) + 1`. In computer science, when we say: "what is the size of x", we are usually talking about the number of bits that value takes up. To make things easier to say in a short sentence, we instead say the size in the number of bytes. 24 | 25 | > Recall: 8 bits == 1 byte 26 | 27 | In x86, and most assembly languages, you have registers which act as variables for doing computation. In x86 (the 64 bit version), registers are 64 bits large (8 bytes). As you may be guessing, in x86 32 bit, the registers are 32 bits large. 28 | 29 | Each register in x86 has a name. Here are their names: 30 | ``` 31 | rax 32 | rbx 33 | rcx 34 | rdx 35 | rbp 36 | rsp 37 | rsi 38 | rdi 39 | rip 40 | r8 41 | r9 42 | r10 43 | r11 44 | r12 45 | r13 46 | r14 47 | r15 48 | ``` 49 | 50 | For now, we just say that any of these registers can hold a number that is up to 64 bits large. In reality, each of these registers are used for different actions in x86. Here is a good [register use reference list](https://wiki.cdot.senecacollege.ca/wiki/X86_64_Register_and_Instruction_Quick_Start) for later. 51 | 52 | ## Using Registers 53 | 54 | Each register can be accessed in different ways. Why must you always use all 64 bits of a register. Take this for example: say we want to set `rax` == `0xffffffffffffffff`, but we already know `rax` has `0xffffffff00000000`: 55 | 56 | ```c 57 | // we know rax = 0xffffffff00000000 58 | 59 | eax = eax | 0xffffffff 60 | 61 | // now rax = 0xffffffffffffffff 62 | ``` 63 | 64 | In this example we used a logical OR instruction covered in [bits-and-logic](./bits_and_logic.md) to OR the bottom 32 bits of `rax`. The way you access the bottom 32 bits or `rax` is with `eax`. It just so happens that every register has splits like this. 65 | Here is an illustration of all the bits and how you can access them: 66 | 67 | ``` 68 | MSB 32 16 8 0 69 | +----------------------------------------+ 70 | | rax | 71 | +--------------------+-------------------+ 72 | | eax | 73 | +---------+---------+ 74 | | ax | 75 | +----+----+ 76 | | ah | al | 77 | +----+----+ 78 | ``` 79 | 80 | MSB here stands for Most Significant Bit, or the high part we referred to earlier. As an example, you can access the first 8 bits of `rax` by reading from `al`. All registers have name access like this. See [this reference](https://wiki.cdot.senecacollege.ca/wiki/X86_64_Register_and_Instruction_Quick_Start) for more use cases. In most cases, you just change the first two letters to access different parts like shown above. 81 | 82 | ## Special Registers 83 | 84 | Some registers are special and will make more sense later. Here they are: 85 | 86 | - rbp: the stack base pointer (bp) 87 | - rip: the instruction pointer (ip) 88 | - rsp: the stack pointer (sp) -------------------------------------------------------------------------------- /src/3_computer_organization/programs_in_mem.md: -------------------------------------------------------------------------------- 1 | # Programs in Memory 2 | 3 | ## Things Exist in Memory 4 | 5 | Recall our earlier discussion on the Von Neumann architecture. We want things that we need faster to be used in the RAM since it is faster than the disk. 6 | 7 | The things we usually need faster than anything else are our running **programs**. In addition, we don't need the space taken up by the running program to exist for a *long time*. Once we close our program, we want the running code that runs that program to be destroyed. To clarify, we don't want the data that the program creates destroyed, but the actual code that runs the program for that instance of it running. That's a lot of circular logic. Let's explain it out for our favorite game Minecraft. 8 | 9 | ### Minecraft as something living in Memory 10 | 11 | Let's say you wanted to start up Minecraft, open your favorite world you have been working on for three months, and place a single block from your inventory, then save and quit. Here's what happens: 12 | 13 | 1. You double click the Minecraft icon 14 | 2. Immediately, the OS places a copy of Minecraft's Code into RAM 15 | 3. The OS (CPU) starts **executing** the location of memory Minecraft specified to start with 16 | 17 | Now Minecraft is running and you can interact with it. When you click buttons in the Minecraft game, it simply transitions to different locations in memory where those button's code exists. Now you open your world: 18 | 19 | 1. You click "open world" and you select your awesome world. 20 | 2. Immediately, Minecraft finds the world on your *hard drive* where it has been saved for months 21 | 3. Minecraft copies all the files that make it a world and places it in RAM 22 | 23 | Now you are playing Minecraft in your favorite world! When you place a block down, that block is placed in the "copied" world that is currently in RAM. That's right, all of the changes you make before saving are in RAM; that's why games always tell you "don't turn off while saving..." because your RAM contents are essentially deleted when your system turns off. 24 | 25 | So you place the block and save & quit. Then close Minecraft. 26 | 27 | 1. Your computer copies the content of the world in RAM and re-saves it over the world on your hard drive 28 | 2. Minecraft deletes the world copied into RAM 29 | 3. Minecraft sends a signal to the OS that it is done running 30 | 4. The OS gets that signal and destroyed the Minecraft copied into RAM. 31 | 32 | Here is a diagram to summarize everything that happened: 33 | 34 | ![](./mc_in_ram.jpeg) 35 | 36 | 37 | Notice how this whole time we were playing on code that was copied into RAM. The entire time, the real Minecraft was sitting comfortably on your hard drive while a copy of it was doing all the world. As with everything you make changes to on your computer, copies are first placed into memory and then saved back to the hard drive after editing. 38 | 39 | As you can see, we care about the response time of Minecraft so we copy it into memory. As with all things we need fast, they go in memory. 40 | 41 | ## Observing things taking Memory 42 | 43 | Open up the fancy terminal again and install `htop`. 44 | 45 | ```bash 46 | sudo apt-get install htop 47 | ``` 48 | 49 | `htop` is an upgraded version of `top`. It allows you to see what is running on your computer and how much "things" it is using. One of those things is memory. Run `htop` by typing the command `htop` in the terminal. You should see a big thing pop open, with what looks like a sound bar thing on the top of the screen. One of them is labeled `Mem` for Memory. It is of course your RAM usage. Right now, with nothing but a terminal open, you should be seeing no more than 1 Gig of memory being used. 50 | 51 | For me it looks like this: 52 | 53 | ![](./memory_in_use.png) 54 | 55 | Now go to your Ubuntu desktop and open `Firefox`. You should notice maybe half a gig of memory now being taken up. If you open more tabs and websites, then more RAM will be taken. This is memory in use! 56 | 57 | ## Summary 58 | 59 | Things exist in memory. When you run a program, it is put in memory. When you edit a file it is put in memory. When you run your Virtual Machine, it is put in memory. 60 | -------------------------------------------------------------------------------- /src/1_introduction/background.md: -------------------------------------------------------------------------------- 1 | # Background 2 | 3 | ## The Education Gap 4 | Over the course of my ever-evolving CTF career, I've seen various methods for getting started in hacking. There are the classic "getting started" guides (ex. this blog), and then there are the more novel interactive frameworks (ex. pwn.college). Many of my peers tend to lean more towards interactive experiences via CTFs, but there have also been sizable portions on the team that has not. In either case, I think there is a small but impactful knowledge gap when following an official "curriculum" instead of creating one yourself. 5 | 6 | To help those cross the aforementioned gap, pwn.college was created. By investing time in pwn.college, one would ideally obtain their "yellow belt" in hacking, with the end result being a set of skills that will help you shape your own future learning curriculum. At this time, the Arizona State University class that uses the pwn.college framework (taught by [Yan](https://www.yancomm.net/)) for its curriculum is offered to those with an equivalent CS experience level of a "junior." In other words, the class expects you to know how to use Linux, understand C, binary properties ... and much more. If you are already lost, don't despair. I was in the same place as you when I started hacking. By reading this handbook and investing your time in growing your skillset, you should gain the necessary skills to take this "daunting" junior class. 7 | ## The Target Audience 8 | 9 | This handbook is targeted at anyone on or below the "junior in college" level of computer science, though even those individuals may find something useful from this handbook. This handbook is intended to give you all the material and direction you need to start pwn.college. After which, it is assumed you will continue your education with the pwn.college teaching platform. Learning from this handbook will have varying difficulty based on the 10 | reader. The people who will have the most efficient/easiest time learning from this handbook look like this: 11 | 1. Comp. Sci. College Juniors 12 | 2. Comp. Sci. College Freshmen/Sophomores 13 | 3. High School Students 14 | 15 | This of course assumes that you are in a college program that touches on systems (which it should). Things like C, debugging, memory management, operating systems. These things will make this experience much easier, 16 | but this handbook was made to help people who have not yet had the chance to take those types of classes (or don't have access to it). There the individuals who will learn/benefit the most from this is: 17 | 1. High School Students 18 | 2. Comp. Sci. College Freshmen/Sophomores 19 | 3. Comp. Sci. College Juniors 20 | 21 | In the end, many people can benefit from this, so feel free to go to exact sections if you think you have nothing to gain from the others; however, I do encourage everyone to at least skim each section since you may 22 | find some surprising tutorial/article I reference. 23 | 24 | ## The Handbook's Novelty 25 | 26 | As I briefly mentioned [above](#the-education-gap), there exist many places to start practicing pwning and reversing, but not many that will explain the introductory concepts. As an example, taking a look at the [Nightmare Book's](https://guyinatuxedo.github.io/index.html) style of teaching. It is layed out into chapters of exploitation techniques, which I like a lot. Each section is a writeup of how to solve a challenge that has the exploitation technique embedded in it. I think this method is extremely effective for learning for those with pre-established skills in Linux, C, and systems. I think this method fails for noobs who are just learning how C really works (on the memory level). In my brain, *prepare for what animations look like in my brain*, the learning barriers look like this: 27 | 28 | **Nightmare Method**: 29 | 30 | ![](./nightmare-method.png) 31 | 32 | **'Ike Method**: 33 | 34 | 35 | ![](./ike-method.png) 36 | 37 | The subtle difference here is that I believe the underlying security concept should be educated and learned about through explicit material first, then it should be reinforced with scripting and systems skills. This is not to say I don't like the Nightmare method, I love it, but others need a different one. This book is that different method. -------------------------------------------------------------------------------- /src/2_operating_systems/hacker_practice.md: -------------------------------------------------------------------------------- 1 | # Hacker Practice 2 | 3 | To practice, we are going to be using a hacking practice site called a Wargame site. These sites 4 | are similar to a CTF in that they offer challenging puzzles that require technical knowledge. 5 | They are different in the fact that they have no time limit to the competition. People from across 6 | the world can use the challenges on the site at any time. 7 | 8 | The site we are using is called Over The Wire. 9 | 10 | The wargame Over The Wire has a lot of different modules that you can practice on, but for the sake 11 | of speed in getting up to hacking whitebelt in a reasonable time, we will only be doing the 12 | [bandit](https://overthewire.org/wargames/bandit/) challenges. Before you start, let's talk about how you will play them. 13 | If you already know how to use SSH, skip to the [practice](#practice). 14 | 15 | ## SSH 16 | 17 | [SSH](https://en.wikipedia.org/wiki/SSH_(Secure_Shell)) stands for Secure Shell. It's a protocol 18 | (messaging language) used to get a remote shell on another machine. A remote shell is what it sounds 19 | like: an interactive shell you can use on a machine you don't have physical access too. After using it a 20 | few times it becomes clearer what this means. 21 | 22 | Recall the Ubuntu container we used in the [containers](./containers.md) section. We ran our 23 | docker command and it gave us a shell into a different Ubuntu version. 24 | 25 | ```bash 26 | docker run -it ubuntu:16.04 27 | ``` 28 | 29 | It resulted in a shell. And to get out of the shell, and subsequently the container, we used: 30 | 31 | ```bash 32 | exit 33 | ``` 34 | 35 | In the shell we had, which returned us to our machine. SSHing is very similar. Let's SSH into 36 | the first level of bandit, which is [here](https://overthewire.org/wargames/bandit/bandit0.html). 37 | 38 | ```bash 39 | ssh bandit0@bandit.labs.overthewire.org -p 2220 40 | ``` 41 | 42 | Input the password, `bandit0`, and now we are in a shell: 43 | 44 | ```bash 45 | bandit0@bandit:~$ 46 | ``` 47 | 48 | As you guessed it, this is likely a different version of Ubuntu then what you are running. Use our 49 | earlier command from the containers section to check what that is. To confirm that this is 50 | a machine we don't own, let's check what the ip address is. An ip address is an address that is 51 | associated with a device on the internet. Ideally, this is a unique address that no other device 52 | should share, but nowadays this is not always true. For now, assume it is unique. 53 | 54 | Run: 55 | 56 | ```bash 57 | curl https://ipinfo.io/ip 58 | ``` 59 | 60 | Take a note of the address, then `exit` the machine just like before: 61 | 62 | ```bash 63 | exit 64 | ``` 65 | 66 | Now we are back on our host machine. Run the same `curl` command again to get the ip address of 67 | your machine: 68 | 69 | ```bash 70 | curl https://ipinfo.io/ip 71 | ``` 72 | 73 | Notice the numbers are fairly different? This confirms that the machine is at least not local 74 | (on our current network). If you want to take it further, you can even look up the location 75 | associated with the ip address of `Over The Wire`. 76 | 77 | ### SSH Semantics 78 | 79 | Lastly, let's talk about the semantics of the actual `SSH` command we ran: 80 | 81 | ```bash 82 | ssh bandit0@bandit.labs.overthewire.org -p 2220 83 | ``` 84 | 85 | Like logging into any machine, it requires a username. The first part of the ssh command is 86 | the username, which is `bandit` in this case. Next is the `@` symbol to signify where the end 87 | of the username is and where the remote address begins. The address here is 88 | `bandit.labs.overthewire.org`. You may be confused here because it does not look like a normal 89 | ip address, which is just numbers. This is due to [DNS](https://www.cloudflare.com/learning/dns/what-is-dns/). 90 | DNS is outside the scope of this section, but just know it allows you to have fancy names point 91 | to normal looking ip addresses. 92 | 93 | So far we have learned SSH looks like: 94 | 95 | ```bash 96 | ssh @ 97 | ``` 98 | 99 | The last thing we have to talk about is the `-p 2220` in the command. This is an option that specifies 100 | a [port](https://en.wikipedia.org/wiki/Port_(computer_networking)) to connect over. You can learn more 101 | about all the options of `ssh` by running: 102 | 103 | ```bash 104 | man ssh 105 | ``` 106 | 107 | ## Practice 108 | 109 | Now that you have this last tool, SSH, in your arsenel, you are ready to start some hacking practice. 110 | To show you are truely ready to progress to the next section, you must prove you are competent 111 | with the shell. 112 | 113 | On [OverTheWire: Bandit](https://overthewire.org/wargames/bandit/) do levels [0](https://overthewire.org/wargames/bandit/bandit0.html) 114 | through [15](https://overthewire.org/wargames/bandit/bandit15.html). These levels should take you a day or two 115 | to complete depending on how fast you get the later levels done. Good luck, and when you complete this 116 | head over to the next section! 117 | 118 | 119 | 120 | 121 | -------------------------------------------------------------------------------- /src/3_computer_organization/asm_memory.md: -------------------------------------------------------------------------------- 1 | # Memory 2 | 3 | ## Introduction 4 | You've already learned about [memory](./memory.md), how you can access it with addresses, and how programs often live in memory with other memory segments like the Heap and Stack. Surprise, surprise, the programs we can write with instructions live in memory as well. With our new knowledge of instructions, you can use this memory to store things that may be very large or of an unknown length. 5 | 6 | ## Memory and Lists 7 | Using instructions like `mov` you can access the data at some memory location. Say another part of the program provided you with the memory address to a writeable place in memory. You could write to it like so: 8 | ```c 9 | // rax = memory addr 10 | 11 | mov [rax], 0x1337 12 | ``` 13 | 14 | Now, let's say we wanted to make a **list** of numbers. We say this list would be 4 numbers large and look like: 15 | ```python 16 | my_list = [2, 4, 8, 16] 17 | ``` 18 | 19 | Assuming the memory at `my_list` label has enough space, we could set up the list like so: 20 | ```c 21 | // my_list is a label to some free data we can write too 22 | mov rax, my_list 23 | mov [rax], 0x2 24 | mov [rax+4], 0x4 25 | mov [rax+8], 0x8 26 | mov [rax+0xc], 0x10 27 | ``` 28 | 29 | In this example, we assumed that the number will be at a max of 4 bytes large. This is important and changes the way we could get the memory back. Say we now wanted to use the data we stored in memory. We would now need to use 4 byte versions of our registers to assure we get the right number (since its only 4 bytes, not 8): 30 | ```c 31 | // my_list is a label to data with numbers of size 4 bytes 32 | mov rax, my_list 33 | mov edi, [rax] 34 | mov esi, [rax+4] 35 | mov edx, [rax+8] 36 | mov ecx, [rax+0xc] 37 | ``` 38 | 39 | ## Stack 40 | You may remember from the [memory-segments](./memory_segments.md) section that we have two special writeable locations in memory: the Stack and the Heap. For the purpose of simplicity, we don't go over how to access and use the Heap in this module since it requires using more complicated instructions. For now, we can just use writeable program memory as we would the Heap since we can consider the case where all we get is an address to a writeable location. 41 | 42 | The Stack is very similar to normal writeable locations. It has addresses and it can be directly dereferenced like a normal address. The Stack is special though because it works like a literal stack (think stacking pancakes), and it has a dedicated register (`rsp`), to tell you where the top of the stack currently is. 43 | 44 | ### Working with the Pancake Stack 45 | Say your mom places 3 pancakes on your plate: pancake 1, 2, and 3. 46 | ``` 47 | Pancake Stack: 48 | 49 | ####################### 50 | | pancake 1 | 51 | ####################### 52 | ~~~~~~~~~~~~~~~~~~~~~~~ 53 | ####################### 54 | | pancake 2 | 55 | ####################### 56 | ~~~~~~~~~~~~~~~~~~~~~~~ 57 | ####################### 58 | | pancake 3 | 59 | ####################### 60 | |=========================| 61 | ``` 62 | 63 | You can't just access pancake 3, that would destroy the stack (and make your mom mad). You need to access pancake 1 first, then 2, then 3. When you access the pancake on the top, we call it a `pop`. Yes, you literally `pop` the pancake into your mouth. We represent that with the instruction: 64 | ```c 65 | pop mouth 66 | ``` 67 | 68 | Which results in the new pancake stack: 69 | ``` 70 | Pancake Stack: 71 | 72 | ####################### 73 | | pancake 2 | 74 | ####################### 75 | ~~~~~~~~~~~~~~~~~~~~~~~ 76 | ####################### 77 | | pancake 3 | 78 | ####################### 79 | |=========================| 80 | ``` 81 | 82 | Now, the top of the stack is `pancake 2`. We would say the `pancake stack pointer` is pointing at the location where the second pancake is located now. It was originally pointing at the location of pancake 1, but we poped the stack. 83 | 84 | So you `pop mouth` another pancake: 85 | ``` 86 | Pancake Stack: 87 | 88 | ####################### 89 | | pancake 3 | 90 | ####################### 91 | |=========================| 92 | ``` 93 | 94 | Before you can do another pop, your mom pushes a fresh new pancake on your plate with the instruction: 95 | ``` 96 | push pancake_4 97 | ``` 98 | 99 | Now the stack looks like: 100 | ``` 101 | Pancake Stack: 102 | 103 | ####################### 104 | | pancake 4 | 105 | ####################### 106 | ~~~~~~~~~~~~~~~~~~~~~~~ 107 | ####################### 108 | | pancake 3 | 109 | ####################### 110 | |=========================| 111 | ``` 112 | 113 | Now the top of the stack points to pancake 4. 114 | 115 | ### Working with the Real Stack 116 | Now you understand how the stack works. You can save stuff there temporarily with `push` and retrieve with `pop`. The special register `rsp` points to the top of the stack. When you do a `push` it results in `rsp -= 8`. When you `pop` it results in `rsp += 8`. 117 | 118 | > Recall: the stack grows _down_ by making the stack address smaller as you need more space. If you need to expand 8 bytes, you would subtract 8 from the rsp. 119 | -------------------------------------------------------------------------------- /src/2_operating_systems/introduction.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | ## Background 4 | To start your journey into binary reversing and exploitation, you first need to understand the platform 5 | on which you reverse and exploit. As you are likely aware, you are currently on an operating system (OS): 6 | likely Windows or MacOS. Generally, people know the difference between these OS, but 7 | to be a good hacker you must understand these differences on a technical level. In future modules, 8 | we will talk about MacOS and Windows, but for now Linux will suffice. 9 | 10 | What is Linux you may ask? Linux is the operating system of hackers! It is also the OS 11 | that most embedded devices and servers use. Linux is [open-source](https://opensource.com/resources/what-open-source), 12 | which means it is also an OS that is easy to develop on and learn (relative 13 | to the other ones). Right now it might not be clear what you would be developing that is related 14 | to the OS or how it is useful, but it will hopefully be more clear at the end 15 | of this section. 16 | 17 | ## Linux 18 | 19 | To get a better idea of what Linux is, read the first five sections of this 20 | [article](https://www.linux.com/what-is-linux/). 21 | 22 | \* \* \* 23 | 24 | Now that you know what a distribution of Linux is, you can be informed that we will be using the mainstream, 25 | Debian based distribution [Ubuntu](https://ubuntu.com/tutorials). If you have found your way here 26 | through external hacking tutorials, you be tempted to use [Kali Linux](https://www.kali.org/downloads/), a 27 | similarly Debian based distribution. I'll make the argument that it will be easier to use Ubuntu than Kali, 28 | and for the basis of these tutorials, the up-to-date kernel of Ubuntu is better. 29 | 30 | TL;DR: We will use Ubuntu, almost none of the tools on Kali are needed for this. 31 | 32 | In the next sub-section, we get Ubuntu installed. 33 | 34 | ## The Kernel 35 | 36 | Before going straight into using an operating system, it's nice to mention that there is a component of 37 | your computer that makes everything run―it's called the **kernel**. In a typical introduction to 38 | operating systems you would likely be introduced to this idea in-depth, but I will only briefly 39 | talk about it. If you are very interested, you should read [this](https://www.amazon.com/Understanding-Linux-Kernel-Third-Daniel/dp/0596005652) book after completing this handbook, 40 | as the content assumes you are already a systems hacker. 41 | 42 | The kernel is the lowest running software that is always running when the computer is on. It's a fundamental 43 | part of the operating system that runs on your computer. If you imagined the computer as a stack of 44 | software and hardware, the kernel would be the line that divides the two: 45 | 46 | ![picture of computer stack](./computer_stack.jpeg) 47 | 48 | The kernel is responsible for: 49 | - making sure programs have somewhere to run (talked about later) 50 | - hardware peripherals, like you physical mouse, actually do things 51 | - your computer having enough memory to function 52 | 53 | And various other tasks that set your computer apart from your toaster oven. It is often true that each 54 | operating system will have its own kernel that was designed to work with it. 55 | 56 | Since this "program" called the kernel is always running and controls everything important, it's safe to 57 | assume that this program is at a higher level of existence than a normal program. Not only is it at a 58 | higher level of **privilege** than a normal program, but it is also at a higher level of privilege than 59 | an admin. The kernel is the god of this computing world, none come before it. 60 | 61 | Luckily, this god known as the kernel is benevolent and allows measly normal programs to use parts of 62 | it when needed. Let's take my favorite game [minecraft](https://minecraft.net) as an example. 63 | 64 | ### Minecraft as an example 65 | 66 | When you use a normal program, like Minecraft, you are running the program in **user space**. The 67 | user space is the virtual space of your computer where normal programs run. As expected, there 68 | exists a space for kernel things called the **kernel space**. 69 | 70 | So you launch Minecraft, and load into your favorite world. You are in the user space. But now, 71 | you move your mouse so that your view in Minecraft rotates -- you have now briefly entered 72 | kernel space. Why? Because hardware caused something called an `interrupt`, which in the 73 | god analogy is equivalent to praying to god for god for a brief blessing of power. 74 | 75 | When you move your mouse this is what happens: 76 | 1. The software that drives the mouse detects a movement 77 | 2. The software causes an interrupt, which stops whatever else is running for a brief time 78 | 3. It enters kernel mode by setting a value that signifies that it is "privileged" 79 | 4. The kernel interprets the exact movement then sends the info back to user space 80 | 5. The mouse move is handled in Minecraft. 81 | 82 | You can visualize it like this (prepare for 3rd grade drawings): 83 | 84 | ![](./minecraft.jpeg) 85 | 86 | ### TL;DR 87 | 88 | The gist here is that there is a **super privileged** software running on every computer called 89 | the kernel. The kernel controls everything at the lowest level, and compromising it means you 90 | compromised everything above it―any software running on the computer. In our next section 91 | on virtualization, it's helpful to know this thing called the kernel exists. 92 | -------------------------------------------------------------------------------- /src/3_computer_organization/addressing.md: -------------------------------------------------------------------------------- 1 | # Addressing 2 | 3 | In the last section about [programs in memory](./programs_in_mem.md) we talked about loading Minecraft into memory (RAM). It was briefly mentioned that: 4 | > When you click buttons in the Minecraft game, it simply transitions to different locations in memory where those button's code exists. 5 | 6 | The way it transitions to different locations in memory is through addressing. Like real life, addressing helps us find the places things live, in-memory for this case. In real life, someone would give you a unique string, their address, where you could find them in real life. Something like `411 North Central Ave, Phoenix, AZ`. In memory, we follow these same rules, but to a more simplified manner. 7 | 8 | ## Linear Addresses 9 | 10 | The simplified manner is that memory has a very specific set of rules: 11 | 1. Each address is unique 12 | 2. Each address is a number 13 | 14 | The number starts at zero and continues to whatever the `bit` count of the computer is. This fact should now help you realize what it means to have a `64bit` computer vs a `32bit` computer. In the old days, everything was `32bit`, but now everything is `64bit`. If you find some old applications, you may notice the option to download either—since programs made in 32bit are cross-compatible for 64bit. 15 | 16 | Anyway, we will assume you are on a 64bit computer. That means the starting address, or the smallest address, is `0x0000000000000000` and the largest address is `0xFFFFFFFFFFFFFFFF`. You will notice this is an 8 byte range (also 64bit). Here is how you can visualize the memory in your computer: 17 | 18 | 19 | ``` 20 | *----*----*----*----* 0x0000000000000000 21 | | | 22 | |-------------------| 23 | | | 24 | |-------------------| 25 | | | 26 | |-------------------| 27 | | | 28 | | . | 29 | | . | 30 | | . | 31 | | | 32 | |-------------------| 33 | | | 34 | *----*----*----*----* 0xFFFFFFFFFFFFFFFF 35 | ``` 36 | 37 | At each address is exactly 1 byte of data. 38 | However, to make make things more concise, we often like to talk about address in terms of **8 bytes** (4 on 32bit machines). 39 | As such, we usually say each address points to an 8-byte slot of data (the 8 bytes following that exact address). 40 | This means one referenced address could store 2**(64) combinations of things. 41 | 42 | ## Reading & Writing Data 43 | 44 | To simplify things, lets define two functions: 45 | ```python 46 | read(address) -> outputs 8 bytes of data 47 | write(address, content) -> writes 8 bytes of data to address 48 | execute(data) -> executes the 8 bytes it's given 49 | ``` 50 | 51 | You can consider that when Minecraft launches the OS know where the beginning and end of Minecraft is. It records it with **labels**: 52 | ``` 53 | minecraft_start = 0x0000000040000000 54 | minecraft_end = 0x0000000050000000 55 | ``` 56 | 57 | Minecraft will of course take up a large portion of space: 58 | 59 | ``` 60 | *----*----*----*----* 0x0000000000000000 61 | | | 62 | | | 63 | | | 64 | |-------------------| 0x0000000040000000 65 | | | 66 | | | 67 | | MINECRAFT | 68 | | | 69 | | | 70 | |-------------------| 0x0000000050000000 71 | | | 72 | *----*----*----*----* 0xFFFFFFFFFFFFFFFF 73 | ``` 74 | 75 | Notice that Minecraft neither starts at the beginning nor at the end of memory. Any program can be loaded at any random place in memory. In addition, other magical things happen to actually allow you to break up a program into multiple places, called [Virtual Addressing](https://whatis.techtarget.com/definition/virtual-address). For now, consider things to be linear and continuous. 76 | 77 | The OS does not know the exact location of the Minecraft Quit Button code. Instead, it only knows an **offset** from `minecraft_start`. Something like: 78 | ``` 79 | mc_quit_btn_offset = 0x80 80 | ``` 81 | 82 | Note: assume there are the correct amount of 0's on the other side of values when they are small like `0x80`. This is all still 64bits 83 | 84 | Now to run the Minecraft Quit Button, the OS simply reads the offset, then executes it. 85 | ```python 86 | code = read(minecraft_start + mc_quit_btn_offset) 87 | execute(code) 88 | ``` 89 | 90 | ``` 91 | *----*----*----*----* 0x0000000000000000 92 | | | 93 | | | 94 | | | 95 | |-------------------| 0x0000000040000000 96 | | | <--------- mc_quit_btn_offset 97 | | | 98 | | MINECRAFT | 99 | | | 100 | | | 101 | |-------------------| 0x0000000050000000 102 | | | 103 | *----*----*----*----* 0xFFFFFFFFFFFFFFFF 104 | ``` 105 | 106 | Other things happen, but for now understand that each thing is accessed as an offset. The same goes for writing over the contents of the map: 107 | ```python 108 | write(minecraft_start + mc_map_offset) 109 | ``` 110 | 111 | ## Summary 112 | 113 | Things are laid-out in memory in a linear format. Each location in memory can be addressed by an address (usually referred to in hex). At each address you can store data and read data. All-in-all, we have a large place that we can read and write too using numbers as our address. We usually refer to locations as offsets of known labels. 114 | 115 | 116 | -------------------------------------------------------------------------------- /src/3_computer_organization/math_and_counting.md: -------------------------------------------------------------------------------- 1 | # Math & Counting 2 | 3 | ## Bits 4 | 5 | You've seen it before in movies, a series of ones and zeros. Now you have a name to go with it: Bits! Here is an example of some Bits: 6 | ```python 7 | 0100010101001001011101001001001000000000011101101111010111010110 8 | ``` 9 | 10 | According to [Wikipedia](https://en.wikipedia.org/wiki/Bit): 11 | > "The bit is a basic unit of information in computing and digital communications ... The bit represents a logical state with one of two possible values. These values are most commonly represented as either "1" or "0", but other representations such as true/false, yes/no, +/−, or on/off are common." 12 | 13 | Let's take emphasis to the "represents a logical state with one of two possible values." So a bit `b`, can be either a `1` or a `0`. That means a single bit has two combinations (1 or 0). Let's make this a little more abstract: a single bit can represent two distinct things. 14 | 15 | As an example, let's say we own a light tower at a dock. If the light is on, aka `1`, then a boat can dock now. If the light is off, aka `0`, then the boat can't dock now. Simple, on or off. `1 = dock`; `0 = no dock`. But what if we need to tell the people in the boats more than just two things? Well, we can get more lights (aka more bits). If we have two bits, we now have `2 * 2 = 4` possible combinations, so we can represent 4 things in total now. 16 | 17 | Assume the boaters know which light is on the left and which is on the right. Now we can signal four different states of docking: 18 | ``` 19 | 00 = can't dock now 20 | 01 = can dock in 1 hour 21 | 10 = can dock in 2 hours 22 | 11 = can dock now 23 | ``` 24 | So if the left light is on, but the right is not, then you can dock in 2 hours. In this way, we just [encoded](https://techterms.com/definition/encoding) 4 different states of being. Pretty cool right? How in only two series of 1's and 0's we got that much information out. So how does it scale? 25 | 26 | ### Scaling Bits 27 | 28 | If 1 bit can encode `2 data states`. 2 bits can encode `4 data states`. 3 bits can encode `2 * 2 * 2 = 8 data states`. The pattern here is called the power of twos. To get the number of states your bits can represent is simple: raise 2 to the power of the number of states you have. Here is a fancy function for it: 29 | 30 | ``` 31 | states(b) = 2 ^ b 32 | ``` 33 | 34 | Where `b` is the number of bits you have. So if you have `8 bits`, then you have `2 ^ 8 = 256` different states you can represent... Yeah, that scales very fast. If you just had 8 flash lights, you could represent 256 different things to your friend across the street. Pretty cool. You may have noticed already, but its an exponential ramp-up on the number of states you can represent, which is good for us computer scientist. 35 | 36 | ## Bits & Bytes 37 | 38 | Often, we need to use more than just a single bit. We call a set of 8 bits a **byte**. Using our earlier maths, a single byte can represent 256 different states. When we use bits to store human data, we usually need much more than 1 byte. This is where our SI table comes for bytes: 39 | 40 | ``` 41 | 1 kilobyte (kb) = 1024 bytes = 8192 bits 42 | 1 megabyte (mb) = 1024 kb 43 | 1 gigabyte (gb) = 1024 mb 44 | 1 terabyte (tb) = 1024 gb 45 | ``` 46 | 47 | You probably have more or equal to 256 GB of disk storage right? That means your disk has `2147483648` bits ready to hold either a `1` or `0` in it's place. That also means that your hard disk can represent `2 ^ 2147483648` different states. That's insane. How they do that with hardware is out of the scope of this handbook, but know that they do it with a little electrical engineering magic. 48 | 49 | ## Hexadecimal 50 | 51 | We talk about bytes so much that it is often easier to refer to a binary number in a completely new counting system called [hexadecimal](https://simple.wikipedia.org/wiki/Hexadecimal) because it is more concise. Hexadecimal is one type of number system. Decimal, the one we usually count in, is another. To understand these `bases` and how to look at hex, watch this [khan academy video](https://www.youtube.com/watch?v=4EJay-6Bioo) 52 | 53 | ... 54 | 55 | As a recap, Hex is converted to decimal and binary like so: 56 | ``` 57 | 0 = 0 (10) = 0000 58 | 1 = 1 (10) = 0001 59 | 2 = 2 (10) = 0010 60 | 3 = 3 (10) = 0011 61 | 4 = 4 (10) = 0100 62 | 5 = 5 (10) = 0101 63 | 6 = 6 (10) = 0110 64 | 7 = 7 (10) = 0111 65 | 8 = 8 (10) = 1000 66 | 9 = 9 (10) = 1001 67 | A = 10 (10) = 1010 68 | B = 11 (10) = 1001 69 | C = 12 (10) = 1100 70 | D = 13 (10) = 1101 71 | E = 14 (10) = 1110 72 | F = 15 (10) = 1111 73 | ``` 74 | 75 | To make it clear that we are writing in hex, and not decimal, we will always append a `0x` to the beginning of the number. So when we say `0x0F`, you know we mean `15` in decimal. 76 | 77 | To tie this all together, we go back to how many bits are in a byte. There are 8 bits in a byte which we usually write like so `0000 0000`. We write it like that because the hex representation is `0x00`. Now we can refer to bigger bit numbers really easy. For instance, if we wanted to refer to `20` decimal we would just write `0x14`, which is `0001 0100` in binary. If you were confused about that conversion, re-watch the video above. 78 | 79 | Remember that hex bytes can also scale just like we did earlier with bits. We can represent **huge** numbers with hex that we would not normally talk about, like: 80 | 81 | ```python 82 | 0x7ffff7dd409 83 | ``` 84 | 85 | Which represents the number `140737351860368` in decimal. Yup that value took 6 bytes to represent. Aka `6*8` bits. 86 | 87 | Now that we understand the fundamentals of bits, lets move on to using them in logic. -------------------------------------------------------------------------------- /src/4_programming_languages/debugging_c.md: -------------------------------------------------------------------------------- 1 | # Debugging 2 | 3 | ## Introduction 4 | A debugger is a tool you use to understand and run analysis on a binary. It allows you to step through instructions and view memory as it is running. Just like `gcc`, there exist a debugger called `gdb`. The `g` in all of these names stands for [GNU](https://www.gnu.org/home.en.html). 5 | 6 | ## Debugging Programs You Wrote 7 | When you have the source for a program, debugging it is very easy. You compile the binary we used in the last secion again with `-g` flag which makes it have symbols and be in a ready form for `gdb`. 8 | 9 | ``` 10 | gcc ex.c -o ex -g 11 | ``` 12 | 13 | Now run the binary with 14 | ``` 15 | gdb ./ex 16 | ``` 17 | 18 | You will now see a bunch of text ending in: 19 | ``` 20 | Reading symbols from ./ex... 21 | (gdb) 22 | ``` 23 | 24 | You are in the `gdb` prompt now. To exit you can type `exit` and hit enter. For now we will stay and run a few commands. 25 | 26 | Since symbols exist, you can just break at any line in the source you like. Breaking is the stopping of code at a certain condition. When you set a breakpoint you tell the debugger to stop the program at a certain symbol or address. Run: 27 | 28 | ``` 29 | b main:1 30 | r 31 | ``` 32 | 33 | We said `break at line 1 in the main function and run`. Type `l` and now you will see where you are: 34 | 35 | ``` 36 | (gdb) l 37 | 1 int main() 38 | 2 { 39 | 3 for(int i; i < 10000; i++) { 40 | 4 puts("hello world"); 41 | 5 } 42 | 6 } 43 | ``` 44 | 45 | Very cool. Now if we enter the command `n` (next) twice, we will be inside the loop. We can print the value of `i` at each iteration: 46 | 47 | ``` 48 | (gdb) n 49 | 3 for(int i; i < 10000; i++) { 50 | (gdb) n 51 | 4 puts("hello world"); 52 | (gdb) p i 53 | $1 = 0 54 | ``` 55 | 56 | We can even change the value of `i`: 57 | 58 | ``` 59 | (gdb) set variable i=10000 60 | (gdb) n 61 | hello world 62 | 3 for(int i; i < 10000; i++) { 63 | (gdb) 64 | 6 } 65 | ``` 66 | 67 | Now we leave the loop since we set `i`. 68 | 69 | This was all possible because `-g` mapped every symbol that we see in the source to a corresponding address, stack variable, and region in the programs memory. You can find more tutorials on debugging programs you wrote [here](https://cs.baylor.edu/~donahoo/tools/gdb/tutorial.html). This is often where other manuals about debugging in gdb will stop... but not use. A real systems hacker needs to know how to debug programs they did not write and don't have the source too. 70 | 71 | ## Debugging Programs You Did Not Write 72 | Let's beging by making this realistic. In the real world, programs are compiled with `-O2` and the no `-g`. Then to make matters worse, they run a speical command on the binary called `strip`. The `strip` command is like the opposite of `-g`. It removes symbols to make the binary smaller and a little faster. Importantly, it removes the names of functions. Lets compile a binary and strip it: 73 | 74 | ```bash 75 | gcc ex.c -o ex_real -O2 76 | strip ex_real 77 | ``` 78 | 79 | Now run the program in gdb and try to break at main: 80 | ``` 81 | (gdb) b main 82 | Function "main" not defined. 83 | ``` 84 | 85 | Yeah... there is no symbols. So how the hell do we stop at the start of the program now? Normally this means we have to look at the entry point address of the binary and then break at that specific address then `s` (step) our way to main. Luckily, us hackers have come a long way and we have made upgrades to gdb to make debugging binaries easier. Instead of telling you to use vanilla gdb, lets use a slightly enhanced version you can run on most systems. 86 | 87 | A few gdb extensions exist, but my favorite is [GEF](https://github.com/hugsy/gef) (GDB Enhanced Features). Let's quickly install it and run this binary again. 88 | 89 | ``` 90 | bash -c "$(wget http://gef.blah.cat/sh -O -)" 91 | ``` 92 | 93 | Now run gdb on the the binary `ex_real` again and run the command `entry-break`: 94 | ``` 95 | gef➤ entry-break 96 | Stopped due to shared library event (no libraries added or removed) 97 | [*] PIC binary detected, retrieving text base address 98 | [+] Breaking at entry-point: 0x5555555550a0 99 | ... 100 | ``` 101 | 102 | This will now make you screen have a bunch of info on it that you may recognize from the assembly section of this handbook. 103 | You should also be noticing by now that we can't see the source code anymore. You can't see the source if the symbols are not mapped in the binary, and we have no way or retrieving it. From now on, we will have to use two commands: 104 | - `si`: Step Instruction, steps an instruction (steps into a call) 105 | - `ni`: Next Instruction, to step over instructions (skips stepping into a call) 106 | 107 | Now run: 108 | ``` 109 | si 10 110 | ``` 111 | 112 | This will step 10 instructions. You should now see something like: 113 | ``` 114 | call QWORD PTR [rip+0x2f12] # 0x555555557fe0 115 | ``` 116 | 117 | This call is the function `_libc_start_main`, which initializes the `main` function we know and love. It's always found in the entry of the program and it's first argument in the function address of `main`. Let's get that address and break at it. 118 | 119 | > Recall: the first argument in System V 64 bit is `rdi` 120 | 121 | ``` 122 | gef➤ p $rdi 123 | $1 = 0x555555555060 124 | gef➤ b *$rdi 125 | Breakpoint 1 at 0x555555555060 126 | ``` 127 | 128 | Now if we use continue we go to main: 129 | ``` 130 | c 131 | ``` 132 | 133 | Use `ni` until you see an instruction like 134 | ``` 135 | call 0x555555555050 136 | ``` 137 | 138 | It's the `puts` in the loop. You should also recognize the structure of this assembly as being a loop with the `cmp ebx, 0x2710` being right after the `puts` if you hit `ni`. Yes `i` is now the register `ebx`. Debugging a binary is much harder. But it is not impossible. We will go indepth on this concept in [processes and debugging](../5_processes/gdb.md) section and with the [EmbryoGDB](https://dojo.pwn.college/challenges/gdb) challenges from pwn.college. -------------------------------------------------------------------------------- /src/4_programming_languages/compilation.md: -------------------------------------------------------------------------------- 1 | # Compilation & Execution 2 | 3 | ## Introduction 4 | As you know, compilation is the conversion of your high-level C code into assembly and then into machine code that the computer can understand. Lucky for you, nearly every Linux machine comes with the most popular compiler `gcc`. I did say the _most_ popular compiler though, there are actually [a few that people use](https://clang.llvm.org/). For simplicity, we will only refer to gcc as our compiler. 5 | 6 | Let's compile a simple program: 7 | ``` 8 | int main() 9 | { 10 | for(int i; i < 10000; i++) { 11 | puts("hello world"); 12 | } 13 | } 14 | ``` 15 | 16 | > Recall: you can place text in a file using "vim filename" 17 | 18 | ```bash 19 | $ cat ex.c 20 | int main() 21 | { 22 | for(int i; i < 10000; i++) { 23 | puts("hello world"); 24 | } 25 | } 26 | ``` 27 | 28 | Compile it by passing its name to gcc: 29 | 30 | ``` 31 | gcc ex.c 32 | ``` 33 | 34 | Now you will have a file named `a.out` in the same directory. You can execute it by running `./a.out`. 35 | 36 | ```bash 37 | $ ./a.out 38 | hello world 39 | hello world 40 | hello world 41 | [truncated] 42 | hello world 43 | ``` 44 | 45 | Yay you can compile and execute now! Let's talk about common compiler flags. 46 | 47 | ## Common Compiler Flags 48 | ### Change output name 49 | First up is `-o`: 50 | 51 | ``` 52 | gcc ex.c -o ex 53 | ``` 54 | 55 | This outputs a file named `ex`. Yes `-o` lets you choose the output file name. 56 | 57 | ### Change optimizations 58 | 59 | Next is `-ON`, where `N` is some number between [1, 3]. The `-O` option lets you choose the optimization level of the code, which you should read more about [from arm docs](https://developer.arm.com/documentation/den0013/d/Optimizing-Code-to-Run-on-ARM-Processors/Compiler-optimizations/GCC-optimization-options). 60 | 61 | Let's do a quick test. Often when you need code faster, you want to use a higher level of optimization. `-O3` is considered to be dangerous since it can more easily introduce bugs into your code, so instead we use `-O2`. Lets see the difference. 62 | 63 | ``` 64 | gcc ex.c -o ex2 -O2 65 | ``` 66 | 67 | You'll notice the program takes about the same time to run as the original. This is because the code is so small and simple. If you want to see larger results, compile larger binaries like [coreutils binaries](https://github.com/coreutils/coreutils). Speaking of coreutils binaries, your system usually has coreutils compiled with `-O2`. As an example, `ls` is a coreutils binary compiled with `-O2`. 68 | 69 | ### Change linking 70 | 71 | We talked about speed, but what about protability? 72 | 73 | > Recall: in the [computer organizatio memory-segment section](../3_computer_organization/memory_segments.md) we talked about a program being mapped into memory with multiple other programs, one such program was libc. This is because when we compiled the binary it was linked. 74 | 75 | When compiling a binary, gcc automatically [links](https://medium.com/@dkwok94/the-linking-process-exposed-static-vs-dynamic-libraries-977e92139b5f) the binary, which means it maps code from other programs into it's process memory space when running. As an example run this commnad: 76 | 77 | ```bash 78 | ldd ./ex 79 | ``` 80 | 81 | This will show something like: 82 | 83 | ```bash 84 | linux-vdso.so.1 (0x00007ffc0f3d5000) 85 | libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5b46d4c000) 86 | /lib64/ld-linux-x86-64.so.2 (0x00007f5b46f59000) 87 | ``` 88 | 89 | All those programs will be mapped into memory when this program runs. This makes sense, since I did not right the `puts` function shown in our code. That `puts` is from `libc` which is shown above. 90 | 91 | So what happens if you give a friend this binary and they dont have libc on their system? Or, what if they don't have the same version? Well then they can't run your binary. Luckily, there is a way to include ALL the external code you use in one binary. This is called [static](https://kb.iu.edu/d/akqn) linking. Let's try it: 92 | 93 | ```bash 94 | gcc ex.c -o ex_static -static 95 | ``` 96 | 97 | This binary runs just like the other binary, but now when we check it with ldd: 98 | ```bash 99 | $ ldd 100 | not a dynamic executable 101 | ``` 102 | 103 | Yup it's not dynamic, so no other programs are mapped. So what is the tradeoff? Run this command to see something: 104 | ```bash 105 | ls -lah ex* 106 | ``` 107 | 108 | You will see something like, but with your user: 109 | ```bash 110 | -rwxrwxr-x 1 parallels parallels 17K Feb 7 10:00 ex 111 | -rwxrwxr-x 1 parallels parallels 17K Feb 7 09:48 ex2 112 | -rw-rw-r-- 1 parallels parallels 83 Feb 7 09:57 ex.c 113 | -rwxrwxr-x 1 parallels parallels 852K Feb 7 09:57 ex_static 114 | ``` 115 | 116 | Notice, `ex_static` is `852K`! That is 50x bigger than `ex`, for a program so small. What if this program was already big? Well then it would be even huger. This is of course because all the code of the external programs like libc are now in this program. 117 | 118 | If you want, you can also prove to yourself that nothing is mapped into memory when you run the binary with the program below (read the warning below): 119 | ``` 120 | while true; do ./ex_static fake_arg & cat /proc/"$!"/maps >> mymaps; done 121 | ``` 122 | 123 | Warning: you will need to `Ctr+C` (kill) this after 10 seconds yourself, since it is a little hard to collect the maps of a short-lived process. The output will be in the mymaps file. 124 | 125 | 126 | ### Debug options 127 | If you compile with the `-g` flag, you include symbols in the binary. For now, just know this means it makes it both bigger, slower, but easier to debug with a debugger. We will cove this more in [debugging-c](./debugging_c.md). 128 | 129 | 130 | ### Developer options 131 | 132 | The last thing we would like to mention is that if you see a `-f` at the beginning of an option, like `-fno-pie`, that means the binary is being compiled with a special developer option. These options often do dangerous or interesting things. Like the command above `-fno-pie` turns off an important security mechanic of the program you are compiling. It can also be used if you need to remove specific optimizations from compiling. For most people, you will not need to touch this. If you see it being used for a program you are examining, recall this paragraph about wacky things happening with `-f`. 133 | 134 | ## Makefiles 135 | We should also mention that nearly every large source base uses something called a `Makefile`, which, when used with the `make` command compiles the binary (and can do other things). You can consider a Makefile as a series of shell commands. Let's make a simple one for compiling our binary with optimization 2 and static linking: 136 | 137 | ```Makefile 138 | all: 139 | gcc ex.c -o ex_final -O2 -static 140 | ``` 141 | 142 | The file you put this in MUST be named `Makefile`. Now that you have make the Makefile, run it by running `make`. This will compile the binary `ex_final`. For more useage, see this [tutorial](https://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/). 143 | 144 | 145 | -------------------------------------------------------------------------------- /src/3_computer_organization/bits_and_logic.md: -------------------------------------------------------------------------------- 1 | # Bits & Logic 2 | 3 | Continuing our discussion on bits, let's talk more about how we can mess with the states system we described earlier. Before that, lets clarify some notation: 4 | 5 | ## Logic 6 | 7 | You've definitely heard the term often, but how deeply have you understood the fundamentals of it? You can find an elongated definition on Wikipedia for [Mathematical Logic](https://en.wikipedia.org/wiki/Mathematical_logic), but lets define it simply as conclusions and reasoning of truths. 8 | 9 | --- 10 | ### Notation 11 | 12 | - ` x = y` means y is assigned to x, or, x is now what y is. 13 | - `x == y` is a statement about equivalence "x is the same as y". 14 | 15 | The second thing is a question of whether x is the same as y or not. As an example, say `x = 7; y = 3`. If we now say: 16 | ``` 17 | x == y 18 | ``` 19 | The answer to the question would be `False`. We could also write it as: 20 | ``` 21 | (x == y) -> False 22 | ``` 23 | This could also be said as `x == y` implies `False`. 24 | 25 | --- 26 | 27 | To make logic easy to write in concise ways, we define abstract things as variables. Something like: 28 | 29 | - S: "Today it's sunny" 30 | - R: "Toady it's rainy" 31 | 32 | Now just having truths assigned to variables would make logic very useless, so they only become useful when we apply **operations** to them. These operations are called **logic operations**. There are 4 fundamental logic operators: 33 | 34 | - AND 35 | - OR 36 | - NOT 37 | - XOR 38 | 39 | Let't talk about each one. 40 | 41 | ### AND 42 | 43 | `AND` works logically like how you use it in english. Its useful for understanding the truth of two things. As an example: 44 | 45 | ``` 46 | S: "Today it's sunny" 47 | R: "Today it's rainy" 48 | 49 | S = True 50 | R = False 51 | 52 | (S AND R) -> False 53 | ``` 54 | 55 | Let's decode the above. First, we defined S and R as shorthand notation for an abstract thing like the state of the day (being rainy or sunny). Next we described the truth of the states we defined. "Today it's sunny" is True; then we said "Today it's rainy" is False. Lastly, we evaluated the truth value of: 56 | 57 | `(S AND R)` 58 | 59 | or the statement: 60 | 61 | `Today it is sunny AND it is rainy`, which is `False`. 62 | 63 | It's False because we had earlier said that it was not rainy today. In this way, you can treat `AND` like a function that takes two arguments `AND(x, y)`. The input is two truth variables (which could be true or false), and the output is `True` or `False`. Let's shorten the state `True` to T and `False` to F. With all your knowledge we can easily define all the possible outputs of `AND`, known as a [truth table](https://en.wikipedia.org/wiki/Truth_table). 64 | 65 | | X | Y | X AND Y | 66 | | ----| ----|---------| 67 | | F | F | F | 68 | | T | F | F | 69 | | T | T | T | 70 | | F | T | F | 71 | 72 | As you can see, the output is only ever True if both X and Y are true. After all of this, it will be much easier to define the other operators. 73 | 74 | ### OR 75 | 76 | `OR` is very similar to `AND`. It takes two truth values and outputs `True` if just one of the two are `True`. Here is the truth table: 77 | 78 | | X | Y | X OR Y | 79 | | ----| ----|--------| 80 | | F | F | F | 81 | | T | F | T | 82 | | T | T | T | 83 | | F | T | T | 84 | 85 | ### NOT 86 | 87 | `NOT` is special because it only takes a single truth value. All `NOT` does is reverse the truth of its argument. Here is the truth value: 88 | 89 | | X | NOT X | 90 | | ----|-------| 91 | | F | T | 92 | | T | F | 93 | 94 | Its useful now though to say that you can compound logical operators: 95 | 96 | | X | Y | X OR Y | NOT( X OR Y ) | 97 | | ----| ----|--------|--------| 98 | | F | F | F | T | 99 | | T | F | T | F | 100 | | T | T | T | F | 101 | | F | T | T | F | 102 | 103 | In addition to that, NOT also has a special reversing mechanic on `AND` and `OR`. For instance: 104 | 105 | ``` 106 | (NOT(X OR Y)) == (NOT(X) AND NOT(Y)) 107 | ``` 108 | 109 | You can test that above by making your own truth table. Notice how you can distribute the `NOT` to each variable and the operator, which flipped it to `AND`. The same is true in the reverse. 110 | 111 | ### XOR 112 | 113 | `XOR` takes two truth values like the others, but is less used in normal english. Its short for `Exclusive`, which means the output is only true when the inputs differ: 114 | 115 | | X | Y | X XOR Y | 116 | | ----| ----|--------| 117 | | F | F | F | 118 | | T | F | T | 119 | | T | T | F | 120 | | F | T | T | 121 | 122 | Notice how it is only True when things going in are different from each other? Its an interesting mechanic and will be used more later. 123 | 124 | Now that we have a high level understand of logic, we can now relate it 125 | 126 | 127 | ## Bit Logic 128 | 129 | Logic with bits work exactly the same as logic in general. `True` is `1`; `False` is `0`. 130 | 131 | ### Notation 132 | Here is our new notation that is generally for bit logic: 133 | ``` 134 | AND: & 135 | OR: | 136 | NOT: ! 137 | XOR: ^ 138 | ``` 139 | 140 | All of them still work the same, but now if I want to say `x AND y` I would actually say `x & y`. 141 | 142 | ### Multi-Bit Logic Operations 143 | 144 | Operations on variables with bits work even on the **byte** level: 145 | 146 | ``` 147 | x = 11110101 148 | y = 00101101 149 | 150 | (x & y) == 00100101 151 | ``` 152 | 153 | How did the above work? If you look closely, each logic operation was applied on each individual bit it lined up with. 154 | 155 | Now, recall that bits can also be represented as hex! This means we can do logic operations on things that _look_ like numbers (but remember they are bits under the hood): 156 | 157 | ``` 158 | x = 0x13 (00010011) 159 | y = 0x32 (00110010) 160 | 161 | (x & y) == 0x12 (00110010) 162 | ``` 163 | 164 | This entire time we have been using **bytes**, but to keep with the earlier theme, why don't we assume that we can represent things in 64bits. For conciseness, we don't write leading 0's in a hex number: 165 | 166 | ``` 167 | x = 0xcafe (64 bits) 168 | y = 0xbabe (64 bits) 169 | 170 | (x ^ y) == 0x0000000000007040 171 | ``` 172 | 173 | The zeros are shown in the result just to clarify once again that we are in 64bits, but all the operations we have done before still work. Remember you can always compound logic statements on other logic statements (and store them in another variable if you). 174 | 175 | ### Logic Gates 176 | 177 | All these bit operations are actually mechanics of real-world hardware that things run on. Since electricity is like a 1 or a 0, it makes sense that these logic gates are what we first implemented in hardware. 178 | 179 | Circuit Engineers annotate these gates like shown [here](https://en.wikipedia.org/wiki/Logic_gate#Symbols). Generally speaking, all things on computers first start with these fundamental logic gates that are implemented in hardware. 180 | 181 | Now that we understand how to truly utilize the power of bits and logic, we can move on to understand a computer at its lowest level. 182 | 183 | -------------------------------------------------------------------------------- /src/2_operating_systems/containers.md: -------------------------------------------------------------------------------- 1 | # Containers 2 | 3 | Containers are another level of virtualization that allows for isolated spaces in the kernel 4 | to be created and destroyed without affecting each other. 5 | 6 | ## What is a container? 7 | 8 | To get more specific, read what Docker has to say on what a Container is: [here](https://www.docker.com/resources/what-container). 9 | Now you may be wondering, how does this differ from a VM? As an end-user, there is no difference. But as 10 | far as technicals go, Microsoft's diagram is a good comparison: 11 | 12 | 13 | **VMs vs Containers** 14 | ![](./ms_container_v_vm.png) 15 | 16 | In simple terms, a VM is much more heavy since we need to initialize an entirely new kernel for each VM we make. 17 | In a container, we share the already existent kernel and, using some technical tricks, create isolated sections 18 | of the kernel that we then use to create containers (which act as VMs). 19 | 20 | **TL;DR**: container light, VM heavy. 21 | 22 | ### Why use a container? 23 | 24 | You may be asking yourself, "why use a container?" We already are in a VM, so as far as affecting our host machine goes, we are 25 | already in the clear. Containers become the most useful when trying to run applications that have dependencies that 26 | we might not have access to in our current version of the VM. For instance, wanting to run an `Ubuntu 18.04` application 27 | while in `Ubuntu 20.04`. You can either pray to the computer gods and hope your dependences still work in the new version, 28 | or you can use a docker container that creates an `Ubuntu 18.04` environment in it to run. The latter is easier. In addition, 29 | containers are not persistent, meaning that when you are done running the container, everything you created inside of it is 30 | destroyed. 31 | 32 | ## Docker: the modern container 33 | 34 | Now that you know what a container is, it's time we set you up to be able to use containers. There are many implementations 35 | of containers, but the one we will use is [Docker](https://www.docker.com/why-docker). Docker is very mainstream and 36 | has a lot of use across the computer science industry. Take for instance [Wordpress](https://wordpress.com/), which you 37 | have likely heard of or used. It's a platform for building websites for free with a GUI. It also has a lot of 38 | dependencies and things you need to set up before using it. Wordpress uses Docker, and has made a docker container 39 | that does it all for you. Check it out: [here](https://hub.docker.com/_/wordpress). Essentially you can run an entire 40 | wordpress hosting software with a database all with this simple command in a terminal: 41 | ```docker 42 | docker run wordpress 43 | ``` 44 | 45 | Which is a great segway into installing Docker, since it requires use of the `command line`. 46 | 47 | ## Setting up docker 48 | 49 | We will now install `Docker` on your `Ubuntu VM`, so from now on all instructions are pertaining to being inside 50 | you VM. First login to the VM, then open up the terminal. You can do this by either searching applications or 51 | simple right-clicking the desktop and clicking `"Open terminal"`. This is where the fun begins. Pro-tip, 52 | you can click the copy button in the top left of the code snippets to copy and paste to your terminal. 53 | 54 | It's alright if you don't understand all the commands in this section, we will cover using the terminal later. 55 | 56 | First, we need to update the list of installable applications. Run: 57 | ```bash 58 | sudo apt-get update 59 | ``` 60 | 61 | After that, we need to install the dependencies of docker: 62 | ```bash 63 | sudo apt-get install \ 64 | apt-transport-https \ 65 | ca-certificates \ 66 | curl \ 67 | gnupg-agent \ 68 | software-properties-common 69 | ``` 70 | 71 | Next, let's add dockers GPG key so we can get their software: 72 | ```bash 73 | curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - 74 | ``` 75 | 76 | Add the repository: 77 | ```bash 78 | sudo add-apt-repository \ 79 | "deb [arch=amd64] https://download.docker.com/linux/ubuntu \ 80 | $(lsb_release -cs) \ 81 | stable" 82 | ``` 83 | 84 | Finally, update once more and install: 85 | ```bash 86 | sudo apt-get update 87 | ``` 88 | ```bash 89 | sudo apt-get install docker-ce docker-ce-cli containerd.io 90 | ``` 91 | 92 | Congratulations, you just installed your first command line tool! And quite a powerful one. 93 | Let's verify it works. 94 | 95 | ## Using docker 96 | 97 | ### Verification 98 | 99 | To verify you actualy have docker running, simply run: 100 | ```bash 101 | sudo docker run hello-world 102 | ``` 103 | 104 | The output should look something like: 105 | ``` 106 | Hello from Docker! 107 | This message shows that your installation appears to be working correctly. 108 | 109 | To generate this message, Docker took the following steps: 110 | 1. The Docker client contacted the Docker daemon. 111 | 2. The Docker daemon pulled the "hello-world" image from the Docker Hub. 112 | (amd64) 113 | 3. The Docker daemon created a new container from that image which runs the 114 | executable that produces the output you are currently reading. 115 | 4. The Docker daemon streamed that output to the Docker client, which sent it 116 | to your terminal. 117 | 118 | To try something more ambitious, you can run an Ubuntu container with: 119 | $ docker run -it ubuntu bash 120 | 121 | Share images, automate workflows, and more with a free Docker ID: 122 | https://hub.docker.com/ 123 | 124 | For more examples and ideas, visit: 125 | https://docs.docker.com/get-started/ 126 | ``` 127 | 128 | ### Having some fun 129 | 130 | To make things a little more interesting, let's launch an older Ubuntu version **inside** your 131 | current Ubuntu version. Virtualception. Let's first see what version we are on in our machine: 132 | ```bash 133 | grep '^VERSION' /etc/os-release 134 | ``` 135 | I'm on `Ubuntu 18.04`, so my command output this: 136 | ```bash 137 | VERSION="18.04.5 LTS (Bionic Beaver)" 138 | VERSION_ID="18.04" 139 | VERSION_CODENAME=bionic 140 | ``` 141 | You may be on `20.04` so it will look slightly different. Take a note of the version you are on. 142 | 143 | 144 | Now let's start an `Ubuntu 16.04` machine. Run: 145 | ```bash 146 | docker run -it ubuntu:16.04 147 | ``` 148 | Since it's the first run it will take a few minutes for it to pull down the Ubuntu image data. 149 | Once it's done, you should see a waiting prompt that looks like: 150 | ```bash 151 | root@cadd99990677:/# 152 | ``` 153 | You have now opened a terminal in a virtual Ubuntu. Congratz. Run the same command from earlier in this prompt: 154 | ```bash 155 | grep '^VERSION' /etc/os-release 156 | ``` 157 | The output should be: 158 | ```bash 159 | VERSION="16.04.7 LTS (Xenial Xerus)" 160 | VERSION_ID="16.04" 161 | VERSION_CODENAME=xenial 162 | ``` 163 | Pretty cool right? We are inside a virtual machine in a virtual machine (technically a container, but you get the point). 164 | Now type: 165 | ```bash 166 | exit 167 | ``` 168 | and hit enter. You will be back to your normal terminal. Amazing! 169 | 170 | We will use Docker often. If you are excited about this, you can check out this tutorial: [here](https://docs.docker.com/get-started/02_our_app/), 171 | though it is not mandatory. Now it's time to learn how to use `Bash`, the terminal you typed on earlier. -------------------------------------------------------------------------------- /src/4_programming_languages/write_c.md: -------------------------------------------------------------------------------- 1 | # Writing Code 2 | 3 | Considering you have done the [Terminal](../2_operating_systems/terminal.md) section 4 | in the Operating Systems chapter, you already know about vim: the universal 5 | terminal based text editor. I stand by this notion that `vim` is the supreme editor 6 | when it comes to it's awesome key-bindings, but vim lacks features that other 7 | editors have made normal. For this very reason, you need to use plugins 8 | with vim to make it actually usable. In this section, I show you how to edit 9 | C code faster and better. 10 | 11 | ## Using plugins with Vim 12 | 13 | Get `Vundle` installed so we can add plugins. Go to their repo [here](https://github.com/VundleVim/Vundle.vim#quick-start) 14 | After you have it installed, you will have a default looking `.vimrc` that should look like this: 15 | 16 | ```vim 17 | set nocompatible " be iMproved, required 18 | filetype off " required 19 | 20 | " set the runtime path to include Vundle and initialize 21 | set rtp+=~/.vim/bundle/Vundle.vim 22 | call vundle#begin() 23 | " alternatively, pass a path where Vundle should install plugins 24 | "call vundle#begin('~/some/path/here') 25 | 26 | " let Vundle manage Vundle, required 27 | Plugin 'VundleVim/Vundle.vim' 28 | 29 | " The following are examples of different formats supported. 30 | " Keep Plugin commands between vundle#begin/end. 31 | " plugin on GitHub repo 32 | Plugin 'tpope/vim-fugitive' 33 | " plugin from http://vim-scripts.org/vim/scripts.html 34 | " Plugin 'L9' 35 | " Git plugin not hosted on GitHub 36 | Plugin 'git://git.wincent.com/command-t.git' 37 | " git repos on your local machine (i.e. when working on your own plugin) 38 | Plugin 'file:///home/gmarik/path/to/plugin' 39 | " The sparkup vim script is in a subdirectory of this repo called vim. 40 | " Pass the path to set the runtimepath properly. 41 | Plugin 'rstacruz/sparkup', {'rtp': 'vim/'} 42 | " Install L9 and avoid a Naming conflict if you've already installed a 43 | " different version somewhere else. 44 | " Plugin 'ascenator/L9', {'name': 'newL9'} 45 | 46 | " All of your Plugins must be added before the following line 47 | call vundle#end() " required 48 | filetype plugin indent on " required 49 | " To ignore plugin indent changes, instead use: 50 | "filetype plugin on 51 | " 52 | " Brief help 53 | " :PluginList - lists configured plugins 54 | " :PluginInstall - installs plugins; append `!` to update or just :PluginUpdate 55 | " :PluginSearch foo - searches for foo; append `!` to refresh local cache 56 | " :PluginClean - confirms removal of unused plugins; append `!` to auto-approve removal 57 | " 58 | " see :h vundle for more details or wiki for FAQ 59 | " Put your non-Plugin stuff after this line 60 | ``` 61 | 62 | Before the line: `" All of your Plugins must be added before the following line` 63 | Add the following: 64 | 65 | ``` 66 | Plugin 'Valloric/YouCompleteMe' 67 | Plugin 'airblade/vim-gitgutter' 68 | Plugin 'editorconfig/editorconfig-vim' 69 | Plugin 'itchyny/lightline.vim' 70 | Plugin 'junegunn/fzf' 71 | Plugin 'junegunn/fzf.vim' 72 | Plugin 'mattn/emmet-vim' 73 | Plugin 'scrooloose/nerdtree' 74 | Plugin 'scrooloose/syntastic' 75 | Plugin 'scrooloose/nerdcommenter' 76 | ``` 77 | 78 | Yeah, it's a series of plugins that make vim usable. The only thing you need to 79 | do in addition to this is install the `youcompleteme` things to make autocompleting 80 | a thing. Follow their [install guide](https://github.com/ycm-core/YouCompleteMe#linux-64-bit). 81 | 82 | After that is all in, feel free to explore what the hell you just installed ;). 83 | For one, you now have autocompleting, a directory view, and a nice lightline. 84 | Check it all out. 85 | 86 | ## Using Vim in another Editor 87 | 88 | Sometimes I don't like to use the command line to do my editing of code. Maybe you feel 89 | the same. When I don't want to use the terminal (usually for bigger or longer-lasting 90 | projects), I use [VS Code](https://code.visualstudio.com/download). 91 | Don't worry; it is not the regular Visual Studio. This is a pretty minimal open-source editor 92 | that works on all platforms. Especially Linux. It's very good and I recommend using it. 93 | 94 | But to make it **ACTUALLY** usable, it must have vim embedded in it. That's right, vim 95 | key bindings. Go ahead and install this [vim plugin](https://marketplace.visualstudio.com/items?itemName=vscodevim.vim) 96 | into your VS Code once you have it installed and setup. 97 | 98 | The lists of features in this editor is endless. One of my favorite features 99 | is that you edit files on a remote machine with ssh in VSCode. It's super seamless 100 | and makes it feel like you are just editing a local file. 101 | [Check it out](https://code.visualstudio.com/docs/remote/ssh). 102 | 103 | ## Using a multiplexer (tmux) 104 | 105 | Lastly, you are going to want to use some sort of multiplexer when you are editing 106 | code. It helps so you don't need to switch between tabs and such. My setup often 107 | looks like this: 108 | 109 | ![my_setup](./my_setup.png) 110 | 111 | In the left I have the source; In the right I have it split to run commands and 112 | see in/output and also have a `man` page up for commands I don't understand. 113 | It makes coding in the terminal supa-hot-fire. 114 | 115 | This is all made possible with [tmux](https://github.com/tmux/tmux/wiki) a terminal 116 | multiplexer. It's very similar to having a terminal that splits this for you. Many 117 | people like to use Terminator for this. I recommend using tmux. It will make you a 118 | better hacker and is usable on SSH connections. 119 | 120 | Install tmux and xclip: 121 | 122 | ```bash 123 | sudo apt-get install tmux xclip 124 | ``` 125 | 126 | Then use the config I have specially made to emulate `Terminator`: 127 | ```bash 128 | vim ~/.tmux.conf 129 | ``` 130 | 131 | ```conf 132 | # Set tmux to split and move like Terminator 133 | bind-key -n C-E split-window -h 134 | bind-key -n C-O split-window -v 135 | bind-key -n 'M-Up' select-pane -U 136 | bind-key -n 'M-Left' select-pane -L 137 | bind-key -n 'M-Right' select-pane -R 138 | bind-key -n 'M-Down' select-pane -D 139 | 140 | # Make sure the mouse is useable 141 | set -g mouse on 142 | 143 | # Turn the status oon 144 | set -g status off 145 | # statusbar 146 | set -g status-position bottom 147 | set -g status-justify left 148 | set -g status-left '' 149 | set -g status-right-length 50 150 | set -g status-left-length 20 151 | setw -g window-status-bell-style 'fg=colour255 bg=colour1 bold' 152 | 153 | 154 | # Make the colors good 155 | set -g default-terminal "screen-256color" 156 | 157 | # Set copy and paste 158 | set -g mouse on 159 | bind -n WheelUpPane if-shell -F -t = "#{mouse_any_flag}" "send-keys -M" "if -Ft= '#{pane_in_mode}' 'send-keys -M' 'select-pane -t=; copy-mode -e; send-keys -M'" 160 | bind -n WheelDownPane select-pane -t= \; send-keys -M 161 | bind -n C-WheelUpPane select-pane -t= \; copy-mode -e \; send-keys -M 162 | bind -T copy-mode-vi C-WheelUpPane send-keys -X halfpage-up 163 | bind -T copy-mode-vi C-WheelDownPane send-keys -X halfpage-down 164 | bind -T copy-mode-emacs C-WheelUpPane send-keys -X halfpage-up 165 | bind -T copy-mode-emacs C-WheelDownPane send-keys -X halfpage-down 166 | 167 | # To copy, left click and drag to highlight text in yellow, 168 | # once you release left click yellow text will disappear and will automatically be available in clibboard 169 | # # Use vim keybindings in copy mode 170 | setw -g mode-keys vi 171 | # Update default binding of `Enter` to also use copy-pipe 172 | unbind -T copy-mode-vi Enter 173 | bind-key -T copy-mode-vi Enter send-keys -X copy-pipe-and-cancel "xclip -selection c" 174 | bind-key -T copy-mode-vi MouseDragEnd1Pane send-keys -X copy-pipe-and-cancel "xclip -in -selection clipboard" 175 | ``` 176 | 177 | Usage: 178 | - Split Vertically: `Ctrl+Shift+e` 179 | - Split Horizontally: `Ctrl+Shift+o` 180 | - Destroy Pane: `Ctrl+d` 181 | - Move between panes with `Alt+arrow_key` 182 | - Example: move left: `Alt+left_arrow_key` 183 | 184 | Lastly, you can now just copy things in your terminal by selecting it with your mouse. 185 | It will automatically copy it while in a `tmux` session. If you destroy every pane, 186 | it will exit out of the `tmux` session. Feel free to rebind everything. 187 | 188 | 189 | 190 | -------------------------------------------------------------------------------- /src/3_computer_organization/control_structures.md: -------------------------------------------------------------------------------- 1 | # Control Structures 2 | 3 | ## Introduction 4 | Control structures are patterns of assembly code that create some kind of more abstract flow controlling thing. As an example, the `if` statement we used earlier is a control structure. Control structures are used to make our code do interesting and complex things like make decisions in a loop, conditional do stuff, and easily make code reusable and understandable. 5 | 6 | The first up of these control structures is one that makes code easy to reuse: functions! 7 | 8 | ## Functions 9 | 10 | A function is a piece of code that you can reuse more than once that can take some arguments and return some values. It's essentially just like a normal math function. Take for instance the classic `f` in math: 11 | ``` 12 | f(x) = y 13 | ``` 14 | 15 | In this function above, it takes `x` and outputs `y`. You may not know how it translates `x` -> `y`, but you know you get `y` from inputting `x`. You can make the same type of functions in x86 and most assembly languages. Earlier in [call-instructions](./instructions.md#call-instructions), we actually provided you with a function. Here it is again: 16 | ```c 17 | // args in rdi, output in rax 18 | make_even: 19 | mov rdx, rdi 20 | mov rax, 2 21 | idiv 22 | mov rax, rdx 23 | cmp rax, 0 24 | je make_even_done 25 | add rdi, 1 26 | make_even_ret: 27 | mov rax, rdi 28 | ret 29 | ``` 30 | 31 | A very simple function to take whatever number it is given and make it even. If you are not sure how it does that, review [instructions](./instructions.md) and how even and odd numbers work in math. 32 | 33 | ### Calling Convention 34 | 35 | In this example, the first argument to the function (the input), is passed in rdi. The output is passed in rax. This passing of arguments and returns actually is called something. A _calling convention_. A calling convention is the way in which you pass arguments to a callable _thing_. The _thing_ in this case is functions in x86. Just like flavors of syntax, there are [many different calling conventions](https://riptutorial.com/x86/topic/3261/calling-conventions). The most widely used calling convention, and the one you are most likely to see in the wild, is the [64-bit System V](https://riptutorial.com/x86/example/11197/64-bit-system-v) calling convention. 36 | 37 | In System V (**the one we use here**), this is how args are passed: 38 | 39 | | Argument | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10+... | return | 40 | |----------|-----|-----|-----|-----|----|----|-----|-----|-----|--------|--------| 41 | | Location | rdi | rsi | rdx | rcx | r8 | r9 | r10 | r11 | rsp | rsp+n | rax | 42 | | | | | | | | | | | | | | 43 | 44 | Arguments passed 8 are all passed on the stack. Argument 9 would be `rsp`, 10 `rsp + 8`, and so forth. System V is the calling convention we will be using for the rest of the handbook and the EmbryoASM modules we will have you do at the end of this section. The return value is always one thing and its passed in rax. 45 | 46 | Though we only use System V in the handbook, we felt it was worth it to mention that the 32 bit version of x86 uses [cdecl](https://riptutorial.com/x86/example/11196/32-bit-cdecl) (commonly said as "C-deck-ul). The cdecl calling convention passes all the arguments on the stack just like System V does for arguments 9 and above. 47 | 48 | Now back to our concrete examples. Say we have a function called `sum4` that returns the sum of four numbers. If we have some assembly code and we wanted to call that function with the values `2, 4, 8, 16`, this is how we would do it: 49 | 50 | ```c 51 | // some earlier code... 52 | 53 | mov rdi, 2 54 | mov rsi, 4 55 | mov rdx, 8 56 | mov rcx, 16 57 | call sum4 58 | 59 | // some code after... 60 | ``` 61 | 62 | Cool right? Note we can easily reuse `sum4` as many times as we like. Just as a refresher, every function ends with a `ret` so that you can reliably use `call` on it as in the example above. 63 | 64 | ### Functions and the Stack 65 | It's important to know that functions often use stacks to save arguments right at the beginning of the function. This is called the [function prolouge](https://en.wikipedia.org/wiki/Function_prologue_and_epilogue). The reason we save things on the stack is because we might need to reuse the original argument registers: 66 | 67 | ```c 68 | // takes 3 args 69 | my_func: 70 | call some_other_func 71 | ret 72 | ``` 73 | 74 | The way to fix this is by saving things on the stack: 75 | 76 | ```c 77 | //takes 3 args 78 | my_func: 79 | push rdi 80 | push rsi 81 | push rdx 82 | call some_other_func 83 | pop rdx 84 | pop rsi 85 | pop rdi 86 | ret 87 | ``` 88 | 89 | This is very easy to do with pops and pushes, but is often not exactly correct. In real function, you will see use of `rbp` as well. Here is a snippet of code we used in the [instructions](./instructions.md) section: 90 | 91 | ```c 92 | 000000000000112d
: 93 | 112d: push rbp 94 | 112e: mov rbp,rsp 95 | 1131: mov [rbp-0x8], 0x0 96 | 1138: mov [rbp-0x4], 0x4 97 | 113f: mov eax, [rbp-0x4] 98 | 1142: add eax, 0x5 99 | 1145: mov [rbp-0x8], eax 100 | 1148: mov eax, [rbp-0x8] 101 | 114b: imul eax, [rbp-0x4] 102 | 114f: mov [rbp-0x8], eax 103 | ``` 104 | 105 | This code is a very accurate representation of what you will see in the real world. We use the special register rbp to save the original place the stack was at the start of the function. `bp` in rbp stands for Base Pointer. It's the base pointer of the stack, or where it was before calling this function. 106 | 107 | To explain the above code more: 108 | 1. the current base pointer is saved (to be popped at the end by a leave; ret;) 109 | 2. the stack pointer becomes the base pointer 110 | 3. the base pointer is used as if it was the sp 111 | 112 | This allows us to modify the sp as we like, then when the function is done, it gets fixed up. This idea will be expanded more in the EmbryoASM challenges. 113 | 114 | 115 | ## Conditionals 116 | Conditionals run the world. Below you will find the most common structures translated into assembly, originally shown in python like code. 117 | ### if statements 118 | High-level: 119 | ```python 120 | if x > 0: 121 | y = 1 122 | else: 123 | y = 0 124 | ``` 125 | 126 | ASM: 127 | ```c 128 | // rdi = x; rax = y 129 | cmp rdi, 0 130 | jle else_label 131 | mov rbx, 1 132 | jmp end_label 133 | 134 | else_label: 135 | mov rbx, 0 136 | 137 | end_label: 138 | mov rax, rbx 139 | ``` 140 | 141 | ### else-if statements 142 | ```python 143 | if x == 0: 144 | y = 1 145 | elif x < 0: 146 | y = -1 147 | else 148 | y = 0 149 | ``` 150 | 151 | ASM: 152 | ```c 153 | // rdi = x; rax = y 154 | cmp rdi, 0 155 | je if_label 156 | jl else_if_label 157 | mov rbx, 0 158 | jmp end_label 159 | 160 | if_label: 161 | mov rbx, 1 162 | jmp end_label 163 | 164 | else_if_label: 165 | mov rbx, -1 166 | 167 | end_label: 168 | mov rax, rbx 169 | ``` 170 | 171 | ## Loops 172 | Loops allow you to do something many times. Like: "walk forward 18 times" actually translates to "walk forward"*18. 173 | Here are two types of loops you can use: 174 | 175 | ### For-loop 176 | When you know how many times you want to iterate, like the example above, you use a for-loop: 177 | High-Level: 178 | ```python 179 | for i=0...18: 180 | walk_forward() 181 | ``` 182 | 183 | ASM: 184 | ```c 185 | mov rcx, 0 186 | loop_head: 187 | cmp rcx, 18 188 | jge loop_end 189 | call walk_forward 190 | jmp loop_head 191 | loop_end: 192 | // any code after loop 193 | mov rax, 0 194 | ``` 195 | 196 | 197 | ### while loop 198 | When you don't know how many times you want to iterate, or your stopping condition is something special, you use a while loop: 199 | 200 | High-Level: 201 | ```python 202 | x = 80 203 | y = 0 204 | while x != 0: 205 | x = x - 2 206 | y += 1 207 | ``` 208 | 209 | ASM: 210 | ```c 211 | // rdi = x, rax = y 212 | mov rdi, 80 213 | mov rbx, 0 214 | 215 | loop_head: 216 | cmp rdi, 0 217 | je loop_end 218 | sub rdi, 2 219 | add rbx, 1 220 | jmp loop_head 221 | 222 | loop_end: 223 | mov rax, rbx 224 | //any code after loop 225 | ``` 226 | 227 | ## Conclusion 228 | With the general knowledge of these structures, you should be ready to start making some simple programs in x86. -------------------------------------------------------------------------------- /src/3_computer_organization/memory_segments.md: -------------------------------------------------------------------------------- 1 | # Memory Segments 2 | 3 | ## Introduction 4 | In the last section, you learned that programs exist in memory. What you may not realize though is that we have segments, or divisions, of memory we a program is running. We call the addressing of these different memory segments a [memory map](https://en.wikipedia.org/wiki/Memory_map). Let's print the memory map of a process. Run the command below: 5 | 6 | ```bash 7 | sleep 5 & cat "/proc/$!/maps" 8 | ``` 9 | 10 | This will run the `sleep` command (which sleeps for 5 seconds), and while it is running use the `pmap` command to print out the memory map of the process (which is the running memory instance of sleep). If you end up not being able to run this line, install pmap with `sudo apt-get install pmap -y`. Here is what the result should look similar to on your system: 11 | 12 | ``` 13 | [1] 243862 14 | 55ce021ae000-55ce021b0000 r--p 00000000 08:05 3540380 /usr/bin/sleep 15 | 55ce021b0000-55ce021b4000 r-xp 00002000 08:05 3540380 /usr/bin/sleep 16 | 55ce021b4000-55ce021b6000 r--p 00006000 08:05 3540380 /usr/bin/sleep 17 | 55ce021b7000-55ce021b8000 r--p 00008000 08:05 3540380 /usr/bin/sleep 18 | 55ce021b8000-55ce021b9000 rw-p 00009000 08:05 3540380 /usr/bin/sleep 19 | 55ce03d2d000-55ce03d4e000 rw-p 00000000 00:00 0 [heap] 20 | 7f4055792000-7f4055d02000 r--p 00000000 08:05 3545607 /usr/lib/locale/locale-archive 21 | 7f4055d02000-7f4055d27000 r--p 00000000 08:05 3546309 /usr/lib/x86_64-linux-gnu/libc-2.31.so 22 | 7f4055d27000-7f4055e9f000 r-xp 00025000 08:05 3546309 /usr/lib/x86_64-linux-gnu/libc-2.31.so 23 | 7f4055e9f000-7f4055ee9000 r--p 0019d000 08:05 3546309 /usr/lib/x86_64-linux-gnu/libc-2.31.so 24 | 7f4055ee9000-7f4055eea000 ---p 001e7000 08:05 3546309 /usr/lib/x86_64-linux-gnu/libc-2.31.so 25 | 7f4055eea000-7f4055eed000 r--p 001e7000 08:05 3546309 /usr/lib/x86_64-linux-gnu/libc-2.31.so 26 | 7f4055eed000-7f4055ef0000 rw-p 001ea000 08:05 3546309 /usr/lib/x86_64-linux-gnu/libc-2.31.so 27 | 7f4055ef0000-7f4055ef6000 rw-p 00000000 00:00 0 28 | 7f4055f0a000-7f4055f0b000 r--p 00000000 08:05 3546044 /usr/lib/x86_64-linux-gnu/ld-2.31.so 29 | 7f4055f0b000-7f4055f2e000 r-xp 00001000 08:05 3546044 /usr/lib/x86_64-linux-gnu/ld-2.31.so 30 | 7f4055f2e000-7f4055f36000 r--p 00024000 08:05 3546044 /usr/lib/x86_64-linux-gnu/ld-2.31.so 31 | 7f4055f37000-7f4055f38000 r--p 0002c000 08:05 3546044 /usr/lib/x86_64-linux-gnu/ld-2.31.so 32 | 7f4055f38000-7f4055f39000 rw-p 0002d000 08:05 3546044 /usr/lib/x86_64-linux-gnu/ld-2.31.so 33 | 7f4055f39000-7f4055f3a000 rw-p 00000000 00:00 0 34 | 7ffd0bc85000-7ffd0bca6000 rw-p 00000000 00:00 0 [stack] 35 | 7ffd0bd80000-7ffd0bd83000 r--p 00000000 00:00 0 [vvar] 36 | 7ffd0bd83000-7ffd0bd84000 r-xp 00000000 00:00 0 [vdso] 37 | ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall] 38 | ``` 39 | 40 | It's a lot of stuff, so let's break it down. The first column is the start address to end address of that memory region. The second column is the permissions. The fifth column is the size of that mapped data (as it stands right now). The sixth column is the name of that mapping, which is not required. 41 | 42 | > Note: it is common in computer science that if something does not have a name it is called "anonymous." Sometimes that name will be shorted to just "anon." 43 | 44 | I won't talk about the other columns because they aren't important for our example. 45 | 46 | ## Important Maps 47 | Using the output from the command above, we can make a little reduced table of the output: 48 | 49 | | Address Range | Permissions | Name | 50 | |---------------------------|-------------|----------------| 51 | | 55ce021ae000-55ce021b0000 | r--p | /usr/bin/sleep | 52 | | 55ce021b8000-55ce021b9000 | rw-p | /usr/bin/sleep | 53 | | 55ce021b0000-55ce021b4000 | r-xp | /usr/bin/sleep | 54 | | 55ce03d2d000-55ce03d4e000 | rw-p | heap | 55 | | 7ffd0bc85000-7ffd0bca6000 | rw-p | stack | 56 | 57 | We will refer to these as maps 1 through 5. 58 | 59 | ## Program Memory 60 | Maps 1, 2, and 3 refer to the memory of the program. This is a fundamental difference from the memory of the process that is running this program. We will talk about processes more in the [processes](../5_processes/indroduction.md) section, but for now you can consider a process to be a bunch of things mapped in memory along with the program. 61 | 62 | Back to our program mapped in memory. Maps 1, 2, and 3 are the memory with talked about in [programs-in-memory](./programs_in_mem.md) with the Minecraft example. You will notice that although all these mappings are for the same thing, they have different permissions per-split. 63 | 64 | Map 1 is a read-only section of the program. You could consider this to be the place in the program where unchangeable non-code things are stored. Things like constant strings, png's of Minecraft blocks, and stuff you will not modify while the program runs. 65 | 66 | Map 2 is a read-write section of the program. You could consider this to be the place in the program where you can store and modify things. Things like names could be stored here. The username of your player in Minecraft may change while you play the game (you could change it at Mojang), which means this name is not constant and needs to be writeable. It could be stored here. Usually, these writeable sections in your program have a special name like `.data` or `.bss`. 67 | 68 | ![](./minecraft_username.jpg) 69 | 70 | Map 3 is a read-and-execute section of the program. In modern programs, this is the only mapping in the program that is executable. This is the place where the actual code of the program is stored. In Minecraft, this would be things like the logic for moving around your player, saving the world, placing blocks... everything. It's where the actual code that will be executed is stored. We will learn later that this is where instructions are stored. 71 | 72 | There are also more mappings in the program for other things, but you will notice that their permissions are all iterations of the maps we described above. 73 | 74 | ## Heap 75 | The Heap is a section of memory in every process (the stuff mapped with the program) that is dedicated to being a large writeable space. You might be asking yourself: "Why does this need to exist? Why can't we just write everything in the writeable section of the program?" Let's continue to use Minecraft as our running example to answer this question. 76 | 77 | In Minecraft, as you roam around the world you may notice things coming in and out of view. The map, with all its AI and moving objects, is a large piece of data that is changing in memory all the time as you move. The amount that will be loaded in memory at any given time is completely unknown. You could, for instance, walk into a piece of the map where you built a 10000 chicken prison. That many chickens would absolutely destroy the program's memory if it could not _expand_. You could also turn up or down the render distance (the view distance), which would increase or decrease the amount of the map loaded at a given time. 78 | 79 | ![](./render_distance.png) 80 | 81 | To make things easy, we made a section of memory that is both large and expandable: the Heap! We put large things in the Heap because the Heap can get bigger or smaller as we need it. As a side note, when you need more space in the heap, the range of the end mapping gets bigger (expands up). 82 | 83 | ## Stack 84 | The Stack is another section of memory in every process that is dedicated to being a medium-sized writeable space that is **very** fast. How fast? It can often be around [100x](https://publicwork.wordpress.com/2019/06/27/stack-allocation-vs-heap-allocation-performance-benchmark/) faster than using the heap, but it is also way less secure and has way less rules. For now, we will just ask you to believe us when we say the stack is much easier to corrupt than the heap, since you will learn that **very** technically if you finish this handbook and do the [memory errors](https://dojo.pwn.college/challenges/memory) module from pwn.college. 85 | 86 | So we have this faster less secure section of memory. We can't but everything there because it's insecure, but we can put small things that change often. In Minecraft, your username does not change _that frequently_, so it should not go here. When we say frequently, we mean a few dozen times in a second. 87 | 88 | Something that changes that frequently in Minecraft, but is pretty small, is the space in memory where we store the letters you type in the Multiplayer chat. 89 | 90 | ![](./minecraft_chat.png) 91 | 92 | The chat constantly gets new letters that exist, then don't exist. For the chat history, we save that in the Heap. But for a single sentence you are about to send to the chat, we put in the Stack. It changes fast and needs that speed for players to get angry at their respective toasters. 93 | 94 | 95 | ## Conclusion 96 | There is a pattern among the usage of these writeable sections in memory: 97 | 98 | | Name | Data changes | Date is Large | Data changes Often | 99 | |-------------------------|--------------|---------------|--------------------| 100 | | Writeable program space | X | | | 101 | | Heap | X | X | | 102 | | Stack | X | | X | 103 | 104 | You use a program's writeable space when it's small and does not change much. Heap when it's large. Stack when it's small and rapidly changing. 105 | 106 | With everything you know now, this image of a processes memory should make sense: 107 | ``` 108 | *----*----*----*----*----* 0x0000000000000000 109 | | | 110 | | MINECRAFT | 111 | |------------------------| 112 | | | 113 | | HEAP | 114 | | | 115 | |------------------------| 116 | | | | 117 | | v | 118 | | | 119 | | | 120 | | ^ | 121 | | | | 122 | |------------------------| 123 | | STACK | 124 | *----*----*----*----*----| 0xFFFFFFFFFFFFFFFF 125 | ``` 126 | 127 | When `Minecraft` is running, its program memory stays stationary. The Heap that it uses to store large changing things grows up (in addressing space) towards the Stack. The Stack that it uses to store small changing things grows down (in the addressing space) towards the Heap. All the other things you can see stored in memory in the initial maps we got are other programs that assist the main program while it runs. This can be things like external libraries (code others have written that you reuse). -------------------------------------------------------------------------------- /src/4_programming_languages/introduction.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | If you are a blossoming computer scientist, it is likely you have heard about the language 4 | [Python](https://www.python.org/about/). With the knowledge of a single language, you think 5 | realize that there must be many different languages to talk to a computer with. What may 6 | come as something interesting is that there are distinct groups of languages we call 7 | [programming paradigms](https://en.wikipedia.org/wiki/Programming_paradigm). There are 8 | distinct programming paradigms, but many are too difficult to understand this early in 9 | your coding career, so we will only focus on [Imperative Programming](https://en.wikipedia.org/wiki/Imperative_programming). 10 | 11 | ## Imperative Programming 12 | 13 | Imperative programming is what many refer to as normal programming. You think in variables 14 | like in math. A simple example is when you want to compute the speed of something. 15 | You are an advanced enough mammal that you can create a formula (a series of symbols) for 16 | this idea: 17 | 18 | ``` 19 | speed = distance / time 20 | ``` 21 | 22 | Now when you want to compute the speed, you simply assign values to the variables 23 | `distance` and `time` that you specify in the formula. These are parameters: 24 | 25 | ``` 26 | compute_speed(distance, time): 27 | speed = distance / time 28 | return speed 29 | ``` 30 | 31 | Finally, you just assign values: 32 | 33 | ``` 34 | compute_speed(10, 1) 35 | ``` 36 | 37 | Which outputs `10`. This is imperative. Variables are assigned, functions return values, 38 | and things work in this linear fashion. All is good. If you care to know what else exists 39 | feel free to read up on [declaritive languages](https://en.wikipedia.org/wiki/Declarative_programming), but 40 | it's optional. Now let's talk about the division in imperative languages 41 | 42 | Two exist: 43 | - Compiled Languages 44 | - Interpreted Languages 45 | 46 | ## Compiled Languages 47 | 48 | It's likely you've also heard of the language [C](https://en.wikipedia.org/wiki/C_(programming_language)). 49 | You must be wondering, how is `C` any different from `Python`? Why is Python more popular now? Which should 50 | I use? These are all valid questions. Let's start by showing you what `C` is, then compare it to Python later. 51 | 52 | `C` is a compiled language. This means the process of compilation must be done on the language after you write 53 | it. According to [ComputerHope](https://www.computerhope.com/jargon/c/compilat.htm): 54 | 55 | > **Compilation** is the process the computer takes to convert a high-level programming language into a machine language that the computer can understand. The software which performs this conversion is called a compiler. 56 | 57 | In many ways, this makes sense. You can't just say English things to the computer. It needs to be translated 58 | into something the computer understands, i.e., `100101011110101...`—binary. You may think this is stupid, 59 | since you already need to convert your high-level ideas into a specific language, then that language gets 60 | translated again. The only reason this current way is more efficient is that it's easier to write 61 | `C` than it is to write `binary`. 62 | 63 | This means that our code can only work (run) on a computer after it has been compiled into a program. 64 | Assuming we can convert our high-level idea into C-code, our workflow now looks like this: 65 | 66 | ![c_workflow](./c_workflow.jpeg) 67 | 68 | `C Code` -> `` -> `Binary Code` 69 | 70 | ### DIY C 71 | 72 | Let's quickly do a simple example, though you are not sure how to code yet. Take the written code 73 | below and write it into a file with vim: 74 | 75 | ```bash 76 | vim example.c 77 | ``` 78 | 79 | write: 80 | 81 | ```c 82 | #include 83 | 84 | int main() 85 | { 86 | printf("Hello World, from C!\n"); 87 | } 88 | ``` 89 | 90 | What is the type of the file we just created? Let's take a look with the `file` command: 91 | (don't mind the `▶` shown, that is my custom shell symbol). 92 | 93 | ```bash 94 | ▶ file example.c 95 | example.c: C source, ASCII text 96 | ``` 97 | 98 | It's an [ASCII](https://en.wikipedia.org/wiki/ASCII) text file that the OS identified as a C source file. 99 | If you are not familiar with ASCII, it's a way to encode numbers as letters and vice versa. 100 | For example `65` translate to `A` and the other way around as well. For now, know that this is just a file 101 | full of readable text. 102 | 103 | Now lets **compile** it! [gcc](https://gcc.gnu.org) is the GNU (another name for Linux) C Compiler. Let's use it: 104 | 105 | ```bash 106 | gcc example.c -o example 107 | ``` 108 | 109 | We used `gcc` and told it to name the output as `example`. You should now see it in the same directory: 110 | 111 | ```bash 112 | ▶ ls 113 | example example.c 114 | ``` 115 | 116 | Now what is the type of `example`? Let's check: 117 | 118 | ```bash 119 | ▶ file example 120 | example: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=aeac59bbc04b9845665af0406044181d241f31f6, not stripped 121 | ``` 122 | 123 | Wo. That is a lot of information. For now, let's talk about the first segment of text: 124 | `example: ELF 64-bit LSB shared object`. This means `example` is a 64-bit ELF. What is an ELF? 125 | Not a magical creature, but an executable program! Anything that is an `ELF` is a compiled 126 | program that is ready and in computer language. Let's run it! 127 | 128 | ```bash 129 | ▶ ./example 130 | Hello World, from C! 131 | ``` 132 | 133 | Amazing, it executed as expected. 134 | What else is an ELF? Almost everything we use in the shell. Take for example `ls`: 135 | 136 | ```bash 137 | ▶ file /bin/ls 138 | /bin/ls: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=9567f9a28e66f4d7ec4baf31cfbf68d0410f0ae6, stripped 139 | ``` 140 | 141 | Yup it's an ELF. We will go deeper into ELF's in the [computer organization](../4_computer_organization/introduction.md) section. 142 | For now, know it's a compiled program. There are many compiled langauges out there. C and C++ are currently 143 | the most used. Here are some of the most popular: 144 | 145 | - [C++](https://www.w3schools.com/cpp/cpp_intro.asp) 146 | - [Rust](https://www.rust-lang.org/) 147 | - [Go](https://golang.org/) 148 | - [Swift](https://developer.apple.com/swift/) 149 | 150 | ## Interpreted Languages 151 | 152 | Unlike `C`, Python is not compiled (or at least not on the surface). Python is interpreted line-by-line. 153 | This is really nice when you are working fast, because when you compile if there is one mistake in your 154 | code, the entire program will not compile—which means you now have nothing but a C file. In Python, 155 | if the code never executes to that section of the code, then it does not care. In addition, you can 156 | execute entire lines of code independently. This is what makes an interpreted language different from 157 | a compiled one. 158 | 159 | With interpreted languages, the language is fed into an interpreter, which then executes the code. 160 | So similarly to compiled languages, we have a middle man that does some kind of translation; except 161 | now it's on a line-by-line basis. Let's look at it closer, on the DIY scale. 162 | 163 | ### DIY Python 164 | Like last time put this in a file: 165 | 166 | ```bash 167 | vim example.py 168 | ``` 169 | 170 | ```python 171 | #!/usr/bin/env python3 172 | print("Hello World, from Python!") 173 | ``` 174 | 175 | As you can probably tell already, Python is much more concise. This is usually true when comparing 176 | an interpreted language to a compiled one. Now let's see what the type of this file is: 177 | 178 | ```bash 179 | ▶ file example.py 180 | example.py: Python script, ASCII text executable 181 | ``` 182 | 183 | Yup, like last time it's an ASCII file identified as Python code. Now to make this executable we need 184 | to give it the executable permission: 185 | 186 | ```bash 187 | chmod +x example.py 188 | ``` 189 | 190 | Then we can finally execute it like we did with the earlier example: 191 | 192 | ```bash 193 | ▶ ./example.py 194 | Hello World, from Python! 195 | ``` 196 | 197 | Now, you must be wondering, how the hell did we just run a file full of readable text? 198 | Based on what we just learned, that should be impossible. It should have to go through some 199 | sort of middle-man that converts the code into a machine language. Well some magic happened 200 | that we did not see: 201 | 202 | ```python 203 | #!/usr/bin/env python3 204 | ``` 205 | 206 | That first line we wrote told the shell to execute this file using the `python3` command. 207 | This is equivalent to running the program like this: 208 | 209 | ```bash 210 | ▶ python3 example.py 211 | Hello World, from Python! 212 | ``` 213 | 214 | And what the hell is `python3`? I think you know... 215 | Note: `python3` is a symbolic link for `python3.6` 216 | 217 | ``` 218 | ▶ file /usr/bin/python3.6 219 | /usr/bin/python3.6: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=fc614aa299a924960da33b875fb9cfaa641ea5bc, stripped 220 | ``` 221 | 222 | Yup, it's an `ELF`. This means that the interpreter is indeed a compiled program written in 223 | a compiled language (C++ in this case). Things make sense again. Since we know the 224 | interpreter is magical; let's take this one step further. We can actually execute 225 | the interpreter without any initial code: 226 | 227 | ```bash 228 | ▶ python3 229 | Python 3.6.9 (default, Oct 8 2020, 12:12:24) 230 | [GCC 8.4.0] on linux 231 | Type "help", "copyright", "credits" or "license" for more information. 232 | >>> 233 | ``` 234 | 235 | Executing `python3` alone should put you in some kind of prompt now. It's ready 236 | to execute things line by line. Try it: 237 | 238 | ```python 239 | >>> print("hello world") 240 | hello world 241 | >>> print("wow so cool") 242 | wow so cool 243 | >>> print(4 + 5) 244 | 9 245 | ``` 246 | 247 | Pretty cool right? It executes things line by line and can even use variables you define. 248 | As you can see, all interpreted languages are built on compiled languages. At the end 249 | of the day, the computer only understands machine language—remember this. 250 | 251 | Here are some popular interpreted languages: 252 | - [Python](https://www.python.org/about/) 253 | - [Java Script](https://www.javascript.com/) 254 | - [Ruby](https://www.ruby-lang.org/en/) 255 | - [Bash](https://www.gnu.org/software/bash/manual/html_node/What-is-Bash_003f.html) 256 | 257 | Yes, the shell you have been using is indeed an interpreted language. 258 | 259 | ## Speed Comparison 260 | 261 | The last thing to mention is that because of the nature of executing a program line-by-line, 262 | compiled languages are often **much** faster than interpreted languages. For instance, 263 | let's compare two programs that do the exact same thing in two languages. 264 | 265 | You know the deal: 266 | 267 | ```bash 268 | vim test.c 269 | ``` 270 | 271 | ```c 272 | #include 273 | 274 | int main() 275 | { 276 | int temp = 0; 277 | for(int i = 0; i < 1000000000; i++); 278 | temp += 1; 279 | return temp; 280 | } 281 | ``` 282 | 283 | ```bash 284 | gcc test.c -o test 285 | ``` 286 | 287 | Now let's write the python version: 288 | ```bash 289 | vim test.py 290 | ``` 291 | 292 | ```python 293 | #!/usr/bin/env python3 294 | def main(): 295 | temp = 0 296 | for i in range(1000000000): 297 | temp += 1 298 | return temp 299 | 300 | main() 301 | ``` 302 | 303 | ```bash 304 | chmod +x test.py 305 | ``` 306 | 307 | Now we have two versions, `test.py` and `test` -- a compiled vs interpreted race. 308 | Both of these programs run the action `"increment temp"` **1 billion times**. 309 | For the C program, this is nothing, but for python... well. Let's time it: 310 | 311 | ```bash 312 | ▶ time ./test 313 | ./test 2.28s user 0.00s system 99% cpu 2.285 total 314 | ``` 315 | 316 | The C program took `2.28` seconds. Let's time the Python one (wait for it!): 317 | 318 | ```bash 319 | ▶ time ./test.py 320 | ./test.py 37.60s user 0.01s system 99% cpu 37.620 total 321 | ``` 322 | 323 | The python program took `37.60` seconds. Wow. Your times may be slightly different, 324 | but the difference being large will be the same. In this case, `C` was **16 times 325 | faster**. You can see how this gets bad on a larger scale, but I digress. 326 | 327 | Let's move on to using these languages. 328 | 329 | 330 | 331 | 332 | 333 | -------------------------------------------------------------------------------- /src/3_computer_organization/instructions.md: -------------------------------------------------------------------------------- 1 | # Instructions 2 | 3 | ## Introduction 4 | So you've learned where to store numbers in assembly, [registers](./registers.md). Your next logical question would be: **how** do I store values in those registers? Good question. You store them using _instructions_. 5 | 6 | Instructions are _atomic_ pieces of logic that run in order on the CPU. That sounds complicated, so let's break it down. Here is an example of something we want to do: 7 | 8 | ```c 9 | x = 10 10 | ``` 11 | 12 | In assembly, where `x is rax`, this translates to: 13 | 14 | ```c 15 | mov rax, 10 16 | ``` 17 | 18 | > It's important to note that this format of x86 is in the flavor of Intel. There are two flavors: [Intel and AT&T](https://imada.sdu.dk/~kslarsen/dm546/Material/IntelnATT.htm). For all the challenge, and this handbook, we will be using Intel format. 19 | 20 | This instruction is atomic because nothing can interrupt it and it is the lowest level of logic on a computer. In contrast, we could show this operation: 21 | 22 | ```c 23 | x = x + 10 24 | ``` 25 | 26 | The thing we are asking to do above is not atomic. It is actually composed of 3 parts (normally): 27 | 28 | ```c 29 | mov rbx, rax // make a temp for x 30 | add rbx, 10 // add 10 to that temp 31 | mov rax, rbx // move the temp back into x 32 | ``` 33 | 34 | You will also notice that these instructions execute one-after-another. They are linearly executed. 35 | 36 | ## Instruction Syntax 37 | 38 | This syntax may be confusing but most instructions follow the same format: 39 | ``` 40 | , 41 | ``` 42 | 43 | So saying: `mov rax, rbx` means `move rbx to rax`. 44 | 45 | There are some other more subtle things in this syntax, like the use of `[]` in instructions. 46 | 47 | > Recall: when we say [0x400000] this refers to the data at the address 0x400000. Review [memory](./memory.md) for a recap. 48 | 49 | For instance: 50 | ``` 51 | mov rax, [rbx] 52 | ``` 53 | 54 | This means move the **the value at the address stored in rbx** to rax. Usually other blogs and such will refer to this process as dereferencing rbx. 55 | 56 | 57 | ## Instruction Execution 58 | 59 | In the [registers](./registers.md) section we talked a little about special registers. Now it's time to talk about the most important of those special registers: **rip** (also referred to as ip). 60 | 61 | IP in assembly land refers to the Instruction Pointer register. You can find an ip register in every architecture. This instruction is responsible for storing the address of the instruction we are supposed to be executing right now. Normally, when you have instructions that are a program they are laid out in memory. Here is how a typical memory layout full of instructions could look: 62 | 63 | ```c 64 | 000000000000112d
: 65 | 112d: push rbp 66 | 112e: mov rbp,rsp 67 | 1131: mov [rbp-0x8], 0x0 68 | 1138: mov [rbp-0x4], 0x4 69 | 113f: mov eax, [rbp-0x4] 70 | 1142: add eax, 0x5 71 | 1145: mov [rbp-0x8], eax 72 | 1148: mov eax, [rbp-0x8] 73 | 114b: imul eax, [rbp-0x4] 74 | 114f: mov [rbp-0x8], eax 75 | ``` 76 | 77 | There are some important things to note here. First, you can dereference registers while adding or subtracting an offset to it like in `[rbp - 0x4]`. Second, the address each instruction is associated with is not singly incremental. Notice how the difference in address between some instructions is `7`, while others are only `1`. You may have guessed it, but the difference in addresses for each instruction is based on that instruction's size. 78 | 79 | Each instruction is composed of bytes that encode it. Here is the same code from above, but printed with its encoding: 80 | 81 | ```c 82 | 000000000000112d
: 83 | 112d: 55 push rbp 84 | 112e: 48 89 e5 mov rbp,rsp 85 | 1131: c7 45 f8 00 00 00 00 mov [rbp-0x8], 0x0 86 | 1138: c7 45 fc 04 00 00 00 mov [rbp-0x4], 0x4 87 | 113f: 8b 45 fc mov eax, [rbp-0x4] 88 | 1142: 83 c0 05 add eax, 0x5 89 | 1145: 89 45 f8 mov [rbp-0x8], eax 90 | 1148: 8b 45 f8 mov eax, [rbp-0x8] 91 | 114b: 0f af 45 fc imul eax, [rbp-0x4] 92 | 114f: 89 45 f8 mov [rbp-0x8], eax 93 | ``` 94 | 95 | There is a lot of different semantics to encoding instructions, such as their type and operation, but I won't be talking about how you can encode instructions by hand in this handbook. If you are interested, check [this out](https://wiki.osdev.org/X86-64_Instruction_Encoding#Opcode). 96 | 97 | If you are curious about how an instruction encodes into its bytes (or the other way around), use [this site](https://defuse.ca/online-x86-assembler.htm) to encode and decode x86 instructions as you like. I use it often for CTFs since it's so easy to use. 98 | 99 | Now back to our earlier discussion, the instruction pointer. Execution of instructions follows the [fetch-and-execute](https://en.wikipedia.org/wiki/Instruction_cycle) cycle: 100 | 1. Get the instruction at the address of the ip 101 | 2. Decode it 102 | 3. Execute it 103 | 4. Add the size of the current instruction to the ip 104 | 5. Repeat 105 | 106 | So if in our previous example we are about to execute `mov [rbp-0x8], 0x0`, that means that `rip = 0x1131`. This also means that `[0x1131]` is the bytes of the instruction `mov [rbp-0x8], 0x0`. 107 | 108 | The last thing to know about `rip`, and `ip` in general, is that you are not allowed to modify this register yourself. Obviously you just having instructions in memory modifies `rip`, but you are not allowed to do things like: 109 | ``` 110 | mov rip, 0x1138 111 | ``` 112 | 113 | That is an illegal instruction. 114 | 115 | ## Common Instructions 116 | So you know instructions can do things, but what kind of instructions exist? Here are the most common instructions you will use/see in the wild: 117 | 118 | > Note: when you see it means that thing could be an `x` or a `y`; `C` means a constant, like `10`; `stack` means the stack (a region of memory) and is represented by a list. 119 | 120 | ### Math Operations 121 | 122 | | Mnemonic | Arguments | Description | Python Equiv | | 123 | |----------|---------------------------|------------------------------------------------------|--------------|---| 124 | | add | r1, | Adds r2 to r1. | r1 += r2 | | 125 | | sub | r1, | Subtracts r2 from r1. | r1 -= r2 | | 126 | | idiv | rax=divisor; rdx=dividend | Divides rdx by rax. Result in rax, remainder in rdx. | rdx // rax; rdx % rax | | 127 | | imul | r1, | Multiplies r1 by r2. | r1 *= r2 | | 128 | 129 | > Note: modulo can be accomplished with idiv and reading the value in rdx. 130 | 131 | The math operations above can be done both in a [signed and unsigned](https://math.libretexts.org/Bookshelves/PreAlgebra/Book%3A_Fundamentals_of_Mathematics_(Burzynski_and_Ellis)/10%3A_Signed_Numbers/10.06%3A_Multiplication_and_Division_of_Signed_Numbers) way. This means that the numbers can be represented negatively or non-negatively which changes how we represent the output of the number. 132 | 133 | > Recall: in [bits-and-logic](./bits_and_logic.md) that we can represent negative values in x86 using [Two's Complement](https://en.wikipedia.org/wiki/Two%27s_complement), which makes the upper bits of a value `1` to represent negative. This affects how instructions output values. 134 | 135 | ### Logic Operations 136 | 137 | | Mnemonic | Arguments | Description | Python Equiv | | 138 | |----------|---------------|-------------------------------------|--------------|---| 139 | | and | r1, | Logically ANDs r1 with r2 | r1 &= r2 | | 140 | | or | r1, | Logically ORs r1 with r2 | r1 \|= r2 | | 141 | | xor | r1, | Logically XORs r1 with r2 | r1 ^= r2 | | 142 | | not | r1 | Logically NOTs r1 and sets it to r1 | r1 = ~r1 | | 143 | 144 | ### Storage Operations 145 | 146 | | Mnemonic | Arguments | Description | Python Equiv | | 147 | |----------|---------------|---------------------------------------------|------------------|---| 148 | | mov | r1, | Copies value in r2 and stores it in r1 | r1 = r2 | | 149 | | lea | r1, [r2 + C] | Stores computed address of r2+C in r1 | r1 = r2 + C | | 150 | | push | | Places r1 on the top of the stack | stack += [r1] | | 151 | | pop | r1 | Removes value on top of stack, places in r1 | r1 = stack.pop() | | 152 | 153 | You don't know what the stack is yet, but we will get to it in the [asm-memory](./asm_memory.md) section. 154 | 155 | --- 156 | 157 | There is actually one more set of instructions we need to cover, and that's _control flow operations_, or operations that change the execution of the program (alter ip). They are so important that they get their own header. 158 | 159 | ## Control Flow Instructions 160 | 161 | You understand how to do things linearly, but that's boring. You don't always want to do things so linearly. You often want conditions! Something like: 162 | 163 | ```python 164 | if(x is even): 165 | y = x + 1 166 | else: 167 | y = x 168 | ``` 169 | 170 | In x86 you represent these types of things with conditional jumps. 171 | 172 | ### Jump Instructions 173 | 174 | All jump instructions start with a `j`, go figure. Normal jumps, called unconditional jumps look like this: 175 | ``` 176 | jmp 0xdeadbeef 177 | ``` 178 | 179 | Where you replace 0xdeadbeef with an address of some sort. This also works with registers, so `jmp [rax]` is a very valid thing too and introduces its own complexities. Conditional jumps start with a `j` and end with some mnemonic to signify what they are dependent on. As an example, you have _jump if less than or equal_: 180 | ``` 181 | jle 0xdeadbeef 182 | ``` 183 | 184 | `le` is dependent on something called the [flags register](https://en.wikipedia.org/wiki/FLAGS_register), which is altered based on instructions that cause comparisons and tests. The two most common comparisons are: 185 | ``` 186 | cmp r1, 187 | test r1, r2 188 | ``` 189 | 190 | `cmp` will subtract `r2` from `r1` to tell the difference. `test` will `AND` the two to tell the difference. These differences are stored in the flags registers. The most common format of their use is like so: 191 | ``` 192 | cmp rax, rdx 193 | jle addr2 194 | addr1: 195 | mov rbx, 1 196 | jmp addr3 197 | addr2: 198 | mov rbx, 0 199 | addr3: 200 | mov rax, rbx 201 | ``` 202 | 203 | The use of the names like `addr1` here are labels. You can place labels anywhere in assembly and use them in jump instructions later. Thse labels will be converted into relative jumps at the time the code is assembled (right before running it). 204 | 205 | **RECAP**: 206 | - Unconditional Jump: `jmp address` 207 | - List of all conditional jumps: [here](http://unixwiz.net/techtips/x86-jumps.html) 208 | - Register flags: [here](https://riptutorial.com/x86/example/6976/flags-register) 209 | - cmp instruction: [here](https://www.aldeid.com/wiki/X86-assembly/Instructions/cmp) 210 | - test instruction [here](https://en.wikipedia.org/wiki/TEST_(x86_instruction)) 211 | 212 | 213 | ### Call Instructions 214 | 215 | Finally, we have the last subset of instructions and that's call related instructions. They are a sub-set of control flow altering instructions and they work very much like jumps. There are two instructions: 216 | 1. [call](https://www.felixcloutier.com/x86/call) 217 | 2. [ret](https://www.felixcloutier.com/x86/ret) 218 | 219 | Call works like this: 220 | ``` 221 | call 222 | ``` 223 | 224 | So you can call an address, label (like in the jumps), or a dereferenced register (`[rax]`). When you call something it actuall does two things: 225 | ``` 226 | call addr: 227 | 1. decode instruction 228 | 2. push (rip + current_instruction_size) 229 | 3. jmp addr 230 | ``` 231 | 232 | You still don't know what the stack is, but know that its somewhere you can save stuff just like normal memory. If you push something on the stack, it is now saved on the stack until a corresponding pop. So, from this, we can extrapolate that a call instruction does a jump while saving the original next address on the stack. This save is for the corresponding instruction `ret`. The `ret` instruction takes optional args, but for now we will consider it takes nothing: 233 | ``` 234 | ret 235 | ``` 236 | 237 | The ret instruction does the following: 238 | ``` 239 | ret: 240 | 1. pop rip 241 | ``` 242 | 243 | So it directly modifies `rip` by taking whatever is on top of the stack and putting it into `rip`. So, in normal code, you can make a region of code you can reuse many times, called a _function_: 244 | ```c 245 | // args in rdi, output in rax 246 | make_even: 247 | mov rdx, rdi 248 | mov rax, 2 249 | idiv 250 | mov rax, rdx 251 | cmp rax, 0 252 | je make_even_done 253 | add rdi, 1 254 | make_even_ret: 255 | mov rax, rdi 256 | ret 257 | 258 | _start: 259 | mov rdi, 10 260 | call make_even 261 | mov rdi, rax 262 | ... 263 | ``` 264 | 265 | This code above shows off the power of the `call`, `ret` combo, allowing you to return to execution after you do some action with registers and values. It also shows how to make a function, which we will cover more in [control-structures](./control_structures.md). 266 | 267 | 268 | ## Conclusion 269 | 270 | There are many instructions that make up the x86-64 architecture. It's actually one of the largest. You can find all instructions at the [felixcloutier site](https://www.felixcloutier.com/x86/), which I often use for references of hard-to-remember instructions. --------------------------------------------------------------------------------