└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Objective-C ABI Knowledge Base 2 | 3 | _**2023 Update:** I am also not actively working on projects related to 4 | Objective-C anymore, and this document has not been updated in quite some 5 | time. Some of the terminology here, etc. has always been wrong; other parts 6 | have aged out since it was written. Some portions of this document remain 7 | accurate, but in general, it would be wise to verify any claims made below._ 8 | 9 | --- 10 | 11 | This repository is meant to serve as a continuously-growing knowledge base 12 | regarding the Objective-C ABI. It is a **work in progress** and is currently 13 | just my personal observations and notes. It is likely that there are errors 14 | and/or misunderstandings. Pull requests with corrections or new knowledge are 15 | encouraged. 16 | 17 | NOTE: In examples where structures/types are shown, some names have been 18 | modified for clarity. Structure and member names shown do not always match 19 | Objective-C's source code or their canonical names. 20 | 21 | ## Pointer types 22 | 23 | The Objective-C ABI utilizes numerous different encoding techniques for 24 | pointers. Different techniques are used depending on the part of the ABI in 25 | question, the architecture of the binary, and the OS version the binary was 26 | compiled for. Objective-C's internal structures make heavy use of (wacky) 27 | pointers, so understanding different pointer types is important before looking 28 | at structures in detail. 29 | 30 | The following table is a **rough outline** of how pointers are typically 31 | encoded, given an OS version and architecture: 32 | 33 | | OS | Architecture | Tagged | Type | 34 | |-----------|--------------|--------|----------------| 35 | | macOS | x86_64 | No | Absolute | 36 | | macOS 11 | arm64 | Yes | Absolute | 37 | | macOS 12+ | arm64e | Ye | Image-relative | 38 | 39 | The table above is **not** all-encompassing and ignores certain scenarios; use 40 | this for a general understanding, not for writing tools. 41 | 42 | ### Absolute (C-style) pointers 43 | 44 | Absolute (C-style) pointers are sometimes used, but are becoming less common. 45 | They look like "normal" pointers (shown below) and don't have any quirks. As of 46 | 2022, absolute pointers are mostly only still used on x86_64. 47 | 48 | ``` 49 | 0x1000311a0 50 | 0x100031240 51 | ``` 52 | 53 | ### Image-relative pointers 54 | 55 | **Image-relative pointers** are really offsets relative to the image base. 56 | Assuming a standard Mach-O image base of 4 GB (`0x100000000`), the 57 | image-relative pointer `0x25207` is equivalent to `0x100025207`. Image-relative 58 | pointers can be combined with some of the other pointer types detailed below, 59 | e.g. a tagged pointer. 60 | 61 | ### Tagged pointers 62 | 63 | **Tagged pointers**—used in arm64(e) binaries—pointers that carry 64 | metadata with them in their upper bits. They look like the following: 65 | 66 | ``` 67 | 0x800d6ae1000331c8 68 | 0x800000002c1d8 69 | ``` 70 | 71 | The metadata portion of the pointer can be removed by performing a bitwise AND 72 | against `0x7ffffffff`. 73 | 74 | ``` 75 | 0x800d6ae1000331c8 & 0x7ffffffff = 0x1000331c8 76 | 0x800000002c1d8 & 0x7ffffffff = 0x2c1d8 77 | ``` 78 | 79 | This illustrates an important point: even after removing the metadata, the 80 | resulting pointer can be **absolute** or **image-relative**. 81 | 82 | ### Fast pointers 83 | 84 | _WARNING: This section is likely incomplete._ 85 | 86 | **Fast pointers** are a type of pointer often used by Swift's ABI, which 87 | overlaps Objective-C's ABI. Fast pointers store metadata in the two least 88 | significant bits of the pointer. These two bits should be removed when 89 | attempting to dereference the pointer. 90 | 91 | Here is an example of a fast pointer, taken from the arm64e slice of the main 92 | binary of macOS 12's "Console.app": 93 | 94 | ``` 95 | 0x20000000096d62 96 | ``` 97 | 98 | The pointer is fast, tagged, and relative. After removing the tags and adding 99 | the image base, the resulting pointer is produced: 100 | 101 | ``` 102 | 0x100096d62 103 | ``` 104 | 105 | However, if we dereferenced this pointer as is, we would be (incorrectly) 106 | pointing into the middle of the following structure: 107 | 108 | ```c 109 | 100096d60 struct class_ro_t ro__TtC7Console21ReportsViewController = { 110 | 100096d60 uint32_t flags = 0x184 111 | 100096d64 uint32_t start = 0x48 112 | 100096d68 uint32_t size = 0x68 113 | 100096d6c uint32_t reserved = 0x0 114 | 100096d70 void* ivar_layout = NULL 115 | 100096d78 void* name = nm__TtC7Console21ReportsViewController 116 | 100096d80 void* methods = ml__TtC7Console21ReportsViewController 117 | 100096d88 void* protocols = NULL 118 | 100096d90 void* vars = 0x10008c068 119 | 100096d98 void* weak_ivar_layout = NULL 120 | 100096da0 void* properties = 0x10008c0f0 121 | 100096da8 } 122 | ``` 123 | 124 | > The definition of this strucutre isn't really important here, but is used to 125 | > illustrate an example. 126 | 127 | After removing the flags in the two least significant bits, we get the correct 128 | pointer, which points to the structure's base: 129 | 130 | ``` 131 | 0x100096d62 & (~0b11) = 0x100096d60 132 | ``` 133 | 134 | ## Types and structures 135 | 136 | This section details structures used by the Objective-C ABI. The structure 137 | layouts shown in the following subsections represent how structures are laid out 138 | _at rest_, i.e. in a binary, not necessarily how they are laid out in memory at 139 | runtime. 140 | 141 | ### Classes 142 | 143 | **Class** structures are defined in the `__objc_data` section and are the basis 144 | for how classes are stored inside the binary. Class structures have the 145 | following layout: 146 | 147 | ```c 148 | struct class_t { 149 | const void* isa; 150 | const class_t* super; /* Superclass' `class_t` structure */ 151 | void* cache; /* Commonly `nullptr` */ 152 | void* vtable; /* Commonly `nullptr` */ 153 | const class_ro_t* data; /* Associated class RO structure */ 154 | }; 155 | ``` 156 | 157 | > Any of the members which are pointers may be tagged or image-relative. The 158 | > `data` member may be a fast pointer. 159 | 160 | ### Class RO 161 | 162 | **Class RO** (read-only) structures are defined inside of the `__objc_const` 163 | section. Most of the "interesting" information about classes—such as name, 164 | methods, or instance variables—is stored in class RO structures. The 165 | layout of class RO structures is as follows: 166 | 167 | ```c 168 | struct class_ro_t { 169 | uint32_t flags; /* Flags */ 170 | uint32_t start; 171 | uint32_t size; 172 | uint32_t reserved; /* Reserved for future use */ 173 | const void* ivar_layout; 174 | const char* name; /* Class name */ 175 | const method_list_t* methods; /* Base method list */ 176 | const void* protocols; 177 | const void* vars; 178 | const void* weak_ivar_layout; 179 | const void* properties; 180 | }; 181 | ``` 182 | 183 | > Any of the members which are pointers may be tagged or image-relative. 184 | 185 | ### Method lists 186 | 187 | **Method lists** are found in the `__objc_const` section on x86_64, or under 188 | `__objc_methlist` section under arm64(e). Method lists describe all of the base 189 | methods associated with a class. 190 | 191 | #### Header 192 | 193 | A method list begins with a **method list header**, which has the following 194 | format: 195 | 196 | ```c 197 | struct method_list_t { 198 | uint32_t size_and_flags; /* Entry size and flags */ 199 | uint32_t count; /* Number of entries */ 200 | }; 201 | ``` 202 | 203 | The `size_and_flags` field tells the size of the each entry, and optionally has 204 | flags in high bits. Flags can be isolated by performing a bitwise AND of the 205 | `size_and_flags` field with `0xffff0000`. The following flags may be present in 206 | the `size_and_flags` field: 207 | 208 | | Flag | Value | 209 | |------------------------|--------------| 210 | | `HAS_RELATIVE_OFFSETS` | `0x80000000` | 211 | | `HAS_DIRECT_SELECTORS` | `0x40000000` | 212 | 213 | The `HAS_RELATIVE_OFFSETS` flag tells whether the pointers in the method list's 214 | entries should be treated as absolute pointers or relative offsets (explained in 215 | more detail below). 216 | 217 | The `HAS_DIRECT_SELECTORS` flag tells what the name/selector field in this 218 | method lists's entries points to. If the flag is set, the field points directly 219 | to a string; if the flag is unset, it points to a selector reference. 220 | 221 | #### Entries 222 | 223 | Immediately following the method list's header comes one or more **entries**. 224 | Entries may have either of the following layouts: 225 | 226 | ```c 227 | struct method_t { 228 | const char* name; /* Pointer to name (or selector reference?) */ 229 | const char* types; /* Pointer to type info */ 230 | void* imp; /* Pointer to implementation (code) */ 231 | }; 232 | 233 | struct method_entry_t { 234 | int32_t name; /* Relative offset to name or selector reference */ 235 | int32_t types; /* Relative offset to type info */ 236 | int32_t imp; /* Relative offset to implementation (code) */ 237 | }; 238 | ``` 239 | 240 | The former entry layout (hereafter the "legacy" format) is older and is 241 | primarily used on x86_64. The latter format (hereafter the "modern" format) is 242 | newer and is used on arm64(e). 243 | 244 | The modern format always utilizes **relative offsets**—not to be confused 245 | with image-relative pointers—to point to its associated data. These 246 | offsets are to be interpreted as offsets from the structure member's absolute 247 | position in memory. 248 | 249 | To illustrate the difference between the two formats, have a look at the method 250 | list for the `BitFieldBox` in macOS 12's Calculator.app. Below is an excerpt of 251 | the method list from the x86_64 slice, which uses the legacy format: 252 | 253 | ```c 254 | 100028198 struct method_list_t ml_BitFieldBox = { 255 | 100028198 uint32_t size_and_flags = 0x18 /* No flags; 24-byte entries */ 256 | 10002819c uint32_t count = 0x4 /* 4 methods (only 1 shown here) */ 257 | 1000281a0 } 258 | 1000281a0 struct method_t mt_initWithFrame_ = { 259 | 1000281a0 const char* name = 0x10001c6e8 /* &"initWithFrame:" */ 260 | 1000281a8 const char* types = 0x1000214c5 /* &"@48@0:8{CGRect={CGPoint=dd}{CGSize=dd}}16" */ 261 | 1000281b0 void* imp = 0x100006829 /* [BitFieldBox initWithFrame:] */ 262 | 1000281b8 } 263 | ``` 264 | 265 | In comparison, here is an excerpt of the same method list in the arm64e slice of 266 | the binary: 267 | 268 | ```c 269 | 10001f3b0 struct method_list_t ml_BitFieldBox = { 270 | 10001f3b0 uint32_t size_and_flags = 0x8000000c /* HAS_RELATIVE_OFFSETS; 12-byte entries */ 271 | 10001f3b4 uint32_t count = 0x4 /* 4 entries (only 1 shown here) */ 272 | 10001f3b8 } 273 | 10001f3b8 struct method_entry_t mt_initWithFrame_ = { 274 | 10001f3b8 int32_t name = 0x12498 /* 0x100031850 = &&"initWithFrame:" */ 275 | 10001f3bc int32_t types = 0x6165 /* 0x100025521 = &"@48@0:8{CGRect={CGPoint=dd}{CGSize=dd}}16" */ 276 | 10001f3c0 int32_t imp = -0x16d34 /* 0x10000868c = [BitFieldBox initWithFrame:] */ 277 | 10001f3c4 } 278 | ``` 279 | 280 | 281 | ## Further reading 282 | 283 | Below are some resources and projects related to the Objective-C ABI that may be 284 | useful references. 285 | 286 | **Resources** 287 | 288 | - https://developpaper.com/in-depth-analysis-of-the-structure-of-the-method-in-objc/ 289 | - https://www.fortinet.com/blog/threat-research/rewriting-idapython-script-objc2-xrefs-helper-py-for-hopper 290 | 291 | **Projects** 292 | 293 | - https://github.com/cxnder/ktool 294 | - https://github.com/blacktop/ipsw 295 | - https://github.com/jonpalmisc/ObjectiveNinja 296 | --------------------------------------------------------------------------------