├── .gitignore ├── README.md ├── benchmark.py ├── images ├── 1.jpg ├── 2.jpg ├── 3.jpg └── 4.jpeg ├── main.py ├── pyproject.toml └── submodule ├── .gitignore ├── build.sh ├── main.c ├── objdump_output.txt └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | .idea/** 2 | __pycache__/ -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Beating Bisect With Branchless Binary Search 2 | 3 | I recently read this article: https://probablydance.com/2023/04/27/beautiful-branchless-binary-search/, and I was inspired. 4 | First I implemented the same algorithm in pure Python: 5 | 6 | https://github.com/juliusgeo/branchless_bisect/blob/81fa8050c92d69e147a829bfd065b6f0bcee0bcf/main.py#L1-L15 7 | 8 | And then I compared it against `sortedcontainers`'s implementation of bisect_left across a large range of array sizes: 9 | 10 | ![image](/images/1.jpg "Figure 1") 11 | 12 | Pretty handily beats it! However, most people using Python would probably be using `bisect_left` from the `bisect` library in the stdlib. 13 | To try to beat that implementation, though, I will have to descend into C-land. Here's my implementation as a Python C-extension: 14 | 15 | https://github.com/juliusgeo/branchless_bisect/blob/81fa8050c92d69e147a829bfd065b6f0bcee0bcf/submodule/main.c#L14-L36 16 | 17 | ![image](/images/2.jpg "Figure 2") 18 | 19 | That beats it as well! Admittedly this is only for arrays of size up to `2**29`, but still pretty cool. 20 | 21 | Now, you might be asking, how does it perform on non-powers-of-two? I made a graph using the following parameters: 22 | ```python 23 | sizes = [i for i in range(0, 2**15)] 24 | ``` 25 | ![image](/images/4.jpeg "Figure 4") 26 | 27 | 28 | I also checked whether it successfully compiled with a `CMOVE` instruction: 29 | 30 | https://github.com/juliusgeo/branchless_bisect/blob/81fa8050c92d69e147a829bfd065b6f0bcee0bcf/submodule/objdump_output.txt#L67-L71 31 | 32 | It does! You can see the full compiled dump in `objdump_output.txt`. 33 | Now, here are all of them combined (what you will get if you run `main.py`): 34 | 35 | ![image](/images/3.jpg "Figure 3") 36 | 37 | This repo contains the submodule and benchmarking code I used to obtain these results. 38 | -------------------------------------------------------------------------------- /benchmark.py: -------------------------------------------------------------------------------- 1 | from sortedcontainers import SortedList 2 | from bisect import bisect_left as bisect_left_c, bisect_right as bisect_right_c 3 | from bl_bl import bl_bisect_left, bl_bisect_right 4 | bisect_left = lambda arr, value: arr.bisect_left(value) 5 | from main import branchless_bisect_left 6 | from timeit import timeit 7 | import matplotlib.pyplot as plt 8 | from random import randint 9 | times = [] 10 | sizes = [i for i in range(2**20, 2**20+1000)] 11 | times_me = [] 12 | times_c = [] 13 | times_bl_c = [] 14 | # bisect_left 15 | num_trials = 20 16 | for size in sizes: 17 | test_arr = list(range(size)) 18 | value=randint(1, size-1) 19 | times_me.append(timeit(lambda:branchless_bisect_left(test_arr, value), number=num_trials)) 20 | test_arr2 = SortedList(test_arr) 21 | times.append(timeit(lambda:bisect_left(test_arr2, value), number=num_trials)) 22 | times_c.append(timeit(lambda:bisect_left_c(test_arr, value), number=num_trials)) 23 | times_bl_c.append(timeit(lambda:bl_bisect_left(test_arr, value), number=num_trials)) 24 | assert len(set([i(test_arr2 if i==bisect_left else test_arr, value) for i in [bisect_left, branchless_bisect_left, bisect_left_c, 25 | bl_bisect_left]])) ==1 26 | plt.plot(sizes, times, label="bisect_left") 27 | plt.plot(sizes, times_c, label="bisect_left_c") 28 | plt.plot(sizes, times_me, label="branchless_bisect_left") 29 | plt.plot(sizes, times_bl_c, label="branchless_bisect_left_c") 30 | plt.xlabel('Input Size') 31 | plt.ylabel('Execution Time (s)') 32 | plt.title('Performance vs Input Size') 33 | plt.legend() 34 | 35 | plt.show() 36 | 37 | # bisect right 38 | from bisect import bisect_right as bisect_right_c 39 | from main import branchless_bisect_right 40 | from bl_bl import bl_bisect_right 41 | bisect_right = lambda arr, value: arr.bisect_right(value) 42 | times = [] 43 | times_me = [] 44 | times_c = [] 45 | times_bl_c = [] 46 | 47 | num_trials = 20 48 | for size in sizes: 49 | test_arr = list(range(size)) 50 | value=randint(1, size-1) 51 | times_me.append(timeit(lambda:branchless_bisect_right(test_arr, value), number=num_trials)) 52 | test_arr2 = SortedList(test_arr) 53 | times.append(timeit(lambda:bisect_right(test_arr2, value), number=num_trials)) 54 | times_c.append(timeit(lambda:bisect_right_c(test_arr, value), number=num_trials)) 55 | times_bl_c.append(timeit(lambda:bl_bisect_right(test_arr, value), number=num_trials)) 56 | assert len(set([i(test_arr2 if i==bisect_right else test_arr, value) for i in [bisect_right, branchless_bisect_right, bisect_right_c, bl_bisect_right]])) ==1 57 | 58 | plt.plot(sizes, times, label="bisect_right") 59 | plt.plot(sizes, times_c, label="bisect_right_c") 60 | plt.plot(sizes, times_me, label="branchless_bisect_right") 61 | plt.plot(sizes, times_bl_c, label="branchless_bisect_right_c") 62 | plt.xlabel('Input Size') 63 | plt.ylabel('Execution Time (s)') 64 | plt.title('Performance vs Input Size') 65 | plt.legend() 66 | 67 | plt.show() -------------------------------------------------------------------------------- /images/1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juliusgeo/branchless_bisect/dff6e6241cfae000c813194ae6d050c9a0852ffc/images/1.jpg -------------------------------------------------------------------------------- /images/2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juliusgeo/branchless_bisect/dff6e6241cfae000c813194ae6d050c9a0852ffc/images/2.jpg -------------------------------------------------------------------------------- /images/3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juliusgeo/branchless_bisect/dff6e6241cfae000c813194ae6d050c9a0852ffc/images/3.jpg -------------------------------------------------------------------------------- /images/4.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/juliusgeo/branchless_bisect/dff6e6241cfae000c813194ae6d050c9a0852ffc/images/4.jpeg -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | from operator import __le__, __lt__ 2 | def bisect(arr, value, compare): 3 | begin,end=0,len(arr) 4 | length=end-begin 5 | if length==0: 6 | return end 7 | step=1<<(length.bit_length()-1) 8 | if step!=length and compare(arr[step], value): 9 | length-=step+1 10 | if length==0: 11 | return end 12 | begin=end-step 13 | step>>=1 14 | for s in (step:=step>>1 for _ in range(step.bit_length())): 15 | begin+=s*compare(arr[s+begin], value) 16 | return begin+int(compare(arr[begin], value)) 17 | def branchless_bisect_left(arr, value): 18 | return bisect(arr, value, __lt__) 19 | 20 | def branchless_bisect_right(arr, value): 21 | return bisect(arr, value, __le__) 22 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [build-system] 2 | requires = ["sortedcontainers", "setuptools"] -------------------------------------------------------------------------------- /submodule/.gitignore: -------------------------------------------------------------------------------- 1 | build 2 | /venv/** 3 | .idea 4 | **/dist/ 5 | *.so 6 | *.egg-info -------------------------------------------------------------------------------- /submodule/build.sh: -------------------------------------------------------------------------------- 1 | set -ex 2 | find . | grep -E "(__pycache__|\.pyc|\.egg-info|\.pyo$)" | xargs rm -rf 3 | rm -r build 4 | python -m pip install --ignore-installed -e . -------------------------------------------------------------------------------- /submodule/main.c: -------------------------------------------------------------------------------- 1 | #define PY_SSIZE_T_CLEAN 2 | #include "Python.h" 3 | #include 4 | #include 5 | 6 | #if defined(__OPTIMIZE__) 7 | 8 | /* LIKELY(), UNLIKELY() definition */ 9 | /* Checks taken from 10 | https://github.com/python/cpython/blob/main/Objects/obmalloc.c */ 11 | #if defined(__GNUC__) && (__GNUC__ > 2) 12 | # define UNLIKELY(value) __builtin_expect((value), 0) 13 | # define LIKELY(value) __builtin_expect((value), 1) 14 | #else 15 | # define UNLIKELY(value) (value) 16 | # define LIKELY(value) (value) 17 | #endif 18 | 19 | /* ASSUME() definition */ 20 | #ifdef __clang__ 21 | # define ASSUME(value) (void)__builtin_assume(value) 22 | #elif defined(__GNUC__) && (__GNUC__ > 4) && (__GNUC_MINOR__ >= 5) 23 | /* https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79469 24 | `__builtin_object_size(( (void)(value), "" ), 2)` checks whether the expression is constant. */ 25 | # define ASSUME(value) \ 26 | (__builtin_object_size(( (void)(value), "" ), 2) \ 27 | ? ((value) ? (void)0 : __builtin_unreachable()) \ 28 | : (void)0 \ 29 | ) 30 | #elif defined(_MSC_VER) 31 | /* We don't have the "just to make sure it's constant" check here. */ 32 | # define ASSUME(value) (void)__assume(value) 33 | #else 34 | # define ASSUME(value) ((void)0) 35 | #endif 36 | 37 | #endif /* __OPTIMIZE__ */ 38 | 39 | static inline Py_ALWAYS_INLINE PyObject * 40 | bisect(PyObject *list_obj, PyObject *value, int compare) 41 | { 42 | int res, retres; 43 | PyObject *resitem; 44 | Py_ssize_t size, step, begin = 0; 45 | 46 | size = PyList_GET_SIZE(list_obj); 47 | 48 | if (UNLIKELY(size == 0)) { 49 | return PyLong_FromLong(0L); 50 | } 51 | 52 | #ifdef __GNUC__ 53 | step = 1 << (63 - __builtin_clzll(size)); 54 | #elif defined(_MSC_VER) 55 | { 56 | unsigned lzcnt; 57 | assert(_BitScanReverse64(&lzcnt, size)); 58 | step = 1 << (63 - lzcnt); 59 | } 60 | #else 61 | { 62 | step = 1; 63 | Py_ssize_t length = size; 64 | while ((length >>= 1)) { 65 | step <<= 1; 66 | } 67 | } 68 | #endif 69 | 70 | if (LIKELY(step != size)) { 71 | res = PyObject_RichCompareBool(PyList_GET_ITEM(list_obj, begin + step - 1), value, compare); 72 | if (UNLIKELY(res == -1)) { 73 | return NULL; 74 | } 75 | ASSUME(res == 0 || res == 1); 76 | begin += res * step; 77 | } 78 | 79 | resitem = PyList_GET_ITEM(list_obj, begin); 80 | retres = -1; 81 | 82 | Py_ssize_t next = 0; 83 | for (step >>= 1; step != 0; step >>= 1) { 84 | if (LIKELY((next = begin + step) < size)) { 85 | PyObject *item = PyList_GET_ITEM(list_obj, next); 86 | res = PyObject_RichCompareBool(item, value, compare); 87 | if (UNLIKELY(res == -1)) { 88 | return NULL; 89 | } 90 | ASSUME(res == 0 || res == 1); 91 | if (res) { 92 | begin = next; 93 | resitem = item; 94 | retres = res; 95 | } 96 | } 97 | } 98 | 99 | if (UNLIKELY(retres == -1)) { 100 | retres = PyObject_RichCompareBool(resitem, value, compare); 101 | if (UNLIKELY(retres == -1)) { 102 | return NULL; 103 | } 104 | ASSUME(res == 0 || res == 1); 105 | } 106 | return PyLong_FromSsize_t(begin + (begin < size && retres)); 107 | } 108 | 109 | static PyObject * 110 | bl_bisect_left(PyObject *self, 111 | PyObject *const *args, 112 | Py_ssize_t nargs) 113 | { 114 | PyObject *list_obj; 115 | PyObject *value; 116 | 117 | if (UNLIKELY(nargs != 2)) { 118 | PyErr_Format(PyExc_TypeError, 119 | "bisect_left() expected 2 arguments, got %zd", 120 | nargs); 121 | return NULL; 122 | } 123 | 124 | list_obj = args[0]; 125 | if (UNLIKELY(!PyList_CheckExact(list_obj))) { 126 | PyErr_Format(PyExc_TypeError, 127 | "bisect_left() expected 'list' (argument 0), got '%.200s'", 128 | Py_TYPE(list_obj)->tp_name); 129 | return NULL; 130 | } 131 | 132 | value = args[1]; 133 | 134 | return bisect(list_obj, value, Py_LT); 135 | } 136 | 137 | static PyObject * 138 | bl_bisect_right(PyObject *self, 139 | PyObject *const *args, 140 | Py_ssize_t nargs) 141 | { 142 | PyObject *list_obj; 143 | PyObject *value; 144 | 145 | if (UNLIKELY(nargs != 2)) { 146 | PyErr_Format(PyExc_TypeError, 147 | "bisect_right() expected 2 arguments, got %zd", 148 | nargs); 149 | return NULL; 150 | } 151 | 152 | list_obj = args[0]; 153 | if (UNLIKELY(!PyList_CheckExact(list_obj))) { 154 | PyErr_Format(PyExc_TypeError, 155 | "bisect_right() expected 'list' (argument 0), got '%.200s'", 156 | Py_TYPE(list_obj)->tp_name); 157 | return NULL; 158 | } 159 | 160 | value = args[1]; 161 | 162 | return bisect(list_obj, value, Py_LE); 163 | } 164 | 165 | static PyMethodDef bl_bl_methods[] = { 166 | {"bl_bisect_left", (PyCFunction)(void(*)(void))bl_bisect_left, METH_FASTCALL, 167 | "Branchless bisect left"}, 168 | {"bl_bisect_right", (PyCFunction)(void(*)(void))bl_bisect_right, METH_FASTCALL, 169 | "Branchless bisect left"}, 170 | {NULL, NULL, 0, NULL} /* Sentinel */ 171 | }; 172 | 173 | static struct PyModuleDef bl_bl_module = { 174 | PyModuleDef_HEAD_INIT, 175 | "bl_bl", /* name of module */ 176 | NULL, /* module documentation, may be NULL */ 177 | 0, /* size of per-interpreter state of the module, 178 | or -1 if the module keeps state in global variables. */ 179 | bl_bl_methods, 180 | }; 181 | 182 | PyMODINIT_FUNC PyInit_bl_bl(void) { 183 | return PyModule_Create(&bl_bl_module); 184 | } 185 | -------------------------------------------------------------------------------- /submodule/objdump_output.txt: -------------------------------------------------------------------------------- 1 | In archive build/temp.macosx-10.9-universal2-cpython-311/main.o: 2 | 3 | i386:x86-64: file format mach-o-x86-64 4 | 5 | 6 | Disassembly of section .text: 7 | 8 | 0000000000000000 <_PyInit_bl_bl>: 9 | 0: 55 push %rbp 10 | 1: 48 89 e5 mov %rsp,%rbp 11 | 4: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # b <_PyInit_bl_bl+0xb> 12 | b: be f5 03 00 00 mov $0x3f5,%esi 13 | 10: 5d pop %rbp 14 | 11: e9 00 00 00 00 jmp 16 <_PyInit_bl_bl+0x16> 15 | 16: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) 16 | 1d: 00 00 00 17 | 18 | 0000000000000020 <_bl_bisect_left>: 19 | 20: 55 push %rbp 20 | 21: 48 89 e5 mov %rsp,%rbp 21 | 24: 41 57 push %r15 22 | 26: 41 56 push %r14 23 | 28: 53 push %rbx 24 | 29: 48 83 ec 18 sub $0x18,%rsp 25 | 2d: 48 89 f7 mov %rsi,%rdi 26 | 30: 48 8d 35 f5 01 00 00 lea 0x1f5(%rip),%rsi # 22c <_bl_bl_methods+0x6c> 27 | 37: 31 db xor %ebx,%ebx 28 | 39: 48 8d 55 e0 lea -0x20(%rbp),%rdx 29 | 3d: 48 8d 4d d8 lea -0x28(%rbp),%rcx 30 | 41: 31 c0 xor %eax,%eax 31 | 43: e8 00 00 00 00 call 48 <_bl_bisect_left+0x28> 32 | 48: 85 c0 test %eax,%eax 33 | 4a: 0f 84 f1 00 00 00 je 141 <_bl_bisect_left+0x121> 34 | 50: 48 8b 7d e0 mov -0x20(%rbp),%rdi 35 | 54: e8 00 00 00 00 call 59 <_bl_bisect_left+0x39> 36 | 59: 48 85 c0 test %rax,%rax 37 | 5c: 74 41 je 9f <_bl_bisect_left+0x7f> 38 | 5e: 49 89 c6 mov %rax,%r14 39 | 61: bb 01 00 00 00 mov $0x1,%ebx 40 | 66: 48 83 f8 02 cmp $0x2,%rax 41 | 6a: 7c 23 jl 8f <_bl_bisect_left+0x6f> 42 | 6c: bb 01 00 00 00 mov $0x1,%ebx 43 | 71: 4c 89 f1 mov %r14,%rcx 44 | 74: 4c 89 f0 mov %r14,%rax 45 | 77: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 46 | 7e: 00 00 47 | 80: 48 d1 e8 shr %rax 48 | 83: 48 01 db add %rbx,%rbx 49 | 86: 48 83 f9 03 cmp $0x3,%rcx 50 | 8a: 48 89 c1 mov %rax,%rcx 51 | 8d: 77 f1 ja 80 <_bl_bisect_left+0x60> 52 | 8f: 4c 39 f3 cmp %r14,%rbx 53 | 92: 75 12 jne a6 <_bl_bisect_left+0x86> 54 | 94: 45 31 ff xor %r15d,%r15d 55 | 97: 48 83 fb 02 cmp $0x2,%rbx 56 | 9b: 73 69 jae 106 <_bl_bisect_left+0xe6> 57 | 9d: eb 32 jmp d1 <_bl_bisect_left+0xb1> 58 | 9f: 31 ff xor %edi,%edi 59 | a1: e9 93 00 00 00 jmp 139 <_bl_bisect_left+0x119> 60 | a6: 48 8b 7d e0 mov -0x20(%rbp),%rdi 61 | aa: 48 8d 73 ff lea -0x1(%rbx),%rsi 62 | ae: e8 00 00 00 00 call b3 <_bl_bisect_left+0x93> 63 | b3: 48 8b 75 d8 mov -0x28(%rbp),%rsi 64 | b7: 45 31 ff xor %r15d,%r15d 65 | ba: 48 89 c7 mov %rax,%rdi 66 | bd: 31 d2 xor %edx,%edx 67 | bf: e8 00 00 00 00 call c4 <_bl_bisect_left+0xa4> 68 | c4: 83 f8 01 cmp $0x1,%eax 69 | c7: 4c 0f 44 fb cmove %rbx,%r15 70 | cb: 48 83 fb 02 cmp $0x2,%rbx 71 | cf: 73 35 jae 106 <_bl_bisect_left+0xe6> 72 | d1: 4d 39 f7 cmp %r14,%r15 73 | d4: 7d 5e jge 134 <_bl_bisect_left+0x114> 74 | d6: 48 8b 7d e0 mov -0x20(%rbp),%rdi 75 | da: 4c 89 fe mov %r15,%rsi 76 | dd: e8 00 00 00 00 call e2 <_bl_bisect_left+0xc2> 77 | e2: 48 8b 75 d8 mov -0x28(%rbp),%rsi 78 | e6: 48 89 c7 mov %rax,%rdi 79 | e9: 31 d2 xor %edx,%edx 80 | eb: e8 00 00 00 00 call f0 <_bl_bisect_left+0xd0> 81 | f0: 31 ff xor %edi,%edi 82 | f2: 85 c0 test %eax,%eax 83 | f4: 40 0f 95 c7 setne %dil 84 | f8: eb 3c jmp 136 <_bl_bisect_left+0x116> 85 | fa: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 86 | 100: 48 83 fb 01 cmp $0x1,%rbx 87 | 104: 76 cb jbe d1 <_bl_bisect_left+0xb1> 88 | 106: 48 d1 fb sar %rbx 89 | 109: 49 8d 34 1f lea (%r15,%rbx,1),%rsi 90 | 10d: 4c 39 f6 cmp %r14,%rsi 91 | 110: 7d ee jge 100 <_bl_bisect_left+0xe0> 92 | 112: 48 8b 7d e0 mov -0x20(%rbp),%rdi 93 | 116: e8 00 00 00 00 call 11b <_bl_bisect_left+0xfb> 94 | 11b: 48 8b 75 d8 mov -0x28(%rbp),%rsi 95 | 11f: 48 89 c7 mov %rax,%rdi 96 | 122: 31 d2 xor %edx,%edx 97 | 124: e8 00 00 00 00 call 129 <_bl_bisect_left+0x109> 98 | 129: 48 98 cltq 99 | 12b: 48 0f af c3 imul %rbx,%rax 100 | 12f: 49 01 c7 add %rax,%r15 101 | 132: eb cc jmp 100 <_bl_bisect_left+0xe0> 102 | 134: 31 ff xor %edi,%edi 103 | 136: 4c 01 ff add %r15,%rdi 104 | 139: e8 00 00 00 00 call 13e <_bl_bisect_left+0x11e> 105 | 13e: 48 89 c3 mov %rax,%rbx 106 | 141: 48 89 d8 mov %rbx,%rax 107 | 144: 48 83 c4 18 add $0x18,%rsp 108 | 148: 5b pop %rbx 109 | 149: 41 5e pop %r14 110 | 14b: 41 5f pop %r15 111 | 14d: 5d pop %rbp 112 | 14e: c3 ret 113 | 114 | aarch64: file format mach-o-arm64 115 | 116 | 117 | Disassembly of section .text: 118 | 119 | 0000000000000000 <_PyInit_bl_bl>: 120 | 0: 90000000 adrp x0, 158 <_bl_bl_module> 121 | 4: 91000000 add x0, x0, #0x0 122 | 8: 52807ea1 mov w1, #0x3f5 // #1013 123 | c: 14000000 b 0 <_PyModule_Create2> 124 | 125 | 0000000000000010 <_bl_bisect_left>: 126 | 10: d10143ff sub sp, sp, #0x50 127 | 14: a90257f6 stp x22, x21, [sp, #32] 128 | 18: a9034ff4 stp x20, x19, [sp, #48] 129 | 1c: a9047bfd stp x29, x30, [sp, #64] 130 | 20: 910103fd add x29, sp, #0x40 131 | 24: aa0103e0 mov x0, x1 132 | 28: 910043e8 add x8, sp, #0x10 133 | 2c: 910063e9 add x9, sp, #0x18 134 | 30: a90023e9 stp x9, x8, [sp] 135 | 34: 90000001 adrp x1, 22c 136 | 38: 91000021 add x1, x1, #0x0 137 | 3c: 94000000 bl 0 <__PyArg_ParseTuple_SizeT> 138 | 40: 34000280 cbz w0, 90 <_bl_bisect_left+0x80> 139 | 44: f9400fe0 ldr x0, [sp, #24] 140 | 48: 94000000 bl 0 <_PyList_Size> 141 | 4c: b4000780 cbz x0, 13c <_bl_bisect_left+0x12c> 142 | 50: aa0003f3 mov x19, x0 143 | 54: 52800035 mov w21, #0x1 // #1 144 | 58: f100081f cmp x0, #0x2 145 | 5c: 540000eb b.lt 78 <_bl_bisect_left+0x68> // b.tstop 146 | 60: aa1303e8 mov x8, x19 147 | 64: d341fd09 lsr x9, x8, #1 148 | 68: d37ffab5 lsl x21, x21, #1 149 | 6c: f1000d1f cmp x8, #0x3 150 | 70: aa0903e8 mov x8, x9 151 | 74: 54ffff88 b.hi 64 <_bl_bisect_left+0x54> // b.pmore 152 | 78: eb1302bf cmp x21, x19 153 | 7c: 54000161 b.ne a8 <_bl_bisect_left+0x98> // b.any 154 | 80: d2800014 mov x20, #0x0 // #0 155 | 84: f1000abf cmp x21, #0x2 156 | 88: 540003e2 b.cs 104 <_bl_bisect_left+0xf4> // b.hs, b.nlast 157 | 8c: 14000011 b d0 <_bl_bisect_left+0xc0> 158 | 90: d2800000 mov x0, #0x0 // #0 159 | 94: a9447bfd ldp x29, x30, [sp, #64] 160 | 98: a9434ff4 ldp x20, x19, [sp, #48] 161 | 9c: a94257f6 ldp x22, x21, [sp, #32] 162 | a0: 910143ff add sp, sp, #0x50 163 | a4: d65f03c0 ret 164 | a8: f9400fe0 ldr x0, [sp, #24] 165 | ac: d10006a1 sub x1, x21, #0x1 166 | b0: 94000000 bl 0 <_PyList_GetItem> 167 | b4: f9400be1 ldr x1, [sp, #16] 168 | b8: 52800002 mov w2, #0x0 // #0 169 | bc: 94000000 bl 0 <_PyObject_RichCompareBool> 170 | c0: 7100041f cmp w0, #0x1 171 | c4: 9a9f02b4 csel x20, x21, xzr, eq // eq = none 172 | c8: f1000abf cmp x21, #0x2 173 | cc: 540001c2 b.cs 104 <_bl_bisect_left+0xf4> // b.hs, b.nlast 174 | d0: eb13029f cmp x20, x19 175 | d4: 5400030a b.ge 134 <_bl_bisect_left+0x124> // b.tcont 176 | d8: f9400fe0 ldr x0, [sp, #24] 177 | dc: aa1403e1 mov x1, x20 178 | e0: 94000000 bl 0 <_PyList_GetItem> 179 | e4: f9400be1 ldr x1, [sp, #16] 180 | e8: 52800002 mov w2, #0x0 // #0 181 | ec: 94000000 bl 0 <_PyObject_RichCompareBool> 182 | f0: 7100001f cmp w0, #0x0 183 | f4: 1a9f07e8 cset w8, ne // ne = any 184 | f8: 14000010 b 138 <_bl_bisect_left+0x128> 185 | fc: f10006bf cmp x21, #0x1 186 | 100: 54fffe89 b.ls d0 <_bl_bisect_left+0xc0> // b.plast 187 | 104: 9341feb5 asr x21, x21, #1 188 | 108: 8b150281 add x1, x20, x21 189 | 10c: eb13003f cmp x1, x19 190 | 110: 54ffff6a b.ge fc <_bl_bisect_left+0xec> // b.tcont 191 | 114: f9400fe0 ldr x0, [sp, #24] 192 | 118: 94000000 bl 0 <_PyList_GetItem> 193 | 11c: f9400be1 ldr x1, [sp, #16] 194 | 120: 52800002 mov w2, #0x0 // #0 195 | 124: 94000000 bl 0 <_PyObject_RichCompareBool> 196 | 128: 93407c08 sxtw x8, w0 197 | 12c: 9b0852b4 madd x20, x21, x8, x20 198 | 130: 17fffff3 b fc <_bl_bisect_left+0xec> 199 | 134: d2800008 mov x8, #0x0 // #0 200 | 138: 8b140100 add x0, x8, x20 201 | 13c: 94000000 bl 0 <_PyLong_FromLong> 202 | 140: a9447bfd ldp x29, x30, [sp, #64] 203 | 144: a9434ff4 ldp x20, x19, [sp, #48] 204 | 148: a94257f6 ldp x22, x21, [sp, #32] 205 | 14c: 910143ff add sp, sp, #0x50 206 | 150: d65f03c0 ret 207 | -------------------------------------------------------------------------------- /submodule/setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup, Extension 2 | 3 | module1 = Extension('bl_bl', 4 | sources = ['main.c'], 5 | extra_compile_args=["-O3"]) 6 | 7 | setup (name = 'bl_bl', 8 | version = '1.0', 9 | description = 'This is a demo package', 10 | ext_modules = [module1]) --------------------------------------------------------------------------------