├── LICENSE ├── README.md ├── binary_search.c ├── binary_search.png ├── binary_search_small.png ├── graph3.png └── monobound_bsearch.c /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (C) 2014-2022 Igor van den Hoven 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining 4 | a copy of this software and associated documentation files (the 5 | "Software"), to deal in the Software without restriction, including 6 | without limitation the rights to use, copy, modify, merge, publish, 7 | distribute, sublicense, and/or sell copies of the Software, and to 8 | permit persons to whom the Software is furnished to do so, subject to 9 | the following conditions: 10 | 11 | The above copyright notice and this permission notice shall be 12 | included in all copies or substantial portions of the Software. 13 | 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 15 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 16 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 17 | IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY 18 | CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, 19 | TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 20 | SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 21 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | The most commonly used binary search variant was first published by Hermann Bottenbruch in 1962 and hasn't notably changed since. Below I'll describe several novel variants with improved performance. The most notable variant, the monobound binary search, executes two to four times faster than the standard binary search on arrays smaller than 1 million 32 bit integers. 2 | 3 | A source code implementation in C is available in the [binary_search.c](https://github.com/scandum/binary_search/blob/master/binary_search.c) file which also contains a bench marking routine. A graph with performance results is included at the bottom of this page. Keep in mind performance will vary depending on hardware and compiler optimizations. 4 | 5 | I'll briefly describe each variant and notable optimizations below, followed by some performance graphs. 6 | 7 | Deferred Detection of Equality 8 | ------------------------------ 9 | 10 | By skipping the detection of equality until the binary search has finished (which does not allow for early termination) each loop contains 1 key check, 1 integer check and 2 integer assignments. This is pretty much the standard algorithm that has been used since 1962. 11 | 12 | Pointer Optimizations 13 | --------------------- 14 | 15 | You can get another 10% performance boost by using pointer operations. I forgo such optimizations in the C implementation to keep things as readable as possible. 16 | 17 | Unsigned Integer Optimization 18 | ----------------------------- 19 | 20 | You can get a further performance boost by using unsigned instead of signed integers. 21 | 22 | Stability 23 | --------- 24 | 25 | All the implementations in binary_search.c should be stable. If you search an array containing the elements `[1][4][7][7][7][9]` and you search for the number `7`, it should return the right most index. This is needed if you want to use a binary search in a stable sorting algorithm. The binary search being stable shouldn't notably slow down performance. 26 | 27 | Zero length array 28 | ----------------- 29 | 30 | All the implementations in binary_search.c should correctly handle the case where the search function is called with 0 as the array length. 31 | 32 | Compilation 33 | ----------- 34 | 35 | For the monobound binary search variant to perform well the source code must be compiled with the -O1, -O2, or -O3 optimization flag. 36 | 37 | Boundless Binary Search 38 | ----------------------- 39 | 40 | The boundless binary search is faster than the standard binary search since the loop contains 1 key check, 1 integer check, and (on average) 1.5 integer assignments. The performance gain will vary depending on various factors, but should be around 20% when comparing 32 bit integers. 41 | 42 | Doubletapped Binary Search 43 | -------------------------- 44 | 45 | When you get to the end of a binary search and there are 2 elements left it takes exactly 2 if checks to finish. By doing two equality checks at the end you can finish up in either 1 or 2 if checks. Subsequently, on average, the doubletapped binary search performs slightly fewer key checks. 46 | 47 | Monobound Binary Search 48 | ----------------------- 49 | 50 | The monobound binary search is similar to the boundless binary search but uses an extra variable to simplify calculations and performs slightly more keychecks. It's up to 60% faster than the standard binary search when comparing 32 bit integers. On small arrays the performance difference is even greater. 51 | 52 | The performance gain is due to dynamic loop unrolling, which the traditional binary search (by trying to minimize the number of key checks) does not allow. Loop unrolling in turn allows various other potential optimizations at the compiler and cpu level. 53 | 54 | Tripletapped Binary Search 55 | -------------------------- 56 | 57 | When you get to the end of a binary search and there are 3 elements left it takes 2.5 if checks to finish. The monobound binary search, however, takes 3 if checks. Subsequently the tripletapped variant performs 3 equality checks at the end with early termination, resulting in slightly fewer key checks and if the data aligns properly, slightly improved performance. 58 | 59 | Quaternary Binary Search 60 | ---------------------------------- 61 | 62 | The dynamic unrolling of loops is often limited to 16 iterations. By narrowing down the search range by a fourth each loop, instead of a half, it takes 16 iteriations to search 4294967296 elements, instead of 65536. This optimizations slows things down slightly for smaller arrays, but can give a notable gain on larger arrays. 63 | 64 | Monobound Interpolated Binary Search 65 | ------------------------------------ 66 | 67 | When you have an even distribution you can make an educated guess as to the location of the index. Due to the expense of the initial check and exponential search, the interpolated binary search is unlikely to outperform other binary searches on arrays with less than 1000 elements. When the distribution is uneven performance will drop, but not significantly. 68 | 69 | A practical application for an interpolated binary search would be looking up authorization keys. 70 | 71 | Adaptive Binary Search 72 | ---------------------- 73 | 74 | The adaptive binary search is optimized for repeated binary searches on the same array. When it observes a pattern it switches from a binary search to an exponential search. Unlike the interpolated search the adaptive search works on uneven distributions as well. 75 | 76 | A practical application for an adaptive binary search would be accessing a unicode lookup table. 77 | 78 | Small array benchmark graph 79 | --------------------------- 80 | The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04). The source code was compiled using `gcc -O3 binary-search.c`. Each test was ran 1,000 times with the time (in seconds) reported of the best run. 81 | 82 | The graph below shows the execution speed on arrays with 1, 2, 4, 8, 16, 32, 64, and 128 elements on an Intel i3 quad-core processor. 83 | 84 | ![binary search graph](/binary_search_small.png) 85 | 86 |
data table 87 | 88 | | Name | Items | Hits | Misses | Checks | Time | 89 | | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | 90 | | linear_search | 1 | 806 | 9194 | 10000 | 0.000029 | 91 | | standard_binary_search | 1 | 806 | 9194 | 10000 | 0.000031 | 92 | | monobound_binary_search | 1 | 806 | 9194 | 10000 | 0.000033 | 93 | | | | | | | | 94 | | linear_search | 2 | 1034 | 8966 | 19495 | 0.000039 | 95 | | standard_binary_search | 2 | 1034 | 8966 | 20000 | 0.000074 | 96 | | monobound_binary_search | 2 | 1034 | 8966 | 20000 | 0.000036 | 97 | | | | | | | | 98 | | linear_search | 4 | 775 | 9225 | 38862 | 0.000046 | 99 | | standard_binary_search | 4 | 775 | 9225 | 30000 | 0.000122 | 100 | | monobound_binary_search | 4 | 775 | 9225 | 30000 | 0.000041 | 101 | | | | | | | | 102 | | linear_search | 8 | 822 | 9178 | 77133 | 0.000064 | 103 | | standard_binary_search | 8 | 822 | 9178 | 40000 | 0.000177 | 104 | | monobound_binary_search | 8 | 822 | 9178 | 40000 | 0.000050 | 105 | | | | | | | | 106 | | linear_search | 16 | 1141 | 8859 | 151154 | 0.000116 | 107 | | standard_binary_search | 16 | 1141 | 8859 | 50000 | 0.000219 | 108 | | monobound_binary_search | 16 | 1141 | 8859 | 50000 | 0.000064 | 109 | | | | | | | | 110 | | linear_search | 32 | 1145 | 8855 | 302324 | 0.000218 | 111 | | standard_binary_search | 32 | 1145 | 8855 | 60000 | 0.000270 | 112 | | monobound_binary_search | 32 | 1145 | 8855 | 60000 | 0.000074 | 113 | | | | | | | | 114 | | linear_search | 64 | 1096 | 8904 | 605248 | 0.000409 | 115 | | standard_binary_search | 64 | 1096 | 8904 | 70000 | 0.000321 | 116 | | monobound_binary_search | 64 | 1096 | 8904 | 70000 | 0.000084 | 117 | | | | | | | | 118 | | linear_search | 128 | 1046 | 8954 | 1214120 | 0.000749 | 119 | | standard_binary_search | 128 | 1046 | 8954 | 80000 | 0.000386 | 120 | | monobound_binary_search | 128 | 1046 | 8954 | 80000 | 0.000097 | 121 | 122 |
123 | 124 | Large array benchmark graph 125 | --------------------------- 126 | The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04). The source code was compiled using `gcc -O3 binary-search.c`. Each test was ran 10,000 times with the time (in seconds) reported of the best run. 127 | 128 | The graph below shows the execution speed on arrays with 10, 100, 1000, 10000, 100000, and 1000000 elements on an Intel i3 quad-core processor. 129 | 130 | ![binary search graph](/binary_search.png) 131 | 132 |
data table 133 | 134 | | Name | Items | Hits | Misses | Checks | Time | 135 | | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | 136 | | standard_binary_search | 10 | 910 | 9090 | 43646 | 0.000182 | 137 | | boundless_binary_search | 10 | 910 | 9090 | 43646 | 0.000156 | 138 | | monobound_binary_search | 10 | 910 | 9090 | 50000 | 0.000060 | 139 | | monobound_interpolated_search | 10 | 910 | 9090 | 64027 | 0.000203 | 140 | | | | | | | | 141 | | standard_binary_search | 100 | 1047 | 8953 | 77085 | 0.000361 | 142 | | boundless_binary_search | 100 | 1047 | 8953 | 77085 | 0.000292 | 143 | | monobound_binary_search | 100 | 1047 | 8953 | 80000 | 0.000096 | 144 | | monobound_interpolated_search | 100 | 1047 | 8953 | 92421 | 0.000234 | 145 | | | | | | | | 146 | | standard_binary_search | 1000 | 1041 | 8959 | 109808 | 0.000610 | 147 | | boundless_binary_search | 1000 | 1041 | 8959 | 109808 | 0.000489 | 148 | | monobound_binary_search | 1000 | 1041 | 8959 | 110000 | 0.000137 | 149 | | monobound_interpolated_search | 1000 | 1041 | 8959 | 108509 | 0.000147 | 150 | | | | | | | | 151 | | standard_binary_search | 10000 | 1024 | 8976 | 143580 | 0.000804 | 152 | | boundless_binary_search | 10000 | 1024 | 8976 | 143580 | 0.000651 | 153 | | monobound_binary_search | 10000 | 1024 | 8976 | 150000 | 0.000204 | 154 | | monobound_interpolated_search | 10000 | 1024 | 8976 | 109353 | 0.000202 | 155 | | | | | | | | 156 | | standard_binary_search | 100000 | 1040 | 8960 | 176860 | 0.001087 | 157 | | boundless_binary_search | 100000 | 1040 | 8960 | 176860 | 0.000903 | 158 | | monobound_binary_search | 100000 | 1040 | 8960 | 180000 | 0.000360 | 159 | | monobound_interpolated_search | 100000 | 1040 | 8960 | 123144 | 0.000290 | 160 | | | | | | | | 161 | | standard_binary_search | 1000000 | 993 | 9007 | 209529 | 0.001570 | 162 | | boundless_binary_search | 1000000 | 993 | 9007 | 209529 | 0.001369 | 163 | | monobound_binary_search | 1000000 | 993 | 9007 | 210000 | 0.000691 | 164 | | monobound_interpolated_search | 1000000 | 993 | 9007 | 124870 | 0.000374 | 165 | 166 |
167 | 168 | monobound_bsearch() vs bsearch() 169 | -------------------------------- 170 | The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04). The source code was compiled using `gcc -O3 monobound_bsearch.c`. Each test was ran 1,000 times with the time (in seconds) reported of the best run. 171 | 172 | The graph below shows the execution speed on arrays with 10, 100, 1K, 10K, 100K, 1M, and 10M elements on an Intel i3 quad-core processor. The bsearch function is the one provided by stdlib.h. 173 | 174 | ![binary search graph](/graph3.png) 175 | 176 |
data table 177 | 178 | | Name | Items | Hits | Misses | Checks | Time | 179 | | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | 180 | | monobound | 10 | 930 | 9070 | 48149 | 0.000136 | 181 | | bsearch | 10 | 930 | 9070 | 34677 | 0.000202 | 182 | | | | | | | | 183 | | monobound | 100 | 1103 | 8897 | 77539 | 0.000189 | 184 | | bsearch | 100 | 1103 | 8897 | 66470 | 0.000410 | 185 | | | | | | | | 186 | | monobound | 1000 | 1033 | 8967 | 107845 | 0.000265 | 187 | | bsearch | 1000 | 1033 | 8967 | 98703 | 0.000623 | 188 | | | | | | | | 189 | | monobound | 10000 | 1033 | 8967 | 147232 | 0.000357 | 190 | | bsearch | 10000 | 1033 | 8967 | 132342 | 0.000820 | 191 | | | | | | | | 192 | | monobound | 100000 | 1014 | 8986 | 177576 | 0.000539 | 193 | | bsearch | 100000 | 1014 | 8986 | 165785 | 0.001111 | 194 | | | | | | | | 195 | | monobound | 1000000 | 998 | 9002 | 207938 | 0.001124 | 196 | | bsearch | 1000000 | 998 | 9002 | 198443 | 0.001603 | 197 | | | | | | | | 198 | | monobound | 10000000 | 974 | 9026 | 247324 | 0.002641 | 199 | | bsearch | 10000000 | 974 | 9026 | 232174 | 0.003784 | 200 | 201 |
202 | -------------------------------------------------------------------------------- /binary_search.c: -------------------------------------------------------------------------------- 1 | /* 2 | Copyright (C) 2014-2021 Igor van den Hoven ivdhoven@gmail.com 3 | */ 4 | 5 | /* 6 | Permission is hereby granted, free of charge, to any person obtaining 7 | a copy of this software and associated documentation files (the 8 | "Software"), to deal in the Software without restriction, including 9 | without limitation the rights to use, copy, modify, merge, publish, 10 | distribute, sublicense, and/or sell copies of the Software, and to 11 | permit persons to whom the Software is furnished to do so, subject to 12 | the following conditions: 13 | 14 | The above copyright notice and this permission notice shall be 15 | included in all copies or substantial portions of the Software. 16 | 17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 18 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 19 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 20 | IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY 21 | CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, 22 | TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 23 | SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 24 | */ 25 | 26 | /* 27 | Binary Search v1.7 28 | 29 | Compile using: gcc -O3 binary-search.c 30 | */ 31 | 32 | #include 33 | #include 34 | #include 35 | #include 36 | #include 37 | 38 | unsigned int checks; 39 | 40 | // linear search, needs to run backwards so it's stable 41 | 42 | int linear_search(int *array, unsigned int array_size, int key) 43 | { 44 | unsigned int top = array_size; 45 | 46 | while (top--) 47 | { 48 | ++checks; 49 | 50 | if (key == array[top]) 51 | { 52 | return top; 53 | } 54 | } 55 | return -1; 56 | } 57 | 58 | // faster than linear on larger arrays 59 | 60 | int breaking_linear_search(int *array, unsigned int array_size, int key) 61 | { 62 | unsigned int top = array_size; 63 | 64 | if (array_size == 0) 65 | { 66 | return -1; 67 | } 68 | 69 | while (--top) 70 | { 71 | ++checks; 72 | 73 | if (key >= array[top]) 74 | { 75 | break; 76 | } 77 | } 78 | ++checks; 79 | 80 | if (key == array[top]) 81 | { 82 | return top; 83 | } 84 | return -1; 85 | } 86 | 87 | // the standard binary search from text books 88 | 89 | int standard_binary_search(int *array, unsigned int array_size, int key) 90 | { 91 | int bot, mid, top; 92 | 93 | if (array_size == 0) 94 | { 95 | return -1; 96 | } 97 | 98 | bot = 0; 99 | top = array_size - 1; 100 | 101 | while (bot < top) 102 | { 103 | mid = top - (top - bot) / 2; 104 | 105 | ++checks; 106 | 107 | if (key < array[mid]) 108 | { 109 | top = mid - 1; 110 | } 111 | else 112 | { 113 | bot = mid; 114 | } 115 | } 116 | 117 | ++checks; 118 | 119 | if (key == array[top]) 120 | { 121 | return top; 122 | } 123 | return -1; 124 | } 125 | 126 | // faster than the standard binary search, same number of checks 127 | 128 | int boundless_binary_search(int *array, unsigned int array_size, int key) 129 | { 130 | unsigned int mid, bot; 131 | 132 | if (array_size == 0) 133 | { 134 | return -1; 135 | } 136 | bot = 0; 137 | mid = array_size; 138 | 139 | while (mid > 1) 140 | { 141 | ++checks; 142 | 143 | if (key >= array[bot + mid / 2]) 144 | { 145 | bot += mid++ / 2; 146 | } 147 | mid /= 2; 148 | } 149 | 150 | ++checks; 151 | 152 | if (key == array[bot]) 153 | { 154 | return bot; 155 | } 156 | 157 | return -1; 158 | } 159 | 160 | // always double tap ⁍⁍ 161 | 162 | int doubletapped_binary_search(int *array, unsigned int array_size, int key) 163 | { 164 | unsigned int mid, bot; 165 | 166 | bot = 0; 167 | mid = array_size; 168 | 169 | while (mid > 2) 170 | { 171 | ++checks; 172 | 173 | if (key >= array[bot + mid / 2]) 174 | { 175 | bot += mid++ / 2; 176 | } 177 | mid /= 2; 178 | } 179 | 180 | while (mid--) 181 | { 182 | ++checks; 183 | 184 | if (key == array[bot + mid]) 185 | { 186 | return bot + mid; 187 | } 188 | } 189 | 190 | return -1; 191 | } 192 | 193 | // faster than the boundless binary search, more checks 194 | 195 | int monobound_binary_search(int *array, unsigned int array_size, int key) 196 | { 197 | unsigned int bot, mid, top; 198 | 199 | if (array_size == 0) 200 | { 201 | return -1; 202 | } 203 | bot = 0; 204 | top = array_size; 205 | 206 | while (top > 1) 207 | { 208 | mid = top / 2; 209 | 210 | ++checks; 211 | 212 | if (key >= array[bot + mid]) 213 | { 214 | bot += mid; 215 | } 216 | top -= mid; 217 | } 218 | 219 | ++checks; 220 | 221 | if (key == array[bot]) 222 | { 223 | return bot; 224 | } 225 | return -1; 226 | } 227 | 228 | // heck, always triple tap ⁍⁍⁍ 229 | 230 | int tripletapped_binary_search(int *array, unsigned int array_size, int key) 231 | { 232 | unsigned int bot, mid, top; 233 | 234 | bot = 0; 235 | top = array_size; 236 | 237 | while (top > 3) 238 | { 239 | mid = top / 2; 240 | 241 | ++checks; 242 | 243 | if (key >= array[bot + mid]) 244 | { 245 | bot += mid; 246 | } 247 | top -= mid; 248 | } 249 | 250 | while (top--) 251 | { 252 | ++checks; 253 | 254 | if (key == array[bot + top]) 255 | { 256 | return bot + top; 257 | } 258 | } 259 | return -1; 260 | } 261 | 262 | // better performance on large arrays 263 | 264 | int monobound_quaternary_search(int *array, unsigned int array_size, int key) 265 | { 266 | unsigned int bot, mid, top; 267 | 268 | if (array_size == 0) 269 | { 270 | return -1; 271 | } 272 | bot = 0; 273 | top = array_size; 274 | 275 | while (top >= 65536) 276 | { 277 | mid = top / 4; 278 | top -= mid * 3; 279 | 280 | ++checks; 281 | if (key < array[bot + mid * 2]) 282 | { 283 | ++checks; 284 | if (key >= array[bot + mid]) 285 | { 286 | bot += mid; 287 | } 288 | } 289 | else 290 | { 291 | bot += mid * 2; 292 | 293 | ++checks; 294 | if (key >= array[bot + mid]) 295 | { 296 | bot += mid; 297 | } 298 | } 299 | } 300 | 301 | while (top > 3) 302 | { 303 | mid = top / 2; 304 | 305 | ++checks; 306 | 307 | if (key >= array[bot + mid]) 308 | { 309 | bot += mid; 310 | } 311 | top -= mid; 312 | } 313 | 314 | while (top--) 315 | { 316 | ++checks; 317 | 318 | if (key == array[bot + top]) 319 | { 320 | return bot + top; 321 | } 322 | } 323 | return -1; 324 | } 325 | 326 | // requires an even distribution 327 | 328 | int monobound_interpolated_search(int *array, unsigned int array_size, int key) 329 | { 330 | unsigned int bot, mid, top; 331 | int min, max; 332 | 333 | if (array_size == 0) 334 | { 335 | return -1; 336 | } 337 | 338 | ++checks; 339 | 340 | if (key < array[0]) 341 | { 342 | return -1; 343 | } 344 | 345 | bot = array_size - 1; 346 | 347 | ++checks; 348 | 349 | if (key >= array[bot]) 350 | { 351 | return ++checks && array[bot] == key ? bot : -1; 352 | } 353 | 354 | min = array[0]; 355 | max = array[bot]; 356 | 357 | bot *= (float) (key - min) / (max - min); 358 | 359 | top = 64; 360 | 361 | ++checks; 362 | 363 | if (key >= array[bot]) 364 | { 365 | while (1) 366 | { 367 | if (bot + top >= array_size) 368 | { 369 | top = array_size - bot; 370 | break; 371 | } 372 | bot += top; 373 | 374 | ++checks; 375 | 376 | if (key < array[bot]) 377 | { 378 | bot -= top; 379 | break; 380 | } 381 | top *= 2; 382 | } 383 | } 384 | else 385 | { 386 | while (1) 387 | { 388 | if (bot < top) 389 | { 390 | top = bot; 391 | bot = 0; 392 | 393 | break; 394 | } 395 | bot -= top; 396 | 397 | ++checks; 398 | 399 | if (key >= array[bot]) 400 | { 401 | break; 402 | } 403 | top *= 2; 404 | } 405 | } 406 | 407 | while (top > 3) 408 | { 409 | mid = top / 2; 410 | 411 | ++checks; 412 | 413 | if (key >= array[bot + mid]) 414 | { 415 | bot += mid; 416 | } 417 | top -= mid; 418 | } 419 | 420 | while (top--) 421 | { 422 | ++checks; 423 | 424 | if (key == array[bot + top]) 425 | { 426 | return bot + top; 427 | } 428 | } 429 | 430 | return -1; 431 | } 432 | 433 | // requires in order sequential access 434 | 435 | int adaptive_binary_search(int *array, unsigned int array_size, int key) 436 | { 437 | static unsigned int i, balance; 438 | unsigned int bot, top, mid; 439 | 440 | if (balance >= 32 || array_size <= 64) 441 | { 442 | bot = 0; 443 | top = array_size; 444 | 445 | goto monobound; 446 | } 447 | bot = i; 448 | top = 32; 449 | 450 | ++checks; 451 | 452 | if (key >= array[bot]) 453 | { 454 | while (1) 455 | { 456 | if (bot + top >= array_size) 457 | { 458 | top = array_size - bot; 459 | break; 460 | } 461 | bot += top; 462 | 463 | ++checks; 464 | 465 | if (key < array[bot]) 466 | { 467 | bot -= top; 468 | break; 469 | } 470 | top *= 2; 471 | } 472 | } 473 | else 474 | { 475 | while (1) 476 | { 477 | if (bot < top) 478 | { 479 | top = bot; 480 | bot = 0; 481 | 482 | break; 483 | } 484 | bot -= top; 485 | 486 | ++checks; 487 | 488 | if (key >= array[bot]) 489 | { 490 | break; 491 | } 492 | top *= 2; 493 | } 494 | } 495 | 496 | monobound: 497 | 498 | while (top > 3) 499 | { 500 | mid = top / 2; 501 | 502 | ++checks; 503 | 504 | if (key >= array[bot + mid]) 505 | { 506 | bot += mid; 507 | } 508 | top -= mid; 509 | } 510 | balance = i > bot ? i - bot : bot - i; 511 | 512 | i = bot; 513 | 514 | while (top) 515 | { 516 | ++checks; 517 | 518 | if (key == array[bot + --top]) 519 | { 520 | return bot + top; 521 | } 522 | } 523 | return -1; 524 | } 525 | 526 | // benchmark 527 | 528 | long long utime() 529 | { 530 | struct timeval now_time; 531 | 532 | gettimeofday(&now_time, NULL); 533 | 534 | return now_time.tv_sec * 1000000LL + now_time.tv_usec; 535 | } 536 | 537 | int *o_array, *r_array; 538 | int density, max, loop, top, rnd, runs, sequential; 539 | long long start, end, best; 540 | 541 | void execute(int (*algo_func)(int *, unsigned int, int), const char * algo_name) 542 | { 543 | long long stable, value; 544 | unsigned int cnt, hit, miss; 545 | 546 | srand(rnd); 547 | 548 | best = 0; 549 | 550 | for (int run = runs ; run ; --run) 551 | { 552 | checks = 0; 553 | hit = 0; 554 | miss = 0; 555 | 556 | if (sequential) 557 | { 558 | stable = 0; 559 | 560 | start = utime(); 561 | 562 | for (cnt = 0 ; cnt < loop ; cnt++) 563 | { 564 | value = algo_func(o_array, max, r_array[cnt]); 565 | 566 | stable += value; 567 | 568 | if (value >= 0) 569 | { 570 | hit++; 571 | } 572 | else 573 | { 574 | miss++; 575 | } 576 | } 577 | } 578 | else 579 | { 580 | start = utime(); 581 | 582 | for (cnt = 0 ; cnt < loop ; cnt++) 583 | { 584 | if (algo_func(o_array, max, r_array[cnt]) >= 0) 585 | { 586 | hit++; 587 | } 588 | else 589 | { 590 | miss++; 591 | } 592 | } 593 | } 594 | end = utime(); 595 | 596 | if (best == 0 || end - start < best) 597 | { 598 | best = end - start; 599 | } 600 | } 601 | 602 | if (sequential) 603 | { 604 | printf("| %30s | %10d | %10d | %10d | %10d | %10f | %10lld |\n", algo_name, max, hit, miss, checks, best / 1000000.0, stable); 605 | } 606 | else 607 | { 608 | printf("| %30s | %10d | %10d | %10d | %10d | %10f |\n", algo_name, max, hit, miss, checks, best / 1000000.0); 609 | } 610 | 611 | } 612 | 613 | #define run(algo) execute(&algo, #algo) 614 | 615 | int cmp_int(const void * a, const void * b) 616 | { 617 | return *(int *) a - *(int *) b; 618 | } 619 | 620 | int main(int argc, char **argv) 621 | { 622 | int cnt, val; 623 | 624 | sequential = 0; 625 | max = 100000; 626 | loop = 10000; 627 | density = 10; // max * density should stay under 2 billion 628 | runs = 1000; 629 | 630 | rnd = time(NULL); 631 | 632 | if (argc > 1) 633 | max = atoi(argv[1]); 634 | 635 | if (argc > 2) 636 | runs = atoi(argv[2]); 637 | 638 | if (argc > 3) 639 | loop = atoi(argv[3]); 640 | 641 | if (argc > 4) 642 | rnd = atoi(argv[4]); 643 | 644 | o_array = (int *) malloc(max * sizeof(int)); 645 | r_array = (int *) malloc(loop * sizeof(int)); 646 | 647 | if ((long long) max * (long long) density > 2000000000) 648 | { 649 | density = 2; 650 | } 651 | 652 | for (cnt = 0, val = 0 ; cnt < max ; cnt++) 653 | { 654 | o_array[cnt] = (val += rand() % (density * 2)); 655 | } 656 | 657 | top = o_array[max - 1] + density; 658 | 659 | srand(rnd); 660 | 661 | for (cnt = 0 ; cnt < loop ; cnt++) 662 | { 663 | r_array[cnt] = rand() % top; 664 | } 665 | 666 | printf("Benchmark: array size: %d, runs: %d, repetitions: %d, seed: %d, density: %d\n\n", max, runs, loop, rnd, density); 667 | 668 | printf("Even distribution with %d 32 bit integers, random access\n\n", max); 669 | 670 | printf("| %30s | %10s | %10s | %10s | %10s | %10s |\n", "Name", "Items", "Hits", "Misses", "Checks", "Time"); 671 | printf("| %30s | %10s | %10s | %10s | %10s | %10s |\n", "----------", "----------", "----------", "----------", "----------", "----------"); 672 | 673 | if (max <= 128 && max != 10 && max != 100) 674 | { 675 | run(linear_search); 676 | run(breaking_linear_search); 677 | } 678 | run(standard_binary_search); 679 | run(boundless_binary_search); 680 | run(doubletapped_binary_search); 681 | run(monobound_binary_search); 682 | run(tripletapped_binary_search); 683 | run(monobound_quaternary_search); 684 | run(monobound_interpolated_search); 685 | run(adaptive_binary_search); 686 | 687 | // uneven distribution 688 | 689 | for (cnt = 0 ; cnt < max / 2 ; cnt++) 690 | { 691 | o_array[cnt] = cnt - cnt % 2; 692 | } 693 | 694 | top = o_array[max - 1] + 2; 695 | 696 | printf("\n\nUneven distribution with %d 32 bit integers, random access\n\n", max); 697 | 698 | printf("| %30s | %10s | %10s | %10s | %10s | %10s |\n", "Name", "Items", "Hits", "Misses", "Checks", "Time"); 699 | printf("| %30s | %10s | %10s | %10s | %10s | %10s |\n", "----------", "----------", "----------", "----------", "----------", "----------"); 700 | 701 | run(monobound_binary_search); 702 | run(monobound_interpolated_search); 703 | run(adaptive_binary_search); 704 | 705 | // sequential access, check stability while at it 706 | 707 | sequential = 1; 708 | 709 | qsort(r_array, loop, sizeof(int), cmp_int); 710 | 711 | printf("\n\nUneven distribution with %d 32 bit integers, sequential access\n\n", max); 712 | 713 | printf("| %30s | %10s | %10s | %10s | %10s | %10s | %10s\n", "Name", "Items", "Hits", "Misses", "Checks", "Time", "Stability"); 714 | printf("| %30s | %10s | %10s | %10s | %10s | %10s | %10s\n", "----------", "----------", "----------", "----------", "----------", "----------", "----------"); 715 | 716 | if (max <= 128 && max != 10 && max != 100) 717 | { 718 | run(linear_search); 719 | run(breaking_linear_search); 720 | } 721 | run(standard_binary_search); 722 | run(boundless_binary_search); 723 | run(doubletapped_binary_search); 724 | run(monobound_binary_search); 725 | run(tripletapped_binary_search); 726 | run(monobound_quaternary_search); 727 | run(monobound_interpolated_search); 728 | run(adaptive_binary_search); 729 | 730 | free(o_array); 731 | free(r_array); 732 | 733 | return 0; 734 | } 735 | -------------------------------------------------------------------------------- /binary_search.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/scandum/binary_search/ff7c12a4704018cd84b011bfcbe333257a941aa2/binary_search.png -------------------------------------------------------------------------------- /binary_search_small.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/scandum/binary_search/ff7c12a4704018cd84b011bfcbe333257a941aa2/binary_search_small.png -------------------------------------------------------------------------------- /graph3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/scandum/binary_search/ff7c12a4704018cd84b011bfcbe333257a941aa2/graph3.png -------------------------------------------------------------------------------- /monobound_bsearch.c: -------------------------------------------------------------------------------- 1 | /* 2 | Copyright (C) 2014-2021 Igor van den Hoven ivdhoven@gmail.com 3 | */ 4 | 5 | /* 6 | Permission is hereby granted, free of charge, to any person obtaining 7 | a copy of this software and associated documentation files (the 8 | "Software"), to deal in the Software without restriction, including 9 | without limitation the rights to use, copy, modify, merge, publish, 10 | distribute, sublicense, and/or sell copies of the Software, and to 11 | permit persons to whom the Software is furnished to do so, subject to 12 | the following conditions: 13 | 14 | The above copyright notice and this permission notice shall be 15 | included in all copies or substantial portions of the Software. 16 | 17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 18 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 19 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 20 | IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY 21 | CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, 22 | TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 23 | SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 24 | */ 25 | 26 | /* 27 | v1.7 28 | 29 | Compile using: gcc -O3 monobound_bsearch.c 30 | */ 31 | 32 | #include 33 | #include 34 | #include 35 | #include 36 | #include 37 | 38 | int checks; 39 | 40 | // Avoid inlining to guarantee the benchmark is fair. 41 | 42 | __attribute__ ((noinline)) int cmp_int(const void * a, const void * b) 43 | { 44 | int fa = *(int *)a; 45 | int fb = *(int *)b; 46 | 47 | checks++; 48 | 49 | return fa - fb; 50 | } 51 | 52 | // Monobound binary search using gcc's bsearch() interface. Since the 53 | // comparison is slow there's no deferred detection of equality. 54 | 55 | void *monobound_bsearch(const void *key, const void *array, size_t nmemb, size_t size, int (*cmp)(const void *, const void *)) 56 | { 57 | size_t mid, top; 58 | int val; 59 | char *piv, *base = (char *) array; 60 | 61 | mid = top = nmemb; 62 | 63 | while (mid) 64 | { 65 | mid = top / 2; 66 | 67 | piv = base + mid * size; 68 | 69 | val = cmp(key, piv); 70 | 71 | if (val == 0) 72 | { 73 | return piv; 74 | } 75 | if (val >= 0) 76 | { 77 | base = piv; 78 | } 79 | top -= mid; 80 | } 81 | return NULL; 82 | } 83 | 84 | // benchmark 85 | 86 | long long utime() 87 | { 88 | struct timeval now_time; 89 | 90 | gettimeofday(&now_time, NULL); 91 | 92 | return now_time.tv_sec * 1000000LL + now_time.tv_usec; 93 | } 94 | 95 | int *o_array, *r_array; 96 | int density, max, loop, top, rnd, runs; 97 | long long start, end, best; 98 | 99 | void execute(void *(*algo_func)(const void *, const void *, size_t, size_t, int (*cmp)(const void *, const void *)), const char * algo_name) 100 | { 101 | unsigned int cnt, hit, miss; 102 | 103 | srand(rnd); 104 | 105 | best = 0; 106 | 107 | for (int run = runs ; run ; --run) 108 | { 109 | hit = 0; 110 | miss = 0; 111 | checks = 0; 112 | 113 | start = utime(); 114 | 115 | for (cnt = 0 ; cnt < loop ; cnt++) 116 | { 117 | if (algo_func(r_array + cnt, o_array, max, sizeof(int), cmp_int) != NULL) 118 | { 119 | hit++; 120 | } 121 | else 122 | { 123 | miss++; 124 | } 125 | } 126 | end = utime(); 127 | 128 | if (best == 0 || end - start < best) 129 | { 130 | best = end - start; 131 | } 132 | } 133 | 134 | printf("| %30s | %10d | %10d | %10d | %10d | %10f |\n", algo_name, max, hit, miss, checks, best / 1000000.0); 135 | } 136 | 137 | #define run(algo) execute(&algo, #algo) 138 | 139 | int main(int argc, char **argv) 140 | { 141 | int cnt, val; 142 | 143 | max = 100000; 144 | loop = 10000; 145 | density = 10; // max * density should stay under 2 billion 146 | runs = 1000; 147 | 148 | rnd = time(NULL); 149 | 150 | if (argc > 1) 151 | max = atoi(argv[1]); 152 | 153 | if (argc > 2) 154 | runs = atoi(argv[2]); 155 | 156 | if (argc > 3) 157 | loop = atoi(argv[3]); 158 | 159 | if (argc > 4) 160 | rnd = atoi(argv[4]); 161 | 162 | o_array = (int *) malloc(max * sizeof(int)); 163 | r_array = (int *) malloc(loop * sizeof(int)); 164 | 165 | if ((long long) max * (long long) density > 2000000000) 166 | { 167 | density = 2; 168 | } 169 | 170 | for (cnt = 0, val = 0 ; cnt < max ; cnt++) 171 | { 172 | o_array[cnt] = (val += rand() % (density * 2)); 173 | } 174 | 175 | top = o_array[max - 1] + density; 176 | 177 | srand(rnd); 178 | 179 | for (cnt = 0 ; cnt < loop ; cnt++) 180 | { 181 | r_array[cnt] = rand() % top; 182 | } 183 | 184 | printf("Benchmark: array size: %d, runs: %d, repetitions: %d, seed: %d, density: %d\n\n", max, runs, loop, rnd, density); 185 | 186 | printf("Even distribution with %d 32 bit integers, random access\n\n", max); 187 | 188 | printf("| %30s | %10s | %10s | %10s | %10s | %10s |\n", "Name", "Items", "Hits", "Misses", "Checks", "Time"); 189 | printf("| %30s | %10s | %10s | %10s | %10s | %10s |\n", "----------", "----------", "----------", "----------", "----------", "----------"); 190 | 191 | run(monobound_bsearch); 192 | run(bsearch); 193 | 194 | free(o_array); 195 | free(r_array); 196 | 197 | return 0; 198 | } 199 | --------------------------------------------------------------------------------