├── LICENSE ├── README.md ├── images ├── crumsort.gif ├── graph1.png └── graph2.png └── src ├── bench.c ├── crumsort.c ├── crumsort.h ├── quadsort.c └── quadsort.h /LICENSE: -------------------------------------------------------------------------------- 1 | This is free and unencumbered software released into the public domain. 2 | 3 | Anyone is free to copy, modify, publish, use, compile, sell, or 4 | distribute this software, either in source code form or as a compiled 5 | binary, for any purpose, commercial or non-commercial, and by any 6 | means. 7 | 8 | In jurisdictions that recognize copyright laws, the author or authors 9 | of this software dedicate any and all copyright interest in the 10 | software to the public domain. We make this dedication for the benefit 11 | of the public at large and to the detriment of our heirs and 12 | successors. We intend this dedication to be an overt act of 13 | relinquishment in perpetuity of all present and future rights to this 14 | software under copyright law. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 19 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR 20 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 21 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 22 | OTHER DEALINGS IN THE SOFTWARE. 23 | 24 | For more information, please refer to 25 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Intro 2 | ----- 3 | This document describes a hybrid quicksort / mergesort named crumsort. The sort is in-place, unstable, adaptive, branchless, and has exceptional performance. 4 | 5 | Analyzer 6 | -------- 7 | Crumsort starts out with an analyzer that sorts fully in-order or reverse-order arrays using n comparisons. It also obtains a measure of presortedness for 4 segments of the array and switches to [quadsort](https://github.com/scandum/quadsort) if a segment is more than 50% ordered. 8 | 9 | Partitioning 10 | ------------ 11 | Partitioning is performed in a top-down manner similar to quicksort. Crumsort obtains the pseudomedian of 9 for partitions smaller than 2048 elements, and the median of 16 for paritions smaller than 65536. For larger partitions crumsort obtains the median of 128, 256, or 512 as an approximation of the cubic root of the partition size. While the square root is optimal in theory, the law of diminishing returns appears to apply to increasing the number of pivot candidates. Hardware limitations need to be factored in as well. 12 | 13 | For large partitions crumsort will swap 128-512 random elements to the start of the array, sort them with quadsort, and take the center right element. Using pseudomedian instead of median selection on large arrays is slower, likely due to cache pollution. 14 | 15 | The median element obtained will be referred to as the pivot. Partitions that grow smaller than 24 elements are sorted with quadsort. 16 | 17 | Fulcrum Partition 18 | ----------------- 19 | After obtaining a pivot the array is parsed from start to end using the fulcrum partitioning scheme. The scheme is similar to the original quicksort scheme known as the Hoare partition with some notable differences. The differences are perhaps best explained with two code examples. 20 | ```c 21 | int hoare_partition(int array[], int head, int tail) 22 | { 23 | int pivot = head++; 24 | int swap; 25 | 26 | while (1) 27 | { 28 | while (array[head] <= array[pivot] && head < tail) 29 | { 30 | head++; 31 | } 32 | 33 | while (array[tail] > array[pivot]) 34 | { 35 | tail--; 36 | } 37 | 38 | if (head >= tail) 39 | { 40 | swap = array[pivot]; array[pivot] = array[tail]; array[tail] = swap; 41 | 42 | return tail; 43 | } 44 | swap = array[head]; array[head] = array[tail]; array[tail] = swap; 45 | } 46 | } 47 | ``` 48 | 49 | ```c 50 | int fulcrum_partition(int array[], int head, int tail) 51 | { 52 | int pivot = array[head]; 53 | 54 | while (1) 55 | { 56 | if (array[tail] > pivot) 57 | { 58 | tail--; 59 | continue; 60 | } 61 | 62 | if (head >= tail) 63 | { 64 | array[head] = pivot; 65 | return head; 66 | } 67 | array[head++] = array[tail]; 68 | 69 | while (1) 70 | { 71 | if (head >= tail) 72 | { 73 | array[head] = pivot; 74 | return head; 75 | } 76 | 77 | if (array[head] <= pivot) 78 | { 79 | head++; 80 | continue; 81 | } 82 | array[tail--] = array[head]; 83 | break; 84 | } 85 | } 86 | } 87 | ``` 88 | Instead of using multiple swaps the fulcrum partition creates a 1 element swap space, with the pivot holding the original data. Doing so turns the 3 assignments from the swap into 2 assignments. Overall the fulcrum partition has a 10-20% performance improvement. 89 | 90 | The swap space of the fulcrum partition can easily be increased from 1 to 32 elements to allow it to perform 16 boundless comparisons at a time, which in turn also allows it to perform these comparisons in a branchless manner with little additional overhead. 91 | 92 | Worst case handling 93 | ------------------- 94 | To avoid run-away recursion crumsort switches to quadsort for both partitions if one partition is less than 1/16th the size of the other partition. On a distribution of random unique values the observed chance of a false positive is 1 in 1,336 for the pseudomedian of 9 and approximately 1 in 500,000 for the median of 16. 95 | 96 | Combined with the analyzer crumsort starts out with this makes the existence of killer patterns unlikely, other than a 33-50% performance slowdown by prematurely triggering the use of quadsort. 97 | 98 | Branchless optimizations 99 | ------------------------ 100 | Crumsort uses a branchless comparison optimization. The ability of quicksort to partition branchless was first described in "BlockQuicksort: How Branch Mispredictions don't affect Quicksort" by Stefan Edelkamp and Armin Weiss. Crumsort uses the fulcrum partitioning scheme where as BlockQuicksort uses a scheme resembling Hoare partitioning. 101 | 102 | Median selection uses a branchless comparison technique that selects the pseudomedian of 9 using 12 comparisons, and the pseudomedian of 25 using 42 comparisons. 103 | 104 | These optimizations do not work as well when the comparisons themselves are branched and the largest performance increase is on 32 and 64 bit integers. 105 | 106 | Generic data optimizations 107 | -------------------------- 108 | Crumsort uses a method that mimicks dual-pivot quicksort to improve generic data handling. If after a partition all elements were smaller or equal to the pivot, a reverse partition is performed, filtering out all elements equal to the pivot, next it carries on as usual. This typically only occurs when sorting tables with many identical values, like gender, age, etc. Crumsort has a small bias in its pivot selection to increase the odds of this happening. In addition, generic data performance is improved slightly by checking if the same pivot is chosen twice in a row, in which case it performs a reverse partition as well. Pivot retention was first introduced by [pdqsort](https://github.com/orlp/pdqsort). 109 | 110 | Small array optimizations 111 | ------------------------- 112 | Most modern quicksorts use insertion sort for partitions smaller than 24 elements. Crumsort uses quadsort which has a dedicated small array sorting routine that outperforms insertion sort. 113 | 114 | Data Types 115 | ---------- 116 | The C implementation of crumsort supports long doubles and 8, 16, 32, and 64 bit data types. By using pointers it's possible to sort any other data type, like strings. 117 | 118 | Interface 119 | --------- 120 | Crumsort uses the same interface as qsort, which is described in [man qsort](https://man7.org/linux/man-pages/man3/qsort.3p.html). 121 | 122 | Crumsort comes with the `crumsort_prim(void *array, size_t nmemb, size_t size)` function to perform primitive comparisons on arrays of 32 and 64 bit integers. Nmemb is the number of elements. Size should be either sizeof(int) or sizeof(long long) for signed integers, and sizeof(int) + 1 or sizeof(long long) + 1 for unsigned integers. Support for additional primitive as well as custom types can be added to fluxsort.h and quadsort.h. 123 | 124 | Porting 125 | ------- 126 | People wanting to port crumsort might want to have a look at [fluxsort](https://github.com/scandum/fluxsort), which is a little bit simpler because it's stable and out of place. There's also [piposort](https://github.com/scandum/piposort), a simplified implementation of quadsort. 127 | 128 | Memory 129 | ------ 130 | Crumsort uses 512 elements of stack memory, which is shared with quadsort. Recursion requires log n stack memory. 131 | 132 | Crumsort can be configured to use sqrt(n) memory, with a minimum memory requirement of 32 elements. 133 | 134 | Performance 135 | ----------- 136 | Crumsort will begin to outperform fluxsort on random data right around 1,000,000 elements. Since it runs on 512 elements of auxiliary memory the sorting of ordered data will be slower than fluxsort for larger arrays. 137 | 138 | Crumsort being unstable will scramble pre-existing patterns, making it less adaptive than fluxsort, which will switch to quadsort when it detects the emergence of ordered data during the partitioning phase. 139 | 140 | Because of the partitioning scheme crumsort is slower than pdqsort when sorting arrays of long doubles. Fixing this is on my todo list and I've devised a scheme to do so. The main focus of crumsort is the sorting of tables however, and crumsort will outperform pdqsort when long doubles are embedded within a table. In this case crumsort only has to move a 64 bit pointer, instead of a 128 bit float. 141 | 142 | To take full advantage of branchless operations the cmp macro needs to be uncommented in bench.c, which will increase the performance by 100% on primitive types. The crumsort_prim() function can be used to access primitive comparisons directly. In the case of 64 bit integers crumsort will outperform all radix sorts I've tested so far. Radix sorts still hold an advantage for 32 bit integers on arrays under 1 million elements. 143 | 144 | Big O 145 | ----- 146 | ``` 147 | ┌───────────────────────┐┌───────────────────────┐ 148 | │comparisons ││swap memory │ 149 | ┌───────────────┐├───────┬───────┬───────┤├───────┬───────┬───────┤┌──────┐┌─────────┐┌─────────┐ 150 | │name ││min │avg │max ││min │avg │max ││stable││partition││adaptive │ 151 | ├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤ 152 | │fluxsort ││n │n log n│n log n││1 │n │n ││yes ││yes ││yes │ 153 | ├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤ 154 | │quadsort ││n │n log n│n log n││1 │n │n ││yes ││no ││yes │ 155 | ├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤ 156 | │quicksort ││n log n│n log n│n² ││1 │1 │1 ││no ││yes ││no │ 157 | ├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤ 158 | │crumsort ││n │n log n│n log n││1 │1 │1 ││no ││yes ││yes │ 159 | └───────────────┘└───────┴───────┴───────┘└───────┴───────┴───────┘└──────┘└─────────┘└─────────┘ 160 | ``` 161 | 162 | Variants 163 | -------- 164 | - [crumsort-rs](https://github.com/google/crumsort-rs) is a parallelized Rust port of crumsort with a focus on random data. 165 | 166 | - [distcrum](https://github.com/mlochbaum/distcrum) is a crumsort / [rhsort](https://github.com/mlochbaum/rhsort) hybrid. 167 | 168 | Visualization 169 | ------------- 170 | In the visualization below two tests are performed on 512 elements. 171 | 172 | 1. Random order 173 | 2. Random % 10 174 | 175 | The upper half shows the swap memory (32 elements) and the bottom half shows the main memory. 176 | Colors are used to differentiate various operations. 177 | 178 | [![crumsort benchmark](/images/crumsort.gif)](https://www.youtube.com/watch?v=NRREkZeNaC4) 179 | 180 | Benchmarks 181 | ---------- 182 | 183 | The following benchmark was on WSL gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) using the [wolfsort](https://github.com/scandum/wolfsort) benchmark. 184 | The source code was compiled using g++ -O3 -w -fpermissive bench.c. Each test was ran 100 times on 100,000 elements. A table with the best and average time in seconds can be uncollapsed below the bar graph. Comparisons for fluxsort, crumsort and pdqsort are inlined. 185 | 186 | ![fluxsort vs crumsort vs pdqsort](https://github.com/scandum/crumsort/blob/main/images/graph1.png) 187 | 188 |
data table 189 | 190 | | Name | Items | Type | Best | Average | Compares | Samples | Distribution | 191 | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- | 192 | | pdqsort | 100000 | 128 | 0.005908 | 0.005975 | 0 | 100 | random order | 193 | | crumsort | 100000 | 128 | 0.008262 | 0.008316 | 0 | 100 | random order | 194 | | fluxsort | 100000 | 128 | 0.008567 | 0.008676 | 0 | 100 | random order | 195 | 196 | | Name | Items | Type | Best | Average | Compares | Samples | Distribution | 197 | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- | 198 | | pdqsort | 100000 | 64 | 0.002645 | 0.002668 | 0 | 100 | random order | 199 | | crumsort | 100000 | 64 | 0.001896 | 0.001927 | 0 | 100 | random order | 200 | | fluxsort | 100000 | 64 | 0.001942 | 0.001965 | 0 | 100 | random order | 201 | 202 | | Name | Items | Type | Best | Average | Loops | Samples | Distribution | 203 | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- | 204 | | pdqsort | 100000 | 32 | 0.002682 | 0.002705 | 0 | 100 | random order | 205 | | crumsort | 100000 | 32 | 0.001797 | 0.001812 | 0 | 100 | random order | 206 | | fluxsort | 100000 | 32 | 0.001834 | 0.001858 | 0 | 100 | random order | 207 | | | | | | | | | | 208 | | pdqsort | 100000 | 32 | 0.000801 | 0.000807 | 0 | 100 | random % 100 | 209 | | crumsort | 100000 | 32 | 0.000560 | 0.000575 | 0 | 100 | random % 100 | 210 | | fluxsort | 100000 | 32 | 0.000657 | 0.000670 | 0 | 100 | random % 100 | 211 | | | | | | | | | | 212 | | pdqsort | 100000 | 32 | 0.000098 | 0.000099 | 0 | 100 | ascending order | 213 | | crumsort | 100000 | 32 | 0.000043 | 0.000046 | 0 | 100 | ascending order | 214 | | fluxsort | 100000 | 32 | 0.000044 | 0.000044 | 0 | 100 | ascending order | 215 | | | | | | | | | | 216 | | pdqsort | 100000 | 32 | 0.003460 | 0.003483 | 0 | 100 | ascending saw | 217 | | crumsort | 100000 | 32 | 0.000628 | 0.000638 | 0 | 100 | ascending saw | 218 | | fluxsort | 100000 | 32 | 0.000328 | 0.000336 | 0 | 100 | ascending saw | 219 | | | | | | | | | | 220 | | pdqsort | 100000 | 32 | 0.002828 | 0.002850 | 0 | 100 | pipe organ | 221 | | crumsort | 100000 | 32 | 0.000359 | 0.000363 | 0 | 100 | pipe organ | 222 | | fluxsort | 100000 | 32 | 0.000215 | 0.000219 | 0 | 100 | pipe organ | 223 | | | | | | | | | | 224 | | pdqsort | 100000 | 32 | 0.000201 | 0.000203 | 0 | 100 | descending order | 225 | | crumsort | 100000 | 32 | 0.000055 | 0.000055 | 0 | 100 | descending order | 226 | | fluxsort | 100000 | 32 | 0.000055 | 0.000056 | 0 | 100 | descending order | 227 | | | | | | | | | | 228 | | pdqsort | 100000 | 32 | 0.003229 | 0.003260 | 0 | 100 | descending saw | 229 | | crumsort | 100000 | 32 | 0.000637 | 0.000645 | 0 | 100 | descending saw | 230 | | fluxsort | 100000 | 32 | 0.000328 | 0.000332 | 0 | 100 | descending saw | 231 | | | | | | | | | | 232 | | pdqsort | 100000 | 32 | 0.002558 | 0.002579 | 0 | 100 | random tail | 233 | | crumsort | 100000 | 32 | 0.000879 | 0.000895 | 0 | 100 | random tail | 234 | | fluxsort | 100000 | 32 | 0.000626 | 0.000631 | 0 | 100 | random tail | 235 | | | | | | | | | | 236 | | pdqsort | 100000 | 32 | 0.002660 | 0.002677 | 0 | 100 | random half | 237 | | crumsort | 100000 | 32 | 0.001200 | 0.001207 | 0 | 100 | random half | 238 | | fluxsort | 100000 | 32 | 0.001069 | 0.001084 | 0 | 100 | random half | 239 | | | | | | | | | | 240 | | pdqsort | 100000 | 32 | 0.002310 | 0.002328 | 0 | 100 | ascending tiles | 241 | | crumsort | 100000 | 32 | 0.001520 | 0.001534 | 0 | 100 | ascending tiles | 242 | | fluxsort | 100000 | 32 | 0.000294 | 0.000298 | 0 | 100 | ascending tiles | 243 | | | | | | | | | | 244 | | pdqsort | 100000 | 32 | 0.002659 | 0.002681 | 0 | 100 | bit reversal | 245 | | crumsort | 100000 | 32 | 0.001787 | 0.001800 | 0 | 100 | bit reversal | 246 | | fluxsort | 100000 | 32 | 0.001696 | 0.001721 | 0 | 100 | bit reversal | 247 | 248 |
249 | 250 | ![fluxsort vs crumsort vs pdqsort](https://github.com/scandum/crumsort/blob/main/images/graph2.png) 251 | 252 |
data table 253 | 254 | | Name | Items | Type | Best | Average | Compares | Samples | Distribution | 255 | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- | 256 | | pdqsort | 10 | 32 | 0.087140 | 0.087436 | 0.0 | 10 | random 10 | 257 | | crumsort | 10 | 32 | 0.049921 | 0.050132 | 0.0 | 10 | random 10 | 258 | | fluxsort | 10 | 32 | 0.048499 | 0.048724 | 0.0 | 10 | random 10 | 259 | | | | | | | | | | 260 | | pdqsort | 100 | 32 | 0.169583 | 0.169940 | 0.0 | 10 | random 100 | 261 | | crumsort | 100 | 32 | 0.113443 | 0.113882 | 0.0 | 10 | random 100 | 262 | | fluxsort | 100 | 32 | 0.113426 | 0.113955 | 0.0 | 10 | random 100 | 263 | | | | | | | | | | 264 | | pdqsort | 1000 | 32 | 0.207200 | 0.207858 | 0.0 | 10 | random 1000 | 265 | | crumsort | 1000 | 32 | 0.135912 | 0.136251 | 0.0 | 10 | random 1000 | 266 | | fluxsort | 1000 | 32 | 0.137019 | 0.138586 | 0.0 | 10 | random 1000 | 267 | | | | | | | | | | 268 | | pdqsort | 10000 | 32 | 0.238297 | 0.239006 | 0.0 | 10 | random 10000 | 269 | | crumsort | 10000 | 32 | 0.158249 | 0.158476 | 0.0 | 10 | random 10000 | 270 | | fluxsort | 10000 | 32 | 0.158445 | 0.158694 | 0.0 | 10 | random 10000 | 271 | | | | | | | | | | 272 | | pdqsort | 100000 | 32 | 0.270447 | 0.270855 | 0.0 | 10 | random 100000 | 273 | | crumsort | 100000 | 32 | 0.181770 | 0.183123 | 0.0 | 10 | random 100000 | 274 | | fluxsort | 100000 | 32 | 0.185907 | 0.186829 | 0.0 | 10 | random 100000 | 275 | | | | | | | | | | 276 | | pdqsort | 1000000 | 32 | 0.303525 | 0.305467 | 0.0 | 10 | random 1000000 | 277 | | crumsort | 1000000 | 32 | 0.206979 | 0.208153 | 0.0 | 10 | random 1000000 | 278 | | fluxsort | 1000000 | 32 | 0.215098 | 0.216294 | 0.0 | 10 | random 1000000 | 279 | | | | | | | | | | 280 | | pdqsort | 10000000 | 32 | 0.338767 | 0.342580 | 0 | 10 | random 10000000 | 281 | | crumsort | 10000000 | 32 | 0.234268 | 0.234664 | 0 | 10 | random 10000000 | 282 | | fluxsort | 10000000 | 32 | 0.264988 | 0.267283 | 0 | 10 | random 10000000 | 283 | 284 |
285 | -------------------------------------------------------------------------------- /images/crumsort.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/scandum/crumsort/d21d7c3aa13212c234967a9aaf037690825916be/images/crumsort.gif -------------------------------------------------------------------------------- /images/graph1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/scandum/crumsort/d21d7c3aa13212c234967a9aaf037690825916be/images/graph1.png -------------------------------------------------------------------------------- /images/graph2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/scandum/crumsort/d21d7c3aa13212c234967a9aaf037690825916be/images/graph2.png -------------------------------------------------------------------------------- /src/bench.c: -------------------------------------------------------------------------------- 1 | /* 2 | To compile use either: 3 | 4 | gcc -O3 bench.c 5 | 6 | or 7 | 8 | clang -O3 bench.c 9 | 10 | or 11 | 12 | g++ -O3 bench.c 13 | */ 14 | 15 | #include 16 | #include 17 | #include 18 | #include 19 | #include 20 | #include 21 | #include 22 | 23 | //#define cmp(a,b) (*(a) > *(b)) // uncomment for faster primitive comparisons 24 | 25 | const char *sorts[] = { "*", "qsort", "crumsort", "quadsort" }; 26 | 27 | //#define SKIP_STRINGS 28 | //#define SKIP_DOUBLES 29 | //#define SKIP_LONGS 30 | 31 | #if __has_include("blitsort.h") 32 | #include "blitsort.h" // curl "https://raw.githubusercontent.com/scandum/blitsort/master/src/blitsort.{c,h}" -o "blitsort.#1" 33 | #endif 34 | #if __has_include("crumsort.h") 35 | #include "crumsort.h" // curl "https://raw.githubusercontent.com/scandum/crumsort/master/src/crumsort.{c,h}" -o "crumsort.#1" 36 | #endif 37 | #if __has_include("dripsort.h") 38 | #include "dripsort.h" 39 | #endif 40 | #if __has_include("flowsort.h") 41 | #include "flowsort.h" 42 | #endif 43 | #if __has_include("fluxsort.h") 44 | #include "fluxsort.h" // curl "https://raw.githubusercontent.com/scandum/fluxsort/master/src/fluxsort.{c,h}" -o "fluxsort.#1" 45 | #endif 46 | #if __has_include("gridsort.h") 47 | #include "gridsort.h" // curl "https://raw.githubusercontent.com/scandum/gridsort/master/src/gridsort.{c,h}" -o "gridsort.#1" 48 | #endif 49 | #if __has_include("octosort.h") 50 | #include "octosort.h" // curl "https://raw.githubusercontent.com/scandum/octosort/master/src/octosort.{c,h}" -o "octosort.#1" 51 | #endif 52 | #if __has_include("piposort.h") 53 | #include "piposort.h" // curl "https://raw.githubusercontent.com/scandum/piposort/master/src/piposort.{c,h}" -o "piposort.#1" 54 | #endif 55 | #if __has_include("quadsort.h") 56 | #include "quadsort.h" // curl "https://raw.githubusercontent.com/scandum/quadsort/master/src/quadsort.{c,h}" -o "quadsort.#1" 57 | #endif 58 | #if __has_include("skipsort.h") 59 | #include "skipsort.h" // curl "https://raw.githubusercontent.com/scandum/wolfsort/master/src/skipsort.{c,h}" -o "skipsort.#1" 60 | #endif 61 | #if __has_include("wolfsort.h") 62 | #include "wolfsort.h" // curl "https://raw.githubusercontent.com/scandum/wolfsort/master/src/wolfsort.{c,h}" -o "wolfsort.#1" 63 | #endif 64 | 65 | #if __has_include("rhsort.c") 66 | #define RHSORT_C 67 | #include "rhsort.c" // curl https://raw.githubusercontent.com/mlochbaum/rhsort/master/rhsort.c > rhsort.c 68 | #endif 69 | 70 | #ifdef __GNUG__ 71 | #include 72 | #if __has_include("pdqsort.h") 73 | #include "pdqsort.h" // curl https://raw.githubusercontent.com/orlp/pdqsort/master/pdqsort.h > pdqsort.h 74 | #endif 75 | #if __has_include("ska_sort.hpp") 76 | #define SKASORT_HPP 77 | #include "ska_sort.hpp" // curl https://raw.githubusercontent.com/skarupke/ska_sort/master/ska_sort.hpp > ska_sort.hpp 78 | #endif 79 | #if __has_include("timsort.hpp") 80 | #include "timsort.hpp" // curl https://raw.githubusercontent.com/timsort/cpp-TimSort/master/include/gfx/timsort.hpp > timsort.hpp 81 | #endif 82 | #endif 83 | 84 | #if __has_include("antiqsort.c") 85 | #include "antiqsort.c" 86 | #endif 87 | 88 | //typedef int CMPFUNC (const void *a, const void *b); 89 | 90 | typedef void SRTFUNC(void *array, size_t nmemb, size_t size, CMPFUNC *cmpf); 91 | 92 | 93 | // Comment out Remove __attribute__ ((noinline)) and comparisons++ for full 94 | // throttle. Like so: #define COMPARISON_PP //comparisons++ 95 | 96 | size_t comparisons; 97 | 98 | #define COMPARISON_PP comparisons++ 99 | 100 | #define NO_INLINE __attribute__ ((noinline)) 101 | 102 | // primitive type comparison functions 103 | 104 | NO_INLINE int cmp_int(const void * a, const void * b) 105 | { 106 | COMPARISON_PP; 107 | 108 | return *(int *) a - *(int *) b; 109 | 110 | // const int l = *(const int *)a; 111 | // const int r = *(const int *)b; 112 | 113 | // return l - r; 114 | // return l > r; 115 | // return (l > r) - (l < r); 116 | } 117 | 118 | NO_INLINE int cmp_rev(const void * a, const void * b) 119 | { 120 | int fa = *(int *)a; 121 | int fb = *(int *)b; 122 | 123 | COMPARISON_PP; 124 | 125 | return fb - fa; 126 | } 127 | 128 | NO_INLINE int cmp_stable(const void * a, const void * b) 129 | { 130 | int fa = *(int *)a; 131 | int fb = *(int *)b; 132 | 133 | COMPARISON_PP; 134 | 135 | return fa / 100000 - fb / 100000; 136 | } 137 | 138 | NO_INLINE int cmp_long(const void * a, const void * b) 139 | { 140 | const long long fa = *(const long long *) a; 141 | const long long fb = *(const long long *) b; 142 | 143 | COMPARISON_PP; 144 | 145 | return (fa > fb) - (fa < fb); 146 | // return (fa > fb); 147 | } 148 | 149 | NO_INLINE int cmp_float(const void * a, const void * b) 150 | { 151 | return *(float *) a - *(float *) b; 152 | } 153 | 154 | NO_INLINE int cmp_long_double(const void * a, const void * b) 155 | { 156 | const long double fa = *(const long double *) a; 157 | const long double fb = *(const long double *) b; 158 | 159 | COMPARISON_PP; 160 | 161 | return (fa > fb) - (fa < fb); 162 | 163 | /* if (isnan(fa) || isnan(fb)) 164 | { 165 | return isnan(fa) - isnan(fb); 166 | } 167 | 168 | return (fa > fb); 169 | */ 170 | } 171 | 172 | // pointer comparison functions 173 | 174 | NO_INLINE int cmp_str(const void * a, const void * b) 175 | { 176 | COMPARISON_PP; 177 | 178 | return strcmp(*(const char **) a, *(const char **) b); 179 | } 180 | 181 | NO_INLINE int cmp_int_ptr(const void * a, const void * b) 182 | { 183 | const int *fa = *(const int **) a; 184 | const int *fb = *(const int **) b; 185 | 186 | COMPARISON_PP; 187 | 188 | return (*fa > *fb) - (*fa < *fb); 189 | } 190 | 191 | NO_INLINE int cmp_long_ptr(const void * a, const void * b) 192 | { 193 | const long long *fa = *(const long long **) a; 194 | const long long *fb = *(const long long **) b; 195 | 196 | COMPARISON_PP; 197 | 198 | return (*fa > *fb) - (*fa < *fb); 199 | } 200 | 201 | NO_INLINE int cmp_long_double_ptr(const void * a, const void * b) 202 | { 203 | const long double *fa = *(const long double **) a; 204 | const long double *fb = *(const long double **) b; 205 | 206 | COMPARISON_PP; 207 | 208 | return (*fa > *fb) - (*fa < *fb); 209 | } 210 | 211 | // c++ comparison functions 212 | 213 | #ifdef __GNUG__ 214 | 215 | NO_INLINE bool cpp_cmp_int(const int &a, const int &b) 216 | { 217 | COMPARISON_PP; 218 | 219 | return a < b; 220 | } 221 | 222 | NO_INLINE bool cpp_cmp_str(char const* const a, char const* const b) 223 | { 224 | COMPARISON_PP; 225 | 226 | return strcmp(a, b) < 0; 227 | } 228 | 229 | #endif 230 | 231 | long long utime() 232 | { 233 | struct timeval now_time; 234 | 235 | gettimeofday(&now_time, NULL); 236 | 237 | return now_time.tv_sec * 1000000LL + now_time.tv_usec; 238 | } 239 | 240 | void seed_rand(unsigned long long seed) 241 | { 242 | srand(seed); 243 | } 244 | 245 | void test_sort(void *array, void *unsorted, void *valid, int minimum, int maximum, int samples, int repetitions, SRTFUNC *srt, const char *name, const char *desc, size_t size, CMPFUNC *cmpf) 246 | { 247 | long long start, end, total, best, average_time, average_comp; 248 | char temp[100]; 249 | static char compare = 0; 250 | long long *ptla = (long long *) array, *ptlv = (long long *) valid; 251 | long double *ptda = (long double *) array, *ptdv = (long double *) valid; 252 | int *pta = (int *) array, *ptv = (int *) valid, rep, sam, max, cnt, name32; 253 | 254 | #ifdef SKASORT_HPP 255 | void *swap; 256 | #endif 257 | 258 | if (*name == '*') 259 | { 260 | if (!strcmp(desc, "random order") || !strcmp(desc, "random 1-4") || !strcmp(desc, "random 4") || !strcmp(desc, "random string") || !strcmp(desc, "random 10")) 261 | { 262 | if (comparisons) 263 | { 264 | compare = 1; 265 | printf("%s\n", "| Name | Items | Type | Best | Average | Compares | Samples | Distribution |"); 266 | printf("%s\n", "| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |"); 267 | } 268 | else 269 | { 270 | printf("%s\n", "| Name | Items | Type | Best | Average | Loops | Samples | Distribution |"); 271 | printf("%s\n", "| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |"); 272 | } 273 | } 274 | else 275 | { 276 | printf("%s\n", "| | | | | | | | |"); 277 | } 278 | return; 279 | } 280 | 281 | name32 = name[0] + (name[1] ? name[1] * 32 : 0) + (name[2] ? name[2] * 1024 : 0); 282 | 283 | best = average_time = average_comp = 0; 284 | 285 | if (minimum == 7 && maximum == 7) 286 | { 287 | pta = (int *) unsorted; 288 | printf("\e[1;32m%10d %10d %10d %10d %10d %10d %10d\e[0m\n", pta[0], pta[1], pta[2], pta[3], pta[4], pta[5], pta[6]); 289 | pta = (int *) array; 290 | } 291 | 292 | for (sam = 0 ; sam < samples ; sam++) 293 | { 294 | total = average_comp = 0; 295 | max = minimum; 296 | 297 | start = utime(); 298 | 299 | for (rep = repetitions - 1 ; rep >= 0 ; rep--) 300 | { 301 | memcpy(array, (char *) unsorted + maximum * rep * size, max * size); 302 | 303 | comparisons = 0; 304 | 305 | // edit char *sorts to add / remove sorts 306 | 307 | switch (name32) 308 | { 309 | #ifdef BLITSORT_H 310 | case 'b' + 'l' * 32 + 'i' * 1024: blitsort(array, max, size, cmpf); break; 311 | #endif 312 | #ifdef CRUMSORT_H 313 | case 'c' + 'r' * 32 + 'u' * 1024: crumsort(array, max, size, cmpf); break; 314 | #endif 315 | #ifdef DRIPSORT_H 316 | case 'd' + 'r' * 32 + 'i' * 1024: dripsort(array, max, size, cmpf); break; 317 | #endif 318 | #ifdef FLOWSORT_H 319 | case 'f' + 'l' * 32 + 'o' * 1024: flowsort(array, max, size, cmpf); break; 320 | #endif 321 | #ifdef FLUXSORT_H 322 | case 'f' + 'l' * 32 + 'u' * 1024: fluxsort(array, max, size, cmpf); break; 323 | case 's' + '_' * 32 + 'f' * 1024: fluxsort_size(array, max, size, cmpf); break; 324 | 325 | #endif 326 | #ifdef GRIDSORT_H 327 | case 'g' + 'r' * 32 + 'i' * 1024: gridsort(array, max, size, cmpf); break; 328 | #endif 329 | #ifdef OCTOSORT_H 330 | case 'o' + 'c' * 32 + 't' * 1024: octosort(array, max, size, cmpf); break; 331 | #endif 332 | #ifdef PIPOSORT_H 333 | case 'p' + 'i' * 32 + 'p' * 1024: piposort(array, max, size, cmpf); break; 334 | #endif 335 | #ifdef QUADSORT_H 336 | case 'q' + 'u' * 32 + 'a' * 1024: quadsort(array, max, size, cmpf); break; 337 | case 's' + '_' * 32 + 'q' * 1024: quadsort_size(array, max, size, cmpf); break; 338 | #endif 339 | #ifdef SKIPSORT_H 340 | case 's' + 'k' * 32 + 'i' * 1024: skipsort(array, max, size, cmpf); break; 341 | #endif 342 | #ifdef WOLFSORT_H 343 | case 'w' + 'o' * 32 + 'l' * 1024: wolfsort(array, max, size, cmpf); break; 344 | #endif 345 | case 'q' + 's' * 32 + 'o' * 1024: qsort(array, max, size, cmpf); break; 346 | 347 | #ifdef RHSORT_C 348 | case 'r' + 'h' * 32 + 's' * 1024: if (size == sizeof(int)) rhsort32(pta, max); else return; break; 349 | #endif 350 | 351 | #ifdef __GNUG__ 352 | case 's' + 'o' * 32 + 'r' * 1024: if (size == sizeof(int)) std::sort(pta, pta + max); else if (size == sizeof(long long)) std::sort(ptla, ptla + max); else std::sort(ptda, ptda + max); break; 353 | case 's' + 't' * 32 + 'a' * 1024: if (size == sizeof(int)) std::stable_sort(pta, pta + max); else if (size == sizeof(long long)) std::stable_sort(ptla, ptla + max); else std::stable_sort(ptda, ptda + max); break; 354 | 355 | #ifdef PDQSORT_H 356 | case 'p' + 'd' * 32 + 'q' * 1024: if (size == sizeof(int)) pdqsort(pta, pta + max); else if (size == sizeof(long long)) pdqsort(ptla, ptla + max); else pdqsort(ptda, ptda + max); break; 357 | #endif 358 | #ifdef SKASORT_HPP 359 | case 's' + 'k' * 32 + 'a' * 1024: swap = malloc(max * size); if (size == sizeof(int)) ska_sort_copy(pta, pta + max, (int *) swap); else if (size == sizeof(long long)) ska_sort_copy(ptla, ptla + max, (long long *) swap); else repetitions = 0; free(swap); break; 360 | #endif 361 | #ifdef GFX_TIMSORT_HPP 362 | case 't' + 'i' * 32 + 'm' * 1024: if (size == sizeof(int)) gfx::timsort(pta, pta + max, cpp_cmp_int); else if (size == sizeof(long long)) gfx::timsort(ptla, ptla + max); else gfx::timsort(ptda, ptda + max); break; 363 | #endif 364 | #endif 365 | default: 366 | switch (name32) 367 | { 368 | case 's' + 'o' * 32 + 'r' * 1024: 369 | case 's' + 't' * 32 + 'a' * 1024: 370 | case 'p' + 'd' * 32 + 'q' * 1024: 371 | case 'r' + 'h' * 32 + 's' * 1024: 372 | case 's' + 'k' * 32 + 'a' * 1024: 373 | case 't' + 'i' * 32 + 'm' * 1024: 374 | printf("unknown sort: %s (compile with g++ instead of gcc?)\n", name); 375 | return; 376 | default: 377 | printf("unknown sort: %s\n", name); 378 | return; 379 | } 380 | } 381 | average_comp += comparisons; 382 | 383 | if (minimum < maximum && ++max > maximum) 384 | { 385 | max = minimum; 386 | } 387 | } 388 | end = utime(); 389 | 390 | total = end - start; 391 | 392 | if (!best || total < best) 393 | { 394 | best = total; 395 | } 396 | average_time += total; 397 | } 398 | 399 | if (minimum == 7 && maximum == 7) 400 | { 401 | printf("\e[1;32m%10d %10d %10d %10d %10d %10d %10d\e[0m\n", pta[0], pta[1], pta[2], pta[3], pta[4], pta[5], pta[6]); 402 | } 403 | 404 | if (repetitions == 0) 405 | { 406 | return; 407 | } 408 | 409 | average_time /= samples; 410 | 411 | if (cmpf == cmp_stable) 412 | { 413 | for (cnt = 1 ; cnt < maximum ; cnt++) 414 | { 415 | if (pta[cnt - 1] > pta[cnt]) 416 | { 417 | sprintf(temp, "\e[1;31m%16s\e[0m", "unstable"); 418 | desc = temp; 419 | break; 420 | } 421 | } 422 | } 423 | 424 | if (compare) 425 | { 426 | if (repetitions <= 1) 427 | { 428 | printf("|%10s |%9d | %4d |%9f |%9f |%10d | %7d | %16s |\e[0m\n", name, maximum, (int) size * 8, best / 1000000.0, average_time / 1000000.0, (int) comparisons, samples, desc); 429 | } 430 | else 431 | { 432 | printf("|%10s |%9d | %4d |%9f |%9f |%10.1f | %7d | %16s |\e[0m\n", name, maximum, (int) size * 8, best / 1000000.0, average_time / 1000000.0, (float) average_comp / repetitions, samples, desc); 433 | } 434 | } 435 | else 436 | { 437 | printf("|%10s | %8d | %4d | %f | %f | %9d | %7d | %16s |\e[0m\n", name, maximum, (int) size * 8, best / 1000000.0, average_time / 1000000.0, repetitions, samples, desc); 438 | } 439 | 440 | if (minimum != maximum || cmpf == cmp_stable) 441 | { 442 | return; 443 | } 444 | 445 | for (cnt = 1 ; cnt < maximum ; cnt++) 446 | { 447 | if (cmpf == cmp_str) 448 | { 449 | char **ptsa = (char **) array; 450 | if (strcmp((char *) ptsa[cnt - 1], (char *) ptsa[cnt]) > 0) 451 | { 452 | printf("%17s: not properly sorted at index %d. (%s vs %s\n", name, cnt, (char *) ptsa[cnt - 1], (char *) ptsa[cnt]); 453 | break; 454 | } 455 | } 456 | else if (size == sizeof(int *) && cmpf == cmp_long_double_ptr) 457 | { 458 | long double **pptda = (long double **) array; 459 | 460 | if (cmp_long_double_ptr(&pptda[cnt - 1], &pptda[cnt]) > 0) 461 | { 462 | printf("%17s: not properly sorted at index %d. (%Lf vs %Lf\n", name, cnt, *pptda[cnt - 1], *pptda[cnt]); 463 | break; 464 | } 465 | } 466 | else if (cmpf == cmp_long_ptr) 467 | { 468 | long long **pptla = (long long **) array; 469 | 470 | if (cmp_long_ptr(&pptla[cnt - 1], &pptla[cnt]) > 0) 471 | { 472 | printf("%17s: not properly sorted at index %d. (%lld vs %lld\n", name, cnt, *pptla[cnt - 1], *pptla[cnt]); 473 | break; 474 | } 475 | } 476 | else if (cmpf == cmp_int_ptr) 477 | { 478 | int **pptia = (int **) array; 479 | 480 | if (cmp_int_ptr(&pptia[cnt - 1], &pptia[cnt]) > 0) 481 | { 482 | printf("%17s: not properly sorted at index %d. (%d vs %d\n", name, cnt, *pptia[cnt - 1], *pptia[cnt]); 483 | break; 484 | } 485 | } 486 | else if (size == sizeof(int)) 487 | { 488 | if (pta[cnt - 1] > pta[cnt]) 489 | { 490 | printf("%17s: not properly sorted at index %d. (%d vs %d\n", name, cnt, pta[cnt - 1], pta[cnt]); 491 | break; 492 | } 493 | if (pta[cnt - 1] == pta[cnt]) 494 | { 495 | // printf("%17s: Found a repeat value at index %d. (%d)\n", name, cnt, pta[cnt]); 496 | } 497 | } 498 | else if (size == sizeof(long long)) 499 | { 500 | if (ptla[cnt - 1] > ptla[cnt]) 501 | { 502 | printf("%17s: not properly sorted at index %d. (%lld vs %lld\n", name, cnt, ptla[cnt - 1], ptla[cnt]); 503 | break; 504 | } 505 | } 506 | else if (size == sizeof(long double)) 507 | { 508 | if (cmp_long_double(&ptda[cnt - 1], &ptda[cnt]) > 0) 509 | { 510 | printf("%17s: not properly sorted at index %d. (%Lf vs %Lf\n", name, cnt, ptda[cnt - 1], ptda[cnt]); 511 | break; 512 | } 513 | } 514 | } 515 | 516 | for (cnt = 1 ; cnt < maximum ; cnt++) 517 | { 518 | if (size == sizeof(int)) 519 | { 520 | if (pta[cnt] != ptv[cnt]) 521 | { 522 | printf(" validate: array[%d] != valid[%d]. (%d vs %d\n", cnt, cnt, pta[cnt], ptv[cnt]); 523 | break; 524 | } 525 | } 526 | else if (size == sizeof(long long)) 527 | { 528 | if (ptla[cnt] != ptlv[cnt]) 529 | { 530 | if (cmpf == cmp_str) 531 | { 532 | char **ptsa = (char **) array; 533 | char **ptsv = (char **) valid; 534 | 535 | printf(" validate: array[%d] != valid[%d]. (%s vs %s) %s\n", cnt, cnt, (char *) ptsa[cnt], (char *) ptsv[cnt], !strcmp((char *) ptsa[cnt], (char *) ptsv[cnt]) ? "\e[1;31munstable\e[0m" : ""); 536 | break; 537 | } 538 | if (cmpf == cmp_long_ptr) 539 | { 540 | long long **ptla = (long long **) array; 541 | long long **ptlv = (long long **) valid; 542 | 543 | printf(" validate: array[%d] != valid[%d]. (%lld vs %lld) %s\n", cnt, cnt, *ptla[cnt], *ptlv[cnt], (*ptla[cnt] == *ptlv[cnt]) ? "\e[1;31munstable\e[0m" : ""); 544 | break; 545 | } 546 | if (cmpf == cmp_int_ptr) 547 | { 548 | int **ptia = (int **) array; 549 | int **ptiv = (int **) valid; 550 | 551 | printf(" validate: array[%d] != valid[%d]. (%d vs %d) %s\n", cnt, cnt, *ptia[cnt], *ptiv[cnt], (*ptia[cnt] == *ptiv[cnt]) ? "\e[1;31munstable\e[0m" : ""); 552 | break; 553 | } 554 | 555 | printf(" validate: array[%d] != valid[%d]. (%lld vs %lld\n", cnt, cnt, ptla[cnt], ptlv[cnt]); 556 | break; 557 | } 558 | } 559 | else if (size == sizeof(long double)) 560 | { 561 | if (ptda[cnt] != ptdv[cnt]) 562 | { 563 | printf(" validate: array[%d] != valid[%d]. (%Lf vs %Lf\n", cnt, cnt, ptda[cnt], ptdv[cnt]); 564 | break; 565 | } 566 | } 567 | } 568 | } 569 | 570 | void validate() 571 | { 572 | int seed = time(NULL); 573 | int cnt, val, max = 1000; 574 | 575 | int *a_array, *r_array, *v_array; 576 | 577 | seed_rand(seed); 578 | 579 | a_array = (int *) malloc(max * sizeof(int)); 580 | r_array = (int *) malloc(max * sizeof(int)); 581 | v_array = (int *) malloc(max * sizeof(int)); 582 | 583 | for (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = rand(); 584 | 585 | for (cnt = 0 ; cnt < max ; cnt++) 586 | { 587 | memcpy(a_array, r_array, cnt * sizeof(int)); 588 | memcpy(v_array, r_array, cnt * sizeof(int)); 589 | 590 | quadsort_prim(a_array, cnt, sizeof(int)); 591 | qsort(v_array, cnt, sizeof(int), cmp_int); 592 | 593 | for (val = 0 ; val < cnt ; val++) 594 | { 595 | if (val && v_array[val - 1] > v_array[val]) {printf("\e[1;31mvalidate rand: seed %d: size: %d Not properly sorted at index %d.\n", seed, cnt, val); return;} 596 | if (a_array[val] != v_array[val]) {printf("\e[1;31mvalidate rand: seed %d: size: %d Not verified at index %d.\n", seed, cnt, val); return;} 597 | } 598 | } 599 | 600 | // ascending saw 601 | 602 | for (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = cnt % (max / 5); 603 | 604 | for (cnt = 0 ; cnt < max ; cnt += 7) 605 | { 606 | memcpy(a_array, r_array, cnt * sizeof(int)); 607 | memcpy(v_array, r_array, cnt * sizeof(int)); 608 | 609 | quadsort(a_array, cnt, sizeof(int), cmp_int); 610 | qsort(v_array, cnt, sizeof(int), cmp_int); 611 | 612 | for (val = 0 ; val < cnt ; val++) 613 | { 614 | if (val && v_array[val - 1] > v_array[val]) {printf("\e[1;31mvalidate ascending saw: seed %d: size: %d Not properly sorted at index %d.\n", seed, cnt, val); return;} 615 | if (a_array[val] != v_array[val]) {printf("\e[1;31mvalidate ascending saw: seed %d: size: %d Not verified at index %d.\n", seed, cnt, val); return;} 616 | } 617 | } 618 | 619 | // descending saw 620 | 621 | for (cnt = 0 ; cnt < max ; cnt++) 622 | { 623 | r_array[cnt] = (max - cnt + 1) % (max / 11); 624 | } 625 | 626 | for (cnt = 1 ; cnt < max ; cnt += 7) 627 | { 628 | memcpy(a_array, r_array, cnt * sizeof(int)); 629 | memcpy(v_array, r_array, cnt * sizeof(int)); 630 | 631 | quadsort(a_array, cnt, sizeof(int), cmp_int); 632 | qsort(v_array, cnt, sizeof(int), cmp_int); 633 | 634 | for (val = 0 ; val < cnt ; val++) 635 | { 636 | if (val && v_array[val - 1] > v_array[val]) {printf("\e[1;31mvalidate descending saw: seed %d: size: %d Not properly sorted at index %d.\n\n", seed, cnt, val); return;} 637 | if (a_array[val] != v_array[val]) {printf("\e[1;31mvalidate descending saw: seed %d: size: %d Not verified at index %d.\n\n", seed, cnt, val); return;} 638 | } 639 | } 640 | 641 | // random half 642 | 643 | for (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = (cnt < max / 2) ? cnt : rand(); 644 | 645 | for (cnt = 1 ; cnt < max ; cnt += 7) 646 | { 647 | memcpy(a_array, r_array, cnt * sizeof(int)); 648 | memcpy(v_array, r_array, cnt * sizeof(int)); 649 | 650 | quadsort(a_array, cnt, sizeof(int), cmp_int); 651 | qsort(v_array, cnt, sizeof(int), cmp_int); 652 | 653 | for (val = 0 ; val < cnt ; val++) 654 | { 655 | if (val && v_array[val - 1] > v_array[val]) {printf("\e[1;31mvalidate rand tail: seed %d: size: %d Not properly sorted at index %d.\n", seed, cnt, val); return;} 656 | if (a_array[val] != v_array[val]) {printf("\e[1;31mvalidate rand tail: seed %d: size: %d Not verified at index %d.\n", seed, cnt, val); return;} 657 | } 658 | } 659 | free(a_array); 660 | free(r_array); 661 | free(v_array); 662 | } 663 | 664 | unsigned int bit_reverse(unsigned int x) 665 | { 666 | x = (((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1)); 667 | x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2)); 668 | x = (((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4)); 669 | x = (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8)); 670 | 671 | return((x >> 16) | (x << 15)); 672 | } 673 | 674 | void run_test(void *a_array, void *r_array, void *v_array, int minimum, int maximum, int samples, int repetitions, int copies, const char *desc, size_t size, CMPFUNC *cmpf) 675 | { 676 | int cnt, rep; 677 | 678 | memcpy(v_array, r_array, maximum * size); 679 | 680 | for (rep = 0 ; rep < copies ; rep++) 681 | { 682 | memcpy((char *) r_array + rep * maximum * size, v_array, maximum * size); 683 | } 684 | quadsort(v_array, maximum, size, cmpf); 685 | 686 | for (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++) 687 | { 688 | test_sort(a_array, r_array, v_array, minimum, maximum, samples, repetitions, qsort, sorts[cnt], desc, size, cmpf); 689 | } 690 | } 691 | 692 | void range_test(int max, int samples, int repetitions, int seed) 693 | { 694 | int cnt, last; 695 | int mem = max * 10 > 32768 * 64 ? max * 10 : 32768 * 64; 696 | char dist[40]; 697 | 698 | int *a_array = (int *) malloc(max * sizeof(int)); 699 | int *r_array = (int *) malloc(mem * sizeof(int)); 700 | int *v_array = (int *) malloc(max * sizeof(int)); 701 | 702 | srand(seed); 703 | 704 | for (cnt = 0 ; cnt < mem ; cnt++) 705 | { 706 | r_array[cnt] = rand(); 707 | } 708 | 709 | if (max <= 4096) 710 | { 711 | for (last = 1, samples = 32768*4, repetitions = 4 ; repetitions <= max ; repetitions *= 2, samples /= 2) 712 | { 713 | if (max >= repetitions) 714 | { 715 | sprintf(dist, "random %d-%d", last, repetitions); 716 | 717 | memcpy(v_array, r_array, repetitions * sizeof(int)); 718 | quadsort(v_array, repetitions, sizeof(int), cmp_int); 719 | 720 | for (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++) 721 | { 722 | test_sort(a_array, r_array, v_array, last, repetitions, 50, samples, qsort, sorts[cnt], dist, sizeof(int), cmp_int); 723 | } 724 | last = repetitions + 1; 725 | } 726 | } 727 | free(a_array); 728 | free(r_array); 729 | free(v_array); 730 | return; 731 | } 732 | 733 | if (max == 10000000) 734 | { 735 | repetitions = 10000000; 736 | 737 | for (max = 10 ; max <= 10000000 ; max *= 10) 738 | { 739 | repetitions /= 10; 740 | 741 | memcpy(v_array, r_array, max * sizeof(int)); 742 | quadsort_prim(v_array, max, sizeof(int)); 743 | 744 | sprintf(dist, "random %d", max); 745 | 746 | for (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++) 747 | { 748 | test_sort(a_array, r_array, v_array, max, max, 10, repetitions, qsort, sorts[cnt], dist, sizeof(int), cmp_int); 749 | } 750 | } 751 | } 752 | else 753 | { 754 | for (samples = 32768*4, repetitions = 4 ; samples > 0 ; repetitions *= 2, samples /= 2) 755 | { 756 | if (max >= repetitions) 757 | { 758 | memcpy(v_array, r_array, repetitions * sizeof(int)); 759 | quadsort(v_array, repetitions, sizeof(int), cmp_int); 760 | 761 | sprintf(dist, "random %d", repetitions); 762 | 763 | for (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++) 764 | { 765 | test_sort(a_array, r_array, v_array, repetitions, repetitions, 100, samples, qsort, sorts[cnt], dist, sizeof(int), cmp_int); 766 | } 767 | } 768 | } 769 | } 770 | free(a_array); 771 | free(r_array); 772 | free(v_array); 773 | return; 774 | } 775 | 776 | #define VAR int 777 | 778 | int main(int argc, char **argv) 779 | { 780 | int max = 100000; 781 | int samples = 10; 782 | int repetitions = 1; 783 | int seed = 0; 784 | int cnt, mem; 785 | VAR *a_array, *r_array, *v_array, sum; 786 | 787 | if (argc >= 1 && argv[1] && *argv[1]) 788 | { 789 | max = atoi(argv[1]); 790 | } 791 | 792 | if (argc >= 2 && argv[2] && *argv[2]) 793 | { 794 | samples = atoi(argv[2]); 795 | } 796 | 797 | if (argc >= 3 && argv[3] && *argv[3]) 798 | { 799 | repetitions = atoi(argv[3]); 800 | } 801 | 802 | if (argc >= 4 && argv[4] && *argv[4]) 803 | { 804 | seed = atoi(argv[4]); 805 | } 806 | 807 | validate(); 808 | 809 | seed = seed ? seed : time(NULL); 810 | 811 | printf("Info: int = %lu, long long = %lu, long double = %lu\n\n", sizeof(int) * 8, sizeof(long long) * 8, sizeof(long double) * 8); 812 | 813 | printf("Benchmark: array size: %d, samples: %d, repetitions: %d, seed: %d\n\n", max, samples, repetitions, seed); 814 | 815 | if (repetitions == 0) 816 | { 817 | range_test(max, samples, repetitions, seed); 818 | return 0; 819 | } 820 | 821 | mem = max * repetitions; 822 | 823 | #ifndef SKIP_STRINGS 824 | #ifndef cmp 825 | 826 | // C string 827 | 828 | { 829 | char **sa_array = (char **) malloc(max * sizeof(char **)); 830 | char **sr_array = (char **) malloc(mem * sizeof(char **)); 831 | char **sv_array = (char **) malloc(max * sizeof(char **)); 832 | 833 | char *buffer = (char *) malloc(mem * 16); 834 | 835 | seed_rand(seed); 836 | 837 | for (cnt = 0 ; cnt < mem ; cnt++) 838 | { 839 | sprintf(buffer + cnt * 16, "%X", rand()); 840 | 841 | sr_array[cnt] = buffer + cnt * 16; 842 | } 843 | 844 | for (cnt = 0 ; cnt < mem ; cnt++) 845 | { 846 | char *pt1 = sr_array[cnt]; 847 | char *pt2 = sr_array[rand() % mem]; 848 | char *pt3 = pt1; pt1 = pt2; pt2 = pt3; 849 | } 850 | run_test(sa_array, sr_array, sv_array, max, max, samples, repetitions, 0, "random string", sizeof(char **), cmp_str); 851 | 852 | free(sa_array); 853 | free(sr_array); 854 | free(sv_array); 855 | 856 | free(buffer); 857 | } 858 | 859 | // long double table 860 | 861 | { 862 | long double **da_array = (long double **) malloc(max * sizeof(long double *)); 863 | long double **dr_array = (long double **) malloc(mem * sizeof(long double *)); 864 | long double **dv_array = (long double **) malloc(max * sizeof(long double *)); 865 | 866 | long double *buffer = (long double *) malloc(mem * sizeof(long double)); 867 | 868 | if (da_array == NULL || dr_array == NULL || dv_array == NULL) 869 | { 870 | printf("main(%d,%d,%d): malloc: %s\n", max, samples, repetitions, strerror(errno)); 871 | 872 | return 0; 873 | } 874 | 875 | seed_rand(seed); 876 | 877 | for (cnt = 0 ; cnt < mem ; cnt++) 878 | { 879 | buffer[cnt] = (long double) rand(); 880 | buffer[cnt] += (long double) ((unsigned long long) rand() << 32ULL); 881 | 882 | dr_array[cnt] = buffer + cnt; 883 | } 884 | run_test(da_array, dr_array, dv_array, max, max, samples, repetitions, 0, "random double", sizeof(long double *), cmp_long_double_ptr); 885 | 886 | free(da_array); 887 | free(dr_array); 888 | free(dv_array); 889 | 890 | free(buffer); 891 | } 892 | 893 | // long long table 894 | 895 | { 896 | long long **la_array = (long long **) malloc(max * sizeof(long long *)); 897 | long long **lr_array = (long long **) malloc(mem * sizeof(long long *)); 898 | long long **lv_array = (long long **) malloc(max * sizeof(long long *)); 899 | 900 | long long *buffer = (long long *) malloc(mem * sizeof(long long)); 901 | 902 | if (la_array == NULL || lr_array == NULL || lv_array == NULL) 903 | { 904 | printf("main(%d,%d,%d): malloc: %s\n", max, samples, repetitions, strerror(errno)); 905 | 906 | return 0; 907 | } 908 | 909 | seed_rand(seed); 910 | 911 | for (cnt = 0 ; cnt < mem ; cnt++) 912 | { 913 | buffer[cnt] = (long long) rand(); 914 | buffer[cnt] += (long long) ((unsigned long long) rand() << 32ULL); 915 | 916 | lr_array[cnt] = buffer + cnt; 917 | } 918 | run_test(la_array, lr_array, lv_array, max, max, samples, repetitions, 0, "random long", sizeof(long long *), cmp_long_ptr); 919 | 920 | 921 | free(la_array); 922 | free(lr_array); 923 | free(lv_array); 924 | 925 | free(buffer); 926 | } 927 | 928 | // int table 929 | 930 | { 931 | int **la_array = (int **) malloc(max * sizeof(int *)); 932 | int **lr_array = (int **) malloc(mem * sizeof(int *)); 933 | int **lv_array = (int **) malloc(max * sizeof(int *)); 934 | 935 | int *buffer = (int *) malloc(mem * sizeof(int)); 936 | 937 | if (la_array == NULL || lr_array == NULL || lv_array == NULL) 938 | { 939 | printf("main(%d,%d,%d): malloc: %s\n", max, samples, repetitions, strerror(errno)); 940 | 941 | return 0; 942 | } 943 | 944 | seed_rand(seed); 945 | 946 | for (cnt = 0 ; cnt < mem ; cnt++) 947 | { 948 | buffer[cnt] = rand(); 949 | 950 | lr_array[cnt] = buffer + cnt; 951 | } 952 | run_test(la_array, lr_array, lv_array, max, max, samples, repetitions, 0, "random int", sizeof(int *), cmp_int_ptr); 953 | 954 | free(la_array); 955 | free(lr_array); 956 | free(lv_array); 957 | 958 | free(buffer); 959 | 960 | printf("\n"); 961 | } 962 | #endif 963 | #endif 964 | // 128 bit 965 | 966 | #ifndef SKIP_DOUBLES 967 | long double *da_array = (long double *) malloc(max * sizeof(long double)); 968 | long double *dr_array = (long double *) malloc(mem * sizeof(long double)); 969 | long double *dv_array = (long double *) malloc(max * sizeof(long double)); 970 | 971 | if (da_array == NULL || dr_array == NULL || dv_array == NULL) 972 | { 973 | printf("main(%d,%d,%d): malloc: %s\n", max, samples, repetitions, strerror(errno)); 974 | 975 | return 0; 976 | } 977 | 978 | seed_rand(seed); 979 | 980 | for (cnt = 0 ; cnt < mem ; cnt++) 981 | { 982 | dr_array[cnt] = (long double) rand(); 983 | dr_array[cnt] += (long double) ((unsigned long long) rand() << 32ULL); 984 | dr_array[cnt] += 1.0L / 3.0L; 985 | } 986 | 987 | memcpy(dv_array, dr_array, max * sizeof(long double)); 988 | quadsort(dv_array, max, sizeof(long double), cmp_long_double); 989 | 990 | for (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++) 991 | { 992 | test_sort(da_array, dr_array, dv_array, max, max, samples, repetitions, qsort, sorts[cnt], "random order", sizeof(long double), cmp_long_double); 993 | } 994 | #ifndef cmp 995 | #ifdef QUADSORT_H 996 | test_sort(da_array, dr_array, dv_array, max, max, samples, repetitions, qsort, "s_quadsort", "random order", sizeof(long double), cmp_long_double_ptr); 997 | #endif 998 | #endif 999 | free(da_array); 1000 | free(dr_array); 1001 | free(dv_array); 1002 | 1003 | printf("\n"); 1004 | #endif 1005 | // 64 bit 1006 | 1007 | #ifndef SKIP_LONGS 1008 | long long *la_array = (long long *) malloc(max * sizeof(long long)); 1009 | long long *lr_array = (long long *) malloc(mem * sizeof(long long)); 1010 | long long *lv_array = (long long *) malloc(max * sizeof(long long)); 1011 | 1012 | if (la_array == NULL || lr_array == NULL || lv_array == NULL) 1013 | { 1014 | printf("main(%d,%d,%d): malloc: %s\n", max, samples, repetitions, strerror(errno)); 1015 | 1016 | return 0; 1017 | } 1018 | 1019 | seed_rand(seed); 1020 | 1021 | for (cnt = 0 ; cnt < mem ; cnt++) 1022 | { 1023 | lr_array[cnt] = rand(); 1024 | lr_array[cnt] += (unsigned long long) rand() << 32ULL; 1025 | } 1026 | 1027 | memcpy(lv_array, lr_array, max * sizeof(long long)); 1028 | quadsort(lv_array, max, sizeof(long long), cmp_long); 1029 | 1030 | for (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++) 1031 | { 1032 | test_sort(la_array, lr_array, lv_array, max, max, samples, repetitions, qsort, sorts[cnt], "random order", sizeof(long long), cmp_long); 1033 | } 1034 | 1035 | free(la_array); 1036 | free(lr_array); 1037 | free(lv_array); 1038 | 1039 | printf("\n"); 1040 | #endif 1041 | // 32 bit 1042 | 1043 | a_array = (VAR *) malloc(max * sizeof(VAR)); 1044 | r_array = (VAR *) malloc(mem * sizeof(VAR)); 1045 | v_array = (VAR *) malloc(max * sizeof(VAR)); 1046 | 1047 | int quad0 = 0; 1048 | int nmemb = max; 1049 | int half1 = nmemb / 2; 1050 | int half2 = nmemb - half1; 1051 | int quad1 = half1 / 2; 1052 | int quad2 = half1 - quad1; 1053 | int quad3 = half2 / 2; 1054 | int quad4 = half2 - quad3; 1055 | 1056 | int span3 = quad1 + quad2 + quad3; 1057 | 1058 | // random 1059 | 1060 | seed_rand(seed); 1061 | 1062 | for (cnt = 0 ; cnt < mem ; cnt++) 1063 | { 1064 | r_array[cnt] = rand(); 1065 | } 1066 | run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "random order", sizeof(VAR), cmp_int); 1067 | 1068 | // random % 100 1069 | 1070 | for (cnt = 0 ; cnt < mem ; cnt++) 1071 | { 1072 | r_array[cnt] = rand() % 100; 1073 | } 1074 | run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "random % 100", sizeof(VAR), cmp_int); 1075 | 1076 | // ascending 1077 | 1078 | for (cnt = sum = 0 ; cnt < mem ; cnt++) 1079 | { 1080 | r_array[cnt] = sum; sum += rand() % 5; 1081 | } 1082 | 1083 | run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "ascending order", sizeof(VAR), cmp_int); 1084 | 1085 | // ascending saw 1086 | 1087 | for (cnt = 0 ; cnt < max ; cnt++) 1088 | { 1089 | r_array[cnt] = rand(); 1090 | } 1091 | 1092 | quadsort(r_array + quad0, quad1, sizeof(VAR), cmp_int); 1093 | quadsort(r_array + quad1, quad2, sizeof(VAR), cmp_int); 1094 | quadsort(r_array + half1, quad3, sizeof(VAR), cmp_int); 1095 | quadsort(r_array + span3, quad4, sizeof(VAR), cmp_int); 1096 | 1097 | run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "ascending saw", sizeof(VAR), cmp_int); 1098 | 1099 | // pipe organ 1100 | 1101 | for (cnt = 0 ; cnt < max ; cnt++) 1102 | { 1103 | r_array[cnt] = rand(); 1104 | } 1105 | 1106 | quadsort(r_array + quad0, half1, sizeof(VAR), cmp_int); 1107 | qsort(r_array + half1, half2, sizeof(VAR), cmp_rev); 1108 | 1109 | for (cnt = half1 + 1 ; cnt < max ; cnt++) 1110 | { 1111 | if (r_array[cnt] >= r_array[cnt - 1]) 1112 | { 1113 | r_array[cnt] = r_array[cnt - 1] - 1; // guarantee the run is strictly descending 1114 | } 1115 | } 1116 | 1117 | run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "pipe organ", sizeof(VAR), cmp_int); 1118 | 1119 | // descending 1120 | 1121 | for (cnt = 0, sum = mem * 10 ; cnt < mem ; cnt++) 1122 | { 1123 | r_array[cnt] = sum; sum -= 1 + rand() % 5; 1124 | } 1125 | run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "descending order", sizeof(VAR), cmp_int); 1126 | 1127 | // descending saw 1128 | 1129 | for (cnt = 0 ; cnt < max ; cnt++) 1130 | { 1131 | r_array[cnt] = rand(); 1132 | } 1133 | 1134 | qsort(r_array + quad0, quad1, sizeof(VAR), cmp_rev); 1135 | qsort(r_array + quad1, quad2, sizeof(VAR), cmp_rev); 1136 | qsort(r_array + half1, quad3, sizeof(VAR), cmp_rev); 1137 | qsort(r_array + span3, quad4, sizeof(VAR), cmp_rev); 1138 | 1139 | for (cnt = 1 ; cnt < max ; cnt++) 1140 | { 1141 | if (cnt == quad1 || cnt == half1 || cnt == span3) continue; 1142 | 1143 | if (r_array[cnt] >= r_array[cnt - 1]) 1144 | { 1145 | r_array[cnt] = r_array[cnt - 1] - 1; // guarantee the run is strictly descending 1146 | } 1147 | } 1148 | 1149 | run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "descending saw", sizeof(VAR), cmp_int); 1150 | 1151 | 1152 | // random tail 25% 1153 | 1154 | for (cnt = 0 ; cnt < max ; cnt++) 1155 | { 1156 | r_array[cnt] = rand(); 1157 | } 1158 | quadsort(r_array, span3, sizeof(VAR), cmp_int); 1159 | 1160 | run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "random tail", sizeof(VAR), cmp_int); 1161 | 1162 | // random 50% 1163 | 1164 | for (cnt = 0 ; cnt < max ; cnt++) 1165 | { 1166 | r_array[cnt] = rand(); 1167 | } 1168 | quadsort(r_array, half1, sizeof(VAR), cmp_int); 1169 | 1170 | run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "random half", sizeof(VAR), cmp_int); 1171 | 1172 | // tiles 1173 | 1174 | for (cnt = 0 ; cnt < mem ; cnt++) 1175 | { 1176 | if (cnt % 2 == 0) 1177 | { 1178 | r_array[cnt] = 16777216 + cnt; 1179 | } 1180 | else 1181 | { 1182 | r_array[cnt] = 33554432 + cnt; 1183 | } 1184 | } 1185 | run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "ascending tiles", sizeof(VAR), cmp_int); 1186 | 1187 | // bit-reversal 1188 | 1189 | for (cnt = 0 ; cnt < mem ; cnt++) 1190 | { 1191 | r_array[cnt] = bit_reverse(cnt); 1192 | } 1193 | run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "bit reversal", sizeof(VAR), cmp_int); 1194 | 1195 | #ifndef cmp 1196 | #ifdef ANTIQSORT 1197 | test_antiqsort; 1198 | #endif 1199 | #endif 1200 | 1201 | #define QUAD_DEBUG 1202 | #if __has_include("extra_tests.c") 1203 | #include "extra_tests.c" 1204 | #endif 1205 | 1206 | free(a_array); 1207 | free(r_array); 1208 | free(v_array); 1209 | 1210 | return 0; 1211 | } 1212 | -------------------------------------------------------------------------------- /src/crumsort.c: -------------------------------------------------------------------------------- 1 | // crumsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com 2 | 3 | #define CRUM_AUX 512 4 | #define CRUM_OUT 96 5 | 6 | void FUNC(fulcrum_partition)(VAR *array, VAR *swap, VAR *max, size_t swap_size, size_t nmemb, CMPFUNC *cmp); 7 | 8 | void FUNC(crum_analyze)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp) 9 | { 10 | unsigned char loop, asum, bsum, csum, dsum; 11 | unsigned int astreaks, bstreaks, cstreaks, dstreaks; 12 | size_t quad1, quad2, quad3, quad4, half1, half2; 13 | size_t cnt, abalance, bbalance, cbalance, dbalance; 14 | VAR *pta, *ptb, *ptc, *ptd; 15 | 16 | half1 = nmemb / 2; 17 | quad1 = half1 / 2; 18 | quad2 = half1 - quad1; 19 | half2 = nmemb - half1; 20 | quad3 = half2 / 2; 21 | quad4 = half2 - quad3; 22 | 23 | pta = array; 24 | ptb = array + quad1; 25 | ptc = array + half1; 26 | ptd = array + half1 + quad3; 27 | 28 | astreaks = bstreaks = cstreaks = dstreaks = 0; 29 | abalance = bbalance = cbalance = dbalance = 0; 30 | 31 | for (cnt = nmemb ; cnt > 132 ; cnt -= 128) 32 | { 33 | for (asum = bsum = csum = dsum = 0, loop = 32 ; loop ; loop--) 34 | { 35 | asum += cmp(pta, pta + 1) > 0; pta++; 36 | bsum += cmp(ptb, ptb + 1) > 0; ptb++; 37 | csum += cmp(ptc, ptc + 1) > 0; ptc++; 38 | dsum += cmp(ptd, ptd + 1) > 0; ptd++; 39 | } 40 | abalance += asum; astreaks += asum = (asum == 0) | (asum == 32); 41 | bbalance += bsum; bstreaks += bsum = (bsum == 0) | (bsum == 32); 42 | cbalance += csum; cstreaks += csum = (csum == 0) | (csum == 32); 43 | dbalance += dsum; dstreaks += dsum = (dsum == 0) | (dsum == 32); 44 | 45 | if (cnt > 516 && asum + bsum + csum + dsum == 0) 46 | { 47 | abalance += 48; pta += 96; 48 | bbalance += 48; ptb += 96; 49 | cbalance += 48; ptc += 96; 50 | dbalance += 48; ptd += 96; 51 | cnt -= 384; 52 | } 53 | } 54 | 55 | for ( ; cnt > 7 ; cnt -= 4) 56 | { 57 | abalance += cmp(pta, pta + 1) > 0; pta++; 58 | bbalance += cmp(ptb, ptb + 1) > 0; ptb++; 59 | cbalance += cmp(ptc, ptc + 1) > 0; ptc++; 60 | dbalance += cmp(ptd, ptd + 1) > 0; ptd++; 61 | } 62 | 63 | if (quad1 < quad2) {bbalance += cmp(ptb, ptb + 1) > 0; ptb++;} 64 | if (quad1 < quad3) {cbalance += cmp(ptc, ptc + 1) > 0; ptc++;} 65 | if (quad1 < quad4) {dbalance += cmp(ptd, ptd + 1) > 0; ptd++;} 66 | 67 | cnt = abalance + bbalance + cbalance + dbalance; 68 | 69 | if (cnt == 0) 70 | { 71 | if (cmp(pta, pta + 1) <= 0 && cmp(ptb, ptb + 1) <= 0 && cmp(ptc, ptc + 1) <= 0) 72 | { 73 | return; 74 | } 75 | } 76 | 77 | asum = quad1 - abalance == 1; 78 | bsum = quad2 - bbalance == 1; 79 | csum = quad3 - cbalance == 1; 80 | dsum = quad4 - dbalance == 1; 81 | 82 | if (asum | bsum | csum | dsum) 83 | { 84 | unsigned char span1 = (asum && bsum) * (cmp(pta, pta + 1) > 0); 85 | unsigned char span2 = (bsum && csum) * (cmp(ptb, ptb + 1) > 0); 86 | unsigned char span3 = (csum && dsum) * (cmp(ptc, ptc + 1) > 0); 87 | 88 | switch (span1 | span2 * 2 | span3 * 4) 89 | { 90 | case 0: break; 91 | case 1: FUNC(quad_reversal)(array, ptb); abalance = bbalance = 0; break; 92 | case 2: FUNC(quad_reversal)(pta + 1, ptc); bbalance = cbalance = 0; break; 93 | case 3: FUNC(quad_reversal)(array, ptc); abalance = bbalance = cbalance = 0; break; 94 | case 4: FUNC(quad_reversal)(ptb + 1, ptd); cbalance = dbalance = 0; break; 95 | case 5: FUNC(quad_reversal)(array, ptb); 96 | FUNC(quad_reversal)(ptb + 1, ptd); abalance = bbalance = cbalance = dbalance = 0; break; 97 | case 6: FUNC(quad_reversal)(pta + 1, ptd); bbalance = cbalance = dbalance = 0; break; 98 | case 7: FUNC(quad_reversal)(array, ptd); return; 99 | } 100 | 101 | if (asum && abalance) {FUNC(quad_reversal)(array, pta); abalance = 0;} 102 | if (bsum && bbalance) {FUNC(quad_reversal)(pta + 1, ptb); bbalance = 0;} 103 | if (csum && cbalance) {FUNC(quad_reversal)(ptb + 1, ptc); cbalance = 0;} 104 | if (dsum && dbalance) {FUNC(quad_reversal)(ptc + 1, ptd); dbalance = 0;} 105 | } 106 | 107 | #ifdef cmp 108 | cnt = nmemb / 256; // switch to quadsort if at least 50% ordered 109 | #else 110 | cnt = nmemb / 512; // switch to quadsort if at least 25% ordered 111 | #endif 112 | asum = astreaks > cnt; 113 | bsum = bstreaks > cnt; 114 | csum = cstreaks > cnt; 115 | dsum = dstreaks > cnt; 116 | 117 | #ifndef cmp 118 | if (quad1 > QUAD_CACHE) 119 | { 120 | // asum = bsum = csum = dsum = 1; 121 | goto quad_cache; 122 | } 123 | #endif 124 | 125 | switch (asum + bsum * 2 + csum * 4 + dsum * 8) 126 | { 127 | case 0: 128 | FUNC(fulcrum_partition)(array, swap, NULL, swap_size, nmemb, cmp); 129 | return; 130 | case 1: 131 | if (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp); 132 | FUNC(fulcrum_partition)(pta + 1, swap, NULL, swap_size, quad2 + half2, cmp); 133 | break; 134 | case 2: 135 | FUNC(fulcrum_partition)(array, swap, NULL, swap_size, quad1, cmp); 136 | if (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp); 137 | FUNC(fulcrum_partition)(ptb + 1, swap, NULL, swap_size, half2, cmp); 138 | break; 139 | case 3: 140 | if (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp); 141 | if (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp); 142 | FUNC(fulcrum_partition)(ptb + 1, swap, NULL, swap_size, half2, cmp); 143 | break; 144 | case 4: 145 | FUNC(fulcrum_partition)(array, swap, NULL, swap_size, half1, cmp); 146 | if (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp); 147 | FUNC(fulcrum_partition)(ptc + 1, swap, NULL, swap_size, quad4, cmp); 148 | break; 149 | case 8: 150 | FUNC(fulcrum_partition)(array, swap, NULL, swap_size, half1 + quad3, cmp); 151 | if (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp); 152 | break; 153 | case 9: 154 | if (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp); 155 | FUNC(fulcrum_partition)(pta + 1, swap, NULL, swap_size, quad2 + quad3, cmp); 156 | if (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp); 157 | break; 158 | case 12: 159 | FUNC(fulcrum_partition)(array, swap, NULL, swap_size, half1, cmp); 160 | if (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp); 161 | if (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp); 162 | break; 163 | case 5: 164 | case 6: 165 | case 7: 166 | case 10: 167 | case 11: 168 | case 13: 169 | case 14: 170 | case 15: 171 | #ifndef cmp 172 | quad_cache: 173 | #endif 174 | if (asum) 175 | { 176 | if (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp); 177 | } 178 | else FUNC(fulcrum_partition)(array, swap, NULL, swap_size, quad1, cmp); 179 | if (bsum) 180 | { 181 | if (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp); 182 | } 183 | else FUNC(fulcrum_partition)(pta + 1, swap, NULL, swap_size, quad2, cmp); 184 | if (csum) 185 | { 186 | if (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp); 187 | } 188 | else FUNC(fulcrum_partition)(ptb + 1, swap, NULL, swap_size, quad3, cmp); 189 | if (dsum) 190 | { 191 | if (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp); 192 | } 193 | else FUNC(fulcrum_partition)(ptc + 1, swap, NULL, swap_size, quad4, cmp); 194 | break; 195 | } 196 | 197 | if (cmp(pta, pta + 1) <= 0) 198 | { 199 | if (cmp(ptc, ptc + 1) <= 0) 200 | { 201 | if (cmp(ptb, ptb + 1) <= 0) 202 | { 203 | return; 204 | } 205 | } 206 | else 207 | { 208 | FUNC(rotate_merge_block)(array + half1, swap, swap_size, quad3, quad4, cmp); 209 | } 210 | } 211 | else 212 | { 213 | FUNC(rotate_merge_block)(array, swap, swap_size, quad1, quad2, cmp); 214 | 215 | if (cmp(ptc, ptc + 1) > 0) 216 | { 217 | FUNC(rotate_merge_block)(array + half1, swap, swap_size, quad3, quad4, cmp); 218 | } 219 | } 220 | FUNC(rotate_merge_block)(array, swap, swap_size, half1, half2, cmp); 221 | } 222 | 223 | // The next 4 functions are used for pivot selection 224 | 225 | VAR *FUNC(crum_binary_median)(VAR *pta, VAR *ptb, size_t len, CMPFUNC *cmp) 226 | { 227 | while (len /= 2) 228 | { 229 | if (cmp(pta + len, ptb + len) <= 0) pta += len; else ptb += len; 230 | } 231 | return cmp(pta, ptb) > 0 ? pta : ptb; 232 | } 233 | 234 | VAR *FUNC(crum_median_of_cbrt)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, int *generic, CMPFUNC *cmp) 235 | { 236 | VAR *pta, *piv; 237 | size_t cnt, cbrt, div; 238 | 239 | for (cbrt = 32 ; nmemb > cbrt * cbrt * cbrt && cbrt < swap_size ; cbrt *= 2) {} 240 | 241 | div = nmemb / cbrt; 242 | 243 | pta = array + nmemb - 1 - (size_t) &div / 64 % div; 244 | piv = array + cbrt; 245 | 246 | for (cnt = cbrt ; cnt ; cnt--) 247 | { 248 | swap[0] = *--piv; *piv = *pta; *pta = swap[0]; 249 | 250 | pta -= div; 251 | } 252 | 253 | cbrt /= 2; 254 | 255 | FUNC(quadsort_swap)(piv, swap, swap_size, cbrt, cmp); 256 | FUNC(quadsort_swap)(piv + cbrt, swap, swap_size, cbrt, cmp); 257 | 258 | *generic = (cmp(piv + cbrt * 2 - 1, piv) <= 0) & (cmp(piv + cbrt - 1, piv) <= 0); 259 | 260 | return FUNC(crum_binary_median)(piv, piv + cbrt, cbrt, cmp); 261 | } 262 | 263 | size_t FUNC(crum_median_of_three)(VAR *array, size_t v0, size_t v1, size_t v2, CMPFUNC *cmp) 264 | { 265 | size_t v[3] = {v0, v1, v2}; 266 | char x, y, z; 267 | 268 | x = cmp(array + v0, array + v1) > 0; 269 | y = cmp(array + v0, array + v2) > 0; 270 | z = cmp(array + v1, array + v2) > 0; 271 | 272 | return v[(x == y) + (y ^ z)]; 273 | } 274 | 275 | VAR *FUNC(crum_median_of_nine)(VAR *array, size_t nmemb, CMPFUNC *cmp) 276 | { 277 | size_t x, y, z, div = nmemb / 16; 278 | 279 | x = FUNC(crum_median_of_three)(array, div * 2, div * 1, div * 4, cmp); 280 | y = FUNC(crum_median_of_three)(array, div * 8, div * 6, div * 10, cmp); 281 | z = FUNC(crum_median_of_three)(array, div * 14, div * 12, div * 15, cmp); 282 | 283 | return array + FUNC(crum_median_of_three)(array, x, y, z, cmp); 284 | } 285 | 286 | size_t FUNC(fulcrum_default_partition)(VAR *array, VAR *swap, VAR *ptx, VAR *piv, size_t swap_size, size_t nmemb, CMPFUNC *cmp) 287 | { 288 | size_t i, cnt, val, m = 0; 289 | VAR *ptl, *ptr, *pta, *tpa; 290 | 291 | memcpy(swap, array, 32 * sizeof(VAR)); 292 | memcpy(swap + 32, array + nmemb - 32, 32 * sizeof(VAR)); 293 | 294 | ptl = array; 295 | ptr = array + nmemb - 1; 296 | 297 | pta = array + 32; 298 | tpa = array + nmemb - 33; 299 | 300 | cnt = nmemb / 16 - 4; 301 | 302 | while (1) 303 | { 304 | if (pta - ptl - m <= 48) 305 | { 306 | if (cnt-- == 0) break; 307 | 308 | for (i = 16 ; i ; i--) 309 | { 310 | val = cmp(pta, piv) <= 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; 311 | } 312 | } 313 | if (pta - ptl - m >= 16) 314 | { 315 | if (cnt-- == 0) break; 316 | 317 | for (i = 16 ; i ; i--) 318 | { 319 | val = cmp(tpa, piv) <= 0; ptl[m] = ptr[m] = *tpa--; m += val; ptr--; 320 | } 321 | } 322 | } 323 | 324 | if (pta - ptl - m <= 48) 325 | { 326 | for (cnt = nmemb % 16 ; cnt ; cnt--) 327 | { 328 | val = cmp(pta, piv) <= 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; 329 | } 330 | } 331 | else 332 | { 333 | for (cnt = nmemb % 16 ; cnt ; cnt--) 334 | { 335 | val = cmp(tpa, piv) <= 0; ptl[m] = ptr[m] = *tpa--; m += val; ptr--; 336 | } 337 | } 338 | pta = swap; 339 | 340 | for (cnt = 16 ; cnt ; cnt--) 341 | { 342 | val = cmp(pta, piv) <= 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; 343 | val = cmp(pta, piv) <= 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; 344 | val = cmp(pta, piv) <= 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; 345 | val = cmp(pta, piv) <= 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; 346 | } 347 | return m; 348 | } 349 | 350 | // As per suggestion by Marshall Lochbaum to improve generic data handling by mimicking dual-pivot quicksort 351 | 352 | size_t FUNC(fulcrum_reverse_partition)(VAR *array, VAR *swap, VAR *ptx, VAR *piv, size_t swap_size, size_t nmemb, CMPFUNC *cmp) 353 | { 354 | size_t i, cnt, val, m = 0; 355 | VAR *ptl, *ptr, *pta, *tpa; 356 | 357 | memcpy(swap, array, 32 * sizeof(VAR)); 358 | memcpy(swap + 32, array + nmemb - 32, 32 * sizeof(VAR)); 359 | 360 | ptl = array; 361 | ptr = array + nmemb - 1; 362 | 363 | pta = array + 32; 364 | tpa = array + nmemb - 33; 365 | 366 | cnt = nmemb / 16 - 4; 367 | 368 | while (1) 369 | { 370 | if (pta - ptl - m <= 48) 371 | { 372 | if (cnt-- == 0) break; 373 | 374 | for (i = 16 ; i ; i--) 375 | { 376 | val = cmp(piv, pta) > 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; 377 | } 378 | } 379 | if (pta - ptl - m >= 16) 380 | { 381 | if (cnt-- == 0) break; 382 | 383 | for (i = 16 ; i ; i--) 384 | { 385 | val = cmp(piv, tpa) > 0; ptl[m] = ptr[m] = *tpa--; m += val; ptr--; 386 | } 387 | } 388 | } 389 | 390 | if (pta - ptl - m <= 48) 391 | { 392 | for (cnt = nmemb % 16 ; cnt ; cnt--) 393 | { 394 | val = cmp(piv, pta) > 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; 395 | } 396 | } 397 | else 398 | { 399 | for (cnt = nmemb % 16 ; cnt ; cnt--) 400 | { 401 | val = cmp(piv, tpa) > 0; ptl[m] = ptr[m] = *tpa--; m += val; ptr--; 402 | } 403 | } 404 | pta = swap; 405 | 406 | for (cnt = 16 ; cnt ; cnt--) 407 | { 408 | val = cmp(piv, pta) > 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; 409 | val = cmp(piv, pta) > 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; 410 | val = cmp(piv, pta) > 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; 411 | val = cmp(piv, pta) > 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; 412 | } 413 | return m; 414 | } 415 | 416 | void FUNC(fulcrum_partition)(VAR *array, VAR *swap, VAR *max, size_t swap_size, size_t nmemb, CMPFUNC *cmp) 417 | { 418 | size_t a_size, s_size; 419 | VAR *ptp, piv; 420 | int generic = 0; 421 | 422 | while (1) 423 | { 424 | if (nmemb <= 2048) 425 | { 426 | ptp = FUNC(crum_median_of_nine)(array, nmemb, cmp); 427 | } 428 | else 429 | { 430 | ptp = FUNC(crum_median_of_cbrt)(array, swap, swap_size, nmemb, &generic, cmp); 431 | 432 | if (generic) break; 433 | } 434 | piv = *ptp; 435 | 436 | if (max && cmp(max, &piv) <= 0) 437 | { 438 | a_size = FUNC(fulcrum_reverse_partition)(array, swap, array, &piv, swap_size, nmemb, cmp); 439 | s_size = nmemb - a_size; 440 | nmemb = a_size; 441 | 442 | if (s_size <= a_size / 32 || a_size <= CRUM_OUT) break; 443 | 444 | max = NULL; 445 | continue; 446 | } 447 | *ptp = array[--nmemb]; 448 | 449 | a_size = FUNC(fulcrum_default_partition)(array, swap, array, &piv, swap_size, nmemb, cmp); 450 | s_size = nmemb - a_size; 451 | 452 | ptp = array + a_size; array[nmemb] = *ptp; *ptp = piv; 453 | 454 | if (a_size <= s_size / 32 || s_size <= CRUM_OUT) 455 | { 456 | FUNC(quadsort_swap)(ptp + 1, swap, swap_size, s_size, cmp); 457 | } 458 | else 459 | { 460 | FUNC(fulcrum_partition)(ptp + 1, swap, max, swap_size, s_size, cmp); 461 | } 462 | nmemb = a_size; 463 | 464 | if (s_size <= a_size / 32 || a_size <= CRUM_OUT) 465 | { 466 | if (a_size <= CRUM_OUT) break; 467 | 468 | a_size = FUNC(fulcrum_reverse_partition)(array, swap, array, &piv, swap_size, nmemb, cmp); 469 | s_size = nmemb - a_size; 470 | nmemb = a_size; 471 | 472 | if (s_size <= a_size / 32 || a_size <= CRUM_OUT) break; 473 | 474 | max = NULL; 475 | continue; 476 | } 477 | max = ptp; 478 | } 479 | FUNC(quadsort_swap)(array, swap, swap_size, nmemb, cmp); 480 | } 481 | 482 | void FUNC(crumsort)(void *array, size_t nmemb, CMPFUNC *cmp) 483 | { 484 | if (nmemb <= 256) 485 | { 486 | VAR swap[nmemb]; 487 | 488 | FUNC(quadsort_swap)(array, swap, nmemb, nmemb, cmp); 489 | 490 | return; 491 | } 492 | VAR *pta = (VAR *) array; 493 | #if CRUM_AUX 494 | size_t swap_size = CRUM_AUX; 495 | #else 496 | size_t swap_size = 128; 497 | 498 | while (swap_size * swap_size <= nmemb) 499 | { 500 | swap_size *= 4; 501 | } 502 | #endif 503 | VAR swap[swap_size]; 504 | 505 | FUNC(crum_analyze)(pta, swap, swap_size, nmemb, cmp); 506 | } 507 | 508 | void FUNC(crumsort_swap)(void *array, void *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp) 509 | { 510 | if (nmemb <= 256) 511 | { 512 | FUNC(quadsort_swap)(array, swap, swap_size, nmemb, cmp); 513 | } 514 | else 515 | { 516 | VAR *pta = (VAR *) array; 517 | VAR *pts = (VAR *) swap; 518 | 519 | FUNC(crum_analyze)(pta, pts, swap_size, nmemb, cmp); 520 | } 521 | } 522 | -------------------------------------------------------------------------------- /src/crumsort.h: -------------------------------------------------------------------------------- 1 | // crumsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com 2 | 3 | #ifndef CRUMSORT_H 4 | #define CRUMSORT_H 5 | 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | 14 | typedef int CMPFUNC (const void *a, const void *b); 15 | 16 | //#define cmp(a,b) (*(a) > *(b)) 17 | 18 | #ifndef QUADSORT_H 19 | #include "quadsort.h" 20 | #endif 21 | 22 | // When sorting an array of pointers, like a string array, the QUAD_CACHE needs 23 | // to be set for proper performance when sorting large arrays. 24 | // crumsort_prim() can be used to sort arrays of 32 and 64 bit integers 25 | // without a comparison function or cache restrictions. 26 | 27 | // With a 6 MB L3 cache a value of 262144 works well. 28 | 29 | #ifdef cmp 30 | #define QUAD_CACHE 4294967295 31 | #else 32 | //#define QUAD_CACHE 131072 33 | #define QUAD_CACHE 262144 34 | //#define QUAD_CACHE 524288 35 | //#define QUAD_CACHE 4294967295 36 | #endif 37 | 38 | ////////////////////////////////////////////////////////// 39 | // ┌───────────────────────────────────────────────────┐// 40 | // │ ██████┐ ██████┐ ██████┐ ██████┐████████┐ │// 41 | // │ └────██┐└────██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// 42 | // │ █████┌┘ █████┌┘ ██████┌┘ ██│ ██│ │// 43 | // │ └───██┐██┌───┘ ██┌──██┐ ██│ ██│ │// 44 | // │ ██████┌┘███████┐ ██████┌┘██████┐ ██│ │// 45 | // │ └─────┘ └──────┘ └─────┘ └─────┘ └─┘ │// 46 | // └───────────────────────────────────────────────────┘// 47 | ////////////////////////////////////////////////////////// 48 | 49 | #define VAR int 50 | #define FUNC(NAME) NAME##32 51 | 52 | #include "crumsort.c" 53 | 54 | #undef VAR 55 | #undef FUNC 56 | 57 | // crumsort_prim 58 | 59 | #define VAR int 60 | #define FUNC(NAME) NAME##_int32 61 | #ifndef cmp 62 | #define cmp(a,b) (*(a) > *(b)) 63 | #include "crumsort.c" 64 | #undef cmp 65 | #else 66 | #include "crumsort.c" 67 | #endif 68 | #undef VAR 69 | #undef FUNC 70 | 71 | #define VAR unsigned int 72 | #define FUNC(NAME) NAME##_uint32 73 | #ifndef cmp 74 | #define cmp(a,b) (*(a) > *(b)) 75 | #include "crumsort.c" 76 | #undef cmp 77 | #else 78 | #include "crumsort.c" 79 | #endif 80 | #undef VAR 81 | #undef FUNC 82 | 83 | ////////////////////////////////////////////////////////// 84 | // ┌───────────────────────────────────────────────────┐// 85 | // │ █████┐ ██┐ ██┐ ██████┐ ██████┐████████┐ │// 86 | // │ ██┌───┘ ██│ ██│ ██┌──██┐└─██┌─┘└──██┌──┘ │// 87 | // │ ██████┐ ███████│ ██████┌┘ ██│ ██│ │// 88 | // │ ██┌──██┐└────██│ ██┌──██┐ ██│ ██│ │// 89 | // │ └█████┌┘ ██│ ██████┌┘██████┐ ██│ │// 90 | // │ └────┘ └─┘ └─────┘ └─────┘ └─┘ │// 91 | // └───────────────────────────────────────────────────┘// 92 | ////////////////////////////////////////////////////////// 93 | 94 | #define VAR long long 95 | #define FUNC(NAME) NAME##64 96 | 97 | #include "crumsort.c" 98 | 99 | #undef VAR 100 | #undef FUNC 101 | 102 | // crumsort_prim 103 | 104 | #define VAR long long 105 | #define FUNC(NAME) NAME##_int64 106 | #ifndef cmp 107 | #define cmp(a,b) (*(a) > *(b)) 108 | #include "crumsort.c" 109 | #undef cmp 110 | #else 111 | #include "crumsort.c" 112 | #endif 113 | #undef VAR 114 | #undef FUNC 115 | 116 | #define VAR unsigned long long 117 | #define FUNC(NAME) NAME##_uint64 118 | #ifndef cmp 119 | #define cmp(a,b) (*(a) > *(b)) 120 | #include "crumsort.c" 121 | #undef cmp 122 | #else 123 | #include "crumsort.c" 124 | #endif 125 | #undef VAR 126 | #undef FUNC 127 | 128 | // This section is outside of 32/64 bit pointer territory, so no cache checks 129 | // necessary, unless sorting 32+ byte structures. 130 | 131 | #undef QUAD_CACHE 132 | #define QUAD_CACHE 4294967295 133 | 134 | ////////////////////////////////////////////////////////// 135 | //┌────────────────────────────────────────────────────┐// 136 | //│ █████┐ ██████┐ ██████┐████████┐ │// 137 | //│ ██┌──██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// 138 | //│ └█████┌┘ ██████┌┘ ██│ ██│ │// 139 | //│ ██┌──██┐ ██┌──██┐ ██│ ██│ │// 140 | //│ └█████┌┘ ██████┌┘██████┐ ██│ │// 141 | //│ └────┘ └─────┘ └─────┘ └─┘ │// 142 | //└────────────────────────────────────────────────────┘// 143 | ////////////////////////////////////////////////////////// 144 | 145 | #define VAR char 146 | #define FUNC(NAME) NAME##8 147 | 148 | #include "crumsort.c" 149 | 150 | #undef VAR 151 | #undef FUNC 152 | 153 | ////////////////////////////////////////////////////////// 154 | //┌────────────────────────────────────────────────────┐// 155 | //│ ▄██┐ █████┐ ██████┐ ██████┐████████┐│// 156 | //│ ████│ ██┌───┘ ██┌──██┐└─██┌─┘└──██┌──┘│// 157 | //│ └─██│ ██████┐ ██████┌┘ ██│ ██│ │// 158 | //│ ██│ ██┌──██┐ ██┌──██┐ ██│ ██│ │// 159 | //│ ██████┐└█████┌┘ ██████┌┘██████┐ ██│ │// 160 | //│ └─────┘ └────┘ └─────┘ └─────┘ └─┘ │// 161 | //└────────────────────────────────────────────────────┘// 162 | ////////////////////////////////////////////////////////// 163 | 164 | #define VAR short 165 | #define FUNC(NAME) NAME##16 166 | 167 | #include "crumsort.c" 168 | 169 | #undef VAR 170 | #undef FUNC 171 | 172 | ////////////////////////////////////////////////////////// 173 | //┌────────────────────────────────────────────────────┐// 174 | //│ ▄██┐ ██████┐ █████┐ ██████┐ ██████┐████████┐ │// 175 | //│ ████│ └────██┐██┌──██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// 176 | //│ └─██│ █████┌┘└█████┌┘ ██████┌┘ ██│ ██│ │// 177 | //│ ██│ ██┌───┘ ██┌──██┐ ██┌──██┐ ██│ ██│ │// 178 | //│ ██████┐███████┐└█████┌┘ ██████┌┘██████┐ ██│ │// 179 | //│ └─────┘└──────┘ └────┘ └─────┘ └─────┘ └─┘ │// 180 | //└────────────────────────────────────────────────────┘// 181 | ////////////////////////////////////////////////////////// 182 | 183 | // 128 reflects the name, though the actual size of a long double is 64, 80, 184 | // 96, or 128 bits, depending on platform. 185 | 186 | #if (DBL_MANT_DIG < LDBL_MANT_DIG) 187 | #define VAR long double 188 | #define FUNC(NAME) NAME##128 189 | #include "crumsort.c" 190 | #undef VAR 191 | #undef FUNC 192 | #endif 193 | 194 | /////////////////////////////////////////////////////////// 195 | //┌─────────────────────────────────────────────────────┐// 196 | //│ ██████┐██┐ ██┐███████┐████████┐ ██████┐ ███┐ ███┐│// 197 | //│██┌────┘██│ ██│██┌────┘└──██┌──┘██┌───██┐████┐████││// 198 | //│██│ ██│ ██│███████┐ ██│ ██│ ██│██┌███┌██││// 199 | //│██│ ██│ ██│└────██│ ██│ ██│ ██│██│└█┌┘██││// 200 | //│└██████┐└██████┌┘███████│ ██│ └██████┌┘██│ └┘ ██││// 201 | //│ └─────┘ └─────┘ └──────┘ └─┘ └─────┘ └─┘ └─┘│// 202 | //└─────────────────────────────────────────────────────┘// 203 | /////////////////////////////////////////////////////////// 204 | 205 | /* 206 | typedef struct {char bytes[32];} struct256; 207 | #define VAR struct256 208 | #define FUNC(NAME) NAME##256 209 | 210 | #include "crumsort.c" 211 | 212 | #undef VAR 213 | #undef FUNC 214 | */ 215 | 216 | ////////////////////////////////////////////////////////////////////////// 217 | //┌─────────────────────────────────────────────────────────────────────┐// 218 | //│ ██████┐██████┐ ██┐ ██┐███┐ ███┐███████┐ ██████┐ ██████┐ ████████┐│// 219 | //│██┌────┘██┌──██┐██│ ██│████┐████│██┌────┘██┌───██┐██┌──██┐└──██┌──┘│// 220 | //│██│ ██████┌┘██│ ██│██┌███┌██│███████┐██│ ██│██████┌┘ ██│ │// 221 | //│██│ ██┌──██┐██│ ██│██│└█┌┘██│└────██│██│ ██│██┌──██┐ ██│ │// 222 | //│└██████┐██│ ██│└██████┌┘██│ └┘ ██│███████│└██████┌┘██│ ██│ ██│ │// 223 | //│ └─────┘└─┘ └─┘ └─────┘ └─┘ └─┘└──────┘ └─────┘ └─┘ └─┘ └─┘ │// 224 | //└─────────────────────────────────────────────────────────────────────┘// 225 | ////////////////////////////////////////////////////////////////////////// 226 | 227 | void crumsort(void *array, size_t nmemb, size_t size, CMPFUNC *cmp) 228 | { 229 | if (nmemb < 2) 230 | { 231 | return; 232 | } 233 | 234 | switch (size) 235 | { 236 | case sizeof(char): 237 | crumsort8(array, nmemb, cmp); 238 | return; 239 | 240 | case sizeof(short): 241 | crumsort16(array, nmemb, cmp); 242 | return; 243 | 244 | case sizeof(int): 245 | crumsort32(array, nmemb, cmp); 246 | return; 247 | 248 | case sizeof(long long): 249 | crumsort64(array, nmemb, cmp); 250 | return; 251 | #if (DBL_MANT_DIG < LDBL_MANT_DIG) 252 | case sizeof(long double): 253 | crumsort128(array, nmemb, cmp); 254 | return; 255 | #endif 256 | // case sizeof(struct256): 257 | // crumsort256(array, nmemb, cmp); 258 | return; 259 | 260 | default: 261 | #if (DBL_MANT_DIG < LDBL_MANT_DIG) 262 | assert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long) || size == sizeof(long double)); 263 | #else 264 | assert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long)); 265 | #endif 266 | // qsort(array, nmemb, size, cmp); 267 | } 268 | } 269 | 270 | // suggested size values for primitives: 271 | 272 | // case 0: unsigned char 273 | // case 1: signed char 274 | // case 2: signed short 275 | // case 3: unsigned short 276 | // case 4: signed int 277 | // case 5: unsigned int 278 | // case 6: float 279 | // case 7: double 280 | // case 8: signed long long 281 | // case 9: unsigned long long 282 | // case ?: long double, use sizeof(long double): 283 | 284 | void crumsort_prim(void *array, size_t nmemb, size_t size) 285 | { 286 | if (nmemb < 2) 287 | { 288 | return; 289 | } 290 | 291 | switch (size) 292 | { 293 | case 4: 294 | crumsort_int32(array, nmemb, NULL); 295 | return; 296 | case 5: 297 | crumsort_uint32(array, nmemb, NULL); 298 | return; 299 | case 8: 300 | crumsort_int64(array, nmemb, NULL); 301 | return; 302 | case 9: 303 | crumsort_uint64(array, nmemb, NULL); 304 | return; 305 | default: 306 | assert(size == sizeof(int) || size == sizeof(int) + 1 || size == sizeof(long long) || size == sizeof(long long) + 1); 307 | return; 308 | } 309 | } 310 | 311 | #undef QUAD_CACHE 312 | 313 | #endif 314 | -------------------------------------------------------------------------------- /src/quadsort.c: -------------------------------------------------------------------------------- 1 | // quadsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com 2 | 3 | // the next seven functions are used for sorting 0 to 31 elements 4 | 5 | void FUNC(parity_swap_four)(VAR *array, CMPFUNC *cmp) 6 | { 7 | VAR tmp, *pta = array; 8 | size_t x; 9 | 10 | branchless_swap(pta, tmp, x, cmp); pta += 2; 11 | branchless_swap(pta, tmp, x, cmp); pta--; 12 | 13 | if (cmp(pta, pta + 1) > 0) 14 | { 15 | tmp = pta[0]; pta[0] = pta[1]; pta[1] = tmp; pta--; 16 | 17 | branchless_swap(pta, tmp, x, cmp); pta += 2; 18 | branchless_swap(pta, tmp, x, cmp); pta--; 19 | branchless_swap(pta, tmp, x, cmp); 20 | } 21 | } 22 | 23 | void FUNC(parity_swap_five)(VAR *array, CMPFUNC *cmp) 24 | { 25 | VAR tmp, *pta = array; 26 | size_t x, y; 27 | 28 | branchless_swap(pta, tmp, x, cmp); pta += 2; 29 | branchless_swap(pta, tmp, x, cmp); pta -= 1; 30 | branchless_swap(pta, tmp, x, cmp); pta += 2; 31 | branchless_swap(pta, tmp, y, cmp); pta = array; 32 | 33 | if (x + y) 34 | { 35 | branchless_swap(pta, tmp, x, cmp); pta += 2; 36 | branchless_swap(pta, tmp, x, cmp); pta -= 1; 37 | branchless_swap(pta, tmp, x, cmp); pta += 2; 38 | branchless_swap(pta, tmp, x, cmp); pta = array; 39 | branchless_swap(pta, tmp, x, cmp); pta += 2; 40 | branchless_swap(pta, tmp, x, cmp); pta -= 1; 41 | } 42 | } 43 | 44 | void FUNC(parity_swap_six)(VAR *array, VAR *swap, CMPFUNC *cmp) 45 | { 46 | VAR tmp, *pta = array, *ptl, *ptr; 47 | size_t x, y; 48 | 49 | branchless_swap(pta, tmp, x, cmp); pta++; 50 | branchless_swap(pta, tmp, x, cmp); pta += 3; 51 | branchless_swap(pta, tmp, x, cmp); pta--; 52 | branchless_swap(pta, tmp, x, cmp); pta = array; 53 | 54 | if (cmp(pta + 2, pta + 3) <= 0) 55 | { 56 | branchless_swap(pta, tmp, x, cmp); pta += 4; 57 | branchless_swap(pta, tmp, x, cmp); 58 | return; 59 | } 60 | x = cmp(pta, pta + 1) > 0; y = !x; swap[0] = pta[x]; swap[1] = pta[y]; swap[2] = pta[2]; pta += 4; 61 | x = cmp(pta, pta + 1) > 0; y = !x; swap[4] = pta[x]; swap[5] = pta[y]; swap[3] = pta[-1]; 62 | 63 | pta = array; ptl = swap; ptr = swap + 3; 64 | 65 | head_branchless_merge(pta, x, ptl, ptr, cmp); 66 | head_branchless_merge(pta, x, ptl, ptr, cmp); 67 | head_branchless_merge(pta, x, ptl, ptr, cmp); 68 | 69 | pta = array + 5; ptl = swap + 2; ptr = swap + 5; 70 | 71 | tail_branchless_merge(pta, y, ptl, ptr, cmp); 72 | tail_branchless_merge(pta, y, ptl, ptr, cmp); 73 | *pta = cmp(ptl, ptr) > 0 ? *ptl : *ptr; 74 | } 75 | 76 | void FUNC(parity_swap_seven)(VAR *array, VAR *swap, CMPFUNC *cmp) 77 | { 78 | VAR tmp, *pta = array, *ptl, *ptr; 79 | size_t x, y; 80 | 81 | branchless_swap(pta, tmp, x, cmp); pta += 2; 82 | branchless_swap(pta, tmp, x, cmp); pta += 2; 83 | branchless_swap(pta, tmp, x, cmp); pta -= 3; 84 | branchless_swap(pta, tmp, y, cmp); pta += 2; 85 | branchless_swap(pta, tmp, x, cmp); pta += 2; y += x; 86 | branchless_swap(pta, tmp, x, cmp); pta -= 1; y += x; 87 | 88 | if (y == 0) return; 89 | 90 | branchless_swap(pta, tmp, x, cmp); pta = array; 91 | 92 | x = cmp(pta, pta + 1) > 0; swap[0] = pta[x]; swap[1] = pta[!x]; swap[2] = pta[2]; pta += 3; 93 | x = cmp(pta, pta + 1) > 0; swap[3] = pta[x]; swap[4] = pta[!x]; pta += 2; 94 | x = cmp(pta, pta + 1) > 0; swap[5] = pta[x]; swap[6] = pta[!x]; 95 | 96 | pta = array; ptl = swap; ptr = swap + 3; 97 | 98 | head_branchless_merge(pta, x, ptl, ptr, cmp); 99 | head_branchless_merge(pta, x, ptl, ptr, cmp); 100 | head_branchless_merge(pta, x, ptl, ptr, cmp); 101 | 102 | pta = array + 6; ptl = swap + 2; ptr = swap + 6; 103 | 104 | tail_branchless_merge(pta, y, ptl, ptr, cmp); 105 | tail_branchless_merge(pta, y, ptl, ptr, cmp); 106 | tail_branchless_merge(pta, y, ptl, ptr, cmp); 107 | *pta = cmp(ptl, ptr) > 0 ? *ptl : *ptr; 108 | } 109 | 110 | void FUNC(tiny_sort)(VAR *array, VAR *swap, size_t nmemb, CMPFUNC *cmp) 111 | { 112 | VAR tmp; 113 | size_t x; 114 | 115 | switch (nmemb) 116 | { 117 | case 0: 118 | case 1: 119 | return; 120 | case 2: 121 | branchless_swap(array, tmp, x, cmp); 122 | return; 123 | case 3: 124 | branchless_swap(array, tmp, x, cmp); array++; 125 | branchless_swap(array, tmp, x, cmp); array--; 126 | branchless_swap(array, tmp, x, cmp); 127 | return; 128 | case 4: 129 | FUNC(parity_swap_four)(array, cmp); 130 | return; 131 | case 5: 132 | FUNC(parity_swap_five)(array, cmp); 133 | return; 134 | case 6: 135 | FUNC(parity_swap_six)(array, swap, cmp); 136 | return; 137 | case 7: 138 | FUNC(parity_swap_seven)(array, swap, cmp); 139 | return; 140 | } 141 | } 142 | 143 | // left must be equal or one smaller than right 144 | 145 | void FUNC(parity_merge)(VAR *dest, VAR *from, size_t left, size_t right, CMPFUNC *cmp) 146 | { 147 | VAR *ptl, *ptr, *tpl, *tpr, *tpd, *ptd; 148 | #if !defined __clang__ 149 | size_t x, y; 150 | #endif 151 | ptl = from; 152 | ptr = from + left; 153 | ptd = dest; 154 | tpl = ptr - 1; 155 | tpr = tpl + right; 156 | tpd = dest + left + right - 1; 157 | 158 | if (left < right) 159 | { 160 | *ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++; 161 | } 162 | *ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++; 163 | 164 | #if !defined cmp && !defined __clang__ // cache limit workaround for gcc 165 | if (left > QUAD_CACHE) 166 | { 167 | while (--left) 168 | { 169 | *ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++; 170 | *tpd-- = cmp(tpl, tpr) > 0 ? *tpl-- : *tpr--; 171 | } 172 | } 173 | else 174 | #endif 175 | { 176 | while (--left) 177 | { 178 | head_branchless_merge(ptd, x, ptl, ptr, cmp); 179 | tail_branchless_merge(tpd, y, tpl, tpr, cmp); 180 | } 181 | } 182 | *tpd = cmp(tpl, tpr) > 0 ? *tpl : *tpr; 183 | } 184 | 185 | void FUNC(tail_swap)(VAR *array, VAR *swap, size_t nmemb, CMPFUNC *cmp) 186 | { 187 | if (nmemb < 8) 188 | { 189 | FUNC(tiny_sort)(array, swap, nmemb, cmp); 190 | return; 191 | } 192 | size_t quad1, quad2, quad3, quad4, half1, half2; 193 | 194 | half1 = nmemb / 2; 195 | quad1 = half1 / 2; 196 | quad2 = half1 - quad1; 197 | half2 = nmemb - half1; 198 | quad3 = half2 / 2; 199 | quad4 = half2 - quad3; 200 | 201 | VAR *pta = array; 202 | 203 | FUNC(tail_swap)(pta, swap, quad1, cmp); pta += quad1; 204 | FUNC(tail_swap)(pta, swap, quad2, cmp); pta += quad2; 205 | FUNC(tail_swap)(pta, swap, quad3, cmp); pta += quad3; 206 | FUNC(tail_swap)(pta, swap, quad4, cmp); 207 | 208 | if (cmp(array + quad1 - 1, array + quad1) <= 0 && cmp(array + half1 - 1, array + half1) <= 0 && cmp(pta - 1, pta) <= 0) 209 | { 210 | return; 211 | } 212 | FUNC(parity_merge)(swap, array, quad1, quad2, cmp); 213 | FUNC(parity_merge)(swap + half1, array + half1, quad3, quad4, cmp); 214 | FUNC(parity_merge)(array, swap, half1, half2, cmp); 215 | } 216 | 217 | // the next three functions create sorted blocks of 32 elements 218 | 219 | void FUNC(quad_reversal)(VAR *pta, VAR *ptz) 220 | { 221 | VAR *ptb, *pty, tmp1, tmp2; 222 | 223 | size_t loop = (ptz - pta) / 2; 224 | 225 | ptb = pta + loop; 226 | pty = ptz - loop; 227 | 228 | if (loop % 2 == 0) 229 | { 230 | tmp2 = *ptb; *ptb-- = *pty; *pty++ = tmp2; loop--; 231 | } 232 | 233 | loop /= 2; 234 | 235 | do 236 | { 237 | tmp1 = *pta; *pta++ = *ptz; *ptz-- = tmp1; 238 | tmp2 = *ptb; *ptb-- = *pty; *pty++ = tmp2; 239 | } 240 | while (loop--); 241 | } 242 | 243 | void FUNC(quad_swap_merge)(VAR *array, VAR *swap, CMPFUNC *cmp) 244 | { 245 | VAR *pts, *ptl, *ptr; 246 | #if !defined __clang__ 247 | size_t x; 248 | #endif 249 | parity_merge_two(array + 0, swap + 0, x, ptl, ptr, pts, cmp); 250 | parity_merge_two(array + 4, swap + 4, x, ptl, ptr, pts, cmp); 251 | 252 | parity_merge_four(swap, array, x, ptl, ptr, pts, cmp); 253 | } 254 | 255 | void FUNC(tail_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp); 256 | 257 | size_t FUNC(quad_swap)(VAR *array, size_t nmemb, CMPFUNC *cmp) 258 | { 259 | VAR tmp, swap[32]; 260 | size_t count; 261 | VAR *pta, *pts; 262 | unsigned char v1, v2, v3, v4, x; 263 | pta = array; 264 | 265 | count = nmemb / 8; 266 | 267 | while (count--) 268 | { 269 | v1 = cmp(pta + 0, pta + 1) > 0; 270 | v2 = cmp(pta + 2, pta + 3) > 0; 271 | v3 = cmp(pta + 4, pta + 5) > 0; 272 | v4 = cmp(pta + 6, pta + 7) > 0; 273 | 274 | switch (v1 + v2 * 2 + v3 * 4 + v4 * 8) 275 | { 276 | case 0: 277 | if (cmp(pta + 1, pta + 2) <= 0 && cmp(pta + 3, pta + 4) <= 0 && cmp(pta + 5, pta + 6) <= 0) 278 | { 279 | goto ordered; 280 | } 281 | FUNC(quad_swap_merge)(pta, swap, cmp); 282 | break; 283 | 284 | case 15: 285 | if (cmp(pta + 1, pta + 2) > 0 && cmp(pta + 3, pta + 4) > 0 && cmp(pta + 5, pta + 6) > 0) 286 | { 287 | pts = pta; 288 | goto reversed; 289 | } 290 | 291 | default: 292 | not_ordered: 293 | x = !v1; tmp = pta[x]; pta[0] = pta[v1]; pta[1] = tmp; pta += 2; 294 | x = !v2; tmp = pta[x]; pta[0] = pta[v2]; pta[1] = tmp; pta += 2; 295 | x = !v3; tmp = pta[x]; pta[0] = pta[v3]; pta[1] = tmp; pta += 2; 296 | x = !v4; tmp = pta[x]; pta[0] = pta[v4]; pta[1] = tmp; pta -= 6; 297 | 298 | FUNC(quad_swap_merge)(pta, swap, cmp); 299 | } 300 | pta += 8; 301 | 302 | continue; 303 | 304 | ordered: 305 | 306 | pta += 8; 307 | 308 | if (count--) 309 | { 310 | if ((v1 = cmp(pta + 0, pta + 1) > 0) | (v2 = cmp(pta + 2, pta + 3) > 0) | (v3 = cmp(pta + 4, pta + 5) > 0) | (v4 = cmp(pta + 6, pta + 7) > 0)) 311 | { 312 | if (v1 + v2 + v3 + v4 == 4 && cmp(pta + 1, pta + 2) > 0 && cmp(pta + 3, pta + 4) > 0 && cmp(pta + 5, pta + 6) > 0) 313 | { 314 | pts = pta; 315 | goto reversed; 316 | } 317 | goto not_ordered; 318 | } 319 | if (cmp(pta + 1, pta + 2) <= 0 && cmp(pta + 3, pta + 4) <= 0 && cmp(pta + 5, pta + 6) <= 0) 320 | { 321 | goto ordered; 322 | } 323 | FUNC(quad_swap_merge)(pta, swap, cmp); 324 | pta += 8; 325 | continue; 326 | } 327 | break; 328 | 329 | reversed: 330 | 331 | pta += 8; 332 | 333 | if (count--) 334 | { 335 | if ((v1 = cmp(pta + 0, pta + 1) <= 0) | (v2 = cmp(pta + 2, pta + 3) <= 0) | (v3 = cmp(pta + 4, pta + 5) <= 0) | (v4 = cmp(pta + 6, pta + 7) <= 0)) 336 | { 337 | // not reversed 338 | } 339 | else 340 | { 341 | if (cmp(pta - 1, pta) > 0 && cmp(pta + 1, pta + 2) > 0 && cmp(pta + 3, pta + 4) > 0 && cmp(pta + 5, pta + 6) > 0) 342 | { 343 | goto reversed; 344 | } 345 | } 346 | FUNC(quad_reversal)(pts, pta - 1); 347 | 348 | if (v1 + v2 + v3 + v4 == 4 && cmp(pta + 1, pta + 2) <= 0 && cmp(pta + 3, pta + 4) <= 0 && cmp(pta + 5, pta + 6) <= 0) 349 | { 350 | goto ordered; 351 | } 352 | if (v1 + v2 + v3 + v4 == 0 && cmp(pta + 1, pta + 2) > 0 && cmp(pta + 3, pta + 4) > 0 && cmp(pta + 5, pta + 6) > 0) 353 | { 354 | pts = pta; 355 | goto reversed; 356 | } 357 | 358 | x = !v1; tmp = pta[v1]; pta[0] = pta[x]; pta[1] = tmp; pta += 2; 359 | x = !v2; tmp = pta[v2]; pta[0] = pta[x]; pta[1] = tmp; pta += 2; 360 | x = !v3; tmp = pta[v3]; pta[0] = pta[x]; pta[1] = tmp; pta += 2; 361 | x = !v4; tmp = pta[v4]; pta[0] = pta[x]; pta[1] = tmp; pta -= 6; 362 | 363 | if (cmp(pta + 1, pta + 2) > 0 || cmp(pta + 3, pta + 4) > 0 || cmp(pta + 5, pta + 6) > 0) 364 | { 365 | FUNC(quad_swap_merge)(pta, swap, cmp); 366 | } 367 | pta += 8; 368 | continue; 369 | } 370 | 371 | switch (nmemb % 8) 372 | { 373 | case 7: if (cmp(pta + 5, pta + 6) <= 0) break; 374 | case 6: if (cmp(pta + 4, pta + 5) <= 0) break; 375 | case 5: if (cmp(pta + 3, pta + 4) <= 0) break; 376 | case 4: if (cmp(pta + 2, pta + 3) <= 0) break; 377 | case 3: if (cmp(pta + 1, pta + 2) <= 0) break; 378 | case 2: if (cmp(pta + 0, pta + 1) <= 0) break; 379 | case 1: if (cmp(pta - 1, pta + 0) <= 0) break; 380 | case 0: 381 | FUNC(quad_reversal)(pts, pta + nmemb % 8 - 1); 382 | 383 | if (pts == array) 384 | { 385 | return 1; 386 | } 387 | goto reverse_end; 388 | } 389 | FUNC(quad_reversal)(pts, pta - 1); 390 | break; 391 | } 392 | FUNC(tail_swap)(pta, swap, nmemb % 8, cmp); 393 | 394 | reverse_end: 395 | 396 | pta = array; 397 | 398 | for (count = nmemb / 32 ; count-- ; pta += 32) 399 | { 400 | if (cmp(pta + 7, pta + 8) <= 0 && cmp(pta + 15, pta + 16) <= 0 && cmp(pta + 23, pta + 24) <= 0) 401 | { 402 | continue; 403 | } 404 | FUNC(parity_merge)(swap, pta, 8, 8, cmp); 405 | FUNC(parity_merge)(swap + 16, pta + 16, 8, 8, cmp); 406 | FUNC(parity_merge)(pta, swap, 16, 16, cmp); 407 | } 408 | 409 | if (nmemb % 32 > 8) 410 | { 411 | FUNC(tail_merge)(pta, swap, 32, nmemb % 32, 8, cmp); 412 | } 413 | return 0; 414 | } 415 | 416 | // The next six functions are quad merge support routines 417 | 418 | void FUNC(cross_merge)(VAR *dest, VAR *from, size_t left, size_t right, CMPFUNC *cmp) 419 | { 420 | VAR *ptl, *tpl, *ptr, *tpr, *ptd, *tpd; 421 | size_t loop; 422 | #if !defined __clang__ 423 | size_t x, y; 424 | #endif 425 | ptl = from; 426 | ptr = from + left; 427 | tpl = ptr - 1; 428 | tpr = tpl + right; 429 | 430 | if (left + 1 >= right && right >= left && left >= 32) 431 | { 432 | if (cmp(ptl + 15, ptr) > 0 && cmp(ptl, ptr + 15) <= 0 && cmp(tpl, tpr - 15) > 0 && cmp(tpl - 15, tpr) <= 0) 433 | { 434 | FUNC(parity_merge)(dest, from, left, right, cmp); 435 | return; 436 | } 437 | } 438 | ptd = dest; 439 | tpd = dest + left + right - 1; 440 | 441 | while (1) 442 | { 443 | if (tpl - ptl > 8) 444 | { 445 | ptl8_ptr: if (cmp(ptl + 7, ptr) <= 0) 446 | { 447 | memcpy(ptd, ptl, 8 * sizeof(VAR)); ptd += 8; ptl += 8; 448 | 449 | if (tpl - ptl > 8) {goto ptl8_ptr;} continue; 450 | } 451 | 452 | tpl8_tpr: if (cmp(tpl - 7, tpr) > 0) 453 | { 454 | tpd -= 7; tpl -= 7; memcpy(tpd--, tpl--, 8 * sizeof(VAR)); 455 | 456 | if (tpl - ptl > 8) {goto tpl8_tpr;} continue; 457 | } 458 | } 459 | 460 | if (tpr - ptr > 8) 461 | { 462 | ptl_ptr8: if (cmp(ptl, ptr + 7) > 0) 463 | { 464 | memcpy(ptd, ptr, 8 * sizeof(VAR)); ptd += 8; ptr += 8; 465 | 466 | if (tpr - ptr > 8) {goto ptl_ptr8;} continue; 467 | } 468 | 469 | tpl_tpr8: if (cmp(tpl, tpr - 7) <= 0) 470 | { 471 | tpd -= 7; tpr -= 7; memcpy(tpd--, tpr--, 8 * sizeof(VAR)); 472 | 473 | if (tpr - ptr > 8) {goto tpl_tpr8;} continue; 474 | } 475 | } 476 | 477 | if (tpd - ptd < 16) 478 | { 479 | break; 480 | } 481 | 482 | #if !defined cmp && !defined __clang__ 483 | if (left > QUAD_CACHE) 484 | { 485 | loop = 8; do 486 | { 487 | *ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++; 488 | *tpd-- = cmp(tpl, tpr) > 0 ? *tpl-- : *tpr--; 489 | } 490 | while (--loop); 491 | } 492 | else 493 | #endif 494 | { 495 | loop = 8; do 496 | { 497 | head_branchless_merge(ptd, x, ptl, ptr, cmp); 498 | tail_branchless_merge(tpd, y, tpl, tpr, cmp); 499 | } 500 | while (--loop); 501 | } 502 | } 503 | 504 | while (ptl <= tpl && ptr <= tpr) 505 | { 506 | *ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++; 507 | } 508 | while (ptl <= tpl) 509 | { 510 | *ptd++ = *ptl++; 511 | } 512 | while (ptr <= tpr) 513 | { 514 | *ptd++ = *ptr++; 515 | } 516 | } 517 | 518 | void FUNC(quad_merge_block)(VAR *array, VAR *swap, size_t block, CMPFUNC *cmp) 519 | { 520 | VAR *pt1, *pt2, *pt3; 521 | size_t block_x_2 = block * 2; 522 | 523 | pt1 = array + block; 524 | pt2 = pt1 + block; 525 | pt3 = pt2 + block; 526 | 527 | switch ((cmp(pt1 - 1, pt1) <= 0) | (cmp(pt3 - 1, pt3) <= 0) * 2) 528 | { 529 | case 0: 530 | FUNC(cross_merge)(swap, array, block, block, cmp); 531 | FUNC(cross_merge)(swap + block_x_2, pt2, block, block, cmp); 532 | break; 533 | case 1: 534 | memcpy(swap, array, block_x_2 * sizeof(VAR)); 535 | FUNC(cross_merge)(swap + block_x_2, pt2, block, block, cmp); 536 | break; 537 | case 2: 538 | FUNC(cross_merge)(swap, array, block, block, cmp); 539 | memcpy(swap + block_x_2, pt2, block_x_2 * sizeof(VAR)); 540 | break; 541 | case 3: 542 | if (cmp(pt2 - 1, pt2) <= 0) 543 | return; 544 | memcpy(swap, array, block_x_2 * 2 * sizeof(VAR)); 545 | } 546 | FUNC(cross_merge)(array, swap, block_x_2, block_x_2, cmp); 547 | } 548 | 549 | size_t FUNC(quad_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp) 550 | { 551 | VAR *pta, *pte; 552 | 553 | pte = array + nmemb; 554 | 555 | block *= 4; 556 | 557 | while (block <= nmemb && block <= swap_size) 558 | { 559 | pta = array; 560 | 561 | do 562 | { 563 | FUNC(quad_merge_block)(pta, swap, block / 4, cmp); 564 | 565 | pta += block; 566 | } 567 | while (pta + block <= pte); 568 | 569 | FUNC(tail_merge)(pta, swap, swap_size, pte - pta, block / 4, cmp); 570 | 571 | block *= 4; 572 | } 573 | 574 | FUNC(tail_merge)(array, swap, swap_size, nmemb, block / 4, cmp); 575 | 576 | return block / 2; 577 | } 578 | 579 | void FUNC(partial_forward_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp) 580 | { 581 | VAR *ptl, *ptr, *tpl, *tpr; 582 | size_t x; 583 | 584 | if (nmemb == block) 585 | { 586 | return; 587 | } 588 | 589 | ptr = array + block; 590 | tpr = array + nmemb - 1; 591 | 592 | if (cmp(ptr - 1, ptr) <= 0) 593 | { 594 | return; 595 | } 596 | 597 | memcpy(swap, array, block * sizeof(VAR)); 598 | 599 | ptl = swap; 600 | tpl = swap + block - 1; 601 | 602 | while (ptl < tpl - 1 && ptr < tpr - 1) 603 | { 604 | ptr2: if (cmp(ptl, ptr + 1) > 0) 605 | { 606 | *array++ = *ptr++; *array++ = *ptr++; 607 | 608 | if (ptr < tpr - 1) {goto ptr2;} break; 609 | } 610 | if (cmp(ptl + 1, ptr) <= 0) 611 | { 612 | *array++ = *ptl++; *array++ = *ptl++; 613 | 614 | if (ptl < tpl - 1) {goto ptl2;} break; 615 | } 616 | 617 | goto cross_swap; 618 | 619 | ptl2: if (cmp(ptl + 1, ptr) <= 0) 620 | { 621 | *array++ = *ptl++; *array++ = *ptl++; 622 | 623 | if (ptl < tpl - 1) {goto ptl2;} break; 624 | } 625 | 626 | if (cmp(ptl, ptr + 1) > 0) 627 | { 628 | *array++ = *ptr++; *array++ = *ptr++; 629 | 630 | if (ptr < tpr - 1) {goto ptr2;} break; 631 | } 632 | 633 | cross_swap: 634 | 635 | x = cmp(ptl, ptr) <= 0; array[x] = *ptr; ptr += 1; array[!x] = *ptl; ptl += 1; array += 2; 636 | head_branchless_merge(array, x, ptl, ptr, cmp); 637 | } 638 | 639 | while (ptl <= tpl && ptr <= tpr) 640 | { 641 | *array++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++; 642 | } 643 | 644 | while (ptl <= tpl) 645 | { 646 | *array++ = *ptl++; 647 | } 648 | } 649 | 650 | void FUNC(partial_backward_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp) 651 | { 652 | VAR *tpl, *tpa, *tpr; 653 | size_t right, loop, x; 654 | 655 | if (nmemb == block) 656 | { 657 | return; 658 | } 659 | 660 | tpl = array + block - 1; 661 | tpa = array + nmemb - 1; 662 | 663 | if (cmp(tpl, tpl + 1) <= 0) 664 | { 665 | return; 666 | } 667 | 668 | right = nmemb - block; 669 | 670 | if (nmemb <= swap_size && right >= 64) 671 | { 672 | FUNC(cross_merge)(swap, array, block, right, cmp); 673 | 674 | memcpy(array, swap, nmemb * sizeof(VAR)); 675 | 676 | return; 677 | } 678 | 679 | memcpy(swap, array + block, right * sizeof(VAR)); 680 | 681 | tpr = swap + right - 1; 682 | 683 | while (tpl > array + 16 && tpr > swap + 16) 684 | { 685 | tpl_tpr16: if (cmp(tpl, tpr - 15) <= 0) 686 | { 687 | loop = 16; do *tpa-- = *tpr--; while (--loop); 688 | 689 | if (tpr > swap + 16) {goto tpl_tpr16;} break; 690 | } 691 | 692 | tpl16_tpr: if (cmp(tpl - 15, tpr) > 0) 693 | { 694 | loop = 16; do *tpa-- = *tpl--; while (--loop); 695 | 696 | if (tpl > array + 16) {goto tpl16_tpr;} break; 697 | } 698 | loop = 8; do 699 | { 700 | if (cmp(tpl, tpr - 1) <= 0) 701 | { 702 | *tpa-- = *tpr--; *tpa-- = *tpr--; 703 | } 704 | else if (cmp(tpl - 1, tpr) > 0) 705 | { 706 | *tpa-- = *tpl--; *tpa-- = *tpl--; 707 | } 708 | else 709 | { 710 | x = cmp(tpl, tpr) <= 0; tpa--; tpa[x] = *tpr; tpr -= 1; tpa[!x] = *tpl; tpl -= 1; tpa--; 711 | tail_branchless_merge(tpa, x, tpl, tpr, cmp); 712 | } 713 | } 714 | while (--loop); 715 | } 716 | 717 | while (tpr > swap + 1 && tpl > array + 1) 718 | { 719 | tpr2: if (cmp(tpl, tpr - 1) <= 0) 720 | { 721 | *tpa-- = *tpr--; *tpa-- = *tpr--; 722 | 723 | if (tpr > swap + 1) {goto tpr2;} break; 724 | } 725 | 726 | if (cmp(tpl - 1, tpr) > 0) 727 | { 728 | *tpa-- = *tpl--; *tpa-- = *tpl--; 729 | 730 | if (tpl > array + 1) {goto tpl2;} break; 731 | } 732 | goto cross_swap; 733 | 734 | tpl2: if (cmp(tpl - 1, tpr) > 0) 735 | { 736 | *tpa-- = *tpl--; *tpa-- = *tpl--; 737 | 738 | if (tpl > array + 1) {goto tpl2;} break; 739 | } 740 | 741 | if (cmp(tpl, tpr - 1) <= 0) 742 | { 743 | *tpa-- = *tpr--; *tpa-- = *tpr--; 744 | 745 | if (tpr > swap + 1) {goto tpr2;} break; 746 | } 747 | cross_swap: 748 | 749 | x = cmp(tpl, tpr) <= 0; tpa--; tpa[x] = *tpr; tpr -= 1; tpa[!x] = *tpl; tpl -= 1; tpa--; 750 | tail_branchless_merge(tpa, x, tpl, tpr, cmp); 751 | } 752 | 753 | while (tpr >= swap && tpl >= array) 754 | { 755 | *tpa-- = cmp(tpl, tpr) > 0 ? *tpl-- : *tpr--; 756 | } 757 | 758 | while (tpr >= swap) 759 | { 760 | *tpa-- = *tpr--; 761 | } 762 | } 763 | 764 | void FUNC(tail_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp) 765 | { 766 | VAR *pta, *pte; 767 | 768 | pte = array + nmemb; 769 | 770 | while (block < nmemb && block <= swap_size) 771 | { 772 | for (pta = array ; pta + block < pte ; pta += block * 2) 773 | { 774 | if (pta + block * 2 < pte) 775 | { 776 | FUNC(partial_backward_merge)(pta, swap, swap_size, block * 2, block, cmp); 777 | 778 | continue; 779 | } 780 | FUNC(partial_backward_merge)(pta, swap, swap_size, pte - pta, block, cmp); 781 | 782 | break; 783 | } 784 | block *= 2; 785 | } 786 | } 787 | 788 | // the next four functions provide in-place rotate merge support 789 | 790 | void FUNC(trinity_rotation)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t left) 791 | { 792 | VAR temp; 793 | size_t bridge, right = nmemb - left; 794 | 795 | if (swap_size > 65536) 796 | { 797 | swap_size = 65536; 798 | } 799 | 800 | if (left < right) 801 | { 802 | if (left <= swap_size) 803 | { 804 | memcpy(swap, array, left * sizeof(VAR)); 805 | memmove(array, array + left, right * sizeof(VAR)); 806 | memcpy(array + right, swap, left * sizeof(VAR)); 807 | } 808 | else 809 | { 810 | VAR *pta, *ptb, *ptc, *ptd; 811 | 812 | pta = array; 813 | ptb = pta + left; 814 | 815 | bridge = right - left; 816 | 817 | if (bridge <= swap_size && bridge > 3) 818 | { 819 | ptc = pta + right; 820 | ptd = ptc + left; 821 | 822 | memcpy(swap, ptb, bridge * sizeof(VAR)); 823 | 824 | while (left--) 825 | { 826 | *--ptc = *--ptd; *ptd = *--ptb; 827 | } 828 | memcpy(pta, swap, bridge * sizeof(VAR)); 829 | } 830 | else 831 | { 832 | ptc = ptb; 833 | ptd = ptc + right; 834 | 835 | bridge = left / 2; 836 | 837 | while (bridge--) 838 | { 839 | temp = *--ptb; *ptb = *pta; *pta++ = *ptc; *ptc++ = *--ptd; *ptd = temp; 840 | } 841 | 842 | bridge = (ptd - ptc) / 2; 843 | 844 | while (bridge--) 845 | { 846 | temp = *ptc; *ptc++ = *--ptd; *ptd = *pta; *pta++ = temp; 847 | } 848 | 849 | bridge = (ptd - pta) / 2; 850 | 851 | while (bridge--) 852 | { 853 | temp = *pta; *pta++ = *--ptd; *ptd = temp; 854 | } 855 | } 856 | } 857 | } 858 | else if (right < left) 859 | { 860 | if (right <= swap_size) 861 | { 862 | memcpy(swap, array + left, right * sizeof(VAR)); 863 | memmove(array + right, array, left * sizeof(VAR)); 864 | memcpy(array, swap, right * sizeof(VAR)); 865 | } 866 | else 867 | { 868 | VAR *pta, *ptb, *ptc, *ptd; 869 | 870 | pta = array; 871 | ptb = pta + left; 872 | 873 | bridge = left - right; 874 | 875 | if (bridge <= swap_size && bridge > 3) 876 | { 877 | ptc = pta + right; 878 | ptd = ptc + left; 879 | 880 | memcpy(swap, ptc, bridge * sizeof(VAR)); 881 | 882 | while (right--) 883 | { 884 | *ptc++ = *pta; *pta++ = *ptb++; 885 | } 886 | memcpy(ptd - bridge, swap, bridge * sizeof(VAR)); 887 | } 888 | else 889 | { 890 | ptc = ptb; 891 | ptd = ptc + right; 892 | 893 | bridge = right / 2; 894 | 895 | while (bridge--) 896 | { 897 | temp = *--ptb; *ptb = *pta; *pta++ = *ptc; *ptc++ = *--ptd; *ptd = temp; 898 | } 899 | 900 | bridge = (ptb - pta) / 2; 901 | 902 | while (bridge--) 903 | { 904 | temp = *--ptb; *ptb = *pta; *pta++ = *--ptd; *ptd = temp; 905 | } 906 | 907 | bridge = (ptd - pta) / 2; 908 | 909 | while (bridge--) 910 | { 911 | temp = *pta; *pta++ = *--ptd; *ptd = temp; 912 | } 913 | } 914 | } 915 | } 916 | else 917 | { 918 | VAR *pta, *ptb; 919 | 920 | pta = array; 921 | ptb = pta + left; 922 | 923 | while (left--) 924 | { 925 | temp = *pta; *pta++ = *ptb; *ptb++ = temp; 926 | } 927 | } 928 | } 929 | 930 | size_t FUNC(monobound_binary_first)(VAR *array, VAR *value, size_t top, CMPFUNC *cmp) 931 | { 932 | VAR *end; 933 | size_t mid; 934 | 935 | end = array + top; 936 | 937 | while (top > 1) 938 | { 939 | mid = top / 2; 940 | 941 | if (cmp(value, end - mid) <= 0) 942 | { 943 | end -= mid; 944 | } 945 | top -= mid; 946 | } 947 | 948 | if (cmp(value, end - 1) <= 0) 949 | { 950 | end--; 951 | } 952 | return (end - array); 953 | } 954 | 955 | void FUNC(rotate_merge_block)(VAR *array, VAR *swap, size_t swap_size, size_t lblock, size_t right, CMPFUNC *cmp) 956 | { 957 | size_t left, rblock, unbalanced; 958 | 959 | if (cmp(array + lblock - 1, array + lblock) <= 0) 960 | { 961 | return; 962 | } 963 | 964 | rblock = lblock / 2; 965 | lblock -= rblock; 966 | 967 | left = FUNC(monobound_binary_first)(array + lblock + rblock, array + lblock, right, cmp); 968 | 969 | right -= left; 970 | 971 | // [ lblock ] [ rblock ] [ left ] [ right ] 972 | 973 | if (left) 974 | { 975 | if (lblock + left <= swap_size) 976 | { 977 | memcpy(swap, array, lblock * sizeof(VAR)); 978 | memcpy(swap + lblock, array + lblock + rblock, left * sizeof(VAR)); 979 | memmove(array + lblock + left, array + lblock, rblock * sizeof(VAR)); 980 | 981 | FUNC(cross_merge)(array, swap, lblock, left, cmp); 982 | } 983 | else 984 | { 985 | FUNC(trinity_rotation)(array + lblock, swap, swap_size, rblock + left, rblock); 986 | 987 | unbalanced = (left * 2 < lblock) | (lblock * 2 < left); 988 | 989 | if (unbalanced && left <= swap_size) 990 | { 991 | FUNC(partial_backward_merge)(array, swap, swap_size, lblock + left, lblock, cmp); 992 | } 993 | else if (unbalanced && lblock <= swap_size) 994 | { 995 | FUNC(partial_forward_merge)(array, swap, swap_size, lblock + left, lblock, cmp); 996 | } 997 | else 998 | { 999 | FUNC(rotate_merge_block)(array, swap, swap_size, lblock, left, cmp); 1000 | } 1001 | } 1002 | } 1003 | 1004 | if (right) 1005 | { 1006 | unbalanced = (right * 2 < rblock) | (rblock * 2 < right); 1007 | 1008 | if ((unbalanced && right <= swap_size) || right + rblock <= swap_size) 1009 | { 1010 | FUNC(partial_backward_merge)(array + lblock + left, swap, swap_size, rblock + right, rblock, cmp); 1011 | } 1012 | else if (unbalanced && rblock <= swap_size) 1013 | { 1014 | FUNC(partial_forward_merge)(array + lblock + left, swap, swap_size, rblock + right, rblock, cmp); 1015 | } 1016 | else 1017 | { 1018 | FUNC(rotate_merge_block)(array + lblock + left, swap, swap_size, rblock, right, cmp); 1019 | } 1020 | } 1021 | } 1022 | 1023 | void FUNC(rotate_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp) 1024 | { 1025 | VAR *pta, *pte; 1026 | 1027 | pte = array + nmemb; 1028 | 1029 | if (nmemb <= block * 2 && nmemb - block <= swap_size) 1030 | { 1031 | FUNC(partial_backward_merge)(array, swap, swap_size, nmemb, block, cmp); 1032 | 1033 | return; 1034 | } 1035 | 1036 | while (block < nmemb) 1037 | { 1038 | for (pta = array ; pta + block < pte ; pta += block * 2) 1039 | { 1040 | if (pta + block * 2 < pte) 1041 | { 1042 | FUNC(rotate_merge_block)(pta, swap, swap_size, block, block, cmp); 1043 | 1044 | continue; 1045 | } 1046 | FUNC(rotate_merge_block)(pta, swap, swap_size, block, pte - pta - block, cmp); 1047 | 1048 | break; 1049 | } 1050 | block *= 2; 1051 | } 1052 | } 1053 | 1054 | /////////////////////////////////////////////////////////////////////////////// 1055 | //┌─────────────────────────────────────────────────────────────────────────┐// 1056 | //│ ██████┐ ██┐ ██┐ █████┐ ██████┐ ███████┐ ██████┐ ██████┐ ████████┐ │// 1057 | //│ ██┌───██┐██│ ██│██┌──██┐██┌──██┐██┌────┘██┌───██┐██┌──██┐└──██┌──┘ │// 1058 | //│ ██│ ██│██│ ██│███████│██│ ██│███████┐██│ ██│██████┌┘ ██│ │// 1059 | //│ ██│▄▄ ██│██│ ██│██┌──██│██│ ██│└────██│██│ ██│██┌──██┐ ██│ │// 1060 | //│ └██████┌┘└██████┌┘██│ ██│██████┌┘███████│└██████┌┘██│ ██│ ██│ │// 1061 | //│ └──▀▀─┘ └─────┘ └─┘ └─┘└─────┘ └──────┘ └─────┘ └─┘ └─┘ └─┘ │// 1062 | //└─────────────────────────────────────────────────────────────────────────┘// 1063 | /////////////////////////////////////////////////////////////////////////////// 1064 | 1065 | void FUNC(quadsort)(void *array, size_t nmemb, CMPFUNC *cmp) 1066 | { 1067 | VAR *pta = (VAR *) array; 1068 | 1069 | if (nmemb < 32) 1070 | { 1071 | VAR swap[nmemb]; 1072 | 1073 | FUNC(tail_swap)(pta, swap, nmemb, cmp); 1074 | } 1075 | else if (FUNC(quad_swap)(pta, nmemb, cmp) == 0) 1076 | { 1077 | VAR *swap = NULL; 1078 | size_t block, swap_size = nmemb; 1079 | 1080 | if (nmemb > 4194304) for (swap_size = 4194304 ; swap_size * 8 <= nmemb ; swap_size *= 4) {} 1081 | 1082 | swap = (VAR *) malloc(swap_size * sizeof(VAR)); 1083 | 1084 | if (swap == NULL) 1085 | { 1086 | VAR stack[512]; 1087 | 1088 | block = FUNC(quad_merge)(pta, stack, 512, nmemb, 32, cmp); 1089 | 1090 | FUNC(rotate_merge)(pta, stack, 512, nmemb, block, cmp); 1091 | 1092 | return; 1093 | } 1094 | block = FUNC(quad_merge)(pta, swap, swap_size, nmemb, 32, cmp); 1095 | 1096 | FUNC(rotate_merge)(pta, swap, swap_size, nmemb, block, cmp); 1097 | 1098 | free(swap); 1099 | } 1100 | } 1101 | 1102 | void FUNC(quadsort_swap)(void *array, void *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp) 1103 | { 1104 | VAR *pta = (VAR *) array; 1105 | VAR *pts = (VAR *) swap; 1106 | 1107 | if (nmemb <= 96) 1108 | { 1109 | FUNC(tail_swap)(pta, pts, nmemb, cmp); 1110 | } 1111 | else if (FUNC(quad_swap)(pta, nmemb, cmp) == 0) 1112 | { 1113 | size_t block = FUNC(quad_merge)(pta, pts, swap_size, nmemb, 32, cmp); 1114 | 1115 | FUNC(rotate_merge)(pta, pts, swap_size, nmemb, block, cmp); 1116 | } 1117 | } 1118 | -------------------------------------------------------------------------------- /src/quadsort.h: -------------------------------------------------------------------------------- 1 | // quadsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com 2 | 3 | #ifndef QUADSORT_H 4 | #define QUADSORT_H 5 | 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | 13 | //#include 14 | 15 | typedef int CMPFUNC (const void *a, const void *b); 16 | 17 | //#define cmp(a,b) (*(a) > *(b)) 18 | 19 | 20 | // When sorting an array of pointers, like a string array, the QUAD_CACHE needs 21 | // to be set for proper performance when sorting large arrays. 22 | // quadsort_prim() can be used to sort arrays of 32 and 64 bit integers 23 | // without a comparison function or cache restrictions. 24 | 25 | // With a 6 MB L3 cache a value of 262144 works well. 26 | 27 | #ifdef cmp 28 | #define QUAD_CACHE 4294967295 29 | #else 30 | //#define QUAD_CACHE 131072 31 | #define QUAD_CACHE 262144 32 | //#define QUAD_CACHE 524288 33 | //#define QUAD_CACHE 4294967295 34 | #endif 35 | 36 | // utilize branchless ternary operations in clang 37 | 38 | #if !defined __clang__ 39 | #define head_branchless_merge(ptd, x, ptl, ptr, cmp) \ 40 | x = cmp(ptl, ptr) <= 0; \ 41 | *ptd = *ptl; \ 42 | ptl += x; \ 43 | ptd[x] = *ptr; \ 44 | ptr += !x; \ 45 | ptd++; 46 | #else 47 | #define head_branchless_merge(ptd, x, ptl, ptr, cmp) \ 48 | *ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++; 49 | #endif 50 | 51 | #if !defined __clang__ 52 | #define tail_branchless_merge(tpd, y, tpl, tpr, cmp) \ 53 | y = cmp(tpl, tpr) <= 0; \ 54 | *tpd = *tpl; \ 55 | tpl -= !y; \ 56 | tpd--; \ 57 | tpd[y] = *tpr; \ 58 | tpr -= y; 59 | #else 60 | #define tail_branchless_merge(tpd, x, tpl, tpr, cmp) \ 61 | *tpd-- = cmp(tpl, tpr) > 0 ? *tpl-- : *tpr--; 62 | #endif 63 | 64 | // guarantee small parity merges are inlined with minimal overhead 65 | 66 | #define parity_merge_two(array, swap, x, ptl, ptr, pts, cmp) \ 67 | ptl = array; ptr = array + 2; pts = swap; \ 68 | head_branchless_merge(pts, x, ptl, ptr, cmp); \ 69 | *pts = cmp(ptl, ptr) <= 0 ? *ptl : *ptr; \ 70 | \ 71 | ptl = array + 1; ptr = array + 3; pts = swap + 3; \ 72 | tail_branchless_merge(pts, x, ptl, ptr, cmp); \ 73 | *pts = cmp(ptl, ptr) > 0 ? *ptl : *ptr; 74 | 75 | #define parity_merge_four(array, swap, x, ptl, ptr, pts, cmp) \ 76 | ptl = array + 0; ptr = array + 4; pts = swap; \ 77 | head_branchless_merge(pts, x, ptl, ptr, cmp); \ 78 | head_branchless_merge(pts, x, ptl, ptr, cmp); \ 79 | head_branchless_merge(pts, x, ptl, ptr, cmp); \ 80 | *pts = cmp(ptl, ptr) <= 0 ? *ptl : *ptr; \ 81 | \ 82 | ptl = array + 3; ptr = array + 7; pts = swap + 7; \ 83 | tail_branchless_merge(pts, x, ptl, ptr, cmp); \ 84 | tail_branchless_merge(pts, x, ptl, ptr, cmp); \ 85 | tail_branchless_merge(pts, x, ptl, ptr, cmp); \ 86 | *pts = cmp(ptl, ptr) > 0 ? *ptl : *ptr; 87 | 88 | 89 | #if !defined __clang__ 90 | #define branchless_swap(pta, swap, x, cmp) \ 91 | x = cmp(pta, pta + 1) > 0; \ 92 | swap = pta[!x]; \ 93 | pta[0] = pta[x]; \ 94 | pta[1] = swap; 95 | #else 96 | #define branchless_swap(pta, swap, x, cmp) \ 97 | x = 0; \ 98 | swap = cmp(pta, pta + 1) > 0 ? pta[x++] : pta[1]; \ 99 | pta[0] = pta[x]; \ 100 | pta[1] = swap; 101 | #endif 102 | 103 | #define swap_branchless(pta, swap, x, y, cmp) \ 104 | x = cmp(pta, pta + 1) > 0; \ 105 | y = !x; \ 106 | swap = pta[y]; \ 107 | pta[0] = pta[x]; \ 108 | pta[1] = swap; 109 | 110 | ////////////////////////////////////////////////////////// 111 | // ┌───────────────────────────────────────────────────┐// 112 | // │ ██████┐ ██████┐ ██████┐ ██████┐████████┐ │// 113 | // │ └────██┐└────██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// 114 | // │ █████┌┘ █████┌┘ ██████┌┘ ██│ ██│ │// 115 | // │ └───██┐██┌───┘ ██┌──██┐ ██│ ██│ │// 116 | // │ ██████┌┘███████┐ ██████┌┘██████┐ ██│ │// 117 | // │ └─────┘ └──────┘ └─────┘ └─────┘ └─┘ │// 118 | // └───────────────────────────────────────────────────┘// 119 | ////////////////////////////////////////////////////////// 120 | 121 | #define VAR int 122 | #define FUNC(NAME) NAME##32 123 | 124 | #include "quadsort.c" 125 | 126 | #undef VAR 127 | #undef FUNC 128 | 129 | // quadsort_prim 130 | 131 | #define VAR int 132 | #define FUNC(NAME) NAME##_int32 133 | #ifndef cmp 134 | #define cmp(a,b) (*(a) > *(b)) 135 | #include "quadsort.c" 136 | #undef cmp 137 | #else 138 | #include "quadsort.c" 139 | #endif 140 | #undef VAR 141 | #undef FUNC 142 | 143 | #define VAR unsigned int 144 | #define FUNC(NAME) NAME##_uint32 145 | #ifndef cmp 146 | #define cmp(a,b) (*(a) > *(b)) 147 | #include "quadsort.c" 148 | #undef cmp 149 | #else 150 | #include "quadsort.c" 151 | #endif 152 | #undef VAR 153 | #undef FUNC 154 | 155 | ////////////////////////////////////////////////////////// 156 | // ┌───────────────────────────────────────────────────┐// 157 | // │ █████┐ ██┐ ██┐ ██████┐ ██████┐████████┐ │// 158 | // │ ██┌───┘ ██│ ██│ ██┌──██┐└─██┌─┘└──██┌──┘ │// 159 | // │ ██████┐ ███████│ ██████┌┘ ██│ ██│ │// 160 | // │ ██┌──██┐└────██│ ██┌──██┐ ██│ ██│ │// 161 | // │ └█████┌┘ ██│ ██████┌┘██████┐ ██│ │// 162 | // │ └────┘ └─┘ └─────┘ └─────┘ └─┘ │// 163 | // └───────────────────────────────────────────────────┘// 164 | ////////////////////////////////////////////////////////// 165 | 166 | #define VAR long long 167 | #define FUNC(NAME) NAME##64 168 | 169 | #include "quadsort.c" 170 | 171 | #undef VAR 172 | #undef FUNC 173 | 174 | // quadsort_prim 175 | 176 | #define VAR long long 177 | #define FUNC(NAME) NAME##_int64 178 | #ifndef cmp 179 | #define cmp(a,b) (*(a) > *(b)) 180 | #include "quadsort.c" 181 | #undef cmp 182 | #else 183 | #include "quadsort.c" 184 | #endif 185 | #undef VAR 186 | #undef FUNC 187 | 188 | #define VAR unsigned long long 189 | #define FUNC(NAME) NAME##_uint64 190 | #ifndef cmp 191 | #define cmp(a,b) (*(a) > *(b)) 192 | #include "quadsort.c" 193 | #undef cmp 194 | #else 195 | #include "quadsort.c" 196 | #endif 197 | #undef VAR 198 | #undef FUNC 199 | 200 | // This section is outside of 32/64 bit pointer territory, so no cache checks 201 | // necessary, unless sorting 32+ byte structures. 202 | 203 | #undef QUAD_CACHE 204 | #define QUAD_CACHE 4294967295 205 | 206 | ////////////////////////////////////////////////////////// 207 | //┌────────────────────────────────────────────────────┐// 208 | //│ █████┐ ██████┐ ██████┐████████┐ │// 209 | //│ ██┌──██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// 210 | //│ └█████┌┘ ██████┌┘ ██│ ██│ │// 211 | //│ ██┌──██┐ ██┌──██┐ ██│ ██│ │// 212 | //│ └█████┌┘ ██████┌┘██████┐ ██│ │// 213 | //│ └────┘ └─────┘ └─────┘ └─┘ │// 214 | //└────────────────────────────────────────────────────┘// 215 | ////////////////////////////////////////////////////////// 216 | 217 | #define VAR char 218 | #define FUNC(NAME) NAME##8 219 | 220 | #include "quadsort.c" 221 | 222 | #undef VAR 223 | #undef FUNC 224 | 225 | ////////////////////////////////////////////////////////// 226 | //┌────────────────────────────────────────────────────┐// 227 | //│ ▄██┐ █████┐ ██████┐ ██████┐████████┐│// 228 | //│ ████│ ██┌───┘ ██┌──██┐└─██┌─┘└──██┌──┘│// 229 | //│ └─██│ ██████┐ ██████┌┘ ██│ ██│ │// 230 | //│ ██│ ██┌──██┐ ██┌──██┐ ██│ ██│ │// 231 | //│ ██████┐└█████┌┘ ██████┌┘██████┐ ██│ │// 232 | //│ └─────┘ └────┘ └─────┘ └─────┘ └─┘ │// 233 | //└────────────────────────────────────────────────────┘// 234 | ////////////////////////////////////////////////////////// 235 | 236 | #define VAR short 237 | #define FUNC(NAME) NAME##16 238 | 239 | #include "quadsort.c" 240 | 241 | #undef VAR 242 | #undef FUNC 243 | 244 | ////////////////////////////////////////////////////////// 245 | //┌────────────────────────────────────────────────────┐// 246 | //│ ▄██┐ ██████┐ █████┐ ██████┐ ██████┐████████┐ │// 247 | //│ ████│ └────██┐██┌──██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// 248 | //│ └─██│ █████┌┘└█████┌┘ ██████┌┘ ██│ ██│ │// 249 | //│ ██│ ██┌───┘ ██┌──██┐ ██┌──██┐ ██│ ██│ │// 250 | //│ ██████┐███████┐└█████┌┘ ██████┌┘██████┐ ██│ │// 251 | //│ └─────┘└──────┘ └────┘ └─────┘ └─────┘ └─┘ │// 252 | //└────────────────────────────────────────────────────┘// 253 | ////////////////////////////////////////////////////////// 254 | 255 | // 128 reflects the name, though the actual size of a long double is 64, 80, 256 | // 96, or 128 bits, depending on platform. 257 | 258 | #if (DBL_MANT_DIG < LDBL_MANT_DIG) 259 | #define VAR long double 260 | #define FUNC(NAME) NAME##128 261 | #include "quadsort.c" 262 | #undef VAR 263 | #undef FUNC 264 | #endif 265 | 266 | /////////////////////////////////////////////////////////// 267 | //┌─────────────────────────────────────────────────────┐// 268 | //│ ██████┐██┐ ██┐███████┐████████┐ ██████┐ ███┐ ███┐│// 269 | //│██┌────┘██│ ██│██┌────┘└──██┌──┘██┌───██┐████┐████││// 270 | //│██│ ██│ ██│███████┐ ██│ ██│ ██│██┌███┌██││// 271 | //│██│ ██│ ██│└────██│ ██│ ██│ ██│██│└█┌┘██││// 272 | //│└██████┐└██████┌┘███████│ ██│ └██████┌┘██│ └┘ ██││// 273 | //│ └─────┘ └─────┘ └──────┘ └─┘ └─────┘ └─┘ └─┘│// 274 | //└─────────────────────────────────────────────────────┘// 275 | /////////////////////////////////////////////////////////// 276 | 277 | /* 278 | typedef struct {char bytes[32];} struct256; 279 | #define VAR struct256 280 | #define FUNC(NAME) NAME##256 281 | 282 | #include "quadsort.c" 283 | 284 | #undef VAR 285 | #undef FUNC 286 | */ 287 | 288 | /////////////////////////////////////////////////////////////////////////////// 289 | //┌─────────────────────────────────────────────────────────────────────────┐// 290 | //│ ██████┐ ██┐ ██┐ █████┐ ██████┐ ███████┐ ██████┐ ██████┐ ████████┐ │// 291 | //│ ██┌───██┐██│ ██│██┌──██┐██┌──██┐██┌────┘██┌───██┐██┌──██┐└──██┌──┘ │// 292 | //│ ██│ ██│██│ ██│███████│██│ ██│███████┐██│ ██│██████┌┘ ██│ │// 293 | //│ ██│▄▄ ██│██│ ██│██┌──██│██│ ██│└────██│██│ ██│██┌──██┐ ██│ │// 294 | //│ └██████┌┘└██████┌┘██│ ██│██████┌┘███████│└██████┌┘██│ ██│ ██│ │// 295 | //│ └──▀▀─┘ └─────┘ └─┘ └─┘└─────┘ └──────┘ └─────┘ └─┘ └─┘ └─┘ │// 296 | //└─────────────────────────────────────────────────────────────────────────┘// 297 | /////////////////////////////////////////////////////////////////////////////// 298 | 299 | 300 | void quadsort(void *array, size_t nmemb, size_t size, CMPFUNC *cmp) 301 | { 302 | if (nmemb < 2) 303 | { 304 | return; 305 | } 306 | 307 | switch (size) 308 | { 309 | case sizeof(char): 310 | quadsort8(array, nmemb, cmp); 311 | return; 312 | 313 | case sizeof(short): 314 | quadsort16(array, nmemb, cmp); 315 | return; 316 | 317 | case sizeof(int): 318 | quadsort32(array, nmemb, cmp); 319 | return; 320 | 321 | case sizeof(long long): 322 | quadsort64(array, nmemb, cmp); 323 | return; 324 | #if (DBL_MANT_DIG < LDBL_MANT_DIG) 325 | case sizeof(long double): 326 | quadsort128(array, nmemb, cmp); 327 | return; 328 | #endif 329 | // case sizeof(struct256): 330 | // quadsort256(array, nmemb, cmp); 331 | // return; 332 | 333 | default: 334 | #if (DBL_MANT_DIG < LDBL_MANT_DIG) 335 | assert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long) || size == sizeof(long double)); 336 | #else 337 | assert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long)); 338 | #endif 339 | // qsort(array, nmemb, size, cmp); 340 | } 341 | } 342 | 343 | // suggested size values for primitives: 344 | 345 | // case 0: unsigned char 346 | // case 1: signed char 347 | // case 2: signed short 348 | // case 3: unsigned short 349 | // case 4: signed int 350 | // case 5: unsigned int 351 | // case 6: float 352 | // case 7: double 353 | // case 8: signed long long 354 | // case 9: unsigned long long 355 | // case ?: long double, use sizeof(long double): 356 | 357 | void quadsort_prim(void *array, size_t nmemb, size_t size) 358 | { 359 | if (nmemb < 2) 360 | { 361 | return; 362 | } 363 | 364 | switch (size) 365 | { 366 | case 4: 367 | quadsort_int32(array, nmemb, NULL); 368 | return; 369 | case 5: 370 | quadsort_uint32(array, nmemb, NULL); 371 | return; 372 | case 8: 373 | quadsort_int64(array, nmemb, NULL); 374 | return; 375 | case 9: 376 | quadsort_uint64(array, nmemb, NULL); 377 | return; 378 | default: 379 | assert(size == sizeof(int) || size == sizeof(int) + 1 || size == sizeof(long long) || size == sizeof(long long) + 1); 380 | return; 381 | } 382 | } 383 | 384 | // Sort arrays of structures, the comparison function must be by reference. 385 | 386 | void quadsort_size(void *array, size_t nmemb, size_t size, CMPFUNC *cmp) 387 | { 388 | char **pti, *pta, *pts; 389 | size_t index, offset; 390 | 391 | if (nmemb < 2) 392 | { 393 | return; 394 | } 395 | pta = (char *) array; 396 | pti = (char **) malloc(nmemb * sizeof(char *)); 397 | 398 | assert(pti != NULL); 399 | 400 | for (index = offset = 0 ; index < nmemb ; index++) 401 | { 402 | pti[index] = pta + offset; 403 | 404 | offset += size; 405 | } 406 | 407 | switch (sizeof(size_t)) 408 | { 409 | case 4: quadsort32(pti, nmemb, cmp); break; 410 | case 8: quadsort64(pti, nmemb, cmp); break; 411 | } 412 | 413 | pts = (char *) malloc(nmemb * size); 414 | 415 | assert(pts != NULL); 416 | 417 | for (index = 0 ; index < nmemb ; index++) 418 | { 419 | memcpy(pts, pti[index], size); 420 | 421 | pts += size; 422 | } 423 | pts -= nmemb * size; 424 | 425 | memcpy(array, pts, nmemb * size); 426 | 427 | free(pti); 428 | free(pts); 429 | } 430 | 431 | #undef QUAD_CACHE 432 | 433 | #endif 434 | --------------------------------------------------------------------------------