├── LICENSE
├── README.md
├── images
    ├── fluxsort_vs_glidesort.png
    ├── graph1.png
    ├── graph2.png
    ├── graph3.png
    ├── graph4.png
    ├── graph5.png
    ├── quadsort.gif
    └── quadswap.gif
└── src
    ├── bench.c
    ├── quadsort.c
    └── quadsort.h


/LICENSE:
--------------------------------------------------------------------------------
 1 | This is free and unencumbered software released into the public domain.
 2 | 
 3 | Anyone is free to copy, modify, publish, use, compile, sell, or
 4 | distribute this software, either in source code form or as a compiled
 5 | binary, for any purpose, commercial or non-commercial, and by any
 6 | means.
 7 | 
 8 | In jurisdictions that recognize copyright laws, the author or authors
 9 | of this software dedicate any and all copyright interest in the
10 | software to the public domain. We make this dedication for the benefit
11 | of the public at large and to the detriment of our heirs and
12 | successors. We intend this dedication to be an overt act of
13 | relinquishment in perpetuity of all present and future rights to this
14 | software under copyright law.
15 | 
16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
19 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
20 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
21 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
22 | OTHER DEALINGS IN THE SOFTWARE.
23 | 
24 | For more information, please refer to <http://unlicense.org>
25 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | Intro
  2 | -----
  3 | 
  4 | This document describes a stable bottom-up adaptive branchless merge sort named quadsort. A [visualisation](https://github.com/scandum/quadsort#visualization) and [benchmarks](https://github.com/scandum/quadsort#benchmark-quadsort-vs-stdstable_sort-vs-timsort) are available at the bottom.
  5 | 
  6 | The quad swap analyzer
  7 | ----------------------
  8 | Quadsort starts out with an analyzer that has the following tasks:
  9 | 
 10 | 1. Detect ordered data with minimal comparisons.
 11 | 2. Detect reverse order data with minimal comparisons.
 12 | 3. Do the above without impacting performance on random data.
 13 | 4. Exit the quad swap analyzer with sorted blocks of 8 elements.
 14 | 
 15 | Ordered data handling
 16 | ---------------------
 17 | Quadsort's analyzer examines the array 8 elements at a time. It performs 4
 18 | comparisons on elements (1,2), (3,4), (5,6), and (7,8) of which the result
 19 | is stored and a bitmask is created with a value between 0 and 15 for all
 20 | the possible combinations. If the result is 0 it means the 4 comparisons
 21 | were all in order.
 22 | 
 23 | What remains is 3 more comparisons on elements (2,3), (4,5), and (6,7) to
 24 | determine if all 8 elements are in order. Traditional sorts would
 25 | do this with 7 branched individual comparisons, which should result in 3.2
 26 | branch mispredictions on random data on average. Using quadsort's method
 27 | should result in 0.2 branch mispredictions on random data on average.
 28 | 
 29 | If the data is in order quadsort moves on to the next 8 elements. If the data turns
 30 | out to be neither in order or in reverse order, 4 branchless swaps are performed
 31 | using the stored comparison results, followed by a branchless parity merge. More on
 32 | that later.
 33 | 
 34 | Reverse order handling
 35 | ----------------------
 36 | Reverse order data is typically moved using a simple reversal function, as following.
 37 | ```c
 38 | int reverse(int array[], int start, int end, int swap)
 39 | {
 40 |     while (start < end)
 41 |     {
 42 |         swap = array[start];
 43 |         array[start++] = array[end];
 44 |         array[end--] = swap;
 45 |     }
 46 | }
 47 | ```
 48 | While random data can only be sorted using **n log n** comparisons and
 49 | **n log n** moves, reverse-order data can be sorted using **n** comparisons
 50 | and **n** moves through run detection and a reversal routine. Without run
 51 | detection the best you can do is sort it in **n** comparisons and **n log n** moves. 
 52 | 
 53 | Run detection, as the name implies, comes with a detection cost. Thanks
 54 | to the laws of probability a quad swap can cheat however. The chance of
 55 | 4 separate pairs of elements being in reverse order is 1 in 16. So there's
 56 | a 6.25% chance quadsort makes a wasteful run check.
 57 | 
 58 | What about run detection for in-order data? While we're turning
 59 | **n log n** moves into **n** moves with reverse order run detection, we'd be
 60 | turning **0** moves into **0** moves with forward run detection. So there's
 61 | no point in doing so.
 62 | 
 63 | The next optimization is to write the quad swap analyzer in such a way that
 64 | we can perform a simple check to see if the entire array was in reverse order,
 65 | if so, the sort is finished.
 66 | 
 67 | At the end of the loop the array has been turned into a series of ordered
 68 | blocks of 8 elements.
 69 | 
 70 | Ping-Pong Quad Merge
 71 | --------------------
 72 | Most textbook mergesort examples merge two blocks to swap memory, then copy
 73 | them back to main memory.
 74 | ```
 75 | main memory ┌────────┐┌────────┐
 76 |             └────────┘└────────┘
 77 |                   ↓ merge ↓
 78 | swap memory ┌──────────────────┐
 79 |             └──────────────────┘
 80 |                   ↓ copy ↓
 81 | main memory ┌──────────────────┐
 82 |             └──────────────────┘
 83 | ```
 84 | This doubles the amount of moves and we can fix this by merging 4 blocks at once
 85 | using a quad merge / ping-pong merge like so:
 86 | ```
 87 | main memory ┌────────┐┌────────┐┌────────┐┌────────┐
 88 |             └────────┘└────────┘└────────┘└────────┘
 89 |                   ↓ merge ↓           ↓ merge ↓
 90 | swap memory ┌──────────────────┐┌──────────────────┐
 91 |             └──────────────────┘└──────────────────┘
 92 |                             ↓ merge ↓
 93 | main memory ┌──────────────────────────────────────┐
 94 |             └──────────────────────────────────────┘
 95 | ```
 96 | 
 97 | It is possible to interleave the two merges to swap memory for increased memory-level
 98 | parallelism, but this can both increase and decrease performance.
 99 | 
100 | Skipping
101 | --------
102 | Just like with the quad swap it is beneficial to check whether the 4 blocks
103 | are in-order.
104 | 
105 | In the case of the 4 blocks being in-order the merge operation is skipped,
106 | as this would be pointless. Because reverse order data is handled in the
107 | quad swap there is no need to check for reverse order blocks.
108 | 
109 | This allows quadsort to sort in-order sequences using **n** comparisons instead
110 | of **n * log n** comparisons.
111 | 
112 | Parity merge
113 | ------------
114 | A parity merge takes advantage of the fact that if you have two n length arrays,
115 | you can fully merge the two arrays by performing n merge operations on the start
116 | of each array, and n merge operations on the end of each array. The arrays must
117 | be of equal length. Another way to describe the parity merge would be as
118 | a bidirectional unguarded merge.
119 | 
120 | The main advantage of a parity merge over a traditional merge is that the loop
121 | of a parity merge can be fully unrolled.
122 | 
123 | If the arrays are not of equal length a hybrid parity merge can be performed. One
124 | way to do so is using n parity merges where n is the size of the smaller array,
125 | before switching to a traditional merge.
126 | 
127 | Branchless parity merge
128 | -----------------------
129 | Since the parity merge can be unrolled it's very suitable for branchless
130 | optimizations to speed up the sorting of random data. Another advantage
131 | is that two separate memory regions are accessed in the same loop, allowing
132 | memory-level parallelism. This makes the routine up to 2.5 times faster for
133 | random data on most hardware.
134 | 
135 | Increasing the memory regions from two to four can result in both performance
136 | gains and performance losses.
137 | 
138 | The following is a visualization of an array with 256 random elements getting
139 | turned into sorted blocks of 32 elements using ping-pong parity merges.
140 | 
141 | ![quadsort visualization](/images/quadswap.gif)
142 | 
143 | Cross merge
144 | -----------
145 | While a branchless parity merge sorts random data faster, it sorts ordered data
146 | slower. One way to solve this problem is by using a method with a resemblance
147 | to the galloping merge concept first introduced by timsort.
148 | 
149 | The cross merge works in a similar way to the quad swap.
150 | Instead of merging two arrays two items at a time, it merges four items at a time.
151 | ```
152 | ┌───┐┌───┐┌───┐    ┌───┐┌───┐┌───┐            ╭───╮  ┌───┐┌───┐┌───┐
153 | │ A ││ B ││ C │    │ X ││ Y ││ Z │        ┌───│B<X├──┤ A ││ B ││C/X│
154 | └─┬─┘└─┬─┘└───┘    └─┬─┘└─┬─┘└───┘        │   ╰─┬─╯  └───┘└───┘└───┘
155 |   └────┴─────────────┴────┴───────────────┘     │  ╭───╮  ┌───┐┌───┐┌───┐
156 |                                                 └──│A>Y├──┤ X ││ Y ││A/Z│
157 |                                                    ╰─┬─╯  └───┘└───┘└───┘
158 |                                                      │    ┌───┐┌───┐┌───┐
159 |                                                      └────│A/X││X/A││B/Y│
160 |                                                           └───┘└───┘└───┘
161 | ```
162 | When merging ABC and XYZ it first checks if B is smaller or equal to X. If
163 | that's the case A and B are copied to swap. If not, it checks if A is greater
164 | than Y. If that's the case X and Y are copied to swap.
165 | 
166 | If either check is false it's known that the two remaining distributions
167 | are A X and X A. This allows performing an optimal branchless merge. Since
168 | it's known each block still has at least 1 item remaining (B and Y) and
169 | there is a high probability of the data to be random, another (sub-optimal)
170 | branchless merge can be performed.
171 | 
172 | In C this looks as following:
173 | ```c
174 | while (l < l_size - 1 && r < r_size - 1)
175 | {
176 |     if (left[l + 1] <= right[r])
177 |     {
178 |         swap[s++] = left[l++];
179 |         swap[s++] = left[l++];
180 |     }
181 |     else if (left[l] > right[r + 1])
182 |     {
183 |         swap[s++] = right[r++];
184 |         swap[s++] = right[r++];
185 |     }
186 |     else
187 |     {
188 |         a = left[l] > right[r];
189 |         x = !a;
190 |         swap[s + a] = left[l++];
191 |         swap[s + x] = right[r++];
192 |         s += 2;
193 |     }
194 | }
195 | ```
196 | Overall the cross merge gives a decent performance gain for both ordered
197 | and random data, particularly when the two arrays are of unequal length. When
198 | two arrays are of near equal length quadsort looks 8 elements ahead, and performs
199 | an 8 element parity merge if it can't skip ahead.
200 | 
201 | Merge strategy
202 | --------------
203 | Quadsort will merge blocks of 8 into blocks of 32, which it will merge into
204 | blocks of 128, 512, 2048, 8192, etc.
205 | 
206 | For each ping-pong merge quadsort will perform two comparisons to see if it will be faster
207 | to use a parity merge or a cross merge, and pick the best option.
208 | 
209 | Tail merge
210 | ----------
211 | When sorting an array of 33 elements you end up with a sorted array of 32
212 | elements and a sorted array of 1 element in the end. If a program sorts in
213 | intervals it should pick an optimal array size (32, 128, 512, etc) to do so.
214 | 
215 | To minimalize the impact the remainder of the array is sorted using a tail
216 | merge.
217 | 
218 | Big O
219 | -----
220 | ```
221 |                  ┌───────────────────────┐┌───────────────────────┐
222 |                  │comparisons            ││swap memory            │
223 | ┌───────────────┐├───────┬───────┬───────┤├───────┬───────┬───────┤┌──────┐┌─────────┐┌─────────┐
224 | │name           ││min    │avg    │max    ││min    │avg    │max    ││stable││partition││adaptive │
225 | ├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
226 | │mergesort      ││n log n│n log n│n log n││n      │n      │n      ││yes   ││no       ││no       │
227 | ├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
228 | │quadsort       ││n      │n log n│n log n││1      │n      │n      ││yes   ││no       ││yes      │
229 | ├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
230 | │quicksort      ││n log n│n log n│n²     ││1      │1      │1      ││no    ││yes      ││no       │
231 | └───────────────┘└───────┴───────┴───────┘└───────┴───────┴───────┘└──────┘└─────────┘└─────────┘
232 | ```
233 | Quadsort makes n comparisons when the data is fully sorted or reverse sorted.
234 | 
235 | Data Types
236 | ----------
237 | The C implementation of quadsort supports long doubles and 8, 16, 32, and 64 bit data types. By using pointers it's possible to sort any other data type, like strings.
238 | 
239 | Interface
240 | ---------
241 | Quadsort uses the same interface as qsort, which is described in [man qsort](https://man7.org/linux/man-pages/man3/qsort.3p.html).
242 | 
243 | In addition to supporting `(l - r)` and `((l > r) - (l < r))` for the comparison function, `(l > r)` is valid as well. Special note should be taken that C++ sorts use `(l < r)` for the comparison function, which is incompatible with the C standard. When porting quadsort to C++ or Rust, switch `(l, r)` to `(r, l)` for every comparison.
244 | 
245 | Quadsort comes with the `quadsort_prim(void *array, size_t nmemb, size_t size)` function to perform primitive comparisons on arrays of 32 and 64 bit integers. Nmemb is the number of elements, while size should be either `sizeof(int)` or `sizeof(long long)` for signed integers, and `sizeof(int) + 1` or `sizeof(long long) + 1` for unsigned integers. Support for the char, short, float, double, and long double types can be easily added in quadsort.h.
246 | 
247 | Quadsort comes with the `quadsort_size(void *array, size_t nmemb, size_t size, CMPFUNC *cmp)` function to sort elements of any given size. The comparison function needs to be by reference, instead of by value, as if you are sorting an array of pointers.
248 | 
249 | Memory
250 | ------
251 | By default quadsort uses n swap memory. If memory allocation fails quadsort will switch to sorting in-place through rotations. The minimum memory requirement is 32 elements of stack memory.
252 | 
253 | Rotations can be performed with minimal performance loss by using [branchless binary searches](https://github.com/scandum/binary_search) and [trinity / bridge rotations](https://github.com/scandum/rotate).
254 | 
255 | Sorting in-place through rotations will increase the number of moves from **n log n** to **n log² n**. The overall impact on performance is minor on array sizes below 1M elements.
256 | 
257 | Performance
258 | -----------
259 | Quadsort is one of the fastest merge sorts written to date. It is faster than quicksort for most data distributions, with the notable exception of generic data. Data type is important as well, and overall quadsort is faster for sorting referenced objects.
260 | 
261 | Compared to Timsort, Quadsort has similar overall adaptivity while being much faster on random data, even without branchless optimizations.
262 | 
263 | Quicksort has a slight advantage on random data as the array size increases beyond the L1 cache. For small arrays quadsort has a significant advantage due to quicksort's inability to cost effectively pick a reliable pivot. Subsequently, the only way for quicksort to rival quadsort is to cheat and become a hybrid sort, by using branchless merges to sort small partitions.
264 | 
265 | When using the clang compiler quadsort can use branchless ternary comparisons. Since most programming languages only support ternary merges `? :` and not ternary partitions `: ?` this gives branchless mergesorts an additional advantage over branchless quicksorts. However, since the gcc compiler doesn't support branchless ternary merges, and the hack to perform branchless merges is less efficient than the hack to perform branchless partitions, branchless quicksorts have an advantage for gcc.
266 | 
267 | To take full advantage of branchless operations the cmp macro needs to be uncommented in bench.c, which will increase the performance by 30% on primitive types. The quadsort_prim function can be used to access primitive comparisons directly. 
268 | 
269 | Variants
270 | --------
271 | - [blitsort](https://github.com/scandum/blitsort) is a hybrid stable in-place rotate quicksort / quadsort.
272 | 
273 | - [crumsort](https://github.com/scandum/crumsort) is a hybrid unstable in-place quicksort / quadsort.
274 | 
275 | - [fluxsort](https://github.com/scandum/fluxsort) is a hybrid stable quicksort / quadsort.
276 | 
277 | - [gridsort](https://github.com/scandum/gridsort) is a hybrid stable cubesort / quadsort. Gridsort is an online sort and might be of interest to those interested in data structures and sorting very large arrays.
278 | 
279 | - [twinsort](https://github.com/scandum/twinsort) is a simplified quadsort with a
280 | much smaller code size. Twinsort might be of use to people who want to port or understand quadsort; it does not use
281 | pointers or gotos. It is a bit dated and isn't branchless.
282 | 
283 | - [piposort](https://github.com/scandum/piposort) is a simplified branchless quadsort with a much smaller code size and complexity while still being very fast. Piposort might be of use to people who want to port quadsort. This is a lot easier when you start out small.
284 | 
285 | - [wolfsort](https://github.com/scandum/wolfsort) is a hybrid stable radixsort / fluxsort with improved performance on random data. It's mostly a proof of concept that only works on unsigned 32 bit integers.
286 | 
287 | - [Robin Hood Sort](https://github.com/mlochbaum/rhsort) is a hybrid stable radixsort / dropsort with improved performance on random and generic data. It has a compilation option to use quadsort for its merging.
288 | 
289 | Credits
290 | -------
291 | I personally invented the quad swap analyzer, cross merge, parity merge, branchless parity merge,
292 | monobound binary search, bridge rotation, and trinity rotation.
293 | 
294 | The ping-pong quad merge had been independently implemented in wikisort prior to quadsort, and
295 | likely by others as well.
296 | 
297 | The monobound binary search has been independently implemented, often referred to as a branchless binary search. I published a working concept in 2014, which appears to pre-date others.
298 | 
299 | Special kudos to [Musiccombo and Co](https://www.youtube.com/c/Musicombo) for getting me interested in rotations and branchless logic.
300 | 
301 | Visualization
302 | -------------
303 | In the visualization below nine tests are performed on 256 elements.
304 | 
305 | 1. Random order
306 | 2. Ascending order
307 | 3. Ascending Saw
308 | 4. Generic random order
309 | 5. Descending order
310 | 6. Descending Saw
311 | 7. Random tail
312 | 8. Random half
313 | 9. Ascending tiles.
314 | 
315 | The upper half shows the swap memory and the bottom half shows the main memory.
316 | Colors are used to differentiate various operations. Quad swaps are in cyan, reversals in magenta, skips in green, parity merges in orange, bridge rotations in yellow, and trinity rotations are in violet.
317 | 
318 | [![quadsort benchmark](/images/quadsort.gif)](https://www.youtube.com/watch?v=GJjH_99BS70)
319 | 
320 | The [visualization is available on YouTube](https://www.youtube.com/watch?v=GJjH_99BS70) and there's also a [YouTube video of a java port of quadsort](https://www.youtube.com/watch?v=drSeVadf05M) in [ArrayV](https://github.com/Gaming32/ArrayV-v4.0) on a wide variety of data distributions.
321 | 
322 | Benchmark: quadsort vs std::stable_sort vs timsort
323 | --------------------------------------------------
324 | The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
325 | using the [wolfsort benchmark](https://github.com/scandum/wolfsort).
326 | The source code was compiled using `g++ -O3 -w -fpermissive bench.c`. Stablesort is g++'s std:stablesort function. Each test was ran 100 times
327 | on 100,000 elements. A table with the best and average time in seconds can be uncollapsed below the bar graph.
328 | 
329 | ![Graph](/images/graph1.png)
330 | 
331 | <details><summary><b>data table</b></summary>
332 | 
333 | |      Name |    Items | Type |     Best |  Average |  Compares | Samples |     Distribution |
334 | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |
335 | |stablesort |   100000 |  128 | 0.010958 | 0.011215 |         0 |     100 |     random order |
336 | |  fluxsort |   100000 |  128 | 0.008589 | 0.008837 |         0 |     100 |     random order |
337 | |   timsort |   100000 |  128 | 0.012799 | 0.013185 |         0 |     100 |     random order |
338 | 
339 | |      Name |    Items | Type |     Best |  Average |  Compares | Samples |     Distribution |
340 | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |
341 | |stablesort |   100000 |   64 | 0.006134 | 0.006232 |         0 |     100 |     random order |
342 | |  fluxsort |   100000 |   64 | 0.001945 | 0.001994 |         0 |     100 |     random order |
343 | |   timsort |   100000 |   64 | 0.007824 | 0.008070 |         0 |     100 |     random order |
344 | 
345 | |      Name |    Items | Type |     Best |  Average |     Loops | Samples |     Distribution |
346 | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |
347 | |stablesort |   100000 |   32 | 0.005995 | 0.006068 |         0 |     100 |     random order |
348 | |  fluxsort |   100000 |   32 | 0.001841 | 0.001890 |         0 |     100 |     random order |
349 | |   timsort |   100000 |   32 | 0.007593 | 0.007773 |         0 |     100 |     random order |
350 | |           |          |      |          |          |           |         |                  |
351 | |stablesort |   100000 |   32 | 0.003815 | 0.003841 |         0 |     100 |     random % 100 |
352 | |  fluxsort |   100000 |   32 | 0.000655 | 0.000680 |         0 |     100 |     random % 100 |
353 | |   timsort |   100000 |   32 | 0.005608 | 0.005666 |         0 |     100 |     random % 100 |
354 | |           |          |      |          |          |           |         |                  |
355 | |stablesort |   100000 |   32 | 0.000672 | 0.000733 |         0 |     100 |  ascending order |
356 | |  fluxsort |   100000 |   32 | 0.000044 | 0.000045 |         0 |     100 |  ascending order |
357 | |   timsort |   100000 |   32 | 0.000045 | 0.000045 |         0 |     100 |  ascending order |
358 | |           |          |      |          |          |           |         |                  |
359 | |stablesort |   100000 |   32 | 0.001360 | 0.001410 |         0 |     100 |    ascending saw |
360 | |  fluxsort |   100000 |   32 | 0.000328 | 0.000330 |         0 |     100 |    ascending saw |
361 | |   timsort |   100000 |   32 | 0.000840 | 0.000852 |         0 |     100 |    ascending saw |
362 | |           |          |      |          |          |           |         |                  |
363 | |stablesort |   100000 |   32 | 0.001121 | 0.001154 |         0 |     100 |       pipe organ |
364 | |  fluxsort |   100000 |   32 | 0.000205 | 0.000207 |         0 |     100 |       pipe organ |
365 | |   timsort |   100000 |   32 | 0.000465 | 0.000469 |         0 |     100 |       pipe organ |
366 | |           |          |      |          |          |           |         |                  |
367 | |stablesort |   100000 |   32 | 0.000904 | 0.000920 |         0 |     100 | descending order |
368 | |  fluxsort |   100000 |   32 | 0.000055 | 0.000055 |         0 |     100 | descending order |
369 | |   timsort |   100000 |   32 | 0.000088 | 0.000092 |         0 |     100 | descending order |
370 | |           |          |      |          |          |           |         |                  |
371 | |stablesort |   100000 |   32 | 0.001603 | 0.001641 |         0 |     100 |   descending saw |
372 | |  fluxsort |   100000 |   32 | 0.000418 | 0.000427 |         0 |     100 |   descending saw |
373 | |   timsort |   100000 |   32 | 0.000788 | 0.000816 |         0 |     100 |   descending saw |
374 | |           |          |      |          |          |           |         |                  |
375 | |stablesort |   100000 |   32 | 0.002029 | 0.002095 |         0 |     100 |      random tail |
376 | |  fluxsort |   100000 |   32 | 0.000623 | 0.000627 |         0 |     100 |      random tail |
377 | |   timsort |   100000 |   32 | 0.001996 | 0.002041 |         0 |     100 |      random tail |
378 | |           |          |      |          |          |           |         |                  |
379 | |stablesort |   100000 |   32 | 0.003491 | 0.003539 |         0 |     100 |      random half |
380 | |  fluxsort |   100000 |   32 | 0.001071 | 0.001078 |         0 |     100 |      random half |
381 | |   timsort |   100000 |   32 | 0.004025 | 0.004056 |         0 |     100 |      random half |
382 | |           |          |      |          |          |           |         |                  |
383 | |stablesort |   100000 |   32 | 0.000918 | 0.000940 |         0 |     100 |  ascending tiles |
384 | |  fluxsort |   100000 |   32 | 0.000293 | 0.000296 |         0 |     100 |  ascending tiles |
385 | |   timsort |   100000 |   32 | 0.000850 | 0.000931 |         0 |     100 |  ascending tiles |
386 | |           |          |      |          |          |           |         |                  |
387 | |stablesort |   100000 |   32 | 0.001168 | 0.001431 |         0 |     100 |     bit reversal |
388 | |  fluxsort |   100000 |   32 | 0.001700 | 0.001731 |         0 |     100 |     bit reversal |
389 | |   timsort |   100000 |   32 | 0.002261 | 0.002940 |         0 |     100 |     bit reversal |
390 | 
391 | </details>
392 | 
393 | The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
394 | using the [wolfsort benchmark](https://github.com/scandum/wolfsort).
395 | The source code was compiled using `g++ -O3 -w -fpermissive bench.c`. It measures the performance on random data with array sizes
396 | ranging from 1 to 1024. It's generated by running the benchmark using 1024 0 0 as the argument. The benchmark is weighted, meaning the number of repetitions
397 | halves each time the number of items doubles. A table with the best and average time in seconds can be uncollapsed below the bar graph.
398 | 
399 | ![Graph](/images/graph2.png)
400 | 
401 | <details><summary><b>data table</b></summary>
402 | 
403 | |      Name |    Items | Type |     Best |  Average |  Compares | Samples |     Distribution |
404 | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |
405 | |stablesort |        4 |   32 | 0.005569 | 0.005899 |       0.0 |      50 |       random 1-4 |
406 | |  quadsort |        4 |   32 | 0.001144 | 0.001189 |       0.0 |      50 |       random 1-4 |
407 | |   timsort |        4 |   32 | 0.002301 | 0.002491 |       0.0 |      50 |       random 1-4 |
408 | |           |          |      |          |          |           |         |                  |
409 | |stablesort |        8 |   32 | 0.005731 | 0.005950 |       0.0 |      50 |       random 5-8 |
410 | |  quadsort |        8 |   32 | 0.002064 | 0.002200 |       0.0 |      50 |       random 5-8 |
411 | |   timsort |        8 |   32 | 0.004958 | 0.005165 |       0.0 |      50 |       random 5-8 |
412 | |           |          |      |          |          |           |         |                  |
413 | |stablesort |       16 |   32 | 0.006360 | 0.006415 |       0.0 |      50 |      random 9-16 |
414 | |  quadsort |       16 |   32 | 0.001862 | 0.001927 |       0.0 |      50 |      random 9-16 |
415 | |   timsort |       16 |   32 | 0.006578 | 0.006663 |       0.0 |      50 |      random 9-16 |
416 | |           |          |      |          |          |           |         |                  |
417 | |stablesort |       32 |   32 | 0.007809 | 0.007885 |       0.0 |      50 |     random 17-32 |
418 | |  quadsort |       32 |   32 | 0.003177 | 0.003258 |       0.0 |      50 |     random 17-32 |
419 | |   timsort |       32 |   32 | 0.008597 | 0.008698 |       0.0 |      50 |     random 17-32 |
420 | |           |          |      |          |          |           |         |                  |
421 | |stablesort |       64 |   32 | 0.008846 | 0.008918 |       0.0 |      50 |     random 33-64 |
422 | |  quadsort |       64 |   32 | 0.004144 | 0.004195 |       0.0 |      50 |     random 33-64 |
423 | |   timsort |       64 |   32 | 0.011459 | 0.011560 |       0.0 |      50 |     random 33-64 |
424 | |           |          |      |          |          |           |         |                  |
425 | |stablesort |      128 |   32 | 0.010065 | 0.010131 |       0.0 |      50 |    random 65-128 |
426 | |  quadsort |      128 |   32 | 0.005131 | 0.005184 |       0.0 |      50 |    random 65-128 |
427 | |   timsort |      128 |   32 | 0.013917 | 0.014022 |       0.0 |      50 |    random 65-128 |
428 | |           |          |      |          |          |           |         |                  |
429 | |stablesort |      256 |   32 | 0.011217 | 0.011305 |       0.0 |      50 |   random 129-256 |
430 | |  quadsort |      256 |   32 | 0.004937 | 0.005010 |       0.0 |      50 |   random 129-256 |
431 | |   timsort |      256 |   32 | 0.015785 | 0.015912 |       0.0 |      50 |   random 129-256 |
432 | |           |          |      |          |          |           |         |                  |
433 | |stablesort |      512 |   32 | 0.012544 | 0.012637 |       0.0 |      50 |   random 257-512 |
434 | |  quadsort |      512 |   32 | 0.005545 | 0.005618 |       0.0 |      50 |   random 257-512 |
435 | |   timsort |      512 |   32 | 0.017533 | 0.017652 |       0.0 |      50 |   random 257-512 |
436 | |           |          |      |          |          |           |         |                  |
437 | |stablesort |     1024 |   32 | 0.013871 | 0.013979 |       0.0 |      50 |  random 513-1024 |
438 | |  quadsort |     1024 |   32 | 0.005664 | 0.005755 |       0.0 |      50 |  random 513-1024 |
439 | |   timsort |     1024 |   32 | 0.019176 | 0.019270 |       0.0 |      50 |  random 513-1024 |
440 | |           |          |      |          |          |           |         |                  |
441 | |stablesort |     2048 |   32 | 0.010961 | 0.011018 |       0.0 |      50 | random 1025-2048 |
442 | |  quadsort |     2048 |   32 | 0.004527 | 0.004580 |       0.0 |      50 | random 1025-2048 |
443 | |   timsort |     2048 |   32 | 0.015289 | 0.015338 |       0.0 |      50 | random 1025-2048 |
444 | |           |          |      |          |          |           |         |                  |
445 | |stablesort |     4096 |   32 | 0.010854 | 0.010917 |       0.0 |      50 | random 2049-4096 |
446 | |  quadsort |     4096 |   32 | 0.003974 | 0.004018 |       0.0 |      50 | random 2049-4096 |
447 | |   timsort |     4096 |   32 | 0.015051 | 0.015132 |       0.0 |      50 | random 2049-4096 |
448 | 
449 | </details>
450 | 
451 | Benchmark: quadsort vs qsort (mergesort)
452 | ----------------------------------------
453 | The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04).
454 | The source code was compiled using gcc -O3 bench.c. Each test was ran 100 times. It's generated
455 | by running the benchmark using 100000 100 1 as the argument. In the benchmark quadsort is
456 | compared against glibc qsort() using the same general purpose interface and without any known
457 | unfair advantage, like inlining. A table with the best and average time in seconds can be
458 | uncollapsed below the bar graph.
459 | 
460 | ![Graph](/images/graph3.png)
461 | 
462 | <details><summary><b>data table</b></summary>
463 | 
464 | |      Name |    Items | Type |     Best |  Average |  Compares | Samples |     Distribution |
465 | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |
466 | |     qsort |   100000 |   64 | 0.016881 | 0.017052 |   1536381 |     100 |    random string |
467 | |  quadsort |   100000 |   64 | 0.010615 | 0.010756 |   1655772 |     100 |    random string |
468 | |           |          |      |          |          |           |         |                  |
469 | |     qsort |   100000 |   64 | 0.015387 | 0.015550 |   1536491 |     100 |    random double |
470 | |  quadsort |   100000 |   64 | 0.008648 | 0.008751 |   1655904 |     100 |    random double |
471 | |           |          |      |          |          |           |         |                  |
472 | |     qsort |   100000 |   64 | 0.011165 | 0.011375 |   1536491 |     100 |      random long |
473 | |  quadsort |   100000 |   64 | 0.006024 | 0.006099 |   1655904 |     100 |      random long |
474 | |           |          |      |          |          |           |         |                  |
475 | |     qsort |   100000 |   64 | 0.010775 | 0.010928 |   1536634 |     100 |       random int |
476 | |  quadsort |   100000 |   64 | 0.005313 | 0.005375 |   1655948 |     100 |       random int |
477 | 
478 | |      Name |    Items | Type |     Best |  Average |  Compares | Samples |     Distribution |
479 | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |
480 | |     qsort |   100000 |  128 | 0.018214 | 0.018843 |   1536491 |     100 |     random order |
481 | |  quadsort |   100000 |  128 | 0.011098 | 0.011185 |   1655904 |     100 |     random order |
482 | 
483 | |      Name |    Items | Type |     Best |  Average |  Compares | Samples |     Distribution |
484 | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |
485 | |     qsort |   100000 |   64 | 0.009522 | 0.009748 |   1536491 |     100 |     random order |
486 | |  quadsort |   100000 |   64 | 0.004073 | 0.004118 |   1655904 |     100 |     random order |
487 | 
488 | |      Name |    Items | Type |     Best |  Average |  Compares | Samples |     Distribution |
489 | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |
490 | |     qsort |   100000 |   32 | 0.008946 | 0.009149 |   1536634 |     100 |     random order |
491 | |  quadsort |   100000 |   32 | 0.003342 | 0.003391 |   1655948 |     100 |     random order |
492 | |           |          |      |          |          |           |         |                  |
493 | |     qsort |   100000 |   32 | 0.006868 | 0.007059 |   1532324 |     100 |     random % 100 |
494 | |  quadsort |   100000 |   32 | 0.002690 | 0.002740 |   1381730 |     100 |     random % 100 |
495 | |           |          |      |          |          |           |         |                  |
496 | |     qsort |   100000 |   32 | 0.002612 | 0.002845 |    815024 |     100 |  ascending order |
497 | |  quadsort |   100000 |   32 | 0.000160 | 0.000161 |     99999 |     100 |  ascending order |
498 | |           |          |      |          |          |           |         |                  |
499 | |     qsort |   100000 |   32 | 0.003396 | 0.003622 |    915020 |     100 |    ascending saw |
500 | |  quadsort |   100000 |   32 | 0.000904 | 0.000925 |    368457 |     100 |    ascending saw |
501 | |           |          |      |          |          |           |         |                  |
502 | |     qsort |   100000 |   32 | 0.002672 | 0.002803 |    884462 |     100 |       pipe organ |
503 | |  quadsort |   100000 |   32 | 0.000466 | 0.000469 |    277443 |     100 |       pipe organ |
504 | |           |          |      |          |          |           |         |                  |
505 | |     qsort |   100000 |   32 | 0.002469 | 0.002587 |    853904 |     100 | descending order |
506 | |  quadsort |   100000 |   32 | 0.000164 | 0.000165 |     99999 |     100 | descending order |
507 | |           |          |      |          |          |           |         |                  |
508 | |     qsort |   100000 |   32 | 0.003302 | 0.003453 |    953892 |     100 |   descending saw |
509 | |  quadsort |   100000 |   32 | 0.000929 | 0.000941 |    380548 |     100 |   descending saw |
510 | |           |          |      |          |          |           |         |                  |
511 | |     qsort |   100000 |   32 | 0.004250 | 0.004501 |   1012003 |     100 |      random tail |
512 | |  quadsort |   100000 |   32 | 0.001188 | 0.001208 |    564953 |     100 |      random tail |
513 | |           |          |      |          |          |           |         |                  |
514 | |     qsort |   100000 |   32 | 0.005960 | 0.006133 |   1200707 |     100 |      random half |
515 | |  quadsort |   100000 |   32 | 0.002047 | 0.002078 |    980778 |     100 |      random half |
516 | |           |          |      |          |          |           |         |                  |
517 | |     qsort |   100000 |   32 | 0.003903 | 0.004352 |   1209200 |     100 |  ascending tiles |
518 | |  quadsort |   100000 |   32 | 0.002072 | 0.002170 |    671191 |     100 |  ascending tiles |
519 | |           |          |      |          |          |           |         |                  |
520 | |     qsort |   100000 |   32 | 0.005165 | 0.006168 |   1553378 |     100 |     bit reversal |
521 | |  quadsort |   100000 |   32 | 0.003146 | 0.003197 |   1711215 |     100 |     bit reversal |
522 | 
523 | </details>
524 | 
525 | The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04).
526 | The source code was compiled using gcc -O3 bench.c. Each test was ran 100 times. It's generated by running the benchmark using
527 | 10000000 0 0 as the argument. The benchmark is weighted, meaning the number of repetitions
528 | halves each time the number of items doubles.  A table with the best and average time in seconds can be uncollapsed below the bar graph.
529 | 
530 | ![Graph](/images/graph5.png)
531 | 
532 | <details><summary><b>data table</b></summary>
533 | 
534 | |      Name |    Items | Type |     Best |  Average |  Compares | Samples |     Distribution |
535 | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |
536 | |     qsort |       10 |   32 | 0.218310 | 0.224505 |        22 |      10 |        random 10 |
537 | |  quadsort |       10 |   32 | 0.091750 | 0.092312 |        29 |      10 |        random 10 |
538 | |           |          |      |          |          |           |         |                  |
539 | |     qsort |      100 |   32 | 0.391962 | 0.396639 |       541 |      10 |       random 100 |
540 | |  quadsort |      100 |   32 | 0.173928 | 0.177794 |       646 |      10 |       random 100 |
541 | |           |          |      |          |          |           |         |                  |
542 | |     qsort |     1000 |   32 | 0.558055 | 0.566364 |      8707 |      10 |      random 1000 |
543 | |  quadsort |     1000 |   32 | 0.220395 | 0.222146 |      9817 |      10 |      random 1000 |
544 | |           |          |      |          |          |           |         |                  |
545 | |     qsort |    10000 |   32 | 0.735528 | 0.741353 |    120454 |      10 |     random 10000 |
546 | |  quadsort |    10000 |   32 | 0.267860 | 0.269924 |    131668 |      10 |     random 10000 |
547 | |           |          |      |          |          |           |         |                  |
548 | |     qsort |   100000 |   32 | 0.907161 | 0.910446 |   1536421 |      10 |    random 100000 |
549 | |  quadsort |   100000 |   32 | 0.339541 | 0.340942 |   1655703 |      10 |    random 100000 |
550 | |           |          |      |          |          |           |         |                  |
551 | |     qsort |  1000000 |   32 | 1.085275 | 1.089068 |  18674532 |      10 |   random 1000000 |
552 | |  quadsort |  1000000 |   32 | 0.401715 | 0.403860 |  19816270 |      10 |   random 1000000 |
553 | |           |          |      |          |          |           |         |                  |
554 | |     qsort | 10000000 |   32 | 1.313922 | 1.319911 | 220105921 |      10 |  random 10000000 |
555 | |  quadsort | 10000000 |   32 | 0.599393 | 0.601635 | 231513131 |      10 |  random 10000000 |
556 | 
557 | </details>
558 | 
559 | Benchmark: quadsort vs pdqsort vs fluxsort
560 | ------------------------------------------------------------
561 | The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
562 | using the [wolfsort benchmark](https://github.com/scandum/wolfsort).
563 | The source code was compiled using `g++ -O3 -w -fpermissive bench.c`. Pdqsort is a branchless
564 | quicksort/heapsort/insertionsort hybrid. [Fluxsort](https://github.com/scandum/fluxsort) is a branchless quicksort/mergesort hybrid. Each test
565 | was ran 100 times on 100,000 elements. Comparisons are fully inlined. A table with the best and
566 | average time in seconds can be uncollapsed below the bar graph.
567 | 
568 | ![Graph](/images/graph4.png)
569 | 
570 | <details><summary><b>data table</b></summary>
571 | 
572 | |      Name |    Items | Type |     Best |  Average |  Compares | Samples |     Distribution |
573 | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |
574 | |   pdqsort |   100000 |  128 | 0.005773 | 0.005859 |         0 |     100 |     random order |
575 | |  quadsort |   100000 |  128 | 0.009813 | 0.009882 |         0 |     100 |     random order |
576 | |  fluxsort |   100000 |  128 | 0.008603 | 0.008704 |         0 |     100 |     random order |
577 | 
578 | |      Name |    Items | Type |     Best |  Average |  Compares | Samples |     Distribution |
579 | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |
580 | |   pdqsort |   100000 |   64 | 0.002671 | 0.002686 |         0 |     100 |     random order |
581 | |  quadsort |   100000 |   64 | 0.002516 | 0.002534 |         0 |     100 |     random order |
582 | |  fluxsort |   100000 |   64 | 0.001978 | 0.002003 |         0 |     100 |     random order |
583 | 
584 | |      Name |    Items | Type |     Best |  Average |     Loops | Samples |     Distribution |
585 | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |
586 | |   pdqsort |   100000 |   32 | 0.002589 | 0.002607 |         0 |     100 |     random order |
587 | |  quadsort |   100000 |   32 | 0.002447 | 0.002466 |         0 |     100 |     random order |
588 | |  fluxsort |   100000 |   32 | 0.001851 | 0.001873 |         0 |     100 |     random order |
589 | |           |          |      |          |          |           |         |                  |
590 | |   pdqsort |   100000 |   32 | 0.000780 | 0.000788 |         0 |     100 |     random % 100 |
591 | |  quadsort |   100000 |   32 | 0.001788 | 0.001812 |         0 |     100 |     random % 100 |
592 | |  fluxsort |   100000 |   32 | 0.000675 | 0.000688 |         0 |     100 |     random % 100 |
593 | |           |          |      |          |          |           |         |                  |
594 | |   pdqsort |   100000 |   32 | 0.000084 | 0.000085 |         0 |     100 |  ascending order |
595 | |  quadsort |   100000 |   32 | 0.000051 | 0.000051 |         0 |     100 |  ascending order |
596 | |  fluxsort |   100000 |   32 | 0.000042 | 0.000043 |         0 |     100 |  ascending order |
597 | |           |          |      |          |          |           |         |                  |
598 | |   pdqsort |   100000 |   32 | 0.003378 | 0.003402 |         0 |     100 |    ascending saw |
599 | |  quadsort |   100000 |   32 | 0.000615 | 0.000618 |         0 |     100 |    ascending saw |
600 | |  fluxsort |   100000 |   32 | 0.000327 | 0.000337 |         0 |     100 |    ascending saw |
601 | |           |          |      |          |          |           |         |                  |
602 | |   pdqsort |   100000 |   32 | 0.002772 | 0.002792 |         0 |     100 |       pipe organ |
603 | |  quadsort |   100000 |   32 | 0.000271 | 0.000271 |         0 |     100 |       pipe organ |
604 | |  fluxsort |   100000 |   32 | 0.000214 | 0.000215 |         0 |     100 |       pipe organ |
605 | |           |          |      |          |          |           |         |                  |
606 | |   pdqsort |   100000 |   32 | 0.000187 | 0.000192 |         0 |     100 | descending order |
607 | |  quadsort |   100000 |   32 | 0.000059 | 0.000059 |         0 |     100 | descending order |
608 | |  fluxsort |   100000 |   32 | 0.000053 | 0.000053 |         0 |     100 | descending order |
609 | |           |          |      |          |          |           |         |                  |
610 | |   pdqsort |   100000 |   32 | 0.003148 | 0.003165 |         0 |     100 |   descending saw |
611 | |  quadsort |   100000 |   32 | 0.000614 | 0.000626 |         0 |     100 |   descending saw |
612 | |  fluxsort |   100000 |   32 | 0.000327 | 0.000331 |         0 |     100 |   descending saw |
613 | |           |          |      |          |          |           |         |                  |
614 | |   pdqsort |   100000 |   32 | 0.002498 | 0.002520 |         0 |     100 |      random tail |
615 | |  quadsort |   100000 |   32 | 0.000813 | 0.000842 |         0 |     100 |      random tail |
616 | |  fluxsort |   100000 |   32 | 0.000624 | 0.000627 |         0 |     100 |      random tail |
617 | |           |          |      |          |          |           |         |                  |
618 | |   pdqsort |   100000 |   32 | 0.002573 | 0.002590 |         0 |     100 |      random half |
619 | |  quadsort |   100000 |   32 | 0.001451 | 0.001462 |         0 |     100 |      random half |
620 | |  fluxsort |   100000 |   32 | 0.001064 | 0.001075 |         0 |     100 |      random half |
621 | |           |          |      |          |          |           |         |                  |
622 | |   pdqsort |   100000 |   32 | 0.002256 | 0.002281 |         0 |     100 |  ascending tiles |
623 | |  quadsort |   100000 |   32 | 0.000815 | 0.000823 |         0 |     100 |  ascending tiles |
624 | |  fluxsort |   100000 |   32 | 0.000313 | 0.000315 |         0 |     100 |  ascending tiles |
625 | |           |          |      |          |          |           |         |                  |
626 | |   pdqsort |   100000 |   32 | 0.002570 | 0.002589 |         0 |     100 |     bit reversal |
627 | |  quadsort |   100000 |   32 | 0.002230 | 0.002259 |         0 |     100 |     bit reversal |
628 | |  fluxsort |   100000 |   32 | 0.001718 | 0.001744 |         0 |     100 |     bit reversal |
629 | 
630 | </details>
631 | 
632 | The following benchmark was on WSL clang version 10 (10.0.0-4ubuntu1~18.04.2) using [rhsort](https://github.com/mlochbaum/rhsort)'s wolfsort benchmark.
633 | The source code was compiled using clang -O3. The bar graph shows the best run out of 100 on 131,072 32 bit integers. Comparisons for quadsort, fluxsort and glidesort are inlined.
634 | 
635 | Some additional context is required for this benchmark. Glidesort is written and compiled in Rust which supports branchless ternary operations, subsequently fluxsort and quadsort are compiled using clang with branchless ternary operations in place for the merge and small-sort routines. Since fluxsort and quadsort are optimized for gcc there is a performance penalty, with some of the routines running 2-3x slower than they do in gcc.
636 | 
637 | ![fluxsort vs glidesort](/images/fluxsort_vs_glidesort.png)
638 | 
639 | <details><summary>data table</summary>
640 | 
641 | |      Name |    Items | Type |     Best |  Average |     Loops | Samples |     Distribution |
642 | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |
643 | |  quadsort |   131072 |   32 | 0.002174 | 0.002209 |         0 |     100 |     random order |
644 | |  fluxsort |   131072 |   32 | 0.002189 | 0.002205 |         0 |     100 |     random order |
645 | | glidesort |   131072 |   32 | 0.003065 | 0.003125 |         0 |     100 |     random order |
646 | |           |          |      |          |          |           |         |                  |
647 | |  quadsort |   131072 |   32 | 0.001623 | 0.001646 |         0 |     100 |     random % 100 |
648 | |  fluxsort |   131072 |   32 | 0.000837 | 0.000856 |         0 |     100 |     random % 100 |
649 | | glidesort |   131072 |   32 | 0.001031 | 0.001037 |         0 |     100 |     random % 100 |
650 | |           |          |      |          |          |           |         |                  |
651 | |  quadsort |   131072 |   32 | 0.000061 | 0.000063 |         0 |     100 |  ascending order |
652 | |  fluxsort |   131072 |   32 | 0.000058 | 0.000060 |         0 |     100 |  ascending order |
653 | | glidesort |   131072 |   32 | 0.000091 | 0.000093 |         0 |     100 |  ascending order |
654 | |           |          |      |          |          |           |         |                  |
655 | |  quadsort |   131072 |   32 | 0.000345 | 0.000353 |         0 |     100 |    ascending saw |
656 | |  fluxsort |   131072 |   32 | 0.000341 | 0.000349 |         0 |     100 |    ascending saw |
657 | | glidesort |   131072 |   32 | 0.000351 | 0.000358 |         0 |     100 |    ascending saw |
658 | |           |          |      |          |          |           |         |                  |
659 | |  quadsort |   131072 |   32 | 0.000231 | 0.000245 |         0 |     100 |       pipe organ |
660 | |  fluxsort |   131072 |   32 | 0.000222 | 0.000228 |         0 |     100 |       pipe organ |
661 | | glidesort |   131072 |   32 | 0.000228 | 0.000235 |         0 |     100 |       pipe organ |
662 | |           |          |      |          |          |           |         |                  |
663 | |  quadsort |   131072 |   32 | 0.000074 | 0.000076 |         0 |     100 | descending order |
664 | |  fluxsort |   131072 |   32 | 0.000073 | 0.000076 |         0 |     100 | descending order |
665 | | glidesort |   131072 |   32 | 0.000106 | 0.000110 |         0 |     100 | descending order |
666 | |           |          |      |          |          |           |         |                  |
667 | |  quadsort |   131072 |   32 | 0.000373 | 0.000380 |         0 |     100 |   descending saw |
668 | |  fluxsort |   131072 |   32 | 0.000355 | 0.000371 |         0 |     100 |   descending saw |
669 | | glidesort |   131072 |   32 | 0.000363 | 0.000369 |         0 |     100 |   descending saw |
670 | |           |          |      |          |          |           |         |                  |
671 | |  quadsort |   131072 |   32 | 0.000685 | 0.000697 |         0 |     100 |      random tail |
672 | |  fluxsort |   131072 |   32 | 0.000720 | 0.000726 |         0 |     100 |      random tail |
673 | | glidesort |   131072 |   32 | 0.000953 | 0.000966 |         0 |     100 |      random tail |
674 | |           |          |      |          |          |           |         |                  |
675 | |  quadsort |   131072 |   32 | 0.001192 | 0.001204 |         0 |     100 |      random half |
676 | |  fluxsort |   131072 |   32 | 0.001251 | 0.001266 |         0 |     100 |      random half |
677 | | glidesort |   131072 |   32 | 0.001650 | 0.001679 |         0 |     100 |      random half |
678 | |           |          |      |          |          |           |         |                  |
679 | |  quadsort |   131072 |   32 | 0.001472 | 0.001507 |         0 |     100 |  ascending tiles |
680 | |  fluxsort |   131072 |   32 | 0.000578 | 0.000589 |         0 |     100 |  ascending tiles |
681 | | glidesort |   131072 |   32 | 0.002559 | 0.002576 |         0 |     100 |  ascending tiles |
682 | |           |          |      |          |          |           |         |                  |
683 | |  quadsort |   131072 |   32 | 0.002210 | 0.002231 |         0 |     100 |     bit reversal |
684 | |  fluxsort |   131072 |   32 | 0.002042 | 0.002053 |         0 |     100 |     bit reversal |
685 | | glidesort |   131072 |   32 | 0.002787 | 0.002807 |         0 |     100 |     bit reversal |
686 | |           |          |      |          |          |           |         |                  |
687 | |  quadsort |   131072 |   32 | 0.001237 | 0.001278 |         0 |     100 |       random % 2 |
688 | |  fluxsort |   131072 |   32 | 0.000227 | 0.000233 |         0 |     100 |       random % 2 |
689 | | glidesort |   131072 |   32 | 0.000449 | 0.000455 |         0 |     100 |       random % 2 |
690 | |           |          |      |          |          |           |         |                  |
691 | |  quadsort |   131072 |   32 | 0.001123 | 0.001153 |         0 |     100 |           signal |
692 | |  fluxsort |   131072 |   32 | 0.001269 | 0.001285 |         0 |     100 |           signal |
693 | | glidesort |   131072 |   32 | 0.003760 | 0.003776 |         0 |     100 |           signal |
694 | |           |          |      |          |          |           |         |                  |
695 | |  quadsort |   131072 |   32 | 0.001911 | 0.001956 |         0 |     100 |      exponential |
696 | |  fluxsort |   131072 |   32 | 0.001134 | 0.001142 |         0 |     100 |      exponential |
697 | | glidesort |   131072 |   32 | 0.002355 | 0.002373 |         0 |     100 |      exponential |
698 | 
699 | </details>
700 | 


--------------------------------------------------------------------------------
/images/fluxsort_vs_glidesort.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/scandum/quadsort/4c357224cd9b53382248affc64e295726697bd35/images/fluxsort_vs_glidesort.png


--------------------------------------------------------------------------------
/images/graph1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/scandum/quadsort/4c357224cd9b53382248affc64e295726697bd35/images/graph1.png


--------------------------------------------------------------------------------
/images/graph2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/scandum/quadsort/4c357224cd9b53382248affc64e295726697bd35/images/graph2.png


--------------------------------------------------------------------------------
/images/graph3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/scandum/quadsort/4c357224cd9b53382248affc64e295726697bd35/images/graph3.png


--------------------------------------------------------------------------------
/images/graph4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/scandum/quadsort/4c357224cd9b53382248affc64e295726697bd35/images/graph4.png


--------------------------------------------------------------------------------
/images/graph5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/scandum/quadsort/4c357224cd9b53382248affc64e295726697bd35/images/graph5.png


--------------------------------------------------------------------------------
/images/quadsort.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/scandum/quadsort/4c357224cd9b53382248affc64e295726697bd35/images/quadsort.gif


--------------------------------------------------------------------------------
/images/quadswap.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/scandum/quadsort/4c357224cd9b53382248affc64e295726697bd35/images/quadswap.gif


--------------------------------------------------------------------------------
/src/bench.c:
--------------------------------------------------------------------------------
   1 | /*
   2 | 	To compile use either:
   3 | 
   4 | 	gcc -O3 bench.c
   5 | 
   6 | 	or
   7 | 
   8 | 	clang -O3 bench.c
   9 | 
  10 | 	or
  11 | 
  12 | 	g++ -O3 bench.c
  13 | */
  14 | 
  15 | #include <stdlib.h>
  16 | #include <stdio.h>
  17 | #include <string.h>
  18 | #include <sys/time.h>
  19 | #include <time.h>
  20 | #include <errno.h>
  21 | #include <math.h>
  22 | 
  23 | //#define cmp(a,b) (*(a) > *(b)) // uncomment for faster primitive comparisons
  24 | 
  25 | const char *sorts[] = { "*", "qsort", "quadsort" };
  26 | 
  27 | //#define SKIP_STRINGS
  28 | //#define SKIP_DOUBLES
  29 | //#define SKIP_LONGS
  30 | 
  31 | #if __has_include("blitsort.h")
  32 |   #include "blitsort.h" // curl "https://raw.githubusercontent.com/scandum/blitsort/master/src/blitsort.{c,h}" -o "blitsort.#1"
  33 | #endif
  34 | #if __has_include("crumsort.h")
  35 |   #include "crumsort.h" // curl "https://raw.githubusercontent.com/scandum/crumsort/master/src/crumsort.{c,h}" -o "crumsort.#1"
  36 | #endif
  37 | #if __has_include("dripsort.h")
  38 |   #include "dripsort.h"
  39 | #endif
  40 | #if __has_include("flowsort.h")
  41 |   #include "flowsort.h"
  42 | #endif
  43 | #if __has_include("fluxsort.h")
  44 |   #include "fluxsort.h" // curl "https://raw.githubusercontent.com/scandum/fluxsort/master/src/fluxsort.{c,h}" -o "fluxsort.#1"
  45 | #endif
  46 | #if __has_include("gridsort.h")
  47 |   #include "gridsort.h" // curl "https://raw.githubusercontent.com/scandum/gridsort/master/src/gridsort.{c,h}" -o "gridsort.#1"
  48 | #endif
  49 | #if __has_include("octosort.h")
  50 |   #include "octosort.h" // curl "https://raw.githubusercontent.com/scandum/octosort/master/src/octosort.{c,h}" -o "octosort.#1"
  51 | #endif
  52 | #if __has_include("piposort.h")
  53 |   #include "piposort.h" // curl "https://raw.githubusercontent.com/scandum/piposort/master/src/piposort.{c,h}" -o "piposort.#1"
  54 | #endif
  55 | #if __has_include("quadsort.h")
  56 |   #include "quadsort.h" // curl "https://raw.githubusercontent.com/scandum/quadsort/master/src/quadsort.{c,h}" -o "quadsort.#1"
  57 | #endif
  58 | #if __has_include("skipsort.h")
  59 |   #include "skipsort.h" // curl "https://raw.githubusercontent.com/scandum/wolfsort/master/src/skipsort.{c,h}" -o "skipsort.#1"
  60 | #endif
  61 | #if __has_include("wolfsort.h")
  62 |   #include "wolfsort.h" // curl "https://raw.githubusercontent.com/scandum/wolfsort/master/src/wolfsort.{c,h}" -o "wolfsort.#1"
  63 | #endif
  64 | 
  65 | #if __has_include("rhsort.c")
  66 |     #define RHSORT_C
  67 |     #include "rhsort.c" // curl https://raw.githubusercontent.com/mlochbaum/rhsort/master/rhsort.c > rhsort.c
  68 | #endif
  69 | 
  70 | #ifdef __GNUG__
  71 |   #include <algorithm>
  72 |   #if __has_include("pdqsort.h")
  73 |     #include "pdqsort.h" // curl https://raw.githubusercontent.com/orlp/pdqsort/master/pdqsort.h > pdqsort.h
  74 |   #endif
  75 |   #if __has_include("ska_sort.hpp")
  76 |     #define SKASORT_HPP
  77 |     #include "ska_sort.hpp" // curl https://raw.githubusercontent.com/skarupke/ska_sort/master/ska_sort.hpp > ska_sort.hpp
  78 |   #endif
  79 |   #if __has_include("timsort.hpp")
  80 |     #include "timsort.hpp" // curl https://raw.githubusercontent.com/timsort/cpp-TimSort/master/include/gfx/timsort.hpp > timsort.hpp
  81 |   #endif
  82 | #endif
  83 | 
  84 | #if __has_include("antiqsort.c")
  85 |   #include "antiqsort.c"
  86 | #endif
  87 | 
  88 | //typedef int CMPFUNC (const void *a, const void *b);
  89 | 
  90 | typedef void SRTFUNC(void *array, size_t nmemb, size_t size, CMPFUNC *cmpf);
  91 | 
  92 | 
  93 | // Comment out Remove __attribute__ ((noinline)) and comparisons++ for full
  94 | // throttle. Like so: #define COMPARISON_PP //comparisons++ 
  95 | 
  96 | size_t comparisons;
  97 | 
  98 | #define COMPARISON_PP comparisons++
  99 | 
 100 | #define NO_INLINE __attribute__ ((noinline))
 101 | 
 102 | // primitive type comparison functions
 103 | 
 104 | NO_INLINE int cmp_int(const void * a, const void * b)
 105 | {
 106 | 	COMPARISON_PP;
 107 | 
 108 | 	return *(int *) a - *(int *) b;
 109 | 
 110 | //	const int l = *(const int *)a;
 111 | //	const int r = *(const int *)b;
 112 | 
 113 | //	return l - r;
 114 | //	return l > r;
 115 | //	return (l > r) - (l < r);
 116 | }
 117 | 
 118 | NO_INLINE int cmp_rev(const void * a, const void * b)
 119 | {
 120 | 	int fa = *(int *)a;
 121 | 	int fb = *(int *)b;
 122 | 
 123 | 	COMPARISON_PP;
 124 | 
 125 | 	return fb - fa;
 126 | }
 127 | 
 128 | NO_INLINE int cmp_stable(const void * a, const void * b)
 129 | {
 130 | 	int fa = *(int *)a;
 131 | 	int fb = *(int *)b;
 132 | 
 133 | 	COMPARISON_PP;
 134 | 
 135 | 	return fa / 100000 - fb / 100000;
 136 | }
 137 | 
 138 | NO_INLINE int cmp_long(const void * a, const void * b)
 139 | {
 140 | 	const long long fa = *(const long long *) a;
 141 | 	const long long fb = *(const long long *) b;
 142 | 
 143 | 	COMPARISON_PP;
 144 | 
 145 | 	return (fa > fb) - (fa < fb);
 146 | //	return (fa > fb);
 147 | }
 148 | 
 149 | NO_INLINE int cmp_float(const void * a, const void * b)
 150 | {
 151 | 	return *(float *) a - *(float *) b;
 152 | }
 153 | 
 154 | NO_INLINE int cmp_long_double(const void * a, const void * b)
 155 | {
 156 | 	const long double fa = *(const long double *) a;
 157 | 	const long double fb = *(const long double *) b;
 158 | 
 159 | 	COMPARISON_PP;
 160 | 
 161 | 	return (fa > fb) - (fa < fb);
 162 | 
 163 | /*	if (isnan(fa) || isnan(fb))
 164 | 	{
 165 | 		return isnan(fa) - isnan(fb);
 166 | 	}
 167 | 
 168 | 	return (fa > fb);
 169 | */
 170 | }
 171 | 
 172 | // pointer comparison functions
 173 | 
 174 | NO_INLINE int cmp_str(const void * a, const void * b)
 175 | {
 176 | 	COMPARISON_PP;
 177 | 
 178 | 	return strcmp(*(const char **) a, *(const char **) b);
 179 | }
 180 | 
 181 | NO_INLINE int cmp_int_ptr(const void * a, const void * b)
 182 | {
 183 | 	const int *fa = *(const int **) a;
 184 | 	const int *fb = *(const int **) b;
 185 | 
 186 | 	COMPARISON_PP;
 187 | 
 188 | 	return (*fa > *fb) - (*fa < *fb);
 189 | }
 190 | 
 191 | NO_INLINE int cmp_long_ptr(const void * a, const void * b)
 192 | {
 193 | 	const long long *fa = *(const long long **) a;
 194 | 	const long long *fb = *(const long long **) b;
 195 | 
 196 | 	COMPARISON_PP;
 197 | 
 198 | 	return (*fa > *fb) - (*fa < *fb);
 199 | }
 200 | 
 201 | NO_INLINE int cmp_long_double_ptr(const void * a, const void * b)
 202 | {
 203 | 	const long double *fa = *(const long double **) a;
 204 | 	const long double *fb = *(const long double **) b;
 205 | 
 206 | 	COMPARISON_PP;
 207 | 
 208 | 	return (*fa > *fb) - (*fa < *fb);
 209 | }
 210 | 
 211 | // c++ comparison functions
 212 | 
 213 | #ifdef __GNUG__
 214 | 
 215 | NO_INLINE bool cpp_cmp_int(const int &a, const int &b)
 216 | {
 217 | 	COMPARISON_PP;
 218 | 
 219 | 	return a < b;
 220 | }
 221 | 
 222 | NO_INLINE bool cpp_cmp_str(char const* const a, char const* const b)
 223 | {
 224 | 	COMPARISON_PP;
 225 | 
 226 | 	return strcmp(a, b) < 0;
 227 | }
 228 | 
 229 | #endif
 230 | 
 231 | long long utime()
 232 | {
 233 | 	struct timeval now_time;
 234 | 
 235 | 	gettimeofday(&now_time, NULL);
 236 | 
 237 | 	return now_time.tv_sec * 1000000LL + now_time.tv_usec;
 238 | }
 239 | 
 240 | void seed_rand(unsigned long long seed)
 241 | {
 242 | 	srand(seed);
 243 | }
 244 | 
 245 | void test_sort(void *array, void *unsorted, void *valid, int minimum, int maximum, int samples, int repetitions, SRTFUNC *srt, const char *name, const char *desc, size_t size, CMPFUNC *cmpf)
 246 | {
 247 | 	long long start, end, total, best, average_time, average_comp;
 248 | 	char temp[100];
 249 | 	static char compare = 0;
 250 | 	long long *ptla = (long long *) array, *ptlv = (long long *) valid;
 251 | 	long double *ptda = (long double *) array, *ptdv = (long double *) valid;
 252 | 	int *pta = (int *) array, *ptv = (int *) valid, rep, sam, max, cnt, name32;
 253 | 
 254 | #ifdef SKASORT_HPP
 255 | 	void *swap;
 256 | #endif
 257 | 
 258 | 	if (*name == '*')
 259 | 	{
 260 | 		if (!strcmp(desc, "random order") || !strcmp(desc, "random 1-4") || !strcmp(desc, "random 4") || !strcmp(desc, "random string") || !strcmp(desc, "random 10"))
 261 | 		{
 262 | 			if (comparisons)
 263 | 			{
 264 | 				compare = 1;
 265 | 				printf("%s\n", "|      Name |    Items | Type |     Best |  Average |  Compares | Samples |     Distribution |");
 266 | 				printf("%s\n", "| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |");
 267 | 			}
 268 | 			else
 269 | 			{
 270 | 				printf("%s\n", "|      Name |    Items | Type |     Best |  Average |     Loops | Samples |     Distribution |");
 271 | 				printf("%s\n", "| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |");
 272 | 			}
 273 | 		}
 274 | 		else
 275 | 		{
 276 | 				printf("%s\n", "|           |          |      |          |          |           |         |                  |");
 277 | 		}
 278 | 		return;
 279 | 	}
 280 | 
 281 | 	name32 = name[0] + (name[1] ? name[1] * 32 : 0) + (name[2] ? name[2] * 1024 : 0);
 282 | 
 283 | 	best = average_time = average_comp = 0;
 284 | 
 285 | 	if (minimum == 7 && maximum == 7)
 286 | 	{
 287 | 		pta = (int *) unsorted;
 288 | 		printf("\e[1;32m%10d %10d %10d %10d %10d %10d %10d\e[0m\n", pta[0], pta[1], pta[2], pta[3], pta[4], pta[5], pta[6]);
 289 | 		pta = (int *) array;
 290 | 	}
 291 | 
 292 | 	for (sam = 0 ; sam < samples ; sam++)
 293 | 	{
 294 | 		total = average_comp = 0;
 295 | 		max = minimum;
 296 | 
 297 | 		start = utime();
 298 | 
 299 | 		for (rep = repetitions - 1 ; rep >= 0 ; rep--)
 300 | 		{
 301 | 			memcpy(array, (char *) unsorted + maximum * rep * size, max * size);
 302 | 
 303 | 			comparisons = 0;
 304 | 
 305 | 			// edit char *sorts to add / remove sorts
 306 | 
 307 | 			switch (name32)
 308 | 			{
 309 | #ifdef BLITSORT_H
 310 | 				case 'b' + 'l' * 32 + 'i' * 1024: blitsort(array, max, size, cmpf); break;
 311 | #endif
 312 | #ifdef CRUMSORT_H
 313 | 				case 'c' + 'r' * 32 + 'u' * 1024: crumsort(array, max, size, cmpf); break;
 314 | #endif
 315 | #ifdef DRIPSORT_H
 316 | 				case 'd' + 'r' * 32 + 'i' * 1024: dripsort(array, max, size, cmpf); break;
 317 | #endif
 318 | #ifdef FLOWSORT_H
 319 | 				case 'f' + 'l' * 32 + 'o' * 1024: flowsort(array, max, size, cmpf); break;
 320 | #endif
 321 | #ifdef FLUXSORT_H
 322 | 				case 'f' + 'l' * 32 + 'u' * 1024: fluxsort(array, max, size, cmpf); break;
 323 | 				case 's' + '_' * 32 + 'f' * 1024: fluxsort_size(array, max, size, cmpf); break;
 324 | 
 325 | #endif
 326 | #ifdef GRIDSORT_H
 327 | 				case 'g' + 'r' * 32 + 'i' * 1024: gridsort(array, max, size, cmpf); break;
 328 | #endif
 329 | #ifdef OCTOSORT_H
 330 | 				case 'o' + 'c' * 32 + 't' * 1024: octosort(array, max, size, cmpf); break;
 331 | #endif
 332 | #ifdef PIPOSORT_H
 333 | 				case 'p' + 'i' * 32 + 'p' * 1024: piposort(array, max, size, cmpf); break;
 334 | #endif
 335 | #ifdef QUADSORT_H
 336 | 				case 'q' + 'u' * 32 + 'a' * 1024: quadsort(array, max, size, cmpf); break;
 337 | 				case 's' + '_' * 32 + 'q' * 1024: quadsort_size(array, max, size, cmpf); break;
 338 | #endif
 339 | #ifdef SKIPSORT_H
 340 | 				case 's' + 'k' * 32 + 'i' * 1024: skipsort(array, max, size, cmpf); break;
 341 | #endif
 342 | #ifdef WOLFSORT_H
 343 | 				case 'w' + 'o' * 32 + 'l' * 1024: wolfsort(array, max, size, cmpf); break;
 344 | #endif
 345 | 				case 'q' + 's' * 32 + 'o' * 1024: qsort(array, max, size, cmpf); break;
 346 | 
 347 | #ifdef RHSORT_C
 348 | 				case 'r' + 'h' * 32 + 's' * 1024: if (size == sizeof(int)) rhsort32(pta, max); else return; break;
 349 | #endif
 350 | 
 351 | #ifdef __GNUG__
 352 | 				case 's' + 'o' * 32 + 'r' * 1024: if (size == sizeof(int)) std::sort(pta, pta + max); else if (size == sizeof(long long)) std::sort(ptla, ptla + max); else std::sort(ptda, ptda + max); break;
 353 | 				case 's' + 't' * 32 + 'a' * 1024: if (size == sizeof(int)) std::stable_sort(pta, pta + max); else if (size == sizeof(long long)) std::stable_sort(ptla, ptla + max); else std::stable_sort(ptda, ptda + max); break;
 354 | 
 355 |   #ifdef PDQSORT_H
 356 | 				case 'p' + 'd' * 32 + 'q' * 1024: if (size == sizeof(int)) pdqsort(pta, pta + max); else if (size == sizeof(long long)) pdqsort(ptla, ptla + max); else pdqsort(ptda, ptda + max); break;
 357 |   #endif
 358 |   #ifdef SKASORT_HPP
 359 | 				case 's' + 'k' * 32 + 'a' * 1024: swap = malloc(max * size); if (size == sizeof(int)) ska_sort_copy(pta, pta + max, (int *) swap); else if (size == sizeof(long long)) ska_sort_copy(ptla, ptla + max, (long long *) swap); else repetitions = 0; free(swap); break;
 360 |   #endif
 361 |   #ifdef GFX_TIMSORT_HPP
 362 | 				case 't' + 'i' * 32 + 'm' * 1024: if (size == sizeof(int)) gfx::timsort(pta, pta + max, cpp_cmp_int); else if (size == sizeof(long long)) gfx::timsort(ptla, ptla + max); else gfx::timsort(ptda, ptda + max); break;
 363 |   #endif
 364 | #endif
 365 | 				default:
 366 | 					switch (name32)
 367 | 					{
 368 | 						case 's' + 'o' * 32 + 'r' * 1024:
 369 | 						case 's' + 't' * 32 + 'a' * 1024:
 370 | 						case 'p' + 'd' * 32 + 'q' * 1024: 
 371 | 						case 'r' + 'h' * 32 + 's' * 1024:
 372 | 						case 's' + 'k' * 32 + 'a' * 1024:
 373 | 						case 't' + 'i' * 32 + 'm' * 1024:
 374 | 							printf("unknown sort: %s (compile with g++ instead of gcc?)\n", name);
 375 | 							return;
 376 | 						default:
 377 | 							printf("unknown sort: %s\n", name);
 378 | 							return;
 379 | 					}
 380 | 			}
 381 | 			average_comp += comparisons;
 382 | 
 383 | 			if (minimum < maximum && ++max > maximum)
 384 | 			{
 385 | 				max = minimum;
 386 | 			}
 387 | 		}
 388 | 		end = utime();
 389 | 
 390 | 		total = end - start;
 391 | 
 392 | 		if (!best || total < best)
 393 | 		{
 394 | 			best = total;
 395 | 		}
 396 | 		average_time += total;
 397 | 	}
 398 | 
 399 | 	if (minimum == 7 && maximum == 7)
 400 | 	{
 401 | 		printf("\e[1;32m%10d %10d %10d %10d %10d %10d %10d\e[0m\n", pta[0], pta[1], pta[2], pta[3], pta[4], pta[5], pta[6]);
 402 | 	}
 403 | 
 404 | 	if (repetitions == 0)
 405 | 	{
 406 | 		return;
 407 | 	}
 408 | 
 409 | 	average_time /= samples;
 410 | 
 411 | 	if (cmpf == cmp_stable)
 412 | 	{
 413 | 		for (cnt = 1 ; cnt < maximum ; cnt++)
 414 | 		{
 415 | 			if (pta[cnt - 1] > pta[cnt])
 416 | 			{
 417 | 				sprintf(temp, "\e[1;31m%16s\e[0m", "unstable");
 418 | 				desc = temp;
 419 | 				break;
 420 | 			}
 421 | 		}
 422 | 	}
 423 | 
 424 | 	if (compare)
 425 | 	{
 426 | 		if (repetitions <= 1)
 427 | 		{
 428 | 			printf("|%10s |%9d | %4d |%9f |%9f |%10d | %7d | %16s |\e[0m\n", name, maximum, (int) size * 8, best / 1000000.0, average_time / 1000000.0, (int) comparisons, samples, desc);
 429 | 		}
 430 | 		else
 431 | 		{
 432 | 			printf("|%10s |%9d | %4d |%9f |%9f |%10.1f | %7d | %16s |\e[0m\n", name, maximum, (int) size * 8, best / 1000000.0, average_time / 1000000.0, (float) average_comp / repetitions, samples, desc);
 433 | 		}
 434 | 	}
 435 | 	else
 436 | 	{
 437 | 		printf("|%10s | %8d | %4d | %f | %f | %9d | %7d | %16s |\e[0m\n", name, maximum, (int) size * 8, best / 1000000.0, average_time / 1000000.0, repetitions, samples, desc);
 438 | 	}
 439 | 
 440 | 	if (minimum != maximum || cmpf == cmp_stable)
 441 | 	{
 442 | 		return;
 443 | 	}
 444 | 
 445 | 	for (cnt = 1 ; cnt < maximum ; cnt++)
 446 | 	{
 447 | 		if (cmpf == cmp_str)
 448 | 		{
 449 | 			char **ptsa = (char **) array;
 450 | 			if (strcmp((char *) ptsa[cnt - 1], (char *) ptsa[cnt]) > 0)
 451 | 			{
 452 | 				printf("%17s: not properly sorted at index %d. (%s vs %s\n", name, cnt, (char *) ptsa[cnt - 1], (char *) ptsa[cnt]);
 453 | 				break;
 454 | 			}
 455 | 		}
 456 | 		else if (size == sizeof(int *) && cmpf == cmp_long_double_ptr)
 457 | 		{
 458 | 			long double **pptda = (long double **) array;
 459 | 
 460 | 			if (cmp_long_double_ptr(&pptda[cnt - 1], &pptda[cnt]) > 0)
 461 | 			{
 462 | 				printf("%17s: not properly sorted at index %d. (%Lf vs %Lf\n", name, cnt, *pptda[cnt - 1], *pptda[cnt]);
 463 | 				break;
 464 | 			}
 465 | 		}
 466 | 		else if (cmpf == cmp_long_ptr)
 467 | 		{
 468 | 			long long **pptla = (long long **) array;
 469 | 
 470 | 			if (cmp_long_ptr(&pptla[cnt - 1], &pptla[cnt]) > 0)
 471 | 			{
 472 | 				printf("%17s: not properly sorted at index %d. (%lld vs %lld\n", name, cnt, *pptla[cnt - 1], *pptla[cnt]);
 473 | 				break;
 474 | 			}
 475 | 		}
 476 | 		else if (cmpf == cmp_int_ptr)
 477 | 		{
 478 | 			int **pptia = (int **) array;
 479 | 
 480 | 			if (cmp_int_ptr(&pptia[cnt - 1], &pptia[cnt]) > 0)
 481 | 			{
 482 | 				printf("%17s: not properly sorted at index %d. (%d vs %d\n", name, cnt, *pptia[cnt - 1], *pptia[cnt]);
 483 | 				break;
 484 | 			}
 485 | 		}
 486 | 		else if (size == sizeof(int))
 487 | 		{
 488 | 			if (pta[cnt - 1] > pta[cnt])
 489 | 			{
 490 | 				printf("%17s: not properly sorted at index %d. (%d vs %d\n", name, cnt, pta[cnt - 1], pta[cnt]);
 491 | 				break;
 492 | 			}
 493 | 			if (pta[cnt - 1] == pta[cnt])
 494 | 			{
 495 | //				printf("%17s: Found a repeat value at index %d. (%d)\n", name, cnt, pta[cnt]);
 496 | 			}
 497 | 		}
 498 | 		else if (size == sizeof(long long))
 499 | 		{
 500 | 			if (ptla[cnt - 1] > ptla[cnt])
 501 | 			{
 502 | 				printf("%17s: not properly sorted at index %d. (%lld vs %lld\n", name, cnt, ptla[cnt - 1], ptla[cnt]);
 503 | 				break;
 504 | 			}
 505 | 		}
 506 | 		else if (size == sizeof(long double))
 507 | 		{
 508 | 			if (cmp_long_double(&ptda[cnt - 1], &ptda[cnt]) > 0)
 509 | 			{
 510 | 				printf("%17s: not properly sorted at index %d. (%Lf vs %Lf\n", name, cnt, ptda[cnt - 1], ptda[cnt]);
 511 | 				break;
 512 | 			}
 513 | 		}
 514 | 	}
 515 | 
 516 | 	for (cnt = 1 ; cnt < maximum ; cnt++)
 517 | 	{
 518 | 		if (size == sizeof(int))
 519 | 		{
 520 | 			if (pta[cnt] != ptv[cnt])
 521 | 			{
 522 | 				printf("         validate: array[%d] != valid[%d]. (%d vs %d\n", cnt, cnt, pta[cnt], ptv[cnt]);
 523 | 				break;
 524 | 			}
 525 | 		}
 526 | 		else if (size == sizeof(long long))
 527 | 		{
 528 | 			if (ptla[cnt] != ptlv[cnt])
 529 | 			{
 530 | 				if (cmpf == cmp_str)
 531 | 				{
 532 | 					char **ptsa = (char **) array;
 533 | 					char **ptsv = (char **) valid;
 534 | 
 535 | 					printf("         validate: array[%d] != valid[%d]. (%s vs %s) %s\n", cnt, cnt, (char *) ptsa[cnt], (char *) ptsv[cnt], !strcmp((char *) ptsa[cnt], (char *) ptsv[cnt]) ? "\e[1;31munstable\e[0m" : "");
 536 | 					break;
 537 | 				}
 538 | 				if (cmpf == cmp_long_ptr)
 539 | 				{
 540 | 					long long **ptla = (long long **) array;
 541 | 					long long **ptlv = (long long **) valid;
 542 | 
 543 | 					printf("         validate: array[%d] != valid[%d]. (%lld vs %lld) %s\n", cnt, cnt, *ptla[cnt], *ptlv[cnt], (*ptla[cnt] == *ptlv[cnt]) ? "\e[1;31munstable\e[0m" : "");
 544 | 					break;
 545 | 				}
 546 | 				if (cmpf == cmp_int_ptr)
 547 | 				{
 548 | 					int **ptia = (int **) array;
 549 | 					int **ptiv = (int **) valid;
 550 | 
 551 | 					printf("         validate: array[%d] != valid[%d]. (%d vs %d) %s\n", cnt, cnt, *ptia[cnt], *ptiv[cnt], (*ptia[cnt] == *ptiv[cnt]) ? "\e[1;31munstable\e[0m" : "");
 552 | 					break;
 553 | 				}
 554 | 
 555 | 				printf("         validate: array[%d] != valid[%d]. (%lld vs %lld\n", cnt, cnt, ptla[cnt], ptlv[cnt]);
 556 | 				break;
 557 | 			}
 558 | 		}
 559 | 		else if (size == sizeof(long double))
 560 | 		{
 561 | 			if (ptda[cnt] != ptdv[cnt])
 562 | 			{
 563 | 				printf("         validate: array[%d] != valid[%d]. (%Lf vs %Lf\n", cnt, cnt, ptda[cnt], ptdv[cnt]);
 564 | 				break;
 565 | 			}
 566 | 		}
 567 | 	}
 568 | }
 569 | 
 570 | void validate()
 571 | {
 572 | 	int seed = time(NULL);
 573 | 	int cnt, val, max = 1000;
 574 | 
 575 | 	int *a_array, *r_array, *v_array;
 576 | 
 577 | 	seed_rand(seed);
 578 | 
 579 | 	a_array = (int *) malloc(max * sizeof(int));
 580 | 	r_array = (int *) malloc(max * sizeof(int));
 581 | 	v_array = (int *) malloc(max * sizeof(int));
 582 | 
 583 | 	for (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = rand();
 584 | 
 585 | 	for (cnt = 0 ; cnt < max ; cnt++)
 586 | 	{
 587 | 		memcpy(a_array, r_array, cnt * sizeof(int));
 588 | 		memcpy(v_array, r_array, cnt * sizeof(int));
 589 | 
 590 | 		quadsort_prim(a_array, cnt, sizeof(int));
 591 | 		qsort(v_array, cnt, sizeof(int), cmp_int);
 592 | 
 593 | 		for (val = 0 ; val < cnt ; val++)
 594 | 		{
 595 | 			if (val && v_array[val - 1] > v_array[val]) {printf("\e[1;31mvalidate rand: seed %d: size: %d Not properly sorted at index %d.\n", seed, cnt, val); return;}
 596 | 			if (a_array[val] != v_array[val])           {printf("\e[1;31mvalidate rand: seed %d: size: %d Not verified at index %d.\n", seed, cnt, val); return;}
 597 | 		}
 598 | 	}
 599 | 
 600 | 	// ascending saw
 601 | 
 602 | 	for (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = cnt % (max / 5);
 603 | 
 604 | 	for (cnt = 0 ; cnt < max ; cnt += 7)
 605 | 	{
 606 | 		memcpy(a_array, r_array, cnt * sizeof(int));
 607 | 		memcpy(v_array, r_array, cnt * sizeof(int));
 608 | 
 609 | 		quadsort(a_array, cnt, sizeof(int), cmp_int);
 610 | 		qsort(v_array, cnt, sizeof(int), cmp_int);
 611 | 
 612 | 		for (val = 0 ; val < cnt ; val++)
 613 | 		{
 614 | 			if (val && v_array[val - 1] > v_array[val]) {printf("\e[1;31mvalidate ascending saw: seed %d: size: %d Not properly sorted at index %d.\n", seed, cnt, val); return;}
 615 | 			if (a_array[val] != v_array[val])           {printf("\e[1;31mvalidate ascending saw: seed %d: size: %d Not verified at index %d.\n", seed, cnt, val); return;}
 616 | 		}
 617 | 	}
 618 | 
 619 | 	// descending saw
 620 | 
 621 | 	for (cnt = 0 ; cnt < max ; cnt++)
 622 | 	{
 623 | 		r_array[cnt] = (max - cnt + 1) % (max / 11);
 624 | 	}
 625 | 
 626 | 	for (cnt = 1 ; cnt < max ; cnt += 7)
 627 | 	{
 628 | 		memcpy(a_array, r_array, cnt * sizeof(int));
 629 | 		memcpy(v_array, r_array, cnt * sizeof(int));
 630 | 
 631 | 		quadsort(a_array, cnt, sizeof(int), cmp_int);
 632 | 		qsort(v_array, cnt, sizeof(int), cmp_int);
 633 | 
 634 | 		for (val = 0 ; val < cnt ; val++)
 635 | 		{
 636 | 			if (val && v_array[val - 1] > v_array[val]) {printf("\e[1;31mvalidate descending saw: seed %d: size: %d Not properly sorted at index %d.\n\n", seed, cnt, val); return;}
 637 | 			if (a_array[val] != v_array[val])           {printf("\e[1;31mvalidate descending saw: seed %d: size: %d Not verified at index %d.\n\n", seed, cnt, val); return;}
 638 | 		}
 639 | 	}
 640 | 
 641 | 	// random half
 642 | 
 643 | 	for (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = (cnt < max / 2) ? cnt : rand();
 644 | 
 645 | 	for (cnt = 1 ; cnt < max ; cnt += 7)
 646 | 	{
 647 | 		memcpy(a_array, r_array, cnt * sizeof(int));
 648 | 		memcpy(v_array, r_array, cnt * sizeof(int));
 649 | 
 650 | 		quadsort(a_array, cnt, sizeof(int), cmp_int);
 651 | 		qsort(v_array, cnt, sizeof(int), cmp_int);
 652 | 
 653 | 		for (val = 0 ; val < cnt ; val++)
 654 | 		{
 655 | 			if (val && v_array[val - 1] > v_array[val]) {printf("\e[1;31mvalidate rand tail: seed %d: size: %d Not properly sorted at index %d.\n", seed, cnt, val); return;}
 656 | 			if (a_array[val] != v_array[val])           {printf("\e[1;31mvalidate rand tail: seed %d: size: %d Not verified at index %d.\n", seed, cnt, val); return;}
 657 | 		}
 658 | 	}
 659 | 	free(a_array);
 660 | 	free(r_array);
 661 | 	free(v_array);
 662 | }
 663 | 
 664 | unsigned int bit_reverse(unsigned int x)
 665 | {
 666 |     x = (((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1));
 667 |     x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2));
 668 |     x = (((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4));
 669 |     x = (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8));
 670 | 
 671 |     return((x >> 16) | (x << 15));
 672 | }
 673 | 
 674 | void run_test(void *a_array, void *r_array, void *v_array, int minimum, int maximum, int samples, int repetitions, int copies, const char *desc, size_t size, CMPFUNC *cmpf)
 675 | {
 676 | 	int cnt, rep;
 677 | 
 678 | 	memcpy(v_array, r_array, maximum * size);
 679 | 
 680 | 	for (rep = 0 ; rep < copies ; rep++)
 681 | 	{
 682 | 		memcpy((char *) r_array + rep * maximum * size, v_array, maximum * size);
 683 | 	}
 684 | 	quadsort(v_array, maximum, size, cmpf);
 685 | 
 686 | 	for (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++)
 687 | 	{
 688 | 		test_sort(a_array, r_array, v_array, minimum, maximum, samples, repetitions, qsort, sorts[cnt], desc, size, cmpf);
 689 | 	}
 690 | }
 691 | 
 692 | void range_test(int max, int samples, int repetitions, int seed)
 693 | {
 694 | 	int cnt, last;
 695 | 	int mem = max * 10 > 32768 * 64 ? max * 10 : 32768 * 64;
 696 | 	char dist[40];
 697 | 
 698 | 	int *a_array = (int *) malloc(max * sizeof(int));
 699 | 	int *r_array = (int *) malloc(mem * sizeof(int));
 700 | 	int *v_array = (int *) malloc(max * sizeof(int));
 701 | 
 702 | 	srand(seed);
 703 | 
 704 | 	for (cnt = 0 ; cnt < mem ; cnt++)
 705 | 	{
 706 | 		r_array[cnt] = rand();
 707 | 	}
 708 | 
 709 | 	if (max <= 4096)
 710 | 	{
 711 | 		for (last = 1, samples = 32768*4, repetitions = 4 ; repetitions <= max ; repetitions *= 2, samples /= 2)
 712 | 		{
 713 | 			if (max >= repetitions)
 714 | 			{
 715 | 				sprintf(dist, "random %d-%d", last, repetitions);
 716 | 
 717 | 				memcpy(v_array, r_array, repetitions * sizeof(int));
 718 | 				quadsort(v_array, repetitions, sizeof(int), cmp_int);
 719 | 
 720 | 				for (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++)
 721 | 				{
 722 | 					test_sort(a_array, r_array, v_array, last, repetitions, 50, samples, qsort, sorts[cnt], dist, sizeof(int), cmp_int);
 723 | 				}
 724 | 				last = repetitions + 1;
 725 | 			}
 726 | 		}
 727 | 		free(a_array);
 728 | 		free(r_array);
 729 | 		free(v_array);
 730 | 		return;
 731 | 	}
 732 | 
 733 | 	if (max == 10000000)
 734 | 	{
 735 | 		repetitions = 10000000;
 736 | 
 737 | 		for (max = 10 ; max <= 10000000 ; max *= 10)
 738 | 		{
 739 | 			repetitions /= 10;
 740 | 
 741 | 			memcpy(v_array, r_array, max * sizeof(int));
 742 | 			quadsort_prim(v_array, max, sizeof(int));
 743 | 
 744 | 			sprintf(dist, "random %d", max);
 745 | 
 746 | 			for (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++)
 747 | 			{
 748 | 				test_sort(a_array, r_array, v_array, max, max, 10, repetitions, qsort, sorts[cnt], dist, sizeof(int), cmp_int);
 749 | 			}
 750 | 		}
 751 | 	}
 752 | 	else
 753 | 	{
 754 | 		for (samples = 32768*4, repetitions = 4 ; samples > 0 ; repetitions *= 2, samples /= 2)
 755 | 		{
 756 | 			if (max >= repetitions)
 757 | 			{
 758 | 				memcpy(v_array, r_array, repetitions * sizeof(int));
 759 | 				quadsort(v_array, repetitions, sizeof(int), cmp_int);
 760 | 
 761 | 				sprintf(dist, "random %d", repetitions);
 762 | 
 763 | 				for (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++)
 764 | 				{
 765 | 					test_sort(a_array, r_array, v_array, repetitions, repetitions, 100, samples, qsort, sorts[cnt], dist, sizeof(int), cmp_int);
 766 | 				}
 767 | 			}
 768 | 		}
 769 | 	}
 770 | 	free(a_array);
 771 | 	free(r_array);
 772 | 	free(v_array);
 773 | 	return;
 774 | }
 775 | 
 776 | #define VAR int
 777 | 
 778 | int main(int argc, char **argv)
 779 | {
 780 | 	int max = 100000;
 781 | 	int samples = 10;
 782 | 	int repetitions = 1;
 783 | 	int seed = 0;
 784 | 	int cnt, mem;
 785 | 	VAR *a_array, *r_array, *v_array, sum;
 786 | 
 787 | 	if (argc >= 1 && argv[1] && *argv[1])
 788 | 	{
 789 | 		max = atoi(argv[1]);
 790 | 	}
 791 | 
 792 | 	if (argc >= 2 && argv[2] && *argv[2])
 793 | 	{
 794 | 		samples = atoi(argv[2]);
 795 | 	}
 796 | 
 797 | 	if (argc >= 3 && argv[3] && *argv[3])
 798 | 	{
 799 | 		repetitions = atoi(argv[3]);
 800 | 	}
 801 | 
 802 | 	if (argc >= 4 && argv[4] && *argv[4])
 803 | 	{
 804 | 		seed = atoi(argv[4]);
 805 | 	}
 806 | 
 807 | 	validate();
 808 | 
 809 | 	seed = seed ? seed : time(NULL);
 810 | 
 811 | 	printf("Info: int = %lu, long long = %lu, long double = %lu\n\n", sizeof(int) * 8, sizeof(long long) * 8, sizeof(long double) * 8);
 812 | 
 813 | 	printf("Benchmark: array size: %d, samples: %d, repetitions: %d, seed: %d\n\n", max, samples, repetitions, seed);
 814 | 
 815 | 	if (repetitions == 0)
 816 | 	{
 817 | 		range_test(max, samples, repetitions, seed);
 818 | 		return 0;
 819 | 	}
 820 | 
 821 | 	mem = max * repetitions;
 822 | 
 823 | #ifndef SKIP_STRINGS
 824 | #ifndef cmp
 825 | 
 826 | 	// C string
 827 | 
 828 | 	{
 829 | 		char **sa_array = (char **) malloc(max * sizeof(char **));
 830 | 		char **sr_array = (char **) malloc(mem * sizeof(char **));
 831 | 		char **sv_array = (char **) malloc(max * sizeof(char **));
 832 | 
 833 | 		char *buffer = (char *) malloc(mem * 16);
 834 | 
 835 | 		seed_rand(seed);
 836 | 
 837 | 		for (cnt = 0 ; cnt < mem ; cnt++)
 838 | 		{
 839 | 			sprintf(buffer + cnt * 16, "%X", rand() % 1000000);
 840 | 
 841 | 			sr_array[cnt] = buffer + cnt * 16;
 842 | 		}
 843 | 		run_test(sa_array, sr_array, sv_array, max, max, samples, repetitions, 0, "random string", sizeof(char **), cmp_str);
 844 | 
 845 | 		free(sa_array);
 846 | 		free(sr_array);
 847 | 		free(sv_array);
 848 | 
 849 | 		free(buffer);
 850 | 	}
 851 | 
 852 | 	// long double table
 853 | 
 854 | 	{
 855 | 		long double **da_array = (long double **) malloc(max * sizeof(long double *));
 856 | 		long double **dr_array = (long double **) malloc(mem * sizeof(long double *));
 857 | 		long double **dv_array = (long double **) malloc(max * sizeof(long double *));
 858 | 
 859 | 		long double *buffer = (long double *) malloc(mem * sizeof(long double));
 860 | 
 861 | 		if (da_array == NULL || dr_array == NULL || dv_array == NULL)
 862 | 		{
 863 | 			printf("main(%d,%d,%d): malloc: %s\n", max, samples, repetitions, strerror(errno));
 864 | 
 865 | 			return 0;
 866 | 		}
 867 | 
 868 | 		seed_rand(seed);
 869 | 
 870 | 		for (cnt = 0 ; cnt < mem ; cnt++)
 871 | 		{
 872 | 			buffer[cnt] = (long double) rand();
 873 | 			buffer[cnt] += (long double) ((unsigned long long) rand() << 32ULL);
 874 | 
 875 | 			dr_array[cnt] = buffer + cnt;
 876 | 		}
 877 | 		run_test(da_array, dr_array, dv_array, max, max, samples, repetitions, 0, "random double", sizeof(long double *), cmp_long_double_ptr);
 878 | 
 879 | 		free(da_array);
 880 | 		free(dr_array);
 881 | 		free(dv_array);
 882 | 
 883 | 		free(buffer);
 884 | 	}
 885 | 
 886 | 	// long long table
 887 | 
 888 | 	{
 889 | 		long long **la_array = (long long **) malloc(max * sizeof(long long *));
 890 | 		long long **lr_array = (long long **) malloc(mem * sizeof(long long *));
 891 | 		long long **lv_array = (long long **) malloc(max * sizeof(long long *));
 892 | 
 893 | 		long long *buffer = (long long *) malloc(mem * sizeof(long long));
 894 | 
 895 | 		if (la_array == NULL || lr_array == NULL || lv_array == NULL)
 896 | 		{
 897 | 			printf("main(%d,%d,%d): malloc: %s\n", max, samples, repetitions, strerror(errno));
 898 | 
 899 | 			return 0;
 900 | 		}
 901 | 
 902 | 		seed_rand(seed);
 903 | 
 904 | 		for (cnt = 0 ; cnt < mem ; cnt++)
 905 | 		{
 906 | 			buffer[cnt] = (long long) rand();
 907 | 			buffer[cnt] += (long long) ((unsigned long long) rand() << 32ULL);
 908 | 
 909 | 			lr_array[cnt] = buffer + cnt;
 910 | 		}
 911 | 		run_test(la_array, lr_array, lv_array, max, max, samples, repetitions, 0, "random long", sizeof(long long *), cmp_long_ptr);
 912 | 
 913 | 
 914 | 		free(la_array);
 915 | 		free(lr_array);
 916 | 		free(lv_array);
 917 | 
 918 | 		free(buffer);
 919 | 	}
 920 | 
 921 | 	// int table
 922 | 
 923 | 	{
 924 | 		int **la_array = (int **) malloc(max * sizeof(int *));
 925 | 		int **lr_array = (int **) malloc(mem * sizeof(int *));
 926 | 		int **lv_array = (int **) malloc(max * sizeof(int *));
 927 | 
 928 | 		int *buffer = (int *) malloc(mem * sizeof(int));
 929 | 
 930 | 		if (la_array == NULL || lr_array == NULL || lv_array == NULL)
 931 | 		{
 932 | 			printf("main(%d,%d,%d): malloc: %s\n", max, samples, repetitions, strerror(errno));
 933 | 
 934 | 			return 0;
 935 | 		}
 936 | 
 937 | 		seed_rand(seed);
 938 | 
 939 | 		for (cnt = 0 ; cnt < mem ; cnt++)
 940 | 		{
 941 | 			buffer[cnt] = rand();
 942 | 
 943 | 			lr_array[cnt] = buffer + cnt;
 944 | 		}
 945 | 		run_test(la_array, lr_array, lv_array, max, max, samples, repetitions, 0, "random int", sizeof(int *), cmp_int_ptr);
 946 | 
 947 | 		free(la_array);
 948 | 		free(lr_array);
 949 | 		free(lv_array);
 950 | 
 951 | 		free(buffer);
 952 | 
 953 | 		printf("\n");
 954 | 	}
 955 | #endif
 956 | #endif
 957 | 	// 128 bit
 958 | 
 959 | #ifndef SKIP_DOUBLES
 960 | 	long double *da_array = (long double *) malloc(max * sizeof(long double));
 961 | 	long double *dr_array = (long double *) malloc(mem * sizeof(long double));
 962 | 	long double *dv_array = (long double *) malloc(max * sizeof(long double));
 963 | 
 964 | 	if (da_array == NULL || dr_array == NULL || dv_array == NULL)
 965 | 	{
 966 | 		printf("main(%d,%d,%d): malloc: %s\n", max, samples, repetitions, strerror(errno));
 967 | 
 968 | 		return 0;
 969 | 	}
 970 | 
 971 | 	seed_rand(seed);
 972 | 
 973 | 	for (cnt = 0 ; cnt < mem ; cnt++)
 974 | 	{
 975 | 		dr_array[cnt] = (long double) rand();
 976 | 		dr_array[cnt] += (long double) ((unsigned long long) rand() << 32ULL);
 977 | 		dr_array[cnt] += 1.0L / 3.0L;
 978 | 	}
 979 | 
 980 | 	memcpy(dv_array, dr_array, max * sizeof(long double));
 981 | 	quadsort(dv_array, max, sizeof(long double), cmp_long_double);
 982 | 
 983 | 	for (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++)
 984 | 	{
 985 | 		test_sort(da_array, dr_array, dv_array, max, max, samples, repetitions, qsort, sorts[cnt], "random order", sizeof(long double), cmp_long_double);
 986 | 	}
 987 | #ifndef cmp
 988 | #ifdef QUADSORT_H
 989 | 	test_sort(da_array, dr_array, dv_array, max, max, samples, repetitions, qsort, "s_quadsort", "random order", sizeof(long double), cmp_long_double_ptr);
 990 | #endif
 991 | #endif
 992 | 	free(da_array);
 993 | 	free(dr_array);
 994 | 	free(dv_array);
 995 | 
 996 | 	printf("\n");
 997 | #endif
 998 | 	// 64 bit
 999 | 
1000 | #ifndef SKIP_LONGS
1001 | 	long long *la_array = (long long *) malloc(max * sizeof(long long));
1002 | 	long long *lr_array = (long long *) malloc(mem * sizeof(long long));
1003 | 	long long *lv_array = (long long *) malloc(max * sizeof(long long));
1004 | 
1005 | 	if (la_array == NULL || lr_array == NULL || lv_array == NULL)
1006 | 	{
1007 | 		printf("main(%d,%d,%d): malloc: %s\n", max, samples, repetitions, strerror(errno));
1008 | 
1009 | 		return 0;
1010 | 	}
1011 | 
1012 | 	seed_rand(seed);
1013 | 
1014 | 	for (cnt = 0 ; cnt < mem ; cnt++)
1015 | 	{
1016 | 		lr_array[cnt] = rand();
1017 | 		lr_array[cnt] += (unsigned long long) rand() << 32ULL;
1018 | 	}
1019 | 
1020 | 	memcpy(lv_array, lr_array, max * sizeof(long long));
1021 | 	quadsort(lv_array, max, sizeof(long long), cmp_long);
1022 | 
1023 | 	for (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++)
1024 | 	{
1025 | 		test_sort(la_array, lr_array, lv_array, max, max, samples, repetitions, qsort, sorts[cnt], "random order", sizeof(long long), cmp_long);
1026 | 	}
1027 | 
1028 | 	free(la_array);
1029 | 	free(lr_array);
1030 | 	free(lv_array);
1031 | 
1032 | 	printf("\n");
1033 | #endif
1034 | 	// 32 bit
1035 | 
1036 | 	a_array = (VAR *) malloc(max * sizeof(VAR));
1037 | 	r_array = (VAR *) malloc(mem * sizeof(VAR));
1038 | 	v_array = (VAR *) malloc(max * sizeof(VAR));
1039 | 
1040 | 	int quad0 = 0;
1041 | 	int nmemb = max;
1042 | 	int half1 = nmemb / 2;
1043 | 	int half2 = nmemb - half1;
1044 | 	int quad1 = half1 / 2;
1045 | 	int quad2 = half1 - quad1;
1046 | 	int quad3 = half2 / 2;
1047 | 	int quad4 = half2 - quad3;
1048 | 
1049 | 	int span3 = quad1 + quad2 + quad3;
1050 | 
1051 | 	// random
1052 | 
1053 | 	seed_rand(seed);
1054 | 
1055 | 	for (cnt = 0 ; cnt < mem ; cnt++)
1056 | 	{
1057 | 		r_array[cnt] = rand();
1058 | 	}
1059 | 	run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "random order", sizeof(VAR), cmp_int);
1060 | 
1061 | 	// random % 100
1062 | 
1063 | 	for (cnt = 0 ; cnt < mem ; cnt++)
1064 | 	{
1065 | 		r_array[cnt] = rand() % 100;
1066 | 	}
1067 | 	run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "random % 100", sizeof(VAR), cmp_int);
1068 | 
1069 | 	// ascending
1070 | 
1071 | 	for (cnt = sum = 0 ; cnt < mem ; cnt++)
1072 | 	{
1073 | 		r_array[cnt] = sum; sum += rand() % 5;
1074 | 	}
1075 | 
1076 | 	run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "ascending order", sizeof(VAR), cmp_int);
1077 | 
1078 | 	// ascending saw
1079 | 
1080 | 	for (cnt = 0 ; cnt < max ; cnt++)
1081 | 	{
1082 | 		r_array[cnt] = rand();
1083 | 	}
1084 | 
1085 | 	quadsort(r_array + quad0, quad1, sizeof(VAR), cmp_int);
1086 | 	quadsort(r_array + quad1, quad2, sizeof(VAR), cmp_int);
1087 | 	quadsort(r_array + half1, quad3, sizeof(VAR), cmp_int);
1088 | 	quadsort(r_array + span3, quad4, sizeof(VAR), cmp_int);
1089 | 
1090 | 	run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "ascending saw", sizeof(VAR), cmp_int);
1091 | 
1092 | 	// pipe organ
1093 | 
1094 | 	for (cnt = 0 ; cnt < max ; cnt++)
1095 | 	{
1096 | 		r_array[cnt] = rand();
1097 | 	}
1098 | 
1099 | 	quadsort(r_array + quad0, half1, sizeof(VAR), cmp_int);
1100 | 	qsort(r_array + half1, half2, sizeof(VAR), cmp_rev);
1101 | 
1102 | 	for (cnt = half1 + 1 ; cnt < max ; cnt++)
1103 | 	{
1104 | 		if (r_array[cnt] >= r_array[cnt - 1])
1105 | 		{
1106 | 			r_array[cnt] = r_array[cnt - 1] - 1; // guarantee the run is strictly descending
1107 | 		}
1108 | 	}
1109 | 
1110 | 	run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "pipe organ", sizeof(VAR), cmp_int);
1111 | 
1112 | 	// descending
1113 | 
1114 | 	for (cnt = 0, sum = mem * 10 ; cnt < mem ; cnt++)
1115 | 	{
1116 | 		r_array[cnt] = sum; sum -= 1 + rand() % 5;
1117 | 	}
1118 | 	run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "descending order", sizeof(VAR), cmp_int);
1119 | 
1120 | 	// descending saw
1121 | 
1122 | 	for (cnt = 0 ; cnt < max ; cnt++)
1123 | 	{
1124 | 		r_array[cnt] = rand();
1125 | 	}
1126 | 
1127 | 	qsort(r_array + quad0, quad1, sizeof(VAR), cmp_rev);
1128 | 	qsort(r_array + quad1, quad2, sizeof(VAR), cmp_rev);
1129 | 	qsort(r_array + half1, quad3, sizeof(VAR), cmp_rev);
1130 | 	qsort(r_array + span3, quad4, sizeof(VAR), cmp_rev);
1131 | 
1132 | 	for (cnt = 1 ; cnt < max ; cnt++)
1133 | 	{
1134 | 		if (cnt == quad1 || cnt == half1 || cnt == span3) continue;
1135 | 
1136 | 		if (r_array[cnt] >= r_array[cnt - 1])
1137 | 		{
1138 | 			r_array[cnt] = r_array[cnt - 1] - 1; // guarantee the run is strictly descending
1139 | 		}
1140 | 	}
1141 | 
1142 | 	run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "descending saw", sizeof(VAR), cmp_int);
1143 | 
1144 | 
1145 | 	// random tail 25%
1146 | 
1147 | 	for (cnt = 0 ; cnt < max ; cnt++)
1148 | 	{
1149 | 		r_array[cnt] = rand();
1150 | 	}
1151 | 	quadsort(r_array, span3, sizeof(VAR), cmp_int);
1152 | 
1153 | 	run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "random tail", sizeof(VAR), cmp_int);
1154 | 
1155 | 	// random 50%
1156 | 
1157 | 	for (cnt = 0 ; cnt < max ; cnt++)
1158 | 	{
1159 | 		r_array[cnt] = rand();
1160 | 	}
1161 | 	quadsort(r_array, half1, sizeof(VAR), cmp_int);
1162 | 
1163 | 	run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "random half", sizeof(VAR), cmp_int);
1164 | 
1165 | 	// tiles
1166 | 
1167 | 	for (cnt = 0 ; cnt < mem ; cnt++)
1168 | 	{
1169 | 		if (cnt % 2 == 0)
1170 | 		{
1171 | 			r_array[cnt] = 16777216 + cnt;
1172 | 		}
1173 | 		else
1174 | 		{
1175 | 			r_array[cnt] = 33554432 + cnt;
1176 | 		}
1177 | 	}
1178 | 	run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "ascending tiles", sizeof(VAR), cmp_int);
1179 | 
1180 | 	// bit-reversal
1181 | 
1182 | 	for (cnt = 0 ; cnt < mem ; cnt++)
1183 | 	{
1184 | 		r_array[cnt] = bit_reverse(cnt);
1185 | 	}
1186 | 	run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "bit reversal", sizeof(VAR), cmp_int);
1187 | 
1188 | #ifndef cmp
1189 |   #ifdef ANTIQSORT
1190 |     test_antiqsort;
1191 |   #endif
1192 | #endif
1193 | 
1194 | #define QUAD_DEBUG
1195 | #if __has_include("extra_tests.c")
1196 |   #include "extra_tests.c"
1197 | #endif
1198 | 
1199 | 	free(a_array);
1200 | 	free(r_array);
1201 | 	free(v_array);
1202 | 
1203 | 	return 0;
1204 | }
1205 | 


--------------------------------------------------------------------------------
/src/quadsort.c:
--------------------------------------------------------------------------------
   1 | // quadsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com
   2 | 
   3 | // the next seven functions are used for sorting 0 to 31 elements
   4 | 
   5 | void FUNC(parity_swap_four)(VAR *array, CMPFUNC *cmp)
   6 | {
   7 | 	VAR tmp, *pta = array;
   8 | 	size_t x;
   9 | 
  10 | 	branchless_swap(pta, tmp, x, cmp); pta += 2;
  11 | 	branchless_swap(pta, tmp, x, cmp); pta--;
  12 | 
  13 | 	if (cmp(pta, pta + 1) > 0)
  14 | 	{
  15 | 		tmp = pta[0]; pta[0] = pta[1]; pta[1] = tmp; pta--;
  16 | 
  17 | 		branchless_swap(pta, tmp, x, cmp); pta += 2;
  18 | 		branchless_swap(pta, tmp, x, cmp); pta--;
  19 | 		branchless_swap(pta, tmp, x, cmp);
  20 | 	}
  21 | }
  22 | 
  23 | void FUNC(parity_swap_five)(VAR *array, CMPFUNC *cmp)
  24 | {
  25 | 	VAR tmp, *pta = array;
  26 | 	size_t x, y;
  27 | 
  28 | 	branchless_swap(pta, tmp, x, cmp); pta += 2;
  29 | 	branchless_swap(pta, tmp, x, cmp); pta -= 1;
  30 | 	branchless_swap(pta, tmp, x, cmp); pta += 2;
  31 | 	branchless_swap(pta, tmp, y, cmp); pta = array;
  32 | 
  33 | 	if (x + y)
  34 | 	{
  35 | 		branchless_swap(pta, tmp, x, cmp); pta += 2;
  36 | 		branchless_swap(pta, tmp, x, cmp); pta -= 1;
  37 | 		branchless_swap(pta, tmp, x, cmp); pta += 2;
  38 | 		branchless_swap(pta, tmp, x, cmp); pta = array;
  39 | 		branchless_swap(pta, tmp, x, cmp); pta += 2;
  40 | 		branchless_swap(pta, tmp, x, cmp); pta -= 1;
  41 | 	}
  42 | }
  43 | 
  44 | void FUNC(parity_swap_six)(VAR *array, VAR *swap, CMPFUNC *cmp)
  45 | {
  46 | 	VAR tmp, *pta = array, *ptl, *ptr;
  47 | 	size_t x, y;
  48 | 
  49 | 	branchless_swap(pta, tmp, x, cmp); pta++;
  50 | 	branchless_swap(pta, tmp, x, cmp); pta += 3;
  51 | 	branchless_swap(pta, tmp, x, cmp); pta--;
  52 | 	branchless_swap(pta, tmp, x, cmp); pta = array;
  53 | 
  54 | 	if (cmp(pta + 2, pta + 3) <= 0)
  55 | 	{
  56 | 		branchless_swap(pta, tmp, x, cmp); pta += 4;
  57 | 		branchless_swap(pta, tmp, x, cmp);
  58 | 		return;
  59 | 	}
  60 | 	x = cmp(pta, pta + 1) > 0; y = !x; swap[0] = pta[x]; swap[1] = pta[y]; swap[2] = pta[2]; pta += 4;
  61 | 	x = cmp(pta, pta + 1) > 0; y = !x; swap[4] = pta[x]; swap[5] = pta[y]; swap[3] = pta[-1];
  62 | 
  63 | 	pta = array; ptl = swap; ptr = swap + 3;
  64 | 
  65 | 	head_branchless_merge(pta, x, ptl, ptr, cmp);
  66 | 	head_branchless_merge(pta, x, ptl, ptr, cmp);
  67 | 	head_branchless_merge(pta, x, ptl, ptr, cmp);
  68 | 
  69 | 	pta = array + 5; ptl = swap + 2; ptr = swap + 5;
  70 | 
  71 | 	tail_branchless_merge(pta, y, ptl, ptr, cmp);
  72 | 	tail_branchless_merge(pta, y, ptl, ptr, cmp);
  73 | 	*pta = cmp(ptl, ptr)  > 0 ? *ptl : *ptr;
  74 | }
  75 | 
  76 | void FUNC(parity_swap_seven)(VAR *array, VAR *swap, CMPFUNC *cmp)
  77 | {
  78 | 	VAR tmp, *pta = array, *ptl, *ptr;
  79 | 	size_t x, y;
  80 | 
  81 | 	branchless_swap(pta, tmp, x, cmp); pta += 2;
  82 | 	branchless_swap(pta, tmp, x, cmp); pta += 2;
  83 | 	branchless_swap(pta, tmp, x, cmp); pta -= 3;
  84 | 	branchless_swap(pta, tmp, y, cmp); pta += 2;
  85 | 	branchless_swap(pta, tmp, x, cmp); pta += 2; y += x;
  86 | 	branchless_swap(pta, tmp, x, cmp); pta -= 1; y += x;
  87 | 
  88 | 	if (y == 0) return;
  89 | 
  90 | 	branchless_swap(pta, tmp, x, cmp); pta = array;
  91 | 
  92 | 	x = cmp(pta, pta + 1) > 0; swap[0] = pta[x]; swap[1] = pta[!x]; swap[2] = pta[2]; pta += 3;
  93 | 	x = cmp(pta, pta + 1) > 0; swap[3] = pta[x]; swap[4] = pta[!x]; pta += 2;
  94 | 	x = cmp(pta, pta + 1) > 0; swap[5] = pta[x]; swap[6] = pta[!x];
  95 | 
  96 | 	pta = array; ptl = swap; ptr = swap + 3;
  97 | 
  98 | 	head_branchless_merge(pta, x, ptl, ptr, cmp);
  99 | 	head_branchless_merge(pta, x, ptl, ptr, cmp);
 100 | 	head_branchless_merge(pta, x, ptl, ptr, cmp);
 101 | 
 102 | 	pta = array + 6; ptl = swap + 2; ptr = swap + 6;
 103 | 
 104 | 	tail_branchless_merge(pta, y, ptl, ptr, cmp);
 105 | 	tail_branchless_merge(pta, y, ptl, ptr, cmp);
 106 | 	tail_branchless_merge(pta, y, ptl, ptr, cmp);
 107 | 	*pta = cmp(ptl, ptr) > 0 ? *ptl : *ptr;
 108 | }
 109 | 
 110 | void FUNC(tiny_sort)(VAR *array, VAR *swap, size_t nmemb, CMPFUNC *cmp)
 111 | {
 112 | 	VAR tmp;
 113 | 	size_t x;
 114 | 
 115 | 	switch (nmemb)
 116 | 	{
 117 | 		case 0:
 118 | 		case 1:
 119 | 			return;
 120 | 		case 2:
 121 | 			branchless_swap(array, tmp, x, cmp);
 122 | 			return;
 123 | 		case 3:
 124 | 			branchless_swap(array, tmp, x, cmp); array++;
 125 | 			branchless_swap(array, tmp, x, cmp); array--;
 126 | 			branchless_swap(array, tmp, x, cmp);
 127 | 			return;
 128 | 		case 4:
 129 | 			FUNC(parity_swap_four)(array, cmp);
 130 | 			return;
 131 | 		case 5:
 132 | 			FUNC(parity_swap_five)(array, cmp);
 133 | 			return;
 134 | 		case 6:
 135 | 			FUNC(parity_swap_six)(array, swap, cmp);
 136 | 			return;
 137 | 		case 7:
 138 | 			FUNC(parity_swap_seven)(array, swap, cmp);
 139 | 			return;
 140 | 	}
 141 | }
 142 | 
 143 | // left must be equal or one smaller than right
 144 | 
 145 | void FUNC(parity_merge)(VAR *dest, VAR *from, size_t left, size_t right, CMPFUNC *cmp)
 146 | {
 147 | 	VAR *ptl, *ptr, *tpl, *tpr, *tpd, *ptd;
 148 | #if !defined __clang__
 149 | 	size_t x, y;
 150 | #endif
 151 | 	ptl = from;
 152 | 	ptr = from + left;
 153 | 	ptd = dest;
 154 | 	tpl = ptr - 1;
 155 | 	tpr = tpl + right;
 156 | 	tpd = dest + left + right - 1;
 157 | 
 158 | 	if (left < right)
 159 | 	{
 160 | 		*ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++;
 161 | 	}
 162 | 	*ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++;
 163 | 
 164 | #if !defined cmp && !defined __clang__ // cache limit workaround for gcc
 165 | 	if (left > QUAD_CACHE)
 166 | 	{
 167 | 		while (--left)
 168 | 		{
 169 | 			*ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++;
 170 | 			*tpd-- = cmp(tpl, tpr)  > 0 ? *tpl-- : *tpr--;
 171 | 		}
 172 | 	}
 173 | 	else
 174 | #endif
 175 | 	{
 176 | 		while (--left)
 177 | 		{
 178 | 			head_branchless_merge(ptd, x, ptl, ptr, cmp);
 179 | 			tail_branchless_merge(tpd, y, tpl, tpr, cmp);
 180 | 		}
 181 | 	}
 182 | 	*tpd = cmp(tpl, tpr)  > 0 ? *tpl : *tpr;
 183 | }
 184 | 
 185 | void FUNC(tail_swap)(VAR *array, VAR *swap, size_t nmemb, CMPFUNC *cmp)
 186 | {
 187 | 	if (nmemb < 8)
 188 | 	{
 189 | 		FUNC(tiny_sort)(array, swap, nmemb, cmp);
 190 | 		return;
 191 | 	}
 192 | 	size_t quad1, quad2, quad3, quad4, half1, half2;
 193 | 
 194 | 	half1 = nmemb / 2;
 195 | 	quad1 = half1 / 2;
 196 | 	quad2 = half1 - quad1;
 197 | 	half2 = nmemb - half1;
 198 | 	quad3 = half2 / 2;
 199 | 	quad4 = half2 - quad3;
 200 | 
 201 | 	VAR *pta = array;
 202 | 
 203 | 	FUNC(tail_swap)(pta, swap, quad1, cmp); pta += quad1;
 204 | 	FUNC(tail_swap)(pta, swap, quad2, cmp); pta += quad2;
 205 | 	FUNC(tail_swap)(pta, swap, quad3, cmp); pta += quad3;
 206 | 	FUNC(tail_swap)(pta, swap, quad4, cmp);
 207 | 
 208 | 	if (cmp(array + quad1 - 1, array + quad1) <= 0 && cmp(array + half1 - 1, array + half1) <= 0 && cmp(pta - 1, pta) <= 0)
 209 | 	{
 210 | 		return;
 211 | 	}
 212 | 	FUNC(parity_merge)(swap, array, quad1, quad2, cmp);
 213 | 	FUNC(parity_merge)(swap + half1, array + half1, quad3, quad4, cmp);
 214 | 	FUNC(parity_merge)(array, swap, half1, half2, cmp);
 215 | }
 216 | 
 217 | // the next three functions create sorted blocks of 32 elements
 218 | 
 219 | void FUNC(quad_reversal)(VAR *pta, VAR *ptz)
 220 | {
 221 | 	VAR *ptb, *pty, tmp1, tmp2;
 222 | 
 223 | 	size_t loop = (ptz - pta) / 2;
 224 | 
 225 | 	ptb = pta + loop;
 226 | 	pty = ptz - loop;
 227 | 
 228 | 	if (loop % 2 == 0)
 229 | 	{
 230 | 		tmp2 = *ptb; *ptb-- = *pty; *pty++ = tmp2; loop--;
 231 | 	}
 232 | 
 233 | 	loop /= 2;
 234 | 
 235 | 	do
 236 | 	{
 237 | 		tmp1 = *pta; *pta++ = *ptz; *ptz-- = tmp1;
 238 | 		tmp2 = *ptb; *ptb-- = *pty; *pty++ = tmp2;
 239 | 	}
 240 | 	while (loop--);
 241 | }
 242 | 
 243 | void FUNC(quad_swap_merge)(VAR *array, VAR *swap, CMPFUNC *cmp)
 244 | {
 245 | 	VAR *pts, *ptl, *ptr;
 246 | #if !defined __clang__
 247 | 	size_t x;
 248 | #endif
 249 | 	parity_merge_two(array + 0, swap + 0, x, ptl, ptr, pts, cmp);
 250 | 	parity_merge_two(array + 4, swap + 4, x, ptl, ptr, pts, cmp);
 251 | 
 252 | 	parity_merge_four(swap, array, x, ptl, ptr, pts, cmp);
 253 | }
 254 | 
 255 | void FUNC(tail_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp);
 256 | 
 257 | size_t FUNC(quad_swap)(VAR *array, size_t nmemb, CMPFUNC *cmp)
 258 | {
 259 | 	VAR tmp, swap[32];
 260 | 	size_t count;
 261 | 	VAR *pta, *pts;
 262 | 	unsigned char v1, v2, v3, v4, x;
 263 | 	pta = array;
 264 | 
 265 | 	count = nmemb / 8;
 266 | 
 267 | 	while (count--)
 268 | 	{
 269 | 		v1 = cmp(pta + 0, pta + 1) > 0;
 270 | 		v2 = cmp(pta + 2, pta + 3) > 0;
 271 | 		v3 = cmp(pta + 4, pta + 5) > 0;
 272 | 		v4 = cmp(pta + 6, pta + 7) > 0;
 273 | 
 274 | 		switch (v1 + v2 * 2 + v3 * 4 + v4 * 8)
 275 | 		{
 276 | 			case 0:
 277 | 				if (cmp(pta + 1, pta + 2) <= 0 && cmp(pta + 3, pta + 4) <= 0 && cmp(pta + 5, pta + 6) <= 0)
 278 | 				{
 279 | 					goto ordered;
 280 | 				}
 281 | 				FUNC(quad_swap_merge)(pta, swap, cmp);
 282 | 				break;
 283 | 
 284 | 			case 15:
 285 | 				if (cmp(pta + 1, pta + 2) > 0 && cmp(pta + 3, pta + 4) > 0 && cmp(pta + 5, pta + 6) > 0)
 286 | 				{
 287 | 					pts = pta;
 288 | 					goto reversed;
 289 | 				}
 290 | 
 291 | 			default:
 292 | 			not_ordered:
 293 | 				x = !v1; tmp = pta[x]; pta[0] = pta[v1]; pta[1] = tmp; pta += 2;
 294 | 				x = !v2; tmp = pta[x]; pta[0] = pta[v2]; pta[1] = tmp; pta += 2;
 295 | 				x = !v3; tmp = pta[x]; pta[0] = pta[v3]; pta[1] = tmp; pta += 2;
 296 | 				x = !v4; tmp = pta[x]; pta[0] = pta[v4]; pta[1] = tmp; pta -= 6;
 297 | 
 298 | 				FUNC(quad_swap_merge)(pta, swap, cmp);
 299 | 		}
 300 | 		pta += 8;
 301 | 
 302 | 		continue;
 303 | 
 304 | 		ordered:
 305 | 
 306 | 		pta += 8;
 307 | 
 308 | 		if (count--)
 309 | 		{
 310 | 			if ((v1 = cmp(pta + 0, pta + 1) > 0) | (v2 = cmp(pta + 2, pta + 3) > 0) | (v3 = cmp(pta + 4, pta + 5) > 0) | (v4 = cmp(pta + 6, pta + 7) > 0))
 311 | 			{
 312 | 				if (v1 + v2 + v3 + v4 == 4 && cmp(pta + 1, pta + 2) > 0 && cmp(pta + 3, pta + 4) > 0 && cmp(pta + 5, pta + 6) > 0)
 313 | 				{
 314 | 					pts = pta;
 315 | 					goto reversed;
 316 | 				}
 317 | 				goto not_ordered;
 318 | 			}
 319 | 			if (cmp(pta + 1, pta + 2) <= 0 && cmp(pta + 3, pta + 4) <= 0 && cmp(pta + 5, pta + 6) <= 0)
 320 | 			{
 321 | 				goto ordered;
 322 | 			}
 323 | 			FUNC(quad_swap_merge)(pta, swap, cmp);
 324 | 			pta += 8;
 325 | 			continue;
 326 | 		}
 327 | 		break;
 328 | 
 329 | 		reversed:
 330 | 
 331 | 		pta += 8;
 332 | 
 333 | 		if (count--)
 334 | 		{
 335 | 			if ((v1 = cmp(pta + 0, pta + 1) <= 0) | (v2 = cmp(pta + 2, pta + 3) <= 0) | (v3 = cmp(pta + 4, pta + 5) <= 0) | (v4 = cmp(pta + 6, pta + 7) <= 0))
 336 | 			{
 337 | 				// not reversed
 338 | 			}
 339 | 			else
 340 | 			{
 341 | 				if (cmp(pta - 1, pta) > 0 && cmp(pta + 1, pta + 2) > 0 && cmp(pta + 3, pta + 4) > 0 && cmp(pta + 5, pta + 6) > 0)
 342 | 				{
 343 | 					goto reversed;
 344 | 				}
 345 | 			}
 346 | 			FUNC(quad_reversal)(pts, pta - 1);
 347 | 
 348 | 			if (v1 + v2 + v3 + v4 == 4 && cmp(pta + 1, pta + 2) <= 0 && cmp(pta + 3, pta + 4) <= 0 && cmp(pta + 5, pta + 6) <= 0)
 349 | 			{
 350 | 				goto ordered;
 351 | 			}
 352 | 			if (v1 + v2 + v3 + v4 == 0 && cmp(pta + 1, pta + 2)  > 0 && cmp(pta + 3, pta + 4)  > 0 && cmp(pta + 5, pta + 6)  > 0)
 353 | 			{
 354 | 				pts = pta;
 355 | 				goto reversed;
 356 | 			}
 357 | 
 358 | 			x = !v1; tmp = pta[v1]; pta[0] = pta[x]; pta[1] = tmp; pta += 2;
 359 | 			x = !v2; tmp = pta[v2]; pta[0] = pta[x]; pta[1] = tmp; pta += 2;
 360 | 			x = !v3; tmp = pta[v3]; pta[0] = pta[x]; pta[1] = tmp; pta += 2;
 361 | 			x = !v4; tmp = pta[v4]; pta[0] = pta[x]; pta[1] = tmp; pta -= 6;
 362 | 
 363 | 			if (cmp(pta + 1, pta + 2) > 0 || cmp(pta + 3, pta + 4) > 0 || cmp(pta + 5, pta + 6) > 0)
 364 | 			{
 365 | 				FUNC(quad_swap_merge)(pta, swap, cmp);
 366 | 			}
 367 | 			pta += 8;
 368 | 			continue;
 369 | 		}
 370 | 
 371 | 		switch (nmemb % 8)
 372 | 		{
 373 | 			case 7: if (cmp(pta + 5, pta + 6) <= 0) break;
 374 | 			case 6: if (cmp(pta + 4, pta + 5) <= 0) break;
 375 | 			case 5: if (cmp(pta + 3, pta + 4) <= 0) break;
 376 | 			case 4: if (cmp(pta + 2, pta + 3) <= 0) break;
 377 | 			case 3: if (cmp(pta + 1, pta + 2) <= 0) break;
 378 | 			case 2: if (cmp(pta + 0, pta + 1) <= 0) break;
 379 | 			case 1: if (cmp(pta - 1, pta + 0) <= 0) break;
 380 | 			case 0:
 381 | 				FUNC(quad_reversal)(pts, pta + nmemb % 8 - 1);
 382 | 
 383 | 				if (pts == array)
 384 | 				{
 385 | 					return 1;
 386 | 				}
 387 | 				goto reverse_end;
 388 | 		}
 389 | 		FUNC(quad_reversal)(pts, pta - 1);
 390 | 		break;
 391 | 	}
 392 | 	FUNC(tail_swap)(pta, swap, nmemb % 8, cmp);
 393 | 
 394 | 	reverse_end:
 395 | 
 396 | 	pta = array;
 397 | 
 398 | 	for (count = nmemb / 32 ; count-- ; pta += 32)
 399 | 	{
 400 | 		if (cmp(pta + 7, pta + 8) <= 0 && cmp(pta + 15, pta + 16) <= 0 && cmp(pta + 23, pta + 24) <= 0)
 401 | 		{
 402 | 			continue;
 403 | 		}
 404 | 		FUNC(parity_merge)(swap, pta, 8, 8, cmp);
 405 | 		FUNC(parity_merge)(swap + 16, pta + 16, 8, 8, cmp);
 406 | 		FUNC(parity_merge)(pta, swap, 16, 16, cmp);
 407 | 	}
 408 | 
 409 | 	if (nmemb % 32 > 8)
 410 | 	{
 411 | 		FUNC(tail_merge)(pta, swap, 32, nmemb % 32, 8, cmp);
 412 | 	}
 413 | 	return 0;
 414 | }
 415 | 
 416 | // The next six functions are quad merge support routines
 417 | 
 418 | void FUNC(cross_merge)(VAR *dest, VAR *from, size_t left, size_t right, CMPFUNC *cmp)
 419 | {
 420 | 	VAR *ptl, *tpl, *ptr, *tpr, *ptd, *tpd;
 421 | 	size_t loop;
 422 | #if !defined __clang__
 423 | 	size_t x, y;
 424 | #endif
 425 | 	ptl = from;
 426 | 	ptr = from + left;
 427 | 	tpl = ptr - 1;
 428 | 	tpr = tpl + right;
 429 | 
 430 | 	if (left + 1 >= right && right >= left && left >= 32)
 431 | 	{
 432 | 		if (cmp(ptl + 15, ptr) > 0 && cmp(ptl, ptr + 15) <= 0 && cmp(tpl, tpr - 15) > 0 && cmp(tpl - 15, tpr) <= 0)
 433 | 		{
 434 | 			FUNC(parity_merge)(dest, from, left, right, cmp);
 435 | 			return;
 436 | 		}
 437 | 	}
 438 | 	ptd = dest;
 439 | 	tpd = dest + left + right - 1;
 440 | 
 441 | 	while (1)
 442 | 	{
 443 | 		if (tpl - ptl > 8)
 444 | 		{
 445 | 			ptl8_ptr: if (cmp(ptl + 7, ptr) <= 0)
 446 | 			{
 447 | 				memcpy(ptd, ptl, 8 * sizeof(VAR)); ptd += 8; ptl += 8;
 448 | 
 449 | 				if (tpl - ptl > 8) {goto ptl8_ptr;} continue;
 450 | 			}
 451 | 
 452 | 			tpl8_tpr: if (cmp(tpl - 7, tpr) > 0)
 453 | 			{
 454 | 				tpd -= 7; tpl -= 7; memcpy(tpd--, tpl--, 8 * sizeof(VAR));
 455 | 
 456 | 				if (tpl - ptl > 8) {goto tpl8_tpr;} continue;
 457 | 			}
 458 | 		}
 459 | 
 460 | 		if (tpr - ptr > 8)
 461 | 		{
 462 | 			ptl_ptr8: if (cmp(ptl, ptr + 7) > 0)
 463 | 			{
 464 | 				memcpy(ptd, ptr, 8 * sizeof(VAR)); ptd += 8; ptr += 8;
 465 | 
 466 | 				if (tpr - ptr > 8) {goto ptl_ptr8;} continue;
 467 | 			}
 468 | 
 469 | 			tpl_tpr8: if (cmp(tpl, tpr - 7) <= 0)
 470 | 			{
 471 | 				tpd -= 7; tpr -= 7; memcpy(tpd--, tpr--, 8 * sizeof(VAR));
 472 | 
 473 | 				if (tpr - ptr > 8) {goto tpl_tpr8;} continue;
 474 | 			}
 475 | 		}
 476 | 
 477 | 		if (tpd - ptd < 16)
 478 | 		{
 479 | 			break;
 480 | 		}
 481 | 
 482 | #if !defined cmp && !defined __clang__
 483 | 		if (left > QUAD_CACHE)
 484 | 		{
 485 | 			loop = 8; do
 486 | 			{
 487 | 				*ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++;
 488 | 				*tpd-- = cmp(tpl, tpr)  > 0 ? *tpl-- : *tpr--;
 489 | 			}
 490 | 			while (--loop);
 491 | 		}
 492 | 		else
 493 | #endif
 494 | 		{
 495 | 			loop = 8; do
 496 | 			{
 497 | 				head_branchless_merge(ptd, x, ptl, ptr, cmp);
 498 | 				tail_branchless_merge(tpd, y, tpl, tpr, cmp);
 499 | 			}
 500 | 			while (--loop);
 501 | 		}
 502 | 	}
 503 | 
 504 | 	while (ptl <= tpl && ptr <= tpr)
 505 | 	{
 506 | 		*ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++;
 507 | 	}
 508 | 	while (ptl <= tpl)
 509 | 	{
 510 | 		*ptd++ = *ptl++;
 511 | 	}
 512 | 	while (ptr <= tpr)
 513 | 	{
 514 | 		*ptd++ = *ptr++;
 515 | 	}
 516 | }
 517 | 
 518 | void FUNC(quad_merge_block)(VAR *array, VAR *swap, size_t block, CMPFUNC *cmp)
 519 | {
 520 | 	VAR *pt1, *pt2, *pt3;
 521 | 	size_t block_x_2 = block * 2;
 522 | 
 523 | 	pt1 = array + block;
 524 | 	pt2 = pt1 + block;
 525 | 	pt3 = pt2 + block;
 526 | 
 527 | 	switch ((cmp(pt1 - 1, pt1) <= 0) | (cmp(pt3 - 1, pt3) <= 0) * 2)
 528 | 	{
 529 | 		case 0:
 530 | 			FUNC(cross_merge)(swap, array, block, block, cmp);
 531 | 			FUNC(cross_merge)(swap + block_x_2, pt2, block, block, cmp);
 532 | 			break;
 533 | 		case 1:
 534 | 			memcpy(swap, array, block_x_2 * sizeof(VAR));
 535 | 			FUNC(cross_merge)(swap + block_x_2, pt2, block, block, cmp);
 536 | 			break;
 537 | 		case 2:
 538 | 			FUNC(cross_merge)(swap, array, block, block, cmp);
 539 | 			memcpy(swap + block_x_2, pt2, block_x_2 * sizeof(VAR));
 540 | 			break;
 541 | 		case 3:
 542 | 			if (cmp(pt2 - 1, pt2) <= 0)
 543 | 				return;
 544 | 			memcpy(swap, array, block_x_2 * 2 * sizeof(VAR));
 545 | 	}
 546 | 	FUNC(cross_merge)(array, swap, block_x_2, block_x_2, cmp);
 547 | }
 548 | 
 549 | size_t FUNC(quad_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp)
 550 | {
 551 | 	VAR *pta, *pte;
 552 | 
 553 | 	pte = array + nmemb;
 554 | 
 555 | 	block *= 4;
 556 | 
 557 | 	while (block <= nmemb && block <= swap_size)
 558 | 	{
 559 | 		pta = array;
 560 | 
 561 | 		do
 562 | 		{
 563 | 			FUNC(quad_merge_block)(pta, swap, block / 4, cmp);
 564 | 
 565 | 			pta += block;
 566 | 		}
 567 | 		while (pta + block <= pte);
 568 | 
 569 | 		FUNC(tail_merge)(pta, swap, swap_size, pte - pta, block / 4, cmp);
 570 | 
 571 | 		block *= 4;
 572 | 	}
 573 | 
 574 | 	FUNC(tail_merge)(array, swap, swap_size, nmemb, block / 4, cmp);
 575 | 
 576 | 	return block / 2;
 577 | }
 578 | 
 579 | void FUNC(partial_forward_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp)
 580 | {
 581 | 	VAR *ptl, *ptr, *tpl, *tpr;
 582 | 	size_t x;
 583 | 
 584 | 	if (nmemb == block)
 585 | 	{
 586 | 		return;
 587 | 	}
 588 | 
 589 | 	ptr = array + block;
 590 | 	tpr = array + nmemb - 1;
 591 | 
 592 | 	if (cmp(ptr - 1, ptr) <= 0)
 593 | 	{
 594 | 		return;
 595 | 	}
 596 | 
 597 | 	memcpy(swap, array, block * sizeof(VAR));
 598 | 
 599 | 	ptl = swap;
 600 | 	tpl = swap + block - 1;
 601 | 
 602 | 	while (ptl < tpl - 1 && ptr < tpr - 1)
 603 | 	{
 604 | 		ptr2: if (cmp(ptl, ptr + 1) > 0)
 605 | 		{
 606 | 			*array++ = *ptr++; *array++ = *ptr++;
 607 | 
 608 | 			if (ptr < tpr - 1) {goto ptr2;} break;
 609 | 		}
 610 | 		if (cmp(ptl + 1, ptr) <= 0)
 611 | 		{
 612 | 			*array++ = *ptl++; *array++ = *ptl++;
 613 | 
 614 | 			if (ptl < tpl - 1) {goto ptl2;} break;
 615 | 		}
 616 | 
 617 | 		goto cross_swap;
 618 | 
 619 | 		ptl2: if (cmp(ptl + 1, ptr) <= 0)
 620 | 		{
 621 | 			*array++ = *ptl++; *array++ = *ptl++;
 622 | 
 623 | 			if (ptl < tpl - 1) {goto ptl2;} break;
 624 | 		}
 625 | 
 626 | 		if (cmp(ptl, ptr + 1) > 0)
 627 | 		{
 628 | 			*array++ = *ptr++; *array++ = *ptr++;
 629 | 
 630 | 			if (ptr < tpr - 1) {goto ptr2;} break;
 631 | 		}
 632 | 
 633 | 		cross_swap:
 634 | 
 635 | 		x = cmp(ptl, ptr) <= 0; array[x] = *ptr; ptr += 1; array[!x] = *ptl; ptl += 1; array += 2;
 636 | 		head_branchless_merge(array, x, ptl, ptr, cmp);
 637 | 	}
 638 | 
 639 | 	while (ptl <= tpl && ptr <= tpr)
 640 | 	{
 641 | 		*array++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++;
 642 | 	}
 643 | 
 644 | 	while (ptl <= tpl)
 645 | 	{
 646 | 		*array++ = *ptl++;
 647 | 	}
 648 | }
 649 | 
 650 | void FUNC(partial_backward_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp)
 651 | {
 652 | 	VAR *tpl, *tpa, *tpr;
 653 | 	size_t right, loop, x;
 654 | 
 655 | 	if (nmemb == block)
 656 | 	{
 657 | 		return;
 658 | 	}
 659 | 
 660 | 	tpl = array + block - 1;
 661 | 	tpa = array + nmemb - 1;
 662 | 
 663 | 	if (cmp(tpl, tpl + 1) <= 0)
 664 | 	{
 665 | 		return;
 666 | 	}
 667 | 
 668 | 	right = nmemb - block;
 669 | 
 670 | 	if (nmemb <= swap_size && right >= 64)
 671 | 	{
 672 | 		FUNC(cross_merge)(swap, array, block, right, cmp);
 673 | 
 674 | 		memcpy(array, swap, nmemb * sizeof(VAR));
 675 | 
 676 | 		return;
 677 | 	}
 678 | 
 679 | 	memcpy(swap, array + block, right * sizeof(VAR));
 680 | 
 681 | 	tpr = swap + right - 1;
 682 | 
 683 | 	while (tpl > array + 16 && tpr > swap + 16)
 684 | 	{
 685 | 		tpl_tpr16: if (cmp(tpl, tpr - 15) <= 0)
 686 | 		{
 687 | 			loop = 16; do *tpa-- = *tpr--; while (--loop);
 688 | 
 689 | 			if (tpr > swap + 16) {goto tpl_tpr16;} break;
 690 | 		}
 691 | 
 692 | 		tpl16_tpr: if (cmp(tpl - 15, tpr) > 0)
 693 | 		{
 694 | 			loop = 16; do *tpa-- = *tpl--; while (--loop);
 695 | 			
 696 | 			if (tpl > array + 16) {goto tpl16_tpr;} break;
 697 | 		}
 698 | 		loop = 8; do
 699 | 		{
 700 | 			if (cmp(tpl, tpr - 1) <= 0)
 701 | 			{
 702 | 				*tpa-- = *tpr--; *tpa-- = *tpr--;
 703 | 			}
 704 | 			else if (cmp(tpl - 1, tpr) > 0)
 705 | 			{
 706 | 				*tpa-- = *tpl--; *tpa-- = *tpl--;
 707 | 			}
 708 | 			else
 709 | 			{
 710 | 				x = cmp(tpl, tpr) <= 0; tpa--; tpa[x] = *tpr; tpr -= 1; tpa[!x] = *tpl; tpl -= 1; tpa--;
 711 | 				tail_branchless_merge(tpa, x, tpl, tpr, cmp);
 712 | 			}
 713 | 		}
 714 | 		while (--loop);
 715 | 	}
 716 | 
 717 | 	while (tpr > swap + 1 && tpl > array + 1)
 718 | 	{
 719 | 		tpr2: if (cmp(tpl, tpr - 1) <= 0)
 720 | 		{
 721 | 			*tpa-- = *tpr--; *tpa-- = *tpr--;
 722 | 			
 723 | 			if (tpr > swap + 1) {goto tpr2;} break;
 724 | 		}
 725 | 
 726 | 		if (cmp(tpl - 1, tpr) > 0)
 727 | 		{
 728 | 			*tpa-- = *tpl--; *tpa-- = *tpl--;
 729 | 
 730 | 			if (tpl > array + 1) {goto tpl2;} break;
 731 | 		}
 732 | 		goto cross_swap;
 733 | 
 734 | 		tpl2: if (cmp(tpl - 1, tpr) > 0)
 735 | 		{
 736 | 			*tpa-- = *tpl--; *tpa-- = *tpl--;
 737 | 
 738 | 			if (tpl > array + 1) {goto tpl2;} break;
 739 | 		}
 740 | 
 741 | 		if (cmp(tpl, tpr - 1) <= 0)
 742 | 		{
 743 | 			*tpa-- = *tpr--; *tpa-- = *tpr--;
 744 | 			
 745 | 			if (tpr > swap + 1) {goto tpr2;} break;
 746 | 		}
 747 | 		cross_swap:
 748 | 
 749 | 		x = cmp(tpl, tpr) <= 0; tpa--; tpa[x] = *tpr; tpr -= 1; tpa[!x] = *tpl; tpl -= 1; tpa--;
 750 | 		tail_branchless_merge(tpa, x, tpl, tpr, cmp);
 751 | 	}
 752 | 
 753 | 	while (tpr >= swap && tpl >= array)
 754 | 	{
 755 | 		*tpa-- = cmp(tpl, tpr) > 0 ? *tpl-- : *tpr--;
 756 | 	}
 757 | 
 758 | 	while (tpr >= swap)
 759 | 	{
 760 | 		*tpa-- = *tpr--;
 761 | 	}
 762 | }
 763 | 
 764 | void FUNC(tail_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp)
 765 | {
 766 | 	VAR *pta, *pte;
 767 | 
 768 | 	pte = array + nmemb;
 769 | 
 770 | 	while (block < nmemb && block <= swap_size)
 771 | 	{
 772 | 		for (pta = array ; pta + block < pte ; pta += block * 2)
 773 | 		{
 774 | 			if (pta + block * 2 < pte)
 775 | 			{
 776 | 				FUNC(partial_backward_merge)(pta, swap, swap_size, block * 2, block, cmp);
 777 | 
 778 | 				continue;
 779 | 			}
 780 | 			FUNC(partial_backward_merge)(pta, swap, swap_size, pte - pta, block, cmp);
 781 | 
 782 | 			break;
 783 | 		}
 784 | 		block *= 2;
 785 | 	}
 786 | }
 787 | 
 788 | // the next four functions provide in-place rotate merge support
 789 | 
 790 | void FUNC(trinity_rotation)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t left)
 791 | {
 792 | 	VAR temp;
 793 | 	size_t bridge, right = nmemb - left;
 794 | 
 795 | 	if (swap_size > 65536)
 796 | 	{
 797 | 		swap_size = 65536;
 798 | 	}
 799 | 
 800 | 	if (left < right)
 801 | 	{
 802 | 		if (left <= swap_size)
 803 | 		{
 804 | 			memcpy(swap, array, left * sizeof(VAR));
 805 | 			memmove(array, array + left, right * sizeof(VAR));
 806 | 			memcpy(array + right, swap, left * sizeof(VAR));
 807 | 		}
 808 | 		else
 809 | 		{
 810 | 			VAR *pta, *ptb, *ptc, *ptd;
 811 | 
 812 | 			pta = array;
 813 | 			ptb = pta + left;
 814 | 
 815 | 			bridge = right - left;
 816 | 
 817 | 			if (bridge <= swap_size && bridge > 3)
 818 | 			{
 819 | 				ptc = pta + right;
 820 | 				ptd = ptc + left;
 821 | 
 822 | 				memcpy(swap, ptb, bridge * sizeof(VAR));
 823 | 
 824 | 				while (left--)
 825 | 				{
 826 | 					*--ptc = *--ptd; *ptd = *--ptb;
 827 | 				}
 828 | 				memcpy(pta, swap, bridge * sizeof(VAR));
 829 | 			}
 830 | 			else
 831 | 			{
 832 | 				ptc = ptb;
 833 | 				ptd = ptc + right;
 834 | 
 835 | 				bridge = left / 2;
 836 | 
 837 | 				while (bridge--)
 838 | 				{
 839 | 					temp = *--ptb; *ptb = *pta; *pta++ = *ptc; *ptc++ = *--ptd; *ptd = temp;
 840 | 				}
 841 | 
 842 | 				bridge = (ptd - ptc) / 2;
 843 | 
 844 | 				while (bridge--)
 845 | 				{
 846 | 					temp = *ptc; *ptc++ = *--ptd; *ptd = *pta; *pta++ = temp;
 847 | 				}
 848 | 
 849 | 				bridge = (ptd - pta) / 2;
 850 | 
 851 | 				while (bridge--)
 852 | 				{
 853 | 					temp = *pta; *pta++ = *--ptd; *ptd = temp;
 854 | 				}
 855 | 			}
 856 | 		}
 857 | 	}
 858 | 	else if (right < left)
 859 | 	{
 860 | 		if (right <= swap_size)
 861 | 		{
 862 | 			memcpy(swap, array + left, right * sizeof(VAR));
 863 | 			memmove(array + right, array, left * sizeof(VAR));
 864 | 			memcpy(array, swap, right * sizeof(VAR));
 865 | 		}
 866 | 		else
 867 | 		{
 868 | 			VAR *pta, *ptb, *ptc, *ptd;
 869 | 
 870 | 			pta = array;
 871 | 			ptb = pta + left;
 872 | 
 873 | 			bridge = left - right;
 874 | 
 875 | 			if (bridge <= swap_size && bridge > 3)
 876 | 			{
 877 | 				ptc = pta + right;
 878 | 				ptd = ptc + left;
 879 | 
 880 | 				memcpy(swap, ptc, bridge * sizeof(VAR));
 881 | 
 882 | 				while (right--)
 883 | 				{
 884 | 					*ptc++ = *pta; *pta++ = *ptb++;
 885 | 				}
 886 | 				memcpy(ptd - bridge, swap, bridge * sizeof(VAR));
 887 | 			}
 888 | 			else
 889 | 			{
 890 | 				ptc = ptb;
 891 | 				ptd = ptc + right;
 892 | 
 893 | 				bridge = right / 2;
 894 | 
 895 | 				while (bridge--)
 896 | 				{
 897 | 					temp = *--ptb; *ptb = *pta; *pta++ = *ptc; *ptc++ = *--ptd; *ptd = temp;
 898 | 				}
 899 | 
 900 | 				bridge = (ptb - pta) / 2;
 901 | 
 902 | 				while (bridge--)
 903 | 				{
 904 | 					temp = *--ptb; *ptb = *pta; *pta++ = *--ptd; *ptd = temp;
 905 | 				}
 906 | 
 907 | 				bridge = (ptd - pta) / 2;
 908 | 
 909 | 				while (bridge--)
 910 | 				{
 911 | 					temp = *pta; *pta++ = *--ptd; *ptd = temp;
 912 | 				}
 913 | 			}
 914 | 		}
 915 | 	}
 916 | 	else
 917 | 	{
 918 | 		VAR *pta, *ptb;
 919 | 
 920 | 		pta = array;
 921 | 		ptb = pta + left;
 922 | 
 923 | 		while (left--)
 924 | 		{
 925 | 			temp = *pta; *pta++ = *ptb; *ptb++ = temp;
 926 | 		}
 927 | 	}
 928 | }
 929 | 
 930 | size_t FUNC(monobound_binary_first)(VAR *array, VAR *value, size_t top, CMPFUNC *cmp)
 931 | {
 932 | 	VAR *end;
 933 | 	size_t mid;
 934 | 
 935 | 	end = array + top;
 936 | 
 937 | 	while (top > 1)
 938 | 	{
 939 | 		mid = top / 2;
 940 | 
 941 | 		if (cmp(value, end - mid) <= 0)
 942 | 		{
 943 | 			end -= mid;
 944 | 		}
 945 | 		top -= mid;
 946 | 	}
 947 | 
 948 | 	if (cmp(value, end - 1) <= 0)
 949 | 	{
 950 | 		end--;
 951 | 	}
 952 | 	return (end - array);
 953 | }
 954 | 
 955 | void FUNC(rotate_merge_block)(VAR *array, VAR *swap, size_t swap_size, size_t lblock, size_t right, CMPFUNC *cmp)
 956 | {
 957 | 	size_t left, rblock, unbalanced;
 958 | 
 959 | 	if (cmp(array + lblock - 1, array + lblock) <= 0)
 960 | 	{
 961 | 		return;
 962 | 	}
 963 | 
 964 | 	rblock = lblock / 2;
 965 | 	lblock -= rblock;
 966 | 
 967 | 	left = FUNC(monobound_binary_first)(array + lblock + rblock, array + lblock, right, cmp);
 968 | 
 969 | 	right -= left;
 970 | 
 971 | 	// [ lblock ] [ rblock ] [ left ] [ right ]
 972 | 
 973 | 	if (left)
 974 | 	{
 975 | 		if (lblock + left <= swap_size)
 976 | 		{
 977 | 			memcpy(swap, array, lblock * sizeof(VAR));
 978 | 			memcpy(swap + lblock, array + lblock + rblock, left * sizeof(VAR));
 979 | 			memmove(array + lblock + left, array + lblock, rblock * sizeof(VAR));
 980 | 
 981 | 			FUNC(cross_merge)(array, swap, lblock, left, cmp);
 982 | 		}
 983 | 		else
 984 | 		{
 985 | 			FUNC(trinity_rotation)(array + lblock, swap, swap_size, rblock + left, rblock);
 986 | 
 987 | 			unbalanced = (left * 2 < lblock) | (lblock * 2 < left);
 988 | 
 989 | 			if (unbalanced && left <= swap_size)
 990 | 			{
 991 | 				FUNC(partial_backward_merge)(array, swap, swap_size, lblock + left, lblock, cmp);
 992 | 			}
 993 | 			else if (unbalanced && lblock <= swap_size)
 994 | 			{
 995 | 				FUNC(partial_forward_merge)(array, swap, swap_size, lblock + left, lblock, cmp);
 996 | 			}
 997 | 			else
 998 | 			{
 999 | 				FUNC(rotate_merge_block)(array, swap, swap_size, lblock, left, cmp);
1000 | 			}
1001 | 		}
1002 | 	}
1003 | 
1004 | 	if (right)
1005 | 	{
1006 | 		unbalanced = (right * 2 < rblock) | (rblock * 2 < right);
1007 | 
1008 | 		if ((unbalanced && right <= swap_size) || right + rblock <= swap_size)
1009 | 		{
1010 | 			FUNC(partial_backward_merge)(array + lblock + left, swap, swap_size, rblock + right, rblock, cmp);
1011 | 		}
1012 | 		else if (unbalanced && rblock <= swap_size)
1013 | 		{
1014 | 			FUNC(partial_forward_merge)(array + lblock + left, swap, swap_size, rblock + right, rblock, cmp);
1015 | 		}
1016 | 		else
1017 | 		{
1018 | 			FUNC(rotate_merge_block)(array + lblock + left, swap, swap_size, rblock, right, cmp);
1019 | 		}
1020 | 	}
1021 | }
1022 | 
1023 | void FUNC(rotate_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp)
1024 | {
1025 | 	VAR *pta, *pte;
1026 | 
1027 | 	pte = array + nmemb;
1028 | 
1029 | 	if (nmemb <= block * 2 && nmemb - block <= swap_size)
1030 | 	{
1031 | 		FUNC(partial_backward_merge)(array, swap, swap_size, nmemb, block, cmp);
1032 | 
1033 | 		return;
1034 | 	}
1035 | 
1036 | 	while (block < nmemb)
1037 | 	{
1038 | 		for (pta = array ; pta + block < pte ; pta += block * 2)
1039 | 		{
1040 | 			if (pta + block * 2 < pte)
1041 | 			{
1042 | 				FUNC(rotate_merge_block)(pta, swap, swap_size, block, block, cmp);
1043 | 
1044 | 				continue;
1045 | 			}
1046 | 			FUNC(rotate_merge_block)(pta, swap, swap_size, block, pte - pta - block, cmp);
1047 | 
1048 | 			break;
1049 | 		}
1050 | 		block *= 2;
1051 | 	}
1052 | }
1053 | 
1054 | ///////////////////////////////////////////////////////////////////////////////
1055 | //┌─────────────────────────────────────────────────────────────────────────┐//
1056 | //│    ██████┐ ██┐   ██┐ █████┐ ██████┐ ███████┐ ██████┐ ██████┐ ████████┐  │//
1057 | //│   ██┌───██┐██│   ██│██┌──██┐██┌──██┐██┌────┘██┌───██┐██┌──██┐└──██┌──┘  │//
1058 | //│   ██│   ██│██│   ██│███████│██│  ██│███████┐██│   ██│██████┌┘   ██│     │//
1059 | //│   ██│▄▄ ██│██│   ██│██┌──██│██│  ██│└────██│██│   ██│██┌──██┐   ██│     │//
1060 | //│   └██████┌┘└██████┌┘██│  ██│██████┌┘███████│└██████┌┘██│  ██│   ██│     │//
1061 | //│    └──▀▀─┘  └─────┘ └─┘  └─┘└─────┘ └──────┘ └─────┘ └─┘  └─┘   └─┘     │//
1062 | //└─────────────────────────────────────────────────────────────────────────┘//
1063 | ///////////////////////////////////////////////////////////////////////////////
1064 | 
1065 | void FUNC(quadsort)(void *array, size_t nmemb, CMPFUNC *cmp)
1066 | {
1067 | 	VAR *pta = (VAR *) array;
1068 | 
1069 | 	if (nmemb < 32)
1070 | 	{
1071 | 		VAR swap[nmemb];
1072 | 
1073 | 		FUNC(tail_swap)(pta, swap, nmemb, cmp);
1074 | 	}
1075 | 	else if (FUNC(quad_swap)(pta, nmemb, cmp) == 0)
1076 | 	{
1077 | 		VAR *swap = NULL;
1078 | 		size_t block, swap_size = nmemb;
1079 | 
1080 | 		if (nmemb > 4194304) for (swap_size = 4194304 ; swap_size * 8 <= nmemb ; swap_size *= 4) {}
1081 | 
1082 | 		swap = (VAR *) malloc(swap_size * sizeof(VAR));
1083 | 
1084 | 		if (swap == NULL)
1085 | 		{
1086 | 			VAR stack[512];
1087 | 
1088 | 			block = FUNC(quad_merge)(pta, stack, 512, nmemb, 32, cmp);
1089 | 
1090 | 			FUNC(rotate_merge)(pta, stack, 512, nmemb, block, cmp);
1091 | 
1092 | 			return;
1093 | 		}
1094 | 		block = FUNC(quad_merge)(pta, swap, swap_size, nmemb, 32, cmp);
1095 | 
1096 | 		FUNC(rotate_merge)(pta, swap, swap_size, nmemb, block, cmp);
1097 | 
1098 | 		free(swap);
1099 | 	}
1100 | }
1101 | 
1102 | void FUNC(quadsort_swap)(void *array, void *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp)
1103 | {
1104 | 	VAR *pta = (VAR *) array;
1105 | 	VAR *pts = (VAR *) swap;
1106 | 
1107 | 	if (nmemb <= 96)
1108 | 	{
1109 | 		FUNC(tail_swap)(pta, pts, nmemb, cmp);
1110 | 	}
1111 | 	else if (FUNC(quad_swap)(pta, nmemb, cmp) == 0)
1112 | 	{
1113 | 		size_t block = FUNC(quad_merge)(pta, pts, swap_size, nmemb, 32, cmp);
1114 | 
1115 | 		FUNC(rotate_merge)(pta, pts, swap_size, nmemb, block, cmp);
1116 | 	}
1117 | }
1118 | 


--------------------------------------------------------------------------------
/src/quadsort.h:
--------------------------------------------------------------------------------
  1 | // quadsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com
  2 | 
  3 | #ifndef QUADSORT_H
  4 | #define QUADSORT_H
  5 | 
  6 | #include <stdlib.h>
  7 | #include <stdio.h>
  8 | #include <assert.h>
  9 | #include <errno.h>
 10 | #include <float.h>
 11 | #include <string.h>
 12 | 
 13 | //#include <stdalign.h>
 14 | 
 15 | typedef int CMPFUNC (const void *a, const void *b);
 16 | 
 17 | //#define cmp(a,b) (*(a) > *(b))
 18 | 
 19 | 
 20 | // When sorting an array of pointers, like a string array, the QUAD_CACHE needs
 21 | // to be set for proper performance when sorting large arrays.
 22 | // quadsort_prim() can be used to sort arrays of 32 and 64 bit integers
 23 | // without a comparison function or cache restrictions.
 24 | 
 25 | // With a 6 MB L3 cache a value of 262144 works well.
 26 | 
 27 | #ifdef cmp
 28 |   #define QUAD_CACHE 4294967295
 29 | #else
 30 | //#define QUAD_CACHE 131072
 31 |   #define QUAD_CACHE 262144
 32 | //#define QUAD_CACHE 524288
 33 | //#define QUAD_CACHE 4294967295
 34 | #endif
 35 | 
 36 | // utilize branchless ternary operations in clang
 37 | 
 38 | #if !defined __clang__
 39 | #define head_branchless_merge(ptd, x, ptl, ptr, cmp)  \
 40 | 	x = cmp(ptl, ptr) <= 0;  \
 41 | 	*ptd = *ptl;  \
 42 | 	ptl += x;  \
 43 | 	ptd[x] = *ptr;  \
 44 | 	ptr += !x;  \
 45 | 	ptd++;
 46 | #else
 47 | #define head_branchless_merge(ptd, x, ptl, ptr, cmp)  \
 48 | 	*ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++;
 49 | #endif
 50 | 
 51 | #if !defined __clang__
 52 | #define tail_branchless_merge(tpd, y, tpl, tpr, cmp)  \
 53 | 	y = cmp(tpl, tpr) <= 0;  \
 54 | 	*tpd = *tpl;  \
 55 | 	tpl -= !y;  \
 56 | 	tpd--;  \
 57 | 	tpd[y] = *tpr;  \
 58 | 	tpr -= y;
 59 | #else
 60 | #define tail_branchless_merge(tpd, x, tpl, tpr, cmp)  \
 61 | 	*tpd-- = cmp(tpl, tpr) > 0 ? *tpl-- : *tpr--;
 62 | #endif
 63 | 
 64 | // guarantee small parity merges are inlined with minimal overhead
 65 | 
 66 | #define parity_merge_two(array, swap, x, ptl, ptr, pts, cmp)  \
 67 | 	ptl = array; ptr = array + 2; pts = swap;  \
 68 | 	head_branchless_merge(pts, x, ptl, ptr, cmp);  \
 69 | 	*pts = cmp(ptl, ptr) <= 0 ? *ptl : *ptr;  \
 70 |   \
 71 | 	ptl = array + 1; ptr = array + 3; pts = swap + 3;  \
 72 | 	tail_branchless_merge(pts, x, ptl, ptr, cmp);  \
 73 | 	*pts = cmp(ptl, ptr)  > 0 ? *ptl : *ptr;
 74 | 
 75 | #define parity_merge_four(array, swap, x, ptl, ptr, pts, cmp)  \
 76 | 	ptl = array + 0; ptr = array + 4; pts = swap;  \
 77 | 	head_branchless_merge(pts, x, ptl, ptr, cmp);  \
 78 | 	head_branchless_merge(pts, x, ptl, ptr, cmp);  \
 79 | 	head_branchless_merge(pts, x, ptl, ptr, cmp);  \
 80 | 	*pts = cmp(ptl, ptr) <= 0 ? *ptl : *ptr;  \
 81 |   \
 82 | 	ptl = array + 3; ptr = array + 7; pts = swap + 7;  \
 83 | 	tail_branchless_merge(pts, x, ptl, ptr, cmp);  \
 84 | 	tail_branchless_merge(pts, x, ptl, ptr, cmp);  \
 85 | 	tail_branchless_merge(pts, x, ptl, ptr, cmp);  \
 86 | 	*pts = cmp(ptl, ptr)  > 0 ? *ptl : *ptr;
 87 | 
 88 | 
 89 | #if !defined __clang__
 90 | #define branchless_swap(pta, swap, x, cmp)  \
 91 | 	x = cmp(pta, pta + 1) > 0;  \
 92 | 	swap = pta[!x];  \
 93 | 	pta[0] = pta[x];  \
 94 | 	pta[1] = swap;
 95 | #else
 96 | #define branchless_swap(pta, swap, x, cmp)  \
 97 | 	x = 0;  \
 98 | 	swap = cmp(pta, pta + 1) > 0 ? pta[x++] : pta[1];  \
 99 | 	pta[0] = pta[x];  \
100 | 	pta[1] = swap;
101 | #endif
102 | 
103 | #define swap_branchless(pta, swap, x, y, cmp)  \
104 | 	x = cmp(pta, pta + 1) > 0;  \
105 | 	y = !x;  \
106 | 	swap = pta[y];  \
107 | 	pta[0] = pta[x];  \
108 | 	pta[1] = swap;
109 | 
110 | //////////////////////////////////////////////////////////
111 | // ┌───────────────────────────────────────────────────┐//
112 | // │       ██████┐ ██████┐    ██████┐ ██████┐████████┐ │//
113 | // │       └────██┐└────██┐   ██┌──██┐└─██┌─┘└──██┌──┘ │//
114 | // │        █████┌┘ █████┌┘   ██████┌┘  ██│     ██│    │//
115 | // │        └───██┐██┌───┘    ██┌──██┐  ██│     ██│    │//
116 | // │       ██████┌┘███████┐   ██████┌┘██████┐   ██│    │//
117 | // │       └─────┘ └──────┘   └─────┘ └─────┘   └─┘    │//
118 | // └───────────────────────────────────────────────────┘//
119 | //////////////////////////////////////////////////////////
120 | 
121 | #define VAR int
122 | #define FUNC(NAME) NAME##32
123 | 
124 | #include "quadsort.c"
125 | 
126 | #undef VAR
127 | #undef FUNC
128 | 
129 | // quadsort_prim
130 | 
131 | #define VAR int
132 | #define FUNC(NAME) NAME##_int32
133 | #ifndef cmp
134 |   #define cmp(a,b) (*(a) > *(b))
135 |   #include "quadsort.c"
136 |   #undef cmp
137 | #else
138 |   #include "quadsort.c"
139 | #endif
140 | #undef VAR
141 | #undef FUNC
142 | 
143 | #define VAR unsigned int
144 | #define FUNC(NAME) NAME##_uint32
145 | #ifndef cmp
146 |   #define cmp(a,b) (*(a) > *(b))
147 |   #include "quadsort.c"
148 |   #undef cmp
149 | #else
150 |   #include "quadsort.c"
151 | #endif
152 | #undef VAR
153 | #undef FUNC
154 | 
155 | //////////////////////////////////////////////////////////
156 | // ┌───────────────────────────────────────────────────┐//
157 | // │        █████┐ ██┐  ██┐   ██████┐ ██████┐████████┐ │//
158 | // │       ██┌───┘ ██│  ██│   ██┌──██┐└─██┌─┘└──██┌──┘ │//
159 | // │       ██████┐ ███████│   ██████┌┘  ██│     ██│    │//
160 | // │       ██┌──██┐└────██│   ██┌──██┐  ██│     ██│    │//
161 | // │       └█████┌┘     ██│   ██████┌┘██████┐   ██│    │//
162 | // │        └────┘      └─┘   └─────┘ └─────┘   └─┘    │//
163 | // └───────────────────────────────────────────────────┘//
164 | //////////////////////////////////////////////////////////
165 | 
166 | #define VAR long long
167 | #define FUNC(NAME) NAME##64
168 | 
169 | #include "quadsort.c"
170 | 
171 | #undef VAR
172 | #undef FUNC
173 | 
174 | // quadsort_prim
175 | 
176 | #define VAR long long
177 | #define FUNC(NAME) NAME##_int64
178 | #ifndef cmp
179 |   #define cmp(a,b) (*(a) > *(b))
180 |   #include "quadsort.c"
181 |   #undef cmp
182 | #else
183 |   #include "quadsort.c"
184 | #endif
185 | #undef VAR
186 | #undef FUNC
187 | 
188 | #define VAR unsigned long long
189 | #define FUNC(NAME) NAME##_uint64
190 | #ifndef cmp
191 |   #define cmp(a,b) (*(a) > *(b))
192 |   #include "quadsort.c"
193 |   #undef cmp
194 | #else
195 |   #include "quadsort.c"
196 | #endif
197 | #undef VAR
198 | #undef FUNC
199 | 
200 | // This section is outside of 32/64 bit pointer territory, so no cache checks
201 | // necessary, unless sorting 32+ byte structures.
202 | 
203 | #undef QUAD_CACHE
204 | #define QUAD_CACHE 4294967295
205 | 
206 | //////////////////////////////////////////////////////////
207 | //┌────────────────────────────────────────────────────┐//
208 | //│                █████┐    ██████┐ ██████┐████████┐  │//
209 | //│               ██┌──██┐   ██┌──██┐└─██┌─┘└──██┌──┘  │//
210 | //│               └█████┌┘   ██████┌┘  ██│     ██│     │//
211 | //│               ██┌──██┐   ██┌──██┐  ██│     ██│     │//
212 | //│               └█████┌┘   ██████┌┘██████┐   ██│     │//
213 | //│                └────┘    └─────┘ └─────┘   └─┘     │//
214 | //└────────────────────────────────────────────────────┘//
215 | //////////////////////////////////////////////////////////
216 | 
217 | #define VAR char
218 | #define FUNC(NAME) NAME##8
219 | 
220 | #include "quadsort.c"
221 | 
222 | #undef VAR
223 | #undef FUNC
224 | 
225 | //////////////////////////////////////////////////////////
226 | //┌────────────────────────────────────────────────────┐//
227 | //│           ▄██┐   █████┐    ██████┐ ██████┐████████┐│//
228 | //│          ████│  ██┌───┘    ██┌──██┐└─██┌─┘└──██┌──┘│//
229 | //│          └─██│  ██████┐    ██████┌┘  ██│     ██│   │//
230 | //│            ██│  ██┌──██┐   ██┌──██┐  ██│     ██│   │//
231 | //│          ██████┐└█████┌┘   ██████┌┘██████┐   ██│   │//
232 | //│          └─────┘ └────┘    └─────┘ └─────┘   └─┘   │//
233 | //└────────────────────────────────────────────────────┘//
234 | //////////////////////////////////////////////////////////
235 | 
236 | #define VAR short
237 | #define FUNC(NAME) NAME##16
238 | 
239 | #include "quadsort.c"
240 | 
241 | #undef VAR
242 | #undef FUNC
243 | 
244 | //////////////////////////////////////////////////////////
245 | //┌────────────────────────────────────────────────────┐//
246 | //│  ▄██┐  ██████┐  █████┐    ██████┐ ██████┐████████┐ │//
247 | //│ ████│  └────██┐██┌──██┐   ██┌──██┐└─██┌─┘└──██┌──┘ │//
248 | //│ └─██│   █████┌┘└█████┌┘   ██████┌┘  ██│     ██│    │//
249 | //│   ██│  ██┌───┘ ██┌──██┐   ██┌──██┐  ██│     ██│    │//
250 | //│ ██████┐███████┐└█████┌┘   ██████┌┘██████┐   ██│    │//
251 | //│ └─────┘└──────┘ └────┘    └─────┘ └─────┘   └─┘    │//
252 | //└────────────────────────────────────────────────────┘//
253 | //////////////////////////////////////////////////////////
254 | 
255 | // 128 reflects the name, though the actual size of a long double is 64, 80,
256 | // 96, or 128 bits, depending on platform.
257 | 
258 | #if (DBL_MANT_DIG < LDBL_MANT_DIG)
259 |   #define VAR long double
260 |   #define FUNC(NAME) NAME##128
261 |   #include "quadsort.c"
262 |   #undef VAR
263 |   #undef FUNC
264 | #endif
265 | 
266 | ///////////////////////////////////////////////////////////
267 | //┌─────────────────────────────────────────────────────┐//
268 | //│ ██████┐██┐   ██┐███████┐████████┐ ██████┐ ███┐  ███┐│//
269 | //│██┌────┘██│   ██│██┌────┘└──██┌──┘██┌───██┐████┐████││//
270 | //│██│     ██│   ██│███████┐   ██│   ██│   ██│██┌███┌██││//
271 | //│██│     ██│   ██│└────██│   ██│   ██│   ██│██│└█┌┘██││//
272 | //│└██████┐└██████┌┘███████│   ██│   └██████┌┘██│ └┘ ██││//
273 | //│ └─────┘ └─────┘ └──────┘   └─┘    └─────┘ └─┘    └─┘│//
274 | //└─────────────────────────────────────────────────────┘//
275 | ///////////////////////////////////////////////////////////
276 | 
277 | /*
278 | typedef struct {char bytes[32];} struct256;
279 | #define VAR struct256
280 | #define FUNC(NAME) NAME##256
281 | 
282 | #include "quadsort.c"
283 | 
284 | #undef VAR
285 | #undef FUNC
286 | */
287 | 
288 | ///////////////////////////////////////////////////////////////////////////////
289 | //┌─────────────────────────────────────────────────────────────────────────┐//
290 | //│    ██████┐ ██┐   ██┐ █████┐ ██████┐ ███████┐ ██████┐ ██████┐ ████████┐  │//
291 | //│   ██┌───██┐██│   ██│██┌──██┐██┌──██┐██┌────┘██┌───██┐██┌──██┐└──██┌──┘  │//
292 | //│   ██│   ██│██│   ██│███████│██│  ██│███████┐██│   ██│██████┌┘   ██│     │//
293 | //│   ██│▄▄ ██│██│   ██│██┌──██│██│  ██│└────██│██│   ██│██┌──██┐   ██│     │//
294 | //│   └██████┌┘└██████┌┘██│  ██│██████┌┘███████│└██████┌┘██│  ██│   ██│     │//
295 | //│    └──▀▀─┘  └─────┘ └─┘  └─┘└─────┘ └──────┘ └─────┘ └─┘  └─┘   └─┘     │//
296 | //└─────────────────────────────────────────────────────────────────────────┘//
297 | ///////////////////////////////////////////////////////////////////////////////
298 | 
299 | 
300 | void quadsort(void *array, size_t nmemb, size_t size, CMPFUNC *cmp)
301 | {
302 | 	if (nmemb < 2)
303 | 	{
304 | 		return;
305 | 	}
306 | 
307 | 	switch (size)
308 | 	{
309 | 		case sizeof(char):
310 | 			quadsort8(array, nmemb, cmp);
311 | 			return;
312 | 
313 | 		case sizeof(short):
314 | 			quadsort16(array, nmemb, cmp);
315 | 			return;
316 | 
317 | 		case sizeof(int):
318 | 			quadsort32(array, nmemb, cmp);
319 | 			return;
320 | 
321 | 		case sizeof(long long):
322 | 			quadsort64(array, nmemb, cmp);
323 | 			return;
324 | #if (DBL_MANT_DIG < LDBL_MANT_DIG)
325 | 		case sizeof(long double):
326 | 			quadsort128(array, nmemb, cmp);
327 | 			return;
328 | #endif
329 | //		case sizeof(struct256):
330 | //			quadsort256(array, nmemb, cmp);
331 | //			return;
332 | 
333 | 		default:
334 | #if (DBL_MANT_DIG < LDBL_MANT_DIG)
335 | 			assert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long) || size == sizeof(long double));
336 | #else
337 | 			assert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long));
338 | #endif
339 | //			qsort(array, nmemb, size, cmp);
340 | 	}
341 | }
342 | 
343 | // suggested size values for primitives:
344 | 
345 | //		case  0: unsigned char
346 | //		case  1: signed char
347 | //		case  2: signed short
348 | //		case  3: unsigned short
349 | //		case  4: signed int
350 | //		case  5: unsigned int
351 | //		case  6: float
352 | //		case  7: double
353 | //		case  8: signed long long
354 | //		case  9: unsigned long long
355 | //		case  ?: long double, use sizeof(long double):
356 | 
357 | void quadsort_prim(void *array, size_t nmemb, size_t size)
358 | {
359 | 	if (nmemb < 2)
360 | 	{
361 | 		return;
362 | 	}
363 | 
364 | 	switch (size)
365 | 	{
366 | 		case 4:
367 | 			quadsort_int32(array, nmemb, NULL);
368 | 			return;
369 | 		case 5:
370 | 			quadsort_uint32(array, nmemb, NULL);
371 | 			return;
372 | 		case 8:
373 | 			quadsort_int64(array, nmemb, NULL);
374 | 			return;
375 | 		case 9:
376 | 			quadsort_uint64(array, nmemb, NULL);
377 | 			return;
378 | 		default:
379 | 			assert(size == sizeof(int) || size == sizeof(int) + 1 || size == sizeof(long long) || size == sizeof(long long) + 1);
380 | 			return;
381 | 	}
382 | }
383 | 
384 | // Sort arrays of structures, the comparison function must be by reference.
385 | 
386 | void quadsort_size(void *array, size_t nmemb, size_t size, CMPFUNC *cmp)
387 | {
388 | 	char **pti, *pta, *pts;
389 | 	size_t index, offset;
390 | 
391 | 	if (nmemb < 2)
392 | 	{
393 | 		return;
394 | 	}
395 | 	pta = (char *) array;
396 | 	pti = (char **) malloc(nmemb * sizeof(char *));
397 | 
398 | 	assert(pti != NULL);
399 | 
400 | 	for (index = offset = 0 ; index < nmemb ; index++)
401 | 	{
402 | 		pti[index] = pta + offset;
403 | 
404 | 		offset += size;
405 | 	}
406 | 
407 | 	switch (sizeof(size_t))
408 | 	{
409 | 		case 4: quadsort32(pti, nmemb, cmp); break;
410 | 		case 8: quadsort64(pti, nmemb, cmp); break;
411 | 	}
412 | 
413 | 	pts = (char *) malloc(nmemb * size);
414 | 
415 | 	assert(pts != NULL);
416 | 	
417 | 	for (index = 0 ; index < nmemb ; index++)
418 | 	{
419 | 		memcpy(pts, pti[index], size);
420 | 
421 | 		pts += size;
422 | 	}
423 | 	pts -= nmemb * size;
424 | 
425 | 	memcpy(array, pts, nmemb * size);
426 | 
427 | 	free(pti);
428 | 	free(pts);
429 | }
430 | 
431 | #undef QUAD_CACHE
432 | 
433 | #endif
434 | 


--------------------------------------------------------------------------------