├── .github └── workflows │ └── test.yml ├── .gitignore ├── CONTRIBUTING.md ├── CONTRIBUTORS.md ├── LICENSE.md ├── Makefile ├── README.md ├── astyle.options ├── benchmark.c ├── demo.c ├── doc └── timsort.txt ├── generate_bitonic_sort.py ├── multidemo.c ├── sort.h ├── sort_extra.h └── stresstest.c /.github/workflows/test.yml: -------------------------------------------------------------------------------- 1 | on: 2 | push: 3 | 4 | jobs: 5 | test-liunux: 6 | strategy: 7 | matrix: 8 | cc: [gcc-10, gcc-11, gcc-12, gcc-13, clang-13, clang-14, clang-15] 9 | runs-on: ubuntu-22.04 10 | steps: 11 | - name: Checkout code 12 | uses: actions/checkout@v2 13 | - name: test 14 | run: make CC=${{ matrix.cc }} 15 | test-macos: 16 | runs-on: macos-latest 17 | steps: 18 | - name: Checkout code 19 | uses: actions/checkout@v2 20 | - name: test 21 | run: make 22 | test-windows: 23 | runs-on: windows-latest 24 | steps: 25 | - name: Checkout code 26 | uses: actions/checkout@v2 27 | - name: Setup cl 28 | uses: ilammy/msvc-dev-cmd@v1 29 | - name: Compile stresstest 30 | run: cl /Zi /DDSET_SORT_EXTRA stresstest.c 31 | - shell: bash 32 | run: ls 33 | - shell: bash 34 | name: stresstest 35 | run: ./stresstest.exe 36 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | benchmark 2 | benchmark_extra 3 | benchmark.txt 4 | demo 5 | demo_extra 6 | multidemo 7 | multidemo_extra 8 | stresstest 9 | stresstest_extra 10 | *.dSYM 11 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to sort.h 2 | 3 | `sort.h` is based on the hard work and fun times of several people, and I hope 4 | you'll join us! 5 | 6 | ## Make your change 7 | 8 | Step 1 is always check out the code, play with it, and make whatever 9 | change you were planning to make. 10 | Maybe adding a new algorithm, improving the performance of something, 11 | or making something simpler? 12 | 13 | Adding more documentation would also be wonderful! 14 | 15 | ## We use C89 16 | 17 | To be compatible with as many C compilers as possible, we stick to the rather 18 | old C89 standard. 19 | In particular, some Microsoft compilers are very picky about this, and 20 | we want that `sort.h` runs on all platforms. 21 | 22 | ## No dependencies 23 | 24 | `sort.h` is pure C and macros, and should compile without any other libraries. 25 | We don't want to create a dependency problem. 26 | 27 | We want to keep it that way, because dependencies are an awful problem in C. 28 | 29 | ## No libraries 30 | 31 | `sort.h` is almost entirely self-contained (`sort_common.h` is only 32 | ever included once, and has some common functions for `sort.h`). 33 | It doesn't build libraries 34 | 35 | We want to keep it that way, because building libraries, versioning them, and 36 | linking to them correctly are awful problems in C. 37 | 38 | ## static 39 | 40 | All functions should be `static` in `sort.h` and `sort_common.h`. 41 | 42 | Combined with the fact that we don't build any libraries, this allows 43 | the compiler to make all sorts of optimizations it can never make otherwise, 44 | such as inlining functions, using fast arithmetic, and avoiding pointers. 45 | 46 | ## __inline 47 | 48 | The proper, C89 way to do inlining is to mark a function as `__inline` -- 49 | this is useful for comparisons and other small functions that are used often. 50 | 51 | ## Make sure tests pass 52 | 53 | Run `make`, and it should build `stresstest` and run it, which 54 | will make sure nothing is broken. 55 | 56 | ## Add more tests, and make sure those pass too 57 | 58 | If you are writing new code, make sure that it is tested in 59 | `stresstest.c`. 60 | 61 | ## Run `make format` 62 | 63 | Run `make format` to ensure that the source files all conform to some 64 | standard style guidelines for C. 65 | 66 | (On OS X, you can use `brew install astyle` to install `astyle`.) 67 | 68 | ## Push your branch to GitHub 69 | 70 | On your fork, push your branch up with your changes, and create a pull request. 71 | 72 | ## We review! 73 | 74 | Swenson should review the code shortly, and provide you feedback, or merge 75 | if it looks great. 76 | 77 | ## Questions 78 | 79 | If you have questions, problems, or just aren't even sure where to start, 80 | please reach out! 81 | Open an issue, or reach out on Twitter (https://twitter.com/chris_swenson) 82 | or email (chris@caswenson.com), and we'd be happy to help. 83 | 84 | <3 85 | -------------------------------------------------------------------------------- /CONTRIBUTORS.md: -------------------------------------------------------------------------------- 1 | List of contributors, in alphabetical order: 2 | 3 | - Andrey Astrelin 4 | - Antony Dovgal 5 | - @[Baobaobear](https://github.com/Baobaobear) 6 | - Christopher Swenson 7 | - [@drfie](https://github.com/drfie) 8 | - [@DrMarkS](https://github.com/DrMarkS) 9 | - Emanuel Falkenauer 10 | - Google Inc. 11 | - Haneef Mubarak 12 | - Matthieu Darbois 13 | - Vojtech Fried 14 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2010-2019 Christopher Swenson and [others as listed in CONTRIBUTORS.md](CONTRIBUTORS.md) 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2010-2024 Christopher Swenson. 2 | # Copyright (c) 2012 Google Inc. All Rights Reserved. 3 | 4 | CC ?= gcc 5 | CFLAGS ?= -O3 -g -Wall -std=c89 -pedantic -Wno-long-long -Wno-format 6 | EXTRA = -DSET_SORT_EXTRA 7 | 8 | default: benchmark demo multidemo stresstest test benchmark_extra demo_extra multidemo_extra stresstest_extra 9 | 10 | .PHONY: default clean test format 11 | 12 | test: stresstest_extra benchmark_extra 13 | ./benchmark_extra | tee benchmark.txt 14 | ./stresstest_extra 15 | 16 | clean: 17 | rm -f demo multidemo stresstest benchmark demo_extra multidemo_extra stresstest_extra benchmark_extra 18 | 19 | demo: demo.c sort.h 20 | $(CC) $(CFLAGS) demo.c -o $@ 21 | 22 | demo_extra: demo.c sort.h sort_extra.h 23 | $(CC) $(CFLAGS) demo.c -o $@ $(EXTRA) 24 | 25 | multidemo: multidemo.c sort.h 26 | $(CC) $(CFLAGS) multidemo.c -o $@ 27 | 28 | multidemo_extra: multidemo.c sort.h sort_extra.h 29 | $(CC) $(CFLAGS) multidemo.c -o $@ $(EXTRA) 30 | 31 | stresstest: stresstest.c sort.h 32 | $(CC) $(CFLAGS) stresstest.c -o $@ 33 | 34 | stresstest_extra: stresstest.c sort.h sort_extra.h 35 | $(CC) $(CFLAGS) stresstest.c -o $@ $(EXTRA) 36 | 37 | benchmark: benchmark.c sort.h 38 | $(CC) $(CFLAGS) benchmark.c -o $@ 39 | 40 | benchmark_extra: benchmark.c sort.h sort_extra.h 41 | $(CC) $(CFLAGS) benchmark.c -o $@ $(EXTRA) 42 | 43 | format: 44 | astyle --options=astyle.options sort.h sort_extra.h demo.c multidemo.c stresstest.c benchmark.c -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | sort.h 2 | ====== 3 | 4 | Overview 5 | -------- 6 | 7 | `sort.h` is an implementation of a ton of sorting algorithms in C with a 8 | user-defined type that is provided at include time. 9 | 10 | This means you don't have to pay the function call overhead of using 11 | a standard library routine. This also gives us the power of higher-level 12 | language generics. 13 | 14 | In addition, you don't have to link in a library: 15 | the entirety of this sorting library is contained in the `sort.h` header file. 16 | 17 | You get the choice of many sorting routines, including: 18 | 19 | * Timsort (stable) 20 | * Quicksort 21 | * Merge sort (stable) 22 | * In-place merge sort (*not* stable) 23 | * Shellsort 24 | * Binary insertion sort 25 | * Heapsort 26 | 27 | If you set `SORT_EXTRA` and have `sort_extra.h` available in the path, there are some additional, specialized sorting routines available: 28 | 29 | * Selection sort (this is really only here for comparison) 30 | * Bubble sort 31 | * Grail sort (stable) 32 | * Based on [`B-C. Huang and M. A. Langston, *Fast Stable Merging and Sorting in 33 | Constant Extra Space* (1989-1992)`](http://comjnl.oxfordjournals.org/content/35/6/643.full.pdf). 34 | 35 | Thanks to Andrey Astrelin for the implementation. 36 | * Sqrt Sort (stable, based on Grail sort, also by Andrey Astrelin). 37 | 38 | If you don't know which one to use, you should probably use Timsort. 39 | 40 | If you have a lot data that is semi-structured, then you should definitely use Timsort. 41 | 42 | If you have data that is really and truly random, quicksort is probably fastest. 43 | 44 | 45 | Usage 46 | ----- 47 | 48 | To use this library, you need to do three things: 49 | 50 | * `#define SORT_TYPE` to be the type of the elements of the array you 51 | want to sort. (For pointers, you should declare this like: `#define SORT_TYPE int*`) 52 | * `#define SORT_NAME` to be a unique name that will be prepended to all 53 | the routines, i.e., `#define SORT_NAME mine` would give you routines 54 | named `mine_heap_sort`, and so forth. 55 | * `#include "sort.h"`. Make sure that `sort.h` is in your include path. 56 | 57 | Then, enjoy using the sorting routines. 58 | 59 | Quick example: 60 | 61 | ```c 62 | #define SORT_NAME int64 63 | #define SORT_TYPE int64_t 64 | #define SORT_CMP(x, y) ((x) - (y)) 65 | #include "sort.h" 66 | ``` 67 | 68 | You would now have access to `int64_quick_sort`, `int64_tim_sort`, etc., 69 | which you can use like 70 | 71 | ```c 72 | /* Assumes you have some int64_t *arr or int64_t arr[128]; */ 73 | int64_quick_sort(arr, 128); 74 | ``` 75 | 76 | See `demo.c` for a more detailed example usage. 77 | 78 | If you are going to use your own custom type, you must redefine 79 | `SORT_CMP(x, y)` with your comparison function, so that it returns 80 | a value less than zero if `x < y`, equal to zero if `x == y`, and 81 | greater than 0 if `x > y`. 82 | 83 | The default just uses the builtin `<` operators: 84 | 85 | ```c 86 | #define SORT_CMP(x, y) ((x) < (y) ? -1 : ((y) < (x) ? 1 : 0)) 87 | ``` 88 | 89 | It is often just fine to just subtract the arguments as well (though 90 | this can cause some stability problems with floating-point types): 91 | 92 | ```c 93 | #define SORT_CMP(x, y) ((x) - (y)) 94 | ``` 95 | 96 | You can also redefine `TIM_SORT_STACK_SIZE` (default 128) to control 97 | the size of the tim sort stack (which can be used to reduce memory). 98 | Reducing it too far can cause tim sort to overflow the stack though. 99 | 100 | You can specify definitions for all functions that are included in 101 | sort.h. Making sort functions static increases the likelihood a 102 | compiler will eliminate dead code. 103 | 104 | ```c 105 | #define SORT_DEF static 106 | ``` 107 | 108 | Speed of routines 109 | ----------------- 110 | 111 | The speed of each routine is highly dependent on your computer and the 112 | structure of your data. 113 | 114 | If your data has a lot of partially sorted sequences, then Tim sort 115 | will beat the kilt off of anything else. 116 | 117 | Timsort is not as good if memory movement is many orders of magnitude more 118 | expensive than comparisons (like, many more than for normal int and double). 119 | If so, then quick sort is probably your routine. On the other hand, Timsort 120 | does extremely well if the comparison operator is very expensive, 121 | since it strives hard to minimize comparisons. 122 | 123 | Here is the output of `demo.c`, which will give you the timings for a run of 124 | 10,000 `int64_t`s on 2014-era MacBook Pro: 125 | 126 | ``` 127 | Running tests 128 | stdlib qsort time: 1285.00 us per iteration 129 | stdlib heapsort time: 2109.00 us per iteration 130 | stdlib mergesort time: 1299.00 us per iteration 131 | quick sort time: 579.00 us per iteration 132 | selection sort time: 127176.00 us per iteration 133 | merge sort time: 999.00 us per iteration 134 | binary insertion sort time: 13443.00 us per iteration 135 | heap sort time: 592.00 us per iteration 136 | shell sort time: 1054.00 us per iteration 137 | tim sort time: 1005.00 us per iteration 138 | in-place merge sort time: 903.00 us per iteration 139 | grail sort time: 1220.00 us per iteration 140 | sqrt sort time: 1095.00 us per iteration 141 | ``` 142 | 143 | Quicksort is the winner here. Heapsort, in-place merge sort, 144 | and timsort also often tend to be quite fast. 145 | 146 | Contributing 147 | ------------ 148 | 149 | See [CONTRIBUTING.md](CONTRIBUTING.md). 150 | 151 | References 152 | ---------- 153 | 154 | * [Wikipedia: Timsort](https://en.wikipedia.org/wiki/Timsort) 155 | * [`timsort.md`](doc/timsort.txt) 156 | 157 | License 158 | ------- 159 | 160 | Available under the MIT License. See [LICENSE.md](LICENSE.md) for details. 161 | -------------------------------------------------------------------------------- /astyle.options: -------------------------------------------------------------------------------- 1 | --mode=c 2 | --style=google # brackets 3 | --indent=spaces=2 4 | --max-code-length=100 5 | --break-blocks 6 | --pad-oper 7 | --pad-header 8 | --delete-empty-lines 9 | --suffix=none 10 | -------------------------------------------------------------------------------- /benchmark.c: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2010-2014 Christopher Swenson. */ 2 | /* Copyright (c) 2012 Google Inc. All Rights Reserved. */ 3 | 4 | #define _XOPEN_SOURCE 5 | #include 6 | #include 7 | #include 8 | 9 | #define SORT_NAME sorter 10 | #define SORT_TYPE int64_t 11 | #define MAX(x,y) (((x) > (y) ? (x) : (y))) 12 | #define MIN(x,y) (((x) < (y) ? (x) : (y))) 13 | #define SORT_CMP(x, y) ((x) - (y)) 14 | #define SORT_CSWAP(x, y) {SORT_TYPE _sort_swap_temp = MAX((x), (y)); (x) = MIN((x),(y)); (y) = _sort_swap_temp;} 15 | #ifdef SET_SORT_EXTRA 16 | #define SORT_EXTRA 17 | #endif 18 | #include "sort.h" 19 | 20 | /* Used to control the stress test */ 21 | #define SEED 123 22 | #define FAST_ITERATIONS 1 23 | #define SLOW_ITERATIONS 1 24 | #define SIZES 1 25 | 26 | size_t sizes[SIZES] = {100000}; 27 | 28 | /* used for stdlib */ 29 | static __inline int simple_cmp(const void *a, const void *b) { 30 | const int64_t da = *((const int64_t *) a); 31 | const int64_t db = *((const int64_t *) b); 32 | return (da < db) ? -1 : (da == db) ? 0 : 1; 33 | } 34 | 35 | static __inline double utime(void) { 36 | struct timeval t; 37 | gettimeofday(&t, NULL); 38 | return (1000000.0 * t.tv_sec + t.tv_usec); 39 | } 40 | 41 | static void fill_random(int64_t *dst, const int size) { 42 | int i; 43 | srand48(SEED); 44 | 45 | for (i = 0; i < size; i++) { 46 | dst[i] = lrand48(); 47 | } 48 | } 49 | 50 | void capitalize(const char *word, char *new_word) { 51 | int len; 52 | len = strlen(word); 53 | 54 | if (len < 1) { 55 | return; 56 | } 57 | 58 | strcpy(new_word, word); 59 | new_word[0] = toupper(new_word[0]); 60 | } 61 | 62 | int platform_bits(void) { 63 | #if defined (__amd64__) || defined (__x86_64__) 64 | 65 | if (1) { /* avoid two returns in a row */ 66 | return 64; 67 | } 68 | 69 | #endif 70 | /* backup case */ 71 | return sizeof(void *) * 8; 72 | } 73 | 74 | void platform_name(char *output) { 75 | char *name; 76 | #if defined (__amd64__) || defined (__x86_64__) 77 | name = "x86"; 78 | #elif defined (__arm__) 79 | name = "arm"; 80 | #else 81 | name = "unknown"; 82 | #endif 83 | sprintf(output, "%s_%d", name, platform_bits()); 84 | } 85 | 86 | #define TEST_STDLIB(name) do { \ 87 | capitalize(#name, capital_word); \ 88 | for (test = 0; test < SIZES; test++) { \ 89 | int64_t size = sizes[test]; \ 90 | int64_t *dst = (int64_t *) malloc(sizeof(int64_t) * size); \ 91 | diff = 0; \ 92 | iter = 0; \ 93 | while (1) { \ 94 | fill_random(dst, size); \ 95 | usec1 = utime(); \ 96 | name (dst, size, sizeof(int64_t), simple_cmp); \ 97 | usec2 = utime(); \ 98 | diff += usec2 - usec1; \ 99 | iter++; \ 100 | if (diff >= 1000000.0) { \ 101 | break; \ 102 | } \ 103 | } \ 104 | free(dst); \ 105 | sprintf(name_buf, "%s %lld %s", capital_word, size, platform); \ 106 | printf("%-40s %4d %16.1f ns/op\n", name_buf, iter, diff * 1000.0 / (double) iter); \ 107 | } \ 108 | } while (0) 109 | 110 | #define TEST_SORT_H(name) do { \ 111 | capitalize(#name, capital_word); \ 112 | for (test = 0; test < SIZES; test++) { \ 113 | int64_t size = sizes[test]; \ 114 | int64_t *dst = (int64_t *) malloc(sizeof(int64_t) * size); \ 115 | diff = 0; \ 116 | iter = 0; \ 117 | while (1) { \ 118 | fill_random(dst, size); \ 119 | usec1 = utime(); \ 120 | sorter_ ## name (dst, size); \ 121 | usec2 = utime(); \ 122 | diff += usec2 - usec1; \ 123 | iter++; \ 124 | if (diff >= 1000000.0) { \ 125 | break; \ 126 | } \ 127 | } \ 128 | free(dst); \ 129 | sprintf(name_buf, "%s %lld %s", capital_word, size, platform); \ 130 | printf("%-40s %4d %16.1f ns/op\n", name_buf, iter, diff * 1000.0 / (double) iter); \ 131 | } \ 132 | } while (0) 133 | 134 | 135 | int main(void) { 136 | int test, iter; 137 | double usec1, usec2, diff; 138 | char capital_word[128]; 139 | char platform[128]; 140 | char name_buf[128]; 141 | \ 142 | platform_name(platform); 143 | TEST_STDLIB(qsort); 144 | #if !defined(__linux__) && !defined(__CYGWIN__) 145 | TEST_STDLIB(heapsort); 146 | TEST_STDLIB(mergesort); 147 | #endif 148 | TEST_SORT_H(binary_insertion_sort); 149 | TEST_SORT_H(bitonic_sort); 150 | TEST_SORT_H(quick_sort); 151 | TEST_SORT_H(merge_sort); 152 | TEST_SORT_H(heap_sort); 153 | TEST_SORT_H(shell_sort); 154 | TEST_SORT_H(tim_sort); 155 | TEST_SORT_H(merge_sort_in_place); 156 | #ifdef SET_SORT_EXTRA 157 | TEST_SORT_H(grail_sort); 158 | TEST_SORT_H(sqrt_sort); 159 | TEST_SORT_H(rec_stable_sort); 160 | TEST_SORT_H(grail_sort_dyn_buffer); 161 | #endif 162 | return 0; 163 | } 164 | -------------------------------------------------------------------------------- /demo.c: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2010-2014 Christopher Swenson. */ 2 | /* Copyright (c) 2012 Google Inc. All Rights Reserved. */ 3 | 4 | #define _XOPEN_SOURCE 5 | #include 6 | 7 | #define SORT_NAME sorter 8 | #define SORT_TYPE int64_t 9 | /* You can redefine the comparison operator. 10 | The default is 11 | #define SORT_CMP(x, y) ((x) < (y) ? -1 : ((x) == (y) ? 0 : 1)) 12 | but the one below is often faster for integer types. 13 | */ 14 | #define SORT_CMP(x, y) (x - y) 15 | #define MAX(x,y) (((x) > (y) ? (x) : (y))) 16 | #define MIN(x,y) (((x) < (y) ? (x) : (y))) 17 | #define SORT_CSWAP(x, y) {SORT_TYPE _sort_swap_temp = MAX((x), (y)); (x) = MIN((x),(y)); (y) = _sort_swap_temp;} 18 | #ifdef SET_SORT_EXTRA 19 | #define SORT_EXTRA 20 | #endif 21 | #include "sort.h" 22 | 23 | /* 24 | We now have the following functions defined 25 | * sorter_shell_sort 26 | * sorter_binary_insertion_sort 27 | * sorter_heap_sort 28 | * sorter_quick_sort 29 | * sorter_merge_sort 30 | * sorter_selection_sort 31 | * sorter_tim_sort 32 | 33 | Each takes two arguments: int64_t *array, size_t size 34 | */ 35 | 36 | 37 | /* Used to control the demo */ 38 | #define SEED 123 39 | #define SIZE 10000 40 | #define RUNS 1 41 | 42 | /* helper functions */ 43 | void verify(int64_t *dst, const int size) { 44 | int i; 45 | 46 | for (i = 1; i < size; i++) { 47 | if (dst[i - 1] > dst[i]) { 48 | printf("Verify failed! at %d\n", i); 49 | /* 50 | for (i = i - 2; i < SIZE; i++) { 51 | printf(" %lld", (long long) dst[i]); 52 | } 53 | */ 54 | printf("\n"); 55 | break; 56 | } 57 | } 58 | } 59 | 60 | static __inline double utime(void) { 61 | struct timeval t; 62 | gettimeofday(&t, NULL); 63 | return (1000000.0 * t.tv_sec + t.tv_usec); 64 | } 65 | 66 | static void fill(int64_t *arr, const int size) { 67 | int i; 68 | 69 | for (i = 0; i < size; i++) { 70 | arr[i] = lrand48(); 71 | } 72 | } 73 | 74 | /* used for stdlib */ 75 | static __inline int simple_cmp(const void *a, const void *b) { 76 | const int64_t da = *((const int64_t *) a); 77 | const int64_t db = *((const int64_t *) b); 78 | return (da < db) ? -1 : (da == db) ? 0 : 1; 79 | } 80 | 81 | 82 | void run_tests(void) { 83 | int i; 84 | int64_t arr[SIZE]; 85 | int64_t dst[SIZE]; 86 | double start_time; 87 | double end_time; 88 | double total_time; 89 | printf("Running tests\n"); 90 | srand48(SEED); 91 | total_time = 0.0; 92 | 93 | for (i = 0; i < RUNS; i++) { 94 | fill(arr, SIZE); 95 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 96 | start_time = utime(); 97 | qsort(dst, SIZE, sizeof(int64_t), simple_cmp); 98 | end_time = utime(); 99 | total_time += end_time - start_time; 100 | verify(dst, SIZE); 101 | } 102 | 103 | printf("stdlib qsort time: %10.2f us per iteration\n", total_time / RUNS); 104 | #if !defined(__linux__) && !defined(__CYGWIN__) 105 | srand48(SEED); 106 | total_time = 0.0; 107 | 108 | for (i = 0; i < RUNS; i++) { 109 | fill(arr, SIZE); 110 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 111 | start_time = utime(); 112 | heapsort(dst, SIZE, sizeof(int64_t), simple_cmp); 113 | end_time = utime(); 114 | total_time += end_time - start_time; 115 | verify(dst, SIZE); 116 | } 117 | 118 | printf("stdlib heapsort time: %10.2f us per iteration\n", total_time / RUNS); 119 | srand48(SEED); 120 | total_time = 0.0; 121 | 122 | for (i = 0; i < RUNS; i++) { 123 | fill(arr, SIZE); 124 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 125 | start_time = utime(); 126 | mergesort(dst, SIZE, sizeof(int64_t), simple_cmp); 127 | end_time = utime(); 128 | total_time += end_time - start_time; 129 | verify(dst, SIZE); 130 | } 131 | 132 | printf("stdlib mergesort time: %10.2f us per iteration\n", total_time / RUNS); 133 | #endif 134 | srand48(SEED); 135 | total_time = 0.0; 136 | 137 | for (i = 0; i < RUNS; i++) { 138 | fill(arr, SIZE); 139 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 140 | start_time = utime(); 141 | sorter_quick_sort(dst, SIZE); 142 | end_time = utime(); 143 | total_time += end_time - start_time; 144 | verify(dst, SIZE); 145 | } 146 | 147 | printf("quick sort time: %10.2f us per iteration\n", total_time / RUNS); 148 | srand48(SEED); 149 | total_time = 0.0; 150 | #ifdef SET_SORT_EXTRA 151 | 152 | for (i = 0; i < RUNS; i++) { 153 | fill(arr, SIZE); 154 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 155 | start_time = utime(); 156 | sorter_selection_sort(dst, SIZE); 157 | end_time = utime(); 158 | total_time += end_time - start_time; 159 | verify(dst, SIZE); 160 | } 161 | 162 | printf("selection sort time: %10.2f us per iteration\n", total_time / RUNS); 163 | srand48(SEED); 164 | total_time = 0.0; 165 | 166 | for (i = 0; i < RUNS; i++) { 167 | fill(arr, SIZE); 168 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 169 | start_time = utime(); 170 | sorter_bubble_sort(dst, SIZE); 171 | end_time = utime(); 172 | total_time += end_time - start_time; 173 | verify(dst, SIZE); 174 | } 175 | 176 | printf("bubble sort time: %10.2f us per iteration\n", total_time / RUNS); 177 | srand48(SEED); 178 | total_time = 0.0; 179 | #endif 180 | 181 | for (i = 0; i < RUNS; i++) { 182 | fill(arr, SIZE); 183 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 184 | start_time = utime(); 185 | sorter_bitonic_sort(dst, SIZE); 186 | end_time = utime(); 187 | total_time += end_time - start_time; 188 | verify(dst, SIZE); 189 | } 190 | 191 | printf("bitonic sort time: %10.2f us per iteration\n", total_time / RUNS); 192 | srand48(SEED); 193 | total_time = 0.0; 194 | 195 | for (i = 0; i < RUNS; i++) { 196 | fill(arr, SIZE); 197 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 198 | start_time = utime(); 199 | sorter_merge_sort(dst, SIZE); 200 | end_time = utime(); 201 | total_time += end_time - start_time; 202 | verify(dst, SIZE); 203 | } 204 | 205 | printf("merge sort time: %10.2f us per iteration\n", total_time / RUNS); 206 | srand48(SEED); 207 | total_time = 0.0; 208 | 209 | for (i = 0; i < RUNS; i++) { 210 | fill(arr, SIZE); 211 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 212 | start_time = utime(); 213 | sorter_binary_insertion_sort(dst, SIZE); 214 | end_time = utime(); 215 | total_time += end_time - start_time; 216 | verify(dst, SIZE); 217 | } 218 | 219 | printf("binary insertion sort time: %10.2f us per iteration\n", total_time / RUNS); 220 | srand48(SEED); 221 | total_time = 0.0; 222 | 223 | for (i = 0; i < RUNS; i++) { 224 | fill(arr, SIZE); 225 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 226 | start_time = utime(); 227 | sorter_heap_sort(dst, SIZE); 228 | end_time = utime(); 229 | total_time += end_time - start_time; 230 | verify(dst, SIZE); 231 | } 232 | 233 | printf("heap sort time: %10.2f us per iteration\n", total_time / RUNS); 234 | srand48(SEED); 235 | total_time = 0.0; 236 | 237 | for (i = 0; i < RUNS; i++) { 238 | fill(arr, SIZE); 239 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 240 | start_time = utime(); 241 | sorter_shell_sort(dst, SIZE); 242 | end_time = utime(); 243 | total_time += end_time - start_time; 244 | verify(dst, SIZE); 245 | } 246 | 247 | printf("shell sort time: %10.2f us per iteration\n", total_time / RUNS); 248 | srand48(SEED); 249 | total_time = 0.0; 250 | 251 | for (i = 0; i < RUNS; i++) { 252 | fill(arr, SIZE); 253 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 254 | start_time = utime(); 255 | sorter_tim_sort(dst, SIZE); 256 | end_time = utime(); 257 | total_time += end_time - start_time; 258 | verify(dst, SIZE); 259 | } 260 | 261 | printf("tim sort time: %10.2f us per iteration\n", total_time / RUNS); 262 | srand48(SEED); 263 | total_time = 0.0; 264 | 265 | for (i = 0; i < RUNS; i++) { 266 | fill(arr, SIZE); 267 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 268 | start_time = utime(); 269 | sorter_merge_sort_in_place(dst, SIZE); 270 | end_time = utime(); 271 | total_time += end_time - start_time; 272 | verify(dst, SIZE); 273 | } 274 | 275 | printf("in-place merge sort time: %10.2f us per iteration\n", total_time / RUNS); 276 | srand48(SEED); 277 | total_time = 0.0; 278 | #ifdef SET_SORT_EXTRA 279 | 280 | for (i = 0; i < RUNS; i++) { 281 | fill(arr, SIZE); 282 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 283 | start_time = utime(); 284 | sorter_grail_sort(dst, SIZE); 285 | end_time = utime(); 286 | total_time += end_time - start_time; 287 | verify(dst, SIZE); 288 | } 289 | 290 | printf("grail sort time: %10.2f us per iteration\n", total_time / RUNS); 291 | srand48(SEED); 292 | total_time = 0.0; 293 | 294 | for (i = 0; i < RUNS; i++) { 295 | fill(arr, SIZE); 296 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 297 | start_time = utime(); 298 | sorter_sqrt_sort(dst, SIZE); 299 | end_time = utime(); 300 | total_time += end_time - start_time; 301 | verify(dst, SIZE); 302 | } 303 | 304 | printf("sqrt sort time: %10.2f us per iteration\n", total_time / RUNS); 305 | srand48(SEED); 306 | total_time = 0.0; 307 | 308 | for (i = 0; i < RUNS; i++) { 309 | fill(arr, SIZE); 310 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 311 | start_time = utime(); 312 | sorter_rec_stable_sort(dst, SIZE); 313 | end_time = utime(); 314 | total_time += end_time - start_time; 315 | verify(dst, SIZE); 316 | } 317 | 318 | printf("rec stable sort sort time: %10.2f us per iteration\n", total_time / RUNS); 319 | srand48(SEED); 320 | total_time = 0.0; 321 | 322 | for (i = 0; i < RUNS; i++) { 323 | fill(arr, SIZE); 324 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 325 | start_time = utime(); 326 | sorter_grail_sort_dyn_buffer(dst, SIZE); 327 | end_time = utime(); 328 | total_time += end_time - start_time; 329 | verify(dst, SIZE); 330 | } 331 | 332 | printf("grail sort dyn buffer sort time: %10.2f us per iteration\n", total_time / RUNS); 333 | #endif 334 | } 335 | 336 | int main(void) { 337 | run_tests(); 338 | return 0; 339 | } 340 | -------------------------------------------------------------------------------- /doc/timsort.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/swenson/sort/24f5b8b13810ad130109c7b56daf8e99ab0fe1b8/doc/timsort.txt -------------------------------------------------------------------------------- /generate_bitonic_sort.py: -------------------------------------------------------------------------------- 1 | import requests 2 | 3 | max_small_sort = 16 4 | small_sort_only = True 5 | 6 | def generate_small_sort(n, rev=False): 7 | s = "http://jgamble.ripco.net/cgi-bin/nw.cgi?inputs=%d&algorithm=best&output=svg" % n 8 | ret = requests.get(s) 9 | content = ret.text 10 | lines = [l for l in content.split('\n') if l.startswith('[[')] 11 | swps = "" 12 | for _l in lines: 13 | l = _l[1:-1] 14 | ll = l.split(']') 15 | for _t in ll: 16 | if '[' not in _t: 17 | continue 18 | t = _t.split('[')[-1] 19 | i, j = t.split(',') 20 | if not rev: 21 | swps += "\tSORT_CSWAP(dst[%s], dst[%s]);\n" % (i, j) 22 | else: 23 | swps += "\tSORT_CSWAP(dst[%s], dst[%s]);\n" % (j, i) 24 | swps += '\n' 25 | if not rev: 26 | f = """ 27 | #define BITONIC_SORT_%d SORT_MAKE_STR(bitonic_sort_%d) 28 | static __inline void BITONIC_SORT_%d(SORT_TYPE *dst) { 29 | %s} 30 | """ % (n, n, n, swps[:-1]) 31 | else: 32 | f = """ 33 | #define BITONIC_SORT_REVERSE_%d SORT_MAKE_STR(bitonic_sort_reverse_%d) 34 | static __inline void BITONIC_SORT_REVERSE_%d(SORT_TYPE *dst) { 35 | %s} 36 | """ % (n, n, n, swps[:-1]) 37 | return f 38 | 39 | 40 | bitonic_sort_str = """ 41 | #ifndef SORT_CSWAP 42 | #define SORT_CSWAP(x, y) { if(SORT_CMP((x),(y)) > 0) {SORT_SWAP((x),(y));}} 43 | #endif 44 | """ 45 | bitonic_sort_str += "\n".join(generate_small_sort(i) for i in range(2,max_small_sort+1)) 46 | 47 | if small_sort_only: 48 | bitonic_sort_case = "\n".join(" case %d:\n BITONIC_SORT_%d(dst);\n break;" % (n, n) for n in range(2, max_small_sort+1)) 49 | bitonic_sort_str += """ 50 | void BITONIC_SORT(SORT_TYPE *dst, const size_t size) { 51 | switch(size) { 52 | case 0: 53 | case 1: 54 | break; 55 | %s 56 | default: 57 | BINARY_INSERTION_SORT(dst, size); 58 | } 59 | } 60 | """ % (bitonic_sort_case) 61 | print(bitonic_sort_str) 62 | exit() 63 | 64 | 65 | 66 | bitonic_sort_str += "\n".join(generate_small_sort(i, rev=True) for i in range(2,max_small_sort+1)) 67 | 68 | bitonic_sort_case = "\n".join(" case %d:\n BITONIC_SORT_%d(dst);\n break;" % (n, n) for n in range(2, max_small_sort+1)) 69 | bitonic_sort_case_rev = "\n".join(" case %d:\n BITONIC_SORT_REVERSE_%d(dst);\n break;" % (n, n) for n in range(2, max_small_sort+1)) 70 | 71 | bitonic_sort_str += """ 72 | #define BITONIC_SORT SORT_MAKE_STR(bitonic_sort) 73 | void BITONIC_SORT(SORT_TYPE *dst, const size_t size); 74 | #define BITONIC_SORT_REVERSE SORT_MAKE_STR(bitonic_sort_reverse) 75 | void BITONIC_SORT_REVERSE(SORT_TYPE *dst, const size_t size); 76 | """ 77 | 78 | bitonic_sort_str += """ 79 | #define BITONIC_MERGE SORT_MAKE_STR(bitonic_merge) 80 | void BITONIC_MERGE(SORT_TYPE *dst, const size_t size) { 81 | size_t m, i, j; 82 | if (size <= 1) { 83 | return; 84 | } 85 | m = 1ULL<<(63 - CLZ(size-1)); 86 | j = m; 87 | for (i = 0; i < size - m; ++i, ++j) { 88 | SORT_CSWAP(dst[i], dst[j]); 89 | } 90 | BITONIC_MERGE(dst, m); 91 | BITONIC_MERGE(dst + m, size - m); 92 | } 93 | 94 | #define BITONIC_MERGE_REVERSE SORT_MAKE_STR(bitonic_merge_reverse) 95 | void BITONIC_MERGE_REVERSE(SORT_TYPE *dst, const size_t size) { 96 | size_t m, i, j; 97 | if (size <= 1) { 98 | return; 99 | } 100 | m = 1ULL<<(63 - CLZ(size-1)); 101 | j = m; 102 | for (i = 0; i < size - m; ++i, ++j) { 103 | SORT_CSWAP(dst[j], dst[i]); 104 | } 105 | BITONIC_MERGE_REVERSE(dst, m); 106 | BITONIC_MERGE_REVERSE(dst + m, size - m); 107 | } 108 | 109 | """ 110 | 111 | bitonic_sort_str += """ 112 | void BITONIC_SORT(SORT_TYPE *dst, const size_t size) { 113 | switch(size) { 114 | case 0: 115 | case 1: 116 | break; 117 | %s 118 | default: 119 | /*printf("Bitonic sort size = %%ld", size);*/ 120 | BITONIC_SORT_REVERSE(dst, (size>>1)); 121 | BITONIC_SORT(dst + (size>>1), size - (size>>1)); 122 | BITONIC_MERGE(dst, size); 123 | } 124 | } 125 | """ % (bitonic_sort_case) 126 | 127 | bitonic_sort_str += """ 128 | void BITONIC_SORT_REVERSE(SORT_TYPE *dst, const size_t size) { 129 | switch(size) { 130 | case 0: 131 | case 1: 132 | break; 133 | %s 134 | default: 135 | /*printf("Bitonic sort reverse size = %%ld", size);*/ 136 | BITONIC_SORT(dst, (size>>1)); 137 | BITONIC_SORT_REVERSE(dst + (size>>1), size - (size>>1)); 138 | BITONIC_MERGE_REVERSE(dst, size); 139 | } 140 | } 141 | """ % (bitonic_sort_case_rev) 142 | 143 | import os 144 | 145 | with open('bitonic_sort.h', 'w') as F: 146 | F.write(bitonic_sort_str) 147 | 148 | os.system('astyle --options=astyle.options bitonic_sort.h') -------------------------------------------------------------------------------- /multidemo.c: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2010-2014 Christopher Swenson. */ 2 | /* Copyright (c) 2012 Google Inc. All Rights Reserved. */ 3 | 4 | #define _XOPEN_SOURCE 5 | #include 6 | 7 | /* sort #1 */ 8 | 9 | #define SORT_NAME sorter 10 | #define SORT_TYPE int64_t 11 | /* You can redefine the comparison operator. 12 | The default is 13 | #define SORT_CMP(x, y) ((x) < (y) ? -1 : ((x) == (y) ? 0 : 1)) 14 | but the one below is often faster for integer types. 15 | */ 16 | #define SORT_CMP(x, y) (x - y) 17 | #ifdef SET_SORT_EXTRA 18 | #define SORT_EXTRA 19 | #endif 20 | #include "sort.h" 21 | 22 | /* 23 | We now have the following functions defined 24 | * sorter_shell_sort 25 | * sorter_binary_insertion_sort 26 | * sorter_heap_sort 27 | * sorter_quick_sort 28 | * sorter_merge_sort 29 | * sorter_selection_sort 30 | * sorter_tim_sort 31 | 32 | Each takes two arguments: int64_t *array, size_t size 33 | 34 | */ 35 | 36 | /* sort #2 */ 37 | 38 | #define SORT_NAME sorter2 39 | #define SORT_TYPE int64_t 40 | /* You can redefine the comparison operator. 41 | The default is 42 | #define SORT_CMP(x, y) ((x) > (y) ? -1 : ((x) == (y) ? 0 : 1)) 43 | but the one below is often faster for integer types. 44 | */ 45 | #define SORT_CMP(x, y) (y - x) 46 | #ifdef SET_SORT_EXTRA 47 | #define SORT_EXTRA 48 | #endif 49 | #include "sort.h" 50 | 51 | /* 52 | We now have the following functions defined 53 | * sorter2_shell_sort 54 | * sorter2_binary_insertion_sort 55 | * sorter2_heap_sort 56 | * sorter2_quick_sort 57 | * sorter2_merge_sort 58 | * sorter2_selection_sort 59 | * sorter2_tim_sort 60 | 61 | Each takes two arguments: int64_t *array, size_t size 62 | 63 | */ 64 | 65 | 66 | /* Used to control the demo */ 67 | #define SEED 123 68 | #define SIZE 10000 69 | #define RUNS 1 70 | 71 | /* helper functions */ 72 | void verify(int64_t *dst, const int size) { 73 | int i; 74 | 75 | for (i = 1; i < size; i++) { 76 | if (dst[i - 1] > dst[i]) { 77 | printf("Verify failed! at %d\n", i); 78 | 79 | for (i = i - 2; i < SIZE; i++) { 80 | printf(" %lld", (long long int)dst[i]); 81 | } 82 | 83 | printf("\n"); 84 | break; 85 | } 86 | } 87 | } 88 | 89 | void verify2(int64_t *dst, const int size) { 90 | int i; 91 | 92 | for (i = 1; i < size; i++) { 93 | if (dst[i - 1] < dst[i]) { 94 | printf("Verify failed! at %d\n", i); 95 | 96 | for (i = i - 2; i < SIZE; i++) { 97 | printf(" %lld", (long long int)dst[i]); 98 | } 99 | 100 | printf("\n"); 101 | break; 102 | } 103 | } 104 | } 105 | 106 | static __inline double utime(void) { 107 | struct timeval t; 108 | gettimeofday(&t, NULL); 109 | return (1000000.0 * t.tv_sec + t.tv_usec); 110 | } 111 | 112 | static void fill(int64_t *arr, const int size) { 113 | int i; 114 | 115 | for (i = 0; i < size; i++) { 116 | arr[i] = lrand48(); 117 | } 118 | } 119 | 120 | /* used for stdlib */ 121 | static __inline int simple_cmp(const void *a, const void *b) { 122 | const int64_t da = *((const int64_t *) a); 123 | const int64_t db = *((const int64_t *) b); 124 | return (da < db) ? -1 : (da == db) ? 0 : 1; 125 | } 126 | 127 | static __inline int simple_cmp2(const void *a, const void *b) { 128 | const int64_t da = *((const int64_t *) a); 129 | const int64_t db = *((const int64_t *) b); 130 | return (da > db) ? -1 : (da == db) ? 0 : 1; 131 | } 132 | 133 | void run_tests(void) { 134 | int i; 135 | int64_t arr[SIZE]; 136 | int64_t dst[SIZE]; 137 | double start_time; 138 | double end_time; 139 | double total_time; 140 | printf("Running tests\n"); 141 | srand48(SEED); 142 | total_time = 0.0; 143 | 144 | for (i = 0; i < RUNS; i++) { 145 | fill(arr, SIZE); 146 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 147 | start_time = utime(); 148 | qsort(dst, SIZE, sizeof(int64_t), simple_cmp); 149 | end_time = utime(); 150 | total_time += end_time - start_time; 151 | verify(dst, SIZE); 152 | } 153 | 154 | printf("stdlib qsort time: %10.2f us per iteration\n", total_time / RUNS); 155 | #if !defined(__linux__) && !defined(__CYGWIN__) 156 | srand48(SEED); 157 | total_time = 0.0; 158 | 159 | for (i = 0; i < RUNS; i++) { 160 | fill(arr, SIZE); 161 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 162 | start_time = utime(); 163 | heapsort(dst, SIZE, sizeof(int64_t), simple_cmp); 164 | end_time = utime(); 165 | total_time += end_time - start_time; 166 | verify(dst, SIZE); 167 | } 168 | 169 | printf("stdlib heapsort time: %10.2f us per iteration\n", total_time / RUNS); 170 | srand48(SEED); 171 | total_time = 0.0; 172 | 173 | for (i = 0; i < RUNS; i++) { 174 | fill(arr, SIZE); 175 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 176 | start_time = utime(); 177 | mergesort(dst, SIZE, sizeof(int64_t), simple_cmp); 178 | end_time = utime(); 179 | total_time += end_time - start_time; 180 | verify(dst, SIZE); 181 | } 182 | 183 | printf("stdlib mergesort time: %10.2f us per iteration\n", total_time / RUNS); 184 | #endif 185 | srand48(SEED); 186 | total_time = 0.0; 187 | 188 | for (i = 0; i < RUNS; i++) { 189 | fill(arr, SIZE); 190 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 191 | start_time = utime(); 192 | sorter_quick_sort(dst, SIZE); 193 | end_time = utime(); 194 | total_time += end_time - start_time; 195 | verify(dst, SIZE); 196 | } 197 | 198 | printf("quick sort time: %10.2f us per iteration\n", total_time / RUNS); 199 | srand48(SEED); 200 | total_time = 0.0; 201 | #ifdef SET_SORT_EXTRA 202 | 203 | for (i = 0; i < RUNS; i++) { 204 | fill(arr, SIZE); 205 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 206 | start_time = utime(); 207 | sorter_selection_sort(dst, SIZE); 208 | end_time = utime(); 209 | total_time += end_time - start_time; 210 | verify(dst, SIZE); 211 | } 212 | 213 | printf("selection sort time: %10.2f us per iteration\n", total_time / RUNS); 214 | srand48(SEED); 215 | total_time = 0.0; 216 | 217 | for (i = 0; i < RUNS; i++) { 218 | fill(arr, SIZE); 219 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 220 | start_time = utime(); 221 | sorter_bubble_sort(dst, SIZE); 222 | end_time = utime(); 223 | total_time += end_time - start_time; 224 | verify(dst, SIZE); 225 | } 226 | 227 | printf("bubble sort time: %10.2f us per iteration\n", total_time / RUNS); 228 | srand48(SEED); 229 | total_time = 0.0; 230 | #endif 231 | 232 | for (i = 0; i < RUNS; i++) { 233 | fill(arr, SIZE); 234 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 235 | start_time = utime(); 236 | sorter_merge_sort(dst, SIZE); 237 | end_time = utime(); 238 | total_time += end_time - start_time; 239 | verify(dst, SIZE); 240 | } 241 | 242 | printf("merge sort time: %10.2f us per iteration\n", total_time / RUNS); 243 | srand48(SEED); 244 | total_time = 0.0; 245 | 246 | for (i = 0; i < RUNS; i++) { 247 | fill(arr, SIZE); 248 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 249 | start_time = utime(); 250 | sorter_binary_insertion_sort(dst, SIZE); 251 | end_time = utime(); 252 | total_time += end_time - start_time; 253 | verify(dst, SIZE); 254 | } 255 | 256 | printf("binary insertion sort time: %10.2f us per iteration\n", total_time / RUNS); 257 | srand48(SEED); 258 | total_time = 0.0; 259 | 260 | for (i = 0; i < RUNS; i++) { 261 | fill(arr, SIZE); 262 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 263 | start_time = utime(); 264 | sorter_heap_sort(dst, SIZE); 265 | end_time = utime(); 266 | total_time += end_time - start_time; 267 | verify(dst, SIZE); 268 | } 269 | 270 | printf("heap sort time: %10.2f us per iteration\n", total_time / RUNS); 271 | srand48(SEED); 272 | total_time = 0.0; 273 | 274 | for (i = 0; i < RUNS; i++) { 275 | fill(arr, SIZE); 276 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 277 | start_time = utime(); 278 | sorter_shell_sort(dst, SIZE); 279 | end_time = utime(); 280 | total_time += end_time - start_time; 281 | verify(dst, SIZE); 282 | } 283 | 284 | printf("shell sort time: %10.2f us per iteration\n", total_time / RUNS); 285 | srand48(SEED); 286 | total_time = 0.0; 287 | 288 | for (i = 0; i < RUNS; i++) { 289 | fill(arr, SIZE); 290 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 291 | start_time = utime(); 292 | sorter_tim_sort(dst, SIZE); 293 | end_time = utime(); 294 | total_time += end_time - start_time; 295 | verify(dst, SIZE); 296 | } 297 | 298 | printf("tim sort time: %10.2f us per iteration\n", total_time / RUNS); 299 | } 300 | 301 | void run_tests2(void) { 302 | int i; 303 | int64_t arr[SIZE]; 304 | int64_t dst[SIZE]; 305 | double start_time; 306 | double end_time; 307 | double total_time; 308 | printf("Running tests - 2\n"); 309 | srand48(SEED); 310 | total_time = 0.0; 311 | 312 | for (i = 0; i < RUNS; i++) { 313 | fill(arr, SIZE); 314 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 315 | start_time = utime(); 316 | qsort(dst, SIZE, sizeof(int64_t), simple_cmp2); 317 | end_time = utime(); 318 | total_time += end_time - start_time; 319 | verify2(dst, SIZE); 320 | } 321 | 322 | printf("stdlib qsort time: %10.2f us per iteration\n", total_time / RUNS); 323 | #if !defined(__linux__) && !defined(__CYGWIN__) 324 | srand48(SEED); 325 | total_time = 0.0; 326 | 327 | for (i = 0; i < RUNS; i++) { 328 | fill(arr, SIZE); 329 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 330 | start_time = utime(); 331 | heapsort(dst, SIZE, sizeof(int64_t), simple_cmp2); 332 | end_time = utime(); 333 | total_time += end_time - start_time; 334 | verify2(dst, SIZE); 335 | } 336 | 337 | printf("stdlib heapsort time: %10.2f us per iteration\n", total_time / RUNS); 338 | srand48(SEED); 339 | total_time = 0.0; 340 | 341 | for (i = 0; i < RUNS; i++) { 342 | fill(arr, SIZE); 343 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 344 | start_time = utime(); 345 | mergesort(dst, SIZE, sizeof(int64_t), simple_cmp2); 346 | end_time = utime(); 347 | total_time += end_time - start_time; 348 | verify2(dst, SIZE); 349 | } 350 | 351 | printf("stdlib mergesort time: %10.2f us per iteration\n", total_time / RUNS); 352 | #endif 353 | srand48(SEED); 354 | total_time = 0.0; 355 | 356 | for (i = 0; i < RUNS; i++) { 357 | fill(arr, SIZE); 358 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 359 | start_time = utime(); 360 | sorter2_quick_sort(dst, SIZE); 361 | end_time = utime(); 362 | total_time += end_time - start_time; 363 | verify2(dst, SIZE); 364 | } 365 | 366 | printf("quick sort time: %10.2f us per iteration\n", total_time / RUNS); 367 | srand48(SEED); 368 | total_time = 0.0; 369 | #ifdef SET_SORT_EXTRA 370 | 371 | for (i = 0; i < RUNS; i++) { 372 | fill(arr, SIZE); 373 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 374 | start_time = utime(); 375 | sorter2_selection_sort(dst, SIZE); 376 | end_time = utime(); 377 | total_time += end_time - start_time; 378 | verify2(dst, SIZE); 379 | } 380 | 381 | printf("selection sort time: %10.2f us per iteration\n", total_time / RUNS); 382 | srand48(SEED); 383 | total_time = 0.0; 384 | 385 | for (i = 0; i < RUNS; i++) { 386 | fill(arr, SIZE); 387 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 388 | start_time = utime(); 389 | sorter2_bubble_sort(dst, SIZE); 390 | end_time = utime(); 391 | total_time += end_time - start_time; 392 | verify2(dst, SIZE); 393 | } 394 | 395 | printf("bubble sort time: %10.2f us per iteration\n", total_time / RUNS); 396 | srand48(SEED); 397 | total_time = 0.0; 398 | #endif 399 | 400 | for (i = 0; i < RUNS; i++) { 401 | fill(arr, SIZE); 402 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 403 | start_time = utime(); 404 | sorter2_merge_sort(dst, SIZE); 405 | end_time = utime(); 406 | total_time += end_time - start_time; 407 | verify2(dst, SIZE); 408 | } 409 | 410 | printf("merge sort time: %10.2f us per iteration\n", total_time / RUNS); 411 | srand48(SEED); 412 | total_time = 0.0; 413 | 414 | for (i = 0; i < RUNS; i++) { 415 | fill(arr, SIZE); 416 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 417 | start_time = utime(); 418 | sorter2_binary_insertion_sort(dst, SIZE); 419 | end_time = utime(); 420 | total_time += end_time - start_time; 421 | verify2(dst, SIZE); 422 | } 423 | 424 | printf("binary insertion sort time: %10.2f us per iteration\n", total_time / RUNS); 425 | srand48(SEED); 426 | total_time = 0.0; 427 | 428 | for (i = 0; i < RUNS; i++) { 429 | fill(arr, SIZE); 430 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 431 | start_time = utime(); 432 | sorter2_heap_sort(dst, SIZE); 433 | end_time = utime(); 434 | total_time += end_time - start_time; 435 | verify2(dst, SIZE); 436 | } 437 | 438 | printf("heap sort time: %10.2f us per iteration\n", total_time / RUNS); 439 | srand48(SEED); 440 | total_time = 0.0; 441 | 442 | for (i = 0; i < RUNS; i++) { 443 | fill(arr, SIZE); 444 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 445 | start_time = utime(); 446 | sorter2_shell_sort(dst, SIZE); 447 | end_time = utime(); 448 | total_time += end_time - start_time; 449 | verify2(dst, SIZE); 450 | } 451 | 452 | printf("shell sort time: %10.2f us per iteration\n", total_time / RUNS); 453 | srand48(SEED); 454 | total_time = 0.0; 455 | 456 | for (i = 0; i < RUNS; i++) { 457 | fill(arr, SIZE); 458 | memcpy(dst, arr, sizeof(int64_t) * SIZE); 459 | start_time = utime(); 460 | sorter2_tim_sort(dst, SIZE); 461 | end_time = utime(); 462 | total_time += end_time - start_time; 463 | verify2(dst, SIZE); 464 | } 465 | 466 | printf("tim sort time: %10.2f us per iteration\n", total_time / RUNS); 467 | } 468 | int main(void) { 469 | run_tests(); 470 | run_tests2(); 471 | return 0; 472 | } 473 | -------------------------------------------------------------------------------- /sort.h: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2010-2024 Christopher Swenson. */ 2 | /* Copyright (c) 2012 Vojtech Fried. */ 3 | /* Copyright (c) 2012 Google Inc. All Rights Reserved. */ 4 | 5 | #include 6 | #include 7 | #include 8 | #include 9 | 10 | #ifndef SORT_NAME 11 | #error "Must declare SORT_NAME" 12 | #endif 13 | 14 | #ifndef SORT_TYPE 15 | #error "Must declare SORT_TYPE" 16 | #endif 17 | 18 | #ifndef SORT_CMP 19 | #define SORT_CMP(x, y) ((x) < (y) ? -1 : ((y) < (x) ? 1 : 0)) 20 | #endif 21 | 22 | #ifndef SORT_DEF 23 | #define SORT_DEF 24 | #else 25 | #ifdef __GNUC__ 26 | #define SORT_GCC_PUSH 27 | #pragma GCC diagnostic push 28 | #pragma GCC diagnostic ignored "-Wunused-function" 29 | #endif 30 | #endif 31 | 32 | #ifdef __cplusplus 33 | #ifndef SORT_SAFE_CPY 34 | #define SORT_SAFE_CPY 0 35 | #endif 36 | #else 37 | #undef SORT_SAFE_CPY 38 | #define SORT_SAFE_CPY 0 39 | #endif 40 | 41 | #ifndef TIM_SORT_STACK_SIZE 42 | #define TIM_SORT_STACK_SIZE 128 43 | #endif 44 | 45 | #ifndef SORT_SWAP 46 | #define SORT_SWAP(x,y) {SORT_TYPE _sort_swap_temp = (x); (x) = (y); (y) = _sort_swap_temp;} 47 | #endif 48 | 49 | /* Common, type-agnostic functions and constants that we don't want to declare twice. */ 50 | #ifndef SORT_COMMON_H 51 | #define SORT_COMMON_H 52 | 53 | #ifndef MAX 54 | #define MAX(x,y) (((x) > (y) ? (x) : (y))) 55 | #endif 56 | 57 | #ifndef MIN 58 | #define MIN(x,y) (((x) < (y) ? (x) : (y))) 59 | #endif 60 | 61 | static int compute_minrun(const uint64_t); 62 | 63 | /* From http://oeis.org/classic/A102549 */ 64 | static const uint64_t shell_gaps[48] = {1, 4, 10, 23, 57, 132, 301, 701, 1750, 4376, 10941, 27353, 68383, 170958, 427396, 1068491, 2671228, 6678071, 16695178, 41737946, 104344866, 260862166, 652155416, 1630388541, 4075971353LL, 10189928383LL, 25474820958LL, 63687052396LL, 159217630991LL, 398044077478LL, 995110193696LL, 2487775484241LL, 6219438710603LL, 15548596776508LL, 38871491941271LL, 97178729853178LL, 242946824632946LL, 607367061582366LL, 1518417653955916LL, 3796044134889791LL, 9490110337224478LL, 23725275843061196LL, 59313189607652991LL, 148282974019132478LL, 370707435047831196LL, 926768587619577991LL, 2316921469048944978LL, 5792303672622362446LL}; 65 | 66 | #ifndef CLZ 67 | /* clang-only */ 68 | #ifndef __has_builtin 69 | #define __has_builtin(x) 0 70 | #endif 71 | #if __has_builtin(__builtin_clzll) || (defined(__GNUC__) && ((__GNUC__ == 3 && __GNUC_MINOR__ >= 4) || (__GNUC__ > 3))) 72 | #define CLZ __builtin_clzll 73 | #else 74 | 75 | static int clzll(uint64_t); 76 | 77 | /* adapted from Hacker's Delight */ 78 | static int clzll(uint64_t x) { 79 | int n; 80 | 81 | if (x == 0) { 82 | return 64; 83 | } 84 | 85 | n = 0; 86 | 87 | if (x <= 0x00000000FFFFFFFFL) { 88 | n = n + 32; 89 | x = x << 32; 90 | } 91 | 92 | if (x <= 0x0000FFFFFFFFFFFFL) { 93 | n = n + 16; 94 | x = x << 16; 95 | } 96 | 97 | if (x <= 0x00FFFFFFFFFFFFFFL) { 98 | n = n + 8; 99 | x = x << 8; 100 | } 101 | 102 | if (x <= 0x0FFFFFFFFFFFFFFFL) { 103 | n = n + 4; 104 | x = x << 4; 105 | } 106 | 107 | if (x <= 0x3FFFFFFFFFFFFFFFL) { 108 | n = n + 2; 109 | x = x << 2; 110 | } 111 | 112 | if (x <= 0x7FFFFFFFFFFFFFFFL) { 113 | n = n + 1; 114 | } 115 | 116 | return n; 117 | } 118 | 119 | #define CLZ clzll 120 | #endif 121 | #endif 122 | 123 | static __inline int compute_minrun(const uint64_t size) { 124 | const int top_bit = 64 - CLZ(size); 125 | const int shift = MAX(top_bit, 6) - 6; 126 | const int minrun = (int)(size >> shift); 127 | const uint64_t mask = (1ULL << shift) - 1; 128 | 129 | if (mask & size) { 130 | return minrun + 1; 131 | } 132 | 133 | return minrun; 134 | } 135 | 136 | static __inline size_t rbnd(size_t len) { 137 | int k; 138 | 139 | if (len < 16) { 140 | return 2; 141 | } 142 | 143 | k = 62 - CLZ(len); 144 | return 1ULL << ((2 * k) / 3); 145 | } 146 | 147 | #endif /* SORT_COMMON_H */ 148 | 149 | #define SORT_CONCAT(x, y) x ## _ ## y 150 | #define SORT_MAKE_STR1(x, y) SORT_CONCAT(x,y) 151 | #define SORT_MAKE_STR(x) SORT_MAKE_STR1(SORT_NAME,x) 152 | 153 | #ifndef SMALL_SORT_BND 154 | #define SMALL_SORT_BND 16 155 | #endif 156 | #ifndef SMALL_SORT 157 | #define SMALL_SORT BITONIC_SORT 158 | /*#define SMALL_SORT BINARY_INSERTION_SORT*/ 159 | #endif 160 | #ifndef SMALL_STABLE_SORT 161 | #define SMALL_STABLE_SORT BINARY_INSERTION_SORT 162 | #endif 163 | 164 | #define SORT_TYPE_CPY SORT_MAKE_STR(sort_type_cpy) 165 | #define SORT_TYPE_MOVE SORT_MAKE_STR(sort_type_move) 166 | #define SORT_NEW_BUFFER SORT_MAKE_STR(sort_new_buffer) 167 | #define SORT_DELETE_BUFFER SORT_MAKE_STR(sort_delete_buffer) 168 | #define BITONIC_SORT SORT_MAKE_STR(bitonic_sort) 169 | #define BINARY_INSERTION_FIND SORT_MAKE_STR(binary_insertion_find) 170 | #define BINARY_INSERTION_SORT_START SORT_MAKE_STR(binary_insertion_sort_start) 171 | #define BINARY_INSERTION_SORT SORT_MAKE_STR(binary_insertion_sort) 172 | #define REVERSE_ELEMENTS SORT_MAKE_STR(reverse_elements) 173 | #define COUNT_RUN SORT_MAKE_STR(count_run) 174 | #define CHECK_INVARIANT SORT_MAKE_STR(check_invariant) 175 | #define TIM_SORT SORT_MAKE_STR(tim_sort) 176 | #define TIM_SORT_RESIZE SORT_MAKE_STR(tim_sort_resize) 177 | #define TIM_SORT_MERGE SORT_MAKE_STR(tim_sort_merge) 178 | #define TIM_SORT_COLLAPSE SORT_MAKE_STR(tim_sort_collapse) 179 | #define HEAP_SORT SORT_MAKE_STR(heap_sort) 180 | #define MEDIAN SORT_MAKE_STR(median) 181 | #define QUICK_SORT SORT_MAKE_STR(quick_sort) 182 | #define MERGE_SORT SORT_MAKE_STR(merge_sort) 183 | #define MERGE_SORT_RECURSIVE SORT_MAKE_STR(merge_sort_recursive) 184 | #define MERGE_SORT_IN_PLACE SORT_MAKE_STR(merge_sort_in_place) 185 | #define MERGE_SORT_IN_PLACE_RMERGE SORT_MAKE_STR(merge_sort_in_place_rmerge) 186 | #define MERGE_SORT_IN_PLACE_BACKMERGE SORT_MAKE_STR(merge_sort_in_place_backmerge) 187 | #define MERGE_SORT_IN_PLACE_FRONTMERGE SORT_MAKE_STR(merge_sort_in_place_frontmerge) 188 | #define MERGE_SORT_IN_PLACE_ASWAP SORT_MAKE_STR(merge_sort_in_place_aswap) 189 | #define SHELL_SORT SORT_MAKE_STR(shell_sort) 190 | #define QUICK_SORT_PARTITION SORT_MAKE_STR(quick_sort_partition) 191 | #define QUICK_SORT_RECURSIVE SORT_MAKE_STR(quick_sort_recursive) 192 | #define HEAP_SIFT_DOWN SORT_MAKE_STR(heap_sift_down) 193 | #define HEAPIFY SORT_MAKE_STR(heapify) 194 | #define TIM_SORT_RUN_T SORT_MAKE_STR(tim_sort_run_t) 195 | #define TEMP_STORAGE_T SORT_MAKE_STR(temp_storage_t) 196 | #define PUSH_NEXT SORT_MAKE_STR(push_next) 197 | 198 | #ifndef MAX 199 | #define MAX(x,y) (((x) > (y) ? (x) : (y))) 200 | #endif 201 | #ifndef MIN 202 | #define MIN(x,y) (((x) < (y) ? (x) : (y))) 203 | #endif 204 | #ifndef SORT_CSWAP 205 | #define SORT_CSWAP(x, y) { if(SORT_CMP((x),(y)) > 0) {SORT_SWAP((x),(y));}} 206 | #endif 207 | 208 | typedef struct { 209 | size_t start; 210 | size_t length; 211 | } TIM_SORT_RUN_T; 212 | 213 | 214 | SORT_DEF void SHELL_SORT(SORT_TYPE *dst, const size_t size); 215 | SORT_DEF void BINARY_INSERTION_SORT(SORT_TYPE *dst, const size_t size); 216 | SORT_DEF void HEAP_SORT(SORT_TYPE *dst, const size_t size); 217 | SORT_DEF void QUICK_SORT(SORT_TYPE *dst, const size_t size); 218 | SORT_DEF void MERGE_SORT(SORT_TYPE *dst, const size_t size); 219 | SORT_DEF void MERGE_SORT_IN_PLACE(SORT_TYPE *dst, const size_t size); 220 | SORT_DEF void TIM_SORT(SORT_TYPE *dst, const size_t size); 221 | SORT_DEF void BITONIC_SORT(SORT_TYPE *dst, const size_t size); 222 | 223 | /* The full implementation of a bitonic sort is not here. Since we only want to use 224 | sorting networks for small length lists we create optimal sorting networks for 225 | lists of length <= 16 and call out to BINARY_INSERTION_SORT for anything larger 226 | than 16. 227 | Optimal sorting networks for small length lists. 228 | Taken from https://pages.ripco.net/~jgamble/nw.html */ 229 | #define BITONIC_SORT_2 SORT_MAKE_STR(bitonic_sort_2) 230 | static __inline void BITONIC_SORT_2(SORT_TYPE *dst) { 231 | SORT_CSWAP(dst[0], dst[1]); 232 | } 233 | 234 | 235 | #define BITONIC_SORT_3 SORT_MAKE_STR(bitonic_sort_3) 236 | static __inline void BITONIC_SORT_3(SORT_TYPE *dst) { 237 | SORT_CSWAP(dst[1], dst[2]); 238 | SORT_CSWAP(dst[0], dst[2]); 239 | SORT_CSWAP(dst[0], dst[1]); 240 | } 241 | 242 | 243 | #define BITONIC_SORT_4 SORT_MAKE_STR(bitonic_sort_4) 244 | static __inline void BITONIC_SORT_4(SORT_TYPE *dst) { 245 | SORT_CSWAP(dst[0], dst[1]); 246 | SORT_CSWAP(dst[2], dst[3]); 247 | SORT_CSWAP(dst[0], dst[2]); 248 | SORT_CSWAP(dst[1], dst[3]); 249 | SORT_CSWAP(dst[1], dst[2]); 250 | } 251 | 252 | 253 | #define BITONIC_SORT_5 SORT_MAKE_STR(bitonic_sort_5) 254 | static __inline void BITONIC_SORT_5(SORT_TYPE *dst) { 255 | SORT_CSWAP(dst[0], dst[1]); 256 | SORT_CSWAP(dst[3], dst[4]); 257 | SORT_CSWAP(dst[2], dst[4]); 258 | SORT_CSWAP(dst[2], dst[3]); 259 | SORT_CSWAP(dst[1], dst[4]); 260 | SORT_CSWAP(dst[0], dst[3]); 261 | SORT_CSWAP(dst[0], dst[2]); 262 | SORT_CSWAP(dst[1], dst[3]); 263 | SORT_CSWAP(dst[1], dst[2]); 264 | } 265 | 266 | 267 | #define BITONIC_SORT_6 SORT_MAKE_STR(bitonic_sort_6) 268 | static __inline void BITONIC_SORT_6(SORT_TYPE *dst) { 269 | SORT_CSWAP(dst[1], dst[2]); 270 | SORT_CSWAP(dst[4], dst[5]); 271 | SORT_CSWAP(dst[0], dst[2]); 272 | SORT_CSWAP(dst[3], dst[5]); 273 | SORT_CSWAP(dst[0], dst[1]); 274 | SORT_CSWAP(dst[3], dst[4]); 275 | SORT_CSWAP(dst[2], dst[5]); 276 | SORT_CSWAP(dst[0], dst[3]); 277 | SORT_CSWAP(dst[1], dst[4]); 278 | SORT_CSWAP(dst[2], dst[4]); 279 | SORT_CSWAP(dst[1], dst[3]); 280 | SORT_CSWAP(dst[2], dst[3]); 281 | } 282 | 283 | 284 | #define BITONIC_SORT_7 SORT_MAKE_STR(bitonic_sort_7) 285 | static __inline void BITONIC_SORT_7(SORT_TYPE *dst) { 286 | SORT_CSWAP(dst[1], dst[2]); 287 | SORT_CSWAP(dst[3], dst[4]); 288 | SORT_CSWAP(dst[5], dst[6]); 289 | SORT_CSWAP(dst[0], dst[2]); 290 | SORT_CSWAP(dst[3], dst[5]); 291 | SORT_CSWAP(dst[4], dst[6]); 292 | SORT_CSWAP(dst[0], dst[1]); 293 | SORT_CSWAP(dst[4], dst[5]); 294 | SORT_CSWAP(dst[2], dst[6]); 295 | SORT_CSWAP(dst[0], dst[4]); 296 | SORT_CSWAP(dst[1], dst[5]); 297 | SORT_CSWAP(dst[0], dst[3]); 298 | SORT_CSWAP(dst[2], dst[5]); 299 | SORT_CSWAP(dst[1], dst[3]); 300 | SORT_CSWAP(dst[2], dst[4]); 301 | SORT_CSWAP(dst[2], dst[3]); 302 | } 303 | 304 | 305 | #define BITONIC_SORT_8 SORT_MAKE_STR(bitonic_sort_8) 306 | static __inline void BITONIC_SORT_8(SORT_TYPE *dst) { 307 | SORT_CSWAP(dst[0], dst[1]); 308 | SORT_CSWAP(dst[2], dst[3]); 309 | SORT_CSWAP(dst[4], dst[5]); 310 | SORT_CSWAP(dst[6], dst[7]); 311 | SORT_CSWAP(dst[0], dst[2]); 312 | SORT_CSWAP(dst[1], dst[3]); 313 | SORT_CSWAP(dst[4], dst[6]); 314 | SORT_CSWAP(dst[5], dst[7]); 315 | SORT_CSWAP(dst[1], dst[2]); 316 | SORT_CSWAP(dst[5], dst[6]); 317 | SORT_CSWAP(dst[0], dst[4]); 318 | SORT_CSWAP(dst[3], dst[7]); 319 | SORT_CSWAP(dst[1], dst[5]); 320 | SORT_CSWAP(dst[2], dst[6]); 321 | SORT_CSWAP(dst[1], dst[4]); 322 | SORT_CSWAP(dst[3], dst[6]); 323 | SORT_CSWAP(dst[2], dst[4]); 324 | SORT_CSWAP(dst[3], dst[5]); 325 | SORT_CSWAP(dst[3], dst[4]); 326 | } 327 | 328 | 329 | #define BITONIC_SORT_9 SORT_MAKE_STR(bitonic_sort_9) 330 | static __inline void BITONIC_SORT_9(SORT_TYPE *dst) { 331 | SORT_CSWAP(dst[0], dst[1]); 332 | SORT_CSWAP(dst[3], dst[4]); 333 | SORT_CSWAP(dst[6], dst[7]); 334 | SORT_CSWAP(dst[1], dst[2]); 335 | SORT_CSWAP(dst[4], dst[5]); 336 | SORT_CSWAP(dst[7], dst[8]); 337 | SORT_CSWAP(dst[0], dst[1]); 338 | SORT_CSWAP(dst[3], dst[4]); 339 | SORT_CSWAP(dst[6], dst[7]); 340 | SORT_CSWAP(dst[2], dst[5]); 341 | SORT_CSWAP(dst[0], dst[3]); 342 | SORT_CSWAP(dst[1], dst[4]); 343 | SORT_CSWAP(dst[5], dst[8]); 344 | SORT_CSWAP(dst[3], dst[6]); 345 | SORT_CSWAP(dst[4], dst[7]); 346 | SORT_CSWAP(dst[2], dst[5]); 347 | SORT_CSWAP(dst[0], dst[3]); 348 | SORT_CSWAP(dst[1], dst[4]); 349 | SORT_CSWAP(dst[5], dst[7]); 350 | SORT_CSWAP(dst[2], dst[6]); 351 | SORT_CSWAP(dst[1], dst[3]); 352 | SORT_CSWAP(dst[4], dst[6]); 353 | SORT_CSWAP(dst[2], dst[4]); 354 | SORT_CSWAP(dst[5], dst[6]); 355 | SORT_CSWAP(dst[2], dst[3]); 356 | } 357 | 358 | 359 | #define BITONIC_SORT_10 SORT_MAKE_STR(bitonic_sort_10) 360 | static __inline void BITONIC_SORT_10(SORT_TYPE *dst) { 361 | SORT_CSWAP(dst[4], dst[9]); 362 | SORT_CSWAP(dst[3], dst[8]); 363 | SORT_CSWAP(dst[2], dst[7]); 364 | SORT_CSWAP(dst[1], dst[6]); 365 | SORT_CSWAP(dst[0], dst[5]); 366 | SORT_CSWAP(dst[1], dst[4]); 367 | SORT_CSWAP(dst[6], dst[9]); 368 | SORT_CSWAP(dst[0], dst[3]); 369 | SORT_CSWAP(dst[5], dst[8]); 370 | SORT_CSWAP(dst[0], dst[2]); 371 | SORT_CSWAP(dst[3], dst[6]); 372 | SORT_CSWAP(dst[7], dst[9]); 373 | SORT_CSWAP(dst[0], dst[1]); 374 | SORT_CSWAP(dst[2], dst[4]); 375 | SORT_CSWAP(dst[5], dst[7]); 376 | SORT_CSWAP(dst[8], dst[9]); 377 | SORT_CSWAP(dst[1], dst[2]); 378 | SORT_CSWAP(dst[4], dst[6]); 379 | SORT_CSWAP(dst[7], dst[8]); 380 | SORT_CSWAP(dst[3], dst[5]); 381 | SORT_CSWAP(dst[2], dst[5]); 382 | SORT_CSWAP(dst[6], dst[8]); 383 | SORT_CSWAP(dst[1], dst[3]); 384 | SORT_CSWAP(dst[4], dst[7]); 385 | SORT_CSWAP(dst[2], dst[3]); 386 | SORT_CSWAP(dst[6], dst[7]); 387 | SORT_CSWAP(dst[3], dst[4]); 388 | SORT_CSWAP(dst[5], dst[6]); 389 | SORT_CSWAP(dst[4], dst[5]); 390 | } 391 | 392 | 393 | #define BITONIC_SORT_11 SORT_MAKE_STR(bitonic_sort_11) 394 | static __inline void BITONIC_SORT_11(SORT_TYPE *dst) { 395 | SORT_CSWAP(dst[0], dst[1]); 396 | SORT_CSWAP(dst[2], dst[3]); 397 | SORT_CSWAP(dst[4], dst[5]); 398 | SORT_CSWAP(dst[6], dst[7]); 399 | SORT_CSWAP(dst[8], dst[9]); 400 | SORT_CSWAP(dst[1], dst[3]); 401 | SORT_CSWAP(dst[5], dst[7]); 402 | SORT_CSWAP(dst[0], dst[2]); 403 | SORT_CSWAP(dst[4], dst[6]); 404 | SORT_CSWAP(dst[8], dst[10]); 405 | SORT_CSWAP(dst[1], dst[2]); 406 | SORT_CSWAP(dst[5], dst[6]); 407 | SORT_CSWAP(dst[9], dst[10]); 408 | SORT_CSWAP(dst[0], dst[4]); 409 | SORT_CSWAP(dst[3], dst[7]); 410 | SORT_CSWAP(dst[1], dst[5]); 411 | SORT_CSWAP(dst[6], dst[10]); 412 | SORT_CSWAP(dst[4], dst[8]); 413 | SORT_CSWAP(dst[5], dst[9]); 414 | SORT_CSWAP(dst[2], dst[6]); 415 | SORT_CSWAP(dst[0], dst[4]); 416 | SORT_CSWAP(dst[3], dst[8]); 417 | SORT_CSWAP(dst[1], dst[5]); 418 | SORT_CSWAP(dst[6], dst[10]); 419 | SORT_CSWAP(dst[2], dst[3]); 420 | SORT_CSWAP(dst[8], dst[9]); 421 | SORT_CSWAP(dst[1], dst[4]); 422 | SORT_CSWAP(dst[7], dst[10]); 423 | SORT_CSWAP(dst[3], dst[5]); 424 | SORT_CSWAP(dst[6], dst[8]); 425 | SORT_CSWAP(dst[2], dst[4]); 426 | SORT_CSWAP(dst[7], dst[9]); 427 | SORT_CSWAP(dst[5], dst[6]); 428 | SORT_CSWAP(dst[3], dst[4]); 429 | SORT_CSWAP(dst[7], dst[8]); 430 | } 431 | 432 | 433 | #define BITONIC_SORT_12 SORT_MAKE_STR(bitonic_sort_12) 434 | static __inline void BITONIC_SORT_12(SORT_TYPE *dst) { 435 | SORT_CSWAP(dst[0], dst[1]); 436 | SORT_CSWAP(dst[2], dst[3]); 437 | SORT_CSWAP(dst[4], dst[5]); 438 | SORT_CSWAP(dst[6], dst[7]); 439 | SORT_CSWAP(dst[8], dst[9]); 440 | SORT_CSWAP(dst[10], dst[11]); 441 | SORT_CSWAP(dst[1], dst[3]); 442 | SORT_CSWAP(dst[5], dst[7]); 443 | SORT_CSWAP(dst[9], dst[11]); 444 | SORT_CSWAP(dst[0], dst[2]); 445 | SORT_CSWAP(dst[4], dst[6]); 446 | SORT_CSWAP(dst[8], dst[10]); 447 | SORT_CSWAP(dst[1], dst[2]); 448 | SORT_CSWAP(dst[5], dst[6]); 449 | SORT_CSWAP(dst[9], dst[10]); 450 | SORT_CSWAP(dst[0], dst[4]); 451 | SORT_CSWAP(dst[7], dst[11]); 452 | SORT_CSWAP(dst[1], dst[5]); 453 | SORT_CSWAP(dst[6], dst[10]); 454 | SORT_CSWAP(dst[3], dst[7]); 455 | SORT_CSWAP(dst[4], dst[8]); 456 | SORT_CSWAP(dst[5], dst[9]); 457 | SORT_CSWAP(dst[2], dst[6]); 458 | SORT_CSWAP(dst[0], dst[4]); 459 | SORT_CSWAP(dst[7], dst[11]); 460 | SORT_CSWAP(dst[3], dst[8]); 461 | SORT_CSWAP(dst[1], dst[5]); 462 | SORT_CSWAP(dst[6], dst[10]); 463 | SORT_CSWAP(dst[2], dst[3]); 464 | SORT_CSWAP(dst[8], dst[9]); 465 | SORT_CSWAP(dst[1], dst[4]); 466 | SORT_CSWAP(dst[7], dst[10]); 467 | SORT_CSWAP(dst[3], dst[5]); 468 | SORT_CSWAP(dst[6], dst[8]); 469 | SORT_CSWAP(dst[2], dst[4]); 470 | SORT_CSWAP(dst[7], dst[9]); 471 | SORT_CSWAP(dst[5], dst[6]); 472 | SORT_CSWAP(dst[3], dst[4]); 473 | SORT_CSWAP(dst[7], dst[8]); 474 | } 475 | 476 | 477 | #define BITONIC_SORT_13 SORT_MAKE_STR(bitonic_sort_13) 478 | static __inline void BITONIC_SORT_13(SORT_TYPE *dst) { 479 | SORT_CSWAP(dst[1], dst[7]); 480 | SORT_CSWAP(dst[9], dst[11]); 481 | SORT_CSWAP(dst[3], dst[4]); 482 | SORT_CSWAP(dst[5], dst[8]); 483 | SORT_CSWAP(dst[0], dst[12]); 484 | SORT_CSWAP(dst[2], dst[6]); 485 | SORT_CSWAP(dst[0], dst[1]); 486 | SORT_CSWAP(dst[2], dst[3]); 487 | SORT_CSWAP(dst[4], dst[6]); 488 | SORT_CSWAP(dst[8], dst[11]); 489 | SORT_CSWAP(dst[7], dst[12]); 490 | SORT_CSWAP(dst[5], dst[9]); 491 | SORT_CSWAP(dst[0], dst[2]); 492 | SORT_CSWAP(dst[3], dst[7]); 493 | SORT_CSWAP(dst[10], dst[11]); 494 | SORT_CSWAP(dst[1], dst[4]); 495 | SORT_CSWAP(dst[6], dst[12]); 496 | SORT_CSWAP(dst[7], dst[8]); 497 | SORT_CSWAP(dst[11], dst[12]); 498 | SORT_CSWAP(dst[4], dst[9]); 499 | SORT_CSWAP(dst[6], dst[10]); 500 | SORT_CSWAP(dst[3], dst[4]); 501 | SORT_CSWAP(dst[5], dst[6]); 502 | SORT_CSWAP(dst[8], dst[9]); 503 | SORT_CSWAP(dst[10], dst[11]); 504 | SORT_CSWAP(dst[1], dst[7]); 505 | SORT_CSWAP(dst[2], dst[6]); 506 | SORT_CSWAP(dst[9], dst[11]); 507 | SORT_CSWAP(dst[1], dst[3]); 508 | SORT_CSWAP(dst[4], dst[7]); 509 | SORT_CSWAP(dst[8], dst[10]); 510 | SORT_CSWAP(dst[0], dst[5]); 511 | SORT_CSWAP(dst[2], dst[5]); 512 | SORT_CSWAP(dst[6], dst[8]); 513 | SORT_CSWAP(dst[9], dst[10]); 514 | SORT_CSWAP(dst[1], dst[2]); 515 | SORT_CSWAP(dst[3], dst[5]); 516 | SORT_CSWAP(dst[7], dst[8]); 517 | SORT_CSWAP(dst[4], dst[6]); 518 | SORT_CSWAP(dst[2], dst[3]); 519 | SORT_CSWAP(dst[4], dst[5]); 520 | SORT_CSWAP(dst[6], dst[7]); 521 | SORT_CSWAP(dst[8], dst[9]); 522 | SORT_CSWAP(dst[3], dst[4]); 523 | SORT_CSWAP(dst[5], dst[6]); 524 | } 525 | 526 | 527 | #define BITONIC_SORT_14 SORT_MAKE_STR(bitonic_sort_14) 528 | static __inline void BITONIC_SORT_14(SORT_TYPE *dst) { 529 | SORT_CSWAP(dst[0], dst[1]); 530 | SORT_CSWAP(dst[2], dst[3]); 531 | SORT_CSWAP(dst[4], dst[5]); 532 | SORT_CSWAP(dst[6], dst[7]); 533 | SORT_CSWAP(dst[8], dst[9]); 534 | SORT_CSWAP(dst[10], dst[11]); 535 | SORT_CSWAP(dst[12], dst[13]); 536 | SORT_CSWAP(dst[0], dst[2]); 537 | SORT_CSWAP(dst[4], dst[6]); 538 | SORT_CSWAP(dst[8], dst[10]); 539 | SORT_CSWAP(dst[1], dst[3]); 540 | SORT_CSWAP(dst[5], dst[7]); 541 | SORT_CSWAP(dst[9], dst[11]); 542 | SORT_CSWAP(dst[0], dst[4]); 543 | SORT_CSWAP(dst[8], dst[12]); 544 | SORT_CSWAP(dst[1], dst[5]); 545 | SORT_CSWAP(dst[9], dst[13]); 546 | SORT_CSWAP(dst[2], dst[6]); 547 | SORT_CSWAP(dst[3], dst[7]); 548 | SORT_CSWAP(dst[0], dst[8]); 549 | SORT_CSWAP(dst[1], dst[9]); 550 | SORT_CSWAP(dst[2], dst[10]); 551 | SORT_CSWAP(dst[3], dst[11]); 552 | SORT_CSWAP(dst[4], dst[12]); 553 | SORT_CSWAP(dst[5], dst[13]); 554 | SORT_CSWAP(dst[5], dst[10]); 555 | SORT_CSWAP(dst[6], dst[9]); 556 | SORT_CSWAP(dst[3], dst[12]); 557 | SORT_CSWAP(dst[7], dst[11]); 558 | SORT_CSWAP(dst[1], dst[2]); 559 | SORT_CSWAP(dst[4], dst[8]); 560 | SORT_CSWAP(dst[1], dst[4]); 561 | SORT_CSWAP(dst[7], dst[13]); 562 | SORT_CSWAP(dst[2], dst[8]); 563 | SORT_CSWAP(dst[5], dst[6]); 564 | SORT_CSWAP(dst[9], dst[10]); 565 | SORT_CSWAP(dst[2], dst[4]); 566 | SORT_CSWAP(dst[11], dst[13]); 567 | SORT_CSWAP(dst[3], dst[8]); 568 | SORT_CSWAP(dst[7], dst[12]); 569 | SORT_CSWAP(dst[6], dst[8]); 570 | SORT_CSWAP(dst[10], dst[12]); 571 | SORT_CSWAP(dst[3], dst[5]); 572 | SORT_CSWAP(dst[7], dst[9]); 573 | SORT_CSWAP(dst[3], dst[4]); 574 | SORT_CSWAP(dst[5], dst[6]); 575 | SORT_CSWAP(dst[7], dst[8]); 576 | SORT_CSWAP(dst[9], dst[10]); 577 | SORT_CSWAP(dst[11], dst[12]); 578 | SORT_CSWAP(dst[6], dst[7]); 579 | SORT_CSWAP(dst[8], dst[9]); 580 | } 581 | 582 | 583 | #define BITONIC_SORT_15 SORT_MAKE_STR(bitonic_sort_15) 584 | static __inline void BITONIC_SORT_15(SORT_TYPE *dst) { 585 | SORT_CSWAP(dst[0], dst[1]); 586 | SORT_CSWAP(dst[2], dst[3]); 587 | SORT_CSWAP(dst[4], dst[5]); 588 | SORT_CSWAP(dst[6], dst[7]); 589 | SORT_CSWAP(dst[8], dst[9]); 590 | SORT_CSWAP(dst[10], dst[11]); 591 | SORT_CSWAP(dst[12], dst[13]); 592 | SORT_CSWAP(dst[0], dst[2]); 593 | SORT_CSWAP(dst[4], dst[6]); 594 | SORT_CSWAP(dst[8], dst[10]); 595 | SORT_CSWAP(dst[12], dst[14]); 596 | SORT_CSWAP(dst[1], dst[3]); 597 | SORT_CSWAP(dst[5], dst[7]); 598 | SORT_CSWAP(dst[9], dst[11]); 599 | SORT_CSWAP(dst[0], dst[4]); 600 | SORT_CSWAP(dst[8], dst[12]); 601 | SORT_CSWAP(dst[1], dst[5]); 602 | SORT_CSWAP(dst[9], dst[13]); 603 | SORT_CSWAP(dst[2], dst[6]); 604 | SORT_CSWAP(dst[10], dst[14]); 605 | SORT_CSWAP(dst[3], dst[7]); 606 | SORT_CSWAP(dst[0], dst[8]); 607 | SORT_CSWAP(dst[1], dst[9]); 608 | SORT_CSWAP(dst[2], dst[10]); 609 | SORT_CSWAP(dst[3], dst[11]); 610 | SORT_CSWAP(dst[4], dst[12]); 611 | SORT_CSWAP(dst[5], dst[13]); 612 | SORT_CSWAP(dst[6], dst[14]); 613 | SORT_CSWAP(dst[5], dst[10]); 614 | SORT_CSWAP(dst[6], dst[9]); 615 | SORT_CSWAP(dst[3], dst[12]); 616 | SORT_CSWAP(dst[13], dst[14]); 617 | SORT_CSWAP(dst[7], dst[11]); 618 | SORT_CSWAP(dst[1], dst[2]); 619 | SORT_CSWAP(dst[4], dst[8]); 620 | SORT_CSWAP(dst[1], dst[4]); 621 | SORT_CSWAP(dst[7], dst[13]); 622 | SORT_CSWAP(dst[2], dst[8]); 623 | SORT_CSWAP(dst[11], dst[14]); 624 | SORT_CSWAP(dst[5], dst[6]); 625 | SORT_CSWAP(dst[9], dst[10]); 626 | SORT_CSWAP(dst[2], dst[4]); 627 | SORT_CSWAP(dst[11], dst[13]); 628 | SORT_CSWAP(dst[3], dst[8]); 629 | SORT_CSWAP(dst[7], dst[12]); 630 | SORT_CSWAP(dst[6], dst[8]); 631 | SORT_CSWAP(dst[10], dst[12]); 632 | SORT_CSWAP(dst[3], dst[5]); 633 | SORT_CSWAP(dst[7], dst[9]); 634 | SORT_CSWAP(dst[3], dst[4]); 635 | SORT_CSWAP(dst[5], dst[6]); 636 | SORT_CSWAP(dst[7], dst[8]); 637 | SORT_CSWAP(dst[9], dst[10]); 638 | SORT_CSWAP(dst[11], dst[12]); 639 | SORT_CSWAP(dst[6], dst[7]); 640 | SORT_CSWAP(dst[8], dst[9]); 641 | } 642 | 643 | 644 | #define BITONIC_SORT_16 SORT_MAKE_STR(bitonic_sort_16) 645 | static __inline void BITONIC_SORT_16(SORT_TYPE *dst) { 646 | SORT_CSWAP(dst[0], dst[1]); 647 | SORT_CSWAP(dst[2], dst[3]); 648 | SORT_CSWAP(dst[4], dst[5]); 649 | SORT_CSWAP(dst[6], dst[7]); 650 | SORT_CSWAP(dst[8], dst[9]); 651 | SORT_CSWAP(dst[10], dst[11]); 652 | SORT_CSWAP(dst[12], dst[13]); 653 | SORT_CSWAP(dst[14], dst[15]); 654 | SORT_CSWAP(dst[0], dst[2]); 655 | SORT_CSWAP(dst[4], dst[6]); 656 | SORT_CSWAP(dst[8], dst[10]); 657 | SORT_CSWAP(dst[12], dst[14]); 658 | SORT_CSWAP(dst[1], dst[3]); 659 | SORT_CSWAP(dst[5], dst[7]); 660 | SORT_CSWAP(dst[9], dst[11]); 661 | SORT_CSWAP(dst[13], dst[15]); 662 | SORT_CSWAP(dst[0], dst[4]); 663 | SORT_CSWAP(dst[8], dst[12]); 664 | SORT_CSWAP(dst[1], dst[5]); 665 | SORT_CSWAP(dst[9], dst[13]); 666 | SORT_CSWAP(dst[2], dst[6]); 667 | SORT_CSWAP(dst[10], dst[14]); 668 | SORT_CSWAP(dst[3], dst[7]); 669 | SORT_CSWAP(dst[11], dst[15]); 670 | SORT_CSWAP(dst[0], dst[8]); 671 | SORT_CSWAP(dst[1], dst[9]); 672 | SORT_CSWAP(dst[2], dst[10]); 673 | SORT_CSWAP(dst[3], dst[11]); 674 | SORT_CSWAP(dst[4], dst[12]); 675 | SORT_CSWAP(dst[5], dst[13]); 676 | SORT_CSWAP(dst[6], dst[14]); 677 | SORT_CSWAP(dst[7], dst[15]); 678 | SORT_CSWAP(dst[5], dst[10]); 679 | SORT_CSWAP(dst[6], dst[9]); 680 | SORT_CSWAP(dst[3], dst[12]); 681 | SORT_CSWAP(dst[13], dst[14]); 682 | SORT_CSWAP(dst[7], dst[11]); 683 | SORT_CSWAP(dst[1], dst[2]); 684 | SORT_CSWAP(dst[4], dst[8]); 685 | SORT_CSWAP(dst[1], dst[4]); 686 | SORT_CSWAP(dst[7], dst[13]); 687 | SORT_CSWAP(dst[2], dst[8]); 688 | SORT_CSWAP(dst[11], dst[14]); 689 | SORT_CSWAP(dst[5], dst[6]); 690 | SORT_CSWAP(dst[9], dst[10]); 691 | SORT_CSWAP(dst[2], dst[4]); 692 | SORT_CSWAP(dst[11], dst[13]); 693 | SORT_CSWAP(dst[3], dst[8]); 694 | SORT_CSWAP(dst[7], dst[12]); 695 | SORT_CSWAP(dst[6], dst[8]); 696 | SORT_CSWAP(dst[10], dst[12]); 697 | SORT_CSWAP(dst[3], dst[5]); 698 | SORT_CSWAP(dst[7], dst[9]); 699 | SORT_CSWAP(dst[3], dst[4]); 700 | SORT_CSWAP(dst[5], dst[6]); 701 | SORT_CSWAP(dst[7], dst[8]); 702 | SORT_CSWAP(dst[9], dst[10]); 703 | SORT_CSWAP(dst[11], dst[12]); 704 | SORT_CSWAP(dst[6], dst[7]); 705 | SORT_CSWAP(dst[8], dst[9]); 706 | } 707 | 708 | SORT_DEF void BITONIC_SORT(SORT_TYPE *dst, const size_t size) { 709 | switch (size) { 710 | case 0: 711 | case 1: 712 | break; 713 | 714 | case 2: 715 | BITONIC_SORT_2(dst); 716 | break; 717 | 718 | case 3: 719 | BITONIC_SORT_3(dst); 720 | break; 721 | 722 | case 4: 723 | BITONIC_SORT_4(dst); 724 | break; 725 | 726 | case 5: 727 | BITONIC_SORT_5(dst); 728 | break; 729 | 730 | case 6: 731 | BITONIC_SORT_6(dst); 732 | break; 733 | 734 | case 7: 735 | BITONIC_SORT_7(dst); 736 | break; 737 | 738 | case 8: 739 | BITONIC_SORT_8(dst); 740 | break; 741 | 742 | case 9: 743 | BITONIC_SORT_9(dst); 744 | break; 745 | 746 | case 10: 747 | BITONIC_SORT_10(dst); 748 | break; 749 | 750 | case 11: 751 | BITONIC_SORT_11(dst); 752 | break; 753 | 754 | case 12: 755 | BITONIC_SORT_12(dst); 756 | break; 757 | 758 | case 13: 759 | BITONIC_SORT_13(dst); 760 | break; 761 | 762 | case 14: 763 | BITONIC_SORT_14(dst); 764 | break; 765 | 766 | case 15: 767 | BITONIC_SORT_15(dst); 768 | break; 769 | 770 | case 16: 771 | BITONIC_SORT_16(dst); 772 | break; 773 | 774 | default: 775 | BINARY_INSERTION_SORT(dst, size); 776 | } 777 | } 778 | 779 | #if SORT_SAFE_CPY 780 | 781 | SORT_DEF void SORT_TYPE_CPY(SORT_TYPE *dst, SORT_TYPE *src, const size_t size) { 782 | size_t i = 0; 783 | 784 | for (; i < size; ++i) { 785 | dst[i] = src[i]; 786 | } 787 | } 788 | 789 | SORT_DEF void SORT_TYPE_MOVE(SORT_TYPE *dst, SORT_TYPE *src, const size_t size) { 790 | size_t i; 791 | 792 | if (dst < src) { 793 | SORT_TYPE_CPY(dst, src, size); 794 | } else if (dst != src && size > 0) { 795 | for (i = size - 1; i > 0; --i) { 796 | dst[i] = src[i]; 797 | } 798 | 799 | *dst = *src; 800 | } 801 | } 802 | 803 | #else 804 | 805 | #undef SORT_TYPE_CPY 806 | #define SORT_TYPE_CPY(dst, src, size) memcpy((dst), (src), (size) * sizeof(SORT_TYPE)) 807 | #undef SORT_TYPE_MOVE 808 | #define SORT_TYPE_MOVE(dst, src, size) memmove((dst), (src), (size) * sizeof(SORT_TYPE)) 809 | 810 | #endif 811 | 812 | SORT_DEF SORT_TYPE* SORT_NEW_BUFFER(size_t size) { 813 | #if SORT_SAFE_CPY 814 | return new SORT_TYPE[size]; 815 | #else 816 | return (SORT_TYPE*)malloc(size * sizeof(SORT_TYPE)); 817 | #endif 818 | } 819 | 820 | SORT_DEF void SORT_DELETE_BUFFER(SORT_TYPE* pointer) { 821 | #if SORT_SAFE_CPY 822 | delete[] pointer; 823 | #else 824 | free(pointer); 825 | #endif 826 | } 827 | 828 | 829 | /* Shell sort implementation based on Wikipedia article 830 | http://en.wikipedia.org/wiki/Shell_sort 831 | */ 832 | SORT_DEF void SHELL_SORT(SORT_TYPE *dst, const size_t size) { 833 | /* don't bother sorting an array of size 0 or 1 */ 834 | /* TODO: binary search to find first gap? */ 835 | int inci = 47; 836 | size_t inc = shell_gaps[inci]; 837 | size_t i; 838 | 839 | if (size <= 1) { 840 | return; 841 | } 842 | 843 | while (inc > (size >> 1)) { 844 | inc = shell_gaps[--inci]; 845 | } 846 | 847 | while (1) { 848 | for (i = inc; i < size; i++) { 849 | SORT_TYPE temp = dst[i]; 850 | size_t j = i; 851 | 852 | while ((j >= inc) && (SORT_CMP(dst[j - inc], temp) > 0)) { 853 | dst[j] = dst[j - inc]; 854 | j -= inc; 855 | } 856 | 857 | dst[j] = temp; 858 | } 859 | 860 | if (inc == 1) { 861 | break; 862 | } 863 | 864 | inc = shell_gaps[--inci]; 865 | } 866 | } 867 | 868 | /* Function used to do a binary search for binary insertion sort */ 869 | static __inline size_t BINARY_INSERTION_FIND(SORT_TYPE *dst, const SORT_TYPE x, 870 | const size_t size) { 871 | size_t l, c, r; 872 | SORT_TYPE cx; 873 | l = 0; 874 | r = size - 1; 875 | c = r >> 1; 876 | 877 | /* check for out of bounds at the beginning. */ 878 | if (SORT_CMP(x, dst[0]) < 0) { 879 | return 0; 880 | } else if (SORT_CMP(x, dst[r]) > 0) { 881 | return r; 882 | } 883 | 884 | cx = dst[c]; 885 | 886 | while (1) { 887 | const int val = SORT_CMP(x, cx); 888 | 889 | if (val < 0) { 890 | if (c - l <= 1) { 891 | return c; 892 | } 893 | 894 | r = c; 895 | } else { /* allow = for stability. The binary search favors the right. */ 896 | if (r - c <= 1) { 897 | return c + 1; 898 | } 899 | 900 | l = c; 901 | } 902 | 903 | c = l + ((r - l) >> 1); 904 | cx = dst[c]; 905 | } 906 | } 907 | 908 | /* Binary insertion sort, but knowing that the first "start" entries are sorted. Used in timsort. */ 909 | static void BINARY_INSERTION_SORT_START(SORT_TYPE *dst, const size_t start, const size_t size) { 910 | size_t i; 911 | 912 | for (i = start; i < size; i++) { 913 | size_t j; 914 | SORT_TYPE x; 915 | size_t location; 916 | 917 | /* If this entry is already correct, just move along */ 918 | if (SORT_CMP(dst[i - 1], dst[i]) <= 0) { 919 | continue; 920 | } 921 | 922 | /* Else we need to find the right place, shift everything over, and squeeze in */ 923 | x = dst[i]; 924 | location = BINARY_INSERTION_FIND(dst, x, i); 925 | 926 | for (j = i - 1; j >= location; j--) { 927 | dst[j + 1] = dst[j]; 928 | 929 | if (j == 0) { /* check edge case because j is unsigned */ 930 | break; 931 | } 932 | } 933 | 934 | dst[location] = x; 935 | } 936 | } 937 | 938 | /* Binary insertion sort */ 939 | SORT_DEF void BINARY_INSERTION_SORT(SORT_TYPE *dst, const size_t size) { 940 | /* don't bother sorting an array of size <= 1 */ 941 | if (size <= 1) { 942 | return; 943 | } 944 | 945 | BINARY_INSERTION_SORT_START(dst, 1, size); 946 | } 947 | 948 | /* In-place mergesort */ 949 | SORT_DEF void MERGE_SORT_IN_PLACE_ASWAP(SORT_TYPE * dst1, SORT_TYPE * dst2, size_t len) { 950 | do { 951 | SORT_SWAP(*dst1, *dst2); 952 | dst1++; 953 | dst2++; 954 | } while (--len); 955 | } 956 | 957 | SORT_DEF void MERGE_SORT_IN_PLACE_FRONTMERGE(SORT_TYPE *dst1, size_t l1, SORT_TYPE *dst2, 958 | size_t l2) { 959 | SORT_TYPE *dst0 = dst2 - l1; 960 | 961 | if (SORT_CMP(dst1[l1 - 1], dst2[0]) <= 0) { 962 | MERGE_SORT_IN_PLACE_ASWAP(dst1, dst0, l1); 963 | return; 964 | } 965 | 966 | do { 967 | while (SORT_CMP(*dst2, *dst1) > 0) { 968 | SORT_SWAP(*dst1, *dst0); 969 | dst1++; 970 | dst0++; 971 | 972 | if (--l1 == 0) { 973 | return; 974 | } 975 | } 976 | 977 | SORT_SWAP(*dst2, *dst0); 978 | dst2++; 979 | dst0++; 980 | } while (--l2); 981 | 982 | do { 983 | SORT_SWAP(*dst1, *dst0); 984 | dst1++; 985 | dst0++; 986 | } while (--l1); 987 | } 988 | 989 | SORT_DEF size_t MERGE_SORT_IN_PLACE_BACKMERGE(SORT_TYPE * dst1, size_t l1, SORT_TYPE * dst2, 990 | size_t l2) { 991 | size_t res; 992 | SORT_TYPE *dst0 = dst2 + l1; 993 | 994 | if (SORT_CMP(dst1[1 - l1], dst2[0]) >= 0) { 995 | MERGE_SORT_IN_PLACE_ASWAP(dst1 - l1 + 1, dst0 - l1 + 1, l1); 996 | return l1; 997 | } 998 | 999 | do { 1000 | while (SORT_CMP(*dst2, *dst1) < 0) { 1001 | SORT_SWAP(*dst1, *dst0); 1002 | dst1--; 1003 | dst0--; 1004 | 1005 | if (--l1 == 0) { 1006 | return 0; 1007 | } 1008 | } 1009 | 1010 | SORT_SWAP(*dst2, *dst0); 1011 | dst2--; 1012 | dst0--; 1013 | } while (--l2); 1014 | 1015 | res = l1; 1016 | 1017 | do { 1018 | SORT_SWAP(*dst1, *dst0); 1019 | dst1--; 1020 | dst0--; 1021 | } while (--l1); 1022 | 1023 | return res; 1024 | } 1025 | 1026 | /* merge dst[p0..p1) by buffer dst[p1..p1+r) */ 1027 | SORT_DEF void MERGE_SORT_IN_PLACE_RMERGE(SORT_TYPE *dst, size_t len, size_t lp, size_t r) { 1028 | size_t i, lq; 1029 | int cv; 1030 | 1031 | if (SORT_CMP(dst[lp], dst[lp - 1]) >= 0) { 1032 | return; 1033 | } 1034 | 1035 | lq = lp; 1036 | 1037 | for (i = 0; i < len; i += r) { 1038 | /* select smallest dst[p0+n*r] */ 1039 | size_t q = i, j; 1040 | 1041 | for (j = lp; j <= lq; j += r) { 1042 | cv = SORT_CMP(dst[j], dst[q]); 1043 | 1044 | if (cv == 0) { 1045 | cv = SORT_CMP(dst[j + r - 1], dst[q + r - 1]); 1046 | } 1047 | 1048 | if (cv < 0) { 1049 | q = j; 1050 | } 1051 | } 1052 | 1053 | if (q != i) { 1054 | MERGE_SORT_IN_PLACE_ASWAP(dst + i, dst + q, r); /* swap it with current position */ 1055 | 1056 | if (q == lq && q < (len - r)) { 1057 | lq += r; 1058 | } 1059 | } 1060 | 1061 | if (i != 0 && SORT_CMP(dst[i], dst[i - 1]) < 0) { 1062 | MERGE_SORT_IN_PLACE_ASWAP(dst + len, dst + i, r); /* swap current position with buffer */ 1063 | MERGE_SORT_IN_PLACE_BACKMERGE(dst + (len + r - 1), r, dst + (i - 1), 1064 | r); /* buffer :merge: dst[i-r..i) -> dst[i-r..i+r) */ 1065 | } 1066 | 1067 | if (lp == i) { 1068 | lp += r; 1069 | } 1070 | } 1071 | } 1072 | 1073 | /* In-place Merge Sort implementation. (c)2012, Andrey Astrelin, astrelin@tochka.ru */ 1074 | SORT_DEF void MERGE_SORT_IN_PLACE(SORT_TYPE *dst, const size_t len) { 1075 | /* don't bother sorting an array of size <= 1 */ 1076 | size_t r = rbnd(len); 1077 | size_t lr = (len / r - 1) * r; 1078 | SORT_TYPE *dst1 = dst - 1; 1079 | size_t p, m, q, q1, p0; 1080 | 1081 | if (len <= 1) { 1082 | return; 1083 | } 1084 | 1085 | if (len <= SMALL_SORT_BND) { 1086 | SMALL_SORT(dst, len); 1087 | return; 1088 | } 1089 | 1090 | for (p = 2; p <= lr; p += 2) { 1091 | dst1 += 2; 1092 | 1093 | if (SORT_CMP(dst1[0], dst1[-1]) < 0) { 1094 | SORT_SWAP(dst1[0], dst1[-1]); 1095 | } 1096 | 1097 | if (p & 2) { 1098 | continue; 1099 | } 1100 | 1101 | m = len - p; 1102 | q = 2; 1103 | 1104 | while ((p & q) == 0) { 1105 | if (SORT_CMP(dst1[1 - q], dst1[-(int) q]) < 0) { 1106 | break; 1107 | } 1108 | 1109 | q *= 2; 1110 | } 1111 | 1112 | if (p & q) { 1113 | continue; 1114 | } 1115 | 1116 | if (q < m) { 1117 | p0 = len - q; 1118 | MERGE_SORT_IN_PLACE_ASWAP(dst + p - q, dst + p0, q); 1119 | 1120 | for (;;) { 1121 | q1 = 2 * q; 1122 | 1123 | if ((q1 > m) || (p & q1)) { 1124 | break; 1125 | } 1126 | 1127 | p0 = len - q1; 1128 | MERGE_SORT_IN_PLACE_FRONTMERGE(dst + (p - q1), q, dst + p0 + q, q); 1129 | q = q1; 1130 | } 1131 | 1132 | MERGE_SORT_IN_PLACE_BACKMERGE(dst + (len - 1), q, dst1 - q, q); 1133 | q *= 2; 1134 | } 1135 | 1136 | q1 = q; 1137 | 1138 | while (q1 > m) { 1139 | q1 /= 2; 1140 | } 1141 | 1142 | while ((q & p) == 0) { 1143 | q *= 2; 1144 | MERGE_SORT_IN_PLACE_RMERGE(dst + (p - q), q, q / 2, q1); 1145 | } 1146 | } 1147 | 1148 | q1 = 0; 1149 | 1150 | for (q = r; q < lr; q *= 2) { 1151 | if ((lr & q) != 0) { 1152 | q1 += q; 1153 | 1154 | if (q1 != q) { 1155 | MERGE_SORT_IN_PLACE_RMERGE(dst + (lr - q1), q1, q, r); 1156 | } 1157 | } 1158 | } 1159 | 1160 | m = len - lr; 1161 | MERGE_SORT_IN_PLACE(dst + lr, m); 1162 | MERGE_SORT_IN_PLACE_ASWAP(dst, dst + lr, m); 1163 | m += MERGE_SORT_IN_PLACE_BACKMERGE(dst + (m - 1), m, dst + (lr - 1), lr - m); 1164 | MERGE_SORT_IN_PLACE(dst, m); 1165 | } 1166 | 1167 | /* Standard merge sort */ 1168 | SORT_DEF void MERGE_SORT_RECURSIVE(SORT_TYPE *newdst, SORT_TYPE *dst, const size_t size) { 1169 | const size_t middle = size / 2; 1170 | size_t out = 0; 1171 | size_t i = 0; 1172 | size_t j = middle; 1173 | 1174 | /* don't bother sorting an array of size <= 1 */ 1175 | if (size <= 1) { 1176 | return; 1177 | } 1178 | 1179 | if (size <= SMALL_SORT_BND) { 1180 | SMALL_STABLE_SORT(dst, size); 1181 | return; 1182 | } 1183 | 1184 | MERGE_SORT_RECURSIVE(newdst, dst, middle); 1185 | MERGE_SORT_RECURSIVE(newdst, &dst[middle], size - middle); 1186 | 1187 | while (out != size) { 1188 | if (i < middle) { 1189 | if (j < size) { 1190 | if (SORT_CMP(dst[i], dst[j]) <= 0) { 1191 | newdst[out] = dst[i++]; 1192 | } else { 1193 | newdst[out] = dst[j++]; 1194 | } 1195 | } else { 1196 | newdst[out] = dst[i++]; 1197 | } 1198 | } else { 1199 | newdst[out] = dst[j++]; 1200 | } 1201 | 1202 | out++; 1203 | } 1204 | 1205 | SORT_TYPE_CPY(dst, newdst, size); 1206 | } 1207 | 1208 | /* Standard merge sort */ 1209 | SORT_DEF void MERGE_SORT(SORT_TYPE *dst, const size_t size) { 1210 | SORT_TYPE *newdst; 1211 | 1212 | /* don't bother sorting an array of size <= 1 */ 1213 | if (size <= 1) { 1214 | return; 1215 | } 1216 | 1217 | if (size <= SMALL_SORT_BND) { 1218 | SMALL_STABLE_SORT(dst, size); 1219 | return; 1220 | } 1221 | 1222 | newdst = SORT_NEW_BUFFER(size); 1223 | MERGE_SORT_RECURSIVE(newdst, dst, size); 1224 | SORT_DELETE_BUFFER(newdst); 1225 | } 1226 | 1227 | 1228 | static __inline size_t QUICK_SORT_PARTITION(SORT_TYPE *dst, const size_t left, 1229 | const size_t right, const size_t pivot) { 1230 | SORT_TYPE value = dst[pivot]; 1231 | size_t index = left; 1232 | size_t i; 1233 | int not_all_same = 0; 1234 | /* move the pivot to the right */ 1235 | SORT_SWAP(dst[pivot], dst[right]); 1236 | 1237 | for (i = left; i < right; i++) { 1238 | int cmp = SORT_CMP(dst[i], value); 1239 | /* check if everything is all the same */ 1240 | not_all_same |= cmp; 1241 | 1242 | if (cmp < 0) { 1243 | SORT_SWAP(dst[i], dst[index]); 1244 | index++; 1245 | } 1246 | } 1247 | 1248 | SORT_SWAP(dst[right], dst[index]); 1249 | 1250 | /* avoid degenerate case */ 1251 | if (not_all_same == 0) { 1252 | return SIZE_MAX; 1253 | } 1254 | 1255 | return index; 1256 | } 1257 | 1258 | /* Based on Knuth vol. 3 1259 | static __inline size_t QUICK_SORT_HOARE_PARTITION(SORT_TYPE *dst, const size_t l, 1260 | const size_t r, const size_t pivot) { 1261 | SORT_TYPE value; 1262 | size_t i = l + 1; 1263 | size_t j = r; 1264 | 1265 | if (pivot != l) { 1266 | SORT_SWAP(dst[pivot], dst[l]); 1267 | } 1268 | value = dst[l]; 1269 | 1270 | while (1) { 1271 | while (SORT_CMP(dst[i], value) < 0) { 1272 | i++; 1273 | } 1274 | while (SORT_CMP(value, dst[j]) < 0) { 1275 | j--; 1276 | } 1277 | if (j <= i) { 1278 | SORT_SWAP(dst[l], dst[j]); 1279 | return j; 1280 | } 1281 | SORT_SWAP(dst[i], dst[j]); 1282 | i++; 1283 | j--; 1284 | } 1285 | return 0; 1286 | } 1287 | */ 1288 | 1289 | 1290 | /* Return the median index of the objects at the three indices. */ 1291 | static __inline size_t MEDIAN(const SORT_TYPE *dst, const size_t a, const size_t b, 1292 | const size_t c) { 1293 | const int AB = SORT_CMP(dst[a], dst[b]) < 0; 1294 | 1295 | if (AB) { 1296 | /* a < b */ 1297 | const int BC = SORT_CMP(dst[b], dst[c]) < 0; 1298 | 1299 | if (BC) { 1300 | /* a < b < c */ 1301 | return b; 1302 | } else { 1303 | /* a < b, c < b */ 1304 | const int AC = SORT_CMP(dst[a], dst[c]) < 0; 1305 | 1306 | if (AC) { 1307 | /* a < c < b */ 1308 | return c; 1309 | } else { 1310 | /* c < a < b */ 1311 | return a; 1312 | } 1313 | } 1314 | } else { 1315 | /* b < a */ 1316 | const int AC = SORT_CMP(dst[a], dst[b]) < 0; 1317 | 1318 | if (AC) { 1319 | /* b < a < c */ 1320 | return a; 1321 | } else { 1322 | /* b < a, c < a */ 1323 | const int BC = SORT_CMP(dst[b], dst[c]) < 0; 1324 | 1325 | if (BC) { 1326 | /* b < c < a */ 1327 | return c; 1328 | } else { 1329 | /* c < b < a */ 1330 | return b; 1331 | } 1332 | } 1333 | } 1334 | } 1335 | 1336 | static void QUICK_SORT_RECURSIVE(SORT_TYPE *dst, const size_t original_left, 1337 | const size_t original_right) { 1338 | size_t left; 1339 | size_t right; 1340 | size_t pivot; 1341 | size_t new_pivot; 1342 | size_t middle; 1343 | int loop_count = 0; 1344 | const int max_loops = 64 - CLZ(original_right - original_left); /* ~lg N */ 1345 | left = original_left; 1346 | right = original_right; 1347 | 1348 | while (1) { 1349 | if (right <= left) { 1350 | return; 1351 | } 1352 | 1353 | if ((right - left + 1U) <= SMALL_SORT_BND) { 1354 | SMALL_SORT(&dst[left], right - left + 1U); 1355 | return; 1356 | } 1357 | 1358 | if (++loop_count >= max_loops) { 1359 | /* we have recursed / looped too many times; switch to heap sort */ 1360 | HEAP_SORT(&dst[left], right - left + 1U); 1361 | return; 1362 | } 1363 | 1364 | /* median of 5 */ 1365 | middle = left + ((right - left) >> 1); 1366 | pivot = MEDIAN((const SORT_TYPE *) dst, left, middle, right); 1367 | pivot = MEDIAN((const SORT_TYPE *) dst, left + ((middle - left) >> 1), pivot, 1368 | middle + ((right - middle) >> 1)); 1369 | new_pivot = QUICK_SORT_PARTITION(dst, left, right, pivot); 1370 | 1371 | /* check for partition all equal */ 1372 | if (new_pivot == SIZE_MAX) { 1373 | return; 1374 | } 1375 | 1376 | /* recurse only on the small part to avoid degenerate stack sizes */ 1377 | /* and manually do tail call on the large part */ 1378 | if (new_pivot - 1U - left > right - new_pivot - 1U) { 1379 | /* left is bigger than right */ 1380 | QUICK_SORT_RECURSIVE(dst, new_pivot + 1U, right); 1381 | /* tail call for left */ 1382 | right = new_pivot - 1U; 1383 | } else { 1384 | /* right is bigger than left */ 1385 | QUICK_SORT_RECURSIVE(dst, left, new_pivot - 1U); 1386 | /* tail call for right */ 1387 | left = new_pivot + 1U; 1388 | } 1389 | } 1390 | } 1391 | 1392 | void QUICK_SORT(SORT_TYPE *dst, const size_t size) { 1393 | /* don't bother sorting an array of size 1 */ 1394 | if (size <= 1) { 1395 | return; 1396 | } 1397 | 1398 | QUICK_SORT_RECURSIVE(dst, 0U, size - 1U); 1399 | } 1400 | 1401 | 1402 | /* timsort implementation, based on timsort.txt */ 1403 | 1404 | static __inline void REVERSE_ELEMENTS(SORT_TYPE *dst, size_t start, size_t end) { 1405 | while (1) { 1406 | if (start >= end) { 1407 | return; 1408 | } 1409 | 1410 | SORT_SWAP(dst[start], dst[end]); 1411 | start++; 1412 | end--; 1413 | } 1414 | } 1415 | 1416 | static size_t COUNT_RUN(SORT_TYPE *dst, const size_t start, const size_t size) { 1417 | size_t curr; 1418 | 1419 | if (size - start == 1) { 1420 | return 1; 1421 | } 1422 | 1423 | if (start >= size - 2) { 1424 | if (SORT_CMP(dst[size - 2], dst[size - 1]) > 0) { 1425 | SORT_SWAP(dst[size - 2], dst[size - 1]); 1426 | } 1427 | 1428 | return 2; 1429 | } 1430 | 1431 | curr = start + 2; 1432 | 1433 | if (SORT_CMP(dst[start], dst[start + 1]) <= 0) { 1434 | /* increasing run */ 1435 | while (1) { 1436 | if (curr == size - 1) { 1437 | break; 1438 | } 1439 | 1440 | if (SORT_CMP(dst[curr - 1], dst[curr]) > 0) { 1441 | break; 1442 | } 1443 | 1444 | curr++; 1445 | } 1446 | 1447 | return curr - start; 1448 | } else { 1449 | /* decreasing run */ 1450 | while (1) { 1451 | if (curr == size - 1) { 1452 | break; 1453 | } 1454 | 1455 | if (SORT_CMP(dst[curr - 1], dst[curr]) <= 0) { 1456 | break; 1457 | } 1458 | 1459 | curr++; 1460 | } 1461 | 1462 | /* reverse in-place */ 1463 | REVERSE_ELEMENTS(dst, start, curr - 1); 1464 | return curr - start; 1465 | } 1466 | } 1467 | 1468 | static int CHECK_INVARIANT(TIM_SORT_RUN_T *stack, const int stack_curr) { 1469 | size_t A, B, C; 1470 | 1471 | if (stack_curr < 2) { 1472 | return 1; 1473 | } 1474 | 1475 | if (stack_curr == 2) { 1476 | const size_t A1 = stack[stack_curr - 2].length; 1477 | const size_t B1 = stack[stack_curr - 1].length; 1478 | 1479 | if (A1 <= B1) { 1480 | return 0; 1481 | } 1482 | 1483 | return 1; 1484 | } 1485 | 1486 | A = stack[stack_curr - 3].length; 1487 | B = stack[stack_curr - 2].length; 1488 | C = stack[stack_curr - 1].length; 1489 | 1490 | if ((A <= B + C) || (B <= C)) { 1491 | return 0; 1492 | } 1493 | 1494 | return 1; 1495 | } 1496 | 1497 | typedef struct { 1498 | size_t alloc; 1499 | SORT_TYPE *storage; 1500 | } TEMP_STORAGE_T; 1501 | 1502 | static void TIM_SORT_RESIZE(TEMP_STORAGE_T *store, const size_t new_size) { 1503 | if ((store->storage == NULL) || (store->alloc < new_size)) { 1504 | SORT_TYPE *tempstore = (SORT_TYPE *)realloc(store->storage, new_size * sizeof(SORT_TYPE)); 1505 | 1506 | if (tempstore == NULL) { 1507 | fprintf(stderr, "Error allocating temporary storage for tim sort: need %lu bytes", 1508 | (unsigned long)(sizeof(SORT_TYPE) * new_size)); 1509 | exit(1); 1510 | } 1511 | 1512 | store->storage = tempstore; 1513 | store->alloc = new_size; 1514 | } 1515 | } 1516 | 1517 | static void TIM_SORT_MERGE(SORT_TYPE *dst, const TIM_SORT_RUN_T *stack, const int stack_curr, 1518 | TEMP_STORAGE_T *store) { 1519 | const size_t A = stack[stack_curr - 2].length; 1520 | const size_t B = stack[stack_curr - 1].length; 1521 | const size_t curr = stack[stack_curr - 2].start; 1522 | SORT_TYPE *storage; 1523 | size_t i, j, k; 1524 | TIM_SORT_RESIZE(store, MIN(A, B)); 1525 | storage = store->storage; 1526 | 1527 | /* left merge */ 1528 | if (A < B) { 1529 | SORT_TYPE_CPY(storage, &dst[curr], A); 1530 | i = 0; 1531 | j = curr + A; 1532 | 1533 | for (k = curr; k < curr + A + B; k++) { 1534 | if ((i < A) && (j < curr + A + B)) { 1535 | if (SORT_CMP(storage[i], dst[j]) <= 0) { 1536 | dst[k] = storage[i++]; 1537 | } else { 1538 | dst[k] = dst[j++]; 1539 | } 1540 | } else if (i < A) { 1541 | dst[k] = storage[i++]; 1542 | } else { 1543 | break; 1544 | } 1545 | } 1546 | } else { 1547 | /* right merge */ 1548 | SORT_TYPE_CPY(storage, &dst[curr + A], B); 1549 | i = B; 1550 | j = curr + A; 1551 | k = curr + A + B; 1552 | 1553 | while (k-- > curr) { 1554 | if ((i > 0) && (j > curr)) { 1555 | if (SORT_CMP(dst[j - 1], storage[i - 1]) > 0) { 1556 | dst[k] = dst[--j]; 1557 | } else { 1558 | dst[k] = storage[--i]; 1559 | } 1560 | } else if (i > 0) { 1561 | dst[k] = storage[--i]; 1562 | } else { 1563 | break; 1564 | } 1565 | } 1566 | } 1567 | } 1568 | 1569 | static int TIM_SORT_COLLAPSE(SORT_TYPE *dst, TIM_SORT_RUN_T *stack, int stack_curr, 1570 | TEMP_STORAGE_T *store, const size_t size) { 1571 | while (1) { 1572 | size_t A, B, C, D; 1573 | int ABC, BCD, CD; 1574 | 1575 | /* if the stack only has one thing on it, we are done with the collapse */ 1576 | if (stack_curr <= 1) { 1577 | break; 1578 | } 1579 | 1580 | /* if this is the last merge, just do it */ 1581 | if ((stack_curr == 2) && (stack[0].length + stack[1].length == size)) { 1582 | TIM_SORT_MERGE(dst, stack, stack_curr, store); 1583 | stack[0].length += stack[1].length; 1584 | stack_curr--; 1585 | break; 1586 | } 1587 | /* check if the invariant is off for a stack of 2 elements */ 1588 | else if ((stack_curr == 2) && (stack[0].length <= stack[1].length)) { 1589 | TIM_SORT_MERGE(dst, stack, stack_curr, store); 1590 | stack[0].length += stack[1].length; 1591 | stack_curr--; 1592 | break; 1593 | } else if (stack_curr == 2) { 1594 | break; 1595 | } 1596 | 1597 | B = stack[stack_curr - 3].length; 1598 | C = stack[stack_curr - 2].length; 1599 | D = stack[stack_curr - 1].length; 1600 | 1601 | if (stack_curr >= 4) { 1602 | A = stack[stack_curr - 4].length; 1603 | ABC = (A <= B + C); 1604 | } else { 1605 | ABC = 0; 1606 | } 1607 | 1608 | BCD = (B <= C + D) || ABC; 1609 | CD = (C <= D); 1610 | 1611 | /* Both invariants are good */ 1612 | if (!BCD && !CD) { 1613 | break; 1614 | } 1615 | 1616 | /* left merge */ 1617 | if (BCD && !CD) { 1618 | TIM_SORT_MERGE(dst, stack, stack_curr - 1, store); 1619 | stack[stack_curr - 3].length += stack[stack_curr - 2].length; 1620 | stack[stack_curr - 2] = stack[stack_curr - 1]; 1621 | stack_curr--; 1622 | } else { 1623 | /* right merge */ 1624 | TIM_SORT_MERGE(dst, stack, stack_curr, store); 1625 | stack[stack_curr - 2].length += stack[stack_curr - 1].length; 1626 | stack_curr--; 1627 | } 1628 | } 1629 | 1630 | return stack_curr; 1631 | } 1632 | 1633 | static __inline int PUSH_NEXT(SORT_TYPE *dst, 1634 | const size_t size, 1635 | TEMP_STORAGE_T *store, 1636 | const size_t minrun, 1637 | TIM_SORT_RUN_T *run_stack, 1638 | size_t *stack_curr, 1639 | size_t *curr) { 1640 | size_t len = COUNT_RUN(dst, *curr, size); 1641 | size_t run = minrun; 1642 | 1643 | if (run > size - *curr) { 1644 | run = size - *curr; 1645 | } 1646 | 1647 | if (run > len) { 1648 | BINARY_INSERTION_SORT_START(&dst[*curr], len, run); 1649 | len = run; 1650 | } 1651 | 1652 | run_stack[*stack_curr].start = *curr; 1653 | run_stack[*stack_curr].length = len; 1654 | (*stack_curr)++; 1655 | *curr += len; 1656 | 1657 | if (*curr == size) { 1658 | /* finish up */ 1659 | while (*stack_curr > 1) { 1660 | TIM_SORT_MERGE(dst, run_stack, (int)*stack_curr, store); 1661 | run_stack[*stack_curr - 2].length += run_stack[*stack_curr - 1].length; 1662 | (*stack_curr)--; 1663 | } 1664 | 1665 | if (store->storage != NULL) { 1666 | free(store->storage); 1667 | store->storage = NULL; 1668 | } 1669 | 1670 | return 0; 1671 | } 1672 | 1673 | return 1; 1674 | } 1675 | 1676 | SORT_DEF void TIM_SORT(SORT_TYPE *dst, const size_t size) { 1677 | size_t minrun; 1678 | TEMP_STORAGE_T _store, *store; 1679 | TIM_SORT_RUN_T run_stack[TIM_SORT_STACK_SIZE]; 1680 | size_t stack_curr = 0; 1681 | size_t curr = 0; 1682 | 1683 | /* don't bother sorting an array of size 1 */ 1684 | if (size <= 1) { 1685 | return; 1686 | } 1687 | 1688 | if (size < 64) { 1689 | SMALL_STABLE_SORT(dst, size); 1690 | return; 1691 | } 1692 | 1693 | /* compute the minimum run length */ 1694 | minrun = compute_minrun(size); 1695 | /* temporary storage for merges */ 1696 | store = &_store; 1697 | store->alloc = 0; 1698 | store->storage = NULL; 1699 | 1700 | if (!PUSH_NEXT(dst, size, store, minrun, run_stack, &stack_curr, &curr)) { 1701 | return; 1702 | } 1703 | 1704 | if (!PUSH_NEXT(dst, size, store, minrun, run_stack, &stack_curr, &curr)) { 1705 | return; 1706 | } 1707 | 1708 | if (!PUSH_NEXT(dst, size, store, minrun, run_stack, &stack_curr, &curr)) { 1709 | return; 1710 | } 1711 | 1712 | while (1) { 1713 | if (!CHECK_INVARIANT(run_stack, (int)stack_curr)) { 1714 | stack_curr = TIM_SORT_COLLAPSE(dst, run_stack, (int)stack_curr, store, size); 1715 | continue; 1716 | } 1717 | 1718 | if (!PUSH_NEXT(dst, size, store, minrun, run_stack, &stack_curr, &curr)) { 1719 | return; 1720 | } 1721 | } 1722 | } 1723 | 1724 | /* heap sort: based on wikipedia */ 1725 | 1726 | static __inline void HEAP_SIFT_DOWN(SORT_TYPE *dst, const size_t start, const size_t end) { 1727 | size_t root = start; 1728 | 1729 | while ((root << 1) <= end) { 1730 | size_t child = root << 1; 1731 | 1732 | if ((child < end) && (SORT_CMP(dst[child], dst[child + 1]) < 0)) { 1733 | child++; 1734 | } 1735 | 1736 | if (SORT_CMP(dst[root], dst[child]) < 0) { 1737 | SORT_SWAP(dst[root], dst[child]); 1738 | root = child; 1739 | } else { 1740 | return; 1741 | } 1742 | } 1743 | } 1744 | 1745 | static __inline void HEAPIFY(SORT_TYPE *dst, const size_t size) { 1746 | size_t start = size >> 1; 1747 | 1748 | while (1) { 1749 | HEAP_SIFT_DOWN(dst, start, size - 1); 1750 | 1751 | if (start == 0) { 1752 | break; 1753 | } 1754 | 1755 | start--; 1756 | } 1757 | } 1758 | 1759 | SORT_DEF void HEAP_SORT(SORT_TYPE *dst, const size_t size) { 1760 | size_t end = size - 1; 1761 | 1762 | /* don't bother sorting an array of size <= 1 */ 1763 | if (size <= 1) { 1764 | return; 1765 | } 1766 | 1767 | HEAPIFY(dst, size); 1768 | 1769 | while (end > 0) { 1770 | SORT_SWAP(dst[end], dst[0]); 1771 | HEAP_SIFT_DOWN(dst, 0, end - 1); 1772 | end--; 1773 | } 1774 | } 1775 | 1776 | #ifdef SORT_EXTRA 1777 | #include "sort_extra.h" 1778 | #endif 1779 | 1780 | #undef SORT_SAFE_CPY 1781 | #undef SORT_TYPE_CPY 1782 | #undef SORT_TYPE_MOVE 1783 | #undef SORT_NEW_BUFFER 1784 | #undef SORT_DELETE_BUFFER 1785 | #undef QUICK_SORT 1786 | #undef MEDIAN 1787 | #undef SORT_CONCAT 1788 | #undef SORT_MAKE_STR1 1789 | #undef SORT_MAKE_STR 1790 | #undef SORT_NAME 1791 | #undef SORT_TYPE 1792 | #undef SORT_CMP 1793 | #undef TEMP_STORAGE_T 1794 | #undef TIM_SORT_RUN_T 1795 | #undef PUSH_NEXT 1796 | #undef SORT_SWAP 1797 | #undef SORT_MAKE_STR1 1798 | #undef SORT_MAKE_STR 1799 | #undef BINARY_INSERTION_FIND 1800 | #undef BINARY_INSERTION_SORT_START 1801 | #undef BINARY_INSERTION_SORT 1802 | #undef REVERSE_ELEMENTS 1803 | #undef COUNT_RUN 1804 | #undef TIM_SORT 1805 | #undef TIM_SORT_RESIZE 1806 | #undef TIM_SORT_COLLAPSE 1807 | #undef TIM_SORT_RUN_T 1808 | #undef TEMP_STORAGE_T 1809 | #undef MERGE_SORT 1810 | #undef MERGE_SORT_RECURSIVE 1811 | #undef MERGE_SORT_IN_PLACE 1812 | #undef MERGE_SORT_IN_PLACE_RMERGE 1813 | #undef MERGE_SORT_IN_PLACE_BACKMERGE 1814 | #undef MERGE_SORT_IN_PLACE_FRONTMERGE 1815 | #undef MERGE_SORT_IN_PLACE_ASWAP 1816 | 1817 | #ifdef SORT_GCC_PUSH 1818 | #undef SORT_GCC_PUSH 1819 | #pragma GCC diagnostic pop 1820 | #endif 1821 | -------------------------------------------------------------------------------- /sort_extra.h: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2010-2024 Christopher Swenson. */ 2 | /* Copyright (c) 2012 Vojtech Fried. */ 3 | /* Copyright (c) 2012 Google Inc. All Rights Reserved. */ 4 | 5 | #define GRAIL_SWAP1 SORT_MAKE_STR(grail_swap1) 6 | #define REC_STABLE_SORT SORT_MAKE_STR(rec_stable_sort) 7 | #define GRAIL_REC_MERGE SORT_MAKE_STR(grail_rec_merge) 8 | #define GRAIL_SORT_DYN_BUFFER SORT_MAKE_STR(grail_sort_dyn_buffer) 9 | #define GRAIL_SORT_FIXED_BUFFER SORT_MAKE_STR(grail_sort_fixed_buffer) 10 | #define GRAIL_COMMON_SORT SORT_MAKE_STR(grail_common_sort) 11 | #define GRAIL_SORT SORT_MAKE_STR(grail_sort) 12 | #define GRAIL_COMBINE_BLOCKS SORT_MAKE_STR(grail_combine_blocks) 13 | #define GRAIL_LAZY_STABLE_SORT SORT_MAKE_STR(grail_lazy_stable_sort) 14 | #define GRAIL_MERGE_WITHOUT_BUFFER SORT_MAKE_STR(grail_merge_without_buffer) 15 | #define GRAIL_ROTATE SORT_MAKE_STR(grail_rotate) 16 | #define GRAIL_BIN_SEARCH_LEFT SORT_MAKE_STR(grail_bin_search_left) 17 | #define GRAIL_BUILD_BLOCKS SORT_MAKE_STR(grail_build_blocks) 18 | #define GRAIL_FIND_KEYS SORT_MAKE_STR(grail_find_keys) 19 | #define GRAIL_MERGE_BUFFERS_LEFT_WITH_X_BUF SORT_MAKE_STR(grail_merge_buffers_left_with_x_buf) 20 | #define GRAIL_BIN_SEARCH_RIGHT SORT_MAKE_STR(grail_bin_search_right) 21 | #define GRAIL_MERGE_BUFFERS_LEFT SORT_MAKE_STR(grail_merge_buffers_left) 22 | #define GRAIL_SMART_MERGE_WITH_X_BUF SORT_MAKE_STR(grail_smart_merge_with_x_buf) 23 | #define GRAIL_MERGE_LEFT_WITH_X_BUF SORT_MAKE_STR(grail_merge_left_with_x_buf) 24 | #define GRAIL_SMART_MERGE_WITHOUT_BUFFER SORT_MAKE_STR(grail_smart_merge_without_buffer) 25 | #define GRAIL_SMART_MERGE_WITH_BUFFER SORT_MAKE_STR(grail_smart_merge_with_buffer) 26 | #define GRAIL_MERGE_RIGHT SORT_MAKE_STR(grail_merge_right) 27 | #define GRAIL_MERGE_LEFT SORT_MAKE_STR(grail_merge_left) 28 | #define GRAIL_SWAP_N SORT_MAKE_STR(grail_swap_n) 29 | #define SQRT_SORT SORT_MAKE_STR(sqrt_sort) 30 | #define SQRT_SORT_BUILD_BLOCKS SORT_MAKE_STR(sqrt_sort_build_blocks) 31 | #define SQRT_SORT_MERGE_BUFFERS_LEFT_WITH_X_BUF SORT_MAKE_STR(sqrt_sort_merge_buffers_left_with_x_buf) 32 | #define SQRT_SORT_MERGE_DOWN SORT_MAKE_STR(sqrt_sort_merge_down) 33 | #define SQRT_SORT_MERGE_LEFT_WITH_X_BUF SORT_MAKE_STR(sqrt_sort_merge_left_with_x_buf) 34 | #define SQRT_SORT_MERGE_RIGHT SORT_MAKE_STR(sqrt_sort_merge_right) 35 | #define SQRT_SORT_SWAP_N SORT_MAKE_STR(sqrt_sort_swap_n) 36 | #define SQRT_SORT_SWAP_1 SORT_MAKE_STR(sqrt_sort_swap_1) 37 | #define SQRT_SORT_SMART_MERGE_WITH_X_BUF SORT_MAKE_STR(sqrt_sort_smart_merge_with_x_buf) 38 | #define SQRT_SORT_SORT_INS SORT_MAKE_STR(sqrt_sort_sort_ins) 39 | #define SQRT_SORT_COMBINE_BLOCKS SORT_MAKE_STR(sqrt_sort_combine_blocks) 40 | #define SQRT_SORT_COMMON_SORT SORT_MAKE_STR(sqrt_sort_common_sort) 41 | #define BUBBLE_SORT SORT_MAKE_STR(bubble_sort) 42 | #define SELECTION_SORT SORT_MAKE_STR(selection_sort) 43 | 44 | SORT_DEF void REC_STABLE_SORT(SORT_TYPE *dst, const size_t size); 45 | SORT_DEF void GRAIL_SORT_DYN_BUFFER(SORT_TYPE *dst, const size_t size); 46 | SORT_DEF void GRAIL_SORT_FIXED_BUFFER(SORT_TYPE *dst, const size_t size); 47 | SORT_DEF void GRAIL_SORT(SORT_TYPE *dst, const size_t size); 48 | SORT_DEF void SQRT_SORT(SORT_TYPE *dst, const size_t size); 49 | SORT_DEF void BUBBLE_SORT(SORT_TYPE *dst, const size_t size); 50 | SORT_DEF void SELECTION_SORT(SORT_TYPE *dst, const size_t size); 51 | 52 | /********* Sqrt sorting *********************************/ 53 | /* */ 54 | /* (c) 2014 by Andrey Astrelin */ 55 | /* */ 56 | /* */ 57 | /* Stable sorting that works in O(N*log(N)) worst time */ 58 | /* and uses O(sqrt(N)) extra memory */ 59 | /* */ 60 | /* Define SORT_TYPE and SORT_CMP */ 61 | /* and then call SqrtSort() function */ 62 | /* */ 63 | /*********************************************************/ 64 | 65 | #define SORT_CMP_A(a,b) SORT_CMP(*(a),*(b)) 66 | 67 | static __inline void SQRT_SORT_SWAP_1(SORT_TYPE *a, SORT_TYPE *b) { 68 | SORT_TYPE c = *a; 69 | *a++ = *b; 70 | *b++ = c; 71 | } 72 | 73 | static __inline void SQRT_SORT_SWAP_N(SORT_TYPE *a, SORT_TYPE *b, int n) { 74 | while (n--) { 75 | SQRT_SORT_SWAP_1(a++, b++); 76 | } 77 | } 78 | 79 | 80 | static void SQRT_SORT_MERGE_RIGHT(SORT_TYPE *arr, int L1, int L2, int M) { 81 | int p0 = L1 + L2 + M - 1, p2 = L1 + L2 - 1, p1 = L1 - 1; 82 | 83 | while (p1 >= 0) { 84 | if (p2 < L1 || SORT_CMP_A(arr + p1, arr + p2) > 0) { 85 | arr[p0--] = arr[p1--]; 86 | } else { 87 | arr[p0--] = arr[p2--]; 88 | } 89 | } 90 | 91 | if (p2 != p0) while (p2 >= L1) { 92 | arr[p0--] = arr[p2--]; 93 | } 94 | } 95 | 96 | /* arr[M..-1] - free, arr[0,L1-1]++arr[L1,L1+L2-1] -> arr[M,M+L1+L2-1] */ 97 | static void SQRT_SORT_MERGE_LEFT_WITH_X_BUF(SORT_TYPE *arr, int L1, int L2, int M) { 98 | int p0 = 0, p1 = L1; 99 | L2 += L1; 100 | 101 | while (p1 < L2) { 102 | if (p0 == L1 || SORT_CMP_A(arr + p0, arr + p1) > 0) { 103 | arr[M++] = arr[p1++]; 104 | } else { 105 | arr[M++] = arr[p0++]; 106 | } 107 | } 108 | 109 | if (M != p0) while (p0 < L1) { 110 | arr[M++] = arr[p0++]; 111 | } 112 | } 113 | 114 | /* arr[0,L1-1] ++ arr2[0,L2-1] -> arr[-L1,L2-1], arr2 is "before" arr1 */ 115 | static void SQRT_SORT_MERGE_DOWN(SORT_TYPE *arr, SORT_TYPE *arr2, int L1, int L2) { 116 | int p0 = 0, p1 = 0, M = -L2; 117 | 118 | while (p1 < L2) { 119 | if (p0 == L1 || SORT_CMP_A(arr + p0, arr2 + p1) >= 0) { 120 | arr[M++] = arr2[p1++]; 121 | } else { 122 | arr[M++] = arr[p0++]; 123 | } 124 | } 125 | 126 | if (M != p0) while (p0 < L1) { 127 | arr[M++] = arr[p0++]; 128 | } 129 | } 130 | 131 | static void SQRT_SORT_SMART_MERGE_WITH_X_BUF(SORT_TYPE *arr, int *alen1, int *atype, int len2, 132 | int lkeys) { 133 | int p0 = -lkeys, p1 = 0, p2 = *alen1, q1 = p2, q2 = p2 + len2; 134 | int ftype = 1 - *atype; /* 1 if inverted */ 135 | 136 | while (p1 < q1 && p2 < q2) { 137 | if (SORT_CMP_A(arr + p1, arr + p2) - ftype < 0) { 138 | arr[p0++] = arr[p1++]; 139 | } else { 140 | arr[p0++] = arr[p2++]; 141 | } 142 | } 143 | 144 | if (p1 < q1) { 145 | *alen1 = q1 - p1; 146 | 147 | while (p1 < q1) { 148 | arr[--q2] = arr[--q1]; 149 | } 150 | } else { 151 | *alen1 = q2 - p2; 152 | *atype = ftype; 153 | } 154 | } 155 | 156 | 157 | /* 158 | arr - starting array. arr[-lblock..-1] - buffer (if havebuf). 159 | lblock - length of regular blocks. First nblocks are stable sorted by 1st elements and key-coded 160 | keys - arrays of keys, in same order as blocks. key0, nblock2=0 is possible. 163 | */ 164 | static void SQRT_SORT_MERGE_BUFFERS_LEFT_WITH_X_BUF(int *keys, int midkey, SORT_TYPE *arr, 165 | int nblock, int lblock, int nblock2, int llast) { 166 | int l, prest, lrest, frest, pidx, cidx, fnext; 167 | 168 | if (nblock == 0) { 169 | l = nblock2 * lblock; 170 | SQRT_SORT_MERGE_LEFT_WITH_X_BUF(arr, l, llast, -lblock); 171 | return; 172 | } 173 | 174 | lrest = lblock; 175 | frest = keys[0] < midkey ? 0 : 1; 176 | pidx = lblock; 177 | 178 | for (cidx = 1; cidx < nblock; cidx++, pidx += lblock) { 179 | prest = pidx - lrest; 180 | fnext = keys[cidx] < midkey ? 0 : 1; 181 | 182 | if (fnext == frest) { 183 | SORT_TYPE_CPY(arr + prest - lblock, arr + prest, lrest); 184 | prest = pidx; 185 | lrest = lblock; 186 | } else { 187 | SQRT_SORT_SMART_MERGE_WITH_X_BUF(arr + prest, &lrest, &frest, lblock, lblock); 188 | } 189 | } 190 | 191 | prest = pidx - lrest; 192 | 193 | if (llast) { 194 | if (frest) { 195 | SORT_TYPE_CPY(arr + prest - lblock, arr + prest, lrest); 196 | prest = pidx; 197 | lrest = lblock * nblock2; 198 | frest = 0; 199 | } else { 200 | lrest += lblock * nblock2; 201 | } 202 | 203 | SQRT_SORT_MERGE_LEFT_WITH_X_BUF(arr + prest, lrest, llast, -lblock); 204 | } else { 205 | SORT_TYPE_CPY(arr + prest - lblock, arr + prest, lrest); 206 | } 207 | } 208 | 209 | /* 210 | build blocks of length K 211 | input: [-K,-1] elements are buffer 212 | output: first K elements are buffer, blocks 2*K and last subblock sorted 213 | */ 214 | static void SQRT_SORT_BUILD_BLOCKS(SORT_TYPE *arr, int L, int K) { 215 | int m, u, h, p0, p1, rest, restk, p; 216 | 217 | for (m = 1; m < L; m += 2) { 218 | u = 0; 219 | 220 | if (SORT_CMP_A(arr + (m - 1), arr + m) > 0) { 221 | u = 1; 222 | } 223 | 224 | arr[m - 3] = arr[m - 1 + u]; 225 | arr[m - 2] = arr[m - u]; 226 | } 227 | 228 | if (L % 2) { 229 | arr[L - 3] = arr[L - 1]; 230 | } 231 | 232 | arr -= 2; 233 | 234 | for (h = 2; h < K; h *= 2) { 235 | p0 = 0; 236 | p1 = L - 2 * h; 237 | 238 | while (p0 <= p1) { 239 | SQRT_SORT_MERGE_LEFT_WITH_X_BUF(arr + p0, h, h, -h); 240 | p0 += 2 * h; 241 | } 242 | 243 | rest = L - p0; 244 | 245 | if (rest > h) { 246 | SQRT_SORT_MERGE_LEFT_WITH_X_BUF(arr + p0, h, rest - h, -h); 247 | } else { 248 | for (; p0 < L; p0++) { 249 | arr[p0 - h] = arr[p0]; 250 | } 251 | } 252 | 253 | arr -= h; 254 | } 255 | 256 | restk = L % (2 * K); 257 | p = L - restk; 258 | 259 | if (restk <= K) { 260 | SORT_TYPE_CPY(arr + p + K, arr + p, restk); 261 | } else { 262 | SQRT_SORT_MERGE_RIGHT(arr + p, K, restk - K, K); 263 | } 264 | 265 | while (p > 0) { 266 | p -= 2 * K; 267 | SQRT_SORT_MERGE_RIGHT(arr + p, K, K, K); 268 | } 269 | } 270 | 271 | 272 | static void SQRT_SORT_SORT_INS(SORT_TYPE *arr, int len) { 273 | int i, j; 274 | 275 | for (i = 1; i < len; i++) { 276 | for (j = i - 1; j >= 0 && SORT_CMP_A(arr + (j + 1), arr + j) < 0; j--) { 277 | SQRT_SORT_SWAP_1(arr + j, arr + (j + 1)); 278 | } 279 | } 280 | } 281 | 282 | /* 283 | keys are on the left of arr. Blocks of length LL combined. We'll combine them in pairs 284 | LL and nkeys are powers of 2. (2*LL/lblock) keys are guarantied 285 | */ 286 | static void SQRT_SORT_COMBINE_BLOCKS(SORT_TYPE *arr, int len, int LL, int lblock, int *tags) { 287 | int M, b, NBlk, midkey, lrest, u, i, p, v, kc, nbl2, llast; 288 | SORT_TYPE *arr1; 289 | M = len / (2 * LL); 290 | lrest = len % (2 * LL); 291 | 292 | if (lrest <= LL) { 293 | len -= lrest; 294 | lrest = 0; 295 | } 296 | 297 | for (b = 0; b <= M; b++) { 298 | if (b == M && lrest == 0) { 299 | break; 300 | } 301 | 302 | arr1 = arr + b * 2 * LL; 303 | NBlk = (b == M ? lrest : 2 * LL) / lblock; 304 | u = NBlk + (b == M ? 1 : 0); 305 | 306 | for (i = 0; i <= u; i++) { 307 | tags[i] = i; 308 | } 309 | 310 | midkey = LL / lblock; 311 | 312 | for (u = 1; u < NBlk; u++) { 313 | p = u - 1; 314 | 315 | for (v = u; v < NBlk; v++) { 316 | kc = SORT_CMP_A(arr1 + p * lblock, arr1 + v * lblock); 317 | 318 | if (kc > 0 || (kc == 0 && tags[p] > tags[v])) { 319 | p = v; 320 | } 321 | } 322 | 323 | if (p != u - 1) { 324 | SQRT_SORT_SWAP_N(arr1 + (u - 1)*lblock, arr1 + p * lblock, lblock); 325 | i = tags[u - 1]; 326 | tags[u - 1] = tags[p]; 327 | tags[p] = i; 328 | } 329 | } 330 | 331 | nbl2 = llast = 0; 332 | 333 | if (b == M) { 334 | llast = lrest % lblock; 335 | } 336 | 337 | if (llast != 0) { 338 | while (nbl2 < NBlk && SORT_CMP_A(arr1 + NBlk * lblock, arr1 + (NBlk - nbl2 - 1) * lblock) < 0) { 339 | nbl2++; 340 | } 341 | } 342 | 343 | SQRT_SORT_MERGE_BUFFERS_LEFT_WITH_X_BUF(tags, midkey, arr1, NBlk - nbl2, lblock, nbl2, llast); 344 | } 345 | 346 | for (p = len; --p >= 0;) { 347 | arr[p] = arr[p - lblock]; 348 | } 349 | } 350 | 351 | 352 | static void SQRT_SORT_COMMON_SORT(SORT_TYPE *arr, int Len, SORT_TYPE *extbuf, int *Tags) { 353 | int lblock, cbuf; 354 | 355 | if (Len < 16) { 356 | SQRT_SORT_SORT_INS(arr, Len); 357 | return; 358 | } 359 | 360 | lblock = 1; 361 | 362 | while (lblock * lblock < Len) { 363 | lblock *= 2; 364 | } 365 | 366 | SORT_TYPE_CPY(extbuf, arr, lblock); 367 | SQRT_SORT_COMMON_SORT(extbuf, lblock, arr, Tags); 368 | SQRT_SORT_BUILD_BLOCKS(arr + lblock, Len - lblock, lblock); 369 | cbuf = lblock; 370 | 371 | while (Len > (cbuf *= 2)) { 372 | SQRT_SORT_COMBINE_BLOCKS(arr + lblock, Len - lblock, cbuf, lblock, Tags); 373 | } 374 | 375 | SQRT_SORT_MERGE_DOWN(arr + lblock, extbuf, Len - lblock, lblock); 376 | } 377 | 378 | void SQRT_SORT(SORT_TYPE *arr, size_t Len) { 379 | int L = 1; 380 | SORT_TYPE *ExtBuf; 381 | int *Tags; 382 | int NK; 383 | 384 | while (L * L < Len) { 385 | L *= 2; 386 | } 387 | 388 | NK = (int)((Len - 1) / L + 2); 389 | ExtBuf = SORT_NEW_BUFFER(L); 390 | 391 | if (ExtBuf == NULL) { 392 | return; /* fail */ 393 | } 394 | 395 | Tags = (int*)malloc(NK * sizeof(int)); 396 | 397 | if (Tags == NULL) { 398 | return; 399 | } 400 | 401 | SQRT_SORT_COMMON_SORT(arr, (int)Len, ExtBuf, Tags); 402 | free(Tags); 403 | SORT_DELETE_BUFFER(ExtBuf); 404 | } 405 | 406 | /********* Grail sorting *********************************/ 407 | /* */ 408 | /* (c) 2013 by Andrey Astrelin */ 409 | /* */ 410 | /* */ 411 | /* Stable sorting that works in O(N*log(N)) worst time */ 412 | /* and uses O(1) extra memory */ 413 | /* */ 414 | /* Define SORT_TYPE and SORT_CMP */ 415 | /* and then call GrailSort() function */ 416 | /* */ 417 | /* For sorting with fixed external buffer (512 items) */ 418 | /* use GrailSortWithBuffer() */ 419 | /* */ 420 | /* For sorting with dynamic external buffer (O(sqrt(N)) items) */ 421 | /* use GrailSortWithDynBuffer() */ 422 | /* */ 423 | /* Also classic in-place merge sort is implemented */ 424 | /* under the name of RecStableSort() */ 425 | /* */ 426 | /*********************************************************/ 427 | 428 | #define GRAIL_EXT_BUFFER_LENGTH 512 429 | 430 | static __inline void GRAIL_SWAP1(SORT_TYPE *a, SORT_TYPE *b) { 431 | SORT_TYPE c = *a; 432 | *a = *b; 433 | *b = c; 434 | } 435 | 436 | static __inline void GRAIL_SWAP_N(SORT_TYPE *a, SORT_TYPE *b, int n) { 437 | while (n--) { 438 | GRAIL_SWAP1(a++, b++); 439 | } 440 | } 441 | 442 | static void GRAIL_ROTATE(SORT_TYPE *a, int l1, int l2) { 443 | while (l1 && l2) { 444 | if (l1 <= l2) { 445 | GRAIL_SWAP_N(a, a + l1, l1); 446 | a += l1; 447 | l2 -= l1; 448 | } else { 449 | GRAIL_SWAP_N(a + (l1 - l2), a + l1, l2); 450 | l1 -= l2; 451 | } 452 | } 453 | } 454 | 455 | static int GRAIL_BIN_SEARCH_LEFT(SORT_TYPE *arr, int len, SORT_TYPE *key) { 456 | int a = -1, b = len, c; 457 | 458 | while (a < b - 1) { 459 | c = a + ((b - a) >> 1); 460 | 461 | if (SORT_CMP_A(arr + c, key) >= 0) { 462 | b = c; 463 | } else { 464 | a = c; 465 | } 466 | } 467 | 468 | return b; 469 | } 470 | static int GRAIL_BIN_SEARCH_RIGHT(SORT_TYPE *arr, int len, SORT_TYPE *key) { 471 | int a = -1, b = len, c; 472 | 473 | while (a < b - 1) { 474 | c = a + ((b - a) >> 1); 475 | 476 | if (SORT_CMP_A(arr + c, key) > 0) { 477 | b = c; 478 | } else { 479 | a = c; 480 | } 481 | } 482 | 483 | return b; 484 | } 485 | 486 | /* cost: 2*len+nk^2/2 */ 487 | static int GRAIL_FIND_KEYS(SORT_TYPE *arr, int len, int nkeys) { 488 | int h = 1, h0 = 0; /* first key is always here */ 489 | int u = 1, r; 490 | 491 | while (u < len && h < nkeys) { 492 | r = GRAIL_BIN_SEARCH_LEFT(arr + h0, h, arr + u); 493 | 494 | if (r == h || SORT_CMP_A(arr + u, arr + (h0 + r)) != 0) { 495 | GRAIL_ROTATE(arr + h0, h, u - (h0 + h)); 496 | h0 = u - h; 497 | GRAIL_ROTATE(arr + (h0 + r), h - r, 1); 498 | h++; 499 | } 500 | 501 | u++; 502 | } 503 | 504 | GRAIL_ROTATE(arr, h0, h); 505 | return h; 506 | } 507 | 508 | /* cost: min(L1,L2)^2+max(L1,L2) */ 509 | static void GRAIL_MERGE_WITHOUT_BUFFER(SORT_TYPE *arr, int len1, int len2) { 510 | int h; 511 | 512 | if (len1 < len2) { 513 | while (len1) { 514 | h = GRAIL_BIN_SEARCH_LEFT(arr + len1, len2, arr); 515 | 516 | if (h != 0) { 517 | GRAIL_ROTATE(arr, len1, h); 518 | arr += h; 519 | len2 -= h; 520 | } 521 | 522 | if (len2 == 0) { 523 | break; 524 | } 525 | 526 | do { 527 | arr++; 528 | len1--; 529 | } while (len1 && SORT_CMP_A(arr, arr + len1) <= 0); 530 | } 531 | } else { 532 | while (len2) { 533 | h = GRAIL_BIN_SEARCH_RIGHT(arr, len1, arr + (len1 + len2 - 1)); 534 | 535 | if (h != len1) { 536 | GRAIL_ROTATE(arr + h, len1 - h, len2); 537 | len1 = h; 538 | } 539 | 540 | if (len1 == 0) { 541 | break; 542 | } 543 | 544 | do { 545 | len2--; 546 | } while (len2 && SORT_CMP_A(arr + len1 - 1, arr + len1 + len2 - 1) <= 0); 547 | } 548 | } 549 | } 550 | 551 | /* arr[M..-1] - buffer, arr[0,L1-1]++arr[L1,L1+L2-1] -> arr[M,M+L1+L2-1] */ 552 | static void GRAIL_MERGE_LEFT(SORT_TYPE *arr, int L1, int L2, int M) { 553 | int p0 = 0, p1 = L1; 554 | L2 += L1; 555 | 556 | while (p1 < L2) { 557 | if (p0 == L1 || SORT_CMP_A(arr + p0, arr + p1) > 0) { 558 | GRAIL_SWAP1(arr + (M++), arr + (p1++)); 559 | } else { 560 | GRAIL_SWAP1(arr + (M++), arr + (p0++)); 561 | } 562 | } 563 | 564 | if (M != p0) { 565 | GRAIL_SWAP_N(arr + M, arr + p0, L1 - p0); 566 | } 567 | } 568 | static void GRAIL_MERGE_RIGHT(SORT_TYPE *arr, int L1, int L2, int M) { 569 | int p0 = L1 + L2 + M - 1, p2 = L1 + L2 - 1, p1 = L1 - 1; 570 | 571 | while (p1 >= 0) { 572 | if (p2 < L1 || SORT_CMP_A(arr + p1, arr + p2) > 0) { 573 | GRAIL_SWAP1(arr + (p0--), arr + (p1--)); 574 | } else { 575 | GRAIL_SWAP1(arr + (p0--), arr + (p2--)); 576 | } 577 | } 578 | 579 | if (p2 != p0) while (p2 >= L1) { 580 | GRAIL_SWAP1(arr + (p0--), arr + (p2--)); 581 | } 582 | } 583 | 584 | static void GRAIL_SMART_MERGE_WITH_BUFFER(SORT_TYPE *arr, int *alen1, int *atype, int len2, 585 | int lkeys) { 586 | int p0 = -lkeys, p1 = 0, p2 = *alen1, q1 = p2, q2 = p2 + len2; 587 | int ftype = 1 - *atype; /* 1 if inverted */ 588 | 589 | while (p1 < q1 && p2 < q2) { 590 | if (SORT_CMP_A(arr + p1, arr + p2) - ftype < 0) { 591 | GRAIL_SWAP1(arr + (p0++), arr + (p1++)); 592 | } else { 593 | GRAIL_SWAP1(arr + (p0++), arr + (p2++)); 594 | } 595 | } 596 | 597 | if (p1 < q1) { 598 | *alen1 = q1 - p1; 599 | 600 | while (p1 < q1) { 601 | GRAIL_SWAP1(arr + (--q1), arr + (--q2)); 602 | } 603 | } else { 604 | *alen1 = q2 - p2; 605 | *atype = ftype; 606 | } 607 | } 608 | static void GRAIL_SMART_MERGE_WITHOUT_BUFFER(SORT_TYPE *arr, int *alen1, int *atype, int _len2) { 609 | int len1, len2, ftype, h; 610 | 611 | if (!_len2) { 612 | return; 613 | } 614 | 615 | len1 = *alen1; 616 | len2 = _len2; 617 | ftype = 1 - *atype; 618 | 619 | if (len1 && SORT_CMP_A(arr + (len1 - 1), arr + len1) - ftype >= 0) { 620 | while (len1) { 621 | h = ftype ? GRAIL_BIN_SEARCH_LEFT(arr + len1, len2, arr) : GRAIL_BIN_SEARCH_RIGHT(arr + len1, len2, 622 | arr); 623 | 624 | if (h != 0) { 625 | GRAIL_ROTATE(arr, len1, h); 626 | arr += h; 627 | len2 -= h; 628 | } 629 | 630 | if (len2 == 0) { 631 | *alen1 = len1; 632 | return; 633 | } 634 | 635 | do { 636 | arr++; 637 | len1--; 638 | } while (len1 && SORT_CMP_A(arr, arr + len1) - ftype < 0); 639 | } 640 | } 641 | 642 | *alen1 = len2; 643 | *atype = ftype; 644 | } 645 | 646 | /***** Sort With Extra Buffer *****/ 647 | 648 | /* arr[M..-1] - free, arr[0,L1-1]++arr[L1,L1+L2-1] -> arr[M,M+L1+L2-1] */ 649 | static void GRAIL_MERGE_LEFT_WITH_X_BUF(SORT_TYPE *arr, int L1, int L2, int M) { 650 | int p0 = 0, p1 = L1; 651 | L2 += L1; 652 | 653 | while (p1 < L2) { 654 | if (p0 == L1 || SORT_CMP_A(arr + p0, arr + p1) > 0) { 655 | arr[M++] = arr[p1++]; 656 | } else { 657 | arr[M++] = arr[p0++]; 658 | } 659 | } 660 | 661 | if (M != p0) while (p0 < L1) { 662 | arr[M++] = arr[p0++]; 663 | } 664 | } 665 | 666 | static void GRAIL_SMART_MERGE_WITH_X_BUF(SORT_TYPE *arr, int *alen1, int *atype, int len2, 667 | int lkeys) { 668 | int p0 = -lkeys, p1 = 0, p2 = *alen1, q1 = p2, q2 = p2 + len2; 669 | int ftype = 1 - *atype; /* 1 if inverted */ 670 | 671 | while (p1 < q1 && p2 < q2) { 672 | if (SORT_CMP_A(arr + p1, arr + p2) - ftype < 0) { 673 | arr[p0++] = arr[p1++]; 674 | } else { 675 | arr[p0++] = arr[p2++]; 676 | } 677 | } 678 | 679 | if (p1 < q1) { 680 | *alen1 = q1 - p1; 681 | 682 | while (p1 < q1) { 683 | arr[--q2] = arr[--q1]; 684 | } 685 | } else { 686 | *alen1 = q2 - p2; 687 | *atype = ftype; 688 | } 689 | } 690 | 691 | /* 692 | arr - starting array. arr[-lblock..-1] - buffer (if havebuf). 693 | lblock - length of regular blocks. First nblocks are stable sorted by 1st elements and key-coded 694 | keys - arrays of keys, in same order as blocks. key0, nblock2=0 is possible. 697 | */ 698 | static void GRAIL_MERGE_BUFFERS_LEFT_WITH_X_BUF(SORT_TYPE *keys, SORT_TYPE *midkey, SORT_TYPE *arr, 699 | int nblock, int lblock, int nblock2, int llast) { 700 | int l, prest, lrest, frest, pidx, cidx, fnext; 701 | 702 | if (nblock == 0) { 703 | l = nblock2 * lblock; 704 | GRAIL_MERGE_LEFT_WITH_X_BUF(arr, l, llast, -lblock); 705 | return; 706 | } 707 | 708 | lrest = lblock; 709 | frest = SORT_CMP_A(keys, midkey) < 0 ? 0 : 1; 710 | pidx = lblock; 711 | 712 | for (cidx = 1; cidx < nblock; cidx++, pidx += lblock) { 713 | prest = pidx - lrest; 714 | fnext = SORT_CMP_A(keys + cidx, midkey) < 0 ? 0 : 1; 715 | 716 | if (fnext == frest) { 717 | SORT_TYPE_CPY(arr + prest - lblock, arr + prest, lrest); 718 | prest = pidx; 719 | lrest = lblock; 720 | } else { 721 | GRAIL_SMART_MERGE_WITH_X_BUF(arr + prest, &lrest, &frest, lblock, lblock); 722 | } 723 | } 724 | 725 | prest = pidx - lrest; 726 | 727 | if (llast) { 728 | if (frest) { 729 | SORT_TYPE_CPY(arr + prest - lblock, arr + prest, lrest); 730 | prest = pidx; 731 | lrest = lblock * nblock2; 732 | frest = 0; 733 | } else { 734 | lrest += lblock * nblock2; 735 | } 736 | 737 | GRAIL_MERGE_LEFT_WITH_X_BUF(arr + prest, lrest, llast, -lblock); 738 | } else { 739 | SORT_TYPE_CPY(arr + prest - lblock, arr + prest, lrest); 740 | } 741 | } 742 | 743 | /***** End Sort With Extra Buffer *****/ 744 | 745 | /* 746 | build blocks of length K 747 | input: [-K,-1] elements are buffer 748 | output: first K elements are buffer, blocks 2*K and last subblock sorted 749 | */ 750 | static void GRAIL_BUILD_BLOCKS(SORT_TYPE *arr, int L, int K, SORT_TYPE *extbuf, int LExtBuf) { 751 | int m, u, h, p0, p1, rest, restk, p, kbuf; 752 | kbuf = K < LExtBuf ? K : LExtBuf; 753 | 754 | while (kbuf & (kbuf - 1)) { 755 | kbuf &= kbuf - 1; /* max power or 2 - just in case */ 756 | } 757 | 758 | if (kbuf) { 759 | SORT_TYPE_CPY(extbuf, arr - kbuf, kbuf); 760 | 761 | for (m = 1; m < L; m += 2) { 762 | u = 0; 763 | 764 | if (SORT_CMP_A(arr + (m - 1), arr + m) > 0) { 765 | u = 1; 766 | } 767 | 768 | arr[m - 3] = arr[m - 1 + u]; 769 | arr[m - 2] = arr[m - u]; 770 | } 771 | 772 | if (L % 2) { 773 | arr[L - 3] = arr[L - 1]; 774 | } 775 | 776 | arr -= 2; 777 | 778 | for (h = 2; h < kbuf; h *= 2) { 779 | p0 = 0; 780 | p1 = L - 2 * h; 781 | 782 | while (p0 <= p1) { 783 | GRAIL_MERGE_LEFT_WITH_X_BUF(arr + p0, h, h, -h); 784 | p0 += 2 * h; 785 | } 786 | 787 | rest = L - p0; 788 | 789 | if (rest > h) { 790 | GRAIL_MERGE_LEFT_WITH_X_BUF(arr + p0, h, rest - h, -h); 791 | } else { 792 | for (; p0 < L; p0++) { 793 | arr[p0 - h] = arr[p0]; 794 | } 795 | } 796 | 797 | arr -= h; 798 | } 799 | 800 | SORT_TYPE_CPY(arr + L, extbuf, kbuf); 801 | } else { 802 | for (m = 1; m < L; m += 2) { 803 | u = 0; 804 | 805 | if (SORT_CMP_A(arr + (m - 1), arr + m) > 0) { 806 | u = 1; 807 | } 808 | 809 | GRAIL_SWAP1(arr + (m - 3), arr + (m - 1 + u)); 810 | GRAIL_SWAP1(arr + (m - 2), arr + (m - u)); 811 | } 812 | 813 | if (L % 2) { 814 | GRAIL_SWAP1(arr + (L - 1), arr + (L - 3)); 815 | } 816 | 817 | arr -= 2; 818 | h = 2; 819 | } 820 | 821 | for (; h < K; h *= 2) { 822 | p0 = 0; 823 | p1 = L - 2 * h; 824 | 825 | while (p0 <= p1) { 826 | GRAIL_MERGE_LEFT(arr + p0, h, h, -h); 827 | p0 += 2 * h; 828 | } 829 | 830 | rest = L - p0; 831 | 832 | if (rest > h) { 833 | GRAIL_MERGE_LEFT(arr + p0, h, rest - h, -h); 834 | } else { 835 | GRAIL_ROTATE(arr + p0 - h, h, rest); 836 | } 837 | 838 | arr -= h; 839 | } 840 | 841 | restk = L % (2 * K); 842 | p = L - restk; 843 | 844 | if (restk <= K) { 845 | GRAIL_ROTATE(arr + p, restk, K); 846 | } else { 847 | GRAIL_MERGE_RIGHT(arr + p, K, restk - K, K); 848 | } 849 | 850 | while (p > 0) { 851 | p -= 2 * K; 852 | GRAIL_MERGE_RIGHT(arr + p, K, K, K); 853 | } 854 | } 855 | 856 | /* 857 | arr - starting array. arr[-lblock..-1] - buffer (if havebuf). 858 | lblock - length of regular blocks. First nblocks are stable sorted by 1st elements and key-coded 859 | keys - arrays of keys, in same order as blocks. key0, nblock2=0 is possible. 862 | */ 863 | static void GRAIL_MERGE_BUFFERS_LEFT(SORT_TYPE *keys, SORT_TYPE *midkey, SORT_TYPE *arr, int nblock, 864 | int lblock, int havebuf, int nblock2, int llast) { 865 | int l, prest, lrest, frest, pidx, cidx, fnext; 866 | 867 | if (nblock == 0) { 868 | l = nblock2 * lblock; 869 | 870 | if (havebuf) { 871 | GRAIL_MERGE_LEFT(arr, l, llast, -lblock); 872 | } else { 873 | GRAIL_MERGE_WITHOUT_BUFFER(arr, l, llast); 874 | } 875 | 876 | return; 877 | } 878 | 879 | lrest = lblock; 880 | frest = SORT_CMP_A(keys, midkey) < 0 ? 0 : 1; 881 | pidx = lblock; 882 | 883 | for (cidx = 1; cidx < nblock; cidx++, pidx += lblock) { 884 | prest = pidx - lrest; 885 | fnext = SORT_CMP_A(keys + cidx, midkey) < 0 ? 0 : 1; 886 | 887 | if (fnext == frest) { 888 | if (havebuf) { 889 | GRAIL_SWAP_N(arr + prest - lblock, arr + prest, lrest); 890 | } 891 | 892 | prest = pidx; 893 | lrest = lblock; 894 | } else { 895 | if (havebuf) { 896 | GRAIL_SMART_MERGE_WITH_BUFFER(arr + prest, &lrest, &frest, lblock, lblock); 897 | } else { 898 | GRAIL_SMART_MERGE_WITHOUT_BUFFER(arr + prest, &lrest, &frest, lblock); 899 | } 900 | } 901 | } 902 | 903 | prest = pidx - lrest; 904 | 905 | if (llast) { 906 | if (frest) { 907 | if (havebuf) { 908 | GRAIL_SWAP_N(arr + prest - lblock, arr + prest, lrest); 909 | } 910 | 911 | prest = pidx; 912 | lrest = lblock * nblock2; 913 | frest = 0; 914 | } else { 915 | lrest += lblock * nblock2; 916 | } 917 | 918 | if (havebuf) { 919 | GRAIL_MERGE_LEFT(arr + prest, lrest, llast, -lblock); 920 | } else { 921 | GRAIL_MERGE_WITHOUT_BUFFER(arr + prest, lrest, llast); 922 | } 923 | } else { 924 | if (havebuf) { 925 | GRAIL_SWAP_N(arr + prest, arr + (prest - lblock), lrest); 926 | } 927 | } 928 | } 929 | 930 | static void GRAIL_LAZY_STABLE_SORT(SORT_TYPE *arr, int L) { 931 | int m, h, p0, p1, rest; 932 | 933 | for (m = 1; m < L; m += 2) { 934 | if (SORT_CMP_A(arr + m - 1, arr + m) > 0) { 935 | GRAIL_SWAP1(arr + (m - 1), arr + m); 936 | } 937 | } 938 | 939 | for (h = 2; h < L; h *= 2) { 940 | p0 = 0; 941 | p1 = L - 2 * h; 942 | 943 | while (p0 <= p1) { 944 | GRAIL_MERGE_WITHOUT_BUFFER(arr + p0, h, h); 945 | p0 += 2 * h; 946 | } 947 | 948 | rest = L - p0; 949 | 950 | if (rest > h) { 951 | GRAIL_MERGE_WITHOUT_BUFFER(arr + p0, h, rest - h); 952 | } 953 | } 954 | } 955 | 956 | /* 957 | keys are on the left of arr. Blocks of length LL combined. We'll combine them in pairs 958 | LL and nkeys are powers of 2. (2*LL/lblock) keys are guarantied 959 | */ 960 | static void GRAIL_COMBINE_BLOCKS(SORT_TYPE *keys, SORT_TYPE *arr, int len, int LL, int lblock, 961 | int havebuf, SORT_TYPE *xbuf) { 962 | int M, b, NBlk, midkey, lrest, u, p, v, kc, nbl2, llast; 963 | SORT_TYPE *arr1; 964 | M = len / (2 * LL); 965 | lrest = len % (2 * LL); 966 | 967 | if (lrest <= LL) { 968 | len -= lrest; 969 | lrest = 0; 970 | } 971 | 972 | if (xbuf) { 973 | SORT_TYPE_CPY(xbuf, arr - lblock, lblock); 974 | } 975 | 976 | for (b = 0; b <= M; b++) { 977 | if (b == M && lrest == 0) { 978 | break; 979 | } 980 | 981 | arr1 = arr + b * 2 * LL; 982 | NBlk = (b == M ? lrest : 2 * LL) / lblock; 983 | SMALL_STABLE_SORT(keys, NBlk + (b == M ? 1 : 0)); 984 | midkey = LL / lblock; 985 | 986 | for (u = 1; u < NBlk; u++) { 987 | p = u - 1; 988 | 989 | for (v = u; v < NBlk; v++) { 990 | kc = SORT_CMP_A(arr1 + p * lblock, arr1 + v * lblock); 991 | 992 | if (kc > 0 || (kc == 0 && SORT_CMP_A(keys + p, keys + v) > 0)) { 993 | p = v; 994 | } 995 | } 996 | 997 | if (p != u - 1) { 998 | GRAIL_SWAP_N(arr1 + (u - 1)*lblock, arr1 + p * lblock, lblock); 999 | GRAIL_SWAP1(keys + (u - 1), keys + p); 1000 | 1001 | if (midkey == u - 1 || midkey == p) { 1002 | midkey ^= (u - 1)^p; 1003 | } 1004 | } 1005 | } 1006 | 1007 | nbl2 = llast = 0; 1008 | 1009 | if (b == M) { 1010 | llast = lrest % lblock; 1011 | } 1012 | 1013 | if (llast != 0) { 1014 | while (nbl2 < NBlk && SORT_CMP_A(arr1 + NBlk * lblock, arr1 + (NBlk - nbl2 - 1) * lblock) < 0) { 1015 | nbl2++; 1016 | } 1017 | } 1018 | 1019 | if (xbuf) { 1020 | GRAIL_MERGE_BUFFERS_LEFT_WITH_X_BUF(keys, keys + midkey, arr1, NBlk - nbl2, lblock, nbl2, llast); 1021 | } else { 1022 | GRAIL_MERGE_BUFFERS_LEFT(keys, keys + midkey, arr1, NBlk - nbl2, lblock, havebuf, nbl2, llast); 1023 | } 1024 | } 1025 | 1026 | if (xbuf) { 1027 | for (p = len; --p >= 0;) { 1028 | arr[p] = arr[p - lblock]; 1029 | } 1030 | 1031 | SORT_TYPE_CPY(arr - lblock, xbuf, lblock); 1032 | } else if (havebuf) { 1033 | while (--len >= 0) { 1034 | GRAIL_SWAP1(arr + len, arr + len - lblock); 1035 | } 1036 | } 1037 | } 1038 | 1039 | 1040 | static void GRAIL_COMMON_SORT(SORT_TYPE *arr, int Len, SORT_TYPE *extbuf, int LExtBuf) { 1041 | int lblock, nkeys, findkeys, ptr, cbuf, lb, nk; 1042 | int havebuf, chavebuf; 1043 | long long s; 1044 | 1045 | if (Len <= SMALL_SORT_BND) { 1046 | SMALL_STABLE_SORT(arr, Len); 1047 | return; 1048 | } 1049 | 1050 | lblock = 1; 1051 | 1052 | while (lblock * lblock < Len) { 1053 | lblock *= 2; 1054 | } 1055 | 1056 | nkeys = (Len - 1) / lblock + 1; 1057 | findkeys = GRAIL_FIND_KEYS(arr, Len, nkeys + lblock); 1058 | havebuf = 1; 1059 | 1060 | if (findkeys < nkeys + lblock) { 1061 | if (findkeys < 4) { 1062 | GRAIL_LAZY_STABLE_SORT(arr, Len); 1063 | return; 1064 | } 1065 | 1066 | nkeys = lblock; 1067 | 1068 | while (nkeys > findkeys) { 1069 | nkeys /= 2; 1070 | } 1071 | 1072 | havebuf = 0; 1073 | lblock = 0; 1074 | } 1075 | 1076 | ptr = lblock + nkeys; 1077 | cbuf = havebuf ? lblock : nkeys; 1078 | 1079 | if (havebuf) { 1080 | GRAIL_BUILD_BLOCKS(arr + ptr, Len - ptr, cbuf, extbuf, LExtBuf); 1081 | } else { 1082 | GRAIL_BUILD_BLOCKS(arr + ptr, Len - ptr, cbuf, NULL, 0); 1083 | } 1084 | 1085 | /* 2*cbuf are built */ 1086 | while (Len - ptr > (cbuf *= 2)) { 1087 | lb = lblock; 1088 | chavebuf = havebuf; 1089 | 1090 | if (!havebuf) { 1091 | if (nkeys > 4 && nkeys / 8 * nkeys >= cbuf) { 1092 | lb = nkeys / 2; 1093 | chavebuf = 1; 1094 | } else { 1095 | nk = 1; 1096 | s = (long long)cbuf * findkeys / 2; 1097 | 1098 | while (nk < nkeys && s != 0) { 1099 | nk *= 2; 1100 | s /= 8; 1101 | } 1102 | 1103 | lb = (2 * cbuf) / nk; 1104 | } 1105 | } 1106 | 1107 | GRAIL_COMBINE_BLOCKS(arr, arr + ptr, Len - ptr, cbuf, lb, chavebuf, chavebuf 1108 | && lb <= LExtBuf ? extbuf : NULL); 1109 | } 1110 | 1111 | SMALL_STABLE_SORT(arr, ptr); 1112 | GRAIL_MERGE_WITHOUT_BUFFER(arr, ptr, Len - ptr); 1113 | } 1114 | 1115 | SORT_DEF void GRAIL_SORT(SORT_TYPE *arr, size_t Len) { 1116 | GRAIL_COMMON_SORT(arr, (int)Len, NULL, 0); 1117 | } 1118 | 1119 | SORT_DEF void GRAIL_SORT_FIXED_BUFFER(SORT_TYPE *arr, size_t Len) { 1120 | SORT_TYPE ExtBuf[GRAIL_EXT_BUFFER_LENGTH]; 1121 | GRAIL_COMMON_SORT(arr, (int)Len, ExtBuf, GRAIL_EXT_BUFFER_LENGTH); 1122 | } 1123 | 1124 | SORT_DEF void GRAIL_SORT_DYN_BUFFER(SORT_TYPE *arr, size_t Len) { 1125 | int L = 1; 1126 | SORT_TYPE *ExtBuf; 1127 | 1128 | while (L * L < Len) { 1129 | L *= 2; 1130 | } 1131 | 1132 | ExtBuf = SORT_NEW_BUFFER(L); 1133 | 1134 | if (ExtBuf == NULL) { 1135 | GRAIL_SORT_FIXED_BUFFER(arr, Len); 1136 | } else { 1137 | GRAIL_COMMON_SORT(arr, (int)Len, ExtBuf, L); 1138 | SORT_DELETE_BUFFER(ExtBuf); 1139 | } 1140 | } 1141 | 1142 | /****** classic MergeInPlace *************/ 1143 | 1144 | static void GRAIL_REC_MERGE(SORT_TYPE *A, int L1, int L2) { 1145 | int K, k1, k2, m1, m2; 1146 | 1147 | if (L1 < 3 || L2 < 3) { 1148 | GRAIL_MERGE_WITHOUT_BUFFER(A, L1, L2); 1149 | return; 1150 | } 1151 | 1152 | if (L1 < L2) { 1153 | K = L1 + L2 / 2; 1154 | } else { 1155 | K = L1 / 2; 1156 | } 1157 | 1158 | k1 = k2 = GRAIL_BIN_SEARCH_LEFT(A, L1, A + K); 1159 | 1160 | if (k2 < L1 && SORT_CMP_A(A + k2, A + K) == 0) { 1161 | k2 = GRAIL_BIN_SEARCH_RIGHT(A + k1, L1 - k1, A + K) + k1; 1162 | } 1163 | 1164 | m1 = GRAIL_BIN_SEARCH_LEFT(A + L1, L2, A + K); 1165 | m2 = m1; 1166 | 1167 | if (m2 < L2 && SORT_CMP_A(A + L1 + m2, A + K) == 0) { 1168 | m2 = GRAIL_BIN_SEARCH_RIGHT(A + L1 + m1, L2 - m1, A + K) + m1; 1169 | } 1170 | 1171 | if (k1 == k2) { 1172 | GRAIL_ROTATE(A + k2, L1 - k2, m2); 1173 | } else { 1174 | GRAIL_ROTATE(A + k1, L1 - k1, m1); 1175 | 1176 | if (m2 != m1) { 1177 | GRAIL_ROTATE(A + (k2 + m1), L1 - k2, m2 - m1); 1178 | } 1179 | } 1180 | 1181 | GRAIL_REC_MERGE(A + (k2 + m2), L1 - k2, L2 - m2); 1182 | GRAIL_REC_MERGE(A, k1, m1); 1183 | } 1184 | 1185 | SORT_DEF void REC_STABLE_SORT(SORT_TYPE *arr, size_t L) { 1186 | int m, h, p0, p1, rest; 1187 | 1188 | for (m = 1; m < L; m += 2) { 1189 | if (SORT_CMP_A(arr + m - 1, arr + m) > 0) { 1190 | GRAIL_SWAP1(arr + (m - 1), arr + m); 1191 | } 1192 | } 1193 | 1194 | for (h = 2; h < L; h *= 2) { 1195 | p0 = 0; 1196 | p1 = (int)(L - 2 * h); 1197 | 1198 | while (p0 <= p1) { 1199 | GRAIL_REC_MERGE(arr + p0, h, h); 1200 | p0 += 2 * h; 1201 | } 1202 | 1203 | rest = (int)(L - p0); 1204 | 1205 | if (rest > h) { 1206 | GRAIL_REC_MERGE(arr + p0, h, rest - h); 1207 | } 1208 | } 1209 | } 1210 | 1211 | /* Bubble sort implementation based on Wikipedia article 1212 | https://en.wikipedia.org/wiki/Bubble_sort 1213 | */ 1214 | SORT_DEF void BUBBLE_SORT(SORT_TYPE *dst, const size_t size) { 1215 | size_t n = size; 1216 | 1217 | while (n) { 1218 | size_t i, newn = 0U; 1219 | 1220 | for (i = 1U; i < n; ++i) { 1221 | if (SORT_CMP(dst[i - 1U], dst[i]) > 0) { 1222 | SORT_SWAP(dst[i - 1U], dst[i]); 1223 | newn = i; 1224 | } 1225 | } 1226 | 1227 | n = newn; 1228 | } 1229 | } 1230 | 1231 | /* Selection sort */ 1232 | SORT_DEF void SELECTION_SORT(SORT_TYPE *dst, const size_t size) { 1233 | size_t i, j; 1234 | 1235 | /* don't bother sorting an array of size <= 1 */ 1236 | if (size <= 1) { 1237 | return; 1238 | } 1239 | 1240 | for (i = 0; i < size; i++) { 1241 | for (j = i + 1; j < size; j++) { 1242 | if (SORT_CMP(dst[j], dst[i]) < 0) { 1243 | SORT_SWAP(dst[i], dst[j]); 1244 | } 1245 | } 1246 | } 1247 | } 1248 | 1249 | #undef GRAIL_SWAP1 1250 | #undef REC_STABLE_SORT 1251 | #undef GRAIL_REC_MERGE 1252 | #undef GRAIL_SORT_DYN_BUFFER 1253 | #undef GRAIL_SORT_FIXED_BUFFER 1254 | #undef GRAIL_COMMON_SORT 1255 | #undef GRAIL_SORT 1256 | #undef GRAIL_COMBINE_BLOCKS 1257 | #undef GRAIL_LAZY_STABLE_SORT 1258 | #undef GRAIL_MERGE_WITHOUT_BUFFER 1259 | #undef GRAIL_ROTATE 1260 | #undef GRAIL_BIN_SEARCH_LEFT 1261 | #undef GRAIL_BUILD_BLOCKS 1262 | #undef GRAIL_FIND_KEYS 1263 | #undef GRAIL_MERGE_BUFFERS_LEFT_WITH_X_BUF 1264 | #undef GRAIL_BIN_SEARCH_RIGHT 1265 | #undef GRAIL_MERGE_BUFFERS_LEFT 1266 | #undef GRAIL_SMART_MERGE_WITH_X_BUF 1267 | #undef GRAIL_MERGE_LEFT_WITH_X_BUF 1268 | #undef GRAIL_SMART_MERGE_WITHOUT_BUFFER 1269 | #undef GRAIL_SMART_MERGE_WITH_BUFFER 1270 | #undef GRAIL_MERGE_RIGHT 1271 | #undef GRAIL_MERGE_LEFT 1272 | #undef GRAIL_SWAP_N 1273 | #undef SQRT_SORT 1274 | #undef SQRT_SORT_BUILD_BLOCKS 1275 | #undef SQRT_SORT_MERGE_BUFFERS_LEFT_WITH_X_BUF 1276 | #undef SQRT_SORT_MERGE_DOWN 1277 | #undef SQRT_SORT_MERGE_LEFT_WITH_X_BUF 1278 | #undef SQRT_SORT_MERGE_RIGHT 1279 | #undef SQRT_SORT_SWAP_N 1280 | #undef SQRT_SORT_SWAP_1 1281 | #undef SQRT_SORT_SMART_MERGE_WITH_X_BUF 1282 | #undef SQRT_SORT_SORT_INS 1283 | #undef SQRT_SORT_COMBINE_BLOCKS 1284 | #undef SQRT_SORT_COMMON_SORT 1285 | #undef SORT_CMP_A 1286 | #undef BUBBLE_SORT 1287 | #undef SELECTION_SORT 1288 | #undef GRAIL_EXT_BUFFER_LENGTH 1289 | -------------------------------------------------------------------------------- /stresstest.c: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2010-2014 Christopher Swenson. */ 2 | /* Copyright (c) 2012 Google Inc. All Rights Reserved. */ 3 | 4 | #define _XOPEN_SOURCE 5 | 6 | #define SORT_NAME sorter 7 | #define SORT_TYPE int64_t 8 | #define SORT_CMP(x, y) ((x) - (y)) 9 | #ifdef SET_SORT_EXTRA 10 | #define SORT_EXTRA 11 | #endif 12 | #include "sort.h" 13 | 14 | #define SORT_NAME stable 15 | #define SORT_TYPE int* 16 | #define SORT_CMP(x, y) (*(x) - *(y)) 17 | #ifdef SET_SORT_EXTRA 18 | #define SORT_EXTRA 19 | #endif 20 | #include "sort.h" 21 | 22 | /* Used to control the stress test */ 23 | #define SEED 123 24 | #define MAXSIZE 45000 25 | #define TESTS 1000 26 | 27 | #define RAND_RANGE(__n, __min, __max) \ 28 | (__n) = (__min) + (long) ((double) ( (double) (__max) - (__min) + 1.0) * ((__n) / (0x7fffffff + 1.0))) 29 | 30 | enum { 31 | FILL_RANDOM, 32 | FILL_SAME, 33 | FILL_SORTED, 34 | FILL_SORTED_10, 35 | FILL_SORTED_100, 36 | FILL_SORTED_10000, 37 | FILL_SWAPPED_N2, 38 | FILL_SWAPPED_N8, 39 | FILL_EVIL, 40 | FILL_LAST_ELEMENT 41 | }; 42 | 43 | char *test_names[FILL_LAST_ELEMENT] = { 44 | "random numbers", 45 | "same number", 46 | "sorted numbers", 47 | "sorted blocks of length 10", 48 | "sorted blocks of length 100", 49 | "sorted blocks of length 10000", 50 | "swapped size/2 pairs", 51 | "swapped size/8 pairs", 52 | "known evil data" 53 | }; 54 | 55 | /* used for stdlib */ 56 | static __inline int simple_cmp(const void *a, const void *b) { 57 | const int64_t da = *((const int64_t *) a); 58 | const int64_t db = *((const int64_t *) b); 59 | return (da < db) ? -1 : (da == db) ? 0 : 1; 60 | } 61 | 62 | #ifdef _WIN32 63 | 64 | #include 65 | static __inline void srand48(long seed) { 66 | srand(seed); 67 | } 68 | 69 | static __inline long lrand48(void) { 70 | int x; 71 | rand_s(&x); 72 | return x & 0x7fffffff; 73 | } 74 | 75 | static __inline double utime(void) { 76 | struct timespec ts; 77 | timespec_get(&ts, TIME_UTC); 78 | return 1000000.0 * ts.tv_sec + ts.tv_nsec / 1000.0; 79 | } 80 | #else 81 | 82 | #include 83 | static __inline double utime(void) { 84 | struct timeval t; 85 | gettimeofday(&t, NULL); 86 | return (1000000.0 * t.tv_sec + t.tv_usec); 87 | } 88 | #endif 89 | 90 | /* helper functions */ 91 | int verify(int64_t *dst, const int size) { 92 | int i; 93 | 94 | for (i = 1; i < size; i++) { 95 | if (dst[i - 1] > dst[i]) { 96 | printf("Verify failed! at %d", i); 97 | return 0; 98 | } 99 | } 100 | 101 | return 1; 102 | } 103 | 104 | static void fill_random(int64_t *dst, const int size) { 105 | int i; 106 | 107 | for (i = 0; i < size; i++) { 108 | dst[i] = lrand48(); 109 | } 110 | } 111 | 112 | static void fill_same(int64_t *dst, const int size) { 113 | int i; 114 | 115 | for (i = 0; i < size; i++) { 116 | dst[i] = 0; 117 | } 118 | } 119 | 120 | static void fill_sorted(int64_t *dst, const int size) { 121 | int i; 122 | 123 | for (i = 0; i < size; i++) { 124 | dst[i] = i; 125 | } 126 | } 127 | 128 | static void fill_sorted_blocks(int64_t *dst, const int size, const int block_size) { 129 | int i, filled, this_block_size; 130 | filled = 0; 131 | 132 | for (i = 0; i < size; i += block_size) { 133 | this_block_size = (filled + block_size) < size ? block_size : (size - filled); 134 | fill_random(dst + filled, this_block_size); 135 | qsort(dst + filled, this_block_size, sizeof(int64_t), simple_cmp); 136 | filled += this_block_size; 137 | } 138 | } 139 | 140 | static void fill_swapped(int64_t *dst, const int size, const int swapped_cnt) { 141 | int i, tmp; 142 | size_t ind1 = 0; 143 | size_t ind2 = 0; 144 | fill_sorted(dst, size); 145 | 146 | for (i = 0; i < swapped_cnt; i++) { 147 | ind1 = lrand48(); 148 | RAND_RANGE(ind1, 0, size - 1); 149 | ind2 = lrand48(); 150 | RAND_RANGE(ind2, 0, size - 1); 151 | tmp = dst[ind1]; 152 | dst[ind1] = dst[ind2]; 153 | dst[ind2] = tmp; 154 | } 155 | } 156 | 157 | static void fill_evil(int64_t *dst, const int size) { 158 | int i; 159 | 160 | for (i = 0; i < size; i++) { 161 | dst[i] = i ^ 1; 162 | } 163 | } 164 | 165 | static void fill(int64_t *dst, const int size, int type) { 166 | switch (type) { 167 | case FILL_SORTED: 168 | fill_sorted(dst, size); 169 | break; 170 | 171 | case FILL_SORTED_10: 172 | fill_sorted_blocks(dst, size, 10); 173 | break; 174 | 175 | case FILL_SORTED_100: 176 | fill_sorted_blocks(dst, size, 100); 177 | break; 178 | 179 | case FILL_SORTED_10000: 180 | fill_sorted_blocks(dst, size, 10000); 181 | break; 182 | 183 | case FILL_SWAPPED_N2: 184 | fill_swapped(dst, size, size / 2); 185 | break; 186 | 187 | case FILL_SWAPPED_N8: 188 | fill_swapped(dst, size, size / 8); 189 | break; 190 | 191 | case FILL_SAME: 192 | fill_same(dst, size); 193 | break; 194 | 195 | case FILL_EVIL: 196 | fill_evil(dst, size); 197 | break; 198 | 199 | case FILL_RANDOM: 200 | default: 201 | fill_random(dst, size); 202 | break; 203 | } 204 | } 205 | 206 | #define TEST_STDLIB(name) do { \ 207 | res = 0; \ 208 | diff = 0; \ 209 | printf("%-29s", "stdlib " #name ); \ 210 | for (test = 0; test < sizes_cnt; test++) { \ 211 | int64_t size = sizes[test]; \ 212 | fill(dst, size, type); \ 213 | usec1 = utime(); \ 214 | name (dst, size, sizeof(int64_t), simple_cmp); \ 215 | usec2 = utime(); \ 216 | res = verify(dst, size); \ 217 | if (!res) { \ 218 | break; \ 219 | } \ 220 | diff += usec2 - usec1; \ 221 | } \ 222 | printf(" - %s, %10.1f usec\n", res ? "ok" : "FAILED", diff); \ 223 | if (!res) return 0; \ 224 | } while (0) 225 | 226 | 227 | #define TEST_SORT_H(name) do { \ 228 | res = 0; \ 229 | diff = 0; \ 230 | printf("%-29s", "sort.h " #name); \ 231 | for (test = 0; test < sizes_cnt; test++) { \ 232 | int64_t size = sizes[test]; \ 233 | fill(dst, size, type); \ 234 | usec1 = utime(); \ 235 | sorter_ ## name (dst, size); \ 236 | usec2 = utime(); \ 237 | res = verify(dst, size); \ 238 | if (!res) { \ 239 | break; \ 240 | } \ 241 | diff += usec2 - usec1; \ 242 | } \ 243 | printf(" - %s, %10.1f usec\n", res ? "ok" : "FAILED", diff); \ 244 | if (!res) return 0; \ 245 | } while (0) 246 | 247 | int run_tests(int64_t *sizes, int sizes_cnt, int type) { 248 | int test, res; 249 | double usec1, usec2, diff; 250 | int64_t * dst = (int64_t *)malloc(MAXSIZE * sizeof(int64_t)); 251 | printf("-------\nRunning tests with %s:\n-------\n", test_names[type]); 252 | TEST_STDLIB(qsort); 253 | #if !defined(__linux__) && !defined(__CYGWIN__) && !defined(_WIN32) 254 | TEST_STDLIB(heapsort); 255 | TEST_STDLIB(mergesort); 256 | #endif 257 | 258 | if (MAXSIZE < 10000) { 259 | #ifdef SET_SORT_EXTRA 260 | TEST_SORT_H(selection_sort); 261 | TEST_SORT_H(bubble_sort); 262 | #endif 263 | TEST_SORT_H(binary_insertion_sort); 264 | TEST_SORT_H(bitonic_sort); 265 | } 266 | 267 | TEST_SORT_H(quick_sort); 268 | TEST_SORT_H(merge_sort); 269 | TEST_SORT_H(heap_sort); 270 | TEST_SORT_H(shell_sort); 271 | TEST_SORT_H(tim_sort); 272 | TEST_SORT_H(merge_sort_in_place); 273 | #ifdef SET_SORT_EXTRA 274 | TEST_SORT_H(grail_sort); 275 | TEST_SORT_H(sqrt_sort); 276 | TEST_SORT_H(rec_stable_sort); 277 | TEST_SORT_H(grail_sort_dyn_buffer); 278 | #endif 279 | free(dst); 280 | return 0; 281 | } 282 | 283 | /* stability testing functions */ 284 | /* cheap hack to keep a copy to compare against */ 285 | static int **original; 286 | static int first_original = 0; 287 | 288 | /* make a int* array */ 289 | void make_intp_array(int **array, int64_t size, int num_values) { 290 | int64_t i; 291 | 292 | if (first_original == 0) { 293 | first_original = 1; 294 | original = malloc(sizeof(int *) * size); 295 | } 296 | 297 | for (i = 0; i < size; i++) { 298 | array[i] = original[i] = malloc(sizeof(int)); 299 | *(array[i]) = lrand48() % num_values; 300 | } 301 | } 302 | 303 | /* free all the pointers */ 304 | void clean_intp_array(int **array, int64_t size) { 305 | int64_t i; 306 | 307 | for (i = 0; i < size; i++) { 308 | free(array[i]); 309 | } 310 | } 311 | 312 | /* find the first instance of an element in an array of pointers */ 313 | int64_t find_next(int **array, int64_t start, int64_t last, int value) { 314 | int64_t i; 315 | 316 | for (i = start; i < last; i++) { 317 | if (*(array[i]) == value) { 318 | break; 319 | } 320 | } 321 | 322 | return i; 323 | } 324 | 325 | /* verify that the given list is stable */ 326 | int verify_stable(int **array, int64_t size, int num_values) { 327 | int value; 328 | int64_t i = 0; 329 | int64_t j = 0; 330 | 331 | for (value = 0; value < num_values; value++) { 332 | while (1) { 333 | i = find_next(original, i, size, value); 334 | 335 | /* unlikely, but possible */ 336 | if (i == size) { 337 | break; 338 | } 339 | 340 | j = find_next(array, j, size, value); 341 | 342 | if (j == size) { 343 | return 0; 344 | } 345 | 346 | if (original[i] != array[j]) { 347 | return 0; 348 | } 349 | 350 | i++; 351 | j++; 352 | } 353 | } 354 | 355 | return 1; 356 | } 357 | 358 | /* Checks that given sort function is stable. */ 359 | void check_stable(char *name, void (*sort_fun)(int **arr, size_t size), int size, int num_values) { 360 | int **array = malloc(sizeof(int *) * size); 361 | make_intp_array(array, size, num_values); 362 | sort_fun(array, size); 363 | printf("%21s -- %s\n", name, verify_stable(array, size, num_values) ? "stable" : "UNSTABLE"); 364 | clean_intp_array(array, size); 365 | free(array); 366 | } 367 | 368 | /* Check which sorts are stable. */ 369 | void stable_tests(void) { 370 | int size = 100000; 371 | int num_values = 1000; 372 | check_stable("binary insertion sort", stable_binary_insertion_sort, size, num_values); 373 | #ifdef SET_SORT_EXTRA 374 | check_stable("selection sort", stable_selection_sort, size, num_values); 375 | check_stable("bubble sort", stable_bubble_sort, size, num_values); 376 | #endif 377 | check_stable("quick sort", stable_quick_sort, size, num_values); 378 | check_stable("merge sort", stable_merge_sort, size, num_values); 379 | check_stable("heap sort", stable_heap_sort, size, num_values); 380 | check_stable("shell sort", stable_shell_sort, size, num_values); 381 | check_stable("tim sort", stable_tim_sort, size, num_values); 382 | check_stable("merge (in-place) sort", stable_merge_sort_in_place, size, num_values); 383 | #ifdef SET_SORT_EXTRA 384 | check_stable("grail sort", stable_grail_sort, size, num_values); 385 | check_stable("sqrt sort", stable_sqrt_sort, size, num_values); 386 | check_stable("rec stable sort", stable_rec_stable_sort, size, num_values); 387 | check_stable("grail sort dyn byffer", stable_grail_sort_dyn_buffer, size, num_values); 388 | #endif 389 | } 390 | 391 | int main(void) { 392 | int i = 0; 393 | int64_t sizes[TESTS]; 394 | srand48(SEED); 395 | stable_tests(); 396 | fill_random(sizes, TESTS); 397 | 398 | for (i = 0; i < TESTS; i++) { 399 | RAND_RANGE(sizes[i], 1, MAXSIZE); 400 | } 401 | 402 | sizes[TESTS - 1] = MAXSIZE; 403 | 404 | for (i = 0; i < FILL_LAST_ELEMENT; i++) { 405 | int result = run_tests(sizes, TESTS, i); 406 | 407 | if (result) { 408 | return 1; 409 | } 410 | } 411 | 412 | return 0; 413 | } 414 | --------------------------------------------------------------------------------