├── Makefile ├── README.md ├── bench.cc └── qsort.h /Makefile: -------------------------------------------------------------------------------- 1 | RPM_OPT_FLAGS ?= -O2 -g -Wall 2 | all: bench 3 | bench: bench.cc qsort.h mjt.h 4 | $(CXX) $(RPM_OPT_FLAGS) -fwhole-program -o $@ $< 5 | ./$@ 6 | mjt.h: 7 | wget -O $@ http://www.corpit.ru/mjt/qsort/qsort.h 8 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # qsort.h - Quicksort as a C macro 2 | 3 | This is a traditional [Quicksort](https://en.wikipedia.org/wiki/Quicksort) 4 | implementation which for the most part follows 5 | [Robert Sedgewick's 1978 paper](http://penguin.ewu.edu/cscd300/Topic/AdvSorting/Sedgewick.pdf). 6 | It is implemented as a C macro, which means that comparisons can be inlined. 7 | A distinctive feature of this implementation is that it works entirely on array 8 | indices, while actual access to the array elements is abstracted out with 9 | the `less` and `swap` primitives provided by the caller. Here is an example 10 | of how to sort an array of integers: 11 | 12 | ```c 13 | #include "qsort.h" 14 | void isort(int A[], size_t n) 15 | { 16 | int tmp; 17 | #define LESS(i, j) A[i] < A[j] 18 | #define SWAP(i, j) tmp = A[i], A[i] = A[j], A[j] = tmp 19 | QSORT(n, LESS, SWAP); 20 | } 21 | ``` 22 | Since access to the actual array is so completely abstracted out, 23 | the macro can be used to sort a few dependent arrays (which, 24 | to the best of my knowledge, no other implementation can do): 25 | 26 | ```c 27 | #include "qsort.h" 28 | int sortByAge(size_t n, const char *names[], int ages[]) 29 | { 30 | const char *tmpName; 31 | int tmpAge; 32 | #define LESS(i, j) ages[i] < ages[j] 33 | #define SWAP(i, j) tmpName = names[i], tmpAge = ages[i], \ 34 | names[i] = names[j], ages[i] = ages[j], \ 35 | names[j] = tmpName, ages[j] = tmpAge 36 | QSORT(n, LESS, SWAP); 37 | } 38 | ``` 39 | The sort is not [stable](https://en.wikipedia.org/wiki/Sorting_algorithm#Stability) 40 | (this is inherent to most of Quicksort variants). To impose order among 41 | the names with the same age, the `LESS` macro can be enhanced like this: 42 | 43 | ```c 44 | #define LESS(i, j) ages[i] < ages[j] || \ 45 | (ages[i] == ages[j] && strcmp(names[i], names[j]) < 0) 46 | ``` 47 | This Quicksort implementation is written by Alexey Tourbin. 48 | The source code is provided under the 49 | [MIT License](https://en.wikipedia.org/wiki/MIT_License). 50 | 51 | ## Performance 52 | 53 | A [benchmark](bench.cc) is provided which evaluates the performance 54 | of a few implementations: libc's `qsort(3)`, STL's `std::sort` (denoted 55 | resp. `stdlib` and `stl`), Michael Tokarev's 56 | [Inline QSORT() implementation](http://www.corpit.ru/mjt/qsort.html), 57 | and this implementation (denoted resp. `mjt` and `svpv`). 58 | Michael Tokarev's implementation is based on an older glibc's version 59 | of Quicksort. Modern glibc versions, including the one used below, 60 | use [merge sort](https://en.wikipedia.org/wiki/Merge_sort). 61 | 62 | A word of warning: this benchmark does only a tiny bit of averaging. 63 | For conclusive evidence, the program needs to be run multiple times. 64 | 65 | By default, the `bench` program sorts 1M random integers. 66 | 67 | ``` 68 | $ make 69 | g++ -O2 -g -Wall -fwhole-program -o bench bench.cc 70 | ./bench 71 | stdlib 402584990 19644762 72 | stl 230878632 25344013 73 | mjt 272302466 24316349 74 | svpv 245908342 23287211 75 | ``` 76 | The STL implementation turns out to be the fastest (the first column 77 | indicates the number of 78 | [RDTSC cycles](https://en.wikipedia.org/wiki/Time_Stamp_Counter)), 79 | despite the fact that it performs the largest number of comparisons 80 | (the second column). 81 | 82 | One reason my implementation comes in second to STL is some if its design 83 | limitations. The `swap` macro issues three moves as a whole, while some 84 | parts of the algorithm, notably 85 | [insertion sort](https://en.wikipedia.org/wiki/Insertion_sort), 86 | can benefit from copying items to the right one position rather than doing 87 | full exchanges. (It wouldn't be enough to factor `swap` into `save`, 88 | `restore`, and `copy`, though. After an item is saved to the temporary 89 | register, it is further required to compare other items to that temporary 90 | register, which `less` can't do.) 91 | 92 | Of course, to a considerable degree, performance depends on the compiler 93 | being used. I found that my implementation is favoured by Clang, which 94 | also disrespects `std::sort` (using `-O3` doesn't help). Sedgewick was 95 | right when he said we exposed ourselves to the whims of compilers. 96 | 97 | ``` 98 | $ rm -f bench && make CXX=clang 99 | clang -O2 -g -Wall -fwhole-program -o bench bench.cc 100 | clang: warning: optimization flag '-fwhole-program' is not supported 101 | ./bench 102 | stdlib 414620784 19644762 103 | stl 321896126 25344013 104 | mjt 270644434 24316349 105 | svpv 233669286 23287211 106 | ``` 107 | 108 | Relevant to performance is another characteristic of my implementation: 109 | it does not assume that comparisons are cheap, as with integers, and 110 | deliberately tries to reduce the number of comparisons when it is easily 111 | possible (specifically, during insertion sort, it does not trade boundary 112 | checks for extra comparisons). This pays off when comparisons are 113 | expensive, such as when comparing string keys with `strcmp(3)`. 114 | In the following example, I use filenames and dependencies from the RPM 115 | database as the set of strings to be sorted, shuffling them with `shuf(1)`. 116 | 117 | ``` 118 | $ rpm -qa --list --requires --provides | shuf >lines 119 | $ wc -l In the above examples, GCC 6.3.1 and Clang 3.8.0 have been used 133 | on a Haswell CPU. 134 | -------------------------------------------------------------------------------- /bench.cc: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2013, 2017 Alexey Tourbin 3 | * 4 | * Permission is hereby granted, free of charge, to any person obtaining a copy 5 | * of this software and associated documentation files (the "Software"), to deal 6 | * in the Software without restriction, including without limitation the rights 7 | * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 8 | * copies of the Software, and to permit persons to whom the Software is 9 | * furnished to do so, subject to the following conditions: 10 | * 11 | * The above copyright notice and this permission notice shall be included in 12 | * all copies or substantial portions of the Software. 13 | * 14 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 15 | * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 16 | * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 17 | * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 18 | * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 19 | * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 20 | * SOFTWARE. 21 | */ 22 | #include 23 | #include 24 | #include 25 | 26 | /* Number of comparisons that a sort function makes. */ 27 | size_t ncmp; 28 | #if 1 29 | #define NCMPINC ncmp++ 30 | #else 31 | #define NCMPINC (void*)0 32 | #endif 33 | 34 | /* 35 | * Numeric comparison 36 | */ 37 | static int icmp(const void *i1, const void *i2) 38 | { 39 | NCMPINC; 40 | return *(int *) i1 - *(int *) i2; 41 | } 42 | void stdlib_isort(int A[], size_t n) 43 | { 44 | qsort(A, n, sizeof(int), icmp); 45 | } 46 | 47 | #ifdef __cplusplus 48 | #include 49 | static inline bool iless(int a, int b) 50 | { 51 | NCMPINC; 52 | return a < b; 53 | } 54 | void stl_isort(int A[], size_t n) 55 | { 56 | std::sort(A, A + n, iless); 57 | } 58 | #endif 59 | 60 | #include "mjt.h" 61 | void mjt_isort(int A[], size_t n) 62 | { 63 | #define MJT_ILESS(a, b) (NCMPINC, *(a) < *(b)) 64 | QSORT(int, A, n, MJT_ILESS); 65 | #undef QSORT 66 | } 67 | 68 | #include "qsort.h" 69 | void svpv_isort(int A[], size_t n) 70 | { 71 | int tmp; 72 | #define ILESS(i, j) (NCMPINC, A[i] < A[j]) 73 | #define SWAP(i, j) tmp = A[i], A[i] = A[j], A[j] = tmp 74 | QSORT(n, ILESS, SWAP); 75 | #undef QSORT 76 | } 77 | 78 | /* 79 | * String comparison 80 | */ 81 | #include 82 | static int pstrcmp(const void *a, const void *b) 83 | { 84 | NCMPINC; 85 | return strcmp(*(const char **)a, *(const char **)b); 86 | } 87 | void stdlib_strsort(const char *A[], size_t n) 88 | { 89 | qsort(A, n, sizeof *A, pstrcmp); 90 | } 91 | 92 | #ifdef __cplusplus 93 | static inline bool strless(const char *a, const char *b) 94 | { 95 | NCMPINC; 96 | return strcmp(a, b) < 0; 97 | } 98 | void stl_strsort(const char *A[], size_t n) 99 | { 100 | std::sort(A, A + n, strless); 101 | } 102 | #endif 103 | 104 | #include "mjt.h" 105 | void mjt_strsort(const char *A[], size_t n) 106 | { 107 | #define MJT_STRLESS(a, b) (NCMPINC, strcmp(*(a), *(b)) < 0) 108 | QSORT(const char *, A, n, MJT_STRLESS); 109 | #undef QSORT 110 | } 111 | 112 | #undef QSORT_H 113 | #include "qsort.h" 114 | void svpv_strsort(const char *A[], size_t n) 115 | { 116 | const char *tmp; 117 | #define STRLESS(i, j) (NCMPINC, strcmp(A[i], A[j]) < 0) 118 | QSORT(n, STRLESS, SWAP); 119 | } 120 | 121 | /* 122 | * Benchmarking 123 | */ 124 | #include 125 | #include 126 | 127 | #define N (1 << 20) 128 | static int orig[N]; 129 | static int copy[N]; 130 | 131 | #include 132 | #include 133 | 134 | uint64_t bench_int(size_t n, void (*sort)(int A[], size_t n)) 135 | { 136 | // Make 4 runs, throw away min and max, average the other two. 137 | uint64_t min = UINT64_MAX, max = 0, sum = 0; 138 | for (int i = 0; i < 4; i++) { 139 | usleep(1); 140 | memcpy(copy, orig, sizeof orig); 141 | sort(copy, n); 142 | memcpy(copy, orig, sizeof orig); 143 | ncmp = 0; 144 | // Don't reorder instructions. 145 | asm volatile ("" ::: "memory"); 146 | uint64_t t = __rdtsc(); 147 | asm volatile ("" ::: "memory"); 148 | sort(copy, n); 149 | asm volatile ("" ::: "memory"); 150 | t = __rdtsc() - t; 151 | asm volatile ("" ::: "memory"); 152 | sum += t; 153 | if (t < min) 154 | min = t; 155 | else if (t > max) 156 | max = t; 157 | } 158 | // See if it can actually sort. 159 | for (size_t i = 1; i < n; i++) 160 | assert(copy[i-1] <= copy[i]); 161 | sum -= min + max; 162 | return sum / 2; 163 | } 164 | 165 | #define N_STR (1 << 20) 166 | static const char *orig_str[N]; 167 | static const char *copy_str[N]; 168 | 169 | uint64_t bench_str(size_t n, void (*strsort)(const char *A[], size_t n)) 170 | { 171 | uint64_t min = UINT64_MAX, max = 0, sum = 0; 172 | for (int i = 0; i < 4; i++) { 173 | usleep(1); 174 | memcpy(copy_str, orig_str, sizeof orig_str); 175 | strsort(copy_str, n); 176 | memcpy(copy_str, orig_str, sizeof orig_str); 177 | ncmp = 0; 178 | asm volatile ("" ::: "memory"); 179 | uint64_t t = __rdtsc(); 180 | asm volatile ("" ::: "memory"); 181 | strsort(copy_str, n); 182 | asm volatile ("" ::: "memory"); 183 | t = __rdtsc() - t; 184 | asm volatile ("" ::: "memory"); 185 | sum += t; 186 | if (t < min) 187 | min = t; 188 | else if (t > max) 189 | max = t; 190 | } 191 | for (size_t i = 1; i < n; i++) 192 | assert(strcmp(copy_str[i-1], copy_str[i]) <= 0); 193 | sum -= min + max; 194 | return sum / 2; 195 | } 196 | 197 | #include 198 | static int opt_srand; 199 | static int opt_strcmp; 200 | static struct option longopts[] = { 201 | { "srand", no_argument, &opt_srand, 1 }, 202 | { "strcmp", no_argument, &opt_strcmp, 1 }, 203 | { NULL }, 204 | }; 205 | 206 | int main(int argc, char **argv) 207 | { 208 | const char *argv0 = argv[0]; 209 | int usage = 0; 210 | int c; 211 | while ((c = getopt_long(argc, argv, "", longopts, NULL)) != -1) { 212 | switch (c) { 213 | case 0: 214 | break; 215 | default: 216 | usage = 1; 217 | } 218 | } 219 | argc -= optind, argv += optind; 220 | if (argc && !usage) { 221 | fprintf(stderr, "%s: too many arguments\n", argv0); 222 | usage = 1; 223 | } 224 | if (usage) { 225 | fprintf(stderr, "Usage: %s [options]\n", argv0); 226 | return 1; 227 | } 228 | if (opt_srand) 229 | srand(getpid()); 230 | if (opt_strcmp) { 231 | if (isatty(0)) 232 | fprintf(stderr, "reading input from stdin\n"); 233 | size_t n = 0; 234 | while (1) { 235 | char *line = NULL; 236 | size_t alloc = 0; 237 | ssize_t len = getline(&line, &alloc, stdin); 238 | if (len < 0) 239 | break; 240 | orig_str[n++] = line; 241 | if (n == N_STR) 242 | break; 243 | } 244 | printf("stdlib\t%12" PRIu64 "\t", bench_str(n, stdlib_strsort)); 245 | printf("%zu\n", ncmp); 246 | #ifdef __cplusplus 247 | printf("stl\t%12" PRIu64 "\t", bench_str(n, stl_strsort)); 248 | printf("%zu\n", ncmp); 249 | #endif 250 | printf("mjt\t%12" PRIu64 "\t", bench_str(n, mjt_strsort)); 251 | printf("%zu\n", ncmp); 252 | printf("svpv\t%12" PRIu64 "\t", bench_str(n, svpv_strsort)); 253 | printf("%zu\n", ncmp); 254 | } 255 | else { 256 | size_t n = N; 257 | for (size_t i = 0; i < N; i++) 258 | orig[i] = rand(); 259 | printf("stdlib\t%12" PRIu64 "\t", bench_int(n, stdlib_isort)); 260 | printf("%zu\n", ncmp); 261 | #ifdef __cplusplus 262 | printf("stl\t%12" PRIu64 "\t", bench_int(n, stl_isort)); 263 | printf("%zu\n", ncmp); 264 | #endif 265 | printf("mjt\t%12" PRIu64 "\t", bench_int(n, mjt_isort)); 266 | printf("%zu\n", ncmp); 267 | printf("svpv\t%12" PRIu64 "\t", bench_int(n, svpv_isort)); 268 | printf("%zu\n", ncmp); 269 | } 270 | return 0; 271 | } 272 | 273 | // ex:set ts=8 sts=4 sw=4 noet: 274 | -------------------------------------------------------------------------------- /qsort.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2013, 2017 Alexey Tourbin 3 | * 4 | * Permission is hereby granted, free of charge, to any person obtaining a copy 5 | * of this software and associated documentation files (the "Software"), to deal 6 | * in the Software without restriction, including without limitation the rights 7 | * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 8 | * copies of the Software, and to permit persons to whom the Software is 9 | * furnished to do so, subject to the following conditions: 10 | * 11 | * The above copyright notice and this permission notice shall be included in 12 | * all copies or substantial portions of the Software. 13 | * 14 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 15 | * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 16 | * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 17 | * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 18 | * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 19 | * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 20 | * SOFTWARE. 21 | */ 22 | 23 | /* 24 | * This is a traditional Quicksort implementation which mostly follows 25 | * [Sedgewick 1978]. Sorting is performed entirely on array indices, 26 | * while actual access to the array elements is abstracted out with the 27 | * user-defined `LESS` and `SWAP` primitives. 28 | * 29 | * Synopsis: 30 | * QSORT(N, LESS, SWAP); 31 | * where 32 | * N - the number of elements in A[]; 33 | * LESS(i, j) - compares A[i] to A[j]; 34 | * SWAP(i, j) - exchanges A[i] with A[j]. 35 | */ 36 | 37 | #ifndef QSORT_H 38 | #define QSORT_H 39 | 40 | /* Sort 3 elements. */ 41 | #define Q_SORT3(q_a1, q_a2, q_a3, Q_LESS, Q_SWAP) \ 42 | do { \ 43 | if (Q_LESS(q_a2, q_a1)) { \ 44 | if (Q_LESS(q_a3, q_a2)) \ 45 | Q_SWAP(q_a1, q_a3); \ 46 | else { \ 47 | Q_SWAP(q_a1, q_a2); \ 48 | if (Q_LESS(q_a3, q_a2)) \ 49 | Q_SWAP(q_a2, q_a3); \ 50 | } \ 51 | } \ 52 | else if (Q_LESS(q_a3, q_a2)) { \ 53 | Q_SWAP(q_a2, q_a3); \ 54 | if (Q_LESS(q_a2, q_a1)) \ 55 | Q_SWAP(q_a1, q_a2); \ 56 | } \ 57 | } while (0) 58 | 59 | /* Partition [q_l,q_r] around a pivot. After partitioning, 60 | * [q_l,q_j] are the elements that are less than or equal to the pivot, 61 | * while [q_i,q_r] are the elements greater than or equal to the pivot. */ 62 | #define Q_PARTITION(q_l, q_r, q_i, q_j, Q_UINT, Q_LESS, Q_SWAP) \ 63 | do { \ 64 | /* The middle element, not to be confused with the median. */ \ 65 | Q_UINT q_m = q_l + ((q_r - q_l) >> 1); \ 66 | /* Reorder the second, the middle, and the last items. \ 67 | * As [Edelkamp Weiss 2016] explain, using the second element \ 68 | * instead of the first one helps avoid bad behaviour for \ 69 | * decreasingly sorted arrays. This method is used in recent \ 70 | * versions of gcc's std::sort, see gcc bug 58437#c13, although \ 71 | * the details are somewhat different (cf. #c14). */ \ 72 | Q_SORT3(q_l + 1, q_m, q_r, Q_LESS, Q_SWAP); \ 73 | /* Place the median at the beginning. */ \ 74 | Q_SWAP(q_l, q_m); \ 75 | /* Partition [q_l+2, q_r-1] around the median which is in q_l. \ 76 | * q_i and q_j are initially off by one, they get decremented \ 77 | * in the do-while loops. */ \ 78 | q_i = q_l + 1; q_j = q_r; \ 79 | while (1) { \ 80 | do q_i++; while (Q_LESS(q_i, q_l)); \ 81 | do q_j--; while (Q_LESS(q_l, q_j)); \ 82 | if (q_i >= q_j) break; /* Sedgewick says "until j < i" */ \ 83 | Q_SWAP(q_i, q_j); \ 84 | } \ 85 | /* Compensate for the i==j case. */ \ 86 | q_i = q_j + 1; \ 87 | /* Put the median to its final place. */ \ 88 | Q_SWAP(q_l, q_j); \ 89 | /* The median is not part of the left subfile. */ \ 90 | q_j--; \ 91 | } while (0) 92 | 93 | /* Insertion sort is applied to small subfiles - this is contrary to 94 | * Sedgewick's suggestion to run a separate insertion sort pass after 95 | * the partitioning is done. The reason I don't like a separate pass 96 | * is that it triggers extra comparisons, because it can't see that the 97 | * medians are already in their final positions and need not be rechecked. 98 | * Since I do not assume that comparisons are cheap, I also do not try 99 | * to eliminate the (q_j > q_l) boundary check. */ 100 | #define Q_INSERTION_SORT(q_l, q_r, Q_UINT, Q_LESS, Q_SWAP) \ 101 | do { \ 102 | Q_UINT q_i, q_j; \ 103 | /* For each item starting with the second... */ \ 104 | for (q_i = q_l + 1; q_i <= q_r; q_i++) \ 105 | /* move it down the array so that the first part is sorted. */ \ 106 | for (q_j = q_i; q_j > q_l && (Q_LESS(q_j, q_j - 1)); q_j--) \ 107 | Q_SWAP(q_j, q_j - 1); \ 108 | } while (0) 109 | 110 | /* When the size of [q_l,q_r], i.e. q_r-q_l+1, is greater than or equal to 111 | * Q_THRESH, the algorithm performs recursive partitioning. When the size 112 | * drops below Q_THRESH, the algorithm switches to insertion sort. 113 | * The minimum valid value is probably 5 (with 5 items, the second and 114 | * the middle items, the middle itself being rounded down, are distinct). */ 115 | #define Q_THRESH 16 116 | 117 | /* The main loop. */ 118 | #define Q_LOOP(Q_UINT, Q_N, Q_LESS, Q_SWAP) \ 119 | do { \ 120 | Q_UINT q_l = 0; \ 121 | Q_UINT q_r = (Q_N) - 1; \ 122 | Q_UINT q_sp = 0; /* the number of frames pushed to the stack */ \ 123 | struct { Q_UINT q_l, q_r; } \ 124 | /* On 32-bit platforms, to sort a "char[3GB+]" array, \ 125 | * it may take full 32 stack frames. On 64-bit CPUs, \ 126 | * though, the address space is limited to 48 bits. \ 127 | * The usage is further reduced if Q_N has a 32-bit type. */ \ 128 | q_st[sizeof(Q_UINT) > 4 && sizeof(Q_N) > 4 ? 48 : 32]; \ 129 | while (1) { \ 130 | if (q_r - q_l + 1 >= Q_THRESH) { \ 131 | Q_UINT q_i, q_j; \ 132 | Q_PARTITION(q_l, q_r, q_i, q_j, Q_UINT, Q_LESS, Q_SWAP); \ 133 | /* Now have two subfiles: [q_l,q_j] and [q_i,q_r]. \ 134 | * Dealing with them depends on which one is bigger. */ \ 135 | if (q_j - q_l >= q_r - q_i) \ 136 | Q_SUBFILES(q_l, q_j, q_i, q_r); \ 137 | else \ 138 | Q_SUBFILES(q_i, q_r, q_l, q_j); \ 139 | } \ 140 | else { \ 141 | Q_INSERTION_SORT(q_l, q_r, Q_UINT, Q_LESS, Q_SWAP); \ 142 | /* Pop subfiles from the stack, until it gets empty. */ \ 143 | if (q_sp == 0) break; \ 144 | q_sp--; \ 145 | q_l = q_st[q_sp].q_l; \ 146 | q_r = q_st[q_sp].q_r; \ 147 | } \ 148 | } \ 149 | } while (0) 150 | 151 | /* The missing part: dealing with subfiles. 152 | * Assumes that the first subfile is not smaller than the second. */ 153 | #define Q_SUBFILES(q_l1, q_r1, q_l2, q_r2) \ 154 | do { \ 155 | /* If the second subfile is only a single element, it needs \ 156 | * no further processing. The first subfile will be processed \ 157 | * on the next iteration (both subfiles cannot be only a single \ 158 | * element, due to Q_THRESH). */ \ 159 | if (q_l2 == q_r2) { \ 160 | q_l = q_l1; \ 161 | q_r = q_r1; \ 162 | } \ 163 | else { \ 164 | /* Otherwise, both subfiles need processing. \ 165 | * Push the larger subfile onto the stack. */ \ 166 | q_st[q_sp].q_l = q_l1; \ 167 | q_st[q_sp].q_r = q_r1; \ 168 | q_sp++; \ 169 | /* Process the smaller subfile on the next iteration. */ \ 170 | q_l = q_l2; \ 171 | q_r = q_r2; \ 172 | } \ 173 | } while (0) 174 | 175 | /* And now, ladies and gentlemen, may I proudly present to you... */ 176 | #define QSORT(Q_N, Q_LESS, Q_SWAP) \ 177 | do { \ 178 | if ((Q_N) > 1) \ 179 | /* We could check sizeof(Q_N) and use "unsigned", but at least \ 180 | * on x86_64, this has the performance penalty of up to 5%. */ \ 181 | Q_LOOP(unsigned long, Q_N, Q_LESS, Q_SWAP); \ 182 | } while (0) 183 | 184 | #endif 185 | 186 | /* ex:set ts=8 sts=4 sw=4 noet: */ 187 | --------------------------------------------------------------------------------