├── Makefile
├── README.md
├── bench.cc
└── qsort.h
/Makefile:
--------------------------------------------------------------------------------
1 | RPM_OPT_FLAGS ?= -O2 -g -Wall
2 | all: bench
3 | bench: bench.cc qsort.h mjt.h
4 | $(CXX) $(RPM_OPT_FLAGS) -fwhole-program -o $@ $<
5 | ./$@
6 | mjt.h:
7 | wget -O $@ http://www.corpit.ru/mjt/qsort/qsort.h
8 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # qsort.h - Quicksort as a C macro
2 |
3 | This is a traditional [Quicksort](https://en.wikipedia.org/wiki/Quicksort)
4 | implementation which for the most part follows
5 | [Robert Sedgewick's 1978 paper](http://penguin.ewu.edu/cscd300/Topic/AdvSorting/Sedgewick.pdf).
6 | It is implemented as a C macro, which means that comparisons can be inlined.
7 | A distinctive feature of this implementation is that it works entirely on array
8 | indices, while actual access to the array elements is abstracted out with
9 | the `less` and `swap` primitives provided by the caller. Here is an example
10 | of how to sort an array of integers:
11 |
12 | ```c
13 | #include "qsort.h"
14 | void isort(int A[], size_t n)
15 | {
16 | int tmp;
17 | #define LESS(i, j) A[i] < A[j]
18 | #define SWAP(i, j) tmp = A[i], A[i] = A[j], A[j] = tmp
19 | QSORT(n, LESS, SWAP);
20 | }
21 | ```
22 | Since access to the actual array is so completely abstracted out,
23 | the macro can be used to sort a few dependent arrays (which,
24 | to the best of my knowledge, no other implementation can do):
25 |
26 | ```c
27 | #include "qsort.h"
28 | int sortByAge(size_t n, const char *names[], int ages[])
29 | {
30 | const char *tmpName;
31 | int tmpAge;
32 | #define LESS(i, j) ages[i] < ages[j]
33 | #define SWAP(i, j) tmpName = names[i], tmpAge = ages[i], \
34 | names[i] = names[j], ages[i] = ages[j], \
35 | names[j] = tmpName, ages[j] = tmpAge
36 | QSORT(n, LESS, SWAP);
37 | }
38 | ```
39 | The sort is not [stable](https://en.wikipedia.org/wiki/Sorting_algorithm#Stability)
40 | (this is inherent to most of Quicksort variants). To impose order among
41 | the names with the same age, the `LESS` macro can be enhanced like this:
42 |
43 | ```c
44 | #define LESS(i, j) ages[i] < ages[j] || \
45 | (ages[i] == ages[j] && strcmp(names[i], names[j]) < 0)
46 | ```
47 | This Quicksort implementation is written by Alexey Tourbin.
48 | The source code is provided under the
49 | [MIT License](https://en.wikipedia.org/wiki/MIT_License).
50 |
51 | ## Performance
52 |
53 | A [benchmark](bench.cc) is provided which evaluates the performance
54 | of a few implementations: libc's `qsort(3)`, STL's `std::sort` (denoted
55 | resp. `stdlib` and `stl`), Michael Tokarev's
56 | [Inline QSORT() implementation](http://www.corpit.ru/mjt/qsort.html),
57 | and this implementation (denoted resp. `mjt` and `svpv`).
58 | Michael Tokarev's implementation is based on an older glibc's version
59 | of Quicksort. Modern glibc versions, including the one used below,
60 | use [merge sort](https://en.wikipedia.org/wiki/Merge_sort).
61 |
62 | A word of warning: this benchmark does only a tiny bit of averaging.
63 | For conclusive evidence, the program needs to be run multiple times.
64 |
65 | By default, the `bench` program sorts 1M random integers.
66 |
67 | ```
68 | $ make
69 | g++ -O2 -g -Wall -fwhole-program -o bench bench.cc
70 | ./bench
71 | stdlib 402584990 19644762
72 | stl 230878632 25344013
73 | mjt 272302466 24316349
74 | svpv 245908342 23287211
75 | ```
76 | The STL implementation turns out to be the fastest (the first column
77 | indicates the number of
78 | [RDTSC cycles](https://en.wikipedia.org/wiki/Time_Stamp_Counter)),
79 | despite the fact that it performs the largest number of comparisons
80 | (the second column).
81 |
82 | One reason my implementation comes in second to STL is some if its design
83 | limitations. The `swap` macro issues three moves as a whole, while some
84 | parts of the algorithm, notably
85 | [insertion sort](https://en.wikipedia.org/wiki/Insertion_sort),
86 | can benefit from copying items to the right one position rather than doing
87 | full exchanges. (It wouldn't be enough to factor `swap` into `save`,
88 | `restore`, and `copy`, though. After an item is saved to the temporary
89 | register, it is further required to compare other items to that temporary
90 | register, which `less` can't do.)
91 |
92 | Of course, to a considerable degree, performance depends on the compiler
93 | being used. I found that my implementation is favoured by Clang, which
94 | also disrespects `std::sort` (using `-O3` doesn't help). Sedgewick was
95 | right when he said we exposed ourselves to the whims of compilers.
96 |
97 | ```
98 | $ rm -f bench && make CXX=clang
99 | clang -O2 -g -Wall -fwhole-program -o bench bench.cc
100 | clang: warning: optimization flag '-fwhole-program' is not supported
101 | ./bench
102 | stdlib 414620784 19644762
103 | stl 321896126 25344013
104 | mjt 270644434 24316349
105 | svpv 233669286 23287211
106 | ```
107 |
108 | Relevant to performance is another characteristic of my implementation:
109 | it does not assume that comparisons are cheap, as with integers, and
110 | deliberately tries to reduce the number of comparisons when it is easily
111 | possible (specifically, during insertion sort, it does not trade boundary
112 | checks for extra comparisons). This pays off when comparisons are
113 | expensive, such as when comparing string keys with `strcmp(3)`.
114 | In the following example, I use filenames and dependencies from the RPM
115 | database as the set of strings to be sorted, shuffling them with `shuf(1)`.
116 |
117 | ```
118 | $ rpm -qa --list --requires --provides | shuf >lines
119 | $ wc -l In the above examples, GCC 6.3.1 and Clang 3.8.0 have been used
133 | on a Haswell CPU.
134 |
--------------------------------------------------------------------------------
/bench.cc:
--------------------------------------------------------------------------------
1 | /*
2 | * Copyright (c) 2013, 2017 Alexey Tourbin
3 | *
4 | * Permission is hereby granted, free of charge, to any person obtaining a copy
5 | * of this software and associated documentation files (the "Software"), to deal
6 | * in the Software without restriction, including without limitation the rights
7 | * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8 | * copies of the Software, and to permit persons to whom the Software is
9 | * furnished to do so, subject to the following conditions:
10 | *
11 | * The above copyright notice and this permission notice shall be included in
12 | * all copies or substantial portions of the Software.
13 | *
14 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 | * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 | * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17 | * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 | * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19 | * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20 | * SOFTWARE.
21 | */
22 | #include
23 | #include
24 | #include
25 |
26 | /* Number of comparisons that a sort function makes. */
27 | size_t ncmp;
28 | #if 1
29 | #define NCMPINC ncmp++
30 | #else
31 | #define NCMPINC (void*)0
32 | #endif
33 |
34 | /*
35 | * Numeric comparison
36 | */
37 | static int icmp(const void *i1, const void *i2)
38 | {
39 | NCMPINC;
40 | return *(int *) i1 - *(int *) i2;
41 | }
42 | void stdlib_isort(int A[], size_t n)
43 | {
44 | qsort(A, n, sizeof(int), icmp);
45 | }
46 |
47 | #ifdef __cplusplus
48 | #include
49 | static inline bool iless(int a, int b)
50 | {
51 | NCMPINC;
52 | return a < b;
53 | }
54 | void stl_isort(int A[], size_t n)
55 | {
56 | std::sort(A, A + n, iless);
57 | }
58 | #endif
59 |
60 | #include "mjt.h"
61 | void mjt_isort(int A[], size_t n)
62 | {
63 | #define MJT_ILESS(a, b) (NCMPINC, *(a) < *(b))
64 | QSORT(int, A, n, MJT_ILESS);
65 | #undef QSORT
66 | }
67 |
68 | #include "qsort.h"
69 | void svpv_isort(int A[], size_t n)
70 | {
71 | int tmp;
72 | #define ILESS(i, j) (NCMPINC, A[i] < A[j])
73 | #define SWAP(i, j) tmp = A[i], A[i] = A[j], A[j] = tmp
74 | QSORT(n, ILESS, SWAP);
75 | #undef QSORT
76 | }
77 |
78 | /*
79 | * String comparison
80 | */
81 | #include
82 | static int pstrcmp(const void *a, const void *b)
83 | {
84 | NCMPINC;
85 | return strcmp(*(const char **)a, *(const char **)b);
86 | }
87 | void stdlib_strsort(const char *A[], size_t n)
88 | {
89 | qsort(A, n, sizeof *A, pstrcmp);
90 | }
91 |
92 | #ifdef __cplusplus
93 | static inline bool strless(const char *a, const char *b)
94 | {
95 | NCMPINC;
96 | return strcmp(a, b) < 0;
97 | }
98 | void stl_strsort(const char *A[], size_t n)
99 | {
100 | std::sort(A, A + n, strless);
101 | }
102 | #endif
103 |
104 | #include "mjt.h"
105 | void mjt_strsort(const char *A[], size_t n)
106 | {
107 | #define MJT_STRLESS(a, b) (NCMPINC, strcmp(*(a), *(b)) < 0)
108 | QSORT(const char *, A, n, MJT_STRLESS);
109 | #undef QSORT
110 | }
111 |
112 | #undef QSORT_H
113 | #include "qsort.h"
114 | void svpv_strsort(const char *A[], size_t n)
115 | {
116 | const char *tmp;
117 | #define STRLESS(i, j) (NCMPINC, strcmp(A[i], A[j]) < 0)
118 | QSORT(n, STRLESS, SWAP);
119 | }
120 |
121 | /*
122 | * Benchmarking
123 | */
124 | #include
125 | #include
126 |
127 | #define N (1 << 20)
128 | static int orig[N];
129 | static int copy[N];
130 |
131 | #include
132 | #include
133 |
134 | uint64_t bench_int(size_t n, void (*sort)(int A[], size_t n))
135 | {
136 | // Make 4 runs, throw away min and max, average the other two.
137 | uint64_t min = UINT64_MAX, max = 0, sum = 0;
138 | for (int i = 0; i < 4; i++) {
139 | usleep(1);
140 | memcpy(copy, orig, sizeof orig);
141 | sort(copy, n);
142 | memcpy(copy, orig, sizeof orig);
143 | ncmp = 0;
144 | // Don't reorder instructions.
145 | asm volatile ("" ::: "memory");
146 | uint64_t t = __rdtsc();
147 | asm volatile ("" ::: "memory");
148 | sort(copy, n);
149 | asm volatile ("" ::: "memory");
150 | t = __rdtsc() - t;
151 | asm volatile ("" ::: "memory");
152 | sum += t;
153 | if (t < min)
154 | min = t;
155 | else if (t > max)
156 | max = t;
157 | }
158 | // See if it can actually sort.
159 | for (size_t i = 1; i < n; i++)
160 | assert(copy[i-1] <= copy[i]);
161 | sum -= min + max;
162 | return sum / 2;
163 | }
164 |
165 | #define N_STR (1 << 20)
166 | static const char *orig_str[N];
167 | static const char *copy_str[N];
168 |
169 | uint64_t bench_str(size_t n, void (*strsort)(const char *A[], size_t n))
170 | {
171 | uint64_t min = UINT64_MAX, max = 0, sum = 0;
172 | for (int i = 0; i < 4; i++) {
173 | usleep(1);
174 | memcpy(copy_str, orig_str, sizeof orig_str);
175 | strsort(copy_str, n);
176 | memcpy(copy_str, orig_str, sizeof orig_str);
177 | ncmp = 0;
178 | asm volatile ("" ::: "memory");
179 | uint64_t t = __rdtsc();
180 | asm volatile ("" ::: "memory");
181 | strsort(copy_str, n);
182 | asm volatile ("" ::: "memory");
183 | t = __rdtsc() - t;
184 | asm volatile ("" ::: "memory");
185 | sum += t;
186 | if (t < min)
187 | min = t;
188 | else if (t > max)
189 | max = t;
190 | }
191 | for (size_t i = 1; i < n; i++)
192 | assert(strcmp(copy_str[i-1], copy_str[i]) <= 0);
193 | sum -= min + max;
194 | return sum / 2;
195 | }
196 |
197 | #include
198 | static int opt_srand;
199 | static int opt_strcmp;
200 | static struct option longopts[] = {
201 | { "srand", no_argument, &opt_srand, 1 },
202 | { "strcmp", no_argument, &opt_strcmp, 1 },
203 | { NULL },
204 | };
205 |
206 | int main(int argc, char **argv)
207 | {
208 | const char *argv0 = argv[0];
209 | int usage = 0;
210 | int c;
211 | while ((c = getopt_long(argc, argv, "", longopts, NULL)) != -1) {
212 | switch (c) {
213 | case 0:
214 | break;
215 | default:
216 | usage = 1;
217 | }
218 | }
219 | argc -= optind, argv += optind;
220 | if (argc && !usage) {
221 | fprintf(stderr, "%s: too many arguments\n", argv0);
222 | usage = 1;
223 | }
224 | if (usage) {
225 | fprintf(stderr, "Usage: %s [options]\n", argv0);
226 | return 1;
227 | }
228 | if (opt_srand)
229 | srand(getpid());
230 | if (opt_strcmp) {
231 | if (isatty(0))
232 | fprintf(stderr, "reading input from stdin\n");
233 | size_t n = 0;
234 | while (1) {
235 | char *line = NULL;
236 | size_t alloc = 0;
237 | ssize_t len = getline(&line, &alloc, stdin);
238 | if (len < 0)
239 | break;
240 | orig_str[n++] = line;
241 | if (n == N_STR)
242 | break;
243 | }
244 | printf("stdlib\t%12" PRIu64 "\t", bench_str(n, stdlib_strsort));
245 | printf("%zu\n", ncmp);
246 | #ifdef __cplusplus
247 | printf("stl\t%12" PRIu64 "\t", bench_str(n, stl_strsort));
248 | printf("%zu\n", ncmp);
249 | #endif
250 | printf("mjt\t%12" PRIu64 "\t", bench_str(n, mjt_strsort));
251 | printf("%zu\n", ncmp);
252 | printf("svpv\t%12" PRIu64 "\t", bench_str(n, svpv_strsort));
253 | printf("%zu\n", ncmp);
254 | }
255 | else {
256 | size_t n = N;
257 | for (size_t i = 0; i < N; i++)
258 | orig[i] = rand();
259 | printf("stdlib\t%12" PRIu64 "\t", bench_int(n, stdlib_isort));
260 | printf("%zu\n", ncmp);
261 | #ifdef __cplusplus
262 | printf("stl\t%12" PRIu64 "\t", bench_int(n, stl_isort));
263 | printf("%zu\n", ncmp);
264 | #endif
265 | printf("mjt\t%12" PRIu64 "\t", bench_int(n, mjt_isort));
266 | printf("%zu\n", ncmp);
267 | printf("svpv\t%12" PRIu64 "\t", bench_int(n, svpv_isort));
268 | printf("%zu\n", ncmp);
269 | }
270 | return 0;
271 | }
272 |
273 | // ex:set ts=8 sts=4 sw=4 noet:
274 |
--------------------------------------------------------------------------------
/qsort.h:
--------------------------------------------------------------------------------
1 | /*
2 | * Copyright (c) 2013, 2017 Alexey Tourbin
3 | *
4 | * Permission is hereby granted, free of charge, to any person obtaining a copy
5 | * of this software and associated documentation files (the "Software"), to deal
6 | * in the Software without restriction, including without limitation the rights
7 | * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8 | * copies of the Software, and to permit persons to whom the Software is
9 | * furnished to do so, subject to the following conditions:
10 | *
11 | * The above copyright notice and this permission notice shall be included in
12 | * all copies or substantial portions of the Software.
13 | *
14 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 | * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 | * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17 | * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 | * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19 | * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20 | * SOFTWARE.
21 | */
22 |
23 | /*
24 | * This is a traditional Quicksort implementation which mostly follows
25 | * [Sedgewick 1978]. Sorting is performed entirely on array indices,
26 | * while actual access to the array elements is abstracted out with the
27 | * user-defined `LESS` and `SWAP` primitives.
28 | *
29 | * Synopsis:
30 | * QSORT(N, LESS, SWAP);
31 | * where
32 | * N - the number of elements in A[];
33 | * LESS(i, j) - compares A[i] to A[j];
34 | * SWAP(i, j) - exchanges A[i] with A[j].
35 | */
36 |
37 | #ifndef QSORT_H
38 | #define QSORT_H
39 |
40 | /* Sort 3 elements. */
41 | #define Q_SORT3(q_a1, q_a2, q_a3, Q_LESS, Q_SWAP) \
42 | do { \
43 | if (Q_LESS(q_a2, q_a1)) { \
44 | if (Q_LESS(q_a3, q_a2)) \
45 | Q_SWAP(q_a1, q_a3); \
46 | else { \
47 | Q_SWAP(q_a1, q_a2); \
48 | if (Q_LESS(q_a3, q_a2)) \
49 | Q_SWAP(q_a2, q_a3); \
50 | } \
51 | } \
52 | else if (Q_LESS(q_a3, q_a2)) { \
53 | Q_SWAP(q_a2, q_a3); \
54 | if (Q_LESS(q_a2, q_a1)) \
55 | Q_SWAP(q_a1, q_a2); \
56 | } \
57 | } while (0)
58 |
59 | /* Partition [q_l,q_r] around a pivot. After partitioning,
60 | * [q_l,q_j] are the elements that are less than or equal to the pivot,
61 | * while [q_i,q_r] are the elements greater than or equal to the pivot. */
62 | #define Q_PARTITION(q_l, q_r, q_i, q_j, Q_UINT, Q_LESS, Q_SWAP) \
63 | do { \
64 | /* The middle element, not to be confused with the median. */ \
65 | Q_UINT q_m = q_l + ((q_r - q_l) >> 1); \
66 | /* Reorder the second, the middle, and the last items. \
67 | * As [Edelkamp Weiss 2016] explain, using the second element \
68 | * instead of the first one helps avoid bad behaviour for \
69 | * decreasingly sorted arrays. This method is used in recent \
70 | * versions of gcc's std::sort, see gcc bug 58437#c13, although \
71 | * the details are somewhat different (cf. #c14). */ \
72 | Q_SORT3(q_l + 1, q_m, q_r, Q_LESS, Q_SWAP); \
73 | /* Place the median at the beginning. */ \
74 | Q_SWAP(q_l, q_m); \
75 | /* Partition [q_l+2, q_r-1] around the median which is in q_l. \
76 | * q_i and q_j are initially off by one, they get decremented \
77 | * in the do-while loops. */ \
78 | q_i = q_l + 1; q_j = q_r; \
79 | while (1) { \
80 | do q_i++; while (Q_LESS(q_i, q_l)); \
81 | do q_j--; while (Q_LESS(q_l, q_j)); \
82 | if (q_i >= q_j) break; /* Sedgewick says "until j < i" */ \
83 | Q_SWAP(q_i, q_j); \
84 | } \
85 | /* Compensate for the i==j case. */ \
86 | q_i = q_j + 1; \
87 | /* Put the median to its final place. */ \
88 | Q_SWAP(q_l, q_j); \
89 | /* The median is not part of the left subfile. */ \
90 | q_j--; \
91 | } while (0)
92 |
93 | /* Insertion sort is applied to small subfiles - this is contrary to
94 | * Sedgewick's suggestion to run a separate insertion sort pass after
95 | * the partitioning is done. The reason I don't like a separate pass
96 | * is that it triggers extra comparisons, because it can't see that the
97 | * medians are already in their final positions and need not be rechecked.
98 | * Since I do not assume that comparisons are cheap, I also do not try
99 | * to eliminate the (q_j > q_l) boundary check. */
100 | #define Q_INSERTION_SORT(q_l, q_r, Q_UINT, Q_LESS, Q_SWAP) \
101 | do { \
102 | Q_UINT q_i, q_j; \
103 | /* For each item starting with the second... */ \
104 | for (q_i = q_l + 1; q_i <= q_r; q_i++) \
105 | /* move it down the array so that the first part is sorted. */ \
106 | for (q_j = q_i; q_j > q_l && (Q_LESS(q_j, q_j - 1)); q_j--) \
107 | Q_SWAP(q_j, q_j - 1); \
108 | } while (0)
109 |
110 | /* When the size of [q_l,q_r], i.e. q_r-q_l+1, is greater than or equal to
111 | * Q_THRESH, the algorithm performs recursive partitioning. When the size
112 | * drops below Q_THRESH, the algorithm switches to insertion sort.
113 | * The minimum valid value is probably 5 (with 5 items, the second and
114 | * the middle items, the middle itself being rounded down, are distinct). */
115 | #define Q_THRESH 16
116 |
117 | /* The main loop. */
118 | #define Q_LOOP(Q_UINT, Q_N, Q_LESS, Q_SWAP) \
119 | do { \
120 | Q_UINT q_l = 0; \
121 | Q_UINT q_r = (Q_N) - 1; \
122 | Q_UINT q_sp = 0; /* the number of frames pushed to the stack */ \
123 | struct { Q_UINT q_l, q_r; } \
124 | /* On 32-bit platforms, to sort a "char[3GB+]" array, \
125 | * it may take full 32 stack frames. On 64-bit CPUs, \
126 | * though, the address space is limited to 48 bits. \
127 | * The usage is further reduced if Q_N has a 32-bit type. */ \
128 | q_st[sizeof(Q_UINT) > 4 && sizeof(Q_N) > 4 ? 48 : 32]; \
129 | while (1) { \
130 | if (q_r - q_l + 1 >= Q_THRESH) { \
131 | Q_UINT q_i, q_j; \
132 | Q_PARTITION(q_l, q_r, q_i, q_j, Q_UINT, Q_LESS, Q_SWAP); \
133 | /* Now have two subfiles: [q_l,q_j] and [q_i,q_r]. \
134 | * Dealing with them depends on which one is bigger. */ \
135 | if (q_j - q_l >= q_r - q_i) \
136 | Q_SUBFILES(q_l, q_j, q_i, q_r); \
137 | else \
138 | Q_SUBFILES(q_i, q_r, q_l, q_j); \
139 | } \
140 | else { \
141 | Q_INSERTION_SORT(q_l, q_r, Q_UINT, Q_LESS, Q_SWAP); \
142 | /* Pop subfiles from the stack, until it gets empty. */ \
143 | if (q_sp == 0) break; \
144 | q_sp--; \
145 | q_l = q_st[q_sp].q_l; \
146 | q_r = q_st[q_sp].q_r; \
147 | } \
148 | } \
149 | } while (0)
150 |
151 | /* The missing part: dealing with subfiles.
152 | * Assumes that the first subfile is not smaller than the second. */
153 | #define Q_SUBFILES(q_l1, q_r1, q_l2, q_r2) \
154 | do { \
155 | /* If the second subfile is only a single element, it needs \
156 | * no further processing. The first subfile will be processed \
157 | * on the next iteration (both subfiles cannot be only a single \
158 | * element, due to Q_THRESH). */ \
159 | if (q_l2 == q_r2) { \
160 | q_l = q_l1; \
161 | q_r = q_r1; \
162 | } \
163 | else { \
164 | /* Otherwise, both subfiles need processing. \
165 | * Push the larger subfile onto the stack. */ \
166 | q_st[q_sp].q_l = q_l1; \
167 | q_st[q_sp].q_r = q_r1; \
168 | q_sp++; \
169 | /* Process the smaller subfile on the next iteration. */ \
170 | q_l = q_l2; \
171 | q_r = q_r2; \
172 | } \
173 | } while (0)
174 |
175 | /* And now, ladies and gentlemen, may I proudly present to you... */
176 | #define QSORT(Q_N, Q_LESS, Q_SWAP) \
177 | do { \
178 | if ((Q_N) > 1) \
179 | /* We could check sizeof(Q_N) and use "unsigned", but at least \
180 | * on x86_64, this has the performance penalty of up to 5%. */ \
181 | Q_LOOP(unsigned long, Q_N, Q_LESS, Q_SWAP); \
182 | } while (0)
183 |
184 | #endif
185 |
186 | /* ex:set ts=8 sts=4 sw=4 noet: */
187 |
--------------------------------------------------------------------------------