├── README.md ├── shelfsort.cpp └── times.md /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | ## Shelfsort: a new sorting algorithm 5 | 6 | #### What basic properties does the Shelfsort algorithm have? 7 | - O(N * log(N)) time 8 | - O(sqrt(N)) memory (with constant pointer size) 9 | - adaptive (fast for mostly pre-sorted inputs) 10 | - stable (order of equal elements is unchanged) 11 | 12 | #### What's novel about the Shelfsort algorithm? 13 | The core idea is: 14 | 1. Divide the data into blocks of sqrt(N) size, and sort each block. 15 | 2. Use sqrt(N) indices to track which block is where. 16 | 3. Clear some blocks by merging data into scratch memory. 17 | 4. Merge blocks of data from where they are into whichever blocks are empty. 18 | 19 | Here's an example of merging blocks: 20 | 21 | 1. The data is divided into blocks. 22 | ``aaaabbbb`` 23 | 24 | 2. Merging of {a, b} into {C} starts. The blocks of {a, b} are merged into a buffer, starting with the first ones. When blocks are used completely, they're cleared in the main array. Here, {b} is first in the sort, so blocks of {b} are cleared first. 25 | ``aaaa__bb`` 26 | 27 | 3. The merged run data {C} is written into the space left by cleared {b} blocks. The location indices are stored in a locations array. Here, locations[i] is the index where the ith output data block can be found. 28 | ``aaaaCC_b`` 29 | ``45______`` 30 | 31 | 4. Here, {a, b} are fully merged into {C} and the locations are stored. If another merge is done, blocks are used from their current positions. 32 | ``CCCCCCCC`` 33 | ``45670123`` 34 | 35 | 5. Once all data has been merged, the blocks are sorted according to their location indices. 36 | 37 | #### How fast is Shelfsort? 38 | For large arrays, compared to std::stable_sort with unlimited memory, this implementation takes: 39 | - ~1.1x the time for random inputs 40 | - ~0.1x the time for sorted inputs 41 | 42 | [Here are some times](times.md) I got from running the included code. 43 | 44 | #### Why name it "Shelfsort"? 45 | The block locations are fixed while elements move between them, like books being moved between shelves. Also, "block sort" was taken. 46 | 47 | #### Why a new sorting algorithm? 48 | Most sorting algorithms are either unstable, slow, or require O(N) memory. There are merge sorts using binary search to partition blocks to reduce memory consumption, such as [blitsort](https://github.com/scandum/blitsort/), but this is a different approach that doesn't use binary search. 49 | 50 | #### No, I meant, why did you work on this? 51 | - It's an intellectual challenge that's "pure" in a way that I like. 52 | - It's a way to test yourself against some of the best minds of history. 53 | - If you can develop something situationally useful, you can leave some small mark on the world. 54 | - I'm currently unemployed, and the weather's been bad, so I had some free time. 55 | - It will probably get HR people to not throw out my resume. <- this is sarcasm 56 | 57 | #### How optimized is this implementation? 58 | It's written in low-level C++ and does some branchless operations, but apart from that, it's not very optimized. Things I *haven't* done include: 59 | - trying initial sorts for small segments that handle >4 elements 60 | - looking at the assembly code 61 | - manual vectorization 62 | 63 | Micro-optimization isn't my specialty or my interest, but feel free to try doing some. 64 | 65 | #### What limitations does this implementation have? 66 | - To simplify implementation and discourage people from using this code in production, it currently only works on power-of-2 array sizes. 67 | - It hasn't been tested thoroughly, and could have bugs on some inputs. 68 | 69 | #### Have you made other open-source stuff? 70 | I also programmed: 71 | - a [fork of a roguelike game](https://github.com/b-crawl/bcrawl/) 72 | - a [proof of concept](https://github.com/bhauth/multilevel-ternary-hash-table) for a new hashtable algorithm 73 | 74 | #### How can I contact you? 75 | b αt bhauth dot com 76 | 77 | 78 | 79 | 80 | -------------------------------------------------------------------------------- /shelfsort.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | #define COPY(dest,src,elements) for (int copy_index = elements - 1; copy_index >= 0; copy_index--) { dest[copy_index] = src[copy_index]; } 6 | 7 | #define ELEMENT int64_t 8 | #define LESSEQ(a,b) a <= b 9 | #define LESS(a,b) a < b 10 | #define SMALLSORT_SIZE 4 11 | 12 | inline void SmallSort(ELEMENT* arr) { 13 | auto a = arr[0]; 14 | auto b = arr[1]; 15 | auto c = arr[2]; 16 | auto d = arr[3]; 17 | 18 | bool less1 = LESS(b,a); 19 | auto a2 = less1 ? b : a; 20 | auto b2 = less1 ? a : b; 21 | bool less2 = LESS(d,c); 22 | auto c2 = less2 ? d : c; 23 | auto d2 = less2 ? c : d; 24 | 25 | if (LESSEQ(b2, c2)) { 26 | arr[0] = a2; 27 | arr[1] = b2; 28 | arr[2] = c2; 29 | arr[3] = d2; 30 | return; 31 | } 32 | 33 | bool x = LESSEQ(a2, c2); 34 | auto b3 = x ? c2 : a2; 35 | bool y = LESSEQ(b2, d2); 36 | auto c3 = y ? b2 : d2; 37 | bool z = LESSEQ(a2, d2); 38 | arr[0] = x ? a2 : c2; 39 | arr[1] = z ? b3 : c3; 40 | arr[2] = z ? c3 : b3; 41 | arr[3] = y ? d2 : b2; 42 | } 43 | 44 | void MergePair(ELEMENT* p1, ELEMENT* p2, ELEMENT* output, int n) { // n = size-1 45 | int i_1 = n; 46 | int i_2 = n; 47 | 48 | for (int i = n * 2 + 1; i_1 >= 0 && i_2 >= 0; i--) { 49 | auto x = p1[i_1]; 50 | auto y = p2[i_2]; 51 | bool higher = LESS(y, x); 52 | bool lower = !higher; 53 | i_1 -= higher; 54 | i_2 -= lower; 55 | output[i] = higher ? x : y; 56 | } 57 | 58 | if (i_1 >= 0) { COPY(output, p1, (i_1 + 1)); } 59 | else { COPY(output, p2, (i_2 + 1)); } 60 | } 61 | 62 | 63 | void BlockMerge(ELEMENT* start_p, ELEMENT* scratch, uint32_t* indices, uint32_t* indices_out, int block_count_1, int block_count_2, int block_size) { 64 | int index_index_1 = block_count_1 - 1; 65 | int index_index_2 = block_count_2 - 1; 66 | int block_id_1 = indices[index_index_1]; 67 | int block_id_2 = indices[block_count_1 + index_index_2]; 68 | ELEMENT* p1 = &start_p[block_id_1 * block_size]; 69 | ELEMENT* p2 = &start_p[(block_count_1 + block_id_2) * block_size]; 70 | 71 | // merge first blocks into scratch 72 | ELEMENT* output = scratch; 73 | int output_block_counter = block_count_1 + block_count_2 - 2; 74 | int clear_block_id = 0; 75 | int next_clear_block_id = 0; 76 | 77 | int i = block_size * 2 - 1; // scratch is 2 blocks in size 78 | int i_1 = block_size - 1; 79 | int i_2 = i_1; 80 | 81 | // adaptive sort: when in order, just adjust indices 82 | ELEMENT last_of_first = p1[(indices[0] * block_size) + block_size - 1]; 83 | ELEMENT first_of_last = start_p[(block_count_1 + indices[block_count_1]) * block_size]; 84 | if (LESSEQ(last_of_first, first_of_last)) { 85 | for (int i = 0; i < block_count_1; i++) { 86 | indices_out[i] = indices[i]; 87 | } 88 | for (int i = block_count_1; i < block_count_1 + block_count_2; i++) { 89 | indices_out[i] = indices[i] + block_count_1; 90 | } 91 | return; 92 | } 93 | 94 | block_merging: 95 | while (i_1 >= 0 && i_2 >= 0 && i >= 0) { 96 | auto x = p1[i_1]; 97 | auto y = p2[i_2]; 98 | bool higher = LESS(y, x); 99 | bool lower = !higher; 100 | output[i] = higher ? x : y; 101 | i_1 -= higher; 102 | i_2 -= lower; 103 | i--; 104 | } 105 | 106 | if (i < 0) { 107 | output = &start_p[next_clear_block_id * block_size]; 108 | output_block_counter--; 109 | indices_out[output_block_counter] = next_clear_block_id; 110 | i = block_size - 1; 111 | } 112 | if (i_1 < 0) { 113 | next_clear_block_id = block_id_1; 114 | index_index_1--; 115 | if (index_index_1 < 0) { 116 | goto finish_left; 117 | } 118 | block_id_1 = indices[index_index_1]; 119 | p1 = &start_p[block_id_1 * block_size]; 120 | i_1 = block_size - 1; 121 | } 122 | if (i_2 < 0) { 123 | next_clear_block_id = block_count_1 + block_id_2; 124 | index_index_2--; 125 | if (index_index_2 < 0) { 126 | goto finish_right; 127 | } 128 | block_id_2 = indices[block_count_1 + index_index_2]; 129 | p2 = &start_p[(block_count_1 + block_id_2) * block_size]; 130 | i_2 = block_size - 1; 131 | } 132 | 133 | goto block_merging; 134 | 135 | 136 | finish_left: 137 | while (i_2 >= 0 && i >= 0) { 138 | output[i] = p2[i_2]; 139 | i_2--; i--; 140 | } 141 | 142 | if (i < 0) { 143 | clear_block_id = next_clear_block_id; 144 | output = &start_p[next_clear_block_id * block_size]; 145 | if (i_2 >= 0) { 146 | indices_out[output_block_counter] = next_clear_block_id; 147 | } 148 | output_block_counter--; 149 | i = block_size - 1; 150 | } 151 | if (i_2 < 0) { 152 | next_clear_block_id = block_count_1 + block_id_2; 153 | index_index_2--; 154 | if (index_index_2 < 0) { 155 | goto unload_scratch; 156 | } 157 | block_id_2 = indices[block_count_1 + index_index_2]; 158 | p2 = &start_p[(block_count_1 + block_id_2) * block_size]; 159 | i_2 = block_size - 1; 160 | } 161 | 162 | goto finish_left; 163 | 164 | 165 | finish_right: 166 | while (i_1 >= 0 && i >= 0) { 167 | output[i] = p1[i_1]; 168 | i_1--; i--; 169 | } 170 | 171 | if (i < 0) { 172 | clear_block_id = next_clear_block_id; 173 | output = &start_p[next_clear_block_id * block_size]; 174 | if (i_1 >= 0) { 175 | indices_out[output_block_counter] = next_clear_block_id; 176 | } 177 | output_block_counter--; 178 | i = block_size - 1; 179 | } 180 | if (i_1 < 0) { 181 | next_clear_block_id = block_id_1; 182 | index_index_1--; 183 | if (index_index_1 < 0) { 184 | goto unload_scratch; 185 | } 186 | block_id_1 = indices[index_index_1]; 187 | p1 = &start_p[block_id_1 * block_size]; 188 | i_1 = block_size - 1; 189 | } 190 | 191 | goto finish_right; 192 | 193 | 194 | unload_scratch: { 195 | int block_bytes = block_size * sizeof(ELEMENT); 196 | memcpy(output, scratch, block_bytes); 197 | memcpy(&start_p[next_clear_block_id * block_size], &scratch[block_size], block_bytes); 198 | int last = block_count_1 + block_count_2 - 1; 199 | indices_out[last - 1] = clear_block_id; 200 | indices_out[last] = next_clear_block_id; 201 | } 202 | } 203 | 204 | 205 | void FinalBlockSorting(ELEMENT* p1, ELEMENT* scratch, uint32_t* indices, int blocks, int block_size) { 206 | for (int b = 0; b < blocks; b++) { 207 | int ix = indices[b]; 208 | if (ix != b) { 209 | int block_bytes = block_size * sizeof(ELEMENT); 210 | memcpy(scratch, &p1[b * block_size], block_bytes); 211 | int empty_block = b; 212 | 213 | while (ix != b) { 214 | memcpy(&p1[empty_block * block_size], &p1[ix * block_size], block_bytes); 215 | indices[empty_block] = empty_block; 216 | empty_block = ix; 217 | ix = indices[ix]; 218 | } 219 | memcpy(&p1[empty_block * block_size], scratch, block_bytes); 220 | indices[empty_block] = empty_block; 221 | } 222 | } 223 | } 224 | 225 | 226 | void ShelfSort(ELEMENT* arr, unsigned int size) { 227 | // determine memory size 228 | unsigned int v = size; 229 | unsigned int log_size = 0; 230 | while (v >>= 1) { log_size++; } 231 | unsigned int scratch_size = 1 << (2 + (log_size + 1) / 2); 232 | 233 | // allocate memory 234 | char* allocated_memory = reinterpret_cast (malloc(scratch_size * 235 | (sizeof(ELEMENT) 236 | + std::max(sizeof(ELEMENT), 2 * sizeof(uint32_t))))); 237 | ELEMENT* scratch = reinterpret_cast (allocated_memory); 238 | uint32_t* indices_a = reinterpret_cast (&allocated_memory[scratch_size * (sizeof(ELEMENT))]); 239 | uint32_t* indices_b = &indices_a[scratch_size]; 240 | 241 | // test other small sorts? 242 | for (int i = 0; i < size; i += SMALLSORT_SIZE) { 243 | SmallSort(&arr[i]); 244 | } 245 | unsigned int sorted_zone_size = SMALLSORT_SIZE; 246 | 247 | // pingpong quad merge 248 | while (sorted_zone_size <= (scratch_size / 2)) { 249 | unsigned int run_len = sorted_zone_size; 250 | sorted_zone_size *= 2; 251 | for (int i = 0; i < size; i += sorted_zone_size * 2) { 252 | ELEMENT* p1 = &arr[i]; 253 | ELEMENT* p2 = &arr[i + run_len]; 254 | bool less1 = LESSEQ(p1[run_len-1], p2[0]); 255 | if (!less1) { // skip if already sorted 256 | MergePair(p1, p2, scratch, run_len-1); 257 | } 258 | 259 | ELEMENT* p3 = &p2[run_len]; 260 | ELEMENT* p4 = &p3[run_len]; 261 | ELEMENT* scratch2 = &scratch[sorted_zone_size]; 262 | bool less2 = LESSEQ(p3[run_len-1], p4[0]); 263 | if (!less2) { 264 | MergePair(p3, p4, scratch2, run_len-1); 265 | } 266 | 267 | if (less1 || less2) { 268 | if (!less1) {COPY(scratch2, p3, sorted_zone_size);} 269 | else if (!less2) {COPY(scratch, p1, sorted_zone_size);} 270 | else { 271 | bool less3 = LESSEQ(p1[sorted_zone_size-1], p3[0]); 272 | if (less3) { continue; } else { 273 | COPY(scratch, p1, sorted_zone_size * 2); 274 | } 275 | } 276 | } 277 | 278 | MergePair(scratch, scratch2, p1, sorted_zone_size-1); 279 | } 280 | sorted_zone_size *= 2; 281 | } 282 | 283 | // initialize block indices 284 | unsigned int block_size = scratch_size / 2; 285 | int total_blocks = size / block_size; 286 | int blocks_per_run = sorted_zone_size / block_size; 287 | for (int i = 0; i < total_blocks; i += blocks_per_run) 288 | for (int j = 0; j < blocks_per_run; j += 1) 289 | indices_a[i + j] = j; 290 | 291 | // do the block sorting runs 292 | while (sorted_zone_size <= (size / 2)) { 293 | unsigned int run_len = sorted_zone_size; 294 | int blocks1 = (sorted_zone_size / block_size); 295 | int blocks2 = blocks1; 296 | sorted_zone_size *= 2; 297 | 298 | for (int i = 0; i < total_blocks; i += (blocks1 + blocks2)) { 299 | BlockMerge(&arr[i * block_size], scratch, &indices_a[i], &indices_b[i], blocks1, blocks2, block_size); 300 | } 301 | 302 | std::swap(indices_a, indices_b); 303 | } 304 | 305 | FinalBlockSorting(arr, scratch, indices_a, (sorted_zone_size / block_size), block_size); 306 | free(allocated_memory); 307 | } 308 | 309 | #define OUT(data) std::cout << data 310 | 311 | double TimedTest_Shelfsort(ELEMENT* test_data, int test_size, int cycles) { 312 | double startTime = (double)clock() / CLOCKS_PER_SEC; 313 | for (int i=0; i (malloc(total_size * (sizeof(ELEMENT)))); 340 | ELEMENT* test_data2 = reinterpret_cast (malloc(total_size * (sizeof(ELEMENT)))); 341 | 342 | OUT("beginning test: 2^"); OUT(log_size); OUT(" size blocks, "); 343 | OUT(cycles); OUT(" cycles\n"); 344 | 345 | srand(time(0)); 346 | for (int i = 0; i < test_size; i++) { 347 | ELEMENT x = rand(); 348 | test_data[i] = x; 349 | test_data2[i] = x; 350 | } 351 | 352 | double time1 = TimedTest_Shelfsort(test_data, test_size, cycles); 353 | double time1_sorted = -1; 354 | time1_sorted = TimedTest_Shelfsort(test_data, test_size, cycles); 355 | 356 | double time_std = TimedTest_stable_sort(test_data2, test_size, cycles); 357 | double time_std_sorted = -1; 358 | time_std_sorted = TimedTest_stable_sort(test_data2, test_size, cycles); 359 | 360 | bool correct_sort = true; 361 | for (int i=0; i < total_size; i++) { 362 | if (test_data[i] != test_data2[i]) { 363 | correct_sort = false; 364 | } 365 | } 366 | 367 | OUT("shelfsort time: "); WriteTime(time1, total_size); 368 | OUT(" / presorted: "); WriteTime(time1_sorted, total_size); OUT("\n"); 369 | OUT("std::stable_sort time: "); WriteTime(time_std, total_size); 370 | OUT(" / presorted: "); WriteTime(time_std_sorted, total_size); OUT("\n"); 371 | OUT("matching sort = "); OUT(correct_sort); OUT("\n\n"); 372 | 373 | free(test_data); 374 | free(test_data2); 375 | } 376 | 377 | int main() { 378 | OUT("Shelfsort speed test\n"); 379 | OUT("times are in ns/item\n\n"); 380 | int total_size = 1 << 22; 381 | for (int size = 16; size <= 24; size++) 382 | { RunTest(size, total_size); } 383 | } 384 | 385 | -------------------------------------------------------------------------------- /times.md: -------------------------------------------------------------------------------- 1 | Shelfsort speed test 2 | times are in ns/item 3 | 4 | **test: 2^16 size blocks, 64 cycles** 5 | shelfsort time: 5.24 / presorted: 1.90 6 | std::stable_sort time: 12.39 / presorted: 9.29 7 | 8 | **test: 2^17 size blocks, 32 cycles** 9 | shelfsort time: 6.67 / presorted: 1.90 10 | std::stable_sort time: 15.97 / presorted: 12.15 11 | 12 | **test: 2^18 size blocks, 16 cycles** 13 | shelfsort time: 9.53 / presorted: 1.90 14 | std::stable_sort time: 18.83 / presorted: 12.63 15 | 16 | **test: 2^19 size blocks, 8 cycles** 17 | shelfsort time: 15.25 / presorted: 2.14 18 | std::stable_sort time: 24.8 / presorted: 14.6 19 | 20 | **test: 2^20 size blocks, 4 cycles** 21 | shelfsort time: 28.37 / presorted: 2.14 22 | std::stable_sort time: 33.85 / presorted: 15.49 23 | 24 | **test: 2^21 size blocks, 2 cycles** 25 | shelfsort time: 51.2 / presorted: 2.14 26 | std::stable_sort time: 53.64 / presorted: 19.7 27 | 28 | **test: 2^22 size blocks, 1 cycles** 29 | shelfsort time: 100.13 / presorted: 2.14 30 | std::stable_sort time: 88.69 / presorted: 21.21 31 | 32 | **test: 2^23 size blocks, 1 cycles** 33 | shelfsort time: 103.35 / presorted: 2.26 34 | std::stable_sort time: 92.2 / presorted: 23.0 35 | 36 | **test: 2^24 size blocks, 1 cycles** 37 | shelfsort time: 108.65 / presorted: 2.26 38 | std::stable_sort time: 92.26 / presorted: 23.54 39 | --------------------------------------------------------------------------------