├── LICENSE
├── README.md
├── graphs
    ├── exp14.png
    ├── exp20.png
    └── exp24.png
├── logPartition.c
├── logsort.h
└── test.c


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2022-2024 aphitorite
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | 
  2 | ## Introduction
  3 | 
  4 | The well-known Quicksort is a O(n log n) algorithm that uses O(n) partitioning to sort data.  Such partitioning schemes are easily done in-place (O(1) extra space) but are not stable, or do not preserve the order of equal elements.  Stable Quicksorts can also be done in O(n log n) time, but they use an extra O(n) space for stable partitioning and are no longer in-place.
  5 | 
  6 | Logsort is a novel practical O(n log n) quicksort that is both in-place and stable.  The algorithm stably partitions data in O(n) time using O(log n) space, hence the name, which many already consider to be in-place despite not being the optimal O(1).  Unlike well-known in-place stable sorts which are O(n log² n), such as std::stable_sort, Logsort is asymptotically optimal.
  7 | 
  8 | To see Logsort's practical performance, jump to [Results](https://github.com/aphitorite/Logsort#Results).
  9 | 
 10 | > [!NOTE]
 11 | > **Usage:** define `VAR` element type and `CMP` comparison function.
 12 | 
 13 | ## Visualization
 14 | 
 15 | (Youtube) Logsort visualized on N = 2049 with 9 extra space allocated.
 16 | 
 17 | [![Video](https://img.youtube.com/vi/be9dpGwciUo/0.jpg)](https://www.youtube.com/watch?v=be9dpGwciUo)
 18 | 
 19 | ## Motivation
 20 | 
 21 | O(n log n) in-place stable sorting is hard to achieve for sorting algorithms.  Bubble Sort and Insertion Sort are stable and in-place but suboptimal.  Efficient sorts, such as Quicksort and Heapsort, are in-place and O(n log n) but unstable.  
 22 | 
 23 | One class of sorting algorithms that achieve both in-place, stability, and O(n log n) time is Block Sort (a.k.a. Block Merge Sorts), such as [Wikisort](https://github.com/BonzaiThePenguin/WikiSort) and [Grailsort](https://github.com/Mrrl/GrailSort), which are in-place merge sorts.  However, they are incredibly complicated and hard to implement.  In addition, in-place stable partitioning is a rather obscure problem in sorting.  Katajainen & Pasanen 1992 describes an O(1) space O(n) time partitioning algorithm, but it's only of theoretical interest and not practical.
 24 | 
 25 | Logsort is a new sorting algorithm that aims to provide a simple and practical O(n log n) in-place stable sort implementation like alternatives such as Block Sort.  The algorithm uses a novel O(n) in-place stable partitioning algorithm different than Katajainen & Pasanen 1992 and borrows ideas from [Aeos Quicksort](https://www.youtube.com/watch?v=_YTl2VJnQ4s) (stable quicksort with O(sqrt n) size blocking).  By sorting recursively using this partition, we get an O(n log n) sorting algorithm.
 26 | 
 27 | ## Algorithm
 28 | 
 29 | Partitioning is analogous to sorting an array of 0's and 1's, where elements smaller than the pivot are 0 and elements larger are 1. (Munro et al. 1990)  Logsort sorts 0's and 1's stably in O(n) time and O(log n) space via its partition.
 30 | 
 31 | In brief, Logsort groups 0's and 1's elements into blocks of size O(log n) using its available space.  By swapping elements among the 0 and 1 blocks, each block is assigned a unique index with O(log n) bits.  An LL block swap partition is performed on the blocks which preserves the order of one partition but not the other.  Using the assigned indices from earlier, the order of the blocks in the scrambled partition are restored.  Lastly, the leftover 0's not divisible by the block length are shifted into place, and the Logsort partition is completed in O(n) time and O(log n) space.
 32 | 
 33 | The four phases of the partition algorithm in more detail along with proofs:
 34 | 1. [Grouping elements into blocks](https://github.com/aphitorite/Logsort#grouping-phase)
 35 | 2. [Bit encoding the blocks](https://github.com/aphitorite/Logsort#bit-encoding)
 36 | 3. [Swapping the blocks](https://github.com/aphitorite/Logsort#swapping-the-blocks)
 37 | 4. [Sorting the blocks](https://github.com/aphitorite/Logsort#sorting-the-blocks) (+ [cleanup](https://github.com/aphitorite/Logsort#cleaning-up))
 38 | 
 39 | The entire partition is implemented in about 100 lines of C code: [logPartition.c](https://github.com/aphitorite/Logsort/blob/main/logPartition.c)
 40 | 
 41 | ## Grouping phase
 42 | 
 43 | Given an unordered list of 0's and 1's, we group them into blocks of a fixed size where each block contains either only 0's or 1's.  Given two buckets of B extra space, one can easily group blocks of size B (Katajainen & Pasanen 1992):
 44 | 
 45 | ```
 46 | Grouping blocks of size 2:
 47 | 
 48 |         ↓ move to ones bucket
 49 | array: [1, 0, 1, 1, 0, 0, 1, 1]
 50 | zeros: [ ,  ]
 51 |  ones: [ ,  ]
 52 |  
 53 |            ↓ move to zeros bucket
 54 | array: [ , 0, 1, 1, 0, 0, 1, 1]
 55 | zeros: [ ,  ]
 56 |  ones: [1,  ]
 57 |  
 58 |               ↓ move to ones bucket
 59 | array: [ ,  , 1, 1, 0, 0, 1, 1]
 60 | zeros: [0,  ]
 61 |  ones: [1,  ]
 62 |  
 63 | array: [ ,  ,  , 1, 0, 0, 1, 1]      
 64 | zeros: [0,  ]
 65 |  ones: [1, 1] ← ones bucket full: output back into array
 66 | 
 67 |         ↓  ↓ first block created: continue looping
 68 | array: [1, 1,  , 1, 0, 0, 1, 1]
 69 | zeros: [0,  ]
 70 |  ones: [ ,  ] 
 71 |  ```
 72 | 
 73 | It's almost guaranteed we end up with partially filled buckets at the end of this phase.  In that case, we output the 0 elements followed by the 1's at the end of the array.
 74 | 
 75 | ```
 76 |                           ↓ output zeros
 77 | array: [1, 1, 0, 0, 1, 1,  ,  ]
 78 | zeros: [0,  ]
 79 |  ones: [1,  ]
 80 | 
 81 |                              ↓ output ones
 82 | array: [1, 1, 0, 0, 1, 1, 0,  ]
 83 | zeros: [ ,  ]
 84 |  ones: [1,  ]
 85 | 
 86 | 
 87 | array: [1, 1, 0, 0, 1, 1, 0, 1]
 88 | 
 89 |        ┌──── blocks ────┐ ↓ leftover zeros
 90 |        [1, 1][0, 0][1, 1][0][1]
 91 | ```
 92 | 
 93 | At the end of the phase, we are left with some leftover zeros that don't make a complete block.  We handle them in the very last step of the partition.
 94 | 
 95 | Logsort's O(log n) space usage comes from grouping blocks of size O(log n), and this will be important in the encoding phase.  In the actual implementation, a space optimization from Aeos Quicksort is used which only needs one bucket instead of two. (Anonymous0726 2021)  Like Wikisort and Grailsort, Logsort's external buffer size can be configured, given it's at least Ω(log n).
 96 | 
 97 | > Since each element is moved a constant amount of times, once to the bucket and once back to the array, the grouping phase is O(n) regardless of block size but requires O(block size) extra space.
 98 | 
 99 | ## Bit encoding
100 | 
101 | 0's and 1's can also be concatenated to make binary numbers, so we can encode numbers in blocks by swapping elements between 0 blocks and 1 blocks.  Decoding a number in a block requires a scan of the block which costs O(log n) comparisons.  Since Logsort's blocks are O(log n) in size, there are enough bits to assign a unique index to each block.
102 | 
103 | > There are at most O(n/log n) encodable blocks which require log(n/log n) = O(log n) bits to represent a number range from 0 to O(n/log n).
104 | 
105 | ```
106 | Encode decimal number 13 = 0b1101:
107 | 
108 | [0, 0, 0, 0, 0] [1, 1, 1, 1, 1]
109 |  ↑  ↑     ↑      ↑  ↑     ↑
110 |  1  2     3      1  2     3  ← swap the following
111 |           
112 | ┌─── 13 ───┐    ┌── ~13 ───┐
113 | [1, 1, 0, 1, 0] [0, 0, 1, 0, 1] ← the pair of blocks are now encoded with 13
114 |              ↑               ↑ 
115 |              last bit reserved to determine 0 or 1 block
116 | ```
117 | 
118 | Using this technique, pairs of 0 and 1 block are encoded with a unique index during the encoding phase.  We maintain two iterators: one for 0 blocks and 1 blocks respectively:
119 | 
120 | ```
121 | Encode blocks:
122 | 
123 |  (0)  (0)
124 | [ 0 ][ 1 ][ 1 ][ 1 ][ 0 ][ 1 ][ 0 ][ 1 ]
125 | └───┘└───┘ encode 0
126 | 
127 |  (0)  (0)  (1)       (1)
128 | [ 0 ][ 1 ][ 1 ][ 1 ][ 0 ][ 1 ][ 0 ][ 1 ]
129 |           └───┘     └───┘ encode 1
130 | 
131 |  (0)  (0)  (1)  (2)  (1)       (2)
132 | [ 0 ][ 1 ][ 1 ][ 1 ][ 0 ][ 1 ][ 0 ][ 1 ]
133 |                └───┘          └───┘ encode 2 and finish
134 | ```
135 | 
136 | In this example, there were fewer 0 blocks, so all 0 blocks get encoded leaving some 1 blocks untouched.  These leftover blocks will be handled in the swapping phase.
137 | 
138 | > Encoding an index costs O(log n) swaps.  Since each block is size O(log n), there are O(n/log n) blocks, so this phase is O(n/log n) \* O(log n) = O(n).
139 | 
140 | ## Swapping the blocks
141 | 
142 | Once the blocks are encoded, we swap the blocks belonging to the larger partition.  In our example, there are more 1 blocks than 0 blocks so we scan the blocks' reserved bit and swap the 1 blocks to the right into their correct place:
143 | 
144 | ```
145 | Swap blocks:
146 | 
147 |  (0)   (0)   (1)   (2)   (1)         (2)
148 | [ 0a ][ 1a ][ 1b ][ 1c ][ 0b ][ 1d ][ 0c ][ 1e ]
149 |                               └────┘└────┘ block swap
150 | 
151 |  (0)   (0)   (1)   (2)   (1)   (2)
152 | [ 0a ][ 1a ][ 1b ][ 1c ][ 0b ][ 0c ][ 1d ][ 1e ]
153 |                   └────┘      └────┘ block swap
154 | 
155 |  (0)   (0)   (1)   (2)   (1)   (2)
156 | [ 0a ][ 1a ][ 1b ][ 0c ][ 0b ][ 1c ][ 1d ][ 1e ]
157 |             └────┘      └────┘ block swap
158 | 
159 |  (0)   (0)   (1)   (2)   (1)   (2)
160 | [ 0a ][ 1a ][ 0b ][ 0c ][ 1b ][ 1c ][ 1d ][ 1e ]
161 |       └────┘      └────┘ block swap
162 | 
163 |  (0)   (2)   (1)   (0)   (1)   (2)
164 | [ 0a ][ 0c ][ 0b ][ 1a ][ 1b ][ 1c ][ 1d ][ 1e ]
165 |                          (finished block swapping)
166 | ```
167 | 
168 | Swapping blocks in this fashion preserves the order of the 1 blocks.  If there were more 0 blocks, we would swap them to the left.  In this example, after swapping, the 1's partition is stably ordered.  However, the order of the 0 blocks is now scrambled.  Using the encoded indices in the 0 blocks, we can reorder them stably in the sorting phase.
169 | 
170 | > Each block in the phase is swapped once.  Since each block swap costs O(log n) swaps and there are at most O(n/log n) blocks swapped, the swapping phase is O(n/log n) \* O(log n) = O(n).
171 | 
172 | ## Sorting the blocks
173 | 
174 | Reordering the scrambled blocks is quite easy: simply iterate across the blocks.  If the current block's index is not equal to the iterator, block swap the current block to the read index.  Repeat this step until the current block's index matches the iterator and move on to the next block.
175 | 
176 | ```
177 | Sort blocks:
178 | 
179 |  (0)←  (2)   (1)   (0)   (1)   (2)
180 | [ 0a ][ 0c ][ 0b ][ 1a ][ 1b ][ 1c ][ 1d ][ 1e ]
181 | └────┘ 0th block == 0 ? -> Yes: go to next block
182 | 
183 |  (0)   (2)←  (1)   (0)   (1)   (2)
184 | [ 0a ][ 0c ][ 0b ][ 1a ][ 1b ][ 1c ][ 1d ][ 1e ]
185 |       └────┘ 1st block == 2 ? -> No: swap with 2nd block
186 | 
187 |  (0)   (1)   (2)   (0)   (1)   (2)
188 | [ 0a ][ 0b ][ 0c ][ 1a ][ 1b ][ 1c ][ 1d ][ 1e ]
189 |       └────┘└────┘
190 | 
191 |  (0)   (1)←  (2)   (0)   (1)   (2)
192 | [ 0a ][ 0b ][ 0c ][ 1a ][ 1b ][ 1c ][ 1d ][ 1e ]
193 |       └────┘ 1st block == 1 ? -> Yes: go to next block
194 | 
195 |  (0)   (1)   (2)←  (0)   (1)   (2)
196 | [ 0a ][ 0b ][ 0c ][ 1a ][ 1b ][ 1c ][ 1d ][ 1e ]
197 |             └────┘ 2nd block == 2 ? -> Yes: finish
198 | ```
199 | 
200 | In pseudocode:
201 | 
202 | ```
203 | for each index from 0 to blocks.count - 1:
204 | 	current = blocks[index].decode()
205 | 	while current != index:
206 | 		block swap blocks[current] and blocks[index]
207 | 		current = blocks[index].decode()
208 | ```
209 | 
210 | It's worth noting in our example we sorted the blocks belonging to the 0 partition.  However, recall that the encoding of 1 blocks is the bit flip of 0 blocks, so we would need to bit flip the results of the decoding beforehand in the case of sorting 1 blocks. 
211 | 
212 | > Each time a block is swapped, it ends up in its final destination, therefore a block is swapped at most once.  The blocks are also decoded at most twice: once per block swap and once per iterated block.  
213 | > 
214 | > Combined, the operations on a block are O(log n).  Since there are at most O(n/log n) scrambled blocks, the sorting phase is O(n/log n) \* O(log n) = O(n).
215 | 
216 | ### Cleaning up
217 | 
218 | After the sorting phase, the blocks are now partitioned stably in O(n) time.  However, some elements between blocks are still swapped from the encoding phase, and we want to restore the original states of the blocks.  Since both 0 blocks and 1 blocks are in order, we can easily reaccess the original encoded block pairs along with their indices in ascending order.  We then can "uncode" them by applying the encode algorithm again with the same index.  After iterating and uncoding the block pairs, we complete the partition on the blocks.
219 | 
220 | ```
221 | Uncode blocks:
222 | 
223 |  (0)   (1)   (2)   (0)   (1)   (2)
224 | [ 0a ][ 0b ][ 0c ][ 1a ][ 1b ][ 1c ][ 1d ][ 1e ]
225 | └────┘            └────┘ uncode 0 (encode 0)
226 | 
227 |        (1)   (2)         (1)   (2)
228 | [ 0a ][ 0b ][ 0c ][ 1a ][ 1b ][ 1c ][ 1d ][ 1e ]
229 |       └────┘            └────┘ uncode 1 (encode 1)
230 | 
231 |              (2)               (2)
232 | [ 0a ][ 0b ][ 0c ][ 1a ][ 1b ][ 1c ][ 1d ][ 1e ]
233 |             └────┘            └────┘ uncode 2 (encode 2)
234 | 
235 | [ 0a ][ 0b ][ 0c ][ 1a ][ 1b ][ 1c ][ 1d ][ 1e ]
236 | [       0        ][             1              ]
237 | 
238 | blocks and underlying elements partitioned stably!
239 | ```
240 | 
241 | We are not done yet and still have a leftover chunk of 0's that did not make a complete block from the grouping phase.  Simply copy the 0 leftovers, shift the 1's partition to the right, and copy the leftovers back:
242 | 
243 | ```
244 | Clean up:
245 | 
246 | [0 0 0 0 0][1 1 1 1 1 1 1 1 1 1 1 1][0 0 0][1 1 1]
247 |                                     └─────┘ copy out
248 |                                     
249 | [0 0 0 0 0][1 1 1 1 1 1 1 1 1 1 1 1]       [1 1 1]
250 |            └───────────────────────┘ shift --->
251 | 
252 | [0 0 0 0 0]       [1 1 1 1 1 1 1 1 1 1 1 1][1 1 1]
253 |            └─────┘ copy in
254 | 
255 | [0 0 0 0 0][0 0 0][1 1 1 1 1 1 1 1 1 1 1 1][1 1 1]
256 | └─────── 0 ──────┘└────────────── 1 ─────────────┘
257 | 
258 | partition complete!
259 | ```
260 | 
261 | Finally, we've stably partitioned the list in O(n) time and O(log n) space.
262 | 
263 | > There are O(n/log n) pairs of blocks that need to be uncoded.  Since each uncoding operation is O(log n), it costs O(n/log n) \* O(log n) = O(n) operations.
264 | > 
265 | >  Copying the leftovers and shifting the 1's partition is O(n) + O(log n) = O(n).
266 | 
267 | ## Results
268 | 
269 | An O(n log n) in-place stable sort in theory sounds great, but how does it compare against existing sorts?  In the following benchmarks, we test Logsort's practicality against four other stable sorting algorithms:
270 | 
271 | - [**Grailsort**](https://github.com/Mrrl/GrailSort) +512 aux (Block Merge Sort)
272 | - [**Octosort**](https://github.com/scandum/octosort) +512 aux (Block Merge Sort, optimized [Wikisort](https://github.com/BonzaiThePenguin/WikiSort))
273 | - [**Sqrtsort**](https://github.com/Mrrl/SqrtSort) +√N aux (Block Merge Sort)
274 | - [**Blitsort**](https://github.com/scandum/blitsort) +512 aux (Fast Rotate Merge/Quick Sort, O(n log² n))
275 | 
276 | All sorts are compiled with `gcc -O3` using GCC 11.4.0 and ran on Ubuntu 22.04 using WSL.  The algorithms sort a random linear distribution of 32-bit integers containing N unique, √N unique, and 4 unique values respectively.  The average time among 100 trials is recorded.
277 | 
278 | ![2^14](https://github.com/aphitorite/Logsort/blob/main/graphs/exp14.png)
279 | ![2^20](https://github.com/aphitorite/Logsort/blob/main/graphs/exp20.png)
280 | ![2^24](https://github.com/aphitorite/Logsort/blob/main/graphs/exp24.png)
281 | 
282 | <details><summary>Data table</summary>
283 | 
284 | |Sort           |List Size|Data Type|Best Time (µs)|Avg. Time (µs)|Trials|Distribution   |
285 | |---------------|---------|---------|--------------|--------------|------|---------------|
286 | |Blitsort (512) |16384    |4 bytes  |71            |84            |100   |4 unique       |
287 | |Sqrtsort (√N)  |16384    |4 bytes  |482           |509           |100   |4 unique       |
288 | |Octosort (512) |16384    |4 bytes  |437           |463           |100   |4 unique       |
289 | |Grailsort (512)|16384    |4 bytes  |855           |923           |100   |4 unique       |
290 | |Logsort (512)  |16384    |4 bytes  |107           |119           |100   |4 unique       |
291 | 
292 | |Sort           |List Size|Data Type|Best Time (µs)|Avg. Time (µs)|Trials|Distribution   |
293 | |---------------|---------|---------|--------------|--------------|------|---------------|
294 | |Blitsort (512) |16384    |4 bytes  |211           |227           |100   |128 unique     |
295 | |Sqrtsort (√N)  |16384    |4 bytes  |764           |791           |100   |128 unique     |
296 | |Octosort (512) |16384    |4 bytes  |844           |872           |100   |128 unique     |
297 | |Grailsort (512)|16384    |4 bytes  |1533          |1590          |100   |128 unique     |
298 | |Logsort (512)  |16384    |4 bytes  |375           |397           |100   |128 unique     |
299 | 
300 | |Sort           |List Size|Data Type|Best Time (µs)|Avg. Time (µs)|Trials|Distribution   |
301 | |---------------|---------|---------|--------------|--------------|------|---------------|
302 | |Blitsort (512) |16384    |4 bytes  |416           |435           |100   |16384 unique   |
303 | |Sqrtsort (√N)  |16384    |4 bytes  |947           |972           |100   |16384 unique   |
304 | |Octosort (512) |16384    |4 bytes  |989           |1012          |100   |16384 unique   |
305 | |Grailsort (512)|16384    |4 bytes  |1036          |1064          |100   |16384 unique   |
306 | |Logsort (512)  |16384    |4 bytes  |402           |424           |100   |16384 unique   |
307 | 
308 | |Sort           |List Size|Data Type|Best Time (µs)|Avg. Time (µs)|Trials|Distribution   |
309 | |---------------|---------|---------|--------------|--------------|------|---------------|
310 | |Blitsort (512) |1048576  |4 bytes  |5171          |5964          |100   |4 unique       |
311 | |Sqrtsort (√N)  |1048576  |4 bytes  |37006         |38549         |100   |4 unique       |
312 | |Octosort (512) |1048576  |4 bytes  |30248         |32028         |100   |4 unique       |
313 | |Grailsort (512)|1048576  |4 bytes  |56563         |58549         |100   |4 unique       |
314 | |Logsort (512)  |1048576  |4 bytes  |7098          |7527          |100   |4 unique       |
315 | 
316 | |Sort           |List Size|Data Type|Best Time (µs)|Avg. Time (µs)|Trials|Distribution   |
317 | |---------------|---------|---------|--------------|--------------|------|---------------|
318 | |Blitsort (512) |1048576  |4 bytes  |20459         |20896         |100   |1024 unique    |
319 | |Sqrtsort (√N)  |1048576  |4 bytes  |66212         |68995         |100   |1024 unique    |
320 | |Octosort (512) |1048576  |4 bytes  |75614         |78219         |100   |1024 unique    |
321 | |Grailsort (512)|1048576  |4 bytes  |129576        |133018        |100   |1024 unique    |
322 | |Logsort (512)  |1048576  |4 bytes  |22834         |23245         |100   |1024 unique    |
323 | 
324 | |Sort           |List Size|Data Type|Best Time (µs)|Avg. Time (µs)|Trials|Distribution   |
325 | |---------------|---------|---------|--------------|--------------|------|---------------|
326 | |Blitsort (512) |1048576  |4 bytes  |37693         |39161         |100   |1048576 unique |
327 | |Sqrtsort (√N)  |1048576  |4 bytes  |87581         |90788         |100   |1048576 unique |
328 | |Octosort (512) |1048576  |4 bytes  |96464         |99214         |100   |1048576 unique |
329 | |Grailsort (512)|1048576  |4 bytes  |92196         |95436         |100   |1048576 unique |
330 | |Logsort (512)  |1048576  |4 bytes  |37343         |38983         |100   |1048576 unique |
331 | 
332 | |Sort           |List Size|Data Type|Best Time (µs)|Avg. Time (µs)|Trials|Distribution   |
333 | |---------------|---------|---------|--------------|--------------|------|---------------|
334 | |Blitsort (512) |16777216 |4 bytes  |431614        |462105        |100   |4 unique       |
335 | |Sqrtsort (√N)  |16777216 |4 bytes  |885925        |904531        |100   |4 unique       |
336 | |Octosort (512) |16777216 |4 bytes  |657904        |671961        |100   |4 unique       |
337 | |Grailsort (512)|16777216 |4 bytes  |1059679       |1126016       |100   |4 unique       |
338 | |Logsort (512)  |16777216 |4 bytes  |149899        |161879        |100   |4 unique       |
339 | 
340 | |Sort           |List Size|Data Type|Best Time (µs)|Avg. Time (µs)|Trials|Distribution   |
341 | |---------------|---------|---------|--------------|--------------|------|---------------|
342 | |Blitsort (512) |16777216 |4 bytes  |854276        |860791        |100   |4096 unique    |
343 | |Sqrtsort (√N)  |16777216 |4 bytes  |1463341       |1499482       |100   |4096 unique    |
344 | |Octosort (512) |16777216 |4 bytes  |1682796       |1727569       |100   |4096 unique    |
345 | |Grailsort (512)|16777216 |4 bytes  |2744438       |2954750       |100   |4096 unique    |
346 | |Logsort (512)  |16777216 |4 bytes  |484517        |490591        |100   |4096 unique    |
347 | 
348 | |Sort           |List Size|Data Type|Best Time (µs)|Avg. Time (µs)|Trials|Distribution   |
349 | |---------------|---------|---------|--------------|--------------|------|---------------|
350 | |Blitsort (512) |16777216 |4 bytes  |1021487       |1030391       |100   |16777216 unique|
351 | |Sqrtsort (√N)  |16777216 |4 bytes  |1873933       |1988718       |100   |16777216 unique|
352 | |Octosort (512) |16777216 |4 bytes  |2123259       |2267619       |100   |16777216 unique|
353 | |Grailsort (512)|16777216 |4 bytes  |1965213       |2043090       |100   |16777216 unique|
354 | |Logsort (512)  |16777216 |4 bytes  |812568        |851833        |100   |16777216 unique|
355 | 
356 | </details>
357 | 
358 | ## Concluding remarks
359 | 
360 | Grailsort, Octosort, and Sqrtsort used branched comparisons in merging.  Logsort was optimized with branchless comparisons similar to [Fluxsort](https://github.com/scandum/fluxsort/) which greatly improved its performance with a 40% increase in speed!  The explanation of this speed boost seems to be similar in mechanism to that of Block Quicksort (Edelkamp & Weiss 2016).  Since Logsort is a stable quicksort by nature, branchless comparisons were easier to implement contrary to a Block Merge Sort.  To further increase the performance as well as simplify the pivot selection, [Piposort](https://github.com/scandum/piposort/) was also used for small arrays.
361 | 
362 | Unlike Block Merge Sorts, Logsort relies on its bit encoding to store information rather than distinct values.  This avoids any overhead in a key collection algorithm; we see this happening with Grailsort spending O(n log n) comparisons finding √N uniques.
363 | 
364 | Being a stable quicksort, Logsort naturally performs well on data with few uniques boasting a O(n log u) complexity.  However, for smaller array sizes, Octosort and Blitsort beat Logsort on a data size of 16M with 4 uniques despite having a complexity of O(n log n log u).  This is likely due to Logsort's poorer access patterns and overhead compared to simple merges with rotations.
365 | 
366 | Logsort's main rival, Blitsort, is an optimized Rotate Merge/Quick Sort which uses rotations to merge/partition but has a suboptimal O(n log² n) complexity.  Despite this, Rotate Merge is known to beat the optimal Block Merge owing to its simplicity, good locality, and low overhead.
367 | 
368 | In the benchmarks, Logsort remained competitive with Blitsort on 2^14 to 2^20 integers and even beat Blitsort on 2^24 integers and beyond with its superior time complexity.  Unlike Blitsort, however, Logsort is not optimized to be an adaptive sort, and this comparison is only on random data.  A hybrid between a rotate partition and blocked partition is also a good idea, but such an optimization is left to the reader.
369 | 
370 | With further improvements, it's likely that stable O(n log n) in-place sorting has practical application outside of theory.  It's worth noting that Logsort's application is quite galactic, only seeing noticeable benefits on lengths in the tens of millions.  However, despite Logsort's simplicity compared to Block Merge Sorts, these algorithms remain fairly complicated compared to their unstable counterparts.  
371 | 
372 | ## Acknowledgements
373 | 
374 | The author would like to thank members of the Discord server "The Studio" ([https://discord.gg/thestudio](https://discord.gg/thestudio "https://discord.gg/thestudio")) particularly:
375 | 
376 | - **@anonymous0726** for providing Aeos Quicksort as a reference
377 | - **@dystair** for revising the block encoding algorithm
378 | - **@control._.** for giving helpful suggestions regarding cache utilization
379 | - **@scandum** ([github](https://github.com/scandum)) for providing useful open-source C code as reference and helpful answers to questions
380 | - **@kigt** ([github](https://github.com/bzyjin)) for improving the main loop of the grouping phase
381 | 
382 | ## References
383 | 
384 | - **(Munro et al. 1990)** Stable in situ sorting and minimum data movement (https://link.springer.com/article/10.1007/BF02017344) 
385 | - **(Anonymous0726 2021)** \[Seizure Warning\] Aeos Quicksort (https://www.youtube.com/watch?v=_YTl2VJnQ4s) 
386 | - **(Katajainen & Pasanen 1992)** Stable minimum space partitioning in linear time (https://link.springer.com/article/10.1007/BF02017344) 
387 | - **(Edelkamp & Weiss 2016)** BlockQuicksort: How Branch Mispredictions don't affect Quicksort (https://arxiv.org/abs/1604.06697)
388 | 


--------------------------------------------------------------------------------
/graphs/exp14.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aphitorite/Logsort/cb797cac80f872f1a776162b619f709b31d214c4/graphs/exp14.png


--------------------------------------------------------------------------------
/graphs/exp20.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aphitorite/Logsort/cb797cac80f872f1a776162b619f709b31d214c4/graphs/exp20.png


--------------------------------------------------------------------------------
/graphs/exp24.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aphitorite/Logsort/cb797cac80f872f1a776162b619f709b31d214c4/graphs/exp24.png


--------------------------------------------------------------------------------
/logPartition.c:
--------------------------------------------------------------------------------
  1 | /*
  2 |  * 
  3 | MIT License
  4 | 
  5 | Copyright (c) 2022-2024 aphitorite
  6 | 
  7 | Permission is hereby granted, free of charge, to any person obtaining a copy
  8 | of this software and associated documentation files (the "Software"), to deal
  9 | in the Software without restriction, including without limitation the rights
 10 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 11 | copies of the Software, and to permit persons to whom the Software is
 12 | furnished to do so, subject to the following conditions:
 13 | 
 14 | The above copyright notice and this permission notice shall be included in all
 15 | copies or substantial portions of the Software.
 16 | 
 17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 18 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 19 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 20 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 21 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 22 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 23 | SOFTWARE.
 24 |  *
 25 |  */
 26 | 
 27 | size_t PIVFUNC(log_block_read)(VAR *a, VAR *piv, char wLen) {
 28 | 	size_t r = 0, i = 0;
 29 | 	
 30 | 	while(wLen--) r |= PIVCMP(a++, piv) << (i++);
 31 | 	
 32 | 	return r;
 33 | }
 34 | 
 35 | VAR *PIVFUNC(log_partition_easy)(VAR *array, VAR *swap, size_t n, VAR *piv) {
 36 | 	size_t c, x, y;
 37 | 	VAR *a = array, *b = a+n-1, *i = a, *j = b;
 38 | 	VAR *swapEnd = swap+n, *pa = swap, *pb = swapEnd-1;
 39 | 	
 40 | 	for(c = n/2; c; c--) {
 41 | 		x = PIVCMP(i, piv);
 42 | 		*a = *i; *pa = *i; i++; a += x; pa += !x;
 43 | 		
 44 | 		x = PIVCMP(j, piv);
 45 | 		*b = *j; *pb = *j; j--; b -= !x; pb -= x;
 46 | 	}
 47 | 	if(n % 2) {
 48 | 		x = PIVCMP(i, piv);
 49 | 		*a = *i; *pa = *i; i++; a += x; pa += !x;
 50 | 	}
 51 | 	while(++pb < swapEnd) *a++ = *pb;
 52 | 	while(pa-- > swap)    *b-- = *pa;
 53 | 	
 54 | 	return a;
 55 | }
 56 | VAR *PIVFUNC(log_partition)(VAR *a, VAR *s, size_t n, size_t bLen, VAR *piv) {
 57 | 	if(n <= bLen) return PIVFUNC(log_partition_easy)(a, s, n, piv);
 58 | 	
 59 | 	// group into blocks
 60 | 	
 61 | 	VAR *p;
 62 | 	size_t i, l = 0, r = 0, lb, rb = 0, rem;
 63 | 	char x;
 64 | 
 65 | 	for(i = 0; i < n; i++) { // branchless partitioning from fluxsort
 66 | 		x = PIVCMP(a+i, piv);
 67 | 		a[l] = a[i]; s[r] = a[i];
 68 | 		l += x; r += !x;
 69 | 		
 70 | 		if(r == bLen) { // external buffer full: empty block in main array
 71 | 			
 72 | 			rem = l % bLen; // size of 0's fragment
 73 | 			p = a+l - rem;
 74 | 			
 75 | 			memcpy(p+bLen, p, rem * sizeof(VAR)); // copy 0's fragment
 76 | 			memcpy(p, s, bLen * sizeof(VAR));     // copy 1's block in
 77 | 			
 78 | 			l += bLen; r = 0; rb++;
 79 | 		}
 80 | 	}
 81 | 	p = a+l;
 82 | 	memcpy(p, s, r * sizeof(VAR));
 83 | 	l %= bLen; p -= l;
 84 | 	lb = (n-r)/bLen - rb;
 85 | 	
 86 | 	char left = lb < rb;
 87 | 	size_t min = left ? lb : rb;
 88 | 	VAR *m = a + lb*bLen;
 89 | 	
 90 | 	if(min) {
 91 | 		size_t max = lb+rb - min, v = 0;
 92 | 		char wLen = log_ceil_log(min);
 93 | 		
 94 | 		// encode bits in blocks
 95 | 		
 96 | 		VAR *pa = a, *pb = a;
 97 | 		
 98 | 		for(i = 0; i < min; i++) {
 99 | 			while(!PIVCMP(pa+wLen, piv)) pa += bLen;
100 | 			while( PIVCMP(pb+wLen, piv)) pb += bLen;
101 | 			
102 | 			log_block_xor(pa, pb, v++); 
103 | 			pa += bLen; pb += bLen;
104 | 		}
105 | 		
106 | 		// swap blocks of larger partition
107 | 		
108 | 		pa = left ? p-bLen : a; pb = pa;
109 | 		size_t step = left ? -bLen : bLen;
110 | 		
111 | 		for(i = 0; i < max; ) {
112 | 			if(left ^ PIVCMP(pb+wLen, piv)) {
113 | 				memcpy(s,  pa, bLen * sizeof(VAR));
114 | 				memcpy(pa, pb, bLen * sizeof(VAR));
115 | 				memcpy(pb, s,  bLen * sizeof(VAR));
116 | 				
117 | 				pa += step; i++;
118 | 			}
119 | 			pb += step;
120 | 		}
121 | 		
122 | 		// block cycle sort
123 | 		
124 | 		size_t j, mask = (left << wLen) - left; v = 0;
125 | 		VAR *ps = left ? a : m; pa = ps; pb = left ? m : a;
126 | 		
127 | 		for(i = 0; i < min; i++) {
128 | 			j = mask ^ PIVFUNC(log_block_read)(pa, piv, wLen);
129 | 			
130 | 			while(j != v) {
131 | 				memcpy(s,  pa,          bLen * sizeof(VAR));
132 | 				memcpy(pa, ps + j*bLen, bLen * sizeof(VAR));
133 | 				memcpy(ps + j*bLen,  s, bLen * sizeof(VAR));
134 | 				
135 | 				j = mask ^ PIVFUNC(log_block_read)(pa, piv, wLen);
136 | 			}
137 | 			log_block_xor(pa, pb, v++);
138 | 			pa += bLen; pb += bLen;
139 | 		}
140 | 	}
141 | 	
142 | 	// clean up leftovers: shift 0's fragment in place
143 | 	
144 | 	memcpy(s, p, l * sizeof(VAR));
145 | 	memmove(m+l, m, rb*bLen * sizeof(VAR));
146 | 	memcpy(m, s, l * sizeof(VAR));
147 | 	
148 | 	return m+l;
149 | }
150 | 


--------------------------------------------------------------------------------
/logsort.h:
--------------------------------------------------------------------------------
  1 | /*
  2 |  * 
  3 | MIT License
  4 | 
  5 | Copyright (c) 2022-2024 aphitorite
  6 | 
  7 | Permission is hereby granted, free of charge, to any person obtaining a copy
  8 | of this software and associated documentation files (the "Software"), to deal
  9 | in the Software without restriction, including without limitation the rights
 10 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 11 | copies of the Software, and to permit persons to whom the Software is
 12 | furnished to do so, subject to the following conditions:
 13 | 
 14 | The above copyright notice and this permission notice shall be included in all
 15 | copies or substantial portions of the Software.
 16 | 
 17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 18 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 19 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 20 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 21 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 22 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 23 | SOFTWARE.
 24 |  *
 25 |  */
 26 |  
 27 | #ifndef LOGSORT_H
 28 | #define LOGSORT_H
 29 | 
 30 | #include <stdlib.h>
 31 | #include <string.h>
 32 | 
 33 | #define MIN_SMALLSORT 7
 34 | #define MIN_PIPOSORT 512
 35 | 
 36 | char log_ceil_log(size_t n) {
 37 | 	char r = 0;
 38 | 	while((1 << r) < n) r++;
 39 | 	return r;
 40 | }
 41 | 
 42 | ////////////////
 43 | //            //
 44 | //  PIPOSORT  //
 45 | //            //
 46 | ////////////////
 47 | 
 48 | // courtesy of @scandum's piposort
 49 | 
 50 | void log_smallsort(VAR *array, size_t nmemb)
 51 | {
 52 | 	VAR swap, *pta, *pte;
 53 | 	unsigned char w = 1, x, y, z = 1;
 54 | 
 55 | 	switch (nmemb) {
 56 | 	default:
 57 | 		pte = array + nmemb - 3;
 58 | 		
 59 | 		do {
 60 | 			pta = pte + (z = !z);
 61 | 
 62 | 			do {
 63 | 				x = cmp(pta, pta + 1) > 0; y = !x; 
 64 | 				swap = pta[y]; pta[0] = pta[x]; pta[1] = swap; 
 65 | 				pta -= 2; w |= x;
 66 | 			}
 67 | 			while(pta >= array);
 68 | 		}
 69 | 		while(w-- && --nmemb);
 70 | 		return;
 71 | 		
 72 | 	case 3:
 73 | 		pta = array;
 74 | 		
 75 | 		x = cmp(pta, pta + 1) > 0; y = !x; 
 76 | 		swap = pta[y]; pta[0] = pta[x]; pta[1] = swap; pta++;
 77 | 		
 78 | 		x = cmp(pta, pta + 1) > 0; y = !x; 
 79 | 		swap = pta[y]; pta[0] = pta[x]; pta[1] = swap;
 80 | 		
 81 | 		if(x == 0) return;
 82 | 		
 83 | 	case 2:
 84 | 		pta = array;
 85 | 		
 86 | 		x = cmp(pta, pta + 1) > 0; y = !x; 
 87 | 		swap = pta[y]; pta[0] = pta[x]; pta[1] = swap;
 88 | 		
 89 | 	case 1:
 90 | 	case 0:
 91 | 		return;
 92 | 	}
 93 | }
 94 | 
 95 | void log_parity_merge(VAR *from, VAR *dest, size_t left, size_t right) {
 96 | 	VAR *ptl, *ptr, *tpl, *tpr, *tpd, *ptd;
 97 | 	unsigned char x;
 98 | 
 99 | 	ptl = from; ptr = from + left; ptd = dest;
100 | 	tpl = from + left-1; tpr = from + left+right-1; tpd = dest + left+right-1;
101 | 
102 | 	if(left < right) *ptd++ = CMP(ptl, ptr) <= 0 ? *ptl++ : *ptr++;
103 | 
104 | 	while(--left) {
105 | 		x = CMP(ptl, ptr) <= 0; *ptd = *ptl; ptl += x; ptd[x] = *ptr; ptr += !x; ptd++;
106 | 		x = CMP(tpl, tpr) <= 0; *tpd = *tpl; tpl -= !x; tpd--; tpd[x] = *tpr; tpr -= x;
107 | 	}
108 | 	*tpd = CMP(tpl, tpr)  > 0 ? *tpl : *tpr;
109 | 	*ptd = CMP(ptl, ptr) <= 0 ? *ptl : *ptr;
110 | }
111 | void log_piposort(VAR *array, VAR *swap, size_t n) {
112 | 	size_t q1, q2, q3, q4, h1, h2;
113 | 
114 | 	if(n <= MIN_SMALLSORT) {
115 | 		log_smallsort(array, n);
116 | 		return;
117 | 	}
118 | 	h1 = n/2;  q1 = h1/2; q2 = h1-q1;
119 | 	h2 = n-h1; q3 = h2/2; q4 = h2-q3;
120 | 
121 | 	log_piposort(array, swap, q1);
122 | 	log_piposort(array + q1, swap, q2);
123 | 	log_piposort(array + h1, swap, q3);
124 | 	log_piposort(array + h1 + q3, swap, q4);
125 | 
126 | 	if(CMP(array + q1-1, array + q1) <= 0 && 
127 | 	   CMP(array + h1-1, array + h1) <= 0 && 
128 | 	   CMP(array + h1+q3-1, array + h1+q3) <= 0)
129 | 		return;
130 | 
131 | 	log_parity_merge(array, swap, q1, q2);
132 | 	log_parity_merge(array + h1, swap + h1, q3, q4);
133 | 	log_parity_merge(swap, array, h1, h2);
134 | }
135 | 
136 | ///////////////////////
137 | //                   //
138 | //  PIVOT SELECTION  //
139 | //                   //
140 | ///////////////////////
141 | 
142 | // courtesy of @scandum's blitsort
143 | 
144 | void log_trim_four(VAR *pta) {
145 | 	VAR swap;
146 | 	size_t x;
147 | 
148 | 	x = cmp(pta, pta + 1)  > 0; swap = pta[!x]; pta[0] = pta[x]; pta[1] = swap; pta += 2;
149 | 	x = cmp(pta, pta + 1)  > 0; swap = pta[!x]; pta[0] = pta[x]; pta[1] = swap; pta -= 2;
150 | 
151 | 	x = (cmp(pta, pta + 2) <= 0) * 2; pta[2] = pta[x]; pta++;
152 | 	x = (cmp(pta, pta + 2)  > 0) * 2; pta[0] = pta[x];
153 | }
154 | 
155 | VAR log_median_of_nine(VAR *a, VAR *s, size_t n) {
156 | 	size_t step = (n-1) / 8, i;
157 | 	VAR *pa = a;
158 | 	
159 | 	for(i = 0; i < 9; i++) 
160 | 		{ s[i] = *pa; pa += step; }
161 | 	
162 | 	log_smallsort(s, 9);
163 | 	return s[4];
164 | }
165 | 
166 | VAR log_smart_median(VAR *array, VAR *swap, size_t n, size_t bLen) {
167 | 	if(bLen < 64) return log_median_of_nine(array, swap, n);
168 | 	
169 | 	size_t cbrt;
170 | 	for(cbrt = 32; cbrt*cbrt*cbrt < n && cbrt < 1024; cbrt *= 2) {}
171 | 	
172 | 	size_t div = bLen < cbrt ? bLen : cbrt;
173 | 	size_t step = n/div, c;
174 | 	VAR *i = array, *j;
175 | 	
176 | 	// copy sample to swap space
177 | 	
178 | 	for(c = 0; c < div; c++) 
179 | 		{ swap[c] = *i; i += step; }
180 | 	
181 | 	// halve the sample using trim fours
182 | 	
183 | 	div /= 2; i = swap; j = swap + div;
184 | 	
185 | 	for(c = (div /= 4); c; c--) {
186 | 		log_trim_four(i);
187 | 		log_trim_four(j);
188 | 		
189 | 		i[0] = j[1]; i[3] = j[2];
190 | 		i += 4; j += 4;
191 | 	}
192 | 	
193 | 	// sort sample for median
194 | 	
195 | 	div *= 4;
196 | 	log_piposort(swap, swap+div, div);
197 | 	
198 | 	return swap[div/2 + 1];
199 | }
200 | 
201 | ///////////////
202 | //           //
203 | //  LOGSORT  //
204 | //           //
205 | ///////////////
206 | 
207 | void log_block_xor(VAR *a, VAR *b, size_t v) {
208 | 	VAR t;
209 | 	
210 | 	while(v) {
211 | 		if(v & 1) { t = *a; *a = *b; *b = t; }
212 | 		v >>= 1; a++; b++;
213 | 	}
214 | }
215 | 
216 | #define PIVFUNC(NAME) NAME##_less
217 | #define PIVCMP(a, b) (CMP((b), (a)) > 0)
218 | 
219 | #include "logPartition.c"
220 | 
221 | #undef PIVFUNC
222 | #undef PIVCMP
223 | 
224 | #define PIVFUNC(NAME) NAME##_less_eq
225 | #define PIVCMP(a, b) (CMP((a), (b)) <= 0)
226 | 
227 | #include "logPartition.c"
228 | 
229 | #undef PIVFUNC
230 | #undef PIVCMP
231 | 
232 | // logsort sorting functions
233 | 
234 | void logsort_rec(VAR *a, VAR *s, size_t n, size_t bLen) {
235 | 	size_t minSort = bLen < MIN_PIPOSORT ? bLen : MIN_PIPOSORT;
236 | 	
237 | 	while(n > minSort) {
238 | 		VAR piv = n < 2048 ? log_median_of_nine(a, s, n)
239 | 		                   : log_smart_median(a, s, n, bLen);
240 | 		
241 | 		VAR *p = log_partition_less_eq(a, s, n, bLen, &piv);
242 | 		size_t m = p-a;
243 | 		
244 | 		if(m == n) { // in the case of many equal elements
245 | 			p = log_partition_less(a, s, n, bLen, &piv);
246 | 			n = p-a;
247 | 			
248 | 			continue;
249 | 		}
250 | 		logsort_rec(p, s, n-m, bLen);
251 | 		n = m;
252 | 	}
253 | 	log_piposort(a, s, n);
254 | }
255 | void logsort(VAR *a, size_t n, size_t bLen) {
256 | 	if(n < bLen) bLen = n;
257 | 	if(bLen < 9) bLen = 9; // for median of nine
258 | 	
259 | 	VAR *s = malloc(bLen * sizeof(VAR));
260 | 	logsort_rec(a, s, n, bLen);
261 | 	free(s);
262 | }
263 | 
264 | #undef MIN_SMALLSORT
265 | #undef MIN_PIPOSORT
266 | 
267 | #endif // LOGSORT_H
268 | 


--------------------------------------------------------------------------------
/test.c:
--------------------------------------------------------------------------------
  1 | /*
  2 |  * 
  3 | MIT License
  4 | 
  5 | Copyright (c) 2022-2024 aphitorite
  6 | 
  7 | Permission is hereby granted, free of charge, to any person obtaining a copy
  8 | of this software and associated documentation files (the "Software"), to deal
  9 | in the Software without restriction, including without limitation the rights
 10 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 11 | copies of the Software, and to permit persons to whom the Software is
 12 | furnished to do so, subject to the following conditions:
 13 | 
 14 | The above copyright notice and this permission notice shall be included in all
 15 | copies or substantial portions of the Software.
 16 | 
 17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 18 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 19 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 20 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 21 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 22 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 23 | SOFTWARE.
 24 |  *
 25 |  */
 26 | 
 27 | #include <stdlib.h>
 28 | #include <stdio.h>
 29 | #include <string.h>
 30 | #include <sys/time.h>
 31 | #include <time.h>
 32 | #include <errno.h>
 33 | #include <math.h>
 34 | #include <assert.h>
 35 | #include <stdbool.h>
 36 | 
 37 | #define SEED 0
 38 | #define LEN(A) (sizeof(A) / sizeof(*A))
 39 | #define USE_AVG_TIME 1
 40 | 
 41 | #define VAR_TYPE int
 42 | 
 43 | long long utime()
 44 | {
 45 | 	struct timeval now_time;
 46 | 
 47 | 	gettimeofday(&now_time, NULL);
 48 | 
 49 | 	return now_time.tv_sec * 1000000LL + now_time.tv_usec;
 50 | }
 51 | 
 52 | // benching against qsort: no inline
 53 | 
 54 | __attribute__ ((noinline)) int cmp(const void * a, const void * b)
 55 | {
 56 | 	const VAR_TYPE fa = *(const VAR_TYPE *) a;
 57 | 	const VAR_TYPE fb = *(const VAR_TYPE *) b;
 58 | 
 59 | 	return fa - fb;
 60 | }
 61 | 
 62 | // import different sorts
 63 | 
 64 | void qsortTest(VAR_TYPE *a, size_t n, size_t b) {
 65 | 	qsort(a, n, sizeof(*a), cmp);
 66 | }
 67 | 
 68 | void shellsort(VAR_TYPE *array, size_t N, size_t b) {
 69 | 	VAR_TYPE t;
 70 | 	size_t i, j, k;
 71 | 		
 72 | 	for(k = 730725073; k; k = k/4 - k/16) {
 73 | 		for(j = k; j < N; j++) {
 74 | 			t = array[j];
 75 | 			
 76 | 			for(i = j; i >= k && cmp(&array[i-k], &t) > 0; i -= k)
 77 | 				array[i] = array[i-k];
 78 | 			
 79 | 			array[i] = t;
 80 | 		}
 81 | 	}
 82 | }
 83 | 
 84 | 
 85 | #include "algos/blitsort.h"
 86 | #include "algos/octosort.h"
 87 | 
 88 | void octosortTest(VAR_TYPE *a, size_t n, size_t b) {
 89 | 	VAR_TYPE *s = malloc(b * sizeof(VAR_TYPE));
 90 | 	octosort32(a, n, s, b, cmp);
 91 | 	free(s);
 92 | }
 93 | 
 94 | void blitsortTest(VAR_TYPE *a, size_t n, size_t b) {
 95 | 	blitsort32(a, n, cmp);
 96 | }
 97 | 
 98 | void quadsortTest(VAR_TYPE *a, size_t n, size_t b) {
 99 | 	quadsort32(a, n, cmp);
100 | }
101 | 
102 | 
103 | #define SORT_TYPE VAR_TYPE
104 | #define SORT_CMP cmp //(a,b) (*(a) - *(b))
105 | 
106 | #include "algos/GrailSort.h"
107 | #include "algos/SqrtSort.h"
108 | 
109 | #undef SORT_TYPE
110 | #undef SORT_CMP
111 | 
112 | void grailsortTest(VAR_TYPE *a, size_t n, size_t b) {
113 | 	VAR_TYPE *s = malloc(b * sizeof(VAR_TYPE));
114 | 	grail_commonSort(a, n, s, b);
115 | 	free(s);
116 | }
117 | 
118 | void sqrtsortTest(VAR_TYPE *a, size_t n, size_t b) {
119 | 	SqrtSort(a, n);
120 | }
121 | 
122 | 
123 | #ifdef USE_SHELFSORT
124 | 
125 | 	#define ELEMENT VAR_TYPE
126 | 	#define CMP cmp
127 | 
128 | 	#include "algos/shelfsort.h"
129 | 
130 | 	#undef CMP
131 | 	#undef ELEMENT
132 | 
133 | 	void shelfsortTest(VAR_TYPE *a, size_t n, size_t b) {
134 | 		ShelfSort(a, n);
135 | 	}
136 | 	
137 | #endif
138 | 
139 | 
140 | #define VAR VAR_TYPE
141 | #define CMP cmp //(a,b) (*(a) - *(b))
142 | 
143 | #ifdef USE_HELIUMSORT
144 | 
145 | 	#include "algos/heliumSort.h"
146 | 
147 | 	void heliumSortTest(VAR_TYPE *a, size_t n, size_t b) {
148 | 		heliumSort(a, 0, n, b);
149 | 	}
150 | 	
151 | #endif
152 | 
153 | #ifdef USE_ECTASORT
154 | 
155 | 	#include "algos/ectasort.h"
156 | 	
157 | 	void ectasortTest(VAR_TYPE *a, size_t n, size_t b) {
158 | 		ectasort(a, n);
159 | 	}
160 | 	
161 | #endif
162 | 
163 | #include "logsort.h"
164 | 
165 | #undef VAR
166 | #undef CMP
167 | 
168 | 
169 | //array generation
170 | 
171 | void randArray(VAR_TYPE *a, size_t n, size_t s) {
172 | 	for(size_t i = 0; i < n; i++) {
173 | 		size_t j = rand()%(i+1);
174 | 		a[i] = a[j];
175 | 		a[j] = (VAR_TYPE)(i >> s);
176 | 	}
177 | }
178 | char verify(VAR_TYPE *a, size_t n, size_t s) {
179 | 	for(size_t i = 0; i < n; i++)
180 | 		if(a[i] != (i >> s)) return 0;
181 | 	return 1;
182 | }
183 | 
184 | void backwards(VAR_TYPE *a, size_t n, size_t s) {
185 | 	for(size_t i = 0; i < n; i++) {
186 | 		a[i] = (VAR_TYPE)((n-1 - i) >> s);
187 | 	}
188 | }
189 | 
190 | //sort trial
191 | 
192 | void sortTrial(long long *times, void (*sort)(VAR_TYPE*, size_t, size_t), VAR_TYPE *a, size_t n, size_t bLen, size_t sh, size_t trials, char prog) {
193 | 	srand(SEED);
194 | 	
195 | 	unsigned long long best = -1;
196 | 	double avg = 0;
197 | 	long long start, end, res;
198 | 	
199 | 	for(size_t i = 0; i < trials; i++) {
200 | 		if(prog) {
201 | 			printf("\r(%ld/%ld) ", i+1, trials);
202 | 			fflush(stdout);
203 | 		}
204 | 		randArray(a, n, sh);
205 | 		
206 | 		start = utime();
207 | 		
208 | 		sort(a, n, bLen);
209 | 		
210 | 		end = utime();
211 | 		res = end-start;
212 | 		
213 | 		assert(verify(a, n, sh));
214 | 		
215 | 		best = res < best ? res : best;
216 | 		avg += res;
217 | 	}
218 | 	if(prog) printf("\r");
219 | 	
220 | 	times[0] = best; 
221 | 	times[1] = (long long)(avg / trials + 0.5);
222 | }
223 | 
224 | void sortTrials(long long *times, void (*sorts[])(VAR_TYPE*, size_t, size_t), char *sortNames[], size_t sortCount, void (*shuffle)(VAR_TYPE*, size_t, size_t),
225 |                 VAR_TYPE *a, size_t n, size_t bLen, size_t sh, size_t trials) {
226 | 	
227 | 	printf("Sort,List Size,Data Type,Best Time (\u00B5s),Avg. Time (\u00B5s),Trials,Distribution\n");
228 | 	
229 | 	for(size_t i = 0; i < sortCount; i++) {
230 | 		sortTrial(times, sorts[i], a, n, bLen, sh, trials, 1);
231 | 		
232 | 		if(i > 0) //sometimes the times are biased for the first sort 
233 | 			printf("%s,%ld,%ld bytes,%lld,%lld,%ld,%ld unique\n", sortNames[i], n, sizeof(VAR_TYPE), times[0], times[1], trials, n >> sh);
234 | 	}
235 | 	printf("\n");
236 | }
237 | 
238 | void sortTrialsTest(long long *times, void (*sorts[])(VAR_TYPE*, size_t, size_t), char *sortNames[], size_t sortCount, void (*shuffle)(VAR_TYPE*, size_t, size_t),
239 |                     VAR_TYPE *a, size_t bLen, size_t sh, size_t trials) {
240 | 	sortTrials(times, sorts, sortNames, sortCount, randArray, a, 1 << sh, bLen, sh-2, trials); // 4 unique
241 | 	sortTrials(times, sorts, sortNames, sortCount, randArray, a, 1 << sh, bLen, sh/2, trials); // sqrt unique
242 | 	sortTrials(times, sorts, sortNames, sortCount, randArray, a, 1 << sh, bLen, 0,    trials); // all unique
243 | }
244 | 
245 | void sortTrialsS(long long *times, void (*sorts[])(VAR_TYPE*, size_t, size_t), char *sortNames[], size_t sortCount, void (*shuffle)(VAR_TYPE*, size_t, size_t),
246 |                 VAR_TYPE *a, size_t n, size_t bLen, size_t *shifts, char *sNames[], size_t sCnt, size_t trials) {
247 | 	
248 | 	printf("Avg. Time (\u00B5s):");
249 | 	for(size_t i = 1; i < sortCount; i++)
250 | 		printf(",%s", sortNames[i]);
251 | 	printf("\n");
252 | 	
253 | 	for(size_t j = 0; j < sCnt; j++) {
254 | 		printf("%s", sNames[j]);
255 | 		size_t sh = shifts[j];
256 | 		
257 | 		for(size_t i = 0; i < sortCount; i++) {
258 | 			sortTrial(times, sorts[i], a, n, bLen, sh, trials, 0);
259 | 			
260 | 			if(i > 0) { //sometimes the times are biased for the first sort 
261 | 				printf(",%.6lf", (double) times[USE_AVG_TIME] / n);
262 | 			}
263 | 		}
264 | 		printf("\n");
265 | 	}
266 | }
267 | 
268 | void sortTrialsN(long long *times, void (*sorts[])(VAR_TYPE*, size_t, size_t), char *sortNames[], size_t sortCount, void (*shuffle)(VAR_TYPE*, size_t, size_t),
269 |                 VAR_TYPE *a, size_t *nList, size_t nCnt, size_t *bList, size_t sh, size_t trials) {
270 | 	
271 | 	printf("Avg.Time per Elem. (\u00B5s)");
272 | 	for(size_t i = 1; i < sortCount; i++)
273 | 		printf(",%s", sortNames[i]);
274 | 	printf("\n");
275 | 	
276 | 	for(size_t j = 0; j < nCnt; j++) {
277 | 		size_t n = nList[j];
278 | 		printf("N = %ld", n); 
279 | 		fflush(stdout);
280 | 		
281 | 		for(size_t i = 0; i < sortCount; i++) {
282 | 			sortTrial(times, sorts[i], a, n, bList[i], sh, trials, 0);
283 | 			
284 | 			if(i > 0) { //sometimes the times are biased for the first sort 
285 | 				printf(",%.6lf", (double) times[USE_AVG_TIME] / n);
286 | 				fflush(stdout);
287 | 			}
288 | 		}
289 | 		printf("\n");
290 | 	}
291 | }
292 | 
293 | //printing array / debugging
294 | 
295 | void printA(VAR_TYPE *data_arr, size_t data_length) {
296 |     while(data_length--) {
297 |         printf("%lld,", (long long) *data_arr);
298 |         *data_arr++;
299 |     }
300 |     printf("\n");
301 | }
302 | 
303 | void printABars(VAR_TYPE *a, size_t n) {
304 | 	VAR_TYPE *max = a, *pa = a+1;
305 | 	
306 | 	for(size_t i = n-1; i; i--) {
307 | 		if(cmp(pa, max) > 0) {
308 | 			max = pa;
309 | 		}
310 | 		pa++;
311 | 	}
312 | 	for(VAR_TYPE i = *max; i+1; i--) {
313 | 		for(size_t j = 0; j < n; j++) {
314 | 			if(cmp(a+j, &i) >= 0) 
315 | 				printf("##");
316 | 			else
317 | 				printf(". ");
318 | 		}
319 | 		printf("\n");
320 | 	}
321 | }
322 | 
323 | int main() {
324 | 	size_t n = 1 << 24, b = 512;
325 | 	VAR_TYPE *a = (VAR_TYPE*) malloc(n * sizeof(VAR_TYPE));
326 | 	
327 | 	/*void (*sorts[])(VAR_TYPE*, size_t, size_t) = { logsort, 
328 | 		blitsortTest, 
329 | 		ectasortTest,
330 | 		shelfsortTest, 
331 | 		sqrtsortTest, 
332 | 		octosortTest, 
333 | 		heliumSortTest, 
334 | 		grailsortTest, 
335 | 		qsortTest, 
336 | 		logsort
337 | 	};
338 | 	char *sortNames[] = { "", 
339 | 		"Blitsort (512)", 
340 | 		"Ectasort (\u221AN)",
341 | 		"Shelfsort (\u221AN)",
342 | 		"Sqrtsort (\u221AN)", 
343 | 		"Octosort (512)", 
344 | 		"Helium Sort (512)", 
345 | 		"Grailsort (512)", 
346 | 		"qsort", 
347 | 		"Logsort (512)"
348 | 	};*/
349 | 	
350 | 	void (*sorts[])(VAR_TYPE*, size_t, size_t) = { logsort, 
351 | 		blitsortTest,
352 | 		sqrtsortTest, 
353 | 		octosortTest, 
354 | 		grailsortTest, 
355 | 		logsort
356 | 	};
357 | 	char *sortNames[] = { "", 
358 | 		"Blitsort (512)",
359 | 		"Sqrtsort (\u221AN)", 
360 | 		"Octosort (512)", 
361 | 		"Grailsort (512)", 
362 | 		"Logsort (512)"
363 | 	};
364 | 	
365 | 	size_t trials = 100;
366 | 	size_t sortCount = LEN(sorts);
367 | 	long long times[2];
368 | 	
369 | 	/*const size_t exp = 7;
370 | 	size_t bList[] = {512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512};
371 | 	size_t nList[] = {1<<14, 1<<16, 1<<18, 1<<20, 1<<22, 1<<24};
372 | 	size_t nCnt = LEN(nList);
373 | 	
374 | 	sortTrialsN(times, sorts, sortNames, sortCount, randArray, a, nList, nCnt, bList, 0, 100);*/
375 | 	
376 | 	printf("using buffer size %ld\n\n", b);
377 | 	
378 | 	sortTrialsTest(times, sorts, sortNames, sortCount, randArray, a, b, 14, trials);
379 | 	sortTrialsTest(times, sorts, sortNames, sortCount, randArray, a, b, 20, trials);
380 | 	sortTrialsTest(times, sorts, sortNames, sortCount, randArray, a, b, 24, trials);
381 | 	
382 | 	free(a);
383 | 	
384 | 	return 0;
385 | }
386 | 


--------------------------------------------------------------------------------