├── .gitignore ├── Cargo.toml ├── README.md ├── fuzz ├── Cargo.toml └── fuzz_targets │ └── fuzz_sort.rs └── src ├── branchless_merge.rs ├── gap_guard.rs ├── glidesort.rs ├── lib.rs ├── merge_reduction.rs ├── mut_slice.rs ├── physical_merges.rs ├── pivot_selection.rs ├── powersort.rs ├── small_sort.rs ├── stable_quicksort.rs ├── tracking.rs └── util.rs /.gitignore: -------------------------------------------------------------------------------- 1 | /fuzz/target 2 | /fuzz/corpus 3 | /fuzz/artifacts 4 | /fuzz/coverage 5 | /target 6 | Cargo.lock 7 | .vscode -------------------------------------------------------------------------------- /Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "glidesort" 3 | version = "0.1.2" 4 | edition = "2021" 5 | authors = ["Orson Peters "] 6 | description = "Glidesort sorting algorithm" 7 | repository = "https://github.com/orlp/glidesort" 8 | license = "MIT OR Apache-2.0" 9 | readme = "README.md" 10 | keywords = ["algorithm", "sort", "sorting"] 11 | categories = ["algorithms"] 12 | 13 | # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html 14 | 15 | [dependencies] 16 | lazy_static = { version = "1.4.0", optional = true } 17 | 18 | [dev-dependencies] 19 | 20 | [features] 21 | tracking = ["lazy_static"] 22 | unstable = [] 23 | 24 | [profile.release] 25 | lto = "thin" -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Glidesort 2 | 3 | Glidesort is a novel stable sorting algorithm that combines the best-case behavior 4 | of Timsort-style merge sorts for pre-sorted data with the best-case behavior of 5 | [pattern-defeating quicksort](https://github.com/orlp/pdqsort) for data with many duplicates. 6 | It is a comparison-based sort supporting arbitrary comparison operators, 7 | and while exceptional on data with patterns it is also very fast for random data. 8 | 9 | For sorting `n` elements with `k` distinct values glidesort has the following 10 | characteristics by default: 11 | 12 | ``` 13 | Best Average Worst Memory Stable Deterministic 14 | n n log k n log n n / 8 Yes Yes 15 | ``` 16 | 17 | Glidesort can use as much (up to `n`) or as little extra memory as you want. If 18 | given only `O(1)` memory the average and worst case become `O(n (log n)^2)`, however 19 | in practice its performance is great for all but the most skewed data size / 20 | auxiliary space ratios. The default is to allocate up to `n` elements worth of 21 | data, unless this exceeds 1 MiB, in which case we scale this down to `n / 2` 22 | elements worth of data up until 1 GiB after which glidesort uses `n / 8` memory. 23 | 24 | # Benchmark 25 | 26 | Performance varies a lot from machine to machine and dataset to dataset, so your 27 | mileage will vary. Nevertheless, an example benchmark from a 2021 Apple M1 28 | machine comparing against `[T]::sort` and `[T]::sort_unstable` for various input 29 | distributions of `u64`: 30 | 31 | ![Performance graph](https://i.imgur.com/8fIACqY.png) 32 | 33 | Compiled with `rustc 1.69.0-nightly (11d96b593)` using `--release --features unstable` and `lto = "thin"`. 34 | 35 | 36 | # Usage 37 | 38 | Use `cargo add glidesort` and replace `a.sort()` with `glidesort::sort(&mut a)`. 39 | A similar process works for `sort_by` and `sort_by_key`. 40 | 41 | Glidesort exposes two more families of sorting functions. 42 | `glidesort::sort_with_buffer(&mut a, buf)` asks you to pass a `&mut 43 | [MaybeUninit]` buffer which it will then (exclusively) use as auxiliary space 44 | to sort the elements. `glidesort::sort_in_vec(&mut v)` behaves like normal 45 | glidesort but will allocate its auxiliary space at the end of the passed `Vec`. 46 | This allows future sorting calls to re-use the same space and reduce allocations. 47 | Both these families also support the `_by` and `_by_key` interface. 48 | 49 | # Visualization 50 | 51 | This visualization focuses on demonstrating the advanced merging techniques in glidesort: 52 | 53 | https://user-images.githubusercontent.com/202547/216675278-e4c8f15c-e42d-4224-b8c7-fdc67fdc2bde.mp4 54 | 55 | This visualization shows how glidesort is adaptive to both pre-existing runs as well 56 | as many duplicates together: 57 | 58 | https://user-images.githubusercontent.com/202547/216675274-6e61689f-a120-4b7c-b1a7-9b5aa5fd013e.mp4 59 | 60 | Note that both visualizations have different small sorting thresholds and 61 | auxiliary memory parameters to show the techniques in action on a smaller scale. 62 | 63 | 64 | # Technique overview 65 | 66 | If you prefer I also have a recorded talk 67 | I gave at FOSDEM 2023 that gives a high level overview of glidesort: 68 | 69 | [![Talk recording preview](https://i.imgur.com/Lcl0KbI.png)](https://fosdem.org/2023/schedule/event/rust_glidesort/) 70 | 71 | Glidesort uses a novel main loop based on powersort. Powersort is similar to 72 | Timsort, using heuristics to find a good order of stably merging sorted runs. 73 | Like powersort it does a linear scan over the input, recognizing any ascending 74 | or strictly descending sequences. However, unlike powersort it does not eagerly 75 | sort sequences that are considered unordered into small sorted blocks. Instead 76 | it processes them as-is, unsorted. This process produces *logical runs*, which 77 | may be sorted or unsorted. 78 | 79 | Glidesort repeatedly uses a *logical* merge operation on these logical runs, as 80 | powersort would. In a logical merge unsorted runs are simply concatenated into 81 | larger unsorted runs. Sorted runs are also concatenated into *double sorted* 82 | runs. Only when merging a sorted and unsorted run finally the unsorted run is 83 | sorted using stable quicksort, and when merging double sorted runs glidesort 84 | uses interleaved ping-pong merges. 85 | 86 | Using this novel hybrid approach glidesort can take advantage of arbitrary 87 | sorted runs in the data as well as process data with many duplicate items faster 88 | similar to pattern-defeating quicksort. 89 | 90 | 91 | # Stable merging 92 | 93 | Glidesort merges multiple sorted runs at the same time, and interleaves their 94 | merging loops for better memory-level and instruction-level parallelism as well 95 | as hiding data dependencies. For similar reasons it also interleaves independent 96 | left-to-right and right-to-left merging loops as bidirectional merges, which are 97 | a generalization of [quadsort](https://github.com/scandum/quadsort)s parity 98 | merges. Merging multiple runs at the same time also lets glidesort use ping-pong 99 | merging, avoiding unnecessary `memcpy` calls by using the implicit copy you get 100 | from an out-of-place merge. All merging loops are completely branchless, making 101 | it fast for random data as well. 102 | 103 | Glidesort further uses binary searches to split up large merge operations into 104 | smaller merge operations that it then performs at the same time using 105 | instruction-level parallelism. This splitting procedure also allows glidesort to 106 | use arbitrarily small amounts of memory, as it can choose to split a merge 107 | repeatedly until it fits in our scratch space to process. 108 | 109 | 110 | # Stable quicksort 111 | 112 | Yes, stable quicksort. Wikipedia will outright tell you that quicksort is 113 | unstable, or at least all efficient implementations are. That simply isn't true, 114 | all it needs is auxiliary memory. Credit to Igor van den Hoven's 115 | [fluxsort](https://github.com/scandum/fluxsort) for demonstrating that stable 116 | quicksort can be efficient in practice. 117 | 118 | Glidesort uses a novel bidirectional stable partitioning method that interleaves 119 | a left-to-right partition scan with a right-to-left partition scan for greater 120 | memory-level parallelism and hiding data dependencies. Partitioning is done 121 | entirely branchlessly (if the comparison operator is), giving consistent 122 | performance on all data. 123 | 124 | 125 | # License 126 | 127 | Glidesort is dual-licensed under the Apache License, Version 2.0 and the MIT license. 128 | -------------------------------------------------------------------------------- /fuzz/Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "glidesort-fuzz" 3 | version = "0.0.0" 4 | publish = false 5 | edition = "2021" 6 | 7 | [package.metadata] 8 | cargo-fuzz = true 9 | 10 | [dependencies] 11 | libfuzzer-sys = "0.4" 12 | bytemuck = "1.12" 13 | 14 | [dependencies.glidesort] 15 | path = ".." 16 | 17 | # Prevent this from interfering with workspaces 18 | [workspace] 19 | members = ["."] 20 | 21 | [profile.release] 22 | debug = 1 23 | 24 | [[bin]] 25 | name = "fuzz_sort" 26 | path = "fuzz_targets/fuzz_sort.rs" 27 | test = false 28 | doc = false 29 | -------------------------------------------------------------------------------- /fuzz/fuzz_targets/fuzz_sort.rs: -------------------------------------------------------------------------------- 1 | #![no_main] 2 | 3 | use libfuzzer_sys::fuzz_target; 4 | use core::cell::Cell; 5 | 6 | fuzz_target!(|data: (&[u8], usize)| { 7 | let _ = std::panic::catch_unwind(|| { 8 | let mut arr: Vec<(u8, u8)> = data.0.chunks_exact(2).map(|c| (c[0], c[1])).collect(); 9 | let mut arr2 = arr.clone(); 10 | let mut arr3 = arr.clone(); 11 | 12 | let mut num_cmp = 0; 13 | glidesort::glidesort_by(&mut arr, |a, b| { 14 | num_cmp += 1; 15 | a.1.cmp(&b.1) 16 | }); 17 | arr2.sort_by(|a, b| { a.1.cmp(&b.1) }); 18 | assert_eq!(arr, arr2); 19 | 20 | // Sadly fuzzer doesn't actually work with catch_unwind yet :( 21 | /* 22 | // Simulate arbitrary exception. 23 | let mut except_point = data.1 % (num_cmp + 1); 24 | glidesort::glidesort_by(&mut arr3, |a, b| { 25 | if except_point == 0 { 26 | panic!("foo"); 27 | } 28 | except_point -= 1; 29 | a.1.cmp(&b.1) 30 | }); 31 | */ 32 | }); 33 | }); 34 | -------------------------------------------------------------------------------- /src/branchless_merge.rs: -------------------------------------------------------------------------------- 1 | /* 2 | When merging two arrays we have six pointers, initially as such: 3 | 4 | left_begin right_begin 5 | | | 6 | [ left ] [ right ] 7 | | | 8 | left_end right_end 9 | 10 | dst_begin 11 | | 12 | [ dst ] 13 | | 14 | dst_end 15 | 16 | Note that in the above picture left and right are disjoint with dst, but for 17 | some inputs one of the two may overlap with dst. 18 | 19 | Logically we have two operations, a merge at the beginning, and a merge at 20 | the end. When disjoint both operations are valid, if left overlaps dst 21 | only merging at the end is valid, if right overlaps dst only merging at the 22 | beginning is valid. 23 | 24 | A merge at the beginning compares the elements at left_begin and 25 | right_begin. It picks the smallest of the two (breaking ties towards left 26 | for stability) and copies it to dst_begin, then increments the pointer it 27 | picked and dst_begin. 28 | 29 | Similarly a merge at the end compares the elements at left_end - 1 and 30 | right_end - 1, picks the larger of the two (breaking ties towards right) 31 | and copies it to dst_end - 1, then decrements the pointer it picked and 32 | dst_end. 33 | 34 | For disjoint input/destination arrays of Copy types we can be a bit more 35 | relaxed in our bounds checks and let the left_begin/left_end (and similarly 36 | for right) pointers cross. To see why, first we note that the following 37 | invariants always hold (and similarly for right): 38 | 39 | left_begin == orig_left_begin + times_left_picked_at_begin 40 | left_end == orig_left_end - times_left_picked_at_end 41 | orig_left_len = orig_left_end - orig_left_begin 42 | 43 | The following invariants hold for a valid comparison operator: 44 | 45 | orig_left_len == times_left_picked_at_begin + times_left_picked_at_end 46 | orig_right_len == times_right_picked_at_begin + times_right_picked_at_end 47 | 48 | Note that as long as we check after merging that left_begin <= left_end, 49 | which should hold for any valid comparison operator by the above invariants, 50 | we can rest assured that no element in left was ever accessed again after 51 | being copied, making the copies valid objects, and similarly for right. When 52 | left or right overlaps with dst we must make sure the begin and end pointers 53 | of that side never cross, otherwise we might be accessing copies before we 54 | know if they'll be valid. 55 | 56 | If this doesn't hold we accessed elements after copying them, making the 57 | copies invalid. In this scenario we copy all elements from the original 58 | arrays into the destination without any further processing. If Ord is 59 | violated we make no guarantees about the output order. 60 | 61 | Finally, we have the following invariants after k merges at the beginning 62 | and k merges at the end: 63 | 64 | times_left_picked_at_begin == k - times_right_picked_at_begin 65 | times_left_picked_at_end == k - times_right_picked_at_end 66 | 67 | Note that all the above invariants guarantee that if left/right are disjoint 68 | with dst and n == orig_left_len == orig_right_len, simply doing n merges at 69 | the beginning and n merges at the end fully merges the result. 70 | */ 71 | 72 | use core::marker::PhantomData; 73 | 74 | use crate::gap_guard::GapGuard; 75 | use crate::mut_slice::states::{AlwaysInit, Init, Uninit, Weak}; 76 | use crate::mut_slice::{Brand, MutSlice, Unbranded}; 77 | use crate::tracking::ptr; 78 | use crate::util::*; 79 | 80 | pub struct GapLeft; // Can write to dst at begin. 81 | pub struct GapRight; // Can write to dst at end. 82 | pub struct GapBoth; // Can write on both ends. 83 | 84 | pub trait HasLeftGap {} 85 | impl HasLeftGap for GapLeft {} 86 | impl HasLeftGap for GapBoth {} 87 | pub trait HasRightGap {} 88 | impl HasRightGap for GapRight {} 89 | impl HasRightGap for GapBoth {} 90 | 91 | // Invariants: 92 | // - If G: HasLeftGap then left_begin < left_end implies left_begin is disjoint with 93 | // dst.begin(), and similarly for right. 94 | // - If G: HasRightGap then left_begin < left_end implies left_end.sub(1) is disjoint 95 | // with dst.end().sub(1) and similarly for right. 96 | pub struct BranchlessMergeState<'l, 'r, 'dst, T, G> { 97 | dst: MutSlice<'dst, Unbranded, T, Weak>, 98 | 99 | // We don't use slices for these as in the disjoint case the left/right 100 | // pointers might cross for invalid comparison operators. 101 | left_begin: *mut T, 102 | left_end: *mut T, 103 | right_begin: *mut T, 104 | right_end: *mut T, 105 | _gap: G, 106 | _lt: PhantomData<(&'l mut (), &'r mut ())>, 107 | } 108 | 109 | impl<'l, 'r, 'dst, T, G> BranchlessMergeState<'l, 'r, 'dst, T, G> { 110 | fn new( 111 | left: MutSlice<'l, BL, T, Weak>, 112 | right: MutSlice<'r, BR, T, Weak>, 113 | dst: MutSlice<'dst, BD, T, Weak>, 114 | gap: G, 115 | ) -> Self { 116 | if left.len() + right.len() != dst.len() { 117 | abort(); 118 | } 119 | 120 | Self { 121 | left_begin: left.begin(), 122 | left_end: left.end(), 123 | right_begin: right.begin(), 124 | right_end: right.end(), 125 | dst: dst.weak().forget_brand(), 126 | _gap: gap, 127 | _lt: PhantomData, 128 | } 129 | } 130 | } 131 | 132 | impl<'l, 'r, 'dst, T> BranchlessMergeState<'l, 'r, 'dst, T, GapBoth> { 133 | pub fn new_disjoint( 134 | left: MutSlice<'l, BL, T, Init>, 135 | right: MutSlice<'r, BR, T, Init>, 136 | dst: MutSlice<'dst, BD, T, Uninit>, 137 | ) -> Self { 138 | Self::new(left.weak(), right.weak(), dst.weak(), GapBoth) 139 | } 140 | } 141 | 142 | impl<'l, 'r, T> BranchlessMergeState<'l, 'r, 'r, T, GapLeft> { 143 | pub fn new_gap_left( 144 | left: GapGuard<'l, 'r, BL, BR, T>, 145 | right: MutSlice<'r, BR, T, AlwaysInit>, 146 | ) -> Self { 147 | unsafe { 148 | // SAFETY: our drop impl will always fill the gap. 149 | let dst = left.gap_weak().concat(right.weak()); 150 | let left = left.take_disjoint().0.weak(); 151 | let right = right.raw().weak(); 152 | Self::new(left, right, dst, GapLeft) 153 | } 154 | } 155 | } 156 | 157 | impl<'l, 'r, T> BranchlessMergeState<'l, 'r, 'l, T, GapRight> { 158 | pub fn new_gap_right( 159 | left: MutSlice<'l, BL, T, AlwaysInit>, 160 | right: GapGuard<'r, 'l, BR, BL, T>, 161 | ) -> Self { 162 | unsafe { 163 | // SAFETY: our drop impl will always fill the gap. 164 | let dst = left.weak().concat(right.gap_weak()); 165 | let left = left.raw().weak(); 166 | let right = right.take_disjoint().0.weak(); 167 | Self::new(left, right, dst, GapRight) 168 | } 169 | } 170 | } 171 | 172 | impl<'l, 'r, 'dst, T, G: HasLeftGap> BranchlessMergeState<'l, 'r, 'dst, T, G> { 173 | /// Merges one element from left, right into the destination, reading/writing 174 | /// at the begin of all the slices. If is_less panics, does nothing. 175 | #[inline(always)] 176 | pub unsafe fn branchless_merge_one_at_begin>(&mut self, is_less: &mut F) { 177 | unsafe { 178 | // Adding 1 and subtracting right_less gave *significantly* faster 179 | // codegen than adding !right_less on my Intel machine since it 180 | // avoided a stack spill, giving much better interleaving after 181 | // inlining. 182 | // 183 | // Do not touch unless you're benchmarking on multiple architectures. 184 | let left_scan = self.left_begin; 185 | let right_scan = self.right_begin; 186 | let right_less = is_less(&*right_scan, &*left_scan); 187 | let src = select(right_less, right_scan, left_scan); 188 | ptr::copy_nonoverlapping(src, self.dst.begin(), 1); 189 | self.dst.add_begin(1); 190 | // self.left_begin = self.left_begin.add((!right_less) as usize); 191 | // self.right_begin = self.right_begin.add(right_less as usize); 192 | self.left_begin = self.left_begin.wrapping_sub(right_less as usize); // Might go out-of-bounds. 193 | self.right_begin = self.right_begin.add(right_less as usize); 194 | self.left_begin = self.left_begin.wrapping_add(1).add(0); // Back in-bounds. 195 | } 196 | } 197 | 198 | /// Exactly the same as branchless_merge_one_at_begin, but does not cause 199 | /// out-of-bounds accesses if *one* of left, right is empty. 200 | #[inline] 201 | pub unsafe fn branchless_merge_one_at_begin_imbalance_guarded>( 202 | &mut self, 203 | is_less: &mut F, 204 | ) { 205 | unsafe { 206 | // Do not touch unless you're benchmarking on multiple architectures. 207 | // See branchless_merge_one_at_begin. 208 | let left_empty = self.left_begin == self.left_end; 209 | let right_nonempty = self.right_begin != self.right_end; 210 | let left_scan = select(left_empty, self.right_begin, self.left_begin); 211 | let right_scan = select(right_nonempty, self.right_begin, self.left_begin); 212 | let right_less = is_less(&*right_scan, &*left_scan); 213 | let shrink_right = right_less & right_nonempty | left_empty; 214 | 215 | let src = select(right_less, right_scan, left_scan); 216 | ptr::copy(src, self.dst.begin(), 1); 217 | self.dst.add_begin(1); 218 | self.left_begin = self.left_begin.wrapping_sub(shrink_right as usize); // Might go out-of-bounds. 219 | self.right_begin = self.right_begin.add(shrink_right as usize); 220 | self.left_begin = self.left_begin.wrapping_add(1).add(0); // Back in-bounds. 221 | } 222 | } 223 | } 224 | 225 | impl<'l, 'r, 'dst, T, G: HasRightGap> BranchlessMergeState<'l, 'r, 'dst, T, G> { 226 | /// Merges one element from left, right into the destination, reading/writing 227 | /// at the end of all the slices. If is_less panics, does nothing. 228 | #[inline(always)] 229 | pub unsafe fn branchless_merge_one_at_end>(&mut self, is_less: &mut F) { 230 | unsafe { 231 | // Do not touch unless you're benchmarking on multiple architectures. 232 | // See branchless_merge_one_at_begin. 233 | let left_scan = self.left_end.sub(1); 234 | let right_scan = self.right_end.sub(1); 235 | let right_less = is_less(&*right_scan, &*left_scan); 236 | let src = select(right_less, left_scan, right_scan); 237 | self.dst.sub_end(1); 238 | ptr::copy_nonoverlapping(src, self.dst.end(), 1); 239 | // self.left_end = self.left_end.sub(right_less as usize); 240 | // self.right_end = self.right_end.sub((!right_less) as usize); 241 | self.right_end = self.right_end.wrapping_add(right_less as usize); // Might go out-of-bounds. 242 | self.left_end = self.left_end.sub(right_less as usize); 243 | self.right_end = self.right_end.wrapping_sub(1).add(0); // Back in-bounds. 244 | } 245 | } 246 | 247 | /// Exactly the same as branchless_merge_one_at_end, but does not cause 248 | /// out-of-bounds accesses if *one* of left, right is empty. 249 | #[inline] 250 | pub unsafe fn branchless_merge_one_at_end_imbalance_guarded>( 251 | &mut self, 252 | is_less: &mut F, 253 | ) { 254 | unsafe { 255 | let left_nonempty = self.left_begin != self.left_end; 256 | let right_empty = self.right_begin == self.right_end; 257 | let left_scan = select(left_nonempty, self.left_end, self.right_end).sub(1); 258 | let right_scan = select(right_empty, self.left_end, self.right_end).sub(1); 259 | let right_less = is_less(&*right_scan, &*left_scan); 260 | let shrink_left = right_less & left_nonempty | right_empty; 261 | 262 | let src = select(right_less, left_scan, right_scan); 263 | self.dst.sub_end(1); 264 | ptr::copy(src, self.dst.end(), 1); 265 | self.right_end = self.right_end.wrapping_add(shrink_left as usize); // Might go out-of-bounds. 266 | self.left_end = self.left_end.sub(shrink_left as usize); 267 | self.right_end = self.right_end.wrapping_sub(1).add(0); // Back in-bounds. 268 | } 269 | } 270 | } 271 | 272 | impl<'l, 'r, 'dst, T, G> BranchlessMergeState<'l, 'r, 'dst, T, G> { 273 | /// In a symmetric merge (k == left_len == right_len) we merge at begin and 274 | /// at end exactly k times. We start at a total size of 2k and each merge 275 | /// reduces it by one. Thus if we check left_begin == left_end we also know 276 | /// right_begin == right_end. This indicates a successful merge, only an 277 | /// invalid comparison operator can violate this (safely, as long as we do 278 | /// not read the elements in the destination). 279 | #[inline(always)] 280 | pub fn symmetric_merge_successful(&self) -> bool { 281 | // Yes, not also checking right_begin == right_end for sanity was ~1% 282 | // slower overall. 283 | self.left_begin == self.left_end 284 | } 285 | 286 | /// It is safe to call a merge operation this many times. 287 | /// If 0 is returned the merge is effectively done since one of the sides is 288 | /// empty. 289 | pub fn num_safe_merge_ops(&self) -> usize { 290 | unsafe { 291 | let left_len = self.left_end.offset_from(self.left_begin); 292 | let right_len = self.right_end.offset_from(self.right_begin); 293 | let min = left_len.min(right_len); 294 | if min < 0 { 295 | // Our scan pointers crossed. This can only happen because 296 | // someone called branchless_merge_one_at_* directly, in which 297 | // case they should not have called this function. 298 | abort(); 299 | } 300 | min as usize 301 | } 302 | } 303 | } 304 | 305 | impl<'l, 'r, 'dst, T> BranchlessMergeState<'l, 'r, 'dst, T, GapLeft> { 306 | #[inline(never)] 307 | pub fn finish_merge>(mut self, is_less: &mut F) { 308 | loop { 309 | let n = self.num_safe_merge_ops(); 310 | if n == 0 { 311 | return; 312 | } 313 | 314 | unsafe { 315 | // SAFETY: we just queried that this many merge ops is safe. 316 | for _ in 0..n / 2 { 317 | self.branchless_merge_one_at_begin(is_less); 318 | self.branchless_merge_one_at_begin(is_less); 319 | } 320 | for _ in 0..n % 2 { 321 | self.branchless_merge_one_at_begin(is_less); 322 | } 323 | } 324 | } 325 | } 326 | } 327 | 328 | impl<'l, 'r, 'dst, T> BranchlessMergeState<'l, 'r, 'dst, T, GapRight> { 329 | #[inline(never)] 330 | pub fn finish_merge>(mut self, is_less: &mut F) { 331 | loop { 332 | let n = self.num_safe_merge_ops(); 333 | if n == 0 { 334 | return; 335 | } 336 | 337 | unsafe { 338 | // SAFETY: we just queried that this many merge ops is safe. 339 | for _ in 0..n / 2 { 340 | self.branchless_merge_one_at_end(is_less); 341 | self.branchless_merge_one_at_end(is_less); 342 | } 343 | for _ in 0..n % 2 { 344 | self.branchless_merge_one_at_end(is_less); 345 | } 346 | } 347 | } 348 | } 349 | } 350 | 351 | impl<'l, 'r, 'dst, T> BranchlessMergeState<'l, 'r, 'dst, T, GapBoth> { 352 | #[inline(never)] 353 | pub fn finish_merge>(mut self, is_less: &mut F) { 354 | loop { 355 | let n = self.num_safe_merge_ops(); 356 | if n == 0 { 357 | return; 358 | } 359 | 360 | unsafe { 361 | // SAFETY: we just queried that this many merge ops is safe. 362 | for _ in 0..n / 4 { 363 | self.branchless_merge_one_at_begin(is_less); 364 | self.branchless_merge_one_at_end(is_less); 365 | self.branchless_merge_one_at_begin(is_less); 366 | self.branchless_merge_one_at_end(is_less); 367 | } 368 | for _ in 0..n % 4 { 369 | self.branchless_merge_one_at_begin(is_less); 370 | } 371 | } 372 | } 373 | } 374 | 375 | #[inline(never)] 376 | pub fn finish_merge_interleaved>(mut self, mut other: Self, is_less: &mut F) { 377 | // Interleave loops while possible. 378 | loop { 379 | let common_remaining = self.num_safe_merge_ops().min(other.num_safe_merge_ops()); 380 | if common_remaining < 2 { 381 | break; 382 | } 383 | 384 | unsafe { 385 | // SAFETY: we just checked that this many merge operations is okay for both. 386 | for _ in 0..common_remaining / 2 { 387 | self.branchless_merge_one_at_begin(is_less); 388 | other.branchless_merge_one_at_begin(is_less); 389 | self.branchless_merge_one_at_end(is_less); 390 | other.branchless_merge_one_at_end(is_less); 391 | } 392 | } 393 | } 394 | 395 | self.finish_merge(is_less); 396 | other.finish_merge(is_less); 397 | } 398 | } 399 | 400 | impl<'l, 'r, 'dst, T, G> Drop for BranchlessMergeState<'l, 'r, 'dst, T, G> { 401 | fn drop(&mut self) { 402 | unsafe { 403 | // Extra sanity check. 404 | let left_len = self 405 | .left_end 406 | .offset_from(self.left_begin) 407 | .try_into() 408 | .unwrap_abort(); 409 | let right_len = self 410 | .right_end 411 | .offset_from(self.right_begin) 412 | .try_into() 413 | .unwrap_abort(); 414 | assert_abort(left_len + right_len == self.dst.len()); 415 | 416 | // SAFETY: ok by our sanity check. 417 | let dst_begin = self.dst.begin(); 418 | let mid = dst_begin.add(left_len); 419 | ptr::copy(self.left_begin, dst_begin, left_len); 420 | ptr::copy(self.right_begin, mid, right_len); 421 | } 422 | } 423 | } 424 | -------------------------------------------------------------------------------- /src/gap_guard.rs: -------------------------------------------------------------------------------- 1 | use crate::mut_slice::states::{Init, Uninit, Weak}; 2 | use crate::mut_slice::{Brand, MutSlice}; 3 | use crate::tracking::ptr; 4 | use crate::util::*; 5 | 6 | /// A struct that makes sure the gap is always filled. 7 | pub struct GapGuard<'data, 'gap, DataB, GapB, T> { 8 | data: MutSlice<'data, DataB, T, Weak>, 9 | gap: MutSlice<'gap, GapB, T, Weak>, 10 | } 11 | 12 | impl<'data, 'gap, DataB: Brand, GapB: Brand, T> GapGuard<'data, 'gap, DataB, GapB, T> { 13 | pub fn new_disjoint( 14 | data: MutSlice<'data, DataB, T, Init>, 15 | gap: MutSlice<'gap, GapB, T, Uninit>, 16 | ) -> Self { 17 | unsafe { Self::new_unchecked(data.weak(), gap.weak()) } 18 | } 19 | 20 | /// SAFETY: it is the caller's responsibility to make sure that when any 21 | /// method other than len, split_at, concat, data_weak or gap_weak is called, 22 | /// or when this struct is dropped, that data is Init. 23 | pub unsafe fn new_unchecked( 24 | data: MutSlice<'data, DataB, T, Weak>, 25 | gap: MutSlice<'gap, GapB, T, Weak>, 26 | ) -> Self { 27 | if data.len() != gap.len() { 28 | abort(); 29 | } 30 | 31 | Self { data, gap } 32 | } 33 | 34 | pub fn len(&self) -> usize { 35 | self.data.len() 36 | } 37 | 38 | pub fn data_weak(&self) -> MutSlice<'data, DataB, T, Weak> { 39 | self.data.weak() 40 | } 41 | 42 | pub fn gap_weak(&self) -> MutSlice<'gap, GapB, T, Weak> { 43 | self.gap.weak() 44 | } 45 | 46 | pub fn split_at(self, i: usize) -> Option<(Self, Self)> { 47 | unsafe { 48 | let (data, gap) = self.take_disjoint(); 49 | if let Some((dl, dr)) = data.split_at(i) { 50 | let (gl, gr) = gap.split_at(i).unwrap_abort(); 51 | Some(( 52 | GapGuard::new_disjoint(dl, gl), 53 | GapGuard::new_disjoint(dr, gr), 54 | )) 55 | } else { 56 | None 57 | } 58 | } 59 | } 60 | 61 | pub fn take_data(self) -> MutSlice<'data, DataB, T, Init> { 62 | unsafe { 63 | let data = self.data.weak(); 64 | core::mem::forget(self); 65 | data.upgrade().assume_init() 66 | } 67 | } 68 | 69 | pub fn as_mut_slice(&mut self) -> &mut [T] { 70 | unsafe { 71 | let begin = self.data.begin(); 72 | core::slice::from_raw_parts_mut(begin, self.data.len()) 73 | } 74 | } 75 | 76 | /// Borrows the gap slice. 77 | /// 78 | /// SAFETY: the gap must be disjoint from the data. 79 | pub unsafe fn borrow_gap<'a>(&'a mut self) -> MutSlice<'a, GapB, T, Uninit> { 80 | unsafe { self.gap.clone().upgrade().assume_uninit() } 81 | } 82 | 83 | /// SAFETY: it is now the callers responsibility to make sure the gap is 84 | /// always filled. The data and gap slices must be disjoint. 85 | pub unsafe fn take_disjoint( 86 | self, 87 | ) -> ( 88 | MutSlice<'data, DataB, T, Init>, 89 | MutSlice<'gap, GapB, T, Uninit>, 90 | ) { 91 | unsafe { 92 | let data = self.data.weak(); 93 | let gap = self.gap.weak(); 94 | core::mem::forget(self); 95 | (data.upgrade().assume_init(), gap.upgrade().assume_uninit()) 96 | } 97 | } 98 | } 99 | 100 | impl<'gap, 'data, GapB, DataB, T> Drop for GapGuard<'gap, 'data, GapB, DataB, T> { 101 | #[inline(never)] 102 | #[cold] 103 | fn drop(&mut self) { 104 | unsafe { 105 | let data_ptr = self.data.begin(); 106 | let gap_ptr = self.gap.begin(); 107 | ptr::copy(data_ptr, gap_ptr, self.data.len()); 108 | } 109 | } 110 | } 111 | -------------------------------------------------------------------------------- /src/glidesort.rs: -------------------------------------------------------------------------------- 1 | use core::mem::MaybeUninit; 2 | 3 | use crate::mut_slice::states::{AlwaysInit, Uninit}; 4 | use crate::mut_slice::{Brand, MutSlice}; 5 | use crate::physical_merges::{physical_merge, physical_quad_merge, physical_triple_merge}; 6 | use crate::stable_quicksort::quicksort; 7 | #[cfg(feature = "tracking")] 8 | use crate::tracking::ptr; 9 | use crate::util::*; 10 | use crate::{powersort, small_sort, tracking, SMALL_SORT}; 11 | 12 | /// A logical run of elements, which can be a series of unsorted elements, 13 | /// a sorted run, or two sorted runs adjacent in memory. 14 | pub enum LogicalRun<'l, B: Brand, T> { 15 | Unsorted(MutSlice<'l, B, T, AlwaysInit>), 16 | Sorted(MutSlice<'l, B, T, AlwaysInit>), 17 | DoubleSorted(MutSlice<'l, B, T, AlwaysInit>, usize), 18 | } 19 | 20 | impl<'l, B: Brand, T> LogicalRun<'l, B, T> { 21 | /// The length of this logical run. 22 | fn len(&self) -> usize { 23 | match self { 24 | LogicalRun::Unsorted(r) => r.len(), 25 | LogicalRun::Sorted(r) => r.len(), 26 | LogicalRun::DoubleSorted(r, _mid) => r.len(), 27 | } 28 | } 29 | 30 | /// Create a new logical run at the start of el, returning the rest. 31 | fn create>( 32 | mut el: MutSlice<'l, B, T, AlwaysInit>, 33 | is_less: &mut F, 34 | eager_smallsort: bool, 35 | ) -> (Self, MutSlice<'l, B, T, AlwaysInit>) { 36 | // Check if input is (partially) pre-sorted in a meaningful way. 37 | if el.len() >= SMALL_SORT { 38 | let (run_length, descending) = run_length_at_start(el.as_mut_slice(), is_less); 39 | if run_length >= SMALL_SORT && run_length * run_length >= el.len() / 2 { 40 | if descending { 41 | #[cfg(feature = "tracking")] 42 | { 43 | for i in 0..run_length / 2 { 44 | unsafe { 45 | ptr::swap_nonoverlapping( 46 | el.begin().add(i), 47 | el.begin().add(run_length - 1 - i), 48 | 1, 49 | ); 50 | } 51 | } 52 | } 53 | #[cfg(not(feature = "tracking"))] 54 | { 55 | el.as_mut_slice()[..run_length].reverse(); 56 | } 57 | } 58 | let (run, rest) = el.split_at(run_length).unwrap(); 59 | return (LogicalRun::Sorted(run), rest); 60 | } 61 | } 62 | 63 | // Otherwise create a small unsorted run. Capping this at SMALL_SORT ensures 64 | // we're always able to sort this later, regardless of scratch space size. 65 | let skip = SMALL_SORT.min(el.len()); 66 | let (mut run, rest) = el.split_at(skip).unwrap(); 67 | if eager_smallsort { 68 | small_sort::small_sort(run.borrow(), is_less); 69 | (LogicalRun::Sorted(run), rest) 70 | } else { 71 | (LogicalRun::Unsorted(run), rest) 72 | } 73 | } 74 | 75 | /// Merges runs self (left) and right using the given scratch space. 76 | /// 77 | /// Panics if is_less does, aborts if left, right aren't contiguous. 78 | fn logical_merge<'sc, BS: Brand, F: Cmp>( 79 | self, 80 | right: LogicalRun<'l, B, T>, 81 | mut scratch: MutSlice<'sc, BS, T, Uninit>, 82 | is_less: &mut F, 83 | ) -> LogicalRun<'l, B, T> { 84 | use LogicalRun::*; 85 | match (self, right) { 86 | // Only combine unsorted runs if it still fits in the scratch space. 87 | (Unsorted(l), Unsorted(r)) if l.len() + r.len() <= scratch.len() => { 88 | Unsorted(l.concat(r)) 89 | } 90 | (Unsorted(l), r) => { 91 | let l = quicksort(l, scratch.borrow(), is_less); 92 | Sorted(l).logical_merge(r, scratch, is_less) 93 | } 94 | (l, Unsorted(r)) => { 95 | let r = quicksort(r, scratch.borrow(), is_less); 96 | l.logical_merge(Sorted(r), scratch, is_less) 97 | } 98 | (Sorted(l), Sorted(r)) => { 99 | let mid = l.len(); 100 | DoubleSorted(l.concat(r), mid) 101 | } 102 | (DoubleSorted(l, mid), Sorted(r)) => { 103 | let (l0, l1) = l.split_at(mid).unwrap(); 104 | Sorted(physical_triple_merge(l0, l1, r, scratch, is_less)) 105 | } 106 | (Sorted(l), DoubleSorted(r, mid)) => { 107 | let (r0, r1) = r.split_at(mid).unwrap(); 108 | Sorted(physical_triple_merge(l, r0, r1, scratch, is_less)) 109 | } 110 | (DoubleSorted(l, lmid), DoubleSorted(r, rmid)) => { 111 | let (l0, l1) = l.split_at(lmid).unwrap(); 112 | let (r0, r1) = r.split_at(rmid).unwrap(); 113 | Sorted(physical_quad_merge(l0, l1, r0, r1, scratch, is_less)) 114 | } 115 | } 116 | } 117 | 118 | /// Ensures that this run is physically sorted. 119 | fn physical_sort<'sc, BS: Brand, F: Cmp>( 120 | self, 121 | scratch: MutSlice<'sc, BS, T, Uninit>, 122 | is_less: &mut F, 123 | ) -> MutSlice<'l, B, T, AlwaysInit> { 124 | match self { 125 | LogicalRun::Sorted(run) => run, 126 | LogicalRun::Unsorted(run) => quicksort(run, scratch, is_less), 127 | LogicalRun::DoubleSorted(run, mid) => { 128 | let (left, right) = run.split_at(mid).unwrap(); 129 | physical_merge(left, right, scratch, is_less) 130 | } 131 | } 132 | } 133 | } 134 | 135 | // Each logical run on the merge stack represents a node in the merge tree. This 136 | // node has fully completed merging its left children, the result of these merge 137 | // operations is the logical run stored on the stack (even though physically the 138 | // run might be DoubleSorted in which case it would still need one more merge 139 | // operation). 140 | // 141 | // Each node doesn't know exactly its depth in the final merge tree, but it does 142 | // know which depth it would *like* to have in the final merge tree. Using these 143 | // desired depths calculated using Powersort's logic we decide which logical 144 | // runs to merge if any when a new run arrives. Powersort guarantees that our 145 | // stack size remains constant if we follow these merge depths as any desired 146 | // depth is less than 64 and the desired depths on the stack is strictly ascending. 147 | struct MergeStack<'l, B: Brand, T> { 148 | left_children: [MaybeUninit>; 64], 149 | desired_depths: [MaybeUninit; 64], 150 | len: usize, 151 | } 152 | 153 | impl<'l, B: Brand, T> MergeStack<'l, B, T> { 154 | /// Creates an empty merge stack. 155 | fn new() -> Self { 156 | unsafe { 157 | // SAFETY: an array of MaybeUninit's is trivially init. 158 | Self { 159 | left_children: MaybeUninit::uninit().assume_init(), 160 | desired_depths: MaybeUninit::uninit().assume_init(), 161 | len: 0, 162 | } 163 | } 164 | } 165 | 166 | /// Push a merge node on the stack given its left child and desired depth. 167 | fn push_node(&mut self, left_child: LogicalRun<'l, B, T>, desired_depth: u8) { 168 | self.left_children[self.len] = MaybeUninit::new(left_child); 169 | self.desired_depths[self.len] = MaybeUninit::new(desired_depth); 170 | self.len += 1; 171 | } 172 | 173 | /// Pop a merge node off the stack, returning its left child. 174 | fn pop_node(&mut self) -> Option> { 175 | if self.len == 0 { 176 | return None; 177 | } 178 | 179 | // SAFETY: len > 0 guarantees this is initialized by a previous push. 180 | self.len -= 1; 181 | Some(unsafe { 182 | self.left_children 183 | .get_unchecked(self.len) 184 | .assume_init_read() 185 | }) 186 | } 187 | 188 | /// Returns the desired depth of the merge node at the top of the stack. 189 | fn peek_desired_depth(&self) -> Option { 190 | if self.len == 0 { 191 | return None; 192 | } 193 | 194 | // SAFETY: len > 0 guarantees this is initialized by a previous push. 195 | Some(unsafe { 196 | self.desired_depths 197 | .get_unchecked(self.len - 1) 198 | .assume_init() 199 | }) 200 | } 201 | } 202 | 203 | pub fn glidesort<'el, 'sc, BE: Brand, BS: Brand, T, F: Cmp>( 204 | mut el: MutSlice<'el, BE, T, AlwaysInit>, 205 | mut scratch: MutSlice<'sc, BS, T, Uninit>, 206 | is_less: &mut F, 207 | eager_smallsort: bool, 208 | ) { 209 | if scratch.len() < SMALL_SORT { 210 | // Sanity fallback, we *need* at least SMALL_SORT buffer size. 211 | let mut v = Vec::with_capacity(SMALL_SORT); 212 | let (_, new_buffer) = split_at_spare_mut(&mut v); 213 | return MutSlice::from_maybeuninit_mut_slice(new_buffer, |new_scratch| { 214 | glidesort(el, new_scratch.assume_uninit(), is_less, eager_smallsort) 215 | }); 216 | } 217 | 218 | tracking::register_buffer("input", el.weak()); 219 | tracking::register_buffer("scratch", scratch.weak()); 220 | 221 | let scale_factor = powersort::merge_tree_scale_factor(el.len()); 222 | let mut merge_stack = MergeStack::new(); 223 | 224 | let mut prev_run_start_idx = 0; 225 | let mut prev_run; 226 | (prev_run, el) = LogicalRun::create(el, is_less, eager_smallsort); 227 | while el.len() > 0 { 228 | let next_run_start_idx = prev_run_start_idx + prev_run.len(); 229 | let next_run; 230 | (next_run, el) = LogicalRun::create(el, is_less, eager_smallsort); 231 | 232 | let desired_depth = powersort::merge_tree_depth( 233 | prev_run_start_idx, 234 | next_run_start_idx, 235 | next_run_start_idx + next_run.len(), 236 | scale_factor, 237 | ); 238 | 239 | // Create the left child of our next node and eagerly merge all nodes 240 | // with a deeper desired merge depth into it. 241 | let mut left_child = prev_run; 242 | while merge_stack 243 | .peek_desired_depth() 244 | .map(|top_depth| top_depth >= desired_depth) 245 | .unwrap_or(false) 246 | { 247 | let left_descendant = merge_stack.pop_node().unwrap(); 248 | left_child = left_descendant.logical_merge(left_child, scratch.borrow(), is_less); 249 | } 250 | 251 | merge_stack.push_node(left_child, desired_depth); 252 | prev_run_start_idx = next_run_start_idx; 253 | prev_run = next_run; 254 | } 255 | 256 | // Collapse the stack down to a single logical run and physically sort it. 257 | let mut result = prev_run; 258 | while let Some(left_child) = merge_stack.pop_node() { 259 | result = left_child.logical_merge(result, scratch.borrow(), is_less); 260 | } 261 | result.physical_sort(scratch, is_less); 262 | 263 | tracking::deregister_buffer("input"); 264 | tracking::deregister_buffer("scratch"); 265 | } 266 | 267 | /// Returns the length of the run at the start of v, and if that run is 268 | /// strictly descending. 269 | fn run_length_at_start>(v: &[T], is_less: &mut F) -> (usize, bool) { 270 | let descending = v.len() >= 2 && is_less(&v[1], &v[0]); 271 | if descending { 272 | for i in 2..v.len() { 273 | if !is_less(&v[i], &v[i - 1]) { 274 | return (i, true); 275 | } 276 | } 277 | } else { 278 | for i in 2..v.len() { 279 | if is_less(&v[i], &v[i - 1]) { 280 | return (i, false); 281 | } 282 | } 283 | } 284 | (v.len(), descending) 285 | } 286 | -------------------------------------------------------------------------------- /src/lib.rs: -------------------------------------------------------------------------------- 1 | #![deny(unsafe_op_in_unsafe_fn)] 2 | #![cfg_attr(feature = "unstable", feature(core_intrinsics))] 3 | #![cfg_attr(feature = "unstable", allow(incomplete_features))] 4 | #![cfg_attr(feature = "unstable", feature(specialization, marker_trait_attr))] 5 | #![allow(clippy::needless_lifetimes)] // Improves readability despite what clippy claims. 6 | #![allow(clippy::type_complexity)] // Somethings things get complex... 7 | #![allow(clippy::unnecessary_mut_passed)] // Exclusivity assertions. 8 | #![allow(clippy::forget_non_drop)] // Forgetting soon-to-be-overlapped slices is important. 9 | 10 | //! Glidesort is a novel stable sorting algorithm that combines the best-case behavior of 11 | //! Timsort-style merge sorts for pre-sorted data with the best-case behavior of pattern-defeating 12 | //! quicksort for data with many duplicates. It is a comparison-based sort supporting arbitrary 13 | //! comparison operators, and while exceptional on data with patterns it is also very fast for 14 | //! random data. 15 | //! 16 | //! For more information see the [readme](https://github.com/orlp/glidesort). 17 | 18 | // We avoid a dynamic allocation for our scratch buffer if a scratch buffer of 19 | // this size is sufficient and the user did not provide one. 20 | const MAX_STACK_SCRATCH_SIZE_BYTES: usize = 4096; 21 | 22 | // When sorting N elements we allocate a buffer of at most size N, N/2 or N/8 23 | // depending on how large the data is. 24 | const FULL_ALLOC_MAX_BYTES: usize = 1024 * 1024; 25 | const HALF_ALLOC_MAX_BYTES: usize = 1024 * 1024 * 1024; 26 | 27 | // If the total size of a merge operation is above this threshold glidesort will 28 | // attempt to split it into (instruction-level) parallel merges when applicable. 29 | const MERGE_SPLIT_THRESHOLD: usize = 32; 30 | 31 | // Recursively select a pseudomedian if above this threshold. 32 | const PSEUDO_MEDIAN_REC_THRESHOLD: usize = 64; 33 | 34 | // For this many or fewer elements we switch to our small sorting algorithm. 35 | const SMALL_SORT: usize = 48; 36 | 37 | // We always need the tracking module internally to provide a fallback dummy 38 | // implementation to prevent adding conditional compilation everywhere. 39 | #[cfg(not(feature = "tracking"))] 40 | mod tracking; 41 | #[cfg(feature = "tracking")] 42 | pub mod tracking; 43 | 44 | mod branchless_merge; 45 | mod gap_guard; 46 | mod glidesort; 47 | mod merge_reduction; 48 | mod mut_slice; 49 | mod physical_merges; 50 | mod pivot_selection; 51 | mod powersort; 52 | mod small_sort; 53 | mod stable_quicksort; 54 | mod util; 55 | 56 | use core::cmp::Ordering; 57 | use core::mem::{ManuallyDrop, MaybeUninit}; 58 | 59 | use util::*; 60 | 61 | use crate::mut_slice::states::AlwaysInit; 62 | use crate::mut_slice::{Brand, MutSlice}; 63 | 64 | fn glidesort_alloc_size(n: usize) -> usize { 65 | let tlen = core::mem::size_of::(); 66 | let full_allowed = n.min(FULL_ALLOC_MAX_BYTES / tlen); 67 | let half_allowed = (n / 2).min(HALF_ALLOC_MAX_BYTES / tlen); 68 | let eighth_allowed = n / 8; 69 | full_allowed 70 | .max(half_allowed) 71 | .max(eighth_allowed) 72 | .max(SMALL_SORT) 73 | } 74 | 75 | /// See [`slice::sort`]. 76 | pub fn sort(v: &mut [T]) { 77 | sort_with_vec_by(v, &mut Vec::new(), |a, b| a.cmp(b)) 78 | } 79 | 80 | /// See [`slice::sort_by_key`]. 81 | pub fn sort_by_key K, K: Ord>(v: &mut [T], mut f: F) { 82 | sort_with_vec_by(v, &mut Vec::new(), |a, b| f(a).cmp(&f(b))) 83 | } 84 | 85 | /// See [`slice::sort_by`]. 86 | pub fn sort_by(v: &mut [T], compare: F) 87 | where 88 | F: FnMut(&T, &T) -> Ordering, 89 | { 90 | sort_with_vec_by(v, &mut Vec::new(), compare) 91 | } 92 | 93 | /// Like [`sort`], except this function allocates its scratch space with `scratch_buf.reserve(_)`. 94 | /// 95 | /// This allows you to re-use the same allocation many times. 96 | pub fn sort_with_vec(v: &mut [T], scratch_buf: &mut Vec) { 97 | sort_with_vec_by(v, scratch_buf, |a, b| a.cmp(b)) 98 | } 99 | 100 | /// Like [`sort_by_key`], except this function allocates its scratch space with `scratch_buf.reserve(_)`. 101 | /// 102 | /// This allows you to re-use the same allocation many times. 103 | pub fn sort_with_vec_by_key K, K: Ord>( 104 | v: &mut [T], 105 | scratch_buf: &mut Vec, 106 | mut f: F, 107 | ) { 108 | sort_with_vec_by(v, scratch_buf, |a, b| f(a).cmp(&f(b))) 109 | } 110 | 111 | /// Like [`sort_by`], except this function allocates its scratch space with `scratch_buf.reserve(_)`. 112 | /// 113 | /// This allows you to re-use the same allocation many times. 114 | pub fn sort_with_vec_by(v: &mut [T], scratch_buf: &mut Vec, mut compare: F) 115 | where 116 | F: FnMut(&T, &T) -> Ordering, 117 | { 118 | // Zero-sized types are either always or never sorted, as they can not carry 119 | // any information that would allow the permutation to change. 120 | if core::mem::size_of::() == 0 { 121 | return; 122 | } 123 | 124 | let mut is_less = cmp_from_closure(|a, b| { 125 | tracking::register_cmp(a, b); 126 | compare(a, b) == Ordering::Less 127 | }); 128 | 129 | let n = v.len(); 130 | MutSlice::from_mut_slice(v, |el| { 131 | // Fast path for very small arrays. 132 | if n < SMALL_SORT { 133 | return small_sort::small_sort(el, &mut is_less); 134 | } 135 | 136 | // Avoid dynamic allocation if possible. 137 | let stack_buffer_cap = MAX_STACK_SCRATCH_SIZE_BYTES / core::mem::size_of::(); 138 | if stack_buffer_cap >= n / 2 { 139 | return glidesort_with_max_stack_scratch(el, &mut is_less); 140 | } 141 | 142 | let (_, buffer) = make_scratch_after_vec(scratch_buf, glidesort_alloc_size::(n)); 143 | MutSlice::from_maybeuninit_mut_slice(buffer, |scratch| { 144 | glidesort::glidesort(el, scratch.assume_uninit(), &mut is_less, false) 145 | }) 146 | }) 147 | } 148 | 149 | /// Like [`sort`], except this function does not allocate and uses the passed buffer instead. 150 | pub fn sort_with_buffer(v: &mut [T], buffer: &mut [MaybeUninit]) { 151 | sort_with_buffer_by(v, buffer, |a, b| a.cmp(b)) 152 | } 153 | 154 | /// Like [`sort_by_key`], except this function does not allocate and uses the passed buffer instead. 155 | pub fn sort_with_buffer_by_key K, K: Ord>( 156 | v: &mut [T], 157 | buffer: &mut [MaybeUninit], 158 | mut f: F, 159 | ) { 160 | sort_with_buffer_by(v, buffer, |a, b| f(a).cmp(&f(b))) 161 | } 162 | 163 | /// Like [`sort_by`], except this function does not allocate and uses the passed buffer instead. 164 | pub fn sort_with_buffer_by(v: &mut [T], buffer: &mut [MaybeUninit], mut compare: F) 165 | where 166 | F: FnMut(&T, &T) -> Ordering, 167 | { 168 | // Zero-sized types are either always or never sorted, as they can not carry 169 | // any information that would allow the permutation to change. 170 | if core::mem::size_of::() == 0 { 171 | return; 172 | } 173 | 174 | let mut is_less = cmp_from_closure(|a, b| { 175 | tracking::register_cmp(a, b); 176 | compare(a, b) == Ordering::Less 177 | }); 178 | 179 | let n = v.len(); 180 | MutSlice::from_mut_slice(v, |el| { 181 | // Fast path for very small arrays. 182 | if n < SMALL_SORT { 183 | return small_sort::small_sort(el, &mut is_less); 184 | } 185 | 186 | MutSlice::from_maybeuninit_mut_slice(buffer, |scratch| { 187 | glidesort::glidesort(el, scratch.assume_uninit(), &mut is_less, false) 188 | }) 189 | }) 190 | } 191 | 192 | /// Like [`sort`], except this function allocates its space at the end of the given `Vec`. 193 | pub fn sort_in_vec(v: &mut Vec) { 194 | sort_in_vec_by(v, |a, b| a.cmp(b)) 195 | } 196 | 197 | /// Like [`sort_by_key`], except this function allocates its space at the end of the given `Vec`. 198 | pub fn sort_in_vec_by_key K, K: Ord>(v: &mut Vec, mut f: F) { 199 | sort_in_vec_by(v, |a, b| f(a).cmp(&f(b))) 200 | } 201 | 202 | /// Like [`sort_by`], except this function allocates its space at the end of the given `Vec`. 203 | pub fn sort_in_vec_by(v: &mut Vec, mut compare: F) 204 | where 205 | F: FnMut(&T, &T) -> Ordering, 206 | { 207 | // Zero-sized types are either always or never sorted, as they can not carry 208 | // any information that would allow the permutation to change. 209 | if core::mem::size_of::() == 0 { 210 | return; 211 | } 212 | 213 | let mut is_less = cmp_from_closure(|a, b| { 214 | tracking::register_cmp(a, b); 215 | compare(a, b) == Ordering::Less 216 | }); 217 | 218 | let n = v.len(); 219 | // Fast path for very small arrays. 220 | if n < SMALL_SORT { 221 | return MutSlice::from_mut_slice(v, |el| small_sort::small_sort(el, &mut is_less)); 222 | } 223 | 224 | // Avoid dynamic allocation if possible. 225 | let stack_buffer_cap = MAX_STACK_SCRATCH_SIZE_BYTES / core::mem::size_of::(); 226 | if stack_buffer_cap >= n / 2 { 227 | return MutSlice::from_mut_slice(v, |el| { 228 | glidesort_with_max_stack_scratch(el, &mut is_less) 229 | }); 230 | } 231 | 232 | let (el, buffer) = make_scratch_after_vec(v, glidesort_alloc_size::(n)); 233 | MutSlice::from_mut_slice(el, |el| { 234 | MutSlice::from_maybeuninit_mut_slice(buffer, |scratch| { 235 | glidesort::glidesort(el, scratch.assume_uninit(), &mut is_less, false) 236 | }) 237 | }) 238 | } 239 | 240 | /// Make and return scratch space after the elements of a Vec. 241 | fn make_scratch_after_vec( 242 | buffer: &mut Vec, 243 | mut target_size: usize, 244 | ) -> (&mut [T], &mut [MaybeUninit]) { 245 | // Avoid reallocation if reasonable. 246 | let free_capacity = buffer.capacity() - buffer.len(); 247 | if free_capacity / 2 < target_size || free_capacity < SMALL_SORT { 248 | while buffer.try_reserve(target_size).is_err() { 249 | // We are in a low-memory situation, we'd much prefer a bit slower sorting 250 | // over completely running out, so aggressively reduce our memory request. 251 | target_size /= 8; 252 | if target_size == 0 { 253 | return (&mut buffer[..], &mut []); 254 | } 255 | } 256 | } 257 | 258 | split_at_spare_mut(buffer) 259 | } 260 | 261 | // We really don't want to inline this in order to prevent always taking up 262 | // extra stack space in non-taken branches, mainly for embedded devices. 263 | // A buffer of N bytes aligned as T would be. 264 | #[repr(C)] 265 | union AlignedBuffer { 266 | buffer: [MaybeUninit; N], 267 | _dummy_for_alignment: ManuallyDrop>, 268 | } 269 | 270 | #[inline(never)] 271 | #[cold] 272 | fn glidesort_with_max_stack_scratch<'l, B: Brand, T, F: Cmp>( 273 | el: MutSlice<'l, B, T, AlwaysInit>, 274 | is_less: &mut F, 275 | ) { 276 | unsafe { 277 | // SAFETY: we assume a [MaybeUninit; N] is initialized, which it 278 | // trivially is as it makes no guarantees. 279 | #[allow(clippy::uninit_assumed_init)] 280 | let mut aligned_buffer: AlignedBuffer = 281 | MaybeUninit::uninit().assume_init(); 282 | let aligned_buffer_bytes = aligned_buffer.buffer.as_mut_slice(); 283 | 284 | // SAFETY: our buffer is aligned and we can fit this many elements. 285 | let max_elements = MAX_STACK_SCRATCH_SIZE_BYTES / core::mem::size_of::(); 286 | let buffer = core::slice::from_raw_parts_mut( 287 | aligned_buffer_bytes.as_mut_ptr().cast::>(), 288 | max_elements, 289 | ); 290 | 291 | MutSlice::from_maybeuninit_mut_slice(buffer, |scratch| { 292 | glidesort::glidesort(el, scratch.assume_uninit(), is_less, false) 293 | }) 294 | } 295 | } 296 | -------------------------------------------------------------------------------- /src/merge_reduction.rs: -------------------------------------------------------------------------------- 1 | use crate::util::*; 2 | 3 | /// Shrinks a stable merge of two slices to a (potentially) smaller 4 | /// case. Returns indices (l, r) such that (left[l..], right[..r]) is the 5 | /// smaller case, or returns None if done. 6 | /// 7 | /// Panics if if_less does. 8 | pub fn shrink_stable_merge>( 9 | left: &[T], 10 | right: &[T], 11 | is_less: &mut F, 12 | ) -> Option<(usize, usize)> { 13 | // Is neither left or right empty? 14 | if let (Some(l_last), Some(r_first)) = (left.last(), right.first()) { 15 | // Are we already completely sorted? 16 | if is_less(r_first, l_last) { 17 | // Find extremal elements that end up in a different position when 18 | // merged. 19 | // TODO: potential optimization here, check first, say, 16 elements 20 | // and if all are already correct switch to binary search. 21 | let first_l_to_r = left.iter().position(|l| is_less(r_first, l)); 22 | let last_r_to_l = right.iter().rposition(|r| is_less(r, l_last)); 23 | 24 | // This should not be possible to fail, but is_less might be buggy. 25 | if let (Some(l), Some(r)) = (first_l_to_r, last_r_to_l) { 26 | // r + 1 because we want an exclusive end bound. 27 | return Some((l, r + 1)); 28 | } 29 | } 30 | } 31 | 32 | None 33 | } 34 | 35 | // Given sorted left, right of equal size n it finds smallest i such that 36 | // for all l in left[i..] and r in right[..n-i] we have l > r. By vacuous truth 37 | // n is always a solution (but not necessarily the smallest). 38 | // 39 | // Panics if left, right aren't equal in size or if is_less does. 40 | pub fn crossover_point>(left: &[T], right: &[T], is_less: &mut F) -> usize { 41 | // Since left and right are sorted we only need to find the smallest i 42 | // satisfying left[i] > right[n-i-1], or return n if there aren't any, 43 | // since we'd have l >= left[i] > right[n-i-1] >= r for any l in 44 | // left[i..] and r in right[..n-i]. 45 | let n = left.len(); 46 | assert!(right.len() == n); 47 | 48 | let mut lo = 0; 49 | let mut maybe = n; 50 | // Invariant (1), every position before lo is ruled out. 51 | // Invariant (2), every position after lo + maybe is ruled out. 52 | // Invariant (3), lo + maybe <= n. 53 | // All hold right now, (1) and (2) by vacuous truth. 54 | 55 | // This terminates because each iteration guarantees that 56 | // new_maybe <= floor(maybe / 2) < maybe. 57 | while maybe > 0 { 58 | let step = maybe / 2; 59 | let i = lo + step; 60 | let i_valid = unsafe { 61 | // SAFETY: step < maybe and by invariant (3) we have 62 | // i = lo + step < lo + maybe <= n, thus i < n. Finally, 63 | // left.len() == right.len() so n - 1 - i is also in bounds 64 | is_less(right.get_unchecked(n - 1 - i), left.get_unchecked(i)) 65 | }; 66 | if i_valid { 67 | // Rule out the positions after i = lo + step as i is valid. 68 | // Explicitly maintains invariant (2), and (3) since step < maybe. 69 | // Invariant (1) is unchanged. 70 | maybe = step; 71 | } else { 72 | // Rule out the elements before i = lo + step and i itself. 73 | // Explicitly maintains invariant (1), invariants (2, 3) are 74 | // unchanged because lo + maybe doesn't change. 75 | lo += step + 1; 76 | maybe -= step + 1; 77 | } 78 | } 79 | 80 | // All elements before and after lo have been ruled out, so lo remains. 81 | lo 82 | } 83 | 84 | // Given two sorted slices left, right computes two splitpoints l, r such that 85 | // stably merging (left[..l], right[..r]) followed by (left[l..], right[r..]) is 86 | // equivalent to stably merging (left, right). It is guaranteed that right[..r] 87 | // and left[l..] are of equal size. 88 | // 89 | // Panics if is_less does. 90 | pub fn merge_splitpoints>(left: &[T], right: &[T], is_less: &mut F) -> (usize, usize) { 91 | let minlen = left.len().min(right.len()); 92 | let left_skip = left.len() - minlen; 93 | let i = crossover_point(&left[left_skip..], &right[..minlen], is_less); 94 | (left_skip + i, minlen - i) 95 | } 96 | -------------------------------------------------------------------------------- /src/mut_slice.rs: -------------------------------------------------------------------------------- 1 | #![allow(dead_code)] 2 | 3 | use core::marker::PhantomData; 4 | use core::mem::MaybeUninit; 5 | 6 | use crate::tracking::{self, ptr}; 7 | use crate::util::abort; 8 | 9 | // Branding. 10 | // See GhostCell: https://plv.mpi-sws.org/rustbelt/ghostcell/. 11 | pub struct InvariantLifetime<'id>(PhantomData<*mut &'id ()>); 12 | pub struct Unbranded; 13 | 14 | pub trait Brand {} 15 | impl<'id> Brand for InvariantLifetime<'id> {} 16 | 17 | // States. 18 | #[rustfmt::skip] 19 | #[allow(clippy::missing_safety_doc)] 20 | pub mod states { 21 | pub struct Weak; 22 | pub struct Uninit; 23 | pub struct MaybeInit; 24 | pub struct Init; 25 | pub struct AlwaysInit; 26 | 27 | pub unsafe trait State { type Borrowed; } 28 | unsafe impl State for Weak { type Borrowed = Weak; } 29 | unsafe impl State for Uninit { type Borrowed = Uninit; } 30 | unsafe impl State for MaybeInit { type Borrowed = MaybeInit; } 31 | unsafe impl State for Init { type Borrowed = AlwaysInit; } // Borrow may not invalidate Init. 32 | unsafe impl State for AlwaysInit { type Borrowed = AlwaysInit; } 33 | 34 | pub unsafe trait RawAccess {} 35 | unsafe impl RawAccess for Uninit {} 36 | unsafe impl RawAccess for MaybeInit {} 37 | unsafe impl RawAccess for Init {} 38 | 39 | pub unsafe trait IsInit {} 40 | unsafe impl IsInit for Init {} 41 | unsafe impl IsInit for AlwaysInit {} 42 | } 43 | 44 | use states::*; 45 | 46 | /// A more flexible mutable slice for dealing with splitting/concatenation 47 | /// better in a safe way, and tracking initialization state. Without this 48 | /// type-level safety programming almost all of Glidesort would have to be 49 | /// unsafe using raw pointers. NOTE: MutSlice is not to be used with zero-sized 50 | /// types. 51 | /// 52 | /// There are two extra aspects to a MutSlice (other than lifetime 'l and type T): 53 | /// 54 | /// - Branding B. Each MutSlice is either branded or unbranded. If a MutSlice is 55 | /// branded we know for certain that it is uniquely associated with some 56 | /// allocation that its pointers have full provenance over, so we can safely 57 | /// concatenate slices with the same brand. 58 | /// 59 | /// - State S. Used to track the state of this slice. 60 | /// 61 | /// 1. Weak, in this state the slice contents may not be accessed through 62 | /// this slice object, but using it a location can be kept around for 63 | /// bookkeeping purposes. A weak slice can be upgraded to the other 64 | /// states, albeit unsafely, as it could create aliasing mutable slices. 65 | /// 2. Uninit, stating this slice *currently* has no initialized values. 66 | /// This is a very weak guarantee since violating it simply instantly 67 | /// leaks values. It is mainly intended as a hint. 68 | /// 3. MaybeInit, making no guarantees, like &mut [MaybeUninit]. To 69 | /// prevent confusion with that type we chose this name. 70 | /// 4. Init, stating this slice *currently* only has initialized values. 71 | /// 5. AlwaysInit, like Init but the region of memory this slice points to 72 | /// must *always* be restored to be fully initialized regardless of what 73 | /// happens, even if something panics. This is similar to &mut [T]. 74 | /// Getting out of this state is always unsafe for this reason. 75 | /// 76 | /// Other than the weak state all states imply disjointness with all other 77 | /// slices. 78 | pub struct MutSlice<'l, B, T, S> { 79 | begin: *mut T, 80 | end: *mut T, 81 | _lifetime: PhantomData<&'l mut T>, 82 | _metadata: PhantomData<(B, S)>, 83 | } 84 | 85 | unsafe impl<'l, B, T, S> Send for MutSlice<'l, B, T, S> {} 86 | 87 | impl<'l, B, T, S> core::fmt::Debug for MutSlice<'l, B, T, S> { 88 | fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { 89 | f.debug_struct("MutSlice") 90 | .field("begin", &self.begin) 91 | .field("end", &self.end) 92 | .finish() 93 | } 94 | } 95 | 96 | // The only ways to create a new branded slice. 97 | impl<'l, 'b, T> MutSlice<'l, InvariantLifetime<'b>, T, AlwaysInit> { 98 | #[inline] 99 | pub fn from_mut_slice( 100 | sl: &'l mut [T], 101 | f: impl for<'b2> FnOnce(MutSlice<'l, InvariantLifetime<'b2>, T, AlwaysInit>) -> R, 102 | ) -> R { 103 | let len = sl.len(); 104 | let ptr = sl.as_mut_ptr(); 105 | f(unsafe { 106 | // SAFETY: we create a new unique distinct brand through phantom 107 | // lifetime 'b2 for this allocation, and we respect &mut [T]'s 108 | // AlwaysInit property. 109 | MutSlice::from_pair_unchecked(ptr, ptr.add(len)) 110 | }) 111 | } 112 | } 113 | 114 | impl<'l, 'b, T> MutSlice<'l, InvariantLifetime<'b>, T, MaybeInit> { 115 | #[inline] 116 | pub fn from_maybeuninit_mut_slice( 117 | sl: &'l mut [MaybeUninit], 118 | f: impl for<'b2> FnOnce(MutSlice<'l, InvariantLifetime<'b2>, T, MaybeInit>) -> R, 119 | ) -> R { 120 | let len = sl.len(); 121 | let ptr = sl.as_mut_ptr() as *mut T; 122 | f(unsafe { 123 | // SAFETY: we create a new unique distinct brand through phantom 124 | // lifetime 'b2 for this allocation. 125 | MutSlice::from_pair_unchecked(ptr, ptr.add(len)) 126 | }) 127 | } 128 | } 129 | 130 | impl<'l, B, T, S> MutSlice<'l, B, T, S> { 131 | // Basics. 132 | #[inline] 133 | pub unsafe fn from_pair_unchecked(begin: *mut T, end: *mut T) -> Self { 134 | // SAFETY: up to the caller to check. 135 | Self { 136 | begin, 137 | end, 138 | _lifetime: PhantomData, 139 | _metadata: PhantomData, 140 | } 141 | } 142 | 143 | #[inline] 144 | pub unsafe fn transmute_metadata(self) -> MutSlice<'l, B2, T, S2> { 145 | // SAFETY: up to the caller to check. 146 | unsafe { MutSlice::from_pair_unchecked(self.begin, self.end) } 147 | } 148 | 149 | #[inline] 150 | pub fn len(&self) -> usize { 151 | // SAFETY: okay per our type invariant. 152 | unsafe { self.end.offset_from(self.begin) as usize } 153 | } 154 | 155 | #[inline] 156 | pub fn contains(&self, ptr: *mut T) -> bool { 157 | (self.begin..self.end).contains(&ptr) 158 | } 159 | 160 | /// Splits this slice into slice[..i] and slice[i..]. 161 | /// Returns None if i > self.len(). 162 | #[inline] 163 | pub fn split_at(self, i: usize) -> Option<(Self, Self)> { 164 | unsafe { 165 | if i <= self.len() { 166 | // SAFETY: if check protects us. 167 | let mid = self.begin.add(i); 168 | 169 | // SAFETY: brand/state is preserved. 170 | Some(( 171 | Self::from_pair_unchecked(self.begin, mid), 172 | Self::from_pair_unchecked(mid, self.end), 173 | )) 174 | } else { 175 | None 176 | } 177 | } 178 | } 179 | 180 | #[inline] 181 | pub fn split_at_end(self, i: usize) -> Option<(Self, Self)> { 182 | self.len().checked_sub(i).and_then(|ni| self.split_at(ni)) 183 | } 184 | 185 | /// Splits off i elements from the begin of this slice into a separate slice. 186 | #[inline] 187 | pub fn split_off_begin(&mut self, i: usize) -> Self { 188 | if i <= self.len() { 189 | unsafe { 190 | let mid = self.begin.add(i); 191 | let other = Self::from_pair_unchecked(self.begin, mid); 192 | self.begin = mid; 193 | other 194 | } 195 | } else { 196 | abort() 197 | } 198 | } 199 | 200 | /// Splits off i elements from the end of this slice into a separate slice. 201 | #[inline] 202 | pub fn split_off_end(&mut self, i: usize) -> Self { 203 | if i <= self.len() { 204 | unsafe { 205 | let mid = self.end.sub(i); 206 | let other = Self::from_pair_unchecked(mid, self.end); 207 | self.end = mid; 208 | other 209 | } 210 | } else { 211 | abort() 212 | } 213 | } 214 | 215 | // For debugging without becoming unsafe. 216 | pub fn begin_address(&self) -> usize { 217 | self.begin as usize 218 | } 219 | 220 | pub fn end_address(&self) -> usize { 221 | self.end as usize 222 | } 223 | } 224 | 225 | // We only implement concat for branded slices. 226 | impl<'l, B: Brand, T, S> MutSlice<'l, B, T, S> { 227 | /// Concatenates two slices if self and right are contiguous, in that order. 228 | /// Aborts if they are not. 229 | #[inline] 230 | pub fn concat(self, right: Self) -> Self { 231 | unsafe { 232 | if self.end == right.begin { 233 | // SAFETY: the check makes sure this is correct, as the brand on 234 | // Self guarantees us these two slices point in the same allocation 235 | // with full provenance. Brand/state is preserved. 236 | Self::from_pair_unchecked(self.begin, right.end) 237 | } else { 238 | abort(); 239 | } 240 | } 241 | } 242 | } 243 | 244 | // ======== Access/mutation ======== 245 | #[rustfmt::skip] 246 | impl<'l, B, T, S> MutSlice<'l, B, T, S> { 247 | #[inline] pub fn begin(&self) -> *mut T { self.begin } 248 | #[inline] pub fn end(&self) -> *mut T { self.end } 249 | } 250 | 251 | #[rustfmt::skip] 252 | impl<'l, B, T, S> MutSlice<'l, B, T, S> { 253 | // Unsafe bounds mutation, safety must be maintained by caller. 254 | #[inline] pub unsafe fn add_begin(&mut self, n: usize) { self.begin = unsafe { self.begin.add(n) } } 255 | #[inline] pub unsafe fn sub_begin(&mut self, n: usize) { self.begin = unsafe { self.begin.sub(n) } } 256 | #[inline] pub unsafe fn wrapping_add_begin(&mut self, n: usize) { self.begin = self.begin.wrapping_add(n) } 257 | #[inline] pub unsafe fn wrapping_sub_begin(&mut self, n: usize) { self.begin = self.begin.wrapping_sub(n) } 258 | #[inline] pub unsafe fn add_end(&mut self, n: usize) { self.end = unsafe { self.end.add(n) } } 259 | #[inline] pub unsafe fn sub_end(&mut self, n: usize) { self.end = unsafe { self.end.sub(n) } } 260 | #[inline] pub unsafe fn wrapping_add_end(&mut self, n: usize) { self.end = self.end.wrapping_add(n) } 261 | #[inline] pub unsafe fn wrapping_sub_end(&mut self, n: usize) { self.end = self.end.wrapping_sub(n) } 262 | } 263 | 264 | impl<'l, B, T> MutSlice<'l, B, T, Init> { 265 | #[inline] 266 | pub fn move_to<'dst_l, DstB>( 267 | self, 268 | dst: MutSlice<'dst_l, DstB, T, Uninit>, 269 | ) -> (MutSlice<'l, B, T, Uninit>, MutSlice<'dst_l, DstB, T, Init>) { 270 | unsafe { 271 | if self.len() != dst.len() { 272 | abort(); 273 | } 274 | 275 | // SAFETY: we may write to dst, and this write can be assumed to be 276 | // non-overlapping by slice disjointness. The lengths are equal. 277 | ptr::copy_nonoverlapping(self.begin(), dst.begin(), self.len()); 278 | (self.assume_uninit(), dst.assume_init()) 279 | } 280 | } 281 | } 282 | 283 | // ========= Conversions ========= 284 | // The *only* way to get rid of AlwaysInit is to raw() it, which is unsafe. 285 | impl<'l, B, T> MutSlice<'l, B, T, AlwaysInit> { 286 | /// SAFETY: I solemny swear that I will ensure this slice is properly 287 | /// initialized before returning to whoever gave me the AlwaysInit slice, 288 | /// even in the case of panics. 289 | #[inline] 290 | pub unsafe fn raw(self) -> MutSlice<'l, B, T, Init> { 291 | unsafe { self.transmute_metadata() } 292 | } 293 | } 294 | 295 | // The other way around is always safe. 296 | impl<'l, B, T> MutSlice<'l, B, T, Init> { 297 | #[inline] 298 | pub fn always_init(self) -> MutSlice<'l, B, T, AlwaysInit> { 299 | // SAFETY: this just adds a guarantee that's currently true. 300 | unsafe { self.transmute_metadata() } 301 | } 302 | } 303 | 304 | // Shedding/regaining state. 305 | impl<'l, B, T, S> MutSlice<'l, B, T, S> { 306 | #[inline] 307 | pub fn forget_brand(self) -> MutSlice<'l, Unbranded, T, S> { 308 | // SAFETY: a brand only grants permissions, it's always safe to drop it. 309 | unsafe { self.transmute_metadata() } 310 | } 311 | 312 | #[inline] 313 | pub fn weak(&self) -> MutSlice<'l, B, T, Weak> { 314 | // SAFETY: it's always safe to make a weak slice, even from &self. 315 | unsafe { MutSlice::from_pair_unchecked(self.begin, self.end) } 316 | } 317 | } 318 | 319 | impl<'l, B, T> MutSlice<'l, B, T, Weak> { 320 | /// SAFETY: the caller is responsible for ensuring this slice does not alias. 321 | #[inline] 322 | pub unsafe fn upgrade(self) -> MutSlice<'l, B, T, MaybeInit> { 323 | unsafe { self.transmute_metadata() } 324 | } 325 | } 326 | 327 | // Non-AlwaysInit initialization conversions. 328 | impl<'l, B, T, S: RawAccess> MutSlice<'l, B, T, S> { 329 | /// SAFETY: slice must only contain initialized objects. 330 | #[inline] 331 | pub unsafe fn assume_init(self) -> MutSlice<'l, B, T, Init> { 332 | unsafe { self.transmute_metadata() } 333 | } 334 | 335 | #[inline] 336 | pub fn assume_uninit(self) -> MutSlice<'l, B, T, Uninit> { 337 | // SAFETY: at worst this leaks objects. 338 | unsafe { self.transmute_metadata() } 339 | } 340 | 341 | #[inline] 342 | pub fn allow_maybe_init(self) -> MutSlice<'l, B, T, MaybeInit> { 343 | // SAFETY: this drops all init guarantees. 344 | unsafe { self.transmute_metadata() } 345 | } 346 | } 347 | 348 | // ========= Borrows, clones, moves. ========= 349 | impl<'l, B, T, S> MutSlice<'l, B, T, S> { 350 | /// Takes this slice, leaving behind an empty slice. 351 | #[inline] 352 | pub fn take(&mut self) -> Self { 353 | unsafe { 354 | // SAFETY: we make ourselves the empty slice which is disjoint with 355 | // any other slice and return a copy of original. 356 | let weak = self.weak(); 357 | self.sub_end(self.len()); 358 | weak.transmute_metadata() 359 | } 360 | } 361 | } 362 | 363 | impl<'l, B, T, S: IsInit> MutSlice<'l, B, T, S> { 364 | #[inline] 365 | pub fn as_slice<'a>(&'a self) -> &'a [T] { 366 | // SAFETY: we're disjoint and initialized, so this is fine. 367 | unsafe { core::slice::from_raw_parts(self.begin, self.len()) } 368 | } 369 | 370 | #[inline] 371 | pub fn as_mut_slice<'a>(&'a mut self) -> &'a mut [T] { 372 | // SAFETY: we're disjoint and initialized, so this is fine. 373 | unsafe { core::slice::from_raw_parts_mut(self.begin, self.len()) } 374 | } 375 | } 376 | 377 | impl<'l, B, T, S: State> MutSlice<'l, B, T, S> { 378 | #[inline] 379 | pub fn borrow<'a>(&'a mut self) -> MutSlice<'a, B, T, S::Borrowed> { 380 | // SAFETY: Lifetime 'a ensures self can't be used at the same time as 381 | // the slice we return. The State::Borrowed state ensure that 382 | // no matter what the borrower does (safely), it can't violate the 383 | // original state. 384 | unsafe { MutSlice::from_pair_unchecked(self.begin, self.end) } 385 | } 386 | } 387 | 388 | impl<'l, B, T> Clone for MutSlice<'l, B, T, Weak> { 389 | #[inline] 390 | fn clone(&self) -> Self { 391 | // SAFETY: it's safe to clone a weak slice. 392 | self.weak() 393 | } 394 | } 395 | 396 | /// Helper function to get scratch space on the stack. 397 | #[inline(always)] 398 | pub fn with_stack_scratch(name: &'static str, f: F) -> R 399 | where 400 | F: for<'l, 'b> FnOnce(MutSlice<'l, InvariantLifetime<'b>, T, Uninit>) -> R, 401 | { 402 | let mut scratch_space: [MaybeUninit; N] = unsafe { MaybeUninit::uninit().assume_init() }; 403 | MutSlice::from_maybeuninit_mut_slice(&mut scratch_space, |scratch| { 404 | tracking::register_buffer(name, scratch.weak()); 405 | let ret = f(scratch.assume_uninit()); 406 | tracking::deregister_buffer(name); 407 | ret 408 | }) 409 | } 410 | -------------------------------------------------------------------------------- /src/physical_merges.rs: -------------------------------------------------------------------------------- 1 | // For physical merges we have two main goals: 2 | // 3 | // 1. Reduce unnecessary data movement moving into/from scratch space. Ideally 4 | // every move is part of a merge operation. We do this by combining multiple 5 | // merges in glidesort's logical merge structure, and on the physical level 6 | // by splitting merges into smaller merges. 7 | // 8 | // 2. We want to combine multiple independent merges into interleaved loops to 9 | // use instruction-level parallelism and hide critical path data dependecy 10 | // latencies. 11 | 12 | use crate::branchless_merge::BranchlessMergeState; 13 | use crate::gap_guard::GapGuard; 14 | use crate::merge_reduction::{merge_splitpoints, shrink_stable_merge}; 15 | use crate::mut_slice::states::{AlwaysInit, Init, Uninit}; 16 | use crate::mut_slice::{Brand, MutSlice}; 17 | use crate::tracking::ptr; 18 | use crate::util::*; 19 | 20 | /// Merges adjacent runs left, right using the given scratch space. 21 | /// 22 | /// Panics if if_less does, aborts if left, right aren't contiguous. 23 | pub(crate) fn physical_merge<'el, 'sc, BE: Brand, BS: Brand, T, F: Cmp>( 24 | mut left: MutSlice<'el, BE, T, AlwaysInit>, 25 | mut right: MutSlice<'el, BE, T, AlwaysInit>, 26 | mut scratch: MutSlice<'sc, BS, T, Uninit>, 27 | is_less: &mut F, 28 | ) -> MutSlice<'el, BE, T, AlwaysInit> { 29 | let ret = left.weak().concat(right.weak()); 30 | 31 | while let Some((left_shrink, right_shrink)) = 32 | shrink_stable_merge(left.as_mut_slice(), right.as_mut_slice(), is_less) 33 | { 34 | // Shrink. 35 | left = left.split_at(left_shrink).unwrap().1; 36 | right = right.split_at(right_shrink).unwrap().0; 37 | 38 | // Split into parallel merge with left1.len() == right0.len(). 39 | let (lsplit, rsplit) = 40 | merge_splitpoints(left.as_mut_slice(), right.as_mut_slice(), is_less); 41 | let (left0, left1) = left.split_at(lsplit).unwrap_abort(); 42 | let (right0, right1) = right.split_at(rsplit).unwrap_abort(); 43 | 44 | // Logically we swap left1 and right0 and then merge (left0, left1) and (right0, right1). 45 | // In reality we can often avoid the explicit swap by renaming scratch buffers. 46 | if let Some((left1_scratch, _)) = scratch.borrow().split_at(left1.len()) { 47 | unsafe { 48 | // SAFETY: should a panic occur first the left1_gap is filled 49 | // by merge_right_gap, consuming right0, then this gap is filled 50 | // by the left1_with_right0_gap guard. 51 | let (left1_gap, left1_scratch) = left1.raw().move_to(left1_scratch); 52 | let left1_with_right0_gap = 53 | GapGuard::new_unchecked(left1_scratch.weak(), right0.weak()); 54 | let right0_with_left1_gap = GapGuard::new_disjoint(right0.raw(), left1_gap); 55 | merge_right_gap(left0, right0_with_left1_gap, is_less); 56 | merge_left_gap(left1_with_right0_gap, right1, is_less); 57 | break; 58 | } 59 | } else { 60 | unsafe { 61 | // Same as left1.as_mut_slice().swap_with_slice(right0.as_mut_slice()); 62 | // Uses ptr for tracking purposes. 63 | ptr::swap_nonoverlapping(left1.weak().begin(), right0.weak().begin(), left1.len()); 64 | } 65 | physical_merge(right0, right1, scratch.borrow(), is_less); 66 | } 67 | 68 | (left, right) = (left0, left1); 69 | } 70 | 71 | // SAFETY: we never permanently moved any elements out of our range. 72 | unsafe { ret.upgrade().assume_init().always_init() } 73 | } 74 | 75 | /// Merges sorted runs r0, r1, r2 using the given scratch space. 76 | /// 77 | /// Panics if_less does, aborts if r0, r1, r2 aren't contiguous. 78 | pub(crate) fn physical_triple_merge<'el, 'sc, BE: Brand, BS: Brand, T, F: Cmp>( 79 | r0: MutSlice<'el, BE, T, AlwaysInit>, 80 | r1: MutSlice<'el, BE, T, AlwaysInit>, 81 | r2: MutSlice<'el, BE, T, AlwaysInit>, 82 | scratch: MutSlice<'sc, BS, T, Uninit>, 83 | is_less: &mut F, 84 | ) -> MutSlice<'el, BE, T, AlwaysInit> { 85 | unsafe { 86 | // SAFETY: our only constraint and reason this function body is unsafe is 87 | // that the gap guards returned by try_merge_into_scratch may not be forgotten. 88 | 89 | // Merge r0, r1 or r1, r2 first? 90 | if r0.len() < r2.len() { 91 | match try_merge_into_scratch(r0, r1, scratch, is_less) { 92 | (Ok(r0r1), _rest_scratch) => merge_left_gap(r0r1, r2, is_less), 93 | (Err((r0, r1)), mut scratch) => { 94 | let r0r1 = physical_merge(r0, r1, scratch.borrow(), is_less); 95 | physical_merge(r0r1, r2, scratch, is_less) 96 | } 97 | } 98 | } else { 99 | match try_merge_into_scratch(r1, r2, scratch, is_less) { 100 | (Ok(r1r2), _rest_scratch) => merge_right_gap(r0, r1r2, is_less), 101 | (Err((r1, r2)), mut scratch) => { 102 | let r1r2 = physical_merge(r1, r2, scratch.borrow(), is_less); 103 | physical_merge(r0, r1r2, scratch, is_less) 104 | } 105 | } 106 | } 107 | } 108 | } 109 | 110 | /// Merges sorted runs r0, r1, r2, r3 using the given scratch space. 111 | /// 112 | /// Panics if is_less does, aborts if r0, r1, r2, r3 aren't contiguous. 113 | pub(crate) fn physical_quad_merge<'el, 'sc, BE: Brand, BS: Brand, T, F: Cmp>( 114 | r0: MutSlice<'el, BE, T, AlwaysInit>, 115 | r1: MutSlice<'el, BE, T, AlwaysInit>, 116 | r2: MutSlice<'el, BE, T, AlwaysInit>, 117 | r3: MutSlice<'el, BE, T, AlwaysInit>, 118 | mut scratch: MutSlice<'sc, BS, T, Uninit>, 119 | is_less: &mut F, 120 | ) -> MutSlice<'el, BE, T, AlwaysInit> { 121 | let left_len = r0.len() + r1.len(); 122 | let right_len = r2.len() + r3.len(); 123 | 124 | if let Some((scratch, _)) = scratch.borrow().split_at(left_len + right_len) { 125 | unsafe { 126 | let dst = r0 127 | .weak() 128 | .concat(r1.weak()) 129 | .concat(r2.weak()) 130 | .concat(r3.weak()); 131 | let guard = GapGuard::new_unchecked(scratch.weak(), dst); 132 | double_merge_into(r0.raw(), r1.raw(), r2.raw(), r3.raw(), scratch, is_less); 133 | let (left, right) = guard.split_at(left_len).unwrap_abort(); 134 | return merge_into_gap(left, right, is_less).always_init(); 135 | } 136 | } 137 | 138 | unsafe { 139 | // Try to merge the bigger pair into scratch first. This guarantees that if the 140 | // bigger one fits but the smaller one doesn't we can use the old space of 141 | // the bigger one as scratch space while merging the smaller one. 142 | let (left_merge, right_merge); 143 | if left_len >= right_len { 144 | (left_merge, scratch) = try_merge_into_scratch(r0, r1, scratch, is_less); 145 | (right_merge, scratch) = try_merge_into_scratch(r2, r3, scratch, is_less); 146 | } else { 147 | (right_merge, scratch) = try_merge_into_scratch(r2, r3, scratch, is_less); 148 | (left_merge, scratch) = try_merge_into_scratch(r0, r1, scratch, is_less); 149 | }; 150 | 151 | match (left_merge, right_merge) { 152 | (Ok(_left), Ok(_right)) => unreachable!(), 153 | 154 | (Ok(mut left), Err((r2, r3))) => { 155 | // We can use the gap from left to help merge r2, r3. It must be 156 | // bigger than whatever's left of the original scratch space, otherwise 157 | // both would've fit. 158 | let right = physical_merge(r2, r3, left.borrow_gap(), is_less); 159 | merge_left_gap(left, right, is_less) 160 | } 161 | 162 | (Err((r0, r1)), Ok(mut right)) => { 163 | // Vice versa. 164 | let left = physical_merge(r0, r1, right.borrow_gap(), is_less); 165 | merge_right_gap(left, right, is_less) 166 | } 167 | 168 | (Err((r0, r1)), Err((r2, r3))) => { 169 | let left = physical_merge(r0, r1, scratch.borrow(), is_less); 170 | let right = physical_merge(r2, r3, scratch.borrow(), is_less); 171 | physical_merge(left, right, scratch, is_less) 172 | } 173 | } 174 | } 175 | } 176 | 177 | /// Merges sorted runs left, right, using the given gap which must be equal in 178 | /// size to left and just before right (otherwise we abort). 179 | /// 180 | /// Should a panic occur all input is written to left.weak_gap().concat(right), 181 | /// in an unspecified order. 182 | pub(crate) fn merge_left_gap<'l, 'r, BL: Brand, BR: Brand, T, F: Cmp>( 183 | mut left: GapGuard<'l, 'r, BL, BR, T>, 184 | mut right: MutSlice<'r, BR, T, AlwaysInit>, 185 | is_less: &mut F, 186 | ) -> MutSlice<'r, BR, T, AlwaysInit> { 187 | let ret = left.gap_weak().concat(right.weak()); 188 | loop { 189 | if left.len().min(right.len()) >= crate::MERGE_SPLIT_THRESHOLD { 190 | let (lsplit, rsplit) = 191 | merge_splitpoints(left.as_mut_slice(), right.as_mut_slice(), is_less); 192 | let (left0, left1) = left.split_at(lsplit).unwrap_abort(); 193 | let (right0, right1) = right.split_at(rsplit).unwrap_abort(); 194 | 195 | unsafe { 196 | // SAFETY: merge_into_gap ensures right0 moves into left1_gap. 197 | // Afterwards the other gap guard ensures left1_data moves into 198 | // the gap from right0. 199 | let (left1_data, left1_gap) = left1.take_disjoint(); 200 | let left1_with_right0_gap = 201 | GapGuard::new_unchecked(left1_data.weak(), right0.weak()); 202 | let right0_with_left1_gap = GapGuard::new_disjoint(right0.raw(), left1_gap); 203 | merge_into_gap(left0, right0_with_left1_gap, is_less); 204 | left = left1_with_right0_gap; 205 | right = right1; 206 | } 207 | } else { 208 | let merge_state = BranchlessMergeState::new_gap_left(left, right); 209 | merge_state.finish_merge(is_less); 210 | return unsafe { ret.upgrade().assume_init().always_init() }; 211 | } 212 | } 213 | } 214 | 215 | /// Merges sorted runs left, right, using the given gap which must be equal in 216 | /// size to right and just after left (otherwise we abort). 217 | /// 218 | /// Should a panic occur all input is written to left.concat(gap), in an 219 | /// unspecified order. 220 | pub(crate) fn merge_right_gap<'l, 'r, BL: Brand, BR: Brand, T, F: Cmp>( 221 | mut left: MutSlice<'l, BL, T, AlwaysInit>, 222 | mut right: GapGuard<'r, 'l, BR, BL, T>, 223 | is_less: &mut F, 224 | ) -> MutSlice<'l, BL, T, AlwaysInit> { 225 | let ret = left.weak().concat(right.gap_weak()); 226 | loop { 227 | if left.len().min(right.len()) >= crate::MERGE_SPLIT_THRESHOLD { 228 | let (lsplit, rsplit) = 229 | merge_splitpoints(left.as_mut_slice(), right.as_mut_slice(), is_less); 230 | let (left0, left1) = left.split_at(lsplit).unwrap_abort(); 231 | let (right0, right1) = right.split_at(rsplit).unwrap_abort(); 232 | 233 | unsafe { 234 | // SAFETY: merge_into_gap ensures left1 moves into right0_gap. 235 | // Afterwards the other gap guard ensures right0_data moves into 236 | // the gap from left1. 237 | let (right0_data, right0_gap) = right0.take_disjoint(); 238 | let right0_with_left1_gap = 239 | GapGuard::new_unchecked(right0_data.weak(), left1.weak()); 240 | let left1_with_right0_gap = GapGuard::new_disjoint(left1.raw(), right0_gap); 241 | merge_into_gap(left1_with_right0_gap, right1, is_less); 242 | left = left0; 243 | right = right0_with_left1_gap; 244 | } 245 | } else { 246 | let merge_state = BranchlessMergeState::new_gap_right(left, right); 247 | merge_state.finish_merge(is_less); 248 | return unsafe { ret.upgrade().assume_init().always_init() }; 249 | } 250 | } 251 | } 252 | 253 | // Merges consecutive runs left, right into the given scratch space. Returns an 254 | // object that would move the elements back into the gap if it were to be 255 | // dropped, use res.take() if you want the actual result. 256 | // 257 | // Returns either the merge result if successful, or the original two slices. 258 | // Also returns what is left of the scratch space. 259 | // 260 | /// Panics if_less does, aborts if left, right aren't contiguous. If a panic 261 | /// occurs all elements are returned to left, right. 262 | /// 263 | /// SAFETY: the returned gap guard may not be forgotten. 264 | pub(crate) unsafe fn try_merge_into_scratch<'el, 'sc, BE: Brand, BS: Brand, T, F: Cmp>( 265 | left: MutSlice<'el, BE, T, AlwaysInit>, 266 | right: MutSlice<'el, BE, T, AlwaysInit>, 267 | mut scratch: MutSlice<'sc, BS, T, Uninit>, 268 | is_less: &mut F, 269 | ) -> ( 270 | Result< 271 | GapGuard<'sc, 'el, BS, BE, T>, 272 | ( 273 | MutSlice<'el, BE, T, AlwaysInit>, 274 | MutSlice<'el, BE, T, AlwaysInit>, 275 | ), 276 | >, 277 | MutSlice<'sc, BS, T, Uninit>, 278 | ) { 279 | let gap = left.weak().concat(right.weak()); 280 | if scratch.len() >= gap.len() { 281 | unsafe { 282 | let dst = scratch.split_off_begin(gap.len()); 283 | // SAFETY: Should something panic all elements first get moved into 284 | // the scratch before being moved back by ret's gap guard. 285 | let ret = GapGuard::new_unchecked(dst.weak(), gap.weak()); 286 | let (left_scratch, right_scratch) = dst.split_at(left.len()).unwrap_abort(); 287 | let left = GapGuard::new_disjoint(left.raw(), left_scratch); 288 | let right = GapGuard::new_disjoint(right.raw(), right_scratch); 289 | merge_into_gap(left, right, is_less); 290 | (Ok(ret), scratch) 291 | } 292 | } else { 293 | (Err((left, right)), scratch) 294 | } 295 | } 296 | 297 | /// Merges sorted runs left, right into their gaps, which must be contiguous. 298 | /// 299 | /// Should a panic occur the gap is filled. 300 | pub(crate) fn merge_into_gap<'src, 'dst, BL: Brand, BR: Brand, BD: Brand, T, F: Cmp>( 301 | mut left: GapGuard<'src, 'dst, BL, BD, T>, 302 | mut right: GapGuard<'src, 'dst, BR, BD, T>, 303 | is_less: &mut F, 304 | ) -> MutSlice<'dst, BD, T, Init> { 305 | let ret = left.gap_weak().concat(right.gap_weak()); 306 | if left.len().min(right.len()) >= crate::MERGE_SPLIT_THRESHOLD { 307 | let (lsplit, rsplit) = 308 | merge_splitpoints(left.as_mut_slice(), right.as_mut_slice(), is_less); 309 | 310 | unsafe { 311 | // These merge states will ensure that in a panic we write all output to dest. 312 | let (left_data, left_gap) = left.take_disjoint(); 313 | let (right_data, right_gap) = right.take_disjoint(); 314 | let (left0, left1) = left_data.split_at(lsplit).unwrap_abort(); 315 | let (right0, right1) = right_data.split_at(rsplit).unwrap_abort(); 316 | double_merge_into( 317 | left0, 318 | right0, 319 | left1, 320 | right1, 321 | left_gap.concat(right_gap), 322 | is_less, 323 | ); 324 | } 325 | } else { 326 | unsafe { 327 | let merge_state = BranchlessMergeState::new_disjoint( 328 | left.take_disjoint().0, 329 | right.take_disjoint().0, 330 | ret.clone().upgrade().assume_uninit(), 331 | ); 332 | merge_state.finish_merge(is_less); 333 | } 334 | } 335 | 336 | unsafe { ret.upgrade().assume_init() } 337 | } 338 | 339 | /// Merges sorted runs left0, left1, right0, right1 into the destination. 340 | /// 341 | /// Should a panic occur all elements are moved into the destination. 342 | pub(crate) fn double_merge_into<'src, 'dst, BL0, BL1, BR0, BR1, BD, T, F: Cmp>( 343 | left0: MutSlice<'src, BL0, T, Init>, 344 | left1: MutSlice<'src, BL1, T, Init>, 345 | right0: MutSlice<'src, BR0, T, Init>, 346 | right1: MutSlice<'src, BR1, T, Init>, 347 | dest: MutSlice<'dst, BD, T, Uninit>, 348 | is_less: &mut F, 349 | ) -> MutSlice<'dst, BD, T, Init> { 350 | let ret = dest.weak(); 351 | let left_len = left0.len() + left1.len(); 352 | let (left_dest, right_dest) = dest.split_at(left_len).unwrap_abort(); 353 | // These merge states will ensure that in a panic we write all output to dest. 354 | let left_merge_state = BranchlessMergeState::new_disjoint(left0, left1, left_dest); 355 | let right_merge_state = BranchlessMergeState::new_disjoint(right0, right1, right_dest); 356 | left_merge_state.finish_merge_interleaved(right_merge_state, is_less); 357 | unsafe { ret.upgrade().assume_init() } 358 | } 359 | -------------------------------------------------------------------------------- /src/pivot_selection.rs: -------------------------------------------------------------------------------- 1 | use crate::mut_slice::states::AlwaysInit; 2 | use crate::mut_slice::MutSlice; 3 | use crate::util::*; 4 | 5 | /// Selects a pivot from left, right. 6 | #[inline] 7 | pub fn select_pivot<'l, 'r, BL, BR, T, F: Cmp>( 8 | left: MutSlice<'l, BL, T, AlwaysInit>, 9 | right: MutSlice<'r, BR, T, AlwaysInit>, 10 | is_less: &mut F, 11 | ) -> *mut T { 12 | unsafe { 13 | // We use unsafe code and raw pointers here because we're dealing with 14 | // two non-contiguous buffers and heavy recursion. Passing safe slices 15 | // around would involve a lot of branches and function call overhead. 16 | let left = left.raw(); 17 | let right = right.raw(); 18 | 19 | // Get a, b, c as the start of three regions of size n / 8, avoiding the 20 | // boundary in the elements. 21 | let n = left.len() + right.len(); 22 | let a = if left.len() >= n / 8 { 23 | left.begin() 24 | } else { 25 | right.begin() 26 | }; 27 | let b = if left.len() >= n / 2 { 28 | left.begin().add(n / 2).sub(n / 8) 29 | } else { 30 | right.end().sub(n / 2) 31 | }; 32 | let c = if right.len() >= n / 8 { 33 | right.end().sub(n / 8) 34 | } else { 35 | left.end().sub(n / 8) 36 | }; 37 | 38 | if n < crate::PSEUDO_MEDIAN_REC_THRESHOLD { 39 | median3(a, b, c, is_less) 40 | } else { 41 | median3_rec(a, b, c, n / 8, is_less) 42 | } 43 | } 44 | } 45 | 46 | /// Calculates an approximate median of 3 elements from sections a, b, c, or recursively from an 47 | /// approximation of each, if they're large enough. By dividing the size of each section by 8 when 48 | /// recursing we have logarithmic recursion depth and overall sample from 49 | /// f(n) = 3*f(n/8) -> f(n) = O(n^(log(3)/log(8))) ~= O(n^0.528) elements. 50 | /// 51 | /// SAFETY: a, b, c must point to the start of initialized regions of memory of 52 | /// at least n elements. 53 | #[cold] 54 | pub unsafe fn median3_rec>( 55 | mut a: *mut T, 56 | mut b: *mut T, 57 | mut c: *mut T, 58 | n: usize, 59 | is_less: &mut F, 60 | ) -> *mut T { 61 | unsafe { 62 | if n * 8 >= crate::PSEUDO_MEDIAN_REC_THRESHOLD { 63 | let n8 = n / 8; 64 | a = median3_rec(a, a.add(n8 * 4), a.add(n8 * 7), n8, is_less); 65 | b = median3_rec(b, b.add(n8 * 4), b.add(n8 * 7), n8, is_less); 66 | c = median3_rec(c, c.add(n8 * 4), c.add(n8 * 7), n8, is_less); 67 | } 68 | median3(a, b, c, is_less) 69 | } 70 | } 71 | 72 | /// Calculates the median of 3 elements. 73 | /// 74 | /// SAFETY: a, b, c must be valid initialized elements. 75 | #[inline(always)] 76 | unsafe fn median3>(a: *mut T, b: *mut T, c: *mut T, is_less: &mut F) -> *mut T { 77 | // Compiler tends to make this branchless when sensible, and avoids the 78 | // third comparison when not. 79 | unsafe { 80 | let x = is_less(&*a, &*b); 81 | let y = is_less(&*a, &*c); 82 | if x == y { 83 | // If x=y=0 then b, c <= a. In this case we want to return max(b, c). 84 | // If x=y=1 then a < b, c. In this case we want to return min(b, c). 85 | // By toggling the outcome of b < c using XOR x we get this behavior. 86 | let z = is_less(&*b, &*c); 87 | select(z ^ x, c, b) 88 | } else { 89 | // Either c <= a < b or b <= a < c, thus a is our median. 90 | a 91 | } 92 | } 93 | } 94 | -------------------------------------------------------------------------------- /src/powersort.rs: -------------------------------------------------------------------------------- 1 | // Nearly-Optimal Mergesorts: Fast, Practical Sorting Methods That Optimally 2 | // Adapt to Existing Runs by J. Ian Munro and Sebastian Wild. 3 | // 4 | // This method forms a binary merge tree, where each internal node corresponds 5 | // to a splitting point between the adjacent runs that have to be merged. If we 6 | // visualize our array as the number line from 0 to 1, we want to find the 7 | // dyadic fraction with smallest denominator that lies between the midpoints of 8 | // our to-be-merged slices. The exponent in the dyadic fraction indicates the 9 | // desired depth in the binary merge tree this internal node wishes to have. 10 | // This does not always correspond to the actual depth due to the inherent 11 | // imbalance in runs, but we follow it as closely as possible. 12 | // 13 | // As an optimization we rescale the number line from [0, 1) to [0, 2^62). Then 14 | // finding the simplest dyadic fraction between midpoints corresponds to finding 15 | // the most significant bit difference of the midpoints. We save scale_factor = 16 | // ceil(2^62 / n) to perform this rescaling using a multiplication, avoiding 17 | // having to repeatedly do integer divides. This rescaling isn't exact when n is 18 | // not a power of two since we use integers and not reals, but the result is 19 | // very close, and in fact when n < 2^30 the resulting tree is equivalent as the 20 | // approximation errors stay entirely in the lower order bits. 21 | // 22 | // Thus for the splitting point between two adjacent slices [a, b) and [b, c) 23 | // the desired depth of the corresponding merge node is CLZ((a+b)*f ^ (b+c)*f), 24 | // where CLZ counts the number of leading zeros in an integer and f is our scale 25 | // factor. Note that we omitted the division by two in the midpoint 26 | // calculations, as this simply shifts the bits by one position (and thus always 27 | // adds one to the result), and we only care about the relative depths. 28 | // 29 | // It is important that for any three slices [a, b), [b, c), [c, d) with 30 | // a < b < c < d we have that tree_depth([a, b), [b, c)) != tree_depth([b, c), 31 | // [c, d)), as this would break our implicit tree structure and potentially our 32 | // log2(n) stack size limit. This is proven in the original paper, but our 33 | // approximation complicates things. Let x, y, z respectively be (a+b)*f, 34 | // (b+c)*f, (c+d)*f, then what we wish to prove is CLZ(x ^ y) != CLZ(y ^ z). 35 | // 36 | // Because a < c we have x < y, and similarly we have y < z. Since x < y we can 37 | // conclude that CLZ(x ^ y) will be determined by a bit position where x is 0 38 | // and y is 1, as it is the most significant bit difference. Apply a similar 39 | // logic for y and z and you would conclude that it is determined by a bit 40 | // position where y is 0 and z is 1. Thus looking at y it can't be the same bit 41 | // position in both cases, and we must have CLZ(x ^ y) != CLZ(y ^ z). 42 | // 43 | // Finally, if we try to upper bound z giving z = ceil(2^62 / n) * (n-1 + n) then 44 | // z < (2^62 / n + 1) * 2n 45 | // z < 2^63 + 2n 46 | // So as long as n < 2^62 we find that z < 2^64, meaning our operations do not 47 | // overflow. 48 | pub fn merge_tree_scale_factor(n: usize) -> u64 { 49 | ((1 << 62) + n as u64 - 1) / n as u64 50 | } 51 | 52 | pub fn merge_tree_depth(left: usize, mid: usize, right: usize, scale_factor: u64) -> u8 { 53 | let x = left as u64 + mid as u64; 54 | let y = mid as u64 + right as u64; 55 | ((scale_factor * x) ^ (scale_factor * y)).leading_zeros() as u8 56 | } 57 | -------------------------------------------------------------------------------- /src/small_sort.rs: -------------------------------------------------------------------------------- 1 | use crate::branchless_merge::BranchlessMergeState; 2 | use crate::mut_slice::states::{AlwaysInit, Init, MaybeInit, Uninit, Weak}; 3 | use crate::mut_slice::{with_stack_scratch, Brand, MutSlice, Unbranded}; 4 | use crate::tracking::ptr; 5 | use crate::util::*; 6 | 7 | pub fn small_sort<'l, B: Brand, T, F: Cmp>(el: MutSlice<'l, B, T, AlwaysInit>, is_less: &mut F) { 8 | block_insertion_sort(el, is_less) 9 | } 10 | 11 | /// Sorts four elements from src into dst. 12 | /// 13 | /// SAFETY: src and dst may not overlap. 14 | #[inline(always)] 15 | pub unsafe fn sort4_raw>(srcp: *mut T, dstp: *mut T, is_less: &mut F) { 16 | unsafe { 17 | // Stably create two pairs a <= b and c <= d. 18 | let c1 = is_less(&*srcp.add(1), &*srcp) as usize; 19 | let c2 = is_less(&*srcp.add(3), &*srcp.add(2)) as usize; 20 | let a = srcp.add(c1); 21 | let b = srcp.add(c1 ^ 1); 22 | let c = srcp.add(2 + c2); 23 | let d = srcp.add(2 + (c2 ^ 1)); 24 | 25 | // Compare (a, c) and (b, d) to identify max/min. We're left with two 26 | // unknown elements, but because we are a stable sort we must know which 27 | // one is leftmost and which one is rightmost. 28 | // c3, c4 | min max unk_left unk_right 29 | // 0, 0 | a d b c 30 | // 0, 1 | a b c d 31 | // 1, 0 | c d a b 32 | // 1, 1 | c b a d 33 | let c3 = is_less(&*c, &*a); 34 | let c4 = is_less(&*d, &*b); 35 | let min = select(c3, c, a); 36 | let max = select(c4, b, d); 37 | let unk_left = select(c3, a, select(c4, c, b)); 38 | let unk_right = select(c4, d, select(c3, b, c)); 39 | 40 | // Sort the last two unknown elements. 41 | let c5 = is_less(&*unk_right, &*unk_left); 42 | let lo = select(c5, unk_right, unk_left); 43 | let hi = select(c5, unk_left, unk_right); 44 | 45 | ptr::copy_nonoverlapping(min, dstp, 1); 46 | ptr::copy_nonoverlapping(lo, dstp.add(1), 1); 47 | ptr::copy_nonoverlapping(hi, dstp.add(2), 1); 48 | ptr::copy_nonoverlapping(max, dstp.add(3), 1); 49 | } 50 | } 51 | 52 | /// A helper struct for creating sorts of small 2^n sized arrays. It ensures 53 | /// that if the comparison operator panics all elements are moved back to the 54 | /// original src location. 55 | struct Pow2SmallSort<'l, BR, T> { 56 | orig_src: MutSlice<'l, BR, T, Weak>, 57 | cur_src: MutSlice<'l, Unbranded, T, Init>, 58 | cur_dst: MutSlice<'l, Unbranded, T, Uninit>, 59 | } 60 | 61 | impl<'l, BR, T> Pow2SmallSort<'l, BR, T> { 62 | pub fn new(src: MutSlice<'l, BR, T, Init>, dst: MutSlice<'l, BD, T, Uninit>) -> Self { 63 | assert_abort(src.len() == dst.len()); 64 | Self { 65 | orig_src: src.weak(), 66 | cur_src: src.forget_brand(), 67 | cur_dst: dst.forget_brand(), 68 | } 69 | } 70 | } 71 | 72 | impl<'l, BR, T> Pow2SmallSort<'l, BR, T> { 73 | // Set a new output destination. The current source must be exhausted. 74 | #[inline] 75 | pub fn set_new_dst(&mut self, new_dst: MutSlice<'l, BND, T, Uninit>) { 76 | unsafe { 77 | // SAFETY: because the old source location is empty, the old 78 | // destination must now be filled with initialized elements. 79 | assert_abort(self.cur_src.len() == 0); 80 | let mut old_dst = core::mem::replace(&mut self.cur_dst, new_dst.forget_brand()); 81 | old_dst.sub_begin(self.orig_src.len()); 82 | self.cur_src = old_dst.assume_init(); 83 | } 84 | } 85 | 86 | // Swap the source and destination. The current source must be exhausted. 87 | #[inline] 88 | pub fn swap_src_dst(&mut self) { 89 | unsafe { 90 | // SAFETY: because the old source location is empty, the old 91 | // destination must now be filled with initialized elements. 92 | assert_abort(self.cur_src.len() == 0); 93 | let mut new_dst = self.cur_src.weak().upgrade().assume_uninit(); 94 | new_dst.sub_begin(self.orig_src.len()); 95 | self.set_new_dst(new_dst); 96 | } 97 | } 98 | 99 | // Sort N/4 four-element arrays into N/4 four-element arrays from src into dst. 100 | #[inline] 101 | pub fn sort_groups_of_four_from_src_to_dst>( 102 | &mut self, 103 | is_less: &mut F, 104 | ) { 105 | unsafe { 106 | assert_abort(N % 4 == 0 && N == self.cur_src.len()); 107 | for _ in 0..N / 4 { 108 | sort4_raw(self.cur_src.begin(), self.cur_dst.begin(), is_less); 109 | self.cur_src.add_begin(4); 110 | self.cur_dst.add_begin(4); 111 | } 112 | } 113 | } 114 | 115 | /// Merge two k-element arrays into one 2k-element array from src into dst. 116 | pub fn final_merge_from_dst_into<'dst: 'l, const N: usize, BD, F: Cmp>( 117 | mut self, 118 | dst: MutSlice<'dst, BD, T, Uninit>, 119 | is_less: &mut F, 120 | ) -> MutSlice<'dst, BD, T, Init> { 121 | unsafe { 122 | assert_abort(N % 4 == 0 && dst.len() == N); 123 | let k = N / 2; 124 | let ret = dst.weak(); 125 | self.set_new_dst(dst); 126 | 127 | // The BranchlessMergeState will ensure things are moved to dest should 128 | // a panic occur, after which our drop handler will move it back to 129 | // orig_src. 130 | let backup_src = self.cur_src.weak(); 131 | let backup_dst = self.cur_dst.weak(); 132 | let left = self.cur_src.split_off_begin(k); 133 | let right = self.cur_src.split_off_begin(k); 134 | let dst = self.cur_dst.split_off_begin(2 * k); 135 | let mut merge_state = BranchlessMergeState::new_disjoint(left, right, dst); 136 | 137 | if T::may_call_ord_on_copy() { 138 | for _ in 0..k { 139 | merge_state.branchless_merge_one_at_begin(is_less); 140 | merge_state.branchless_merge_one_at_end(is_less); 141 | } 142 | 143 | if !merge_state.symmetric_merge_successful() { 144 | // Bad comparison operator, just copy over input. 145 | ptr::copy(backup_src.upgrade().begin(), backup_dst.begin(), N); 146 | } 147 | } else { 148 | for _ in 0..k / 2 { 149 | merge_state.branchless_merge_one_at_begin(is_less); 150 | merge_state.branchless_merge_one_at_end(is_less); 151 | } 152 | for _ in 0..k / 2 { 153 | // For Copy types these could be unguarded. All memory accesses 154 | // are in-bounds regardless, without the guard we would however 155 | // call the comparison operator on copies we would forget. 156 | merge_state.branchless_merge_one_at_begin_imbalance_guarded(is_less); 157 | merge_state.branchless_merge_one_at_end_imbalance_guarded(is_less); 158 | } 159 | } 160 | 161 | // All elements are properly initialized in dst. 162 | core::mem::forget(merge_state); 163 | core::mem::forget(self); 164 | ret.upgrade().assume_init() 165 | } 166 | } 167 | 168 | /// Merge four k-element arrays into two k-element arrays from src into dst. 169 | #[inline(never)] 170 | pub fn double_merge_from_src_to_dst>(&mut self, is_less: &mut F) { 171 | unsafe { 172 | assert_abort(N % 8 == 0 && N <= self.cur_src.len()); 173 | let k = N / 4; 174 | 175 | // The BranchlessMergeState will ensure things are moved to dest should 176 | // a panic occur, after which our drop handler will move it back to 177 | // orig_src. 178 | let backup_src = self.cur_src.weak(); 179 | let backup_dst = self.cur_dst.weak(); 180 | let left0 = self.cur_src.split_off_begin(k); 181 | let left1 = self.cur_src.split_off_begin(k); 182 | let right0 = self.cur_src.split_off_begin(k); 183 | let right1 = self.cur_src.split_off_begin(k); 184 | let left_dst = self.cur_dst.split_off_begin(2 * k); 185 | let right_dst = self.cur_dst.split_off_begin(2 * k); 186 | let mut left_merge = BranchlessMergeState::new_disjoint(left0, left1, left_dst); 187 | let mut right_merge = BranchlessMergeState::new_disjoint(right0, right1, right_dst); 188 | 189 | if T::may_call_ord_on_copy() { 190 | for _ in 0..k { 191 | left_merge.branchless_merge_one_at_begin(is_less); 192 | right_merge.branchless_merge_one_at_begin(is_less); 193 | left_merge.branchless_merge_one_at_end(is_less); 194 | right_merge.branchless_merge_one_at_end(is_less); 195 | } 196 | 197 | if !left_merge.symmetric_merge_successful() 198 | || !right_merge.symmetric_merge_successful() 199 | { 200 | // Bad comparison operator, just copy over input. 201 | ptr::copy(backup_src.upgrade().begin(), backup_dst.begin(), N); 202 | } 203 | } else { 204 | for _ in 0..k / 2 { 205 | left_merge.branchless_merge_one_at_begin(is_less); 206 | right_merge.branchless_merge_one_at_begin(is_less); 207 | left_merge.branchless_merge_one_at_end(is_less); 208 | right_merge.branchless_merge_one_at_end(is_less); 209 | } 210 | for _ in 0..k / 2 { 211 | // For Copy types these could be unguarded. All memory accesses 212 | // are in-bounds regardless, without the guard we would however 213 | // call the comparison operator on copies we would forget. 214 | left_merge.branchless_merge_one_at_begin_imbalance_guarded(is_less); 215 | right_merge.branchless_merge_one_at_begin_imbalance_guarded(is_less); 216 | left_merge.branchless_merge_one_at_end_imbalance_guarded(is_less); 217 | right_merge.branchless_merge_one_at_end_imbalance_guarded(is_less); 218 | } 219 | } 220 | 221 | // Merging fully done. 222 | core::mem::forget(left_merge); 223 | core::mem::forget(right_merge); 224 | } 225 | } 226 | } 227 | 228 | impl<'l, BR, T> Drop for Pow2SmallSort<'l, BR, T> { 229 | #[cold] 230 | fn drop(&mut self) { 231 | unsafe { 232 | // Put all elements back in orig_src. 233 | let num_in_src = self.cur_src.len(); 234 | let num_in_dst = self.orig_src.len() - num_in_src; 235 | ptr::copy(self.cur_src.begin(), self.orig_src.begin(), num_in_src); 236 | ptr::copy( 237 | self.cur_dst.begin().sub(num_in_dst), 238 | self.orig_src.begin().add(num_in_src), 239 | num_in_dst, 240 | ); 241 | } 242 | } 243 | } 244 | 245 | fn sort4_into<'src, 'dst, 'tmp, BS: Brand, BD: Brand, BT: Brand, T, F: Cmp>( 246 | src: MutSlice<'src, BS, T, Init>, 247 | dst: MutSlice<'dst, BD, T, Weak>, 248 | scratch: MutSlice<'tmp, BT, T, Uninit>, 249 | is_less: &mut F, 250 | ) -> MutSlice<'dst, BD, T, Init> { 251 | unsafe { 252 | sort4_raw(src.begin(), scratch.begin(), is_less); 253 | core::mem::forget(src); 254 | scratch 255 | .assume_init() 256 | .move_to(dst.upgrade().assume_uninit()) 257 | .1 258 | } 259 | } 260 | 261 | fn sort8_into<'src, 'dst, 'tmp, BS: Brand, BD: Brand, BT: Brand, T, F: Cmp>( 262 | src: MutSlice<'src, BS, T, Init>, 263 | dst: MutSlice<'dst, BD, T, Weak>, 264 | scratch: MutSlice<'tmp, BT, T, Uninit>, 265 | is_less: &mut F, 266 | ) -> MutSlice<'dst, BD, T, Init> { 267 | let mut sort = Pow2SmallSort::new(src, scratch); 268 | sort.sort_groups_of_four_from_src_to_dst::<8, F>(is_less); 269 | let dst = unsafe { dst.upgrade().assume_uninit() }; 270 | sort.final_merge_from_dst_into::<8, BD, F>(dst, is_less) 271 | } 272 | 273 | fn sort16_into<'src, 'dst, 'tmp, BS: Brand, BD: Brand, BT: Brand, T, F: Cmp>( 274 | src: MutSlice<'src, BS, T, Init>, 275 | dst: MutSlice<'dst, BD, T, Weak>, 276 | scratch: MutSlice<'tmp, BT, T, Uninit>, 277 | is_less: &mut F, 278 | ) -> MutSlice<'dst, BD, T, Init> { 279 | let (scratch0, scratch1) = scratch.split_at(16).unwrap_abort(); 280 | let mut sort = Pow2SmallSort::new(src, scratch0); 281 | sort.sort_groups_of_four_from_src_to_dst::<16, F>(is_less); 282 | sort.set_new_dst(scratch1); 283 | sort.double_merge_from_src_to_dst::<16, F>(is_less); 284 | let dst = unsafe { dst.upgrade().assume_uninit() }; 285 | sort.final_merge_from_dst_into::<16, BD, F>(dst, is_less) 286 | } 287 | 288 | fn sort32_into<'src, 'dst, 'tmp, BS: Brand, BD: Brand, BT: Brand, T, F: Cmp>( 289 | src: MutSlice<'src, BS, T, Init>, 290 | dst: MutSlice<'dst, BD, T, Weak>, 291 | scratch: MutSlice<'tmp, BT, T, Uninit>, 292 | is_less: &mut F, 293 | ) -> MutSlice<'dst, BD, T, Init> { 294 | let (scratch0, scratch1) = scratch.split_at(32).unwrap_abort(); 295 | let mut sort = Pow2SmallSort::new(src, scratch0); 296 | sort.sort_groups_of_four_from_src_to_dst::<32, F>(is_less); 297 | sort.set_new_dst(scratch1); 298 | sort.double_merge_from_src_to_dst::<16, F>(is_less); 299 | sort.double_merge_from_src_to_dst::<16, F>(is_less); 300 | sort.swap_src_dst(); 301 | sort.double_merge_from_src_to_dst::<32, F>(is_less); 302 | let dst = unsafe { dst.upgrade().assume_uninit() }; 303 | sort.final_merge_from_dst_into::<32, BD, F>(dst, is_less) 304 | } 305 | 306 | // A helper function that inserts sorted run src into dst, dst_hole, where the 307 | // hole is initially directly after dst. On a panic the hole is closed. 308 | struct BlockInserter<'l, BS, BD, T> { 309 | src: MutSlice<'l, BS, T, Init>, 310 | dst: MutSlice<'l, BD, T, MaybeInit>, 311 | hole_begin: *mut T, 312 | } 313 | 314 | impl<'l, BS, BD: Brand, T> BlockInserter<'l, BS, BD, T> { 315 | fn new( 316 | src: MutSlice<'l, BS, T, Init>, 317 | dst: MutSlice<'l, BD, T, Init>, 318 | dst_hole: MutSlice<'l, BD, T, Uninit>, 319 | ) -> Self { 320 | assert_abort(src.len() == dst_hole.len()); 321 | let hole_begin = dst_hole.begin(); 322 | let dst = dst.allow_maybe_init().concat(dst_hole.allow_maybe_init()); 323 | Self { 324 | src, 325 | dst, 326 | hole_begin, 327 | } 328 | } 329 | 330 | fn insert>(mut self, is_less: &mut F) { 331 | unsafe { 332 | while self.src.len() > 0 { 333 | let p = self.src.end().sub(1); 334 | while self.hole_begin != self.dst.begin() && is_less(&*p, &*self.hole_begin.sub(1)) 335 | { 336 | self.hole_begin = self.hole_begin.sub(1); 337 | self.dst.sub_end(1); 338 | ptr::copy_nonoverlapping(self.hole_begin, self.dst.end(), 1); 339 | } 340 | 341 | self.src.sub_end(1); 342 | self.dst.sub_end(1); 343 | ptr::copy_nonoverlapping(p, self.dst.end(), 1); 344 | } 345 | 346 | core::mem::forget(self); 347 | } 348 | } 349 | } 350 | 351 | impl<'l, BS, BD, T> Drop for BlockInserter<'l, BS, BD, T> { 352 | #[inline(never)] 353 | #[cold] 354 | fn drop(&mut self) { 355 | unsafe { 356 | ptr::copy_nonoverlapping(self.src.begin(), self.hole_begin, self.src.len()); 357 | } 358 | } 359 | } 360 | 361 | fn partial_sort_into<'src, 'dst, BS: Brand, BD: Brand, T, F: Cmp>( 362 | mut src: MutSlice<'src, BS, T, Init>, 363 | mut dst: MutSlice<'dst, BD, T, Weak>, 364 | is_less: &mut F, 365 | ) -> MutSlice<'dst, BD, T, Init> { 366 | with_stack_scratch::<64, T, _, _>("partial-sort-into-scratch", |mut scratch| { 367 | let n = src.len(); 368 | assert_abort(dst.len() >= n.min(32)); 369 | if n >= 32 { 370 | sort32_into( 371 | src.split_off_begin(32), 372 | dst.split_off_begin(32), 373 | scratch, 374 | is_less, 375 | ) 376 | } else if n >= 16 { 377 | sort16_into( 378 | src.split_off_begin(16), 379 | dst.split_off_begin(16), 380 | scratch.split_off_begin(32), // Yes, 32. 381 | is_less, 382 | ) 383 | } else if n >= 8 { 384 | sort8_into( 385 | src.split_off_begin(8), 386 | dst.split_off_begin(8), 387 | scratch.split_off_begin(8), 388 | is_less, 389 | ) 390 | } else if n >= 4 { 391 | sort4_into( 392 | src.split_off_begin(4), 393 | dst.split_off_begin(4), 394 | scratch.split_off_begin(4), 395 | is_less, 396 | ) 397 | } else if n >= 2 { 398 | unsafe { 399 | let first = src.begin(); 400 | let second = src.begin().add(1); 401 | if is_less(&*second, &*first) { 402 | ptr::swap_nonoverlapping(first, second, 1); 403 | } 404 | ptr::copy(first, dst.begin(), 1); 405 | ptr::copy(second, dst.begin().add(1), 1); 406 | dst.split_off_begin(2).upgrade().assume_init() 407 | } 408 | } else { 409 | unsafe { 410 | ptr::copy(src.begin(), dst.begin(), 1); 411 | dst.split_off_begin(1).upgrade().assume_init() 412 | } 413 | } 414 | }) 415 | } 416 | 417 | pub fn block_insertion_sort<'l, B: Brand, T, F: Cmp>( 418 | mut el: MutSlice<'l, B, T, AlwaysInit>, 419 | is_less: &mut F, 420 | ) { 421 | let n = el.len(); 422 | if n <= 1 { 423 | return; 424 | } 425 | 426 | unsafe { 427 | let el_weak = el.weak(); 428 | let n = el.len(); 429 | let mut num_sorted = partial_sort_into(el.borrow().raw(), el_weak, is_less).len(); 430 | 431 | with_stack_scratch::<32, T, _, _>("block-insertion-sort-scratch", |scratch| { 432 | while num_sorted < n { 433 | let (sorted, unsorted) = el.borrow().split_at(num_sorted).unwrap_abort(); 434 | let mut unsorted_weak = unsorted.weak(); 435 | let in_scratch = partial_sort_into(unsorted.raw(), scratch.weak(), is_less); 436 | num_sorted += in_scratch.len(); 437 | let gap = unsorted_weak.split_off_begin(in_scratch.len()); 438 | BlockInserter::new(in_scratch, sorted.raw(), gap.upgrade().assume_uninit()) 439 | .insert(is_less); 440 | } 441 | }); 442 | } 443 | } 444 | -------------------------------------------------------------------------------- /src/stable_quicksort.rs: -------------------------------------------------------------------------------- 1 | /* 2 | Bidirectional partitioning. 3 | 4 | We assume we always have a contiguous scratch space equal in size to our 5 | total amount of elements in each recursive call. Similarly we assume we have 6 | a destination space of equal size. Our input logically consists of a single 7 | slice of elements, although physically it can consist of two slices. We will 8 | call the first slice left, and the second slice right. 9 | 10 | Our input slices can overlap with either scratch or destination, but the 11 | left slice if it overlaps must start at the beginning, and the right slice 12 | must end at the end of the overlap. 13 | 14 | In this scenario (which is the usual starting scenario) both left and right 15 | overlap the destination (address space goes from left-to-right, vertically 16 | aligned arrays overlap): 17 | 18 | [ scratch ] 19 | [ destination ] 20 | [ left ][ right ] 21 | 22 | We start scanning through our logical input slice from both ends: forwards 23 | from the start, backwards from the end. Initially that means left becomes 24 | forward and right becomes backward, but if either left or right is fully 25 | consumed the remaining slice will be split up equally into a new forwards 26 | and backwards pair of slices. 27 | 28 | In the forward scan we will put elements that are less than the pivot in the 29 | destination, and those that are greater than or equal in the scratch, both 30 | at the start. In the backwards scan we do the exact opposite and put 31 | elements that are less than the pivot in the scratch and those that are 32 | greater or equal in the destination, both at the end. Note that this also 33 | means the aforementioned overlaps are fine, as we will read each element 34 | before it gets overwritten. 35 | 36 | The above example might look like this after partitioning: 37 | 38 | [ >= | scratch | < ] 39 | [ c' | d' | a' | b' ] 40 | [ < | dst | >= ] 41 | [ a | b | c | d ] 42 | 43 | We have marked some regions with letters. It is intended regardless of ASCII 44 | art accuracy, that a is as large as a', etc. Note that 45 | 46 | |a| + |b'| + |c'| + |d| = |a| + |b| + |c| + |d| = n, 47 | 48 | where n is the total number of elements. a and c' were produced by the 49 | forward scan, b' and d were produced by the backwards scan. Then if 50 | 51 | partition(left, right, dest, scratch) 52 | 53 | is our type signature, our two recursive calls are: 54 | 55 | partition(a, b', concat(a, b), concat(a', b')) 56 | partition(c', d, concat(c, d), concat(c', d')) 57 | 58 | Note that this maintains the invariant that both destination and scratch 59 | space equal left, right in total size. Note that it also places all elements 60 | smaller than the pivot before those greater or equal. When our recursive 61 | call total size becomes too small we simply copy left, right into the 62 | destination and use a dedicated small sorting routine. 63 | */ 64 | 65 | use core::mem::ManuallyDrop; 66 | 67 | use crate::gap_guard::GapGuard; 68 | use crate::mut_slice::states::{AlwaysInit, Init, Uninit, Weak}; 69 | use crate::mut_slice::{Brand, MutSlice, Unbranded}; 70 | use crate::pivot_selection::select_pivot; 71 | use crate::small_sort; 72 | use crate::tracking; 73 | use crate::tracking::ptr; 74 | use crate::util::*; 75 | 76 | struct BidirPartitionState<'l, BD: Brand, BS: Brand, T> { 77 | // The elements that still have to be scanned. 78 | forward_scan: MutSlice<'l, Unbranded, T, Init>, 79 | backward_scan: MutSlice<'l, Unbranded, T, Init>, 80 | 81 | // Our destination and scratch output slices. 82 | dest: MutSlice<'l, BD, T, Weak>, 83 | scratch: MutSlice<'l, BS, T, Weak>, 84 | 85 | // To get the most optimal loop body we use this weird representation where 86 | // dest.begin().add(num_at_dest_begin) is our dest write head and 87 | // scratch_forwards_cursor.sub(num_at_dest_begin) is our scratch write head. 88 | // This allows us to unconditionally increment scratch_forwards_cursor each 89 | // iteration, instead of subtracting the negation of our conditional. 90 | num_at_dest_begin: usize, 91 | scratch_forwards_cursor: *mut T, 92 | 93 | // Similarly, dest_backwards_cursor.add(num_at_scratch_end).sub(1) and 94 | // scratch.end().sub(num_at_scratch_end + 1) are our dest write tail and 95 | // scratch write tail respectively. 96 | // Yes, this extra subtraction in the loop is faster than pre-decrementing 97 | // once, likely due to LLVM loop bounds proving. 98 | num_at_scratch_end: usize, 99 | dest_backwards_cursor: *mut T, 100 | } 101 | 102 | impl<'l, BD: Brand, BS: Brand, T> BidirPartitionState<'l, BD, BS, T> { 103 | pub fn new( 104 | left: MutSlice<'l, BL, T, Init>, 105 | right: MutSlice<'l, BR, T, Init>, 106 | dest: MutSlice<'l, BD, T, Weak>, 107 | scratch: MutSlice<'l, BS, T, Weak>, 108 | ) -> Self { 109 | Self { 110 | forward_scan: left.forget_brand(), 111 | backward_scan: right.forget_brand(), 112 | num_at_dest_begin: 0, 113 | scratch_forwards_cursor: scratch.begin(), 114 | num_at_scratch_end: 0, 115 | dest_backwards_cursor: dest.end(), 116 | dest, 117 | scratch, 118 | } 119 | } 120 | 121 | /// Take ownership of the output slices, in ascending order. 122 | pub fn take( 123 | &mut self, 124 | ) -> ( 125 | MutSlice<'l, BD, T, Init>, 126 | MutSlice<'l, BS, T, Init>, 127 | MutSlice<'l, BS, T, Init>, 128 | MutSlice<'l, BD, T, Init>, 129 | ) { 130 | unsafe { 131 | let less_in_dest = self.dest.split_off_begin(self.num_at_dest_begin); 132 | let scratch_end_ptr = self.scratch_forwards_cursor.sub(self.num_at_dest_begin); 133 | let num_in_scratch = scratch_end_ptr.offset_from(self.scratch.begin()) as usize; 134 | let geq_in_scratch = self.scratch.split_off_begin(num_in_scratch); 135 | self.num_at_dest_begin = 0; 136 | self.scratch_forwards_cursor = self.scratch.begin(); 137 | 138 | let dest_begin_ptr = self.dest_backwards_cursor.add(self.num_at_scratch_end); 139 | let num_in_dest = self.dest.end().offset_from(dest_begin_ptr) as usize; 140 | let geq_in_dest = self.dest.split_off_end(num_in_dest); 141 | let less_in_scratch = self.scratch.split_off_end(self.num_at_scratch_end); 142 | self.num_at_scratch_end = 0; 143 | self.dest_backwards_cursor = self.dest.end(); 144 | 145 | ( 146 | less_in_dest.upgrade().assume_init(), 147 | less_in_scratch.upgrade().assume_init(), 148 | geq_in_scratch.upgrade().assume_init(), 149 | geq_in_dest.upgrade().assume_init(), 150 | ) 151 | } 152 | } 153 | 154 | /// Partitions one element using our forward scan. 155 | /// 156 | /// SAFETY: self.forward_scan may not be empty. 157 | #[inline] 158 | unsafe fn partition_one_forward>( 159 | &mut self, 160 | pivot: *mut T, 161 | is_less: &mut F, 162 | ) -> *mut T { 163 | unsafe { 164 | let scan = self.forward_scan.begin(); 165 | let less_than_pivot = is_less(&*scan, &*pivot); 166 | let dest_out = self.dest.begin().add(self.num_at_dest_begin); 167 | let scratch_out = self.scratch_forwards_cursor.sub(self.num_at_dest_begin); 168 | let out = select(less_than_pivot, dest_out, scratch_out); 169 | if core::mem::size_of::() <= core::mem::size_of::() 170 | && !cfg!(feature = "tracking") 171 | { 172 | // We'll overwrite a bad answer in a later iteration anyway. 173 | // Or not, in which case we'll never read it. 174 | ptr::copy(scan, dest_out, 1); 175 | ptr::copy(scan, scratch_out, 1); 176 | } else { 177 | ptr::copy(scan, out, 1); 178 | } 179 | self.num_at_dest_begin += less_than_pivot as usize; 180 | self.scratch_forwards_cursor = self.scratch_forwards_cursor.add(1); 181 | self.forward_scan.add_begin(1); 182 | out 183 | } 184 | } 185 | 186 | /// Partitions one element using our backward scan. 187 | /// 188 | /// SAFETY: self.backward_scan may not be empty. 189 | #[inline] 190 | unsafe fn partition_one_backward>( 191 | &mut self, 192 | pivot: *mut T, 193 | is_less: &mut F, 194 | ) -> *mut T { 195 | unsafe { 196 | let scan = self.backward_scan.end().sub(1); 197 | let less_than_pivot = is_less(&*scan, &*pivot); 198 | let dest_out = self 199 | .dest_backwards_cursor 200 | .add(self.num_at_scratch_end) 201 | .sub(1); 202 | let scratch_out = self.scratch.end().sub(self.num_at_scratch_end + 1); 203 | let out = select(less_than_pivot, scratch_out, dest_out); 204 | if core::mem::size_of::() <= core::mem::size_of::() 205 | && !cfg!(feature = "tracking") 206 | { 207 | // We'll overwrite a bad answer in a later iteration anyway. 208 | // Or not, in which case we'll never read it. 209 | ptr::copy(scan, dest_out, 1); 210 | ptr::copy(scan, scratch_out, 1); 211 | } else { 212 | ptr::copy(scan, out, 1); 213 | } 214 | self.num_at_scratch_end += less_than_pivot as usize; 215 | self.dest_backwards_cursor = self.dest_backwards_cursor.sub(1); 216 | self.backward_scan.sub_end(1); 217 | out 218 | } 219 | } 220 | 221 | /// Partitions forwards and backwards n times. 222 | /// 223 | /// SAFETY: self.forward_scan and self.backward_scan must be at least n 224 | /// elements long, and respectively their prefix and suffix of n elements 225 | /// may not contain pivot_pos. 226 | unsafe fn partition_bidir_n>( 227 | &mut self, 228 | pivot_pos: *mut T, 229 | n: usize, 230 | is_less: &mut F, 231 | ) { 232 | // In case of a panic we must write back the pivot, which we move into a 233 | // local to prove to the compiler that our writes don't overwrite the 234 | // pivot, and thus it does not need to be reloaded. 235 | struct WriteBackPivot { 236 | local_pivot: ManuallyDrop, 237 | pivot_pos: *mut T, 238 | } 239 | 240 | impl Drop for WriteBackPivot { 241 | fn drop(&mut self) { 242 | unsafe { 243 | ptr::copy_nonoverlapping(&mut *self.local_pivot, self.pivot_pos, 1); 244 | tracking::track_copy(&mut *self.local_pivot, self.pivot_pos, 1); 245 | tracking::deregister_buffer("pivot"); 246 | } 247 | } 248 | } 249 | 250 | unsafe { 251 | let mut guard = WriteBackPivot { 252 | local_pivot: ManuallyDrop::new(ptr::read(pivot_pos)), 253 | pivot_pos, 254 | }; 255 | let local_pivot_ptr = &mut *guard.local_pivot as *mut T; 256 | tracking::register_buffer( 257 | "pivot", 258 | MutSlice::::from_pair_unchecked( 259 | local_pivot_ptr, 260 | local_pivot_ptr.add(1), 261 | ), 262 | ); 263 | tracking::track_copy(pivot_pos, local_pivot_ptr, 1); 264 | for _ in 0..n / 4 { 265 | self.partition_one_forward(local_pivot_ptr, is_less); 266 | self.partition_one_backward(local_pivot_ptr, is_less); 267 | self.partition_one_forward(local_pivot_ptr, is_less); 268 | self.partition_one_backward(local_pivot_ptr, is_less); 269 | self.partition_one_forward(local_pivot_ptr, is_less); 270 | self.partition_one_backward(local_pivot_ptr, is_less); 271 | self.partition_one_forward(local_pivot_ptr, is_less); 272 | self.partition_one_backward(local_pivot_ptr, is_less); 273 | } 274 | for _ in 0..n % 4 { 275 | self.partition_one_forward(local_pivot_ptr, is_less); 276 | self.partition_one_backward(local_pivot_ptr, is_less); 277 | } 278 | } 279 | } 280 | 281 | /// Fully partitions the data. 282 | /// 283 | /// SAFETY: pivot_pos must point to a valid object. 284 | pub unsafe fn partition_bidir>( 285 | &mut self, 286 | mut pivot_pos: *mut T, 287 | is_less: &mut F, 288 | ) -> *mut T { 289 | unsafe { 290 | loop { 291 | // Our main causes of worry is that the pivot (depending on how 292 | // it's chosen) is in our array, and that the forward and backward 293 | // scans may not be of the same size. We also would like to keep 294 | // track of the position of our left ancestor pivot. So we 295 | // compute how far we can safely scan. 296 | let mut forward_limit = self.forward_scan.len(); 297 | if self.forward_scan.contains(pivot_pos) { 298 | forward_limit = pivot_pos.offset_from(self.forward_scan.begin()) as usize; 299 | } 300 | 301 | let mut backward_limit = self.backward_scan.len(); 302 | if self.backward_scan.contains(pivot_pos) { 303 | backward_limit = self.backward_scan.end().offset_from(pivot_pos) as usize - 1; 304 | } 305 | 306 | // We found how far we can safely scan on both sides, do that. 307 | let limit = forward_limit.min(backward_limit); 308 | self.partition_bidir_n(pivot_pos, limit, is_less); 309 | 310 | // We could be done, or hit one of our limits. 311 | if self.forward_scan.len() == 0 && self.backward_scan.len() == 0 { 312 | return pivot_pos; 313 | } else if self.forward_scan.len() > 0 && self.forward_scan.begin() == pivot_pos { 314 | pivot_pos = self.partition_one_forward(pivot_pos, is_less) 315 | } else if self.backward_scan.len() > 0 316 | && self.backward_scan.end().sub(1) == pivot_pos 317 | { 318 | pivot_pos = self.partition_one_backward(pivot_pos, is_less) 319 | } else if self.forward_scan.len() == 0 { 320 | // Handle odd input sizes. 321 | if self.backward_scan.len() % 2 > 0 { 322 | self.partition_one_backward(pivot_pos, is_less); 323 | } 324 | self.forward_scan = self 325 | .backward_scan 326 | .split_off_begin(self.backward_scan.len() / 2); 327 | } else { 328 | // Handle odd input sizes. 329 | if self.forward_scan.len() % 2 > 0 { 330 | self.partition_one_forward(pivot_pos, is_less); 331 | } 332 | self.backward_scan = 333 | self.forward_scan.split_off_end(self.forward_scan.len() / 2); 334 | } 335 | } 336 | } 337 | } 338 | } 339 | 340 | impl<'l, BD: Brand, BS: Brand, T> Drop for BidirPartitionState<'l, BD, BS, T> { 341 | fn drop(&mut self) { 342 | unsafe { 343 | // Make sure all elements are moved into the destination. 344 | // We should only run this upon panicking. 345 | let (_dest_less, scratch_less, scratch_geq, _dest_geq) = self.take(); 346 | let fwd = self.dest.split_off_begin(self.forward_scan.len()); 347 | let less = self.dest.split_off_begin(scratch_less.len()); 348 | let geq = self.dest.split_off_begin(scratch_geq.len()); 349 | let bck = self.dest.split_off_begin(self.backward_scan.len()); 350 | assert_abort(self.dest.len() == 0); 351 | // Our scans can overlap the destination, so we should copy them 352 | // first lest we overwrite them by accident. 353 | ptr::copy(self.forward_scan.begin(), fwd.begin(), fwd.len()); 354 | ptr::copy(self.backward_scan.begin(), bck.begin(), bck.len()); 355 | ptr::copy(scratch_less.begin(), less.begin(), less.len()); 356 | ptr::copy(scratch_geq.begin(), geq.begin(), geq.len()); 357 | } 358 | } 359 | } 360 | 361 | enum PartitionStrategy { 362 | LeftWithPivot(*mut T), 363 | LeftIfNewPivotEquals(*mut T), 364 | LeftIfNewPivotEqualsCopy(T), 365 | RightWithNewPivot, 366 | } 367 | 368 | unsafe fn stable_bidir_quicksort_into< 369 | 'l, 370 | BL: Brand, 371 | BR: Brand, 372 | BD: Brand, 373 | BS: Brand, 374 | T, 375 | F: Cmp, 376 | >( 377 | left: MutSlice<'l, BL, T, Init>, 378 | right: MutSlice<'l, BR, T, Init>, 379 | dest: MutSlice<'l, BD, T, Weak>, 380 | scratch: MutSlice<'l, BS, T, Weak>, 381 | partition_strategy: PartitionStrategy, 382 | recursion_limit: usize, 383 | is_less: &mut F, 384 | ) { 385 | let n = left.len() + right.len(); 386 | assert_abort(dest.len() == scratch.len()); 387 | assert_abort(dest.len() == n); 388 | 389 | if n < crate::SMALL_SORT || recursion_limit == 0 { 390 | unsafe { 391 | let (left_dest, right_dest) = dest.clone().split_at(left.len()).unwrap_abort(); 392 | if left.begin() != dest.begin() { 393 | left.move_to(left_dest.upgrade().assume_uninit()); 394 | } 395 | if right.begin() != right_dest.begin() { 396 | right.move_to(right_dest.upgrade().assume_uninit()); 397 | } 398 | let data = dest.upgrade().assume_init().always_init(); 399 | 400 | if n < crate::SMALL_SORT { 401 | small_sort::small_sort(data, is_less); 402 | } else { 403 | crate::glidesort::glidesort(data, scratch.upgrade().assume_uninit(), is_less, true); 404 | } 405 | return; 406 | } 407 | } 408 | 409 | // Load left/right into state first in case pivot selection panics as our guard. 410 | let mut state = BidirPartitionState::new(left, right, dest.clone(), scratch.clone()); 411 | let mut pivot_pos = if let PartitionStrategy::LeftWithPivot(p) = partition_strategy { 412 | p 413 | } else { 414 | select_pivot( 415 | state.forward_scan.borrow(), 416 | state.backward_scan.borrow(), 417 | is_less, 418 | ) 419 | }; 420 | let partition_left = match partition_strategy { 421 | PartitionStrategy::LeftWithPivot(_) => true, 422 | PartitionStrategy::LeftIfNewPivotEquals(p) => unsafe { !is_less(&*p, &*pivot_pos) }, 423 | PartitionStrategy::LeftIfNewPivotEqualsCopy(ref p) => unsafe { !is_less(p, &*pivot_pos) }, 424 | PartitionStrategy::RightWithNewPivot => false, 425 | }; 426 | 427 | unsafe { 428 | pivot_pos = if partition_left { 429 | state.partition_bidir(pivot_pos, &mut cmp_from_closure(|a, b| !is_less(b, a))) 430 | } else { 431 | state.partition_bidir(pivot_pos, is_less) 432 | }; 433 | } 434 | let (less_in_dest, less_in_scratch, geq_in_scratch, geq_in_dest) = state.take(); 435 | core::mem::forget(state); 436 | 437 | // Compute recursive slices, and construct panic safety gap guards. 438 | let less_n = less_in_dest.len() + less_in_scratch.len(); 439 | let geq_n = geq_in_dest.len() + geq_in_scratch.len(); 440 | let (less_rec_dest, geq_rec_dest) = dest.clone().split_at(less_n).unwrap_abort(); 441 | let (geq_rec_scratch, less_rec_scratch) = scratch.split_at(geq_n).unwrap_abort(); 442 | 443 | let (less_gap, geq_gap) = { 444 | let (_less_in_dest, less_gap) = less_rec_dest 445 | .clone() 446 | .split_at(less_in_dest.len()) 447 | .unwrap_abort(); 448 | let (geq_gap, _geq_in_dest) = geq_rec_dest 449 | .clone() 450 | .split_at_end(geq_in_dest.len()) 451 | .unwrap_abort(); 452 | unsafe { 453 | ( 454 | less_gap.upgrade().assume_uninit(), 455 | geq_gap.upgrade().assume_uninit(), 456 | ) 457 | } 458 | }; 459 | let less_in_scratch_guard = GapGuard::new_disjoint(less_in_scratch, less_gap); 460 | let geq_in_scratch_guard = GapGuard::new_disjoint(geq_in_scratch, geq_gap); 461 | 462 | // Both recursive calls are small, we can use this to overlap two slightly 463 | // overly large small sorts for faster smallsort sizes. 464 | if less_n < crate::SMALL_SORT && geq_n < crate::SMALL_SORT { 465 | let mut dest = unsafe { 466 | drop(less_in_scratch_guard); 467 | drop(geq_in_scratch_guard); 468 | dest.upgrade().assume_init().always_init() 469 | }; 470 | 471 | if less_n <= 32 && less_n & 0b1000 > 0 { 472 | // Round up lower 3 bits. 473 | let round = (less_n + 0b111) & !0b111; 474 | small_sort::small_sort(dest.borrow().split_at(round).unwrap_abort().0, is_less); 475 | } else { 476 | small_sort::small_sort(dest.borrow().split_at(less_n).unwrap_abort().0, is_less); 477 | } 478 | 479 | if geq_n <= 32 && geq_n & 0b1000 > 0 { 480 | // Round up lower 3 bits. 481 | let round = (geq_n + 0b111) & !0b111; 482 | small_sort::small_sort(dest.borrow().split_at_end(round).unwrap_abort().1, is_less); 483 | } else { 484 | small_sort::small_sort(dest.borrow().split_at_end(geq_n).unwrap_abort().1, is_less); 485 | } 486 | return; 487 | } 488 | 489 | if less_n == 0 && !partition_left { 490 | unsafe { 491 | stable_bidir_quicksort_into( 492 | geq_in_scratch_guard.take_data(), 493 | geq_in_dest, 494 | geq_rec_dest, 495 | geq_rec_scratch, 496 | PartitionStrategy::LeftWithPivot(pivot_pos), 497 | recursion_limit - 1, 498 | is_less, 499 | ); 500 | } 501 | return; 502 | } 503 | 504 | unsafe { 505 | // This ensures the two recursive calls are completely independent, for potential parallelism, even if the 506 | // comparison operator is invalid. 507 | let pivot_in_geq = 508 | geq_in_scratch_guard.data_weak().contains(pivot_pos) || geq_in_dest.contains(pivot_pos); 509 | let less_strategy = 510 | if let PartitionStrategy::LeftIfNewPivotEqualsCopy(p) = partition_strategy { 511 | PartitionStrategy::LeftIfNewPivotEqualsCopy(p) 512 | } else { 513 | PartitionStrategy::RightWithNewPivot 514 | }; 515 | 516 | let geq_strategy = if !partition_left && T::is_copy_type() { 517 | PartitionStrategy::LeftIfNewPivotEqualsCopy(ptr::read(pivot_pos)) 518 | } else if !partition_left && pivot_in_geq { 519 | PartitionStrategy::LeftIfNewPivotEquals(pivot_pos) 520 | } else { 521 | PartitionStrategy::RightWithNewPivot 522 | }; 523 | 524 | if !partition_left { 525 | stable_bidir_quicksort_into( 526 | less_in_dest, 527 | less_in_scratch_guard.take_data(), 528 | less_rec_dest, 529 | less_rec_scratch, 530 | less_strategy, 531 | recursion_limit - 1, 532 | is_less, 533 | ); 534 | } 535 | 536 | stable_bidir_quicksort_into( 537 | geq_in_scratch_guard.take_data(), 538 | geq_in_dest, 539 | geq_rec_dest, 540 | geq_rec_scratch, 541 | geq_strategy, 542 | recursion_limit - 1, 543 | is_less, 544 | ); 545 | } 546 | } 547 | 548 | pub fn quicksort<'el, 'sc, BE: Brand, BS: Brand, T, F: Cmp>( 549 | el: MutSlice<'el, BE, T, AlwaysInit>, 550 | scratch: MutSlice<'sc, BS, T, Uninit>, 551 | is_less: &mut F, 552 | ) -> MutSlice<'el, BE, T, AlwaysInit> { 553 | unsafe { 554 | let n = el.len(); 555 | let logn = core::mem::size_of::() * 8 - n.leading_zeros() as usize; 556 | let scratch = scratch.split_at(n).unwrap_abort().0; 557 | let dest = el.weak(); 558 | let (left, right) = el.raw().split_at(n / 2).unwrap_abort(); 559 | stable_bidir_quicksort_into( 560 | left, 561 | right, 562 | dest.clone(), 563 | scratch.weak(), 564 | PartitionStrategy::RightWithNewPivot, 565 | 2 * logn, 566 | is_less, 567 | ); 568 | dest.upgrade().assume_init().always_init() 569 | } 570 | } 571 | -------------------------------------------------------------------------------- /src/tracking.rs: -------------------------------------------------------------------------------- 1 | #![allow(dead_code)] 2 | 3 | use crate::mut_slice::states::Weak; 4 | use crate::mut_slice::MutSlice; 5 | 6 | #[derive(Copy, Clone, Debug)] 7 | pub struct Location { 8 | pub buffer: &'static str, 9 | pub idx: usize, 10 | } 11 | 12 | #[derive(Copy, Clone, Debug)] 13 | pub enum Operation { 14 | Move { from: Location, to: Location }, 15 | 16 | Swap { a: Location, b: Location }, 17 | 18 | Compare { lhs: Location, rhs: Location }, 19 | } 20 | 21 | #[cfg(feature = "tracking")] 22 | mod tracking_impl { 23 | use std::collections::{HashMap, HashSet}; 24 | use std::sync::Mutex; 25 | 26 | use super::*; 27 | 28 | #[derive(Default)] 29 | struct TrackingRegister { 30 | known_buffers: HashMap<&'static str, (usize, usize)>, 31 | registered_reads: HashSet, 32 | registered_writes: HashSet, 33 | ops: Vec, 34 | } 35 | 36 | impl TrackingRegister { 37 | pub fn register_buffer<'l, B, T>( 38 | &mut self, 39 | name: &'static str, 40 | buf: MutSlice<'l, B, T, Weak>, 41 | ) { 42 | let old = self 43 | .known_buffers 44 | .insert(name, (buf.begin_address(), buf.end_address())); 45 | assert!(old.is_none(), "duplicate buffer {name}"); 46 | } 47 | 48 | pub fn deregister_buffer(&mut self, name: &'static str) -> (usize, usize) { 49 | let old = self.known_buffers.remove(name); 50 | assert!(old.is_some(), "unknown buffer {name}"); 51 | old.unwrap() 52 | } 53 | 54 | fn locate(&self, ptr: *const T) -> Option { 55 | let iptr = ptr as usize; 56 | for (buf, (begin, end)) in self.known_buffers.iter() { 57 | if (*begin..*end).contains(&iptr) { 58 | return Some(Location { 59 | buffer: *buf, 60 | idx: (iptr - begin) / std::mem::size_of::(), 61 | }); 62 | } 63 | } 64 | None 65 | } 66 | } 67 | 68 | lazy_static::lazy_static! { 69 | static ref TRACKING_REGISTER: Mutex = { 70 | Mutex::new(TrackingRegister::default()) 71 | }; 72 | } 73 | 74 | pub fn read_tracked_ops() -> Vec { 75 | let mut register = TRACKING_REGISTER.lock().unwrap(); 76 | assert!(register.registered_reads.len() == 0); 77 | assert!(register.registered_writes.len() == 0); 78 | assert!(register.known_buffers.len() == 0); 79 | core::mem::take(&mut register.ops) 80 | } 81 | 82 | pub fn register_buffer<'l, B, T>(name: &'static str, buf: MutSlice<'l, B, T, Weak>) { 83 | let mut register = TRACKING_REGISTER.lock().unwrap(); 84 | register.register_buffer(name, buf); 85 | } 86 | 87 | pub fn deregister_buffer(name: &'static str) { 88 | let mut register = TRACKING_REGISTER.lock().unwrap(); 89 | register.deregister_buffer(name); 90 | } 91 | 92 | pub fn register_cmp(left: *const T, right: *const T) { 93 | let mut register = TRACKING_REGISTER.lock().unwrap(); 94 | let lhs = register.locate(left).expect("unregistered cmp lhs"); 95 | let rhs = register.locate(right).expect("unregistered cmp rhs"); 96 | register.ops.push(Operation::Compare { lhs, rhs }) 97 | } 98 | 99 | pub fn track_copy(src: *const T, dst: *mut T, count: usize) { 100 | if count == 0 { 101 | return; 102 | } 103 | 104 | let mut register = TRACKING_REGISTER.lock().unwrap(); 105 | let from = register 106 | .locate(src) 107 | .expect("unregistered copy src destination"); 108 | let to = register 109 | .locate(dst) 110 | .expect("unregistered copy dst destination"); 111 | if (src..src.wrapping_add(count)).contains(&(dst as *const T)) { 112 | for i in (0..count).rev() { 113 | register.ops.push(Operation::Move { 114 | from: Location { 115 | idx: from.idx + i, 116 | buffer: from.buffer, 117 | }, 118 | to: Location { 119 | idx: to.idx + i, 120 | buffer: to.buffer, 121 | }, 122 | }); 123 | } 124 | } else { 125 | for i in 0..count { 126 | register.ops.push(Operation::Move { 127 | from: Location { 128 | idx: from.idx + i, 129 | buffer: from.buffer, 130 | }, 131 | to: Location { 132 | idx: to.idx + i, 133 | buffer: to.buffer, 134 | }, 135 | }); 136 | } 137 | } 138 | } 139 | 140 | pub fn track_swap_nonoverlapping(src: *const T, dst: *mut T, count: usize) { 141 | if count == 0 { 142 | return; 143 | } 144 | 145 | let mut register = TRACKING_REGISTER.lock().unwrap(); 146 | let from = register 147 | .locate(src) 148 | .expect("unregistered copy src destination"); 149 | let to = register 150 | .locate(dst) 151 | .expect("unregistered copy dst destination"); 152 | for i in 0..count { 153 | register.ops.push(Operation::Swap { 154 | a: Location { 155 | idx: from.idx + i, 156 | buffer: from.buffer, 157 | }, 158 | b: Location { 159 | idx: to.idx + i, 160 | buffer: to.buffer, 161 | }, 162 | }); 163 | } 164 | } 165 | } 166 | 167 | /// Dummy implementation. 168 | #[cfg(not(feature = "tracking"))] 169 | #[allow(dead_code)] 170 | mod tracking_impl { 171 | use super::*; 172 | 173 | #[inline] 174 | pub fn register_cmp(_left: *const T, _right: *const T) {} 175 | #[inline] 176 | pub fn register_buffer<'l, B, T>(_name: &'static str, _sl: MutSlice<'l, B, T, Weak>) {} 177 | #[inline] 178 | pub fn deregister_buffer(_name: &'static str) {} 179 | #[inline] 180 | pub fn track_copy(_src: *const T, _dst: *mut T, _count: usize) {} 181 | #[inline] 182 | pub fn track_swap_nonoverlapping(_a: *const T, _b: *mut T, _count: usize) {} 183 | } 184 | 185 | #[cfg(not(feature = "tracking"))] 186 | pub(crate) use core::ptr; 187 | 188 | #[cfg(feature = "tracking")] 189 | pub use tracking_impl::read_tracked_ops; 190 | pub(crate) use tracking_impl::{deregister_buffer, register_buffer, register_cmp, track_copy}; 191 | 192 | #[cfg(feature = "tracking")] 193 | pub(crate) mod ptr { 194 | use core::ptr as cptr; 195 | 196 | pub use cptr::{read, write}; 197 | 198 | #[inline] 199 | pub unsafe fn swap_nonoverlapping(a: *mut T, b: *mut T, count: usize) { 200 | super::tracking_impl::track_swap_nonoverlapping(a, b, count); 201 | unsafe { cptr::swap_nonoverlapping(a, b, count) } 202 | } 203 | 204 | #[inline] 205 | pub unsafe fn copy_nonoverlapping(src: *const T, dst: *mut T, count: usize) { 206 | super::tracking_impl::track_copy(src, dst, count); 207 | unsafe { cptr::copy_nonoverlapping(src, dst, count) } 208 | } 209 | 210 | #[inline] 211 | pub unsafe fn copy(src: *const T, dst: *mut T, count: usize) { 212 | super::tracking_impl::track_copy(src, dst, count); 213 | unsafe { cptr::copy(src, dst, count) } 214 | } 215 | } 216 | -------------------------------------------------------------------------------- /src/util.rs: -------------------------------------------------------------------------------- 1 | use core::mem::MaybeUninit; 2 | 3 | /// Trait alias for comparison functions. 4 | pub trait Cmp: FnMut(&T, &T) -> bool {} 5 | impl bool> Cmp for F {} 6 | 7 | /// Helper function for the compiler to infer a closure as Cmp. 8 | #[inline] 9 | pub fn cmp_from_closure(f: F) -> F 10 | where 11 | F: FnMut(&T, &T) -> bool, 12 | { 13 | f 14 | } 15 | 16 | #[inline] 17 | pub fn select(cond: bool, if_true: *mut T, if_false: *mut T) -> *mut T { 18 | // let mut ret = if_false; 19 | // unsafe { 20 | // core::arch::asm! { 21 | // "test {cond}, {cond}", 22 | // "cmovnz {ret}, {if_true}", 23 | // cond = in(reg) (cond as usize), 24 | // if_true = in(reg) if_true, 25 | // ret = inlateout(reg) ret, 26 | // options(pure, nomem, nostack) 27 | // }; 28 | // } 29 | // ret 30 | 31 | // let mut res = if_false as usize; 32 | // cmov::cmovnz(cond as usize, if_true as usize, &mut res); 33 | // res as *mut T 34 | 35 | // let ab = [if_false, if_true]; 36 | // ab[cond as usize] 37 | 38 | // let tpi = if_true as usize; 39 | // let fpi = if_false as usize; 40 | 41 | // let xor = tpi ^ fpi; 42 | // let cond_mask = (-(cond as isize)) as usize; 43 | // let xor_if_true = xor & cond_mask; 44 | // return (fpi ^ xor_if_true) as *mut T; 45 | 46 | if cond { 47 | if_true 48 | } else { 49 | if_false 50 | } 51 | } 52 | 53 | #[inline] 54 | #[cold] 55 | pub fn abort() -> ! { 56 | // panic!("abort called"); 57 | #[cfg(not(feature = "unstable"))] 58 | { 59 | std::process::abort(); 60 | } 61 | #[cfg(feature = "unstable")] 62 | { 63 | core::intrinsics::abort(); 64 | } 65 | // unsafe { std::hint::unreachable_unchecked() } 66 | } 67 | 68 | #[inline(always)] 69 | pub fn assert_abort(b: bool) { 70 | if !b { 71 | abort(); 72 | } 73 | } 74 | 75 | pub trait UnwrapAbort { 76 | type Inner; 77 | fn unwrap_abort(self) -> Self::Inner; 78 | } 79 | 80 | impl UnwrapAbort for Option { 81 | type Inner = T; 82 | 83 | #[inline] 84 | fn unwrap_abort(self) -> Self::Inner { 85 | if let Some(inner) = self { 86 | inner 87 | } else { 88 | abort() 89 | } 90 | } 91 | } 92 | 93 | impl UnwrapAbort for Result { 94 | type Inner = T; 95 | 96 | #[inline] 97 | fn unwrap_abort(self) -> Self::Inner { 98 | if let Ok(inner) = self { 99 | inner 100 | } else { 101 | abort() 102 | } 103 | } 104 | } 105 | 106 | // split_at_spare_mut not stabilized yet. 107 | #[inline] 108 | pub fn split_at_spare_mut(v: &mut Vec) -> (&mut [T], &mut [MaybeUninit]) { 109 | unsafe { 110 | let ptr = v.as_mut_ptr(); 111 | let len = v.len(); 112 | let spare_ptr = ptr.add(len); 113 | let spare_len = v.capacity() - len; 114 | 115 | // SAFETY: the two slices are non-overlapping, and both match their 116 | // initialization requirements. 117 | ( 118 | core::slice::from_raw_parts_mut(ptr, len), 119 | core::slice::from_raw_parts_mut(spare_ptr.cast::>(), spare_len), 120 | ) 121 | } 122 | } 123 | 124 | /// # Safety 125 | /// Only implemented for copy types. 126 | pub unsafe trait IsCopyType { 127 | fn is_copy_type() -> bool; 128 | } 129 | 130 | #[cfg(not(feature = "unstable"))] 131 | unsafe impl IsCopyType for T { 132 | fn is_copy_type() -> bool { 133 | false 134 | } 135 | } 136 | 137 | #[cfg(feature = "unstable")] 138 | unsafe impl IsCopyType for T { 139 | default fn is_copy_type() -> bool { 140 | false 141 | } 142 | } 143 | 144 | #[cfg(feature = "unstable")] 145 | unsafe impl IsCopyType for T { 146 | fn is_copy_type() -> bool { 147 | true 148 | } 149 | } 150 | 151 | /// # Safety 152 | /// Only implemented for types for which we may call Ord on (soon to be 153 | /// forgotten) copies, even if T isn't Copy. 154 | pub unsafe trait MayCallOrdOnCopy { 155 | fn may_call_ord_on_copy() -> bool; 156 | } 157 | 158 | #[cfg(not(feature = "unstable"))] 159 | unsafe impl MayCallOrdOnCopy for T { 160 | fn may_call_ord_on_copy() -> bool { 161 | false 162 | } 163 | } 164 | 165 | #[cfg(feature = "unstable")] 166 | unsafe impl MayCallOrdOnCopy for T { 167 | default fn may_call_ord_on_copy() -> bool { 168 | false 169 | } 170 | } 171 | 172 | #[cfg(feature = "unstable")] 173 | #[marker] 174 | unsafe trait SafeToCall {} 175 | 176 | #[cfg(feature = "unstable")] 177 | unsafe impl MayCallOrdOnCopy for T { 178 | fn may_call_ord_on_copy() -> bool { 179 | true 180 | } 181 | } 182 | 183 | #[cfg(feature = "unstable")] 184 | unsafe impl SafeToCall for T {} 185 | 186 | #[cfg(feature = "unstable")] 187 | unsafe impl SafeToCall for (T,) {} 188 | 189 | #[cfg(feature = "unstable")] 190 | unsafe impl SafeToCall for (T, U) {} 191 | 192 | #[cfg(feature = "unstable")] 193 | unsafe impl SafeToCall for (T, U, V) {} 194 | 195 | macro_rules! impl_safetocallord { 196 | ($($t:ty, )*) => { 197 | $( 198 | #[cfg(feature = "unstable")] 199 | unsafe impl SafeToCall for $t { } 200 | )* 201 | }; 202 | } 203 | 204 | impl_safetocallord!(String,); 205 | --------------------------------------------------------------------------------