├── .cargo └── config ├── .circleci └── config.yml ├── .github └── workflows │ └── rust.yml ├── .gitignore ├── Cargo.toml ├── LICENSE ├── README.md ├── benches └── bench.rs ├── examples └── 2d_delta_bench.rs ├── proptest-regressions ├── nibblepack_simd.txt └── nibblepacking.txt ├── rust-toolchain ├── rustfmt.toml ├── src ├── byteutils.rs ├── error.rs ├── filter.rs ├── histogram.rs ├── lib.rs ├── nibblepack_simd.rs ├── nibblepacking.rs ├── section.rs ├── sink.rs └── vector.rs └── vector_format.md /.cargo/config: -------------------------------------------------------------------------------- 1 | [build] 2 | rustflags = ["-C", "target-cpu=native", "-C", "target-feature=+avx2"] 3 | -------------------------------------------------------------------------------- /.circleci/config.yml: -------------------------------------------------------------------------------- 1 | version: 2 2 | 3 | jobs: 4 | build: 5 | docker: 6 | # The image used to build our project, build 7 | # your own using the Dockerfile provided below 8 | # and replace here. I put my own image here for 9 | # the example. 10 | # - image: abronan/rust-circleci:latest 11 | - image: liuchong/rustup:all 12 | 13 | environment: 14 | # Set your codecov token if your repository is private. 15 | #CODECOV_TOKEN: 16 | #TZ: "/usr/share/zoneinfo/Europe/Paris" 17 | 18 | steps: 19 | - checkout 20 | - restore_cache: 21 | key: project-cache 22 | # - run: 23 | # name: Check formatting 24 | # command: | 25 | # rustfmt --version 26 | # cargo fmt -- --write-mode=diff 27 | - run: 28 | name: Nightly Build 29 | command: | 30 | rustup run nightly rustc --version --verbose 31 | rustup run nightly cargo --version --verbose 32 | rustup run nightly cargo build --release 33 | # - run: 34 | # name: Stable Build 35 | # command: | 36 | # rustc --version --verbose 37 | # cargo --version --verbose 38 | # cargo build --release 39 | - run: 40 | name: Test 41 | command: rustup run nightly cargo test 42 | - save_cache: 43 | key: project-cache 44 | paths: 45 | - "~/.cargo" 46 | - "./target" -------------------------------------------------------------------------------- /.github/workflows/rust.yml: -------------------------------------------------------------------------------- 1 | name: Rust 2 | 3 | on: 4 | push: 5 | branches: [ main ] 6 | pull_request: 7 | branches: [ main ] 8 | 9 | env: 10 | CARGO_TERM_COLOR: always 11 | 12 | jobs: 13 | build: 14 | 15 | runs-on: ubuntu-latest 16 | 17 | steps: 18 | - uses: actions/checkout@v2 19 | - name: Build 20 | run: cargo build --verbose 21 | - name: Run tests 22 | run: cargo test --verbose 23 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | /target 2 | **/*.rs.bk 3 | /out.stacks 4 | /*.svg 5 | /Cargo.lock 6 | .DS_Store 7 | perf.data 8 | -------------------------------------------------------------------------------- /Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "compressed_vec" 3 | version = "0.1.1" 4 | authors = ["Evan Chan "] 5 | edition = "2021" 6 | description = "Floating point and integer compressed vector library, SIMD-enabled for fast processing/iteration over compressed representations." 7 | license = "Apache-2.0" 8 | readme = "README.md" 9 | repository = "https://github.com/velvia/compressed-vec" 10 | categories = ["compression", "data-structures", "encoding"] 11 | keywords = ["compression", "data-structures", "simd", "columnar", "float"] 12 | 13 | [dependencies] 14 | memoffset = "0.6.3" 15 | plain = "0.2.3" 16 | scroll = { version = "0.11", features = ["derive"] } 17 | arrayref = "0.3.7" 18 | enum_dispatch = "0.3.12" 19 | num = "0.4.1" 20 | smallvec = "1.11.2" 21 | num_enum = "0.7.1" 22 | 23 | # TODO: put this behind a feature flag 24 | packed_simd = { version = "0.3.9", features = ["into_bits"] } 25 | 26 | [dev-dependencies] 27 | criterion = "0.3" 28 | proptest = "0.9.1" 29 | 30 | [[bench]] 31 | name = "bench" 32 | harness = false 33 | 34 | [profile.bench] 35 | opt-level = 3 36 | debug = true 37 | 38 | [profile.release] 39 | debug = true 40 | 41 | [lib] 42 | # NOTE: Type "cdylib" is for JVM integration; "lib" is needed for benchmarks 43 | crate-type = ["staticlib", "cdylib", "lib"] 44 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "{}" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright {yyyy} {name of copyright owner} 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | 203 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## compressed_vec 2 | 3 | [![crate](https://img.shields.io/crates/v/compressed_vec.svg)](https://crates.io/crates/compressed_vec) 4 | [![documentation](https://docs.rs/compressed_vec/badge.svg)](https://docs.rs/compressed_vec) 5 | [![CircleCI](https://circleci.com/gh/velvia/compressed-vec.svg?style=shield)](https://circleci.com/gh/velvia/compressed-vec) 6 | 7 | Floating point and integer compressed vector library, SIMD-enabled for fast processing/iteration over compressed representations. 8 | 9 | This is a *compressed vec* library, rather than a *compression* library. What does that mean? A compression library takes some uncompressed data and provides essentially compress() and decompress() functions. Typically you have to decompress data to be able to do anything with it, resulting in extra latency and allocations. 10 | 11 | On the other hand, this *compressed vec* library allows you to iterate over and process the compressed representations directly. It is designed to balance fast iteration and SIMD processing/filtering, while compressing vectors to within 2x of the best columnar compression technology such as Apache Parquet, using techniques such as delta and XOR encoding. Applications: 12 | 13 | * Database engines 14 | * Large in-memory data processing 15 | * Games and other apps needing fast access to large quantities of FP vectors/matrices 16 | 17 | ### Performance Numbers 18 | 19 | Numbers are from my laptop: 2.9 GHz Core i9, 6/12 cores, 12MB L3, AVX2; from running `cargo bench vector`, which benchmarks a filter-and-count-matches operation directly on encoded/compressed vectors. 20 | 21 | | Vector type(s) | Elements/sec | Raw GBs per sec | 22 | | -------------- | ------------ | --------------- | 23 | | u32 dense (no sparsity) | 1.7 Gelems/s | 6.8 GB/s | 24 | | u32 sparse (99% zeros) | 13.9 Gelems/s | 55.6 GB/s | 25 | | Two u32 vectors (sparse + dense)* | 1.3-5.2 Gelems/s | 5-20 GB/s | 26 | | u64 vector, dense | 955M - 1.1 Gelems/s | 7.6 - 9.1 GB/s | 27 | | f32, XOR encoded, 60% density | 985 Melems/s | 3.9 GB/s | 28 | 29 | * The two u32 vector filtering speed (using `MultiVectorFilter`) depends on the order of the vectors. It is faster to filter the sparse vector first. 30 | 31 | ### Creation, Iteration 32 | 33 | To create an f32 compressed vector: 34 | 35 | ```rust 36 | use compressed_vec::VectorF32XorAppender; 37 | let mut appender = VectorF32XorAppender::try_new(2048).unwrap(); 38 | let encoded_bytes = appender.encode_all(vec![1.0, 1.5, 2.0, 2.5]).unwrap(); 39 | ``` 40 | 41 | The simplest way to iterate on this compressed vector (note this does not allocate at all): 42 | 43 | ```rust 44 | use compressed_vec::VectorReader; 45 | let reader = VectorReader::::try_new(&encoded_bytes[..]).unwrap(); 46 | let sum = reader.iterate().sum::(); // Yay, no allocations! 47 | ``` 48 | 49 | ### Filtering and SIMD Processing 50 | 51 | `iterate()` is the easiest API to go through individual elements of the compressed vector, but it is not the fastest. Fast data processing, such as done in the filter-and-count benchmarks above, are performed using `Sink`s, which are defined in the `sink` module. Sinks operate on a SIMD word at a time, and the sink API is designed for inlining. 52 | 53 | For example, let's say that we want to add 2.5 to the f32 vector above, and then write out the results to a `Vec`. Internally, XOR encoding and decoding is performed (using a sink). The sinks can be stacked during decoding, for an almost entirely SIMD pipeline: 54 | - `XorSink` (used automatically for f32 decoding) 55 | - `AddConstSink` (to add 2.5, again done using SIMD) 56 | - `VecSink` (writes output to a normal Vec) 57 | 58 | ```rust 59 | use compressed_vec::{VectorReader, AddConstSink, VecSink}; 60 | let reader = VectorReader::::try_new(&encoded_bytes[..]).unwrap(); 61 | let mut vecsink = VecSink::::new(); 62 | let mut addsink = AddConstSink::new(2.5f32, &mut vecsink); 63 | reader.decode_to_sink(&mut addsink).unwrap(); 64 | println!("And the transformed vector is: {:?}", vecsink.vec); 65 | ``` 66 | 67 | ### Vector Format 68 | 69 | Details of the vector format can be found [here](https://github.com/velvia/compressed-vec/blob/main/vector_format.md). 70 | 71 | The vector format follows columnar compression techniques used throughout the big data and database world, and roughly follows the Google [Procella](https://blog.acolyer.org/2019/09/11/procella/) paper with its custom Artus format: 72 | 73 | * Compression within 2x of ZSTD while operating directly on the data 74 | * Compression for this format is within 2x of Parquet, but is written to allow filtering and operating on the data directly without needing a separate decompression step for the entire vector 75 | * Multi-pass encoding 76 | * The `VectorAppender` collects min/max and other stats on the raw data and uses it to decide on the best encoding strategy (delta, etc.) 77 | * Exposing dictionary indices to query engine and aggressive pushdown to the data format 78 | * The format is designed to filter over dictionary codes, which speeds up filtering 79 | * The use of sections allows for many optimizations for filtering. For example, null sections and constant sections allow for very fast filter short-circuiting. 80 | 81 | ### Collaboration 82 | 83 | Please reach out to me to collaborate! -------------------------------------------------------------------------------- /benches/bench.rs: -------------------------------------------------------------------------------- 1 | #[macro_use] 2 | extern crate criterion; 3 | extern crate compressed_vec; 4 | 5 | use criterion::{Criterion, Benchmark, BenchmarkId, Throughput}; 6 | use compressed_vec::*; 7 | use compressed_vec::sink::{Sink, U32_256Sink}; 8 | use compressed_vec::section::{FixedSectReader, NibblePackMedFixedSect}; 9 | 10 | fn nibblepack8_varlen(c: &mut Criterion) { 11 | // This method from Criterion allows us to run benchmarks and vary some variable. 12 | // Thus we can discover how much effect the # of bits and # of nonzero elements make. 13 | // Two discoveries: 14 | // 1) Varying the # of nonzero elements matters, but not really the # of bits per element. 15 | // 2) For some odd reason, running just a single benchmark speeds it up significantly, running two slows down each one by half 16 | c.bench_function_over_inputs("nibblepack8 varying nonzeroes", |b, &&nonzeroes| { 17 | let mut inputbuf = [0u64; 8]; 18 | let mut buf = [0u8; 1024]; 19 | for i in 0..nonzeroes { 20 | inputbuf[i] = 0x1234u64 + i as u64; 21 | } 22 | b.iter(|| { 23 | nibblepacking::nibble_pack8(&inputbuf, &mut buf, 0).unwrap(); 24 | }) 25 | }, &[0, 2, 4, 6, 8]); 26 | } 27 | 28 | 29 | fn make_nonzeroes_u64x64(num_nonzeroes: usize) -> [u64; 64] { 30 | let mut inputs = [0u64; 64]; 31 | for i in 1..=num_nonzeroes { 32 | inputs[i] = (((i as f32) * std::f32::consts::PI / (num_nonzeroes as f32)).sin() * 1000.0) as u64 33 | } 34 | inputs 35 | } 36 | 37 | fn increasing_nonzeroes_u64x64(num_nonzeroes: usize) -> [u64; 64] { 38 | let mut inputs = make_nonzeroes_u64x64(num_nonzeroes); 39 | for i in 1..64 { 40 | inputs[i] = inputs[i - 1] + inputs[i]; 41 | } 42 | inputs 43 | } 44 | 45 | fn sinewave_varnonzeros_u32(fract_nonzeroes: f32, len: usize) -> Vec { 46 | let amp = 7.0 / fract_nonzeroes; 47 | let dist = 15.0 - amp; 48 | // Sinusoid between -1 and 1 with period of ~20 49 | (0..len).map(|i| ((i as f32) * std::f32::consts::PI / 10.0).sin()) 50 | // Change amplitude to 8/fract_nonzeroes; center so max is 16: 51 | // If value is negative, turn it into a zero 52 | .map(|i| { 53 | let new_value = i * amp + dist; 54 | if new_value >= 0.0 { new_value as u32 } else { 0 } 55 | }).collect() 56 | } 57 | 58 | // About 50% nonzeroes, but vary number of bits 59 | fn sinewave_varnumbits_u32(max_numbits: u8, len: usize) -> Vec { 60 | let amp = 2usize.pow(max_numbits as u32) - 1; 61 | // Sinusoid between -1 and 1 with period of ~20 62 | (0..len).map(|i| ((i as f32) * std::f32::consts::PI / 10.0).sin()) 63 | // If value is negative, turn it into a zero 64 | .map(|i| { 65 | let new_value = i * amp as f32; 66 | if new_value >= 0.0 { new_value as u32 } else { 0 } 67 | }).collect() 68 | } 69 | 70 | // Pack 64 u64's, variable number of them are 0 71 | fn pack_delta_u64s_varlen(c: &mut Criterion) { 72 | c.bench_function_over_inputs("pack delta u64s varying nonzero numbers", |b, &&nonzeroes| { 73 | let inputs = increasing_nonzeroes_u64x64(nonzeroes); 74 | let mut buf = [0u8; 1024]; 75 | b.iter(|| { 76 | nibblepacking::pack_u64_delta(&inputs, &mut buf).unwrap(); 77 | }) 78 | }, &[2, 4, 8, 16]); 79 | } 80 | 81 | fn unpack_delta_u64s(c: &mut Criterion) { 82 | c.bench_function("unpack delta u64s", |b| { 83 | let inputs = increasing_nonzeroes_u64x64(24); 84 | let mut buf = [0u8; 1024]; 85 | nibblepacking::pack_u64_delta(&inputs, &mut buf).unwrap(); 86 | 87 | let mut sink = nibblepacking::DeltaSink::new(); 88 | b.iter(|| { 89 | sink.reset(); 90 | nibblepacking::unpack(&buf[..], &mut sink, inputs.len()).unwrap(); 91 | }) 92 | }); 93 | } 94 | 95 | use section::FixedSectionWriter; 96 | 97 | fn section32_decode_dense_lowcard_varnonzeroes(c: &mut Criterion) { 98 | let mut group = c.benchmark_group("section u32 decode"); 99 | group.throughput(Throughput::Elements(256)); 100 | 101 | for nonzero_f in [0.05, 0.25, 0.5, 0.9, 1.0].iter() { 102 | let inputs = sinewave_varnonzeros_u32(*nonzero_f, 256); 103 | let mut buf = [0u8; 1024]; 104 | NibblePackMedFixedSect::::gen_stats_and_write(&mut buf, 0, &inputs[..]).unwrap(); 105 | 106 | group.bench_with_input(BenchmarkId::new("dense low card, nonzero%: ", *nonzero_f), &buf, 107 | |b, buf| b.iter(|| { 108 | let mut sink = U32_256Sink::new(); 109 | NibblePackMedFixedSect::::try_from(buf).unwrap().decode_to_sink(&mut sink).unwrap(); 110 | })); 111 | } 112 | } 113 | 114 | fn section32_decode_dense_varnumbits(c: &mut Criterion) { 115 | let mut group = c.benchmark_group("section u32 decode numbits"); 116 | group.throughput(Throughput::Elements(256)); 117 | 118 | for numbits in [4, 8, 12, 16, 20].iter() { 119 | let inputs = sinewave_varnumbits_u32(*numbits, 256); 120 | let mut buf = [0u8; 1024]; 121 | NibblePackMedFixedSect::::gen_stats_and_write(&mut buf, 0, &inputs[..]).unwrap(); 122 | 123 | group.bench_with_input(BenchmarkId::new("dense low card, numbits: ", *numbits), &buf, 124 | |b, buf| b.iter(|| { 125 | let mut sink = U32_256Sink::new(); 126 | NibblePackMedFixedSect::::try_from(buf).unwrap().decode_to_sink(&mut sink).unwrap(); 127 | })); 128 | } 129 | } 130 | 131 | const VECTOR_LENGTH: usize = 10000; 132 | 133 | fn dense_lowcard_vector() -> Vec { 134 | let inputs = sinewave_varnonzeros_u32(1.0, VECTOR_LENGTH); 135 | let mut appender = vector::VectorU32Appender::try_new(8192).unwrap(); 136 | appender.encode_all(inputs).unwrap() 137 | } 138 | 139 | fn dense_lowcard_u64_vector() -> Vec { 140 | let inputs = sinewave_varnonzeros_u32(1.0, VECTOR_LENGTH); 141 | let mut appender = vector::VectorU64Appender::try_new(8192).unwrap(); 142 | inputs.iter().for_each(|&a| appender.append(a as u64).unwrap()); 143 | appender.finish(VECTOR_LENGTH).unwrap() 144 | } 145 | 146 | fn dense_lowcard_f32_vector() -> Vec { 147 | let inputs = sinewave_varnonzeros_u32(0.6, VECTOR_LENGTH); 148 | let mut appender = vector::VectorF32XorAppender::try_new(VECTOR_LENGTH * 2).unwrap(); 149 | inputs.iter().for_each(|&a| appender.append(a as f32 / 1.4).unwrap()); 150 | appender.finish(VECTOR_LENGTH).unwrap() 151 | } 152 | 153 | fn dense_delta_u64_vector(delta: u64) -> Vec { 154 | let inputs = sinewave_varnonzeros_u32(1.0, VECTOR_LENGTH); 155 | let mut appender = vector::VectorU64Appender::try_new(8192).unwrap(); 156 | inputs.iter().for_each(|&a| appender.append(a as u64 + delta).unwrap()); 157 | appender.finish(VECTOR_LENGTH).unwrap() 158 | } 159 | 160 | fn sparse_lowcard_vector(num_nonzeroes: usize) -> Vec { 161 | let nonzeroes = sinewave_varnonzeros_u32(1.0, num_nonzeroes/2); 162 | let nulls = VECTOR_LENGTH - num_nonzeroes; 163 | 164 | let mut appender = vector::VectorU32Appender::try_new(8192).unwrap(); 165 | appender.append_nulls(nulls/4).unwrap(); 166 | nonzeroes.iter().for_each(|a| appender.append(*a).unwrap()); 167 | appender.append_nulls(nulls/2).unwrap(); 168 | nonzeroes.iter().for_each(|a| appender.append(*a).unwrap()); 169 | appender.finish(VECTOR_LENGTH).unwrap() 170 | } 171 | 172 | fn bench_filter_vect(c: &mut Criterion) { 173 | let mut group = c.benchmark_group("vector filtering"); 174 | group.throughput(Throughput::Elements(VECTOR_LENGTH as u64)); 175 | 176 | let dense_vect = dense_lowcard_vector(); 177 | let sparse_vect = sparse_lowcard_vector(100); 178 | let dense_reader = vector::VectorReader::::try_new(&dense_vect[..]).unwrap(); 179 | let sparse_reader = vector::VectorReader::::try_new(&sparse_vect[..]).unwrap(); 180 | // To verify composition of dense vector 181 | // dbg!(vector::VectorStats::new(&dense_reader)); 182 | // println!("Dense vector summary: {}", vector::VectorStats::new(&dense_reader).summary_string()); 183 | 184 | group.bench_function("lowcard u32", |b| b.iter(|| { 185 | let filter_iter = dense_reader.filter_iter(filter::EqualsSink::::new(&3)); 186 | filter::count_hits(filter_iter); 187 | })); 188 | group.bench_function("very sparse lowcard u32", |b| b.iter(|| { 189 | let filter_iter = sparse_reader.filter_iter(filter::EqualsSink::::new(&15)); 190 | filter::count_hits(filter_iter); 191 | })); 192 | group.bench_function("dense + sparse lowcard combo", |b| b.iter(|| { 193 | let dense_iter = dense_reader.filter_iter(filter::EqualsSink::::new(&3)); 194 | let sparse_iter = sparse_reader.filter_iter(filter::EqualsSink::::new(&15)); 195 | let filter_iter = filter::MultiVectorFilter::new(vec![dense_iter, sparse_iter]); 196 | filter::count_hits(filter_iter); 197 | })); 198 | group.bench_function("sparse + dense lowcard combo", |b| b.iter(|| { 199 | let dense_iter = dense_reader.filter_iter(filter::EqualsSink::::new(&3)); 200 | let sparse_iter = sparse_reader.filter_iter(filter::EqualsSink::::new(&15)); 201 | let filter_iter = filter::MultiVectorFilter::new(vec![sparse_iter, dense_iter]); 202 | filter::count_hits(filter_iter); 203 | })); 204 | 205 | group.finish(); 206 | } 207 | 208 | fn bench_filter_u64_vect(c: &mut Criterion) { 209 | let mut group = c.benchmark_group("u64 vector filtering"); 210 | group.throughput(Throughput::Elements(VECTOR_LENGTH as u64)); 211 | 212 | let dense_vect = dense_lowcard_u64_vector(); 213 | let dense_reader = vector::VectorReader::::try_new(&dense_vect[..]).unwrap(); 214 | 215 | let delta_vect = dense_delta_u64_vector(100_000u64); 216 | let delta_reader = vector::VectorReader::::try_new(&delta_vect[..]).unwrap(); 217 | 218 | group.bench_function("lowcard", |b| b.iter(|| { 219 | let filter_iter = dense_reader.filter_iter(filter::EqualsSink::::new(&3)); 220 | filter::count_hits(filter_iter); 221 | })); 222 | 223 | group.bench_function("delta lowcard", |b| b.iter(|| { 224 | let filter_iter = delta_reader.filter_iter(filter::EqualsSink::::new(&100_003)); 225 | filter::count_hits(filter_iter); 226 | })); 227 | 228 | group.finish(); 229 | } 230 | 231 | fn bench_filter_f32_vect(c: &mut Criterion) { 232 | let mut group = c.benchmark_group("f32 vector filtering"); 233 | group.throughput(Throughput::Elements(VECTOR_LENGTH as u64)); 234 | 235 | let dense_vect = dense_lowcard_f32_vector(); 236 | let dense_reader = vector::VectorReader::::try_new(&dense_vect[..]).unwrap(); 237 | 238 | group.bench_function("lowcard 60% density", |b| b.iter(|| { 239 | let filter_iter = dense_reader.filter_iter(filter::EqualsSink::::new(&3.0)); 240 | filter::count_hits(filter_iter); 241 | })); 242 | 243 | group.finish(); 244 | } 245 | 246 | const BATCH_SIZE: usize = 100; 247 | 248 | fn repack_2d_deltas(c: &mut Criterion) { 249 | c.bench("repack 2D diff deltas", 250 | Benchmark::new("100x u64", |b| { 251 | 252 | let orig = increasing_nonzeroes_u64x64(16); 253 | let mut inputs = [0u64; 64]; 254 | let mut srcbuf = [0u8; 1024]; 255 | for i in 0..BATCH_SIZE { 256 | for j in 0..orig.len() { 257 | inputs[j] = orig[j] + ((j + i) as u64); 258 | } 259 | nibblepacking::pack_u64_delta(&inputs, &mut srcbuf).unwrap(); 260 | } 261 | 262 | let mut out_buf = [0u8; 4096]; 263 | let mut sink = histogram::DeltaDiffPackSink::new(inputs.len(), &mut out_buf); 264 | 265 | b.iter(|| { 266 | // Reset out_buf and sink last_deltas state 267 | sink.reset(); 268 | let mut slice = &srcbuf[..]; 269 | 270 | for _ in 0..BATCH_SIZE { 271 | let res = nibblepacking::unpack(slice, &mut sink, 64); 272 | sink.finish(); 273 | slice = res.unwrap(); 274 | } 275 | }) 276 | }).throughput(Throughput::Elements(BATCH_SIZE as u64))); 277 | } 278 | 279 | criterion_group!(benches, //nibblepack8_varlen, 280 | pack_delta_u64s_varlen, 281 | unpack_delta_u64s, 282 | section32_decode_dense_lowcard_varnonzeroes, 283 | section32_decode_dense_varnumbits, 284 | bench_filter_vect, 285 | bench_filter_u64_vect, 286 | bench_filter_f32_vect, 287 | // repack_2d_deltas, 288 | ); 289 | criterion_main!(benches); 290 | -------------------------------------------------------------------------------- /examples/2d_delta_bench.rs: -------------------------------------------------------------------------------- 1 | use std::fs::File; 2 | use std::io::{BufRead, BufReader}; 3 | use std::time::Instant; 4 | use compressed_vec::{histogram, nibblepacking}; 5 | use compressed_vec::sink::Sink; 6 | 7 | /// 8 | /// 2d_delta_bench <> 9 | /// An example benchmark that reads histogram data from a file and repeatedly decodes delta-encoded 10 | /// histograms using DeltaDiffPackSink (ie re-encoding into 2D diff encoded histograms that aren't 11 | /// increasing any more) 12 | /// Each line of the file is CSV, no headers, and is expected to be Prom style (ie increasing bucket to bucket 13 | /// and increasing over time). 14 | /// 15 | /// NOTE: be sure to compile in release mode for benchmarking, ie cargo build --release --example 2d_delta_bench 16 | fn main() { 17 | const NUM_BUCKETS: usize = 64; 18 | const NUM_LOOPS: usize = 1000; 19 | 20 | let filename = std::env::args().nth(1).expect("No filename given"); 21 | let file = File::open(filename).unwrap(); 22 | let mut srcbuf = [0u8; 65536]; 23 | let mut num_lines = 0; 24 | let mut offset = 0; 25 | 26 | for line in BufReader::new(&file).lines() { 27 | // Split and trim lines, parsing into u64. Delta encode 28 | let mut last = 0u64; 29 | let line = line.expect("Could not parse line"); 30 | let num_iter = line.split(',') 31 | .map(|s| s.trim().parse::().unwrap()) 32 | .map(|n| { 33 | let delta = n.saturating_sub(last); 34 | last = n; 35 | delta 36 | }); 37 | offset = nibblepacking::pack_u64(num_iter, &mut srcbuf, offset).unwrap(); 38 | num_lines += 1; 39 | } 40 | 41 | println!("Finished reading and compressing {} histograms, now running {} iterations of 2D Delta...", 42 | num_lines, NUM_LOOPS); 43 | 44 | let mut out_buf = [0u8; 4096]; 45 | let mut sink = histogram::DeltaDiffPackSink::new(NUM_BUCKETS, &mut out_buf); 46 | let start = Instant::now(); 47 | 48 | for _ in 0..NUM_LOOPS { 49 | // Reset out_buf and sink last_deltas state 50 | sink.reset(); 51 | let mut slice = &srcbuf[..]; 52 | 53 | for _ in 0..num_lines { 54 | let res = nibblepacking::unpack(slice, &mut sink, NUM_BUCKETS); 55 | sink.finish(); 56 | slice = res.unwrap(); 57 | } 58 | } 59 | 60 | let elapsed_millis = start.elapsed().as_millis(); 61 | let rate = (num_lines * NUM_LOOPS * 1000) as u128 / elapsed_millis; 62 | println!("{} encoded in {} ms = {} histograms/sec", num_lines * NUM_LOOPS, elapsed_millis, rate); 63 | } -------------------------------------------------------------------------------- /proptest-regressions/nibblepack_simd.txt: -------------------------------------------------------------------------------- 1 | # Seeds for failure cases proptest has generated in the past. It is 2 | # automatically read and these particular cases re-run before any 3 | # novel cases are generated. 4 | # 5 | # It is recommended to check this file in to source control so that 6 | # everyone who runs the test benefits from these saved cases. 7 | cc e1f68d10a8614a3050e2e369102ec8956610f7fa95068f25abbc7deae9ad16cd # shrinks to input = [0, 0] 8 | cc 4e600b27f6415c12f427161f90438393356dde5bf8af88243941dc8cbd81e74f # shrinks to input = [1, 1, 256] 9 | -------------------------------------------------------------------------------- /proptest-regressions/nibblepacking.txt: -------------------------------------------------------------------------------- 1 | # Seeds for failure cases proptest has generated in the past. It is 2 | # automatically read and these particular cases re-run before any 3 | # novel cases are generated. 4 | # 5 | # It is recommended to check this file in to source control so that 6 | # everyone who runs the test benefits from these saved cases. 7 | cc ee2680382a64334086235c0cdb396b7c352c31903126af9bcb51f4bf5775547f # shrinks to input = [0, 0, 0, 0, 1152921504606846976, 0, 0, 1] 8 | cc ed3dc445c907210ec88143e0a29a0b708c0ea587fdc912030ec9c3e2a339eac1 # shrinks to input = [0, 0, 0, 0, 0, 0, 1, 16] 9 | cc 925f091431c47b84c47529bd2485ea0af517efa9975cc4ad4a222abbb9bb2c75 # shrinks to input = [1, 2, 3, 4, 5, 6, 7, 4294967303, 4294967304, 4294967305, 4294967306, 4294967307, 8589934603, 8589934604, 8589934605, 8589934606, 8589934607, 8589934608, 8589934609, 8589934610, 8589934611, 8589934612, 12884901908, 12884901909, 12884901910, 12884901911, 12884901912, 12884901913, 12884901914, 17179869210, 17179869211, 17179869212, 17179869213, 17179869214, 17179869215, 17179869216, 17179869217, 17179869218, 17179869219, 21474836515, 21474836516, 21474836517, 21474836518, 21474836519, 21474836520, 21474836521, 25769803817, 25769803818, 25769803819, 30064771115, 30064771116, 30064771117, 30064771118, 30064771119] 10 | cc b8f14f1e84da566aff6471dc24a2f49c5b062f23a2fa1422a42165628d11f7c3 # shrinks to input = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5596655808] 11 | -------------------------------------------------------------------------------- /rust-toolchain: -------------------------------------------------------------------------------- 1 | nightly-2023-12-06 -------------------------------------------------------------------------------- /rustfmt.toml: -------------------------------------------------------------------------------- 1 | unstable_features = true 2 | indent_style = "Visual" 3 | use_small_heuristics = "Max" 4 | -------------------------------------------------------------------------------- /src/byteutils.rs: -------------------------------------------------------------------------------- 1 | use crate::error::CodingError; 2 | 3 | use scroll::{Pread, Pwrite, LE}; 4 | 5 | /// Fast write of u64. numbytes least significant bytes are written. 6 | /// Writes into out_buffer[offset..offset+numbytes]. 7 | /// Returns offset+numbytes 8 | #[inline] 9 | pub fn direct_write_uint_le(out_buffer: &mut [u8], 10 | offset: usize, 11 | value: u64, 12 | numbytes: usize) -> Result { 13 | // By default, write all 8 bytes checking that there's enough space. 14 | // We only adjust offset by numbytes, so the happy path is pretty fast. 15 | let _num_written = out_buffer.pwrite_with(value, offset, LE) 16 | .or_else(|err| match err { 17 | _ => { 18 | if out_buffer.len() < offset + numbytes { 19 | Err(CodingError::NotEnoughSpace) 20 | } else { 21 | // Copy only numbytes bytes to be memory safe 22 | let bytes = value.to_le_bytes(); 23 | out_buffer[offset..offset+numbytes].copy_from_slice(&bytes[0..numbytes]); 24 | Ok(numbytes) 25 | } 26 | }, 27 | })?; 28 | Ok(offset + numbytes) 29 | } 30 | 31 | /// Reads u64 value, even if there are less than 8 bytes left. Reads are little endian. 32 | /// Will never read beyond end of inbuf. Pos is position within inbuf. 33 | #[inline(always)] 34 | pub fn direct_read_uint_le(inbuf: &[u8], pos: usize) -> Result { 35 | inbuf.pread_with::(pos, LE) 36 | .or_else(|err| match err { 37 | _ => { 38 | let remaining = inbuf.len() as isize - (pos as isize); 39 | if remaining > 0 { 40 | let mut buf = [0u8; 8]; 41 | buf[0..remaining as usize].copy_from_slice(&inbuf[pos..]); 42 | Ok(u64::from_le_bytes(buf)) 43 | } else { 44 | Err(CodingError::NotEnoughSpace) 45 | } 46 | } 47 | }) 48 | } 49 | -------------------------------------------------------------------------------- /src/error.rs: -------------------------------------------------------------------------------- 1 | #[derive(Debug, PartialEq)] 2 | pub enum CodingError { 3 | NotEnoughSpace, 4 | InputTooShort, 5 | BadOffset(usize), 6 | InvalidSectionType(u8), 7 | InvalidFormat(String), 8 | InvalidNumRows(usize, usize), // Number passed into finish(), number of actual rows written so far 9 | WrongVectorType(u8), // Eg Used a VectorReader:: on a u32 vector 10 | ScrollErr(String), 11 | } 12 | 13 | impl From for CodingError { 14 | fn from(err: scroll::Error) -> CodingError { 15 | match err { 16 | scroll::Error::TooBig { .. } => CodingError::NotEnoughSpace, 17 | scroll::Error::BadOffset(off) => CodingError::BadOffset(off), 18 | _ => CodingError::ScrollErr(err.to_string()), 19 | } 20 | } 21 | } 22 | -------------------------------------------------------------------------------- /src/filter.rs: -------------------------------------------------------------------------------- 1 | /// The `filter` module contains traits for fast filtering of vectors. 2 | /// U32 vectors have SIMD-enabled filtering support for each section, which is 3 | /// 256 elements long to enable SIMD bitmasking on AVX2 with a single instruction. 4 | /// 5 | /// TODO: add examples for EqualsSink, OneOfSink, etc. 6 | /// 7 | use core::marker::PhantomData; 8 | 9 | use packed_simd::u32x8; 10 | use smallvec::SmallVec; 11 | 12 | use crate::section::*; 13 | use crate::sink::{Sink, SinkInput}; 14 | 15 | 16 | /// A Sink designed to filter 256-section vectors. The workflow: 17 | /// 1. Call sink.reset() 18 | /// 2. Call decode on section with this sink 19 | /// 3. get_mask() 20 | /// - If the section is null, instead call null_mask() 21 | pub trait SectFilterSink: Sink { 22 | /// Gets the mask, one bit is ON for each match in the section 23 | fn get_mask(&self) -> u32x8; 24 | 25 | /// Returns a mask when its a null section 26 | fn null_mask(&self) -> u32x8; 27 | } 28 | 29 | 30 | /// A Predicate is the value(s) for a filter to filter against 31 | pub trait Predicate { 32 | type Input; 33 | 34 | /// Returns true if the predicate matches null or zero values 35 | fn pred_matches_zero(input: &Self::Input) -> bool; 36 | 37 | /// Creates this predicate from a predicate input type 38 | fn from_input(input: &Self::Input) -> Self; 39 | } 40 | 41 | pub trait InnerFilter { 42 | type P: Predicate; 43 | 44 | /// This method is called with the SinkInput from the decoder, and has to do filtering using 45 | /// the predicate type and return a bitmask; LSB=first item processed 46 | fn filter_bitmask(pred: &Self::P, decoded: T::SI) -> u8; 47 | } 48 | 49 | /// Sink designed to filter 8 items at a time from the decoder, building up a bitmask for each section. 50 | /// It is generic for different predicates and base types. Has optimizations for null sections. 51 | #[repr(align(16))] // To ensure the mask is aligned and can transmute to u32 52 | #[derive(Debug)] 53 | pub struct GenericFilterSink> { 54 | mask: [u8; 32], 55 | predicate: IF::P, 56 | i: usize, 57 | match_zero: bool, // true if zero value will be matched by the predicate 58 | } 59 | 60 | impl> GenericFilterSink { 61 | pub fn new(input: &>::Input) -> Self { 62 | Self { 63 | mask: [0u8; 32], 64 | predicate: IF::P::from_input(input), 65 | i: 0, 66 | match_zero: IF::P::pred_matches_zero(input), 67 | } 68 | } 69 | } 70 | 71 | impl> Sink for GenericFilterSink { 72 | #[inline] 73 | fn process_zeroes(&mut self) { 74 | self.mask[self.i] = if self.match_zero { 0xff } else { 0 }; 75 | self.i += 1; 76 | } 77 | 78 | #[inline] 79 | fn process(&mut self, unpacked: T::SI) { 80 | self.mask[self.i] = IF::filter_bitmask(&self.predicate, unpacked); 81 | self.i += 1; 82 | } 83 | 84 | #[inline] 85 | fn reset(&mut self) { 86 | self.i = 0; 87 | } 88 | } 89 | 90 | const ALL_MATCHES: u32x8 = u32x8::splat(0xffff_ffff); // All 1's 91 | const NO_MATCHES: u32x8 = u32x8::splat(0); 92 | 93 | impl> SectFilterSink for GenericFilterSink { 94 | #[inline] 95 | fn get_mask(&self) -> u32x8 { 96 | // NOTE: we transmute the mask to u32; 8. This is safe because we have aligned the struct for 16 bytes. 97 | let u32array = unsafe { 98 | std::mem::transmute::<[u8; 32], [u32; 8]>(self.mask) 99 | }; 100 | u32x8::from(u32array) 101 | } 102 | 103 | #[inline] 104 | fn null_mask(&self) -> u32x8 { 105 | if self.match_zero { ALL_MATCHES } else { NO_MATCHES } 106 | } 107 | } 108 | 109 | 110 | /// A predicate containing 8 values (probably SIMD) of each type for single comparisons 111 | // type SingleValuePredicate = ::SI; 112 | pub struct SingleValuePredicate { 113 | pred: T::SI, 114 | } 115 | 116 | impl Predicate for SingleValuePredicate { 117 | type Input = T; 118 | #[inline] 119 | fn pred_matches_zero(input: &T) -> bool { 120 | input.is_zero() 121 | } 122 | 123 | #[inline] 124 | fn from_input(input: &T) -> Self { 125 | Self { pred: T::SI::splat(*input) } 126 | } 127 | } 128 | 129 | pub struct EqualsIF {} 130 | 131 | impl InnerFilter for EqualsIF { 132 | type P = SingleValuePredicate; 133 | #[inline] 134 | fn filter_bitmask(p: &Self::P, decoded: T::SI) -> u8 { 135 | T::SI::eq_mask(p.pred, decoded) 136 | } 137 | } 138 | 139 | pub type EqualsSink = GenericFilterSink; 140 | 141 | 142 | /// A predicate for low cardinality SET membership (one of/IN matches), consisting of a Vec of 8 values each 143 | pub struct MembershipPredicate { 144 | set: Vec, 145 | } 146 | 147 | impl Predicate for MembershipPredicate { 148 | // Up to 4 items in the set, heap allocation not needed 149 | type Input = SmallVec<[T; 4]>; 150 | #[inline] 151 | fn pred_matches_zero(input: &Self::Input) -> bool { 152 | // If any member of set is 0, then pred can match 0 153 | input.iter().any(|x| x.is_zero()) 154 | } 155 | 156 | #[inline] 157 | fn from_input(input: &Self::Input) -> Self { 158 | Self { set: input.iter().map(|&item| T::SI::splat(item)).collect() } 159 | } 160 | } 161 | 162 | pub struct OneOfIF {} 163 | 164 | impl InnerFilter for OneOfIF { 165 | type P = MembershipPredicate; 166 | #[inline] 167 | fn filter_bitmask(p: &Self::P, decoded: T::SI) -> u8 { 168 | // SIMD compare of decoded value with each of the predicates, OR resulting masks 169 | let mut mask = 0u8; 170 | for pred in &p.set { 171 | mask |= T::SI::eq_mask(*pred, decoded); 172 | } 173 | mask 174 | } 175 | } 176 | 177 | pub type OneOfSink = GenericFilterSink; 178 | 179 | 180 | /// A Unary filter takes one mask input, does some kind of filtering and creates a new mask. 181 | /// Filters that process and filter vectors are a subset of the above. 182 | pub trait UnaryFilter { 183 | /// Filters input mask where each bit ON = match, and returns output mask 184 | fn filter(input: u32x8) -> u32x8; 185 | } 186 | 187 | /// Allows for filtering over each section of a vector. 188 | /// Yields an Iterator of u32x8 mask for each section in the vector. 189 | pub struct VectorFilter<'buf, SF, T> 190 | where T: VectBase, 191 | SF: SectFilterSink { 192 | sect_iter: FixedSectIterator<'buf, T>, 193 | sf: SF, 194 | _t: PhantomData, 195 | } 196 | 197 | impl<'buf, SF, T> VectorFilter<'buf, SF, T> 198 | where T: VectBase, 199 | SF: SectFilterSink { 200 | pub fn new(vector_bytes: &'buf [u8], sf: SF) -> Self { 201 | Self { sect_iter: FixedSectIterator::new(vector_bytes), sf, _t: PhantomData } 202 | } 203 | 204 | /// Advances the iterator without calling the filter. This is used to skip processing the filter 205 | /// for short circuiting. 206 | #[inline] 207 | pub fn advance(&mut self) { 208 | self.sect_iter.next(); 209 | } 210 | } 211 | 212 | impl<'buf, SF, T> Iterator for VectorFilter<'buf, SF, T> 213 | where T: VectBase, 214 | SF: SectFilterSink { 215 | type Item = u32x8; 216 | 217 | #[inline] 218 | fn next(&mut self) -> Option { 219 | self.sect_iter.next() 220 | .and_then(|res| { 221 | let sect = res.expect("This should not fail!"); 222 | if sect.is_null() { 223 | Some(self.sf.null_mask()) 224 | } else { 225 | self.sf.reset(); 226 | sect.decode(&mut self.sf).ok()?; 227 | Some(self.sf.get_mask()) 228 | } 229 | }) 230 | } 231 | } 232 | 233 | /// Helper to facilitate filtering multiple vectors at the same time, 234 | /// this one filters by the same type of filter (eg all Equals). 235 | /// For each group of sections, the same section filter masks are then ANDed together. 236 | /// It has one optimization: it short-circuits the ANDing as soon as the masking creates 237 | /// an all-zero mask. Thus it makes sense to put the most sparse and least likely to hit 238 | /// vector first. 239 | pub struct MultiVectorFilter<'buf , SF, T> 240 | where SF: SectFilterSink, 241 | T: VectBase { 242 | vect_filters: Vec> 243 | } 244 | 245 | impl<'buf, SF, T> MultiVectorFilter<'buf, SF, T> 246 | where SF: SectFilterSink, 247 | T: VectBase { 248 | pub fn new(vect_filters: Vec>) -> Self { 249 | if vect_filters.is_empty() { panic!("Cannot pass in empty filters to MultiVectorFilter"); } 250 | Self { vect_filters } 251 | } 252 | } 253 | 254 | impl<'buf, SF, T> Iterator for MultiVectorFilter<'buf, SF, T> 255 | where SF: SectFilterSink, 256 | T: VectBase { 257 | type Item = u32x8; 258 | 259 | #[inline] 260 | fn next(&mut self) -> Option { 261 | // Get first filter 262 | let mut mask = match self.vect_filters[0].next() { 263 | Some(m) => m, 264 | None => return None, // Assume end of one vector is end of all 265 | }; 266 | let mut n = 1; 267 | 268 | // Keep going if filter is not empty and there are still vectors to go 269 | while n < self.vect_filters.len() && mask != NO_MATCHES { 270 | mask &= match self.vect_filters[n].next() { 271 | Some(m) => m, 272 | None => return None, 273 | }; 274 | n += 1; 275 | } 276 | 277 | // short-circuit: just advance the iterator if we're already at zero mask. 278 | // No need to do expensive filtering. 279 | while n < self.vect_filters.len() { 280 | self.vect_filters[n].advance(); 281 | n += 1; 282 | } 283 | 284 | Some(mask) 285 | } 286 | } 287 | 288 | pub type EmptyFilter = std::iter::Empty; 289 | 290 | pub const EMPTY_FILTER: EmptyFilter = std::iter::empty::(); 291 | 292 | 293 | /// Counts the output of VectorFilter iterator (or multiple VectorFilter results ANDed together) 294 | /// for all the 1's in the output and returns the total 295 | /// SIMD count_ones() is used for fast counting 296 | pub fn count_hits(filter_iter: I) -> usize 297 | where I: Iterator { 298 | (filter_iter.map(|mask| mask.count_ones().wrapping_sum()).sum::()) as usize 299 | } 300 | 301 | /// Creates a Vec of the element positions where matches occur 302 | pub fn match_positions(filter_iter: I) -> Vec 303 | where I: Iterator { 304 | let mut pos = 0; 305 | let mut matches = Vec::::new(); 306 | filter_iter.for_each(|mask| { 307 | for word in 0..8 { 308 | let u32mask = mask.extract(word); 309 | if u32mask != 0 { 310 | // TODO: find highest bit (intrinsic) for O(on bits) speed 311 | for bit in 0..32 { 312 | if (u32mask & (1 << bit)) != 0 { 313 | matches.push(pos); 314 | } 315 | pos += 1; 316 | } 317 | } 318 | } 319 | }); 320 | matches 321 | } 322 | 323 | #[cfg(test)] 324 | mod tests { 325 | use super::*; 326 | 327 | use smallvec::smallvec; 328 | use crate::filter::match_positions; 329 | use crate::vector::{VectorU32Appender, VectorU64Appender, VectorReader}; 330 | 331 | #[test] 332 | fn test_filter_u64_equals() { 333 | let vector_size: usize = 400; 334 | let mut appender = VectorU64Appender::try_new(1024).unwrap(); 335 | for i in 0..vector_size { 336 | appender.append((i as u64 % 4) + 1).unwrap(); 337 | } 338 | let finished_vec = appender.finish(vector_size).unwrap(); 339 | 340 | let reader = VectorReader::::try_new(&finished_vec[..]).unwrap(); 341 | let filter_iter = reader.filter_iter(EqualsSink::::new(&3)); 342 | let matches = match_positions(filter_iter); 343 | assert_eq!(matches.len(), vector_size / 4); 344 | 345 | // 1, 2, 3... so match for 3 starts at position 2 346 | let expected_pos: Vec<_> = (2..vector_size).step_by(4).collect(); 347 | assert_eq!(matches, expected_pos); 348 | } 349 | 350 | #[test] 351 | fn test_filter_u32_oneof() { 352 | let vector_size: usize = 400; 353 | let mut appender = VectorU32Appender::try_new(1024).unwrap(); 354 | for i in 0..vector_size { 355 | appender.append((i as u32 % 12) + 1).unwrap(); 356 | } 357 | let finished_vec = appender.finish(vector_size).unwrap(); 358 | 359 | let reader = VectorReader::::try_new(&finished_vec[..]).unwrap(); 360 | let filter_iter = reader.filter_iter(OneOfSink::::new(&smallvec![3, 5])); 361 | let matches = match_positions(filter_iter); 362 | 363 | // 3 and 5 are 1/6th of 12 values. 400/6=66 but 400%12=4, so the 3 is last value matched again 364 | assert_eq!(matches.len(), 67); 365 | 366 | // 3, 5 are positions 2, 4..... etc. 367 | let mut expected_pos: Vec<_> = (2..vector_size).step_by(12).map(|i| vec![i, i+2]).flatten().collect(); 368 | // have to trim last item 369 | expected_pos.resize(67, 0); 370 | assert_eq!(matches, expected_pos); 371 | } 372 | } 373 | -------------------------------------------------------------------------------- /src/histogram.rs: -------------------------------------------------------------------------------- 1 | use packed_simd::u64x8; 2 | use plain::Plain; 3 | use crate::nibblepacking::*; 4 | use crate::sink::Sink; 5 | 6 | #[derive(Copy, Clone, Debug)] 7 | #[repr(u8)] 8 | pub enum BinHistogramFormat { 9 | Empty = 0x00, 10 | GeometricDelta = 0x01, 11 | Geometric1Delta = 0x02, 12 | } 13 | 14 | /// Header for a compressed histogram, not including any length prefix bytes. A compressed histogram 15 | /// contains bucket definitions and compressed bucket values, usually compressed using nibblepacking. 16 | #[repr(C, packed)] 17 | #[derive(Copy, Clone, Debug)] 18 | struct BinHistogramHeader { 19 | format_code: BinHistogramFormat, 20 | bucket_def_len: u16, 21 | num_buckets: u16, 22 | } 23 | 24 | unsafe impl Plain for BinHistogramHeader {} 25 | 26 | impl BinHistogramHeader { 27 | #[allow(dead_code)] 28 | pub fn from_bytes(buf: &[u8]) -> &BinHistogramHeader { 29 | plain::from_bytes(buf).expect("The buffer is either too short or not aligned!") 30 | } 31 | 32 | // Returns the byte slice for the compressed binary bucket values 33 | #[allow(dead_code)] 34 | pub fn values_byteslice<'a>(&self, buf: &'a [u8]) -> &'a [u8] { 35 | let values_index = offset_of!(BinHistogramHeader, num_buckets) + self.bucket_def_len as usize; 36 | &buf[values_index..] 37 | } 38 | } 39 | 40 | #[repr(C, packed)] 41 | #[derive(Copy, Clone, Debug)] 42 | struct PackedGeometricBuckets { 43 | initial_bucket: f64, 44 | multiplier: f64, 45 | } 46 | 47 | unsafe impl Plain for PackedGeometricBuckets {} 48 | 49 | /// 50 | /// Compresses raw histogram values with geometric bucket definitions and non-increasing bucket values as a delta- 51 | /// encoded compressed histogram -- ie, the raw values will be considered deltas and parsed as increasing buckets. 52 | /// 53 | /// This method should be called to convert non-increasing histogram buckets to the internal increasing bucket 54 | /// format. The outbuf must have been cleared already though it can have other data in it. 55 | pub fn compress_geom_nonincreasing(num_buckets: u16, 56 | initial_bucket: f64, 57 | multiplier: f64, 58 | format_code: BinHistogramFormat, 59 | bucket_values: &[u64], 60 | outbuf: &mut [u8]) { 61 | // First, write out BinHistogramHeader 62 | let bucket_def_len = mem::size_of::() as u16 + 2; 63 | let header = BinHistogramHeader::from_mut_bytes(outbuf).unwrap(); 64 | header.format_code = format_code; 65 | header.bucket_def_len = bucket_def_len; 66 | header.num_buckets = num_buckets; 67 | 68 | // Then, write out geometric values 69 | let header_size = mem::size_of::(); 70 | let geom_buckets = PackedGeometricBuckets::from_mut_bytes(&mut outbuf[header_size..]).unwrap(); 71 | geom_buckets.initial_bucket = initial_bucket; 72 | geom_buckets.multiplier = multiplier; 73 | 74 | // Finally, pack the values 75 | pack_u64(bucket_values.into_iter().cloned(), outbuf, (bucket_def_len + 3) as usize).unwrap(); 76 | } 77 | 78 | /// 79 | /// A sink used for increasing histogram counters. In one shot: 80 | /// - Unpacks a delta-encoded NibblePack compressed Histogram 81 | /// - Subtracts the values from lastHistValues, noting if the difference is not >= 0 (means counter reset) 82 | /// - Packs the subtracted values 83 | /// - Updates lastHistValues to the latest unpacked values so this sink can be used again 84 | /// 85 | /// Meant to be used again and again to parse next histogram, thus the last_hist_deltas 86 | /// state is reused to compute the next set of deltas. 87 | /// If the new set of values is less than last_hist_deltas then the new set of values is 88 | /// encoded instead of the diffs. 89 | /// For more details, see the "2D Delta" section in [compression.md](doc/compression.md) 90 | #[derive(Default)] 91 | #[derive(Debug)] 92 | pub struct DeltaDiffPackSink<'a> { 93 | value_dropped: bool, 94 | i: usize, 95 | last_hist_deltas: Vec, 96 | pack_array: [u64; 8], 97 | out_offset: usize, 98 | out_buf: &'a mut [u8], 99 | } 100 | 101 | impl<'a> DeltaDiffPackSink<'a> { 102 | /// Creates new DeltaDiffPackSink 103 | pub fn new(num_buckets: usize, out_buf: &'a mut [u8]) -> Self { 104 | let mut last_hist_deltas = Vec::::with_capacity(num_buckets); 105 | last_hist_deltas.resize(num_buckets, 0); 106 | Self { last_hist_deltas, out_buf, ..Default::default() } 107 | } 108 | 109 | pub fn reset_out_buf(&mut self) { 110 | self.out_offset = 0; 111 | } 112 | 113 | /// Call this to finish packing the remainder of the deltas and reset for next go 114 | #[inline] 115 | pub fn finish(&mut self) { 116 | // TODO: move this to a pack_remainder function? 117 | if self.i != 0 { 118 | for j in self.i..8 { 119 | self.pack_array[j] = 0; 120 | } 121 | self.out_offset = nibble_pack8(&self.pack_array, self.out_buf, self.out_offset).unwrap(); 122 | } 123 | self.i = 0; 124 | self.value_dropped = false; 125 | } 126 | } 127 | 128 | impl<'a> Sink for DeltaDiffPackSink<'a> { 129 | #[inline] 130 | fn process(&mut self, data: u64x8) { 131 | let maxlen = self.last_hist_deltas.len(); 132 | let looplen = if self.i + 8 <= maxlen { 8 } else { maxlen - self.i }; 133 | for n in 0..looplen { 134 | let last_value = self.last_hist_deltas[self.i + n]; 135 | // If data dropped from last, write data instead of diff 136 | // TODO: actually try to use the SIMD 137 | let data_item = data.extract(n); 138 | if data_item < last_value { 139 | self.value_dropped = true; 140 | self.pack_array[n] = data_item; 141 | } else { 142 | self.pack_array[n] = data_item - last_value; 143 | } 144 | } 145 | // copy data wholesale to last_hist_deltas 146 | for n in self.i..(self.i+looplen) { 147 | self.last_hist_deltas[n] = data.extract(n - self.i); 148 | } 149 | // if numElems < 8, zero out remainder of packArray 150 | for n in looplen..8 { 151 | self.pack_array[n] = 0; 152 | } 153 | self.out_offset = nibble_pack8(&self.pack_array, self.out_buf, self.out_offset).unwrap(); 154 | self.i += 8; 155 | } 156 | 157 | fn process_zeroes(&mut self) { 158 | todo!(); 159 | } 160 | 161 | // Resets everythin, even the out_buf. Probably should be used only for testing 162 | #[inline] 163 | fn reset(&mut self) { 164 | self.i = 0; 165 | self.value_dropped = false; 166 | for elem in self.last_hist_deltas.iter_mut() { 167 | *elem = 0; 168 | } 169 | self.out_offset = 0; 170 | } 171 | } 172 | 173 | use std::mem; 174 | 175 | #[test] 176 | fn dump_header_structure() { 177 | let header = BinHistogramHeader { 178 | format_code: BinHistogramFormat::GeometricDelta, 179 | bucket_def_len: 2, 180 | num_buckets: 16, 181 | }; 182 | 183 | println!("size of header: {:?}", mem::size_of::()); 184 | println!("align of header: {:?}", mem::align_of::()); 185 | println!("span of bucket_def_len: {:?}", span_of!(BinHistogramHeader, bucket_def_len)); 186 | 187 | unsafe { 188 | let slice = plain::as_bytes(&header); 189 | assert_eq!(slice, [0x01u8, 0x02, 0, 16, 0]); 190 | 191 | let new_header = BinHistogramHeader::from_bytes(slice); 192 | println!("new_header: {:?}", new_header); 193 | } 194 | } 195 | 196 | #[test] 197 | fn delta_diffpack_sink_test() { 198 | let inputs = [ [0u64, 1000, 1001, 1002, 1003, 2005, 2010, 3034, 4045, 5056, 6067, 7078], 199 | [3u64, 1004, 1006, 1008, 1009, 2012, 2020, 3056, 4070, 5090, 6101, 7150], 200 | // [3u64, 1004, 1006, 1008, 1009, 2010, 2020, 3056, 4070, 5090, 6101, 7150], 201 | [7u64, 1010, 1016, 1018, 1019, 2022, 2030, 3078, 4101, 5122, 6134, 7195] ]; 202 | let diffs = inputs.windows(2).map(|pair| { 203 | pair[1].iter().zip(pair[0].iter()).map(|(nb, na)| nb - na ).collect::>() 204 | }).collect::>(); 205 | 206 | // Compress each individual input into its own buffer 207 | let compressed_inputs: Vec<[u8; 256]> = inputs.iter().map(|input| { 208 | let mut buf = [0u8; 256]; 209 | pack_u64_delta(&input[..], &mut buf).unwrap(); 210 | buf 211 | }).collect(); 212 | 213 | let mut out_buf = [0u8; 1024]; 214 | let mut sink = DeltaDiffPackSink::new(inputs[0].len(), &mut out_buf); 215 | 216 | // Verify delta on first one (empty diffs) yields back the original 217 | let _res = unpack(&compressed_inputs[0], &mut sink, inputs[0].len()); 218 | // assert_eq!(res.unwrap().len(), 0); 219 | sink.finish(); 220 | 221 | let mut dsink = DeltaSink::new(); 222 | let _res = unpack(sink.out_buf, &mut dsink, inputs[0].len()); 223 | assert_eq!(dsink.output_vec()[..inputs[0].len()], inputs[0]); 224 | 225 | // Second and subsequent inputs shouyld correspond to diffs 226 | for i in 1..3 { 227 | sink.reset_out_buf(); // need to reset output 228 | let _res = unpack(&compressed_inputs[i], &mut sink, inputs[0].len()); 229 | // assert_eq!(res.unwrap().len(), 0); 230 | assert_eq!(sink.value_dropped, false); // should not have dropped? 231 | sink.finish(); 232 | // dbg!(&sink.out_vec); 233 | 234 | let mut dsink = DeltaSink::new(); 235 | let _res = unpack(sink.out_buf, &mut dsink, inputs[0].len()); 236 | assert_eq!(dsink.output_vec()[..inputs[0].len()], diffs[i - 1][..]); 237 | } 238 | } 239 | -------------------------------------------------------------------------------- /src/lib.rs: -------------------------------------------------------------------------------- 1 | //! ## compressed_vec 2 | //! 3 | //! Floating point and integer compressed vector library, SIMD-enabled for fast processing/iteration over compressed representations. 4 | //! 5 | //! This is a *compressed vec* library, rather than a *compression* library. What does that mean? A compression library takes some uncompressed data and provides essentially compress() and decompress() functions. Typically you have to decompress data to be able to do anything with it, resulting in extra latency and allocations. 6 | //! 7 | //! On the other hand, this *compressed vec* library allows you to iterate over and process the compressed representations directly. It is designed to balance fast iteration and SIMD processing/filtering, while compressing vectors to within 2x of the best columnar compression technology such as Apache Parquet, using techniques such as delta and XOR encoding. Applications: 8 | //! 9 | //! * Database engines 10 | //! * Large in-memory data processing 11 | //! * Games and other apps needing fast access to large quantities of FP vectors/matrices 12 | //! 13 | //! ### Performance Numbers 14 | //! 15 | //! Numbers are from my laptop: 2.9 GHz Core i9, 6/12 cores, 12MB L3, AVX2; from running `cargo bench vector`, which benchmarks a filter-and-count-matches operation directly on encoded/compressed vectors. 16 | //! 17 | //! | Vector type(s) | Elements/sec | Raw GBs per sec | 18 | //! | -------------- | ------------ | --------------- | 19 | //! | u32 dense (no sparsity) | 1.7 Gelems/s | 6.8 GB/s | 20 | //! | u32 sparse (99% zeros) | 13.9 Gelems/s | 55.6 GB/s | 21 | //! | Two u32 vectors (sparse + dense)* | 1.3-5.2 Gelems/s | 5-20 GB/s | 22 | //! | u64 vector, dense | 955M - 1.1 Gelems/s | 7.6 - 9.1 GB/s | 23 | //! | f32, XOR encoded, 60% density | 985 Melems/s | 3.9 GB/s | 24 | //! 25 | //! * The two u32 vector filtering speed (using `MultiVectorFilter`) depends on the order of the vectors. It is faster to filter the sparse vector first. 26 | //! 27 | //! ### Creation, Iteration 28 | //! 29 | //! To create an f32 compressed vector: 30 | //! 31 | //! ``` 32 | //! use compressed_vec::VectorF32XorAppender; 33 | //! let mut appender = VectorF32XorAppender::try_new(2048).unwrap(); 34 | //! let encoded_bytes = appender.encode_all(vec![1.0, 1.5, 2.0, 2.5]).unwrap(); 35 | //! ``` 36 | //! 37 | //! The simplest way to iterate on this compressed vector (note this does not allocate at all): 38 | //! 39 | //! ``` 40 | //! # use compressed_vec::VectorF32XorAppender; 41 | //! # let mut appender = VectorF32XorAppender::try_new(2048).unwrap(); 42 | //! # let encoded_bytes = appender.encode_all(vec![1.0, 1.5, 2.0, 2.5]).unwrap(); 43 | //! use compressed_vec::VectorReader; 44 | //! let reader = VectorReader::::try_new(&encoded_bytes[..]).unwrap(); 45 | //! let sum = reader.iterate().sum::(); // Yay, no allocations! 46 | //! ``` 47 | //! 48 | //! ### Filtering and SIMD Processing 49 | //! 50 | //! `iterate()` is the easiest API to go through individual elements of the compressed vector, but it is not the fastest. Fast data processing, such as done in the filter-and-count benchmarks above, are performed using `Sink`s, which are defined in the `sink` module. Sinks operate on a SIMD word at a time, and the sink API is designed for inlining. 51 | //! 52 | //! For example, let's say that we want to add 2.5 to the f32 vector above, and then write out the results to a `Vec`. Internally, XOR encoding and decoding is performed (using a sink). The sinks can be stacked during decoding, for an almost entirely SIMD pipeline: 53 | //! - `XorSink` (used automatically for f32 decoding) 54 | //! - `AddConstSink` (to add 2.5, again done using SIMD) 55 | //! - `VecSink` (writes output to a normal Vec) 56 | //! 57 | //! ``` 58 | //! # use compressed_vec::VectorF32XorAppender; 59 | //! # let mut appender = VectorF32XorAppender::try_new(2048).unwrap(); 60 | //! # let encoded_bytes = appender.encode_all(vec![1.0, 1.5, 2.0, 2.5]).unwrap(); 61 | //! use compressed_vec::{VectorReader, AddConstSink, VecSink}; 62 | //! let reader = VectorReader::::try_new(&encoded_bytes[..]).unwrap(); 63 | //! let mut vecsink = VecSink::::new(); 64 | //! let mut addsink = AddConstSink::new(2.5f32, &mut vecsink); 65 | //! reader.decode_to_sink(&mut addsink).unwrap(); 66 | //! println!("And the transformed vector is: {:?}", vecsink.vec); 67 | //! ``` 68 | //! 69 | //! ### Vector Format 70 | //! 71 | //! Details of the vector format can be found [here](https://github.com/velvia/compressed-vec/blob/main/vector_format.md). 72 | //! 73 | //! The vector format follows columnar compression techniques used throughout the big data and database world, and roughly follows the Google [Procella](https://blog.acolyer.org/2019/09/11/procella/) paper with its custom Artus format: 74 | //! 75 | //! * Compression within 2x of ZSTD while operating directly on the data 76 | //! * Compression for this format is within 2x of Parquet, but is written to allow filtering and operating on the data directly without needing a separate decompression step for the entire vector 77 | //! * Multi-pass encoding 78 | //! * The `VectorAppender` collects min/max and other stats on the raw data and uses it to decide on the best encoding strategy (delta, etc.) 79 | //! * Exposing dictionary indices to query engine and aggressive pushdown to the data format 80 | //! * The format is designed to filter over dictionary codes, which speeds up filtering 81 | //! * The use of sections allows for many optimizations for filtering. For example, null sections and constant sections allow for very fast filter short-circuiting. 82 | 83 | 84 | #![feature(associated_type_defaults)] 85 | 86 | #[macro_use] 87 | extern crate memoffset; 88 | 89 | pub mod nibblepacking; 90 | pub mod nibblepack_simd; 91 | pub mod byteutils; 92 | pub mod vector; 93 | pub mod histogram; 94 | pub mod section; 95 | pub mod error; 96 | pub mod filter; 97 | pub mod sink; 98 | 99 | // Public crate-level exports for convenience 100 | pub use vector::{VectorU64Appender, VectorU32Appender, VectorF32XorAppender, 101 | VectorReader}; 102 | pub use sink::{VecSink, Section256Sink, AddConstSink}; -------------------------------------------------------------------------------- /src/nibblepack_simd.rs: -------------------------------------------------------------------------------- 1 | #![allow(unused)] // needed for dbg!() macro, but folks say this should not be needed 2 | #![feature(slice_fill)] 3 | 4 | use core::ops::BitAnd; 5 | use std::ops::{Shl, Shr}; 6 | 7 | use crate::byteutils::*; 8 | use crate::error::CodingError; 9 | use crate::nibblepacking::*; 10 | use crate::sink::*; 11 | 12 | use packed_simd::{shuffle, u64x8, u32x8, m32x8, isizex8, cptrx8}; 13 | 14 | 15 | const ZEROES_U64X8: u64x8 = u64x8::splat(0); 16 | const ZEROES_U32X8: u32x8 = u32x8::splat(0); 17 | 18 | /// Partially SIMD-based packing of eight u64 values. Writes at offset into out_buffer; 19 | /// Returns final offset. 20 | // TODO: make rest of steps SIMD too. Right now only input, bitmask and nibble word computation is SIMD. 21 | #[inline] 22 | pub fn pack8_u64_simd(inputs: u64x8, out_buffer: &mut [u8], offset: usize) -> Result { 23 | if (offset + 2) >= out_buffer.len() { 24 | return Err(CodingError::NotEnoughSpace); 25 | } 26 | 27 | // Compute nonzero bitmask, comparing each input word to zeroes 28 | let nonzero_mask = inputs.ne(ZEROES_U64X8).bitmask(); 29 | out_buffer[offset] = nonzero_mask; 30 | let mut off = offset + 1; 31 | 32 | if nonzero_mask != 0 { 33 | // Compute min of leading and trailing zeroes, using SIMD for speed. 34 | // Fastest way is to OR all the bits together, then can use the ORed bits to find leading/trailing zeroes 35 | let ored_bits = inputs.or(); 36 | let min_leading_zeros = ored_bits.leading_zeros(); 37 | let min_trailing_zeros = ored_bits.trailing_zeros(); 38 | 39 | // Convert min leading/trailing to # nibbles. Start packing! 40 | // NOTE: num_nibbles cannot be 0; that would imply every input was zero 41 | let trailing_nibbles = min_trailing_zeros / 4; 42 | let num_nibbles = 16 - (min_leading_zeros / 4) - trailing_nibbles; 43 | let nibble_word = (((num_nibbles - 1) << 4) | trailing_nibbles) as u8; 44 | out_buffer[off] = nibble_word; 45 | off += 1; 46 | 47 | let mut input_buf = [0u64; 8]; 48 | inputs.write_to_slice_unaligned(&mut input_buf); 49 | if (num_nibbles % 2) == 0 { 50 | pack_to_even_nibbles(&input_buf, out_buffer, off, num_nibbles, trailing_nibbles) 51 | } else { 52 | pack_universal(&input_buf, out_buffer, off, num_nibbles, trailing_nibbles) 53 | } 54 | } else { 55 | Ok(off) 56 | } 57 | } 58 | 59 | // Variable shifts for each SIMD lane to decode NibblePacked data 60 | const U32_SIMD_SHIFTS: [u32x8; 9] = [ 61 | // 0 nibbles: this should never be used 62 | u32x8::splat(0), 63 | // 1 nibble / 4 bits for same u32 word 64 | u32x8::new(0, 4, 8, 12, 16, 20, 24, 28), 65 | // 2 nibbles: lower u32 (8 bits x 4), upper u32 (8 bits x 4) 66 | u32x8::new(0, 8, 16, 24, 0, 8, 16, 24), 67 | // 3 nibbles: 4 groups of u32 words (12 bits x 2) 68 | u32x8::new(0, 12, 0, 12, 0, 12, 0, 12), 69 | // 4 nibbles: 4 groups of u32 words (16 bits x 2) 70 | u32x8::new(0, 16, 0, 16, 0, 16, 0, 16), 71 | // 5-8 nibbles: 8 u32 words shifted 72 | u32x8::new(0, 4, 0, 4, 0, 4, 0, 4), 73 | u32x8::splat(0), 74 | u32x8::new(0, 4, 0, 4, 0, 4, 0, 4), 75 | u32x8::splat(0), 76 | ]; 77 | 78 | // Byte offsets for reading U32 values from memory vs number of nibbles. 79 | // Combined with U32_SIMD_SHIFTS, allows us to place shifted U32 values into each lane. 80 | const U32_SIMD_PTR_OFFSETS: [isizex8; 9] = [ 81 | // 0 nibbles: should never be used 82 | isizex8::splat(0), 83 | // 1 nibble, 4x8 bits fits into one u32, so no offset 84 | isizex8::splat(0), 85 | // 2 nibbles, 8x8 bits, two u32s offset by 4 bytes 86 | isizex8::new(0, 0, 0, 0, 4, 4, 4, 4), 87 | // 3 nibbles. 4 groups of u32s 3 bytes apart 88 | isizex8::new(0, 0, 3, 3, 6, 6, 9, 9), 89 | // 4 nibbles. 4 groups of u32s 4 bytes apart 90 | isizex8::new(0, 0, 4, 4, 8, 8, 12, 12), 91 | // 5-8 nibbles: individual u32 words spaced apart 92 | isizex8::new(0, 2, 5, 7, 10, 12, 15, 17), 93 | isizex8::new(0, 3, 6, 9, 12, 15, 18, 21), 94 | isizex8::new(0, 3, 7, 10, 14, 17, 21, 24), 95 | isizex8::new(0, 4, 8, 12, 16, 20, 24, 28), 96 | ]; 97 | 98 | // Bitmask for ANDing during SIMD unpacking 99 | const U32_SIMD_ANDMASK: [u32x8; 9] = [ 100 | u32x8::splat(0x0f), 101 | // 1 nibble 102 | u32x8::splat(0x0f), 103 | // 2 nibbles, etc. 104 | u32x8::splat(0x0ff), 105 | u32x8::splat(0x0fff), 106 | u32x8::splat(0x0ffff), 107 | u32x8::splat(0x0f_ffff), 108 | u32x8::splat(0x0ff_ffff), 109 | u32x8::splat(0x0fff_ffff), 110 | u32x8::splat(0xffff_ffff), 111 | ]; 112 | 113 | const U32_SIMD_ZEROES: u32x8 = u32x8::splat(0); 114 | 115 | // Shuffles used in unpacking. Given input bitmask, it calculates the shuffle 116 | // matrix needed to "expand" or move the elements to the right place given null elements. 117 | // from is the source element number. NOTE: lazy_static was too slow so these constants were 118 | // generated using the following code 119 | // lazy_static! { 120 | // static ref SHUFFLE_UNPACK_IDX_U32: [u32x8; 256] = { 121 | // let mut shuffle_indices = [u32x8::splat(0); 256]; 122 | // for bitmask in 0usize..256 { 123 | // let mut from_pos = 0; 124 | // let mut indices = [0u32; 8]; 125 | // for to_pos in 0..8 { 126 | // // If bit in bitmask is on, then map from_pos to current pos 127 | // if bitmask & (1 << to_pos) != 0 { 128 | // indices[to_pos] = from_pos; 129 | // from_pos += 1; 130 | // // If bit is off, then use the last index into which 0 is stuffed. 131 | // } else { 132 | // indices[to_pos] = 7; 133 | // } 134 | // } 135 | // shuffle_indices[bitmask as usize] = u32x8::from(indices); 136 | // } 137 | // shuffle_indices 138 | // }; 139 | // } 140 | 141 | const SHUFFLE_UNPACK_IDX_U32: [u32x8; 256] = [ 142 | u32x8::new(7, 7, 7, 7, 7, 7, 7, 7), 143 | u32x8::new(0, 7, 7, 7, 7, 7, 7, 7), 144 | u32x8::new(7, 0, 7, 7, 7, 7, 7, 7), 145 | u32x8::new(0, 1, 7, 7, 7, 7, 7, 7), 146 | u32x8::new(7, 7, 0, 7, 7, 7, 7, 7), 147 | u32x8::new(0, 7, 1, 7, 7, 7, 7, 7), 148 | u32x8::new(7, 0, 1, 7, 7, 7, 7, 7), 149 | u32x8::new(0, 1, 2, 7, 7, 7, 7, 7), 150 | u32x8::new(7, 7, 7, 0, 7, 7, 7, 7), 151 | u32x8::new(0, 7, 7, 1, 7, 7, 7, 7), 152 | u32x8::new(7, 0, 7, 1, 7, 7, 7, 7), 153 | u32x8::new(0, 1, 7, 2, 7, 7, 7, 7), 154 | u32x8::new(7, 7, 0, 1, 7, 7, 7, 7), 155 | u32x8::new(0, 7, 1, 2, 7, 7, 7, 7), 156 | u32x8::new(7, 0, 1, 2, 7, 7, 7, 7), 157 | u32x8::new(0, 1, 2, 3, 7, 7, 7, 7), 158 | u32x8::new(7, 7, 7, 7, 0, 7, 7, 7), 159 | u32x8::new(0, 7, 7, 7, 1, 7, 7, 7), 160 | u32x8::new(7, 0, 7, 7, 1, 7, 7, 7), 161 | u32x8::new(0, 1, 7, 7, 2, 7, 7, 7), 162 | u32x8::new(7, 7, 0, 7, 1, 7, 7, 7), 163 | u32x8::new(0, 7, 1, 7, 2, 7, 7, 7), 164 | u32x8::new(7, 0, 1, 7, 2, 7, 7, 7), 165 | u32x8::new(0, 1, 2, 7, 3, 7, 7, 7), 166 | u32x8::new(7, 7, 7, 0, 1, 7, 7, 7), 167 | u32x8::new(0, 7, 7, 1, 2, 7, 7, 7), 168 | u32x8::new(7, 0, 7, 1, 2, 7, 7, 7), 169 | u32x8::new(0, 1, 7, 2, 3, 7, 7, 7), 170 | u32x8::new(7, 7, 0, 1, 2, 7, 7, 7), 171 | u32x8::new(0, 7, 1, 2, 3, 7, 7, 7), 172 | u32x8::new(7, 0, 1, 2, 3, 7, 7, 7), 173 | u32x8::new(0, 1, 2, 3, 4, 7, 7, 7), 174 | u32x8::new(7, 7, 7, 7, 7, 0, 7, 7), 175 | u32x8::new(0, 7, 7, 7, 7, 1, 7, 7), 176 | u32x8::new(7, 0, 7, 7, 7, 1, 7, 7), 177 | u32x8::new(0, 1, 7, 7, 7, 2, 7, 7), 178 | u32x8::new(7, 7, 0, 7, 7, 1, 7, 7), 179 | u32x8::new(0, 7, 1, 7, 7, 2, 7, 7), 180 | u32x8::new(7, 0, 1, 7, 7, 2, 7, 7), 181 | u32x8::new(0, 1, 2, 7, 7, 3, 7, 7), 182 | u32x8::new(7, 7, 7, 0, 7, 1, 7, 7), 183 | u32x8::new(0, 7, 7, 1, 7, 2, 7, 7), 184 | u32x8::new(7, 0, 7, 1, 7, 2, 7, 7), 185 | u32x8::new(0, 1, 7, 2, 7, 3, 7, 7), 186 | u32x8::new(7, 7, 0, 1, 7, 2, 7, 7), 187 | u32x8::new(0, 7, 1, 2, 7, 3, 7, 7), 188 | u32x8::new(7, 0, 1, 2, 7, 3, 7, 7), 189 | u32x8::new(0, 1, 2, 3, 7, 4, 7, 7), 190 | u32x8::new(7, 7, 7, 7, 0, 1, 7, 7), 191 | u32x8::new(0, 7, 7, 7, 1, 2, 7, 7), 192 | u32x8::new(7, 0, 7, 7, 1, 2, 7, 7), 193 | u32x8::new(0, 1, 7, 7, 2, 3, 7, 7), 194 | u32x8::new(7, 7, 0, 7, 1, 2, 7, 7), 195 | u32x8::new(0, 7, 1, 7, 2, 3, 7, 7), 196 | u32x8::new(7, 0, 1, 7, 2, 3, 7, 7), 197 | u32x8::new(0, 1, 2, 7, 3, 4, 7, 7), 198 | u32x8::new(7, 7, 7, 0, 1, 2, 7, 7), 199 | u32x8::new(0, 7, 7, 1, 2, 3, 7, 7), 200 | u32x8::new(7, 0, 7, 1, 2, 3, 7, 7), 201 | u32x8::new(0, 1, 7, 2, 3, 4, 7, 7), 202 | u32x8::new(7, 7, 0, 1, 2, 3, 7, 7), 203 | u32x8::new(0, 7, 1, 2, 3, 4, 7, 7), 204 | u32x8::new(7, 0, 1, 2, 3, 4, 7, 7), 205 | u32x8::new(0, 1, 2, 3, 4, 5, 7, 7), 206 | u32x8::new(7, 7, 7, 7, 7, 7, 0, 7), 207 | u32x8::new(0, 7, 7, 7, 7, 7, 1, 7), 208 | u32x8::new(7, 0, 7, 7, 7, 7, 1, 7), 209 | u32x8::new(0, 1, 7, 7, 7, 7, 2, 7), 210 | u32x8::new(7, 7, 0, 7, 7, 7, 1, 7), 211 | u32x8::new(0, 7, 1, 7, 7, 7, 2, 7), 212 | u32x8::new(7, 0, 1, 7, 7, 7, 2, 7), 213 | u32x8::new(0, 1, 2, 7, 7, 7, 3, 7), 214 | u32x8::new(7, 7, 7, 0, 7, 7, 1, 7), 215 | u32x8::new(0, 7, 7, 1, 7, 7, 2, 7), 216 | u32x8::new(7, 0, 7, 1, 7, 7, 2, 7), 217 | u32x8::new(0, 1, 7, 2, 7, 7, 3, 7), 218 | u32x8::new(7, 7, 0, 1, 7, 7, 2, 7), 219 | u32x8::new(0, 7, 1, 2, 7, 7, 3, 7), 220 | u32x8::new(7, 0, 1, 2, 7, 7, 3, 7), 221 | u32x8::new(0, 1, 2, 3, 7, 7, 4, 7), 222 | u32x8::new(7, 7, 7, 7, 0, 7, 1, 7), 223 | u32x8::new(0, 7, 7, 7, 1, 7, 2, 7), 224 | u32x8::new(7, 0, 7, 7, 1, 7, 2, 7), 225 | u32x8::new(0, 1, 7, 7, 2, 7, 3, 7), 226 | u32x8::new(7, 7, 0, 7, 1, 7, 2, 7), 227 | u32x8::new(0, 7, 1, 7, 2, 7, 3, 7), 228 | u32x8::new(7, 0, 1, 7, 2, 7, 3, 7), 229 | u32x8::new(0, 1, 2, 7, 3, 7, 4, 7), 230 | u32x8::new(7, 7, 7, 0, 1, 7, 2, 7), 231 | u32x8::new(0, 7, 7, 1, 2, 7, 3, 7), 232 | u32x8::new(7, 0, 7, 1, 2, 7, 3, 7), 233 | u32x8::new(0, 1, 7, 2, 3, 7, 4, 7), 234 | u32x8::new(7, 7, 0, 1, 2, 7, 3, 7), 235 | u32x8::new(0, 7, 1, 2, 3, 7, 4, 7), 236 | u32x8::new(7, 0, 1, 2, 3, 7, 4, 7), 237 | u32x8::new(0, 1, 2, 3, 4, 7, 5, 7), 238 | u32x8::new(7, 7, 7, 7, 7, 0, 1, 7), 239 | u32x8::new(0, 7, 7, 7, 7, 1, 2, 7), 240 | u32x8::new(7, 0, 7, 7, 7, 1, 2, 7), 241 | u32x8::new(0, 1, 7, 7, 7, 2, 3, 7), 242 | u32x8::new(7, 7, 0, 7, 7, 1, 2, 7), 243 | u32x8::new(0, 7, 1, 7, 7, 2, 3, 7), 244 | u32x8::new(7, 0, 1, 7, 7, 2, 3, 7), 245 | u32x8::new(0, 1, 2, 7, 7, 3, 4, 7), 246 | u32x8::new(7, 7, 7, 0, 7, 1, 2, 7), 247 | u32x8::new(0, 7, 7, 1, 7, 2, 3, 7), 248 | u32x8::new(7, 0, 7, 1, 7, 2, 3, 7), 249 | u32x8::new(0, 1, 7, 2, 7, 3, 4, 7), 250 | u32x8::new(7, 7, 0, 1, 7, 2, 3, 7), 251 | u32x8::new(0, 7, 1, 2, 7, 3, 4, 7), 252 | u32x8::new(7, 0, 1, 2, 7, 3, 4, 7), 253 | u32x8::new(0, 1, 2, 3, 7, 4, 5, 7), 254 | u32x8::new(7, 7, 7, 7, 0, 1, 2, 7), 255 | u32x8::new(0, 7, 7, 7, 1, 2, 3, 7), 256 | u32x8::new(7, 0, 7, 7, 1, 2, 3, 7), 257 | u32x8::new(0, 1, 7, 7, 2, 3, 4, 7), 258 | u32x8::new(7, 7, 0, 7, 1, 2, 3, 7), 259 | u32x8::new(0, 7, 1, 7, 2, 3, 4, 7), 260 | u32x8::new(7, 0, 1, 7, 2, 3, 4, 7), 261 | u32x8::new(0, 1, 2, 7, 3, 4, 5, 7), 262 | u32x8::new(7, 7, 7, 0, 1, 2, 3, 7), 263 | u32x8::new(0, 7, 7, 1, 2, 3, 4, 7), 264 | u32x8::new(7, 0, 7, 1, 2, 3, 4, 7), 265 | u32x8::new(0, 1, 7, 2, 3, 4, 5, 7), 266 | u32x8::new(7, 7, 0, 1, 2, 3, 4, 7), 267 | u32x8::new(0, 7, 1, 2, 3, 4, 5, 7), 268 | u32x8::new(7, 0, 1, 2, 3, 4, 5, 7), 269 | u32x8::new(0, 1, 2, 3, 4, 5, 6, 7), 270 | u32x8::new(7, 7, 7, 7, 7, 7, 7, 0), 271 | u32x8::new(0, 7, 7, 7, 7, 7, 7, 1), 272 | u32x8::new(7, 0, 7, 7, 7, 7, 7, 1), 273 | u32x8::new(0, 1, 7, 7, 7, 7, 7, 2), 274 | u32x8::new(7, 7, 0, 7, 7, 7, 7, 1), 275 | u32x8::new(0, 7, 1, 7, 7, 7, 7, 2), 276 | u32x8::new(7, 0, 1, 7, 7, 7, 7, 2), 277 | u32x8::new(0, 1, 2, 7, 7, 7, 7, 3), 278 | u32x8::new(7, 7, 7, 0, 7, 7, 7, 1), 279 | u32x8::new(0, 7, 7, 1, 7, 7, 7, 2), 280 | u32x8::new(7, 0, 7, 1, 7, 7, 7, 2), 281 | u32x8::new(0, 1, 7, 2, 7, 7, 7, 3), 282 | u32x8::new(7, 7, 0, 1, 7, 7, 7, 2), 283 | u32x8::new(0, 7, 1, 2, 7, 7, 7, 3), 284 | u32x8::new(7, 0, 1, 2, 7, 7, 7, 3), 285 | u32x8::new(0, 1, 2, 3, 7, 7, 7, 4), 286 | u32x8::new(7, 7, 7, 7, 0, 7, 7, 1), 287 | u32x8::new(0, 7, 7, 7, 1, 7, 7, 2), 288 | u32x8::new(7, 0, 7, 7, 1, 7, 7, 2), 289 | u32x8::new(0, 1, 7, 7, 2, 7, 7, 3), 290 | u32x8::new(7, 7, 0, 7, 1, 7, 7, 2), 291 | u32x8::new(0, 7, 1, 7, 2, 7, 7, 3), 292 | u32x8::new(7, 0, 1, 7, 2, 7, 7, 3), 293 | u32x8::new(0, 1, 2, 7, 3, 7, 7, 4), 294 | u32x8::new(7, 7, 7, 0, 1, 7, 7, 2), 295 | u32x8::new(0, 7, 7, 1, 2, 7, 7, 3), 296 | u32x8::new(7, 0, 7, 1, 2, 7, 7, 3), 297 | u32x8::new(0, 1, 7, 2, 3, 7, 7, 4), 298 | u32x8::new(7, 7, 0, 1, 2, 7, 7, 3), 299 | u32x8::new(0, 7, 1, 2, 3, 7, 7, 4), 300 | u32x8::new(7, 0, 1, 2, 3, 7, 7, 4), 301 | u32x8::new(0, 1, 2, 3, 4, 7, 7, 5), 302 | u32x8::new(7, 7, 7, 7, 7, 0, 7, 1), 303 | u32x8::new(0, 7, 7, 7, 7, 1, 7, 2), 304 | u32x8::new(7, 0, 7, 7, 7, 1, 7, 2), 305 | u32x8::new(0, 1, 7, 7, 7, 2, 7, 3), 306 | u32x8::new(7, 7, 0, 7, 7, 1, 7, 2), 307 | u32x8::new(0, 7, 1, 7, 7, 2, 7, 3), 308 | u32x8::new(7, 0, 1, 7, 7, 2, 7, 3), 309 | u32x8::new(0, 1, 2, 7, 7, 3, 7, 4), 310 | u32x8::new(7, 7, 7, 0, 7, 1, 7, 2), 311 | u32x8::new(0, 7, 7, 1, 7, 2, 7, 3), 312 | u32x8::new(7, 0, 7, 1, 7, 2, 7, 3), 313 | u32x8::new(0, 1, 7, 2, 7, 3, 7, 4), 314 | u32x8::new(7, 7, 0, 1, 7, 2, 7, 3), 315 | u32x8::new(0, 7, 1, 2, 7, 3, 7, 4), 316 | u32x8::new(7, 0, 1, 2, 7, 3, 7, 4), 317 | u32x8::new(0, 1, 2, 3, 7, 4, 7, 5), 318 | u32x8::new(7, 7, 7, 7, 0, 1, 7, 2), 319 | u32x8::new(0, 7, 7, 7, 1, 2, 7, 3), 320 | u32x8::new(7, 0, 7, 7, 1, 2, 7, 3), 321 | u32x8::new(0, 1, 7, 7, 2, 3, 7, 4), 322 | u32x8::new(7, 7, 0, 7, 1, 2, 7, 3), 323 | u32x8::new(0, 7, 1, 7, 2, 3, 7, 4), 324 | u32x8::new(7, 0, 1, 7, 2, 3, 7, 4), 325 | u32x8::new(0, 1, 2, 7, 3, 4, 7, 5), 326 | u32x8::new(7, 7, 7, 0, 1, 2, 7, 3), 327 | u32x8::new(0, 7, 7, 1, 2, 3, 7, 4), 328 | u32x8::new(7, 0, 7, 1, 2, 3, 7, 4), 329 | u32x8::new(0, 1, 7, 2, 3, 4, 7, 5), 330 | u32x8::new(7, 7, 0, 1, 2, 3, 7, 4), 331 | u32x8::new(0, 7, 1, 2, 3, 4, 7, 5), 332 | u32x8::new(7, 0, 1, 2, 3, 4, 7, 5), 333 | u32x8::new(0, 1, 2, 3, 4, 5, 7, 6), 334 | u32x8::new(7, 7, 7, 7, 7, 7, 0, 1), 335 | u32x8::new(0, 7, 7, 7, 7, 7, 1, 2), 336 | u32x8::new(7, 0, 7, 7, 7, 7, 1, 2), 337 | u32x8::new(0, 1, 7, 7, 7, 7, 2, 3), 338 | u32x8::new(7, 7, 0, 7, 7, 7, 1, 2), 339 | u32x8::new(0, 7, 1, 7, 7, 7, 2, 3), 340 | u32x8::new(7, 0, 1, 7, 7, 7, 2, 3), 341 | u32x8::new(0, 1, 2, 7, 7, 7, 3, 4), 342 | u32x8::new(7, 7, 7, 0, 7, 7, 1, 2), 343 | u32x8::new(0, 7, 7, 1, 7, 7, 2, 3), 344 | u32x8::new(7, 0, 7, 1, 7, 7, 2, 3), 345 | u32x8::new(0, 1, 7, 2, 7, 7, 3, 4), 346 | u32x8::new(7, 7, 0, 1, 7, 7, 2, 3), 347 | u32x8::new(0, 7, 1, 2, 7, 7, 3, 4), 348 | u32x8::new(7, 0, 1, 2, 7, 7, 3, 4), 349 | u32x8::new(0, 1, 2, 3, 7, 7, 4, 5), 350 | u32x8::new(7, 7, 7, 7, 0, 7, 1, 2), 351 | u32x8::new(0, 7, 7, 7, 1, 7, 2, 3), 352 | u32x8::new(7, 0, 7, 7, 1, 7, 2, 3), 353 | u32x8::new(0, 1, 7, 7, 2, 7, 3, 4), 354 | u32x8::new(7, 7, 0, 7, 1, 7, 2, 3), 355 | u32x8::new(0, 7, 1, 7, 2, 7, 3, 4), 356 | u32x8::new(7, 0, 1, 7, 2, 7, 3, 4), 357 | u32x8::new(0, 1, 2, 7, 3, 7, 4, 5), 358 | u32x8::new(7, 7, 7, 0, 1, 7, 2, 3), 359 | u32x8::new(0, 7, 7, 1, 2, 7, 3, 4), 360 | u32x8::new(7, 0, 7, 1, 2, 7, 3, 4), 361 | u32x8::new(0, 1, 7, 2, 3, 7, 4, 5), 362 | u32x8::new(7, 7, 0, 1, 2, 7, 3, 4), 363 | u32x8::new(0, 7, 1, 2, 3, 7, 4, 5), 364 | u32x8::new(7, 0, 1, 2, 3, 7, 4, 5), 365 | u32x8::new(0, 1, 2, 3, 4, 7, 5, 6), 366 | u32x8::new(7, 7, 7, 7, 7, 0, 1, 2), 367 | u32x8::new(0, 7, 7, 7, 7, 1, 2, 3), 368 | u32x8::new(7, 0, 7, 7, 7, 1, 2, 3), 369 | u32x8::new(0, 1, 7, 7, 7, 2, 3, 4), 370 | u32x8::new(7, 7, 0, 7, 7, 1, 2, 3), 371 | u32x8::new(0, 7, 1, 7, 7, 2, 3, 4), 372 | u32x8::new(7, 0, 1, 7, 7, 2, 3, 4), 373 | u32x8::new(0, 1, 2, 7, 7, 3, 4, 5), 374 | u32x8::new(7, 7, 7, 0, 7, 1, 2, 3), 375 | u32x8::new(0, 7, 7, 1, 7, 2, 3, 4), 376 | u32x8::new(7, 0, 7, 1, 7, 2, 3, 4), 377 | u32x8::new(0, 1, 7, 2, 7, 3, 4, 5), 378 | u32x8::new(7, 7, 0, 1, 7, 2, 3, 4), 379 | u32x8::new(0, 7, 1, 2, 7, 3, 4, 5), 380 | u32x8::new(7, 0, 1, 2, 7, 3, 4, 5), 381 | u32x8::new(0, 1, 2, 3, 7, 4, 5, 6), 382 | u32x8::new(7, 7, 7, 7, 0, 1, 2, 3), 383 | u32x8::new(0, 7, 7, 7, 1, 2, 3, 4), 384 | u32x8::new(7, 0, 7, 7, 1, 2, 3, 4), 385 | u32x8::new(0, 1, 7, 7, 2, 3, 4, 5), 386 | u32x8::new(7, 7, 0, 7, 1, 2, 3, 4), 387 | u32x8::new(0, 7, 1, 7, 2, 3, 4, 5), 388 | u32x8::new(7, 0, 1, 7, 2, 3, 4, 5), 389 | u32x8::new(0, 1, 2, 7, 3, 4, 5, 6), 390 | u32x8::new(7, 7, 7, 0, 1, 2, 3, 4), 391 | u32x8::new(0, 7, 7, 1, 2, 3, 4, 5), 392 | u32x8::new(7, 0, 7, 1, 2, 3, 4, 5), 393 | u32x8::new(0, 1, 7, 2, 3, 4, 5, 6), 394 | u32x8::new(7, 7, 0, 1, 2, 3, 4, 5), 395 | u32x8::new(0, 7, 1, 2, 3, 4, 5, 6), 396 | u32x8::new(7, 0, 1, 2, 3, 4, 5, 6), 397 | u32x8::new(0, 1, 2, 3, 4, 5, 6, 7), 398 | ]; 399 | 400 | // mask for SIMD gather/pointer reading based on number of nonzeroes in group of 8. 401 | // Only read from memory for which values are guaranteed to exist. 402 | const U32_SIMD_READMASKS: [m32x8; 9] = [ 403 | m32x8::splat(false), 404 | m32x8::new(true, false, false, false, false, false, false, false), 405 | m32x8::new(true, true, false, false, false, false, false, false), 406 | m32x8::new(true, true, true, false, false, false, false, false), 407 | m32x8::new(true, true, true, true, false, false, false, false), 408 | m32x8::new(true, true, true, true, true, false, false, false), 409 | m32x8::new(true, true, true, true, true, true, false, false), 410 | m32x8::new(true, true, true, true, true, true, true, false), 411 | m32x8::new(true, true, true, true, true, true, true, true), 412 | ]; 413 | 414 | // Used for when we aren't sure there's enough space to use preload_simd 415 | #[inline] 416 | fn preload_u32x8_3_4_nibble(buf: &[u8], 417 | stride: usize, 418 | nonzeroes: u32) -> Result<(u32x8, u32), CodingError> { 419 | let total_bytes = (stride * nonzeroes as usize + 1) / 2; 420 | let inword1 = direct_read_uint_le(buf, 2)?; 421 | let words0 = inword1 as u32; 422 | let words1 = (inword1 >> (stride * 8)) as u32; 423 | let (words2, words3) = if (stride * 2) < total_bytes { 424 | // We have processed stride*2 bytes. If total bytes is more than that, keep reading. 425 | let inword2 = direct_read_uint_le(buf, 2 + stride*2)?; 426 | (inword2 as u32, (inword2 >> (stride * 8)) as u32) 427 | } else { (0, 0) }; 428 | let simd_word = u32x8::new(words0, words0, words1, words1, words2, words2, words3, words3); 429 | Ok((simd_word, total_bytes as u32)) 430 | } 431 | 432 | #[inline] 433 | fn preload_u32x8_nibbles(buf: &[u8], 434 | num_nibbles: usize, 435 | nonzeroes: u32) -> Result<(u32x8, u32), CodingError> { 436 | let total_bytes = (num_nibbles * nonzeroes as usize + 1) / 2; 437 | let mut i = 0; 438 | let mut off = 2; 439 | let simd_word = u32x8::splat(0); 440 | while i < 8 && off < (total_bytes + 2) { 441 | let inword = direct_read_uint_le(buf, off)?; 442 | // Safe because we are checking boundaries in while loop conditions 443 | unsafe { simd_word.replace_unchecked(i, inword as u32) }; 444 | let shift2 = (num_nibbles * 4) / 8 * 8; // round off shift to lower byte boundary 445 | unsafe { simd_word.replace_unchecked(i + 1, (inword >> shift2) as u32) }; 446 | i += 2; 447 | off += num_nibbles; 448 | } 449 | Ok((simd_word, total_bytes as u32)) 450 | } 451 | 452 | /// SIMD GATHER/cptr based loading of SIMD u32x8 register, fast for 3+ nibbles 453 | /// Can be used to load from any number of nibbles for u32 454 | // TODO: only enable this for x86* and architectures with safe unaligned reads? 455 | #[inline(always)] 456 | unsafe fn preload_u32x8_simd(buf: &[u8], 457 | num_nibbles: u8, 458 | nonzeroes: u32) -> u32x8 { 459 | // Get pointer to beginning of buf encoded bytes section. This is safe due to length check above 460 | let first_byte = buf.as_ptr().offset(2); 461 | let u8_ptrs = cptrx8::splat(first_byte); 462 | 463 | // Add variable offsets so we read from right parts of buffer for each word 464 | let u8_offset = u8_ptrs.offset(U32_SIMD_PTR_OFFSETS[num_nibbles as usize]); 465 | // Change type from *u8 to *u32 and force unaligned reads 466 | let u32_offsets: cptrx8 = std::mem::transmute(u8_offset); 467 | 468 | // Read with mask 469 | let loaded: u32x8 = u32_offsets.read(U32_SIMD_READMASKS[nonzeroes as usize], ZEROES_U32X8); 470 | // Ensure little endian. This should be NOP on x86 and other LE architectures 471 | loaded.to_le() 472 | } 473 | 474 | // Optimized shuffle using AVX2 instruction, which is not available in packed_simd for some reason ?? 475 | #[cfg(all(any(target_arch = "x86", target_arch = "x86_64"), 476 | target_feature = "avx2"))] 477 | #[inline(always)] 478 | fn unpack_shuffle(input: u32x8, nonzero_mask: u8) -> u32x8 { 479 | #[cfg(target_arch = "x86")] 480 | use core::arch::x86::_mm256_permutevar8x32_epi32; 481 | #[cfg(target_arch = "x86_64")] 482 | use core::arch::x86_64::_mm256_permutevar8x32_epi32; 483 | 484 | let shifted1 = input.replace(7, 0); // Stuff 0 into unused final slot 485 | unsafe { 486 | std::mem::transmute( 487 | _mm256_permutevar8x32_epi32( 488 | std::mem::transmute(shifted1), 489 | std::mem::transmute(SHUFFLE_UNPACK_IDX_U32[nonzero_mask as usize]) 490 | ) 491 | ) 492 | } 493 | } 494 | 495 | // Unoptimized using packed_simd which doesn't support above instruction 496 | #[cfg(not(all(any(target_arch = "x86", target_arch = "x86_64"), 497 | target_feature = "avx2")))] 498 | #[inline(always)] 499 | fn unpack_shuffle(input: u32x8, nonzero_mask: u8) -> u32x8 { 500 | let shifted1 = input.replace(7, 0); // Stuff 0 into unused final slot 501 | shifted1.shuffle1_dyn(SHUFFLE_UNPACK_IDX_U32[nonzero_mask as usize]) 502 | } 503 | 504 | // Max number of bytes that a U32 nibblepacked 8 inputs could take up: 2 + 8*4; 505 | pub const MAX_U32_NIBBLEPACKED_LEN: usize = 34; 506 | 507 | /// SIMD-based decoding of NibblePacked data to u32x8. Errors out if number of nibbles exceeds 8. 508 | /// Checks that the input buffer has enough room to decode. 509 | /// Really fast for 1-2 nibbles, but still fast for 3-8 nibbles. 510 | #[inline] 511 | pub fn unpack8_u32_simd<'a, Output: Sink>( 512 | inbuf: &'a [u8], 513 | output: &mut Output, 514 | ) -> Result<&'a [u8], CodingError> { 515 | if inbuf.is_empty() { return Err(CodingError::NotEnoughSpace) } 516 | let nonzero_mask = inbuf[0]; 517 | let nonzero_count = nonzero_mask.count_ones(); 518 | if nonzero_mask == 0 { 519 | output.process_zeroes(); 520 | Ok(&inbuf[1..]) 521 | } else { 522 | // NOTE: if nonzero values, must be at least two more bytes: the nibble count and packed nibbles 523 | if inbuf.len() < 3 { return Err(CodingError::NotEnoughSpace) } 524 | let num_nibbles = (inbuf[1] >> 4) + 1; 525 | let trailing_zeros = (inbuf[1] & 0x0f) * 4; 526 | 527 | // First step: load encoded bytes in parallel to SIMD registers 528 | // Also figure out how many bytes are taken up by packed nibbles 529 | let (simd_inputs, num_bytes) = match num_nibbles { 530 | // NOTE: the code for 1/2 nibbles is faster than preload_simd, but for 3+ nibbles preload_simd is faster 531 | 1 => { // one nibble, easy peasy. 532 | // Step 1. single nibble x 8 is u32, so we can just splat it :) 533 | let encoded0 = direct_read_uint_le(inbuf, 2)? as u32; 534 | (u32x8::splat(encoded0), (nonzero_count + 1) / 2) 535 | }, 536 | 2 => { 537 | // 2 nibbles/byte: first 4 values gets lower 32 bits, second gets upper 32 bits 538 | let in_word = direct_read_uint_le(inbuf, 2)?; 539 | let lower_u32 = in_word as u32; 540 | let upper_u32 = (in_word >> 32) as u32; 541 | (u32x8::new(lower_u32, lower_u32, lower_u32, lower_u32, 542 | upper_u32, upper_u32, upper_u32, upper_u32), 543 | nonzero_count) 544 | }, 545 | 3..=8 => { 546 | if inbuf.len() >= MAX_U32_NIBBLEPACKED_LEN { 547 | let total_bytes = (num_nibbles as usize * nonzero_count as usize + 1) / 2; 548 | // Call below is safe since we have checked length above 549 | (unsafe { preload_u32x8_simd(inbuf, num_nibbles, nonzero_count) }, total_bytes as u32) 550 | } else if num_nibbles <= 4 { 551 | preload_u32x8_3_4_nibble(inbuf, num_nibbles as usize, nonzero_count)? 552 | } else { 553 | preload_u32x8_nibbles(inbuf, num_nibbles as usize, nonzero_count)? 554 | } 555 | }, 556 | _ => return Err(CodingError::InvalidFormat( 557 | format!("{:?} nibbles is too many for u32 decoder", num_nibbles))), 558 | }; 559 | 560 | let shuffled = simd_unpack_inner(simd_inputs, num_nibbles, trailing_zeros, 561 | nonzero_count, nonzero_mask); 562 | 563 | // Step 6. Send to sink, and advance input slice 564 | output.process(shuffled); 565 | Ok(&inbuf[(2 + num_bytes as usize)..]) 566 | } 567 | } 568 | 569 | // Inner SIMD decoding steps, produces a final shuffled 8 u32's 570 | #[inline(always)] 571 | fn simd_unpack_inner(simd_inputs: u32x8, num_nibbles: u8, trailing_zeros: u8, 572 | nonzero_count: u32, 573 | nonzero_mask: u8) -> u32x8 { 574 | // Step 2. Variable right shift to shift each set of nibbles in right place 575 | let shifted = simd_inputs.shr(U32_SIMD_SHIFTS[num_nibbles as usize]); 576 | 577 | // Step 3. AND mask to strip upper bits, so each lane left with its own value 578 | let anded = shifted.bitand(U32_SIMD_ANDMASK[num_nibbles as usize]); 579 | 580 | // Step 4. Left shift for trailing zeroes, if needed 581 | let leftshifted = if (trailing_zeros == 0) { anded } else { anded.shl(trailing_zeros as u32) }; 582 | 583 | // Step 5. Shuffle inputs based on nonzero mask to proper places 584 | if (nonzero_count == 8) { leftshifted } else { unpack_shuffle(leftshifted, nonzero_mask) } 585 | } 586 | 587 | 588 | #[test] 589 | fn test_unpack_u32simd_1_2nibbles() { 590 | let mut buf = [55u8; 512]; 591 | 592 | // 1 nibble, no nulls 593 | let mut sink = U32_256Sink::new(); 594 | let data = [1u32, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]; 595 | let written = pack_u64(data.iter().map(|&x| x as u64), &mut buf, 0).unwrap(); 596 | let rest = unpack8_u32_simd(&buf[..written], &mut sink).unwrap(); 597 | 598 | // should use up all but last 4 bytes, and first 8 bytes should be identical 599 | assert_eq!(rest.len(), 4); 600 | assert_eq!(sink.values[..8], data[..8]); 601 | 602 | unpack8_u32_simd(rest, &mut sink).unwrap(); 603 | assert_eq!(sink.values[8..12], data[8..12]); 604 | 605 | // 2 nibbles, no nulls. NOTE; final values all are multiples of 16; this tests leading_zeroes == 4 606 | let mut sink = U32_256Sink::new(); 607 | let data2 = [32u32, 34, 40, 48, 56, 72, 80, 88, 96, 112, 128, 144]; 608 | let written = pack_u64(data2.iter().map(|&x| x as u64), &mut buf, 0).unwrap(); 609 | let rest2 = unpack8_u32_simd(&buf[..written], &mut sink).unwrap(); 610 | 611 | // assert_eq!(rest2.len(), 6); 612 | assert_eq!(sink.values[..8], data2[..8]); 613 | 614 | unpack8_u32_simd(rest2, &mut sink).unwrap(); 615 | assert_eq!(sink.values[8..12], data2[8..12]); 616 | 617 | // 1 nibble, nulls 618 | let mut sink = U32_256Sink::new(); 619 | let data = [1u32, 2, 0, 3, 4, 0, 5, 6, 0, 8, 9, 10, 0, 12]; 620 | let written = pack_u64(data.iter().map(|&x| x as u64), &mut buf, 0).unwrap(); 621 | let rest = unpack8_u32_simd(&buf[..written], &mut sink).unwrap(); 622 | 623 | // should use up all but last 4 bytes, and first 8 bytes should be identical 624 | assert_eq!(rest.len(), 4); 625 | assert_eq!(sink.values[..8], data[..8]); 626 | 627 | unpack8_u32_simd(rest, &mut sink).unwrap(); 628 | assert_eq!(sink.values[8..data.len()], data[8..]); 629 | 630 | // 2 nibbles, nulls 631 | let mut sink = U32_256Sink::new(); 632 | let data2 = [32u32, 34, 40, 0, 0, 48, 56, 72, 80, 0, 88, 0, 96]; 633 | let written = pack_u64(data2.iter().map(|&x| x as u64), &mut buf, 0).unwrap(); 634 | let rest2 = unpack8_u32_simd(&buf[..written], &mut sink).unwrap(); 635 | 636 | // assert_eq!(rest2.len(), 6); 637 | assert_eq!(sink.values[..8], data2[..8]); 638 | 639 | unpack8_u32_simd(rest2, &mut sink).unwrap(); 640 | assert_eq!(sink.values[8..12], data2[8..12]); 641 | } 642 | 643 | #[test] 644 | fn test_unpack_u32simd_3_4nibbles() { 645 | // Tests edge case where 4 nibbles (16 bits) pack edge 646 | // 4 nibbles = 2^16, so values < 65536 647 | let inputs = [65535u64; 8]; 648 | let mut buf = [0u8; 512]; 649 | let written = nibble_pack8(&inputs, &mut buf, 0).unwrap(); 650 | 651 | let mut sink = U32_256Sink::new(); 652 | let _rest = unpack8_u32_simd(&buf[..written], &mut sink).unwrap(); 653 | 654 | assert_eq!(sink.values[..8], [65535u32; 8]); 655 | 656 | // case 2 - first 8 use 3 nibbles, and then 4 nibbles. 657 | let mut sink = U32_256Sink::new(); 658 | let inputs = [0u32, 1000, 1001, 1002, 1003, 2005, 2010, 3034, 4045, 5056, 6067, 7078]; 659 | 660 | let written = pack_u64(inputs.iter().map(|&x| x as u64), &mut buf, 0).unwrap(); 661 | let rest = unpack8_u32_simd(&buf[..written], &mut sink).unwrap(); 662 | 663 | unpack8_u32_simd(rest, &mut sink).unwrap(); 664 | assert_eq!(sink.values[..inputs.len()], inputs); 665 | } 666 | 667 | // NOTE: cfg(test) is needed so that proptest can just be a "dev-dependency" and not linked for final library 668 | // NOTE2: somehow cargo is happier when we put props tests in its own module 669 | #[cfg(test)] 670 | mod props { 671 | extern crate proptest; 672 | 673 | use self::proptest::prelude::*; 674 | use super::*; 675 | 676 | // Generators (Arb's) for numbers of given # bits with fractional chance of being zero. 677 | // Also input arrays of 8 with the given properties above. 678 | prop_compose! { 679 | /// zero_chance: 0..1.0 chance of obtaining a zero 680 | fn arb_maybezero_nbits_u32 681 | (nbits: usize, zero_chance: f32) 682 | (is_zero in prop::bool::weighted(zero_chance as f64), 683 | n in 0u32..(1 << nbits)) 684 | -> u32 685 | { 686 | if is_zero { 0 } else { n } 687 | } 688 | } 689 | 690 | // Generate random u32 source arrays 691 | prop_compose! { 692 | fn arb_u32_vectors() 693 | (nbits in 4usize..30, chance in 0.1f32..0.6) 694 | (mut v in proptest::collection::vec(arb_maybezero_nbits_u32(nbits, chance), 2..40)) 695 | -> Vec { v } 696 | } 697 | 698 | prop_compose! { 699 | /// zero_chance: 0..1.0 chance of obtaining a zero 700 | fn arb_maybezero_nbits_u64 701 | (nbits: usize, zero_chance: f32) 702 | (is_zero in prop::bool::weighted(zero_chance as f64), 703 | n in 0u64..(1 << nbits)) 704 | -> u64 705 | { 706 | if is_zero { 0 } else { n } 707 | } 708 | } 709 | 710 | // random u64 source arrays 711 | prop_compose! { 712 | fn arb_u64_vectors() 713 | (nbits in 16usize..40, chance in 0.1f32..0.6) 714 | (mut v in proptest::collection::vec(arb_maybezero_nbits_u64(nbits, chance), 2..10)) 715 | -> Vec { v } 716 | } 717 | 718 | proptest! { 719 | #[test] 720 | fn prop_u32simd_pack_unpack(input in arb_u32_vectors()) { 721 | let mut buf = [0u8; 2048]; 722 | pack_u64(input.iter().map(|&x| x as u64), &mut buf, 0).unwrap(); 723 | let mut sink = U32_256Sink::new(); 724 | let res = unpack8_u32_simd(&buf, &mut sink).unwrap(); 725 | let maxlen = 8.min(input.len()); 726 | assert_eq!(sink.values[..maxlen], input[..maxlen]); 727 | } 728 | 729 | #[test] 730 | fn prop_u64simd_pack(input in arb_u64_vectors()) { 731 | let mut buf = [0u8; 1024]; 732 | let mut inbuf = [0u64; 8]; 733 | let numelems = input.len().min(8); 734 | inbuf[..numelems].copy_from_slice(&input[..numelems]); 735 | let simd_inputs = u64x8::from_slice_unaligned(&inbuf[..]); 736 | let _off = pack8_u64_simd(simd_inputs, &mut buf, 0).unwrap(); 737 | 738 | let mut sink = U64_256Sink::new(); 739 | let res = nibble_unpack8(&buf, &mut sink).unwrap(); 740 | assert_eq!(sink.values[..numelems], input[..numelems]); 741 | } 742 | } 743 | } 744 | -------------------------------------------------------------------------------- /src/nibblepacking.rs: -------------------------------------------------------------------------------- 1 | use packed_simd::{u32x8, u64x8, FromCast}; 2 | 3 | use crate::error::CodingError; 4 | use crate::byteutils::*; 5 | use crate::sink::*; 6 | use crate::nibblepack_simd::unpack8_u32_simd; 7 | 8 | /// Packs a slice of u64 numbers that are increasing, using delta encoding. That is, the delta between successive 9 | /// elements is encoded, rather than the absolute numbers. The first number is encoded as is. 10 | /// 11 | /// ## Numbers must be increasing 12 | /// This is currently only designed for the case where successive numbers are either the same or increasing 13 | /// (such as Prometheus-style increasing histograms). If a successive input is less than the previous input, 14 | /// currently this method WILL CLIP and record the difference as 0. 15 | pub fn pack_u64_delta(inputs: &[u64], out_buffer: &mut [u8]) -> Result { 16 | let mut last = 0u64; 17 | let deltas = inputs.into_iter().map(|&n| { 18 | let delta = n.saturating_sub(last); 19 | last = n; 20 | delta 21 | }); 22 | pack_u64(deltas, out_buffer, 0) 23 | } 24 | 25 | /// Packs a stream of double-precision IEEE-754 / f64 numbers using XOR encoding. 26 | /// The first f64 is written as is; after that, each successive f64 is XORed with the previous one and the xor 27 | /// value is written, based on the premise that when changes are small so is the XORed value. 28 | /// Stream must have at least one value, otherwise InputTooShort is returned 29 | pub fn pack_f64_xor>(mut stream: I, 30 | out_buffer: &mut [u8]) -> Result { 31 | let mut last: u64 = match stream.next() { 32 | Some(num) => { 33 | let num_bits = num.to_bits(); 34 | direct_write_uint_le(out_buffer, 0, num_bits, 8)?; 35 | num_bits 36 | }, 37 | None => return Err(CodingError::InputTooShort) 38 | }; 39 | pack_u64(stream.map(|f| { 40 | let f_bits = f.to_bits(); 41 | let delta = last ^ f_bits; 42 | last = f_bits; 43 | delta 44 | }), out_buffer, 8) 45 | } 46 | 47 | 48 | /// 49 | /// Packs a stream of plain u64 numbers using NibblePacking. 50 | /// 51 | /// This is especially powerful when combined with 52 | /// other packers which can do for example delta or floating point XOR or other kinds of encoding which reduces 53 | /// the # of bits needed and produces many zeroes. This is why an Iterator is used for the API, as sources will 54 | /// typically transform the incoming data by reducing the bits needed. 55 | /// This method does no transformations to the input data. You might want one of the other pack_* methods. 56 | /// 57 | /// ``` 58 | /// # use compressed_vec::nibblepacking; 59 | /// let inputs = [0u64, 1000, 1001, 1002, 1003, 2005, 2010, 3034, 4045, 5056, 6067, 7078]; 60 | /// let mut buf = [0u8; 1024]; 61 | /// nibblepacking::pack_u64(inputs.into_iter().cloned(), &mut buf, 0); 62 | /// ``` 63 | /// NOTE: The NibblePack algorithm always packs 8 u64's at a time. If the length of the input stream is not 64 | /// divisible by 8, extra 0 values pad the input. 65 | // TODO: should this really be a function, or maybe a struct with more methods? 66 | // TODO: also benchmark this vs just reading from a slice of u64's 67 | #[inline] 68 | pub fn pack_u64>(stream: I, 69 | out_buffer: &mut [u8], 70 | offset: usize) -> Result { 71 | let mut in_buffer = [0u64; 8]; 72 | let mut bufindex = 0; 73 | let mut off = offset; 74 | // NOTE: using pointer math is actually NOT any faster! 75 | for num in stream { 76 | in_buffer[bufindex] = num; 77 | bufindex += 1; 78 | if bufindex >= 8 { 79 | // input buffer is full, encode! 80 | off = nibble_pack8(&in_buffer, out_buffer, off)?; 81 | bufindex = 0; 82 | } 83 | } 84 | // If buffer is partially filled, then encode the remainer 85 | if bufindex > 0 { 86 | while bufindex < 8 { 87 | in_buffer[bufindex] = 0; 88 | bufindex += 1; 89 | } 90 | off = nibble_pack8(&in_buffer, out_buffer, off)?; 91 | } 92 | Ok(off) 93 | } 94 | 95 | /// 96 | /// NibblePacking is an encoding technique for packing 8 u64's tightly into the same number of nibbles. 97 | /// It can be combined with a prediction algorithm to efficiency encode floats and long values. 98 | /// This is really an inner function; the intention is for the user to use one of the higher level pack* methods. 99 | /// Please see http://github.com/filodb/FiloDB/doc/compression.md for more answers. 100 | /// 101 | /// # Arguments 102 | /// * `inputs` - ref to 8 u64 values to pack, could be the output of a predictor 103 | /// * `out_buffer` - a &mut [u8] to write the encoded output to. 104 | /// * `offset` - offset within the out_buffer to write to 105 | /// Outputs the ending offset, or an error. 106 | /// 107 | #[inline(always)] 108 | pub fn nibble_pack8(inputs: &[u64; 8], 109 | out_buffer: &mut [u8], 110 | offset: usize) -> Result { 111 | // Compute the nonzero bitmask. TODO: use SIMD here 112 | let mut nonzero_mask = 0u8; 113 | let mut off = offset; 114 | for i in 0..8 { 115 | if inputs[i] != 0 { 116 | nonzero_mask |= 1 << i; 117 | } 118 | } 119 | // Check for both nonzero byte and at least one more byte after that for nibbles 120 | if (off + 1) >= out_buffer.len() { 121 | return Err(CodingError::NotEnoughSpace); 122 | } 123 | out_buffer[off] = nonzero_mask; 124 | off += 1; 125 | 126 | // if no nonzero values, we're done! 127 | if nonzero_mask != 0 { 128 | // TODO: use SIMD here 129 | // otherwise, get min of leading and trailing zeros, encode it 130 | let min_leading_zeros = inputs.into_iter().map(|x| x.leading_zeros()).min().unwrap(); 131 | let min_trailing_zeros = inputs.into_iter().map(|x| x.trailing_zeros()).min().unwrap(); 132 | // Below impl seems to be equally fast, though it generates much more efficient code and SHOULD be much faster 133 | // let mut ored_bits = 0u64; 134 | // inputs.into_iter().for_each(|&x| ored_bits |= x); 135 | // let min_leading_zeros = ored_bits.leading_zeros(); 136 | // let min_trailing_zeros = ored_bits.trailing_zeros(); 137 | 138 | // Convert min leading/trailing to # nibbles. Start packing! 139 | // NOTE: num_nibbles cannot be 0; that would imply every input was zero 140 | let trailing_nibbles = min_trailing_zeros / 4; 141 | let num_nibbles = 16 - (min_leading_zeros / 4) - trailing_nibbles; 142 | let nibble_word = (((num_nibbles - 1) << 4) | trailing_nibbles) as u8; 143 | out_buffer[off] = nibble_word; 144 | off += 1; 145 | 146 | if (num_nibbles % 2) == 0 { 147 | off = pack_to_even_nibbles(inputs, out_buffer, off, num_nibbles, trailing_nibbles)?; 148 | } else { 149 | off = pack_universal(inputs, out_buffer, off, num_nibbles, trailing_nibbles)?; 150 | } 151 | } 152 | Ok(off) 153 | } 154 | 155 | /// 156 | /// Inner function to pack the raw inputs to nibbles when # nibbles is even (always # bytes) 157 | /// It's somehow really fast, perhaps because it is really simple. 158 | /// 159 | /// # Arguments 160 | /// * `trailing_zero_nibbles` - the min # of trailing zero nibbles across all inputs 161 | /// * `num_nibbles` - the max # of nibbles having nonzero bits in all inputs 162 | #[inline] 163 | pub(crate) fn pack_to_even_nibbles( 164 | inputs: &[u64; 8], 165 | out_buffer: &mut [u8], 166 | offset: usize, 167 | num_nibbles: u32, 168 | trailing_zero_nibbles: u32 169 | ) -> Result { 170 | // In the future, explore these optimizations: functions just for specific nibble widths 171 | let shift = trailing_zero_nibbles * 4; 172 | assert!(num_nibbles % 2 == 0); 173 | let num_bytes_each = (num_nibbles / 2) as usize; 174 | let mut off = offset; 175 | 176 | // for each nonzero input, shift and write out exact # of bytes 177 | for &x in inputs { 178 | if x != 0 { 179 | off = direct_write_uint_le(out_buffer, off, x >> shift, num_bytes_each)?; 180 | } 181 | }; 182 | Ok(off) 183 | } 184 | 185 | /// Universal, generic nibble packing algorithm, packing 8 64-bit values to a byte buffer. 186 | /// This code is inspired by bitpacking crate: https://github.com/tantivy-search/bitpacking/ 187 | /// but modified for the NibblePacking algorithm. No macros, so slightly less efficient. 188 | /// TODO: consider using macros like in bitpacking to achieve even more speed :D 189 | #[inline] 190 | pub(crate) fn pack_universal( 191 | inputs: &[u64; 8], 192 | out_buffer: &mut [u8], 193 | offset: usize, 194 | num_nibbles: u32, 195 | trailing_zero_nibbles: u32 196 | ) -> Result { 197 | let trailing_shift = trailing_zero_nibbles * 4; 198 | let num_bits = num_nibbles * 4; 199 | let mut out_word = 0u64; 200 | let mut bit_cursor = 0; 201 | let mut off = offset; 202 | 203 | for &x in inputs { 204 | if x != 0 { 205 | let remaining = 64 - bit_cursor; 206 | let shifted_input = x >> trailing_shift; 207 | 208 | // This is least significant portion of input 209 | out_word |= shifted_input << bit_cursor; 210 | 211 | // Write out current word if we've used up all 64 bits 212 | if remaining <= num_bits { 213 | off = direct_write_uint_le(out_buffer, off, out_word, 8)?; 214 | 215 | if remaining < num_bits { 216 | // Most significant portion left over from previous word 217 | out_word = shifted_input >> (remaining as i32); 218 | } else { 219 | out_word = 0; // reset for 64-bit input case 220 | } 221 | } 222 | 223 | bit_cursor = (bit_cursor + num_bits) % 64; 224 | } 225 | }; 226 | 227 | // Write remainder word if there are any bits remaining, and only advance buffer right # of bytes 228 | if bit_cursor > 0 { 229 | off = direct_write_uint_le(out_buffer, off, out_word, ((bit_cursor + 7) / 8) as usize)?; 230 | } 231 | Ok(off) 232 | } 233 | 234 | 235 | const ZERO_U64OCTET: u64x8 = u64x8::splat(0); 236 | 237 | /// A Sink which accumulates delta-encoded NibblePacked data back into increasing u64 numbers 238 | #[derive(Debug)] 239 | pub struct DeltaSink { 240 | acc: u64, 241 | sink: VecSink, 242 | } 243 | 244 | impl DeltaSink { 245 | pub fn with_sink(inner_sink: VecSink) -> DeltaSink { 246 | DeltaSink { acc: 0, sink: inner_sink } 247 | } 248 | 249 | pub fn new() -> DeltaSink { 250 | DeltaSink::with_sink(VecSink::::new()) 251 | } 252 | 253 | pub fn output_vec(&self) -> &Vec { 254 | &self.sink.vec 255 | } 256 | } 257 | 258 | impl Sink for DeltaSink { 259 | #[inline] 260 | fn process(&mut self, data: u64x8) { 261 | let mut buf = u64x8::splat(0); 262 | let mut acc = self.acc; 263 | for i in 0..8 { 264 | acc += data.extract(i); 265 | buf = buf.replace(i, acc); 266 | } 267 | self.acc = acc; 268 | self.sink.process(buf); 269 | } 270 | 271 | fn process_zeroes(&mut self) { 272 | todo!(); 273 | } 274 | 275 | fn reset(&mut self) { 276 | self.acc = 0; 277 | self.sink.reset() 278 | } 279 | } 280 | 281 | /// A sink which uses simple successive XOR encoding to decode a NibblePacked floating point stream 282 | /// encoded using [`pack_f64_xor`]: #method.pack_f64_xor 283 | #[derive(Debug)] 284 | pub struct DoubleXorSink { 285 | last: u64, 286 | vec: Vec, 287 | } 288 | 289 | impl DoubleXorSink { 290 | /// Creates a new DoubleXorSink with a vec which is owned by this struct. 291 | pub fn new(the_vec: Vec) -> DoubleXorSink { 292 | DoubleXorSink { last: 0, vec: the_vec } 293 | } 294 | 295 | fn reset(&mut self, init_value: u64) { 296 | self.vec.clear(); 297 | self.vec.push(f64::from_bits(init_value)); 298 | self.last = init_value; 299 | } 300 | } 301 | 302 | impl Sink for DoubleXorSink { 303 | #[inline] 304 | fn process(&mut self, data: u64x8) { 305 | let mut buf = [0f64; 8]; 306 | let mut last = self.last; 307 | for i in 0..8 { 308 | // XOR new piece of data with last, which yields original value 309 | let numbits = last ^ data.extract(i); 310 | buf[i] = f64::from_bits(numbits); 311 | last = numbits 312 | } 313 | self.last = last; 314 | 315 | self.vec.extend(&buf); 316 | } 317 | 318 | fn process_zeroes(&mut self) { 319 | todo!(); 320 | } 321 | 322 | fn reset(&mut self) { 323 | self.vec.clear(); 324 | } 325 | } 326 | 327 | /// A sink that converts u32x8 output from SIMD 32-bit unpacker to 64-bit 328 | // TODO: figure out right place for this? 329 | #[derive(Debug)] 330 | struct U32ToU64Sink<'a, S: Sink> { 331 | u64sink: &'a mut S 332 | } 333 | 334 | impl<'a, S: Sink> U32ToU64Sink<'a, S> { 335 | #[inline] 336 | pub fn new(u64sink: &'a mut S) -> Self { 337 | Self { u64sink } 338 | } 339 | } 340 | 341 | impl<'a, S: Sink> Sink for U32ToU64Sink<'a, S> { 342 | #[inline] 343 | fn process(&mut self, data: u32x8) { 344 | self.u64sink.process(u64x8::from_cast(data)); 345 | } 346 | 347 | #[inline] 348 | fn process_zeroes(&mut self) { 349 | self.u64sink.process_zeroes(); 350 | } 351 | 352 | fn reset(&mut self) {} 353 | } 354 | 355 | /// Unpacks num_values values from an encoded buffer, by calling nibble_unpack8 enough times. 356 | /// The output.process() method is called numValues times rounded up to the next multiple of 8. 357 | /// Returns "remainder" byteslice or unpacking error (say if one ran out of space) 358 | /// 359 | /// # Arguments 360 | /// * `inbuf` - NibblePacked compressed byte slice containing "remaining" bytes, starting with bitmask byte 361 | /// * `output` - a Trait which processes each resulting u64 362 | /// * `num_values` - the number of u64 values to decode 363 | #[inline] 364 | pub fn unpack<'a, Output>( 365 | encoded: &'a [u8], 366 | output: &mut Output, 367 | num_values: usize, 368 | ) -> Result<&'a [u8], CodingError> 369 | where Output: Sink { 370 | let mut values_left = num_values as isize; 371 | let mut inbuf = encoded; 372 | while values_left > 0 { 373 | inbuf = nibble_unpack8(inbuf, output)?; 374 | values_left -= 8; 375 | } 376 | Ok(inbuf) 377 | } 378 | 379 | /// Unpacks a buffer encoded with [`pack_f64_xor`]: #method.pack_f64_xor 380 | /// 381 | /// This wraps unpack() method with a read of the initial f64 value. InputTooShort error is returned 382 | /// if the input does not have enough bytes given the number of values read. 383 | /// NOTE: the sink is automatically cleared at the beginning. 384 | /// 385 | /// ``` 386 | /// # use compressed_vec::nibblepacking; 387 | /// # let encoded = [0xffu8; 16]; 388 | /// let mut out = Vec::::with_capacity(64); 389 | /// let mut sink = nibblepacking::DoubleXorSink::new(out); 390 | /// let res = nibblepacking::unpack_f64_xor(&encoded[..], &mut sink, 16); 391 | /// ``` 392 | pub fn unpack_f64_xor<'a>(encoded: &'a [u8], 393 | sink: &mut DoubleXorSink, 394 | num_values: usize) -> Result<&'a [u8], CodingError> { 395 | assert!(num_values >= 1); 396 | let init_value = direct_read_uint_le(encoded, 0)?; 397 | sink.reset(init_value); 398 | 399 | unpack(&encoded[8..], sink, num_values - 1) 400 | } 401 | 402 | /// Unpacks 8 u64's packed using nibble_pack8 by calling the output.process() method 8 times, once for each encoded 403 | /// value. Always calls 8 times regardless of what is in the input, unless the input is too short. 404 | /// Returns "remainder" byteslice or unpacking error (say if one ran out of space). 405 | /// Uses the SIMD U32 unpack func if possible to speed things up 406 | /// 407 | /// # Arguments 408 | /// * `inbuf` - NibblePacked compressed byte slice containing "remaining" bytes, starting with bitmask byte 409 | /// * `output` - a Trait which processes each resulting u64 410 | // NOTE: The 'a is a lifetime annotation. When you use two references in Rust, and return one, Rust needs 411 | // annotations to help it determine to which input the output lifetime is related to, so Rust knows 412 | // that the output of the slice will live as long as the reference to the input slice is valid. 413 | #[inline] 414 | pub fn nibble_unpack8<'a, Output: Sink>( 415 | inbuf: &'a [u8], 416 | output: &mut Output, 417 | ) -> Result<&'a [u8], CodingError> { 418 | if inbuf.is_empty() { return Err(CodingError::NotEnoughSpace) } 419 | let nonzero_mask = inbuf[0]; 420 | if nonzero_mask == 0 { 421 | // All 8 words are 0; skip further processing 422 | output.process(ZERO_U64OCTET); 423 | Ok(&inbuf[1..]) 424 | } else { 425 | if inbuf.len() < 2 { return Err(CodingError::NotEnoughSpace) } 426 | let num_bits = ((inbuf[1] >> 4) + 1) * 4; 427 | let trailing_zeros = (inbuf[1] & 0x0f) * 4; 428 | 429 | // Use SIMD u32 unpacker if total resulting bits is <= 32 430 | // Improves filtering throughput about 2x 431 | if (num_bits + trailing_zeros) <= 32 { 432 | let mut wrapper_sink = U32ToU64Sink::new(output); 433 | return unpack8_u32_simd(inbuf, &mut wrapper_sink); 434 | } 435 | 436 | let total_bytes = 2 + (num_bits as u32 * nonzero_mask.count_ones() + 7) / 8; 437 | let mask: u64 = if num_bits >= 64 { std::u64::MAX } else { (1u64 << num_bits) - 1u64 }; 438 | let mut bit_cursor = 0; 439 | let mut out_array = [0u64; 8]; 440 | 441 | // Read in first word 442 | let mut in_word = direct_read_uint_le(inbuf, 2)?; 443 | let mut pos = 10; 444 | 445 | for bit in 0..8 { 446 | if (nonzero_mask & (1 << bit)) != 0 { 447 | let remaining = 64 - bit_cursor; 448 | 449 | // Shift and read in LSB (or entire nibbles if they fit) 450 | let shifted_in = in_word >> bit_cursor; 451 | let mut out_word = shifted_in & mask; 452 | 453 | // If remaining bits are in next word, read next word -- if there's space 454 | // We don't want to read the next word though if we're already at the end 455 | if remaining <= num_bits && pos < (total_bytes as usize) { 456 | // Read in MSB bits from next word 457 | in_word = direct_read_uint_le(inbuf, pos)?; 458 | pos += 8; 459 | if remaining < num_bits { 460 | let shifted = in_word << remaining; 461 | out_word |= shifted & mask; 462 | } 463 | } 464 | 465 | out_array[bit] = out_word << trailing_zeros; 466 | 467 | // Update other indices 468 | bit_cursor = (bit_cursor + num_bits) % 64; 469 | } 470 | } 471 | output.process(u64x8::from_slice_unaligned(&out_array)); 472 | // Return the "remaining slice" - the rest of input buffer after we've parsed our bytes. 473 | // This allows for easy and clean chaining of nibble_unpack8 calls with no mutable state 474 | Ok(&inbuf[(total_bytes as usize)..]) 475 | } 476 | } 477 | 478 | #[test] 479 | fn nibblepack8_all_zeroes() { 480 | let mut buf = [0u8; 512]; 481 | let inputs = [0u64; 8]; 482 | let res = nibble_pack8(&inputs, &mut buf, 0); 483 | dbg!(is_x86_feature_detected!("avx2")); 484 | assert_eq!(res, Ok(1)); 485 | assert_eq!(buf[..1], [0u8]); 486 | } 487 | 488 | #[rustfmt::skip] 489 | #[test] 490 | fn nibblepack8_all_evennibbles() { 491 | // All 8 are nonzero, even # nibbles 492 | let mut buf = [0u8; 512]; 493 | let inputs = [ 0x0000_00fe_dcba_0000u64, 0x0000_0033_2211_0000u64, 494 | 0x0000_0044_3322_0000u64, 0x0000_0055_4433_0000u64, 495 | 0x0000_0066_5544_0000u64, 0x0000_0076_5432_0000u64, 496 | 0x0000_0087_6543_0000u64, 0x0000_0098_7654_0000u64, ]; 497 | let res = nibble_pack8(&inputs, &mut buf, 0); 498 | 499 | // Expected result: 500 | let expected_buf = [ 501 | 0xffu8, // Every input is nonzero, all bits on 502 | 0x54u8, // six nibbles wide, four zero nibbles trailing 503 | 0xbau8, 0xdcu8, 0xfeu8, 0x11u8, 0x22u8, 0x33u8, 0x22u8, 0x33u8, 0x44u8, 504 | 0x33u8, 0x44u8, 0x55u8, 0x44u8, 0x55u8, 0x66u8, 0x32u8, 0x54u8, 0x76u8, 505 | 0x43u8, 0x65u8, 0x87u8, 0x54u8, 0x76u8, 0x98u8, ]; 506 | assert_eq!(res, Ok(2 + 3 * 8)); 507 | assert_eq!(buf[..expected_buf.len()], expected_buf); 508 | } 509 | 510 | // Even nibbles with different combos of partial 511 | #[rustfmt::skip] // We format the arrays specially to help visually see input vs output. Don't reformat. 512 | #[test] 513 | fn nibblepack8_partial_evennibbles() { 514 | // All 8 are nonzero, even # nibbles 515 | let mut buf = [0u8; 512]; 516 | let inputs = [ 517 | 0u64, 518 | 0x0000_0033_2211_0000u64, 0x0000_0044_3322_0000u64, 519 | 0x0000_0055_4433_0000u64, 0x0000_0066_5544_0000u64, 520 | 0u64, 521 | 0u64, 522 | 0u64, 523 | ]; 524 | let res = nibble_pack8(&inputs, &mut buf, 0); 525 | 526 | // Expected result: 527 | let expected_buf = [ 528 | 0b0001_1110u8, // only some bits on 529 | 0x54u8, // six nibbles wide, four zero nibbles trailing 530 | 0x11u8, 0x22u8, 0x33u8, 0x22u8, 0x33u8, 0x44u8, 531 | 0x33u8, 0x44u8, 0x55u8, 0x44u8, 0x55u8, 0x66u8, 532 | ]; 533 | assert_eq!(res, Ok(2 + 3 * 4)); 534 | assert_eq!(buf[..expected_buf.len()], expected_buf); 535 | } 536 | 537 | // Odd nibbles with different combos of partial 538 | #[rustfmt::skip] 539 | #[test] 540 | fn nibblepack8_partial_oddnibbles() { 541 | // All 8 are nonzero, even # nibbles 542 | let mut buf = [0u8; 512]; 543 | let inputs = [ 544 | 0u64, 545 | 0x0000_0033_2210_0000u64, 0x0000_0044_3320_0000u64, 546 | 0x0000_0055_4430_0000u64, 0x0000_0066_5540_0000u64, 547 | 0x0000_0076_5430_0000u64, 0u64, 0u64, 548 | ]; 549 | let res = nibble_pack8(&inputs, &mut buf, 0); 550 | 551 | // Expected result: 552 | let expected_buf = [ 553 | 0b0011_1110u8, // only some bits on 554 | 0x45u8, // five nibbles wide, five zero nibbles trailing 555 | 0x21u8, 0x32u8, 0x23u8, 0x33u8, 0x44u8, // First two values 556 | 0x43u8, 0x54u8, 0x45u8, 0x55u8, 0x66u8, 557 | 0x43u8, 0x65u8, 0x07u8, 558 | ]; 559 | assert_eq!(res, Ok(expected_buf.len())); 560 | assert_eq!(buf[..expected_buf.len()], expected_buf); 561 | } 562 | 563 | // Odd nibbles > 8 nibbles 564 | #[rustfmt::skip] 565 | #[test] 566 | fn nibblepack8_partial_oddnibbles_large() { 567 | // All 8 are nonzero, even # nibbles 568 | let mut buf = [0u8; 512]; 569 | let inputs = [ 570 | 0u64, 571 | 0x0005_4433_2211_0000u64, 0x0000_0044_3320_0000u64, 572 | 0x0007_6655_4433_0000u64, 0x0000_0066_5540_0000u64, 573 | 0x0001_9876_5430_0000u64, 0u64, 0u64, 574 | ]; 575 | let res = nibble_pack8(&inputs, &mut buf, 0); 576 | 577 | // Expected result: 578 | let expected_buf = [ 579 | 0b0011_1110u8, // only some bits on 580 | 0x84u8, // nine nibbles wide, four zero nibbles trailing 581 | 0x11u8, 0x22u8, 0x33u8, 0x44u8, 0x05u8, 0x32u8, 0x43u8, 0x04u8, 0, 582 | 0x33u8, 0x44u8, 0x55u8, 0x66u8, 0x07u8, 0x54u8, 0x65u8, 0x06u8, 0, 583 | 0x30u8, 0x54u8, 0x76u8, 0x98u8, 0x01u8, 584 | ]; 585 | assert_eq!(res, Ok(expected_buf.len())); 586 | assert_eq!(buf[..expected_buf.len()], expected_buf); 587 | } 588 | 589 | #[rustfmt::skip] 590 | #[test] 591 | fn nibblepack8_64bit_numbers() { 592 | let mut buf = [0u8; 512]; 593 | let inputs = [0, 0, -1i32 as u64, -2i32 as u64, 0, -100234i32 as u64, 0, 0]; 594 | let res = nibble_pack8(&inputs, &mut buf, 0); 595 | 596 | let expected_buf = [ 597 | 0b0010_1100u8, 598 | 0xf0u8, // all 16 nibbles wide, zero nibbles trailing 599 | 0xffu8, 0xffu8, 0xffu8, 0xffu8, 0xffu8, 0xffu8, 0xffu8, 0xffu8, 600 | 0xfeu8, 0xffu8, 0xffu8, 0xffu8, 0xffu8, 0xffu8, 0xffu8, 0xffu8, 601 | 0x76u8, 0x78u8, 0xfeu8, 0xffu8, 0xffu8, 0xffu8, 0xffu8, 0xffu8, 602 | ]; 603 | assert_eq!(res, Ok(expected_buf.len())); 604 | assert_eq!(buf[..expected_buf.len()], expected_buf); 605 | } 606 | 607 | #[test] 608 | fn unpack8_all_zeroes() { 609 | let compressed_array = [0x00u8]; 610 | let mut sink = VecSink::::new(); 611 | let res = nibble_unpack8(&compressed_array, &mut sink); 612 | assert_eq!(res.unwrap().is_empty(), true); 613 | assert_eq!(sink.vec.len(), 8); 614 | assert_eq!(sink.vec[..], [0u64; 8]); 615 | } 616 | 617 | #[rustfmt::skip] 618 | #[test] 619 | fn unpack8_input_too_short() { 620 | let compressed = [ 621 | 0b0011_1110u8, // only some bits on 622 | 0x84u8, // nine nibbles wide, four zero nibbles trailing 623 | 0x11u8, 0x22u8, 0x33u8, 0x44u8, 0x05u8, 0x32u8, 0x43u8, 0x04u8, 0, 624 | 0x33u8, 0x44u8, 0x55u8, 0x66u8, 0x07u8, 625 | ]; // too short!! 626 | let mut sink = VecSink::::new(); 627 | let res = nibble_unpack8(&compressed, &mut sink); 628 | assert_eq!(res, Err(CodingError::NotEnoughSpace)); 629 | } 630 | 631 | // Tests the case where nibbles lines up with 64-bit boundaries - edge case 632 | #[test] 633 | fn unpack8_4nibbles_allfull() { 634 | // 4 nibbles = 2^16, so values < 65536 635 | let inputs = [65535u64; 8]; 636 | let mut buf = [0u8; 512]; 637 | let written = nibble_pack8(&inputs, &mut buf, 0).unwrap(); 638 | 639 | let mut sink = VecSink::::new(); 640 | let res = nibble_unpack8(&buf[..written], &mut sink); 641 | assert_eq!(res.unwrap().len(), 0); 642 | assert_eq!(sink.vec[..], inputs); 643 | } 644 | 645 | #[test] 646 | fn unpack8_partial_oddnibbles() { 647 | let compressed = [ 648 | 0b0011_1110u8, // only some bits on 649 | 0x84u8, // nine nibbles wide, four zero nibbles trailing 650 | 0x11u8, 0x22u8, 0x33u8, 0x44u8, 0x05u8, 0x32u8, 0x43u8, 0x04u8, 0, 651 | 0x33u8, 0x44u8, 0x55u8, 0x66u8, 0x07u8, 0x54u8, 0x65u8, 0x06u8, 0, 652 | 0x30u8, 0x54u8, 0x76u8, 0x98u8, 0x01u8, 653 | 0x00u8, ]; // extra padding... just to test the return value 654 | let mut sink = VecSink::::new(); 655 | let res = nibble_unpack8(&compressed, &mut sink); 656 | assert_eq!(res.unwrap().len(), 1); 657 | assert_eq!(sink.vec.len(), 8); 658 | 659 | let orig = [ 660 | 0u64, 661 | 0x0005_4433_2211_0000u64, 0x0000_0044_3320_0000u64, 662 | 0x0007_6655_4433_0000u64, 0x0000_0066_5540_0000u64, 663 | 0x0001_9876_5430_0000u64, 0u64, 0u64, 664 | ]; 665 | 666 | assert_eq!(sink.vec[..], orig); 667 | } 668 | 669 | #[test] 670 | fn pack_unpack_u64_plain() { 671 | let inputs = [0u64, 1000, 1001, 1002, 1003, 2005, 2010, 3034, 4045, 5056, 6067, 7078]; 672 | let mut buf = [0u8; 512]; 673 | let written = pack_u64(inputs.iter().cloned(), &mut buf, 0).unwrap(); 674 | println!("Packed {} u64 inputs (plain) into {} bytes", inputs.len(), written); 675 | 676 | let mut sink = VecSink::::new(); 677 | let res = unpack(&buf[..written], &mut sink, inputs.len()); 678 | assert_eq!(res.unwrap().len(), 0); 679 | assert_eq!(sink.vec[..inputs.len()], inputs); 680 | } 681 | 682 | #[test] 683 | fn test_unpack_u64_plain_iter() { 684 | let inputs = [0u64, 1000, 1001, 1002, 1003, 2005, 2010, 3034, 4045, 5056, 6067, 7078]; 685 | let mut buf = [0u8; 512]; 686 | // NOTE: into_iter() of an array returns an Iterator, cloned() is needed to convert back to u64 687 | let written = pack_u64(inputs.iter().cloned(), &mut buf, 0).unwrap(); 688 | 689 | let mut sink = U64_256Sink::new(); 690 | unpack(&buf[0..written], &mut sink, inputs.len()).unwrap(); 691 | assert_eq!(sink.values[0..inputs.len()], inputs); 692 | } 693 | 694 | #[test] 695 | fn pack_unpack_u64_deltas() { 696 | let inputs = [0u64, 1000, 1001, 1002, 1003, 2005, 2010, 3034, 4045, 5056, 6067, 7078]; 697 | let mut buf = [0u8; 512]; 698 | // NOTE: into_iter() of an array returns an Iterator, cloned() is needed to convert back to u64 699 | let written = pack_u64_delta(&inputs[..], &mut buf).unwrap(); 700 | println!("Packed {} u64 inputs (delta) into {} bytes", inputs.len(), written); 701 | 702 | let mut sink = DeltaSink::new(); 703 | let res = unpack(&buf[..written], &mut sink, inputs.len()); 704 | assert_eq!(res.unwrap().len(), 0); 705 | assert_eq!(sink.sink.vec[..inputs.len()], inputs); 706 | } 707 | 708 | #[test] 709 | fn pack_unpack_f64_xor() { 710 | let inputs = [0f64, 0.5, 2.5, 10., 25., 100.]; 711 | let mut buf = [0u8; 512]; 712 | let written = pack_f64_xor(inputs.iter().cloned(), &mut buf).unwrap(); 713 | println!("Packed {} f64 inputs (XOR) into {} bytes", inputs.len(), written); 714 | 715 | let out = Vec::::with_capacity(64); 716 | let mut sink = DoubleXorSink::new(out); 717 | let res = unpack_f64_xor(&buf[..written], &mut sink, inputs.len()); 718 | assert_eq!(res.unwrap().len(), 0); 719 | assert_eq!(sink.vec[..inputs.len()], inputs); 720 | } 721 | 722 | // NOTE: cfg(test) is needed so that proptest can just be a "dev-dependency" and not linked for final library 723 | // NOTE2: somehow cargo is happier when we put props tests in its own module 724 | #[cfg(test)] 725 | mod props { 726 | extern crate proptest; 727 | 728 | use self::proptest::prelude::*; 729 | use super::*; 730 | 731 | // Generators (Arb's) for numbers of given # bits with fractional chance of being zero. 732 | // Also input arrays of 8 with the given properties above. 733 | prop_compose! { 734 | /// zero_chance: 0..1.0 chance of obtaining a zero 735 | fn arb_maybezero_nbits_u64 736 | (nbits: usize, zero_chance: f32) 737 | (is_zero in prop::bool::weighted(zero_chance as f64), 738 | n in 0u64..(1 << nbits)) 739 | -> u64 740 | { 741 | if is_zero { 0 } else { n } 742 | } 743 | } 744 | 745 | // Try different # bits and # nonzero elements 746 | prop_compose! { 747 | fn arb_8longs_nbits() 748 | (nbits in 4usize..64, chance in 0.2f32..0.8) 749 | (input in prop::array::uniform8(arb_maybezero_nbits_u64(nbits, chance))) -> [u64; 8] { 750 | input 751 | } 752 | } 753 | 754 | // Generate variable length increasing/deltas u64's 755 | prop_compose! { 756 | fn arb_varlen_deltas() 757 | (nbits in 4usize..48, chance in 0.2f32..0.8) 758 | (mut v in proptest::collection::vec(arb_maybezero_nbits_u64(nbits, chance), 2..64)) -> Vec { 759 | for i in 1..v.len() { 760 | // make numbers increasing 761 | v[i] = v[i - 1] + v[i]; 762 | } 763 | v 764 | } 765 | } 766 | 767 | proptest! { 768 | #[test] 769 | fn prop_pack_unpack_identity(input in arb_8longs_nbits()) { 770 | let mut buf = [0u8; 256]; 771 | nibble_pack8(&input, &mut buf, 0).unwrap(); 772 | 773 | let mut sink = VecSink::::new(); 774 | let _res = nibble_unpack8(&buf[..], &mut sink); 775 | assert_eq!(sink.vec[..], input); 776 | } 777 | 778 | #[test] 779 | fn prop_delta_u64s_packing(input in arb_varlen_deltas()) { 780 | let mut buf = [0u8; 512]; 781 | pack_u64_delta(&input[..], &mut buf).unwrap(); 782 | let mut sink = DeltaSink::new(); 783 | let _res = unpack(&buf, &mut sink, input.len()); 784 | assert_eq!(sink.sink.vec[..input.len()], input[..]); 785 | } 786 | } 787 | } 788 | -------------------------------------------------------------------------------- /src/sink.rs: -------------------------------------------------------------------------------- 1 | /// A sink processes data during unpacking. The type, Input, is supposed to represent 8 integers of fixed width, 2 | /// since NibblePack works on 8 ints at a time. 3 | /// This module contains common sinks for all types, such as ones used for vector/section unpacking/iteration, 4 | /// and sinks for writing to Vecs. 5 | /// Sinks can be stacked for processing. For example, unpack and multiply f32's, then store to Vec: 6 | /// regular unpack8_u32_simd -> u32 to f32 XOR sink -> MultiplySink -> VecSink 7 | /// TODO: examples 8 | use core::marker::PhantomData; 9 | use std::ops::{Add, BitXor}; 10 | 11 | use crate::section::VectBase; 12 | 13 | use num::{Zero, Unsigned, Float}; 14 | use packed_simd::{u32x8, u64x8, f32x8, FromCast, FromBits, IntoBits}; 15 | 16 | /// An input to a sink. Sinks take a type which represents 8 values of an int, such as [u64; 8]. 17 | /// Item type represents the underlying type of each individual item in the 8 item SinkInput. 18 | pub trait SinkInput: Copy + core::fmt::Debug { 19 | type Item: Zero + Copy; 20 | 21 | const ZERO: Self; // The zero item for myself 22 | 23 | /// Writes the sink input to a mutable slice of type Item 24 | fn write_to_slice(&self, slice: &mut [Self::Item]); 25 | 26 | /// Creates one of these types from a base Item type by splatting (replicating it 8x) 27 | fn splat(item: Self::Item) -> Self; 28 | 29 | /// Methods for implementing filtering/masking. 30 | /// Compares my 8 values to other 8 values, returning a bitmask for equality 31 | fn eq_mask(self, other: Self) -> u8; 32 | 33 | /// Loads the bits from a slice into a u64x8. Mostly used for converting FP bits to int bits for XORing. 34 | fn to_u64x8_bits(slice: &[Self::Item]) -> u64x8; 35 | } 36 | 37 | // TODO: remove 38 | impl SinkInput for [u64; 8] { 39 | type Item = u64; 40 | const ZERO: [u64; 8] = [0u64; 8]; 41 | 42 | #[inline] 43 | fn write_to_slice(&self, slice: &mut [Self::Item]) { slice.copy_from_slice(self) } 44 | 45 | #[inline] 46 | fn splat(item: u64) -> Self { [item; 8] } 47 | 48 | #[inline] 49 | fn eq_mask(self, other: Self) -> u8 { 50 | let mut mask = 0u8; 51 | for i in 0..8 { 52 | if self[i] == other[i] { 53 | mask |= 1 << i; 54 | } 55 | } 56 | mask 57 | } 58 | 59 | #[inline] 60 | fn to_u64x8_bits(_slice: &[u64]) -> u64x8 { todo!("blah") } 61 | } 62 | 63 | impl SinkInput for u64x8 { 64 | type Item = u64; 65 | const ZERO: u64x8 = u64x8::splat(0); 66 | 67 | #[inline] 68 | fn write_to_slice(&self, slice: &mut [Self::Item]) { 69 | self.write_to_slice_unaligned(slice); 70 | } 71 | 72 | #[inline] 73 | fn splat(item: u64) -> Self { u64x8::splat(item) } 74 | 75 | #[inline] 76 | fn eq_mask(self, other: Self) -> u8 { 77 | self.eq(other).bitmask() 78 | } 79 | 80 | #[inline] 81 | fn to_u64x8_bits(slice: &[u64]) -> u64x8 { u64x8::from_slice_unaligned(slice) } 82 | } 83 | 84 | impl SinkInput for u32x8 { 85 | type Item = u32; 86 | const ZERO: u32x8 = u32x8::splat(0); 87 | 88 | #[inline] 89 | fn write_to_slice(&self, slice: &mut [Self::Item]) { 90 | // NOTE: use unaligned writes for now. See simd_aligned for a possible solution. 91 | // Pointer check align_offset is not enabled for now. 92 | self.write_to_slice_unaligned(slice); 93 | } 94 | 95 | #[inline] 96 | fn splat(item: u32) -> Self { u32x8::splat(item) } 97 | 98 | #[inline] 99 | fn eq_mask(self, other: Self) -> u8 { 100 | self.eq(other).bitmask() 101 | } 102 | 103 | #[inline] 104 | fn to_u64x8_bits(slice: &[u32]) -> u64x8 { 105 | u64x8::from_cast(u32x8::from_slice_unaligned(slice)) 106 | } 107 | } 108 | 109 | impl SinkInput for f32x8 { 110 | type Item = f32; 111 | const ZERO: f32x8 = f32x8::splat(0.0); 112 | 113 | #[inline] 114 | fn write_to_slice(&self, slice: &mut [f32]) { 115 | self.write_to_slice_unaligned(slice); 116 | } 117 | 118 | #[inline] 119 | fn splat(item: f32) -> Self { f32x8::splat(item) } 120 | 121 | #[inline] 122 | fn eq_mask(self, other: Self) -> u8 { 123 | self.eq(other).bitmask() 124 | } 125 | 126 | #[inline] 127 | fn to_u64x8_bits(slice: &[f32]) -> u64x8 { 128 | let f_bits: u32x8 = f32x8::from_slice_unaligned(slice).into_bits(); 129 | u64x8::from_cast(f_bits) 130 | } 131 | } 132 | 133 | /// A sink processes data during unpacking. The type, Input, is supposed to represent 8 integers of fixed width, 134 | /// since NibblePack works on 8 ints at a time. 135 | pub trait Sink { 136 | /// Processes 8 items. Sink responsible for space allocation and safety. 137 | fn process(&mut self, data: Input); 138 | 139 | /// Called when all zeroes or 8 null outputs 140 | fn process_zeroes(&mut self); 141 | 142 | /// Resets state in the sink; exact meaning depends on the sink itself. Many sinks operate on more than 143 | /// 8 items; for example 256 items or entire sections. 144 | fn reset(&mut self); 145 | } 146 | 147 | 148 | /// A Sink which writes all values to a Vec. A good choice as the final Sink in a chain of Sink processors! 149 | /// Important! This Sink will decode entire sections at a time, so the result will have up to 255 extra values. 150 | #[derive(Debug)] 151 | pub struct VecSink { 152 | pub vec: Vec, 153 | } 154 | 155 | const DEFAULT_CAPACITY: usize = 64; 156 | 157 | impl VecSink { 158 | pub fn new() -> Self { 159 | VecSink { vec: Vec::with_capacity(DEFAULT_CAPACITY) } 160 | } 161 | } 162 | 163 | impl Sink for VecSink { 164 | #[inline] 165 | fn process(&mut self, data: T::SI) { 166 | // So first we need to resize the Vec, then we write in values using write_to_slice 167 | let new_len = self.vec.len() + 8; 168 | self.vec.resize(new_len, T::zero()); 169 | data.write_to_slice(&mut self.vec[new_len-8..new_len]); 170 | } 171 | 172 | #[inline] 173 | fn process_zeroes(&mut self) { 174 | for _ in 0..8 { 175 | self.vec.push(T::zero()); 176 | } 177 | } 178 | 179 | fn reset(&mut self) { 180 | self.vec.clear() 181 | } 182 | } 183 | 184 | // #[repr(simd)] // SIMD 32x8 alignment 185 | // struct U32Values([u32; 256]); 186 | 187 | /// A simple sink storing up to 256 values in an array, ie all the values in a section. 188 | /// Useful for iterating over or processing all the raw values of a section. 189 | // NOTE (u32x8): we want to do fast aligned SIMD writes, but looks like that might not happen. 190 | // See simd_aligned for a possible solution. It is possible the alignment check might fail 191 | // due to values being a [u32];. 192 | // TODO for SIMD: Try using aligned crate (https://docs.rs/aligned/0.3.2/aligned/) and see if 193 | // it allows for aligned writes 194 | #[repr(align(32))] // SIMD alignment? 195 | pub struct Section256Sink 196 | where T: VectBase { 197 | pub values: [T; 256], 198 | i: usize, 199 | } 200 | 201 | impl Section256Sink 202 | where T: VectBase { 203 | pub fn new() -> Self { 204 | Self { values: [T::zero(); 256], i: 0 } 205 | } 206 | } 207 | 208 | impl Sink for Section256Sink 209 | where T: VectBase { 210 | #[inline] 211 | fn process(&mut self, unpacked: T::SI) { 212 | if self.i < self.values.len() { 213 | unpacked.write_to_slice(&mut self.values[self.i..self.i+8]); 214 | self.i += 8; 215 | } 216 | } 217 | 218 | #[inline] 219 | fn process_zeroes(&mut self) { 220 | if self.i < self.values.len() { 221 | // We need to write zeroes in case the sink is reused; previous values won't be zero. 222 | // This is fairly fast in any case. NOTE: fill() is a new API in nightly 223 | // Alternative, not quite as fast, is use copy_from_slice() and memcpy from zero slice 224 | self.values[self.i..self.i+8].fill(T::zero()); 225 | self.i += 8; 226 | } 227 | } 228 | 229 | fn reset(&mut self) { 230 | self.i = 0; // No need to zero things out, process() methods will fill properly 231 | } 232 | } 233 | 234 | pub type U32_256Sink = Section256Sink; 235 | pub type U64_256Sink = Section256Sink; 236 | 237 | 238 | /// A sink for FP/XOR decoding. Keeps a running "last bits" octet and XORs each new octet with the last one. 239 | /// Forwards resulting XORed/restored output to another sink. 240 | #[derive(Debug)] 241 | pub struct XorSink<'a, F, I, S> 242 | where F: VectBase + Float, // Output floating point type 243 | I: VectBase + Unsigned, // Input: unsigned (u32/u64) int type 244 | S: Sink { 245 | last_bits: I::SI, 246 | inner_sink: &'a mut S, 247 | _f: PhantomData, 248 | } 249 | 250 | impl<'a, F, I, S> XorSink<'a, F, I, S> 251 | where F: VectBase + Float, // Output floating point type 252 | I: VectBase + Unsigned, // Input: unsigned (u32/u64) type 253 | S: Sink { 254 | pub fn new(inner_sink: &'a mut S) -> Self { 255 | Self { 256 | last_bits: I::SI::ZERO, 257 | inner_sink, 258 | _f: PhantomData, 259 | } 260 | } 261 | } 262 | 263 | impl<'a, F, I, S> Sink for XorSink<'a, F, I, S> 264 | where F: VectBase + Float, // Output floating point type 265 | I: VectBase + Unsigned, // Input: unsigned (u32/u64) type 266 | S: Sink, 267 | // bitxor is supported for underlying int types, and into_bits supported to/from FP types 268 | I::SI: BitXor, 269 | F::SI: FromBits { 270 | #[inline] 271 | fn process(&mut self, unpacked: I::SI) where { 272 | let new_bits = self.last_bits.bitxor(unpacked); 273 | self.last_bits = new_bits; 274 | self.inner_sink.process(new_bits.into_bits()); 275 | } 276 | 277 | #[inline] 278 | fn process_zeroes(&mut self) { 279 | // last XOR 0 == last 280 | self.inner_sink.process(self.last_bits.into_bits()); 281 | } 282 | 283 | fn reset(&mut self) {} 284 | } 285 | 286 | /// A Sink for adding a constant value to all output elements. Note that all SIMD types we use also support Add :) 287 | /// This sink is also used for decoding Delta-encoded u64/u32 values, then passing the output to another sink. 288 | #[derive(Debug)] 289 | pub struct AddConstSink<'a, T, S> 290 | where T: VectBase, 291 | S: Sink { 292 | base: T::SI, 293 | inner_sink: &'a mut S, 294 | } 295 | 296 | impl<'a, T, S> AddConstSink<'a, T, S> 297 | where T: VectBase, 298 | S: Sink { 299 | pub fn new(base: T, inner_sink: &'a mut S) -> Self { 300 | Self { base: T::SI::splat(base), inner_sink } 301 | } 302 | } 303 | 304 | impl<'a, T, S> Sink for AddConstSink<'a, T, S> 305 | where T: VectBase, 306 | S: Sink, 307 | T::SI: Add { 308 | #[inline] 309 | fn process(&mut self, unpacked: T::SI) { 310 | self.inner_sink.process(unpacked + self.base); 311 | } 312 | 313 | #[inline] 314 | fn process_zeroes(&mut self) { 315 | // base + 0 == base 316 | self.inner_sink.process(self.base); 317 | } 318 | 319 | fn reset(&mut self) {} 320 | } -------------------------------------------------------------------------------- /src/vector.rs: -------------------------------------------------------------------------------- 1 | /// vector module contains `BinaryVector`, which allows creation of compressed binary vectors which can be 2 | /// appended to and read, queried, filtered, etc. quickly. 3 | /// 4 | /// ## Appending values and reading them back 5 | /// 6 | /// Appending values is easy. Appenders dynamically size the input buffer. 7 | /// ``` 8 | /// # use compressed_vec::vector::*; 9 | /// let mut appender = VectorU32Appender::try_new(1024).unwrap(); 10 | /// appender.append(1).unwrap(); 11 | /// appender.append(2).unwrap(); 12 | /// appender.append_nulls(3).unwrap(); 13 | /// assert_eq!(appender.num_elements(), 5); 14 | /// 15 | /// let reader = appender.reader(); 16 | /// println!("Elements so far: {:?}", reader.iterate().count()); 17 | /// 18 | /// // Continue appending! 19 | /// appender.append(10).unwrap(); 20 | /// ``` 21 | /// 22 | /// ## Finishing vectors 23 | /// 24 | /// Calling `finish()` clones the vector bytes to the smallest representation possible, after which the 25 | /// appender is reset for creation of another new vector. The finished vector is then immutable and the 26 | /// caller can read it. 27 | use std::collections::HashMap; 28 | use std::marker::PhantomData; 29 | use std::mem; 30 | 31 | use scroll::{ctx, Endian, Pread, Pwrite, LE}; 32 | 33 | use crate::error::CodingError; 34 | use crate::filter::{SectFilterSink, VectorFilter}; 35 | use crate::section::*; 36 | use crate::sink::*; 37 | 38 | /// BinaryVector: a compressed vector storing data of the same type 39 | /// enabling high speed operations on compressed data without 40 | /// the need for decompressing (in many cases, exceptions noted) 41 | /// 42 | /// A BinaryVector MAY consist of multiple sections. Each section can represent 43 | /// potentially different encoding parameters (bit widths, sparsity, etc.) and 44 | /// has its own header to allow for quickly skipping ahead even when different 45 | /// sections are encoded differently. 46 | /// 47 | /// This struct describes a common header for all BinaryVectors. Note that the 48 | /// first 16 bytes of a BinaryVector are reserved for the header, not just what is 49 | /// defined here. 50 | /// The major and minor types and the header bytes are compatible with FiloDB BinaryVectors. 51 | #[repr(C)] 52 | #[derive(Debug, Clone, PartialEq, Pwrite)] 53 | pub struct BinaryVector { 54 | num_bytes: u32, // Number of bytes in vector following this length 55 | major_type: VectorType, // These should probably be enums no? 56 | minor_type: VectorSubType, 57 | _padding: u16, 58 | } 59 | 60 | #[repr(u8)] 61 | #[derive(Copy, Clone, Debug, PartialEq)] 62 | pub enum VectorType { 63 | Empty = 0x01, 64 | BinSimple = 0x06, 65 | BinDict = 0x07, 66 | Delta2 = 0x08, // Delta-delta encoded 67 | Histogram = 0x09, // FiloDB sections with Histogram chunks per section 68 | FixedSection256 = 0x10, // Fixed 256-element sections 69 | } 70 | 71 | impl VectorType { 72 | pub fn as_num(&self) -> u8 { *self as u8 } 73 | } 74 | 75 | impl ctx::TryIntoCtx for &VectorType { 76 | type Error = scroll::Error; 77 | fn try_into_ctx(self, buf: &mut [u8], ctx: Endian) -> Result { 78 | u8::try_into_ctx(self.as_num(), buf, ctx) 79 | } 80 | } 81 | 82 | #[repr(u8)] 83 | #[derive(Copy, Clone, Debug, PartialEq)] 84 | pub enum VectorSubType { 85 | Primitive = 0x00, 86 | STRING = 0x01, 87 | UTF8 = 0x02, 88 | FIXEDMAXUTF8 = 0x03, // fixed max size per blob, length byte 89 | DATETIME = 0x04, 90 | PrimitiveNoMask = 0x05, 91 | REPEATED = 0x06, // vectors.ConstVector 92 | INT = 0x07, // Int gets special type because Longs and Doubles may be encoded as Int 93 | IntNoMask = 0x08, 94 | FixedU64 = 0x10, // FixedSection256 with u64 elements 95 | FixedU32 = 0x11, // FixedSection256 with u32 elements 96 | FixedF32 = 0x12, // FixedSection256 with f32 elements 97 | } 98 | 99 | impl VectorSubType { 100 | pub fn as_num(&self) -> u8 { *self as u8 } 101 | } 102 | 103 | impl ctx::TryIntoCtx for &VectorSubType { 104 | type Error = scroll::Error; 105 | fn try_into_ctx(self, buf: &mut [u8], ctx: Endian) -> Result { 106 | u8::try_into_ctx(self.as_num(), buf, ctx) 107 | } 108 | } 109 | 110 | const NUM_HEADER_BYTES_TOTAL: usize = 16; 111 | const BINARYVECT_HEADER_SIZE: usize = std::mem::size_of::(); 112 | 113 | impl BinaryVector { 114 | pub fn new(major_type: VectorType, minor_type: VectorSubType) -> Self { 115 | Self { num_bytes: NUM_HEADER_BYTES_TOTAL as u32 - 4, major_type, minor_type, _padding: 0 } 116 | } 117 | 118 | /// Returns the length of the BinaryVector including the length bytes 119 | pub fn whole_length(&self) -> u32 { 120 | self.num_bytes + (mem::size_of::() as u32) 121 | } 122 | 123 | pub fn reset(&mut self) { 124 | self.num_bytes = NUM_HEADER_BYTES_TOTAL as u32 - 4; 125 | } 126 | 127 | /// Writes the entire BinaryVector header into the beginning of the given buffer 128 | pub fn write_header(&self, buf: &mut [u8]) -> Result<(), CodingError> { 129 | buf.pwrite_with(self, 0, LE)?; 130 | Ok(()) 131 | } 132 | 133 | /// Updates the number of bytes in the vector. 134 | /// The num_body_bytes should be the number of bytes AFTER the 16-byte BinaryVector header. 135 | /// The buffer slice should point to the beginning of the header ie the length bytes 136 | pub fn update_num_bytes(&mut self, 137 | buf: &mut [u8], 138 | num_body_bytes: u32) -> Result<(), CodingError> { 139 | self.num_bytes = num_body_bytes + (NUM_HEADER_BYTES_TOTAL - 4) as u32; 140 | buf.pwrite_with(self.num_bytes, 0, LE)?; 141 | Ok(()) 142 | } 143 | } 144 | 145 | 146 | /// Mapping of VectBase type to VectorSubType. Allows checking of vector type by reader. 147 | pub trait BaseSubtypeMapping { 148 | fn vect_subtype() -> VectorSubType; 149 | } 150 | 151 | impl BaseSubtypeMapping for u64 { 152 | fn vect_subtype() -> VectorSubType { VectorSubType::FixedU64 } 153 | } 154 | 155 | impl BaseSubtypeMapping for u32 { 156 | fn vect_subtype() -> VectorSubType { VectorSubType::FixedU32 } 157 | } 158 | 159 | impl BaseSubtypeMapping for f32 { 160 | fn vect_subtype() -> VectorSubType { VectorSubType::FixedF32 } 161 | } 162 | 163 | #[derive(Debug, Copy, Clone, Pread, Pwrite)] 164 | pub struct FixedSectStats { 165 | pub num_elements: u32, 166 | num_null_sections: u16, 167 | } 168 | 169 | impl FixedSectStats { 170 | pub fn new() -> Self { 171 | Self { num_elements: 0, num_null_sections: 0 } 172 | } 173 | 174 | pub fn reset(&mut self) { 175 | self.num_elements = 0; 176 | self.num_null_sections = 0; 177 | } 178 | 179 | /// Updates the number of elements only. Writes entire stats at once. 180 | /// Assumes buf points to beginning of _vector_ not this struct. 181 | pub fn update_num_elems(&mut self, buf: &mut [u8], num_elements: u32) -> Result<(), CodingError> { 182 | self.num_elements = num_elements; 183 | buf.pwrite_with(*self, BINARYVECT_HEADER_SIZE, LE)?; 184 | Ok(()) 185 | } 186 | } 187 | 188 | const GROW_BYTES: usize = 4096; 189 | 190 | /// A builder for a BinaryVector holding encoded/compressed integral/floating values 191 | /// as 256-element FixedSections. Buffers elements to be written and writes 192 | /// them in 256-element sections at a time. This builder owns its own write buffer memory, expanding it 193 | /// as needed. `finish()` wraps up the vector, cloning a copy, and `reset()` can be called to reuse 194 | /// this appender. The write buffer stays with this builder to minimize allocations. 195 | /// NOTE: the vector state (elements, num bytes etc) are only updated when a section is updated. 196 | /// So readers who read just the vector itself will not get the updates in write_buf. 197 | /// This appender must be consulted for querying write_buf values. 198 | /// 199 | /// The easiest way to encode a vector is to create an appender, then use `encode_all()`: 200 | /// ``` 201 | /// # use compressed_vec::vector::VectorF32XorAppender; 202 | /// let my_vec = vec![0.5, 1.0, 1.5]; 203 | /// let mut appender = VectorF32XorAppender::try_new(2048).unwrap(); 204 | /// let bytes = appender.encode_all(my_vec).unwrap(); 205 | /// ``` 206 | #[derive(Clone)] 207 | pub struct VectorAppender 208 | where T: VectBase + Clone + PartialOrd, 209 | W: FixedSectionWriter { 210 | vect_buf: Vec, 211 | offset: usize, 212 | header: BinaryVector, 213 | write_buf: Vec, 214 | stats: FixedSectStats, 215 | sect_writer: PhantomData // Uses no space, this tells rustc we need W 216 | } 217 | 218 | impl VectorAppender 219 | where T: VectBase + Clone + PartialOrd + BaseSubtypeMapping, 220 | W: FixedSectionWriter { 221 | /// Creates a new VectorAppender. Initializes the vect_buf with a valid section header. 222 | /// Initial capacity is the initial size of the write buffer, which can grow. 223 | pub fn try_new(initial_capacity: usize) -> Result { 224 | let mut new_self = Self { 225 | vect_buf: vec![0; initial_capacity], 226 | offset: NUM_HEADER_BYTES_TOTAL, 227 | header: BinaryVector::new(VectorType::FixedSection256, T::vect_subtype()), 228 | write_buf: Vec::with_capacity(FIXED_LEN), 229 | stats: FixedSectStats::new(), 230 | sect_writer: PhantomData 231 | }; 232 | new_self.write_header()?; 233 | Ok(new_self) 234 | } 235 | 236 | /// Convenience method to append all values from a collection and finish a vector, returning the encoded bytes. 237 | /// Appender is reset and ready to use, so this can be called repeatedly for successive vectors. 238 | pub fn encode_all(&mut self, collection: C) -> Result, CodingError> 239 | where C: IntoIterator { 240 | let mut count = 0; 241 | for x in collection.into_iter() { 242 | count += 1; 243 | self.append(x)?; 244 | }; 245 | self.finish(count) 246 | } 247 | 248 | /// Total number of elements including encoded sections and write buffer 249 | pub fn num_elements(&self) -> usize { 250 | self.stats.num_elements as usize + self.write_buf.len() 251 | } 252 | 253 | /// Resets the internal state for appending a new vector. 254 | pub fn reset(&mut self) -> Result<(), CodingError> { 255 | self.header.reset(); 256 | self.offset = NUM_HEADER_BYTES_TOTAL; 257 | self.write_buf.clear(); 258 | self.vect_buf.resize(self.vect_buf.capacity(), 0); // Make sure entire vec is usable 259 | self.stats.reset(); 260 | self.stats.update_num_elems(&mut self.vect_buf, 0)?; 261 | self.write_header() 262 | } 263 | 264 | /// Writes out the header for the vector. Done automatically during try_new() / reset(). 265 | fn write_header(&mut self) -> Result<(), CodingError> { 266 | self.header.write_header(self.vect_buf.as_mut_slice()) 267 | } 268 | 269 | /// Encodes all the values in write_buf. Adjust the number of elements and other vector state. 270 | fn encode_section(&mut self) -> Result<(), CodingError> { 271 | assert!(self.write_buf.len() == FIXED_LEN); 272 | self.offset = self.retry_grow(|s| W::gen_stats_and_write(s.vect_buf.as_mut_slice(), 273 | s.offset, 274 | &s.write_buf[..]))?; 275 | self.write_buf.clear(); 276 | self.stats.update_num_elems(&mut self.vect_buf, self.stats.num_elements + FIXED_LEN as u32)?; 277 | self.header.update_num_bytes(self.vect_buf.as_mut_slice(), 278 | (self.offset - NUM_HEADER_BYTES_TOTAL) as u32) 279 | } 280 | 281 | /// Retries a func which might return Result<..., CodingError> by growing the vect_buf. 282 | /// If it still fails then we return the Err. 283 | fn retry_grow(&mut self, mut func: F) -> Result 284 | where F: FnMut(&mut Self) -> Result { 285 | func(self).or_else(|err| { 286 | match err { 287 | CodingError::NotEnoughSpace | CodingError::BadOffset(_) => { 288 | // Expand vect_buf 289 | self.vect_buf.reserve(GROW_BYTES); 290 | self.vect_buf.resize(self.vect_buf.capacity(), 0); 291 | func(self) 292 | } 293 | _ => Err(err), 294 | } 295 | }) 296 | } 297 | 298 | /// Appends a single value to this vector. When a section fills up, will encode all values in write buffer 299 | /// into the vector. 300 | pub fn append(&mut self, value: T) -> Result<(), CodingError> { 301 | self.write_buf.push(value); 302 | if self.write_buf.len() >= FIXED_LEN { 303 | self.encode_section() 304 | } else { 305 | Ok(()) 306 | } 307 | } 308 | 309 | /// Appends a number of nulls at once to the vector. Super useful and fast for sparse data. 310 | /// Nulls are equivalent to zero value for type T. 311 | pub fn append_nulls(&mut self, num_nulls: usize) -> Result<(), CodingError> { 312 | let mut left = num_nulls; 313 | while left > 0 { 314 | // If current write_buf is not empty, fill it up with zeroes and flush (maybe) 315 | if self.write_buf.len() > 0 { 316 | let num_to_fill = left.min(FIXED_LEN - self.write_buf.len()); 317 | self.write_buf.resize(self.write_buf.len() + num_to_fill as usize, T::zero()); 318 | left -= num_to_fill; 319 | if self.write_buf.len() >= FIXED_LEN { self.encode_section()?; } 320 | // If empty, and we have at least FIXED_LEN nulls to go, insert a null section. 321 | } else if left >= FIXED_LEN { 322 | self.offset = self.retry_grow(|s| NullFixedSect::write(s.vect_buf.as_mut_slice(), s.offset))?; 323 | self.stats.num_null_sections += 1; 324 | self.stats.update_num_elems(&mut self.vect_buf, self.stats.num_elements + FIXED_LEN as u32)?; 325 | self.header.update_num_bytes(self.vect_buf.as_mut_slice(), 326 | (self.offset - NUM_HEADER_BYTES_TOTAL) as u32)?; 327 | left -= FIXED_LEN; 328 | // If empty, and less than fixed_len nulls, insert nulls into write_buf 329 | } else { 330 | self.write_buf.resize(left as usize, T::zero()); 331 | left = 0; 332 | } 333 | } 334 | Ok(()) 335 | } 336 | 337 | /// Call this method to wrap up a vector and any unfinished sections, and clone out resulting vector. 338 | /// We have no more values, and need to fill up the appender with nulls/0's until it is the right length. 339 | /// This is because most query engines expect all vectors to be of the same number of elements. 340 | /// The number passed in will be stored as the actual number of elements for iteration purposes, however 341 | /// since this is a fixed size section vector, the number will be rounded up to the next FIXED_LEN so that 342 | /// an entire section is written. 343 | /// NOTE: TooFewRows is returned if total_num_rows is below the total number of elements written so far. 344 | pub fn finish(&mut self, total_num_rows: usize) -> Result, CodingError> { 345 | let total_so_far = self.stats.num_elements as usize + self.write_buf.len(); 346 | if total_so_far > total_num_rows { return Err(CodingError::InvalidNumRows(total_num_rows, total_so_far)); } 347 | if total_num_rows > u32::max_value() as usize { 348 | return Err(CodingError::InvalidNumRows(total_num_rows, u32::max_value() as usize)); 349 | } 350 | 351 | // Round out the section if needed 352 | if self.write_buf.len() > 0 { 353 | let number_to_fill = FIXED_LEN - self.write_buf.len(); 354 | self.append_nulls(number_to_fill)?; 355 | } 356 | 357 | while self.stats.num_elements < total_num_rows as u32 { 358 | self.append_nulls(256)?; 359 | } 360 | 361 | // Re-write the number of elements to reflect total_num_rows 362 | self.stats.update_num_elems(self.vect_buf.as_mut_slice(), total_num_rows as u32)?; 363 | self.vect_buf.as_mut_slice().pwrite_with(&self.stats, BINARYVECT_HEADER_SIZE, LE)?; 364 | 365 | self.vect_buf.resize(self.offset, 0); 366 | let mut returned_vec = Vec::with_capacity(self.offset); 367 | returned_vec.append(&mut self.vect_buf); 368 | self.reset()?; 369 | Ok(returned_vec) 370 | } 371 | 372 | /// Obtains a reader for reading from the bytes of this appender. 373 | /// NOTE: reader will only read what has been written so far, and due to Rust borrowing rules, one should 374 | /// not attempt to read and append at the same time; the returned reader is not safe across threads. 375 | pub fn reader(&self) -> VectorReader { 376 | // This should never fail, as we have already proven we can initialize the vector 377 | VectorReader::try_new(&self.vect_buf[..self.offset]).expect("Getting reader from appender failed") 378 | } 379 | } 380 | 381 | /// Regular U64 appender with AutoEncoder 382 | pub type VectorU64Appender = VectorAppender; 383 | 384 | /// Regular U32 appender with AutoEncoder 385 | pub type VectorU32Appender = VectorAppender; 386 | 387 | /// Regular F32 appender with XOR-based optimizing encoder 388 | pub type VectorF32XorAppender = VectorAppender>; 389 | 390 | 391 | /// A reader for reading sections and elements from a `VectorAppender` written vector. 392 | /// Use the same base type - eg VectorU32Appender -> VectorReader:: 393 | /// Can be reused many times; it has no mutable state and creates new iterators every time. 394 | // TODO: have a reader trait of some kind? 395 | pub struct VectorReader<'buf, T: VectBase> { 396 | vect_bytes: &'buf [u8], 397 | _reader: PhantomData, 398 | } 399 | 400 | impl<'buf, T> VectorReader<'buf, T> 401 | where T: VectBase + BaseSubtypeMapping { 402 | /// Creates a new reader out of the bytes for the vector. 403 | // TODO: verify that the vector is a fixed sect int. 404 | pub fn try_new(vect_bytes: &'buf [u8]) -> Result { 405 | let bytes_from_header: u32 = vect_bytes.pread_with(0, LE)?; 406 | let subtype: u8 = vect_bytes.pread_with(offset_of!(BinaryVector, minor_type), LE)?; 407 | if vect_bytes.len() < (bytes_from_header + 4) as usize { 408 | Err(CodingError::InputTooShort) 409 | } else if subtype != T::vect_subtype() as u8 { 410 | Err(CodingError::WrongVectorType(subtype)) 411 | } else { 412 | Ok(Self { vect_bytes, _reader: PhantomData }) 413 | } 414 | } 415 | 416 | pub fn num_elements(&self) -> usize { 417 | // Should not fail since we have verified in try_new() that we have all header bytes 418 | self.get_stats().num_elements as usize 419 | } 420 | 421 | pub fn total_bytes(&self) -> usize { 422 | self.vect_bytes.len() 423 | } 424 | 425 | /// Iterates and discovers the number of null sections. O(num_sections). It will be faster to just use 426 | /// get_stats(). 427 | pub fn num_null_sections(&self) -> Result { 428 | let mut count = 0; 429 | for sect_res in self.sect_iter() { 430 | let sect = sect_res?; 431 | if sect.is_null() { count += 1 } 432 | } 433 | Ok(count) 434 | } 435 | 436 | /// Returns a FixedSectStats extracted from the vector header. 437 | pub fn get_stats(&self) -> FixedSectStats { 438 | self.vect_bytes.pread_with(BINARYVECT_HEADER_SIZE, LE).unwrap() 439 | } 440 | 441 | /// Returns an iterator over each section in this vector 442 | pub fn sect_iter(&self) -> FixedSectIterator<'buf, T> { 443 | FixedSectIterator::new(&self.vect_bytes[NUM_HEADER_BYTES_TOTAL..]) 444 | } 445 | 446 | /// Returns a VectorFilter that iterates over 256-bit masks filtered from vector elements 447 | pub fn filter_iter>(&self, f: F) -> VectorFilter<'buf, F, T> { 448 | VectorFilter::new(&self.vect_bytes[NUM_HEADER_BYTES_TOTAL..], f) 449 | } 450 | 451 | /// Returns an iterator over all items in this vector. 452 | pub fn iterate(&self) -> VectorItemIter<'buf, T> { 453 | VectorItemIter::new(self.sect_iter(), self.num_elements()) 454 | } 455 | 456 | /// Decodes/processes this vector's elements through a Sink. This is the most general purpose vector 457 | /// decoding/processing API. 458 | pub fn decode_to_sink(&self, output: &mut Output) -> Result<(), CodingError> 459 | where Output: Sink { 460 | for sect in self.sect_iter() { 461 | sect?.decode(output)?; 462 | } 463 | Ok(()) 464 | } 465 | } 466 | 467 | 468 | /// Detailed stats, for debugging or perf analysis, on a Vector. Includes the section types. 469 | #[derive(Debug)] 470 | pub struct VectorStats { 471 | num_bytes: usize, 472 | bytes_per_elem: f32, 473 | stats: FixedSectStats, 474 | sect_types: Vec, 475 | } 476 | 477 | impl VectorStats { 478 | pub fn new<'buf, T: VectBase + BaseSubtypeMapping>(reader: &VectorReader<'buf, T>) -> Self { 479 | let stats = reader.get_stats(); 480 | Self { 481 | num_bytes: reader.total_bytes(), 482 | bytes_per_elem: reader.total_bytes() as f32 / stats.num_elements as f32, 483 | stats, 484 | sect_types: reader.sect_iter().map(|sect| sect.unwrap().sect_type()).collect(), 485 | } 486 | } 487 | 488 | /// Creates a histogram or count of each section type 489 | pub fn sect_types_histogram(&self) -> HashMap { 490 | let mut map = HashMap::new(); 491 | self.sect_types.iter().for_each(|§_type| { 492 | let count = map.entry(sect_type).or_insert(0); 493 | *count += 1; 494 | }); 495 | map 496 | } 497 | 498 | /// Returns a short summary string of the stats, including a histogram summary 499 | pub fn summary_string(&self) -> String { 500 | let keyvalues: Vec<_> = self.sect_types_histogram().iter() 501 | .map(|(k, v)| format!("{:?}={:?}", k, v)).collect(); 502 | format!("#bytes={:?} #elems={:?} bytes-per-elem={:?}\nsection type hist: {}", 503 | self.num_bytes, self.stats.num_elements, self.bytes_per_elem, 504 | keyvalues.join(", ")) 505 | } 506 | } 507 | 508 | 509 | /// Iterator struct over all items in a vector, for convenience 510 | /// Panics on decoding error - there's no really good way for an iterator to return an error 511 | // NOTE: part of reason to do this is to better control lifetimes which is hard otherwise 512 | pub struct VectorItemIter<'buf, T: VectBase> { 513 | sect_iter: FixedSectIterator<'buf, T>, 514 | sink: Section256Sink, 515 | num_elems: usize, 516 | i: usize, 517 | } 518 | 519 | impl<'buf, T: VectBase> VectorItemIter<'buf, T> { 520 | pub fn new(sect_iter: FixedSectIterator<'buf, T>, num_elems: usize) -> Self { 521 | let mut s = Self { 522 | sect_iter, 523 | sink: Section256Sink::::new(), 524 | num_elems, 525 | i: 0, 526 | }; 527 | if num_elems > 0 { 528 | s.next_section(); 529 | } 530 | s 531 | } 532 | 533 | fn next_section(&mut self) { 534 | self.sink.reset(); 535 | if let Some(Ok(next_sect)) = self.sect_iter.next() { 536 | next_sect.decode(&mut self.sink).expect("Unexpected end of section"); 537 | } 538 | } 539 | } 540 | 541 | impl<'buf, T: VectBase> Iterator for VectorItemIter<'buf, T> { 542 | type Item = T; 543 | fn next(&mut self) -> Option { 544 | if self.i < self.num_elems { 545 | let thing = self.sink.values[self.i % FIXED_LEN]; 546 | self.i += 1; 547 | // If at boundary, get next_section 548 | if self.i % FIXED_LEN == 0 && self.i < self.num_elems { 549 | self.next_section(); 550 | } 551 | Some(thing) 552 | } else { 553 | None 554 | } 555 | } 556 | } 557 | 558 | #[cfg(test)] 559 | mod test { 560 | use super::*; 561 | use crate::filter::{EqualsSink, count_hits}; 562 | 563 | #[test] 564 | fn test_append_u64_nonulls() { 565 | // Make sure the fixed sect stats above can still fit in total headers 566 | assert!(std::mem::size_of::() + BINARYVECT_HEADER_SIZE <= NUM_HEADER_BYTES_TOTAL); 567 | 568 | // Append more than 256 values, see if we get two sections and the right data back 569 | let num_values: usize = 500; 570 | let data: Vec = (0..num_values as u64).collect(); 571 | 572 | let mut appender = VectorU64Appender::try_new(1024).unwrap(); 573 | { 574 | let reader = appender.reader(); 575 | 576 | assert_eq!(reader.num_elements(), 0); 577 | assert_eq!(reader.sect_iter().count(), 0); 578 | // Note: due to Rust borrowing rules we can only have reader as long as we are not appending. 579 | } 580 | 581 | // Now append the data 582 | data.iter().for_each(|&e| appender.append(e).unwrap()); 583 | 584 | // At this point only 1 section has been written, the vector is not finished yet. 585 | let reader = appender.reader(); 586 | assert_eq!(reader.num_elements(), 256); 587 | assert_eq!(reader.sect_iter().count(), 1); 588 | 589 | let finished_vec = appender.finish(num_values).unwrap(); 590 | 591 | let reader = VectorReader::try_new(&finished_vec[..]).unwrap(); 592 | assert_eq!(reader.num_elements(), num_values); 593 | assert_eq!(reader.sect_iter().count(), 2); 594 | assert_eq!(reader.num_null_sections().unwrap(), 0); 595 | 596 | let elems: Vec = reader.iterate().collect(); 597 | assert_eq!(elems, data); 598 | } 599 | 600 | #[test] 601 | fn test_append_u64_mixed_nulls() { 602 | // Have some values, then append a large number of nulls 603 | // (enough to pack rest of section, plus a null section, plus more in next section) 604 | // Thus sections should be: Sect1: 100 values + 156 nulls 605 | // Sect2: null section 606 | // Sect3: 50 nulls + 50 more values 607 | let data1: Vec = (0..100).collect(); 608 | let num_nulls = (256 - data1.len()) + 256 + 50; 609 | let data2: Vec = (0..50).collect(); 610 | 611 | let total_elems = data1.len() + data2.len() + num_nulls; 612 | 613 | let mut all_data = Vec::::with_capacity(total_elems); 614 | all_data.extend_from_slice(&data1[..]); 615 | (0..num_nulls).for_each(|_i| all_data.push(0)); 616 | all_data.extend_from_slice(&data2[..]); 617 | 618 | let mut appender = VectorU64Appender::try_new(1024).unwrap(); 619 | data1.iter().for_each(|&e| appender.append(e).unwrap()); 620 | appender.append_nulls(num_nulls).unwrap(); 621 | data2.iter().for_each(|&e| appender.append(e).unwrap()); 622 | 623 | let finished_vec = appender.finish(total_elems).unwrap(); 624 | 625 | let reader = VectorReader::try_new(&finished_vec[..]).unwrap(); 626 | assert_eq!(reader.num_elements(), total_elems); 627 | assert_eq!(reader.sect_iter().count(), 3); 628 | assert_eq!(reader.num_null_sections().unwrap(), 1); 629 | 630 | assert_eq!(reader.get_stats().num_null_sections, 1); 631 | 632 | let elems: Vec = reader.iterate().collect(); 633 | assert_eq!(elems, all_data); 634 | } 635 | 636 | #[test] 637 | fn test_append_u64_mixed_nulls_grow() { 638 | // Same as last test but use smaller buffer to force growing of encoding buffer 639 | let data1: Vec = (0..300).collect(); 640 | let num_nulls = 350; 641 | 642 | let total_elems = (data1.len() + num_nulls) * 2; 643 | 644 | let mut all_data = Vec::::with_capacity(total_elems); 645 | all_data.extend_from_slice(&data1[..]); 646 | (0..num_nulls).for_each(|_i| all_data.push(0)); 647 | all_data.extend_from_slice(&data1[..]); 648 | (0..num_nulls).for_each(|_i| all_data.push(0)); 649 | 650 | let mut appender = VectorU64Appender::try_new(300).unwrap(); 651 | data1.iter().for_each(|&e| appender.append(e).unwrap()); 652 | appender.append_nulls(num_nulls).unwrap(); 653 | data1.iter().for_each(|&e| appender.append(e).unwrap()); 654 | appender.append_nulls(num_nulls).unwrap(); 655 | 656 | let finished_vec = appender.finish(total_elems).unwrap(); 657 | 658 | let reader = VectorReader::try_new(&finished_vec[..]).unwrap(); 659 | println!("summary: {}", VectorStats::new(&reader).summary_string()); 660 | assert_eq!(reader.num_elements(), total_elems); 661 | assert_eq!(reader.sect_iter().count(), 6); 662 | assert_eq!(reader.num_null_sections().unwrap(), 2); 663 | 664 | let elems: Vec = reader.iterate().collect(); 665 | assert_eq!(elems, all_data); 666 | } 667 | 668 | #[test] 669 | fn test_append_u32_and_filter() { 670 | // First test appending with no nulls. Just 1,2,3,4 and filter for 3, should get 1/4 of appended elements 671 | let vector_size = 400; 672 | let mut appender = VectorU32Appender::try_new(1024).unwrap(); 673 | for i in 0..vector_size { 674 | appender.append((i % 4) + 1).unwrap(); 675 | } 676 | let finished_vec = appender.finish(vector_size as usize).unwrap(); 677 | 678 | let reader = VectorReader::::try_new(&finished_vec[..]).unwrap(); 679 | assert_eq!(reader.num_elements(), vector_size as usize); 680 | assert_eq!(reader.sect_iter().count(), 2); 681 | 682 | let filter_iter = reader.filter_iter(EqualsSink::::new(&3)); 683 | let count = count_hits(filter_iter) as u32; 684 | assert_eq!(count, vector_size / 4); 685 | 686 | // Test appending with stretches of nulls. 300, then 400 nulls, then 300 elements again 687 | let nonnulls = 300; 688 | let total_elems = nonnulls * 2 + 400; 689 | for i in 0..nonnulls { 690 | appender.append((i % 4) + 1).unwrap(); 691 | } 692 | appender.append_nulls(400).unwrap(); 693 | for i in 0..nonnulls { 694 | appender.append((i % 4) + 1).unwrap(); 695 | } 696 | let finished_vec = appender.finish(total_elems as usize).unwrap(); 697 | 698 | let reader = VectorReader::::try_new(&finished_vec[..]).unwrap(); 699 | assert_eq!(reader.num_elements(), total_elems as usize); 700 | 701 | let filter_iter = reader.filter_iter(EqualsSink::::new(&3)); 702 | let count = count_hits(filter_iter) as u32; 703 | assert_eq!(count, nonnulls * 2 / 4); 704 | 705 | // Iterate and decode_to_sink to VecSink should produce same values... except for trailing zeroes 706 | let mut sink = VecSink::::new(); 707 | reader.decode_to_sink(&mut sink).unwrap(); 708 | let it_data: Vec = reader.iterate().collect(); 709 | assert_eq!(sink.vec[..total_elems as usize], it_data[..]); 710 | } 711 | 712 | #[test] 713 | fn test_append_u32_large_vector() { 714 | // 9999 nulls, then an item, 10 times = 100k items total 715 | let mut appender = VectorU32Appender::try_new(4096).unwrap(); 716 | let vector_size = 100000; 717 | for _ in 0..10 { 718 | appender.append_nulls(9999).unwrap(); 719 | appender.append(2).unwrap(); 720 | } 721 | assert_eq!(appender.num_elements(), vector_size); 722 | 723 | let finished_vec = appender.finish(vector_size).unwrap(); 724 | let reader = VectorReader::::try_new(&finished_vec[..]).unwrap(); 725 | assert_eq!(reader.num_elements(), vector_size as usize); 726 | } 727 | 728 | #[test] 729 | fn test_read_wrong_type_error() { 730 | let vector_size = 400; 731 | let mut appender = VectorU32Appender::try_new(1024).unwrap(); 732 | for i in 0..vector_size { 733 | appender.append((i % 4) + 1).unwrap(); 734 | } 735 | let finished_vec = appender.finish(vector_size as usize).unwrap(); 736 | 737 | let res = VectorReader::::try_new(&finished_vec[..]); 738 | assert_eq!(res.err().unwrap(), CodingError::WrongVectorType(VectorSubType::FixedU32 as u8)) 739 | } 740 | 741 | #[test] 742 | fn test_append_f32_decode() { 743 | let mut appender = VectorF32XorAppender::try_new(2048).unwrap(); 744 | let vector_size = 280; 745 | let data: Vec = (0..vector_size).map(|x| x as f32 / 2.8).collect(); 746 | 747 | let finished_vec = appender.encode_all(data.clone()).unwrap(); 748 | let reader = VectorReader::::try_new(&finished_vec[..]).unwrap(); 749 | assert_eq!(reader.num_elements(), vector_size); 750 | 751 | // Iterate and decode_to_sink to VecSink should produce same values... except for trailing zeroes 752 | let mut sink = VecSink::::new(); 753 | reader.decode_to_sink(&mut sink).unwrap(); 754 | assert_eq!(sink.vec[..vector_size], data[..]); 755 | } 756 | } 757 | 758 | -------------------------------------------------------------------------------- /vector_format.md: -------------------------------------------------------------------------------- 1 | ## compressed_vec Vector Format 2 | 3 | The format for each vector is defined in detail below. It is loosely based on the vector format in [FiloDB](https://github.com/filodb/FiloDB) used for histograms. Each vector is divided into sections of 256 elements. 4 | 5 | The vector bytes are wire-format ready, they can be written to and read from disk or network and interpreted/read with no further transformations needed. 6 | 7 | The goals of the vector format are: 8 | * Optimize for fast, SIMD-based decoding. 9 | * Enable fast, aligned data filtering and processing by having fixed section boundaries 10 | * Varied encoding techniques to bring compression to within roughly 2x of good general purpose compression techniques, or better 11 | * Contain metadata to enable fast filtering 12 | 13 | ### Header 14 | 15 | The header is 16 bytes. The first 6 bytes are binary compatible with FiloDB vectors. The structs are defined in `src/vector.rs` in the `BinaryVector` and `FixedSectStats` structs. 16 | 17 | | offset | description | 18 | | ------ | ----------- | 19 | | +0 | u32: total number of bytes in this vector, NOT including these 4 length bytes | 20 | | +4 | u8: Major vector type, see the `VectorType` enum for details | 21 | | +5 | u8: Vector subtype, see the `VectorSubType` enum for details | 22 | | +8 | u32: total number of elements in this vector | 23 | | +12 | u16: number of null sections in this vector, used for quickly determining relative sparsity | 24 | 25 | For the vectors produced by this crate, the major type code used is `VectorType::FixedSection256` (0x10), while the minor type code is `Primitive`. 26 | 27 | ### Sections 28 | 29 | Following the header are one or more sections of fixed 256 elements each. If the last section does not have 256 elements, nulls are added until the section has 256 elements. Sections cannot carry over state to adjacent sections; each section must contain enough state to completely decode itself. This is needed for fast iteration and skipping over sections in filtering and data processing. 30 | 31 | The first byte of a section contains the section code. See the `SectionType` enum in `src/section.rs` for the up to date list of codes, but it is included here for convenience: 32 | 33 | ```rust 34 | pub enum SectionType { 35 | Null = 0, // FIXED_LEN unavailable or null elements in a row 36 | NibblePackedMedium = 1, // Nibble-packed u64/u32's, total size < 64KB 37 | DeltaNPMedium = 3, // Nibble-packed u64/u32's, delta encoded, total size < 64KB 38 | Constant = 5, // Constant value section 39 | XorNPMedium = 6, // XORed f64/f32, NibblePacked, total size < 64KB 40 | } 41 | ``` 42 | 43 | ### Null Sections 44 | 45 | A null section represents 256 zeroes, and just contains the single Null section type byte. 46 | 47 | Null sections are key to encoding sparse vectors efficiently, and should be leveraged as much as possible. 48 | 49 | ### NibblePacked U64/U32 sections 50 | 51 | The NibblePacked section codes (1/2) represent 256 values (u32 or u64), packed in groups of 8 using the [NibblePacking](https://github.com/filodb/FiloDB/blob/develop/doc/compression.md#predictive-nibblepacking) algorithm from FiloDB (used in production at massive scale). NibblePacking uses only 1 bit for zero values, and stores the minimum number of nibbles only. From the start of the section, there are 3 header bytes, followed by the NibblePack-encoded data. 52 | 53 | | offset | description | 54 | | ------ | ----------- | 55 | | +0 | u8: section type code: 1 | 56 | | +1 | u16: number of bytes of this section, excluding these 3 header bytes | 57 | | +3 | Start of NibblePack-encoded data, back to back. This starts with the bitmask byte, then the number of nibbles byte, then the nibbles, repeated for every group of 8 u64's/u32's | 58 | 59 | ### Delta-Encoded NibblePacked Sections 60 | 61 | For values such as timestamps which are mostly in a certain narrow range, the naive NibblePacked algorithm above might result in more nibbles than necessary. Delta-encoded sections store a delta from the minimum value in the stretch of 256 raw values, and the deltas are then NibblePack compressed. The goal here is to attain higher compression as the deltas should be smaller. 62 | 63 | | offset | description | 64 | | ------ | ----------- | 65 | | +0 | u8: section type code: 3 | 66 | | +1 | u16: number of bytes of this section, excluding these 3 header bytes | 67 | | +3 | u8: number of bits needed for the largest delta. Or the smallest n where 2^n >= max raw value. | 68 | | +4 | u64: The "base" value to which all deltas are added to form original value | 69 | | +12 | Start of NibblePack-encoded deltas, back to back. This starts with the bitmask byte, then the number of nibbles byte, then the nibbles, repeated for every group of 8 u64's/u32's | 70 | 71 | ### Constant Sections 72 | 73 | These sections represent 256 repeated values. 74 | 75 | | offset | description | 76 | | ------ | ----------- | 77 | | +0 | u8: section type code: 5 | 78 | | +1 | u32/u64/etc.: the constant value | 79 | 80 | ### XOR floating point NibblePacked sections 81 | 82 | This is a Gorilla- and Prometheus- inspired algorithm but designed for fast SIMD unpacking. Floating point numbers that are similar will XOR such that the result only contains a few set bits. NibblePacking algorithm then packs only the nonzero nibbles, taking care of long trailing zero nibbles. The algorithm starts with 0's, thus the initial octet gets NibblePacked in the stream. 83 | 84 | | offset | description | 85 | | ------ | ----------- | 86 | | +0 | u8: section type code: 6 | 87 | | +1 | u16: number of bytes of this section, including header bytes | 88 | | +3 | NibblePacked XORed values | 89 | 90 | Each set of 8 values are XORed against the previous set of 8 values, and the difference is NibblePacked. 91 | 92 | ### Filtering and Vector Processing 93 | 94 | Fast filtering and vector processing of multiple vectors is enabled by the following: 95 | * All sections are the same number of elements across vectors, thus 96 | * we can iterate through sections and process the same set of elements across multiple vectors at the same time 97 | * processing of sparse vectors with null sections can be optimized and special cased 98 | 99 | We can see examples of this in `src/filter.rs`, with the `VectorFilter` and `MultiVectorFilter` structs. 100 | --------------------------------------------------------------------------------