├── .gitignore ├── Cargo.toml ├── LICENSE ├── README.md ├── benches └── main.rs └── src ├── bit.rs └── lib.rs /.gitignore: -------------------------------------------------------------------------------- 1 | target 2 | Cargo.lock 3 | -------------------------------------------------------------------------------- /Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "bset" 3 | version = "0.1.0" 4 | authors = ["Maroš Grego "] 5 | edition = "2018" 6 | description = "Fast and compact sets of bytes or ASCII characters" 7 | repository = "https://github.com/grego/bset" 8 | homepage = "https://github.com/grego/bset" 9 | keywords = ["byte", "set", "ascii", "search", "fast"] 10 | categories = ["no-std", "data-structures", "rust-patterns", "embedded"] 11 | license = "MIT" 12 | readme = "README.md" 13 | 14 | [dev-dependencies] 15 | rand = "0.8" 16 | byte_set = "0.1.3" 17 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright 2021 Maroš Grego 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 4 | 5 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 6 | 7 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 8 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # bset 2 | [![Crates.io status](https://badgen.net/crates/v/bset)](https://crates.io/crates/bset) 3 | [![Docs](https://docs.rs/bset/badge.svg)](https://docs.rs/bset) 4 | 5 | Fast and compact sets of bytes and ASCII characters, 6 | useful for searching, parsing and determining membership of a given byte 7 | in the given set. 8 | They don't use any allocation, nor even any `std` features. 9 | In fact, all of the provided functions are `const`, so they can be freely 10 | constructed at the compile time 11 | 12 | ## Sets 13 | This crate exports two set types - `ByteSet` for a set of general bytes 14 | and `AsciiSet` for a set restricted to the range of ASCII characters. 15 | The is two times smaller in the memory, but comes with a slight 16 | performance trade-off in having to check whether the given character 17 | belongs to the ASCII range. 18 | ```rust 19 | use ascii_set::AsciiSet; 20 | 21 | const OP: AsciiSet = AsciiSet::new().add_bytes(b"+-*/%&|^"); 22 | assert!(OP.contains(b'%')); 23 | ``` 24 | The sets are implemented as an array of pointer-sized bit masks. 25 | 26 | ## Stack of sets 27 | Inspired by [this article](https://maciej.codes/2020-04-19-stacking-luts-in-logos.html) 28 | by Maciej Hirsz, this crate provides a way to stack multiple sets into one 29 | structure to gain even more performance. 30 | To not slow it down, sets are "obtained" from the given stack at the type 31 | level. For this, the module `bits` contains types `B0`, `B1`, ..., `B7` 32 | representing indices of a set in the stack. 33 | Because `const fn`s currently don't support generic functions, the sets 34 | are indexed by the order they were added to the stack. 35 | Type aliases can be used to identify the sets within the stack: 36 | ```rust 37 | use ascii_set::{bits::*, ByteSet, ByteStack}; 38 | 39 | const BYTE_STACK: ByteStack = ByteStack::new() 40 | .add_set(ByteSet::DIGITS) 41 | .add_set(ByteSet::ALPHABETIC) 42 | .add_set(ByteSet::new().add_bytes(b"+-*/%&|^")); 43 | type Digits = B0; 44 | type Alphabetic = B1; 45 | type Operations = B2; 46 | assert!(BYTE_STACK.contains::(b'%')); 47 | ``` 48 | Again, there are two versions, `ByteStack` for all bytes and `AsciiStack` 49 | restricted to the ASCII range. Benchmarks show that testing the set membership 50 | is about 20% faster with stacked sets. They come with 8 times larger 51 | memory size (128/256 bytes vs. 16/32), which does not increase with the stacks 52 | added, so when 8 sets (the maximum number) are used in one stack, 53 | the memory size is equivalent. 54 | 55 | ## Benchmarks 56 | Stacked full byte set version consistently outperforms both `match`ing and `std` 57 | `is_ascii_*` functions. For some simple sets, the set version can be a bit slower. 58 | 59 | Alphanumeric characters: 60 | ``` 61 | test alnum_ascii_set ... bench: 1,051 ns/iter (+/- 48) = 974 MB/s 62 | test alnum_ascii_stack ... bench: 801 ns/iter (+/- 33) = 1278 MB/s 63 | test alnum_byte_set ... bench: 839 ns/iter (+/- 50) = 1220 MB/s 64 | test alnum_byte_stack ... bench: 620 ns/iter (+/- 33) = 1651 MB/s 65 | test alnum_is_alnum ... bench: 1,574 ns/iter (+/- 70) = 650 MB/s 66 | test alnum_match ... bench: 1,573 ns/iter (+/- 86) = 650 MB/s 67 | ``` 68 | 69 | Alphabetic characters: 70 | ``` 71 | test letter_ascii_set ... bench: 1,027 ns/iter (+/- 42) = 997 MB/s 72 | test letter_ascii_stack ... bench: 943 ns/iter (+/- 45) = 1085 MB/s 73 | test letter_byte_set ... bench: 839 ns/iter (+/- 34) = 1220 MB/s 74 | test letter_byte_stack ... bench: 619 ns/iter (+/- 29) = 1654 MB/s 75 | test letter_is_alphabetic ... bench: 820 ns/iter (+/- 42) = 1248 MB/s 76 | test letter_match ... bench: 825 ns/iter (+/- 36) = 1241 MB/s 77 | ``` 78 | 79 | Lowercase characters: 80 | ``` 81 | test lowercase_ascii_set ... bench: 1,197 ns/iter (+/- 52) = 855 MB/s 82 | test lowercase_ascii_stack ... bench: 893 ns/iter (+/- 45) = 1146 MB/s 83 | test lowercase_byte_set ... bench: 890 ns/iter (+/- 44) = 1150 MB/s 84 | test lowercase_byte_stack ... bench: 451 ns/iter (+/- 14) = 2270 MB/s 85 | test lowercase_is_lowercase ... bench: 752 ns/iter (+/- 33) = 1361 MB/s 86 | test lowercase_match ... bench: 771 ns/iter (+/- 67) = 1328 MB/s 87 | ``` 88 | 89 | URI reserved characters (per RFC 3986, section 2.2): 90 | ``` 91 | test uri_ascii_set ... bench: 1,243 ns/iter (+/- 87) = 823 MB/s 92 | test uri_ascii_stack ... bench: 887 ns/iter (+/- 103) = 1154 MB/s 93 | test uri_byte_set ... bench: 905 ns/iter (+/- 84) = 1131 MB/s 94 | test uri_byte_stack ... bench: 610 ns/iter (+/- 35) = 1678 MB/s 95 | test uri_match ... bench: 1,294 ns/iter (+/- 45) = 791 MB/s 96 | ``` 97 | 98 | License: MIT 99 | -------------------------------------------------------------------------------- /benches/main.rs: -------------------------------------------------------------------------------- 1 | #![feature(test)] 2 | extern crate test; 3 | 4 | use bset::{bits::*, AsciiSet, AsciiStack, ByteSet, ByteStack}; 5 | use rand::{thread_rng, Rng}; 6 | use test::Bencher; 7 | 8 | use byte_set::ByteSet as OtherByteSet; 9 | 10 | const SAMPLE_SIZE: usize = 1024; 11 | 12 | const ASCII_STACK: AsciiStack = AsciiStack::new() 13 | .add_set(AsciiSet::LOWERCASE) 14 | .add_set(AsciiSet::ALPHABETIC) 15 | .add_set(AsciiSet::ALPHANUMERIC) 16 | .add_set(AsciiSet::URI_RESERVED); 17 | 18 | const BYTE_STACK: ByteStack = ByteStack::new() 19 | .add_set(AsciiSet::LOWERCASE) 20 | .add_set(AsciiSet::ALPHABETIC) 21 | .add_set(AsciiSet::ALPHANUMERIC) 22 | .add_set(AsciiSet::URI_RESERVED); 23 | 24 | type Lowercase = B0; 25 | type Alphabetic = B1; 26 | type Alphanumeric = B2; 27 | type UriReserved = B3; 28 | 29 | fn bench_fn(b: &mut Bencher, f: F) 30 | where 31 | F: Fn(&u8) -> bool, 32 | { 33 | let mut input = [0_u8; SAMPLE_SIZE]; 34 | thread_rng().fill(&mut input); 35 | b.bytes = SAMPLE_SIZE as u64; 36 | b.iter(|| input.iter().copied().filter(&f).count()); 37 | } 38 | 39 | // Lowercase cases 40 | #[bench] 41 | fn lowercase_ascii_set(b: &mut Bencher) { 42 | bench_fn(b, |&c| AsciiSet::LOWERCASE.contains(c)); 43 | } 44 | 45 | #[bench] 46 | fn lowercase_ascii_stack(b: &mut Bencher) { 47 | bench_fn(b, |&c| ASCII_STACK.contains::(c)); 48 | } 49 | 50 | #[bench] 51 | fn lowercase_byte_set(b: &mut Bencher) { 52 | bench_fn(b, |&c| ByteSet::LOWERCASE.contains(c)); 53 | } 54 | 55 | #[bench] 56 | fn lowercase_byte_stack(b: &mut Bencher) { 57 | bench_fn(b, |&c| BYTE_STACK.contains::(c)); 58 | } 59 | 60 | #[bench] 61 | fn lowercase_match(b: &mut Bencher) { 62 | bench_fn(b, |&c| matches!(c, b'a'..=b'z')); 63 | } 64 | 65 | #[bench] 66 | fn lowercase_is_lowercase(b: &mut Bencher) { 67 | bench_fn(b, |&c| c.is_ascii_lowercase()); 68 | } 69 | 70 | #[bench] 71 | fn lowercase_other_byte_set(b: &mut Bencher) { 72 | bench_fn(b, |&c| OtherByteSet::ASCII_LOWERCASE.contains(c)); 73 | } 74 | 75 | // Alphabetic cases 76 | #[bench] 77 | fn letter_ascii_set(b: &mut Bencher) { 78 | bench_fn(b, |&c| AsciiSet::ALPHABETIC.contains(c)); 79 | } 80 | 81 | #[bench] 82 | fn letter_ascii_stack(b: &mut Bencher) { 83 | bench_fn(b, |&c| ASCII_STACK.contains::(c)); 84 | } 85 | 86 | #[bench] 87 | fn letter_byte_set(b: &mut Bencher) { 88 | bench_fn(b, |&c| ByteSet::ALPHABETIC.contains(c)); 89 | } 90 | 91 | #[bench] 92 | fn letter_byte_stack(b: &mut Bencher) { 93 | bench_fn(b, |&c| BYTE_STACK.contains::(c)); 94 | } 95 | 96 | #[bench] 97 | fn letter_is_alphabetic(b: &mut Bencher) { 98 | bench_fn(b, |&c| c.is_ascii_alphabetic()); 99 | } 100 | 101 | #[bench] 102 | fn letter_match(b: &mut Bencher) { 103 | bench_fn(b, |&c| matches!(c, b'a'..=b'z' | b'A'..=b'Z')); 104 | } 105 | 106 | #[bench] 107 | fn letter_other_byte_set(b: &mut Bencher) { 108 | bench_fn(b, |&c| OtherByteSet::ASCII_ALPHABETIC.contains(c)); 109 | } 110 | 111 | // Alphanumeric cases 112 | #[bench] 113 | fn alnum_ascii_set(b: &mut Bencher) { 114 | bench_fn(b, |&c| AsciiSet::ALPHANUMERIC.contains(c)); 115 | } 116 | 117 | #[bench] 118 | fn alnum_ascii_stack(b: &mut Bencher) { 119 | bench_fn(b, |&c| ASCII_STACK.contains::(c)); 120 | } 121 | 122 | #[bench] 123 | fn alnum_byte_set(b: &mut Bencher) { 124 | bench_fn(b, |&c| ByteSet::ALPHANUMERIC.contains(c)); 125 | } 126 | 127 | #[bench] 128 | fn alnum_byte_stack(b: &mut Bencher) { 129 | bench_fn(b, |&c| BYTE_STACK.contains::(c)); 130 | } 131 | 132 | #[bench] 133 | fn alnum_is_alnum(b: &mut Bencher) { 134 | bench_fn(b, |&c| c.is_ascii_alphanumeric()); 135 | } 136 | 137 | #[bench] 138 | fn alnum_match(b: &mut Bencher) { 139 | bench_fn(b, |&c| matches!(c, b'a'..=b'z' | b'A'..=b'Z' | b'0'..=b'9')); 140 | } 141 | 142 | #[bench] 143 | fn alnum_other_byte_set(b: &mut Bencher) { 144 | bench_fn(b, |&c| OtherByteSet::ASCII_ALPHANUMERIC.contains(c)); 145 | } 146 | 147 | // URI reserved cases 148 | #[bench] 149 | fn uri_ascii_set(b: &mut Bencher) { 150 | bench_fn(b, |&c| AsciiSet::URI_RESERVED.contains(c)); 151 | } 152 | 153 | #[bench] 154 | fn uri_ascii_stack(b: &mut Bencher) { 155 | bench_fn(b, |&c| ASCII_STACK.contains::(c)); 156 | } 157 | 158 | #[bench] 159 | fn uri_byte_set(b: &mut Bencher) { 160 | bench_fn(b, |&c| ByteSet::URI_RESERVED.contains(c)); 161 | } 162 | 163 | #[bench] 164 | fn uri_byte_stack(b: &mut Bencher) { 165 | bench_fn(b, |&c| BYTE_STACK.contains::(c)); 166 | } 167 | 168 | #[bench] 169 | fn uri_match(b: &mut Bencher) { 170 | bench_fn(b, |&c| { 171 | matches!( 172 | c, 173 | b'!' | b'#' 174 | | b'$' 175 | | b'&' 176 | | b'\'' 177 | | b'(' 178 | | b')' 179 | | b'*' 180 | | b'+' 181 | | b',' 182 | | b'/' 183 | | b':' 184 | | b';' 185 | | b'=' 186 | | b'?' 187 | | b'@' 188 | | b'[' 189 | | b']' 190 | ) 191 | }); 192 | } 193 | -------------------------------------------------------------------------------- /src/bit.rs: -------------------------------------------------------------------------------- 1 | /// The 0-th bit. 2 | pub struct B0; 3 | /// The 1-st bit. 4 | pub struct B1; 5 | /// The 2-nd bit. 6 | pub struct B2; 7 | /// The 3-rd bit. 8 | pub struct B3; 9 | /// The 4-th bit. 10 | pub struct B4; 11 | /// The 5-th bit. 12 | pub struct B5; 13 | /// The 6-th bit. 14 | pub struct B6; 15 | /// The 7-th bit. 16 | pub struct B7; 17 | 18 | pub trait Bit { 19 | const NUMBER: usize; 20 | type Successor; 21 | } 22 | 23 | impl Bit for B0 { 24 | const NUMBER: usize = 0; 25 | type Successor = B1; 26 | } 27 | 28 | impl Bit for B1 { 29 | const NUMBER: usize = 1; 30 | type Successor = B2; 31 | } 32 | 33 | impl Bit for B2 { 34 | const NUMBER: usize = 2; 35 | type Successor = B3; 36 | } 37 | 38 | impl Bit for B3 { 39 | const NUMBER: usize = 3; 40 | type Successor = B4; 41 | } 42 | 43 | impl Bit for B4 { 44 | const NUMBER: usize = 4; 45 | type Successor = B5; 46 | } 47 | 48 | impl Bit for B5 { 49 | const NUMBER: usize = 5; 50 | type Successor = B6; 51 | } 52 | 53 | impl Bit for B6 { 54 | const NUMBER: usize = 6; 55 | type Successor = B7; 56 | } 57 | 58 | impl Bit for B7 { 59 | const NUMBER: usize = 7; 60 | type Successor = (); 61 | } 62 | -------------------------------------------------------------------------------- /src/lib.rs: -------------------------------------------------------------------------------- 1 | //! Fast and compact sets of bytes or ASCII characters, 2 | //! useful for searching, parsing and determining membership of a given byte 3 | //! in the given set. 4 | //! They don't use any allocation, nor even any `std` features. 5 | //! In fact, all of the provided functions are `const`, so they can be freely 6 | //! constructed at the compile time 7 | //! 8 | //! # Sets 9 | //! This crate exports two set types - `ByteSet` for a set of general bytes 10 | //! and `AsciiSet` for a set restricted to the range of ASCII characters. 11 | //! The is two times smaller in the memory, but comes with a slight 12 | //! performance trade-off in having to check whether the given character 13 | //! belongs to the ASCII range. 14 | //! ``` 15 | //! use bset::AsciiSet; 16 | //! 17 | //! const OP: AsciiSet = AsciiSet::new().add_bytes(b"+-*/%&|^"); 18 | //! assert!(OP.contains(b'%')); 19 | //! ``` 20 | //! The sets are implemented as an array of pointer-sized bit masks. 21 | //! 22 | //! # Stack of sets 23 | //! Inspired by [this article](https://maciej.codes/2020-04-19-stacking-luts-in-logos.html) 24 | //! by Maciej Hirsz, this crate provides a way to stack multiple sets into one 25 | //! structure to gain even more performance. 26 | //! To not slow it down, sets are "obtained" from the given stack at the type 27 | //! level. For this, the module `bits` contains types `B0`, `B1`, ..., `B7` 28 | //! representing indices of a set in the stack. 29 | //! Because `const fn`s currently don't support generic functions, the sets 30 | //! are indexed by the order they were added to the stack. 31 | //! Type aliases can be used to identify the sets within the stack: 32 | //! ``` 33 | //! use bset::{bits::*, ByteSet, ByteStack}; 34 | //! 35 | //! const BYTE_STACK: ByteStack = ByteStack::new() 36 | //! .add_set(ByteSet::DIGITS) 37 | //! .add_set(ByteSet::ALPHABETIC) 38 | //! .add_set(ByteSet::new().add_bytes(b"+-*/%&|^")); 39 | //! type Digits = B0; 40 | //! type Alphabetic = B1; 41 | //! type Operations = B2; 42 | //! assert!(BYTE_STACK.contains::(b'%')); 43 | //! ``` 44 | //! Again, there are two versions, `ByteStack` for all bytes and `AsciiStack` 45 | //! restricted to the ASCII range. Benchmarks show that testing the set membership 46 | //! is about 20% faster with stacked sets. They come with 8 times larger 47 | //! memory size (128/256 bytes vs. 16/32), which does not increase with the stacks 48 | //! added, so when 8 sets (the maximum number) are used in one stack, 49 | //! the memory size is equivalent. 50 | #![no_std] 51 | #![warn(missing_docs)] 52 | mod bit; 53 | /// Types that denote the position of a byte set within a byte stack. 54 | pub mod bits { 55 | pub use crate::bit::{B0, B1, B2, B3, B4, B5, B6, B7}; 56 | } 57 | use bit::Bit; 58 | use bits::*; 59 | use core::marker::PhantomData; 60 | use core::ops::RangeInclusive; 61 | 62 | type Chunk = usize; 63 | /// Range of ASCII characters. 64 | pub const ASCII_RANGE_LEN: usize = 0x80; 65 | /// Size of one chunk of the mask in the implementation of byte sets. 66 | pub const CHUNK_SIZE: usize = core::mem::size_of::(); 67 | /// Number of bytes in one chunk of the mask. 68 | pub const BITS_PER_CHUNK: usize = 8 * CHUNK_SIZE; 69 | /// Number of chunks in the ASCII set. 70 | pub const CHUNKS: usize = ASCII_RANGE_LEN / BITS_PER_CHUNK; 71 | 72 | /// A compact set of bytes. 73 | /// Only particular instances - `AsciiSet` and `ByteSet` can be constructed. 74 | #[derive(Clone, Copy, Debug, Eq, PartialEq)] 75 | pub struct AnyByteSet { 76 | mask: [Chunk; N], 77 | } 78 | 79 | /// A compact set of ASCII bytes. Spans only 16 bytes. 80 | pub type AsciiSet = AnyByteSet; 81 | /// A compact set of all bytes. Spans only 32 bytes. 82 | pub type ByteSet = AnyByteSet<{ 2 * CHUNKS }>; 83 | 84 | /// A compact stack of up to 8 byte sets for fast lookup. 85 | /// Only particular instances - `AsciiStack` and `ByteStack` can be constructed. 86 | #[derive(Clone, Copy, Debug, Eq, PartialEq)] 87 | pub struct AnyByteStack { 88 | masks: [u8; N], 89 | current: PhantomData, 90 | } 91 | 92 | /// A compact stack of up to 8 ASCII sets for fast lookup. 93 | pub type AsciiStack = AnyByteStack; 94 | /// A compact stack of up to 8 full byte sets for fast lookup. 95 | pub type ByteStack = AnyByteStack; 96 | 97 | impl AsciiSet { 98 | /// Creates a new, empty, `AsciiSet`. 99 | pub const fn new() -> Self { 100 | Self { mask: [0; CHUNKS] } 101 | } 102 | 103 | /// Tests whether this set contains the `byte`. 104 | #[inline] 105 | pub const fn contains(&self, byte: u8) -> bool { 106 | if byte >= ASCII_RANGE_LEN as u8 { 107 | return false; 108 | }; 109 | let chunk = self.mask[byte as usize / BITS_PER_CHUNK]; 110 | let mask = 1 << (byte as usize % BITS_PER_CHUNK); 111 | (chunk & mask) != 0 112 | } 113 | } 114 | 115 | impl ByteSet { 116 | /// Creates a new, empty, `ByteSet`. 117 | pub const fn new() -> Self { 118 | Self { 119 | mask: [0; 2 * CHUNKS], 120 | } 121 | } 122 | 123 | /// Tests whether this set contains the `byte`. 124 | #[inline] 125 | pub const fn contains(&self, byte: u8) -> bool { 126 | let chunk = self.mask[byte as usize / BITS_PER_CHUNK]; 127 | let mask = 1 << (byte as usize % BITS_PER_CHUNK); 128 | (chunk & mask) != 0 129 | } 130 | } 131 | 132 | impl AnyByteSet { 133 | /// Lowercase letters (`a` - `z`) 134 | pub const LOWERCASE: Self = Self::blank().add_range(b'a'..=b'z'); 135 | /// Uppercase letters (`A` - `Z`) 136 | pub const UPPERCASE: Self = Self::blank().add_range(b'A'..=b'Z'); 137 | /// Numerical digits (`0` - `9`) 138 | pub const DIGITS: Self = Self::blank().add_range(b'0'..=b'9'); 139 | /// Uppercase and lowercase letters 140 | pub const ALPHABETIC: Self = Self::LOWERCASE.union(Self::UPPERCASE); 141 | /// Uppercase and lowercase letters and digits 142 | pub const ALPHANUMERIC: Self = Self::ALPHABETIC.union(Self::DIGITS); 143 | 144 | /// Space and tab 145 | pub const SPACE_TAB: Self = Self::blank().add_bytes(b" \t"); 146 | /// Line feed and carriage return 147 | pub const NEWLINE: Self = Self::blank().add_bytes(b"\r\n"); 148 | /// Space, tab, line feed and carriage return 149 | pub const WHITESPACE: Self = Self::SPACE_TAB.union(Self::NEWLINE); 150 | 151 | /// ASCII graphic characters 152 | pub const GRAPHIC: Self = Self::blank().add_range(b'!'..=b'~'); 153 | /// Reserved URI characters (per RFC 3986, section 2.2) 154 | pub const URI_RESERVED: Self = Self::blank().add_bytes(b"!#$&'()*+,/:;=?@[]"); 155 | 156 | const fn blank() -> Self { 157 | Self { mask: [0; N] } 158 | } 159 | 160 | /// Adds the `byte` to the set. 161 | pub const fn add(&self, byte: u8) -> Self { 162 | let mut mask = self.mask; 163 | mask[byte as usize / BITS_PER_CHUNK] |= 1 << (byte as usize % BITS_PER_CHUNK); 164 | Self { mask } 165 | } 166 | 167 | /// Removes the `byte` from the set. 168 | pub const fn remove(&self, byte: u8) -> Self { 169 | let mut mask = self.mask; 170 | mask[byte as usize / BITS_PER_CHUNK] &= !(1 << (byte as usize % BITS_PER_CHUNK)); 171 | Self { mask } 172 | } 173 | 174 | /// Adds every byte from the slice to the set. 175 | pub const fn add_bytes(&self, bytes: &[u8]) -> Self { 176 | let mut aset = *self; 177 | let mut i = 0; 178 | while i < bytes.len() { 179 | aset = aset.add(bytes[i]); 180 | i += 1; 181 | } 182 | aset 183 | } 184 | 185 | /// Removes every byte from the slice from the set. 186 | pub const fn remove_bytes(&self, bytes: &[u8]) -> Self { 187 | let mut aset = *self; 188 | let mut i = 0; 189 | while i < bytes.len() { 190 | aset = aset.remove(bytes[i]); 191 | i += 1; 192 | } 193 | aset 194 | } 195 | 196 | /// Adds every byte from the inclusive range to the set. 197 | pub const fn add_range(&self, range: RangeInclusive) -> Self { 198 | let mut aset = *self; 199 | let mut c = *range.start(); 200 | while c <= *range.end() { 201 | aset = aset.add(c); 202 | c += 1; 203 | } 204 | aset 205 | } 206 | 207 | /// Removes every byte from the inclusive range from the set. 208 | pub const fn remove_range(&self, range: RangeInclusive) -> Self { 209 | let mut aset = *self; 210 | let mut c = *range.start(); 211 | while c <= *range.end() { 212 | aset = aset.remove(c); 213 | c += 1; 214 | } 215 | aset 216 | } 217 | 218 | /// Returns the union of this set and `other`. 219 | /// 220 | /// #Panics 221 | /// Panics if the size of `other` is bigger than the size of `self`. 222 | /// 223 | /// # Examples 224 | /// ``` 225 | /// use bset::AsciiSet; 226 | /// assert_eq!(AsciiSet::ALPHABETIC, AsciiSet::UPPERCASE.union(AsciiSet::LOWERCASE)); 227 | /// ``` 228 | pub const fn union(&self, other: AnyByteSet) -> Self { 229 | let mut mask = [0; N]; 230 | let mut i = 0; 231 | while i < N { 232 | mask[i] = self.mask[i] | other.mask[i]; 233 | i += 1; 234 | } 235 | Self { mask } 236 | } 237 | 238 | /// Returns the intersection of this set and `other`. 239 | /// 240 | /// #Panics 241 | /// Panics if the size of `other` is bigger than the size of `self`. 242 | /// 243 | /// # Examples 244 | /// ``` 245 | /// use bset::AsciiSet; 246 | /// assert_eq!(AsciiSet::LOWERCASE, AsciiSet::ALPHABETIC.intersection(AsciiSet::LOWERCASE)); 247 | /// ``` 248 | pub const fn intersection(&self, other: AnyByteSet) -> Self { 249 | let mut mask = [0; N]; 250 | let mut i = 0; 251 | while i < N { 252 | mask[i] = self.mask[i] & other.mask[i]; 253 | i += 1; 254 | } 255 | Self { mask } 256 | } 257 | 258 | /// Returns the set of all ASCII chars not in `self`. 259 | pub const fn complement(&self) -> Self { 260 | let mut mask = self.mask; 261 | let mut i = 0; 262 | while i < N { 263 | mask[i] = !mask[i]; 264 | i += 1; 265 | } 266 | Self { mask } 267 | } 268 | 269 | /// Returns the set of chars in `self` but not `other`. 270 | /// 271 | /// #Panics 272 | /// Panics if the size of `other` is bigger than the size of `self`. 273 | /// 274 | /// # Examples 275 | /// ``` 276 | /// use bset::AsciiSet; 277 | /// assert_eq!(AsciiSet::LOWERCASE, AsciiSet::ALPHABETIC.difference(AsciiSet::UPPERCASE)); 278 | /// ``` 279 | pub const fn difference(&self, other: AnyByteSet) -> Self { 280 | self.intersection(other.complement()) 281 | } 282 | } 283 | 284 | impl AsciiStack { 285 | /// Tests whether the set at the position `B` in the stack contains the `byte`. 286 | #[inline] 287 | pub fn contains(&self, byte: u8) -> bool { 288 | byte < ASCII_RANGE_LEN as u8 && self.masks[byte as usize] & (1 << B::NUMBER) != 0 289 | } 290 | } 291 | 292 | impl AsciiStack { 293 | /// Creates a new `AsciiStack` 294 | pub const fn new() -> Self { 295 | Self { 296 | masks: [0; ASCII_RANGE_LEN], 297 | current: PhantomData, 298 | } 299 | } 300 | } 301 | 302 | impl ByteStack { 303 | /// Tests whether the set at the position `B` in the stack contains the `byte`. 304 | #[inline] 305 | pub fn contains(&self, byte: u8) -> bool { 306 | self.masks[byte as usize] & (1 << B::NUMBER) != 0 307 | } 308 | } 309 | 310 | impl ByteStack { 311 | /// Creates a new `ByteStack` 312 | pub const fn new() -> Self { 313 | Self { 314 | masks: [0; 2 * ASCII_RANGE_LEN], 315 | current: PhantomData, 316 | } 317 | } 318 | } 319 | 320 | // TODO: Implement this generically once generic bounds are stable for const fns. 321 | macro_rules! implement_add_set { 322 | ($($ty:ty),*) => { 323 | $(impl AnyByteStack<$ty, N> { 324 | /// Add this byte set to the next available position in this stack. 325 | pub const fn add_set( 326 | &self, 327 | aset: AnyByteSet, 328 | ) -> AnyByteStack<<$ty as Bit>::Successor, N> { 329 | let mut masks = self.masks; 330 | let mask = aset.mask; 331 | let mut i = 0; 332 | while i < M { 333 | let mut j = 0; 334 | while j < BITS_PER_CHUNK { 335 | if mask[i] & (1 << j) != 0 { 336 | masks[i * BITS_PER_CHUNK + j] |= 1 << <$ty>::NUMBER; 337 | } 338 | j += 1; 339 | } 340 | i += 1; 341 | } 342 | 343 | AnyByteStack { 344 | masks, 345 | current: PhantomData, 346 | } 347 | } 348 | })* 349 | } 350 | } 351 | implement_add_set!(B0, B1, B2, B3, B4, B5, B6, B7); 352 | --------------------------------------------------------------------------------