├── .gitignore ├── Cargo.lock ├── Cargo.toml ├── README.md └── src ├── convert.rs ├── lib.rs ├── owned_utf32.rs ├── owned_utf8.rs ├── slice_utf32.rs ├── slice_utf32_index.rs ├── slice_utf8.rs ├── slice_utf8_index.rs ├── string_base.rs ├── traits_utf32.rs ├── traits_utf8.rs └── validation.rs /.gitignore: -------------------------------------------------------------------------------- 1 | /target/ 2 | -------------------------------------------------------------------------------- /Cargo.lock: -------------------------------------------------------------------------------- 1 | # This file is automatically @generated by Cargo. 2 | # It is not intended for manual editing. 3 | version = 3 4 | 5 | [[package]] 6 | name = "cl-generic-vec" 7 | version = "0.3.3" 8 | source = "registry+https://github.com/rust-lang/crates.io-index" 9 | checksum = "928bbac2485cd21ba262e5f0148eebd4753e1a64db262048c47bb46330d9c7b9" 10 | 11 | [[package]] 12 | name = "generic-str" 13 | version = "0.3.1" 14 | dependencies = [ 15 | "cl-generic-vec", 16 | ] 17 | -------------------------------------------------------------------------------- /Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "generic-str" 3 | version = "0.3.1" 4 | edition = "2021" 5 | author = "Conrad Ludgate This project intends to be a proof-of-concept for an idea I had a few months back. 8 | > There is lots of unsafe and requires nightly. Tested on `cargo 1.58.0-nightly (2e2a16e98 2021-11-08)` 9 | 10 | ## Explanation 11 | 12 | Rust notoriously has a few different string types. The two main contenders are: 13 | 14 | - `&str` which is a 'string reference'. It's non-resizable and it's mutability is limited. 15 | - `String` which is an 'owned string'. It's resizable and can be mutated simply. 16 | 17 | It turns out that these two strings aren't too different. 18 | `str` is just a string that's backed by a `[u8]` byte slice. 19 | Similarly, `String` is just a string that's [backed by a `Vec`](https://github.com/rust-lang/rust/blob/88e5ae2dd3/library/alloc/src/string.rs#L294-L296). 20 | 21 | So why are they really different types? Couldn't we theoretically have something like 22 | 23 | ```rust 24 | type str = StringBase<[u8]>; 25 | type String = StringBase>; 26 | ``` 27 | 28 | So that's what this is. It's mostly up to feature parity with the standard library strings. A lot of the standard trait implementations are there too. 29 | 30 | ## generic-vec 31 | 32 | So there was some [discussion about whether `Allocator` was the best abstraction for customising `Vec` storage](https://internals.rust-lang.org/t/is-custom-allocators-the-right-abstraction/13460). 33 | I was very intrigured by this concept, and I made use of an [implementation that RustyYato contributed](https://github.com/RustyYato/generic-vec) in the thread in this project. 34 | 35 | So, now I have 36 | 37 | ```rust 38 | use generic_vec::{GenericVec, raw::Heap}; 39 | pub type String = OwnedString], A>>; 40 | pub type OwnedString = StringBase>; 41 | ``` 42 | 43 | Which might look more complicated, and you'd be right. Implementation wise, `GenericVec>` is supposed to be identical to `Vec` so it should be functionally the same as before. 44 | 45 | But, with the added power of this storage backed system, it allows for static allocated but resizable† strings! 46 | 47 | ```rust 48 | pub type ArrayString = OwnedString<[MaybeUninit; N]>; 49 | ``` 50 | 51 | And I get to re-use all of the same code from when implementing `String`, 52 | because it's all implemented on the base `OwnedString` type for string manipulations that needs resizablility. 53 | 54 | > †: obviously, they cannot be resized larger than the pre-defined `N` value, and it will panic in the event that you attempt to push over that. 55 | -------------------------------------------------------------------------------- /src/convert.rs: -------------------------------------------------------------------------------- 1 | use core::str::Utf8Error; 2 | 3 | use crate::string_base::StringBase; 4 | 5 | /// Converts a slice of bytes to a string slice. 6 | /// 7 | /// A string slice ([`&str`]) is made of bytes ([`u8`]), and a byte slice 8 | /// ([`&[u8]`][byteslice]) is made of bytes, so this function converts between 9 | /// the two. Not all byte slices are valid string slices, however: [`&str`] requires 10 | /// that it is valid UTF-8. `from_utf8()` checks to ensure that the bytes are valid 11 | /// UTF-8, and then does the conversion. 12 | /// 13 | /// [`&str`]: crate::str 14 | /// [byteslice]: slice 15 | /// 16 | /// If you are sure that the byte slice is valid UTF-8, and you don't want to 17 | /// incur the overhead of the validity check, there is an unsafe version of 18 | /// this function, [`from_utf8_unchecked`], which has the same 19 | /// behavior but skips the check. 20 | /// 21 | /// If you need a `String` instead of a `&str`, consider 22 | /// [`String::from_utf8`][string]. 23 | /// 24 | /// [string]: StringBase::from_utf8 25 | /// 26 | /// Because you can stack-allocate a `[u8; N]`, and you can take a 27 | /// [`&[u8]`][byteslice] of it, this function is one way to have a 28 | /// stack-allocated string. There is an example of this in the 29 | /// examples section below. 30 | /// 31 | /// [byteslice]: slice 32 | /// 33 | /// # Errors 34 | /// 35 | /// Returns `Err` if the slice is not UTF-8 with a description as to why the 36 | /// provided slice is not UTF-8. 37 | /// 38 | /// # Examples 39 | /// 40 | /// Basic usage: 41 | /// 42 | /// ``` 43 | /// # use generic_str::str; 44 | /// // some bytes, in a vector 45 | /// let sparkle_heart = vec![240, 159, 146, 150]; 46 | /// 47 | /// // We know these bytes are valid, so just use `unwrap()`. 48 | /// let sparkle_heart = generic_str::from_utf8(&sparkle_heart).unwrap(); 49 | /// 50 | /// assert_eq!(sparkle_heart, <&str>::from("💖")); 51 | /// ``` 52 | /// 53 | /// Incorrect bytes: 54 | /// 55 | /// ``` 56 | /// // some invalid bytes, in a vector 57 | /// let sparkle_heart = vec![0, 159, 146, 150]; 58 | /// 59 | /// assert!(generic_str::from_utf8(&sparkle_heart).is_err()); 60 | /// ``` 61 | /// 62 | /// See the docs for [`Utf8Error`] for more details on the kinds of 63 | /// errors that can be returned. 64 | /// 65 | /// A "stack allocated string": 66 | /// 67 | /// ``` 68 | /// # use generic_str::str; 69 | /// // some bytes, in a stack-allocated array 70 | /// let sparkle_heart = [240, 159, 146, 150]; 71 | /// 72 | /// // We know these bytes are valid, so just use `unwrap()`. 73 | /// let sparkle_heart = generic_str::from_utf8(&sparkle_heart).unwrap(); 74 | /// 75 | /// assert_eq!(sparkle_heart, <&str>::from("💖")); 76 | /// ``` 77 | pub fn from_utf8(v: &[u8]) -> Result<&StringBase<[u8]>, Utf8Error> { 78 | Ok(core::str::from_utf8(v)?.into()) 79 | } 80 | 81 | /// Converts a mutable slice of bytes to a mutable string slice. 82 | /// 83 | /// # Examples 84 | /// 85 | /// Basic usage: 86 | /// 87 | /// ``` 88 | /// # use generic_str::str; 89 | /// // "Hello, Rust!" as a mutable vector 90 | /// let mut hellorust = vec![72, 101, 108, 108, 111, 44, 32, 82, 117, 115, 116, 33]; 91 | /// 92 | /// // As we know these bytes are valid, we can use `unwrap()` 93 | /// let outstr = generic_str::from_utf8_mut(&mut hellorust).unwrap(); 94 | /// 95 | /// assert_eq!(outstr, <&str>::from("Hello, Rust!")); 96 | /// ``` 97 | /// 98 | /// Incorrect bytes: 99 | /// 100 | /// ``` 101 | /// // Some invalid bytes in a mutable vector 102 | /// let mut invalid = vec![128, 223]; 103 | /// 104 | /// assert!(generic_str::from_utf8_mut(&mut invalid).is_err()); 105 | /// ``` 106 | /// See the docs for [`Utf8Error`] for more details on the kinds of 107 | /// errors that can be returned. 108 | pub fn from_utf8_mut(v: &mut [u8]) -> Result<&mut StringBase<[u8]>, Utf8Error> { 109 | Ok(core::str::from_utf8_mut(v)?.into()) 110 | } 111 | 112 | /// Converts a slice of bytes to a string slice without checking 113 | /// that the string contains valid UTF-8. 114 | /// 115 | /// See the safe version, [`from_utf8`], for more information. 116 | /// 117 | /// # Safety 118 | /// 119 | /// This function is unsafe because it does not check that the bytes passed to 120 | /// it are valid UTF-8. If this constraint is violated, undefined behavior 121 | /// results, as the rest of Rust assumes that [`&str`]s are valid UTF-8. 122 | /// 123 | /// [`&str`]: str 124 | /// 125 | /// # Examples 126 | /// 127 | /// Basic usage: 128 | /// 129 | /// ``` 130 | /// # use generic_str::str; 131 | /// // some bytes, in a vector 132 | /// let sparkle_heart = vec![240, 159, 146, 150]; 133 | /// 134 | /// let sparkle_heart = unsafe { 135 | /// generic_str::from_utf8_unchecked(&sparkle_heart) 136 | /// }; 137 | /// 138 | /// assert_eq!(sparkle_heart, <&str>::from("💖")); 139 | /// ``` 140 | #[inline] 141 | pub const unsafe fn from_utf8_unchecked(v: &[u8]) -> &StringBase<[u8]> { 142 | // SAFETY: the caller must guarantee that the bytes `v` are valid UTF-8. 143 | // Also relies on `&str` and `&[u8]` having the same layout. 144 | core::mem::transmute(v) 145 | } 146 | 147 | /// Converts a slice of bytes to a string slice without checking 148 | /// that the string contains valid UTF-8. 149 | /// 150 | /// See the safe version, [`from_utf8`], for more information. 151 | /// 152 | /// # Safety 153 | /// 154 | /// This function is unsafe because it does not check that the bytes passed to 155 | /// it are valid UTF-8. If this constraint is violated, undefined behavior 156 | /// results, as the rest of Rust assumes that [`&str`]s are valid UTF-8. 157 | /// 158 | /// [`&str`]: str 159 | /// 160 | /// # Examples 161 | /// 162 | /// Basic usage: 163 | /// 164 | /// ``` 165 | /// # use generic_str::str; 166 | /// // some bytes, in a vector 167 | /// let sparkle_heart = vec![240, 159, 146, 150]; 168 | /// 169 | /// let sparkle_heart = unsafe { 170 | /// generic_str::from_utf8_unchecked(&sparkle_heart) 171 | /// }; 172 | /// 173 | /// assert_eq!(sparkle_heart, <&str>::from("💖")); 174 | /// ``` 175 | #[inline] 176 | pub const unsafe fn from_utf8_unchecked_mut(v: &mut [u8]) -> &mut StringBase<[u8]> { 177 | // SAFETY: the caller must guarantee that the bytes `v` are valid UTF-8. 178 | // Also relies on `&str` and `&[u8]` having the same layout. 179 | core::mem::transmute(v) 180 | } 181 | -------------------------------------------------------------------------------- /src/lib.rs: -------------------------------------------------------------------------------- 1 | #![cfg_attr(not(any(doc, feature = "std")), no_std)] 2 | 3 | //! The one and only string type in Rust 4 | //! 5 | //! ``` 6 | //! # use generic_str::str; 7 | //! let foo: &str = "foo".into(); 8 | //! let expected: &str = "foobar".into(); 9 | //! 10 | //! let mut foobar = foo.to_owned(); 11 | //! foobar.push_str("bar".into()); 12 | //! 13 | //! assert_eq!(foobar, *expected); 14 | //! ``` 15 | #![cfg_attr(feature = "alloc", feature(vec_into_raw_parts))] 16 | #![feature(str_internals)] 17 | #![feature(allocator_api)] 18 | #![feature(slice_range)] 19 | #![feature(slice_index_methods)] 20 | #![feature(slice_ptr_get)] 21 | #![feature(slice_ptr_len)] 22 | #![feature(const_mut_refs)] 23 | #![feature(const_fn_trait_bound)] 24 | #![feature(unicode_internals)] 25 | 26 | mod convert; 27 | mod owned_utf32; 28 | mod owned_utf8; 29 | mod slice_utf32; 30 | mod slice_utf32_index; 31 | mod slice_utf8; 32 | mod slice_utf8_index; 33 | mod string_base; 34 | mod traits_utf32; 35 | mod traits_utf8; 36 | mod validation; 37 | 38 | pub use convert::*; 39 | pub use owned_utf32::*; 40 | pub use owned_utf8::*; 41 | pub use slice_utf32::*; 42 | pub use slice_utf8::*; 43 | pub use string_base::*; 44 | 45 | #[cfg(test)] 46 | mod tests { 47 | use crate::str; 48 | 49 | #[test] 50 | fn test() { 51 | let foo: &str = "foo".into(); 52 | let expected: &str = "foobar".into(); 53 | 54 | let mut foobar = foo.to_owned(); 55 | foobar.push_str("bar".into()); 56 | 57 | assert_eq!(foobar, *expected); 58 | } 59 | } 60 | -------------------------------------------------------------------------------- /src/owned_utf32.rs: -------------------------------------------------------------------------------- 1 | #[cfg(feature = "alloc")] 2 | use std::alloc::{Allocator, Global}; 3 | use std::mem::MaybeUninit; 4 | 5 | use generic_vec::{ 6 | raw::{Storage, StorageWithCapacity}, 7 | ArrayVec, GenericVec, 8 | }; 9 | 10 | use crate::{OwnedString, StringBase}; 11 | 12 | /// UTF-32 Owned String that supports reallocation 13 | /// 14 | /// ``` 15 | /// # use generic_str::String32; 16 | /// let mut s = String32::new(); 17 | /// s.push_str32(&String32::from("foobar")); 18 | /// assert_eq!(s, String32::from("foobar")); 19 | /// ``` 20 | #[cfg(feature = "alloc")] 21 | pub type String32 = OwnedString], A>>; 22 | 23 | /// UTF-32 Owned String that has a fixed capacity 24 | /// 25 | /// ``` 26 | /// # use generic_str::{String32, ArrayString32}; 27 | /// let mut s = ArrayString32::<8>::new(); 28 | /// assert_eq!(std::mem::size_of_val(&s), 8 * 4 + 8); // 8 chars of storage, 8 bytes for length 29 | /// 30 | /// s.push_str32(&String32::from("foo")); 31 | /// let t = s.clone(); // cloning requires no heap allocations 32 | /// s.push_str32(&String32::from("bar")); 33 | /// 34 | /// assert_eq!(t, String32::from("foo")); 35 | /// assert_eq!(s, String32::from("foobar")); 36 | /// ``` 37 | pub type ArrayString32 = OwnedString; N]>; 38 | 39 | #[cfg(feature = "alloc")] 40 | impl String32 { 41 | /// Creates a new empty `String32`. 42 | /// 43 | /// Given that the `String32` is empty, this will not allocate any initial 44 | /// buffer. While that means that this initial operation is very 45 | /// inexpensive, it may cause excessive allocation later when you add 46 | /// data. If you have an idea of how much data the `String32` will hold, 47 | /// consider the [`with_capacity`] method to prevent excessive 48 | /// re-allocation. 49 | /// 50 | /// [`with_capacity`]: String32::with_capacity 51 | /// 52 | /// # Examples 53 | /// 54 | /// Basic usage: 55 | /// 56 | /// ``` 57 | /// # use generic_str::String32; 58 | /// let s = String32::new(); 59 | /// ``` 60 | #[inline] 61 | pub fn new() -> Self { 62 | Self::with_storage(Box::default()) 63 | } 64 | 65 | /// Creates a new empty `String` with a particular capacity. 66 | /// 67 | /// `String`s have an internal buffer to hold their data. The capacity is 68 | /// the length of that buffer, and can be queried with the [`capacity`] 69 | /// method. This method creates an empty `String`, but one with an initial 70 | /// buffer that can hold `capacity` bytes. This is useful when you may be 71 | /// appending a bunch of data to the `String`, reducing the number of 72 | /// reallocations it needs to do. 73 | /// 74 | /// [`capacity`]: StringBase::capacity 75 | /// 76 | /// If the given capacity is `0`, no allocation will occur, and this method 77 | /// is identical to the [`new`] method. 78 | /// 79 | /// [`new`]: StringBase::new 80 | /// 81 | /// # Examples 82 | /// 83 | /// Basic usage: 84 | /// 85 | /// ``` 86 | /// # use generic_str::String32; 87 | /// let mut s = String32::with_capacity(10); 88 | /// 89 | /// // The String contains no chars, even though it has capacity for more 90 | /// assert_eq!(s.len(), 0); 91 | /// 92 | /// // These are all done without reallocating... 93 | /// let cap = s.capacity(); 94 | /// for _ in 0..10 { 95 | /// s.push('a'); 96 | /// } 97 | /// 98 | /// assert_eq!(s.capacity(), cap); 99 | /// 100 | /// // ...but this may make the string reallocate 101 | /// s.push('a'); 102 | /// ``` 103 | #[inline] 104 | pub fn with_capacity(capacity: usize) -> Self { 105 | Self::new_with_capacity(capacity) 106 | } 107 | } 108 | 109 | #[cfg(feature = "alloc")] 110 | impl String32 { 111 | pub fn with_alloc(alloc: A) -> Self { 112 | Self::with_storage(Box::new_uninit_slice_in(0, alloc)) 113 | } 114 | } 115 | 116 | impl ArrayString32 { 117 | /// Creates a new empty `ArrayString`. 118 | /// 119 | /// # Examples 120 | /// 121 | /// Basic usage: 122 | /// 123 | /// ``` 124 | /// # use generic_str::ArrayString32; 125 | /// let s = ArrayString32::<8>::new(); 126 | /// ``` 127 | #[inline] 128 | pub fn new() -> Self { 129 | Self { 130 | storage: ArrayVec::new(), 131 | } 132 | } 133 | } 134 | 135 | impl> OwnedString { 136 | /// Appends a given string slice onto the end of this `String`. 137 | /// 138 | /// # Examples 139 | /// 140 | /// Basic usage: 141 | /// 142 | /// ``` 143 | /// # use generic_str::String32; 144 | /// let mut s = String32::from("foo"); 145 | /// let t = String32::from("bar"); 146 | /// 147 | /// s.push_str32(&t); 148 | /// 149 | /// assert_eq!(s, String32::from("foobar")); 150 | /// ``` 151 | #[inline] 152 | pub fn push_str32(&mut self, string: &crate::str32) { 153 | self.storage.extend_from_slice(&string.storage) 154 | } 155 | 156 | /// Appends the given [`char`] to the end of this `String`. 157 | /// 158 | /// # Examples 159 | /// 160 | /// Basic usage: 161 | /// 162 | /// ``` 163 | /// # use generic_str::String32; 164 | /// let mut s = String32::new(); 165 | /// 166 | /// s.push('1'); 167 | /// s.push('2'); 168 | /// s.push('3'); 169 | /// 170 | /// assert_eq!(s, String32::from("123")); 171 | /// ``` 172 | #[inline] 173 | pub fn push(&mut self, ch: char) { 174 | self.storage.push(ch); 175 | } 176 | 177 | /// Removes the last character from the string buffer and returns it. 178 | /// 179 | /// Returns [`None`] if this `String` is empty. 180 | /// 181 | /// # Examples 182 | /// 183 | /// Basic usage: 184 | /// 185 | /// ``` 186 | /// # use generic_str::String32; 187 | /// let mut s = String32::from("foo"); 188 | /// 189 | /// assert_eq!(s.pop(), Some('o')); 190 | /// assert_eq!(s.pop(), Some('o')); 191 | /// assert_eq!(s.pop(), Some('f')); 192 | /// 193 | /// assert_eq!(s.pop(), None); 194 | /// ``` 195 | #[inline] 196 | pub fn pop(&mut self) -> Option { 197 | self.storage.try_pop() 198 | } 199 | 200 | /// Shortens this `String` to the specified length. 201 | /// 202 | /// If `new_len` is greater than the string's current length, this has no 203 | /// effect. 204 | /// 205 | /// Note that this method has no effect on the allocated capacity 206 | /// of the string 207 | /// 208 | /// # Panics 209 | /// 210 | /// Panics if `new_len` does not lie on a [`char`] boundary. 211 | /// 212 | /// # Examples 213 | /// 214 | /// Basic usage: 215 | /// 216 | /// ``` 217 | /// # use generic_str::String32; 218 | /// let mut s = String32::from("hello"); 219 | /// 220 | /// s.truncate(2); 221 | /// 222 | /// assert_eq!(s, String32::from("he")); 223 | /// ``` 224 | #[inline] 225 | pub fn truncate(&mut self, new_len: usize) { 226 | self.storage.truncate(new_len) 227 | } 228 | 229 | /// Removes a [`char`] from this `String` at a byte position and returns it. 230 | /// 231 | /// This is an *O*(*n*) operation, as it requires copying every element in the 232 | /// buffer. 233 | /// 234 | /// # Panics 235 | /// 236 | /// Panics if `idx` is larger than or equal to the `String`'s length, 237 | /// or if it does not lie on a [`char`] boundary. 238 | /// 239 | /// # Examples 240 | /// 241 | /// Basic usage: 242 | /// 243 | /// ``` 244 | /// # use generic_str::String32; 245 | /// let mut s = String32::from("foo"); 246 | /// 247 | /// assert_eq!(s.remove(0), 'f'); 248 | /// assert_eq!(s.remove(1), 'o'); 249 | /// assert_eq!(s.remove(0), 'o'); 250 | /// ``` 251 | #[inline] 252 | pub fn remove(&mut self, idx: usize) -> char { 253 | self.storage.remove(idx) 254 | } 255 | 256 | /// Inserts a character into this `String` at a byte position. 257 | /// 258 | /// This is an *O*(*n*) operation as it requires copying every element in the 259 | /// buffer. 260 | /// 261 | /// # Panics 262 | /// 263 | /// Panics if `idx` is larger than the `String`'s length, or if it does not 264 | /// lie on a [`char`] boundary. 265 | /// 266 | /// # Examples 267 | /// 268 | /// Basic usage: 269 | /// 270 | /// ``` 271 | /// # use generic_str::String32; 272 | /// let mut s = String32::with_capacity(3); 273 | /// 274 | /// s.insert(0, 'f'); 275 | /// s.insert(1, 'o'); 276 | /// s.insert(2, 'o'); 277 | /// 278 | /// assert_eq!(s, String32::from("foo")); 279 | /// ``` 280 | #[inline] 281 | pub fn insert(&mut self, idx: usize, ch: char) { 282 | self.storage.insert(idx, ch); 283 | } 284 | 285 | /// Returns a mutable reference to the contents of this `String`. 286 | /// 287 | /// # Examples 288 | /// 289 | /// Basic usage: 290 | /// 291 | /// ``` 292 | /// # use generic_str::String32; 293 | /// let mut s = String32::from("hello"); 294 | /// 295 | /// unsafe { 296 | /// let vec = s.as_mut_vec(); 297 | /// assert_eq!(&['h', 'e', 'l', 'l', 'o'][..], &vec[..]); 298 | /// 299 | /// vec.reverse(); 300 | /// } 301 | /// assert_eq!(s, String32::from("olleh")); 302 | /// ``` 303 | #[inline] 304 | pub fn as_mut_vec(&mut self) -> &mut GenericVec { 305 | &mut self.storage 306 | } 307 | 308 | /// Splits the string into two at the given byte index. 309 | /// 310 | /// Returns a newly allocated `String`. `self` contains bytes `[0, at)`, and 311 | /// the returned `String` contains bytes `[at, len)`. `at` must be on the 312 | /// boundary of a UTF-8 code point. 313 | /// 314 | /// Note that the capacity of `self` does not change. 315 | /// 316 | /// # Panics 317 | /// 318 | /// Panics if `at` is not on a `UTF-8` code point boundary, or if it is beyond the last 319 | /// code point of the string. 320 | /// 321 | /// # Examples 322 | /// 323 | /// ``` 324 | /// # use generic_str::String32; 325 | /// # fn main() { 326 | /// let mut hello = String32::from("Hello, World!"); 327 | /// let world: String32 = hello.split_off(7); 328 | /// assert_eq!(hello, String32::from("Hello, ")); 329 | /// assert_eq!(world, String32::from("World!")); 330 | /// # } 331 | /// ``` 332 | #[inline] 333 | #[must_use = "use `.truncate()` if you don't need the other half"] 334 | pub fn split_off>( 335 | &mut self, 336 | at: usize, 337 | ) -> OwnedString { 338 | let other = self.storage.split_off(at); 339 | StringBase { storage: other } 340 | } 341 | 342 | /// Truncates this `String`, removing all contents. 343 | /// 344 | /// While this means the `String` will have a length of zero, it does not 345 | /// touch its capacity. 346 | /// 347 | /// # Examples 348 | /// 349 | /// Basic usage: 350 | /// 351 | /// ``` 352 | /// # use generic_str::String32; 353 | /// let mut s = String32::from("foo"); 354 | /// let cap = s.capacity(); 355 | /// 356 | /// s.clear(); 357 | /// 358 | /// assert!(s.is_empty()); 359 | /// assert_eq!(0, s.len()); 360 | /// assert_eq!(cap, s.capacity()); 361 | /// ``` 362 | #[inline] 363 | pub fn clear(&mut self) { 364 | self.storage.clear() 365 | } 366 | 367 | /// Returns this `String`'s capacity, in bytes. 368 | /// 369 | /// # Examples 370 | /// 371 | /// Basic usage: 372 | /// 373 | /// ``` 374 | /// # use generic_str::String32; 375 | /// let s = String32::with_capacity(10); 376 | /// 377 | /// assert!(s.capacity() >= 10); 378 | /// ``` 379 | #[inline] 380 | pub fn capacity(&self) -> usize { 381 | self.storage.capacity() 382 | } 383 | } 384 | -------------------------------------------------------------------------------- /src/owned_utf8.rs: -------------------------------------------------------------------------------- 1 | use core::str::Utf8Error; 2 | use std::mem::MaybeUninit; 3 | 4 | use generic_vec::{ 5 | raw::{AllocResult, Storage, StorageWithCapacity}, 6 | ArrayVec, GenericVec, 7 | }; 8 | 9 | #[cfg(feature = "alloc")] 10 | use std::alloc::{Allocator, Global}; 11 | 12 | use crate::{string_base::StringBase, OwnedString}; 13 | 14 | /// Exactly the same as [`std::string::String`], except generic 15 | /// 16 | /// ``` 17 | /// # use generic_str::{str, String}; 18 | /// let mut s = String::new(); 19 | /// s.push_str("foobar".into()); 20 | /// assert_eq!(s, <&str>::from("foobar")); 21 | /// ``` 22 | #[cfg(feature = "alloc")] 23 | pub type String = OwnedString], A>>; 24 | 25 | /// Same API as [`String`] but without any re-allocation. Can only hold up to `N` bytes 26 | /// 27 | /// ``` 28 | /// # use generic_str::{str, ArrayString}; 29 | /// let mut s = ArrayString::<8>::new(); 30 | /// assert_eq!(std::mem::size_of_val(&s), 8 + 8); // 8 bytes of storage, 8 bytes for length 31 | /// 32 | /// s.push_str("foo".into()); 33 | /// let t = s.clone(); // cloning requires no heap allocations 34 | /// s.push_str("bar".into()); 35 | /// 36 | /// assert_eq!(t, <&str>::from("foo")); 37 | /// assert_eq!(s, <&str>::from("foobar")); 38 | /// ``` 39 | pub type ArrayString = OwnedString; N]>; 40 | 41 | #[cfg(feature = "alloc")] 42 | impl String { 43 | /// Creates a new empty `String`. 44 | /// 45 | /// Given that the `String` is empty, this will not allocate any initial 46 | /// buffer. While that means that this initial operation is very 47 | /// inexpensive, it may cause excessive allocation later when you add 48 | /// data. If you have an idea of how much data the `String` will hold, 49 | /// consider the [`with_capacity`] method to prevent excessive 50 | /// re-allocation. 51 | /// 52 | /// [`with_capacity`]: String::with_capacity 53 | /// 54 | /// # Examples 55 | /// 56 | /// Basic usage: 57 | /// 58 | /// ``` 59 | /// # use generic_str::String; 60 | /// let s = String::new(); 61 | /// ``` 62 | #[inline] 63 | pub fn new() -> Self { 64 | Self::with_storage(Box::default()) 65 | } 66 | 67 | /// Creates a new empty `String` with a particular capacity. 68 | /// 69 | /// `String`s have an internal buffer to hold their data. The capacity is 70 | /// the length of that buffer, and can be queried with the [`capacity`] 71 | /// method. This method creates an empty `String`, but one with an initial 72 | /// buffer that can hold `capacity` bytes. This is useful when you may be 73 | /// appending a bunch of data to the `String`, reducing the number of 74 | /// reallocations it needs to do. 75 | /// 76 | /// [`capacity`]: StringBase::capacity 77 | /// 78 | /// If the given capacity is `0`, no allocation will occur, and this method 79 | /// is identical to the [`new`] method. 80 | /// 81 | /// [`new`]: StringBase::new 82 | /// 83 | /// # Examples 84 | /// 85 | /// Basic usage: 86 | /// 87 | /// ``` 88 | /// # use generic_str::String; 89 | /// let mut s = String::with_capacity(10); 90 | /// 91 | /// // The String contains no chars, even though it has capacity for more 92 | /// assert_eq!(s.len(), 0); 93 | /// 94 | /// // These are all done without reallocating... 95 | /// let cap = s.capacity(); 96 | /// for _ in 0..10 { 97 | /// s.push('a'); 98 | /// } 99 | /// 100 | /// assert_eq!(s.capacity(), cap); 101 | /// 102 | /// // ...but this may make the string reallocate 103 | /// s.push('a'); 104 | /// ``` 105 | #[inline] 106 | pub fn with_capacity(capacity: usize) -> Self { 107 | Self::new_with_capacity(capacity) 108 | } 109 | } 110 | 111 | #[cfg(feature = "alloc")] 112 | impl String { 113 | pub fn with_alloc(alloc: A) -> Self { 114 | Self::with_storage(Box::new_uninit_slice_in(0, alloc)) 115 | } 116 | } 117 | 118 | impl ArrayString { 119 | /// Creates a new empty `ArrayString`. 120 | /// 121 | /// # Examples 122 | /// 123 | /// Basic usage: 124 | /// 125 | /// ``` 126 | /// # use generic_str::ArrayString; 127 | /// let s = ArrayString::<8>::new(); 128 | /// ``` 129 | #[inline] 130 | pub fn new() -> Self { 131 | Self { 132 | storage: ArrayVec::new(), 133 | } 134 | } 135 | } 136 | 137 | #[derive(PartialEq, Eq)] 138 | pub struct FromUtf8Error> { 139 | bytes: GenericVec, 140 | error: Utf8Error, 141 | } 142 | 143 | use core::fmt; 144 | impl> fmt::Debug for FromUtf8Error { 145 | fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { 146 | f.debug_struct("FromUtf8Error") 147 | .field("bytes", &self.bytes) 148 | .field("error", &self.error) 149 | .finish() 150 | } 151 | } 152 | 153 | impl> OwnedString { 154 | /// Converts a vector of bytes to a `String`. 155 | /// 156 | /// A string ([`String`]) is made of bytes ([`u8`]), and a vector of bytes 157 | /// ([`Vec`]) is made of bytes, so this function converts between the 158 | /// two. Not all byte slices are valid `String`s, however: `String` 159 | /// requires that it is valid UTF-8. `from_utf8()` checks to ensure that 160 | /// the bytes are valid UTF-8, and then does the conversion. 161 | /// 162 | /// If you are sure that the byte slice is valid UTF-8, and you don't want 163 | /// to incur the overhead of the validity check, there is an unsafe version 164 | /// of this function, [`from_utf8_unchecked`], which has the same behavior 165 | /// but skips the check. 166 | /// 167 | /// This method will take care to not copy the vector, for efficiency's 168 | /// sake. 169 | /// 170 | /// If you need a [`&str`] instead of a `String`, consider 171 | /// [`from_utf8`]. 172 | /// 173 | /// [`from_utf8`]: crate::from_utf8 174 | /// 175 | /// The inverse of this method is [`into_bytes`]. 176 | /// 177 | /// # Errors 178 | /// 179 | /// Returns [`Err`] if the slice is not UTF-8 with a description as to why the 180 | /// provided bytes are not UTF-8. The vector you moved in is also included. 181 | /// 182 | /// # Examples 183 | /// 184 | /// Basic usage: 185 | /// 186 | /// ``` 187 | /// # use generic_str::{str, String}; 188 | /// // some bytes, in a vector 189 | /// let sparkle_heart = vec![240, 159, 146, 150]; 190 | /// 191 | /// // We know these bytes are valid, so we'll use `unwrap()`. 192 | /// let sparkle_heart = String::from_utf8(sparkle_heart.into()).unwrap(); 193 | /// 194 | /// assert_eq!(sparkle_heart, <&str>::from("💖")); 195 | /// ``` 196 | /// 197 | /// Incorrect bytes: 198 | /// 199 | /// ``` 200 | /// # use generic_str::String; 201 | /// // some invalid bytes, in a vector 202 | /// let sparkle_heart = vec![0, 159, 146, 150]; 203 | /// 204 | /// assert!(String::from_utf8(sparkle_heart.into()).is_err()); 205 | /// ``` 206 | /// 207 | /// See the docs for [`FromUtf8Error`] for more details on what you can do 208 | /// with this error. 209 | /// 210 | /// [`from_utf8_unchecked`]: StringBase::from_utf8_unchecked 211 | /// [`Vec`]: std::vec::Vec 212 | /// [`&str`]: prim@str 213 | /// [`into_bytes`]: StringBase::into_bytes 214 | #[inline] 215 | pub fn from_utf8(vec: GenericVec) -> Result> 216 | where 217 | S: Sized, 218 | { 219 | match core::str::from_utf8(&vec) { 220 | Ok(..) => Ok(Self { storage: vec }), 221 | Err(e) => Err(FromUtf8Error { 222 | bytes: vec, 223 | error: e, 224 | }), 225 | } 226 | } 227 | /// Converts a vector of bytes to a `String` without checking that the 228 | /// string contains valid UTF-8. 229 | /// 230 | /// See the safe version, [`from_utf8`], for more details. 231 | /// 232 | /// [`from_utf8`]: StringBase::from_utf8 233 | /// 234 | /// # Safety 235 | /// 236 | /// This function is unsafe because it does not check that the bytes passed 237 | /// to it are valid UTF-8. If this constraint is violated, it may cause 238 | /// memory unsafety issues with future users of the `String`, as the rest of 239 | /// the standard library assumes that `String`s are valid UTF-8. 240 | /// 241 | /// # Examples 242 | /// 243 | /// Basic usage: 244 | /// 245 | /// ``` 246 | /// # use generic_str::{str, String}; 247 | /// // some bytes, in a vector 248 | /// let sparkle_heart = vec![240, 159, 146, 150]; 249 | /// 250 | /// let sparkle_heart = unsafe { 251 | /// String::from_utf8_unchecked(sparkle_heart.into()) 252 | /// }; 253 | /// 254 | /// assert_eq!(sparkle_heart, <&str>::from("💖")); 255 | /// ``` 256 | #[inline] 257 | pub unsafe fn from_utf8_unchecked(vec: GenericVec) -> Self 258 | where 259 | S: Sized, 260 | { 261 | Self { storage: vec } 262 | } 263 | /// Converts a `String` into a byte vector. 264 | /// 265 | /// This consumes the `String`, so we do not need to copy its contents. 266 | /// 267 | /// # Examples 268 | /// 269 | /// Basic usage: 270 | /// 271 | /// ``` 272 | /// # use generic_str::String; 273 | /// let s = String::from("hello"); 274 | /// let bytes = s.into_bytes(); 275 | /// 276 | /// assert_eq!(&[104, 101, 108, 108, 111][..], &bytes[..]); 277 | /// ``` 278 | #[inline] 279 | pub fn into_bytes(self) -> GenericVec 280 | where 281 | S: Sized, 282 | { 283 | self.storage 284 | } 285 | /// Extracts a string slice containing the entire `String`. 286 | /// 287 | /// # Examples 288 | /// 289 | /// Basic usage: 290 | /// 291 | /// ``` 292 | /// # use generic_str::{str, String}; 293 | /// let s = String::from("foo"); 294 | /// 295 | /// assert_eq!(s.as_str(), <&str>::from("foo")); 296 | /// ``` 297 | #[inline] 298 | pub fn as_str(&self) -> &crate::str { 299 | self 300 | } 301 | /// Converts a `String` into a mutable string slice. 302 | /// 303 | /// # Examples 304 | /// 305 | /// Basic usage: 306 | /// 307 | /// ``` 308 | /// # use generic_str::{str, String}; 309 | /// let mut s = String::from("foobar"); 310 | /// let s_mut_str = s.as_mut_str(); 311 | /// 312 | /// s_mut_str.make_ascii_uppercase(); 313 | /// 314 | /// assert_eq!(s_mut_str, <&str>::from("FOOBAR")); 315 | /// ``` 316 | #[inline] 317 | pub fn as_mut_str(&mut self) -> &mut crate::str { 318 | self 319 | } 320 | /// Appends a given string slice onto the end of this `String`. 321 | /// 322 | /// # Examples 323 | /// 324 | /// Basic usage: 325 | /// 326 | /// ``` 327 | /// # use generic_str::{str, String}; 328 | /// let mut s = String::from("foo"); 329 | /// 330 | /// s.push_str("bar".into()); 331 | /// 332 | /// assert_eq!(s, <&str>::from("foobar")); 333 | /// ``` 334 | #[inline] 335 | pub fn push_str(&mut self, string: &crate::str) { 336 | self.storage.extend_from_slice(&string.storage) 337 | } 338 | /// Ensures that this `String`'s capacity is at least `additional` bytes 339 | /// larger than its length. 340 | /// 341 | /// The capacity may be increased by more than `additional` bytes if it 342 | /// chooses, to prevent frequent reallocations. 343 | /// 344 | /// # Panics 345 | /// 346 | /// Panics if the new capacity overflows [`usize`]. 347 | /// 348 | /// # Examples 349 | /// 350 | /// Basic usage: 351 | /// 352 | /// ``` 353 | /// # use generic_str::String; 354 | /// let mut s = String::new(); 355 | /// 356 | /// s.reserve(10); 357 | /// 358 | /// assert!(s.capacity() >= 10); 359 | /// ``` 360 | /// 361 | /// This may not actually increase the capacity: 362 | /// 363 | /// ``` 364 | /// # use generic_str::String; 365 | /// let mut s = String::with_capacity(10); 366 | /// s.push('a'); 367 | /// s.push('b'); 368 | /// 369 | /// // s now has a length of 2 and a capacity of 10 370 | /// assert_eq!(2, s.len()); 371 | /// assert_eq!(10, s.capacity()); 372 | /// 373 | /// // Since we already have an extra 8 capacity, calling this... 374 | /// s.reserve(8); 375 | /// 376 | /// // ... doesn't actually increase. 377 | /// assert_eq!(10, s.capacity()); 378 | /// ``` 379 | #[inline] 380 | pub fn reserve(&mut self, additional: usize) { 381 | self.storage.reserve(additional) 382 | } 383 | /// Tries to reserve capacity for at least `additional` more elements to be inserted 384 | /// in the given `String`. The collection may reserve more space to avoid 385 | /// frequent reallocations. After calling `reserve`, capacity will be 386 | /// greater than or equal to `self.len() + additional`. Does nothing if 387 | /// capacity is already sufficient. 388 | /// 389 | /// # Errors 390 | /// 391 | /// If the capacity overflows, or the allocator reports a failure, then an error 392 | /// is returned. 393 | pub fn try_reserve(&mut self, additional: usize) -> AllocResult { 394 | self.storage.try_reserve(additional) 395 | } 396 | /// Appends the given [`char`] to the end of this `String`. 397 | /// 398 | /// # Examples 399 | /// 400 | /// Basic usage: 401 | /// 402 | /// ``` 403 | /// # use generic_str::{str, String}; 404 | /// let mut s = String::from("abc"); 405 | /// 406 | /// s.push('1'); 407 | /// s.push('2'); 408 | /// s.push('3'); 409 | /// 410 | /// assert_eq!(s, <&str>::from("abc123")); 411 | /// ``` 412 | #[inline] 413 | pub fn push(&mut self, ch: char) { 414 | match ch.len_utf8() { 415 | 1 => { 416 | self.storage.push(ch as u8); 417 | } 418 | _ => self 419 | .storage 420 | .extend_from_slice(ch.encode_utf8(&mut [0; 4]).as_bytes()), 421 | } 422 | } 423 | 424 | /// Removes the last character from the string buffer and returns it. 425 | /// 426 | /// Returns [`None`] if this `String` is empty. 427 | /// 428 | /// # Examples 429 | /// 430 | /// Basic usage: 431 | /// 432 | /// ``` 433 | /// # use generic_str::String; 434 | /// let mut s = String::from("foo"); 435 | /// 436 | /// assert_eq!(s.pop(), Some('o')); 437 | /// assert_eq!(s.pop(), Some('o')); 438 | /// assert_eq!(s.pop(), Some('f')); 439 | /// 440 | /// assert_eq!(s.pop(), None); 441 | /// ``` 442 | #[inline] 443 | pub fn pop(&mut self) -> Option { 444 | let ch = self.chars().rev().next()?; 445 | let newlen = self.len() - ch.len_utf8(); 446 | unsafe { 447 | self.storage.set_len_unchecked(newlen); 448 | } 449 | Some(ch) 450 | } 451 | 452 | /// Shortens this `String` to the specified length. 453 | /// 454 | /// If `new_len` is greater than the string's current length, this has no 455 | /// effect. 456 | /// 457 | /// Note that this method has no effect on the allocated capacity 458 | /// of the string 459 | /// 460 | /// # Panics 461 | /// 462 | /// Panics if `new_len` does not lie on a [`char`] boundary. 463 | /// 464 | /// # Examples 465 | /// 466 | /// Basic usage: 467 | /// 468 | /// ``` 469 | /// # use generic_str::{str, String}; 470 | /// let mut s = String::from("hello"); 471 | /// 472 | /// s.truncate(2); 473 | /// 474 | /// assert_eq!(s, <&str>::from("he")); 475 | /// ``` 476 | #[inline] 477 | pub fn truncate(&mut self, new_len: usize) { 478 | if new_len <= self.len() { 479 | assert!(self.is_char_boundary(new_len)); 480 | self.storage.truncate(new_len) 481 | } 482 | } 483 | 484 | /// Removes a [`char`] from this `String` at a byte position and returns it. 485 | /// 486 | /// This is an *O*(*n*) operation, as it requires copying every element in the 487 | /// buffer. 488 | /// 489 | /// # Panics 490 | /// 491 | /// Panics if `idx` is larger than or equal to the `String`'s length, 492 | /// or if it does not lie on a [`char`] boundary. 493 | /// 494 | /// # Examples 495 | /// 496 | /// Basic usage: 497 | /// 498 | /// ``` 499 | /// # use generic_str::String; 500 | /// let mut s = String::from("foo"); 501 | /// 502 | /// assert_eq!(s.remove(0), 'f'); 503 | /// assert_eq!(s.remove(1), 'o'); 504 | /// assert_eq!(s.remove(0), 'o'); 505 | /// ``` 506 | #[inline] 507 | pub fn remove(&mut self, idx: usize) -> char { 508 | let ch = match self[idx..].chars().next() { 509 | Some(ch) => ch, 510 | None => panic!("cannot remove a char from the end of a string"), 511 | }; 512 | 513 | let next = idx + ch.len_utf8(); 514 | let len = self.len(); 515 | unsafe { 516 | core::ptr::copy( 517 | self.storage.as_ptr().add(next), 518 | self.storage.as_mut_ptr().add(idx), 519 | len - next, 520 | ); 521 | self.storage.set_len_unchecked(len - (next - idx)); 522 | } 523 | ch 524 | } 525 | 526 | /// Inserts a character into this `String` at a byte position. 527 | /// 528 | /// This is an *O*(*n*) operation as it requires copying every element in the 529 | /// buffer. 530 | /// 531 | /// # Panics 532 | /// 533 | /// Panics if `idx` is larger than the `String`'s length, or if it does not 534 | /// lie on a [`char`] boundary. 535 | /// 536 | /// # Examples 537 | /// 538 | /// Basic usage: 539 | /// 540 | /// ``` 541 | /// # use generic_str::{str, String}; 542 | /// let mut s = String::with_capacity(3); 543 | /// 544 | /// s.insert(0, 'f'); 545 | /// s.insert(1, 'o'); 546 | /// s.insert(2, 'o'); 547 | /// 548 | /// assert_eq!(s, <&str>::from("foo")); 549 | /// ``` 550 | #[inline] 551 | pub fn insert(&mut self, idx: usize, ch: char) { 552 | assert!(self.is_char_boundary(idx)); 553 | let mut bits = [0; 4]; 554 | let bits = ch.encode_utf8(&mut bits).as_bytes(); 555 | 556 | unsafe { 557 | self.insert_bytes(idx, bits); 558 | } 559 | } 560 | 561 | unsafe fn insert_bytes(&mut self, idx: usize, bytes: &[u8]) { 562 | let len = self.len(); 563 | let amt = bytes.len(); 564 | self.storage.reserve(amt); 565 | 566 | core::ptr::copy( 567 | self.storage.as_ptr().add(idx), 568 | self.storage.as_mut_ptr().add(idx + amt), 569 | len - idx, 570 | ); 571 | core::ptr::copy(bytes.as_ptr(), self.storage.as_mut_ptr().add(idx), amt); 572 | self.storage.set_len_unchecked(len + amt); 573 | } 574 | 575 | /// Inserts a string slice into this `String` at a byte position. 576 | /// 577 | /// This is an *O*(*n*) operation as it requires copying every element in the 578 | /// buffer. 579 | /// 580 | /// # Panics 581 | /// 582 | /// Panics if `idx` is larger than the `String`'s length, or if it does not 583 | /// lie on a [`char`] boundary. 584 | /// 585 | /// # Examples 586 | /// 587 | /// Basic usage: 588 | /// 589 | /// ``` 590 | /// # use generic_str::{str, String}; 591 | /// let mut s = String::from("bar"); 592 | /// 593 | /// s.insert_str(0, "foo"); 594 | /// 595 | /// assert_eq!(s, <&str>::from("foobar")); 596 | /// ``` 597 | #[inline] 598 | pub fn insert_str(&mut self, idx: usize, string: &str) { 599 | assert!(self.is_char_boundary(idx)); 600 | 601 | unsafe { 602 | self.insert_bytes(idx, string.as_bytes()); 603 | } 604 | } 605 | 606 | /// Returns a mutable reference to the contents of this `String`. 607 | /// 608 | /// # Safety 609 | /// 610 | /// This function is unsafe because it does not check that the bytes passed 611 | /// to it are valid UTF-8. If this constraint is violated, it may cause 612 | /// memory unsafety issues with future users of the `String`, as the rest of 613 | /// the standard library assumes that `String`s are valid UTF-8. 614 | /// 615 | /// # Examples 616 | /// 617 | /// Basic usage: 618 | /// 619 | /// ``` 620 | /// # use generic_str::{str, String}; 621 | /// let mut s = String::from("hello"); 622 | /// 623 | /// unsafe { 624 | /// let vec = s.as_mut_vec(); 625 | /// assert_eq!(&[104, 101, 108, 108, 111][..], &vec[..]); 626 | /// 627 | /// vec.reverse(); 628 | /// } 629 | /// assert_eq!(s, <&str>::from("olleh")); 630 | /// ``` 631 | #[inline] 632 | pub unsafe fn as_mut_vec(&mut self) -> &mut GenericVec { 633 | &mut self.storage 634 | } 635 | 636 | /// Splits the string into two at the given byte index. 637 | /// 638 | /// Returns a newly allocated `String`. `self` contains bytes `[0, at)`, and 639 | /// the returned `String` contains bytes `[at, len)`. `at` must be on the 640 | /// boundary of a UTF-8 code point. 641 | /// 642 | /// Note that the capacity of `self` does not change. 643 | /// 644 | /// # Panics 645 | /// 646 | /// Panics if `at` is not on a `UTF-8` code point boundary, or if it is beyond the last 647 | /// code point of the string. 648 | /// 649 | /// # Examples 650 | /// 651 | /// ``` 652 | /// # use generic_str::{str, String}; 653 | /// # fn main() { 654 | /// let mut hello = String::from("Hello, World!"); 655 | /// let world: String = hello.split_off(7); 656 | /// assert_eq!(hello, <&str>::from("Hello, ")); 657 | /// assert_eq!(world, <&str>::from("World!")); 658 | /// # } 659 | /// ``` 660 | #[inline] 661 | #[must_use = "use `.truncate()` if you don't need the other half"] 662 | pub fn split_off>( 663 | &mut self, 664 | at: usize, 665 | ) -> StringBase> { 666 | assert!(self.is_char_boundary(at)); 667 | let other = self.storage.split_off(at); 668 | unsafe { StringBase::from_utf8_unchecked(other) } 669 | } 670 | 671 | /// Truncates this `String`, removing all contents. 672 | /// 673 | /// While this means the `String` will have a length of zero, it does not 674 | /// touch its capacity. 675 | /// 676 | /// # Examples 677 | /// 678 | /// Basic usage: 679 | /// 680 | /// ``` 681 | /// # use generic_str::String; 682 | /// let mut s = String::from("foo"); 683 | /// 684 | /// s.clear(); 685 | /// 686 | /// assert!(s.is_empty()); 687 | /// assert_eq!(0, s.len()); 688 | /// assert_eq!(3, s.capacity()); 689 | /// ``` 690 | #[inline] 691 | pub fn clear(&mut self) { 692 | self.storage.clear() 693 | } 694 | 695 | /// Returns this `String`'s capacity, in bytes. 696 | /// 697 | /// # Examples 698 | /// 699 | /// Basic usage: 700 | /// 701 | /// ``` 702 | /// # use generic_str::String; 703 | /// let s = String::with_capacity(10); 704 | /// 705 | /// assert!(s.capacity() >= 10); 706 | /// ``` 707 | #[inline] 708 | pub fn capacity(&self) -> usize { 709 | self.storage.capacity() 710 | } 711 | } 712 | -------------------------------------------------------------------------------- /src/slice_utf32.rs: -------------------------------------------------------------------------------- 1 | use core::slice::SliceIndex; 2 | 3 | use crate::StringSlice; 4 | 5 | #[allow(non_camel_case_types)] 6 | /// Exactly the same as [`std::str`], except generic 7 | pub type str32 = StringSlice; 8 | 9 | impl str32 { 10 | /// Returns the length of `self`. 11 | /// 12 | /// This length is in bytes, not [`char`]s or graphemes. In other words, 13 | /// it may not be what a human considers the length of the string. 14 | /// 15 | /// [`char`]: prim@char 16 | /// 17 | /// # Examples 18 | /// 19 | /// Basic usage: 20 | /// 21 | /// ``` 22 | /// # use generic_str::String32; 23 | /// assert_eq!(String32::from("foo").len(), 3); 24 | /// assert_eq!(String32::from("ƒoo").len(), 3); // fancy f! 25 | /// ``` 26 | #[inline] 27 | pub fn len(&self) -> usize { 28 | self.storage.as_ref().len() 29 | } 30 | 31 | /// Returns `true` if `self` has a length of zero bytes. 32 | /// 33 | /// # Examples 34 | /// 35 | /// Basic usage: 36 | /// 37 | /// ``` 38 | /// # use generic_str::String32; 39 | /// let s = String32::from(""); 40 | /// assert!(s.is_empty()); 41 | /// 42 | /// let s = String32::from("not empty"); 43 | /// assert!(!s.is_empty()); 44 | /// ``` 45 | #[inline] 46 | pub fn is_empty(&self) -> bool { 47 | self.storage.is_empty() 48 | } 49 | 50 | /// Converts a string slice to a raw pointer. 51 | /// 52 | /// As string slices are a slice of bytes, the raw pointer points to a 53 | /// [`char`]. This pointer will be pointing to the first byte of the string 54 | /// slice. 55 | /// 56 | /// The caller must ensure that the returned pointer is never written to. 57 | /// If you need to mutate the contents of the string slice, use [`as_mut_ptr`]. 58 | /// 59 | /// [`as_mut_ptr`]: str::as_mut_ptr 60 | /// 61 | /// # Examples 62 | /// 63 | /// Basic usage: 64 | /// 65 | /// ``` 66 | /// # use generic_str::String32; 67 | /// let s = String32::from("Hello"); 68 | /// let ptr = s.as_ptr(); 69 | /// ``` 70 | #[inline] 71 | pub fn as_ptr(&self) -> *const char { 72 | self.storage.as_ref() as *const [char] as *const char 73 | } 74 | 75 | /// Converts a mutable string slice to a raw pointer. 76 | /// 77 | /// As string slices are a slice of bytes, the raw pointer points to a 78 | /// [`char`]. This pointer will be pointing to the first byte of the string 79 | /// slice. 80 | #[inline] 81 | pub fn as_mut_ptr(&mut self) -> *mut char { 82 | self.storage.as_mut() as *mut [char] as *mut char 83 | } 84 | 85 | /// Converts a mutable string slice to a raw pointer. 86 | /// 87 | /// As string slices are a slice of bytes, the raw pointer points to a 88 | /// [`char`]. This pointer will be pointing to the first byte of the string 89 | /// slice. 90 | #[inline] 91 | pub fn from_slice(data: &[char]) -> &Self { 92 | unsafe { core::mem::transmute(data) } 93 | } 94 | 95 | /// Converts a mutable string slice to a raw pointer. 96 | /// 97 | /// As string slices are a slice of bytes, the raw pointer points to a 98 | /// [`char`]. This pointer will be pointing to the first byte of the string 99 | /// slice. 100 | #[inline] 101 | pub fn from_slice_mut(data: &mut [char]) -> &mut Self { 102 | unsafe { core::mem::transmute(data) } 103 | } 104 | 105 | /// Returns a subslice of `str`. 106 | /// 107 | /// This is the non-panicking alternative to indexing the `str`. Returns 108 | /// [`None`] whenever equivalent indexing operation would panic. 109 | /// 110 | /// # Examples 111 | /// 112 | /// ``` 113 | /// # use generic_str::{str, String32}; 114 | /// let v = String32::from("🗻∈🌏"); 115 | /// 116 | /// assert_eq!(v.get(0..2).unwrap().to_owned(), String32::from("🗻∈")); 117 | /// 118 | /// // out of bounds 119 | /// assert!(v.get(..4).is_none()); 120 | /// ``` 121 | #[inline] 122 | pub fn get>(&self, i: I) -> Option<&I::Output> { 123 | i.get(self.as_ref()) 124 | } 125 | 126 | /// Returns a mutable subslice of `str`. 127 | /// 128 | /// This is the non-panicking alternative to indexing the `str`. Returns 129 | /// [`None`] whenever equivalent indexing operation would panic. 130 | /// 131 | /// # Examples 132 | /// 133 | /// ``` 134 | /// # use generic_str::{str, String32}; 135 | /// let mut v = String32::from("hello"); 136 | /// // correct length 137 | /// assert!(v.get_mut(0..5).is_some()); 138 | /// // out of bounds 139 | /// assert!(v.get_mut(..42).is_none()); 140 | /// 141 | /// { 142 | /// let s = v.get_mut(0..2); 143 | /// let s = s.map(|s| { 144 | /// s.make_ascii_uppercase(); 145 | /// &*s 146 | /// }); 147 | /// } 148 | /// assert_eq!(v, String32::from("HEllo")); 149 | /// ``` 150 | #[inline] 151 | pub fn get_mut>(&mut self, i: I) -> Option<&mut I::Output> { 152 | i.get_mut(self.as_mut()) 153 | } 154 | 155 | /// Returns an unchecked subslice of `str`. 156 | /// 157 | /// This is the unchecked alternative to indexing the `str`. 158 | /// 159 | /// # Safety 160 | /// 161 | /// Callers of this function are responsible that these preconditions are 162 | /// satisfied: 163 | /// 164 | /// * The starting index must not exceed the ending index; 165 | /// * Indexes must be within bounds of the original slice; 166 | /// * Indexes must lie on UTF-8 sequence boundaries. 167 | /// 168 | /// Failing that, the returned string slice may reference invalid memory or 169 | /// violate the invariants communicated by the `str` type. 170 | /// 171 | /// # Examples 172 | /// 173 | /// ``` 174 | /// # use generic_str::String32; 175 | /// let v = "🗻∈🌏"; 176 | /// unsafe { 177 | /// assert_eq!(v.get_unchecked(0..4), "🗻"); 178 | /// assert_eq!(v.get_unchecked(4..7), "∈"); 179 | /// assert_eq!(v.get_unchecked(7..11), "🌏"); 180 | /// } 181 | /// ``` 182 | #[inline] 183 | pub unsafe fn get_unchecked>(&self, i: I) -> &I::Output { 184 | // SAFETY: the caller must uphold the safety contract for `get_unchecked`; 185 | // the slice is dereferencable because `self` is a safe reference. 186 | // The returned pointer is safe because impls of `SliceIndex` have to guarantee that it is. 187 | &*i.get_unchecked(self) 188 | } 189 | 190 | /// Returns a mutable, unchecked subslice of `str`. 191 | /// 192 | /// This is the unchecked alternative to indexing the `str`. 193 | /// 194 | /// # Safety 195 | /// 196 | /// Callers of this function are responsible that these preconditions are 197 | /// satisfied: 198 | /// 199 | /// * The starting index must not exceed the ending index; 200 | /// * Indexes must be within bounds of the original slice; 201 | /// * Indexes must lie on UTF-8 sequence boundaries. 202 | /// 203 | /// Failing that, the returned string slice may reference invalid memory or 204 | /// violate the invariants communicated by the `str` type. 205 | /// 206 | /// # Examples 207 | /// 208 | /// ``` 209 | /// # use generic_str::String32; 210 | /// let mut v = String32::from("🗻∈🌏"); 211 | /// unsafe { 212 | /// assert_eq!(*v.get_unchecked_mut(0..2), String32::from("🗻∈")); 213 | /// } 214 | /// ``` 215 | #[inline] 216 | pub unsafe fn get_unchecked_mut>(&mut self, i: I) -> &mut I::Output { 217 | // SAFETY: the caller must uphold the safety contract for `get_unchecked_mut`; 218 | // the slice is dereferencable because `self` is a safe reference. 219 | // The returned pointer is safe because impls of `SliceIndex` have to guarantee that it is. 220 | &mut *i.get_unchecked_mut(self) 221 | } 222 | 223 | /// Divide one string slice into two at an index. 224 | /// 225 | /// The two slices returned go from the start of the string slice to `mid`, 226 | /// and from `mid` to the end of the string slice. 227 | /// 228 | /// To get mutable string slices instead, see the [`split_at_mut`] 229 | /// method. 230 | /// 231 | /// [`split_at_mut`]: str32::split_at_mut 232 | /// 233 | /// # Panics 234 | /// 235 | /// Panics if `mid` is past the end of the last code point of the string slice. 236 | /// 237 | /// # Examples 238 | /// 239 | /// Basic usage: 240 | /// 241 | /// ``` 242 | /// # use generic_str::String32; 243 | /// let s = String32::from("Per Martin-Löf"); 244 | /// 245 | /// let (first, last) = s.split_at(3); 246 | /// 247 | /// assert_eq!(first.to_owned(), String32::from("Per")); 248 | /// assert_eq!(last.to_owned(), String32::from(" Martin-Löf")); 249 | /// ``` 250 | #[inline] 251 | pub fn split_at(&self, mid: usize) -> (&Self, &Self) { 252 | if mid <= self.len() { 253 | unsafe { 254 | ( 255 | self.get_unchecked(0..mid), 256 | self.get_unchecked(mid..self.len()), 257 | ) 258 | } 259 | } else { 260 | #[cfg(feature = "alloc")] 261 | panic!("char index {} is out of bounds of `{}`", mid, self); 262 | 263 | #[cfg(not(feature = "alloc"))] 264 | panic!("char index {} is out of bounds", mid); 265 | } 266 | } 267 | 268 | /// Divide one mutable string slice into two at an index. 269 | /// 270 | /// The argument, `mid`, should be a byte offset from the start of the 271 | /// string. It must also be on the boundary of a UTF-8 code point. 272 | /// 273 | /// The two slices returned go from the start of the string slice to `mid`, 274 | /// and from `mid` to the end of the string slice. 275 | /// 276 | /// To get immutable string slices instead, see the [`split_at`] method. 277 | /// 278 | /// [`split_at`]: str32::split_at 279 | /// 280 | /// # Panics 281 | /// 282 | /// Panics if `mid` is not on a UTF-8 code point boundary, or if it is 283 | /// past the end of the last code point of the string slice. 284 | /// 285 | /// # Examples 286 | /// 287 | /// Basic usage: 288 | /// 289 | /// ``` 290 | /// # use generic_str::String32; 291 | /// let mut s = String32::from("Per Martin-Löf"); 292 | /// { 293 | /// let (first, last) = s.split_at_mut(3); 294 | /// first.make_ascii_uppercase(); 295 | /// assert_eq!(first.to_owned(), String32::from("PER")); 296 | /// assert_eq!(last.to_owned(), String32::from(" Martin-Löf")); 297 | /// } 298 | /// assert_eq!(s, String32::from("PER Martin-Löf")); 299 | /// ``` 300 | #[inline] 301 | pub fn split_at_mut(&mut self, mid: usize) -> (&mut Self, &mut Self) { 302 | // is_char_boundary checks that the index is in [0, .len()] 303 | if mid < self.len() { 304 | let len = self.len(); 305 | let ptr = self.as_mut_ptr(); 306 | // SAFETY: just checked that `mid` is on a char boundary. 307 | unsafe { 308 | ( 309 | Self::from_slice_mut(core::slice::from_raw_parts_mut(ptr, mid)), 310 | Self::from_slice_mut(core::slice::from_raw_parts_mut(ptr.add(mid), len - mid)), 311 | ) 312 | } 313 | } else { 314 | #[cfg(feature = "alloc")] 315 | panic!("char index {} is out of bounds of `{}`", mid, self); 316 | 317 | #[cfg(not(feature = "alloc"))] 318 | panic!("char index {} is out of bounds", mid); 319 | } 320 | } 321 | 322 | /// Converts this string to its ASCII upper case equivalent in-place. 323 | /// 324 | /// ASCII letters 'a' to 'z' are mapped to 'A' to 'Z', 325 | /// but non-ASCII letters are unchanged. 326 | /// 327 | /// To return a new uppercased value without modifying the existing one, use 328 | /// [`to_ascii_uppercase()`]. 329 | /// 330 | /// [`to_ascii_uppercase()`]: #method.to_ascii_uppercase 331 | /// 332 | /// # Examples 333 | /// 334 | /// ``` 335 | /// # use generic_str::String32; 336 | /// let mut s = String32::from("Grüße, Jürgen ❤"); 337 | /// 338 | /// s.make_ascii_uppercase(); 339 | /// 340 | /// assert_eq!(s, String32::from("GRüßE, JüRGEN ❤")); 341 | /// ``` 342 | #[inline] 343 | pub fn make_ascii_uppercase(&mut self) { 344 | self.storage.iter_mut().for_each(char::make_ascii_uppercase) 345 | } 346 | 347 | /// Converts this string to its ASCII lower case equivalent in-place. 348 | /// 349 | /// ASCII letters 'A' to 'Z' are mapped to 'a' to 'z', 350 | /// but non-ASCII letters are unchanged. 351 | /// 352 | /// To return a new lowercased value without modifying the existing one, use 353 | /// [`to_ascii_lowercase()`]. 354 | /// 355 | /// [`to_ascii_lowercase()`]: #method.to_ascii_lowercase 356 | /// 357 | /// # Examples 358 | /// 359 | /// ``` 360 | /// # use generic_str::String32; 361 | /// let mut s = String32::from("GRÜßE, JÜRGEN ❤"); 362 | /// 363 | /// s.make_ascii_lowercase(); 364 | /// 365 | /// assert_eq!(s, String32::from("grÜße, jÜrgen ❤")); 366 | /// ``` 367 | #[inline] 368 | pub fn make_ascii_lowercase(&mut self) { 369 | self.storage.iter_mut().for_each(char::make_ascii_lowercase) 370 | } 371 | } 372 | -------------------------------------------------------------------------------- /src/slice_utf32_index.rs: -------------------------------------------------------------------------------- 1 | use core::slice::SliceIndex; 2 | 3 | use crate::string_base::StringBase; 4 | 5 | /// Implements substring slicing with syntax `&self[..]` or `&mut self[..]`. 6 | /// 7 | /// Returns a slice of the whole string, i.e., returns `&self` or `&mut 8 | /// self`. Equivalent to `&self[0 .. len]` or `&mut self[0 .. len]`. Unlike 9 | /// other indexing operations, this can never panic. 10 | /// 11 | /// This operation is *O*(1). 12 | /// 13 | /// Prior to 1.20.0, these indexing operations were still supported by 14 | /// direct implementation of `Index` and `IndexMut`. 15 | /// 16 | /// Equivalent to `&self[0 .. len]` or `&mut self[0 .. len]`. 17 | unsafe impl SliceIndex> for core::ops::RangeFull { 18 | type Output = StringBase<[char]>; 19 | #[inline] 20 | fn get(self, slice: &StringBase<[char]>) -> Option<&Self::Output> { 21 | Some(slice) 22 | } 23 | #[inline] 24 | fn get_mut(self, slice: &mut StringBase<[char]>) -> Option<&mut Self::Output> { 25 | Some(slice) 26 | } 27 | #[inline] 28 | unsafe fn get_unchecked(self, slice: *const StringBase<[char]>) -> *const Self::Output { 29 | slice 30 | } 31 | #[inline] 32 | unsafe fn get_unchecked_mut(self, slice: *mut StringBase<[char]>) -> *mut Self::Output { 33 | slice 34 | } 35 | #[inline] 36 | fn index(self, slice: &StringBase<[char]>) -> &Self::Output { 37 | slice 38 | } 39 | #[inline] 40 | fn index_mut(self, slice: &mut StringBase<[char]>) -> &mut Self::Output { 41 | slice 42 | } 43 | } 44 | 45 | /// Implements substring slicing with syntax `&self[begin .. end]` or `&mut 46 | /// self[begin .. end]`. 47 | /// 48 | /// Returns a slice of the given string from the byte range 49 | /// [`begin`, `end`). 50 | /// 51 | /// This operation is *O*(1). 52 | /// 53 | /// Prior to 1.20.0, these indexing operations were still supported by 54 | /// direct implementation of `Index` and `IndexMut`. 55 | /// 56 | /// # Panics 57 | /// 58 | /// Panics if `begin > end`, or if `end > len`. 59 | unsafe impl SliceIndex> for core::ops::Range { 60 | type Output = StringBase<[char]>; 61 | #[inline] 62 | fn get(self, slice: &StringBase<[char]>) -> Option<&Self::Output> { 63 | if self.start <= self.end && self.end <= slice.len() { 64 | Some(unsafe { &*self.get_unchecked(slice) }) 65 | } else { 66 | None 67 | } 68 | } 69 | #[inline] 70 | fn get_mut(self, slice: &mut StringBase<[char]>) -> Option<&mut Self::Output> { 71 | if self.start <= self.end && self.end <= slice.len() { 72 | // SAFETY: just checked that `start` and `end` are on a char boundary. 73 | // We know the pointer is unique because we got it from `slice`. 74 | Some(unsafe { &mut *self.get_unchecked_mut(slice) }) 75 | } else { 76 | None 77 | } 78 | } 79 | #[inline] 80 | unsafe fn get_unchecked(self, slice: *const StringBase<[char]>) -> *const Self::Output { 81 | let slice = slice as *const [char]; 82 | // SAFETY: the caller guarantees that `self` is in bounds of `slice` 83 | // which satisfies all the conditions for `add`. 84 | let ptr = slice.as_ptr().add(self.start); 85 | let len = self.end - self.start; 86 | core::ptr::slice_from_raw_parts(ptr, len) as *const StringBase<[char]> 87 | } 88 | #[inline] 89 | unsafe fn get_unchecked_mut(self, slice: *mut StringBase<[char]>) -> *mut Self::Output { 90 | let slice = slice as *mut [char]; 91 | // SAFETY: see comments for `get_unchecked`. 92 | let ptr = slice.as_mut_ptr().add(self.start); 93 | let len = self.end - self.start; 94 | core::ptr::slice_from_raw_parts_mut(ptr, len) as *mut StringBase<[char]> 95 | } 96 | #[inline] 97 | fn index(self, slice: &StringBase<[char]>) -> &Self::Output { 98 | let end = self.end; 99 | match self.get(slice) { 100 | Some(s) => s, 101 | 102 | #[cfg(feature = "alloc")] 103 | None => panic!("char index {} is out of bounds of `{}`", end, &*slice), 104 | 105 | #[cfg(not(feature = "alloc"))] 106 | None => panic!("char index {} is out of bounds", end), 107 | } 108 | } 109 | #[inline] 110 | fn index_mut(self, slice: &mut StringBase<[char]>) -> &mut Self::Output { 111 | if self.start <= self.end && self.end <= slice.len() { 112 | // SAFETY: just checked that `start` and `end` are on a char boundary. 113 | // We know the pointer is unique because we got it from `slice`. 114 | unsafe { &mut *self.get_unchecked_mut(slice) } 115 | } else { 116 | #[cfg(feature = "alloc")] 117 | panic!("char index {} is out of bounds of `{}`", self.end, &*slice); 118 | 119 | #[cfg(not(feature = "alloc"))] 120 | panic!("char index {} is out of bounds", self.end); 121 | } 122 | } 123 | } 124 | 125 | /// Implements substring slicing with syntax `&self[.. end]` or `&mut 126 | /// self[.. end]`. 127 | /// 128 | /// Returns a slice of the given string from the char range [`0`, `end`). 129 | /// Equivalent to `&self[0 .. end]` or `&mut self[0 .. end]`. 130 | /// 131 | /// This operation is *O*(1). 132 | /// 133 | /// Prior to 1.20.0, these indexing operations were still supported by 134 | /// direct implementation of `Index` and `IndexMut`. 135 | /// 136 | /// # Panics 137 | /// 138 | /// Panics if `end > len`. 139 | unsafe impl SliceIndex> for core::ops::RangeTo { 140 | type Output = StringBase<[char]>; 141 | #[inline] 142 | fn get(self, slice: &StringBase<[char]>) -> Option<&Self::Output> { 143 | if self.end <= slice.len() { 144 | // SAFETY: just checked that `end` is on a char boundary, 145 | // and we are passing in a safe reference, so the return value will also be one. 146 | Some(unsafe { &*self.get_unchecked(slice) }) 147 | } else { 148 | None 149 | } 150 | } 151 | #[inline] 152 | fn get_mut(self, slice: &mut StringBase<[char]>) -> Option<&mut Self::Output> { 153 | if self.end <= slice.len() { 154 | // SAFETY: just checked that `end` is on a char boundary, 155 | // and we are passing in a safe reference, so the return value will also be one. 156 | Some(unsafe { &mut *self.get_unchecked_mut(slice) }) 157 | } else { 158 | None 159 | } 160 | } 161 | #[inline] 162 | unsafe fn get_unchecked(self, slice: *const StringBase<[char]>) -> *const Self::Output { 163 | let slice = slice as *const [char]; 164 | let ptr = slice.as_ptr(); 165 | core::ptr::slice_from_raw_parts(ptr, self.end) as *const StringBase<[char]> 166 | } 167 | #[inline] 168 | unsafe fn get_unchecked_mut(self, slice: *mut StringBase<[char]>) -> *mut Self::Output { 169 | let slice = slice as *mut [char]; 170 | let ptr = slice.as_mut_ptr(); 171 | core::ptr::slice_from_raw_parts_mut(ptr, self.end) as *mut StringBase<[char]> 172 | } 173 | #[inline] 174 | fn index(self, slice: &StringBase<[char]>) -> &Self::Output { 175 | let end = self.end; 176 | match self.get(slice) { 177 | Some(s) => s, 178 | 179 | #[cfg(feature = "alloc")] 180 | None => panic!("char index {} is out of bounds of `{}`", end, &*slice), 181 | 182 | #[cfg(not(feature = "alloc"))] 183 | None => panic!("char index {} is out of bounds", end), 184 | } 185 | } 186 | #[inline] 187 | fn index_mut(self, slice: &mut StringBase<[char]>) -> &mut Self::Output { 188 | if self.end <= slice.len() { 189 | // SAFETY: just checked that `end` is on a char boundary, 190 | // and we are passing in a safe reference, so the return value will also be one. 191 | unsafe { &mut *self.get_unchecked_mut(slice) } 192 | } else { 193 | #[cfg(feature = "alloc")] 194 | panic!("char index {} is out of bounds of `{}`", self.end, &*slice); 195 | 196 | #[cfg(not(feature = "alloc"))] 197 | panic!("char index {} is out of bounds", self.end); 198 | } 199 | } 200 | } 201 | 202 | /// Implements substring slicing with syntax `&self[begin ..]` or `&mut 203 | /// self[begin ..]`. 204 | /// 205 | /// Returns a slice of the given string from the char range [`begin`, 206 | /// `len`). Equivalent to `&self[begin .. len]` or `&mut self[begin .. 207 | /// len]`. 208 | /// 209 | /// This operation is *O*(1). 210 | /// 211 | /// Prior to 1.20.0, these indexing operations were still supported by 212 | /// direct implementation of `Index` and `IndexMut`. 213 | /// 214 | /// # Panics 215 | /// 216 | /// Panics if `begin > len`. 217 | unsafe impl SliceIndex> for core::ops::RangeFrom { 218 | type Output = StringBase<[char]>; 219 | #[inline] 220 | fn get(self, slice: &StringBase<[char]>) -> Option<&Self::Output> { 221 | if self.start <= slice.len() { 222 | // SAFETY: just checked that `start` is on a char boundary, 223 | // and we are passing in a safe reference, so the return value will also be one. 224 | Some(unsafe { &*self.get_unchecked(slice) }) 225 | } else { 226 | None 227 | } 228 | } 229 | #[inline] 230 | fn get_mut(self, slice: &mut StringBase<[char]>) -> Option<&mut Self::Output> { 231 | if self.start <= slice.len() { 232 | // SAFETY: just checked that `start` is on a char boundary, 233 | // and we are passing in a safe reference, so the return value will also be one. 234 | Some(unsafe { &mut *self.get_unchecked_mut(slice) }) 235 | } else { 236 | None 237 | } 238 | } 239 | #[inline] 240 | unsafe fn get_unchecked(self, slice: *const StringBase<[char]>) -> *const Self::Output { 241 | let slice = slice as *const [char]; 242 | // SAFETY: the caller guarantees that `self` is in bounds of `slice` 243 | // which satisfies all the conditions for `add`. 244 | let ptr = slice.as_ptr().add(self.start); 245 | let len = slice.len() - self.start; 246 | core::ptr::slice_from_raw_parts(ptr, len) as *const StringBase<[char]> 247 | } 248 | #[inline] 249 | unsafe fn get_unchecked_mut(self, slice: *mut StringBase<[char]>) -> *mut Self::Output { 250 | let slice = slice as *mut [char]; 251 | // SAFETY: identical to `get_unchecked`. 252 | let ptr = slice.as_mut_ptr().add(self.start); 253 | let len = slice.len() - self.start; 254 | core::ptr::slice_from_raw_parts_mut(ptr, len) as *mut StringBase<[char]> 255 | } 256 | #[inline] 257 | fn index(self, slice: &StringBase<[char]>) -> &Self::Output { 258 | let start = self.start; 259 | match self.get(slice) { 260 | Some(s) => s, 261 | 262 | #[cfg(feature = "alloc")] 263 | None => panic!("char index {} is out of bounds of `{}`", start, &*slice), 264 | 265 | #[cfg(not(feature = "alloc"))] 266 | None => panic!("char index {} is out of bounds", start), 267 | } 268 | } 269 | #[inline] 270 | fn index_mut(self, slice: &mut StringBase<[char]>) -> &mut Self::Output { 271 | if self.start <= slice.len() { 272 | // SAFETY: just checked that `start` is on a char boundary, 273 | // and we are passing in a safe reference, so the return value will also be one. 274 | unsafe { &mut *self.get_unchecked_mut(slice) } 275 | } else { 276 | #[cfg(feature = "alloc")] 277 | panic!( 278 | "char index {} is out of bounds of `{}`", 279 | self.start, &*slice 280 | ); 281 | 282 | #[cfg(not(feature = "alloc"))] 283 | panic!("char index {} is out of bounds", self.start); 284 | } 285 | } 286 | } 287 | 288 | /// Implements substring slicing with syntax `&self[..= end]` or `&mut 289 | /// self[..= end]`. 290 | /// 291 | /// Returns a slice of the given string from the byte range [0, `end`]. 292 | /// Equivalent to `&self [0 .. end + 1]`, except if `end` has the maximum 293 | /// value for `usize`. 294 | /// 295 | /// This operation is *O*(1). 296 | /// 297 | /// # Panics 298 | /// 299 | /// Panics if `end` does not point to the ending byte offset of a character 300 | /// (`end + 1` is either a starting byte offset as defined by 301 | /// `is_char_boundary`, or equal to `len`), or if `end >= len`. 302 | unsafe impl SliceIndex> for core::ops::RangeToInclusive { 303 | type Output = StringBase<[char]>; 304 | #[inline] 305 | fn get(self, slice: &StringBase<[char]>) -> Option<&Self::Output> { 306 | if self.end == usize::MAX { 307 | None 308 | } else { 309 | (..self.end + 1).get(slice) 310 | } 311 | } 312 | #[inline] 313 | fn get_mut(self, slice: &mut StringBase<[char]>) -> Option<&mut Self::Output> { 314 | if self.end == usize::MAX { 315 | None 316 | } else { 317 | (..self.end + 1).get_mut(slice) 318 | } 319 | } 320 | #[inline] 321 | unsafe fn get_unchecked(self, slice: *const StringBase<[char]>) -> *const Self::Output { 322 | // SAFETY: the caller must uphold the safety contract for `get_unchecked`. 323 | (..self.end + 1).get_unchecked(slice) 324 | } 325 | #[inline] 326 | unsafe fn get_unchecked_mut(self, slice: *mut StringBase<[char]>) -> *mut Self::Output { 327 | // SAFETY: the caller must uphold the safety contract for `get_unchecked_mut`. 328 | (..self.end + 1).get_unchecked_mut(slice) 329 | } 330 | #[inline] 331 | fn index(self, slice: &StringBase<[char]>) -> &Self::Output { 332 | if self.end == usize::MAX { 333 | str_index_overflow_fail(); 334 | } 335 | (..self.end + 1).index(slice) 336 | } 337 | #[inline] 338 | fn index_mut(self, slice: &mut StringBase<[char]>) -> &mut Self::Output { 339 | if self.end == usize::MAX { 340 | str_index_overflow_fail(); 341 | } 342 | (..self.end + 1).index_mut(slice) 343 | } 344 | } 345 | 346 | #[inline(never)] 347 | #[cold] 348 | #[track_caller] 349 | fn str_index_overflow_fail() -> ! { 350 | panic!("attempted to index str up to maximum usize"); 351 | } 352 | -------------------------------------------------------------------------------- /src/slice_utf8.rs: -------------------------------------------------------------------------------- 1 | use core::{ 2 | slice::SliceIndex, 3 | str::{Bytes, CharIndices, Chars}, 4 | }; 5 | 6 | use crate::{from_utf8_unchecked_mut, validation::truncate_to_char_boundary, StringSlice}; 7 | 8 | #[allow(non_camel_case_types)] 9 | /// Exactly the same as [`std::primitive::str`], except generic 10 | pub type str = StringSlice; 11 | 12 | impl str { 13 | /// Returns the length of `self`. 14 | /// 15 | /// This length is in bytes, not [`char`]s or graphemes. In other words, 16 | /// it may not be what a human considers the length of the string. 17 | /// 18 | /// [`char`]: prim@char 19 | /// 20 | /// # Examples 21 | /// 22 | /// Basic usage: 23 | /// 24 | /// ``` 25 | /// # use generic_str::str; 26 | /// let len = <&str>::from("foo").len(); 27 | /// assert_eq!(3, len); 28 | /// 29 | /// assert_eq!("ƒoo".len(), 4); // fancy f! 30 | /// assert_eq!("ƒoo".chars().count(), 3); 31 | /// ``` 32 | #[inline] 33 | pub fn len(&self) -> usize { 34 | self.storage.as_ref().len() 35 | } 36 | 37 | /// Returns `true` if `self` has a length of zero bytes. 38 | /// 39 | /// # Examples 40 | /// 41 | /// Basic usage: 42 | /// 43 | /// ``` 44 | /// # use generic_str::str; 45 | /// let s: &str = "".into(); 46 | /// assert!(s.is_empty()); 47 | /// 48 | /// let s: &str = "not empty".into(); 49 | /// assert!(!s.is_empty()); 50 | /// ``` 51 | #[inline] 52 | pub fn is_empty(&self) -> bool { 53 | self.storage.is_empty() 54 | } 55 | 56 | /// Checks that `index`-th byte is the first byte in a UTF-8 code point 57 | /// sequence or the end of the string. 58 | /// 59 | /// The start and end of the string (when `index == self.len()`) are 60 | /// considered to be boundaries. 61 | /// 62 | /// Returns `false` if `index` is greater than `self.len()`. 63 | /// 64 | /// # Examples 65 | /// 66 | /// ``` 67 | /// # use generic_str::str; 68 | /// let s: &str = "Löwe 老虎 Léopard".into(); 69 | /// assert!(s.is_char_boundary(0)); 70 | /// // start of `老` 71 | /// assert!(s.is_char_boundary(6)); 72 | /// assert!(s.is_char_boundary(s.len())); 73 | /// 74 | /// // second byte of `ö` 75 | /// assert!(!s.is_char_boundary(2)); 76 | /// 77 | /// // third byte of `老` 78 | /// assert!(!s.is_char_boundary(8)); 79 | /// ``` 80 | #[inline] 81 | pub fn is_char_boundary(&self, index: usize) -> bool { 82 | // 0 is always ok. 83 | // Test for 0 explicitly so that it can optimize out the check 84 | // easily and skip reading string data for that case. 85 | // Note that optimizing `self.get(..index)` relies on this. 86 | if index == 0 { 87 | return true; 88 | } 89 | 90 | match self.as_bytes().get(index) { 91 | // For `None` we have two options: 92 | // 93 | // - index == self.len() 94 | // Empty strings are valid, so return true 95 | // - index > self.len() 96 | // In this case return false 97 | // 98 | // The check is placed exactly here, because it improves generated 99 | // code on higher opt-levels. See PR #84751 for more details. 100 | None => index == self.len(), 101 | 102 | // This is bit magic equivalent to: b < 128 || b >= 192 103 | Some(&b) => (b as i8) >= -0x40, 104 | } 105 | } 106 | 107 | /// Converts a string slice to a byte slice. To convert the byte slice back 108 | /// into a string slice, use the [`from_utf8`] function. 109 | /// 110 | /// [`from_utf8`]: crate::from_utf8 111 | /// 112 | /// # Examples 113 | /// 114 | /// Basic usage: 115 | /// 116 | /// ``` 117 | /// # use generic_str::str; 118 | /// let bytes = <&str>::from("bors").as_bytes(); 119 | /// assert_eq!(b"bors", bytes); 120 | /// ``` 121 | #[inline(always)] 122 | pub fn as_bytes(&self) -> &[u8] { 123 | // SAFETY: const sound because we transmute two types with the same layout 124 | unsafe { core::mem::transmute(self.storage.as_ref()) } 125 | } 126 | 127 | /// Converts a mutable string slice to a mutable byte slice. 128 | /// 129 | /// # Safety 130 | /// 131 | /// The caller must ensure that the content of the slice is valid UTF-8 132 | /// before the borrow ends and the underlying `str` is used. 133 | /// 134 | /// Use of a `str` whose contents are not valid UTF-8 is undefined behavior. 135 | /// 136 | /// # Examples 137 | /// 138 | /// Basic usage: 139 | /// 140 | /// ``` 141 | /// # use generic_str::String; 142 | /// let mut s = String::from("Hello"); 143 | /// let bytes = unsafe { s.as_bytes_mut() }; 144 | /// 145 | /// assert_eq!(bytes, b"Hello"); 146 | /// ``` 147 | /// 148 | /// Mutability: 149 | /// 150 | /// ``` 151 | /// # use generic_str::{str, String}; 152 | /// let mut s = String::from("🗻∈🌏"); 153 | /// 154 | /// unsafe { 155 | /// let bytes = s.as_bytes_mut(); 156 | /// 157 | /// bytes[0] = 0xF0; 158 | /// bytes[1] = 0x9F; 159 | /// bytes[2] = 0x8D; 160 | /// bytes[3] = 0x94; 161 | /// } 162 | /// 163 | /// assert_eq!(s, <&str>::from("🍔∈🌏")); 164 | /// ``` 165 | #[inline(always)] 166 | pub unsafe fn as_bytes_mut(&mut self) -> &mut [u8] { 167 | // SAFETY: const sound because we transmute two types with the same layout 168 | core::mem::transmute(self.storage.as_mut()) 169 | } 170 | 171 | /// Converts a string slice to a raw pointer. 172 | /// 173 | /// As string slices are a slice of bytes, the raw pointer points to a 174 | /// [`u8`]. This pointer will be pointing to the first byte of the string 175 | /// slice. 176 | /// 177 | /// The caller must ensure that the returned pointer is never written to. 178 | /// If you need to mutate the contents of the string slice, use [`as_mut_ptr`]. 179 | /// 180 | /// [`as_mut_ptr`]: str::as_mut_ptr 181 | /// 182 | /// # Examples 183 | /// 184 | /// Basic usage: 185 | /// 186 | /// ``` 187 | /// # use generic_str::str; 188 | /// let s: &str = "Hello".into(); 189 | /// let ptr = s.as_ptr(); 190 | /// ``` 191 | #[inline] 192 | pub fn as_ptr(&self) -> *const u8 { 193 | self.storage.as_ref() as *const [u8] as *const u8 194 | } 195 | 196 | /// Converts a mutable string slice to a raw pointer. 197 | /// 198 | /// As string slices are a slice of bytes, the raw pointer points to a 199 | /// [`u8`]. This pointer will be pointing to the first byte of the string 200 | /// slice. 201 | /// 202 | /// It is your responsibility to make sure that the string slice only gets 203 | /// modified in a way that it remains valid UTF-8. 204 | #[inline] 205 | pub fn as_mut_ptr(&mut self) -> *mut u8 { 206 | self.storage.as_mut() as *mut [u8] as *mut u8 207 | } 208 | 209 | /// Returns a subslice of `str`. 210 | /// 211 | /// This is the non-panicking alternative to indexing the `str`. Returns 212 | /// [`None`] whenever equivalent indexing operation would panic. 213 | /// 214 | /// # Examples 215 | /// 216 | /// ``` 217 | /// # use generic_str::{str, String}; 218 | /// let v = String::from("🗻∈🌏"); 219 | /// 220 | /// assert_eq!(v.get(0..4), Some(<&str>::from("🗻"))); 221 | /// 222 | /// // indices not on UTF-8 sequence boundaries 223 | /// assert!(v.get(1..).is_none()); 224 | /// assert!(v.get(..8).is_none()); 225 | /// 226 | /// // out of bounds 227 | /// assert!(v.get(..42).is_none()); 228 | /// ``` 229 | #[inline] 230 | pub fn get>(&self, i: I) -> Option<&I::Output> { 231 | i.get(self.as_ref()) 232 | } 233 | 234 | /// Returns a mutable subslice of `str`. 235 | /// 236 | /// This is the non-panicking alternative to indexing the `str`. Returns 237 | /// [`None`] whenever equivalent indexing operation would panic. 238 | /// 239 | /// # Examples 240 | /// 241 | /// ``` 242 | /// # use generic_str::{str, String}; 243 | /// let mut v = String::from("hello"); 244 | /// // correct length 245 | /// assert!(v.get_mut(0..5).is_some()); 246 | /// // out of bounds 247 | /// assert!(v.get_mut(..42).is_none()); 248 | /// assert_eq!(v.get_mut(0..2).map(|v| &*v), Some(<&str>::from("he"))); 249 | /// 250 | /// assert_eq!(v, <&str>::from("hello")); 251 | /// { 252 | /// let s = v.get_mut(0..2); 253 | /// let s = s.map(|s| { 254 | /// s.make_ascii_uppercase(); 255 | /// &*s 256 | /// }); 257 | /// assert_eq!(s, Some(<&str>::from("HE"))); 258 | /// } 259 | /// assert_eq!(v, <&str>::from("HEllo")); 260 | /// ``` 261 | #[inline] 262 | pub fn get_mut>(&mut self, i: I) -> Option<&mut I::Output> { 263 | i.get_mut(self.as_mut()) 264 | } 265 | 266 | /// Returns an unchecked subslice of `str`. 267 | /// 268 | /// This is the unchecked alternative to indexing the `str`. 269 | /// 270 | /// # Safety 271 | /// 272 | /// Callers of this function are responsible that these preconditions are 273 | /// satisfied: 274 | /// 275 | /// * The starting index must not exceed the ending index; 276 | /// * Indexes must be within bounds of the original slice; 277 | /// * Indexes must lie on UTF-8 sequence boundaries. 278 | /// 279 | /// Failing that, the returned string slice may reference invalid memory or 280 | /// violate the invariants communicated by the `str` type. 281 | /// 282 | /// # Examples 283 | /// 284 | /// ``` 285 | /// # use generic_str::str; 286 | /// let v = <&str>::from("🗻∈🌏"); 287 | /// unsafe { 288 | /// assert_eq!(v.get_unchecked(0..4), <&str>::from("🗻")); 289 | /// assert_eq!(v.get_unchecked(4..7), <&str>::from("∈")); 290 | /// assert_eq!(v.get_unchecked(7..11), <&str>::from("🌏")); 291 | /// } 292 | /// ``` 293 | #[inline] 294 | pub unsafe fn get_unchecked>(&self, i: I) -> &I::Output { 295 | // SAFETY: the caller must uphold the safety contract for `get_unchecked`; 296 | // the slice is dereferencable because `self` is a safe reference. 297 | // The returned pointer is safe because impls of `SliceIndex` have to guarantee that it is. 298 | &*i.get_unchecked(self) 299 | } 300 | 301 | /// Returns a mutable, unchecked subslice of `str`. 302 | /// 303 | /// This is the unchecked alternative to indexing the `str`. 304 | /// 305 | /// # Safety 306 | /// 307 | /// Callers of this function are responsible that these preconditions are 308 | /// satisfied: 309 | /// 310 | /// * The starting index must not exceed the ending index; 311 | /// * Indexes must be within bounds of the original slice; 312 | /// * Indexes must lie on UTF-8 sequence boundaries. 313 | /// 314 | /// Failing that, the returned string slice may reference invalid memory or 315 | /// violate the invariants communicated by the `str` type. 316 | /// 317 | /// # Examples 318 | /// 319 | /// ``` 320 | /// # use generic_str::{str, String}; 321 | /// let mut v = String::from("🗻∈🌏"); 322 | /// unsafe { 323 | /// assert_eq!(v.get_unchecked_mut(0..4), <&str>::from("🗻")); 324 | /// assert_eq!(v.get_unchecked_mut(4..7), <&str>::from("∈")); 325 | /// assert_eq!(v.get_unchecked_mut(7..11), <&str>::from("🌏")); 326 | /// } 327 | /// ``` 328 | #[inline] 329 | pub unsafe fn get_unchecked_mut>(&mut self, i: I) -> &mut I::Output { 330 | // SAFETY: the caller must uphold the safety contract for `get_unchecked_mut`; 331 | // the slice is dereferencable because `self` is a safe reference. 332 | // The returned pointer is safe because impls of `SliceIndex` have to guarantee that it is. 333 | &mut *i.get_unchecked_mut(self) 334 | } 335 | 336 | /// Divide one string slice into two at an index. 337 | /// 338 | /// The argument, `mid`, should be a byte offset from the start of the 339 | /// string. It must also be on the boundary of a UTF-8 code point. 340 | /// 341 | /// The two slices returned go from the start of the string slice to `mid`, 342 | /// and from `mid` to the end of the string slice. 343 | /// 344 | /// To get mutable string slices instead, see the [`split_at_mut`] 345 | /// method. 346 | /// 347 | /// [`split_at_mut`]: str::split_at_mut 348 | /// 349 | /// # Panics 350 | /// 351 | /// Panics if `mid` is not on a UTF-8 code point boundary, or if it is 352 | /// past the end of the last code point of the string slice. 353 | /// 354 | /// # Examples 355 | /// 356 | /// Basic usage: 357 | /// 358 | /// ``` 359 | /// # use generic_str::str; 360 | /// let s: &str = "Per Martin-Löf".into(); 361 | /// 362 | /// let (first, last) = s.split_at(3); 363 | /// 364 | /// assert_eq!(first, <&str>::from("Per")); 365 | /// assert_eq!(last, <&str>::from(" Martin-Löf")); 366 | /// ``` 367 | #[inline] 368 | pub fn split_at(&self, mid: usize) -> (&Self, &Self) { 369 | // is_char_boundary checks that the index is in [0, .len()] 370 | if self.is_char_boundary(mid) { 371 | // SAFETY: just checked that `mid` is on a char boundary. 372 | unsafe { 373 | ( 374 | self.get_unchecked(0..mid), 375 | self.get_unchecked(mid..self.len()), 376 | ) 377 | } 378 | } else { 379 | slice_error_fail(self, 0, mid) 380 | } 381 | } 382 | 383 | /// Divide one mutable string slice into two at an index. 384 | /// 385 | /// The argument, `mid`, should be a byte offset from the start of the 386 | /// string. It must also be on the boundary of a UTF-8 code point. 387 | /// 388 | /// The two slices returned go from the start of the string slice to `mid`, 389 | /// and from `mid` to the end of the string slice. 390 | /// 391 | /// To get immutable string slices instead, see the [`split_at`] method. 392 | /// 393 | /// [`split_at`]: str::split_at 394 | /// 395 | /// # Panics 396 | /// 397 | /// Panics if `mid` is not on a UTF-8 code point boundary, or if it is 398 | /// past the end of the last code point of the string slice. 399 | /// 400 | /// # Examples 401 | /// 402 | /// Basic usage: 403 | /// 404 | /// ``` 405 | /// # use generic_str::{str, String}; 406 | /// let mut s = String::from("Per Martin-Löf"); 407 | /// { 408 | /// let (first, last) = s.split_at_mut(3); 409 | /// first.make_ascii_uppercase(); 410 | /// assert_eq!(first, <&str>::from("PER")); 411 | /// assert_eq!(last, <&str>::from(" Martin-Löf")); 412 | /// } 413 | /// assert_eq!(s, <&str>::from("PER Martin-Löf")); 414 | /// ``` 415 | #[inline] 416 | pub fn split_at_mut(&mut self, mid: usize) -> (&mut Self, &mut Self) { 417 | // is_char_boundary checks that the index is in [0, .len()] 418 | if self.is_char_boundary(mid) { 419 | let len = self.len(); 420 | let ptr = self.as_mut_ptr(); 421 | // SAFETY: just checked that `mid` is on a char boundary. 422 | unsafe { 423 | ( 424 | from_utf8_unchecked_mut(core::slice::from_raw_parts_mut(ptr, mid)), 425 | from_utf8_unchecked_mut(core::slice::from_raw_parts_mut( 426 | ptr.add(mid), 427 | len - mid, 428 | )), 429 | ) 430 | } 431 | } else { 432 | slice_error_fail(self, 0, mid) 433 | } 434 | } 435 | 436 | /// Returns an iterator over the [`char`]s of a string slice. 437 | /// 438 | /// As a string slice consists of valid UTF-8, we can iterate through a 439 | /// string slice by [`char`]. This method returns such an iterator. 440 | /// 441 | /// It's important to remember that [`char`] represents a Unicode Scalar 442 | /// Value, and may not match your idea of what a 'character' is. Iteration 443 | /// over grapheme clusters may be what you actually want. This functionality 444 | /// is not provided by Rust's standard library, check crates.io instead. 445 | /// 446 | /// # Examples 447 | /// 448 | /// Basic usage: 449 | /// 450 | /// ``` 451 | /// # use generic_str::str; 452 | /// let word = <&str>::from("goodbye"); 453 | /// 454 | /// let count = word.chars().count(); 455 | /// assert_eq!(7, count); 456 | /// 457 | /// let mut chars = word.chars(); 458 | /// 459 | /// assert_eq!(Some('g'), chars.next()); 460 | /// assert_eq!(Some('o'), chars.next()); 461 | /// assert_eq!(Some('o'), chars.next()); 462 | /// assert_eq!(Some('d'), chars.next()); 463 | /// assert_eq!(Some('b'), chars.next()); 464 | /// assert_eq!(Some('y'), chars.next()); 465 | /// assert_eq!(Some('e'), chars.next()); 466 | /// 467 | /// assert_eq!(None, chars.next()); 468 | /// ``` 469 | /// 470 | /// Remember, [`char`]s may not match your intuition about characters: 471 | /// 472 | /// [`char`]: prim@char 473 | /// 474 | /// ``` 475 | /// let y = "y̆"; 476 | /// 477 | /// let mut chars = y.chars(); 478 | /// 479 | /// assert_eq!(Some('y'), chars.next()); // not 'y̆' 480 | /// assert_eq!(Some('\u{0306}'), chars.next()); 481 | /// 482 | /// assert_eq!(None, chars.next()); 483 | /// ``` 484 | #[inline] 485 | pub fn chars(&self) -> Chars<'_> { 486 | let s: &core::primitive::str = self.into(); 487 | s.chars() 488 | } 489 | pub fn char_indices(&self) -> CharIndices<'_> { 490 | let s: &core::primitive::str = self.into(); 491 | s.char_indices() 492 | } 493 | 494 | /// An iterator over the bytes of a string slice. 495 | /// 496 | /// As a string slice consists of a sequence of bytes, we can iterate 497 | /// through a string slice by byte. This method returns such an iterator. 498 | /// 499 | /// # Examples 500 | /// 501 | /// Basic usage: 502 | /// 503 | /// ``` 504 | /// # use generic_str::str; 505 | /// let mut bytes = <&str>::from("bors").bytes(); 506 | /// 507 | /// assert_eq!(Some(b'b'), bytes.next()); 508 | /// assert_eq!(Some(b'o'), bytes.next()); 509 | /// assert_eq!(Some(b'r'), bytes.next()); 510 | /// assert_eq!(Some(b's'), bytes.next()); 511 | /// 512 | /// assert_eq!(None, bytes.next()); 513 | /// ``` 514 | #[inline] 515 | pub fn bytes(&self) -> Bytes<'_> { 516 | let s: &core::primitive::str = self.into(); 517 | s.bytes() 518 | } 519 | 520 | /// Checks if all characters in this string are within the ASCII range. 521 | /// 522 | /// # Examples 523 | /// 524 | /// ``` 525 | /// # use generic_str::str; 526 | /// let ascii = <&str>::from("hello!\n"); 527 | /// let non_ascii = <&str>::from("Grüße, Jürgen ❤"); 528 | /// 529 | /// assert!(ascii.is_ascii()); 530 | /// assert!(!non_ascii.is_ascii()); 531 | /// ``` 532 | #[inline] 533 | pub fn is_ascii(&self) -> bool { 534 | // We can treat each byte as character here: all multibyte characters 535 | // start with a byte that is not in the ascii range, so we will stop 536 | // there already. 537 | self.as_bytes().is_ascii() 538 | } 539 | 540 | /// Checks that two strings are an ASCII case-insensitive match. 541 | /// 542 | /// Same as `to_ascii_lowercase(a) == to_ascii_lowercase(b)`, 543 | /// but without allocating and copying temporaries. 544 | /// 545 | /// # Examples 546 | /// 547 | /// ``` 548 | /// # use generic_str::str; 549 | /// assert!(<&str>::from("Ferris").eq_ignore_ascii_case("FERRIS".into())); 550 | /// assert!(<&str>::from("Ferrös").eq_ignore_ascii_case("FERRöS".into())); 551 | /// assert!(!<&str>::from("Ferrös").eq_ignore_ascii_case("FERRÖS".into())); 552 | /// ``` 553 | #[inline] 554 | pub fn eq_ignore_ascii_case(&self, other: &Self) -> bool { 555 | self.as_bytes().eq_ignore_ascii_case(other.as_bytes()) 556 | } 557 | 558 | /// Converts this string to its ASCII upper case equivalent in-place. 559 | /// 560 | /// ASCII letters 'a' to 'z' are mapped to 'A' to 'Z', 561 | /// but non-ASCII letters are unchanged. 562 | /// 563 | /// To return a new uppercased value without modifying the existing one, use 564 | /// [`to_ascii_uppercase()`]. 565 | /// 566 | /// [`to_ascii_uppercase()`]: #method.to_ascii_uppercase 567 | /// 568 | /// # Examples 569 | /// 570 | /// ``` 571 | /// # use generic_str::{str, String}; 572 | /// let mut s = String::from("Grüße, Jürgen ❤"); 573 | /// 574 | /// s.make_ascii_uppercase(); 575 | /// 576 | /// assert_eq!(s, <&str>::from("GRüßE, JüRGEN ❤")); 577 | /// ``` 578 | #[inline] 579 | pub fn make_ascii_uppercase(&mut self) { 580 | // SAFETY: safe because we transmute two types with the same layout. 581 | let me = unsafe { self.as_bytes_mut() }; 582 | me.make_ascii_uppercase() 583 | } 584 | 585 | /// Converts this string to its ASCII lower case equivalent in-place. 586 | /// 587 | /// ASCII letters 'A' to 'Z' are mapped to 'a' to 'z', 588 | /// but non-ASCII letters are unchanged. 589 | /// 590 | /// To return a new lowercased value without modifying the existing one, use 591 | /// [`to_ascii_lowercase()`]. 592 | /// 593 | /// [`to_ascii_lowercase()`]: #method.to_ascii_lowercase 594 | /// 595 | /// # Examples 596 | /// 597 | /// ``` 598 | /// # use generic_str::{str, String}; 599 | /// let mut s = String::from("GRÜßE, JÜRGEN ❤"); 600 | /// 601 | /// s.make_ascii_lowercase(); 602 | /// 603 | /// assert_eq!(s, <&str>::from("grÜße, jÜrgen ❤")); 604 | /// ``` 605 | #[inline] 606 | pub fn make_ascii_lowercase(&mut self) { 607 | // SAFETY: safe because we transmute two types with the same layout. 608 | let me = unsafe { self.as_bytes_mut() }; 609 | me.make_ascii_lowercase() 610 | } 611 | 612 | /// Returns the lowercase equivalent of this string slice, as a new [`String`]. 613 | /// 614 | /// 'Lowercase' is defined according to the terms of the Unicode Derived Core Property 615 | /// `Lowercase`. 616 | /// 617 | /// Since some characters can expand into multiple characters when changing 618 | /// the case, this function returns a [`String`] instead of modifying the 619 | /// parameter in-place. 620 | /// 621 | /// # Examples 622 | /// 623 | /// Basic usage: 624 | /// 625 | /// ``` 626 | /// # use generic_str::str; 627 | /// let s = <&str>::from("HELLO"); 628 | /// 629 | /// assert_eq!(s.to_lowercase(), <&str>::from("hello")); 630 | /// ``` 631 | /// 632 | /// A tricky example, with sigma: 633 | /// 634 | /// ``` 635 | /// # use generic_str::str; 636 | /// let sigma = <&str>::from("Σ"); 637 | /// 638 | /// assert_eq!(sigma.to_lowercase(), <&str>::from("σ")); 639 | /// 640 | /// // but at the end of a word, it's ς, not σ: 641 | /// let odysseus = <&str>::from("ὈΔΥΣΣΕΎΣ"); 642 | /// 643 | /// assert_eq!(odysseus.to_lowercase(), <&str>::from("ὀδυσσεύς")); 644 | /// ``` 645 | /// 646 | /// Languages without case are not changed: 647 | /// 648 | /// ``` 649 | /// # use generic_str::str; 650 | /// let new_year = <&str>::from("农历新年"); 651 | /// 652 | /// assert_eq!(new_year, new_year.to_lowercase()); 653 | /// ``` 654 | #[cfg(feature = "alloc")] 655 | pub fn to_lowercase(&self) -> crate::String { 656 | use core::unicode::conversions; 657 | 658 | let mut s = crate::String::with_capacity(self.len()); 659 | for (i, c) in self[..].char_indices() { 660 | if c == 'Σ' { 661 | // Σ maps to σ, except at the end of a word where it maps to ς. 662 | // This is the only conditional (contextual) but language-independent mapping 663 | // in `SpecialCasing.txt`, 664 | // so hard-code it rather than have a generic "condition" mechanism. 665 | // See https://github.com/rust-lang/rust/issues/26035 666 | map_uppercase_sigma(self, i, &mut s) 667 | } else { 668 | match conversions::to_lower(c) { 669 | [a, '\0', _] => s.push(a), 670 | [a, b, '\0'] => { 671 | s.push(a); 672 | s.push(b); 673 | } 674 | [a, b, c] => { 675 | s.push(a); 676 | s.push(b); 677 | s.push(c); 678 | } 679 | } 680 | } 681 | } 682 | return s; 683 | 684 | fn map_uppercase_sigma(from: &str, i: usize, to: &mut crate::String) { 685 | // See http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf#G33992 686 | // for the definition of `Final_Sigma`. 687 | debug_assert!('Σ'.len_utf8() == 2); 688 | let is_word_final = case_ignoreable_then_cased(from[..i].chars().rev()) 689 | && !case_ignoreable_then_cased(from[i + 2..].chars()); 690 | to.push_str(if is_word_final { "ς" } else { "σ" }.into()); 691 | } 692 | 693 | fn case_ignoreable_then_cased>(mut iter: I) -> bool { 694 | use core::unicode::{Case_Ignorable, Cased}; 695 | match iter.find(|&c| !Case_Ignorable(c)) { 696 | Some(c) => Cased(c), 697 | None => false, 698 | } 699 | } 700 | } 701 | 702 | /// Returns the uppercase equivalent of this string slice, as a new [`String`]. 703 | /// 704 | /// 'Uppercase' is defined according to the terms of the Unicode Derived Core Property 705 | /// `Uppercase`. 706 | /// 707 | /// Since some characters can expand into multiple characters when changing 708 | /// the case, this function returns a [`String`] instead of modifying the 709 | /// parameter in-place. 710 | /// 711 | /// # Examples 712 | /// 713 | /// Basic usage: 714 | /// 715 | /// ``` 716 | /// # use generic_str::str; 717 | /// let s = <&str>::from("hello"); 718 | /// 719 | /// assert_eq!(s.to_uppercase(), <&str>::from("HELLO")); 720 | /// ``` 721 | /// 722 | /// Scripts without case are not changed: 723 | /// 724 | /// ``` 725 | /// # use generic_str::str; 726 | /// let new_year = <&str>::from("农历新年"); 727 | /// 728 | /// assert_eq!(new_year, new_year.to_uppercase()); 729 | /// ``` 730 | /// 731 | /// One character can become multiple: 732 | /// ``` 733 | /// # use generic_str::str; 734 | /// let s = <&str>::from("tschüß"); 735 | /// 736 | /// assert_eq!(s.to_uppercase(), <&str>::from("TSCHÜSS")); 737 | /// ``` 738 | #[cfg(feature = "alloc")] 739 | pub fn to_uppercase(&self) -> crate::String { 740 | use core::unicode::conversions; 741 | 742 | let mut s = crate::String::with_capacity(self.len()); 743 | for c in self[..].chars() { 744 | match conversions::to_upper(c) { 745 | [a, '\0', _] => s.push(a), 746 | [a, b, '\0'] => { 747 | s.push(a); 748 | s.push(b); 749 | } 750 | [a, b, c] => { 751 | s.push(a); 752 | s.push(b); 753 | s.push(c); 754 | } 755 | } 756 | } 757 | s 758 | } 759 | } 760 | 761 | #[inline(never)] 762 | #[cold] 763 | #[track_caller] 764 | pub(crate) fn slice_error_fail(s: &str, begin: usize, end: usize) -> ! { 765 | const MAX_DISPLAY_LENGTH: usize = 256; 766 | let (truncated, s_trunc) = truncate_to_char_boundary(s, MAX_DISPLAY_LENGTH); 767 | let ellipsis = if truncated { "[...]" } else { "" }; 768 | 769 | // 1. out of bounds 770 | if begin > s.len() || end > s.len() { 771 | let oob_index = if begin > s.len() { begin } else { end }; 772 | panic!( 773 | "byte index {} is out of bounds of `{}`{}", 774 | oob_index, s_trunc, ellipsis 775 | ); 776 | } 777 | 778 | // 2. begin <= end 779 | assert!( 780 | begin <= end, 781 | "begin <= end ({} <= {}) when slicing `{}`{}", 782 | begin, 783 | end, 784 | s_trunc, 785 | ellipsis 786 | ); 787 | 788 | // 3. character boundary 789 | let index = if !s.is_char_boundary(begin) { 790 | begin 791 | } else { 792 | end 793 | }; 794 | // find the character 795 | let mut char_start = index; 796 | while !s.is_char_boundary(char_start) { 797 | char_start -= 1; 798 | } 799 | // `char_start` must be less than len and a char boundary 800 | let ch = s[char_start..].chars().next().unwrap(); 801 | let char_range = char_start..char_start + ch.len_utf8(); 802 | panic!( 803 | "byte index {} is not a char boundary; it is inside {:?} (bytes {:?}) of `{}`{}", 804 | index, ch, char_range, s_trunc, ellipsis 805 | ); 806 | } 807 | -------------------------------------------------------------------------------- /src/slice_utf8_index.rs: -------------------------------------------------------------------------------- 1 | use core::slice::SliceIndex; 2 | 3 | use crate::{slice_utf8::slice_error_fail, string_base::StringBase}; 4 | 5 | /// Implements substring slicing with syntax `&self[..]` or `&mut self[..]`. 6 | /// 7 | /// Returns a slice of the whole string, i.e., returns `&self` or `&mut 8 | /// self`. Equivalent to `&self[0 .. len]` or `&mut self[0 .. len]`. Unlike 9 | /// other indexing operations, this can never panic. 10 | /// 11 | /// This operation is *O*(1). 12 | /// 13 | /// Prior to 1.20.0, these indexing operations were still supported by 14 | /// direct implementation of `Index` and `IndexMut`. 15 | /// 16 | /// Equivalent to `&self[0 .. len]` or `&mut self[0 .. len]`. 17 | unsafe impl SliceIndex> for core::ops::RangeFull { 18 | type Output = StringBase<[u8]>; 19 | #[inline] 20 | fn get(self, slice: &StringBase<[u8]>) -> Option<&Self::Output> { 21 | Some(slice) 22 | } 23 | #[inline] 24 | fn get_mut(self, slice: &mut StringBase<[u8]>) -> Option<&mut Self::Output> { 25 | Some(slice) 26 | } 27 | #[inline] 28 | unsafe fn get_unchecked(self, slice: *const StringBase<[u8]>) -> *const Self::Output { 29 | slice 30 | } 31 | #[inline] 32 | unsafe fn get_unchecked_mut(self, slice: *mut StringBase<[u8]>) -> *mut Self::Output { 33 | slice 34 | } 35 | #[inline] 36 | fn index(self, slice: &StringBase<[u8]>) -> &Self::Output { 37 | slice 38 | } 39 | #[inline] 40 | fn index_mut(self, slice: &mut StringBase<[u8]>) -> &mut Self::Output { 41 | slice 42 | } 43 | } 44 | 45 | /// Implements substring slicing with syntax `&self[begin .. end]` or `&mut 46 | /// self[begin .. end]`. 47 | /// 48 | /// Returns a slice of the given string from the byte range 49 | /// [`begin`, `end`). 50 | /// 51 | /// This operation is *O*(1). 52 | /// 53 | /// Prior to 1.20.0, these indexing operations were still supported by 54 | /// direct implementation of `Index` and `IndexMut`. 55 | /// 56 | /// # Panics 57 | /// 58 | /// Panics if `begin` or `end` does not point to the starting byte offset of 59 | /// a character (as defined by `is_char_boundary`), if `begin > end`, or if 60 | /// `end > len`. 61 | /// 62 | /// # Examples 63 | /// 64 | /// ``` 65 | /// let s = "Löwe 老虎 Léopard"; 66 | /// assert_eq!(&s[0 .. 1], "L"); 67 | /// 68 | /// assert_eq!(&s[1 .. 9], "öwe 老"); 69 | /// 70 | /// // these will panic: 71 | /// // byte 2 lies within `ö`: 72 | /// // &s[2 ..3]; 73 | /// 74 | /// // byte 8 lies within `老` 75 | /// // &s[1 .. 8]; 76 | /// 77 | /// // byte 100 is outside the string 78 | /// // &s[3 .. 100]; 79 | /// ``` 80 | unsafe impl SliceIndex> for core::ops::Range { 81 | type Output = StringBase<[u8]>; 82 | #[inline] 83 | fn get(self, slice: &StringBase<[u8]>) -> Option<&Self::Output> { 84 | if self.start <= self.end 85 | && slice.is_char_boundary(self.start) 86 | && slice.is_char_boundary(self.end) 87 | { 88 | // SAFETY: just checked that `start` and `end` are on a char boundary, 89 | // and we are passing in a safe reference, so the return value will also be one. 90 | // We also checked char boundaries, so this is valid UTF-8. 91 | Some(unsafe { &*self.get_unchecked(slice) }) 92 | } else { 93 | None 94 | } 95 | } 96 | #[inline] 97 | fn get_mut(self, slice: &mut StringBase<[u8]>) -> Option<&mut Self::Output> { 98 | if self.start <= self.end 99 | && slice.is_char_boundary(self.start) 100 | && slice.is_char_boundary(self.end) 101 | { 102 | // SAFETY: just checked that `start` and `end` are on a char boundary. 103 | // We know the pointer is unique because we got it from `slice`. 104 | Some(unsafe { &mut *self.get_unchecked_mut(slice) }) 105 | } else { 106 | None 107 | } 108 | } 109 | #[inline] 110 | unsafe fn get_unchecked(self, slice: *const StringBase<[u8]>) -> *const Self::Output { 111 | let slice = slice as *const [u8]; 112 | // SAFETY: the caller guarantees that `self` is in bounds of `slice` 113 | // which satisfies all the conditions for `add`. 114 | let ptr = slice.as_ptr().add(self.start); 115 | let len = self.end - self.start; 116 | core::ptr::slice_from_raw_parts(ptr, len) as *const StringBase<[u8]> 117 | } 118 | #[inline] 119 | unsafe fn get_unchecked_mut(self, slice: *mut StringBase<[u8]>) -> *mut Self::Output { 120 | let slice = slice as *mut [u8]; 121 | // SAFETY: see comments for `get_unchecked`. 122 | let ptr = slice.as_mut_ptr().add(self.start); 123 | let len = self.end - self.start; 124 | core::ptr::slice_from_raw_parts_mut(ptr, len) as *mut StringBase<[u8]> 125 | } 126 | #[inline] 127 | fn index(self, slice: &StringBase<[u8]>) -> &Self::Output { 128 | let (start, end) = (self.start, self.end); 129 | match self.get(slice) { 130 | Some(s) => s, 131 | None => slice_error_fail(slice, start, end), 132 | } 133 | } 134 | #[inline] 135 | fn index_mut(self, slice: &mut StringBase<[u8]>) -> &mut Self::Output { 136 | // is_char_boundary checks that the index is in [0, .len()] 137 | // cannot reuse `get` as above, because of NLL trouble 138 | if self.start <= self.end 139 | && slice.is_char_boundary(self.start) 140 | && slice.is_char_boundary(self.end) 141 | { 142 | // SAFETY: just checked that `start` and `end` are on a char boundary, 143 | // and we are passing in a safe reference, so the return value will also be one. 144 | unsafe { &mut *self.get_unchecked_mut(slice) } 145 | } else { 146 | slice_error_fail(slice, self.start, self.end) 147 | } 148 | } 149 | } 150 | 151 | /// Implements substring slicing with syntax `&self[.. end]` or `&mut 152 | /// self[.. end]`. 153 | /// 154 | /// Returns a slice of the given string from the byte range [`0`, `end`). 155 | /// Equivalent to `&self[0 .. end]` or `&mut self[0 .. end]`. 156 | /// 157 | /// This operation is *O*(1). 158 | /// 159 | /// Prior to 1.20.0, these indexing operations were still supported by 160 | /// direct implementation of `Index` and `IndexMut`. 161 | /// 162 | /// # Panics 163 | /// 164 | /// Panics if `end` does not point to the starting byte offset of a 165 | /// character (as defined by `is_char_boundary`), or if `end > len`. 166 | unsafe impl SliceIndex> for core::ops::RangeTo { 167 | type Output = StringBase<[u8]>; 168 | #[inline] 169 | fn get(self, slice: &StringBase<[u8]>) -> Option<&Self::Output> { 170 | if slice.is_char_boundary(self.end) { 171 | // SAFETY: just checked that `end` is on a char boundary, 172 | // and we are passing in a safe reference, so the return value will also be one. 173 | Some(unsafe { &*self.get_unchecked(slice) }) 174 | } else { 175 | None 176 | } 177 | } 178 | #[inline] 179 | fn get_mut(self, slice: &mut StringBase<[u8]>) -> Option<&mut Self::Output> { 180 | if slice.is_char_boundary(self.end) { 181 | // SAFETY: just checked that `end` is on a char boundary, 182 | // and we are passing in a safe reference, so the return value will also be one. 183 | Some(unsafe { &mut *self.get_unchecked_mut(slice) }) 184 | } else { 185 | None 186 | } 187 | } 188 | #[inline] 189 | unsafe fn get_unchecked(self, slice: *const StringBase<[u8]>) -> *const Self::Output { 190 | let slice = slice as *const [u8]; 191 | let ptr = slice.as_ptr(); 192 | core::ptr::slice_from_raw_parts(ptr, self.end) as *const StringBase<[u8]> 193 | } 194 | #[inline] 195 | unsafe fn get_unchecked_mut(self, slice: *mut StringBase<[u8]>) -> *mut Self::Output { 196 | let slice = slice as *mut [u8]; 197 | let ptr = slice.as_mut_ptr(); 198 | core::ptr::slice_from_raw_parts_mut(ptr, self.end) as *mut StringBase<[u8]> 199 | } 200 | #[inline] 201 | fn index(self, slice: &StringBase<[u8]>) -> &Self::Output { 202 | let end = self.end; 203 | match self.get(slice) { 204 | Some(s) => s, 205 | None => slice_error_fail(slice, 0, end), 206 | } 207 | } 208 | #[inline] 209 | fn index_mut(self, slice: &mut StringBase<[u8]>) -> &mut Self::Output { 210 | if slice.is_char_boundary(self.end) { 211 | // SAFETY: just checked that `end` is on a char boundary, 212 | // and we are passing in a safe reference, so the return value will also be one. 213 | unsafe { &mut *self.get_unchecked_mut(slice) } 214 | } else { 215 | slice_error_fail(slice, 0, self.end) 216 | } 217 | } 218 | } 219 | 220 | /// Implements substring slicing with syntax `&self[begin ..]` or `&mut 221 | /// self[begin ..]`. 222 | /// 223 | /// Returns a slice of the given string from the byte range [`begin`, 224 | /// `len`). Equivalent to `&self[begin .. len]` or `&mut self[begin .. 225 | /// len]`. 226 | /// 227 | /// This operation is *O*(1). 228 | /// 229 | /// Prior to 1.20.0, these indexing operations were still supported by 230 | /// direct implementation of `Index` and `IndexMut`. 231 | /// 232 | /// # Panics 233 | /// 234 | /// Panics if `begin` does not point to the starting byte offset of 235 | /// a character (as defined by `is_char_boundary`), or if `begin > len`. 236 | unsafe impl SliceIndex> for core::ops::RangeFrom { 237 | type Output = StringBase<[u8]>; 238 | #[inline] 239 | fn get(self, slice: &StringBase<[u8]>) -> Option<&Self::Output> { 240 | if slice.is_char_boundary(self.start) { 241 | // SAFETY: just checked that `start` is on a char boundary, 242 | // and we are passing in a safe reference, so the return value will also be one. 243 | Some(unsafe { &*self.get_unchecked(slice) }) 244 | } else { 245 | None 246 | } 247 | } 248 | #[inline] 249 | fn get_mut(self, slice: &mut StringBase<[u8]>) -> Option<&mut Self::Output> { 250 | if slice.is_char_boundary(self.start) { 251 | // SAFETY: just checked that `start` is on a char boundary, 252 | // and we are passing in a safe reference, so the return value will also be one. 253 | Some(unsafe { &mut *self.get_unchecked_mut(slice) }) 254 | } else { 255 | None 256 | } 257 | } 258 | #[inline] 259 | unsafe fn get_unchecked(self, slice: *const StringBase<[u8]>) -> *const Self::Output { 260 | let slice = slice as *const [u8]; 261 | // SAFETY: the caller guarantees that `self` is in bounds of `slice` 262 | // which satisfies all the conditions for `add`. 263 | let ptr = slice.as_ptr().add(self.start); 264 | let len = slice.len() - self.start; 265 | core::ptr::slice_from_raw_parts(ptr, len) as *const StringBase<[u8]> 266 | } 267 | #[inline] 268 | unsafe fn get_unchecked_mut(self, slice: *mut StringBase<[u8]>) -> *mut Self::Output { 269 | let slice = slice as *mut [u8]; 270 | // SAFETY: identical to `get_unchecked`. 271 | let ptr = slice.as_mut_ptr().add(self.start); 272 | let len = slice.len() - self.start; 273 | core::ptr::slice_from_raw_parts_mut(ptr, len) as *mut StringBase<[u8]> 274 | } 275 | #[inline] 276 | fn index(self, slice: &StringBase<[u8]>) -> &Self::Output { 277 | let (start, end) = (self.start, slice.len()); 278 | match self.get(slice) { 279 | Some(s) => s, 280 | None => slice_error_fail(slice, start, end), 281 | } 282 | } 283 | #[inline] 284 | fn index_mut(self, slice: &mut StringBase<[u8]>) -> &mut Self::Output { 285 | if slice.is_char_boundary(self.start) { 286 | // SAFETY: just checked that `start` is on a char boundary, 287 | // and we are passing in a safe reference, so the return value will also be one. 288 | unsafe { &mut *self.get_unchecked_mut(slice) } 289 | } else { 290 | slice_error_fail(slice, self.start, slice.len()) 291 | } 292 | } 293 | } 294 | 295 | /// Implements substring slicing with syntax `&self[..= end]` or `&mut 296 | /// self[..= end]`. 297 | /// 298 | /// Returns a slice of the given string from the byte range [0, `end`]. 299 | /// Equivalent to `&self [0 .. end + 1]`, except if `end` has the maximum 300 | /// value for `usize`. 301 | /// 302 | /// This operation is *O*(1). 303 | /// 304 | /// # Panics 305 | /// 306 | /// Panics if `end` does not point to the ending byte offset of a character 307 | /// (`end + 1` is either a starting byte offset as defined by 308 | /// `is_char_boundary`, or equal to `len`), or if `end >= len`. 309 | unsafe impl SliceIndex> for core::ops::RangeToInclusive { 310 | type Output = StringBase<[u8]>; 311 | #[inline] 312 | fn get(self, slice: &StringBase<[u8]>) -> Option<&Self::Output> { 313 | if self.end == usize::MAX { 314 | None 315 | } else { 316 | (..self.end + 1).get(slice) 317 | } 318 | } 319 | #[inline] 320 | fn get_mut(self, slice: &mut StringBase<[u8]>) -> Option<&mut Self::Output> { 321 | if self.end == usize::MAX { 322 | None 323 | } else { 324 | (..self.end + 1).get_mut(slice) 325 | } 326 | } 327 | #[inline] 328 | unsafe fn get_unchecked(self, slice: *const StringBase<[u8]>) -> *const Self::Output { 329 | // SAFETY: the caller must uphold the safety contract for `get_unchecked`. 330 | (..self.end + 1).get_unchecked(slice) 331 | } 332 | #[inline] 333 | unsafe fn get_unchecked_mut(self, slice: *mut StringBase<[u8]>) -> *mut Self::Output { 334 | // SAFETY: the caller must uphold the safety contract for `get_unchecked_mut`. 335 | (..self.end + 1).get_unchecked_mut(slice) 336 | } 337 | #[inline] 338 | fn index(self, slice: &StringBase<[u8]>) -> &Self::Output { 339 | if self.end == usize::MAX { 340 | str_index_overflow_fail(); 341 | } 342 | (..self.end + 1).index(slice) 343 | } 344 | #[inline] 345 | fn index_mut(self, slice: &mut StringBase<[u8]>) -> &mut Self::Output { 346 | if self.end == usize::MAX { 347 | str_index_overflow_fail(); 348 | } 349 | (..self.end + 1).index_mut(slice) 350 | } 351 | } 352 | 353 | #[inline(never)] 354 | #[cold] 355 | #[track_caller] 356 | fn str_index_overflow_fail() -> ! { 357 | panic!("attempted to index str up to maximum usize"); 358 | } 359 | -------------------------------------------------------------------------------- /src/string_base.rs: -------------------------------------------------------------------------------- 1 | use core::cmp::Ordering; 2 | 3 | use generic_vec::{ 4 | raw::{Storage, StorageWithCapacity}, 5 | GenericVec, 6 | }; 7 | 8 | #[cfg(feature = "alloc")] 9 | use generic_vec::HeapVec; 10 | #[cfg(feature = "alloc")] 11 | use std::{alloc::Allocator, borrow::Borrow}; 12 | 13 | #[derive(Default, Copy, Clone)] 14 | #[repr(transparent)] 15 | pub struct StringBase { 16 | pub(crate) storage: S, 17 | } 18 | 19 | impl StringBase> { 20 | #[inline] 21 | pub fn new_with_capacity(capacity: usize) -> Self { 22 | Self::with_storage(S::with_capacity(capacity)) 23 | } 24 | } 25 | 26 | pub type OwnedString = StringBase>; 27 | pub type StringSlice = StringBase<[U]>; 28 | 29 | impl StringBase> { 30 | /// Creates a new empty `String` with a particular storage backend. 31 | /// 32 | /// `String`s have an internal buffer to hold their data. The capacity is 33 | /// the length of that buffer, and can be queried with the [`capacity`] 34 | /// method. This method creates an empty `String`, but one with an initial 35 | /// buffer that can hold `capacity` bytes. This is useful when you may be 36 | /// appending a bunch of data to the `String`, reducing the number of 37 | /// reallocations it needs to do. 38 | /// 39 | /// [`capacity`]: StringBase::capacity 40 | /// 41 | /// If the given capacity is `0`, no allocation will occur, and this method 42 | /// is identical to the [`new`] method. 43 | /// 44 | /// [`new`]: StringBase::new 45 | /// 46 | /// # Examples 47 | /// 48 | /// Basic usage: 49 | /// 50 | /// ``` 51 | /// # use generic_str::String; 52 | /// let mut s = String::with_capacity(10); 53 | /// 54 | /// // The String contains no chars, even though it has capacity for more 55 | /// assert_eq!(s.len(), 0); 56 | /// 57 | /// // These are all done without reallocating... 58 | /// let cap = s.capacity(); 59 | /// for _ in 0..10 { 60 | /// s.push('a'); 61 | /// } 62 | /// 63 | /// assert_eq!(s.capacity(), cap); 64 | /// 65 | /// // ...but this may make the string reallocate 66 | /// s.push('a'); 67 | /// ``` 68 | #[inline] 69 | pub fn with_storage(storage: S) -> Self 70 | where 71 | S: Sized, 72 | { 73 | StringBase { 74 | storage: GenericVec::with_storage(storage), 75 | } 76 | } 77 | } 78 | 79 | impl core::ops::Deref for StringBase { 80 | type Target = StringBase; 81 | 82 | fn deref(&self) -> &StringBase { 83 | unsafe { core::mem::transmute::<&T::Target, &StringBase>(self.storage.deref()) } 84 | } 85 | } 86 | 87 | impl core::ops::DerefMut for StringBase { 88 | fn deref_mut(&mut self) -> &mut StringBase { 89 | unsafe { 90 | core::mem::transmute::<&mut T::Target, &mut StringBase>( 91 | self.storage.deref_mut(), 92 | ) 93 | } 94 | } 95 | } 96 | 97 | impl, U: ?Sized> AsRef> for StringBase { 98 | fn as_ref(&self) -> &StringBase { 99 | unsafe { core::mem::transmute::<&U, &StringBase>(self.storage.as_ref()) } 100 | } 101 | } 102 | 103 | impl, U: ?Sized> AsMut> for StringBase { 104 | fn as_mut(&mut self) -> &mut StringBase { 105 | unsafe { core::mem::transmute::<&mut U, &mut StringBase>(self.storage.as_mut()) } 106 | } 107 | } 108 | 109 | #[cfg(feature = "alloc")] 110 | impl Borrow> for StringBase> { 111 | fn borrow(&self) -> &StringBase<[T]> { 112 | unsafe { std::mem::transmute::<&[T], &StringBase<[T]>>(self.storage.borrow()) } 113 | } 114 | } 115 | 116 | #[cfg(feature = "alloc")] 117 | impl ToOwned for StringBase<[T]> { 118 | type Owned = StringBase>; 119 | 120 | fn to_owned(&self) -> Self::Owned { 121 | Self::Owned { 122 | storage: self.storage.to_owned().into(), 123 | } 124 | } 125 | } 126 | 127 | impl PartialEq> 128 | for StringBase> 129 | where 130 | GenericVec: PartialEq>, 131 | { 132 | fn eq(&self, other: &OwnedString) -> bool { 133 | self.storage.eq(&other.storage) 134 | } 135 | } 136 | 137 | impl PartialEq> for StringSlice 138 | where 139 | S::Item: PartialEq, 140 | { 141 | fn eq(&self, other: &OwnedString) -> bool { 142 | other.storage.eq(&self.storage) 143 | } 144 | } 145 | 146 | impl PartialEq> for &StringSlice 147 | where 148 | S::Item: PartialEq, 149 | { 150 | fn eq(&self, other: &OwnedString) -> bool { 151 | other.storage.eq(&self.storage) 152 | } 153 | } 154 | 155 | impl PartialEq> for OwnedString 156 | where 157 | S::Item: PartialEq, 158 | { 159 | fn eq(&self, other: &StringSlice) -> bool { 160 | self.storage.eq(&other.storage) 161 | } 162 | } 163 | 164 | impl PartialEq<&StringSlice> for OwnedString 165 | where 166 | S::Item: PartialEq, 167 | { 168 | fn eq(&self, other: &&StringSlice) -> bool { 169 | self.storage.eq(&other.storage) 170 | } 171 | } 172 | 173 | impl PartialEq> for StringSlice { 174 | fn eq(&self, other: &StringSlice) -> bool { 175 | self.storage.eq(&other.storage) 176 | } 177 | } 178 | 179 | impl Eq for OwnedString where S::Item: Eq {} 180 | impl Eq for StringSlice {} 181 | 182 | impl PartialOrd> 183 | for OwnedString 184 | where 185 | GenericVec: PartialOrd>, 186 | { 187 | fn partial_cmp(&self, other: &OwnedString) -> Option { 188 | self.storage.partial_cmp(&other.storage) 189 | } 190 | } 191 | 192 | impl PartialOrd> for StringSlice 193 | where 194 | S::Item: PartialOrd, 195 | { 196 | fn partial_cmp(&self, other: &OwnedString) -> Option { 197 | other.storage.partial_cmp(&self.storage) 198 | } 199 | } 200 | 201 | impl PartialOrd> for &StringSlice 202 | where 203 | S::Item: PartialOrd, 204 | { 205 | fn partial_cmp(&self, other: &OwnedString) -> Option { 206 | other.storage.partial_cmp(&self.storage) 207 | } 208 | } 209 | 210 | impl PartialOrd> for OwnedString 211 | where 212 | S::Item: PartialOrd, 213 | { 214 | fn partial_cmp(&self, other: &StringSlice) -> Option { 215 | self.storage.partial_cmp(&other.storage) 216 | } 217 | } 218 | 219 | impl PartialOrd<&StringSlice> for OwnedString 220 | where 221 | S::Item: PartialOrd, 222 | { 223 | fn partial_cmp(&self, other: &&StringSlice) -> Option { 224 | self.storage.partial_cmp(&other.storage) 225 | } 226 | } 227 | 228 | impl PartialOrd> for StringSlice { 229 | fn partial_cmp(&self, other: &StringSlice) -> Option { 230 | self.storage.partial_cmp(&other.storage) 231 | } 232 | } 233 | 234 | impl Ord for OwnedString 235 | where 236 | S::Item: Ord, 237 | { 238 | fn cmp(&self, other: &Self) -> Ordering { 239 | self.storage.cmp(&other.storage) 240 | } 241 | } 242 | impl Ord for StringSlice { 243 | fn cmp(&self, other: &Self) -> Ordering { 244 | self.storage.cmp(&other.storage) 245 | } 246 | } 247 | -------------------------------------------------------------------------------- /src/traits_utf32.rs: -------------------------------------------------------------------------------- 1 | use generic_vec::{raw::Storage, GenericVec}; 2 | 3 | #[cfg(feature = "alloc")] 4 | use generic_vec::HeapVec; 5 | #[cfg(feature = "alloc")] 6 | use std::{ 7 | borrow::Cow, 8 | iter::FromIterator, 9 | ops::{Add, AddAssign}, 10 | }; 11 | 12 | use crate::{string_base::StringBase, OwnedString}; 13 | use core::{ 14 | ops::{Index, IndexMut}, 15 | slice::SliceIndex, 16 | }; 17 | 18 | impl Index for crate::str32 19 | where 20 | I: SliceIndex, 21 | { 22 | type Output = I::Output; 23 | 24 | #[inline] 25 | fn index(&self, index: I) -> &I::Output { 26 | index.index(self) 27 | } 28 | } 29 | 30 | impl IndexMut for crate::str32 31 | where 32 | I: SliceIndex, 33 | { 34 | #[inline] 35 | fn index_mut(&mut self, index: I) -> &mut I::Output { 36 | index.index_mut(self) 37 | } 38 | } 39 | 40 | #[cfg(feature = "alloc")] 41 | impl> std::fmt::Debug for OwnedString { 42 | fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { 43 | let s: String = self.as_ref().storage.iter().collect(); 44 | write!(f, "{:?}", s) 45 | } 46 | } 47 | 48 | #[cfg(feature = "alloc")] 49 | impl> std::fmt::Display for OwnedString { 50 | fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { 51 | let s: String = self.as_ref().storage.iter().collect(); 52 | write!(f, "{}", s) 53 | } 54 | } 55 | 56 | #[cfg(feature = "alloc")] 57 | impl std::fmt::Debug for crate::str32 { 58 | fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { 59 | let s: String = self.storage.iter().collect(); 60 | write!(f, "{:?}", s) 61 | } 62 | } 63 | 64 | #[cfg(feature = "alloc")] 65 | impl std::fmt::Display for crate::str32 { 66 | fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { 67 | let s: String = self.storage.iter().collect(); 68 | write!(f, "{}", s) 69 | } 70 | } 71 | 72 | #[cfg(feature = "alloc")] 73 | impl From for crate::String32 { 74 | fn from(s: String) -> Self { 75 | s.chars().collect() 76 | } 77 | } 78 | 79 | #[cfg(feature = "alloc")] 80 | impl FromIterator for crate::String32 { 81 | fn from_iter>(iter: I) -> Self { 82 | let mut new = Self::new(); 83 | iter.into_iter().for_each(|c| new.push(c)); 84 | new 85 | } 86 | } 87 | 88 | #[cfg(feature = "alloc")] 89 | impl From<&str> for StringBase> { 90 | fn from(s: &str) -> Self { 91 | s.to_owned().into() 92 | } 93 | } 94 | 95 | impl From<&str> for &crate::str32 { 96 | fn from(s: &str) -> Self { 97 | unsafe { core::mem::transmute(s) } 98 | } 99 | } 100 | 101 | impl From<&mut str> for &mut crate::str32 { 102 | fn from(s: &mut str) -> Self { 103 | unsafe { core::mem::transmute(s) } 104 | } 105 | } 106 | 107 | impl, T: ?Sized + AsRef<[char]>> PartialEq 108 | for StringBase> 109 | { 110 | fn eq(&self, other: &T) -> bool { 111 | self.storage.as_ref() == other.as_ref() 112 | } 113 | } 114 | 115 | impl> PartialEq for crate::str32 { 116 | fn eq(&self, other: &T) -> bool { 117 | self.storage.as_ref() == other.as_ref() 118 | } 119 | } 120 | 121 | #[cfg(feature = "alloc")] 122 | impl<'a> Add<&'a crate::str32> for Cow<'a, crate::str32> { 123 | type Output = Cow<'a, crate::str32>; 124 | 125 | #[inline] 126 | fn add(mut self, rhs: &'a crate::str32) -> Self::Output { 127 | self += rhs; 128 | self 129 | } 130 | } 131 | 132 | #[cfg(feature = "alloc")] 133 | impl<'a> AddAssign<&'a crate::str32> for Cow<'a, crate::str32> { 134 | fn add_assign(&mut self, rhs: &'a crate::str32) { 135 | if self.is_empty() { 136 | *self = Cow::Borrowed(rhs) 137 | } else if !rhs.is_empty() { 138 | if let Cow::Borrowed(lhs) = *self { 139 | let mut s = crate::String32::with_capacity(lhs.len() + rhs.len()); 140 | s.push_str32(lhs); 141 | *self = Cow::Owned(s); 142 | } 143 | self.to_mut().push_str32(rhs); 144 | } 145 | } 146 | } 147 | -------------------------------------------------------------------------------- /src/traits_utf8.rs: -------------------------------------------------------------------------------- 1 | use generic_vec::raw::Storage; 2 | 3 | #[cfg(feature = "alloc")] 4 | use generic_vec::HeapVec; 5 | #[cfg(feature = "alloc")] 6 | use std::{ 7 | borrow::Cow, 8 | ops::{Add, AddAssign}, 9 | }; 10 | 11 | use crate::{string_base::StringBase, OwnedString}; 12 | use core::{ 13 | ops::{Index, IndexMut}, 14 | slice::SliceIndex, 15 | }; 16 | 17 | impl AsRef for StringBase<[u8]> { 18 | fn as_ref(&self) -> &str { 19 | unsafe { core::str::from_utf8_unchecked(&self.storage) } 20 | } 21 | } 22 | 23 | impl AsMut for StringBase<[u8]> { 24 | fn as_mut(&mut self) -> &mut str { 25 | unsafe { core::str::from_utf8_unchecked_mut(&mut self.storage) } 26 | } 27 | } 28 | 29 | impl Index for StringBase<[u8]> 30 | where 31 | I: SliceIndex>, 32 | { 33 | type Output = I::Output; 34 | 35 | #[inline] 36 | fn index(&self, index: I) -> &I::Output { 37 | index.index(self) 38 | } 39 | } 40 | 41 | impl IndexMut for StringBase<[u8]> 42 | where 43 | I: SliceIndex>, 44 | { 45 | #[inline] 46 | fn index_mut(&mut self, index: I) -> &mut I::Output { 47 | index.index_mut(self) 48 | } 49 | } 50 | 51 | impl> core::fmt::Debug for OwnedString { 52 | fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { 53 | let s: &StringBase<[S::Item]> = self.as_ref(); 54 | let s: &str = s.as_ref(); 55 | write!(f, "{:?}", s) 56 | } 57 | } 58 | 59 | impl> core::fmt::Display for OwnedString { 60 | fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { 61 | let s: &StringBase<[S::Item]> = self.as_ref(); 62 | let s: &str = s.as_ref(); 63 | write!(f, "{}", s) 64 | } 65 | } 66 | 67 | impl core::fmt::Debug for StringBase<[u8]> { 68 | fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { 69 | let s: &str = self.as_ref(); 70 | write!(f, "{:?}", s) 71 | } 72 | } 73 | 74 | impl core::fmt::Display for StringBase<[u8]> { 75 | fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { 76 | let s: &str = self.as_ref(); 77 | write!(f, "{}", s) 78 | } 79 | } 80 | 81 | #[cfg(feature = "alloc")] 82 | impl From for StringBase> { 83 | fn from(s: String) -> Self { 84 | let (ptr, len, capacity) = s.into_raw_parts(); 85 | unsafe { 86 | let ptr = std::ptr::slice_from_raw_parts_mut(ptr.cast(), capacity); 87 | let storage = Box::from_raw(ptr); 88 | let storage = HeapVec::from_raw_parts(len, storage); 89 | Self { storage } 90 | } 91 | } 92 | } 93 | 94 | #[cfg(feature = "alloc")] 95 | impl From<&str> for StringBase> { 96 | fn from(s: &str) -> Self { 97 | s.to_owned().into() 98 | } 99 | } 100 | 101 | impl From<&str> for &StringBase<[u8]> { 102 | fn from(s: &str) -> Self { 103 | unsafe { core::mem::transmute(s) } 104 | } 105 | } 106 | 107 | impl From<&mut str> for &mut StringBase<[u8]> { 108 | fn from(s: &mut str) -> Self { 109 | unsafe { core::mem::transmute(s) } 110 | } 111 | } 112 | 113 | impl<'a, S: ?Sized + AsRef<[u8]>> From<&'a StringBase> for &'a str { 114 | fn from(s: &'a StringBase) -> Self { 115 | unsafe { core::str::from_utf8_unchecked(s.storage.as_ref()) } 116 | } 117 | } 118 | 119 | #[cfg(feature = "alloc")] 120 | impl<'a> Add<&'a crate::str> for Cow<'a, crate::str> { 121 | type Output = Cow<'a, crate::str>; 122 | 123 | #[inline] 124 | fn add(mut self, rhs: &'a crate::str) -> Self::Output { 125 | self += rhs; 126 | self 127 | } 128 | } 129 | 130 | #[cfg(feature = "alloc")] 131 | impl<'a> AddAssign<&'a crate::str> for Cow<'a, crate::str> { 132 | fn add_assign(&mut self, rhs: &'a crate::str) { 133 | if self.is_empty() { 134 | *self = Cow::Borrowed(rhs) 135 | } else if !rhs.is_empty() { 136 | if let Cow::Borrowed(lhs) = *self { 137 | let mut s = crate::String::with_capacity(lhs.len() + rhs.len()); 138 | s.push_str(lhs); 139 | *self = Cow::Owned(s); 140 | } 141 | self.to_mut().push_str(rhs); 142 | } 143 | } 144 | } 145 | -------------------------------------------------------------------------------- /src/validation.rs: -------------------------------------------------------------------------------- 1 | use crate::string_base::StringBase; 2 | 3 | // truncate `&StringBase<[u8]>` to length at most equal to `max` 4 | // return `true` if it were truncated, and the new str. 5 | pub(super) fn truncate_to_char_boundary( 6 | s: &StringBase<[u8]>, 7 | mut max: usize, 8 | ) -> (bool, &StringBase<[u8]>) { 9 | if max >= s.len() { 10 | (false, s) 11 | } else { 12 | while !s.is_char_boundary(max) { 13 | max -= 1; 14 | } 15 | (true, &s[..max]) 16 | } 17 | } 18 | --------------------------------------------------------------------------------