├── .github ├── CODE_OF_CONDUCT ├── CONTRIBUTING.md └── workflows │ └── ci.yaml ├── .gitignore ├── Cargo.toml ├── LICENSE-MIT ├── README.md ├── benches ├── bench.rs └── nfa.rs └── src ├── lib.rs └── nfa.rs /.github/CODE_OF_CONDUCT: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | In the interest of fostering an open and welcoming environment, we as 6 | contributors and maintainers pledge to making participation in our project and 7 | our community a harassment-free experience for everyone, regardless of age, body 8 | size, disability, ethnicity, gender identity and expression, level of 9 | experience, 10 | education, socio-economic status, nationality, personal appearance, race, 11 | religion, or sexual identity and orientation. 12 | 13 | ## Our Standards 14 | 15 | Examples of behavior that contributes to creating a positive environment 16 | include: 17 | 18 | - Using welcoming and inclusive language 19 | - Being respectful of differing viewpoints and experiences 20 | - Gracefully accepting constructive criticism 21 | - Focusing on what is best for the community 22 | - Showing empathy towards other community members 23 | 24 | Examples of unacceptable behavior by participants include: 25 | 26 | - The use of sexualized language or imagery and unwelcome sexual attention or 27 | advances 28 | - Trolling, insulting/derogatory comments, and personal or political attacks 29 | - Public or private harassment 30 | - Publishing others' private information, such as a physical or electronic 31 | address, without explicit permission 32 | - Other conduct which could reasonably be considered inappropriate in a 33 | professional setting 34 | 35 | 36 | ## Our Responsibilities 37 | 38 | Project maintainers are responsible for clarifying the standards of acceptable 39 | behavior and are expected to take appropriate and fair corrective action in 40 | response to any instances of unacceptable behavior. 41 | 42 | Project maintainers have the right and responsibility to remove, edit, or 43 | reject comments, commits, code, wiki edits, issues, and other contributions 44 | that are not aligned to this Code of Conduct, or to ban temporarily or 45 | permanently any contributor for other behaviors that they deem inappropriate, 46 | threatening, offensive, or harmful. 47 | 48 | ## Scope 49 | 50 | This Code of Conduct applies both within project spaces and in public spaces 51 | when an individual is representing the project or its community. Examples of 52 | representing a project or community include using an official project e-mail 53 | address, posting via an official social media account, or acting as an appointed 54 | representative at an online or offline event. Representation of a project may be 55 | further defined and clarified by project maintainers. 56 | 57 | ## Enforcement 58 | 59 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 60 | reported by contacting the project team at yoshuawuyts@gmail.com, or through 61 | IRC. All complaints will be reviewed and investigated and will result in a 62 | response that is deemed necessary and appropriate to the circumstances. The 63 | project team is obligated to maintain confidentiality with regard to the 64 | reporter of an incident. 65 | Further details of specific enforcement policies may be posted separately. 66 | 67 | Project maintainers who do not follow or enforce the Code of Conduct in good 68 | faith may face temporary or permanent repercussions as determined by other 69 | members of the project's leadership. 70 | 71 | ## Attribution 72 | 73 | This Code of Conduct is adapted from the Contributor Covenant, version 1.4, 74 | available at 75 | https://www.contributor-covenant.org/version/1/4/code-of-conduct.html 76 | -------------------------------------------------------------------------------- /.github/CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing 2 | Contributions include code, documentation, answering user questions, running the 3 | project's infrastructure, and advocating for all types of users. 4 | 5 | The project welcomes all contributions from anyone willing to work in good faith 6 | with other contributors and the community. No contribution is too small and all 7 | contributions are valued. 8 | 9 | This guide explains the process for contributing to the project's GitHub 10 | Repository. 11 | 12 | - [Code of Conduct](#code-of-conduct) 13 | - [Bad Actors](#bad-actors) 14 | 15 | ## Code of Conduct 16 | The project has a [Code of Conduct](./CODE_OF_CONDUCT.md) that *all* 17 | contributors are expected to follow. This code describes the *minimum* behavior 18 | expectations for all contributors. 19 | 20 | As a contributor, how you choose to act and interact towards your 21 | fellow contributors, as well as to the community, will reflect back not only 22 | on yourself but on the project as a whole. The Code of Conduct is designed and 23 | intended, above all else, to help establish a culture within the project that 24 | allows anyone and everyone who wants to contribute to feel safe doing so. 25 | 26 | Should any individual act in any way that is considered in violation of the 27 | [Code of Conduct](./CODE_OF_CONDUCT.md), corrective actions will be taken. It is 28 | possible, however, for any individual to *act* in such a manner that is not in 29 | violation of the strict letter of the Code of Conduct guidelines while still 30 | going completely against the spirit of what that Code is intended to accomplish. 31 | 32 | Open, diverse, and inclusive communities live and die on the basis of trust. 33 | Contributors can disagree with one another so long as they trust that those 34 | disagreements are in good faith and everyone is working towards a common 35 | goal. 36 | 37 | ## Bad Actors 38 | All contributors to tacitly agree to abide by both the letter and 39 | spirit of the [Code of Conduct](./CODE_OF_CONDUCT.md). Failure, or 40 | unwillingness, to do so will result in contributions being respectfully 41 | declined. 42 | 43 | A *bad actor* is someone who repeatedly violates the *spirit* of the Code of 44 | Conduct through consistent failure to self-regulate the way in which they 45 | interact with other contributors in the project. In doing so, bad actors 46 | alienate other contributors, discourage collaboration, and generally reflect 47 | poorly on the project as a whole. 48 | 49 | Being a bad actor may be intentional or unintentional. Typically, unintentional 50 | bad behavior can be easily corrected by being quick to apologize and correct 51 | course *even if you are not entirely convinced you need to*. Giving other 52 | contributors the benefit of the doubt and having a sincere willingness to admit 53 | that you *might* be wrong is critical for any successful open collaboration. 54 | 55 | Don't be a bad actor. 56 | -------------------------------------------------------------------------------- /.github/workflows/ci.yaml: -------------------------------------------------------------------------------- 1 | name: CI 2 | 3 | on: 4 | pull_request: 5 | push: 6 | branches: 7 | - main 8 | - staging 9 | - trying 10 | 11 | env: 12 | RUSTFLAGS: -Dwarnings 13 | 14 | jobs: 15 | build_and_test: 16 | name: Build and test 17 | runs-on: ${{ matrix.os }} 18 | strategy: 19 | matrix: 20 | os: [ubuntu-latest, windows-latest, macOS-latest] 21 | rust: [stable] 22 | 23 | steps: 24 | - uses: actions/checkout@master 25 | 26 | - name: Install ${{ matrix.rust }} 27 | uses: actions-rs/toolchain@v1 28 | with: 29 | toolchain: ${{ matrix.rust }} 30 | override: true 31 | 32 | - name: check 33 | uses: actions-rs/cargo@v1 34 | with: 35 | command: check 36 | args: --all --bins --examples 37 | 38 | - name: tests 39 | uses: actions-rs/cargo@v1 40 | with: 41 | command: test 42 | args: --all 43 | 44 | check_fmt_and_docs: 45 | name: Checking fmt and docs 46 | runs-on: ubuntu-latest 47 | steps: 48 | - uses: actions/checkout@master 49 | - uses: actions-rs/toolchain@v1 50 | with: 51 | toolchain: nightly 52 | components: rustfmt, clippy 53 | override: true 54 | 55 | - name: fmt 56 | run: cargo fmt --all -- --check 57 | 58 | - name: Docs 59 | run: cargo doc 60 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | /target 2 | /Cargo.lock 3 | -------------------------------------------------------------------------------- /Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "route-recognizer" 3 | description = "Recognizes URL patterns with support for dynamic and wildcard segments" 4 | license = "MIT" 5 | repository = "https://github.com/rustasync/route-recognizer" 6 | keywords = ["router", "url"] 7 | edition = "2018" 8 | 9 | version = "0.3.1" 10 | authors = ["wycats", "rustasync"] 11 | -------------------------------------------------------------------------------- /LICENSE-MIT: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2020 The http-rs contributors 4 | Copyright (c) 2019 The rustasync contributors 5 | Copyright (c) 2014 Yehuda Katz 6 | 7 | Permission is hereby granted, free of charge, to any person obtaining a copy 8 | of this software and associated documentation files (the "Software"), to deal 9 | in the Software without restriction, including without limitation the rights 10 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 11 | copies of the Software, and to permit persons to whom the Software is 12 | furnished to do so, subject to the following conditions: 13 | 14 | The above copyright notice and this permission notice shall be included in all 15 | copies or substantial portions of the Software. 16 | 17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 18 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 19 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 20 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 21 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 22 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 23 | SOFTWARE. 24 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

route-recognizer

2 |
3 | 4 | Recognizes URL patterns with support for dynamic and wildcard segments 5 | 6 |
7 | 8 |
9 | 10 |
11 | 12 | 13 | Crates.io version 15 | 16 | 17 | 18 | Download 20 | 21 | 22 | 23 | docs.rs docs 25 | 26 |
27 | 28 |
29 |

30 | 31 | API Docs 32 | 33 | | 34 | 35 | Releases 36 | 37 | | 38 | 39 | Contributing 40 | 41 |

42 |
43 | 44 | ## Installation 45 | ```sh 46 | $ cargo add route-recognizer 47 | ``` 48 | 49 | ## Safety 50 | This crate uses ``#![deny(unsafe_code)]`` to ensure everything is implemented in 51 | 100% Safe Rust. 52 | 53 | ## Contributing 54 | Want to join us? Check out our ["Contributing" guide][contributing] and take a 55 | look at some of these issues: 56 | 57 | - [Issues labeled "good first issue"][good-first-issue] 58 | - [Issues labeled "help wanted"][help-wanted] 59 | 60 | [contributing]: https://github.com/http-rs/route-recognizer/blob/master.github/CONTRIBUTING.md 61 | [good-first-issue]: https://github.com/http-rs/route-recognizer/labels/good%20first%20issue 62 | [help-wanted]: https://github.com/http-rs/route-recognizer/labels/help%20wanted 63 | 64 | ## License 65 | 66 | 67 | Licensed under either the MIT license at your option. 68 | 69 | 70 |
71 | 72 | 73 | Unless you explicitly state otherwise, any contribution intentionally submitted 74 | for inclusion in this crate by you, as defined in the Apache-2.0 license, shall 75 | be dual licensed as MIT / Apache-2.0, without any additional terms or 76 | conditions. 77 | 78 | -------------------------------------------------------------------------------- /benches/bench.rs: -------------------------------------------------------------------------------- 1 | #![feature(test)] 2 | 3 | extern crate route_recognizer; 4 | extern crate test; 5 | 6 | use route_recognizer::Router; 7 | 8 | #[bench] 9 | fn benchmark(b: &mut test::Bencher) { 10 | let mut router = Router::new(); 11 | router.add("/posts/:post_id/comments/:id", "comment".to_string()); 12 | router.add("/posts/:post_id/comments", "comments".to_string()); 13 | router.add("/posts/:post_id", "post".to_string()); 14 | router.add("/posts", "posts".to_string()); 15 | router.add("/comments", "comments2".to_string()); 16 | router.add("/comments/:id", "comment2".to_string()); 17 | 18 | b.iter(|| router.recognize("/posts/100/comments/200")); 19 | } 20 | -------------------------------------------------------------------------------- /benches/nfa.rs: -------------------------------------------------------------------------------- 1 | #![feature(test)] 2 | 3 | extern crate route_recognizer; 4 | extern crate test; 5 | 6 | use route_recognizer::nfa::CharSet; 7 | use std::collections::{BTreeSet, HashSet}; 8 | 9 | #[bench] 10 | fn bench_char_set(b: &mut test::Bencher) { 11 | let mut set = CharSet::new(); 12 | set.insert('p'); 13 | set.insert('n'); 14 | set.insert('/'); 15 | 16 | b.iter(|| { 17 | assert!(set.contains('p')); 18 | assert!(set.contains('/')); 19 | assert!(!set.contains('z')); 20 | }); 21 | } 22 | 23 | #[bench] 24 | fn bench_hash_set(b: &mut test::Bencher) { 25 | let mut set = HashSet::new(); 26 | set.insert('p'); 27 | set.insert('n'); 28 | set.insert('/'); 29 | 30 | b.iter(|| { 31 | assert!(set.contains(&'p')); 32 | assert!(set.contains(&'/')); 33 | assert!(!set.contains(&'z')); 34 | }); 35 | } 36 | 37 | #[bench] 38 | fn bench_btree_set(b: &mut test::Bencher) { 39 | let mut set = BTreeSet::new(); 40 | set.insert('p'); 41 | set.insert('n'); 42 | set.insert('/'); 43 | 44 | b.iter(|| { 45 | assert!(set.contains(&'p')); 46 | assert!(set.contains(&'/')); 47 | assert!(!set.contains(&'z')); 48 | }); 49 | } 50 | -------------------------------------------------------------------------------- /src/lib.rs: -------------------------------------------------------------------------------- 1 | //! Recognizes URL patterns with support for dynamic and wildcard segments 2 | //! 3 | //! # Examples 4 | //! 5 | //! ``` 6 | //! use route_recognizer::{Router, Params}; 7 | //! 8 | //! let mut router = Router::new(); 9 | //! 10 | //! router.add("/thomas", "Thomas".to_string()); 11 | //! router.add("/tom", "Tom".to_string()); 12 | //! router.add("/wycats", "Yehuda".to_string()); 13 | //! 14 | //! let m = router.recognize("/thomas").unwrap(); 15 | //! 16 | //! assert_eq!(m.handler().as_str(), "Thomas"); 17 | //! assert_eq!(m.params(), &Params::new()); 18 | //! ``` 19 | //! 20 | //! # Routing params 21 | //! 22 | //! The router supports four kinds of route segments: 23 | //! - __segments__: these are of the format `/a/b`. 24 | //! - __params__: these are of the format `/a/:b`. 25 | //! - __named wildcards__: these are of the format `/a/*b`. 26 | //! - __unnamed wildcards__: these are of the format `/a/*`. 27 | //! 28 | //! The difference between a "named wildcard" and a "param" is how the 29 | //! matching rules apply. Given the router `/a/:b`, passing in `/foo/bar/baz` 30 | //! will not match because `/baz` has no counterpart in the router. 31 | //! 32 | //! However if we define the route `/a/*b` and we pass `/foo/bar/baz` we end up 33 | //! with a named param `"b"` that contains the value `"bar/baz"`. Wildcard 34 | //! routing rules are useful when you don't know which routes may follow. The 35 | //! difference between "named" and "unnamed" wildcards is that the former will 36 | //! show up in `Params`, while the latter won't. 37 | 38 | #![cfg_attr(feature = "docs", feature(doc_cfg))] 39 | #![deny(unsafe_code)] 40 | #![deny(missing_debug_implementations, nonstandard_style)] 41 | #![warn(missing_docs, unreachable_pub, future_incompatible, rust_2018_idioms)] 42 | #![doc(test(attr(deny(warnings))))] 43 | #![doc(test(attr(allow(unused_extern_crates, unused_variables))))] 44 | #![doc(html_favicon_url = "https://yoshuawuyts.com/assets/http-rs/favicon.ico")] 45 | #![doc(html_logo_url = "https://yoshuawuyts.com/assets/http-rs/logo-rounded.png")] 46 | 47 | use std::cmp::Ordering; 48 | use std::collections::{btree_map, BTreeMap}; 49 | use std::ops::Index; 50 | 51 | use crate::nfa::{CharacterClass, NFA}; 52 | 53 | #[doc(hidden)] 54 | pub mod nfa; 55 | 56 | #[derive(Clone, Eq, Debug)] 57 | struct Metadata { 58 | statics: u32, 59 | dynamics: u32, 60 | wildcards: u32, 61 | param_names: Vec, 62 | } 63 | 64 | impl Metadata { 65 | pub(crate) fn new() -> Self { 66 | Self { 67 | statics: 0, 68 | dynamics: 0, 69 | wildcards: 0, 70 | param_names: Vec::new(), 71 | } 72 | } 73 | } 74 | 75 | impl Ord for Metadata { 76 | fn cmp(&self, other: &Self) -> Ordering { 77 | if self.statics > other.statics { 78 | Ordering::Greater 79 | } else if self.statics < other.statics { 80 | Ordering::Less 81 | } else if self.dynamics > other.dynamics { 82 | Ordering::Greater 83 | } else if self.dynamics < other.dynamics { 84 | Ordering::Less 85 | } else if self.wildcards > other.wildcards { 86 | Ordering::Greater 87 | } else if self.wildcards < other.wildcards { 88 | Ordering::Less 89 | } else { 90 | Ordering::Equal 91 | } 92 | } 93 | } 94 | 95 | impl PartialOrd for Metadata { 96 | fn partial_cmp(&self, other: &Self) -> Option { 97 | Some(self.cmp(other)) 98 | } 99 | } 100 | 101 | impl PartialEq for Metadata { 102 | fn eq(&self, other: &Self) -> bool { 103 | self.statics == other.statics 104 | && self.dynamics == other.dynamics 105 | && self.wildcards == other.wildcards 106 | } 107 | } 108 | 109 | /// Router parameters. 110 | #[derive(PartialEq, Clone, Debug, Default)] 111 | pub struct Params { 112 | map: BTreeMap, 113 | } 114 | 115 | impl Params { 116 | /// Create a new instance of `Params`. 117 | pub fn new() -> Self { 118 | Self { 119 | map: BTreeMap::new(), 120 | } 121 | } 122 | 123 | /// Insert a new param into `Params`. 124 | pub fn insert(&mut self, key: String, value: String) { 125 | self.map.insert(key, value); 126 | } 127 | 128 | /// Find a param by name in `Params`. 129 | pub fn find(&self, key: &str) -> Option<&str> { 130 | self.map.get(key).map(|s| &s[..]) 131 | } 132 | 133 | /// Iterate over all named params. 134 | /// 135 | /// This will return all named params and named wildcards. 136 | pub fn iter(&self) -> Iter<'_> { 137 | Iter(self.map.iter()) 138 | } 139 | } 140 | 141 | impl Index<&str> for Params { 142 | type Output = String; 143 | fn index(&self, index: &str) -> &String { 144 | match self.map.get(index) { 145 | None => panic!("params[{}] did not exist", index), 146 | Some(s) => s, 147 | } 148 | } 149 | } 150 | 151 | impl<'a> IntoIterator for &'a Params { 152 | type IntoIter = Iter<'a>; 153 | type Item = (&'a str, &'a str); 154 | 155 | fn into_iter(self) -> Iter<'a> { 156 | self.iter() 157 | } 158 | } 159 | 160 | /// An iterator over `Params`. 161 | #[derive(Debug)] 162 | pub struct Iter<'a>(btree_map::Iter<'a, String, String>); 163 | 164 | impl<'a> Iterator for Iter<'a> { 165 | type Item = (&'a str, &'a str); 166 | 167 | #[inline] 168 | fn next(&mut self) -> Option<(&'a str, &'a str)> { 169 | self.0.next().map(|(k, v)| (&**k, &**v)) 170 | } 171 | 172 | fn size_hint(&self) -> (usize, Option) { 173 | self.0.size_hint() 174 | } 175 | } 176 | 177 | /// The result of a successful match returned by `Router::recognize`. 178 | #[derive(Debug)] 179 | pub struct Match { 180 | /// Return the endpoint handler. 181 | handler: T, 182 | /// Return the params. 183 | params: Params, 184 | } 185 | 186 | impl Match { 187 | /// Create a new instance of `Match`. 188 | pub fn new(handler: T, params: Params) -> Self { 189 | Self { handler, params } 190 | } 191 | 192 | /// Get a handle to the handler. 193 | pub fn handler(&self) -> &T { 194 | &self.handler 195 | } 196 | 197 | /// Get a mutable handle to the handler. 198 | pub fn handler_mut(&mut self) -> &mut T { 199 | &mut self.handler 200 | } 201 | 202 | /// Get a handle to the params. 203 | pub fn params(&self) -> &Params { 204 | &self.params 205 | } 206 | 207 | /// Get a mutable handle to the params. 208 | pub fn params_mut(&mut self) -> &mut Params { 209 | &mut self.params 210 | } 211 | } 212 | 213 | /// Recognizes URL patterns with support for dynamic and wildcard segments. 214 | #[derive(Clone, Debug)] 215 | pub struct Router { 216 | nfa: NFA, 217 | handlers: BTreeMap, 218 | } 219 | 220 | fn segments(route: &str) -> Vec<(Option, &str)> { 221 | let predicate = |c| c == '.' || c == '/'; 222 | 223 | let mut segments = vec![]; 224 | let mut segment_start = 0; 225 | 226 | while segment_start < route.len() { 227 | let segment_end = route[segment_start + 1..] 228 | .find(predicate) 229 | .map(|i| i + segment_start + 1) 230 | .unwrap_or_else(|| route.len()); 231 | let potential_sep = route.chars().nth(segment_start); 232 | let sep_and_segment = match potential_sep { 233 | Some(sep) if predicate(sep) => (Some(sep), &route[segment_start + 1..segment_end]), 234 | _ => (None, &route[segment_start..segment_end]), 235 | }; 236 | 237 | segments.push(sep_and_segment); 238 | segment_start = segment_end; 239 | } 240 | 241 | segments 242 | } 243 | 244 | impl Router { 245 | /// Create a new instance of `Router`. 246 | pub fn new() -> Self { 247 | Self { 248 | nfa: NFA::new(), 249 | handlers: BTreeMap::new(), 250 | } 251 | } 252 | 253 | /// Add a route to the router. 254 | pub fn add(&mut self, mut route: &str, dest: T) { 255 | if !route.is_empty() && route.as_bytes()[0] == b'/' { 256 | route = &route[1..]; 257 | } 258 | 259 | let nfa = &mut self.nfa; 260 | let mut state = 0; 261 | let mut metadata = Metadata::new(); 262 | 263 | for (separator, segment) in segments(route) { 264 | if let Some(separator) = separator { 265 | state = nfa.put(state, CharacterClass::valid_char(separator)); 266 | } 267 | 268 | if !segment.is_empty() && segment.as_bytes()[0] == b':' { 269 | state = process_dynamic_segment(nfa, state); 270 | metadata.dynamics += 1; 271 | metadata.param_names.push(segment[1..].to_string()); 272 | } else if !segment.is_empty() && segment.as_bytes()[0] == b'*' { 273 | state = process_star_state(nfa, state); 274 | metadata.wildcards += 1; 275 | metadata.param_names.push(segment[1..].to_string()); 276 | } else { 277 | state = process_static_segment(segment, nfa, state); 278 | metadata.statics += 1; 279 | } 280 | } 281 | 282 | nfa.acceptance(state); 283 | nfa.metadata(state, metadata); 284 | self.handlers.insert(state, dest); 285 | } 286 | 287 | /// Match a route on the router. 288 | pub fn recognize(&self, mut path: &str) -> Result, String> { 289 | if !path.is_empty() && path.as_bytes()[0] == b'/' { 290 | path = &path[1..]; 291 | } 292 | 293 | let nfa = &self.nfa; 294 | let result = nfa.process(path, |index| nfa.get(index).metadata.as_ref().unwrap()); 295 | 296 | match result { 297 | Ok(nfa_match) => { 298 | let mut map = Params::new(); 299 | let state = &nfa.get(nfa_match.state); 300 | let metadata = state.metadata.as_ref().unwrap(); 301 | let param_names = metadata.param_names.clone(); 302 | 303 | for (i, capture) in nfa_match.captures.iter().enumerate() { 304 | if !param_names[i].is_empty() { 305 | map.insert(param_names[i].to_string(), capture.to_string()); 306 | } 307 | } 308 | 309 | let handler = self.handlers.get(&nfa_match.state).unwrap(); 310 | Ok(Match::new(handler, map)) 311 | } 312 | Err(str) => Err(str), 313 | } 314 | } 315 | } 316 | 317 | impl Default for Router { 318 | fn default() -> Self { 319 | Self::new() 320 | } 321 | } 322 | 323 | fn process_static_segment(segment: &str, nfa: &mut NFA, mut state: usize) -> usize { 324 | for char in segment.chars() { 325 | state = nfa.put(state, CharacterClass::valid_char(char)); 326 | } 327 | 328 | state 329 | } 330 | 331 | fn process_dynamic_segment(nfa: &mut NFA, mut state: usize) -> usize { 332 | state = nfa.put(state, CharacterClass::invalid_char('/')); 333 | nfa.put_state(state, state); 334 | nfa.start_capture(state); 335 | nfa.end_capture(state); 336 | 337 | state 338 | } 339 | 340 | fn process_star_state(nfa: &mut NFA, mut state: usize) -> usize { 341 | state = nfa.put(state, CharacterClass::any()); 342 | nfa.put_state(state, state); 343 | nfa.start_capture(state); 344 | nfa.end_capture(state); 345 | 346 | state 347 | } 348 | 349 | #[cfg(test)] 350 | mod tests { 351 | use super::{Params, Router}; 352 | 353 | #[test] 354 | fn basic_router() { 355 | let mut router = Router::new(); 356 | 357 | router.add("/thomas", "Thomas".to_string()); 358 | router.add("/tom", "Tom".to_string()); 359 | router.add("/wycats", "Yehuda".to_string()); 360 | 361 | let m = router.recognize("/thomas").unwrap(); 362 | 363 | assert_eq!(*m.handler, "Thomas".to_string()); 364 | assert_eq!(m.params, Params::new()); 365 | } 366 | 367 | #[test] 368 | fn root_router() { 369 | let mut router = Router::new(); 370 | router.add("/", 10); 371 | assert_eq!(*router.recognize("/").unwrap().handler, 10) 372 | } 373 | 374 | #[test] 375 | fn empty_path() { 376 | let mut router = Router::new(); 377 | router.add("/", 12); 378 | assert_eq!(*router.recognize("").unwrap().handler, 12) 379 | } 380 | 381 | #[test] 382 | fn empty_route() { 383 | let mut router = Router::new(); 384 | router.add("", 12); 385 | assert_eq!(*router.recognize("/").unwrap().handler, 12) 386 | } 387 | 388 | #[test] 389 | fn ambiguous_router() { 390 | let mut router = Router::new(); 391 | 392 | router.add("/posts/new", "new".to_string()); 393 | router.add("/posts/:id", "id".to_string()); 394 | 395 | let id = router.recognize("/posts/1").unwrap(); 396 | 397 | assert_eq!(*id.handler, "id".to_string()); 398 | assert_eq!(id.params, params("id", "1")); 399 | 400 | let new = router.recognize("/posts/new").unwrap(); 401 | assert_eq!(*new.handler, "new".to_string()); 402 | assert_eq!(new.params, Params::new()); 403 | } 404 | 405 | #[test] 406 | fn ambiguous_router_b() { 407 | let mut router = Router::new(); 408 | 409 | router.add("/posts/:id", "id".to_string()); 410 | router.add("/posts/new", "new".to_string()); 411 | 412 | let id = router.recognize("/posts/1").unwrap(); 413 | 414 | assert_eq!(*id.handler, "id".to_string()); 415 | assert_eq!(id.params, params("id", "1")); 416 | 417 | let new = router.recognize("/posts/new").unwrap(); 418 | assert_eq!(*new.handler, "new".to_string()); 419 | assert_eq!(new.params, Params::new()); 420 | } 421 | 422 | #[test] 423 | fn multiple_params() { 424 | let mut router = Router::new(); 425 | 426 | router.add("/posts/:post_id/comments/:id", "comment".to_string()); 427 | router.add("/posts/:post_id/comments", "comments".to_string()); 428 | 429 | let com = router.recognize("/posts/12/comments/100").unwrap(); 430 | let coms = router.recognize("/posts/12/comments").unwrap(); 431 | 432 | assert_eq!(*com.handler, "comment".to_string()); 433 | assert_eq!(com.params, two_params("post_id", "12", "id", "100")); 434 | 435 | assert_eq!(*coms.handler, "comments".to_string()); 436 | assert_eq!(coms.params, params("post_id", "12")); 437 | assert_eq!(coms.params["post_id"], "12".to_string()); 438 | } 439 | 440 | #[test] 441 | fn wildcard() { 442 | let mut router = Router::new(); 443 | 444 | router.add("*foo", "test".to_string()); 445 | router.add("/bar/*foo", "test2".to_string()); 446 | 447 | let m = router.recognize("/test").unwrap(); 448 | assert_eq!(*m.handler, "test".to_string()); 449 | assert_eq!(m.params, params("foo", "test")); 450 | 451 | let m = router.recognize("/foo/bar").unwrap(); 452 | assert_eq!(*m.handler, "test".to_string()); 453 | assert_eq!(m.params, params("foo", "foo/bar")); 454 | 455 | let m = router.recognize("/bar/foo").unwrap(); 456 | assert_eq!(*m.handler, "test2".to_string()); 457 | assert_eq!(m.params, params("foo", "foo")); 458 | } 459 | 460 | #[test] 461 | fn wildcard_colon() { 462 | let mut router = Router::new(); 463 | 464 | router.add("/a/*b", "ab".to_string()); 465 | router.add("/a/*b/c", "abc".to_string()); 466 | router.add("/a/*b/c/:d", "abcd".to_string()); 467 | 468 | let m = router.recognize("/a/foo").unwrap(); 469 | assert_eq!(*m.handler, "ab".to_string()); 470 | assert_eq!(m.params, params("b", "foo")); 471 | 472 | let m = router.recognize("/a/foo/bar").unwrap(); 473 | assert_eq!(*m.handler, "ab".to_string()); 474 | assert_eq!(m.params, params("b", "foo/bar")); 475 | 476 | let m = router.recognize("/a/foo/c").unwrap(); 477 | assert_eq!(*m.handler, "abc".to_string()); 478 | assert_eq!(m.params, params("b", "foo")); 479 | 480 | let m = router.recognize("/a/foo/bar/c").unwrap(); 481 | assert_eq!(*m.handler, "abc".to_string()); 482 | assert_eq!(m.params, params("b", "foo/bar")); 483 | 484 | let m = router.recognize("/a/foo/c/baz").unwrap(); 485 | assert_eq!(*m.handler, "abcd".to_string()); 486 | assert_eq!(m.params, two_params("b", "foo", "d", "baz")); 487 | 488 | let m = router.recognize("/a/foo/bar/c/baz").unwrap(); 489 | assert_eq!(*m.handler, "abcd".to_string()); 490 | assert_eq!(m.params, two_params("b", "foo/bar", "d", "baz")); 491 | 492 | let m = router.recognize("/a/foo/bar/c/baz/bay").unwrap(); 493 | assert_eq!(*m.handler, "ab".to_string()); 494 | assert_eq!(m.params, params("b", "foo/bar/c/baz/bay")); 495 | } 496 | 497 | #[test] 498 | fn unnamed_parameters() { 499 | let mut router = Router::new(); 500 | 501 | router.add("/foo/:/bar", "test".to_string()); 502 | router.add("/foo/:bar/*", "test2".to_string()); 503 | let m = router.recognize("/foo/test/bar").unwrap(); 504 | assert_eq!(*m.handler, "test"); 505 | assert_eq!(m.params, Params::new()); 506 | 507 | let m = router.recognize("/foo/test/blah").unwrap(); 508 | assert_eq!(*m.handler, "test2"); 509 | assert_eq!(m.params, params("bar", "test")); 510 | } 511 | 512 | fn params(key: &str, val: &str) -> Params { 513 | let mut map = Params::new(); 514 | map.insert(key.to_string(), val.to_string()); 515 | map 516 | } 517 | 518 | fn two_params(k1: &str, v1: &str, k2: &str, v2: &str) -> Params { 519 | let mut map = Params::new(); 520 | map.insert(k1.to_string(), v1.to_string()); 521 | map.insert(k2.to_string(), v2.to_string()); 522 | map 523 | } 524 | 525 | #[test] 526 | fn dot() { 527 | let mut router = Router::new(); 528 | router.add("/1/baz.:wibble", ()); 529 | router.add("/2/:bar.baz", ()); 530 | router.add("/3/:dynamic.:extension", ()); 531 | router.add("/4/static.static", ()); 532 | 533 | let m = router.recognize("/1/baz.jpg").unwrap(); 534 | assert_eq!(m.params, params("wibble", "jpg")); 535 | 536 | let m = router.recognize("/2/test.baz").unwrap(); 537 | assert_eq!(m.params, params("bar", "test")); 538 | 539 | let m = router.recognize("/3/any.thing").unwrap(); 540 | assert_eq!(m.params, two_params("dynamic", "any", "extension", "thing")); 541 | 542 | let m = router.recognize("/3/this.performs.a.greedy.match").unwrap(); 543 | assert_eq!( 544 | m.params, 545 | two_params("dynamic", "this.performs.a.greedy", "extension", "match") 546 | ); 547 | 548 | let m = router.recognize("/4/static.static").unwrap(); 549 | assert_eq!(m.params, Params::new()); 550 | 551 | let m = router.recognize("/4/static/static"); 552 | assert!(m.is_err()); 553 | 554 | let m = router.recognize("/4.static.static"); 555 | assert!(m.is_err()); 556 | } 557 | 558 | #[test] 559 | fn test_chinese() { 560 | let mut router = Router::new(); 561 | router.add("/crates/:foo/:bar", "Hello".to_string()); 562 | 563 | let m = router.recognize("/crates/实打实打算/d's'd").unwrap(); 564 | assert_eq!(m.handler().as_str(), "Hello"); 565 | assert_eq!(m.params().find("foo"), Some("实打实打算")); 566 | assert_eq!(m.params().find("bar"), Some("d's'd")); 567 | } 568 | } 569 | -------------------------------------------------------------------------------- /src/nfa.rs: -------------------------------------------------------------------------------- 1 | use std::collections::HashSet; 2 | 3 | use self::CharacterClass::{Ascii, InvalidChars, ValidChars}; 4 | 5 | #[derive(PartialEq, Eq, Clone, Default, Debug)] 6 | pub struct CharSet { 7 | low_mask: u64, 8 | high_mask: u64, 9 | non_ascii: HashSet, 10 | } 11 | 12 | impl CharSet { 13 | pub fn new() -> Self { 14 | Self { 15 | low_mask: 0, 16 | high_mask: 0, 17 | non_ascii: HashSet::new(), 18 | } 19 | } 20 | 21 | pub fn insert(&mut self, char: char) { 22 | let val = char as u32 - 1; 23 | 24 | if val > 127 { 25 | self.non_ascii.insert(char); 26 | } else if val > 63 { 27 | let bit = 1 << (val - 64); 28 | self.high_mask |= bit; 29 | } else { 30 | let bit = 1 << val; 31 | self.low_mask |= bit; 32 | } 33 | } 34 | 35 | pub fn contains(&self, char: char) -> bool { 36 | let val = char as u32 - 1; 37 | 38 | if val > 127 { 39 | self.non_ascii.contains(&char) 40 | } else if val > 63 { 41 | let bit = 1 << (val - 64); 42 | self.high_mask & bit != 0 43 | } else { 44 | let bit = 1 << val; 45 | self.low_mask & bit != 0 46 | } 47 | } 48 | } 49 | 50 | #[derive(PartialEq, Eq, Clone, Debug)] 51 | pub enum CharacterClass { 52 | Ascii(u64, u64, bool), 53 | ValidChars(CharSet), 54 | InvalidChars(CharSet), 55 | } 56 | 57 | impl CharacterClass { 58 | pub fn any() -> Self { 59 | Ascii(u64::max_value(), u64::max_value(), true) 60 | } 61 | 62 | pub fn valid(string: &str) -> Self { 63 | ValidChars(Self::str_to_set(string)) 64 | } 65 | 66 | pub fn invalid(string: &str) -> Self { 67 | InvalidChars(Self::str_to_set(string)) 68 | } 69 | 70 | pub fn valid_char(char: char) -> Self { 71 | let val = char as u32 - 1; 72 | 73 | if val > 127 { 74 | ValidChars(Self::char_to_set(char)) 75 | } else if val > 63 { 76 | Ascii(1 << (val - 64), 0, false) 77 | } else { 78 | Ascii(0, 1 << val, false) 79 | } 80 | } 81 | 82 | pub fn invalid_char(char: char) -> Self { 83 | let val = char as u32 - 1; 84 | 85 | if val > 127 { 86 | InvalidChars(Self::char_to_set(char)) 87 | } else if val > 63 { 88 | Ascii(u64::max_value() ^ (1 << (val - 64)), u64::max_value(), true) 89 | } else { 90 | Ascii(u64::max_value(), u64::max_value() ^ (1 << val), true) 91 | } 92 | } 93 | 94 | pub fn matches(&self, char: char) -> bool { 95 | match *self { 96 | ValidChars(ref valid) => valid.contains(char), 97 | InvalidChars(ref invalid) => !invalid.contains(char), 98 | Ascii(high, low, unicode) => { 99 | let val = char as u32 - 1; 100 | if val > 127 { 101 | unicode 102 | } else if val > 63 { 103 | high & (1 << (val - 64)) != 0 104 | } else { 105 | low & (1 << val) != 0 106 | } 107 | } 108 | } 109 | } 110 | 111 | fn char_to_set(char: char) -> CharSet { 112 | let mut set = CharSet::new(); 113 | set.insert(char); 114 | set 115 | } 116 | 117 | fn str_to_set(string: &str) -> CharSet { 118 | let mut set = CharSet::new(); 119 | for char in string.chars() { 120 | set.insert(char); 121 | } 122 | set 123 | } 124 | } 125 | 126 | #[derive(Clone)] 127 | struct Thread { 128 | state: usize, 129 | captures: Vec<(usize, usize)>, 130 | capture_begin: Option, 131 | } 132 | 133 | impl Thread { 134 | pub(crate) fn new() -> Self { 135 | Self { 136 | state: 0, 137 | captures: Vec::new(), 138 | capture_begin: None, 139 | } 140 | } 141 | 142 | #[inline] 143 | pub(crate) fn start_capture(&mut self, start: usize) { 144 | self.capture_begin = Some(start); 145 | } 146 | 147 | #[inline] 148 | pub(crate) fn end_capture(&mut self, end: usize) { 149 | self.captures.push((self.capture_begin.unwrap(), end)); 150 | self.capture_begin = None; 151 | } 152 | 153 | pub(crate) fn extract<'a>(&self, source: &'a str) -> Vec<&'a str> { 154 | self.captures 155 | .iter() 156 | .map(|&(begin, end)| &source[begin..end]) 157 | .collect() 158 | } 159 | } 160 | 161 | #[derive(Clone, Debug)] 162 | pub struct State { 163 | pub index: usize, 164 | pub chars: CharacterClass, 165 | pub next_states: Vec, 166 | pub acceptance: bool, 167 | pub start_capture: bool, 168 | pub end_capture: bool, 169 | pub metadata: Option, 170 | } 171 | 172 | impl PartialEq for State { 173 | fn eq(&self, other: &Self) -> bool { 174 | self.index == other.index 175 | } 176 | } 177 | 178 | impl State { 179 | pub fn new(index: usize, chars: CharacterClass) -> Self { 180 | Self { 181 | index, 182 | chars, 183 | next_states: Vec::new(), 184 | acceptance: false, 185 | start_capture: false, 186 | end_capture: false, 187 | metadata: None, 188 | } 189 | } 190 | } 191 | 192 | #[derive(Debug)] 193 | pub struct Match<'a> { 194 | pub state: usize, 195 | pub captures: Vec<&'a str>, 196 | } 197 | 198 | impl<'a> Match<'a> { 199 | pub fn new(state: usize, captures: Vec<&'_ str>) -> Match<'_> { 200 | Match { state, captures } 201 | } 202 | } 203 | 204 | #[derive(Clone, Default, Debug)] 205 | pub struct NFA { 206 | states: Vec>, 207 | start_capture: Vec, 208 | end_capture: Vec, 209 | acceptance: Vec, 210 | } 211 | 212 | impl NFA { 213 | pub fn new() -> Self { 214 | let root = State::new(0, CharacterClass::any()); 215 | Self { 216 | states: vec![root], 217 | start_capture: vec![false], 218 | end_capture: vec![false], 219 | acceptance: vec![false], 220 | } 221 | } 222 | 223 | pub fn process<'a, I, F>(&self, string: &'a str, mut ord: F) -> Result, String> 224 | where 225 | I: Ord, 226 | F: FnMut(usize) -> I, 227 | { 228 | let mut threads = vec![Thread::new()]; 229 | 230 | for (i, char) in string.char_indices() { 231 | let next_threads = self.process_char(threads, char, i); 232 | 233 | if next_threads.is_empty() { 234 | return Err(format!("Couldn't process {}", string)); 235 | } 236 | 237 | threads = next_threads; 238 | } 239 | 240 | let returned = threads 241 | .into_iter() 242 | .filter(|thread| self.get(thread.state).acceptance); 243 | 244 | let thread = returned 245 | .fold(None, |prev, y| { 246 | let y_v = ord(y.state); 247 | match prev { 248 | None => Some((y_v, y)), 249 | Some((x_v, x)) => { 250 | if x_v < y_v { 251 | Some((y_v, y)) 252 | } else { 253 | Some((x_v, x)) 254 | } 255 | } 256 | } 257 | }) 258 | .map(|p| p.1); 259 | 260 | match thread { 261 | None => Err("The string was exhausted before reaching an \ 262 | acceptance state" 263 | .to_string()), 264 | Some(mut thread) => { 265 | if thread.capture_begin.is_some() { 266 | thread.end_capture(string.len()); 267 | } 268 | let state = self.get(thread.state); 269 | Ok(Match::new(state.index, thread.extract(string))) 270 | } 271 | } 272 | } 273 | 274 | #[inline] 275 | fn process_char(&self, threads: Vec, char: char, pos: usize) -> Vec { 276 | let mut returned = Vec::with_capacity(threads.len()); 277 | 278 | for mut thread in threads { 279 | let current_state = self.get(thread.state); 280 | 281 | let mut count = 0; 282 | let mut found_state = 0; 283 | 284 | for &index in ¤t_state.next_states { 285 | let state = &self.states[index]; 286 | 287 | if state.chars.matches(char) { 288 | count += 1; 289 | found_state = index; 290 | } 291 | } 292 | 293 | if count == 1 { 294 | thread.state = found_state; 295 | capture(self, &mut thread, current_state.index, found_state, pos); 296 | returned.push(thread); 297 | continue; 298 | } 299 | 300 | for &index in ¤t_state.next_states { 301 | let state = &self.states[index]; 302 | if state.chars.matches(char) { 303 | let mut thread = fork_thread(&thread, state); 304 | capture(self, &mut thread, current_state.index, index, pos); 305 | returned.push(thread); 306 | } 307 | } 308 | } 309 | 310 | returned 311 | } 312 | 313 | #[inline] 314 | pub fn get(&self, state: usize) -> &State { 315 | &self.states[state] 316 | } 317 | 318 | pub fn get_mut(&mut self, state: usize) -> &mut State { 319 | &mut self.states[state] 320 | } 321 | 322 | pub fn put(&mut self, index: usize, chars: CharacterClass) -> usize { 323 | { 324 | let state = self.get(index); 325 | 326 | for &index in &state.next_states { 327 | let state = self.get(index); 328 | if state.chars == chars { 329 | return index; 330 | } 331 | } 332 | } 333 | 334 | let state = self.new_state(chars); 335 | self.get_mut(index).next_states.push(state); 336 | state 337 | } 338 | 339 | pub fn put_state(&mut self, index: usize, child: usize) { 340 | if !self.states[index].next_states.contains(&child) { 341 | self.get_mut(index).next_states.push(child); 342 | } 343 | } 344 | 345 | pub fn acceptance(&mut self, index: usize) { 346 | self.get_mut(index).acceptance = true; 347 | self.acceptance[index] = true; 348 | } 349 | 350 | pub fn start_capture(&mut self, index: usize) { 351 | self.get_mut(index).start_capture = true; 352 | self.start_capture[index] = true; 353 | } 354 | 355 | pub fn end_capture(&mut self, index: usize) { 356 | self.get_mut(index).end_capture = true; 357 | self.end_capture[index] = true; 358 | } 359 | 360 | pub fn metadata(&mut self, index: usize, metadata: T) { 361 | self.get_mut(index).metadata = Some(metadata); 362 | } 363 | 364 | fn new_state(&mut self, chars: CharacterClass) -> usize { 365 | let index = self.states.len(); 366 | let state = State::new(index, chars); 367 | self.states.push(state); 368 | 369 | self.acceptance.push(false); 370 | self.start_capture.push(false); 371 | self.end_capture.push(false); 372 | 373 | index 374 | } 375 | } 376 | 377 | #[inline] 378 | fn fork_thread(thread: &Thread, state: &State) -> Thread { 379 | let mut new_trace = thread.clone(); 380 | new_trace.state = state.index; 381 | new_trace 382 | } 383 | 384 | #[inline] 385 | fn capture( 386 | nfa: &NFA, 387 | thread: &mut Thread, 388 | current_state: usize, 389 | next_state: usize, 390 | pos: usize, 391 | ) { 392 | if thread.capture_begin == None && nfa.start_capture[next_state] { 393 | thread.start_capture(pos); 394 | } 395 | 396 | if thread.capture_begin != None && nfa.end_capture[current_state] && next_state > current_state 397 | { 398 | thread.end_capture(pos); 399 | } 400 | } 401 | 402 | #[cfg(test)] 403 | mod tests { 404 | use super::{CharSet, CharacterClass, NFA}; 405 | 406 | #[test] 407 | fn basic_test() { 408 | let mut nfa = NFA::<()>::new(); 409 | let a = nfa.put(0, CharacterClass::valid("h")); 410 | let b = nfa.put(a, CharacterClass::valid("e")); 411 | let c = nfa.put(b, CharacterClass::valid("l")); 412 | let d = nfa.put(c, CharacterClass::valid("l")); 413 | let e = nfa.put(d, CharacterClass::valid("o")); 414 | nfa.acceptance(e); 415 | 416 | let m = nfa.process("hello", |a| a); 417 | 418 | assert!( 419 | m.unwrap().state == e, 420 | "You didn't get the right final state" 421 | ); 422 | } 423 | 424 | #[test] 425 | fn multiple_solutions() { 426 | let mut nfa = NFA::<()>::new(); 427 | let a1 = nfa.put(0, CharacterClass::valid("n")); 428 | let b1 = nfa.put(a1, CharacterClass::valid("e")); 429 | let c1 = nfa.put(b1, CharacterClass::valid("w")); 430 | nfa.acceptance(c1); 431 | 432 | let a2 = nfa.put(0, CharacterClass::invalid("")); 433 | let b2 = nfa.put(a2, CharacterClass::invalid("")); 434 | let c2 = nfa.put(b2, CharacterClass::invalid("")); 435 | nfa.acceptance(c2); 436 | 437 | let m = nfa.process("new", |a| a); 438 | 439 | assert!(m.unwrap().state == c2, "The two states were not found"); 440 | } 441 | 442 | #[test] 443 | fn multiple_paths() { 444 | let mut nfa = NFA::<()>::new(); 445 | let a = nfa.put(0, CharacterClass::valid("t")); // t 446 | let b1 = nfa.put(a, CharacterClass::valid("h")); // th 447 | let c1 = nfa.put(b1, CharacterClass::valid("o")); // tho 448 | let d1 = nfa.put(c1, CharacterClass::valid("m")); // thom 449 | let e1 = nfa.put(d1, CharacterClass::valid("a")); // thoma 450 | let f1 = nfa.put(e1, CharacterClass::valid("s")); // thomas 451 | 452 | let b2 = nfa.put(a, CharacterClass::valid("o")); // to 453 | let c2 = nfa.put(b2, CharacterClass::valid("m")); // tom 454 | 455 | nfa.acceptance(f1); 456 | nfa.acceptance(c2); 457 | 458 | let thomas = nfa.process("thomas", |a| a); 459 | let tom = nfa.process("tom", |a| a); 460 | let thom = nfa.process("thom", |a| a); 461 | let nope = nfa.process("nope", |a| a); 462 | 463 | assert!(thomas.unwrap().state == f1, "thomas was parsed correctly"); 464 | assert!(tom.unwrap().state == c2, "tom was parsed correctly"); 465 | assert!(thom.is_err(), "thom didn't reach an acceptance state"); 466 | assert!(nope.is_err(), "nope wasn't parsed"); 467 | } 468 | 469 | #[test] 470 | fn repetitions() { 471 | let mut nfa = NFA::<()>::new(); 472 | let a = nfa.put(0, CharacterClass::valid("p")); // p 473 | let b = nfa.put(a, CharacterClass::valid("o")); // po 474 | let c = nfa.put(b, CharacterClass::valid("s")); // pos 475 | let d = nfa.put(c, CharacterClass::valid("t")); // post 476 | let e = nfa.put(d, CharacterClass::valid("s")); // posts 477 | let f = nfa.put(e, CharacterClass::valid("/")); // posts/ 478 | let g = nfa.put(f, CharacterClass::invalid("/")); // posts/[^/] 479 | nfa.put_state(g, g); 480 | 481 | nfa.acceptance(g); 482 | 483 | let post = nfa.process("posts/1", |a| a); 484 | let new_post = nfa.process("posts/new", |a| a); 485 | let invalid = nfa.process("posts/", |a| a); 486 | 487 | assert!(post.unwrap().state == g, "posts/1 was parsed"); 488 | assert!(new_post.unwrap().state == g, "posts/new was parsed"); 489 | assert!(invalid.is_err(), "posts/ was invalid"); 490 | } 491 | 492 | #[test] 493 | fn repetitions_with_ambiguous() { 494 | let mut nfa = NFA::<()>::new(); 495 | let a = nfa.put(0, CharacterClass::valid("p")); // p 496 | let b = nfa.put(a, CharacterClass::valid("o")); // po 497 | let c = nfa.put(b, CharacterClass::valid("s")); // pos 498 | let d = nfa.put(c, CharacterClass::valid("t")); // post 499 | let e = nfa.put(d, CharacterClass::valid("s")); // posts 500 | let f = nfa.put(e, CharacterClass::valid("/")); // posts/ 501 | let g1 = nfa.put(f, CharacterClass::invalid("/")); // posts/[^/] 502 | let g2 = nfa.put(f, CharacterClass::valid("n")); // posts/n 503 | let h2 = nfa.put(g2, CharacterClass::valid("e")); // posts/ne 504 | let i2 = nfa.put(h2, CharacterClass::valid("w")); // posts/new 505 | 506 | nfa.put_state(g1, g1); 507 | 508 | nfa.acceptance(g1); 509 | nfa.acceptance(i2); 510 | 511 | let post = nfa.process("posts/1", |a| a); 512 | let ambiguous = nfa.process("posts/new", |a| a); 513 | let invalid = nfa.process("posts/", |a| a); 514 | 515 | assert!(post.unwrap().state == g1, "posts/1 was parsed"); 516 | assert!(ambiguous.unwrap().state == i2, "posts/new was ambiguous"); 517 | assert!(invalid.is_err(), "posts/ was invalid"); 518 | } 519 | 520 | #[test] 521 | fn captures() { 522 | let mut nfa = NFA::<()>::new(); 523 | let a = nfa.put(0, CharacterClass::valid("n")); 524 | let b = nfa.put(a, CharacterClass::valid("e")); 525 | let c = nfa.put(b, CharacterClass::valid("w")); 526 | 527 | nfa.acceptance(c); 528 | nfa.start_capture(a); 529 | nfa.end_capture(c); 530 | 531 | let post = nfa.process("new", |a| a); 532 | 533 | assert_eq!(post.unwrap().captures, vec!["new"]); 534 | } 535 | 536 | #[test] 537 | fn capture_mid_match() { 538 | let mut nfa = NFA::<()>::new(); 539 | let a = nfa.put(0, valid('p')); 540 | let b = nfa.put(a, valid('/')); 541 | let c = nfa.put(b, invalid('/')); 542 | let d = nfa.put(c, valid('/')); 543 | let e = nfa.put(d, valid('c')); 544 | 545 | nfa.put_state(c, c); 546 | nfa.acceptance(e); 547 | nfa.start_capture(c); 548 | nfa.end_capture(c); 549 | 550 | let post = nfa.process("p/123/c", |a| a); 551 | 552 | assert_eq!(post.unwrap().captures, vec!["123"]); 553 | } 554 | 555 | #[test] 556 | fn capture_multiple_captures() { 557 | let mut nfa = NFA::<()>::new(); 558 | let a = nfa.put(0, valid('p')); 559 | let b = nfa.put(a, valid('/')); 560 | let c = nfa.put(b, invalid('/')); 561 | let d = nfa.put(c, valid('/')); 562 | let e = nfa.put(d, valid('c')); 563 | let f = nfa.put(e, valid('/')); 564 | let g = nfa.put(f, invalid('/')); 565 | 566 | nfa.put_state(c, c); 567 | nfa.put_state(g, g); 568 | nfa.acceptance(g); 569 | 570 | nfa.start_capture(c); 571 | nfa.end_capture(c); 572 | 573 | nfa.start_capture(g); 574 | nfa.end_capture(g); 575 | 576 | let post = nfa.process("p/123/c/456", |a| a); 577 | assert_eq!(post.unwrap().captures, vec!["123", "456"]); 578 | } 579 | 580 | #[test] 581 | fn test_ascii_set() { 582 | let mut set = CharSet::new(); 583 | set.insert('?'); 584 | set.insert('a'); 585 | set.insert('é'); 586 | 587 | assert!(set.contains('?'), "The set contains char 63"); 588 | assert!(set.contains('a'), "The set contains char 97"); 589 | assert!(set.contains('é'), "The set contains char 233"); 590 | assert!(!set.contains('q'), "The set does not contain q"); 591 | assert!(!set.contains('ü'), "The set does not contain ü"); 592 | } 593 | 594 | fn valid(char: char) -> CharacterClass { 595 | CharacterClass::valid_char(char) 596 | } 597 | 598 | fn invalid(char: char) -> CharacterClass { 599 | CharacterClass::invalid_char(char) 600 | } 601 | } 602 | --------------------------------------------------------------------------------