├── .github └── workflows │ └── test.yml ├── .gitignore ├── CODE_OF_CONDUCT.md ├── Cargo.toml ├── LICENSE ├── README.md ├── assets ├── logo.png └── vulnreach.webp ├── bench ├── Cargo.toml ├── packages └── src │ └── main.rs ├── rustfmt.toml ├── vuln-reach-cli ├── Cargo.toml ├── README.md ├── sample.toml ├── src │ └── main.rs └── tests │ └── fixtures │ └── jest-environment-jsdom-28.1.3.toml └── vuln-reach ├── Cargo.toml ├── build.rs └── src ├── javascript ├── lang │ ├── accesses.rs │ ├── exports.rs │ ├── imports.rs │ ├── mod.rs │ └── symbol_table.rs ├── mod.rs ├── module │ ├── mod.rs │ ├── module_cache.rs │ └── resolver │ │ ├── fs.rs │ │ ├── mem.rs │ │ ├── mod.rs │ │ └── tgz.rs ├── package │ ├── mod.rs │ ├── reachability.rs │ └── resolver.rs ├── project │ └── mod.rs └── queries │ ├── commonjs-exports.lsp │ ├── commonjs-imports.lsp │ ├── esm-exports.lsp │ └── esm-imports.lsp ├── lib.rs └── util.rs /.github/workflows/test.yml: -------------------------------------------------------------------------------- 1 | --- 2 | name: Test 3 | 4 | on: 5 | workflow_dispatch: 6 | pull_request: 7 | push: 8 | branches: 9 | - main 10 | 11 | jobs: 12 | test: 13 | runs-on: ubuntu-latest 14 | steps: 15 | - name: Checkout 16 | uses: actions/checkout@v3 17 | 18 | - name: Install Rust toolchains 19 | run: | 20 | rustup toolchain install stable --profile minimal -c clippy 21 | rustup toolchain install nightly --profile minimal -c rustfmt 22 | rustup default stable 23 | 24 | - name: Format check 25 | run: cargo +nightly fmt --all -- --check 26 | 27 | - name: Clippy 28 | run: cargo clippy -- -D warnings 29 | 30 | - name: Oldstable 31 | run: | 32 | oldstable=$(cat "./vuln-reach/Cargo.toml" | grep "rust-version" | sed 's/.*"\(.*\)".*/\1/') 33 | rustup toolchain install --profile minimal "$oldstable" 34 | cargo "+$oldstable" check 35 | 36 | - name: Tests 37 | run: cargo test 38 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Generated by Cargo 2 | # will have compiled files and executables 3 | debug/ 4 | target/ 5 | 6 | # Remove Cargo.lock from gitignore if creating an executable, leave it for libraries 7 | # More information here https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html 8 | Cargo.lock 9 | 10 | # These are backup files generated by rustfmt 11 | **/*.rs.bk 12 | 13 | # MSVC Windows builds of rustc generate these, which store debugging information 14 | *.pdb 15 | 16 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | We as members, contributors, and leaders pledge to make participation in our 6 | community a harassment-free experience for everyone, regardless of age, body 7 | size, visible or invisible disability, ethnicity, sex characteristics, gender 8 | identity and expression, level of experience, education, socio-economic status, 9 | nationality, personal appearance, race, religion, or sexual identity 10 | and orientation. 11 | 12 | We pledge to act and interact in ways that contribute to an open, welcoming, 13 | diverse, inclusive, and healthy community. 14 | 15 | ## Our Standards 16 | 17 | Examples of behavior that contributes to a positive environment for our 18 | community include: 19 | 20 | * Demonstrating empathy and kindness toward other people 21 | * Being respectful of differing opinions, viewpoints, and experiences 22 | * Giving and gracefully accepting constructive feedback 23 | * Accepting responsibility and apologizing to those affected by our mistakes, 24 | and learning from the experience 25 | * Focusing on what is best not just for us as individuals, but for the 26 | overall community 27 | 28 | Examples of unacceptable behavior include: 29 | 30 | * The use of sexualized language or imagery, and sexual attention or 31 | advances of any kind 32 | * Trolling, insulting or derogatory comments, and personal or political attacks 33 | * Public or private harassment 34 | * Publishing others' private information, such as a physical or email 35 | address, without their explicit permission 36 | * Other conduct which could reasonably be considered inappropriate in a 37 | professional setting 38 | 39 | ## Enforcement Responsibilities 40 | 41 | Community leaders are responsible for clarifying and enforcing our standards of 42 | acceptable behavior and will take appropriate and fair corrective action in 43 | response to any behavior that they deem inappropriate, threatening, offensive, 44 | or harmful. 45 | 46 | Community leaders have the right and responsibility to remove, edit, or reject 47 | comments, commits, code, wiki edits, issues, and other contributions that are 48 | not aligned to this Code of Conduct, and will communicate reasons for moderation 49 | decisions when appropriate. 50 | 51 | ## Scope 52 | 53 | This Code of Conduct applies within all community spaces, and also applies when 54 | an individual is officially representing the community in public spaces. 55 | Examples of representing our community include using an official e-mail address, 56 | posting via an official social media account, or acting as an appointed 57 | representative at an online or offline event. 58 | 59 | ## Enforcement 60 | 61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 62 | reported to the community leaders responsible for enforcement at 63 | phylum@phylum.io. 64 | All complaints will be reviewed and investigated promptly and fairly. 65 | 66 | All community leaders are obligated to respect the privacy and security of the 67 | reporter of any incident. 68 | 69 | ## Enforcement Guidelines 70 | 71 | Community leaders will follow these Community Impact Guidelines in determining 72 | the consequences for any action they deem in violation of this Code of Conduct: 73 | 74 | ### 1. Correction 75 | 76 | **Community Impact**: Use of inappropriate language or other behavior deemed 77 | unprofessional or unwelcome in the community. 78 | 79 | **Consequence**: A private, written warning from community leaders, providing 80 | clarity around the nature of the violation and an explanation of why the 81 | behavior was inappropriate. A public apology may be requested. 82 | 83 | ### 2. Warning 84 | 85 | **Community Impact**: A violation through a single incident or series 86 | of actions. 87 | 88 | **Consequence**: A warning with consequences for continued behavior. No 89 | interaction with the people involved, including unsolicited interaction with 90 | those enforcing the Code of Conduct, for a specified period of time. This 91 | includes avoiding interactions in community spaces as well as external channels 92 | like social media. Violating these terms may lead to a temporary or 93 | permanent ban. 94 | 95 | ### 3. Temporary Ban 96 | 97 | **Community Impact**: A serious violation of community standards, including 98 | sustained inappropriate behavior. 99 | 100 | **Consequence**: A temporary ban from any sort of interaction or public 101 | communication with the community for a specified period of time. No public or 102 | private interaction with the people involved, including unsolicited interaction 103 | with those enforcing the Code of Conduct, is allowed during this period. 104 | Violating these terms may lead to a permanent ban. 105 | 106 | ### 4. Permanent Ban 107 | 108 | **Community Impact**: Demonstrating a pattern of violation of community 109 | standards, including sustained inappropriate behavior, harassment of an 110 | individual, or aggression toward or disparagement of classes of individuals. 111 | 112 | **Consequence**: A permanent ban from any sort of public interaction within 113 | the community. 114 | 115 | ## Attribution 116 | 117 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], 118 | version 2.0, available at 119 | https://www.contributor-covenant.org/version/2/0/code_of_conduct.html. 120 | 121 | Community Impact Guidelines were inspired by [Mozilla's code of conduct 122 | enforcement ladder](https://github.com/mozilla/diversity). 123 | 124 | [homepage]: https://www.contributor-covenant.org 125 | 126 | For answers to common questions about this code of conduct, see the FAQ at 127 | https://www.contributor-covenant.org/faq. Translations are available at 128 | https://www.contributor-covenant.org/translations. 129 | -------------------------------------------------------------------------------- /Cargo.toml: -------------------------------------------------------------------------------- 1 | [workspace] 2 | resolver = "2" 3 | members = [ 4 | "vuln-reach", 5 | "vuln-reach-cli", 6 | "bench", 7 | ] 8 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![Vuln Reach Logo](https://github.com/phylum-dev/vuln-reach/raw/main/assets/logo.png) 2 | 3 | ![GitHub Repo stars](https://img.shields.io/github/stars/phylum-dev/vuln-reach) ![GitHub](https://img.shields.io/github/license/phylum-dev/vuln-reach) ![Discord](https://img.shields.io/discord/1070071012353376387) 4 | 5 | --- 6 | 7 | # Overview 8 | **Vuln Reach** is a library for developing tools that determine if a given vulnerability is reachable. Provided to the open source community by [Phylum](https://phylum.io) to help reduce false positives and increase signal-to-noise for software developers. 9 | 10 | 11 | 12 | # How does it work? 13 | 14 | **Vuln Reach** is a static analysis library written in Rust that leverages [`tree-sitter`](https://tree-sitter.github.io/tree-sitter/) for parsing. 15 | It currently supports Javascript. 16 | 17 | It builds an access graph of the source code of a package and its transitive dependencies, and then uses it to search for a path to a known vulnerable identifier node. 18 | 19 | # Usage 20 | 21 | Add this to your `Cargo.toml`: 22 | ```toml 23 | [dependencies] 24 | vuln-reach = { git = "https://github.com/phylum-dev/vuln-reach" } 25 | ``` 26 | 27 | # Example 28 | 29 | Here's an example of how you can find out whether an identifier node in a package is reachable from another package. 30 | 31 | ```rust 32 | use vuln_reach::javascript::package::reachability::VulnerableNode; 33 | use vuln_reach::javascript::package::resolver::PackageResolver; 34 | use vuln_reach::javascript::package::Package; 35 | use vuln_reach::javascript::project::Project; 36 | 37 | // Build a package resolver. 38 | let package_resolver = PackageResolver::builder() 39 | .with_package("path-scurry", Package::from_tarball_path("./tarballs/path-scurry-1.6.1.tgz")) 40 | .with_package("lru-cache", Package::from_tarball_path("./tarballs/lru-cache-7.14.1.tgz")) 41 | .with_package("minipass", Package::from_tarball_path("./tarballs/minipass-4.0.2.tgz")) 42 | .build(); 43 | 44 | // Build a project from the package resolver. 45 | let project = Project::new(package_resolver); 46 | 47 | // Define a target node (rows/columns are zero-indexed). 48 | let vulnerable_node = VulnerableNode::new("lru-cache", "index.js", 1017, 17, 1017, 24); 49 | 50 | // Compute the reachability graph. 51 | let reachability = project.reachability(&vulnerable_node); 52 | 53 | // Find a path to the vulnerable node, starting from the given package. 54 | let path = reachability.find_path("path-scurry"); 55 | ``` 56 | 57 | To find out what the transitive dependencies for your project are, you can use [Phylum](https://phylum.io)! 58 | 59 | For a more complete example of usage, check out the [cli](https://github.com/phylum-dev/vuln-reach/tree/main/vuln-reach-cli). 60 | 61 | # Contributing 62 | 63 | ## How do you add support for additional languages? 64 | 65 | At the moment, the codebase is relatively tightly coupled to Javascript. Plans are underway to abstract the non-language-specific bits to be used by all languages. 66 | 67 | Adding support for a new language requires the following steps: 68 | - Add the relevant tree-sitter parser to [`build.rs`](https://github.com/phylum-dev/vuln-reach/blob/main/vuln-reach/build.rs). 69 | - Create a module directory for your language in the [top level](https://github.com/phylum-dev/vuln-reach/blob/main/vuln-reach/src) of the `vuln-reach` package. 70 | - Implement abstractions for the language's imports and exports. 71 | - Implement [the concept of access](https://github.com/phylum-dev/vuln-reach/blob/main/vuln-reach/src/javascript/lang/accesses.rs) for your language -- this could be as simple as being equivalent to "function call" or as complex as necessary. 72 | 73 | # Commercial Licensing 74 | If you're interested in using `vuln reach` in a commercial project and need a different licensing agreement, please reach out to partnerships@phylum.io. 75 | -------------------------------------------------------------------------------- /assets/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/phylum-dev/vuln-reach/38a042035d8143fc510e561f9a3d89ccb06039c5/assets/logo.png -------------------------------------------------------------------------------- /assets/vulnreach.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/phylum-dev/vuln-reach/38a042035d8143fc510e561f9a3d89ccb06039c5/assets/vulnreach.webp -------------------------------------------------------------------------------- /bench/Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "bench" 3 | version = "0.1.0" 4 | edition = "2021" 5 | 6 | [dependencies] 7 | vuln-reach = { path = "../vuln-reach" } 8 | vuln-reach-upstream = { package = "vuln-reach", git = "https://github.com/phylum-dev/vuln-reach" } 9 | tokio = { version = "1.28.1", features = ["macros", "rt-multi-thread"] } 10 | reqwest = "0.11.18" 11 | bytes = "1.4.0" 12 | -------------------------------------------------------------------------------- /bench/packages: -------------------------------------------------------------------------------- 1 | https://registry.npmjs.org/codemirror/-/codemirror-5.5.0.tgz 2 | https://registry.npmjs.org/mongoose/-/mongoose-7.2.1.tgz 3 | https://registry.npmjs.org/core-js/-/core-js-3.30.2.tgz 4 | https://registry.npmjs.org/express/-/express-4.17.1.tgz 5 | https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz 6 | https://registry.npmjs.org/react/-/react-17.0.2.tgz 7 | https://registry.npmjs.org/axios/-/axios-0.21.1.tgz 8 | https://registry.npmjs.org/moment/-/moment-2.29.1.tgz 9 | https://registry.npmjs.org/dotenv/-/dotenv-10.0.0.tgz 10 | https://registry.npmjs.org/nodemon/-/nodemon-2.0.12.tgz 11 | https://registry.npmjs.org/mongoose/-/mongoose-6.0.5.tgz 12 | https://registry.npmjs.org/request/-/request-2.88.2.tgz 13 | https://registry.npmjs.org/cors/-/cors-2.8.5.tgz 14 | https://registry.npmjs.org/bcrypt/-/bcrypt-5.0.1.tgz 15 | https://registry.npmjs.org/socket.io/-/socket.io-4.1.2.tgz 16 | https://registry.npmjs.org/multer/-/multer-1.4.2.tgz 17 | https://registry.npmjs.org/joi/-/joi-17.4.0.tgz 18 | https://registry.npmjs.org/chalk/-/chalk-4.1.2.tgz 19 | https://registry.npmjs.org/fs-extra/-/fs-extra-10.0.0.tgz 20 | https://registry.npmjs.org/winston/-/winston-3.3.3.tgz 21 | https://registry.npmjs.org/helmet/-/helmet-4.6.0.tgz 22 | https://registry.npmjs.org/mysql/-/mysql-2.18.1.tgz 23 | https://registry.npmjs.org/jsonwebtoken/-/jsonwebtoken-8.5.1.tgz 24 | https://registry.npmjs.org/react-dom/-/react-dom-17.0.2.tgz 25 | https://registry.npmjs.org/jest/-/jest-27.0.6.tgz 26 | https://registry.npmjs.org/debug/-/debug-4.3.1.tgz 27 | https://registry.npmjs.org/axios-mock-adapter/-/axios-mock-adapter-1.19.0.tgz 28 | https://registry.npmjs.org/react-router-dom/-/react-router-dom-5.3.0.tgz 29 | https://registry.npmjs.org/firebase/-/firebase-8.8.1.tgz 30 | https://registry.npmjs.org/passport/-/passport-0.4.1.tgz 31 | https://registry.npmjs.org/morgan/-/morgan-1.10.0.tgz 32 | https://registry.npmjs.org/pg/-/pg-8.7.1.tgz 33 | https://registry.npmjs.org/prop-types/-/prop-types-15.7.2.tgz 34 | https://registry.npmjs.org/react-redux/-/react-redux-7.2.5.tgz 35 | https://registry.npmjs.org/nodemailer/-/nodemailer-6.6.3.tgz 36 | https://registry.npmjs.org/redux/-/redux-4.1.1.tgz 37 | https://registry.npmjs.org/moment-timezone/-/moment-timezone-0.5.34.tgz 38 | https://registry.npmjs.org/sequelize/-/sequelize-6.6.5.tgz 39 | https://registry.npmjs.org/cookie-parser/-/cookie-parser-1.4.5.tgz 40 | https://registry.npmjs.org/react-native-vector-icons/-/react-native-vector-icons-9.0.0.tgz 41 | https://registry.npmjs.org/passport-local/-/passport-local-1.0.0.tgz 42 | https://registry.npmjs.org/axios-cookiejar-support/-/axios-cookiejar-support-1.0.0.tgz 43 | https://registry.npmjs.org/express-validator/-/express-validator-6.12.1.tgz 44 | https://registry.npmjs.org/compression/-/compression-1.7.4.tgz 45 | https://registry.npmjs.org/rimraf/-/rimraf-3.0.2.tgz 46 | https://registry.npmjs.org/lodash.get/-/lodash.get-4.4.2.tgz 47 | https://registry.npmjs.org/lodash.merge/-/lodash.merge-4.6.2.tgz 48 | https://registry.npmjs.org/http-proxy-middleware/-/http-proxy-middleware-2.0.1.tgz 49 | https://registry.npmjs.org/mongoose-paginate-v2/-/mongoose-paginate-v2-1.4.3.tgz 50 | https://registry.npmjs.org/glob/-/glob-7.1.7.tgz 51 | https://registry.npmjs.org/chai/-/chai-4.3.4.tgz 52 | https://registry.npmjs.org/react-native-elements/-/react-native-elements-3.4.2.tgz 53 | https://registry.npmjs.org/express-session/-/express-session-1.17.2.tgz 54 | https://registry.npmjs.org/styled-components/-/styled-components-5.3.0.tgz 55 | https://registry.npmjs.org/bcryptjs/-/bcryptjs-2.4.3.tgz 56 | https://registry.npmjs.org/date-fns/-/date-fns-2.23.0.tgz 57 | https://registry.npmjs.org/concurrently/-/concurrently-6.4.0.tgz 58 | https://registry.npmjs.org/react-native-gesture-handler/-/react-native-gesture-handler-1.10.3.tgz 59 | https://registry.npmjs.org/formik/-/formik-2.2.9.tgz 60 | https://registry.npmjs.org/multer-s3/-/multer-s3-2.10.0.tgz 61 | https://registry.npmjs.org/socket.io-client/-/socket.io-client-4.3.2.tgz 62 | https://registry.npmjs.org/react-native-maps/-/react-native-maps-0.29.4.tgz 63 | https://registry.npmjs.org/helmet-csp/-/helmet-csp-3.0.0.tgz 64 | https://registry.npmjs.org/react-scripts/-/react-scripts-4.0.3.tgz 65 | https://registry.npmjs.org/mongoose-unique-validator/-/mongoose-unique-validator-3.0.0.tgz 66 | https://registry.npmjs.org/react-datepicker/-/react-datepicker-4.2.1.tgz 67 | https://registry.npmjs.org/passport-jwt/-/passport-jwt-4.0.0.tgz 68 | https://registry.npmjs.org/cheerio/-/cheerio-1.0.0-rc.3.tgz 69 | https://registry.npmjs.org/react-bootstrap/-/react-bootstrap-2.1.1.tgz 70 | https://registry.npmjs.org/dotenv-webpack/-/dotenv-webpack-7.0.3.tgz 71 | https://registry.npmjs.org/nodemailer-sendgrid-transport/-/nodemailer-sendgrid-transport-0.2.0.tgz 72 | https://registry.npmjs.org/winston-daily-rotate-file/-/winston-daily-rotate-file-4.5.2.tgz 73 | https://registry.npmjs.org/html-webpack-plugin/-/html-webpack-plugin-5.3.1.tgz 74 | https://registry.npmjs.org/typeorm/-/typeorm-0.2.36.tgz 75 | https://registry.npmjs.org/moment-duration-format/-/moment-duration-format-2.3.2.tgz 76 | https://registry.npmjs.org/cross-env/-/cross-env-7.0.3.tgz 77 | https://registry.npmjs.org/passport-facebook/-/passport-facebook-3.0.0.tgz 78 | https://registry.npmjs.org/redux-thunk/-/redux-thunk-2.4.0.tgz 79 | https://registry.npmjs.org/koa/-/koa-2.13.0.tgz 80 | https://registry.npmjs.org/react-native-navigation/-/react-native-navigation-7.15.0.tgz 81 | https://registry.npmjs.org/dotenv-flow/-/dotenv-flow-3.2.0.tgz 82 | https://registry.npmjs.org/cors-anywhere/-/cors-anywhere-0.4.4.tgz 83 | https://registry.npmjs.org/bodymen/-/bodymen-1.1.1.tgz 84 | https://registry.npmjs.org/axios-https-proxy-fix/-/axios-https-proxy-fix-0.17.1.tgz 85 | https://registry.npmjs.org/jsonwebtoken-promisified/-/jsonwebtoken-promisified-1.0.3.tgz 86 | https://registry.npmjs.org/jsonwebtoken-redis/-/jsonwebtoken-redis-1.0.6.tgz 87 | https://registry.npmjs.org/react-native-image-picker/-/react-native-image-picker-5.4.0.tgz 88 | https://registry.npmjs.org/jest-expect-message/-/jest-expect-message-1.1.3.tgz 89 | https://registry.npmjs.org/jest-mock-axios/-/jest-mock-axios-4.7.2.tgz 90 | https://registry.npmjs.org/csv-parser/-/csv-parser-3.0.0.tgz 91 | https://registry.npmjs.org/react-router/-/react-router-6.11.2.tgz 92 | https://registry.npmjs.org/morgan-body/-/morgan-body-2.6.9.tgz 93 | https://registry.npmjs.org/formik-material-ui/-/formik-material-ui-3.0.0.tgz 94 | https://registry.npmjs.org/debug-logdown/-/debug-logdown-0.2.0.tgz 95 | https://registry.npmjs.org/react-router-config/-/react-router-config-5.1.1.tgz 96 | https://registry.npmjs.org/@types/body-parser/-/body-parser-1.19.2.tgz 97 | 98 | # TODO: Currently broken for one reason or another. 99 | # https://registry.npmjs.org/react-native/-/react-native-0.64.2.tgz 100 | # https://registry.npmjs.org/yargs/-/yargs-17.0.1.tgz 101 | # https://registry.npmjs.org/react-native-camera/-/react-native-camera-4.2.0.tgz 102 | # https://registry.npmjs.org/sequelize-cli/-/sequelize-cli-6.2.0.tgz 103 | # https://registry.npmjs.org/vue/-/vue-2.6.14.tgz 104 | -------------------------------------------------------------------------------- /bench/src/main.rs: -------------------------------------------------------------------------------- 1 | //! Benchmark module loading performance. 2 | 3 | use std::time::Duration; 4 | 5 | use bytes::Bytes; 6 | use vuln_reach::javascript::module::TarballModuleResolver; 7 | use vuln_reach::javascript::package::Package; 8 | use vuln_reach_upstream::javascript::module::TarballModuleResolver as UpstreamTarballModuleResolver; 9 | use vuln_reach_upstream::javascript::package::Package as UpstreamPackage; 10 | 11 | const PACKAGES: &str = include_str!("../packages"); 12 | 13 | #[tokio::main] 14 | async fn main() { 15 | let mut perf_changes: Vec = Vec::with_capacity(PACKAGES.lines().count()); 16 | 17 | for package_uri in PACKAGES.lines() { 18 | // Ignore packages which are commented out. 19 | if package_uri.starts_with('#') | package_uri.trim().is_empty() { 20 | continue; 21 | } 22 | 23 | println!("Benchmarking {package_uri}:"); 24 | 25 | let tarball = reqwest::get(package_uri).await.unwrap().bytes().await.unwrap(); 26 | 27 | // Time loading HEAD. 28 | let (package, elapsed) = package(&tarball); 29 | println!(" HEAD loaded in {:?}", elapsed); 30 | 31 | // Time loading upstream. 32 | let (upstream_package, upstream_elapsed) = upstream_package(&tarball); 33 | println!(" Upstream loaded in {:?}", upstream_elapsed); 34 | 35 | // Ensure equivalence. 36 | assert_eq(upstream_package, package); 37 | 38 | // Compute and store percentage change. 39 | // If the package was loaded faster with HEAD than with upstream, the 40 | // change will be a negative value. 41 | let elapsed = elapsed.as_secs_f64(); 42 | let upstream_elapsed = upstream_elapsed.as_secs_f64(); 43 | let pct_change = elapsed / upstream_elapsed - 1f64; 44 | 45 | println!( 46 | "\x1b[{};1m {:.2}%\x1b[0m {}", 47 | if pct_change < 0. { 32 } else { 31 }, 48 | pct_change.abs() * 100., 49 | if pct_change < 0. { "improvement" } else { "regression" } 50 | ); 51 | 52 | perf_changes.push(pct_change); 53 | } 54 | 55 | let mean: f64 = perf_changes.iter().copied().sum::() / perf_changes.len() as f64; 56 | let std: f64 = (perf_changes.iter().copied().map(|val| (val - mean).powf(2f64)).sum::() 57 | / (perf_changes.len() - 1) as f64) 58 | .sqrt(); 59 | 60 | println!( 61 | "\nMean: {:5.2}% {}\n Std: {:5.2}", 62 | mean * 100., 63 | if mean < 0. { "improvement" } else { "regression" }, 64 | std * 100. 65 | ); 66 | } 67 | 68 | // Load package for the current revision. 69 | fn package(tarball: &Bytes) -> (Package, Duration) { 70 | let start = std::time::Instant::now(); 71 | let package = Package::from_tarball_bytes(tarball.to_vec()).unwrap(); 72 | let elapsed = start.elapsed(); 73 | 74 | (package, elapsed) 75 | } 76 | 77 | // Load package for the upstream revision. 78 | fn upstream_package(tarball: &Bytes) -> (UpstreamPackage, Duration) { 79 | let start = std::time::Instant::now(); 80 | let package = UpstreamPackage::from_tarball_bytes(tarball.to_vec()).unwrap(); 81 | let elapsed = start.elapsed(); 82 | 83 | (package, elapsed) 84 | } 85 | 86 | // Check for equivalence of upstream/HEAD. 87 | fn assert_eq( 88 | upstream: UpstreamPackage, 89 | head: Package, 90 | ) { 91 | let upstream_graph = upstream.cache().graph(); 92 | let graph = head.cache().graph(); 93 | 94 | for (key, upstream_value) in upstream_graph { 95 | let value = match graph.get(key) { 96 | Some(value) => value, 97 | None => panic!("Head missing key {key:?}"), 98 | }; 99 | 100 | for (key, upstream_value) in upstream_value { 101 | assert_eq!(value.get(key), Some(upstream_value)); 102 | } 103 | } 104 | 105 | assert_eq!(upstream_graph, graph); 106 | } 107 | -------------------------------------------------------------------------------- /rustfmt.toml: -------------------------------------------------------------------------------- 1 | unstable_features = true 2 | version = "Two" 3 | format_code_in_doc_comments = true 4 | group_imports = "StdExternalCrate" 5 | match_block_trailing_comma = true 6 | condense_wildcard_suffixes = true 7 | use_field_init_shorthand = true 8 | normalize_doc_attributes = true 9 | overflow_delimited_expr = true 10 | imports_granularity = "Module" 11 | use_small_heuristics = "Max" 12 | normalize_comments = true 13 | reorder_impl_items = true 14 | use_try_shorthand = true 15 | newline_style = "Unix" 16 | format_strings = true 17 | wrap_comments = true 18 | -------------------------------------------------------------------------------- /vuln-reach-cli/Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "vuln-reach-cli" 3 | version = "0.1.0" 4 | edition = "2021" 5 | 6 | # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html 7 | 8 | [dependencies] 9 | anyhow = "1.0.70" 10 | clap = { version = "4.2.1", features = ["derive"] } 11 | futures = "0.3.28" 12 | vuln-reach = { version = "0.1.0", path = "../vuln-reach" } 13 | reqwest = { version = "0.11.16", features = ["rustls-tls", "json"] } 14 | serde = { version = "1.0.159", features = ["derive"] } 15 | tokio = { version = "1.27.0", features = ["macros", "net", "fs", "rt-multi-thread"] } 16 | toml = "0.7.3" 17 | -------------------------------------------------------------------------------- /vuln-reach-cli/README.md: -------------------------------------------------------------------------------- 1 | # vuln-reach-cli 2 | 3 | This tool is intended for easing local research and development, and test case design. 4 | 5 | ## Building 6 | 7 | ``` 8 | cargo build --release --package vuln-reach-cli 9 | ``` 10 | 11 | ## Usage 12 | 13 | ### Specification file 14 | 15 | The specification file format is `toml`. 16 | 17 | `vuln-reach-cli` relies on it to retrieve the packages from the 18 | [npm registry](https://registry.npmjs.com/). 19 | 20 | The file must contain a list of `project` objects, structured like this: 21 | 22 | ```toml 23 | [[projects]] 24 | name = "@redis/client" 25 | tarballs = "./tarballs" 26 | packages = [ 27 | { name = "@redis/client", version = "1.0.6" }, 28 | { name = "cluster-key-slot", version = "1.1.2" }, 29 | { name = "yallist", version = "4.0.0" }, 30 | { name = "generic-pool", version = "3.8.2" }, 31 | ] 32 | vuln = [ 33 | { package = "generic-pool", module = "lib/Pool.js", start_row = 744, start_column = 18, end_row = 744, end_column = 22 } 34 | ] 35 | 36 | [[projects]] 37 | name = "path-scurry" 38 | tarballs = "./tarballs" 39 | packages = [ 40 | { name = "path-scurry", version = "1.6.1" }, 41 | { name = "lru-cache", version = "7.14.1" }, 42 | { name = "minipass", version = "4.0.2" }, 43 | ] 44 | vuln = [ 45 | { package = "lru-cache", module = "index.js", start_row = 1018, start_column = 18, end_row = 1018, end_column = 25 } 46 | ] 47 | ``` 48 | 49 | In each of those objects: 50 | - `packages` is the list of packages, and their version, available in the project. It must contain 51 | the elements of the entire _transitive_ dependency tree, not just the first-level dependencies. 52 | You can find the transitive dependency tree via the [Phylum CLI](https://github.com/phylum-dev/cli): 53 | ```shell 54 | $ npm init -y && npm i --save # for a new project 55 | $ phylum parse package-lock.json 56 | ``` 57 | - `vuln` indicates a vulnerable node in the specified `package`, in the file 58 | `module` at position `(row, column)`. Note that, differently from the library, 59 | the position is 1-indexed so specify these values in the same way they are 60 | displayed in a text editor. 61 | 62 | The term "vulnerable node" is used loosely here -- any identifier can be chosen as the 63 | target node that will be searched for. 64 | 65 | ### Running 66 | 67 | ``` 68 | cargo run --release --bin vuln-reach-cli -- sample.toml 69 | ``` 70 | 71 | All the specifications in `sample.toml` will be analyzed for reachability. 72 | If the node is determined to be reachable, the tool will print the access path that was found. 73 | -------------------------------------------------------------------------------- /vuln-reach-cli/sample.toml: -------------------------------------------------------------------------------- 1 | [[projects]] 2 | name = "@redis/client" 3 | tarballs = "./tarballs" 4 | packages = [ 5 | { name = "@redis/client", version = "1.0.6" }, 6 | { name = "cluster-key-slot", version = "1.1.2" }, 7 | { name = "yallist", version = "4.0.0" }, 8 | { name = "generic-pool", version = "3.8.2" }, 9 | ] 10 | vuln = [ 11 | { package = "generic-pool", module = "lib/Pool.js", start_row = 744, start_column = 18, end_row = 744, end_column = 22 } 12 | ] 13 | 14 | [[projects]] 15 | name = "path-scurry" 16 | tarballs = "./tarballs" 17 | packages = [ 18 | { name = "path-scurry", version = "1.6.1" }, 19 | { name = "lru-cache", version = "7.14.1" }, 20 | { name = "minipass", version = "4.0.2" }, 21 | ] 22 | vuln = [ 23 | { package = "lru-cache", module = "index.js", start_row = 1018, start_column = 18, end_row = 1018, end_column = 25 } 24 | ] 25 | -------------------------------------------------------------------------------- /vuln-reach-cli/src/main.rs: -------------------------------------------------------------------------------- 1 | use std::collections::HashMap; 2 | use std::fmt::Display; 3 | use std::path::{Path, PathBuf}; 4 | 5 | use anyhow::{anyhow, Result}; 6 | use clap::Parser; 7 | use futures::future; 8 | use serde::de::Error; 9 | use serde::{Deserialize, Deserializer}; 10 | use tokio::fs; 11 | use vuln_reach::javascript::package::reachability::{NodePath, VulnerableNode}; 12 | use vuln_reach::javascript::package::resolver::PackageResolver; 13 | use vuln_reach::javascript::package::Package; 14 | use vuln_reach::javascript::project::Project; 15 | 16 | type StdResult = std::result::Result; 17 | 18 | #[derive(Deserialize)] 19 | struct NpmRegistry { 20 | versions: HashMap, 21 | } 22 | 23 | #[derive(Deserialize)] 24 | struct Version { 25 | dist: Dist, 26 | } 27 | 28 | #[derive(Deserialize)] 29 | struct Dist { 30 | tarball: String, 31 | } 32 | 33 | #[derive(Deserialize, Debug, Hash, PartialEq, Eq)] 34 | struct PackageDef { 35 | name: String, 36 | version: String, 37 | } 38 | 39 | impl PackageDef { 40 | fn tarball(&self) -> PathBuf { 41 | PathBuf::from(format!("{}-{}.tgz", self.name, self.version)) 42 | } 43 | 44 | async fn retrieve(&self, prefix: &Path) -> Result<()> { 45 | let save_path = prefix.join(self.tarball()); 46 | 47 | if save_path.exists() { 48 | return Ok(()); 49 | } 50 | 51 | println!( 52 | "Downloading \x1b[36m{}-{}\x1b[0m to \x1b[33m{}\x1b[0m...", 53 | self.name, 54 | self.version, 55 | save_path.display() 56 | ); 57 | 58 | let registry = reqwest::get(format!("https://registry.npmjs.org/{}", self.name)) 59 | .await? 60 | .json::() 61 | .await?; 62 | 63 | let tarball_url = ®istry 64 | .versions 65 | .get(&self.version) 66 | .ok_or_else(|| anyhow!("{}-{} not found", self.name, self.version))? 67 | .dist 68 | .tarball; 69 | 70 | let tarball_data = reqwest::get(tarball_url).await?.bytes().await?; 71 | 72 | fs::create_dir_all(save_path.parent().unwrap()).await?; 73 | fs::write(save_path, tarball_data).await?; 74 | 75 | Ok(()) 76 | } 77 | } 78 | 79 | #[derive(Deserialize, Debug)] 80 | struct ProjectDef { 81 | name: String, 82 | tarballs: PathBuf, 83 | packages: Vec, 84 | #[serde(deserialize_with = "deserialize_vulnerable_node")] 85 | vuln: Vec, 86 | } 87 | 88 | #[derive(Default)] 89 | struct NodeValidation { 90 | start_after_end: Option<((usize, usize), (usize, usize))>, 91 | zero_value: bool, 92 | } 93 | 94 | impl NodeValidation { 95 | fn new(node: &VulnerableNode) -> Self { 96 | let start_row = node.start_row(); 97 | let start_column = node.start_column(); 98 | let end_row = node.end_row(); 99 | let end_column = node.end_column(); 100 | 101 | let start_after_end = 102 | if start_row > end_row || (start_row == end_row && start_column > end_column) { 103 | Some(((start_row, start_column), (end_row, end_column))) 104 | } else { 105 | None 106 | }; 107 | 108 | Self { 109 | start_after_end, 110 | zero_value: node.start_row() == 0 111 | || node.end_row() == 0 112 | || node.start_column() == 0 113 | || node.end_column() == 0, 114 | } 115 | } 116 | 117 | fn is_error(&self) -> bool { 118 | self.start_after_end.is_some() || self.zero_value 119 | } 120 | } 121 | 122 | impl Display for NodeValidation { 123 | fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { 124 | write!(f, "Invalid node representation: ")?; 125 | 126 | if let Some(((start_row, start_column), (end_row, end_column))) = self.start_after_end { 127 | write!( 128 | f, 129 | "Start position ({start_row}, {start_column}) is after end position ({end_row}, \ 130 | {end_column}); " 131 | )?; 132 | } 133 | 134 | if self.zero_value { 135 | write!(f, "Zero values are not allowed")?; 136 | } 137 | 138 | Ok(()) 139 | } 140 | } 141 | 142 | fn deserialize_vulnerable_node<'de, D: Deserializer<'de>>( 143 | deserializer: D, 144 | ) -> StdResult, D::Error> { 145 | let nodes = Vec::::deserialize(deserializer)?; 146 | nodes 147 | .into_iter() 148 | .map(|node| { 149 | let validation = NodeValidation::new(&node); 150 | if validation.is_error() { 151 | return Err(D::Error::custom(validation.to_string())); 152 | } 153 | 154 | let start_row = node.start_row(); 155 | let start_column = node.start_column(); 156 | let end_row = node.end_row(); 157 | let end_column = node.end_column(); 158 | 159 | Ok(VulnerableNode::new( 160 | node.package(), 161 | node.module(), 162 | start_row.saturating_sub(1), 163 | start_column.saturating_sub(1), 164 | end_row.saturating_sub(1), 165 | end_column.saturating_sub(1), 166 | )) 167 | }) 168 | .collect::, D::Error>>() 169 | } 170 | 171 | impl ProjectDef { 172 | async fn sync(&self) -> Result<()> { 173 | let results = future::join_all( 174 | self.packages.iter().map(|package| async { package.retrieve(&self.tarballs).await }), 175 | ) 176 | .await; 177 | 178 | for r in results { 179 | r?; 180 | } 181 | 182 | Ok(()) 183 | } 184 | 185 | fn reachability(self) -> Result<()> { 186 | // Build the package resolver. 187 | let mut package_resolver = PackageResolver::builder(); 188 | 189 | for package_def in &self.packages { 190 | let tarball_path = self.tarballs.join(package_def.tarball()); 191 | let tarball = Package::from_tarball_path(&tarball_path) 192 | .map_err(|e| anyhow!("{tarball_path:?}: {e:?}"))?; 193 | package_resolver = package_resolver.with_package(&package_def.name, tarball); 194 | } 195 | 196 | let package_resolver = package_resolver.build(); 197 | 198 | // Build the project object. 199 | let project = Project::new(package_resolver); 200 | 201 | // Compute reachability for each node. 202 | for node in &self.vuln { 203 | let reachability = project.reachability(node)?; 204 | 205 | let path = reachability.find_path(&self.name); 206 | 207 | match path { 208 | Some(path) => { 209 | println!( 210 | "\n\x1b[31m *** Node {}/{}:{}:{} is reachable!\x1b[0m\n", 211 | node.package(), 212 | node.module(), 213 | node.start_row(), 214 | node.start_column() 215 | ); 216 | print_path(path); 217 | }, 218 | None => { 219 | println!( 220 | "\n\x1b[33m *** No paths to {}/{}:{}:{} found.\x1b[0m", 221 | node.package(), 222 | node.module(), 223 | node.start_row(), 224 | node.start_column() 225 | ); 226 | }, 227 | } 228 | } 229 | 230 | Ok(()) 231 | } 232 | } 233 | 234 | #[derive(Deserialize, Debug)] 235 | struct Projects { 236 | projects: Vec, 237 | } 238 | 239 | impl Projects { 240 | async fn sync(&self) -> Result<()> { 241 | let results = future::join_all(self.projects.iter().map(ProjectDef::sync)).await; 242 | for r in results { 243 | r?; 244 | } 245 | 246 | Ok(()) 247 | } 248 | } 249 | 250 | fn print_path(package_path: Vec<(String, Vec<(String, NodePath)>)>) { 251 | for (package, module_path) in package_path { 252 | println!(" \x1b[34;1m{package}\x1b[0m:"); 253 | for (module, node_path) in module_path { 254 | println!(" \x1b[33;1m{module}\x1b[0m:"); 255 | for node_step in node_path { 256 | let (r, c) = node_step.start(); 257 | println!(" {:>4}:{:<5} {}", r, c, node_step.symbol(),); 258 | } 259 | } 260 | } 261 | } 262 | 263 | #[derive(Parser, Debug)] 264 | struct Cli { 265 | /// Path to a reachability definition .toml file 266 | path: String, 267 | } 268 | 269 | async fn read_document(path: &str) -> Result { 270 | Ok(toml::from_str::(fs::read_to_string(path).await?.as_str())?) 271 | } 272 | 273 | #[tokio::main] 274 | async fn main() -> Result<()> { 275 | let cli = Cli::parse(); 276 | 277 | let documents = read_document(&cli.path).await?; 278 | documents.sync().await?; 279 | 280 | for document in documents.projects { 281 | println!("\n\x1b[46;30m Reachability for {} \x1b[0m\n", document.name); 282 | document.reachability()?; 283 | } 284 | 285 | Ok(()) 286 | } 287 | -------------------------------------------------------------------------------- /vuln-reach-cli/tests/fixtures/jest-environment-jsdom-28.1.3.toml: -------------------------------------------------------------------------------- 1 | [[projects]] 2 | name = "jest-environment-jsdom:28.1.3" 3 | tarballs = "./tarballs" 4 | packages = [ 5 | { "name" = "@babel/code-frame", "version" = "7.22.13" }, 6 | { "name" = "ansi-styles", "version" = "3.2.1" }, 7 | { "name" = "chalk", "version" = "2.4.2" }, 8 | { "name" = "color-convert", "version" = "1.9.3" }, 9 | { "name" = "color-name", "version" = "1.1.3" }, 10 | { "name" = "escape-string-regexp", "version" = "1.0.5" }, 11 | { "name" = "has-flag", "version" = "3.0.0" }, 12 | { "name" = "supports-color", "version" = "5.5.0" }, 13 | { "name" = "@babel/helper-validator-identifier", "version" = "7.22.20" }, 14 | { "name" = "@babel/highlight", "version" = "7.22.20" }, 15 | { "name" = "ansi-styles", "version" = "3.2.1" }, 16 | { "name" = "chalk", "version" = "2.4.2" }, 17 | { "name" = "color-convert", "version" = "1.9.3" }, 18 | { "name" = "color-name", "version" = "1.1.3" }, 19 | { "name" = "escape-string-regexp", "version" = "1.0.5" }, 20 | { "name" = "has-flag", "version" = "3.0.0" }, 21 | { "name" = "supports-color", "version" = "5.5.0" }, 22 | { "name" = "@jest/environment", "version" = "28.1.3" }, 23 | { "name" = "@jest/fake-timers", "version" = "28.1.3" }, 24 | { "name" = "@jest/schemas", "version" = "28.1.3" }, 25 | { "name" = "@jest/types", "version" = "28.1.3" }, 26 | { "name" = "@sinclair/typebox", "version" = "0.24.51" }, 27 | { "name" = "@sinonjs/commons", "version" = "1.8.6" }, 28 | { "name" = "@sinonjs/fake-timers", "version" = "9.1.2" }, 29 | { "name" = "@tootallnate/once", "version" = "2.0.0" }, 30 | { "name" = "@types/istanbul-lib-coverage", "version" = "2.0.6" }, 31 | { "name" = "@types/istanbul-lib-report", "version" = "3.0.3" }, 32 | { "name" = "@types/istanbul-reports", "version" = "3.0.4" }, 33 | { "name" = "@types/jsdom", "version" = "16.2.15" }, 34 | { "name" = "@types/node", "version" = "20.9.1" }, 35 | { "name" = "@types/parse5", "version" = "6.0.3" }, 36 | { "name" = "@types/stack-utils", "version" = "2.0.3" }, 37 | { "name" = "@types/tough-cookie", "version" = "4.0.5" }, 38 | { "name" = "@types/yargs", "version" = "17.0.31" }, 39 | { "name" = "@types/yargs-parser", "version" = "21.0.3" }, 40 | { "name" = "abab", "version" = "2.0.6" }, 41 | { "name" = "acorn", "version" = "8.11.2" }, 42 | { "name" = "acorn-globals", "version" = "6.0.0" }, 43 | { "name" = "acorn", "version" = "7.4.1" }, 44 | { "name" = "acorn-walk", "version" = "7.2.0" }, 45 | { "name" = "agent-base", "version" = "6.0.2" }, 46 | { "name" = "ansi-regex", "version" = "5.0.1" }, 47 | { "name" = "ansi-styles", "version" = "4.3.0" }, 48 | { "name" = "asynckit", "version" = "0.4.0" }, 49 | { "name" = "braces", "version" = "3.0.2" }, 50 | { "name" = "browser-process-hrtime", "version" = "1.0.0" }, 51 | { "name" = "chalk", "version" = "4.1.2" }, 52 | { "name" = "ci-info", "version" = "3.9.0" }, 53 | { "name" = "color-convert", "version" = "2.0.1" }, 54 | { "name" = "color-name", "version" = "1.1.4" }, 55 | { "name" = "combined-stream", "version" = "1.0.8" }, 56 | { "name" = "cssom", "version" = "0.5.0" }, 57 | { "name" = "cssstyle", "version" = "2.3.0" }, 58 | { "name" = "cssom", "version" = "0.3.8" }, 59 | { "name" = "data-urls", "version" = "3.0.2" }, 60 | { "name" = "whatwg-url", "version" = "11.0.0" }, 61 | { "name" = "debug", "version" = "4.3.4" }, 62 | { "name" = "decimal.js", "version" = "10.4.3" }, 63 | { "name" = "delayed-stream", "version" = "1.0.0" }, 64 | { "name" = "domexception", "version" = "4.0.0" }, 65 | { "name" = "escape-string-regexp", "version" = "2.0.0" }, 66 | { "name" = "escodegen", "version" = "2.1.0" }, 67 | { "name" = "esprima", "version" = "4.0.1" }, 68 | { "name" = "estraverse", "version" = "5.3.0" }, 69 | { "name" = "esutils", "version" = "2.0.3" }, 70 | { "name" = "fill-range", "version" = "7.0.1" }, 71 | { "name" = "form-data", "version" = "4.0.0" }, 72 | { "name" = "graceful-fs", "version" = "4.2.11" }, 73 | { "name" = "has-flag", "version" = "4.0.0" }, 74 | { "name" = "html-encoding-sniffer", "version" = "3.0.0" }, 75 | { "name" = "http-proxy-agent", "version" = "5.0.0" }, 76 | { "name" = "https-proxy-agent", "version" = "5.0.1" }, 77 | { "name" = "iconv-lite", "version" = "0.6.3" }, 78 | { "name" = "is-number", "version" = "7.0.0" }, 79 | { "name" = "is-potential-custom-element-name", "version" = "1.0.1" }, 80 | { "name" = "jest-environment-jsdom", "version" = "28.1.3" }, 81 | { "name" = "jest-message-util", "version" = "28.1.3" }, 82 | { "name" = "jest-mock", "version" = "28.1.3" }, 83 | { "name" = "jest-util", "version" = "28.1.3" }, 84 | { "name" = "js-tokens", "version" = "4.0.0" }, 85 | { "name" = "jsdom", "version" = "19.0.0" }, 86 | { "name" = "micromatch", "version" = "4.0.5" }, 87 | { "name" = "mime-db", "version" = "1.52.0" }, 88 | { "name" = "mime-types", "version" = "2.1.35" }, 89 | { "name" = "ms", "version" = "2.1.2" }, 90 | { "name" = "nwsapi", "version" = "2.2.7" }, 91 | { "name" = "parse5", "version" = "6.0.1" }, 92 | { "name" = "picomatch", "version" = "2.3.1" }, 93 | { "name" = "pretty-format", "version" = "28.1.3" }, 94 | { "name" = "ansi-styles", "version" = "5.2.0" }, 95 | { "name" = "psl", "version" = "1.9.0" }, 96 | { "name" = "punycode", "version" = "2.3.1" }, 97 | { "name" = "react-is", "version" = "18.2.0" }, 98 | { "name" = "safer-buffer", "version" = "2.1.2" }, 99 | { "name" = "saxes", "version" = "5.0.1" }, 100 | { "name" = "slash", "version" = "3.0.0" }, 101 | { "name" = "source-map", "version" = "0.6.1" }, 102 | { "name" = "stack-utils", "version" = "2.0.6" }, 103 | { "name" = "supports-color", "version" = "7.2.0" }, 104 | { "name" = "symbol-tree", "version" = "3.2.4" }, 105 | { "name" = "to-regex-range", "version" = "5.0.1" }, 106 | { "name" = "tough-cookie", "version" = "4.0.0" }, 107 | { "name" = "tr46", "version" = "3.0.0" }, 108 | { "name" = "type-detect", "version" = "4.0.8" }, 109 | { "name" = "undici-types", "version" = "5.26.5" }, 110 | { "name" = "universalify", "version" = "0.1.2" }, 111 | { "name" = "w3c-hr-time", "version" = "1.0.2" }, 112 | { "name" = "w3c-xmlserializer", "version" = "3.0.0" }, 113 | { "name" = "webidl-conversions", "version" = "7.0.0" }, 114 | { "name" = "whatwg-encoding", "version" = "2.0.0" }, 115 | { "name" = "whatwg-mimetype", "version" = "3.0.0" }, 116 | { "name" = "whatwg-url", "version" = "10.0.0" }, 117 | { "name" = "ws", "version" = "8.14.2" }, 118 | { "name" = "xml-name-validator", "version" = "4.0.0" }, 119 | { "name" = "xmlchars", "version" = "2.2.0" } 120 | ] 121 | vuln = [ 122 | { package = "tough-cookie", module = "lib/memstore.js", start_row = 111, start_column = 32, end_row = 111, end_column = 34 } 123 | ] 124 | -------------------------------------------------------------------------------- /vuln-reach/Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "vuln-reach" 3 | version = "0.1.1" 4 | edition = "2021" 5 | description = "Code reachability path analysis." 6 | authors = ["Phylum, Inc. "] 7 | repository = "https://github.com/phylum-dev/vuln-reach" 8 | documentation = "https://docs.rs/vuln-reach" 9 | license-file = "../LICENSE" 10 | rust-version = "1.74" 11 | readme = "../README.md" 12 | 13 | [dependencies] 14 | flate2 = "1.0.24" 15 | itertools = "0.10.5" 16 | lazy_static = "1.4.0" 17 | ouroboros = "0.17.0" 18 | serde = { version = "1.0.152", features = ["derive"] } 19 | serde_json = "1.0.93" 20 | tar = "0.4.38" 21 | thiserror = "1.0.37" 22 | tree-sitter = "0.20.9" 23 | walkdir = "2.3.2" 24 | 25 | [build-dependencies] 26 | cc = "1.0.76" 27 | 28 | [dev-dependencies] 29 | maplit = "1.0.2" 30 | tempfile = "3.3.0" 31 | textwrap = "0.16.0" 32 | -------------------------------------------------------------------------------- /vuln-reach/build.rs: -------------------------------------------------------------------------------- 1 | use std::env; 2 | use std::path::{Path, PathBuf}; 3 | use std::process::Command; 4 | 5 | fn out_dir() -> PathBuf { 6 | PathBuf::from(env::var("OUT_DIR").unwrap()) 7 | } 8 | 9 | struct TreeSitterLang { 10 | lang: &'static str, 11 | path: PathBuf, 12 | repo: String, 13 | tag: Option<&'static str>, 14 | } 15 | 16 | impl TreeSitterLang { 17 | fn new(lang: &'static str, tag: Option<&'static str>) -> Self { 18 | let path = out_dir().join(format!("tree-sitter-{lang}")); 19 | let repo = format!("https://github.com/tree-sitter/tree-sitter-{lang}"); 20 | 21 | TreeSitterLang { lang, path, repo, tag } 22 | } 23 | 24 | fn path(&self) -> &Path { 25 | &self.path 26 | } 27 | 28 | fn repo(&self) -> &str { 29 | &self.repo 30 | } 31 | 32 | fn clone_repository(&self) { 33 | if self.path().exists() { 34 | return; 35 | } 36 | 37 | let mut cmd = Command::new("git"); 38 | 39 | cmd.current_dir(out_dir()).arg("clone").arg(self.repo()).arg(self.path()); 40 | 41 | if let Some(tag) = self.tag { 42 | cmd.arg("-b").arg(tag); 43 | } 44 | 45 | let status = cmd.status().expect("Couldn't run git command"); 46 | 47 | if !status.success() { 48 | panic!("Couldn't clone git repo for {}", self.lang); 49 | } 50 | } 51 | } 52 | 53 | fn main() { 54 | let js = TreeSitterLang::new("javascript", Some("v0.20.1")); 55 | js.clone_repository(); 56 | 57 | cc::Build::new() 58 | .warnings(false) 59 | .include(js.path().join("src")) 60 | .file(js.path().join("src/parser.c")) 61 | .file(js.path().join("src/scanner.c")) 62 | .compile("tree-sitter-obj"); 63 | } 64 | -------------------------------------------------------------------------------- /vuln-reach/src/javascript/lang/exports.rs: -------------------------------------------------------------------------------- 1 | //! ESM and CommonJS exports. 2 | use std::collections::{HashMap, HashSet}; 3 | 4 | use lazy_static::lazy_static; 5 | use tree_sitter::{Node, Query, QueryCursor}; 6 | 7 | use crate::{Error, Tree, JS}; 8 | 9 | // CommonJS 10 | // 11 | // All the compatible export statements in CommonJS are the following: 12 | // 1. module.exports = an object, an identifier, or a function 13 | // 14 | // When this is found, override all previous exports. 15 | // 16 | // 2. module.exports.foo = anything 17 | // 18 | // When this is found, override exports of the same name only. 19 | // 20 | // 3. exports.foo = anything 21 | // 22 | // This is compatible with 2., but if there is even one instance of 1., this 23 | // definition does nothing. This is because the module-global `exports` is a 24 | // shorthand for `module.exports`, but if something is assigned to 25 | // `module.exports`, this simply gets cloaked and the reference gets lost. 26 | // 27 | // For the time, we only care about top-level exported objects, 28 | // i.e.: 29 | // 30 | // Ok: module.exports = { a, b, c } 31 | // Ok: module.exports.a = ... 32 | // Ok: exports.a = ... 33 | // No: module.exports.foo.bar = ... (is anyone _that_ evil?) 34 | #[derive(Debug)] 35 | pub enum CommonJsExports<'a> { 36 | Name(Node<'a>), 37 | Scope(Node<'a>), 38 | // The value in this map can either be an identifier or a scope. The 39 | // pathfinder should match this against its list of scopes first and, 40 | // failing that, retrieve the correspondingly named identifier and its 41 | // scope. 42 | Object(HashMap<&'a str, Node<'a>>), 43 | None, 44 | } 45 | 46 | impl<'a> TryFrom<&'a Tree> for CommonJsExports<'a> { 47 | type Error = Error; 48 | 49 | fn try_from(tree: &'a Tree) -> Result { 50 | let mut cur = QueryCursor::new(); 51 | 52 | lazy_static! { 53 | static ref QUERY: Query = 54 | Query::new(*JS, include_str!("../queries/commonjs-exports.lsp")).unwrap(); 55 | }; 56 | 57 | let mut exports_object = CommonJsExports::Object(Default::default()); 58 | let mut found = false; 59 | 60 | // For a given node, find the one that can track accesses. 61 | fn exportable_node(node: Node<'_>) -> Node<'_> { 62 | if let Some(body) = node.child_by_field_name("body") { 63 | // Function nodes that have a child field called `body` return 64 | // that, since it's a scope. 65 | body 66 | } else { 67 | // Return the node itself by default. Note: primarily this is 68 | // intended for identifiers, which are to be looked up as-is, 69 | // but for the moment it also works the same for other kinds of 70 | // nodes, should any such edge cases crop up. 71 | node 72 | } 73 | } 74 | 75 | for query_match in cur.matches(&QUERY, tree.root_node(), tree.buf().as_bytes()) { 76 | match query_match.pattern_index { 77 | 0 => { 78 | // module.exports = identifier 79 | let target_ident = query_match.captures[2].node; 80 | exports_object = CommonJsExports::Name(target_ident); 81 | found = true; 82 | }, 83 | 1 => { 84 | // module.exports = function () { ... } 85 | let target_scope = query_match.captures[2].node; 86 | exports_object = CommonJsExports::Scope(target_scope); 87 | found = true; 88 | }, 89 | 2 => { 90 | // module.exports = { a, b, c ... } 91 | // 92 | // Parse an object, which can only be one of: 93 | // - (pair key: _, value: _) save the statement block iff there is any one 94 | // present under value 95 | // - (spread_element) we don't care 96 | // - (method_definition name: _ body: (statement_block)) save the statement 97 | // block 98 | // - (shorthand_property_identifier) resolve later to surrounding scope 99 | let mut object_map = HashMap::default(); 100 | 101 | let target_object = query_match.captures[2].node; 102 | let mut cur = target_object.walk(); 103 | for object_entry in target_object.children(&mut cur) { 104 | match object_entry.kind() { 105 | "pair" => { 106 | let key = object_entry.child_by_field_name("key").unwrap(); 107 | let value = exportable_node( 108 | object_entry.child_by_field_name("value").unwrap(), 109 | ); 110 | object_map.insert(tree.repr_of(key), value); 111 | }, 112 | "method_definition" => { 113 | let name = object_entry.child_by_field_name("name").unwrap(); 114 | let stmt = object_entry.child_by_field_name("body").unwrap(); 115 | object_map.insert(tree.repr_of(name), stmt); 116 | }, 117 | "shorthand_property_identifier" => { 118 | object_map.insert(tree.repr_of(object_entry), object_entry); 119 | }, 120 | _ => {}, 121 | } 122 | } 123 | 124 | exports_object = CommonJsExports::Object(object_map); 125 | found = true; 126 | }, 127 | 3 => { 128 | // module.exports.foo = ... 129 | if let CommonJsExports::Object(ref mut object_map) = &mut exports_object { 130 | let target_name = query_match.captures[2].node; 131 | let target_object = exportable_node(query_match.captures[3].node); 132 | object_map.insert(tree.repr_of(target_name), target_object); 133 | } 134 | found = true; 135 | }, 136 | 4 => { 137 | // exports.foo = ... 138 | if let CommonJsExports::Object(ref mut object_map) = &mut exports_object { 139 | let target_name = query_match.captures[1].node; 140 | let target_object = exportable_node(query_match.captures[2].node); 141 | object_map.insert(tree.repr_of(target_name), target_object); 142 | } 143 | 144 | // silently fail if module.exports not an object: it will 145 | // look like assigning to the properties of a function or a 146 | // primitive object, which is, respectively, already accounted 147 | // for by the pathfinder, and invalid JS. 148 | found = true; 149 | }, 150 | k => unreachable!("{}: {:#?}", k, query_match), 151 | } 152 | } 153 | 154 | Ok(if found { exports_object } else { CommonJsExports::None }) 155 | } 156 | } 157 | 158 | // ESM Exports 159 | // 160 | // All the compatible ESM export statements are the following: 161 | // 1. export declaration 162 | // - export let a, b 163 | // - export const a = 1, b = 2 164 | // - XXX export { name: value } is _not_ valid! 165 | // 2. export list 166 | // - export { a, b, c } 167 | // - export { a as b, c as "d with spaces" } 168 | // - export { something as default } 169 | // 3. export default declaration (identifier|object|function|class) 170 | // 4. export default 171 | // 172 | // Unsupported statements: 173 | // 5. export * from "module" / export { default } from "module" / export { name 174 | // } from "module" 175 | #[derive(Debug, Default)] 176 | pub struct EsmExports<'a> { 177 | // The nodes in this map can either be identifiers or scopes. The 178 | // pathfinder should match this against its list of scopes first and, 179 | // failing that, retrieve the correspondingly named identifier and its 180 | // scope. 181 | pub objects: HashMap<&'a str, EsmExport<'a>>, 182 | pub default: Option>, 183 | // Contains a vector of all full-module exports. At this point, we can't 184 | // know what the exported names are, because we haven't processed the 185 | // reexported module. 186 | pub reexports: HashSet>, 187 | } 188 | 189 | #[derive(Debug)] 190 | pub enum EsmExport<'a> { 191 | // Perform a module-scoped symbol lookup of this node. 192 | Name(Node<'a>), 193 | // Mark all reachable nodes in this scope as reachable through this export. 194 | Scope(Node<'a>), 195 | // Mark all identifier nodes inside of the expression as reachable. 196 | // XXX needs an expression evaluation module. 197 | Expression(Node<'a>), 198 | } 199 | 200 | impl<'a> EsmExport<'a> { 201 | pub fn node(&self) -> Node<'a> { 202 | match self { 203 | EsmExport::Name(node) | EsmExport::Scope(node) | EsmExport::Expression(node) => *node, 204 | } 205 | } 206 | 207 | // Determine if the expression export contains the specified node. 208 | pub fn expression_contains(&self, node: Node<'a>) -> bool { 209 | match self { 210 | EsmExport::Name(_) | EsmExport::Scope(_) => false, 211 | EsmExport::Expression(export) => { 212 | export.start_byte() <= node.start_byte() && export.end_byte() >= node.end_byte() 213 | }, 214 | } 215 | } 216 | } 217 | 218 | #[derive(Debug, PartialEq, Eq, Hash)] 219 | pub enum Reexport<'a> { 220 | All(&'a str), 221 | Named(&'a str, Option<&'a str>, &'a str), 222 | } 223 | 224 | impl<'a> TryFrom<&'a Tree> for Option> { 225 | type Error = Error; 226 | 227 | fn try_from(tree: &'a Tree) -> Result { 228 | let mut cur = QueryCursor::new(); 229 | 230 | lazy_static! { 231 | static ref QUERY: Query = 232 | Query::new(*JS, include_str!("../queries/esm-exports.lsp")).unwrap(); 233 | }; 234 | 235 | let mut exports = EsmExports::default(); 236 | let mut found = false; 237 | 238 | for query_match in cur.matches(&QUERY, tree.root_node(), tree.buf().as_bytes()) { 239 | match query_match.pattern_index { 240 | 0 => { 241 | // export let name = 1 242 | // export const name = 1 243 | let export_decl = query_match.captures[0].node; 244 | exports.objects.insert(tree.repr_of(export_decl), EsmExport::Name(export_decl)); 245 | found = true; 246 | }, 247 | 1 => { 248 | // export function name() {} 249 | // export function* name() {} 250 | // export class ClassName {} 251 | // 252 | // These are the equivalent of `function name() {}; export name`. 253 | let export_name = query_match.captures[0].node; 254 | let export_scope = query_match.captures[1].node; 255 | 256 | // The grammar has a quirk where `export default function foo() {}` will 257 | // look exactly like `export function foo() {}`; but that is misleading 258 | // because in the former case the exported identifier is `default` and in 259 | // the latter it is `foo`. So we need to manually match the first child 260 | // of the export statement and confirm that it is in fact the literal 261 | // `default`; use this to decide whether to insert the export in the 262 | // default slot or in the objects. 263 | let parent = export_name.parent().unwrap().parent().unwrap(); 264 | if let Some("default") = parent.child(1).map(|c| tree.repr_of(c)) { 265 | exports.default = Some(EsmExport::Scope(export_scope)); 266 | } else { 267 | exports 268 | .objects 269 | .insert(tree.repr_of(export_name), EsmExport::Name(export_name)); 270 | } 271 | found = true; 272 | }, 273 | 2 => { 274 | // export { foo, bar, baz as quux } 275 | let export_spec = query_match.captures[0].node; 276 | let name = export_spec.child_by_field_name("name").unwrap(); 277 | let alias = export_spec.child_by_field_name("alias"); 278 | exports 279 | .objects 280 | .insert(tree.repr_of(alias.unwrap_or(name)), EsmExport::Name(name)); 281 | found = true; 282 | }, 283 | 3 => { 284 | // export default function name() {} 285 | // export default function* name() {} 286 | // export default class ClassName {} 287 | // export default function () {} 288 | // export default function* () {} 289 | // export default class {} 290 | let export_scope = query_match.captures[0].node; 291 | // Discard previous export, if it exists: there can only 292 | // ever be a single default export. 293 | exports.default = Some(EsmExport::Scope(export_scope)); 294 | found = true; 295 | }, 296 | 4 => { 297 | // export default identifier 298 | let export_ident = query_match.captures[0].node; 299 | // Discard previous export, if it exists: there can only 300 | // ever be a single default export. 301 | exports.default = Some(EsmExport::Name(export_ident)); 302 | found = true; 303 | }, 304 | 5 | 6 => { 305 | // export const { foo, bar } = baz 306 | let export_pattern = query_match.captures[0].node; 307 | let export_source = query_match.captures[1].node; 308 | exports 309 | .objects 310 | .insert(tree.repr_of(export_pattern), EsmExport::Name(export_source)); 311 | found = true; 312 | }, 313 | 7 => { 314 | // export default { a: 1, b: 2 } 315 | let export_value = query_match.captures[0].node; 316 | exports.default = Some(EsmExport::Expression(export_value)); 317 | found = true; 318 | }, 319 | 8 => { 320 | // export * from 'foo' 321 | // export { a as b, c } from 'foo' 322 | if query_match.captures.len() == 1 { 323 | let export_source = tree.repr_of(query_match.captures[0].node); 324 | exports.reexports.insert(Reexport::All(export_source)); 325 | } else { 326 | let export_clause = query_match.captures[0].node; 327 | let export_source = tree.repr_of(query_match.captures[1].node); 328 | 329 | for spec in (0..export_clause.named_child_count()) 330 | .filter_map(|i| export_clause.named_child(i)) 331 | { 332 | let name = tree.repr_of(spec.child_by_field_name(b"name").unwrap()); 333 | let alias = 334 | spec.child_by_field_name(b"alias").map(|node| tree.repr_of(node)); 335 | exports.reexports.insert(Reexport::Named(name, alias, export_source)); 336 | } 337 | } 338 | found = true; 339 | }, 340 | k => unreachable!("{}: {:#?}", k, query_match), 341 | } 342 | } 343 | 344 | Ok(if found { Some(exports) } else { None }) 345 | } 346 | } 347 | 348 | #[derive(Debug)] 349 | pub enum Exports<'a> { 350 | Esm(EsmExports<'a>), 351 | CommonJs(CommonJsExports<'a>), 352 | None, 353 | } 354 | 355 | impl<'a> Exports<'a> { 356 | pub fn new(tree: &'a Tree) -> Self { 357 | if let Ok(Some(esm_exports)) = Option::::try_from(tree) { 358 | Exports::Esm(esm_exports) 359 | } else if let Ok(cjs_exports) = CommonJsExports::try_from(tree) { 360 | Exports::CommonJs(cjs_exports) 361 | } else { 362 | Exports::None 363 | } 364 | } 365 | } 366 | 367 | // Very coarse grained functions for telling whether a JS file contains an 368 | // export at all. 369 | 370 | pub fn has_exports_cjs(tree: &Tree) -> bool { 371 | let mut cur = QueryCursor::new(); 372 | 373 | lazy_static! { 374 | static ref QUERY: Query = Query::new( 375 | *JS, 376 | r#" 377 | ( 378 | (member_expression 379 | object: (identifier) @module 380 | property: (property_identifier) @exports) 381 | (#eq? @module "module") 382 | (#eq? @exports "exports") 383 | ) 384 | ( 385 | (identifier) @exports 386 | (#eq? @exports "exports") 387 | ) 388 | "# 389 | ) 390 | .unwrap(); 391 | }; 392 | 393 | cur.matches(&QUERY, tree.root_node(), tree.buf().as_bytes()).count() > 0 394 | } 395 | 396 | pub fn has_exports_mjs(tree: &Tree) -> bool { 397 | let mut cur = QueryCursor::new(); 398 | 399 | lazy_static! { 400 | static ref QUERY: Query = Query::new(*JS, r#"(export_statement)"#).unwrap(); 401 | }; 402 | 403 | cur.matches(&QUERY, tree.root_node(), tree.buf().as_bytes()).count() > 0 404 | } 405 | 406 | #[test] 407 | fn test_module_exports() { 408 | let tree = Tree::new(r#"module.exports = function () {}"#.to_string()).unwrap(); 409 | assert!(matches!(CommonJsExports::try_from(&tree), Ok(CommonJsExports::Scope(_)))); 410 | 411 | let tree = Tree::new( 412 | r#" 413 | module.exports = { a, b, c } 414 | module.exports.foo = function () {} 415 | "# 416 | .to_string(), 417 | ) 418 | .unwrap(); 419 | let exports = CommonJsExports::try_from(&tree).unwrap(); 420 | println!("{:#?}", exports); 421 | assert!(matches!(exports, CommonJsExports::Object(e) if e.len() == 4)); 422 | 423 | let tree = Tree::new( 424 | r#" 425 | module.exports = { 426 | a, b, c, 427 | foo() {}, 428 | d: function() {}, 429 | e: {} 430 | } 431 | "# 432 | .to_string(), 433 | ) 434 | .unwrap(); 435 | let exports = CommonJsExports::try_from(&tree).unwrap(); 436 | println!("{:#?}", exports); 437 | assert!(matches!(exports, CommonJsExports::Object(e) if e.len() == 6)); 438 | } 439 | 440 | #[test] 441 | fn test_esm_exports() { 442 | let tree = Tree::new( 443 | r#" 444 | // Exporting declarations 445 | export let name1, name2; // also var 446 | export const name3 = 1, name4 = 2; // also var, let 447 | export function functionName() {} 448 | export class ClassName {} 449 | export function* generatorFunctionName() {} 450 | export const { name5, name6: bar } = o; 451 | export const [ name7, name8 ] = array; 452 | 453 | // Export list 454 | export { name9, name10 }; 455 | export { variable1 as name11, variable2 as name12, name13 }; 456 | export { variable1 as "string name" }; 457 | export { name14 as default }; 458 | 459 | // Default exports 460 | export default expression; 461 | export default function defaultFunctionName() {} 462 | export default class DefaultClassName {} 463 | export default function* DefaultGeneratorFunctionName() {} 464 | export default function () {} 465 | export default class {} 466 | export default function* () {} 467 | "# 468 | .to_string(), 469 | ) 470 | .unwrap(); 471 | let exports = Option::::try_from(&tree).unwrap().unwrap(); 472 | println!("{:#?}", exports); 473 | 474 | assert!(matches!(exports.objects.get("name1"), Some(EsmExport::Name(_)))); 475 | assert!(matches!(exports.objects.get("name2"), Some(EsmExport::Name(_)))); 476 | assert!(matches!(exports.objects.get("name3"), Some(EsmExport::Name(_)))); 477 | assert!(matches!(exports.objects.get("name4"), Some(EsmExport::Name(_)))); 478 | assert!(matches!(exports.objects.get("name5"), Some(EsmExport::Name(_)))); 479 | assert!(matches!(exports.objects.get("name6"), Some(EsmExport::Name(_)))); 480 | assert!(matches!(exports.objects.get("name7"), Some(EsmExport::Name(_)))); 481 | assert!(matches!(exports.objects.get("name8"), Some(EsmExport::Name(_)))); 482 | assert!(matches!(exports.objects.get("name9"), Some(EsmExport::Name(_)))); 483 | assert!(matches!(exports.objects.get("name10"), Some(EsmExport::Name(_)))); 484 | assert!(matches!(exports.objects.get("name11"), Some(EsmExport::Name(_)))); 485 | assert!(matches!(exports.objects.get("name12"), Some(EsmExport::Name(_)))); 486 | assert!(matches!(exports.objects.get("name13"), Some(EsmExport::Name(_)))); 487 | assert!(matches!(exports.default, Some(EsmExport::Scope(_)))); 488 | 489 | assert!(!exports.objects.contains_key("DefaultClassName")); 490 | assert!(!exports.objects.contains_key("defaultFunctionName")); 491 | assert!(!exports.objects.contains_key("defaultGeneratorFunctionName")); 492 | 493 | assert!(exports.objects.contains_key("ClassName")); 494 | assert!(exports.objects.contains_key("functionName")); 495 | assert!(exports.objects.contains_key("generatorFunctionName")); 496 | 497 | assert_eq!(exports.objects.len(), 18); 498 | } 499 | 500 | #[test] 501 | fn test_esm_reexport() { 502 | let tree = Tree::new( 503 | r#" 504 | // Aggregating modules 505 | export * from "module-name"; 506 | export { name2, nameN } from "module-name"; 507 | export { import1 as name3, import2 as name4, name5 } from "module-name"; 508 | export { default } from "module-name"; 509 | export { default as foo } from "module-name"; 510 | "# 511 | .to_string(), 512 | ) 513 | .unwrap(); 514 | let exports = Option::::try_from(&tree).unwrap().unwrap(); 515 | println!("{:#?}", exports); 516 | 517 | macro_rules! assert_reexport_contains { 518 | ($pattern:pat_param) => { 519 | assert!(exports.reexports.iter().find(|e| matches!(e, $pattern)).is_some()); 520 | }; 521 | } 522 | 523 | assert_eq!(exports.reexports.len(), 8); 524 | assert_reexport_contains!(Reexport::All("module-name")); 525 | assert_reexport_contains!(Reexport::Named("name2", None, "module-name")); 526 | assert_reexport_contains!(Reexport::Named("nameN", None, "module-name")); 527 | assert_reexport_contains!(Reexport::Named("import1", Some("name3"), "module-name")); 528 | assert_reexport_contains!(Reexport::Named("import2", Some("name4"), "module-name")); 529 | assert_reexport_contains!(Reexport::Named("name5", None, "module-name")); 530 | assert_reexport_contains!(Reexport::Named("default", None, "module-name")); 531 | assert_reexport_contains!(Reexport::Named("default", Some("foo"), "module-name")); 532 | } 533 | -------------------------------------------------------------------------------- /vuln-reach/src/javascript/lang/imports.rs: -------------------------------------------------------------------------------- 1 | //! ESM and CommonJS imports. 2 | 3 | use std::iter::Copied; 4 | use std::ops::Deref; 5 | 6 | use lazy_static::lazy_static; 7 | use tree_sitter::{Node, Query, QueryCursor}; 8 | 9 | use crate::{Cursor, Error, Tree, JS}; 10 | 11 | // CommonJS 12 | // 13 | // Imports in CommonJS are always of a single form: 14 | // 15 | // require("module") 16 | // 17 | // Destructuring, assignments and member lookups should be dealt with by the 18 | // reachability queries. 19 | 20 | #[derive(Debug, Clone, Copy)] 21 | pub struct CommonJsImport<'a> { 22 | node: Node<'a>, 23 | source: &'a str, 24 | } 25 | 26 | impl<'a> CommonJsImport<'a> { 27 | /// The node of the import. 28 | /// 29 | /// # Example 30 | /// 31 | /// In the following, it is the node referring to the string segment `bar`. 32 | /// 33 | /// ```js 34 | /// const foo = require('bar') 35 | /// ``` 36 | pub fn node(&self) -> Node<'a> { 37 | self.node 38 | } 39 | 40 | /// The source of the import; a string representation of the import node. 41 | /// 42 | /// # Example 43 | /// 44 | /// In the following, it is the string `bar`. 45 | /// 46 | /// ```js 47 | /// const foo = require('bar') 48 | /// ``` 49 | pub fn source(&self) -> &'a str { 50 | self.source 51 | } 52 | 53 | /// The node through which the import is accessed. 54 | /// 55 | /// # Example 56 | /// 57 | /// In the following, the `bar` string segment is `self.node`, 58 | /// and `foo` is the access node. 59 | /// 60 | /// ```js 61 | /// const foo = require('bar') 62 | /// ``` 63 | pub fn access_node(&self, tree: &'a Tree) -> Node<'a> { 64 | let cursor = Cursor::new(tree, self.node).unwrap(); 65 | cursor 66 | .parents() 67 | .find_map(|node| match node.kind() { 68 | "assignment_expression" | "augmented_assignment_expression" => { 69 | node.child_by_field_name("left") 70 | }, 71 | "variable_declarator" => node.child_by_field_name("name"), 72 | _ => None, 73 | }) 74 | .unwrap_or(self.node) 75 | } 76 | } 77 | 78 | #[derive(Default, Debug)] 79 | pub struct CommonJsImports<'a>(Vec>); 80 | 81 | impl<'a> CommonJsImports<'a> { 82 | pub fn new(imports: Vec>) -> Self { 83 | Self(imports) 84 | } 85 | } 86 | 87 | impl<'a> Deref for CommonJsImports<'a> { 88 | type Target = Vec>; 89 | 90 | fn deref(&self) -> &Self::Target { 91 | &self.0 92 | } 93 | } 94 | 95 | impl<'a> TryFrom<&'a Tree> for CommonJsImports<'a> { 96 | type Error = Error; 97 | 98 | fn try_from(tree: &'a Tree) -> Result { 99 | let mut cur = QueryCursor::new(); 100 | 101 | lazy_static! { 102 | static ref QUERY: Query = 103 | Query::new(*JS, include_str!("../queries/commonjs-imports.lsp")).unwrap(); 104 | }; 105 | 106 | Ok(CommonJsImports( 107 | cur.matches(&QUERY, tree.root_node(), tree.buf().as_bytes()) 108 | .map(|query_match| query_match.captures[1].node) 109 | .map(|node| CommonJsImport { node, source: tree.repr_of(node) }) 110 | .collect(), 111 | )) 112 | } 113 | } 114 | 115 | impl<'a> IntoIterator for CommonJsImports<'a> { 116 | type IntoIter = > as IntoIterator>::IntoIter; 117 | type Item = CommonJsImport<'a>; 118 | 119 | fn into_iter(self) -> Self::IntoIter { 120 | self.0.into_iter() 121 | } 122 | } 123 | 124 | impl<'a, 'b> IntoIterator for &'b CommonJsImports<'a> { 125 | type IntoIter = Copied<<&'b Vec> as IntoIterator>::IntoIter>; 126 | type Item = CommonJsImport<'a>; 127 | 128 | fn into_iter(self) -> Self::IntoIter { 129 | self.0.iter().copied() 130 | } 131 | } 132 | 133 | // ESM Imports 134 | // 135 | // All the compatible ESM import statements are the following: 136 | // 137 | // 1. Default imports 138 | // - import name from 'module' Puts `name` in the module scope, refers to the 139 | // object exported as `default` in 'module'. 140 | // 2. Namespace imports 141 | // - import * as name from 'module' Puts `name` in the module scope, refers 142 | // to the object of all exports from 'module'. 143 | // 3. Named imports 144 | // - import { foo, bar } from 'module' Puts `foo` and `bar` in the module 145 | // scope. 146 | // 4. Aliased named imports 147 | // - import { foo as bar } from 'module' Puts `bar` in the module scope, 148 | // refers to `foo` in 'module'. 149 | // 5. Reexports (actually an export, but works like an import!) 150 | // - export { foo as bar } from 'module' 151 | // 152 | 153 | #[derive(Debug, Clone)] 154 | pub enum EsmImport<'a> { 155 | Named { 156 | name: &'a str, 157 | alias: Option<&'a str>, 158 | alias_node: Option>, 159 | source: &'a str, 160 | node: Node<'a>, 161 | }, 162 | Namespace { 163 | name: &'a str, 164 | source: &'a str, 165 | node: Node<'a>, 166 | }, 167 | Default { 168 | name: &'a str, 169 | source: &'a str, 170 | node: Node<'a>, 171 | }, 172 | } 173 | 174 | #[derive(Default, Debug)] 175 | pub struct EsmImports<'a>(Vec>); 176 | 177 | impl<'a> EsmImports<'a> { 178 | pub fn new(imports: Vec>) -> Self { 179 | Self(imports) 180 | } 181 | } 182 | 183 | impl<'a> Deref for EsmImports<'a> { 184 | type Target = Vec>; 185 | 186 | fn deref(&self) -> &Self::Target { 187 | &self.0 188 | } 189 | } 190 | 191 | impl<'a> TryFrom<&'a Tree> for EsmImports<'a> { 192 | type Error = Error; 193 | 194 | fn try_from(tree: &'a Tree) -> Result { 195 | let mut cur = QueryCursor::new(); 196 | 197 | lazy_static! { 198 | static ref QUERY: Query = 199 | Query::new(*JS, include_str!("../queries/esm-imports.lsp")).unwrap(); 200 | }; 201 | 202 | let mut imports = Vec::new(); 203 | 204 | for query_match in cur.matches(&QUERY, tree.root_node(), tree.buf().as_bytes()) { 205 | match query_match.pattern_index { 206 | 0 => { 207 | // import name from "module" 208 | imports.push(EsmImport::Default { 209 | name: tree.repr_of(query_match.captures[0].node), 210 | source: tree.repr_of(query_match.captures[1].node), 211 | node: query_match.captures[0].node, 212 | }); 213 | }, 214 | 1 => { 215 | // import * as name from "module" 216 | imports.push(EsmImport::Namespace { 217 | name: tree.repr_of(query_match.captures[0].node), 218 | source: tree.repr_of(query_match.captures[1].node), 219 | node: query_match.captures[0].node, 220 | }); 221 | }, 222 | 2 => { 223 | // import { foo } from "module" 224 | // import { foo as bar } from "module" 225 | let import_spec = query_match.captures[0].node; 226 | let source = query_match.captures[1].node; 227 | let name = import_spec.child_by_field_name("name").unwrap(); 228 | let alias = import_spec.child_by_field_name("alias"); 229 | imports.push(EsmImport::Named { 230 | name: tree.repr_of(name), 231 | source: tree.repr_of(source), 232 | alias: alias.map(|node| tree.repr_of(node)), 233 | alias_node: alias, 234 | node: name, 235 | }); 236 | }, 237 | k => unreachable!("{}: {:#?}", k, query_match), 238 | } 239 | } 240 | 241 | Ok(EsmImports(imports)) 242 | } 243 | } 244 | 245 | impl<'a> IntoIterator for EsmImports<'a> { 246 | type IntoIter = > as IntoIterator>::IntoIter; 247 | type Item = EsmImport<'a>; 248 | 249 | fn into_iter(self) -> Self::IntoIter { 250 | self.0.into_iter() 251 | } 252 | } 253 | 254 | impl<'a, 'b> IntoIterator for &'b EsmImports<'a> { 255 | type IntoIter = <&'b Vec> as IntoIterator>::IntoIter; 256 | type Item = &'b EsmImport<'a>; 257 | 258 | fn into_iter(self) -> Self::IntoIter { 259 | self.0.iter() 260 | } 261 | } 262 | 263 | impl<'a> EsmImport<'a> { 264 | pub fn name(&self) -> &str { 265 | match self { 266 | &EsmImport::Named { name, .. } 267 | | &EsmImport::Namespace { name, .. } 268 | | &EsmImport::Default { name, .. } => name, 269 | } 270 | } 271 | 272 | pub fn source(&self) -> &str { 273 | match self { 274 | &EsmImport::Named { source, .. } 275 | | &EsmImport::Namespace { source, .. } 276 | | &EsmImport::Default { source, .. } => source, 277 | } 278 | } 279 | 280 | pub fn alias(&self) -> Option<&str> { 281 | match self { 282 | EsmImport::Named { alias, .. } => *alias, 283 | _ => None, 284 | } 285 | } 286 | 287 | pub fn alias_node(&self) -> Option> { 288 | match self { 289 | EsmImport::Named { alias_node, .. } => *alias_node, 290 | _ => None, 291 | } 292 | } 293 | 294 | pub fn node(&self) -> Node<'a> { 295 | match self { 296 | EsmImport::Named { node, .. } 297 | | EsmImport::Namespace { node, .. } 298 | | EsmImport::Default { node, .. } => *node, 299 | } 300 | } 301 | } 302 | 303 | #[derive(Debug)] 304 | pub enum Imports<'a> { 305 | Esm(EsmImports<'a>), 306 | CommonJs(CommonJsImports<'a>), 307 | None, 308 | } 309 | 310 | impl<'a> Imports<'a> { 311 | pub fn new(tree: &'a Tree) -> Self { 312 | if let Ok(esm_imports) = EsmImports::try_from(tree) { 313 | if !esm_imports.is_empty() { 314 | return Imports::Esm(esm_imports); 315 | } 316 | } 317 | 318 | if let Ok(cjs_imports) = CommonJsImports::try_from(tree) { 319 | if !cjs_imports.is_empty() { 320 | return Imports::CommonJs(cjs_imports); 321 | } 322 | } 323 | 324 | Imports::None 325 | } 326 | } 327 | 328 | #[cfg(test)] 329 | mod tests { 330 | use super::*; 331 | 332 | #[test] 333 | fn test_commonjs_imports() { 334 | let tree = Tree::new( 335 | r#" 336 | const name1 = require("foo") 337 | let name2 = require("foo") 338 | var name3 = require("foo") 339 | name2 = require("foo").bar.baz 340 | "# 341 | .to_string(), 342 | ) 343 | .unwrap(); 344 | 345 | let imports = CommonJsImports::try_from(&tree).unwrap(); 346 | println!("{:#?}", imports); 347 | 348 | assert_eq!(imports.len(), 4); 349 | 350 | for import in imports { 351 | assert_eq!("foo", tree.repr_of(import.node())); 352 | assert_eq!("foo", import.source()); 353 | } 354 | } 355 | 356 | #[test] 357 | fn test_esm_imports() { 358 | let tree = Tree::new( 359 | r#" 360 | import * as namespaced from 'foo' 361 | import defaultImport from 'foo' 362 | import defaultImportComposite, { named1, named2 as foo } from 'foo' 363 | "# 364 | .to_string(), 365 | ) 366 | .unwrap(); 367 | 368 | let imports = EsmImports::try_from(&tree).unwrap(); 369 | println!("{:#?}", imports); 370 | 371 | assert_eq!(imports.0.len(), 5); 372 | 373 | assert!(matches!(imports.0[0], EsmImport::Namespace { 374 | name: "namespaced", 375 | source: "foo", 376 | .. 377 | })); 378 | assert!(matches!(imports.0[1], EsmImport::Default { 379 | name: "defaultImport", 380 | source: "foo", 381 | .. 382 | })); 383 | assert!(matches!(imports.0[2], EsmImport::Default { 384 | name: "defaultImportComposite", 385 | source: "foo", 386 | .. 387 | })); 388 | assert!(matches!(imports.0[3], EsmImport::Named { 389 | name: "named1", 390 | alias: None, 391 | source: "foo", 392 | .. 393 | })); 394 | assert!(matches!(imports.0[4], EsmImport::Named { 395 | name: "named2", 396 | alias: Some("foo"), 397 | source: "foo", 398 | .. 399 | })); 400 | } 401 | } 402 | -------------------------------------------------------------------------------- /vuln-reach/src/javascript/lang/mod.rs: -------------------------------------------------------------------------------- 1 | //! Javascript language facilities. 2 | pub mod accesses; 3 | pub mod exports; 4 | pub mod imports; 5 | pub mod symbol_table; 6 | -------------------------------------------------------------------------------- /vuln-reach/src/javascript/lang/symbol_table.rs: -------------------------------------------------------------------------------- 1 | //! Table of symbols in a source file. 2 | 3 | use std::collections::{HashMap, HashSet}; 4 | use std::mem; 5 | 6 | use itertools::Itertools; 7 | use tree_sitter::{Node, QueryCursor}; 8 | 9 | use crate::{Cursor, Tree}; 10 | 11 | // A (lexical) scope. 12 | #[derive(Debug)] 13 | pub struct Scope<'a> { 14 | level: usize, 15 | node: Node<'a>, 16 | names: HashSet>, 17 | assignments: HashSet>, 18 | } 19 | 20 | impl<'a> Scope<'a> { 21 | /// The nesting level of the scope. 22 | /// 23 | /// `0` is the scope for the `program` node. 24 | pub fn level(&self) -> usize { 25 | self.level 26 | } 27 | 28 | /// The tree node for this scope. 29 | pub fn node(&self) -> Node<'a> { 30 | self.node 31 | } 32 | 33 | /// The names that are defined in this scope. 34 | pub fn names(&self) -> &HashSet> { 35 | &self.names 36 | } 37 | 38 | /// Add a symbol to the names in the scope. 39 | pub fn define(&mut self, symbol: Node<'a>) { 40 | self.names.insert(symbol); 41 | } 42 | 43 | /// Lookup a symbol by name. 44 | /// 45 | /// Find the node among the `names` whose representation is equivalent to 46 | /// the supplied string. 47 | pub fn lookup_by_name(&self, name: &str, tree: &Tree) -> HashSet> { 48 | self.names.iter().filter(|&&n| tree.repr_of(n) == name).copied().collect() 49 | } 50 | } 51 | 52 | // A stateful visitor that traverses the AST and gather the scoped symbol table. 53 | #[derive(Debug, Default)] 54 | struct SymbolTableBuilder<'a> { 55 | cur_level: usize, 56 | scope_table: Vec>, 57 | scope_stack: Vec>, 58 | visited_nodes: HashSet>, 59 | } 60 | 61 | impl<'a> SymbolTableBuilder<'a> { 62 | fn build(tree: &'a Tree) -> SymbolTable<'a> { 63 | let mut visitor = SymbolTableBuilder::default(); 64 | 65 | // Recursively visit all the nodes in the tree, starting from the root. 66 | visitor.visit(tree.root_node()); 67 | 68 | // Reverse the scope table. This makes it so that the first scope is the 69 | // `program` node's scope. 70 | visitor.scope_table = visitor.scope_table.into_iter().rev().collect(); 71 | 72 | // Create a dictionary that associates each scope node to its index 73 | // in the scope table. 74 | let scope_indices = visitor 75 | .scope_table 76 | .iter() 77 | .enumerate() 78 | .map(|(idx, scope)| (scope.node, idx)) 79 | .collect::>(); 80 | 81 | let mut table = SymbolTable { tree, scopes: visitor.scope_table, scope_indices }; 82 | 83 | // Hoist assignments which were not declared as variables. 84 | for i in 0..table.scopes.len() { 85 | let assignments = mem::take(&mut table.scopes[i].assignments); 86 | for name in assignments { 87 | let cursor = Cursor::new(tree, name).unwrap(); 88 | if table.lookup(cursor).is_none() { 89 | table.scopes[0].define(name); 90 | } 91 | } 92 | } 93 | 94 | table 95 | } 96 | 97 | /// Retrieve the root scope. 98 | fn root_scope(&mut self) -> &mut Scope<'a> { 99 | self.scope_stack.first_mut().unwrap() 100 | } 101 | 102 | // Starting from the end of the scope stack, find the first scope that's a 103 | // function; default to the program scope if none is found. 104 | fn find_parent_function_scope(&mut self) -> &mut Scope<'a> { 105 | self.scope_stack 106 | .iter_mut() 107 | .rev() 108 | .find(|scope| { 109 | if scope.level == 0 { 110 | return true; 111 | } 112 | 113 | let parent = scope.node.parent().unwrap(); 114 | 115 | matches!( 116 | parent.kind(), 117 | "function_declaration" 118 | | "generator_function_declaration" 119 | | "function" 120 | | "generator_function" 121 | ) 122 | }) 123 | .unwrap() 124 | } 125 | 126 | // Push a scope on the stack. 127 | fn push_scope(&mut self, node: Node<'a>) { 128 | self.scope_stack.push(Scope { 129 | node, 130 | level: self.cur_level, 131 | assignments: Default::default(), 132 | names: Default::default(), 133 | }); 134 | self.cur_level += 1; 135 | } 136 | 137 | // Pop the last scope off the stack, and add it to the scope table. 138 | fn pop_scope(&mut self) { 139 | self.scope_table.push(self.scope_stack.pop().unwrap()); 140 | self.cur_level -= 1; 141 | } 142 | 143 | // Recursively visit a node in a depth-first fashion. 144 | // 145 | // Visiting a node alters the state of the scope tree by adding scopes and 146 | // defining names according to the language semantics of each kind of node. 147 | fn visit(&mut self, node: Node<'a>) { 148 | // Each node is visited only once. 149 | if self.visited_nodes.contains(&node) { 150 | return; 151 | } 152 | 153 | self.visited_nodes.insert(node); 154 | 155 | match node.kind() { 156 | "statement_block" | "program" => { 157 | // Statement block and program nodes are the only node types 158 | // that create a new scope. 159 | self.push_scope(node); 160 | 161 | let mut cur_node = node.prev_named_sibling(); 162 | 163 | // Visit a `formal_parameters` sibling node, if it exists. 164 | while let Some(node) = cur_node { 165 | if node.kind() == "formal_parameters" { 166 | self.visit(node); 167 | } 168 | cur_node = node.prev_named_sibling(); 169 | } 170 | 171 | // Visit all of the child nodes of this scope. 172 | self.visit_children(node); 173 | 174 | // Go back to the parent scope. 175 | self.pop_scope(); 176 | }, 177 | "function_declaration" | "generator_function_declaration" | "class_declaration" => { 178 | // Function declarations show up in the parent *function* scope. 179 | 180 | let name = node.child_by_field_name(b"name").unwrap(); 181 | let scope = self.find_parent_function_scope(); 182 | scope.define(name); 183 | 184 | self.visit_children(node); 185 | }, 186 | "variable_declaration" => { 187 | // Variable declarations show up in the parent *function* scope. 188 | 189 | // Loop through the `variable_declarator` child nodes of this node. 190 | for declarator_node in node.named_children(&mut node.walk()) { 191 | if declarator_node.kind() != "variable_declarator" { 192 | continue; 193 | } 194 | 195 | let name = declarator_node.child_by_field_name(b"name").unwrap(); 196 | let scope = self.find_parent_function_scope(); 197 | scope.define(name); 198 | } 199 | 200 | self.visit_children(node); 201 | }, 202 | "lexical_declaration" => { 203 | // Lexical declarations (const, let) show up in the parent *lexical* scope, 204 | // i.e. if/for blocks etc. 205 | 206 | // Loop through the `variable_declarator` child nodes of this node. 207 | for declarator_node in node.named_children(&mut node.walk()) { 208 | if declarator_node.kind() != "variable_declarator" { 209 | continue; 210 | } 211 | 212 | // Retrieve the current scope from the stack. 213 | let scope = self.scope_stack.last_mut().unwrap(); 214 | 215 | // Retrieve the name of the declaration. 216 | let name = declarator_node.child_by_field_name(b"name").unwrap(); 217 | 218 | // Define the name in the current scope. 219 | scope.define(name); 220 | } 221 | 222 | self.visit_children(node); 223 | }, 224 | "assignment_expression" | "augmented_assignment_expression" => { 225 | // Add assignments to their scope, allowing identification of hoisted variables. 226 | let scope = self.find_parent_function_scope(); 227 | let name = node.child_by_field_name(b"left").unwrap(); 228 | if !scope.names.contains(&name) { 229 | scope.assignments.insert(name); 230 | } 231 | 232 | self.visit_children(node); 233 | }, 234 | "formal_parameters" => { 235 | // Formal parameters show up in the scope of the function they belong to. 236 | 237 | // Retrieve the current scope. 238 | let scope = self.scope_stack.last_mut().unwrap(); 239 | 240 | // Define each formal parameter's identifier in the current scope. 241 | for i in 0..node.named_child_count() { 242 | let parameter_name = node.named_child(i).unwrap(); 243 | scope.define(parameter_name); 244 | } 245 | }, 246 | "catch_clause" => { 247 | // Catch clause identifier has to be registered in the child statement block. 248 | if let Some(catch_statement) = node.child_by_field_name(b"body") { 249 | // Create scope for catch block. 250 | self.push_scope(catch_statement); 251 | 252 | // Define catch parameter for its block. 253 | let scope = self.scope_stack.last_mut().unwrap(); 254 | 255 | if let Some(catch_param) = node.child_by_field_name(b"parameter") { 256 | scope.define(catch_param); 257 | } 258 | 259 | // Parse catch block children. 260 | self.visit_children(catch_statement); 261 | 262 | // Go back to parent scope. 263 | self.pop_scope(); 264 | } 265 | }, 266 | "import_specifier" => { 267 | // import { a, b as c } from 'foo' 268 | 269 | // Retrieve the root scope, as that's where all imports are defined. 270 | let scope = self.root_scope(); 271 | 272 | // An import defines a name in the root scope. The name can be defined as 273 | // itself or as an alias, if one is present. 274 | let name = node.child_by_field_name(b"name").unwrap(); 275 | let alias = node.child_by_field_name(b"alias"); 276 | scope.define(alias.unwrap_or(name)); 277 | 278 | self.visit_children(node); 279 | }, 280 | "namespace_import" | "import_clause" => { 281 | // import something from 'foo' 282 | // import * as something from 'foo' 283 | 284 | // Retrieve the root scope, as that's where all imports are defined. 285 | let scope = self.root_scope(); 286 | 287 | // For namespace imports and import clauses, directly put 288 | // immediate `identifier` children nodes in the root scope. 289 | for i in 0..node.named_child_count() { 290 | let identifier = node.named_child(i).unwrap(); 291 | if identifier.kind() == "identifier" { 292 | scope.define(identifier); 293 | } 294 | } 295 | 296 | self.visit_children(node); 297 | }, 298 | _ => { 299 | // Recursively keep visiting children for all the other kinds of nodes. 300 | self.visit_children(node); 301 | }, 302 | } 303 | } 304 | 305 | fn visit_children(&mut self, node: Node<'a>) { 306 | // Visit the children in reverse order so that the scopes are pushed 307 | // onto the stack in the correct order. 308 | for i in (0..node.named_child_count()).rev() { 309 | self.visit(node.named_child(i).unwrap()); 310 | } 311 | } 312 | } 313 | 314 | #[derive(Debug)] 315 | pub struct SymbolTable<'a> { 316 | tree: &'a Tree, 317 | scopes: Vec>, 318 | // Cache for O(1) lookups of the `scopes` vector. Associate each scope node 319 | // to its index in the scopes vector. 320 | scope_indices: HashMap, usize>, 321 | } 322 | 323 | impl<'a> SymbolTable<'a> { 324 | pub fn new(tree: &'a Tree) -> Self { 325 | SymbolTableBuilder::build(tree) 326 | } 327 | 328 | /// Return the list of scopes. 329 | pub fn scopes(&self) -> &Vec> { 330 | &self.scopes 331 | } 332 | 333 | /// Return the root scope. 334 | pub fn root_scope(&self) -> &Scope<'a> { 335 | &self.scopes[0] 336 | } 337 | 338 | /// Define a new symbol in the given scope. 339 | pub fn define(&mut self, scope: Node<'a>, symbol: Node<'a>) { 340 | if let Some(scope) = 341 | self.scope_indices.get(&scope).and_then(|&idx| self.scopes.get_mut(idx)) 342 | { 343 | scope.define(symbol); 344 | } 345 | } 346 | 347 | /// Retrieve the scope object for a given scope node. 348 | pub fn get_scope(&self, scope: Node<'a>) -> Option<&Scope<'a>> { 349 | self.scope_indices.get(&scope).and_then(|&idx| self.scopes.get(idx)) 350 | } 351 | 352 | /// Lookup an identifier. Returns the scope in which it is defined, and the 353 | /// node at which it is declared. 354 | pub fn lookup(&self, mut cursor: Cursor<'a>) -> Option<(&Scope<'a>, Node<'a>)> { 355 | // Skip lookup if the node is not an identifier. 356 | let node = cursor.node(); 357 | if node.kind() != "identifier" { 358 | return None; 359 | } 360 | 361 | let name = self.tree.repr_of(node); 362 | 363 | // For function parameters, the scope is the function body. 364 | let mut parent = cursor.goto_parent().unwrap(); 365 | if parent.kind() == "formal_parameters" { 366 | // Get function body. 367 | let grandparent = cursor.clone().goto_parent().unwrap(); 368 | let body = grandparent.child_by_field_name("body").unwrap(); 369 | 370 | if body.kind() == "statement_block" { 371 | let scope = self.get_scope(body).unwrap(); 372 | 373 | return Some((scope, node)); 374 | } else { 375 | // Lambda parameters without body do not belong to any scope. 376 | // 377 | // Example: `(param) => console.log(param);` 378 | return None; 379 | } 380 | } 381 | 382 | // For parameters in a catch clause, the scope is the catch body. 383 | if parent.kind() == "catch_clause" { 384 | // Get catch body. 385 | let body = parent.child_by_field_name("body").unwrap(); 386 | let scope = self.get_scope(body).unwrap(); 387 | 388 | return Some((scope, node)); 389 | } 390 | 391 | // Find parent scope node. Guaranteed to exist: "program" is the topmost and 392 | // worst case. 393 | let parent_scope_node = loop { 394 | if matches!(parent.kind(), "statement_block" | "program") { 395 | break parent; 396 | } 397 | 398 | parent = cursor.goto_parent().unwrap(); 399 | }; 400 | 401 | // Find the parent scope (where the identifier is used). 402 | // 403 | // Invariants: scope_indices.get(parent_scope_node) exists if the 404 | // visitor algorithm is correct and all the scopes were accounted 405 | // for. Otherwise, the algorithm is incorrect and stop-the-world is 406 | // appropriate behavior. 407 | let mut parent_scope_index = *self.scope_indices.get(&parent_scope_node).unwrap(); 408 | 409 | // Find the declaration scope (where the identifier is declared). 410 | // 411 | // Starting from the parent scope's index, walk back through the parent 412 | // layers until a scope with the node name is declared. 413 | let (decl_scope, decl_node) = 414 | loop { 415 | // Retrieve the scope. 416 | let scope = &self.scopes[parent_scope_index]; 417 | 418 | // Find a declaration with the same text representation as the name we are 419 | // looking up. 420 | if let Some(decl_node) = scope.names.iter().find_map(|&node| { 421 | if self.tree.repr_of(node) == name { Some(node) } else { None } 422 | }) { 423 | break (scope, decl_node); 424 | } 425 | 426 | // If we walked all the way to the program node, we haven't found the 427 | // declaration. We can directly return `None` from here. 428 | if parent_scope_index == 0 { 429 | return None; 430 | } 431 | 432 | let cur_level = scope.level; 433 | 434 | // Walking upwards, find the first scope with level less than the current. 435 | while parent_scope_index > 0 && self.scopes[parent_scope_index].level >= cur_level { 436 | parent_scope_index -= 1; 437 | } 438 | }; 439 | 440 | Some((decl_scope, decl_node)) 441 | } 442 | 443 | // Pretty print the source code, with identifiers colored according to the 444 | // scope they belong to. 445 | pub fn pretty_display(&self) { 446 | // color_table[i] is the color of the i-th scope. 447 | let color_table = 448 | (0..self.scopes.len()).map(|color| 16 + ((color + 1) * 32) % 231).collect::>(); 449 | 450 | let query = self.tree.query("(identifier) @ident").unwrap(); 451 | let mut cur = QueryCursor::new(); 452 | let colorings: Vec<(Node<'a>, Option)> = cur 453 | .matches(&query, self.tree.root_node(), self.tree.buf().as_bytes()) 454 | .map(|query_match| { 455 | let node = query_match.captures[0].node; 456 | let cursor = Cursor::new(self.tree, node).unwrap(); 457 | let scope_index = self 458 | .lookup(cursor) 459 | .map(|(scope, _)| *self.scope_indices.get(&scope.node).unwrap()); 460 | (node, scope_index) 461 | }) 462 | .sorted_by_key(|(node, _)| node.start_byte()) 463 | .collect(); 464 | 465 | let mut cur_point = 0usize; 466 | let buf = self.tree.buf(); 467 | for (node, color_idx) in colorings { 468 | if node.start_byte() < cur_point { 469 | continue; 470 | } 471 | 472 | print!("\x1b[0m{}", &buf[cur_point..node.start_byte()],); 473 | if let Some(idx) = color_idx { 474 | print!("\x1b[38;5;{}m", color_table[idx]); 475 | } else { 476 | print!("\x1b[37m\x1b[41m"); 477 | } 478 | print!("{}", &buf[node.start_byte()..node.end_byte()]); 479 | cur_point = node.end_byte(); 480 | } 481 | print!("\x1b[0m{}", &buf[cur_point..]); 482 | } 483 | } 484 | 485 | #[cfg(test)] 486 | mod tests { 487 | use super::*; 488 | 489 | #[test] 490 | fn test_simple() { 491 | let tree = Tree::new( 492 | r#" 493 | const const_global 494 | 495 | function func1_global() { 496 | const const_scope1 497 | 498 | { 499 | const const_scope2 500 | } 501 | } 502 | 503 | function func2_global(param_scope3) { 504 | const const_scope3 505 | var var_scope3 506 | 507 | if (undefined_ident1 == 2) { 508 | let let_scope4 509 | } 510 | 511 | if (undefined_ident2 == 2) { 512 | var var_scope3 513 | let let_scope5 514 | 515 | function func3_scope3() { 516 | var var_scope6 517 | let let_scope6 518 | } 519 | } 520 | } 521 | "# 522 | .to_string(), 523 | ) 524 | .unwrap(); 525 | 526 | let st = SymbolTable::new(&tree); 527 | st.pretty_display(); 528 | } 529 | 530 | // Test the uncommon (but valid) situation in which there is a comment between 531 | // a formal parameter list and a function body. 532 | #[test] 533 | fn test_formal_param_comment() { 534 | let tree = Tree::new( 535 | r#" 536 | function fn1(arg1, arg2) {} 537 | function fn2(arg1, arg2) /* comment */ {} 538 | "# 539 | .to_string(), 540 | ) 541 | .expect("Should not panic"); 542 | 543 | let st = SymbolTable::new(&tree); 544 | st.pretty_display(); 545 | } 546 | } 547 | -------------------------------------------------------------------------------- /vuln-reach/src/javascript/mod.rs: -------------------------------------------------------------------------------- 1 | pub mod lang; 2 | pub mod module; 3 | pub mod package; 4 | pub mod project; 5 | -------------------------------------------------------------------------------- /vuln-reach/src/javascript/module/mod.rs: -------------------------------------------------------------------------------- 1 | //! Module-level (i.e. single Javascript file) data structures and methods. 2 | 3 | pub mod module_cache; 4 | pub mod resolver; 5 | 6 | use std::collections::HashSet; 7 | 8 | use itertools::Itertools; 9 | use ouroboros::self_referencing; 10 | use tree_sitter::Node; 11 | 12 | use super::lang::exports::EsmExport; 13 | use crate::javascript::lang::accesses::{AccessEdge, AccessGraph}; 14 | use crate::javascript::lang::exports::{CommonJsExports, EsmExports, Exports}; 15 | use crate::javascript::lang::imports::Imports; 16 | use crate::javascript::lang::symbol_table::SymbolTable; 17 | pub use crate::javascript::module::module_cache::ModuleCache; 18 | pub use crate::javascript::module::resolver::fs::FilesystemModuleResolver; 19 | pub use crate::javascript::module::resolver::mem::MemModuleResolver; 20 | pub use crate::javascript::module::resolver::tgz::TarballModuleResolver; 21 | pub use crate::javascript::module::resolver::ModuleResolver; 22 | use crate::{Error, Result, Tree}; 23 | 24 | #[derive(Clone, Debug)] 25 | pub(crate) enum PathToExport<'a> { 26 | Esm { name: &'a str, access_path: Vec> }, 27 | Cjs { name: &'a str, access_path: Vec>, export: Node<'a> }, 28 | SideEffect { name: &'a str, access_path: Vec>, effect_node: Node<'a> }, 29 | } 30 | 31 | impl<'a> PathToExport<'a> { 32 | pub(crate) fn name(&self) -> &str { 33 | match self { 34 | PathToExport::Esm { name, .. } 35 | | PathToExport::Cjs { name, .. } 36 | | PathToExport::SideEffect { name, .. } => name, 37 | } 38 | } 39 | } 40 | 41 | /// A module object. It contains all information about symbols, accesses, 42 | /// imports and exports and is self-referencing. 43 | #[self_referencing] 44 | pub struct Module { 45 | tree: Tree, 46 | #[borrows(tree)] 47 | #[covariant] 48 | symbol_table: SymbolTable<'this>, 49 | #[borrows(tree, symbol_table)] 50 | #[covariant] 51 | accesses: AccessGraph<'this>, 52 | #[borrows(tree)] 53 | #[covariant] 54 | imports: Imports<'this>, 55 | #[borrows(tree)] 56 | #[covariant] 57 | exports: Exports<'this>, 58 | } 59 | 60 | impl TryFrom for Module { 61 | type Error = Error; 62 | 63 | fn try_from(tree: Tree) -> Result { 64 | // Fail if there is an error anywhere in the AST. 65 | // 66 | // `tree-sitter` is capable of parsing code with errors in it, but 67 | // code with errors won't be executable by a runtime anyway, and as 68 | // such anything in it will also be unreachable. We can safely skip 69 | // processing these cases. 70 | if tree.root_node().has_error() { 71 | Err(Error::ParseError) 72 | } else { 73 | Ok(Module::new( 74 | tree, 75 | |tree| SymbolTable::new(tree), 76 | |tree, symbol_table| AccessGraph::new(tree, symbol_table), 77 | |tree| Imports::new(tree), 78 | |tree| Exports::new(tree), 79 | )) 80 | } 81 | } 82 | } 83 | 84 | impl Module { 85 | pub fn tree(&self) -> &Tree { 86 | self.borrow_tree() 87 | } 88 | 89 | pub fn imports(&self) -> &Imports { 90 | self.borrow_imports() 91 | } 92 | 93 | pub fn exports(&self) -> &Exports { 94 | self.borrow_exports() 95 | } 96 | 97 | pub fn symbol_table(&self) -> &SymbolTable { 98 | self.borrow_symbol_table() 99 | } 100 | 101 | pub fn accesses(&self) -> &AccessGraph { 102 | self.borrow_accesses() 103 | } 104 | 105 | /// Find paths to ES Module exports. 106 | fn paths_to_exports_esm<'a>( 107 | &'a self, 108 | source: Node<'a>, 109 | exports: &'a EsmExports, 110 | ) -> Result>> { 111 | let mut export_nodes = HashSet::new(); 112 | 113 | // If there is a default export, add its node to the set. 114 | if let Some(export) = exports.default.as_ref() { 115 | export_nodes.insert(export.node()); 116 | } 117 | 118 | // Add all exported objects' nodes to the set. 119 | for export in exports.objects.values() { 120 | export_nodes.insert(export.node()); 121 | } 122 | 123 | // Compute paths to all ESM export nodes. 124 | Ok(self 125 | .accesses() 126 | .compute_paths( 127 | |access| { 128 | // An access is a target if either its node or its scope are in the set 129 | // of export nodes. 130 | export_nodes.contains(&access.node) || export_nodes.contains(&access.scope) 131 | }, 132 | source, 133 | )? 134 | .into_iter() 135 | .map(|access_path| { 136 | let last_node = access_path.last().unwrap(); 137 | // The name of this path to export is either "default" or the name of the 138 | // last identifier in the path. 139 | // 140 | // The guard condition ensures that `last_node` is the default export node, 141 | // to prevent shadowing other exports. 142 | let name = match exports.default.as_ref() { 143 | Some(EsmExport::Scope(node)) if node == &last_node.access().scope => "default", 144 | Some(EsmExport::Name(node)) if node == &last_node.access().node => "default", 145 | Some(export @ EsmExport::Expression(_)) 146 | if export.expression_contains(last_node.access().node) => 147 | { 148 | "default" 149 | }, 150 | _ => self.tree().repr_of(last_node.accessed()), 151 | }; 152 | 153 | PathToExport::Esm { name, access_path } 154 | }) 155 | .collect()) 156 | } 157 | 158 | /// Find paths to CommonJS exports. 159 | fn paths_to_exports_cjs<'a>( 160 | &'a self, 161 | source: Node<'a>, 162 | exports: &'a CommonJsExports, 163 | ) -> Result>> { 164 | Ok(self 165 | .accesses() 166 | .compute_paths( 167 | |access| { 168 | // Depending on which kind of export we have in this module, run a different 169 | // check against the current access to determine whether that is a target. 170 | match exports { 171 | CommonJsExports::Name(n) => access.node == *n, 172 | CommonJsExports::Scope(s) => access.scope == *s, 173 | CommonJsExports::Object(o) => { 174 | o.values().contains(&access.node) || o.values().contains(&access.scope) 175 | }, 176 | CommonJsExports::None => false, 177 | } 178 | }, 179 | source, 180 | )? 181 | .into_iter() 182 | .map(|access_path| { 183 | let last_access = access_path.last().unwrap(); 184 | 185 | // Determine the name of the exported node in the path. 186 | let (name, export) = match exports { 187 | &CommonJsExports::Name(n) => (self.tree().repr_of(n), n), 188 | &CommonJsExports::Scope(s) => ("", s), 189 | CommonJsExports::Object(o) => o 190 | .iter() 191 | .find_map(|(&k, &v)| { 192 | if last_access.access().node == v || last_access.access().scope == v { 193 | Some((k, v)) 194 | } else { 195 | None 196 | } 197 | }) 198 | .unwrap(), 199 | CommonJsExports::None => unreachable!(), 200 | }; 201 | 202 | PathToExport::Cjs { name, access_path, export } 203 | }) 204 | .collect()) 205 | } 206 | 207 | /// Find paths to side effects, i.e. nodes that are always accessed when 208 | /// importing a module. 209 | fn paths_to_side_effects<'a>(&'a self, source: Node<'a>) -> Result>> { 210 | Ok(self 211 | .accesses() 212 | .compute_paths(|access| access.is_side_effect(), source)? 213 | .into_iter() 214 | .map(|access_path| { 215 | let last_access = access_path.last().unwrap().access(); 216 | 217 | // The name of this path is the representation of its access node. 218 | let effect_node = last_access.node; 219 | let name = self.tree().repr_of(effect_node); 220 | 221 | PathToExport::SideEffect { name, access_path, effect_node } 222 | }) 223 | .collect()) 224 | } 225 | 226 | /// Find the paths to exports for all kinds of modules. 227 | pub(crate) fn paths_to_exports<'a>( 228 | &'a self, 229 | source: Node<'a>, 230 | ) -> Result>> { 231 | // Always include paths to side effects. 232 | let mut paths = self.paths_to_side_effects(source)?; 233 | 234 | match self.exports() { 235 | Exports::Esm(exports) => paths.extend(self.paths_to_exports_esm(source, exports)?), 236 | Exports::CommonJs(exports) => paths.extend(self.paths_to_exports_cjs(source, exports)?), 237 | Exports::None => {}, 238 | } 239 | 240 | Ok(paths) 241 | } 242 | } 243 | 244 | #[cfg(test)] 245 | mod tests { 246 | use std::fmt::Write; 247 | 248 | use textwrap::dedent; 249 | use tree_sitter::Point; 250 | 251 | use super::*; 252 | 253 | impl<'a> PathToExport<'a> { 254 | pub(crate) fn repr(&self, tree: &Tree) -> String { 255 | let mut buf = String::new(); 256 | 257 | match self { 258 | PathToExport::Esm { name, access_path } => { 259 | write!(&mut buf, "`{}`: ", name).ok(); 260 | for edge in access_path { 261 | let accessed = edge.accessed(); 262 | write!( 263 | &mut buf, 264 | "{}:{},{}", 265 | tree.repr_of(accessed), 266 | accessed.start_position().row, 267 | accessed.start_position().column 268 | ) 269 | .ok(); 270 | 271 | if edge.access().accessor.is_some() { 272 | write!(&mut buf, " -> ").ok(); 273 | } 274 | } 275 | }, 276 | PathToExport::Cjs { name, access_path, export } => { 277 | write!(&mut buf, "`{}`: ", name).ok(); 278 | for edge in access_path { 279 | let accessed = edge.accessed(); 280 | write!( 281 | &mut buf, 282 | "{}:{},{}", 283 | tree.repr_of(accessed), 284 | accessed.start_position().row, 285 | accessed.start_position().column 286 | ) 287 | .ok(); 288 | 289 | write!(&mut buf, " -> ").ok(); 290 | } 291 | 292 | write!( 293 | &mut buf, 294 | "{}:{},{}", 295 | name, 296 | export.start_position().row, 297 | export.start_position().column 298 | ) 299 | .ok(); 300 | }, 301 | PathToExport::SideEffect { name, access_path, effect_node } => { 302 | write!(&mut buf, "`{}`: ", name).ok(); 303 | for edge in access_path { 304 | let accessed = edge.accessed(); 305 | write!( 306 | &mut buf, 307 | "{}:{},{}", 308 | tree.repr_of(accessed), 309 | accessed.start_position().row, 310 | accessed.start_position().column 311 | ) 312 | .ok(); 313 | 314 | write!(&mut buf, " -> ").ok(); 315 | } 316 | 317 | write!( 318 | &mut buf, 319 | "{}:{},{}", 320 | name, 321 | effect_node.start_position().row, 322 | effect_node.start_position().column 323 | ) 324 | .ok(); 325 | }, 326 | } 327 | 328 | buf 329 | } 330 | } 331 | 332 | #[ignore] 333 | #[test] 334 | fn test_paths_to_exports_mjs_default_function() { 335 | let module = Module::try_from( 336 | Tree::new(dedent( 337 | r#" 338 | const foo = 3 339 | 340 | function bar() { 341 | const c = foo 342 | } 343 | 344 | export default function() { 345 | bar() 346 | } 347 | "#, 348 | )) 349 | .unwrap(), 350 | ) 351 | .unwrap(); 352 | 353 | let paths = module 354 | .paths_to_exports( 355 | module 356 | .tree() 357 | .root_node() 358 | .descendant_for_point_range(Point::new(1, 6), Point::new(1, 8)) 359 | .unwrap(), 360 | ) 361 | .unwrap() 362 | .into_iter() 363 | .filter(|path| matches!(path, PathToExport::Esm { .. })) 364 | .collect::>(); 365 | 366 | assert_eq!(paths.len(), 1); 367 | 368 | let path = paths.into_iter().next().unwrap(); 369 | println!("{}", path.repr(module.tree())); 370 | assert_eq!(path.name(), "default"); 371 | } 372 | 373 | #[ignore] 374 | #[test] 375 | fn test_paths_to_exports_mjs_default_binding() { 376 | let module = Module::try_from( 377 | Tree::new(dedent( 378 | r#" 379 | const foo = 3 380 | 381 | function bar() { 382 | const c = foo 383 | } 384 | 385 | export default bar 386 | "#, 387 | )) 388 | .unwrap(), 389 | ) 390 | .unwrap(); 391 | 392 | let paths = module 393 | .paths_to_exports( 394 | module 395 | .tree() 396 | .root_node() 397 | .descendant_for_point_range(Point::new(1, 6), Point::new(1, 8)) 398 | .unwrap(), 399 | ) 400 | .unwrap() 401 | .into_iter() 402 | .filter(|path| matches!(path, PathToExport::Esm { .. })) 403 | .collect::>(); 404 | 405 | assert_eq!(paths.len(), 1); 406 | 407 | let path = paths.into_iter().next().unwrap(); 408 | println!("{}", path.repr(module.tree())); 409 | assert_eq!(path.name(), "default"); 410 | } 411 | 412 | #[test] 413 | fn test_paths_to_side_effects() { 414 | let module = Module::try_from( 415 | Tree::new(dedent( 416 | r#" 417 | let value = 3 418 | 419 | function foo() { 420 | value = 4 421 | } 422 | 423 | function bar() { 424 | foo() 425 | } 426 | 427 | foo() 428 | "#, 429 | )) 430 | .unwrap(), 431 | ) 432 | .unwrap(); 433 | 434 | let paths = module 435 | .paths_to_exports( 436 | module 437 | .tree() 438 | .root_node() 439 | .descendant_for_point_range(Point::new(1, 4), Point::new(1, 8)) 440 | .unwrap(), 441 | ) 442 | .unwrap(); 443 | 444 | assert!(paths.len() == 1, "Wrong number of paths found"); 445 | 446 | let Some(PathToExport::SideEffect { name, access_path, effect_node }) = 447 | paths.into_iter().next() 448 | else { 449 | panic!("Path found is not to a side effect") 450 | }; 451 | 452 | assert_eq!(name, "foo", "Wrong side effect name {name}"); 453 | assert_eq!( 454 | effect_node.start_position(), 455 | Point::new(11, 0), 456 | "Wrong side effect node position" 457 | ); 458 | assert_eq!( 459 | access_path.first().unwrap().accessed().start_position(), 460 | Point::new(1, 4), 461 | "Wrong node accessed" 462 | ); 463 | } 464 | 465 | #[test] 466 | fn test_module_with_errors() { 467 | let tree = Tree::new( 468 | r#" 469 | #[test] 470 | fn test_function() { 471 | panic!("I am not even JavaScript code"); 472 | } 473 | "# 474 | .to_string(), 475 | ) 476 | .expect("The tree should be parsed anyway"); 477 | 478 | assert!(matches!(Module::try_from(tree), Err(Error::ParseError))); 479 | } 480 | } 481 | -------------------------------------------------------------------------------- /vuln-reach/src/javascript/module/module_cache.rs: -------------------------------------------------------------------------------- 1 | use std::collections::hash_map::Entry; 2 | use std::collections::{HashMap, HashSet, VecDeque}; 3 | use std::fmt::Write; 4 | use std::path::{Path, PathBuf}; 5 | 6 | use super::resolver::resolve_path; 7 | use crate::javascript::lang::imports::Imports; 8 | use crate::javascript::module::{Module, ModuleResolver}; 9 | use crate::Result; 10 | 11 | // Type aliases are just for clarity. 12 | type RelativeSpec = PathBuf; 13 | type AbsoluteSpec = PathBuf; 14 | 15 | /// Represents a graph of modules in a package. 16 | pub struct ModuleCache { 17 | /// The dictionary that maps absolute import specs to Module objects. 18 | cache: HashMap, 19 | /// The graph of import relationships. 20 | /// 21 | /// Example: if `foo/bar/baz.js` imports `../quux.js`, there will 22 | /// be a mapping of the form 23 | /// ```js 24 | /// { "foo/bar/baz.js": { "../quux.js": "foo/quux.js" } } 25 | /// ``` 26 | module_graph: HashMap>, 27 | } 28 | 29 | impl ModuleCache { 30 | /// Construct the cache by going through all the paths in the provided 31 | /// resolver. 32 | pub fn new(resolver: &R) -> Result { 33 | ModuleCache::with_initial_nodes(resolver, resolver.all_paths()) 34 | } 35 | 36 | /// Construct the cache by going through the specified entry point only. 37 | /// 38 | /// This will create a smaller graph with only the imports reachable from 39 | /// the specified entry point. 40 | pub fn with_entry_point, R: ModuleResolver>( 41 | resolver: &R, 42 | entry_point: P, 43 | ) -> Result { 44 | ModuleCache::with_initial_nodes(resolver, Some(entry_point)) 45 | } 46 | 47 | /// Generic constructor that evaluates edges going out of all the supplied 48 | /// import paths. 49 | fn with_initial_nodes, R: ModuleResolver>( 50 | resolver: &R, 51 | nodes: impl IntoIterator, 52 | ) -> Result { 53 | // Create a queue and add all the supplied import paths. 54 | let mut q = VecDeque::<(Option, PathBuf)>::new(); 55 | nodes.into_iter().for_each(|node| { 56 | q.push_back((None, node.as_ref().to_path_buf())); 57 | }); 58 | 59 | // Cache visited nodes (i.e. modules). 60 | let mut visited = HashSet::new(); 61 | 62 | let mut cache = HashMap::default(); 63 | let mut module_graph: HashMap> = 64 | HashMap::default(); 65 | 66 | while !q.is_empty() { 67 | // Pick the top of the queue. 68 | let (current_module_spec, import_spec) = q.pop_front().unwrap(); 69 | 70 | // Skip if the module has already been visited. 71 | { 72 | let b = (current_module_spec.clone(), import_spec.clone()); 73 | if visited.contains(&b) { 74 | continue; 75 | } 76 | visited.insert(b); 77 | } 78 | 79 | // Ignore resolution errors for now; we only need to load the package's modules. 80 | // Resolution errors can either be bona fide mistakes (i.e. a module which 81 | // doesn't exist in the package, but should) or dependency imports 82 | // (which shouldn't exist in a context where only the package itself 83 | // exists, i.e. a tarball). 84 | 85 | // Retrieve absolute spec. For example, if 'foo/bar.js' imports '../baz/quux', 86 | // return 'foo/baz/quux'. 87 | let absolute_spec = match current_module_spec 88 | .as_ref() 89 | .map(|current_module_spec| { 90 | resolver.resolve_relative(&import_spec, current_module_spec) 91 | }) 92 | .unwrap_or_else(|| resolver.resolve_absolute(&import_spec)) 93 | { 94 | Ok(absolute_spec) => absolute_spec, 95 | Err(_) => continue, 96 | }; 97 | 98 | // Load modules into cache on demand. 99 | let module = match cache.entry(absolute_spec.clone()) { 100 | Entry::Occupied(module) => module.into_mut(), 101 | Entry::Vacant(vacant) => { 102 | // Load a new module. 103 | let module = match resolver.load(&absolute_spec) { 104 | Ok(module) => module, 105 | Err(_) => continue, 106 | }; 107 | vacant.insert(module) 108 | }, 109 | }; 110 | 111 | // Only process import sources now. 112 | match module.imports() { 113 | Imports::Esm(esm_imports) => { 114 | // Add to the queue all the paths that the current ES Module file imports. 115 | for import in esm_imports { 116 | q.push_back((Some(absolute_spec.clone()), import.source().into())); 117 | } 118 | }, 119 | Imports::CommonJs(cjs_imports) => { 120 | // Add to the queue all the paths that the current CommonJS file imports. 121 | for import in cjs_imports { 122 | q.push_back((Some(absolute_spec.clone()), import.source().into())); 123 | } 124 | }, 125 | Imports::None => {}, 126 | } 127 | 128 | // Insert an edge to the import in the base module's adjacency list. 129 | if let Some(current_module) = current_module_spec { 130 | module_graph 131 | .entry(current_module.to_path_buf()) 132 | .or_default() 133 | .insert(import_spec, absolute_spec); 134 | } 135 | } 136 | 137 | Ok(Self { cache, module_graph }) 138 | } 139 | 140 | /// Retrieve the dependencies of the given import specification. 141 | pub fn module_deps>( 142 | &self, 143 | spec: P, 144 | ) -> Option<&HashMap> { 145 | self.module_graph.get(spec.as_ref()) 146 | } 147 | 148 | /// Retrieve the module associated to the given specification. 149 | pub fn module>(&self, spec: P) -> Option<&Module> { 150 | self.cache.get(spec.as_ref()) 151 | } 152 | 153 | /// Get the dictionary of all the modules. 154 | pub fn modules(&self) -> &HashMap { 155 | &self.cache 156 | } 157 | 158 | /// Get the dependency graph. 159 | pub fn graph(&self) -> &HashMap> { 160 | &self.module_graph 161 | } 162 | 163 | // Find out which modules import the specified dependency. 164 | pub fn dependents_of<'a, P: AsRef + 'a>( 165 | &'a self, 166 | dependency: P, 167 | ) -> impl Iterator { 168 | self.graph().iter().filter_map(move |(spec, deps)| { 169 | deps.values() 170 | .find_map(|import_spec| resolve_path(import_spec, |f| f == dependency.as_ref())) 171 | .map(|_| spec) 172 | }) 173 | } 174 | 175 | // Find out if dependent_spec (absolute) depends on dependency_spec (may be 176 | // relative) 177 | pub fn depends_on, Q: AsRef>( 178 | &self, 179 | dependent_spec: P, 180 | dependency_spec: Q, 181 | ) -> bool { 182 | let deps = self.graph().get(dependent_spec.as_ref()).unwrap(); 183 | deps.contains_key(dependency_spec.as_ref()) 184 | || deps.values().any(|dep| dep == dependency_spec.as_ref()) 185 | } 186 | 187 | /// Get a module by using a spec relative to another module. 188 | /// 189 | /// It is assumed that `spec` can only ever appear inside a given module. 190 | pub fn module_rel, Q: AsRef>(&self, spec: P, base: Q) -> Option<&Module> { 191 | self.module_deps(base) 192 | .and_then(|deps| deps.get(spec.as_ref())) 193 | .and_then(|absolute_spec| self.cache.get(absolute_spec)) 194 | } 195 | 196 | /// Convert the graph to the Graphviz dot format. 197 | pub fn to_dot(&self) -> String { 198 | let mut buf = String::from("digraph G {\n rankdir = LR;\n"); 199 | let mut node_id = 0usize; 200 | let mut node_ids = HashMap::::new(); 201 | 202 | for (k, vs) in self.graph() { 203 | let k_id = *node_ids.entry(k.clone()).or_insert_with(|| { 204 | node_id += 1; 205 | node_id 206 | }); 207 | 208 | for v in vs.values() { 209 | let v_id = *node_ids.entry(v.clone()).or_insert_with(|| { 210 | node_id += 1; 211 | node_id 212 | }); 213 | 214 | writeln!(buf, " node{k_id:03} -> node{v_id:03}").unwrap(); 215 | } 216 | } 217 | writeln!(buf, "\n").unwrap(); 218 | 219 | for (path, id) in node_ids { 220 | writeln!(buf, " node{id:03} [label=\"{}\"]", path.display()).unwrap(); 221 | } 222 | writeln!(buf, "}}").unwrap(); 223 | 224 | buf 225 | } 226 | } 227 | -------------------------------------------------------------------------------- /vuln-reach/src/javascript/module/resolver/fs.rs: -------------------------------------------------------------------------------- 1 | use std::fs; 2 | use std::path::{Path, PathBuf}; 3 | 4 | use walkdir::WalkDir; 5 | 6 | use super::{entry_point, is_valid_js_extension}; 7 | use crate::javascript::module::resolver::ModuleResolver; 8 | use crate::javascript::module::Module; 9 | use crate::{Error, Result, Tree}; 10 | 11 | pub struct FilesystemModuleResolver { 12 | base_path: PathBuf, 13 | entry_point: PathBuf, 14 | } 15 | 16 | impl FilesystemModuleResolver { 17 | pub fn new>(base_path: P) -> Result { 18 | let base_path = fs::canonicalize(base_path.as_ref()) 19 | .map_err(|e| Error::Generic(format!("{}: {}", base_path.as_ref().display(), e)))?; 20 | 21 | let package_json = fs::read_to_string(base_path.join("package.json")) 22 | .map_err(|e| Error::Generic(format!("package.json: {}", e)))?; 23 | 24 | let entry_point = entry_point(&package_json)?; 25 | 26 | Ok(Self { base_path, entry_point }) 27 | } 28 | } 29 | 30 | impl ModuleResolver for FilesystemModuleResolver { 31 | fn entry_point(&self) -> &str { 32 | // Unwrap is always valid because the entry point comes from a 33 | // `package.json` file which should be UTF-8 to begin with. 34 | self.entry_point.to_str().unwrap() 35 | } 36 | 37 | fn load>(&self, path: P) -> Result { 38 | let path_abs = self.base_path.join(path.as_ref()); 39 | let path = self.resolve_absolute(path_abs)?; 40 | 41 | let source = fs::read_to_string(&path) 42 | .map_err(|e| Error::Generic(format!("{}: {}", path.display(), e)))?; 43 | let tree = Tree::new(source)?.try_into()?; 44 | 45 | Ok(tree) 46 | } 47 | 48 | fn all_paths(&self) -> Box + '_> { 49 | let paths = WalkDir::new(&self.base_path) 50 | .into_iter() 51 | .flatten() 52 | .filter(|entry| is_valid_js_extension(entry.path())) 53 | .filter_map(|entry| { 54 | entry.path().strip_prefix(&self.base_path).ok().map(|p| p.to_path_buf()) 55 | }); 56 | Box::new(paths) 57 | } 58 | 59 | fn exists>(&self, path: P) -> bool { 60 | path.as_ref().exists() 61 | } 62 | 63 | fn is_dir>(&self, spec: P) -> bool { 64 | spec.as_ref().is_dir() 65 | } 66 | } 67 | 68 | #[cfg(test)] 69 | mod tests { 70 | use std::fs; 71 | 72 | use super::*; 73 | use crate::javascript::lang::exports::{EsmExport, Exports}; 74 | use crate::javascript::lang::imports::{EsmImport, Imports}; 75 | 76 | #[test] 77 | fn test_filesystem_resolver() { 78 | let dir = tempfile::tempdir().unwrap(); 79 | 80 | let file1 = dir.path().join("file1.js"); 81 | let file2 = dir.path().join("file2.js"); 82 | let package_json = dir.path().join("package.json"); 83 | 84 | fs::write(&file1, r#"export default function() { console.log("foo") }"#).unwrap(); 85 | fs::write(&file2, r#"import foo from "./file1.js""#).unwrap(); 86 | fs::write(package_json, r#"{ "main": "file1.js" }"#).unwrap(); 87 | 88 | let resolver = FilesystemModuleResolver::new(dir.path()).unwrap(); 89 | let module1 = resolver.load(&file1).unwrap(); 90 | let module2 = resolver.load(&file2).unwrap(); 91 | 92 | match module1.exports() { 93 | Exports::Esm(e) => { 94 | if let Some(EsmExport::Scope(e)) = e.default { 95 | println!("Default export: {:?}", module1.tree().repr_of(e)); 96 | } else { 97 | panic!("Failed assertion: {e:?}"); 98 | } 99 | }, 100 | Exports::CommonJs(_) => unreachable!(), 101 | Exports::None => unreachable!(), 102 | } 103 | 104 | match module2.imports() { 105 | Imports::Esm(i) => { 106 | if let Some(EsmImport::Default { name, source, .. }) = i.first() { 107 | println!("Default import: {name} from {source}"); 108 | } else { 109 | panic!("Failed assertion: {i:?}"); 110 | } 111 | }, 112 | Imports::CommonJs(_) => unreachable!(), 113 | Imports::None => unreachable!(), 114 | } 115 | 116 | let mut paths = resolver.all_paths().collect::>(); 117 | paths.sort(); 118 | assert_eq!(paths, vec![PathBuf::from("file1.js"), PathBuf::from("file2.js")]); 119 | } 120 | } 121 | -------------------------------------------------------------------------------- /vuln-reach/src/javascript/module/resolver/mem.rs: -------------------------------------------------------------------------------- 1 | use std::collections::HashMap; 2 | use std::path::{Path, PathBuf}; 3 | 4 | use super::{entry_point, is_valid_js_extension}; 5 | use crate::javascript::module::resolver::ModuleResolver; 6 | use crate::javascript::module::Module; 7 | use crate::{Error, Result, Tree}; 8 | 9 | pub struct MemModuleResolver { 10 | backing: HashMap, 11 | entry_point: PathBuf, 12 | } 13 | 14 | impl MemModuleResolver { 15 | pub fn new(backing: HashMap) -> Self 16 | where 17 | K: Into, 18 | V: Into, 19 | { 20 | let backing: HashMap = 21 | backing.into_iter().map(|(k, v)| (k.into(), v.into())).collect(); 22 | let package_json = backing 23 | .get(Path::new("package.json")) 24 | .ok_or_else(|| Error::Generic("package.json: not found".to_string())) 25 | .unwrap(); 26 | 27 | let entry_point = entry_point(package_json).unwrap(); 28 | 29 | Self { backing, entry_point } 30 | } 31 | } 32 | 33 | impl ModuleResolver for MemModuleResolver { 34 | fn entry_point(&self) -> &str { 35 | // Unwrap is always valid because the entry point comes from a 36 | // `package.json` file which should be UTF-8 to begin with. 37 | self.entry_point.to_str().unwrap() 38 | } 39 | 40 | fn load>(&self, path: P) -> Result { 41 | let path = self.resolve_absolute(path)?; 42 | 43 | let tree = Tree::new(self.backing.get(path.as_path()).unwrap().to_string())?.try_into()?; 44 | 45 | Ok(tree) 46 | } 47 | 48 | fn all_paths(&self) -> Box + '_> { 49 | let paths = self.backing.keys().filter(|path| is_valid_js_extension(path)).cloned(); 50 | Box::new(paths) 51 | } 52 | 53 | fn exists>(&self, path: P) -> bool { 54 | self.backing.contains_key(path.as_ref()) 55 | } 56 | 57 | fn is_dir>(&self, _spec: P) -> bool { 58 | // Paths in memory are never directories. 59 | false 60 | } 61 | } 62 | 63 | #[cfg(test)] 64 | mod tests { 65 | use maplit::hashmap; 66 | 67 | use super::*; 68 | use crate::javascript::lang::exports::{EsmExport, Exports}; 69 | use crate::javascript::lang::imports::{EsmImport, Imports}; 70 | 71 | #[test] 72 | fn test_mem_resolver() { 73 | let resolver = MemModuleResolver::new(hashmap! { 74 | "package.json" => "{}", 75 | "file1.js" => r#"export default function() { console.log("foo") }"#, 76 | "file2.js" => r#"import foo from "./file1.js""#, 77 | }); 78 | 79 | let module1 = resolver.load("file1.js").unwrap(); 80 | let module2 = resolver.load("file2.js").unwrap(); 81 | 82 | match module1.exports() { 83 | Exports::Esm(e) => { 84 | if let Some(EsmExport::Scope(e)) = e.default { 85 | println!("Default export: {:?}", module1.tree().repr_of(e)); 86 | } else { 87 | panic!("Failed assertion: {e:?}"); 88 | } 89 | }, 90 | Exports::CommonJs(_) => unreachable!(), 91 | Exports::None => unreachable!(), 92 | } 93 | 94 | match module2.imports() { 95 | Imports::Esm(i) => { 96 | if let Some(EsmImport::Default { name, source, .. }) = i.first() { 97 | println!("Default import: {name} from {source}"); 98 | } else { 99 | panic!("Failed assertion: {i:?}"); 100 | } 101 | }, 102 | Imports::CommonJs(_) => unreachable!(), 103 | Imports::None => unreachable!(), 104 | } 105 | 106 | let mut paths = resolver.all_paths().collect::>(); 107 | paths.sort(); 108 | assert_eq!(paths, vec![PathBuf::from("file1.js"), PathBuf::from("file2.js")]); 109 | } 110 | } 111 | -------------------------------------------------------------------------------- /vuln-reach/src/javascript/module/resolver/mod.rs: -------------------------------------------------------------------------------- 1 | pub mod fs; 2 | pub mod mem; 3 | pub mod tgz; 4 | 5 | use std::path::{Path, PathBuf}; 6 | 7 | use serde::Deserialize; 8 | 9 | use crate::javascript::module::Module; 10 | use crate::{util, Error, Result}; 11 | 12 | /// Trait for implementing module resolvers. 13 | pub trait ModuleResolver { 14 | /// Resolve a module spec in an absolute sense. 15 | /// 16 | /// This is meant to resolve specs starting from an entry point, i.e. the 17 | /// `main` key in `package.json`. 18 | fn resolve_absolute>(&self, spec: P) -> Result { 19 | let normalized_spec = util::normalize_path(spec.as_ref()); 20 | let full_path = resolve_path(&normalized_spec, |p| self.exists(p) && !self.is_dir(p)) 21 | .ok_or_else(|| { 22 | Error::Generic(format!("Path not found: {}", normalized_spec.display())) 23 | })?; 24 | 25 | Ok(full_path) 26 | } 27 | 28 | /// Resolve a module spec relative to another spec. 29 | /// 30 | /// This is meant to resolve specs starting from other modules, i.e. in 31 | /// import statements inside the code itself. 32 | fn resolve_relative, Q: AsRef>( 33 | &self, 34 | spec: P, 35 | relative_to_spec: Q, 36 | ) -> Result { 37 | if !is_relative(&spec) { 38 | return self.resolve_absolute(spec); 39 | } 40 | 41 | let source_spec = self.resolve_absolute(relative_to_spec)?; 42 | let base_spec = if !self.is_dir(&source_spec) { 43 | source_spec.parent().unwrap_or(&source_spec) 44 | } else { 45 | &source_spec 46 | }; 47 | 48 | let absolute_spec = base_spec.join(spec); 49 | 50 | self.resolve_absolute(absolute_spec) 51 | } 52 | 53 | /// Determine a package's entry point module. 54 | fn entry_point(&self) -> &str; 55 | 56 | /// Load (absolute) a given path into a [`Module`]. 57 | fn load>(&self, spec: P) -> Result; 58 | 59 | /// Return all paths for the package. 60 | fn all_paths(&self) -> Box + '_>; 61 | 62 | /// Determine whether the supplied module specifier exists within 63 | /// the current resolver. 64 | fn exists>(&self, spec: P) -> bool; 65 | 66 | /// Determine whether the supplied module specifier is a directory 67 | /// within the current resolver. 68 | fn is_dir>(&self, spec: P) -> bool; 69 | } 70 | 71 | // Implement a best-effort Node.js-compatible module path determination, 72 | // that takes into account both CommonJS and ESM. 73 | // See https://nodejs.org/api/packages.html#determining-module-system 74 | pub(crate) fn resolve_path(base_path: P, exists_predicate: F) -> Option 75 | where 76 | P: AsRef, 77 | F: Fn(&Path) -> bool, 78 | { 79 | let mut path = base_path.as_ref().to_path_buf(); 80 | 81 | // Check if path itself exists. 82 | if exists_predicate(&path) { 83 | return Some(path); 84 | } 85 | 86 | // Try path combined with JavaScript suffixes. 87 | for suffix in ["js", "mjs", "cjs"] { 88 | path.set_extension(suffix); 89 | if exists_predicate(&path) { 90 | return Some(path); 91 | } 92 | } 93 | 94 | // Try entrypoints in the path's directory. 95 | for entrypoint in ["index.js", "index.mjs", "index.cjs"] { 96 | path = base_path.as_ref().to_path_buf().join(entrypoint); 97 | if exists_predicate(&path) { 98 | return Some(path); 99 | } 100 | } 101 | 102 | None 103 | } 104 | 105 | // For an import specifier to be recognized as relative in Javascript, it _has_ 106 | // to start with either "." or "..". This means that we can use this predicate 107 | // to avoid trying to resolve as relative a spec which is bare or absolute. 108 | // 109 | // https://nodejs.org/api/modules.html#all-together 110 | // https://nodejs.org/api/esm.html#import-specifiers 111 | fn is_relative>(path: P) -> bool { 112 | let path = path.as_ref(); 113 | path.starts_with(".") || path.starts_with("..") 114 | } 115 | 116 | fn is_valid_js_extension>(path: P) -> bool { 117 | path.as_ref().extension().map_or(false, |ext| { 118 | let lowercase_ext = ext.to_string_lossy().to_lowercase(); 119 | ["js", "mjs", "cjs"].contains(&&*lowercase_ext) 120 | }) 121 | } 122 | 123 | #[derive(Deserialize)] 124 | struct PackageJson { 125 | main: Option, 126 | } 127 | 128 | // Retrieve the entry point from the "main" field in `package.json`. 129 | pub(crate) fn entry_point(package_json: &str) -> Result { 130 | serde_json::from_str::(package_json) 131 | .map_err(|e| Error::Generic(format!("package.json deserialization error: {e:?}"))) 132 | .map(|p| p.main.unwrap_or_else(|| PathBuf::from("index.js"))) 133 | .map(|path| util::normalize_path(&path)) 134 | } 135 | -------------------------------------------------------------------------------- /vuln-reach/src/javascript/module/resolver/tgz.rs: -------------------------------------------------------------------------------- 1 | use std::collections::HashMap; 2 | use std::io::Read; 3 | use std::path::{Path, PathBuf}; 4 | 5 | use flate2::read::GzDecoder; 6 | use tar::Archive; 7 | 8 | use super::{entry_point, is_valid_js_extension}; 9 | use crate::javascript::module::resolver::ModuleResolver; 10 | use crate::javascript::module::Module; 11 | use crate::{Error, Result, Tree}; 12 | 13 | pub struct TarballModuleResolver { 14 | files: HashMap>, 15 | entry_point: PathBuf, 16 | } 17 | 18 | impl TarballModuleResolver { 19 | pub fn new(bytes: Vec) -> Result { 20 | // Immediately extract the file in-memory because random access directly 21 | // from the tarball object is not possible. 22 | let files = Archive::new(GzDecoder::new(&bytes[..])) 23 | .entries()? 24 | .filter_map(|entry| entry.ok()) 25 | .filter_map(|mut entry| { 26 | // Skip the toplevel directory, to go to the module's root. 27 | let mut path = PathBuf::from(entry.path().unwrap()); 28 | path = path.components().skip(1).collect(); 29 | 30 | let mut data = Vec::new(); 31 | entry.read_to_end(&mut data).ok()?; 32 | 33 | Some((path, data)) 34 | }) 35 | .collect::>(); 36 | 37 | let package_json = std::str::from_utf8(files.get(Path::new("package.json")).unwrap()) 38 | .map_err(|e| Error::Generic(format!("package.json: {}", e)))?; 39 | 40 | let entry_point = entry_point(package_json)?; 41 | 42 | Ok(Self { files, entry_point }) 43 | } 44 | } 45 | 46 | impl ModuleResolver for TarballModuleResolver { 47 | fn entry_point(&self) -> &str { 48 | // Unwrap is always valid because the entry point comes from a 49 | // `package.json` file which should be UTF-8 to begin with. 50 | self.entry_point.to_str().unwrap() 51 | } 52 | 53 | fn load>(&self, path: P) -> Result { 54 | let path = self.resolve_absolute(path)?; 55 | 56 | let source = std::str::from_utf8(self.files.get(&path).unwrap()) 57 | .map_err(|e| Error::Generic(format!("{}: {}", path.display(), e)))? 58 | .to_string(); 59 | let tree = Tree::new(source)?.try_into()?; 60 | 61 | Ok(tree) 62 | } 63 | 64 | fn all_paths(&self) -> Box + '_> { 65 | let paths = self.files.keys().filter(|path| is_valid_js_extension(path)).cloned(); 66 | Box::new(paths) 67 | } 68 | 69 | fn exists>(&self, path: P) -> bool { 70 | self.files.contains_key(path.as_ref()) 71 | } 72 | 73 | fn is_dir>(&self, _spec: P) -> bool { 74 | // Paths in a tarball are never directories. 75 | false 76 | } 77 | } 78 | -------------------------------------------------------------------------------- /vuln-reach/src/javascript/package/mod.rs: -------------------------------------------------------------------------------- 1 | //! Package-level (i.e. Javascript files pertaining to a `package.json`) data 2 | //! structures and methods. 3 | pub mod reachability; 4 | pub mod resolver; 5 | 6 | use std::collections::HashMap; 7 | use std::fs; 8 | use std::path::{Path, PathBuf}; 9 | 10 | use reachability::{PackageReachability, VulnerableNode}; 11 | 12 | use super::lang::imports::{CommonJsImports, EsmImports}; 13 | use crate::javascript::lang::imports::Imports; 14 | use crate::javascript::module::{ 15 | FilesystemModuleResolver, MemModuleResolver, Module, ModuleCache, ModuleResolver, 16 | TarballModuleResolver, 17 | }; 18 | use crate::Result; 19 | 20 | /// A Javascript package. 21 | pub struct Package { 22 | cache: ModuleCache, 23 | resolver: R, 24 | } 25 | 26 | impl Package 27 | where 28 | R: ModuleResolver, 29 | { 30 | pub fn resolve_relative, Q: AsRef>( 31 | &self, 32 | spec: P, 33 | base: Q, 34 | ) -> Option<&Module> { 35 | self.resolver 36 | .resolve_relative(spec.as_ref(), base.as_ref()) 37 | .ok() 38 | .and_then(|path| self.cache.module(path)) 39 | } 40 | 41 | pub fn resolve_absolute>(&self, spec: P) -> Option<&Module> { 42 | self.resolver.resolve_absolute(spec.as_ref()).ok().and_then(|path| self.cache.module(path)) 43 | } 44 | 45 | pub fn cache(&self) -> &ModuleCache { 46 | &self.cache 47 | } 48 | 49 | pub fn resolver(&self) -> &R { 50 | &self.resolver 51 | } 52 | 53 | pub fn reachability(&self, vulnerable_node: &VulnerableNode) -> Result { 54 | PackageReachability::new(self, vulnerable_node) 55 | } 56 | 57 | /// Identify foreign imports for each module. 58 | /// 59 | /// A foreign import is an import from a source that's outside this package. 60 | /// For convenience, we are going to mark all imports that _don't_ resolve 61 | /// inside the package as foreign; true unreachable exports will be just 62 | /// dropped. 63 | pub fn foreign_imports(&self) -> HashMap<&PathBuf, Imports> { 64 | let mut foreign_imports = HashMap::new(); 65 | 66 | // Strategy for detecting foreign imports: discard trivially relative 67 | // paths (i.e. starting with '.'), then attempt to resolve paths from 68 | // inside the package, marking the import as foreign if that fails. 69 | for (spec, module) in self.cache().modules() { 70 | let import_filter = |import: &str| { 71 | !import.starts_with('.') && self.resolve_relative(import, spec).is_none() 72 | }; 73 | 74 | let import_specs = match module.imports() { 75 | Imports::Esm(imports) => Imports::Esm(EsmImports::new( 76 | imports 77 | .into_iter() 78 | .filter(|import| import_filter(import.source())) 79 | .cloned() 80 | .collect(), 81 | )), 82 | Imports::CommonJs(imports) => Imports::CommonJs(CommonJsImports::new( 83 | imports.into_iter().filter(|&import| import_filter(import.source())).collect(), 84 | )), 85 | Imports::None => continue, 86 | }; 87 | foreign_imports.insert(spec, import_specs); 88 | } 89 | 90 | foreign_imports 91 | } 92 | } 93 | 94 | impl Package { 95 | pub fn from_tarball_path>(tarball: P) -> Result { 96 | fs::read(tarball.as_ref()).map_err(|e| e.into()).and_then(Package::from_tarball_bytes) 97 | } 98 | 99 | pub fn from_tarball_bytes(bytes: Vec) -> Result { 100 | let resolver = TarballModuleResolver::new(bytes)?; 101 | let cache = ModuleCache::new(&resolver)?; 102 | Ok(Self { resolver, cache }) 103 | } 104 | } 105 | 106 | impl Package { 107 | pub fn from_path>(path: P) -> Result { 108 | let resolver = FilesystemModuleResolver::new(path.as_ref())?; 109 | let cache = ModuleCache::new(&resolver)?; 110 | Ok(Self { resolver, cache }) 111 | } 112 | } 113 | 114 | impl Package { 115 | pub fn from_mem(backing: HashMap) -> Result 116 | where 117 | K: Into, 118 | V: Into, 119 | { 120 | let resolver = MemModuleResolver::new(backing); 121 | let cache = ModuleCache::new(&resolver)?; 122 | Ok(Self { resolver, cache }) 123 | } 124 | } 125 | 126 | #[cfg(test)] 127 | mod tests { 128 | use maplit::hashmap; 129 | use textwrap::dedent; 130 | 131 | use super::*; 132 | 133 | fn fixture() -> Package { 134 | Package::from_mem(hashmap! { 135 | "package.json" => r#"{ "main": "index.js" }"#.to_string(), 136 | "index.js" => dedent(r#" 137 | import foo from '@foo/bar' 138 | import bar from 'foobar' 139 | import baz from './baz.js' 140 | import quux from './quux.js' 141 | "#), 142 | "baz.js" => dedent(r#" 143 | export default const c = 1 144 | "#) 145 | }) 146 | .unwrap() 147 | } 148 | 149 | #[test] 150 | fn test_foreign_imports() { 151 | let fixture = fixture(); 152 | let foreign_imports = fixture.foreign_imports(); 153 | 154 | fn vec_of_imports<'a>(imports: &'a Imports) -> Vec<&'a str> { 155 | match imports { 156 | Imports::Esm(imports) => imports.into_iter().map(|v| v.source()).collect(), 157 | _ => unreachable!(), 158 | } 159 | } 160 | 161 | println!("{:#?}", foreign_imports); 162 | assert_eq!( 163 | foreign_imports 164 | .iter() 165 | .map(|(&k, v)| (k.clone(), vec_of_imports(v))) 166 | .collect::>(), 167 | hashmap! { 168 | PathBuf::from("index.js") => vec!["@foo/bar", "foobar"] 169 | } 170 | ); 171 | } 172 | } 173 | -------------------------------------------------------------------------------- /vuln-reach/src/javascript/package/resolver.rs: -------------------------------------------------------------------------------- 1 | use std::collections::HashMap; 2 | use std::path::{Component, Path}; 3 | 4 | use crate::javascript::module::{Module, ModuleResolver}; 5 | use crate::javascript::package::Package; 6 | 7 | /// A cache of packages. 8 | pub struct PackageResolver { 9 | packages: HashMap>, 10 | } 11 | 12 | impl PackageResolver 13 | where 14 | R: ModuleResolver, 15 | { 16 | /// Creates a new builder. 17 | pub fn builder() -> PackageResolverBuilder { 18 | Default::default() 19 | } 20 | 21 | /// Returns the specified package, if present. 22 | pub fn resolve_package(&self, package: &str) -> Option<&Package> { 23 | self.packages.get(package) 24 | } 25 | 26 | /// Resolves a package and a module given a full import spec. 27 | /// 28 | /// # Examples 29 | /// 30 | /// ```js 31 | /// import "@foo/bar/baz/quux.js" // yields ("@foo/bar", "baz/quux.js") 32 | /// import "foo/bar/baz.js" // yields ("foo", "bar/baz.js") 33 | /// ``` 34 | pub fn resolve_spec(&self, full_spec: &Path) -> Option<(&Package, &Module)> { 35 | let (pkg_path, spec) = match full_spec.components().next() { 36 | Some(Component::Normal(x)) if x.to_string_lossy().starts_with('@') => { 37 | let mut c = full_spec.components(); 38 | let scope = c.next()?; 39 | let package = c.next()?; 40 | let spec = c.as_path(); 41 | 42 | let scope_path: &Path = scope.as_ref(); 43 | let package_path: &Path = package.as_ref(); 44 | 45 | Some((scope_path.to_path_buf().join(package_path), spec)) 46 | }, 47 | Some(Component::Normal(_)) => { 48 | let mut c = full_spec.components(); 49 | let package = c.next()?; 50 | let spec = c.as_path(); 51 | 52 | let package_path: &Path = package.as_ref(); 53 | 54 | Some((package_path.to_path_buf(), spec)) 55 | }, 56 | _ => None, 57 | }?; 58 | 59 | let package = self.resolve_package(&pkg_path.to_string_lossy())?; 60 | let module = package.resolve_absolute(spec)?; 61 | 62 | Some((package, module)) 63 | } 64 | 65 | pub fn package_specs(&self) -> impl Iterator { 66 | self.packages.keys() 67 | } 68 | } 69 | 70 | /// Builder for [`PackageResolver`]. 71 | pub struct PackageResolverBuilder { 72 | packages: HashMap>, 73 | } 74 | 75 | impl Default for PackageResolverBuilder 76 | where 77 | R: ModuleResolver, 78 | { 79 | fn default() -> Self { 80 | Self { packages: Default::default() } 81 | } 82 | } 83 | 84 | impl PackageResolverBuilder 85 | where 86 | R: ModuleResolver, 87 | { 88 | /// Inserts the specified package in the resolver. 89 | pub fn with_package>(mut self, k: S, v: Package) -> Self { 90 | self.packages.insert(k.into(), v); 91 | self 92 | } 93 | 94 | /// Builds the value. 95 | pub fn build(self) -> PackageResolver { 96 | PackageResolver { packages: self.packages } 97 | } 98 | } 99 | -------------------------------------------------------------------------------- /vuln-reach/src/javascript/project/mod.rs: -------------------------------------------------------------------------------- 1 | //! Project-wide (i.e. tree-of-packages) reachability. 2 | 3 | use std::collections::{HashMap, HashSet, VecDeque}; 4 | use std::hash::Hash; 5 | 6 | use serde::{Deserialize, Serialize}; 7 | use tree_sitter::Node; 8 | 9 | use super::lang::imports::Imports; 10 | use super::package::reachability::{PackageReachability, VulnerableNode}; 11 | use super::package::Package; 12 | use crate::javascript::module::resolver::ModuleResolver; 13 | use crate::javascript::package::reachability::NodePath; 14 | use crate::javascript::package::resolver::PackageResolver; 15 | use crate::{Error, Result}; 16 | 17 | /// Specifies the package and module that direct towards a vulnerability. 18 | #[derive(Serialize, Deserialize, Clone, Debug, PartialEq, Eq)] 19 | pub struct VulnerableEdge { 20 | package: String, 21 | module: String, 22 | } 23 | 24 | impl VulnerableEdge { 25 | pub fn new, S2: Into>(package: S1, module: S2) -> Self { 26 | Self { package: package.into(), module: module.into() } 27 | } 28 | } 29 | 30 | /// An instance of in-package reachability that leads either to 31 | /// the vulnerability itself, or to an intermediate package. 32 | #[derive(Serialize, Deserialize, Debug, PartialEq, Eq)] 33 | pub enum ReachabilityEdge { 34 | /// The vulnerability itself is reachable. 35 | Own(PackageReachability), 36 | /// The vulnerability is reachable through another package in the dependency 37 | /// tree. 38 | Foreign(PackageReachability, VulnerableEdge), 39 | } 40 | 41 | impl ReachabilityEdge { 42 | fn reachability(&self) -> &PackageReachability { 43 | match self { 44 | ReachabilityEdge::Own(p) => p, 45 | ReachabilityEdge::Foreign(p, _) => p, 46 | } 47 | } 48 | } 49 | 50 | /// A single path between a node in a package and a vulnerable node. 51 | /// 52 | /// This is a hierarchical ordered representation, of the following form: 53 | /// 54 | /// ```json 55 | /// { 56 | /// "dependent_package": { 57 | /// "module1.js": ["node1", "node2", ...] 58 | /// }, 59 | /// ..., 60 | /// "vulnerable_package": { 61 | /// "module2.js": ["node3", "node4", ... "vulnerable_node"] 62 | /// } 63 | /// } 64 | /// ``` 65 | pub type ReachabilityPath = Vec<(String, Vec<(String, NodePath)>)>; 66 | 67 | /// Reachability of a given node inside of a project. 68 | /// 69 | /// Maps a package name to the graph of accesses (represented as adjacency 70 | /// lists) that lead to the vulnerable node, starting from that package. 71 | #[derive(Serialize, Deserialize, Default, Debug, PartialEq, Eq)] 72 | pub struct ProjectReachability(HashMap>); 73 | 74 | impl ProjectReachability { 75 | pub fn new(adjacency_lists: HashMap>) -> Self { 76 | Self(adjacency_lists) 77 | } 78 | 79 | /// Return the graph edges leaving from the specified package, if they 80 | /// exist. 81 | pub fn edges_from(&self, package: &str) -> Option<&Vec> { 82 | self.0.get(package) 83 | } 84 | 85 | /// Find one possible path from the specified package to the vulnerable 86 | /// node. 87 | pub fn find_path>(&self, start_package: S) -> Option { 88 | struct EvaluationStep<'a> { 89 | src_package: &'a str, 90 | src_module: &'a str, 91 | step_path: Vec<(&'a str, Vec<(String, NodePath)>)>, 92 | } 93 | let mut bfs_q = VecDeque::new(); 94 | let mut visited = HashSet::new(); 95 | 96 | // Initialize the queue with steps for all modules in the top level package. 97 | // 98 | // This maps to the client having N files in their project, and starting a 99 | // search for a path from each of those N files. 100 | for edge in self.0.get(start_package.as_ref())? { 101 | for module_spec in edge.reachability().modules() { 102 | bfs_q.push_back(EvaluationStep { 103 | src_package: start_package.as_ref(), 104 | src_module: module_spec.as_ref(), 105 | step_path: Vec::new(), 106 | }); 107 | } 108 | } 109 | 110 | while let Some(EvaluationStep { src_package, src_module, step_path }) = bfs_q.pop_front() { 111 | // Skip already visited packages (would be a circular dependency). 112 | if visited.contains(src_package) { 113 | continue; 114 | } 115 | visited.insert(src_package); 116 | 117 | // Stop searching in this branch if it would lead to a non-existing package. 118 | // In practice, this should only happen in the top-level package if it is 119 | // misnamed, as `self` should contain only valid edges. 120 | let Some(edges) = self.0.get(src_package) else { continue }; 121 | 122 | for edge in edges { 123 | // Each edge maintains its own copy of the step path. 124 | let mut step_path = step_path.clone(); 125 | 126 | // If a subpath forward is found, add it to the path. Otherwise, 127 | // skip processing this edge. 128 | if let Some(path) = edge.reachability().find_path(src_module) { 129 | step_path.push((src_package, path)); 130 | } else { 131 | continue; 132 | } 133 | 134 | match edge { 135 | ReachabilityEdge::Own(_) => { 136 | // This is the last branch. Return the found path. 137 | return Some( 138 | step_path.into_iter().map(|(k, v)| (k.to_string(), v)).collect(), 139 | ); 140 | }, 141 | ReachabilityEdge::Foreign(_, symbol) => { 142 | // This is an intermediate branch. Add a step towards the next symbol. 143 | bfs_q.push_back(EvaluationStep { 144 | src_package: &symbol.package, 145 | src_module: &symbol.module, 146 | step_path: step_path.clone(), 147 | }); 148 | }, 149 | } 150 | } 151 | } 152 | 153 | None 154 | } 155 | } 156 | 157 | pub struct Project { 158 | package_resolver: PackageResolver, 159 | } 160 | 161 | impl Project { 162 | pub fn new(package_resolver: PackageResolver) -> Self { 163 | Self { package_resolver } 164 | } 165 | 166 | pub fn reachability(&self, vuln_node: &VulnerableNode) -> Result { 167 | self.reachability_inner(vuln_node, Default::default()) 168 | } 169 | 170 | pub fn reachability_extend( 171 | &self, 172 | project_reachability: ProjectReachability, 173 | vuln_node: &VulnerableNode, 174 | ) -> Result { 175 | self.reachability_inner(vuln_node, project_reachability) 176 | } 177 | 178 | pub fn all_packages(&self) -> Vec<(&str, &Package)> { 179 | self.package_resolver 180 | .package_specs() 181 | .filter_map(|package_spec| { 182 | self.package_resolver 183 | .resolve_package(package_spec) 184 | .map(|package| (package_spec.as_str(), package)) 185 | }) 186 | .collect() 187 | } 188 | 189 | fn reachability_inner<'a>( 190 | &'a self, 191 | vuln_node: &'a VulnerableNode, 192 | package_reachabilities: ProjectReachability, 193 | ) -> Result { 194 | let ProjectReachability(mut package_reachabilities) = package_reachabilities; 195 | 196 | // Using foreign imports for each package, build a list of edges (a, b) 197 | // where b depends on a. 198 | let mut edges = HashSet::<(&str, &str)>::new(); 199 | 200 | let all_foreign_imports: HashMap<_, _> = self 201 | .all_packages() 202 | .into_iter() 203 | .map(|(package_spec, package)| (package_spec, (package, package.foreign_imports()))) 204 | .collect(); 205 | 206 | for (package_spec, (_package, foreign_imports)) in &all_foreign_imports { 207 | for dependencies in foreign_imports.values() { 208 | let insert_edge = |(dependency, _)| { 209 | edges.insert((package_spec, dependency)); 210 | }; 211 | 212 | match &dependencies { 213 | Imports::Esm(imports) => imports 214 | .into_iter() 215 | .map(|import| import.source()) 216 | .map(package_spec_split) 217 | .for_each(insert_edge), 218 | Imports::CommonJs(imports) => imports 219 | .into_iter() 220 | .map(|import| import.source()) 221 | .map(package_spec_split) 222 | .for_each(insert_edge), 223 | Imports::None => continue, 224 | }; 225 | } 226 | } 227 | 228 | let edges = edges.into_iter().collect::>(); 229 | 230 | // Find the topological sorting of this graph so that we can find the 231 | // reachability for the leaves first and for all the dependents afterwards. 232 | let topo_ordering = topological_sort(&edges)?; 233 | 234 | // For each dependent A on dependency B, compute reachability between A's 235 | // imports and B's exports. 236 | // 237 | // Let's say that the first package with a vulnerability comes Kth in the list. 238 | // - Packages 1 to K-1 will have empty reachability data by construction; they 239 | // do not depend on (any of) the vulnerable package(s). 240 | // - Package K will have the reachability information starting from the given 241 | // vulnerable node(s). 242 | // - Packages K+1 to N will be able to construct their reachability information 243 | // starting from the information in package K's reachability. 244 | 245 | for package_spec in topo_ordering.into_iter().rev() { 246 | if package_reachabilities.contains_key(package_spec) { 247 | continue; 248 | } 249 | 250 | let mut target_nodes: Vec<(VulnerableNode, Option)> = Vec::new(); 251 | 252 | // If the vulnerable spec pertains to the currently processed package, 253 | // add the node to the list of target nodes 254 | if package_spec == vuln_node.package() { 255 | target_nodes.push((vuln_node.clone(), None)); 256 | } 257 | 258 | // A package in topo_ordering must always be present in the resolver 259 | // and it should also have an entry in all_foreign_imports. Skip 260 | // (but report) missing specs as they might be system packages 261 | // (e.g `fs`, `events`). 262 | let Some((package, foreign_imports)) = all_foreign_imports.get(package_spec) else { 263 | eprintln!("Package spec not found in project: {package_spec}"); 264 | continue; 265 | }; 266 | 267 | // Identify target nodes coming from foreign imports. This will not 268 | // run for the leaves, i.e. packages with no dependencies. 269 | for (module_spec, specs) in foreign_imports { 270 | // Link foreign imports to the symbol exported from the foreign package. 271 | let mut link_exports = |name: Option<&str>, source: &str, node: Node<'_>| { 272 | // Extract package name, module name and reachability struct for the 273 | // imported foreign package. 274 | let (dep_package, dep_module) = package_spec_split(source); 275 | let dep_module = dep_module 276 | .or_else(|| { 277 | let package = self.package_resolver.resolve_package(dep_package)?; 278 | Some(package.resolver().entry_point()) 279 | }) 280 | .unwrap_or("index.js"); 281 | let Some(dep_reachabilities) = package_reachabilities.get(dep_package) else { 282 | return; 283 | }; 284 | 285 | let reachable_exports = dep_reachabilities 286 | .iter() 287 | .map(|edge| edge.reachability().reachable_exports()) 288 | .fold(HashSet::new(), |mut set, exports| { 289 | set.extend(exports); 290 | set 291 | }); 292 | 293 | let reachability_check = match name { 294 | // The hardcoded "" check is for CommonJs modules, and it represents 295 | // exports that are a function. Unconditionally, any access to exports at 296 | // all that are in this form is to be considered 297 | // reaching. 298 | // 299 | // This part in CommonJs is currently shortcircuited by the `None` branch. 300 | Some(name) => { 301 | reachable_exports.contains(&(dep_module, name)) 302 | || reachable_exports.contains(&(dep_module, "")) 303 | }, 304 | // On this branch, all exports match, regardless of their symbol. 305 | // This is an over-coloring that is only useful with CommonJs until 306 | // a better way of evaluating objects is found. 307 | None => reachable_exports.iter().any(|&(module, _)| module == dep_module), 308 | }; 309 | 310 | // If there is a reachability match between imports and 311 | // foreign exports, insert the current import node in the 312 | // target nodes, and attach information about the imported 313 | // symbol. 314 | if reachability_check { 315 | let start_pos = node.start_position(); 316 | let end_pos = node.end_position(); 317 | target_nodes.push(( 318 | VulnerableNode::new( 319 | package_spec, 320 | module_spec.to_string_lossy(), 321 | start_pos.row, 322 | start_pos.column, 323 | end_pos.row, 324 | end_pos.column, 325 | ), 326 | Some(VulnerableEdge::new(dep_package, dep_module)), 327 | )); 328 | } 329 | }; 330 | 331 | match &specs { 332 | Imports::Esm(imports) => { 333 | for import in imports { 334 | link_exports(Some(import.name()), import.source(), import.node()); 335 | } 336 | }, 337 | Imports::CommonJs(imports) => { 338 | let module = package.cache().module(module_spec).unwrap(); 339 | let tree = module.tree(); 340 | 341 | for import in imports { 342 | let access_node = import.access_node(tree); 343 | link_exports(None, import.source(), access_node); 344 | } 345 | }, 346 | Imports::None => {}, 347 | } 348 | } 349 | 350 | // Compute the reachability from each target node to the visible 351 | // exports/side effects in a package. 352 | for (reachability, foreign) in target_nodes 353 | .into_iter() 354 | .map(|(target_node, foreign)| (package.reachability(&target_node), foreign)) 355 | { 356 | match reachability { 357 | Ok(reachability) => { 358 | if !reachability.is_empty() { 359 | let reachability_edge = match foreign { 360 | Some(foreign) => ReachabilityEdge::Foreign(reachability, foreign), 361 | None => ReachabilityEdge::Own(reachability), 362 | }; 363 | 364 | package_reachabilities 365 | .entry(package_spec.to_string()) 366 | .or_default() 367 | .push(reachability_edge); 368 | } 369 | }, 370 | Err(e) => eprintln!("Reachability failed: {e:?}"), 371 | } 372 | } 373 | } 374 | 375 | // Pick out reachability for the topmost package. By construction, it should 376 | // always be the last one in the topological ordering. We should build tests 377 | // to ensure this is the case. 378 | Ok(ProjectReachability::new(package_reachabilities)) 379 | } 380 | } 381 | 382 | // Utilities 383 | // 384 | 385 | // Split a package spec into package name and module spec. 386 | // 387 | // # Example 388 | // 389 | // ```rust 390 | // assert_eq!(package_spec_split("@foo/bar"), ("@foo/bar", None)); 391 | // assert_eq!(package_spec_split("@foo/bar/baz.js"), ("@foo/bar", Some("baz.js"))); 392 | // assert_eq!(package_spec_split("foo"), ("foo", None)); 393 | // assert_eq!(package_spec_split("foo/bar.js"), ("foo", Some("bar.js"))); 394 | // ``` 395 | fn package_spec_split(s: &str) -> (&str, Option<&str>) { 396 | let idx = if s.starts_with('@') { 1 } else { 0 }; 397 | 398 | s.match_indices('/') 399 | .nth(idx) 400 | .map(|(index, _)| s.split_at(index)) 401 | .map(|(package, module)| (package, Some(&module[1..]))) 402 | .unwrap_or_else(|| (s, None)) 403 | } 404 | 405 | // Implementation of https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm 406 | fn topological_sort(edges: &[(N, N)]) -> Result> { 407 | let mut nodes = { 408 | // Build a set of all vertices. 409 | let all_nodes = edges 410 | .iter() 411 | .copied() 412 | .flat_map(|(dependent, dependency)| [dependent, dependency]) 413 | .collect::>(); 414 | 415 | // Build a set of vertices with at least one incoming edge. 416 | let nodes_with_inc_edges = 417 | edges.iter().copied().map(|(_, dependency)| dependency).collect::>(); 418 | 419 | // Build the set of vertices with no incoming edge by difference between the two 420 | // above. This is also going to be the working set of the algorithm. 421 | all_nodes.difference(&nodes_with_inc_edges).copied().collect::>() 422 | }; 423 | 424 | // Build reversed adjacency lists tp gradually remove edges from. 425 | let mut adj_lists: HashMap> = 426 | edges.iter().copied().fold(HashMap::new(), |mut adj_lists, (dependent, dependency)| { 427 | adj_lists.entry(dependency).or_default().insert(dependent); 428 | adj_lists 429 | }); 430 | 431 | let mut ordering = Vec::new(); 432 | 433 | while !nodes.is_empty() { 434 | let n = nodes.pop_front().unwrap(); 435 | ordering.push(n); 436 | 437 | for (dependency, dependents) in &mut adj_lists { 438 | if dependents.remove(&n) && dependents.is_empty() { 439 | nodes.push_back(*dependency); 440 | } 441 | } 442 | } 443 | 444 | // Input graph should be a DAG hence not have any cycles; it means that 445 | // all the edges have been processed. 446 | if adj_lists.iter().any(|(_, s)| !s.is_empty()) { 447 | return Err(Error::Generic( 448 | "Could not find topological ordering: cycles detected in the dependency tree".into(), 449 | )); 450 | } 451 | 452 | Ok(ordering) 453 | } 454 | 455 | #[cfg(test)] 456 | mod tests { 457 | use maplit::hashmap; 458 | use textwrap::dedent; 459 | 460 | use super::*; 461 | use crate::javascript::module::MemModuleResolver; 462 | use crate::javascript::package::Package; 463 | 464 | macro_rules! mem_fixture { 465 | ($($module:expr => $src:expr,)*) => { 466 | Package::from_mem(hashmap! { 467 | $($module => dedent($src)),* 468 | }).unwrap() 469 | } 470 | } 471 | 472 | fn project_trivial_esm() -> Project { 473 | let package_resolver = PackageResolver::builder() 474 | .with_package( 475 | "dependency", 476 | mem_fixture!( 477 | "package.json" => r#"{ "main": "index.js" }"#, 478 | "index.js" => r#" 479 | import { vuln } from './vuln.js' 480 | 481 | export function vuln2() { vuln() } 482 | "#, 483 | "vuln.js" => r#" 484 | export function vuln() { const foo = 2 } 485 | "#, 486 | ), 487 | ) 488 | .with_package( 489 | "dependent", 490 | mem_fixture!( 491 | "package.json" => r#"{ "main": "index.js" }"#, 492 | "index.js" => r#" 493 | import { vuln2 } from 'dependency' 494 | 495 | export function dependent_vuln() { vuln2() } 496 | "#, 497 | ), 498 | ) 499 | .build(); 500 | 501 | Project { package_resolver } 502 | } 503 | 504 | fn project_trivial_cjs() -> Project { 505 | let package_resolver = PackageResolver::builder() 506 | .with_package( 507 | "dependency", 508 | mem_fixture!( 509 | "package.json" => r#"{ "main": "index.js" }"#, 510 | "index.js" => r#" 511 | const vuln = require('./vuln.js') 512 | 513 | module.exports.vuln2 = function() { vuln.vuln() } 514 | "#, 515 | "vuln.js" => r#" 516 | module.exports.vuln = function() { const foo = 2 } 517 | "#, 518 | ), 519 | ) 520 | .with_package( 521 | "dependent", 522 | mem_fixture!( 523 | "package.json" => r#"{ "main": "index.js" }"#, 524 | "index.js" => r#" 525 | const dependency = require('dependency') 526 | 527 | module.exports.vuln3 = function() { dependency.vuln2() } 528 | "#, 529 | ), 530 | ) 531 | .build(); 532 | 533 | Project { package_resolver } 534 | } 535 | 536 | #[test] 537 | fn test_topo_sort() { 538 | let edges = 539 | &[(5, 11), (7, 11), (7, 8), (3, 8), (3, 10), (11, 2), (11, 9), (11, 10), (8, 9)]; 540 | 541 | let t = topological_sort(edges).unwrap(); 542 | 543 | let is_before = 544 | |a, b| t.iter().position(|i| i == a).unwrap() < t.iter().position(|i| i == b).unwrap(); 545 | 546 | // Every dependent must come before all of its dependencies. 547 | for (a, b) in edges { 548 | assert!(is_before(a, b), "{a} -> {b}"); 549 | } 550 | } 551 | 552 | #[test] 553 | fn test_package_spec_split() { 554 | assert_eq!(package_spec_split("@foo/bar"), ("@foo/bar", None)); 555 | assert_eq!(package_spec_split("@foo/bar/baz.js"), ("@foo/bar", Some("baz.js"))); 556 | assert_eq!(package_spec_split("foo"), ("foo", None)); 557 | assert_eq!(package_spec_split("foo/bar.js"), ("foo", Some("bar.js"))); 558 | } 559 | 560 | #[test] 561 | fn test_trivial_reachability_esm() { 562 | let project = project_trivial_esm(); 563 | 564 | let r = project 565 | .reachability_inner( 566 | &VulnerableNode::new("dependency", "vuln.js", 1, 31, 1, 34), 567 | Default::default(), 568 | ) 569 | .unwrap(); 570 | let path = r.find_path("dependent").unwrap(); 571 | println!("{:#?}", r); 572 | print_path(path); 573 | } 574 | 575 | #[test] 576 | fn test_trivial_reachability_cjs() { 577 | let project = project_trivial_cjs(); 578 | 579 | let r = project 580 | .reachability_inner( 581 | &VulnerableNode::new("dependency", "vuln.js", 1, 41, 1, 44), 582 | Default::default(), 583 | ) 584 | .unwrap(); 585 | let path = r.find_path("dependent").unwrap(); 586 | println!("{:#?}", r); 587 | print_path(path); 588 | } 589 | 590 | fn print_path(package_path: Vec<(String, Vec<(String, NodePath)>)>) { 591 | for (package, module_path) in package_path { 592 | println!("\x1b[34;1m{package}\x1b[0m:"); 593 | for (module, node_path) in module_path { 594 | println!(" \x1b[36;1m{module}\x1b[0m:"); 595 | for node_step in node_path { 596 | let (r, c) = node_step.start(); 597 | println!(" {:>4}:{:<5} {}", r, c, node_step.symbol(),); 598 | } 599 | } 600 | } 601 | } 602 | } 603 | -------------------------------------------------------------------------------- /vuln-reach/src/javascript/queries/commonjs-exports.lsp: -------------------------------------------------------------------------------- 1 | ; The order of the patterns must be preserved, or maintained alongside the 2 | ; users of this query. Unfortunately there is no efficient way of identifying 3 | ; a query pattern except its index. 4 | 5 | ; Pattern #0 6 | ; module.exports = identifier produces an unnamed identifier 7 | ( 8 | (assignment_expression 9 | left: (member_expression 10 | object: (identifier) @cjs-module 11 | property: (property_identifier) @cjs-exports) 12 | right: (identifier) @target-ident) 13 | (#eq? @cjs-module "module") 14 | (#eq? @cjs-exports "exports") 15 | ) 16 | 17 | ; Pattern #1 18 | ; module.exports = function () {} produces an unnamed scope 19 | ( 20 | (assignment_expression 21 | left: (member_expression 22 | object: (identifier) @cjs-module 23 | property: (property_identifier) @cjs-exports) 24 | right: (_ body: (_) @target-scope)) 25 | (#eq? @cjs-module "module") 26 | (#eq? @cjs-exports "exports") 27 | ) 28 | 29 | ; Pattern #2 30 | ; module.exports = { a, b, c } produces [a, b, c] 31 | ( 32 | (assignment_expression 33 | left: (member_expression 34 | object: (identifier) @cjs-module 35 | property: (property_identifier) @cjs-exports) 36 | right: (object) @target-object) 37 | (#eq? @cjs-module "module") 38 | (#eq? @cjs-exports "exports") 39 | ) 40 | 41 | ; Pattern #3 42 | ; module.exports.foo = ... produces [foo] 43 | ( 44 | (assignment_expression 45 | left: (member_expression 46 | object: (member_expression 47 | object: (identifier) @cjs-module 48 | property: (property_identifier) @cjs-exports) 49 | property: (property_identifier) @target-name) 50 | right: (_) @target-object) 51 | (#eq? @cjs-module "module") 52 | (#eq? @cjs-exports "exports") 53 | ) 54 | 55 | ; Pattern #4 56 | ; exports.foo = ... produces [foo] (rare, but happens) 57 | ( 58 | (assignment_expression 59 | left: (member_expression 60 | object: (identifier) @cjs-exports 61 | property: (property_identifier) @target-name) 62 | right: (_) @target-object) 63 | (#eq? @cjs-exports "exports") 64 | ) 65 | -------------------------------------------------------------------------------- /vuln-reach/src/javascript/queries/commonjs-imports.lsp: -------------------------------------------------------------------------------- 1 | ( 2 | (call_expression 3 | function: (identifier) @id-require 4 | arguments: (arguments (string (string_fragment) @import-source))) 5 | (#eq? @id-require "require") 6 | ) 7 | -------------------------------------------------------------------------------- /vuln-reach/src/javascript/queries/esm-exports.lsp: -------------------------------------------------------------------------------- 1 | ; The order of the patterns must be preserved, or maintained alongside the 2 | ; users of this query. Unfortunately there is no efficient way of identifying 3 | ; a query pattern except its index. 4 | 5 | ; Pattern #0 6 | ; 7 | ; export let name = 1 8 | ; export const name = 1 9 | (export_statement 10 | !source 11 | declaration: [ 12 | (lexical_declaration (variable_declarator name: (identifier) @export-decl)) 13 | (variable_declaration (variable_declarator name: (identifier) @export-decl)) 14 | ]) 15 | 16 | ; Pattern #1 17 | ; 18 | ; export function name() {} 19 | ; export function* name() {} 20 | ; export class ClassName {} 21 | (export_statement 22 | !source 23 | declaration: [ 24 | (_ 25 | name: (identifier) @export-decl 26 | body: (statement_block) @export-scope) 27 | (class_declaration 28 | name: (identifier) @export-decl 29 | body: (class_body) @export-scope) 30 | ]) 31 | 32 | ; Pattern #2 33 | ; export_specifier has a `name` field and, optionally, an `alias` field. 34 | ; 35 | ; export { foo, bar, baz } 36 | (export_statement 37 | !source 38 | (export_clause (export_specifier) @export-list-spec)) 39 | 40 | ; Pattern #3 41 | ; 42 | ; export default function name() {} 43 | ; export default function* name() {} 44 | ; export default class ClassName {} 45 | ; export default function () {} 46 | ; export default function* () {} 47 | ; export default class {} 48 | (export_statement 49 | !source 50 | value: [ 51 | (function body: (statement_block) @export-scope) 52 | (generator_function body: (statement_block) @export-scope) 53 | (class body: (class_body) @export-scope) 54 | ]) 55 | 56 | ; Pattern #4 57 | ; 58 | ; export default identifier 59 | (export_statement 60 | !source 61 | value: (identifier) @export-name) 62 | 63 | ; Pattern #5 64 | ; 65 | ; export const { foo, bar } = baz 66 | ; export let { foo, bar } = baz 67 | ; export var { foo, bar } = baz 68 | ; 69 | ; Object pattern is a special case because export names are linked to the 70 | ; pattern's RHS rather than getting picked up straight from the scope in which 71 | ; they are defined. 72 | (export_statement 73 | !source 74 | declaration: (_ 75 | (variable_declarator 76 | name: (object_pattern [ 77 | (shorthand_property_identifier_pattern) @export-pattern 78 | (pair_pattern key: (_) @export-pattern)]) 79 | value: (_) @export-source))) 80 | 81 | ; Pattern #6 82 | ; 83 | ; export const [foo, bar] = baz 84 | ; export let [foo, bar] = baz 85 | ; export var [foo, bar] = baz 86 | ; 87 | ; Same as above 88 | (export_statement 89 | !source 90 | declaration: (_ 91 | (variable_declarator 92 | name: (array_pattern (_) @export-pattern) 93 | value: (_) @export-source))) 94 | 95 | ; Pattern #7 96 | ; 97 | ; export default { a: 1, b: 2 } 98 | ; 99 | ; Same as above 100 | (export_statement 101 | !source 102 | value: (_) @export-value) 103 | 104 | ; Pattern #8 105 | ; 106 | ; export * from 'source' 107 | ; export { a as b, c } from 'source' 108 | (export_statement 109 | (export_clause)? @export_clause 110 | source: (string (string_fragment) @export-source)) 111 | -------------------------------------------------------------------------------- /vuln-reach/src/javascript/queries/esm-imports.lsp: -------------------------------------------------------------------------------- 1 | ; The order of the patterns must be preserved, or maintained alongside the 2 | ; users of this query. Unfortunately there is no efficient way of identifying 3 | ; a query pattern except its index. 4 | 5 | ; Pattern #0 6 | ; 7 | ; import name from "module" 8 | (import_statement 9 | (import_clause (identifier) @import-default) 10 | source: (string (string_fragment) @import-source)) 11 | 12 | ; Pattern #1 13 | ; 14 | ; import * as name from "module" 15 | (import_statement 16 | (import_clause (namespace_import (identifier) @import-star)) 17 | source: (string (string_fragment) @import-source)) 18 | 19 | ; Pattern #2 20 | ; 21 | ; import { foo } from "module" 22 | ; import { foo as bar } from "module" 23 | (import_statement 24 | (import_clause (named_imports (import_specifier) @import-spec)) 25 | source: (string (string_fragment) @import-source)) 26 | -------------------------------------------------------------------------------- /vuln-reach/src/lib.rs: -------------------------------------------------------------------------------- 1 | use std::collections::HashMap; 2 | use std::io; 3 | use std::ops::{Deref, DerefMut}; 4 | 5 | use lazy_static::lazy_static; 6 | use thiserror::Error; 7 | use tree_sitter::{Language, LanguageError, Node, Parser, Query, QueryError, Tree as TsTree}; 8 | 9 | pub mod javascript; 10 | pub mod util; 11 | 12 | pub use tree_sitter; 13 | 14 | extern "C" { 15 | fn tree_sitter_javascript() -> Language; 16 | } 17 | 18 | lazy_static! { 19 | pub static ref JS: Language = unsafe { tree_sitter_javascript() }; 20 | } 21 | 22 | pub fn js_parser() -> Parser { 23 | let mut p = Parser::new(); 24 | p.set_language(*JS).unwrap(); 25 | p 26 | } 27 | 28 | #[derive(Error, Debug)] 29 | pub enum Error { 30 | #[error("language error: {0}")] 31 | LanguageError(#[from] LanguageError), 32 | #[error("query error: {0}")] 33 | QueryError(#[from] QueryError), 34 | #[error("i/o error: {0}")] 35 | IoError(#[from] io::Error), 36 | #[error("{0}")] 37 | Generic(String), 38 | #[error("tree contains parse errors")] 39 | ParseError, 40 | #[error("node does not exist in tree")] 41 | InvalidNode, 42 | } 43 | 44 | impl From for Error { 45 | fn from(value: String) -> Self { 46 | Error::Generic(value) 47 | } 48 | } 49 | 50 | pub type Result = std::result::Result; 51 | 52 | #[derive(Debug)] 53 | pub struct Tree { 54 | buf: String, 55 | lang: Language, 56 | tree: TsTree, 57 | } 58 | 59 | impl Tree { 60 | pub fn new(buf: String) -> Result { 61 | let mut parser = js_parser(); 62 | parser.set_language(*JS)?; 63 | 64 | Self::with_parser(&mut parser, buf) 65 | } 66 | 67 | pub fn with_parser(parser: &mut Parser, buf: String) -> Result { 68 | let lang = parser 69 | .language() 70 | .ok_or_else(|| Error::Generic("Language for parser not specified".to_string()))?; 71 | let tree = parser 72 | .parse(&buf, None) 73 | .ok_or_else(|| Error::Generic("Could not parse source".to_string()))?; 74 | 75 | Ok(Tree { buf, lang, tree }) 76 | } 77 | 78 | pub fn buf(&self) -> &str { 79 | &self.buf 80 | } 81 | 82 | pub fn query(&self, text: &str) -> Result { 83 | Ok(Query::new(self.lang, text)?) 84 | } 85 | 86 | pub fn repr_of(&self, node: Node) -> &str { 87 | &self.buf[node.start_byte()..node.end_byte()] 88 | } 89 | } 90 | 91 | impl Deref for Tree { 92 | type Target = TsTree; 93 | 94 | fn deref(&self) -> &Self::Target { 95 | &self.tree 96 | } 97 | } 98 | 99 | impl DerefMut for Tree { 100 | fn deref_mut(&mut self) -> &mut Self::Target { 101 | &mut self.tree 102 | } 103 | } 104 | 105 | /// Cache for fast duplication of previously used cursors. 106 | pub struct TreeCursorCache<'a> { 107 | cursors: HashMap, Cursor<'a>>, 108 | tree: &'a Tree, 109 | } 110 | 111 | impl<'a> TreeCursorCache<'a> { 112 | fn new(tree: &'a Tree) -> Self { 113 | Self { tree, cursors: HashMap::new() } 114 | } 115 | 116 | fn cursor(&mut self, node: Node<'a>) -> Result> { 117 | match self.cursors.get(&node) { 118 | Some(cursor) => Ok(cursor.clone()), 119 | None => { 120 | let cursor = Cursor::new(self.tree, node)?; 121 | self.cursors.insert(node, cursor.clone()); 122 | Ok(cursor) 123 | }, 124 | } 125 | } 126 | } 127 | 128 | /// Cursor for upwards traversal of a [`treesitter::Tree`]. 129 | #[derive(Clone)] 130 | pub struct Cursor<'a> { 131 | nodes: Vec>, 132 | } 133 | 134 | impl<'a> Cursor<'a> { 135 | /// Construct a new cursor and move it to the `node`. 136 | pub fn new(tree: &'a Tree, node: Node<'a>) -> Result { 137 | let mut nodes = Vec::new(); 138 | 139 | // Start cursor at the root, so we know all parents. 140 | let mut cursor = tree.root_node().walk(); 141 | nodes.push(cursor.node()); 142 | 143 | // Iterate through children until we've found the desired node. 144 | let node_end = node.end_byte(); 145 | while node != nodes[nodes.len() - 1] { 146 | let child_offset = cursor.goto_first_child_for_byte(node_end); 147 | 148 | // Child does not exist in the tree. 149 | if child_offset.is_none() { 150 | return Err(Error::InvalidNode); 151 | } 152 | 153 | nodes.push(cursor.node()); 154 | } 155 | 156 | Ok(Self { nodes }) 157 | } 158 | 159 | /// Move the cursor to the parent node. 160 | pub fn goto_parent(&mut self) -> Option> { 161 | (self.nodes.len() > 1).then(|| { 162 | self.nodes.pop(); 163 | self.node() 164 | }) 165 | } 166 | 167 | /// Get the cursor's current node. 168 | pub fn node(&self) -> Node<'a> { 169 | self.nodes[self.nodes.len() - 1] 170 | } 171 | 172 | /// Get the node's parent. 173 | pub fn parent(&mut self) -> Option> { 174 | (self.nodes.len() > 1).then(|| self.nodes[self.nodes.len() - 2]) 175 | } 176 | 177 | /// Get an iterator from the cursor's current node to the tree root. 178 | pub fn parents(self) -> ParentIterator<'a> { 179 | ParentIterator { cursor: self } 180 | } 181 | } 182 | 183 | pub struct ParentIterator<'a> { 184 | cursor: Cursor<'a>, 185 | } 186 | 187 | impl<'a> Iterator for ParentIterator<'a> { 188 | type Item = Node<'a>; 189 | 190 | fn next(&mut self) -> Option { 191 | self.cursor.goto_parent() 192 | } 193 | } 194 | -------------------------------------------------------------------------------- /vuln-reach/src/util.rs: -------------------------------------------------------------------------------- 1 | use std::path::{Component, Path, PathBuf}; 2 | 3 | /// Normalize a path, removing things like `.` and `..`. 4 | /// 5 | /// CAUTION: This does not resolve symlinks (unlike 6 | /// [`std::fs::canonicalize`]). This may cause incorrect or surprising 7 | /// behavior at times. This should be used carefully. Unfortunately, 8 | /// [`std::fs::canonicalize`] can be hard to use correctly, since it can often 9 | /// fail, or on Windows returns annoying device paths. This is a problem Cargo 10 | /// needs to improve on. 11 | /// 12 | /// Taken from . 13 | pub fn normalize_path(path: &Path) -> PathBuf { 14 | let mut components = path.components().peekable(); 15 | let mut ret = if let Some(c @ Component::Prefix(..)) = components.peek().cloned() { 16 | components.next(); 17 | PathBuf::from(c.as_os_str()) 18 | } else { 19 | PathBuf::new() 20 | }; 21 | 22 | for component in components { 23 | match component { 24 | Component::Prefix(..) => unreachable!(), 25 | Component::RootDir => { 26 | ret.push(component.as_os_str()); 27 | }, 28 | Component::CurDir => {}, 29 | Component::ParentDir => { 30 | ret.pop(); 31 | }, 32 | Component::Normal(c) => { 33 | ret.push(c); 34 | }, 35 | } 36 | } 37 | ret 38 | } 39 | 40 | #[test] 41 | fn test_normalize_path() { 42 | // Check that traversal is prevented. 43 | assert_eq!(normalize_path(Path::new("./foo/bar")), PathBuf::from("foo/bar")); 44 | assert_eq!(normalize_path(Path::new("../foo/bar")), PathBuf::from("foo/bar")); 45 | assert_eq!(normalize_path(Path::new("../foo/../bar")), PathBuf::from("bar")); 46 | } 47 | --------------------------------------------------------------------------------