├── .github └── ISSUE_TEMPLATE │ └── bug-report.md ├── LICENSE ├── README.md ├── config.json ├── figure └── taint-mini.svg ├── main.py ├── pdg_js ├── LICENSE ├── README.md ├── __init__.py ├── build_ast.py ├── build_pdg.py ├── control_flow.py ├── data_flow.py ├── display_graph.py ├── extended_ast.py ├── js_operators.py ├── js_reserved.py ├── node.py ├── package-lock.json ├── package.json ├── parser.js ├── pointer_analysis.py ├── scope.py ├── utility_df.py └── value_filters.py ├── requirements.txt └── taint_mini ├── __init__.py ├── storage.py ├── taintmini.py ├── wxjs.py └── wxml.py /.github/ISSUE_TEMPLATE/bug-report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | title: '' 5 | labels: bug 6 | assignees: chaowangsec 7 | 8 | --- 9 | 10 | **Describe the bug** 11 | A clear and concise description of what the bug is. 12 | 13 | **To Reproduce** 14 | Steps to reproduce the behavior: 15 | 1. Configure the environment '....' 16 | 2. Type commands '....' 17 | 3. Run for a while '....' 18 | 4. Exception raised 19 | 20 | **Expected behavior** 21 | A clear and concise description of what you expected to happen. 22 | 23 | **Screenshots** 24 | If applicable, add screenshots to help explain your problem. 25 | 26 | **Environment (please complete the following information):** 27 | - OS: [e.g. Debian] 28 | - Version: [e.g. bookworm] 29 | - Python version: [e.g. 3.7] 30 | - Other environment: 31 | 32 | 33 | **Command line arguments** 34 | 35 | 36 | **Exception traceback (if applicable)** 37 | 38 | 39 | **Additional context** 40 | Add any other context about the problem here. 41 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | GNU AFFERO GENERAL PUBLIC LICENSE 2 | Version 3, 19 November 2007 3 | 4 | Copyright (C) 2007 Free Software Foundation, Inc. 5 | Everyone is permitted to copy and distribute verbatim copies 6 | of this license document, but changing it is not allowed. 7 | 8 | Preamble 9 | 10 | The GNU Affero General Public License is a free, copyleft license for 11 | software and other kinds of works, specifically designed to ensure 12 | cooperation with the community in the case of network server software. 13 | 14 | The licenses for most software and other practical works are designed 15 | to take away your freedom to share and change the works. By contrast, 16 | our General Public Licenses are intended to guarantee your freedom to 17 | share and change all versions of a program--to make sure it remains free 18 | software for all its users. 19 | 20 | When we speak of free software, we are referring to freedom, not 21 | price. Our General Public Licenses are designed to make sure that you 22 | have the freedom to distribute copies of free software (and charge for 23 | them if you wish), that you receive source code or can get it if you 24 | want it, that you can change the software or use pieces of it in new 25 | free programs, and that you know you can do these things. 26 | 27 | Developers that use our General Public Licenses protect your rights 28 | with two steps: (1) assert copyright on the software, and (2) offer 29 | you this License which gives you legal permission to copy, distribute 30 | and/or modify the software. 31 | 32 | A secondary benefit of defending all users' freedom is that 33 | improvements made in alternate versions of the program, if they 34 | receive widespread use, become available for other developers to 35 | incorporate. Many developers of free software are heartened and 36 | encouraged by the resulting cooperation. However, in the case of 37 | software used on network servers, this result may fail to come about. 38 | The GNU General Public License permits making a modified version and 39 | letting the public access it on a server without ever releasing its 40 | source code to the public. 41 | 42 | The GNU Affero General Public License is designed specifically to 43 | ensure that, in such cases, the modified source code becomes available 44 | to the community. It requires the operator of a network server to 45 | provide the source code of the modified version running there to the 46 | users of that server. Therefore, public use of a modified version, on 47 | a publicly accessible server, gives the public access to the source 48 | code of the modified version. 49 | 50 | An older license, called the Affero General Public License and 51 | published by Affero, was designed to accomplish similar goals. This is 52 | a different license, not a version of the Affero GPL, but Affero has 53 | released a new version of the Affero GPL which permits relicensing under 54 | this license. 55 | 56 | The precise terms and conditions for copying, distribution and 57 | modification follow. 58 | 59 | TERMS AND CONDITIONS 60 | 61 | 0. Definitions. 62 | 63 | "This License" refers to version 3 of the GNU Affero General Public License. 64 | 65 | "Copyright" also means copyright-like laws that apply to other kinds of 66 | works, such as semiconductor masks. 67 | 68 | "The Program" refers to any copyrightable work licensed under this 69 | License. Each licensee is addressed as "you". "Licensees" and 70 | "recipients" may be individuals or organizations. 71 | 72 | To "modify" a work means to copy from or adapt all or part of the work 73 | in a fashion requiring copyright permission, other than the making of an 74 | exact copy. The resulting work is called a "modified version" of the 75 | earlier work or a work "based on" the earlier work. 76 | 77 | A "covered work" means either the unmodified Program or a work based 78 | on the Program. 79 | 80 | To "propagate" a work means to do anything with it that, without 81 | permission, would make you directly or secondarily liable for 82 | infringement under applicable copyright law, except executing it on a 83 | computer or modifying a private copy. Propagation includes copying, 84 | distribution (with or without modification), making available to the 85 | public, and in some countries other activities as well. 86 | 87 | To "convey" a work means any kind of propagation that enables other 88 | parties to make or receive copies. Mere interaction with a user through 89 | a computer network, with no transfer of a copy, is not conveying. 90 | 91 | An interactive user interface displays "Appropriate Legal Notices" 92 | to the extent that it includes a convenient and prominently visible 93 | feature that (1) displays an appropriate copyright notice, and (2) 94 | tells the user that there is no warranty for the work (except to the 95 | extent that warranties are provided), that licensees may convey the 96 | work under this License, and how to view a copy of this License. If 97 | the interface presents a list of user commands or options, such as a 98 | menu, a prominent item in the list meets this criterion. 99 | 100 | 1. Source Code. 101 | 102 | The "source code" for a work means the preferred form of the work 103 | for making modifications to it. "Object code" means any non-source 104 | form of a work. 105 | 106 | A "Standard Interface" means an interface that either is an official 107 | standard defined by a recognized standards body, or, in the case of 108 | interfaces specified for a particular programming language, one that 109 | is widely used among developers working in that language. 110 | 111 | The "System Libraries" of an executable work include anything, other 112 | than the work as a whole, that (a) is included in the normal form of 113 | packaging a Major Component, but which is not part of that Major 114 | Component, and (b) serves only to enable use of the work with that 115 | Major Component, or to implement a Standard Interface for which an 116 | implementation is available to the public in source code form. A 117 | "Major Component", in this context, means a major essential component 118 | (kernel, window system, and so on) of the specific operating system 119 | (if any) on which the executable work runs, or a compiler used to 120 | produce the work, or an object code interpreter used to run it. 121 | 122 | The "Corresponding Source" for a work in object code form means all 123 | the source code needed to generate, install, and (for an executable 124 | work) run the object code and to modify the work, including scripts to 125 | control those activities. However, it does not include the work's 126 | System Libraries, or general-purpose tools or generally available free 127 | programs which are used unmodified in performing those activities but 128 | which are not part of the work. For example, Corresponding Source 129 | includes interface definition files associated with source files for 130 | the work, and the source code for shared libraries and dynamically 131 | linked subprograms that the work is specifically designed to require, 132 | such as by intimate data communication or control flow between those 133 | subprograms and other parts of the work. 134 | 135 | The Corresponding Source need not include anything that users 136 | can regenerate automatically from other parts of the Corresponding 137 | Source. 138 | 139 | The Corresponding Source for a work in source code form is that 140 | same work. 141 | 142 | 2. Basic Permissions. 143 | 144 | All rights granted under this License are granted for the term of 145 | copyright on the Program, and are irrevocable provided the stated 146 | conditions are met. This License explicitly affirms your unlimited 147 | permission to run the unmodified Program. The output from running a 148 | covered work is covered by this License only if the output, given its 149 | content, constitutes a covered work. This License acknowledges your 150 | rights of fair use or other equivalent, as provided by copyright law. 151 | 152 | You may make, run and propagate covered works that you do not 153 | convey, without conditions so long as your license otherwise remains 154 | in force. You may convey covered works to others for the sole purpose 155 | of having them make modifications exclusively for you, or provide you 156 | with facilities for running those works, provided that you comply with 157 | the terms of this License in conveying all material for which you do 158 | not control copyright. Those thus making or running the covered works 159 | for you must do so exclusively on your behalf, under your direction 160 | and control, on terms that prohibit them from making any copies of 161 | your copyrighted material outside their relationship with you. 162 | 163 | Conveying under any other circumstances is permitted solely under 164 | the conditions stated below. Sublicensing is not allowed; section 10 165 | makes it unnecessary. 166 | 167 | 3. Protecting Users' Legal Rights From Anti-Circumvention Law. 168 | 169 | No covered work shall be deemed part of an effective technological 170 | measure under any applicable law fulfilling obligations under article 171 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or 172 | similar laws prohibiting or restricting circumvention of such 173 | measures. 174 | 175 | When you convey a covered work, you waive any legal power to forbid 176 | circumvention of technological measures to the extent such circumvention 177 | is effected by exercising rights under this License with respect to 178 | the covered work, and you disclaim any intention to limit operation or 179 | modification of the work as a means of enforcing, against the work's 180 | users, your or third parties' legal rights to forbid circumvention of 181 | technological measures. 182 | 183 | 4. Conveying Verbatim Copies. 184 | 185 | You may convey verbatim copies of the Program's source code as you 186 | receive it, in any medium, provided that you conspicuously and 187 | appropriately publish on each copy an appropriate copyright notice; 188 | keep intact all notices stating that this License and any 189 | non-permissive terms added in accord with section 7 apply to the code; 190 | keep intact all notices of the absence of any warranty; and give all 191 | recipients a copy of this License along with the Program. 192 | 193 | You may charge any price or no price for each copy that you convey, 194 | and you may offer support or warranty protection for a fee. 195 | 196 | 5. Conveying Modified Source Versions. 197 | 198 | You may convey a work based on the Program, or the modifications to 199 | produce it from the Program, in the form of source code under the 200 | terms of section 4, provided that you also meet all of these conditions: 201 | 202 | a) The work must carry prominent notices stating that you modified 203 | it, and giving a relevant date. 204 | 205 | b) The work must carry prominent notices stating that it is 206 | released under this License and any conditions added under section 207 | 7. This requirement modifies the requirement in section 4 to 208 | "keep intact all notices". 209 | 210 | c) You must license the entire work, as a whole, under this 211 | License to anyone who comes into possession of a copy. This 212 | License will therefore apply, along with any applicable section 7 213 | additional terms, to the whole of the work, and all its parts, 214 | regardless of how they are packaged. This License gives no 215 | permission to license the work in any other way, but it does not 216 | invalidate such permission if you have separately received it. 217 | 218 | d) If the work has interactive user interfaces, each must display 219 | Appropriate Legal Notices; however, if the Program has interactive 220 | interfaces that do not display Appropriate Legal Notices, your 221 | work need not make them do so. 222 | 223 | A compilation of a covered work with other separate and independent 224 | works, which are not by their nature extensions of the covered work, 225 | and which are not combined with it such as to form a larger program, 226 | in or on a volume of a storage or distribution medium, is called an 227 | "aggregate" if the compilation and its resulting copyright are not 228 | used to limit the access or legal rights of the compilation's users 229 | beyond what the individual works permit. Inclusion of a covered work 230 | in an aggregate does not cause this License to apply to the other 231 | parts of the aggregate. 232 | 233 | 6. Conveying Non-Source Forms. 234 | 235 | You may convey a covered work in object code form under the terms 236 | of sections 4 and 5, provided that you also convey the 237 | machine-readable Corresponding Source under the terms of this License, 238 | in one of these ways: 239 | 240 | a) Convey the object code in, or embodied in, a physical product 241 | (including a physical distribution medium), accompanied by the 242 | Corresponding Source fixed on a durable physical medium 243 | customarily used for software interchange. 244 | 245 | b) Convey the object code in, or embodied in, a physical product 246 | (including a physical distribution medium), accompanied by a 247 | written offer, valid for at least three years and valid for as 248 | long as you offer spare parts or customer support for that product 249 | model, to give anyone who possesses the object code either (1) a 250 | copy of the Corresponding Source for all the software in the 251 | product that is covered by this License, on a durable physical 252 | medium customarily used for software interchange, for a price no 253 | more than your reasonable cost of physically performing this 254 | conveying of source, or (2) access to copy the 255 | Corresponding Source from a network server at no charge. 256 | 257 | c) Convey individual copies of the object code with a copy of the 258 | written offer to provide the Corresponding Source. This 259 | alternative is allowed only occasionally and noncommercially, and 260 | only if you received the object code with such an offer, in accord 261 | with subsection 6b. 262 | 263 | d) Convey the object code by offering access from a designated 264 | place (gratis or for a charge), and offer equivalent access to the 265 | Corresponding Source in the same way through the same place at no 266 | further charge. You need not require recipients to copy the 267 | Corresponding Source along with the object code. If the place to 268 | copy the object code is a network server, the Corresponding Source 269 | may be on a different server (operated by you or a third party) 270 | that supports equivalent copying facilities, provided you maintain 271 | clear directions next to the object code saying where to find the 272 | Corresponding Source. Regardless of what server hosts the 273 | Corresponding Source, you remain obligated to ensure that it is 274 | available for as long as needed to satisfy these requirements. 275 | 276 | e) Convey the object code using peer-to-peer transmission, provided 277 | you inform other peers where the object code and Corresponding 278 | Source of the work are being offered to the general public at no 279 | charge under subsection 6d. 280 | 281 | A separable portion of the object code, whose source code is excluded 282 | from the Corresponding Source as a System Library, need not be 283 | included in conveying the object code work. 284 | 285 | A "User Product" is either (1) a "consumer product", which means any 286 | tangible personal property which is normally used for personal, family, 287 | or household purposes, or (2) anything designed or sold for incorporation 288 | into a dwelling. In determining whether a product is a consumer product, 289 | doubtful cases shall be resolved in favor of coverage. For a particular 290 | product received by a particular user, "normally used" refers to a 291 | typical or common use of that class of product, regardless of the status 292 | of the particular user or of the way in which the particular user 293 | actually uses, or expects or is expected to use, the product. A product 294 | is a consumer product regardless of whether the product has substantial 295 | commercial, industrial or non-consumer uses, unless such uses represent 296 | the only significant mode of use of the product. 297 | 298 | "Installation Information" for a User Product means any methods, 299 | procedures, authorization keys, or other information required to install 300 | and execute modified versions of a covered work in that User Product from 301 | a modified version of its Corresponding Source. The information must 302 | suffice to ensure that the continued functioning of the modified object 303 | code is in no case prevented or interfered with solely because 304 | modification has been made. 305 | 306 | If you convey an object code work under this section in, or with, or 307 | specifically for use in, a User Product, and the conveying occurs as 308 | part of a transaction in which the right of possession and use of the 309 | User Product is transferred to the recipient in perpetuity or for a 310 | fixed term (regardless of how the transaction is characterized), the 311 | Corresponding Source conveyed under this section must be accompanied 312 | by the Installation Information. But this requirement does not apply 313 | if neither you nor any third party retains the ability to install 314 | modified object code on the User Product (for example, the work has 315 | been installed in ROM). 316 | 317 | The requirement to provide Installation Information does not include a 318 | requirement to continue to provide support service, warranty, or updates 319 | for a work that has been modified or installed by the recipient, or for 320 | the User Product in which it has been modified or installed. Access to a 321 | network may be denied when the modification itself materially and 322 | adversely affects the operation of the network or violates the rules and 323 | protocols for communication across the network. 324 | 325 | Corresponding Source conveyed, and Installation Information provided, 326 | in accord with this section must be in a format that is publicly 327 | documented (and with an implementation available to the public in 328 | source code form), and must require no special password or key for 329 | unpacking, reading or copying. 330 | 331 | 7. Additional Terms. 332 | 333 | "Additional permissions" are terms that supplement the terms of this 334 | License by making exceptions from one or more of its conditions. 335 | Additional permissions that are applicable to the entire Program shall 336 | be treated as though they were included in this License, to the extent 337 | that they are valid under applicable law. If additional permissions 338 | apply only to part of the Program, that part may be used separately 339 | under those permissions, but the entire Program remains governed by 340 | this License without regard to the additional permissions. 341 | 342 | When you convey a copy of a covered work, you may at your option 343 | remove any additional permissions from that copy, or from any part of 344 | it. (Additional permissions may be written to require their own 345 | removal in certain cases when you modify the work.) You may place 346 | additional permissions on material, added by you to a covered work, 347 | for which you have or can give appropriate copyright permission. 348 | 349 | Notwithstanding any other provision of this License, for material you 350 | add to a covered work, you may (if authorized by the copyright holders of 351 | that material) supplement the terms of this License with terms: 352 | 353 | a) Disclaiming warranty or limiting liability differently from the 354 | terms of sections 15 and 16 of this License; or 355 | 356 | b) Requiring preservation of specified reasonable legal notices or 357 | author attributions in that material or in the Appropriate Legal 358 | Notices displayed by works containing it; or 359 | 360 | c) Prohibiting misrepresentation of the origin of that material, or 361 | requiring that modified versions of such material be marked in 362 | reasonable ways as different from the original version; or 363 | 364 | d) Limiting the use for publicity purposes of names of licensors or 365 | authors of the material; or 366 | 367 | e) Declining to grant rights under trademark law for use of some 368 | trade names, trademarks, or service marks; or 369 | 370 | f) Requiring indemnification of licensors and authors of that 371 | material by anyone who conveys the material (or modified versions of 372 | it) with contractual assumptions of liability to the recipient, for 373 | any liability that these contractual assumptions directly impose on 374 | those licensors and authors. 375 | 376 | All other non-permissive additional terms are considered "further 377 | restrictions" within the meaning of section 10. If the Program as you 378 | received it, or any part of it, contains a notice stating that it is 379 | governed by this License along with a term that is a further 380 | restriction, you may remove that term. If a license document contains 381 | a further restriction but permits relicensing or conveying under this 382 | License, you may add to a covered work material governed by the terms 383 | of that license document, provided that the further restriction does 384 | not survive such relicensing or conveying. 385 | 386 | If you add terms to a covered work in accord with this section, you 387 | must place, in the relevant source files, a statement of the 388 | additional terms that apply to those files, or a notice indicating 389 | where to find the applicable terms. 390 | 391 | Additional terms, permissive or non-permissive, may be stated in the 392 | form of a separately written license, or stated as exceptions; 393 | the above requirements apply either way. 394 | 395 | 8. Termination. 396 | 397 | You may not propagate or modify a covered work except as expressly 398 | provided under this License. Any attempt otherwise to propagate or 399 | modify it is void, and will automatically terminate your rights under 400 | this License (including any patent licenses granted under the third 401 | paragraph of section 11). 402 | 403 | However, if you cease all violation of this License, then your 404 | license from a particular copyright holder is reinstated (a) 405 | provisionally, unless and until the copyright holder explicitly and 406 | finally terminates your license, and (b) permanently, if the copyright 407 | holder fails to notify you of the violation by some reasonable means 408 | prior to 60 days after the cessation. 409 | 410 | Moreover, your license from a particular copyright holder is 411 | reinstated permanently if the copyright holder notifies you of the 412 | violation by some reasonable means, this is the first time you have 413 | received notice of violation of this License (for any work) from that 414 | copyright holder, and you cure the violation prior to 30 days after 415 | your receipt of the notice. 416 | 417 | Termination of your rights under this section does not terminate the 418 | licenses of parties who have received copies or rights from you under 419 | this License. If your rights have been terminated and not permanently 420 | reinstated, you do not qualify to receive new licenses for the same 421 | material under section 10. 422 | 423 | 9. Acceptance Not Required for Having Copies. 424 | 425 | You are not required to accept this License in order to receive or 426 | run a copy of the Program. Ancillary propagation of a covered work 427 | occurring solely as a consequence of using peer-to-peer transmission 428 | to receive a copy likewise does not require acceptance. However, 429 | nothing other than this License grants you permission to propagate or 430 | modify any covered work. These actions infringe copyright if you do 431 | not accept this License. Therefore, by modifying or propagating a 432 | covered work, you indicate your acceptance of this License to do so. 433 | 434 | 10. Automatic Licensing of Downstream Recipients. 435 | 436 | Each time you convey a covered work, the recipient automatically 437 | receives a license from the original licensors, to run, modify and 438 | propagate that work, subject to this License. You are not responsible 439 | for enforcing compliance by third parties with this License. 440 | 441 | An "entity transaction" is a transaction transferring control of an 442 | organization, or substantially all assets of one, or subdividing an 443 | organization, or merging organizations. If propagation of a covered 444 | work results from an entity transaction, each party to that 445 | transaction who receives a copy of the work also receives whatever 446 | licenses to the work the party's predecessor in interest had or could 447 | give under the previous paragraph, plus a right to possession of the 448 | Corresponding Source of the work from the predecessor in interest, if 449 | the predecessor has it or can get it with reasonable efforts. 450 | 451 | You may not impose any further restrictions on the exercise of the 452 | rights granted or affirmed under this License. For example, you may 453 | not impose a license fee, royalty, or other charge for exercise of 454 | rights granted under this License, and you may not initiate litigation 455 | (including a cross-claim or counterclaim in a lawsuit) alleging that 456 | any patent claim is infringed by making, using, selling, offering for 457 | sale, or importing the Program or any portion of it. 458 | 459 | 11. Patents. 460 | 461 | A "contributor" is a copyright holder who authorizes use under this 462 | License of the Program or a work on which the Program is based. The 463 | work thus licensed is called the contributor's "contributor version". 464 | 465 | A contributor's "essential patent claims" are all patent claims 466 | owned or controlled by the contributor, whether already acquired or 467 | hereafter acquired, that would be infringed by some manner, permitted 468 | by this License, of making, using, or selling its contributor version, 469 | but do not include claims that would be infringed only as a 470 | consequence of further modification of the contributor version. For 471 | purposes of this definition, "control" includes the right to grant 472 | patent sublicenses in a manner consistent with the requirements of 473 | this License. 474 | 475 | Each contributor grants you a non-exclusive, worldwide, royalty-free 476 | patent license under the contributor's essential patent claims, to 477 | make, use, sell, offer for sale, import and otherwise run, modify and 478 | propagate the contents of its contributor version. 479 | 480 | In the following three paragraphs, a "patent license" is any express 481 | agreement or commitment, however denominated, not to enforce a patent 482 | (such as an express permission to practice a patent or covenant not to 483 | sue for patent infringement). To "grant" such a patent license to a 484 | party means to make such an agreement or commitment not to enforce a 485 | patent against the party. 486 | 487 | If you convey a covered work, knowingly relying on a patent license, 488 | and the Corresponding Source of the work is not available for anyone 489 | to copy, free of charge and under the terms of this License, through a 490 | publicly available network server or other readily accessible means, 491 | then you must either (1) cause the Corresponding Source to be so 492 | available, or (2) arrange to deprive yourself of the benefit of the 493 | patent license for this particular work, or (3) arrange, in a manner 494 | consistent with the requirements of this License, to extend the patent 495 | license to downstream recipients. "Knowingly relying" means you have 496 | actual knowledge that, but for the patent license, your conveying the 497 | covered work in a country, or your recipient's use of the covered work 498 | in a country, would infringe one or more identifiable patents in that 499 | country that you have reason to believe are valid. 500 | 501 | If, pursuant to or in connection with a single transaction or 502 | arrangement, you convey, or propagate by procuring conveyance of, a 503 | covered work, and grant a patent license to some of the parties 504 | receiving the covered work authorizing them to use, propagate, modify 505 | or convey a specific copy of the covered work, then the patent license 506 | you grant is automatically extended to all recipients of the covered 507 | work and works based on it. 508 | 509 | A patent license is "discriminatory" if it does not include within 510 | the scope of its coverage, prohibits the exercise of, or is 511 | conditioned on the non-exercise of one or more of the rights that are 512 | specifically granted under this License. You may not convey a covered 513 | work if you are a party to an arrangement with a third party that is 514 | in the business of distributing software, under which you make payment 515 | to the third party based on the extent of your activity of conveying 516 | the work, and under which the third party grants, to any of the 517 | parties who would receive the covered work from you, a discriminatory 518 | patent license (a) in connection with copies of the covered work 519 | conveyed by you (or copies made from those copies), or (b) primarily 520 | for and in connection with specific products or compilations that 521 | contain the covered work, unless you entered into that arrangement, 522 | or that patent license was granted, prior to 28 March 2007. 523 | 524 | Nothing in this License shall be construed as excluding or limiting 525 | any implied license or other defenses to infringement that may 526 | otherwise be available to you under applicable patent law. 527 | 528 | 12. No Surrender of Others' Freedom. 529 | 530 | If conditions are imposed on you (whether by court order, agreement or 531 | otherwise) that contradict the conditions of this License, they do not 532 | excuse you from the conditions of this License. If you cannot convey a 533 | covered work so as to satisfy simultaneously your obligations under this 534 | License and any other pertinent obligations, then as a consequence you may 535 | not convey it at all. For example, if you agree to terms that obligate you 536 | to collect a royalty for further conveying from those to whom you convey 537 | the Program, the only way you could satisfy both those terms and this 538 | License would be to refrain entirely from conveying the Program. 539 | 540 | 13. Remote Network Interaction; Use with the GNU General Public License. 541 | 542 | Notwithstanding any other provision of this License, if you modify the 543 | Program, your modified version must prominently offer all users 544 | interacting with it remotely through a computer network (if your version 545 | supports such interaction) an opportunity to receive the Corresponding 546 | Source of your version by providing access to the Corresponding Source 547 | from a network server at no charge, through some standard or customary 548 | means of facilitating copying of software. This Corresponding Source 549 | shall include the Corresponding Source for any work covered by version 3 550 | of the GNU General Public License that is incorporated pursuant to the 551 | following paragraph. 552 | 553 | Notwithstanding any other provision of this License, you have 554 | permission to link or combine any covered work with a work licensed 555 | under version 3 of the GNU General Public License into a single 556 | combined work, and to convey the resulting work. The terms of this 557 | License will continue to apply to the part which is the covered work, 558 | but the work with which it is combined will remain governed by version 559 | 3 of the GNU General Public License. 560 | 561 | 14. Revised Versions of this License. 562 | 563 | The Free Software Foundation may publish revised and/or new versions of 564 | the GNU Affero General Public License from time to time. Such new versions 565 | will be similar in spirit to the present version, but may differ in detail to 566 | address new problems or concerns. 567 | 568 | Each version is given a distinguishing version number. If the 569 | Program specifies that a certain numbered version of the GNU Affero General 570 | Public License "or any later version" applies to it, you have the 571 | option of following the terms and conditions either of that numbered 572 | version or of any later version published by the Free Software 573 | Foundation. If the Program does not specify a version number of the 574 | GNU Affero General Public License, you may choose any version ever published 575 | by the Free Software Foundation. 576 | 577 | If the Program specifies that a proxy can decide which future 578 | versions of the GNU Affero General Public License can be used, that proxy's 579 | public statement of acceptance of a version permanently authorizes you 580 | to choose that version for the Program. 581 | 582 | Later license versions may give you additional or different 583 | permissions. However, no additional obligations are imposed on any 584 | author or copyright holder as a result of your choosing to follow a 585 | later version. 586 | 587 | 15. Disclaimer of Warranty. 588 | 589 | THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY 590 | APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT 591 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY 592 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, 593 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 594 | PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM 595 | IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF 596 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 597 | 598 | 16. Limitation of Liability. 599 | 600 | IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 601 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS 602 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY 603 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE 604 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF 605 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD 606 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), 607 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF 608 | SUCH DAMAGES. 609 | 610 | 17. Interpretation of Sections 15 and 16. 611 | 612 | If the disclaimer of warranty and limitation of liability provided 613 | above cannot be given local legal effect according to their terms, 614 | reviewing courts shall apply local law that most closely approximates 615 | an absolute waiver of all civil liability in connection with the 616 | Program, unless a warranty or assumption of liability accompanies a 617 | copy of the Program in return for a fee. 618 | 619 | END OF TERMS AND CONDITIONS 620 | 621 | How to Apply These Terms to Your New Programs 622 | 623 | If you develop a new program, and you want it to be of the greatest 624 | possible use to the public, the best way to achieve this is to make it 625 | free software which everyone can redistribute and change under these terms. 626 | 627 | To do so, attach the following notices to the program. It is safest 628 | to attach them to the start of each source file to most effectively 629 | state the exclusion of warranty; and each file should have at least 630 | the "copyright" line and a pointer to where the full notice is found. 631 | 632 | 633 | Copyright (C) 634 | 635 | This program is free software: you can redistribute it and/or modify 636 | it under the terms of the GNU Affero General Public License as published 637 | by the Free Software Foundation, either version 3 of the License, or 638 | (at your option) any later version. 639 | 640 | This program is distributed in the hope that it will be useful, 641 | but WITHOUT ANY WARRANTY; without even the implied warranty of 642 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 643 | GNU Affero General Public License for more details. 644 | 645 | You should have received a copy of the GNU Affero General Public License 646 | along with this program. If not, see . 647 | 648 | Also add information on how to contact you by electronic and paper mail. 649 | 650 | If your software can interact with users remotely through a computer 651 | network, you should also make sure that it provides a way for users to 652 | get its source. For example, if your program is a web application, its 653 | interface could display a "Source" link that leads users to an archive 654 | of the code. There are many ways you could offer source, and different 655 | solutions will be better for different programs; see section 13 for the 656 | specific requirements. 657 | 658 | You should also get your employer (if you work as a programmer) or school, 659 | if any, to sign a "copyright disclaimer" for the program, if necessary. 660 | For more information on this, and how to apply and follow the GNU AGPL, see 661 | . 662 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # TaintMini 2 | 3 | TaintMini is a framework for detecting flows of sensitive data in Mini-Programs with static taint analysis. It is a novel universal data flow graph approach that captures data flows within 4 | and across mini-programs. 5 | 6 |

taintmini

7 | 8 | We implemented TaintMini based on `pdg_js` (from [DoubleX](https://github.com/Aurore54F/DoubleX) by [Aurore Fass](https://aurore54f.github.io/) *et al*.). For more implementation details, please refer to our [paper](https://chaowang.dev/publications/icse23.pdf) and the [DoubleX paper](https://swag.cispa.saarland/papers/fass2021doublex.pdf). 9 | 10 | ## Table of contents 11 | 12 | - [TaintMini](#taintmini) 13 | - [Table of contents](#table-of-contents) 14 | - [Prerequisites](#prerequisites) 15 | - [Environment](#environment) 16 | - [Dependencies](#dependencies) 17 | - [Pre-processing](#pre-processing) 18 | - [Usage](#usage) 19 | - [Config](#config) 20 | - [Examples](#examples) 21 | - [Single MiniProgram](#single-miniprogram) 22 | - [Multiple MiniPrograms](#multiple-miniprograms) 23 | - [Citation](#citation) 24 | - [License](#license) 25 | 26 | ## Prerequisites 27 | 28 | ### Environment 29 | 30 | For optimal performance, we recommend allocating at least 4 cores and 16 GiB of memory to run the tool. 31 | Additionally, for best IO performance during analysis, we recommend using SSDs rather than hard disk drives, due to the large number of small files (less than one page size) that Mini-Programs typically have. 32 | As a reference, we used 16 vCPUs of Intel Xeon Silver 4314, 128 GiB of 3200 MHz DDR4 memory, and 2 TiB of NVMe SSD (700 KIOPS) as the host for building and validating our artifact evaluation submission. 33 | 34 | ### Dependencies 35 | 36 | Install Node.js dependencies for `pdg_js` first. 37 | 38 | ```bash 39 | # make sure node.js and npm is installed 40 | node --version && cd pdg_js && npm i 41 | ``` 42 | 43 | Install requirements for python. 44 | 45 | ```bash 46 | # install requirements 47 | pip install -r requirements.txt 48 | ``` 49 | 50 | ### Pre-processing 51 | 52 | TaintMini operates on unpacked WeChat Mini-Programs, necessitating the use of a WeChat Mini-Program unpacking tool in advance. 53 | Please note that we are unable to provide such a tool directly due to potential legal implications. 54 | We recommend seeking it out on external websites. 55 | 56 | ## Usage 57 | 58 | ``` 59 | usage: mini-taint [-h] -i path [-o path] [-c path] [-j number] [-b] 60 | 61 | optional arguments: 62 | -h, --help show this help message and exit 63 | -i path, --input path 64 | path of input mini program(s). Single mini program directory or index files will both be fine. 65 | -o path, --output path 66 | path of output results. The output file will be stored outside of the mini program directories. 67 | -c path, --config path 68 | path of config file. See default config file for example. Leave the field empty to include all results. 69 | -j number, --jobs number 70 | number of workers. 71 | -b, --bench enable benchmark data log. Default: False 72 | ``` 73 | 74 | Results will be written to the directory provided by the `-o/--output` flag. 75 | Result files are named `$(basename )-result.csv`, 76 | along with `$(basename )-bench.csv` if `-b/--bench` option is present. 77 | 78 | ## Config 79 | 80 | The `config.json` is a JSON formatted file, which includes two fields: `sources` and `sinks`: 81 | 82 | - `sources` is an array, indicating the source APIs that need to be included. Please note there is a special value named `[double_binding]` which indicates the data flows from `WXML`. 83 | - `sinks` is an array, indicating the sink APIs that need to be included. 84 | 85 | For examples, please refer to the `config.json` file. 86 | 87 | ## Examples 88 | 89 | ### Single MiniProgram 90 | 91 | Analyze a single MiniProgram; Include all sources and sinks; Enable multi-processing (all available CPU cores); No benchmark required. 92 | 93 | ```bash 94 | python main.py -i /path/to/miniprogram -o ./results -j $(nproc) 95 | ``` 96 | 97 | ### Multiple MiniPrograms 98 | 99 | Analyze multiple MiniPrograms; Include all sources and sinks; Enable multi-processing (all available CPU cores); Benchmarks required. 100 | 101 | ```bash 102 | # generate index 103 | find /path/to/miniprograms -maxdepth 1 -type d -name "wx*" > index.txt 104 | # start analysis 105 | python main.py -i ./index.txt -o ./results -j $(nproc) -b 106 | ``` 107 | 108 | ## Citation 109 | 110 | If you find TaintMini useful, please consider citing our paper and DoubleX: 111 | 112 | ```plaintext 113 | @inproceedings{wang2023taintmini, 114 | title={TAINTMINI: Detecting Flow of Sensitive Data in Mini-Programs with Static Taint Analysis}, 115 | author={Wang, Chao and Ko, Ronny and Zhang, Yue and Yang, Yuqing and Lin, Zhiqiang}, 116 | booktitle={Proceedings of the 45th International Conference on Software Engineering}, 117 | year={2023} 118 | } 119 | 120 | @inproceedings{fass2021doublex, 121 | author="Aurore Fass and Doli{\`e}re Francis Som{\'e} and Michael Backes and Ben Stock", 122 | title="{\textsc{DoubleX}: Statically Detecting Vulnerable Data Flows in Browser Extensions at Scale}", 123 | booktitle="ACM CCS", 124 | year="2021" 125 | } 126 | ``` 127 | 128 | ## License 129 | 130 | This project is licensed under the terms of the AGPLV3 license. 131 | 132 | * **pdg_js** is credit to [**DoubleX**](https://github.com/Aurore54F/DoubleX/) 133 | 134 | 135 | -------------------------------------------------------------------------------- /config.json: -------------------------------------------------------------------------------- 1 | { 2 | "sources": ["wx.getStorage", "wx.getStorageSync", "[double_binding]"], 3 | "sinks": ["wx.setStorage", "wx.setStorageSync", "wx.navigateTo", "wx.navigateToMiniProgram"] 4 | } -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import json 2 | from taint_mini import taintmini 3 | import argparse 4 | import os 5 | 6 | 7 | def main(): 8 | parser = argparse.ArgumentParser(prog="taint-mini", 9 | formatter_class=argparse.RawTextHelpFormatter) 10 | 11 | parser.add_argument("-i", "--input", dest="input", metavar="path", type=str, required=True, 12 | help="path of input mini program(s)." 13 | "Single mini program directory or index files will both be fine.") 14 | parser.add_argument("-o", "--output", dest="output", metavar="path", type=str, default="results", 15 | help="path of output results." 16 | "The output file will be stored outside of the mini program directories.") 17 | parser.add_argument("-c", "--config", dest="config", metavar="path", type=str, 18 | help="path of config file." 19 | "See default config file for example. Leave the field empty to include all results.") 20 | parser.add_argument("-j", "--jobs", dest="workers", metavar="number", type=int, default=None, 21 | help="number of workers.") 22 | parser.add_argument("-b", "--bench", dest="bench", action="store_true", 23 | help="enable benchmark data log." 24 | "Default: False") 25 | 26 | args = parser.parse_args() 27 | input_path = args.input 28 | output_path = args.output 29 | config_path = args.config 30 | workers = args.workers 31 | bench = args.bench 32 | 33 | # test config 34 | config = None 35 | if config_path is None: 36 | # no config given, include all sources and sinks 37 | config = dict() 38 | else: 39 | try: 40 | config = json.load(open(config_path)) 41 | except FileNotFoundError: 42 | print(f"[main] error: config not found") 43 | exit(-1) 44 | 45 | # test input_path 46 | if os.path.exists(input_path): 47 | if os.path.isfile(input_path): 48 | # handle index files 49 | with open(input_path) as f: 50 | for i in f.readlines(): 51 | taintmini.analyze_mini_program(str.strip(i), output_path, config, workers, bench) 52 | elif os.path.isdir(input_path): 53 | # handle single mini program 54 | taintmini.analyze_mini_program(input_path, output_path, config, workers, bench) 55 | else: 56 | print(f"[main] error: invalid input path") 57 | 58 | 59 | if __name__ == "__main__": 60 | main() 61 | -------------------------------------------------------------------------------- /pdg_js/README.md: -------------------------------------------------------------------------------- 1 | # pdg_js 2 | 3 | Statically building the enhanced AST (with control and data flow, as well as pointer analysis information) for JavaScript inputs (sometimes referred to as PDG). 4 | 5 | 6 | ## Setup (if not already done for DoubleX) 7 | 8 | ``` 9 | install python3 # (tested with 3.7.3 and 3.7.4) 10 | 11 | install nodejs 12 | install npm 13 | cd src/pdg_js 14 | npm install esprima # (tested with 4.0.1) 15 | npm install escodegen # (tested with 1.14.2 and 2.0.0) 16 | cd .. 17 | ``` 18 | 19 | To install graphviz (only for drawing graphs, not yet documented, please open an issue if interested) 20 | ``` 21 | pip3 install graphviz 22 | On MacOS: install brew and then brew install graphviz 23 | On Linux: sudo apt-get install graphviz 24 | ``` 25 | 26 | ## Usage 27 | 28 | ### PDG Generation - Multiprocessing 29 | 30 | Let's consider a directory `EXTENSIONS` containing several extension's folders. For each extension, their corresponding folder contains *.js files for each component. We would like to generate the PDGs (= ASTs enhanced with control and data flow, and pointer analysis) of each file. For each extension, the corresponding PDG will be stored in the folder `PDG`. 31 | To generate these PDGs, launch the following shell command from the `pdg_js` folder location: 32 | ``` 33 | $ python3 -c "from build_pdg import store_extension_pdg_folder; store_extension_pdg_folder('EXTENSIONS')" 34 | ``` 35 | 36 | The corresponding PDGs will be stored in EXTENSIONS/\/PDG`. 37 | 38 | Currently, we are using 1 CPU, but you can change that by modifying the variable NUM\_WORKERS from `pdg_js/utility_df.py` (the one **line 51**). 39 | 40 | 41 | ### Single PDG Generation 42 | 43 | To generate the PDG of a specific *.js file, launch the following python3 commands from the `pdg_js` folder location: 44 | ``` 45 | >>> from build_pdg import get_data_flow 46 | >>> pdg = get_data_flow('INPUT_FILE', benchmarks=dict()) 47 | ``` 48 | 49 | Per default, the corresponding PDG will not be stored. To store it in an **existing** PDG\_PATH folder, call: 50 | ``` 51 | $ python3 -c "from build_pdg import get_data_flow; get_data_flow('INPUT_FILE', benchmarks=dict(), store_pdgs='PDG_PATH')" 52 | ``` 53 | 54 | 55 | Note that we added a timeout of 10 min for the data flow/pointer analysis (cf. line 149 of `pdg_js/build_pdg.py`), and a memory limit of 20GB (cf. line 115 of `pdg_js/build_pdg.py`). -------------------------------------------------------------------------------- /pdg_js/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OSUSecLab/TaintMini/bbf0af5801c9b40f95dff82a040a11e9433a8ed5/pdg_js/__init__.py -------------------------------------------------------------------------------- /pdg_js/build_ast.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2021 Aurore Fass 2 | # 3 | # This program is free software: you can redistribute it and/or modify 4 | # it under the terms of the GNU Affero General Public License as published 5 | # by the Free Software Foundation, either version 3 of the License, or 6 | # (at your option) any later version. 7 | # 8 | # This program is distributed in the hope that it will be useful, 9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 11 | # GNU Affero General Public License for more details. 12 | # 13 | # You should have received a copy of the GNU Affero General Public License 14 | # along with this program. If not, see . 15 | 16 | 17 | """ 18 | From JS source code to an Esprima AST exported in JSON. 19 | From JSON to ExtendedAst and Node objects. 20 | From Node objects to JSON. 21 | From JSON to JS source code using Escodegen. 22 | """ 23 | 24 | # Note: improved from HideNoSeek (bugs correction + semantic information to the nodes) 25 | 26 | 27 | import logging 28 | import json 29 | import os 30 | import subprocess 31 | 32 | from . import node as _node 33 | from . import extended_ast as _extended_ast 34 | 35 | SRC_PATH = os.path.abspath(os.path.join(os.path.dirname(__file__))) 36 | 37 | 38 | def get_extended_ast(input_file, json_path, remove_json=False): 39 | """ 40 | JavaScript AST production. 41 | 42 | ------- 43 | Parameters: 44 | - input_file: str 45 | Path of the file to produce an AST from. 46 | - json_path: str 47 | Path of the JSON file to temporary store the AST in. 48 | - remove_json: bool 49 | Indicates whether to remove or not the JSON file containing the Esprima AST. 50 | Default: True. 51 | 52 | ------- 53 | Returns: 54 | - ExtendedAst 55 | The extended AST (i.e., contains type, filename, body, sourceType, range, comments, 56 | tokens, and possibly leadingComments) of input_file. 57 | - None if an error occurred. 58 | """ 59 | 60 | try: 61 | produce_ast = subprocess.run(['node', os.path.join(SRC_PATH, 'parser.js'), 62 | input_file, json_path], 63 | stdout=subprocess.PIPE, check=True) 64 | except subprocess.CalledProcessError: 65 | logging.critical('Esprima parsing error for %s', input_file) 66 | return None 67 | 68 | if produce_ast.returncode == 0: 69 | 70 | with open(json_path) as json_data: 71 | esprima_ast = json.loads(json_data.read()) 72 | if remove_json: 73 | os.remove(json_path) 74 | 75 | extended_ast = _extended_ast.ExtendedAst() 76 | extended_ast.filename = input_file 77 | extended_ast.set_type(esprima_ast['type']) 78 | extended_ast.set_body(esprima_ast['body']) 79 | extended_ast.set_source_type(esprima_ast['sourceType']) 80 | extended_ast.set_range(esprima_ast['range']) 81 | extended_ast.set_tokens(esprima_ast['tokens']) 82 | extended_ast.set_comments(esprima_ast['comments']) 83 | if 'leadingComments' in esprima_ast: 84 | extended_ast.set_leading_comments(esprima_ast['leadingComments']) 85 | 86 | return extended_ast 87 | 88 | logging.critical('Esprima could not produce an AST for %s', input_file) 89 | return None 90 | 91 | 92 | def indent(depth_dict): 93 | """ Indentation size. """ 94 | return '\t' * depth_dict 95 | 96 | 97 | def brace(key): 98 | """ Write a word between cases. """ 99 | return '|<' + key + '>' 100 | 101 | 102 | def print_dict(depth_dict, key, value, max_depth, delete_leaf): 103 | """ Print the content of a dict with specific indentation and braces for the keys. """ 104 | if depth_dict <= max_depth: 105 | print('%s%s' % (indent(depth_dict), brace(key))) 106 | beautiful_print_ast(value, depth=depth_dict + 1, max_depth=max_depth, 107 | delete_leaf=delete_leaf) 108 | 109 | 110 | def print_value(depth_dict, key, value, max_depth, delete_leaf): 111 | """ Print a dict value with respect to the indentation. """ 112 | if depth_dict <= max_depth: 113 | if all(dont_consider != key for dont_consider in delete_leaf): 114 | print(indent(depth_dict) + "| %s = %s" % (key, value)) 115 | 116 | 117 | def beautiful_print_ast(ast, delete_leaf, depth=0, max_depth=2 ** 63): 118 | """ 119 | Walking through an AST and printing it beautifully 120 | 121 | ------- 122 | Parameters: 123 | - ast: dict 124 | Contains an Esprima AST of a JS file, i.e., get_extended_ast(, ) 125 | output or get_extended_ast(, ).get_ast() output. 126 | - depth: int 127 | Initial depth of the tree. Default: 0. 128 | - max_depth: int 129 | Indicates the depth up to which the AST is printed. Default: 2**63. 130 | - delete_leaf: list 131 | Contains the leaf that should not be printed (e.g. 'range'). Default: [''], 132 | beware it is mutable. 133 | """ 134 | 135 | for k, v in ast.items(): # Because need k everywhere 136 | if isinstance(v, dict): 137 | print_dict(depth, k, v, max_depth, delete_leaf) 138 | elif isinstance(v, list): 139 | if not v: 140 | print_value(depth, k, v, max_depth, delete_leaf) 141 | for el in v: 142 | if isinstance(el, dict): 143 | print_dict(depth, k, el, max_depth, delete_leaf) 144 | else: 145 | print_value(depth, k, el, max_depth, delete_leaf) 146 | else: 147 | print_value(depth, k, v, max_depth, delete_leaf) 148 | 149 | 150 | def create_node(dico, node_body, parent_node, cond=False, filename=''): 151 | """ Node creation. """ 152 | 153 | if dico is None: # Not a Node, but needed a construct to store, e.g., [, a] = array 154 | node = _node.Node(name='None', parent=parent_node) 155 | parent_node.set_child(node) 156 | node.set_body(node_body) 157 | if cond: 158 | node.set_body_list(True) 159 | node.filename = filename 160 | 161 | elif 'type' in dico: 162 | if dico['type'] == 'FunctionDeclaration': 163 | node = _node.FunctionDeclaration(name=dico['type'], parent=parent_node) 164 | elif dico['type'] == 'FunctionExpression' or dico['type'] == 'ArrowFunctionExpression': 165 | node = _node.FunctionExpression(name=dico['type'], parent=parent_node) 166 | elif dico['type'] == 'ReturnStatement': 167 | node = _node.ReturnStatement(name=dico['type'], parent=parent_node) 168 | elif dico['type'] in _node.STATEMENTS: 169 | node = _node.Statement(name=dico['type'], parent=parent_node) 170 | elif dico['type'] in _node.VALUE_EXPR: 171 | node = _node.ValueExpr(name=dico['type'], parent=parent_node) 172 | elif dico['type'] == 'Identifier': 173 | node = _node.Identifier(name=dico['type'], parent=parent_node) 174 | else: 175 | node = _node.Node(name=dico['type'], parent=parent_node) 176 | 177 | if not node.is_comment(): # Otherwise comments are children and it is getting messy! 178 | parent_node.set_child(node) 179 | node.set_body(node_body) 180 | if cond: 181 | node.set_body_list(True) # Some attributes are stored in a list even when they 182 | # are alone. If we do not respect the initial syntax, Escodegen cannot built the 183 | # JS code back. 184 | node.filename = filename 185 | ast_to_ast_nodes(dico, node) 186 | 187 | 188 | def ast_to_ast_nodes(ast, ast_nodes=_node.Node('Program')): 189 | """ 190 | Convert an AST to Node objects. 191 | 192 | ------- 193 | Parameters: 194 | - ast: dict 195 | Output of get_extended_ast(, ).get_ast(). 196 | - ast_nodes: Node 197 | Current Node to be built. Default: ast_nodes=Node('Program'). Beware, always call the 198 | function indicating the default argument, otherwise the last value will be used 199 | (because the default parameter is mutable). 200 | 201 | ------- 202 | Returns: 203 | - Node 204 | The AST in format Node object. 205 | """ 206 | 207 | if 'filename' in ast: 208 | filename = ast['filename'] 209 | ast_nodes.set_attribute('filename', filename) 210 | else: 211 | filename = '' 212 | 213 | for k in ast: 214 | if k == 'filename' or k == 'loc' or k == 'range' or k == 'value' \ 215 | or (k != 'type' and not isinstance(ast[k], list) 216 | and not isinstance(ast[k], dict)) or k == 'regex': 217 | ast_nodes.set_attribute(k, ast[k]) # range is a list but stored as attributes 218 | if isinstance(ast[k], dict): 219 | if k == 'range': # Case leadingComments as range: {0: begin, 1: end} 220 | ast_nodes.set_attribute(k, ast[k]) 221 | else: 222 | create_node(dico=ast[k], node_body=k, parent_node=ast_nodes, filename=filename) 223 | elif isinstance(ast[k], list): 224 | if not ast[k]: # Case with empty list, e.g. params: [] 225 | ast_nodes.set_attribute(k, ast[k]) 226 | for el in ast[k]: 227 | if isinstance(el, dict): 228 | create_node(dico=el, node_body=k, parent_node=ast_nodes, cond=True, 229 | filename=filename) 230 | elif el is None: # Case [None, {stuff about a}] for [, a] = array 231 | create_node(dico=el, node_body=k, parent_node=ast_nodes, cond=True, 232 | filename=filename) 233 | return ast_nodes 234 | 235 | 236 | def print_ast_nodes(ast_nodes): 237 | """ 238 | Print the Nodes of ast_nodes with their properties. 239 | Debug function. 240 | 241 | ------- 242 | Parameters: 243 | - ast_nodes: Node 244 | Output of ast_to_ast_nodes(, ast_nodes=Node('Program')). 245 | """ 246 | 247 | for child in ast_nodes.children: 248 | print('Parent: ' + child.parent.name) 249 | print('Child: ' + child.name) 250 | print('Id: ' + str(child.id)) 251 | print('Attributes:') 252 | print(child.attributes) 253 | print('Body: ' + str(child.body)) 254 | print('Body_list: ' + str(child.body_list)) 255 | print('Is-leaf: ' + str(child.is_leaf())) 256 | print('-----------------------') 257 | print_ast_nodes(child) 258 | 259 | 260 | def build_json(ast_nodes, dico): 261 | """ 262 | Convert an AST format Node objects to JSON format. 263 | 264 | ------- 265 | Parameters: 266 | - ast_nodes: Node 267 | Output of ast_to_ast_nodes(, ast_nodes=Node('Program')). 268 | - dico: dict 269 | Current dict to be built. 270 | 271 | ------- 272 | Returns: 273 | - dict 274 | The AST in format JSON. 275 | """ 276 | 277 | if ast_nodes.name != 'None': # Nothing interesting in the None Node 278 | dico['type'] = ast_nodes.name 279 | if len(ast_nodes.children) >= 1: 280 | for child in ast_nodes.children: 281 | dico2 = {} 282 | if child.body_list: 283 | if child.body not in dico: 284 | dico[child.body] = [] # Some attributes just have to be stored in a list. 285 | build_json(child, dico2) 286 | if not dico2: # Case [, a] = array -> [None, {stuff about a}] (None and not {}) 287 | dico2 = None # Not sure if it could not be legitimate sometimes 288 | logging.warning('Transformed {} into None for Escodegen; was it legitimate?') 289 | dico[child.body].append(dico2) 290 | else: 291 | build_json(child, dico2) 292 | dico[child.body] = dico2 293 | elif ast_nodes.body_list == 'special': 294 | dico[ast_nodes.body] = [] 295 | else: 296 | pass 297 | for att in ast_nodes.attributes: 298 | dico[att] = ast_nodes.attributes[att] 299 | return dico 300 | 301 | 302 | def save_json(ast_nodes, json_path): 303 | """ 304 | Stores an AST format Node objects in a JSON file. 305 | 306 | ------- 307 | Parameters: 308 | - ast_nodes: Node 309 | Output of ast_to_ast_nodes(, ast_nodes=Node('Program')). 310 | - json_path: str 311 | Path of the JSON file to store the AST in. 312 | """ 313 | 314 | data = build_json(ast_nodes, dico={}) 315 | with open(json_path, 'w') as json_data: 316 | json.dump(data, json_data, indent=4) 317 | 318 | 319 | def get_code(json_path, code_path='1', remove_json=True, test=False): 320 | """ 321 | Convert JSON format back to JavaScript code. 322 | 323 | ------- 324 | Parameters: 325 | - json_path: str 326 | Path of the JSON file to build the code from. 327 | - code_path: str 328 | Path of the file to store the code in. If 1, then displays it to stdout. 329 | - remove_json: bool 330 | Indicates whether to remove or not the JSON file containing the Esprima AST. 331 | Default: True. 332 | - test: bool 333 | Indicates whether we are in test mode. Default: False. 334 | """ 335 | 336 | try: 337 | code = subprocess.run(['node', os.path.join(SRC_PATH, 'generate_js.js'), 338 | json_path, code_path], 339 | stdout=subprocess.PIPE, check=True) 340 | except subprocess.CalledProcessError: 341 | logging.exception('Something went wrong to get the code from the AST for %s', json_path) 342 | return None 343 | 344 | if remove_json: 345 | os.remove(json_path) 346 | if code.returncode != 0: 347 | logging.error('Something wrong happened while converting JS back to code for %s', json_path) 348 | return None 349 | 350 | if code_path == '1': 351 | if test: 352 | print((code.stdout.decode('utf-8')).replace('\n', '')) 353 | return (code.stdout.decode('utf-8')).replace('\n', '') 354 | 355 | return code_path 356 | -------------------------------------------------------------------------------- /pdg_js/build_pdg.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2021 Aurore Fass 2 | # Copyright (C) 2022 Anonymous 3 | # 4 | # This program is free software: you can redistribute it and/or modify 5 | # it under the terms of the GNU Affero General Public License as published 6 | # by the Free Software Foundation, either version 3 of the License, or 7 | # (at your option) any later version. 8 | # 9 | # This program is distributed in the hope that it will be useful, 10 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 11 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 12 | # GNU Affero General Public License for more details. 13 | # 14 | # You should have received a copy of the GNU Affero General Public License 15 | # along with this program. If not, see . 16 | 17 | 18 | """ 19 | Generation and storage of JavaScript PDGs. Possibility for multiprocessing (NUM_WORKERS 20 | defined in utility_df.py). 21 | """ 22 | 23 | import os 24 | import pickle 25 | import logging 26 | import timeit 27 | import json 28 | from multiprocessing import Process, Queue 29 | 30 | from . import node as _node 31 | from . import build_ast 32 | from . import utility_df 33 | from . import control_flow 34 | from . import data_flow 35 | from . import scope as _scope 36 | from . import display_graph 37 | 38 | # Builds the JS code from the AST, or not, to check for possible bugs in the AST building process. 39 | CHECK_JSON = utility_df.CHECK_JSON 40 | 41 | 42 | def pickle_dump_process(dfg_nodes, store_pdg): 43 | """ Call to pickle.dump """ 44 | pickle.dump(dfg_nodes, open(store_pdg, 'wb')) 45 | 46 | 47 | def function_hoisting(node, entry): 48 | """ Hoists FunctionDeclaration at the beginning of a basic block = Function bloc. """ 49 | 50 | # Will avoid problem if function first called and then defined 51 | for child in node.children: 52 | if child.name == 'FunctionDeclaration': 53 | child.adopt_child(step_daddy=entry) # Sets new parent and deletes old one 54 | function_hoisting(child, entry=child) # New basic block = FunctionDeclaration = child 55 | elif child.name == 'FunctionExpression': 56 | function_hoisting(child, entry=child) # New basic block = FunctionExpression = child 57 | else: 58 | function_hoisting(child, entry=entry) # Current basic block = entry 59 | 60 | 61 | def traverse(node): 62 | """ Debug function, traverse node. """ 63 | 64 | for child in node.children: 65 | print(child.name) 66 | traverse(child) 67 | 68 | 69 | def get_data_flow_process(js_path, benchmarks, store_pdgs): 70 | """ Call to get_data_flow. """ 71 | 72 | try: 73 | # save_path_pdg = js_path.split(".")[0] 74 | get_data_flow(input_file=js_path, benchmarks=benchmarks, store_pdgs=store_pdgs, 75 | beautiful_print=False, check_json=False, save_path_pdg=False) 76 | except Exception as e: 77 | print(e) 78 | raise e 79 | 80 | 81 | def get_data_flow(input_file, benchmarks, store_pdgs=None, check_var=False, beautiful_print=False, 82 | save_path_ast=False, save_path_cfg=False, save_path_pdg=False, 83 | check_json=CHECK_JSON, alt_json_path=None): 84 | """ 85 | Builds the PDG: enhances the AST with CF, DF, and pointer analysis for a given file. 86 | 87 | ------- 88 | Parameters: 89 | - input_file: str 90 | Path of the file to analyze. 91 | - benchmarks: dict 92 | Contains the different micro benchmarks. Should be empty. 93 | - store_pdgs: str 94 | Path of the folder to store the PDG in. 95 | Or None to pursue without storing it. 96 | - check_var: bool 97 | Returns the unknown variables (not the PDG). 98 | - save_path_ast / cfg / pdg: 99 | False --> does neither produce nor store the graphical representation; 100 | None --> produces + displays the graphical representation; 101 | Valid-path --> produces + stores the graphical representation under the name Valid-path. 102 | - beautiful_print: bool 103 | Whether to beautiful print the AST or not. 104 | - check_json: bool 105 | Builds the JS code from the AST, or not, to check for bugs in the AST building process. 106 | 107 | ------- 108 | Returns: 109 | - Node 110 | PDG of the file. 111 | - or None if problems to build the PDG. 112 | - or list of unknown variables if check_var is True. 113 | """ 114 | 115 | start = timeit.default_timer() 116 | utility_df.limit_memory(20*10**9) # Limiting the memory usage to 20GB 117 | if input_file.endswith('.js'): 118 | esprima_json = input_file.replace('.js', '.json') 119 | else: 120 | esprima_json = input_file + '.json' 121 | 122 | if alt_json_path is not None: 123 | if not os.path.exists(alt_json_path): 124 | os.mkdir(alt_json_path) 125 | esprima_json = os.path.join(alt_json_path, esprima_json[1:]) 126 | extended_ast = build_ast.get_extended_ast(input_file, esprima_json) 127 | 128 | benchmarks['errors'] = [] 129 | 130 | if extended_ast is not None: 131 | benchmarks['got AST'] = timeit.default_timer() - start 132 | start = utility_df.micro_benchmark('Successfully got Esprima AST in', 133 | timeit.default_timer() - start) 134 | ast = extended_ast.get_ast() 135 | if beautiful_print: 136 | build_ast.beautiful_print_ast(ast, delete_leaf=[]) 137 | ast_nodes = build_ast.ast_to_ast_nodes(ast, ast_nodes=_node.Node('Program')) 138 | function_hoisting(ast_nodes, ast_nodes) # Hoists FunDecl at a basic block's beginning 139 | 140 | benchmarks['AST'] = timeit.default_timer() - start 141 | start = utility_df.micro_benchmark('Successfully produced the AST in', 142 | timeit.default_timer() - start) 143 | if save_path_ast is not False: 144 | display_graph.draw_ast(ast_nodes, attributes=True, save_path=save_path_ast) 145 | 146 | cfg_nodes = control_flow.control_flow(ast_nodes) 147 | benchmarks['CFG'] = timeit.default_timer() - start 148 | start = utility_df.micro_benchmark('Successfully produced the CFG in', 149 | timeit.default_timer() - start) 150 | if save_path_cfg is not False: 151 | display_graph.draw_cfg(cfg_nodes, attributes=True, save_path=save_path_cfg) 152 | 153 | unknown_var = [] 154 | try: 155 | with utility_df.Timeout(600): # Tries to produce DF within 10 minutes 156 | scopes = [_scope.Scope('Global')] 157 | dfg_nodes, scopes = data_flow.df_scoping(cfg_nodes, scopes=scopes, 158 | id_list=[], entry=1) 159 | # This may have to be added if we want to make the fake hoisting work 160 | # dfg_nodes = data_flow.df_scoping(dfg_nodes, scopes=scopes, id_list=[], entry=1)[0] 161 | except utility_df.Timeout.Timeout: 162 | logging.critical('Building the PDG timed out for %s', input_file) 163 | benchmarks['errors'].append('pdg-timeout') 164 | return _node.Node('Program') # Empty PDG to avoid trying to get the children of None 165 | 166 | # except MemoryError: # Catching it will catch ALL memory errors, 167 | # while we just want to avoid getting over our 20GB limit 168 | # logging.critical('Too much memory used for %s', input_file) 169 | # return _node.Node('Program') # Empty PDG to avoid trying to get the children of None 170 | 171 | benchmarks['PDG'] = timeit.default_timer() - start 172 | utility_df.micro_benchmark('Successfully produced the PDG in', 173 | timeit.default_timer() - start) 174 | if save_path_pdg is not False: 175 | display_graph.draw_pdg(dfg_nodes, attributes=True, save_path=save_path_pdg) 176 | 177 | if check_json: # Looking for possible bugs when building the AST / json doc in build_ast 178 | my_json = esprima_json.replace('.json', '-back.json') 179 | build_ast.save_json(dfg_nodes, my_json) 180 | print(build_ast.get_code(my_json)) 181 | 182 | if check_var: 183 | for scope in scopes: 184 | for unknown in scope.unknown_var: 185 | if not unknown.data_dep_parents: 186 | # If DD: not unknown, can happen because of hoisting FunctionDeclaration 187 | # After second function run, not unknown anymore 188 | logging.warning('The variable %s is not declared in the scope %s', 189 | unknown.attributes['name'], scope.name) 190 | unknown_var.append(unknown) 191 | return unknown_var 192 | 193 | if store_pdgs is not None: 194 | store_pdg = os.path.join(store_pdgs, os.path.basename(input_file.replace('.js', ''))) 195 | pickle_dump_process(dfg_nodes, store_pdg) 196 | json_analysis = os.path.join(store_pdgs, os.path.basename(esprima_json)) 197 | with open(json_analysis, 'w') as json_data: 198 | json.dump(benchmarks, json_data, indent=4, sort_keys=False, default=default, 199 | skipkeys=True) 200 | return dfg_nodes 201 | benchmarks['errors'].append('parsing-error') 202 | return _node.Node('ParsingError') # Empty PDG to avoid trying to get the children of None 203 | 204 | 205 | def default(o): 206 | """ To avoid TypeError, conversion of problematic objects into str. """ 207 | 208 | return str(o) 209 | 210 | 211 | def handle_one_pdg(root, js, store_pdgs): 212 | """ Stores the PDG of js located in root, in store_pdgs. """ 213 | 214 | benchmarks = dict() 215 | if js.endswith('.js'): 216 | print(os.path.join(store_pdgs, js.replace('.js', ''))) 217 | js_path = os.path.join(root, js) 218 | if not os.path.isfile(js_path): 219 | logging.error('The path %s does not exist', js_path) 220 | return False 221 | # Some PDGs lead to Segfault, avoids killing the current process 222 | p = Process(target=get_data_flow_process, args=(js_path, benchmarks, store_pdgs)) 223 | p.start() 224 | p.join() 225 | if p.exitcode != 0: 226 | logging.critical('Something wrong occurred with %s PDG generation', js_path) 227 | return False 228 | return True 229 | 230 | 231 | def worker(my_queue): 232 | """ Worker """ 233 | 234 | while True: 235 | try: 236 | root, js, store_pdgs = my_queue.get(timeout=2) 237 | handle_one_pdg(root, js, store_pdgs) 238 | except Exception as e: 239 | logging.exception(e) 240 | break 241 | 242 | 243 | def store_pdg_folder(folder_js): 244 | """ 245 | Stores the PDGs of the JS files from folder_js. 246 | 247 | ------- 248 | Parameter: 249 | - folder_js: str 250 | Path of the folder containing the files to get the PDG of. 251 | """ 252 | 253 | start = timeit.default_timer() 254 | 255 | my_queue = Queue() 256 | workers = list() 257 | 258 | if not os.path.exists(folder_js): 259 | logging.exception('The path %s does not exist', folder_js) 260 | return 261 | store_pdgs = os.path.join(folder_js, 'PDG') 262 | if not os.path.exists(store_pdgs): 263 | os.makedirs(store_pdgs) 264 | 265 | for root, _, files in os.walk(folder_js): 266 | for js in files: 267 | my_queue.put([root, js, store_pdgs]) 268 | 269 | for _ in range(utility_df.NUM_WORKERS): 270 | p = Process(target=worker, args=(my_queue,)) 271 | p.start() 272 | print("Starting process") 273 | workers.append(p) 274 | 275 | for w in workers: 276 | w.join() 277 | 278 | utility_df.micro_benchmark('Total elapsed time:', timeit.default_timer() - start) 279 | 280 | 281 | def store_extension_pdg_folder(extensions_path): 282 | """ Stores the PDGs of all JS files contained in all extensions_path's folders. TO CALL""" 283 | 284 | start = timeit.default_timer() 285 | 286 | my_queue = Queue() 287 | workers = list() 288 | 289 | for extension_folder in os.listdir(extensions_path): 290 | extension_path = os.path.join(extensions_path, extension_folder) 291 | if os.path.isdir(extension_path): 292 | extension_pdg_path = os.path.join(extension_path, 'PDG') 293 | if not os.path.exists(extension_pdg_path): 294 | os.makedirs(extension_pdg_path) 295 | for component in os.listdir(extension_path): 296 | # To handle only files not handled yet 297 | # if not os.path.isfile(os.path.join(extension_pdg_path, 298 | # os.path.basename(component).replace('.js', 299 | # ''))): 300 | my_queue.put([extension_path, component, extension_pdg_path]) 301 | 302 | for _ in range(utility_df.NUM_WORKERS): 303 | p = Process(target=worker, args=(my_queue,)) 304 | p.start() 305 | print("Starting process") 306 | workers.append(p) 307 | 308 | for w in workers: 309 | w.join() 310 | 311 | utility_df.micro_benchmark('Total elapsed time:', timeit.default_timer() - start) 312 | -------------------------------------------------------------------------------- /pdg_js/control_flow.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2021 Aurore Fass 2 | # 3 | # This program is free software: you can redistribute it and/or modify 4 | # it under the terms of the GNU Affero General Public License as published 5 | # by the Free Software Foundation, either version 3 of the License, or 6 | # (at your option) any later version. 7 | # 8 | # This program is distributed in the hope that it will be useful, 9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 11 | # GNU Affero General Public License for more details. 12 | # 13 | # You should have received a copy of the GNU Affero General Public License 14 | # along with this program. If not, see . 15 | 16 | 17 | """ 18 | Adds control flow to the AST. 19 | """ 20 | 21 | # Note: slightly improved from HideNoSeek 22 | 23 | 24 | from . import node as _node 25 | 26 | 27 | def link_expression(node, node_parent): 28 | """ Non-statement node. """ 29 | if node.is_comment(): 30 | pass 31 | else: 32 | node_parent.set_statement_dependency(extremity=node) 33 | return node 34 | 35 | 36 | def epsilon_statement_cf(node): 37 | """ Non-conditional statements. """ 38 | for child in node.children: 39 | if isinstance(child, _node.Statement): 40 | node.set_control_dependency(extremity=child, label='e') 41 | else: 42 | link_expression(node=child, node_parent=node) 43 | 44 | 45 | def do_while_cf(node): 46 | """ DoWhileStatement. """ 47 | # Element 0: body (Statement) 48 | # Element 1: test (Expression) 49 | node.set_control_dependency(extremity=node.children[0], label=True) 50 | link_expression(node=node.children[1], node_parent=node) 51 | 52 | 53 | def for_cf(node): 54 | """ ForStatement. """ 55 | # Element 0: init 56 | # Element 1: test (Expression) 57 | # Element 2: update (Expression) 58 | # Element 3: body (Statement) 59 | """ ForOfStatement. """ 60 | # Element 0: left 61 | # Element 1: right 62 | # Element 2: body (Statement) 63 | i = 0 64 | for child in node.children: 65 | if child.body != 'body': 66 | link_expression(node=child, node_parent=node) 67 | elif not child.is_comment(): 68 | node.set_control_dependency(extremity=child, label=True) 69 | i += 1 70 | 71 | 72 | def if_cf(node): 73 | """ IfStatement. """ 74 | # Element 0: test (Expression) 75 | # Element 1: consequent (Statement) 76 | # Element 2: alternate (Statement) 77 | link_expression(node=node.children[0], node_parent=node) 78 | if len(node.children) > 1: # Not sure why, but can happen... 79 | node.set_control_dependency(extremity=node.children[1], label=True) 80 | if len(node.children) > 2: 81 | if node.children[2].is_comment(): 82 | pass 83 | else: 84 | node.set_control_dependency(extremity=node.children[2], label=False) 85 | 86 | 87 | def try_cf(node): 88 | """ TryStatement. """ 89 | # Element 0: block (Statement) 90 | # Element 1: handler (Statement) / finalizer (Statement) 91 | # Element 2: finalizer (Statement) 92 | node.set_control_dependency(extremity=node.children[0], label=True) 93 | if node.children[1].body == 'handler': 94 | node.set_control_dependency(extremity=node.children[1], label=False) 95 | else: # finalizer 96 | node.set_control_dependency(extremity=node.children[1], label='e') 97 | if len(node.children) > 2: 98 | if node.children[2].body == 'finalizer': 99 | node.set_control_dependency(extremity=node.children[2], label='e') 100 | 101 | 102 | def while_cf(node): 103 | """ WhileStatement. """ 104 | # Element 0: test (Expression) 105 | # Element 1: body (Statement) 106 | link_expression(node=node.children[0], node_parent=node) 107 | node.set_control_dependency(extremity=node.children[1], label=True) 108 | 109 | 110 | def switch_cf(node): 111 | """ SwitchStatement. """ 112 | # Element 0: discriminant 113 | # Element 1: cases (SwitchCase) 114 | 115 | switch_cases = node.children 116 | link_expression(node=switch_cases[0], node_parent=node) 117 | if len(switch_cases) > 1: 118 | # SwitchStatement -> True -> SwitchCase for first one 119 | node.set_control_dependency(extremity=switch_cases[1], label='e') 120 | switch_case_cf(switch_cases[1]) 121 | for i in range(2, len(switch_cases)): 122 | if switch_cases[i].is_comment(): 123 | pass 124 | else: 125 | # SwitchCase -> False -> SwitchCase for the other ones 126 | switch_cases[i - 1].set_control_dependency(extremity=switch_cases[i], label=False) 127 | if i != len(switch_cases) - 1: 128 | switch_case_cf(switch_cases[i]) 129 | else: # Because the last switch is executed per default, i.e. without condition 1st 130 | switch_case_cf(switch_cases[i], last=True) 131 | # Otherwise, we could just have a switch(something) {} 132 | 133 | 134 | def switch_case_cf(node, last=False): 135 | """ SwitchCase. """ 136 | # Element 0: test 137 | # Element 1: consequent (Statement) 138 | nb_child = len(node.children) 139 | if nb_child > 1: 140 | if not last: # As all switches but the last have to respect a condition to enter the branch 141 | link_expression(node=node.children[0], node_parent=node) 142 | j = 1 143 | else: 144 | j = 0 145 | for i in range(j, nb_child): 146 | if node.children[i].is_comment(): 147 | pass 148 | else: 149 | node.set_control_dependency(extremity=node.children[i], label=True) 150 | elif nb_child == 1: 151 | node.set_control_dependency(extremity=node.children[0], label=True) 152 | 153 | 154 | def conditional_statement_cf(node): 155 | """ For the conditional nodes. """ 156 | if node.name == 'DoWhileStatement': 157 | do_while_cf(node) 158 | elif node.name == 'ForStatement' or node.name == 'ForOfStatement'\ 159 | or node.name == 'ForInStatement': 160 | for_cf(node) 161 | elif node.name == 'IfStatement' or node.name == 'ConditionalExpression': 162 | if_cf(node) 163 | elif node.name == 'WhileStatement': 164 | while_cf(node) 165 | elif node.name == 'TryStatement': 166 | try_cf(node) 167 | elif node.name == 'SwitchStatement': 168 | switch_cf(node) 169 | elif node.name == 'SwitchCase': 170 | pass # Already handled in SwitchStatement 171 | 172 | 173 | def control_flow(ast_nodes): 174 | """ 175 | Enhance the AST by adding statement and control dependencies to each Node. 176 | 177 | ------- 178 | Parameters: 179 | - ast_nodes: Node 180 | Output of ast_to_ast_nodes(, ast_nodes=Node('Program')). 181 | 182 | ------- 183 | Returns: 184 | - Node 185 | With statement and control dependencies added. 186 | """ 187 | 188 | for child in ast_nodes.children: 189 | if child.name in _node.EPSILON or child.name in _node.UNSTRUCTURED: 190 | epsilon_statement_cf(child) 191 | elif child.name in _node.CONDITIONAL: 192 | conditional_statement_cf(child) 193 | else: 194 | for grandchild in child.children: 195 | link_expression(node=grandchild, node_parent=child) 196 | control_flow(child) 197 | return ast_nodes 198 | -------------------------------------------------------------------------------- /pdg_js/display_graph.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2021 Aurore Fass 2 | # 3 | # This program is free software: you can redistribute it and/or modify 4 | # it under the terms of the GNU Affero General Public License as published 5 | # by the Free Software Foundation, either version 3 of the License, or 6 | # (at your option) any later version. 7 | # 8 | # This program is distributed in the hope that it will be useful, 9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 11 | # GNU Affero General Public License for more details. 12 | # 13 | # You should have received a copy of the GNU Affero General Public License 14 | # along with this program. If not, see . 15 | 16 | # Additional permission under GNU GPL version 3 section 7 17 | # 18 | # If you modify this Program, or any covered work, by linking or combining it with 19 | # graphviz (or a modified version of that library), containing parts covered by the 20 | # terms of The Common Public License, the licensors of this Program grant you 21 | # additional permission to convey the resulting work. 22 | 23 | 24 | """ 25 | Display graphs (AST, CFG, PDG) using the graphviz library. 26 | """ 27 | 28 | import graphviz 29 | 30 | from . import node as _node 31 | 32 | 33 | def append_leaf_attr(node, graph): 34 | """ 35 | Append the leaf's attribute to the graph in graphviz format. 36 | 37 | ------- 38 | Parameters: 39 | - node: Node 40 | Node. 41 | - graph: Digraph/Graph 42 | Graph object. Be careful it is mutable. 43 | """ 44 | 45 | if node.is_leaf(): 46 | leaf_id = str(node.id) + 'leaf_' 47 | graph.attr('node', style='filled', color='lightgoldenrodyellow', 48 | fillcolor='lightgoldenrodyellow') 49 | graph.attr('edge', color='orange') 50 | got_attr, node_attributes = node.get_node_attributes() 51 | if got_attr: # Got attributes 52 | leaf_attr = str(node_attributes) 53 | graph.node(leaf_id, leaf_attr) 54 | graph.edge(str(node.id), leaf_id) 55 | 56 | 57 | def produce_ast(ast_nodes, attributes, graph=graphviz.Graph(comment='AST representation')): 58 | """ 59 | Produce an AST in graphviz format. 60 | 61 | ------- 62 | Parameters: 63 | - ast_nodes: Node 64 | Output of ast_to_ast_nodes(, ast_nodes=Node('Program')). 65 | - graph: Graph 66 | Graph object. Be careful it is mutable. 67 | - attributes: bool 68 | Whether to display the leaf attributes or not. 69 | 70 | ------- 71 | Returns: 72 | - graph 73 | graphviz formatted graph. 74 | """ 75 | 76 | graph.attr('node', color='black', style='filled', fillcolor='white') 77 | graph.attr('edge', color='black') 78 | graph.node(str(ast_nodes.id), ast_nodes.name) 79 | for child in ast_nodes.children: 80 | graph.attr('node', color='black', style='filled', fillcolor='white') 81 | graph.attr('edge', color='black') 82 | graph.edge(str(ast_nodes.id), str(child.id)) 83 | produce_ast(child, attributes, graph) 84 | if attributes: 85 | append_leaf_attr(child, graph) 86 | return graph 87 | 88 | 89 | def draw_ast(ast_nodes, attributes=False, save_path=None): 90 | """ 91 | Plot an AST. 92 | 93 | ------- 94 | Parameters: 95 | - ast_nodes: Node 96 | Output of ast_to_ast_nodes(, ast_nodes=Node('Program')). 97 | - save_path: str 98 | Path of the file to store the AST in. 99 | - attributes: bool 100 | Whether to display the leaf attributes or not. Default: False. 101 | """ 102 | 103 | dot = produce_ast(ast_nodes, attributes) 104 | if save_path is None: 105 | dot.view() 106 | else: 107 | dot.render(save_path, view=False) 108 | graphviz.render(filepath=save_path, engine='dot', format='eps') 109 | dot.clear() 110 | 111 | 112 | def cfg_type_node(child): 113 | """ Different form according to statement node or not. """ 114 | 115 | if isinstance(child, _node.Statement) or child.is_comment(): 116 | return ['box', 'red', 'lightpink'] 117 | return ['ellipse', 'blue', 'lightblue2'] 118 | 119 | 120 | def produce_cfg_one_child(child, data_flow, attributes, 121 | graph=graphviz.Digraph(comment='Control flow representation')): 122 | """ 123 | Produce a CFG in graphviz format. 124 | 125 | ------- 126 | Parameters: 127 | - child: Node 128 | Node to begin with. 129 | - data_flow: bool 130 | Whether to display the data flow or not. Default: False. 131 | - attributes: bool 132 | Whether to display the leaf attributes or not. 133 | - graph: Digraph 134 | Graph object. Be careful it is mutable. 135 | 136 | ------- 137 | Returns: 138 | - graph 139 | graphviz formatted graph. 140 | """ 141 | 142 | type_node = cfg_type_node(child) 143 | graph.attr('node', shape=type_node[0], style='filled', color=type_node[2], 144 | fillcolor=type_node[2]) 145 | graph.attr('edge', color=type_node[1]) 146 | graph.node(str(child.id), child.name) 147 | 148 | for child_statement_dep in child.statement_dep_children: 149 | child_statement = child_statement_dep.extremity 150 | type_node = cfg_type_node(child_statement) 151 | graph.attr('node', shape=type_node[0], color=type_node[2], fillcolor=type_node[2]) 152 | graph.attr('edge', color=type_node[1]) 153 | graph.edge(str(child.id), str(child_statement.id), label=child_statement_dep.label) 154 | produce_cfg_one_child(child_statement, data_flow=data_flow, attributes=attributes, 155 | graph=graph) 156 | if attributes: 157 | append_leaf_attr(child_statement, graph) 158 | 159 | if isinstance(child, _node.Statement): 160 | for child_cf_dep in child.control_dep_children: 161 | child_cf = child_cf_dep.extremity 162 | type_node = cfg_type_node(child_cf) 163 | graph.attr('node', shape=type_node[0], color=type_node[2], fillcolor=type_node[2]) 164 | graph.attr('edge', color=type_node[1]) 165 | graph.edge(str(child.id), str(child_cf.id), label=str(child_cf_dep.label)) 166 | produce_cfg_one_child(child_cf, data_flow=data_flow, attributes=attributes, graph=graph) 167 | if attributes: 168 | append_leaf_attr(child_cf, graph) 169 | 170 | if data_flow: 171 | graph.attr('edge', color='green') 172 | if isinstance(child, _node.Identifier): 173 | for child_data_dep in child.data_dep_children: 174 | child_data = child_data_dep.extremity 175 | type_node = cfg_type_node(child) 176 | graph.attr('node', shape=type_node[0], color=type_node[2], fillcolor=type_node[2]) 177 | graph.edge(str(child.id), str(child_data.id), label=child_data_dep.label) 178 | # No call to the func as already recursive for data/statmt dep on the same nodes 179 | # logging.info("Data dependency on the variable " + child_data.attributes['name']) 180 | graph.attr('edge', color='seagreen') 181 | if hasattr(child, 'fun_param_parents'): # Function parameters flow 182 | for child_param in child.fun_param_parents: 183 | type_node = cfg_type_node(child) 184 | graph.attr('node', shape=type_node[0], color=type_node[2], fillcolor=type_node[2]) 185 | graph.edge(str(child.id), str(child_param.id), label='param') 186 | 187 | return graph 188 | 189 | 190 | def draw_cfg(cfg_nodes, attributes=False, save_path=None): 191 | """ 192 | Plot a CFG. 193 | 194 | ------- 195 | Parameters: 196 | - cfg_nodes: Node 197 | Output of produce_cfg(ast_to_ast_nodes(, ast_nodes=Node('Program'))). 198 | - save_path: str 199 | Path of the file to store the CFG in. 200 | - attributes: bool 201 | Whether to display the leaf attributes or not. Default: False. 202 | """ 203 | 204 | dot = graphviz.Digraph() 205 | for child in cfg_nodes.children: 206 | dot = produce_cfg_one_child(child=child, data_flow=False, attributes=attributes) 207 | if save_path is None: 208 | dot.view() 209 | else: 210 | dot.render(save_path, view=False) 211 | graphviz.render(filepath=save_path, engine='dot', format='eps') 212 | dot.clear() 213 | 214 | 215 | def draw_pdg(dfg_nodes, attributes=False, save_path=None): 216 | """ 217 | Plot a PDG. 218 | 219 | ------- 220 | Parameters: 221 | - dfg_nodes: Node 222 | Output of produce_dfg(produce_cfg(ast_to_ast_nodes(, ast_nodes=Node('Program')))). 223 | - save_path: str 224 | Path of the file to store the PDG in. 225 | - attributes: bool 226 | Whether to display the leaf attributes or not. Default: False. 227 | """ 228 | 229 | dot = graphviz.Digraph() 230 | for child in dfg_nodes.children: 231 | dot = produce_cfg_one_child(child=child, data_flow=True, attributes=attributes) 232 | if save_path is None: 233 | dot.view() 234 | else: 235 | dot.render(save_path, view=False) 236 | graphviz.render(filepath=save_path, engine='dot', format='eps') 237 | dot.clear() 238 | -------------------------------------------------------------------------------- /pdg_js/extended_ast.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2021 Aurore Fass 2 | # 3 | # This program is free software: you can redistribute it and/or modify 4 | # it under the terms of the GNU Affero General Public License as published 5 | # by the Free Software Foundation, either version 3 of the License, or 6 | # (at your option) any later version. 7 | # 8 | # This program is distributed in the hope that it will be useful, 9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 11 | # GNU Affero General Public License for more details. 12 | # 13 | # You should have received a copy of the GNU Affero General Public License 14 | # along with this program. If not, see . 15 | 16 | 17 | """ 18 | Definition of the class ExtendedAst: corresponds to the output of Esprima's parse function 19 | with the arguments: {range: true, loc: true, tokens: true, tolerant: true, comment: true}. 20 | """ 21 | 22 | # Note: slightly improved from HideNoSeek 23 | 24 | 25 | class ExtendedAst: 26 | """ Stores the Esprima formatted AST into python objects. """ 27 | 28 | def __init__(self): 29 | self.type = None 30 | self.filename = '' 31 | self.body = [] 32 | self.source_type = None 33 | self.range = [] 34 | self.comments = [] 35 | self.tokens = [] 36 | self.leading_comments = [] 37 | 38 | def get_type(self): 39 | return self.type 40 | 41 | def set_type(self, root): 42 | self.type = root 43 | 44 | def get_body(self): 45 | return self.body 46 | 47 | def set_body(self, body): 48 | self.body = body 49 | 50 | def get_extended_ast(self): 51 | return {'type': self.get_type(), 'body': self.get_body(), 52 | 'sourceType': self.get_source_type(), 'range': self.get_range(), 53 | 'comments': self.get_comments(), 'tokens': self.get_tokens(), 54 | 'filename': self.filename, 55 | 'leadingComments': self.get_leading_comments()} 56 | 57 | def get_ast(self): 58 | return {'type': self.get_type(), 'body': self.get_body(), 'filename': self.filename} 59 | 60 | def get_source_type(self): 61 | return self.source_type 62 | 63 | def set_source_type(self, source_type): 64 | self.source_type = source_type 65 | 66 | def get_range(self): 67 | return self.range 68 | 69 | def set_range(self, ast_range): 70 | self.range = ast_range 71 | 72 | def get_comments(self): 73 | return self.comments 74 | 75 | def set_comments(self, comments): 76 | self.comments = comments 77 | 78 | def get_tokens(self): 79 | return self.tokens 80 | 81 | def set_tokens(self, tokens): 82 | self.tokens = tokens 83 | 84 | def get_leading_comments(self): 85 | return self.leading_comments 86 | 87 | def set_leading_comments(self, leading_comments): 88 | self.leading_comments = leading_comments 89 | -------------------------------------------------------------------------------- /pdg_js/js_operators.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2021 Aurore Fass 2 | # 3 | # This program is free software: you can redistribute it and/or modify 4 | # it under the terms of the GNU Affero General Public License as published 5 | # by the Free Software Foundation, either version 3 of the License, or 6 | # (at your option) any later version. 7 | # 8 | # This program is distributed in the hope that it will be useful, 9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 11 | # GNU Affero General Public License for more details. 12 | # 13 | # You should have received a copy of the GNU Affero General Public License 14 | # along with this program. If not, see . 15 | 16 | 17 | """ 18 | Operators computation for pointer analysis; computing values of variables. 19 | """ 20 | 21 | import logging 22 | 23 | from . import node as _node 24 | 25 | """ 26 | In the following, 27 | - node: Node 28 | Current node. 29 | - initial_node: Node 30 | Node, which we leveraged to compute the value of node (for provenance purpose). 31 | """ 32 | 33 | 34 | def get_node_value(node, initial_node=None, recdepth=0, recvisited=None): 35 | """ Gets the value of node, depending on its type. """ 36 | 37 | if recvisited is None: 38 | recvisited = set() 39 | 40 | if isinstance(node, _node.ValueExpr): 41 | if node.value is not None: # Special case if node references a Node whose value changed 42 | return node.value 43 | 44 | got_attr, node_attributes = node.get_node_attributes() 45 | if got_attr: # Got attributes, returns the value 46 | return node_attributes 47 | 48 | logging.debug('Getting the value from %s', node.name) 49 | 50 | if node.name == 'UnaryExpression': 51 | return compute_unary_expression(node, initial_node=initial_node, 52 | recdepth=recdepth + 1, recvisited=recvisited) 53 | if node.name in ('BinaryExpression', 'LogicalExpression'): 54 | return compute_binary_expression(node, initial_node=initial_node, 55 | recdepth=recdepth + 1, recvisited=recvisited) 56 | if node.name == 'ArrayExpression': 57 | return node 58 | if node.name in ('ObjectExpression', 'ObjectPattern'): 59 | return node 60 | if node.name == 'MemberExpression': 61 | return compute_member_expression(node, initial_node=initial_node, 62 | recdepth=recdepth + 1, recvisited=recvisited) 63 | if node.name == 'ThisExpression': 64 | return 'this' 65 | if isinstance(node, _node.FunctionExpression): 66 | return compute_function_expression(node) 67 | if node.name == 'CallExpression' and isinstance(node.children[0], _node.FunctionExpression): 68 | return node.children[0].fun_name # Function called; mapping to the function name if any 69 | if node.name in _node.CALL_EXPR: 70 | return compute_call_expression(node, initial_node=initial_node, 71 | recdepth=recdepth + 1, recvisited=recvisited) 72 | if node.name == 'ReturnStatement' or node.name == 'BlockStatement': 73 | if node.children: 74 | return get_node_computed_value(node.children[0], initial_node=initial_node, 75 | recdepth=recdepth + 1, recvisited=recvisited) 76 | return None 77 | if node.name == 'TemplateLiteral': 78 | return compute_template_literal(node, initial_node=initial_node, 79 | recdepth=recdepth + 1, recvisited=recvisited) 80 | if node.name == 'ConditionalExpression': 81 | return compute_conditional_expression(node, initial_node=initial_node, 82 | recdepth=recdepth + 1, recvisited=recvisited) 83 | if node.name == 'AssignmentExpression': 84 | return compute_assignment_expression(node, initial_node=initial_node, 85 | recdepth=recdepth + 1, recvisited=recvisited) 86 | if node.name == 'UpdateExpression': 87 | return get_node_computed_value(node.children[0], initial_node=initial_node, 88 | recdepth=recdepth + 1, recvisited=recvisited) 89 | 90 | for child in node.children: 91 | get_node_computed_value(child, initial_node=initial_node, 92 | recdepth=recdepth + 1, recvisited=recvisited) 93 | 94 | logging.warning('Could not get the value of the node %s, whose attributes are %s', 95 | node.name, node.attributes) 96 | 97 | return None 98 | 99 | 100 | def get_node_computed_value(node, initial_node=None, keep_none=False, recdepth=0, recvisited=None): 101 | """ Computes the value of node, depending on its type. """ 102 | 103 | if recvisited is None: 104 | recvisited = set() 105 | 106 | logging.debug("Visiting node: %s", node.attributes) 107 | 108 | if node in recvisited: 109 | if isinstance(node, _node.Value): 110 | logging.debug("Revisiting node: %s %s (value: %s)", node.attributes, initial_node, 111 | node.value) 112 | return node.value 113 | logging.debug("Revisiting node: %s %s (none)", node.attributes, initial_node) 114 | return None 115 | recvisited.add(node) 116 | if recdepth > 1000: 117 | logging.debug("Recursion depth for get_node_computed_value exceeded: %s", node.attributes) 118 | if hasattr(node, "value"): 119 | return node.value 120 | return None 121 | 122 | value = None 123 | if isinstance(initial_node, _node.Value): 124 | logging.debug('%s is depending on %s', initial_node.attributes, node.attributes) 125 | initial_node.set_provenance(node) 126 | 127 | if isinstance(node, _node.Value): # if we already know the value 128 | value = node.value # might be directly a value (int/str) or a Node referring to the value 129 | logging.debug('Computing the value of an %s node, got %s', node.name, value) 130 | 131 | if isinstance(value, _node.Node): # node.value is a Node 132 | # computing actual value 133 | if node.value != node: 134 | value = get_node_computed_value(node.value, initial_node=initial_node, 135 | recdepth=recdepth + 1, recvisited=recvisited) 136 | logging.debug('Its value is a node, computed it and got %s', value) 137 | 138 | if value is None and not keep_none: # node is not an Identifier or is None 139 | # keep_none True is just for display_temp, to avoid having an Identifier variable with 140 | # None value being equal to the variable because of the call to get_node_value on itself 141 | value = get_node_value(node, initial_node=initial_node, 142 | recdepth=recdepth + 1, recvisited=recvisited) 143 | logging.debug('The value should be computed, got %s', value) 144 | 145 | if isinstance(node, _node.Value) and node.name not in _node.CALL_EXPR: 146 | # Do not store value for CallExpr as could have changed and should be recomputed 147 | node.set_value(value) # Stores the value so as not to compute it again 148 | 149 | return value 150 | 151 | 152 | def compute_operators(operator, node_a, node_b, initial_node=None, recdepth=0, recvisited=None): 153 | """ Evaluates node_a operator node_b. """ 154 | 155 | if isinstance(node_a, _node.Node): # Standard case 156 | if isinstance(node_a, _node.Identifier): 157 | # If it is an Identifier, it should have a value, possibly None. 158 | # But the value should not be the Identifier's name. 159 | a = get_node_computed_value(node_a, initial_node=initial_node, keep_none=True, 160 | recdepth=recdepth + 1, recvisited=recvisited) 161 | else: 162 | a = get_node_computed_value(node_a, initial_node=initial_node, 163 | recdepth=recdepth + 1, recvisited=recvisited) 164 | else: # Specific to compute_binary_expression 165 | a = node_a # node_a may not be a Node but already a computed result 166 | if isinstance(node_b, _node.Node): # Standard case 167 | if isinstance(node_b, _node.Identifier): 168 | b = get_node_computed_value(node_b, initial_node=initial_node, keep_none=True, 169 | recdepth=recdepth + 1, recvisited=recvisited) 170 | else: 171 | b = get_node_computed_value(node_b, initial_node=initial_node, 172 | recdepth=recdepth + 1, recvisited=recvisited) 173 | else: # Specific to compute_binary_expression 174 | b = node_b # node_b may not be a Node but already a computed result 175 | 176 | if not isinstance(a, (int, float)) or not isinstance(b, (int, float)): 177 | if operator in ('+=', '+') and (isinstance(a, str) or isinstance(b, str)): 178 | return operator_plus(a, b) 179 | if a is None or b is None: 180 | return None 181 | if (not isinstance(a, str) or isinstance(a, str) and not '.' in a)\ 182 | and (not isinstance(b, str) or isinstance(b, str) and not '.' in b): 183 | # So that if MemExpr could not be fully computed we do not take any hasty decisions 184 | # For ex: data.message.split(-).1.toUpperCase() == POST is undecidable for us 185 | # But abc == abc is not 186 | pass 187 | else: 188 | logging.warning('Unable to compute %s %s %s', a, operator, b) 189 | return None 190 | 191 | try: 192 | if operator in ('+=', '+'): 193 | return operator_plus(a, b) 194 | if operator in ('-=', '-'): 195 | return operator_minus(a, b) 196 | if operator in ('*=', '*'): 197 | return operator_asterisk(a, b) 198 | if operator in ('/=', '/'): 199 | return operator_slash(a, b) 200 | if operator in ('**=', '**'): 201 | return operator_2asterisk(a, b) 202 | if operator in ('%=', '%'): 203 | return operator_modulo(a, b) 204 | if operator == '++': 205 | return operator_plus_plus(a) 206 | if operator == '--': 207 | return operator_minus_minus(a) 208 | if operator in ('==', '==='): 209 | return operator_equal(a, b) 210 | if operator in ('!=', '!=='): 211 | return operator_different(a, b) 212 | if operator == '!': 213 | return operator_not(a) 214 | if operator == '>=': 215 | return operator_bigger_equal(a, b) 216 | if operator == '>': 217 | return operator_bigger(a, b) 218 | if operator == '<=': 219 | return operator_smaller_equal(a, b) 220 | if operator == '<': 221 | return operator_smaller(a, b) 222 | if operator == '&&': 223 | return operator_and(a, b) 224 | if operator == '||': 225 | return operator_or(a, b) 226 | if operator in ('&', '>>', '>>>', '<<', '^', '|', '&=', '>>=', '>>>=', '<<=', '^=', '|=', 227 | 'in', 'instanceof'): 228 | logging.warning('Currently not handling the operator %s', operator) 229 | return None 230 | 231 | except TypeError: 232 | logging.warning('Type problem, could not compute %s %s %s', a, operator, b) 233 | return None 234 | 235 | logging.error('Unknown operator %s', operator) 236 | return None 237 | 238 | 239 | def compute_unary_expression(node, initial_node, recdepth=0, recvisited=None): 240 | """ Evaluates an UnaryExpression node. """ 241 | 242 | compute_unary = get_node_computed_value(node.children[0], initial_node=initial_node, 243 | recdepth=recdepth + 1, recvisited=recvisited) 244 | if compute_unary is None: 245 | return None 246 | if isinstance(compute_unary, bool): 247 | return not compute_unary 248 | if isinstance(compute_unary, (int, float)): 249 | return - compute_unary 250 | if isinstance(compute_unary, str): # So as not to lose the current compute_unary value 251 | return node.attributes['operator'] + compute_unary # Adds the UnaryOp before value 252 | 253 | logging.warning('Could not compute the unary operation %s on %s', 254 | node.attributes['operator'], compute_unary) 255 | return None 256 | 257 | 258 | def compute_binary_expression(node, initial_node, recdepth=0, recvisited=None): 259 | """ Evaluates a BinaryExpression node. """ 260 | 261 | operator = node.attributes['operator'] 262 | node_a = node.children[0] 263 | node_b = node.children[1] 264 | 265 | # node_a operator node_b 266 | return compute_operators(operator, node_a, node_b, initial_node=initial_node, 267 | recdepth=recdepth, recvisited=recvisited) 268 | 269 | 270 | def compute_member_expression(node, initial_node, compute=True, recdepth=0, recvisited=None): 271 | """ Evaluates a MemberExpression node. """ 272 | 273 | obj = node.children[0] 274 | prop = node.children[1] 275 | prop_value = get_node_computed_value(prop, initial_node=initial_node, recdepth=recdepth + 1, 276 | recvisited=recvisited) # Computes the value 277 | obj_value = get_node_computed_value(obj, initial_node=initial_node, 278 | recdepth=recdepth + 1, recvisited=recvisited) 279 | if obj.name == 'ThisExpression' or obj_value in _node.GLOBAL_VAR: 280 | return prop_value 281 | 282 | if not isinstance(obj_value, _node.Node): 283 | # Specific case if we changed an Array/Object type 284 | # var my_array = [[1]]; my_array[0] = 18; e = my_array[0][0]; -> e undefined hence None 285 | # If ArrayExpression or ObjectExpression, we are trying to access an element that does not 286 | # exist anymore, will be displayed as .prop 287 | # Otherwise: obj.prop 288 | if isinstance(obj_value, list): # Special case for TaggedTemplateExpression 289 | if isinstance(prop_value, int): 290 | try: 291 | return obj_value[prop_value] # Params passed in obj_value, cf. data_flow 292 | except IndexError as e: 293 | logging.exception(e) 294 | logging.exception('Could not get the property %s of %s', prop_value, obj_value) 295 | return None 296 | elif isinstance(obj_value, dict): # Special case for already defined objects with new prop 297 | if prop_value in obj_value: 298 | return obj_value[prop_value] # ex: localStorage.firstTime 299 | return None 300 | return display_member_expression_value(node, '', initial_node=initial_node)[0:-1] 301 | 302 | # obj_value.prop_value or obj_value[prop_value] 303 | if obj_value.name == 'Literal' or obj_value.name == 'Identifier': 304 | member_expression_value = obj_value # We already have the value 305 | else: 306 | if isinstance(prop_value, str): # obj_value.prop_value -> prop_value str = object property 307 | obj_prop_list = [] 308 | search_object_property(obj_value, prop_value, obj_prop_list) 309 | if obj_prop_list: # Stores all matches 310 | member_expression_value = None 311 | for obj_prop in obj_prop_list: 312 | member_expression_value, worked = get_property_value(obj_prop, 313 | initial_node=initial_node, 314 | recdepth=recdepth + 1, 315 | recvisited=recvisited) 316 | if worked: # Takes the first one that is working 317 | break 318 | else: 319 | member_expression_value = None 320 | logging.warning('Could not get the property %s of the %s with value %s', 321 | prop_value, obj.name, obj_value) 322 | elif isinstance(prop_value, int): # obj_value[prop_value] -> prop_value int = array index 323 | if len(obj_value.children) > prop_value: 324 | member_expression_value = obj_value.children[prop_value] # We fetch the value 325 | else: 326 | member_expression_value = display_member_expression_value\ 327 | (node, '', initial_node=initial_node)[0:-1] 328 | else: 329 | logging.error('Expected an str or an int, got a %s', type(prop_value)) 330 | member_expression_value = None 331 | 332 | if compute and isinstance(member_expression_value, _node.Node): 333 | # Computes the value 334 | return get_node_computed_value(member_expression_value, initial_node=initial_node, 335 | recdepth=recdepth + 1) 336 | 337 | return member_expression_value # Returns the node referencing the value 338 | 339 | 340 | def search_object_property(node, prop, found_list): 341 | """ Search in an object definition where a given property (-> prop = str) is defined. 342 | Storing all the matches in case the first one is not the right one, e.g., 343 | var obj = { 344 | f1: function(a) {obj.f2(1)}, 345 | f2: function(a) {} 346 | }; 347 | obj.f2(); 348 | By looking for f2, the 1st match is wrong and will lead to an error, the 2nd one is correct.""" 349 | 350 | if 'name' in node.attributes: 351 | if isinstance(prop, str): 352 | if node.attributes['name'] == prop: 353 | # prop is already the value 354 | found_list.append(node) 355 | elif 'value' in node.attributes: 356 | if isinstance(prop, str): 357 | if node.attributes['value'] == prop: 358 | # prop is already the value 359 | found_list.append(node) 360 | 361 | for child in node.children: 362 | search_object_property(child, prop, found_list) 363 | 364 | 365 | def get_property_value(node, initial_node, recdepth=0, recvisited=None): 366 | """ Get the value of an object's property. """ 367 | 368 | if (isinstance(node, _node.Identifier) or node.name == 'Literal')\ 369 | and node.parent.name == 'Property': 370 | prop_value = node.parent.children[1] 371 | if prop_value.name == 'Literal': 372 | return prop_value, True 373 | return get_node_computed_value(prop_value, initial_node=initial_node, recdepth=recdepth + 1, 374 | recvisited=recvisited), True 375 | 376 | logging.warning('Trying to get the property value of %s whose parent is %s', 377 | node.name, node.parent.name) 378 | return None, False 379 | 380 | 381 | def compute_function_expression(node): 382 | """ Computes a (Arrow)FunctionExpression node. """ 383 | 384 | fun_name = node.fun_name 385 | if fun_name is not None: 386 | return fun_name # Mapping to the function's name if any 387 | return node # Otherwise mapping to the FunExpr handler 388 | 389 | 390 | def compute_call_expression(node, initial_node, recdepth=0, recvisited=None): 391 | """ Gets the value of CallExpression with parameters. """ 392 | 393 | if isinstance(initial_node, _node.Value): 394 | initial_node.set_provenance(node) 395 | 396 | callee = node.children[0] 397 | params = '(' 398 | 399 | for arg in range(1, len(node.children)): 400 | # Computes the value of the arguments: a.b...(arg1, arg2...) 401 | params += str(get_node_computed_value(node.children[arg], initial_node=initial_node, 402 | recdepth=recdepth + 1, recvisited=recvisited)) 403 | if arg < len(node.children) - 1: 404 | params += ', ' 405 | 406 | params += ')' 407 | 408 | if isinstance(callee, _node.Identifier): 409 | return str(get_node_computed_value(callee, initial_node=initial_node, 410 | recdepth=recdepth + 1, recvisited=recvisited)) + params 411 | 412 | if callee.name == 'MemberExpression': 413 | value = display_member_expression_value(callee, '', initial_node=initial_node) 414 | value = value[0:-1] + params 415 | return value 416 | # return compute_member_expression(callee) + params # To test if problems here 417 | 418 | if callee.name in _node.CALL_EXPR: 419 | if get_node_computed_value(callee, initial_node=initial_node, recdepth=recdepth + 1, 420 | recvisited=recvisited) is None or params is None: 421 | return None 422 | return get_node_computed_value(callee, initial_node=initial_node, 423 | recdepth=recdepth + 1, recvisited=recvisited) + params 424 | 425 | if callee.name == 'LogicalExpression': # a || b, if a not False a otherwise b 426 | if get_node_computed_value(callee.children[0], initial_node=initial_node, 427 | recdepth=recdepth + 1, recvisited=recvisited) is False: 428 | return get_node_computed_value(callee.children[1], initial_node=initial_node, 429 | recdepth=recdepth + 1, recvisited=recvisited) 430 | return get_node_computed_value(callee.children[0], initial_node=initial_node, 431 | recdepth=recdepth + 1, recvisited=recvisited) 432 | 433 | logging.error('Got a CallExpression on %s with attributes %s and id %s', 434 | callee.name, callee.attributes, callee.id) 435 | return None 436 | 437 | 438 | def compute_template_literal(node, initial_node, recdepth=0, recvisited=None): 439 | """ Gets the value of TemplateLiteral. """ 440 | 441 | template_element = [] # Seems that TemplateElement = similar to Literal and in front 442 | expressions = [] # vs. Expressions has to be computed and are at the end 443 | template_literal = '' 444 | 445 | for child in node.children: 446 | if child.name == 'TemplateElement': # Either that 447 | template_element.append(child) 448 | else: # Or Expressions 449 | expressions.append(child) 450 | 451 | len_template_element = len(template_element) 452 | len_expressions = len(expressions) 453 | 454 | if len_template_element != len_expressions + 1: 455 | logging.error('Unexpected %s with %s TemplateElements and %s Expressions', node.type, 456 | len_template_element, len_expressions) 457 | return None 458 | 459 | for i in range(len_expressions): 460 | # Will concatenate: 1 TemplateElement, 1 Expr, ..., 1 TemplateElement 461 | template_literal += str(get_node_computed_value(template_element[i], 462 | initial_node=initial_node, 463 | recdepth=recdepth + 1, 464 | recvisited=recvisited)) \ 465 | + str(get_node_computed_value(expressions[i], 466 | initial_node=initial_node, 467 | recdepth=recdepth + 1, 468 | recvisited=recvisited)) 469 | template_literal += str(get_node_computed_value(template_element[len_template_element - 1], 470 | initial_node=initial_node, 471 | recdepth=recdepth + 1, 472 | recvisited=recvisited)) 473 | 474 | return template_literal 475 | 476 | 477 | def display_member_expression_value(node, value, initial_node): 478 | """ Displays the value of elements from a MemberExpression. """ 479 | 480 | for child in node.children: 481 | if child.name == 'MemberExpression': 482 | value = display_member_expression_value(child, value, initial_node=initial_node) 483 | else: 484 | value += str(get_node_computed_value(child, initial_node=initial_node)) + '.' 485 | return value 486 | 487 | 488 | def compute_object_expr(node, initial_node): 489 | """ For debug: displays the content of an ObjectExpression. """ 490 | 491 | node_value = '{' 492 | 493 | for prop in node.children: 494 | key = prop.children[0] 495 | key_value = get_node_computed_value(key, initial_node=initial_node) 496 | value = prop.children[1] 497 | value_value = get_node_computed_value(value, initial_node=initial_node) 498 | 499 | prop_value = str(key_value) + ': ' + str(value_value) 500 | node_value += '\n\t' + prop_value 501 | 502 | node_value += '\n}' 503 | return node_value 504 | 505 | 506 | def compute_conditional_expression(node, initial_node, recdepth=0, recvisited=None): 507 | """ Gets the value of a ConditionalExpression. """ 508 | 509 | test = get_node_computed_value(node.children[0], initial_node=initial_node, 510 | recdepth=recdepth + 1, recvisited=recvisited) 511 | consequent = get_node_computed_value(node.children[1], initial_node=initial_node, 512 | recdepth=recdepth + 1, recvisited=recvisited) 513 | alternate = get_node_computed_value(node.children[2], initial_node=initial_node, 514 | recdepth=recdepth + 1, recvisited=recvisited) 515 | if not isinstance(test, bool): 516 | test = None # So that must be either True, False or None 517 | if test is None: 518 | return [alternate, consequent] 519 | if test: 520 | return consequent 521 | return alternate 522 | 523 | 524 | def compute_assignment_expression(node, initial_node, recdepth=0, recvisited=None): 525 | """ Computes the value of an AssignmentExpression node. """ 526 | 527 | var = node.children[0] # Value coming from the right: a = b = value, computing a knowing b 528 | if isinstance(var, _node.Value) and var.value is not None: 529 | return var.value 530 | return get_node_computed_value(var, initial_node=initial_node, 531 | recdepth=recdepth + 1, recvisited=recvisited) 532 | 533 | 534 | def operator_plus(a, b): 535 | """ Evaluates a + b. """ 536 | if isinstance(a, str) or isinstance(b, str): 537 | return str(a) + str(b) 538 | return a + b 539 | 540 | 541 | def operator_minus(a, b): 542 | """ Evaluates a - b. """ 543 | return a - b 544 | 545 | 546 | def operator_asterisk(a, b): 547 | """ Evaluates a * b. """ 548 | return a * b 549 | 550 | 551 | def operator_slash(a, b): 552 | """ Evaluates a / b. """ 553 | if b == 0: 554 | logging.warning('Trying to compute %s / %s', a, b) 555 | return None 556 | return a / b 557 | 558 | 559 | def operator_2asterisk(a, b): 560 | """ Evaluates a ** b. """ 561 | return a ** b 562 | 563 | 564 | def operator_modulo(a, b): 565 | """ Evaluates a % b. """ 566 | return a % b 567 | 568 | 569 | def operator_plus_plus(a): 570 | """ Evaluates a++. """ 571 | return a + 1 572 | 573 | 574 | def operator_minus_minus(a): 575 | """ Evaluates a--. """ 576 | return a - 1 577 | 578 | 579 | def operator_equal(a, b): 580 | """ Evaluates a == b. """ 581 | return a == b 582 | 583 | 584 | def operator_different(a, b): 585 | """ Evaluates a != b. """ 586 | return a != b 587 | 588 | 589 | def operator_not(a): 590 | """ Evaluates !a. """ 591 | return not a 592 | 593 | 594 | def operator_bigger_equal(a, b): 595 | """ Evaluates a >= b. """ 596 | return a >= b 597 | 598 | 599 | def operator_bigger(a, b): 600 | """ Evaluates a > b. """ 601 | return a > b 602 | 603 | 604 | def operator_smaller_equal(a, b): 605 | """ Evaluates a <= b. """ 606 | return a <= b 607 | 608 | 609 | def operator_smaller(a, b): 610 | """ Evaluates a < b. """ 611 | return a < b 612 | 613 | 614 | def operator_and(a, b): 615 | """ Evaluates a and b. """ 616 | return a and b 617 | 618 | 619 | def operator_or(a, b): 620 | """ Evaluates a or b. """ 621 | return a or b 622 | -------------------------------------------------------------------------------- /pdg_js/js_reserved.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2021 Aurore Fass 2 | # 3 | # This program is free software: you can redistribute it and/or modify 4 | # it under the terms of the GNU Affero General Public License as published 5 | # by the Free Software Foundation, either version 3 of the License, or 6 | # (at your option) any later version. 7 | # 8 | # This program is distributed in the hope that it will be useful, 9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 11 | # GNU Affero General Public License for more details. 12 | # 13 | # You should have received a copy of the GNU Affero General Public License 14 | # along with this program. If not, see . 15 | 16 | 17 | """ 18 | JavaScript reserved keywords or words known by the interpreter. 19 | """ 20 | 21 | # Note: slightly improved from HideNoSeek (browser extension keywords) 22 | 23 | 24 | RESERVED_WORDS = ["abstract", "arguments", "await", "boolean", "break", "byte", "case", "catch", 25 | "char", "class", "const", "continue", "debugger", "default", "delete", "do", 26 | "double", "else", "enum", "eval", "export", "extends", "false", "final", 27 | "finally", "float", "for", "function", "goto", "if", "implements", "import", "in", 28 | "instanceof", "int", "interface", "let", "long", "native", "new", "null", 29 | "package", "private", "protected", "public", "return", "short", "static", "super", 30 | "switch", "synchronized", "this", "throw", "throws", "transient", "true", "try", 31 | "typeof", "var", "void", "volatile", "while", "with", "yield", "Array", 32 | "Date", "eval", "function", "hasOwnProperty", "Infinity", "isFinite", "isNaN", 33 | "isPrototypeOf", "length", "Math", "NaN", "name", "Number", "Object", "prototype", 34 | "String", "toString", "undefined", "valueOf", "getClass", "java", "JavaArray", 35 | "javaClass", "JavaObject", "JavaPackage", "alert", "all", "anchor", "anchors", 36 | "area", "assign", "blur", "button", "checkbox", "clearInterval", "clearTimeout", 37 | "clientInformation", "close", "closed", "confirm", "constructor", "crypto", 38 | "decodeURI", "decodeURIComponent", "defaultStatus", "document", "element", 39 | "elements", "embed", "embeds", "encodeURI", "encodeURIComponent", "escape", 40 | "event", "fileUpload", "focus", "form", "forms", "frame", "innerHeight", 41 | "innerWidth", "layer", "layers", "link", "location", "mimeTypes", "navigate", 42 | "navigator", "frames", "frameRate", "hidden", "history", "image", "images", 43 | "offscreenBuffering", "open", "opener", "option", "outerHeight", "outerWidth", 44 | "packages", "pageXOffset", "pageYOffset", "parent", "parseFloat", "parseInt", 45 | "password", "pkcs11", "plugin", "prompt", "propertyIsEnum", "radio", "reset", 46 | "screenX", "screenY", "scroll", "secure", "select", "self", "setInterval", 47 | "setTimeout", "status", "submit", "taint", "text", "textarea", "top", "unescape", 48 | "untaint", "window", "onblur", "onclick", "onerror", "onfocus", "onkeydown", 49 | "onkeypress", "onkeyup", "onmouseover", "onload", "onmouseup", "onmousedown", 50 | "onsubmit", 51 | "define", "exports", "require", "each", "ActiveXObject", "console", "module", 52 | "Error", "TypeError", "RangeError", "RegExp", "Symbol", "Set"] 53 | 54 | 55 | BROWSER_EXTENSIONS = ['addEventListener', 'browser', 'chrome', 'localStorage', 'postMessage', 56 | 'Promise', 'JSON', 'XMLHttpRequest', '$', 'screen', 'CryptoJS'] 57 | 58 | KNOWN_WORDS_LOWER = [word.lower() for word in RESERVED_WORDS + BROWSER_EXTENSIONS] 59 | -------------------------------------------------------------------------------- /pdg_js/node.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2021 Aurore Fass 2 | # 3 | # This program is free software: you can redistribute it and/or modify 4 | # it under the terms of the GNU Affero General Public License as published 5 | # by the Free Software Foundation, either version 3 of the License, or 6 | # (at your option) any later version. 7 | # 8 | # This program is distributed in the hope that it will be useful, 9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 11 | # GNU Affero General Public License for more details. 12 | # 13 | # You should have received a copy of the GNU Affero General Public License 14 | # along with this program. If not, see . 15 | 16 | 17 | """ 18 | Definition of classes: 19 | - Dependence; 20 | - Node; 21 | - Value; 22 | - Identifier(Node, Value); 23 | - ValueExpr(Node, Value); 24 | - Statement(Node); 25 | - ReturnStatement(Statement, Value); 26 | - Function; 27 | - FunctionDeclaration(Statement, Function); 28 | - FunctionExpression(Node, Function) 29 | """ 30 | 31 | # Note: going significantly beyond the node structure of HideNoSeek: 32 | # semantic information to the nodes, which have different properties, e.g., DF on Identifier, 33 | # parameter flows, value handling, provenance tracking, etc 34 | 35 | 36 | import logging 37 | import random 38 | 39 | from . import utility_df 40 | 41 | EXPRESSIONS = ['AssignmentExpression', 'ArrayExpression', 'ArrowFunctionExpression', 42 | 'AwaitExpression', 'BinaryExpression', 'CallExpression', 'ClassExpression', 43 | 'ConditionalExpression', 'FunctionExpression', 'LogicalExpression', 44 | 'MemberExpression', 'NewExpression', 'ObjectExpression', 'SequenceExpression', 45 | 'TaggedTemplateExpression', 'ThisExpression', 'UnaryExpression', 'UpdateExpression', 46 | 'YieldExpression'] 47 | 48 | EPSILON = ['BlockStatement', 'DebuggerStatement', 'EmptyStatement', 49 | 'ExpressionStatement', 'LabeledStatement', 'ReturnStatement', 50 | 'ThrowStatement', 'WithStatement', 'CatchClause', 'VariableDeclaration', 51 | 'FunctionDeclaration', 'ClassDeclaration'] 52 | 53 | CONDITIONAL = ['DoWhileStatement', 'ForStatement', 'ForOfStatement', 'ForInStatement', 54 | 'IfStatement', 'SwitchCase', 'SwitchStatement', 'TryStatement', 55 | 'WhileStatement', 'ConditionalExpression'] 56 | 57 | UNSTRUCTURED = ['BreakStatement', 'ContinueStatement'] 58 | 59 | STATEMENTS = EPSILON + CONDITIONAL + UNSTRUCTURED 60 | CALL_EXPR = ['CallExpression', 'TaggedTemplateExpression', 'NewExpression'] 61 | VALUE_EXPR = ['Literal', 'ArrayExpression', 'ObjectExpression', 'ObjectPattern'] + CALL_EXPR 62 | COMMENTS = ['Line', 'Block'] 63 | 64 | GLOBAL_VAR = ['window', 'this', 'self', 'top', 'global', 'that'] 65 | 66 | LIMIT_SIZE = utility_df.LIMIT_SIZE # To avoid list values with over 1,000 characters 67 | 68 | 69 | class Dependence: 70 | """ For control, data, comment, and statement dependencies. """ 71 | 72 | def __init__(self, dependency_type, extremity, label, nearest_statement=None): 73 | self.type = dependency_type 74 | self.extremity = extremity 75 | self.nearest_statement = nearest_statement 76 | self.label = label 77 | 78 | 79 | class Node: 80 | """ Defines a Node that is used in the AST. """ 81 | 82 | id = random.randint(0, 2*32) # To limit id collision between 2 ASTs from separate processes 83 | 84 | def __init__(self, name, parent=None): 85 | self.name = name 86 | self.id = Node.id 87 | Node.id += 1 88 | self.filename = '' 89 | self.attributes = {} 90 | self.body = None 91 | self.body_list = False 92 | self.parent = parent 93 | self.children = [] 94 | self.statement_dep_parents = [] 95 | self.statement_dep_children = [] # Between Statement and their non-Statement descendants 96 | 97 | def is_leaf(self): 98 | return not self.children 99 | 100 | def set_attribute(self, attribute_type, node_attribute): 101 | self.attributes[attribute_type] = node_attribute 102 | 103 | def set_body(self, body): 104 | self.body = body 105 | 106 | def set_body_list(self, bool_body_list): 107 | self.body_list = bool_body_list 108 | 109 | def set_parent(self, parent): 110 | self.parent = parent 111 | 112 | def set_child(self, child): 113 | self.children.append(child) 114 | 115 | def adopt_child(self, step_daddy): # child = self changes parent 116 | old_parent = self.parent 117 | old_parent.children.remove(self) # Old parent does not point to the child anymore 118 | step_daddy.children.insert(0, self) # New parent points to the child 119 | self.set_parent(step_daddy) # The child points to its new parent 120 | 121 | def set_statement_dependency(self, extremity): 122 | self.statement_dep_children.append(Dependence('statement dependency', extremity, '')) 123 | extremity.statement_dep_parents.append(Dependence('statement dependency', self, '')) 124 | 125 | # def set_comment_dependency(self, extremity): 126 | # self.statement_dep_children.append(Dependence('comment dependency', extremity, 'c')) 127 | # extremity.statement_dep_parents.append(Dependence('comment dependency', self, 'c')) 128 | 129 | def is_comment(self): 130 | if self.name in COMMENTS: 131 | return True 132 | return False 133 | 134 | def get_node_attributes(self): 135 | """ Get the attributes regex, value or name of a node. """ 136 | node_attribute = self.attributes 137 | if 'regex' in node_attribute: 138 | regex = node_attribute['regex'] 139 | if isinstance(regex, dict) and 'pattern' in regex: 140 | return True, '/' + str(regex['pattern']) + '/' 141 | if 'value' in node_attribute: 142 | value = node_attribute['value'] 143 | if isinstance(value, dict) and 'raw' in value: 144 | return True, value['raw'] 145 | return True, node_attribute['value'] 146 | if 'name' in node_attribute: 147 | return True, node_attribute['name'] 148 | return False, None # Just None was a pb when used in get_node_value as value could be None 149 | 150 | def get_line(self): 151 | """ Gets the line number where a given node is defined. """ 152 | try: 153 | line_begin = self.attributes['loc']['start']['line'] 154 | line_end = self.attributes['loc']['end']['line'] 155 | return str(line_begin) + ' - ' + str(line_end) 156 | except KeyError: 157 | return None 158 | 159 | def get_file(self): 160 | parent = self 161 | while True: 162 | if parent is not None and parent.parent: 163 | parent = parent.parent 164 | else: 165 | break 166 | if parent is not None: 167 | if "filename" in parent.attributes: 168 | return parent.attributes["filename"] 169 | return '' 170 | 171 | 172 | def literal_type(literal_node): 173 | """ Gets the type of a Literal node. """ 174 | 175 | if 'value' in literal_node.attributes: 176 | literal = literal_node.attributes['value'] 177 | if isinstance(literal, str): 178 | return 'String' 179 | if isinstance(literal, int): 180 | return 'Int' 181 | if isinstance(literal, float): 182 | return 'Numeric' 183 | if isinstance(literal, bool): 184 | return 'Bool' 185 | if literal == 'null' or literal is None: 186 | return 'Null' 187 | if 'regex' in literal_node.attributes: 188 | return 'RegExp' 189 | logging.error('The literal %s has an unknown type', literal_node.attributes['raw']) 190 | return None 191 | 192 | 193 | def shorten_value_list(value_list, value_list_shortened, counter=0): 194 | """ When a value is a list, shorten it so that keep at most LIMIT_SIZE characters. """ 195 | 196 | for el in value_list: 197 | if isinstance(el, list): 198 | value_list_shortened.append([]) 199 | counter = shorten_value_list(el, value_list_shortened[-1], counter) 200 | if counter >= LIMIT_SIZE: 201 | return counter 202 | elif isinstance(el, str): 203 | counter += len(el) 204 | if counter < LIMIT_SIZE: 205 | value_list_shortened.append(el) 206 | else: 207 | counter += len(str(el)) 208 | if counter < LIMIT_SIZE: 209 | value_list_shortened.append(el) 210 | return counter 211 | 212 | 213 | def shorten_value_dict(value_dict, value_dict_shortened, counter=0, visited=None): 214 | """ When a value is a dict, shorten it so that keep at most LIMIT_SIZE characters. """ 215 | 216 | if visited is None: 217 | visited = set() 218 | if id(value_dict) in visited: 219 | return counter 220 | visited.add(id(value_dict)) 221 | 222 | for k, v in value_dict.items(): 223 | if isinstance(k, str): 224 | counter += len(k) 225 | if isinstance(v, list): 226 | value_dict_shortened[k] = [] 227 | counter = shorten_value_list(v, value_dict_shortened[k], counter) 228 | if counter >= LIMIT_SIZE: 229 | return counter 230 | elif isinstance(v, dict): 231 | value_dict_shortened[k] = {} 232 | if id(v) in visited: 233 | return counter 234 | counter = shorten_value_dict(v, value_dict_shortened[k], counter, visited) 235 | if counter >= LIMIT_SIZE: 236 | return counter 237 | elif isinstance(v, str): 238 | counter += len(v) 239 | if counter < LIMIT_SIZE: 240 | value_dict_shortened[k] = v 241 | else: 242 | counter += len(str(v)) 243 | if counter < LIMIT_SIZE: 244 | value_dict_shortened[k] = v 245 | return counter 246 | 247 | 248 | class Value: 249 | """ To store the value of a specific node. """ 250 | 251 | def __init__(self): 252 | self.value = None 253 | self.update_value = True 254 | self.provenance_children = [] 255 | self.provenance_parents = [] 256 | self.provenance_children_set = set() 257 | self.provenance_parents_set = set() 258 | self.seen_provenance = set() 259 | 260 | def set_value(self, value): 261 | if isinstance(value, list): # To shorten value if over LIMIT_SIZE characters 262 | value_shortened = [] 263 | counter = shorten_value_list(value, value_shortened) 264 | if counter >= LIMIT_SIZE: 265 | value = value_shortened 266 | logging.warning('Shortened the value of %s %s', self.name, self.attributes) 267 | elif isinstance(value, dict): # To shorten value if over LIMIT_SIZE characters 268 | value_shortened = {} 269 | counter = shorten_value_dict(value, value_shortened) 270 | if counter >= LIMIT_SIZE: 271 | value = value_shortened 272 | logging.warning('Shortened the value of %s %s', self.name, self.attributes) 273 | elif isinstance(value, str): # To shorten value if over LIMIT_SIZE characters 274 | value = value[:LIMIT_SIZE] 275 | self.value = value 276 | 277 | def set_update_value(self, update_value): 278 | self.update_value = update_value 279 | 280 | def set_provenance_dd(self, extremity): # Set Node provenance, set_data_dependency case 281 | # self is the origin of the DD while extremity is the destination of the DD 282 | if extremity.provenance_children: 283 | for child in extremity.provenance_children: 284 | if child not in self.provenance_children_set: 285 | self.provenance_children_set.add(child) 286 | self.provenance_children.append(child) 287 | else: 288 | if extremity not in self.provenance_children_set: 289 | self.provenance_children_set.add(extremity) 290 | self.provenance_children.append(extremity) 291 | if self.provenance_parents: 292 | for parent in self.provenance_parents: 293 | if parent not in extremity.provenance_parents_set: 294 | extremity.provenance_parents_set.add(parent) 295 | extremity.provenance_parents.append(parent) 296 | else: 297 | if self not in extremity.provenance_parents_set: 298 | extremity.provenance_parents_set.add(self) 299 | extremity.provenance_parents.append(self) 300 | 301 | def set_provenance(self, extremity): # Set Node provenance, computed value case 302 | """ 303 | a.b = c 304 | """ 305 | if extremity in self.seen_provenance: 306 | pass 307 | self.seen_provenance.add(extremity) 308 | # extremity was leveraged to compute the value of self 309 | if not isinstance(extremity, Node): # extremity is None: 310 | if self not in self.provenance_parents_set: 311 | self.provenance_parents_set.add(self) 312 | self.provenance_parents.append(self) 313 | elif isinstance(extremity, Value): 314 | if extremity.provenance_parents: 315 | for parent in extremity.provenance_parents: 316 | if parent not in self.provenance_parents_set: 317 | self.provenance_parents_set.add(parent) 318 | self.provenance_parents.append(parent) 319 | else: 320 | if extremity not in self.provenance_parents_set: 321 | self.provenance_parents_set.add(extremity) 322 | self.provenance_parents.append(extremity) 323 | if self.provenance_children: 324 | for child in self.provenance_children: 325 | if child not in extremity.provenance_children_set: 326 | extremity.provenance_children_set.add(child) 327 | extremity.provenance_children.append(child) 328 | else: 329 | if self not in extremity.provenance_children_set: 330 | extremity.provenance_children_set.add(self) 331 | extremity.provenance_children.append(self) 332 | elif isinstance(extremity, Node): # Otherwise very restrictive 333 | self.provenance_parents_set.add(extremity) 334 | self.provenance_parents.append(extremity) 335 | for extremity_child in extremity.children: # Not necessarily useful 336 | self.set_provenance(extremity_child) 337 | 338 | def set_provenance_rec(self, extremity): 339 | self.set_provenance(extremity) 340 | for child in extremity.children: 341 | self.set_provenance_rec(child) 342 | 343 | 344 | class Identifier(Node, Value): 345 | """ Identifier Nodes. DD is on Identifier nodes. """ 346 | 347 | def __init__(self, name, parent): 348 | Node.__init__(self, name, parent) 349 | Value.__init__(self) 350 | self.code = None 351 | self.fun = None 352 | self.data_dep_parents = [] 353 | self.data_dep_children = [] 354 | 355 | def set_code(self, code): 356 | self.code = code 357 | 358 | def set_fun(self, fun): # The Identifier node refers to a function ('s name) 359 | self.fun = fun 360 | 361 | def set_data_dependency(self, extremity, nearest_statement=None): 362 | if extremity not in [el.extremity for el in self.data_dep_children]: # Avoids duplicates 363 | self.data_dep_children.append(Dependence('data dependency', extremity, 'data', 364 | nearest_statement)) 365 | extremity.data_dep_parents.append(Dependence('data dependency', self, 'data', 366 | nearest_statement)) 367 | self.set_provenance_dd(extremity) # Stored provenance 368 | 369 | 370 | class ValueExpr(Node, Value): 371 | """ Nodes from VALUE_EXPR which therefore have a value that should be stored. """ 372 | 373 | def __init__(self, name, parent): 374 | Node.__init__(self, name, parent) 375 | Value.__init__(self) 376 | 377 | 378 | class Statement(Node): 379 | """ Statement Nodes, see STATEMENTS. """ 380 | 381 | def __init__(self, name, parent): 382 | Node.__init__(self, name, parent) 383 | self.control_dep_parents = [] 384 | self.control_dep_children = [] 385 | 386 | def set_control_dependency(self, extremity, label): 387 | self.control_dep_children.append(Dependence('control dependency', extremity, label)) 388 | try: 389 | extremity.control_dep_parents.append(Dependence('control dependency', self, label)) 390 | except AttributeError as e: 391 | logging.debug('Unable to build a CF to go up the tree: %s', e) 392 | 393 | def remove_control_dependency(self, extremity): 394 | for i, _ in enumerate(self.control_dep_children): 395 | elt = self.control_dep_children[i] 396 | if elt.extremity.id == extremity.id: 397 | del self.control_dep_children[i] 398 | try: 399 | del extremity.control_dep_parents[i] 400 | except AttributeError as e: 401 | logging.debug('No CF going up the tree to delete: %s', e) 402 | 403 | 404 | class ReturnStatement(Statement, Value): 405 | """ ReturnStatement Node. It is a Statement that also has the attributes of a Value. """ 406 | 407 | def __init__(self, name, parent): 408 | Statement.__init__(self, name, parent) 409 | Value.__init__(self) 410 | 411 | 412 | class Function: 413 | """ To store function related information. """ 414 | 415 | def __init__(self): 416 | self.fun_name = None 417 | self.fun_params = [] 418 | self.fun_return = [] 419 | self.retraverse = False # Indicates if we are traversing a given node again 420 | self.called = False 421 | 422 | def set_fun_name(self, fun_name): 423 | self.fun_name = fun_name 424 | fun_name.set_fun(self) # Identifier fun_name has a handler to the function declaration self 425 | 426 | def add_fun_param(self, fun_param): 427 | self.fun_params.append(fun_param) 428 | 429 | def add_fun_return(self, fun_return): 430 | # if fun_return.id not in [el.id for el in self.fun_return]: # Avoids duplicates 431 | # Duplicates are okay, because we only consider the last return value from the list 432 | return_id_list = [el.id for el in self.fun_return] 433 | if not return_id_list: 434 | self.fun_return.append(fun_return) 435 | elif fun_return.id != return_id_list[-1]: # Avoids duplicates if already considered one 436 | self.fun_return.append(fun_return) 437 | 438 | def set_retraverse(self): 439 | self.retraverse = True 440 | 441 | def call_function(self): 442 | self.called = True 443 | 444 | 445 | class FunctionDeclaration(Statement, Function): 446 | """ FunctionDeclaration Node. It is a Statement that also has the attributes of a Function. """ 447 | 448 | def __init__(self, name, parent): 449 | Statement.__init__(self, name, parent) 450 | Function.__init__(self) 451 | 452 | 453 | class FunctionExpression(Node, Function): 454 | """ FunctionExpression and ArrowFunctionExpression Nodes. Have the attributes of a Function. """ 455 | 456 | def __init__(self, name, parent): 457 | Node.__init__(self, name, parent) 458 | Function.__init__(self) 459 | self.fun_intern_name = None 460 | 461 | def set_fun_intern_name(self, fun_intern_name): 462 | self.fun_intern_name = fun_intern_name # Name used if FunExpr referenced inside itself 463 | fun_intern_name.set_fun(self) # fun_intern_name has a handler to the function declaration 464 | -------------------------------------------------------------------------------- /pdg_js/package-lock.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "pdg_js", 3 | "lockfileVersion": 2, 4 | "requires": true, 5 | "packages": { 6 | "": { 7 | "dependencies": { 8 | "escodegen": "^2.0.0", 9 | "esprima": "^4.0.1" 10 | } 11 | }, 12 | "node_modules/deep-is": { 13 | "version": "0.1.4", 14 | "resolved": "https://registry.npmjs.org/deep-is/-/deep-is-0.1.4.tgz", 15 | "integrity": "sha512-oIPzksmTg4/MriiaYGO+okXDT7ztn/w3Eptv/+gSIdMdKsJo0u4CfYNFJPy+4SKMuCqGw2wxnA+URMg3t8a/bQ==" 16 | }, 17 | "node_modules/escodegen": { 18 | "version": "2.0.0", 19 | "resolved": "https://registry.npmjs.org/escodegen/-/escodegen-2.0.0.tgz", 20 | "integrity": "sha512-mmHKys/C8BFUGI+MAWNcSYoORYLMdPzjrknd2Vc+bUsjN5bXcr8EhrNB+UTqfL1y3I9c4fw2ihgtMPQLBRiQxw==", 21 | "dependencies": { 22 | "esprima": "^4.0.1", 23 | "estraverse": "^5.2.0", 24 | "esutils": "^2.0.2", 25 | "optionator": "^0.8.1" 26 | }, 27 | "bin": { 28 | "escodegen": "bin/escodegen.js", 29 | "esgenerate": "bin/esgenerate.js" 30 | }, 31 | "engines": { 32 | "node": ">=6.0" 33 | }, 34 | "optionalDependencies": { 35 | "source-map": "~0.6.1" 36 | } 37 | }, 38 | "node_modules/esprima": { 39 | "version": "4.0.1", 40 | "resolved": "https://registry.npmjs.org/esprima/-/esprima-4.0.1.tgz", 41 | "integrity": "sha512-eGuFFw7Upda+g4p+QHvnW0RyTX/SVeJBDM/gCtMARO0cLuT2HcEKnTPvhjV6aGeqrCB/sbNop0Kszm0jsaWU4A==", 42 | "bin": { 43 | "esparse": "bin/esparse.js", 44 | "esvalidate": "bin/esvalidate.js" 45 | }, 46 | "engines": { 47 | "node": ">=4" 48 | } 49 | }, 50 | "node_modules/estraverse": { 51 | "version": "5.3.0", 52 | "resolved": "https://registry.npmjs.org/estraverse/-/estraverse-5.3.0.tgz", 53 | "integrity": "sha512-MMdARuVEQziNTeJD8DgMqmhwR11BRQ/cBP+pLtYdSTnf3MIO8fFeiINEbX36ZdNlfU/7A9f3gUw49B3oQsvwBA==", 54 | "engines": { 55 | "node": ">=4.0" 56 | } 57 | }, 58 | "node_modules/esutils": { 59 | "version": "2.0.3", 60 | "resolved": "https://registry.npmjs.org/esutils/-/esutils-2.0.3.tgz", 61 | "integrity": "sha512-kVscqXk4OCp68SZ0dkgEKVi6/8ij300KBWTJq32P/dYeWTSwK41WyTxalN1eRmA5Z9UU/LX9D7FWSmV9SAYx6g==", 62 | "engines": { 63 | "node": ">=0.10.0" 64 | } 65 | }, 66 | "node_modules/fast-levenshtein": { 67 | "version": "2.0.6", 68 | "resolved": "https://registry.npmjs.org/fast-levenshtein/-/fast-levenshtein-2.0.6.tgz", 69 | "integrity": "sha1-PYpcZog6FqMMqGQ+hR8Zuqd5eRc=" 70 | }, 71 | "node_modules/levn": { 72 | "version": "0.3.0", 73 | "resolved": "https://registry.npmjs.org/levn/-/levn-0.3.0.tgz", 74 | "integrity": "sha1-OwmSTt+fCDwEkP3UwLxEIeBHZO4=", 75 | "dependencies": { 76 | "prelude-ls": "~1.1.2", 77 | "type-check": "~0.3.2" 78 | }, 79 | "engines": { 80 | "node": ">= 0.8.0" 81 | } 82 | }, 83 | "node_modules/optionator": { 84 | "version": "0.8.3", 85 | "resolved": "https://registry.npmjs.org/optionator/-/optionator-0.8.3.tgz", 86 | "integrity": "sha512-+IW9pACdk3XWmmTXG8m3upGUJst5XRGzxMRjXzAuJ1XnIFNvfhjjIuYkDvysnPQ7qzqVzLt78BCruntqRhWQbA==", 87 | "dependencies": { 88 | "deep-is": "~0.1.3", 89 | "fast-levenshtein": "~2.0.6", 90 | "levn": "~0.3.0", 91 | "prelude-ls": "~1.1.2", 92 | "type-check": "~0.3.2", 93 | "word-wrap": "~1.2.3" 94 | }, 95 | "engines": { 96 | "node": ">= 0.8.0" 97 | } 98 | }, 99 | "node_modules/prelude-ls": { 100 | "version": "1.1.2", 101 | "resolved": "https://registry.npmjs.org/prelude-ls/-/prelude-ls-1.1.2.tgz", 102 | "integrity": "sha1-IZMqVJ9eUv/ZqCf1cOBL5iqX2lQ=", 103 | "engines": { 104 | "node": ">= 0.8.0" 105 | } 106 | }, 107 | "node_modules/source-map": { 108 | "version": "0.6.1", 109 | "resolved": "https://registry.npmjs.org/source-map/-/source-map-0.6.1.tgz", 110 | "integrity": "sha512-UjgapumWlbMhkBgzT7Ykc5YXUT46F0iKu8SGXq0bcwP5dz/h0Plj6enJqjz1Zbq2l5WaqYnrVbwWOWMyF3F47g==", 111 | "optional": true, 112 | "engines": { 113 | "node": ">=0.10.0" 114 | } 115 | }, 116 | "node_modules/type-check": { 117 | "version": "0.3.2", 118 | "resolved": "https://registry.npmjs.org/type-check/-/type-check-0.3.2.tgz", 119 | "integrity": "sha1-WITKtRLPHTVeP7eE8wgEsrUg23I=", 120 | "dependencies": { 121 | "prelude-ls": "~1.1.2" 122 | }, 123 | "engines": { 124 | "node": ">= 0.8.0" 125 | } 126 | }, 127 | "node_modules/word-wrap": { 128 | "version": "1.2.3", 129 | "resolved": "https://registry.npmjs.org/word-wrap/-/word-wrap-1.2.3.tgz", 130 | "integrity": "sha512-Hz/mrNwitNRh/HUAtM/VT/5VH+ygD6DV7mYKZAtHOrbs8U7lvPS6xf7EJKMF0uW1KJCl0H701g3ZGus+muE5vQ==", 131 | "engines": { 132 | "node": ">=0.10.0" 133 | } 134 | } 135 | }, 136 | "dependencies": { 137 | "deep-is": { 138 | "version": "0.1.4", 139 | "resolved": "https://registry.npmjs.org/deep-is/-/deep-is-0.1.4.tgz", 140 | "integrity": "sha512-oIPzksmTg4/MriiaYGO+okXDT7ztn/w3Eptv/+gSIdMdKsJo0u4CfYNFJPy+4SKMuCqGw2wxnA+URMg3t8a/bQ==" 141 | }, 142 | "escodegen": { 143 | "version": "2.0.0", 144 | "resolved": "https://registry.npmjs.org/escodegen/-/escodegen-2.0.0.tgz", 145 | "integrity": "sha512-mmHKys/C8BFUGI+MAWNcSYoORYLMdPzjrknd2Vc+bUsjN5bXcr8EhrNB+UTqfL1y3I9c4fw2ihgtMPQLBRiQxw==", 146 | "requires": { 147 | "esprima": "^4.0.1", 148 | "estraverse": "^5.2.0", 149 | "esutils": "^2.0.2", 150 | "optionator": "^0.8.1", 151 | "source-map": "~0.6.1" 152 | } 153 | }, 154 | "esprima": { 155 | "version": "4.0.1", 156 | "resolved": "https://registry.npmjs.org/esprima/-/esprima-4.0.1.tgz", 157 | "integrity": "sha512-eGuFFw7Upda+g4p+QHvnW0RyTX/SVeJBDM/gCtMARO0cLuT2HcEKnTPvhjV6aGeqrCB/sbNop0Kszm0jsaWU4A==" 158 | }, 159 | "estraverse": { 160 | "version": "5.3.0", 161 | "resolved": "https://registry.npmjs.org/estraverse/-/estraverse-5.3.0.tgz", 162 | "integrity": "sha512-MMdARuVEQziNTeJD8DgMqmhwR11BRQ/cBP+pLtYdSTnf3MIO8fFeiINEbX36ZdNlfU/7A9f3gUw49B3oQsvwBA==" 163 | }, 164 | "esutils": { 165 | "version": "2.0.3", 166 | "resolved": "https://registry.npmjs.org/esutils/-/esutils-2.0.3.tgz", 167 | "integrity": "sha512-kVscqXk4OCp68SZ0dkgEKVi6/8ij300KBWTJq32P/dYeWTSwK41WyTxalN1eRmA5Z9UU/LX9D7FWSmV9SAYx6g==" 168 | }, 169 | "fast-levenshtein": { 170 | "version": "2.0.6", 171 | "resolved": "https://registry.npmjs.org/fast-levenshtein/-/fast-levenshtein-2.0.6.tgz", 172 | "integrity": "sha1-PYpcZog6FqMMqGQ+hR8Zuqd5eRc=" 173 | }, 174 | "levn": { 175 | "version": "0.3.0", 176 | "resolved": "https://registry.npmjs.org/levn/-/levn-0.3.0.tgz", 177 | "integrity": "sha1-OwmSTt+fCDwEkP3UwLxEIeBHZO4=", 178 | "requires": { 179 | "prelude-ls": "~1.1.2", 180 | "type-check": "~0.3.2" 181 | } 182 | }, 183 | "optionator": { 184 | "version": "0.8.3", 185 | "resolved": "https://registry.npmjs.org/optionator/-/optionator-0.8.3.tgz", 186 | "integrity": "sha512-+IW9pACdk3XWmmTXG8m3upGUJst5XRGzxMRjXzAuJ1XnIFNvfhjjIuYkDvysnPQ7qzqVzLt78BCruntqRhWQbA==", 187 | "requires": { 188 | "deep-is": "~0.1.3", 189 | "fast-levenshtein": "~2.0.6", 190 | "levn": "~0.3.0", 191 | "prelude-ls": "~1.1.2", 192 | "type-check": "~0.3.2", 193 | "word-wrap": "~1.2.3" 194 | } 195 | }, 196 | "prelude-ls": { 197 | "version": "1.1.2", 198 | "resolved": "https://registry.npmjs.org/prelude-ls/-/prelude-ls-1.1.2.tgz", 199 | "integrity": "sha1-IZMqVJ9eUv/ZqCf1cOBL5iqX2lQ=" 200 | }, 201 | "source-map": { 202 | "version": "0.6.1", 203 | "resolved": "https://registry.npmjs.org/source-map/-/source-map-0.6.1.tgz", 204 | "integrity": "sha512-UjgapumWlbMhkBgzT7Ykc5YXUT46F0iKu8SGXq0bcwP5dz/h0Plj6enJqjz1Zbq2l5WaqYnrVbwWOWMyF3F47g==", 205 | "optional": true 206 | }, 207 | "type-check": { 208 | "version": "0.3.2", 209 | "resolved": "https://registry.npmjs.org/type-check/-/type-check-0.3.2.tgz", 210 | "integrity": "sha1-WITKtRLPHTVeP7eE8wgEsrUg23I=", 211 | "requires": { 212 | "prelude-ls": "~1.1.2" 213 | } 214 | }, 215 | "word-wrap": { 216 | "version": "1.2.3", 217 | "resolved": "https://registry.npmjs.org/word-wrap/-/word-wrap-1.2.3.tgz", 218 | "integrity": "sha512-Hz/mrNwitNRh/HUAtM/VT/5VH+ygD6DV7mYKZAtHOrbs8U7lvPS6xf7EJKMF0uW1KJCl0H701g3ZGus+muE5vQ==" 219 | } 220 | } 221 | } 222 | -------------------------------------------------------------------------------- /pdg_js/package.json: -------------------------------------------------------------------------------- 1 | { 2 | "dependencies": { 3 | "escodegen": "^2.0.0", 4 | "esprima": "^4.0.1" 5 | } 6 | } 7 | -------------------------------------------------------------------------------- /pdg_js/parser.js: -------------------------------------------------------------------------------- 1 | // Copyright (C) 2021 Aurore Fass 2 | // Copyright (C) 2022 Anonymous 3 | // 4 | // This program is free software: you can redistribute it and/or modify 5 | // it under the terms of the GNU Affero General Public License as published 6 | // by the Free Software Foundation, either version 3 of the License, or 7 | // (at your option) any later version. 8 | // 9 | // This program is distributed in the hope that it will be useful, 10 | // but WITHOUT ANY WARRANTY; without even the implied warranty of 11 | // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 12 | // GNU Affero General Public License for more details. 13 | // 14 | // You should have received a copy of the GNU Affero General Public License 15 | // along with this program. If not, see . 16 | 17 | 18 | // Conversion of a JS file into its Esprima AST. 19 | 20 | 21 | module.exports = { 22 | js2ast: js2ast, 23 | }; 24 | 25 | 26 | var esprima = require("esprima"); 27 | var es = require("escodegen"); 28 | var fs = require("fs"); 29 | var path = require("path"); 30 | var process = require("process"); 31 | 32 | 33 | /** 34 | * Extraction of the AST of an input JS file using Esprima. 35 | * 36 | * @param js 37 | * @param json_path 38 | * @returns {*} 39 | */ 40 | function js2ast(js, json_path) { 41 | var text = fs.readFileSync(js).toString('utf-8'); 42 | try { 43 | var ast = esprima.parseModule(text, { 44 | range: true, 45 | loc: true, 46 | tokens: true, 47 | tolerant: true, 48 | comment: true 49 | }); 50 | } catch(e) { 51 | console.error(js, e); 52 | process.exit(1); 53 | } 54 | 55 | // Attaching comments is a separate step for Escodegen 56 | ast = es.attachComments(ast, ast.comments, ast.tokens); 57 | 58 | fs.mkdirSync(path.dirname(json_path), {recursive: true}); 59 | fs.writeFile(json_path, JSON.stringify(ast), function (err) { 60 | if (err) { 61 | console.error(err); 62 | } 63 | }); 64 | 65 | return ast; 66 | } 67 | 68 | js2ast(process.argv[2], process.argv[3]); 69 | -------------------------------------------------------------------------------- /pdg_js/pointer_analysis.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2021 Aurore Fass 2 | # 3 | # This program is free software: you can redistribute it and/or modify 4 | # it under the terms of the GNU Affero General Public License as published 5 | # by the Free Software Foundation, either version 3 of the License, or 6 | # (at your option) any later version. 7 | # 8 | # This program is distributed in the hope that it will be useful, 9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 11 | # GNU Affero General Public License for more details. 12 | # 13 | # You should have received a copy of the GNU Affero General Public License 14 | # along with this program. If not, see . 15 | 16 | 17 | """ 18 | Pointer analysis; mapping a variable to where its value is defined. 19 | """ 20 | 21 | import logging 22 | 23 | from . import js_operators 24 | from .value_filters import get_node_computed_value, display_values 25 | from . import node as _node 26 | 27 | 28 | """ 29 | In the following and if not stated otherwise, 30 | - node: Node 31 | Current node. 32 | - identifiers: list 33 | List of Identifier nodes whose values we aim at computing. 34 | - operator: None or str (e.g., '+='). 35 | """ 36 | 37 | 38 | def get_node_path(begin_node, destination_node, path): 39 | """ 40 | Find the path between begin_node and destination_node. 41 | ------- 42 | Parameters: 43 | - begin_node: Node 44 | Entry point, origin. 45 | - destination_node: Node 46 | Descendant of begin_node. Destination point. 47 | - path: list 48 | Path between begin_node and destination_node. 49 | Ex: [0, 0, 1] <=> begin_node.children[0].children[0].children[1] = destination_node. 50 | """ 51 | 52 | if begin_node.id == destination_node.id: 53 | return True 54 | 55 | for i, _ in enumerate(begin_node.children): 56 | path.append(i) # Child number i 57 | found = get_node_path(begin_node.children[i], destination_node, path) 58 | if found: 59 | return True 60 | del path[-1] 61 | return False 62 | 63 | 64 | def find_node(var, begin_node, path): 65 | """ Find the node whose path from begin_node is given. """ 66 | 67 | logging.debug('Trying to find the node symmetric from %s using the following path %s from %s', 68 | var.name, path, begin_node.name) 69 | while path: 70 | child_nb = path.pop(0) 71 | try: 72 | begin_node = begin_node.children[child_nb] 73 | except IndexError: # Case Asymmetric mapping, e.g., Array or Object mapped to an Identifier 74 | return begin_node, None 75 | 76 | if not path: # begin_node is already the node we are looking for 77 | return begin_node, None 78 | 79 | # Case Asymmetric mapping, e.g., Identifier mapped to an Array or else 80 | logging.debug('Asymmetric mapping case') 81 | if begin_node.name in ('ArrayExpression', 'ObjectExpression', 'ObjectPattern', 'NewExpression'): 82 | value = begin_node 83 | logging.debug('The value corresponds to node %s', value.name) 84 | return None, value 85 | 86 | return begin_node, None 87 | 88 | 89 | def get_member_expression(node): 90 | """ Returns: 91 | - if a MemberExpression node ascendant was found; 92 | - the furthest MemberExpression ascendant (if True) or node. 93 | - if we are in a window.node or this.node situation. """ 94 | 95 | if node.parent.name != 'MemberExpression': 96 | return False, node, False 97 | 98 | while node.parent.name == 'MemberExpression': 99 | if node.parent.children[0].name == 'ThisExpression'\ 100 | or get_node_computed_value(node.parent.children[0]) in _node.GLOBAL_VAR: 101 | return False, node, True 102 | node = node.parent 103 | return True, node, False 104 | 105 | 106 | def map_var2value(node, identifiers, operator=None): 107 | """ 108 | Map identifier nodes to their corresponding Literal/Identifier values. 109 | 110 | ------- 111 | Parameters: 112 | - node: Node 113 | Entry point, either a VariableDeclaration or AssignmentExpression node. 114 | Therefore: node.children[0] => Identifier = considered variable; 115 | node.children[1] => Identifier/Literal = corresponding value 116 | - identifiers: list 117 | List of Identifier nodes to map to their values. 118 | 119 | Trick: Symmetry between AST left-hand side (declaration) and right-hand side (value). 120 | """ 121 | 122 | if node.name != 'VariableDeclarator' and node.name != 'AssignmentExpression' \ 123 | and node.name != 'Property': 124 | # Could be called on other Nodes because of assignment_expr_df which calculates DD on 125 | # right-hand side elements which may not be variable declarations/assignments anymore 126 | return 127 | 128 | var = node.children[0] 129 | init = node.children[1] 130 | 131 | for decl in identifiers: 132 | # Compute the value for each decl, as it might have changed 133 | logging.debug('Computing a value for the variable %s with id %s', 134 | decl.attributes['name'], decl.id) 135 | 136 | decl.set_update_value(True) # Will be updated when printed in display_temp 137 | member_expr, decl, this_window = get_member_expression(decl) 138 | 139 | path = list() 140 | get_node_path(var, decl, path) 141 | if this_window: 142 | path.pop() # We jump over the MemberExpression parent to keep the symmetry 143 | 144 | if isinstance(init, _node.Identifier) and isinstance(init.value, _node.Node): 145 | try: 146 | logging.debug('The variable %s was initialized with the Identifier %s which already' 147 | ' has a value', decl.attributes['name'], init.attributes['name']) 148 | except KeyError: 149 | logging.debug('The variable %s was initialized with the Identifier %s which already' 150 | ' has a value', decl.name, init.name) 151 | value_node, value = find_node(var, init.value, path) 152 | else: 153 | if isinstance(decl, _node.Identifier): 154 | logging.debug('The variable %s was not initialized with an Identifier or ' 155 | 'it does not already have a value', decl.attributes['name']) 156 | else: 157 | logging.debug('The %s %s was not initialized with an Identifier or ' 158 | 'it does not already have a value', decl.name, decl.attributes) 159 | value_node, value = find_node(var, init, path) 160 | if value_node is not None: 161 | logging.debug('Got the node %s', value_node.name) 162 | 163 | if value is None: 164 | if isinstance(decl, _node.Identifier): 165 | logging.debug('Calculating the value of the variable %s', decl.attributes['name']) 166 | else: 167 | logging.debug('Calculating the value') 168 | if operator is None: 169 | logging.debug('Fetching the value') 170 | # We compute the value ourselves 171 | value = get_node_computed_value(value_node, initial_node=decl) 172 | if isinstance(decl, _node.Identifier): 173 | decl.set_code(node) # Add code 174 | 175 | else: 176 | logging.debug('Found the %s operator, computing the value ourselves', operator) 177 | # We compute the value ourselves: decl operator value_node 178 | value = js_operators.compute_operators(operator, decl, value_node, 179 | initial_node=decl) 180 | if isinstance(decl, _node.Identifier): 181 | decl.set_code(node) # Add code 182 | 183 | else: 184 | decl.set_code(node) # Add code 185 | 186 | if not member_expr: # Standard case, assign the value to the Identifier node 187 | logging.debug('Assigning the value %s to %s', value, decl.attributes['name']) 188 | decl.set_value(value) 189 | if isinstance(value_node, _node.FunctionExpression): 190 | fun_name = decl 191 | if value_node.fun_intern_name is not None: 192 | logging.debug('The variable %s refers to the (Arrow)FunctionExpresion %s', 193 | fun_name.attributes['name'], 194 | value_node.fun_intern_name.attributes['name']) 195 | else: 196 | logging.debug('The variable %s refers to an anonymous (Arrow)FunctionExpresion', 197 | fun_name.attributes['name']) 198 | value_node.set_fun_name(fun_name) 199 | else: 200 | display_values(decl) # Displays values 201 | else: # MemberExpression case 202 | logging.debug('MemberExpression case') 203 | literal_value = update_member_expression(decl, initial_node=decl) 204 | if isinstance(literal_value, _node.Value): # Everything is fine, can store value 205 | logging.debug('The object was defined, set the value of its property') 206 | literal_value.set_value(value) # Modifies value of the node referencing the MemExpr 207 | literal_value.set_provenance_rec(value_node) # Updates provenance 208 | display_values(literal_value) # Displays values 209 | else: # The object is probably a built-in object therefore no handle to get its prop 210 | logging.debug('The object was not defined, stored its property and set its value') 211 | obj, all_prop = define_obj_properties(decl, value, initial_node=decl) 212 | obj.set_value(all_prop) 213 | obj.set_provenance_rec(value_node) # Updates provenance 214 | display_values(obj) 215 | 216 | 217 | def compute_update_expression(node, identifier): 218 | """ Evaluates an UpdateExpression node. """ 219 | 220 | identifier.set_update_value(True) # Will be updated when printed in display_temp 221 | operator = node.attributes['operator'] 222 | value = js_operators.compute_operators(operator, identifier, 0) 223 | identifier.set_value(value) 224 | identifier.set_code(node.parent) 225 | 226 | 227 | def update_member_expression(member_expression_node, initial_node): 228 | """ If a MemberExpression is modified (i.e., left-hand side of an assignment), 229 | modifies the value of the node referencing the MemberExpression. """ 230 | 231 | literal_value = js_operators.compute_member_expression(member_expression_node, 232 | initial_node=initial_node, compute=False) 233 | return literal_value 234 | 235 | 236 | def search_properties(node, tab): 237 | """ Searches the Identifier/Literal nodes properties of a MemberExpression node. """ 238 | 239 | if node.name in ('Identifier', 'Literal'): 240 | if get_node_computed_value(node) not in _node.GLOBAL_VAR: # do nothing if window &co 241 | tab.append(node) # store left member as not window &co 242 | 243 | for child in node.children: 244 | search_properties(child, tab) 245 | 246 | 247 | def define_obj_properties(member_expression_node, value, initial_node): 248 | """ Defines the properties of a built-in object. Returns the object + its properties. """ 249 | 250 | properties = [] 251 | search_properties(member_expression_node, properties) # Got all prop 252 | 253 | obj = properties[0] 254 | obj_init = get_node_computed_value(obj, initial_node=initial_node) 255 | # The obj may already have some properties 256 | properties = properties[1:] 257 | properties_value = [get_node_computed_value(prop, 258 | initial_node=initial_node) for prop in properties] 259 | 260 | # Good for debugging to see dict content, but cannot be used as loses link to variables 261 | # if isinstance(value, _node.Node): 262 | # if value.name in ('ObjectExpression', 'ObjectPattern'): 263 | # value = js_operators.compute_object_expr(value) 264 | 265 | if isinstance(obj_init, dict): # the obj already have properties 266 | all_prop = obj_init # initialize obj with its existing properties 267 | elif isinstance(obj_init, str): # the obj was previously defined with value obj_init 268 | all_prop = {obj_init: {}} # store its previous value as a property to keep it 269 | else: 270 | all_prop = {} # initialize with empty dict 271 | previous_prop = all_prop 272 | for i in range(len(properties_value) - 1): 273 | prop = properties_value[i] 274 | if prop not in previous_prop or not isinstance(previous_prop[prop], dict): 275 | previous_prop[prop] = {} # previous_prop[prop] does not already exist 276 | previous_prop = previous_prop[prop] 277 | previous_prop[properties_value[-1]] = value # prop0.prop1.prop2... = value 278 | 279 | return obj, all_prop 280 | -------------------------------------------------------------------------------- /pdg_js/scope.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2021 Aurore Fass 2 | # 3 | # This program is free software: you can redistribute it and/or modify 4 | # it under the terms of the GNU Affero General Public License as published 5 | # by the Free Software Foundation, either version 3 of the License, or 6 | # (at your option) any later version. 7 | # 8 | # This program is distributed in the hope that it will be useful, 9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 11 | # GNU Affero General Public License for more details. 12 | # 13 | # You should have received a copy of the GNU Affero General Public License 14 | # along with this program. If not, see . 15 | 16 | 17 | """ 18 | Definition of class Scope to handle JS scoping rules. 19 | """ 20 | 21 | import copy 22 | 23 | 24 | class Scope: 25 | """ To apply JS scoping rules. """ 26 | 27 | def __init__(self, name=''): 28 | self.name = name 29 | self.var_list = [] 30 | self.var_if2_list = [] # Specific to if constructs with 2 possible variables at the end 31 | self.unknown_var = set() # Unknown variable in a given scope 32 | self.function = None 33 | self.bloc = False # Indicates if we are in a block statement 34 | self.need_to_recompute_var_list = True 35 | self.id_name_list = set() 36 | 37 | def set_name(self, name): 38 | self.name = name 39 | 40 | def set_var_list(self, var_list): 41 | self.var_list = var_list 42 | self.need_to_recompute_var_list = True 43 | 44 | def set_var_if2_list(self, var_if2_list): 45 | self.var_if2_list = var_if2_list 46 | 47 | def set_unknown_var(self, unknown_var): 48 | self.unknown_var = unknown_var 49 | 50 | def set_function(self, function): 51 | self.function = function 52 | 53 | def add_var(self, identifier_node): 54 | self.var_list.append(identifier_node) 55 | self.need_to_recompute_var_list = True 56 | self.var_if2_list.append(None) 57 | 58 | def add_unknown_var(self, unknown): 59 | self.unknown_var.add(unknown) # Set avoids duplicates 60 | 61 | def remove_unknown_var(self, unknown): 62 | self.unknown_var.remove(unknown) 63 | 64 | def update_var(self, index, identifier_node): 65 | self.var_list[index] = identifier_node 66 | self.need_to_recompute_var_list = True 67 | self.var_if2_list[index] = None 68 | 69 | def update_var_if2(self, index, identifier_node_list): 70 | self.var_if2_list[index] = identifier_node_list 71 | 72 | def add_var_if2(self, index, identifier_node): 73 | if not isinstance(self.var_if2_list[index], list): 74 | self.var_if2_list[index] = [] 75 | self.var_if2_list[index].append(identifier_node) 76 | 77 | def is_equal(self, var_list2): 78 | if self.var_list == var_list2.var_list and self.var_if2_list == var_list2.var_if2_list: 79 | return True 80 | return False 81 | 82 | def copy_scope(self): 83 | scope = Scope() 84 | scope.set_name(copy.copy(self.name)) 85 | scope.set_var_list(copy.copy(self.var_list)) 86 | scope.set_var_if2_list(copy.copy(self.var_if2_list)) 87 | scope.set_unknown_var(copy.copy(self.unknown_var)) 88 | scope.set_function(copy.copy(self.function)) 89 | return scope 90 | 91 | def get_pos_identifier(self, identifier_node): 92 | tmp_list = None 93 | if self.need_to_recompute_var_list: 94 | tmp_list = [elt.attributes['name'] for elt in self.var_list] 95 | self.id_name_list = set(tmp_list) 96 | self.need_to_recompute_var_list = False 97 | var_name = identifier_node.attributes['name'] 98 | if var_name in self.id_name_list: 99 | if tmp_list is None: 100 | tmp_list = [elt.attributes['name'] for elt in self.var_list] 101 | return tmp_list.index(var_name) # Position of identifier_node in var_list 102 | return None # None if it is not in the list 103 | 104 | def set_in_bloc(self, bloc): 105 | self.bloc = bloc 106 | -------------------------------------------------------------------------------- /pdg_js/utility_df.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2021 Aurore Fass 2 | # 3 | # This program is free software: you can redistribute it and/or modify 4 | # it under the terms of the GNU Affero General Public License as published 5 | # by the Free Software Foundation, either version 3 of the License, or 6 | # (at your option) any later version. 7 | # 8 | # This program is distributed in the hope that it will be useful, 9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 11 | # GNU Affero General Public License for more details. 12 | # 13 | # You should have received a copy of the GNU Affero General Public License 14 | # along with this program. If not, see . 15 | 16 | 17 | """ 18 | Utility file, stores shared information. 19 | """ 20 | 21 | import sys 22 | import resource 23 | import timeit 24 | import logging 25 | import signal 26 | import traceback 27 | 28 | sys.setrecursionlimit(100000) 29 | 30 | 31 | TEST = False 32 | 33 | if TEST: # To test, e.g., the examples 34 | PDG_EXCEPT = True # To print the exceptions encountered while building the PDG 35 | LIMIT_SIZE = 10000 # To avoid list/str values with over 10,000 characters 36 | LIMIT_RETRAVERSE = 1 # If function called on itself, then max times to avoid infinite recursion 37 | LIMIT_LOOP = 5 # If iterating through a loop, then max times to avoid infinite loops 38 | DISPLAY_VAR = True # To display variable values 39 | CHECK_JSON = True # Builds the JS code from the AST, to check for possible bugs in the AST 40 | 41 | NUM_WORKERS = 1 42 | 43 | else: # To run with multiprocessing 44 | PDG_EXCEPT = False # To ignore (pass) the exceptions encountered while building the PDG 45 | LIMIT_SIZE = 10000 # To avoid list/str values with over 10,000 characters 46 | LIMIT_RETRAVERSE = 1 # If function called on itself, then max times to avoid infinite recursion 47 | LIMIT_LOOP = 1 # If iterating through a loop, then max times to avoid infinite loops 48 | DISPLAY_VAR = False # To not display variable values 49 | CHECK_JSON = False # To not build the JS code from the AST 50 | 51 | NUM_WORKERS = 1 # CHANGE THIS ONE 52 | 53 | 54 | class UpperThresholdFilter(logging.Filter): 55 | """ 56 | This allows us to set an upper threshold for the log levels since the setLevel method only 57 | sets a lower one 58 | """ 59 | 60 | def __init__(self, threshold, *args, **kwargs): 61 | self._threshold = threshold 62 | super(UpperThresholdFilter, self).__init__(*args, **kwargs) 63 | 64 | def filter(self, rec): 65 | return rec.levelno <= self._threshold 66 | 67 | 68 | logging.basicConfig(format='%(levelname)s: %(filename)s: %(message)s', level=logging.CRITICAL) 69 | # logging.basicConfig(filename='pdg.log', format='%(levelname)s: %(filename)s: %(message)s', 70 | # level=logging.DEBUG) 71 | # LOGGER = logging.getLogger() 72 | # LOGGER.addFilter(UpperThresholdFilter(logging.CRITICAL)) 73 | 74 | 75 | def micro_benchmark(message, elapsed_time): 76 | """ Micro benchmarks. """ 77 | logging.info('%s %s%s', message, str(elapsed_time), 's') 78 | print('CURRENT STATE %s %s%s' % (message, str(elapsed_time), 's')) 79 | return timeit.default_timer() 80 | 81 | 82 | class Timeout: 83 | """ Timeout class using ALARM signal. """ 84 | 85 | class Timeout(Exception): 86 | """ Timeout class throwing an exception. """ 87 | 88 | def __init__(self, sec): 89 | self.sec = sec 90 | 91 | def __enter__(self): 92 | signal.signal(signal.SIGALRM, self.raise_timeout) 93 | signal.alarm(self.sec) 94 | 95 | def __exit__(self, *args): 96 | signal.alarm(0) # disable alarm 97 | 98 | def raise_timeout(self, *args): 99 | traceback.print_stack(limit=100) 100 | raise Timeout.Timeout() 101 | 102 | 103 | def limit_memory(maxsize): 104 | """ Limiting the memory usage to maxsize (in bytes), soft limit. """ 105 | 106 | soft, hard = resource.getrlimit(resource.RLIMIT_AS) 107 | resource.setrlimit(resource.RLIMIT_AS, (maxsize, hard)) 108 | -------------------------------------------------------------------------------- /pdg_js/value_filters.py: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2021 Aurore Fass 2 | # 3 | # This program is free software: you can redistribute it and/or modify 4 | # it under the terms of the GNU Affero General Public License as published 5 | # by the Free Software Foundation, either version 3 of the License, or 6 | # (at your option) any later version. 7 | # 8 | # This program is distributed in the hope that it will be useful, 9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 11 | # GNU Affero General Public License for more details. 12 | # 13 | # You should have received a copy of the GNU Affero General Public License 14 | # along with this program. If not, see . 15 | 16 | 17 | """ Prints variables with their corresponding value. And logs whether an insecure API was used. """ 18 | 19 | import logging 20 | from . import node as _node 21 | from .js_operators import get_node_computed_value, get_node_value 22 | from . import utility_df 23 | 24 | INSECURE = ['document.write'] 25 | DISPLAY_VAR = utility_df.DISPLAY_VAR # To display the variables' value or not 26 | 27 | 28 | def is_insecure_there(value): 29 | """ Checks if value is part of an insecure API. """ 30 | 31 | for insecure in INSECURE: 32 | if insecure in value: 33 | logging.debug('Found a call to %s', insecure) 34 | 35 | 36 | def display_values(var, keep_none=True, check_insecure=True, recompute=False): 37 | """ Print var = its value and checks whether the value is part of an insecure API. """ 38 | 39 | if not DISPLAY_VAR: # We do not want the values printed during large-scale analyses 40 | return 41 | 42 | if recompute: # If we store ALL value sometimes we need to recompute them as could have changed 43 | # Currently not executed, check if set_value in get_node_computed_value 44 | value = get_node_value(var) 45 | var.set_value(value) 46 | else: 47 | value = var.value # We store value so as not to compute it AGAIN 48 | if isinstance(value, _node.Node) or value is None: # Only if necessary 49 | value = get_node_computed_value(var, keep_none=keep_none) # Gets variable value 50 | 51 | if isinstance(var, _node.Identifier): 52 | variable = get_node_value(var) 53 | print('\t' + variable + ' = ' + str(value)) # Prints variable = value 54 | 55 | elif var.name in _node.CALL_EXPR + ['ReturnStatement']: 56 | print('\t' + var.name + ' = ' + str(value)) # Prints variable = value) 57 | 58 | if isinstance(value, _node.Node): 59 | print('\t' + value.name, value.attributes, value.id) 60 | 61 | elif isinstance(value, str) and check_insecure: 62 | is_insecure_there(value) # Checks for usage of insecure APIs 63 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | graphviz==0.20 2 | lxml==4.9.0 3 | -------------------------------------------------------------------------------- /taint_mini/__init__.py: -------------------------------------------------------------------------------- 1 | from .storage import * 2 | from .wxml import * 3 | from .wxjs import * 4 | from .taintmini import * 5 | -------------------------------------------------------------------------------- /taint_mini/storage.py: -------------------------------------------------------------------------------- 1 | class Storage: 2 | instance = None 3 | node = None 4 | results = None 5 | app_path = None 6 | page_path = None 7 | config = None 8 | 9 | def __init__(self, _node, _app_path, _path_path, _config): 10 | self.node = _node 11 | # data structure: { 12 | # [page_name]: [ 13 | # { method: [method_name], 14 | # source: [source_name], 15 | # sink: [sink_name] 16 | # }, 17 | # ] 18 | # } 19 | self.results = dict() 20 | self.app_path = _app_path 21 | self.page_path = _path_path 22 | self.config = _config 23 | 24 | def get_node(self): 25 | return self.node 26 | 27 | def get_results(self): 28 | return self.results 29 | 30 | def get_app_path(self): 31 | return self.app_path 32 | 33 | def get_page_path(self): 34 | return self.page_path 35 | 36 | def get_config(self): 37 | return self.config 38 | 39 | @staticmethod 40 | def init(_node, _app_path, _page_path, _config): 41 | Storage.instance = Storage(_node, _app_path, _page_path, _config) 42 | 43 | @staticmethod 44 | def get_instance(): 45 | return Storage.instance 46 | -------------------------------------------------------------------------------- /taint_mini/taintmini.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | from .wxjs import gen_pdg, handle_wxjs 4 | from .wxml import handle_wxml 5 | from .storage import Storage 6 | import multiprocessing as mp 7 | 8 | 9 | def filter_results(results, config): 10 | # no filters, just return 11 | if ("sources" not in config or len(config["sources"]) == 0) and \ 12 | ("sinks" not in config or len(config["sinks"]) == 0): 13 | return results 14 | 15 | filtered = {} 16 | for page in results: 17 | filtered[page] = [] 18 | for flow in results[page]: 19 | # filter source 20 | if "sources" in config and len(config["sources"]) > 0: 21 | if "sinks" in config and len(config["sinks"]) > 0: 22 | # apply source and sink filter 23 | if flow['source'] in config["sources"] and flow['sink'] in config["sinks"]: 24 | filtered[page].append(flow) 25 | # handle double binding in source 26 | if "[double_binding]" in config["sources"] and "[data from" in flow['source'] \ 27 | and flow['sink'] in config["sinks"]: 28 | filtered[page].append(flow) 29 | else: 30 | # no sink filter, just apply source filter 31 | if flow['sink'] in config["sinks"]: 32 | filtered[page].append(flow) 33 | else: 34 | # no source filter, apply sink filter 35 | if "sinks" in config and len(config["sinks"]) > 0: 36 | # apply sink filter 37 | if flow['sink'] in config["sinks"]: 38 | filtered[page].append(flow) 39 | # remove empty entries 40 | if len(filtered[page]) == 0: 41 | filtered.pop(page) 42 | return filtered 43 | 44 | 45 | def analyze_worker(app_path, page_path, results_path, config, queue): 46 | # generate pdg first 47 | r = gen_pdg(os.path.join(app_path, "pages", f"{page_path}.js"), results_path) 48 | # init shared storage (per process) 49 | Storage.init(r, app_path, page_path, config) 50 | # analyze double binding 51 | handle_wxml(os.path.join(app_path, "pages", f"{page_path}.wxml")) 52 | # analyze data flow 53 | handle_wxjs(r) 54 | # retrieve results 55 | results = Storage.get_instance().get_results() 56 | # filter results 57 | filtered = filter_results(results, config) 58 | # send results 59 | queue.put(filtered) 60 | 61 | 62 | def analyze_listener(result_path, queue): 63 | with open(result_path, "w") as f: 64 | f.write("page_name | page_method | ident | source | sink\n") 65 | while True: 66 | message = queue.get() 67 | if message == "kill": 68 | break 69 | if isinstance(message, dict): 70 | for page in message: 71 | for flow in message[page]: 72 | f.write(f"{page} | {flow['method']} | {flow['ident']} | {flow['source']} | {flow['sink']}\n") 73 | f.flush() 74 | f.flush() 75 | 76 | 77 | def obtain_valid_page(files): 78 | sub_pages = set() 79 | for f in files: 80 | sub_pages.add(str.split(f, ".")[0]) 81 | for f in list(sub_pages): 82 | if f"{f}.js" not in files or f"{f}.wxml" not in files: 83 | sub_pages.remove(f) 84 | return sub_pages 85 | 86 | 87 | def retrieve_pages(app_path): 88 | pages = set() 89 | for root, dirs, files in os.walk(os.path.join(app_path, "pages/")): 90 | for s in obtain_valid_page(files): 91 | pages.add(f"{root[len(os.path.join(app_path, 'pages/')):]}/{s}") 92 | return pages 93 | 94 | 95 | def analyze_mini_program(app_path, results_path, config, workers, bench): 96 | if not os.path.exists(app_path): 97 | print("[main] invalid app path") 98 | 99 | # obtain pages 100 | pages = retrieve_pages(app_path) 101 | if len(pages) == 0: 102 | print(f"[main] no page found") 103 | return 104 | 105 | # prepare output path 106 | if not os.path.exists(results_path): 107 | os.mkdir(results_path) 108 | elif os.path.isfile(results_path): 109 | print(f"[main] error: invalid output path") 110 | return 111 | 112 | manager = mp.Manager() 113 | queue = manager.Queue() 114 | pool = mp.Pool(workers if workers is not None else mp.cpu_count()) 115 | 116 | # put listener to pool first 117 | pool.apply_async(analyze_listener, (os.path.join(results_path, f"{os.path.basename(app_path)}-result.csv"), queue)) 118 | 119 | bench_out = None 120 | if bench: 121 | bench_out = open(os.path.join(results_path, f"{os.path.basename(app_path)}-bench.csv"), "w") 122 | bench_out.write("page|start|end\n") 123 | 124 | # execute workers 125 | workers = dict() 126 | for p in pages: 127 | workers[p] = dict() 128 | workers[p]["job"] = pool.apply_async(analyze_worker, (app_path, p, results_path, config, queue)) 129 | if bench: 130 | workers[p]["begin_time"] = int(time.time()) 131 | 132 | # collect results 133 | for p in pages: 134 | try: 135 | workers[p]["job"].get() 136 | except Exception as e: 137 | print(f"[main] critical error: {e}") 138 | finally: 139 | if bench: 140 | workers[p]["end_time"] = int(time.time()) 141 | 142 | queue.put("kill") 143 | pool.close() 144 | pool.join() 145 | 146 | if bench and bench_out is not None: 147 | for p in pages: 148 | bench_out.write(f"{p}|{workers[p]['begin_time']}|{workers[p]['end_time']}\n") 149 | bench_out.close() 150 | -------------------------------------------------------------------------------- /taint_mini/wxjs.py: -------------------------------------------------------------------------------- 1 | from collections import deque 2 | from pdg_js import node as _node 3 | from pdg_js.build_pdg import get_data_flow 4 | from .storage import Storage 5 | 6 | 7 | def gen_pdg(file_path, results_path): 8 | return get_data_flow(file_path, benchmarks=dict(), alt_json_path=f"{results_path}/intermediate-data/") 9 | 10 | 11 | def handle_wxjs(r): 12 | results = Storage.get_instance().get_results() 13 | results[Storage.get_instance().get_page_path()] = list() 14 | find_page_methods_node(r) 15 | 16 | 17 | def find_page_methods_node(r): 18 | for child in r.children: 19 | if child.name == "ExpressionStatement": 20 | if len(child.children) > 0 \ 21 | and child.children[0].name == "CallExpression" \ 22 | and child.children[0].children[0].attributes["name"] == "Page": 23 | # found page expression 24 | for method_node in child.children[0].children[1].children: 25 | if method_node.attributes["value"]["type"] == "FunctionExpression": 26 | # handle node 27 | method_name = method_node.children[0].attributes['name'] 28 | print( 29 | f"[page method] got page method, method name: {method_name}") 30 | try: 31 | dfs_search(method_node, method_name) 32 | except Exception as e: 33 | print(f"[wxjs] error in searching method {method_name}: {e}") 34 | 35 | 36 | def find_nearest_call_expr_node(node): 37 | return node if node is not None and \ 38 | hasattr(node, "name") and isinstance(node, _node.ValueExpr) and node.name == "CallExpression" \ 39 | else find_nearest_call_expr_node(node.parent if node.parent is not None else None) 40 | 41 | 42 | def obtain_callee_from_call_expr(node): 43 | if len(node.children[0].children) == 0 and node.children[0].attributes["name"] != "Page": 44 | return node.children[0].attributes["name"] 45 | return ".".join([i.attributes["name"] if "name" in i.attributes else "" for i in node.children[0].children]) 46 | 47 | 48 | def obtain_var_decl_callee(node): 49 | return ".".join([i.attributes["name"] for i in node.children[0].children]) 50 | 51 | 52 | def obtain_value_expr_callee(node): 53 | return ".".join([i.attributes["name"] for i in node.children]) 54 | 55 | 56 | def obtain_data_flow_sink(dep): 57 | # check if the dependence node has CallExpression parent 58 | if isinstance(dep.extremity.parent, _node.ValueExpr): 59 | return obtain_value_expr_callee(dep.extremity.parent.children[0]) 60 | return None 61 | 62 | 63 | def handle_data_parent_node(node): 64 | source = check_immediate_data_dep_parent(node) 65 | # if no known pattern match, fall back to general search 66 | if source is None: 67 | call_expr_node = find_nearest_call_expr_node(node) 68 | source = obtain_callee_from_call_expr(call_expr_node) 69 | print(f"[taint source] got nearest callee (source): {source}") 70 | 71 | # obtain sink 72 | sink = [] 73 | for child in node.data_dep_children: 74 | s = obtain_callee_from_call_expr(find_nearest_call_expr_node(child.extremity)) 75 | if s is not None: 76 | print(f"[taint sink] got data flow sink: {s}") 77 | sink.append(s) 78 | 79 | print(f"[flow path] data identifier: {node.attributes['name']}, " 80 | f"from source: {source if source is not None else 'None'}, " 81 | f"to sink: {','.join(map(str, sink))}") 82 | 83 | 84 | def is_parent_var_decl_or_assign_expr(node): 85 | return isinstance(node.parent, _node.Node) and \ 86 | hasattr(node.parent, "name") and \ 87 | (node.parent.name == "VariableDeclarator" or node.parent.name == "AssignmentExpression") 88 | 89 | 90 | def check_immediate_data_dep_parent(node): 91 | # check the data dep parent node is assignment or var decl 92 | # this check suitable for var_decl -> further usage 93 | source = None 94 | if is_parent_var_decl_or_assign_expr(node): 95 | # variable declaration or assignment, check the call expr 96 | if len(node.parent.children) > 1 and isinstance(node.parent.children[1], _node.ValueExpr): 97 | # obtain callee if parent is call expr 98 | if hasattr(node.parent.children[1], "name") and node.parent.children[1].name == "CallExpression": 99 | source = obtain_callee_from_call_expr(node.parent.children[1]) 100 | 101 | # obtain callee if parent is var decl 102 | if source is None: 103 | source = obtain_var_decl_callee(node.parent.children[1]) 104 | print(f"[taint source] got data flow source: {source}, identifier: {node.attributes['name']}") 105 | return source 106 | 107 | 108 | def is_page_method_parameter(node): 109 | if not isinstance(node, _node.Identifier): 110 | return False 111 | # in AST tree, ident -> FunctionExpr -> Property -> ObjectExpr 112 | # -> CallExpr <- Ident (Page) 113 | try: 114 | if node.parent.parent.parent.parent \ 115 | .children[0].attributes["name"] == "Page": 116 | return True 117 | except IndexError: 118 | return False 119 | except AttributeError: 120 | return False 121 | except KeyError: 122 | return False 123 | 124 | 125 | def get_input_name(value): 126 | return value[value.rindex(".") + 1:] if isinstance(value, str) and "detail.value" in value else None 127 | 128 | 129 | def handle_page_method_parameter(node, _n): 130 | # handle double binding values 131 | if not isinstance(node, _node.Identifier) or not isinstance(_n, _node.Identifier): 132 | return None 133 | # key is double_binding_values in ident node 134 | # omit it since false-negatives 135 | # if "double_binding_values" not in _n.attributes: 136 | # return None 137 | sources = set() 138 | # handle form double binding (input) 139 | # pattern: e.detail.value.[id] 140 | if isinstance(node.value, dict): 141 | for i in node.value: 142 | if isinstance(node.value[i], str) and "detail.value" in node.value[i]: 143 | input_name = get_input_name(node.value[i]) 144 | if input_name is None or input_name not in _n.attributes["double_binding_values"]: 145 | continue 146 | sources.add(f"[data from double binding: {input_name}, " 147 | f"type: {_n.attributes['double_binding_values'][input_name]}]") 148 | elif isinstance(node.value, str) and "detail.value" in node.value: 149 | input_name = get_input_name(node.value) 150 | if input_name is not None and input_name in _n.attributes["double_binding_values"]: 151 | sources.add(f"[data from double binding: {input_name}, " 152 | f"type: {_n.attributes['double_binding_values'][input_name]}]") 153 | 154 | # if no double binding found, fall back to general resolve 155 | if len(sources) == 0: 156 | sources.add(f"[data from page parameter: {node.value}]") 157 | return sources 158 | 159 | 160 | def handle_data_dep_parents(node): 161 | """ 162 | @return set of sources 163 | """ 164 | # check immediate data dep parent node first 165 | source = check_immediate_data_dep_parent(node) 166 | if source is not None: 167 | return {source} 168 | 169 | # no source found, fall back to general search 170 | source = obtain_callee_from_call_expr(find_nearest_call_expr_node(node)) 171 | if source is not None and source != "": 172 | return {source} 173 | 174 | sources = set() 175 | # no call expr found, search from provenance parents 176 | for n in node.provenance_parents_set: 177 | # check ident 178 | if isinstance(n, _node.Identifier): 179 | # check if it's page method parameter first 180 | if is_page_method_parameter(n): 181 | # is page method parameter, handle double binding 182 | # notice here should analyze the original node, 183 | # not the provenance parent node 184 | r = handle_page_method_parameter(node, n) 185 | if r is not None: 186 | sources.update(r) 187 | continue 188 | 189 | # search for source from var decl or assignment expr 190 | r = check_immediate_data_dep_parent(n) 191 | if r is None: 192 | # no results found, fall back to general search 193 | r = obtain_callee_from_call_expr(find_nearest_call_expr_node(n)) 194 | 195 | # still no results 196 | if r is None or r == "": 197 | continue 198 | # found source, add to set 199 | sources.add(r) 200 | # normal node, don't handle it 201 | if isinstance(n, _node.Node): 202 | continue 203 | # value expr, don't handle it 204 | if isinstance(n, _node.ValueExpr): 205 | continue 206 | # end for 207 | return sources 208 | 209 | 210 | def handle_data_child_node(node, method_name): 211 | if hasattr(node, "data_dep_children") and len(node.data_dep_children) > 0: 212 | # this node has data dep children (intermediate node), won't handle it 213 | return 214 | 215 | # no more children, it's the last node of the data flow 216 | # resolve sink api if the parent node is call expr 217 | sink = obtain_callee_from_call_expr(find_nearest_call_expr_node(node)) 218 | if sink == "": 219 | print(f"[taint sink] no sink api resolved, passing...") 220 | return 221 | print(f"[taint sink] got data flow sink: {sink}, resolving data flow source") 222 | 223 | # resolve data source 224 | sources = set() 225 | data_dep_parent_nodes = node.data_dep_parents 226 | for n in data_dep_parent_nodes: 227 | s = handle_data_dep_parents(n.extremity) 228 | if s is not None: 229 | sources.update(s) 230 | 231 | if len(sources): 232 | print(f"[taint source] resolve data sources: {', '.join(sources)}") 233 | else: 234 | print(f"[taint source] no valid source found") 235 | 236 | # flow path 237 | if len(sources): 238 | print(f"[flow path] data identifier: {node.attributes['name']}, " 239 | f"from source: {', '.join(sources)}, " 240 | f"to sink: {sink}") 241 | results = Storage.get_instance().get_results() 242 | for s in sources: 243 | results[Storage.get_instance().get_page_path()].append({ 244 | "method": method_name, 245 | "ident": node.attributes['name'], 246 | "source": s, 247 | "sink": sink 248 | }) 249 | 250 | 251 | def handle_identifier_node(node, method_name): 252 | # if hasattr(node, "data_dep_children") and len(node.data_dep_children) > 0: 253 | # print("[handle ident] got data flow parent node") 254 | # handle_data_parent_node(node) 255 | 256 | # search backwards (from children) 257 | if hasattr(node, "data_dep_parents") and len(node.data_dep_parents) > 0: 258 | print("[handle ident] got data flow child node") 259 | # omit backwards search 260 | handle_data_child_node(node, method_name) 261 | 262 | 263 | def dfs_visit(node, method_name): 264 | if not isinstance(node, _node.Identifier): 265 | # print("normal node, passing") 266 | return 267 | 268 | handle_identifier_node(node, method_name) 269 | 270 | 271 | def dfs_search(r, n): 272 | stack = deque() 273 | stack.append(r) 274 | 275 | visited = [] 276 | 277 | while stack: 278 | v = stack.pop() 279 | if v in visited: 280 | continue 281 | 282 | # node is not visited 283 | visited.append(v) 284 | dfs_visit(v, n) 285 | 286 | # visit its children 287 | children = v.children 288 | for i in reversed(children): 289 | if i not in visited: 290 | stack.append(i) 291 | -------------------------------------------------------------------------------- /taint_mini/wxml.py: -------------------------------------------------------------------------------- 1 | from lxml.html import parse 2 | from .storage import Storage 3 | 4 | 5 | def handle_wxml(file): 6 | try: 7 | wxml_html_root = parse(file) 8 | visit_wxml_tree(wxml_html_root) 9 | except Exception as e: 10 | print(f"[wxml] got error: {e}") 11 | 12 | 13 | def find_page_method_node(root, name): 14 | for child in root.children: 15 | if child.name == "ExpressionStatement": 16 | if len(child.children) > 0 \ 17 | and child.children[0].name == "CallExpression" \ 18 | and child.children[0].children[0].attributes["name"] == "Page": 19 | # found page expression 20 | for method_node in child.children[0].children[1].children: 21 | if method_node.attributes["value"]["type"] == "FunctionExpression": 22 | # handle node 23 | if method_node.children[0].attributes["name"] == name: 24 | return method_node 25 | 26 | 27 | def tag_properties_to_page_method_param_ident_node(node, p): 28 | # Function [1] -> FunctionExpr [0] -> Ident 29 | node.children[1].children[0].attributes["double_binding_values"] = p["inputs"] 30 | 31 | 32 | def handle_form_properties(p): 33 | root = Storage.get_instance().get_node() 34 | node = find_page_method_node(root, p["bind_submit"]) 35 | tag_properties_to_page_method_param_ident_node(node, p) 36 | 37 | 38 | def handle_wxml_form(element): 39 | visited_elements_in_form = [] 40 | form_properties = dict() 41 | # mapping: key name -> type 42 | form_properties["inputs"] = dict() 43 | 44 | for e in element.iter(): 45 | visited_elements_in_form.append(e) 46 | 47 | # handle form bind:submit 48 | if e.tag == "g-form" or e.tag == "form": 49 | if hasattr(e, "attrib") and "bind:submit" in e.attrib: 50 | form_properties["bind_submit"] = e.attrib["bind:submit"] 51 | continue 52 | 53 | # handle input properties 54 | if (e.tag == "g-input" or e.tag == "input") and hasattr(e, "attrib") \ 55 | and ("name" in e.attrib or "id" in e.attrib): 56 | # handle password 57 | if "password" in e.attrib or ("type" in e.attrib and e.attrib["type"] == "safe-password"): 58 | form_properties["inputs"][e.attrib["name"] if "name" in e.attrib else e.attrib["id"]] = "password" 59 | continue 60 | # handle normal input 61 | if "type" in e.attrib: 62 | form_properties["inputs"][e.attrib["name"] if "name" in e.attrib else e.attrib["id"]] = e.attrib["type"] 63 | continue 64 | 65 | # handle the properties 66 | if form_properties["bind_submit"] is not None: 67 | handle_form_properties(form_properties) 68 | # return the visited elements in this form 69 | return visited_elements_in_form 70 | 71 | 72 | def handle_wxml_element(element): 73 | pass 74 | 75 | 76 | def visit_wxml_tree(r): 77 | visited = [] 78 | 79 | def visit_node(v): 80 | visited.append(v) 81 | # handle form element 82 | # as a form may have many child input elements 83 | if hasattr(v, "tag") and (v.tag == "g-form" or v.tag == "form"): 84 | # multiple elements are visited in handling form element 85 | visited.extend(handle_wxml_form(v)) 86 | return 87 | 88 | # handle normal xml element 89 | handle_wxml_element(v) 90 | 91 | # iter all the elements 92 | for i in r.iter(): 93 | if i not in visited: 94 | visit_node(i) 95 | 96 | 97 | --------------------------------------------------------------------------------