├── .github
└── ISSUE_TEMPLATE
│ └── bug-report.md
├── LICENSE
├── README.md
├── config.json
├── figure
└── taint-mini.svg
├── main.py
├── pdg_js
├── LICENSE
├── README.md
├── __init__.py
├── build_ast.py
├── build_pdg.py
├── control_flow.py
├── data_flow.py
├── display_graph.py
├── extended_ast.py
├── js_operators.py
├── js_reserved.py
├── node.py
├── package-lock.json
├── package.json
├── parser.js
├── pointer_analysis.py
├── scope.py
├── utility_df.py
└── value_filters.py
├── requirements.txt
└── taint_mini
├── __init__.py
├── storage.py
├── taintmini.py
├── wxjs.py
└── wxml.py
/.github/ISSUE_TEMPLATE/bug-report.md:
--------------------------------------------------------------------------------
1 | ---
2 | name: Bug report
3 | about: Create a report to help us improve
4 | title: ''
5 | labels: bug
6 | assignees: chaowangsec
7 |
8 | ---
9 |
10 | **Describe the bug**
11 | A clear and concise description of what the bug is.
12 |
13 | **To Reproduce**
14 | Steps to reproduce the behavior:
15 | 1. Configure the environment '....'
16 | 2. Type commands '....'
17 | 3. Run for a while '....'
18 | 4. Exception raised
19 |
20 | **Expected behavior**
21 | A clear and concise description of what you expected to happen.
22 |
23 | **Screenshots**
24 | If applicable, add screenshots to help explain your problem.
25 |
26 | **Environment (please complete the following information):**
27 | - OS: [e.g. Debian]
28 | - Version: [e.g. bookworm]
29 | - Python version: [e.g. 3.7]
30 | - Other environment:
31 |
32 |
33 | **Command line arguments**
34 |
35 |
36 | **Exception traceback (if applicable)**
37 |
38 |
39 | **Additional context**
40 | Add any other context about the problem here.
41 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | GNU AFFERO GENERAL PUBLIC LICENSE
2 | Version 3, 19 November 2007
3 |
4 | Copyright (C) 2007 Free Software Foundation, Inc.
5 | Everyone is permitted to copy and distribute verbatim copies
6 | of this license document, but changing it is not allowed.
7 |
8 | Preamble
9 |
10 | The GNU Affero General Public License is a free, copyleft license for
11 | software and other kinds of works, specifically designed to ensure
12 | cooperation with the community in the case of network server software.
13 |
14 | The licenses for most software and other practical works are designed
15 | to take away your freedom to share and change the works. By contrast,
16 | our General Public Licenses are intended to guarantee your freedom to
17 | share and change all versions of a program--to make sure it remains free
18 | software for all its users.
19 |
20 | When we speak of free software, we are referring to freedom, not
21 | price. Our General Public Licenses are designed to make sure that you
22 | have the freedom to distribute copies of free software (and charge for
23 | them if you wish), that you receive source code or can get it if you
24 | want it, that you can change the software or use pieces of it in new
25 | free programs, and that you know you can do these things.
26 |
27 | Developers that use our General Public Licenses protect your rights
28 | with two steps: (1) assert copyright on the software, and (2) offer
29 | you this License which gives you legal permission to copy, distribute
30 | and/or modify the software.
31 |
32 | A secondary benefit of defending all users' freedom is that
33 | improvements made in alternate versions of the program, if they
34 | receive widespread use, become available for other developers to
35 | incorporate. Many developers of free software are heartened and
36 | encouraged by the resulting cooperation. However, in the case of
37 | software used on network servers, this result may fail to come about.
38 | The GNU General Public License permits making a modified version and
39 | letting the public access it on a server without ever releasing its
40 | source code to the public.
41 |
42 | The GNU Affero General Public License is designed specifically to
43 | ensure that, in such cases, the modified source code becomes available
44 | to the community. It requires the operator of a network server to
45 | provide the source code of the modified version running there to the
46 | users of that server. Therefore, public use of a modified version, on
47 | a publicly accessible server, gives the public access to the source
48 | code of the modified version.
49 |
50 | An older license, called the Affero General Public License and
51 | published by Affero, was designed to accomplish similar goals. This is
52 | a different license, not a version of the Affero GPL, but Affero has
53 | released a new version of the Affero GPL which permits relicensing under
54 | this license.
55 |
56 | The precise terms and conditions for copying, distribution and
57 | modification follow.
58 |
59 | TERMS AND CONDITIONS
60 |
61 | 0. Definitions.
62 |
63 | "This License" refers to version 3 of the GNU Affero General Public License.
64 |
65 | "Copyright" also means copyright-like laws that apply to other kinds of
66 | works, such as semiconductor masks.
67 |
68 | "The Program" refers to any copyrightable work licensed under this
69 | License. Each licensee is addressed as "you". "Licensees" and
70 | "recipients" may be individuals or organizations.
71 |
72 | To "modify" a work means to copy from or adapt all or part of the work
73 | in a fashion requiring copyright permission, other than the making of an
74 | exact copy. The resulting work is called a "modified version" of the
75 | earlier work or a work "based on" the earlier work.
76 |
77 | A "covered work" means either the unmodified Program or a work based
78 | on the Program.
79 |
80 | To "propagate" a work means to do anything with it that, without
81 | permission, would make you directly or secondarily liable for
82 | infringement under applicable copyright law, except executing it on a
83 | computer or modifying a private copy. Propagation includes copying,
84 | distribution (with or without modification), making available to the
85 | public, and in some countries other activities as well.
86 |
87 | To "convey" a work means any kind of propagation that enables other
88 | parties to make or receive copies. Mere interaction with a user through
89 | a computer network, with no transfer of a copy, is not conveying.
90 |
91 | An interactive user interface displays "Appropriate Legal Notices"
92 | to the extent that it includes a convenient and prominently visible
93 | feature that (1) displays an appropriate copyright notice, and (2)
94 | tells the user that there is no warranty for the work (except to the
95 | extent that warranties are provided), that licensees may convey the
96 | work under this License, and how to view a copy of this License. If
97 | the interface presents a list of user commands or options, such as a
98 | menu, a prominent item in the list meets this criterion.
99 |
100 | 1. Source Code.
101 |
102 | The "source code" for a work means the preferred form of the work
103 | for making modifications to it. "Object code" means any non-source
104 | form of a work.
105 |
106 | A "Standard Interface" means an interface that either is an official
107 | standard defined by a recognized standards body, or, in the case of
108 | interfaces specified for a particular programming language, one that
109 | is widely used among developers working in that language.
110 |
111 | The "System Libraries" of an executable work include anything, other
112 | than the work as a whole, that (a) is included in the normal form of
113 | packaging a Major Component, but which is not part of that Major
114 | Component, and (b) serves only to enable use of the work with that
115 | Major Component, or to implement a Standard Interface for which an
116 | implementation is available to the public in source code form. A
117 | "Major Component", in this context, means a major essential component
118 | (kernel, window system, and so on) of the specific operating system
119 | (if any) on which the executable work runs, or a compiler used to
120 | produce the work, or an object code interpreter used to run it.
121 |
122 | The "Corresponding Source" for a work in object code form means all
123 | the source code needed to generate, install, and (for an executable
124 | work) run the object code and to modify the work, including scripts to
125 | control those activities. However, it does not include the work's
126 | System Libraries, or general-purpose tools or generally available free
127 | programs which are used unmodified in performing those activities but
128 | which are not part of the work. For example, Corresponding Source
129 | includes interface definition files associated with source files for
130 | the work, and the source code for shared libraries and dynamically
131 | linked subprograms that the work is specifically designed to require,
132 | such as by intimate data communication or control flow between those
133 | subprograms and other parts of the work.
134 |
135 | The Corresponding Source need not include anything that users
136 | can regenerate automatically from other parts of the Corresponding
137 | Source.
138 |
139 | The Corresponding Source for a work in source code form is that
140 | same work.
141 |
142 | 2. Basic Permissions.
143 |
144 | All rights granted under this License are granted for the term of
145 | copyright on the Program, and are irrevocable provided the stated
146 | conditions are met. This License explicitly affirms your unlimited
147 | permission to run the unmodified Program. The output from running a
148 | covered work is covered by this License only if the output, given its
149 | content, constitutes a covered work. This License acknowledges your
150 | rights of fair use or other equivalent, as provided by copyright law.
151 |
152 | You may make, run and propagate covered works that you do not
153 | convey, without conditions so long as your license otherwise remains
154 | in force. You may convey covered works to others for the sole purpose
155 | of having them make modifications exclusively for you, or provide you
156 | with facilities for running those works, provided that you comply with
157 | the terms of this License in conveying all material for which you do
158 | not control copyright. Those thus making or running the covered works
159 | for you must do so exclusively on your behalf, under your direction
160 | and control, on terms that prohibit them from making any copies of
161 | your copyrighted material outside their relationship with you.
162 |
163 | Conveying under any other circumstances is permitted solely under
164 | the conditions stated below. Sublicensing is not allowed; section 10
165 | makes it unnecessary.
166 |
167 | 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
168 |
169 | No covered work shall be deemed part of an effective technological
170 | measure under any applicable law fulfilling obligations under article
171 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or
172 | similar laws prohibiting or restricting circumvention of such
173 | measures.
174 |
175 | When you convey a covered work, you waive any legal power to forbid
176 | circumvention of technological measures to the extent such circumvention
177 | is effected by exercising rights under this License with respect to
178 | the covered work, and you disclaim any intention to limit operation or
179 | modification of the work as a means of enforcing, against the work's
180 | users, your or third parties' legal rights to forbid circumvention of
181 | technological measures.
182 |
183 | 4. Conveying Verbatim Copies.
184 |
185 | You may convey verbatim copies of the Program's source code as you
186 | receive it, in any medium, provided that you conspicuously and
187 | appropriately publish on each copy an appropriate copyright notice;
188 | keep intact all notices stating that this License and any
189 | non-permissive terms added in accord with section 7 apply to the code;
190 | keep intact all notices of the absence of any warranty; and give all
191 | recipients a copy of this License along with the Program.
192 |
193 | You may charge any price or no price for each copy that you convey,
194 | and you may offer support or warranty protection for a fee.
195 |
196 | 5. Conveying Modified Source Versions.
197 |
198 | You may convey a work based on the Program, or the modifications to
199 | produce it from the Program, in the form of source code under the
200 | terms of section 4, provided that you also meet all of these conditions:
201 |
202 | a) The work must carry prominent notices stating that you modified
203 | it, and giving a relevant date.
204 |
205 | b) The work must carry prominent notices stating that it is
206 | released under this License and any conditions added under section
207 | 7. This requirement modifies the requirement in section 4 to
208 | "keep intact all notices".
209 |
210 | c) You must license the entire work, as a whole, under this
211 | License to anyone who comes into possession of a copy. This
212 | License will therefore apply, along with any applicable section 7
213 | additional terms, to the whole of the work, and all its parts,
214 | regardless of how they are packaged. This License gives no
215 | permission to license the work in any other way, but it does not
216 | invalidate such permission if you have separately received it.
217 |
218 | d) If the work has interactive user interfaces, each must display
219 | Appropriate Legal Notices; however, if the Program has interactive
220 | interfaces that do not display Appropriate Legal Notices, your
221 | work need not make them do so.
222 |
223 | A compilation of a covered work with other separate and independent
224 | works, which are not by their nature extensions of the covered work,
225 | and which are not combined with it such as to form a larger program,
226 | in or on a volume of a storage or distribution medium, is called an
227 | "aggregate" if the compilation and its resulting copyright are not
228 | used to limit the access or legal rights of the compilation's users
229 | beyond what the individual works permit. Inclusion of a covered work
230 | in an aggregate does not cause this License to apply to the other
231 | parts of the aggregate.
232 |
233 | 6. Conveying Non-Source Forms.
234 |
235 | You may convey a covered work in object code form under the terms
236 | of sections 4 and 5, provided that you also convey the
237 | machine-readable Corresponding Source under the terms of this License,
238 | in one of these ways:
239 |
240 | a) Convey the object code in, or embodied in, a physical product
241 | (including a physical distribution medium), accompanied by the
242 | Corresponding Source fixed on a durable physical medium
243 | customarily used for software interchange.
244 |
245 | b) Convey the object code in, or embodied in, a physical product
246 | (including a physical distribution medium), accompanied by a
247 | written offer, valid for at least three years and valid for as
248 | long as you offer spare parts or customer support for that product
249 | model, to give anyone who possesses the object code either (1) a
250 | copy of the Corresponding Source for all the software in the
251 | product that is covered by this License, on a durable physical
252 | medium customarily used for software interchange, for a price no
253 | more than your reasonable cost of physically performing this
254 | conveying of source, or (2) access to copy the
255 | Corresponding Source from a network server at no charge.
256 |
257 | c) Convey individual copies of the object code with a copy of the
258 | written offer to provide the Corresponding Source. This
259 | alternative is allowed only occasionally and noncommercially, and
260 | only if you received the object code with such an offer, in accord
261 | with subsection 6b.
262 |
263 | d) Convey the object code by offering access from a designated
264 | place (gratis or for a charge), and offer equivalent access to the
265 | Corresponding Source in the same way through the same place at no
266 | further charge. You need not require recipients to copy the
267 | Corresponding Source along with the object code. If the place to
268 | copy the object code is a network server, the Corresponding Source
269 | may be on a different server (operated by you or a third party)
270 | that supports equivalent copying facilities, provided you maintain
271 | clear directions next to the object code saying where to find the
272 | Corresponding Source. Regardless of what server hosts the
273 | Corresponding Source, you remain obligated to ensure that it is
274 | available for as long as needed to satisfy these requirements.
275 |
276 | e) Convey the object code using peer-to-peer transmission, provided
277 | you inform other peers where the object code and Corresponding
278 | Source of the work are being offered to the general public at no
279 | charge under subsection 6d.
280 |
281 | A separable portion of the object code, whose source code is excluded
282 | from the Corresponding Source as a System Library, need not be
283 | included in conveying the object code work.
284 |
285 | A "User Product" is either (1) a "consumer product", which means any
286 | tangible personal property which is normally used for personal, family,
287 | or household purposes, or (2) anything designed or sold for incorporation
288 | into a dwelling. In determining whether a product is a consumer product,
289 | doubtful cases shall be resolved in favor of coverage. For a particular
290 | product received by a particular user, "normally used" refers to a
291 | typical or common use of that class of product, regardless of the status
292 | of the particular user or of the way in which the particular user
293 | actually uses, or expects or is expected to use, the product. A product
294 | is a consumer product regardless of whether the product has substantial
295 | commercial, industrial or non-consumer uses, unless such uses represent
296 | the only significant mode of use of the product.
297 |
298 | "Installation Information" for a User Product means any methods,
299 | procedures, authorization keys, or other information required to install
300 | and execute modified versions of a covered work in that User Product from
301 | a modified version of its Corresponding Source. The information must
302 | suffice to ensure that the continued functioning of the modified object
303 | code is in no case prevented or interfered with solely because
304 | modification has been made.
305 |
306 | If you convey an object code work under this section in, or with, or
307 | specifically for use in, a User Product, and the conveying occurs as
308 | part of a transaction in which the right of possession and use of the
309 | User Product is transferred to the recipient in perpetuity or for a
310 | fixed term (regardless of how the transaction is characterized), the
311 | Corresponding Source conveyed under this section must be accompanied
312 | by the Installation Information. But this requirement does not apply
313 | if neither you nor any third party retains the ability to install
314 | modified object code on the User Product (for example, the work has
315 | been installed in ROM).
316 |
317 | The requirement to provide Installation Information does not include a
318 | requirement to continue to provide support service, warranty, or updates
319 | for a work that has been modified or installed by the recipient, or for
320 | the User Product in which it has been modified or installed. Access to a
321 | network may be denied when the modification itself materially and
322 | adversely affects the operation of the network or violates the rules and
323 | protocols for communication across the network.
324 |
325 | Corresponding Source conveyed, and Installation Information provided,
326 | in accord with this section must be in a format that is publicly
327 | documented (and with an implementation available to the public in
328 | source code form), and must require no special password or key for
329 | unpacking, reading or copying.
330 |
331 | 7. Additional Terms.
332 |
333 | "Additional permissions" are terms that supplement the terms of this
334 | License by making exceptions from one or more of its conditions.
335 | Additional permissions that are applicable to the entire Program shall
336 | be treated as though they were included in this License, to the extent
337 | that they are valid under applicable law. If additional permissions
338 | apply only to part of the Program, that part may be used separately
339 | under those permissions, but the entire Program remains governed by
340 | this License without regard to the additional permissions.
341 |
342 | When you convey a copy of a covered work, you may at your option
343 | remove any additional permissions from that copy, or from any part of
344 | it. (Additional permissions may be written to require their own
345 | removal in certain cases when you modify the work.) You may place
346 | additional permissions on material, added by you to a covered work,
347 | for which you have or can give appropriate copyright permission.
348 |
349 | Notwithstanding any other provision of this License, for material you
350 | add to a covered work, you may (if authorized by the copyright holders of
351 | that material) supplement the terms of this License with terms:
352 |
353 | a) Disclaiming warranty or limiting liability differently from the
354 | terms of sections 15 and 16 of this License; or
355 |
356 | b) Requiring preservation of specified reasonable legal notices or
357 | author attributions in that material or in the Appropriate Legal
358 | Notices displayed by works containing it; or
359 |
360 | c) Prohibiting misrepresentation of the origin of that material, or
361 | requiring that modified versions of such material be marked in
362 | reasonable ways as different from the original version; or
363 |
364 | d) Limiting the use for publicity purposes of names of licensors or
365 | authors of the material; or
366 |
367 | e) Declining to grant rights under trademark law for use of some
368 | trade names, trademarks, or service marks; or
369 |
370 | f) Requiring indemnification of licensors and authors of that
371 | material by anyone who conveys the material (or modified versions of
372 | it) with contractual assumptions of liability to the recipient, for
373 | any liability that these contractual assumptions directly impose on
374 | those licensors and authors.
375 |
376 | All other non-permissive additional terms are considered "further
377 | restrictions" within the meaning of section 10. If the Program as you
378 | received it, or any part of it, contains a notice stating that it is
379 | governed by this License along with a term that is a further
380 | restriction, you may remove that term. If a license document contains
381 | a further restriction but permits relicensing or conveying under this
382 | License, you may add to a covered work material governed by the terms
383 | of that license document, provided that the further restriction does
384 | not survive such relicensing or conveying.
385 |
386 | If you add terms to a covered work in accord with this section, you
387 | must place, in the relevant source files, a statement of the
388 | additional terms that apply to those files, or a notice indicating
389 | where to find the applicable terms.
390 |
391 | Additional terms, permissive or non-permissive, may be stated in the
392 | form of a separately written license, or stated as exceptions;
393 | the above requirements apply either way.
394 |
395 | 8. Termination.
396 |
397 | You may not propagate or modify a covered work except as expressly
398 | provided under this License. Any attempt otherwise to propagate or
399 | modify it is void, and will automatically terminate your rights under
400 | this License (including any patent licenses granted under the third
401 | paragraph of section 11).
402 |
403 | However, if you cease all violation of this License, then your
404 | license from a particular copyright holder is reinstated (a)
405 | provisionally, unless and until the copyright holder explicitly and
406 | finally terminates your license, and (b) permanently, if the copyright
407 | holder fails to notify you of the violation by some reasonable means
408 | prior to 60 days after the cessation.
409 |
410 | Moreover, your license from a particular copyright holder is
411 | reinstated permanently if the copyright holder notifies you of the
412 | violation by some reasonable means, this is the first time you have
413 | received notice of violation of this License (for any work) from that
414 | copyright holder, and you cure the violation prior to 30 days after
415 | your receipt of the notice.
416 |
417 | Termination of your rights under this section does not terminate the
418 | licenses of parties who have received copies or rights from you under
419 | this License. If your rights have been terminated and not permanently
420 | reinstated, you do not qualify to receive new licenses for the same
421 | material under section 10.
422 |
423 | 9. Acceptance Not Required for Having Copies.
424 |
425 | You are not required to accept this License in order to receive or
426 | run a copy of the Program. Ancillary propagation of a covered work
427 | occurring solely as a consequence of using peer-to-peer transmission
428 | to receive a copy likewise does not require acceptance. However,
429 | nothing other than this License grants you permission to propagate or
430 | modify any covered work. These actions infringe copyright if you do
431 | not accept this License. Therefore, by modifying or propagating a
432 | covered work, you indicate your acceptance of this License to do so.
433 |
434 | 10. Automatic Licensing of Downstream Recipients.
435 |
436 | Each time you convey a covered work, the recipient automatically
437 | receives a license from the original licensors, to run, modify and
438 | propagate that work, subject to this License. You are not responsible
439 | for enforcing compliance by third parties with this License.
440 |
441 | An "entity transaction" is a transaction transferring control of an
442 | organization, or substantially all assets of one, or subdividing an
443 | organization, or merging organizations. If propagation of a covered
444 | work results from an entity transaction, each party to that
445 | transaction who receives a copy of the work also receives whatever
446 | licenses to the work the party's predecessor in interest had or could
447 | give under the previous paragraph, plus a right to possession of the
448 | Corresponding Source of the work from the predecessor in interest, if
449 | the predecessor has it or can get it with reasonable efforts.
450 |
451 | You may not impose any further restrictions on the exercise of the
452 | rights granted or affirmed under this License. For example, you may
453 | not impose a license fee, royalty, or other charge for exercise of
454 | rights granted under this License, and you may not initiate litigation
455 | (including a cross-claim or counterclaim in a lawsuit) alleging that
456 | any patent claim is infringed by making, using, selling, offering for
457 | sale, or importing the Program or any portion of it.
458 |
459 | 11. Patents.
460 |
461 | A "contributor" is a copyright holder who authorizes use under this
462 | License of the Program or a work on which the Program is based. The
463 | work thus licensed is called the contributor's "contributor version".
464 |
465 | A contributor's "essential patent claims" are all patent claims
466 | owned or controlled by the contributor, whether already acquired or
467 | hereafter acquired, that would be infringed by some manner, permitted
468 | by this License, of making, using, or selling its contributor version,
469 | but do not include claims that would be infringed only as a
470 | consequence of further modification of the contributor version. For
471 | purposes of this definition, "control" includes the right to grant
472 | patent sublicenses in a manner consistent with the requirements of
473 | this License.
474 |
475 | Each contributor grants you a non-exclusive, worldwide, royalty-free
476 | patent license under the contributor's essential patent claims, to
477 | make, use, sell, offer for sale, import and otherwise run, modify and
478 | propagate the contents of its contributor version.
479 |
480 | In the following three paragraphs, a "patent license" is any express
481 | agreement or commitment, however denominated, not to enforce a patent
482 | (such as an express permission to practice a patent or covenant not to
483 | sue for patent infringement). To "grant" such a patent license to a
484 | party means to make such an agreement or commitment not to enforce a
485 | patent against the party.
486 |
487 | If you convey a covered work, knowingly relying on a patent license,
488 | and the Corresponding Source of the work is not available for anyone
489 | to copy, free of charge and under the terms of this License, through a
490 | publicly available network server or other readily accessible means,
491 | then you must either (1) cause the Corresponding Source to be so
492 | available, or (2) arrange to deprive yourself of the benefit of the
493 | patent license for this particular work, or (3) arrange, in a manner
494 | consistent with the requirements of this License, to extend the patent
495 | license to downstream recipients. "Knowingly relying" means you have
496 | actual knowledge that, but for the patent license, your conveying the
497 | covered work in a country, or your recipient's use of the covered work
498 | in a country, would infringe one or more identifiable patents in that
499 | country that you have reason to believe are valid.
500 |
501 | If, pursuant to or in connection with a single transaction or
502 | arrangement, you convey, or propagate by procuring conveyance of, a
503 | covered work, and grant a patent license to some of the parties
504 | receiving the covered work authorizing them to use, propagate, modify
505 | or convey a specific copy of the covered work, then the patent license
506 | you grant is automatically extended to all recipients of the covered
507 | work and works based on it.
508 |
509 | A patent license is "discriminatory" if it does not include within
510 | the scope of its coverage, prohibits the exercise of, or is
511 | conditioned on the non-exercise of one or more of the rights that are
512 | specifically granted under this License. You may not convey a covered
513 | work if you are a party to an arrangement with a third party that is
514 | in the business of distributing software, under which you make payment
515 | to the third party based on the extent of your activity of conveying
516 | the work, and under which the third party grants, to any of the
517 | parties who would receive the covered work from you, a discriminatory
518 | patent license (a) in connection with copies of the covered work
519 | conveyed by you (or copies made from those copies), or (b) primarily
520 | for and in connection with specific products or compilations that
521 | contain the covered work, unless you entered into that arrangement,
522 | or that patent license was granted, prior to 28 March 2007.
523 |
524 | Nothing in this License shall be construed as excluding or limiting
525 | any implied license or other defenses to infringement that may
526 | otherwise be available to you under applicable patent law.
527 |
528 | 12. No Surrender of Others' Freedom.
529 |
530 | If conditions are imposed on you (whether by court order, agreement or
531 | otherwise) that contradict the conditions of this License, they do not
532 | excuse you from the conditions of this License. If you cannot convey a
533 | covered work so as to satisfy simultaneously your obligations under this
534 | License and any other pertinent obligations, then as a consequence you may
535 | not convey it at all. For example, if you agree to terms that obligate you
536 | to collect a royalty for further conveying from those to whom you convey
537 | the Program, the only way you could satisfy both those terms and this
538 | License would be to refrain entirely from conveying the Program.
539 |
540 | 13. Remote Network Interaction; Use with the GNU General Public License.
541 |
542 | Notwithstanding any other provision of this License, if you modify the
543 | Program, your modified version must prominently offer all users
544 | interacting with it remotely through a computer network (if your version
545 | supports such interaction) an opportunity to receive the Corresponding
546 | Source of your version by providing access to the Corresponding Source
547 | from a network server at no charge, through some standard or customary
548 | means of facilitating copying of software. This Corresponding Source
549 | shall include the Corresponding Source for any work covered by version 3
550 | of the GNU General Public License that is incorporated pursuant to the
551 | following paragraph.
552 |
553 | Notwithstanding any other provision of this License, you have
554 | permission to link or combine any covered work with a work licensed
555 | under version 3 of the GNU General Public License into a single
556 | combined work, and to convey the resulting work. The terms of this
557 | License will continue to apply to the part which is the covered work,
558 | but the work with which it is combined will remain governed by version
559 | 3 of the GNU General Public License.
560 |
561 | 14. Revised Versions of this License.
562 |
563 | The Free Software Foundation may publish revised and/or new versions of
564 | the GNU Affero General Public License from time to time. Such new versions
565 | will be similar in spirit to the present version, but may differ in detail to
566 | address new problems or concerns.
567 |
568 | Each version is given a distinguishing version number. If the
569 | Program specifies that a certain numbered version of the GNU Affero General
570 | Public License "or any later version" applies to it, you have the
571 | option of following the terms and conditions either of that numbered
572 | version or of any later version published by the Free Software
573 | Foundation. If the Program does not specify a version number of the
574 | GNU Affero General Public License, you may choose any version ever published
575 | by the Free Software Foundation.
576 |
577 | If the Program specifies that a proxy can decide which future
578 | versions of the GNU Affero General Public License can be used, that proxy's
579 | public statement of acceptance of a version permanently authorizes you
580 | to choose that version for the Program.
581 |
582 | Later license versions may give you additional or different
583 | permissions. However, no additional obligations are imposed on any
584 | author or copyright holder as a result of your choosing to follow a
585 | later version.
586 |
587 | 15. Disclaimer of Warranty.
588 |
589 | THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
590 | APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
591 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
592 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
593 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
594 | PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
595 | IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
596 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
597 |
598 | 16. Limitation of Liability.
599 |
600 | IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
601 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
602 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
603 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
604 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
605 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
606 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
607 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
608 | SUCH DAMAGES.
609 |
610 | 17. Interpretation of Sections 15 and 16.
611 |
612 | If the disclaimer of warranty and limitation of liability provided
613 | above cannot be given local legal effect according to their terms,
614 | reviewing courts shall apply local law that most closely approximates
615 | an absolute waiver of all civil liability in connection with the
616 | Program, unless a warranty or assumption of liability accompanies a
617 | copy of the Program in return for a fee.
618 |
619 | END OF TERMS AND CONDITIONS
620 |
621 | How to Apply These Terms to Your New Programs
622 |
623 | If you develop a new program, and you want it to be of the greatest
624 | possible use to the public, the best way to achieve this is to make it
625 | free software which everyone can redistribute and change under these terms.
626 |
627 | To do so, attach the following notices to the program. It is safest
628 | to attach them to the start of each source file to most effectively
629 | state the exclusion of warranty; and each file should have at least
630 | the "copyright" line and a pointer to where the full notice is found.
631 |
632 |
633 | Copyright (C)
634 |
635 | This program is free software: you can redistribute it and/or modify
636 | it under the terms of the GNU Affero General Public License as published
637 | by the Free Software Foundation, either version 3 of the License, or
638 | (at your option) any later version.
639 |
640 | This program is distributed in the hope that it will be useful,
641 | but WITHOUT ANY WARRANTY; without even the implied warranty of
642 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
643 | GNU Affero General Public License for more details.
644 |
645 | You should have received a copy of the GNU Affero General Public License
646 | along with this program. If not, see .
647 |
648 | Also add information on how to contact you by electronic and paper mail.
649 |
650 | If your software can interact with users remotely through a computer
651 | network, you should also make sure that it provides a way for users to
652 | get its source. For example, if your program is a web application, its
653 | interface could display a "Source" link that leads users to an archive
654 | of the code. There are many ways you could offer source, and different
655 | solutions will be better for different programs; see section 13 for the
656 | specific requirements.
657 |
658 | You should also get your employer (if you work as a programmer) or school,
659 | if any, to sign a "copyright disclaimer" for the program, if necessary.
660 | For more information on this, and how to apply and follow the GNU AGPL, see
661 | .
662 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # TaintMini
2 |
3 | TaintMini is a framework for detecting flows of sensitive data in Mini-Programs with static taint analysis. It is a novel universal data flow graph approach that captures data flows within
4 | and across mini-programs.
5 |
6 | 
7 |
8 | We implemented TaintMini based on `pdg_js` (from [DoubleX](https://github.com/Aurore54F/DoubleX) by [Aurore Fass](https://aurore54f.github.io/) *et al*.). For more implementation details, please refer to our [paper](https://chaowang.dev/publications/icse23.pdf) and the [DoubleX paper](https://swag.cispa.saarland/papers/fass2021doublex.pdf).
9 |
10 | ## Table of contents
11 |
12 | - [TaintMini](#taintmini)
13 | - [Table of contents](#table-of-contents)
14 | - [Prerequisites](#prerequisites)
15 | - [Environment](#environment)
16 | - [Dependencies](#dependencies)
17 | - [Pre-processing](#pre-processing)
18 | - [Usage](#usage)
19 | - [Config](#config)
20 | - [Examples](#examples)
21 | - [Single MiniProgram](#single-miniprogram)
22 | - [Multiple MiniPrograms](#multiple-miniprograms)
23 | - [Citation](#citation)
24 | - [License](#license)
25 |
26 | ## Prerequisites
27 |
28 | ### Environment
29 |
30 | For optimal performance, we recommend allocating at least 4 cores and 16 GiB of memory to run the tool.
31 | Additionally, for best IO performance during analysis, we recommend using SSDs rather than hard disk drives, due to the large number of small files (less than one page size) that Mini-Programs typically have.
32 | As a reference, we used 16 vCPUs of Intel Xeon Silver 4314, 128 GiB of 3200 MHz DDR4 memory, and 2 TiB of NVMe SSD (700 KIOPS) as the host for building and validating our artifact evaluation submission.
33 |
34 | ### Dependencies
35 |
36 | Install Node.js dependencies for `pdg_js` first.
37 |
38 | ```bash
39 | # make sure node.js and npm is installed
40 | node --version && cd pdg_js && npm i
41 | ```
42 |
43 | Install requirements for python.
44 |
45 | ```bash
46 | # install requirements
47 | pip install -r requirements.txt
48 | ```
49 |
50 | ### Pre-processing
51 |
52 | TaintMini operates on unpacked WeChat Mini-Programs, necessitating the use of a WeChat Mini-Program unpacking tool in advance.
53 | Please note that we are unable to provide such a tool directly due to potential legal implications.
54 | We recommend seeking it out on external websites.
55 |
56 | ## Usage
57 |
58 | ```
59 | usage: mini-taint [-h] -i path [-o path] [-c path] [-j number] [-b]
60 |
61 | optional arguments:
62 | -h, --help show this help message and exit
63 | -i path, --input path
64 | path of input mini program(s). Single mini program directory or index files will both be fine.
65 | -o path, --output path
66 | path of output results. The output file will be stored outside of the mini program directories.
67 | -c path, --config path
68 | path of config file. See default config file for example. Leave the field empty to include all results.
69 | -j number, --jobs number
70 | number of workers.
71 | -b, --bench enable benchmark data log. Default: False
72 | ```
73 |
74 | Results will be written to the directory provided by the `-o/--output` flag.
75 | Result files are named `$(basename )-result.csv`,
76 | along with `$(basename )-bench.csv` if `-b/--bench` option is present.
77 |
78 | ## Config
79 |
80 | The `config.json` is a JSON formatted file, which includes two fields: `sources` and `sinks`:
81 |
82 | - `sources` is an array, indicating the source APIs that need to be included. Please note there is a special value named `[double_binding]` which indicates the data flows from `WXML`.
83 | - `sinks` is an array, indicating the sink APIs that need to be included.
84 |
85 | For examples, please refer to the `config.json` file.
86 |
87 | ## Examples
88 |
89 | ### Single MiniProgram
90 |
91 | Analyze a single MiniProgram; Include all sources and sinks; Enable multi-processing (all available CPU cores); No benchmark required.
92 |
93 | ```bash
94 | python main.py -i /path/to/miniprogram -o ./results -j $(nproc)
95 | ```
96 |
97 | ### Multiple MiniPrograms
98 |
99 | Analyze multiple MiniPrograms; Include all sources and sinks; Enable multi-processing (all available CPU cores); Benchmarks required.
100 |
101 | ```bash
102 | # generate index
103 | find /path/to/miniprograms -maxdepth 1 -type d -name "wx*" > index.txt
104 | # start analysis
105 | python main.py -i ./index.txt -o ./results -j $(nproc) -b
106 | ```
107 |
108 | ## Citation
109 |
110 | If you find TaintMini useful, please consider citing our paper and DoubleX:
111 |
112 | ```plaintext
113 | @inproceedings{wang2023taintmini,
114 | title={TAINTMINI: Detecting Flow of Sensitive Data in Mini-Programs with Static Taint Analysis},
115 | author={Wang, Chao and Ko, Ronny and Zhang, Yue and Yang, Yuqing and Lin, Zhiqiang},
116 | booktitle={Proceedings of the 45th International Conference on Software Engineering},
117 | year={2023}
118 | }
119 |
120 | @inproceedings{fass2021doublex,
121 | author="Aurore Fass and Doli{\`e}re Francis Som{\'e} and Michael Backes and Ben Stock",
122 | title="{\textsc{DoubleX}: Statically Detecting Vulnerable Data Flows in Browser Extensions at Scale}",
123 | booktitle="ACM CCS",
124 | year="2021"
125 | }
126 | ```
127 |
128 | ## License
129 |
130 | This project is licensed under the terms of the AGPLV3 license.
131 |
132 | * **pdg_js** is credit to [**DoubleX**](https://github.com/Aurore54F/DoubleX/)
133 |
134 |
135 |
--------------------------------------------------------------------------------
/config.json:
--------------------------------------------------------------------------------
1 | {
2 | "sources": ["wx.getStorage", "wx.getStorageSync", "[double_binding]"],
3 | "sinks": ["wx.setStorage", "wx.setStorageSync", "wx.navigateTo", "wx.navigateToMiniProgram"]
4 | }
--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
1 | import json
2 | from taint_mini import taintmini
3 | import argparse
4 | import os
5 |
6 |
7 | def main():
8 | parser = argparse.ArgumentParser(prog="taint-mini",
9 | formatter_class=argparse.RawTextHelpFormatter)
10 |
11 | parser.add_argument("-i", "--input", dest="input", metavar="path", type=str, required=True,
12 | help="path of input mini program(s)."
13 | "Single mini program directory or index files will both be fine.")
14 | parser.add_argument("-o", "--output", dest="output", metavar="path", type=str, default="results",
15 | help="path of output results."
16 | "The output file will be stored outside of the mini program directories.")
17 | parser.add_argument("-c", "--config", dest="config", metavar="path", type=str,
18 | help="path of config file."
19 | "See default config file for example. Leave the field empty to include all results.")
20 | parser.add_argument("-j", "--jobs", dest="workers", metavar="number", type=int, default=None,
21 | help="number of workers.")
22 | parser.add_argument("-b", "--bench", dest="bench", action="store_true",
23 | help="enable benchmark data log."
24 | "Default: False")
25 |
26 | args = parser.parse_args()
27 | input_path = args.input
28 | output_path = args.output
29 | config_path = args.config
30 | workers = args.workers
31 | bench = args.bench
32 |
33 | # test config
34 | config = None
35 | if config_path is None:
36 | # no config given, include all sources and sinks
37 | config = dict()
38 | else:
39 | try:
40 | config = json.load(open(config_path))
41 | except FileNotFoundError:
42 | print(f"[main] error: config not found")
43 | exit(-1)
44 |
45 | # test input_path
46 | if os.path.exists(input_path):
47 | if os.path.isfile(input_path):
48 | # handle index files
49 | with open(input_path) as f:
50 | for i in f.readlines():
51 | taintmini.analyze_mini_program(str.strip(i), output_path, config, workers, bench)
52 | elif os.path.isdir(input_path):
53 | # handle single mini program
54 | taintmini.analyze_mini_program(input_path, output_path, config, workers, bench)
55 | else:
56 | print(f"[main] error: invalid input path")
57 |
58 |
59 | if __name__ == "__main__":
60 | main()
61 |
--------------------------------------------------------------------------------
/pdg_js/README.md:
--------------------------------------------------------------------------------
1 | # pdg_js
2 |
3 | Statically building the enhanced AST (with control and data flow, as well as pointer analysis information) for JavaScript inputs (sometimes referred to as PDG).
4 |
5 |
6 | ## Setup (if not already done for DoubleX)
7 |
8 | ```
9 | install python3 # (tested with 3.7.3 and 3.7.4)
10 |
11 | install nodejs
12 | install npm
13 | cd src/pdg_js
14 | npm install esprima # (tested with 4.0.1)
15 | npm install escodegen # (tested with 1.14.2 and 2.0.0)
16 | cd ..
17 | ```
18 |
19 | To install graphviz (only for drawing graphs, not yet documented, please open an issue if interested)
20 | ```
21 | pip3 install graphviz
22 | On MacOS: install brew and then brew install graphviz
23 | On Linux: sudo apt-get install graphviz
24 | ```
25 |
26 | ## Usage
27 |
28 | ### PDG Generation - Multiprocessing
29 |
30 | Let's consider a directory `EXTENSIONS` containing several extension's folders. For each extension, their corresponding folder contains *.js files for each component. We would like to generate the PDGs (= ASTs enhanced with control and data flow, and pointer analysis) of each file. For each extension, the corresponding PDG will be stored in the folder `PDG`.
31 | To generate these PDGs, launch the following shell command from the `pdg_js` folder location:
32 | ```
33 | $ python3 -c "from build_pdg import store_extension_pdg_folder; store_extension_pdg_folder('EXTENSIONS')"
34 | ```
35 |
36 | The corresponding PDGs will be stored in EXTENSIONS/\/PDG`.
37 |
38 | Currently, we are using 1 CPU, but you can change that by modifying the variable NUM\_WORKERS from `pdg_js/utility_df.py` (the one **line 51**).
39 |
40 |
41 | ### Single PDG Generation
42 |
43 | To generate the PDG of a specific *.js file, launch the following python3 commands from the `pdg_js` folder location:
44 | ```
45 | >>> from build_pdg import get_data_flow
46 | >>> pdg = get_data_flow('INPUT_FILE', benchmarks=dict())
47 | ```
48 |
49 | Per default, the corresponding PDG will not be stored. To store it in an **existing** PDG\_PATH folder, call:
50 | ```
51 | $ python3 -c "from build_pdg import get_data_flow; get_data_flow('INPUT_FILE', benchmarks=dict(), store_pdgs='PDG_PATH')"
52 | ```
53 |
54 |
55 | Note that we added a timeout of 10 min for the data flow/pointer analysis (cf. line 149 of `pdg_js/build_pdg.py`), and a memory limit of 20GB (cf. line 115 of `pdg_js/build_pdg.py`).
--------------------------------------------------------------------------------
/pdg_js/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OSUSecLab/TaintMini/bbf0af5801c9b40f95dff82a040a11e9433a8ed5/pdg_js/__init__.py
--------------------------------------------------------------------------------
/pdg_js/build_ast.py:
--------------------------------------------------------------------------------
1 | # Copyright (C) 2021 Aurore Fass
2 | #
3 | # This program is free software: you can redistribute it and/or modify
4 | # it under the terms of the GNU Affero General Public License as published
5 | # by the Free Software Foundation, either version 3 of the License, or
6 | # (at your option) any later version.
7 | #
8 | # This program is distributed in the hope that it will be useful,
9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of
10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11 | # GNU Affero General Public License for more details.
12 | #
13 | # You should have received a copy of the GNU Affero General Public License
14 | # along with this program. If not, see .
15 |
16 |
17 | """
18 | From JS source code to an Esprima AST exported in JSON.
19 | From JSON to ExtendedAst and Node objects.
20 | From Node objects to JSON.
21 | From JSON to JS source code using Escodegen.
22 | """
23 |
24 | # Note: improved from HideNoSeek (bugs correction + semantic information to the nodes)
25 |
26 |
27 | import logging
28 | import json
29 | import os
30 | import subprocess
31 |
32 | from . import node as _node
33 | from . import extended_ast as _extended_ast
34 |
35 | SRC_PATH = os.path.abspath(os.path.join(os.path.dirname(__file__)))
36 |
37 |
38 | def get_extended_ast(input_file, json_path, remove_json=False):
39 | """
40 | JavaScript AST production.
41 |
42 | -------
43 | Parameters:
44 | - input_file: str
45 | Path of the file to produce an AST from.
46 | - json_path: str
47 | Path of the JSON file to temporary store the AST in.
48 | - remove_json: bool
49 | Indicates whether to remove or not the JSON file containing the Esprima AST.
50 | Default: True.
51 |
52 | -------
53 | Returns:
54 | - ExtendedAst
55 | The extended AST (i.e., contains type, filename, body, sourceType, range, comments,
56 | tokens, and possibly leadingComments) of input_file.
57 | - None if an error occurred.
58 | """
59 |
60 | try:
61 | produce_ast = subprocess.run(['node', os.path.join(SRC_PATH, 'parser.js'),
62 | input_file, json_path],
63 | stdout=subprocess.PIPE, check=True)
64 | except subprocess.CalledProcessError:
65 | logging.critical('Esprima parsing error for %s', input_file)
66 | return None
67 |
68 | if produce_ast.returncode == 0:
69 |
70 | with open(json_path) as json_data:
71 | esprima_ast = json.loads(json_data.read())
72 | if remove_json:
73 | os.remove(json_path)
74 |
75 | extended_ast = _extended_ast.ExtendedAst()
76 | extended_ast.filename = input_file
77 | extended_ast.set_type(esprima_ast['type'])
78 | extended_ast.set_body(esprima_ast['body'])
79 | extended_ast.set_source_type(esprima_ast['sourceType'])
80 | extended_ast.set_range(esprima_ast['range'])
81 | extended_ast.set_tokens(esprima_ast['tokens'])
82 | extended_ast.set_comments(esprima_ast['comments'])
83 | if 'leadingComments' in esprima_ast:
84 | extended_ast.set_leading_comments(esprima_ast['leadingComments'])
85 |
86 | return extended_ast
87 |
88 | logging.critical('Esprima could not produce an AST for %s', input_file)
89 | return None
90 |
91 |
92 | def indent(depth_dict):
93 | """ Indentation size. """
94 | return '\t' * depth_dict
95 |
96 |
97 | def brace(key):
98 | """ Write a word between cases. """
99 | return '|<' + key + '>'
100 |
101 |
102 | def print_dict(depth_dict, key, value, max_depth, delete_leaf):
103 | """ Print the content of a dict with specific indentation and braces for the keys. """
104 | if depth_dict <= max_depth:
105 | print('%s%s' % (indent(depth_dict), brace(key)))
106 | beautiful_print_ast(value, depth=depth_dict + 1, max_depth=max_depth,
107 | delete_leaf=delete_leaf)
108 |
109 |
110 | def print_value(depth_dict, key, value, max_depth, delete_leaf):
111 | """ Print a dict value with respect to the indentation. """
112 | if depth_dict <= max_depth:
113 | if all(dont_consider != key for dont_consider in delete_leaf):
114 | print(indent(depth_dict) + "| %s = %s" % (key, value))
115 |
116 |
117 | def beautiful_print_ast(ast, delete_leaf, depth=0, max_depth=2 ** 63):
118 | """
119 | Walking through an AST and printing it beautifully
120 |
121 | -------
122 | Parameters:
123 | - ast: dict
124 | Contains an Esprima AST of a JS file, i.e., get_extended_ast(, )
125 | output or get_extended_ast(, ).get_ast() output.
126 | - depth: int
127 | Initial depth of the tree. Default: 0.
128 | - max_depth: int
129 | Indicates the depth up to which the AST is printed. Default: 2**63.
130 | - delete_leaf: list
131 | Contains the leaf that should not be printed (e.g. 'range'). Default: [''],
132 | beware it is mutable.
133 | """
134 |
135 | for k, v in ast.items(): # Because need k everywhere
136 | if isinstance(v, dict):
137 | print_dict(depth, k, v, max_depth, delete_leaf)
138 | elif isinstance(v, list):
139 | if not v:
140 | print_value(depth, k, v, max_depth, delete_leaf)
141 | for el in v:
142 | if isinstance(el, dict):
143 | print_dict(depth, k, el, max_depth, delete_leaf)
144 | else:
145 | print_value(depth, k, el, max_depth, delete_leaf)
146 | else:
147 | print_value(depth, k, v, max_depth, delete_leaf)
148 |
149 |
150 | def create_node(dico, node_body, parent_node, cond=False, filename=''):
151 | """ Node creation. """
152 |
153 | if dico is None: # Not a Node, but needed a construct to store, e.g., [, a] = array
154 | node = _node.Node(name='None', parent=parent_node)
155 | parent_node.set_child(node)
156 | node.set_body(node_body)
157 | if cond:
158 | node.set_body_list(True)
159 | node.filename = filename
160 |
161 | elif 'type' in dico:
162 | if dico['type'] == 'FunctionDeclaration':
163 | node = _node.FunctionDeclaration(name=dico['type'], parent=parent_node)
164 | elif dico['type'] == 'FunctionExpression' or dico['type'] == 'ArrowFunctionExpression':
165 | node = _node.FunctionExpression(name=dico['type'], parent=parent_node)
166 | elif dico['type'] == 'ReturnStatement':
167 | node = _node.ReturnStatement(name=dico['type'], parent=parent_node)
168 | elif dico['type'] in _node.STATEMENTS:
169 | node = _node.Statement(name=dico['type'], parent=parent_node)
170 | elif dico['type'] in _node.VALUE_EXPR:
171 | node = _node.ValueExpr(name=dico['type'], parent=parent_node)
172 | elif dico['type'] == 'Identifier':
173 | node = _node.Identifier(name=dico['type'], parent=parent_node)
174 | else:
175 | node = _node.Node(name=dico['type'], parent=parent_node)
176 |
177 | if not node.is_comment(): # Otherwise comments are children and it is getting messy!
178 | parent_node.set_child(node)
179 | node.set_body(node_body)
180 | if cond:
181 | node.set_body_list(True) # Some attributes are stored in a list even when they
182 | # are alone. If we do not respect the initial syntax, Escodegen cannot built the
183 | # JS code back.
184 | node.filename = filename
185 | ast_to_ast_nodes(dico, node)
186 |
187 |
188 | def ast_to_ast_nodes(ast, ast_nodes=_node.Node('Program')):
189 | """
190 | Convert an AST to Node objects.
191 |
192 | -------
193 | Parameters:
194 | - ast: dict
195 | Output of get_extended_ast(, ).get_ast().
196 | - ast_nodes: Node
197 | Current Node to be built. Default: ast_nodes=Node('Program'). Beware, always call the
198 | function indicating the default argument, otherwise the last value will be used
199 | (because the default parameter is mutable).
200 |
201 | -------
202 | Returns:
203 | - Node
204 | The AST in format Node object.
205 | """
206 |
207 | if 'filename' in ast:
208 | filename = ast['filename']
209 | ast_nodes.set_attribute('filename', filename)
210 | else:
211 | filename = ''
212 |
213 | for k in ast:
214 | if k == 'filename' or k == 'loc' or k == 'range' or k == 'value' \
215 | or (k != 'type' and not isinstance(ast[k], list)
216 | and not isinstance(ast[k], dict)) or k == 'regex':
217 | ast_nodes.set_attribute(k, ast[k]) # range is a list but stored as attributes
218 | if isinstance(ast[k], dict):
219 | if k == 'range': # Case leadingComments as range: {0: begin, 1: end}
220 | ast_nodes.set_attribute(k, ast[k])
221 | else:
222 | create_node(dico=ast[k], node_body=k, parent_node=ast_nodes, filename=filename)
223 | elif isinstance(ast[k], list):
224 | if not ast[k]: # Case with empty list, e.g. params: []
225 | ast_nodes.set_attribute(k, ast[k])
226 | for el in ast[k]:
227 | if isinstance(el, dict):
228 | create_node(dico=el, node_body=k, parent_node=ast_nodes, cond=True,
229 | filename=filename)
230 | elif el is None: # Case [None, {stuff about a}] for [, a] = array
231 | create_node(dico=el, node_body=k, parent_node=ast_nodes, cond=True,
232 | filename=filename)
233 | return ast_nodes
234 |
235 |
236 | def print_ast_nodes(ast_nodes):
237 | """
238 | Print the Nodes of ast_nodes with their properties.
239 | Debug function.
240 |
241 | -------
242 | Parameters:
243 | - ast_nodes: Node
244 | Output of ast_to_ast_nodes(, ast_nodes=Node('Program')).
245 | """
246 |
247 | for child in ast_nodes.children:
248 | print('Parent: ' + child.parent.name)
249 | print('Child: ' + child.name)
250 | print('Id: ' + str(child.id))
251 | print('Attributes:')
252 | print(child.attributes)
253 | print('Body: ' + str(child.body))
254 | print('Body_list: ' + str(child.body_list))
255 | print('Is-leaf: ' + str(child.is_leaf()))
256 | print('-----------------------')
257 | print_ast_nodes(child)
258 |
259 |
260 | def build_json(ast_nodes, dico):
261 | """
262 | Convert an AST format Node objects to JSON format.
263 |
264 | -------
265 | Parameters:
266 | - ast_nodes: Node
267 | Output of ast_to_ast_nodes(, ast_nodes=Node('Program')).
268 | - dico: dict
269 | Current dict to be built.
270 |
271 | -------
272 | Returns:
273 | - dict
274 | The AST in format JSON.
275 | """
276 |
277 | if ast_nodes.name != 'None': # Nothing interesting in the None Node
278 | dico['type'] = ast_nodes.name
279 | if len(ast_nodes.children) >= 1:
280 | for child in ast_nodes.children:
281 | dico2 = {}
282 | if child.body_list:
283 | if child.body not in dico:
284 | dico[child.body] = [] # Some attributes just have to be stored in a list.
285 | build_json(child, dico2)
286 | if not dico2: # Case [, a] = array -> [None, {stuff about a}] (None and not {})
287 | dico2 = None # Not sure if it could not be legitimate sometimes
288 | logging.warning('Transformed {} into None for Escodegen; was it legitimate?')
289 | dico[child.body].append(dico2)
290 | else:
291 | build_json(child, dico2)
292 | dico[child.body] = dico2
293 | elif ast_nodes.body_list == 'special':
294 | dico[ast_nodes.body] = []
295 | else:
296 | pass
297 | for att in ast_nodes.attributes:
298 | dico[att] = ast_nodes.attributes[att]
299 | return dico
300 |
301 |
302 | def save_json(ast_nodes, json_path):
303 | """
304 | Stores an AST format Node objects in a JSON file.
305 |
306 | -------
307 | Parameters:
308 | - ast_nodes: Node
309 | Output of ast_to_ast_nodes(, ast_nodes=Node('Program')).
310 | - json_path: str
311 | Path of the JSON file to store the AST in.
312 | """
313 |
314 | data = build_json(ast_nodes, dico={})
315 | with open(json_path, 'w') as json_data:
316 | json.dump(data, json_data, indent=4)
317 |
318 |
319 | def get_code(json_path, code_path='1', remove_json=True, test=False):
320 | """
321 | Convert JSON format back to JavaScript code.
322 |
323 | -------
324 | Parameters:
325 | - json_path: str
326 | Path of the JSON file to build the code from.
327 | - code_path: str
328 | Path of the file to store the code in. If 1, then displays it to stdout.
329 | - remove_json: bool
330 | Indicates whether to remove or not the JSON file containing the Esprima AST.
331 | Default: True.
332 | - test: bool
333 | Indicates whether we are in test mode. Default: False.
334 | """
335 |
336 | try:
337 | code = subprocess.run(['node', os.path.join(SRC_PATH, 'generate_js.js'),
338 | json_path, code_path],
339 | stdout=subprocess.PIPE, check=True)
340 | except subprocess.CalledProcessError:
341 | logging.exception('Something went wrong to get the code from the AST for %s', json_path)
342 | return None
343 |
344 | if remove_json:
345 | os.remove(json_path)
346 | if code.returncode != 0:
347 | logging.error('Something wrong happened while converting JS back to code for %s', json_path)
348 | return None
349 |
350 | if code_path == '1':
351 | if test:
352 | print((code.stdout.decode('utf-8')).replace('\n', ''))
353 | return (code.stdout.decode('utf-8')).replace('\n', '')
354 |
355 | return code_path
356 |
--------------------------------------------------------------------------------
/pdg_js/build_pdg.py:
--------------------------------------------------------------------------------
1 | # Copyright (C) 2021 Aurore Fass
2 | # Copyright (C) 2022 Anonymous
3 | #
4 | # This program is free software: you can redistribute it and/or modify
5 | # it under the terms of the GNU Affero General Public License as published
6 | # by the Free Software Foundation, either version 3 of the License, or
7 | # (at your option) any later version.
8 | #
9 | # This program is distributed in the hope that it will be useful,
10 | # but WITHOUT ANY WARRANTY; without even the implied warranty of
11 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12 | # GNU Affero General Public License for more details.
13 | #
14 | # You should have received a copy of the GNU Affero General Public License
15 | # along with this program. If not, see .
16 |
17 |
18 | """
19 | Generation and storage of JavaScript PDGs. Possibility for multiprocessing (NUM_WORKERS
20 | defined in utility_df.py).
21 | """
22 |
23 | import os
24 | import pickle
25 | import logging
26 | import timeit
27 | import json
28 | from multiprocessing import Process, Queue
29 |
30 | from . import node as _node
31 | from . import build_ast
32 | from . import utility_df
33 | from . import control_flow
34 | from . import data_flow
35 | from . import scope as _scope
36 | from . import display_graph
37 |
38 | # Builds the JS code from the AST, or not, to check for possible bugs in the AST building process.
39 | CHECK_JSON = utility_df.CHECK_JSON
40 |
41 |
42 | def pickle_dump_process(dfg_nodes, store_pdg):
43 | """ Call to pickle.dump """
44 | pickle.dump(dfg_nodes, open(store_pdg, 'wb'))
45 |
46 |
47 | def function_hoisting(node, entry):
48 | """ Hoists FunctionDeclaration at the beginning of a basic block = Function bloc. """
49 |
50 | # Will avoid problem if function first called and then defined
51 | for child in node.children:
52 | if child.name == 'FunctionDeclaration':
53 | child.adopt_child(step_daddy=entry) # Sets new parent and deletes old one
54 | function_hoisting(child, entry=child) # New basic block = FunctionDeclaration = child
55 | elif child.name == 'FunctionExpression':
56 | function_hoisting(child, entry=child) # New basic block = FunctionExpression = child
57 | else:
58 | function_hoisting(child, entry=entry) # Current basic block = entry
59 |
60 |
61 | def traverse(node):
62 | """ Debug function, traverse node. """
63 |
64 | for child in node.children:
65 | print(child.name)
66 | traverse(child)
67 |
68 |
69 | def get_data_flow_process(js_path, benchmarks, store_pdgs):
70 | """ Call to get_data_flow. """
71 |
72 | try:
73 | # save_path_pdg = js_path.split(".")[0]
74 | get_data_flow(input_file=js_path, benchmarks=benchmarks, store_pdgs=store_pdgs,
75 | beautiful_print=False, check_json=False, save_path_pdg=False)
76 | except Exception as e:
77 | print(e)
78 | raise e
79 |
80 |
81 | def get_data_flow(input_file, benchmarks, store_pdgs=None, check_var=False, beautiful_print=False,
82 | save_path_ast=False, save_path_cfg=False, save_path_pdg=False,
83 | check_json=CHECK_JSON, alt_json_path=None):
84 | """
85 | Builds the PDG: enhances the AST with CF, DF, and pointer analysis for a given file.
86 |
87 | -------
88 | Parameters:
89 | - input_file: str
90 | Path of the file to analyze.
91 | - benchmarks: dict
92 | Contains the different micro benchmarks. Should be empty.
93 | - store_pdgs: str
94 | Path of the folder to store the PDG in.
95 | Or None to pursue without storing it.
96 | - check_var: bool
97 | Returns the unknown variables (not the PDG).
98 | - save_path_ast / cfg / pdg:
99 | False --> does neither produce nor store the graphical representation;
100 | None --> produces + displays the graphical representation;
101 | Valid-path --> produces + stores the graphical representation under the name Valid-path.
102 | - beautiful_print: bool
103 | Whether to beautiful print the AST or not.
104 | - check_json: bool
105 | Builds the JS code from the AST, or not, to check for bugs in the AST building process.
106 |
107 | -------
108 | Returns:
109 | - Node
110 | PDG of the file.
111 | - or None if problems to build the PDG.
112 | - or list of unknown variables if check_var is True.
113 | """
114 |
115 | start = timeit.default_timer()
116 | utility_df.limit_memory(20*10**9) # Limiting the memory usage to 20GB
117 | if input_file.endswith('.js'):
118 | esprima_json = input_file.replace('.js', '.json')
119 | else:
120 | esprima_json = input_file + '.json'
121 |
122 | if alt_json_path is not None:
123 | if not os.path.exists(alt_json_path):
124 | os.mkdir(alt_json_path)
125 | esprima_json = os.path.join(alt_json_path, esprima_json[1:])
126 | extended_ast = build_ast.get_extended_ast(input_file, esprima_json)
127 |
128 | benchmarks['errors'] = []
129 |
130 | if extended_ast is not None:
131 | benchmarks['got AST'] = timeit.default_timer() - start
132 | start = utility_df.micro_benchmark('Successfully got Esprima AST in',
133 | timeit.default_timer() - start)
134 | ast = extended_ast.get_ast()
135 | if beautiful_print:
136 | build_ast.beautiful_print_ast(ast, delete_leaf=[])
137 | ast_nodes = build_ast.ast_to_ast_nodes(ast, ast_nodes=_node.Node('Program'))
138 | function_hoisting(ast_nodes, ast_nodes) # Hoists FunDecl at a basic block's beginning
139 |
140 | benchmarks['AST'] = timeit.default_timer() - start
141 | start = utility_df.micro_benchmark('Successfully produced the AST in',
142 | timeit.default_timer() - start)
143 | if save_path_ast is not False:
144 | display_graph.draw_ast(ast_nodes, attributes=True, save_path=save_path_ast)
145 |
146 | cfg_nodes = control_flow.control_flow(ast_nodes)
147 | benchmarks['CFG'] = timeit.default_timer() - start
148 | start = utility_df.micro_benchmark('Successfully produced the CFG in',
149 | timeit.default_timer() - start)
150 | if save_path_cfg is not False:
151 | display_graph.draw_cfg(cfg_nodes, attributes=True, save_path=save_path_cfg)
152 |
153 | unknown_var = []
154 | try:
155 | with utility_df.Timeout(600): # Tries to produce DF within 10 minutes
156 | scopes = [_scope.Scope('Global')]
157 | dfg_nodes, scopes = data_flow.df_scoping(cfg_nodes, scopes=scopes,
158 | id_list=[], entry=1)
159 | # This may have to be added if we want to make the fake hoisting work
160 | # dfg_nodes = data_flow.df_scoping(dfg_nodes, scopes=scopes, id_list=[], entry=1)[0]
161 | except utility_df.Timeout.Timeout:
162 | logging.critical('Building the PDG timed out for %s', input_file)
163 | benchmarks['errors'].append('pdg-timeout')
164 | return _node.Node('Program') # Empty PDG to avoid trying to get the children of None
165 |
166 | # except MemoryError: # Catching it will catch ALL memory errors,
167 | # while we just want to avoid getting over our 20GB limit
168 | # logging.critical('Too much memory used for %s', input_file)
169 | # return _node.Node('Program') # Empty PDG to avoid trying to get the children of None
170 |
171 | benchmarks['PDG'] = timeit.default_timer() - start
172 | utility_df.micro_benchmark('Successfully produced the PDG in',
173 | timeit.default_timer() - start)
174 | if save_path_pdg is not False:
175 | display_graph.draw_pdg(dfg_nodes, attributes=True, save_path=save_path_pdg)
176 |
177 | if check_json: # Looking for possible bugs when building the AST / json doc in build_ast
178 | my_json = esprima_json.replace('.json', '-back.json')
179 | build_ast.save_json(dfg_nodes, my_json)
180 | print(build_ast.get_code(my_json))
181 |
182 | if check_var:
183 | for scope in scopes:
184 | for unknown in scope.unknown_var:
185 | if not unknown.data_dep_parents:
186 | # If DD: not unknown, can happen because of hoisting FunctionDeclaration
187 | # After second function run, not unknown anymore
188 | logging.warning('The variable %s is not declared in the scope %s',
189 | unknown.attributes['name'], scope.name)
190 | unknown_var.append(unknown)
191 | return unknown_var
192 |
193 | if store_pdgs is not None:
194 | store_pdg = os.path.join(store_pdgs, os.path.basename(input_file.replace('.js', '')))
195 | pickle_dump_process(dfg_nodes, store_pdg)
196 | json_analysis = os.path.join(store_pdgs, os.path.basename(esprima_json))
197 | with open(json_analysis, 'w') as json_data:
198 | json.dump(benchmarks, json_data, indent=4, sort_keys=False, default=default,
199 | skipkeys=True)
200 | return dfg_nodes
201 | benchmarks['errors'].append('parsing-error')
202 | return _node.Node('ParsingError') # Empty PDG to avoid trying to get the children of None
203 |
204 |
205 | def default(o):
206 | """ To avoid TypeError, conversion of problematic objects into str. """
207 |
208 | return str(o)
209 |
210 |
211 | def handle_one_pdg(root, js, store_pdgs):
212 | """ Stores the PDG of js located in root, in store_pdgs. """
213 |
214 | benchmarks = dict()
215 | if js.endswith('.js'):
216 | print(os.path.join(store_pdgs, js.replace('.js', '')))
217 | js_path = os.path.join(root, js)
218 | if not os.path.isfile(js_path):
219 | logging.error('The path %s does not exist', js_path)
220 | return False
221 | # Some PDGs lead to Segfault, avoids killing the current process
222 | p = Process(target=get_data_flow_process, args=(js_path, benchmarks, store_pdgs))
223 | p.start()
224 | p.join()
225 | if p.exitcode != 0:
226 | logging.critical('Something wrong occurred with %s PDG generation', js_path)
227 | return False
228 | return True
229 |
230 |
231 | def worker(my_queue):
232 | """ Worker """
233 |
234 | while True:
235 | try:
236 | root, js, store_pdgs = my_queue.get(timeout=2)
237 | handle_one_pdg(root, js, store_pdgs)
238 | except Exception as e:
239 | logging.exception(e)
240 | break
241 |
242 |
243 | def store_pdg_folder(folder_js):
244 | """
245 | Stores the PDGs of the JS files from folder_js.
246 |
247 | -------
248 | Parameter:
249 | - folder_js: str
250 | Path of the folder containing the files to get the PDG of.
251 | """
252 |
253 | start = timeit.default_timer()
254 |
255 | my_queue = Queue()
256 | workers = list()
257 |
258 | if not os.path.exists(folder_js):
259 | logging.exception('The path %s does not exist', folder_js)
260 | return
261 | store_pdgs = os.path.join(folder_js, 'PDG')
262 | if not os.path.exists(store_pdgs):
263 | os.makedirs(store_pdgs)
264 |
265 | for root, _, files in os.walk(folder_js):
266 | for js in files:
267 | my_queue.put([root, js, store_pdgs])
268 |
269 | for _ in range(utility_df.NUM_WORKERS):
270 | p = Process(target=worker, args=(my_queue,))
271 | p.start()
272 | print("Starting process")
273 | workers.append(p)
274 |
275 | for w in workers:
276 | w.join()
277 |
278 | utility_df.micro_benchmark('Total elapsed time:', timeit.default_timer() - start)
279 |
280 |
281 | def store_extension_pdg_folder(extensions_path):
282 | """ Stores the PDGs of all JS files contained in all extensions_path's folders. TO CALL"""
283 |
284 | start = timeit.default_timer()
285 |
286 | my_queue = Queue()
287 | workers = list()
288 |
289 | for extension_folder in os.listdir(extensions_path):
290 | extension_path = os.path.join(extensions_path, extension_folder)
291 | if os.path.isdir(extension_path):
292 | extension_pdg_path = os.path.join(extension_path, 'PDG')
293 | if not os.path.exists(extension_pdg_path):
294 | os.makedirs(extension_pdg_path)
295 | for component in os.listdir(extension_path):
296 | # To handle only files not handled yet
297 | # if not os.path.isfile(os.path.join(extension_pdg_path,
298 | # os.path.basename(component).replace('.js',
299 | # ''))):
300 | my_queue.put([extension_path, component, extension_pdg_path])
301 |
302 | for _ in range(utility_df.NUM_WORKERS):
303 | p = Process(target=worker, args=(my_queue,))
304 | p.start()
305 | print("Starting process")
306 | workers.append(p)
307 |
308 | for w in workers:
309 | w.join()
310 |
311 | utility_df.micro_benchmark('Total elapsed time:', timeit.default_timer() - start)
312 |
--------------------------------------------------------------------------------
/pdg_js/control_flow.py:
--------------------------------------------------------------------------------
1 | # Copyright (C) 2021 Aurore Fass
2 | #
3 | # This program is free software: you can redistribute it and/or modify
4 | # it under the terms of the GNU Affero General Public License as published
5 | # by the Free Software Foundation, either version 3 of the License, or
6 | # (at your option) any later version.
7 | #
8 | # This program is distributed in the hope that it will be useful,
9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of
10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11 | # GNU Affero General Public License for more details.
12 | #
13 | # You should have received a copy of the GNU Affero General Public License
14 | # along with this program. If not, see .
15 |
16 |
17 | """
18 | Adds control flow to the AST.
19 | """
20 |
21 | # Note: slightly improved from HideNoSeek
22 |
23 |
24 | from . import node as _node
25 |
26 |
27 | def link_expression(node, node_parent):
28 | """ Non-statement node. """
29 | if node.is_comment():
30 | pass
31 | else:
32 | node_parent.set_statement_dependency(extremity=node)
33 | return node
34 |
35 |
36 | def epsilon_statement_cf(node):
37 | """ Non-conditional statements. """
38 | for child in node.children:
39 | if isinstance(child, _node.Statement):
40 | node.set_control_dependency(extremity=child, label='e')
41 | else:
42 | link_expression(node=child, node_parent=node)
43 |
44 |
45 | def do_while_cf(node):
46 | """ DoWhileStatement. """
47 | # Element 0: body (Statement)
48 | # Element 1: test (Expression)
49 | node.set_control_dependency(extremity=node.children[0], label=True)
50 | link_expression(node=node.children[1], node_parent=node)
51 |
52 |
53 | def for_cf(node):
54 | """ ForStatement. """
55 | # Element 0: init
56 | # Element 1: test (Expression)
57 | # Element 2: update (Expression)
58 | # Element 3: body (Statement)
59 | """ ForOfStatement. """
60 | # Element 0: left
61 | # Element 1: right
62 | # Element 2: body (Statement)
63 | i = 0
64 | for child in node.children:
65 | if child.body != 'body':
66 | link_expression(node=child, node_parent=node)
67 | elif not child.is_comment():
68 | node.set_control_dependency(extremity=child, label=True)
69 | i += 1
70 |
71 |
72 | def if_cf(node):
73 | """ IfStatement. """
74 | # Element 0: test (Expression)
75 | # Element 1: consequent (Statement)
76 | # Element 2: alternate (Statement)
77 | link_expression(node=node.children[0], node_parent=node)
78 | if len(node.children) > 1: # Not sure why, but can happen...
79 | node.set_control_dependency(extremity=node.children[1], label=True)
80 | if len(node.children) > 2:
81 | if node.children[2].is_comment():
82 | pass
83 | else:
84 | node.set_control_dependency(extremity=node.children[2], label=False)
85 |
86 |
87 | def try_cf(node):
88 | """ TryStatement. """
89 | # Element 0: block (Statement)
90 | # Element 1: handler (Statement) / finalizer (Statement)
91 | # Element 2: finalizer (Statement)
92 | node.set_control_dependency(extremity=node.children[0], label=True)
93 | if node.children[1].body == 'handler':
94 | node.set_control_dependency(extremity=node.children[1], label=False)
95 | else: # finalizer
96 | node.set_control_dependency(extremity=node.children[1], label='e')
97 | if len(node.children) > 2:
98 | if node.children[2].body == 'finalizer':
99 | node.set_control_dependency(extremity=node.children[2], label='e')
100 |
101 |
102 | def while_cf(node):
103 | """ WhileStatement. """
104 | # Element 0: test (Expression)
105 | # Element 1: body (Statement)
106 | link_expression(node=node.children[0], node_parent=node)
107 | node.set_control_dependency(extremity=node.children[1], label=True)
108 |
109 |
110 | def switch_cf(node):
111 | """ SwitchStatement. """
112 | # Element 0: discriminant
113 | # Element 1: cases (SwitchCase)
114 |
115 | switch_cases = node.children
116 | link_expression(node=switch_cases[0], node_parent=node)
117 | if len(switch_cases) > 1:
118 | # SwitchStatement -> True -> SwitchCase for first one
119 | node.set_control_dependency(extremity=switch_cases[1], label='e')
120 | switch_case_cf(switch_cases[1])
121 | for i in range(2, len(switch_cases)):
122 | if switch_cases[i].is_comment():
123 | pass
124 | else:
125 | # SwitchCase -> False -> SwitchCase for the other ones
126 | switch_cases[i - 1].set_control_dependency(extremity=switch_cases[i], label=False)
127 | if i != len(switch_cases) - 1:
128 | switch_case_cf(switch_cases[i])
129 | else: # Because the last switch is executed per default, i.e. without condition 1st
130 | switch_case_cf(switch_cases[i], last=True)
131 | # Otherwise, we could just have a switch(something) {}
132 |
133 |
134 | def switch_case_cf(node, last=False):
135 | """ SwitchCase. """
136 | # Element 0: test
137 | # Element 1: consequent (Statement)
138 | nb_child = len(node.children)
139 | if nb_child > 1:
140 | if not last: # As all switches but the last have to respect a condition to enter the branch
141 | link_expression(node=node.children[0], node_parent=node)
142 | j = 1
143 | else:
144 | j = 0
145 | for i in range(j, nb_child):
146 | if node.children[i].is_comment():
147 | pass
148 | else:
149 | node.set_control_dependency(extremity=node.children[i], label=True)
150 | elif nb_child == 1:
151 | node.set_control_dependency(extremity=node.children[0], label=True)
152 |
153 |
154 | def conditional_statement_cf(node):
155 | """ For the conditional nodes. """
156 | if node.name == 'DoWhileStatement':
157 | do_while_cf(node)
158 | elif node.name == 'ForStatement' or node.name == 'ForOfStatement'\
159 | or node.name == 'ForInStatement':
160 | for_cf(node)
161 | elif node.name == 'IfStatement' or node.name == 'ConditionalExpression':
162 | if_cf(node)
163 | elif node.name == 'WhileStatement':
164 | while_cf(node)
165 | elif node.name == 'TryStatement':
166 | try_cf(node)
167 | elif node.name == 'SwitchStatement':
168 | switch_cf(node)
169 | elif node.name == 'SwitchCase':
170 | pass # Already handled in SwitchStatement
171 |
172 |
173 | def control_flow(ast_nodes):
174 | """
175 | Enhance the AST by adding statement and control dependencies to each Node.
176 |
177 | -------
178 | Parameters:
179 | - ast_nodes: Node
180 | Output of ast_to_ast_nodes(, ast_nodes=Node('Program')).
181 |
182 | -------
183 | Returns:
184 | - Node
185 | With statement and control dependencies added.
186 | """
187 |
188 | for child in ast_nodes.children:
189 | if child.name in _node.EPSILON or child.name in _node.UNSTRUCTURED:
190 | epsilon_statement_cf(child)
191 | elif child.name in _node.CONDITIONAL:
192 | conditional_statement_cf(child)
193 | else:
194 | for grandchild in child.children:
195 | link_expression(node=grandchild, node_parent=child)
196 | control_flow(child)
197 | return ast_nodes
198 |
--------------------------------------------------------------------------------
/pdg_js/display_graph.py:
--------------------------------------------------------------------------------
1 | # Copyright (C) 2021 Aurore Fass
2 | #
3 | # This program is free software: you can redistribute it and/or modify
4 | # it under the terms of the GNU Affero General Public License as published
5 | # by the Free Software Foundation, either version 3 of the License, or
6 | # (at your option) any later version.
7 | #
8 | # This program is distributed in the hope that it will be useful,
9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of
10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11 | # GNU Affero General Public License for more details.
12 | #
13 | # You should have received a copy of the GNU Affero General Public License
14 | # along with this program. If not, see .
15 |
16 | # Additional permission under GNU GPL version 3 section 7
17 | #
18 | # If you modify this Program, or any covered work, by linking or combining it with
19 | # graphviz (or a modified version of that library), containing parts covered by the
20 | # terms of The Common Public License, the licensors of this Program grant you
21 | # additional permission to convey the resulting work.
22 |
23 |
24 | """
25 | Display graphs (AST, CFG, PDG) using the graphviz library.
26 | """
27 |
28 | import graphviz
29 |
30 | from . import node as _node
31 |
32 |
33 | def append_leaf_attr(node, graph):
34 | """
35 | Append the leaf's attribute to the graph in graphviz format.
36 |
37 | -------
38 | Parameters:
39 | - node: Node
40 | Node.
41 | - graph: Digraph/Graph
42 | Graph object. Be careful it is mutable.
43 | """
44 |
45 | if node.is_leaf():
46 | leaf_id = str(node.id) + 'leaf_'
47 | graph.attr('node', style='filled', color='lightgoldenrodyellow',
48 | fillcolor='lightgoldenrodyellow')
49 | graph.attr('edge', color='orange')
50 | got_attr, node_attributes = node.get_node_attributes()
51 | if got_attr: # Got attributes
52 | leaf_attr = str(node_attributes)
53 | graph.node(leaf_id, leaf_attr)
54 | graph.edge(str(node.id), leaf_id)
55 |
56 |
57 | def produce_ast(ast_nodes, attributes, graph=graphviz.Graph(comment='AST representation')):
58 | """
59 | Produce an AST in graphviz format.
60 |
61 | -------
62 | Parameters:
63 | - ast_nodes: Node
64 | Output of ast_to_ast_nodes(, ast_nodes=Node('Program')).
65 | - graph: Graph
66 | Graph object. Be careful it is mutable.
67 | - attributes: bool
68 | Whether to display the leaf attributes or not.
69 |
70 | -------
71 | Returns:
72 | - graph
73 | graphviz formatted graph.
74 | """
75 |
76 | graph.attr('node', color='black', style='filled', fillcolor='white')
77 | graph.attr('edge', color='black')
78 | graph.node(str(ast_nodes.id), ast_nodes.name)
79 | for child in ast_nodes.children:
80 | graph.attr('node', color='black', style='filled', fillcolor='white')
81 | graph.attr('edge', color='black')
82 | graph.edge(str(ast_nodes.id), str(child.id))
83 | produce_ast(child, attributes, graph)
84 | if attributes:
85 | append_leaf_attr(child, graph)
86 | return graph
87 |
88 |
89 | def draw_ast(ast_nodes, attributes=False, save_path=None):
90 | """
91 | Plot an AST.
92 |
93 | -------
94 | Parameters:
95 | - ast_nodes: Node
96 | Output of ast_to_ast_nodes(, ast_nodes=Node('Program')).
97 | - save_path: str
98 | Path of the file to store the AST in.
99 | - attributes: bool
100 | Whether to display the leaf attributes or not. Default: False.
101 | """
102 |
103 | dot = produce_ast(ast_nodes, attributes)
104 | if save_path is None:
105 | dot.view()
106 | else:
107 | dot.render(save_path, view=False)
108 | graphviz.render(filepath=save_path, engine='dot', format='eps')
109 | dot.clear()
110 |
111 |
112 | def cfg_type_node(child):
113 | """ Different form according to statement node or not. """
114 |
115 | if isinstance(child, _node.Statement) or child.is_comment():
116 | return ['box', 'red', 'lightpink']
117 | return ['ellipse', 'blue', 'lightblue2']
118 |
119 |
120 | def produce_cfg_one_child(child, data_flow, attributes,
121 | graph=graphviz.Digraph(comment='Control flow representation')):
122 | """
123 | Produce a CFG in graphviz format.
124 |
125 | -------
126 | Parameters:
127 | - child: Node
128 | Node to begin with.
129 | - data_flow: bool
130 | Whether to display the data flow or not. Default: False.
131 | - attributes: bool
132 | Whether to display the leaf attributes or not.
133 | - graph: Digraph
134 | Graph object. Be careful it is mutable.
135 |
136 | -------
137 | Returns:
138 | - graph
139 | graphviz formatted graph.
140 | """
141 |
142 | type_node = cfg_type_node(child)
143 | graph.attr('node', shape=type_node[0], style='filled', color=type_node[2],
144 | fillcolor=type_node[2])
145 | graph.attr('edge', color=type_node[1])
146 | graph.node(str(child.id), child.name)
147 |
148 | for child_statement_dep in child.statement_dep_children:
149 | child_statement = child_statement_dep.extremity
150 | type_node = cfg_type_node(child_statement)
151 | graph.attr('node', shape=type_node[0], color=type_node[2], fillcolor=type_node[2])
152 | graph.attr('edge', color=type_node[1])
153 | graph.edge(str(child.id), str(child_statement.id), label=child_statement_dep.label)
154 | produce_cfg_one_child(child_statement, data_flow=data_flow, attributes=attributes,
155 | graph=graph)
156 | if attributes:
157 | append_leaf_attr(child_statement, graph)
158 |
159 | if isinstance(child, _node.Statement):
160 | for child_cf_dep in child.control_dep_children:
161 | child_cf = child_cf_dep.extremity
162 | type_node = cfg_type_node(child_cf)
163 | graph.attr('node', shape=type_node[0], color=type_node[2], fillcolor=type_node[2])
164 | graph.attr('edge', color=type_node[1])
165 | graph.edge(str(child.id), str(child_cf.id), label=str(child_cf_dep.label))
166 | produce_cfg_one_child(child_cf, data_flow=data_flow, attributes=attributes, graph=graph)
167 | if attributes:
168 | append_leaf_attr(child_cf, graph)
169 |
170 | if data_flow:
171 | graph.attr('edge', color='green')
172 | if isinstance(child, _node.Identifier):
173 | for child_data_dep in child.data_dep_children:
174 | child_data = child_data_dep.extremity
175 | type_node = cfg_type_node(child)
176 | graph.attr('node', shape=type_node[0], color=type_node[2], fillcolor=type_node[2])
177 | graph.edge(str(child.id), str(child_data.id), label=child_data_dep.label)
178 | # No call to the func as already recursive for data/statmt dep on the same nodes
179 | # logging.info("Data dependency on the variable " + child_data.attributes['name'])
180 | graph.attr('edge', color='seagreen')
181 | if hasattr(child, 'fun_param_parents'): # Function parameters flow
182 | for child_param in child.fun_param_parents:
183 | type_node = cfg_type_node(child)
184 | graph.attr('node', shape=type_node[0], color=type_node[2], fillcolor=type_node[2])
185 | graph.edge(str(child.id), str(child_param.id), label='param')
186 |
187 | return graph
188 |
189 |
190 | def draw_cfg(cfg_nodes, attributes=False, save_path=None):
191 | """
192 | Plot a CFG.
193 |
194 | -------
195 | Parameters:
196 | - cfg_nodes: Node
197 | Output of produce_cfg(ast_to_ast_nodes(, ast_nodes=Node('Program'))).
198 | - save_path: str
199 | Path of the file to store the CFG in.
200 | - attributes: bool
201 | Whether to display the leaf attributes or not. Default: False.
202 | """
203 |
204 | dot = graphviz.Digraph()
205 | for child in cfg_nodes.children:
206 | dot = produce_cfg_one_child(child=child, data_flow=False, attributes=attributes)
207 | if save_path is None:
208 | dot.view()
209 | else:
210 | dot.render(save_path, view=False)
211 | graphviz.render(filepath=save_path, engine='dot', format='eps')
212 | dot.clear()
213 |
214 |
215 | def draw_pdg(dfg_nodes, attributes=False, save_path=None):
216 | """
217 | Plot a PDG.
218 |
219 | -------
220 | Parameters:
221 | - dfg_nodes: Node
222 | Output of produce_dfg(produce_cfg(ast_to_ast_nodes(, ast_nodes=Node('Program')))).
223 | - save_path: str
224 | Path of the file to store the PDG in.
225 | - attributes: bool
226 | Whether to display the leaf attributes or not. Default: False.
227 | """
228 |
229 | dot = graphviz.Digraph()
230 | for child in dfg_nodes.children:
231 | dot = produce_cfg_one_child(child=child, data_flow=True, attributes=attributes)
232 | if save_path is None:
233 | dot.view()
234 | else:
235 | dot.render(save_path, view=False)
236 | graphviz.render(filepath=save_path, engine='dot', format='eps')
237 | dot.clear()
238 |
--------------------------------------------------------------------------------
/pdg_js/extended_ast.py:
--------------------------------------------------------------------------------
1 | # Copyright (C) 2021 Aurore Fass
2 | #
3 | # This program is free software: you can redistribute it and/or modify
4 | # it under the terms of the GNU Affero General Public License as published
5 | # by the Free Software Foundation, either version 3 of the License, or
6 | # (at your option) any later version.
7 | #
8 | # This program is distributed in the hope that it will be useful,
9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of
10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11 | # GNU Affero General Public License for more details.
12 | #
13 | # You should have received a copy of the GNU Affero General Public License
14 | # along with this program. If not, see .
15 |
16 |
17 | """
18 | Definition of the class ExtendedAst: corresponds to the output of Esprima's parse function
19 | with the arguments: {range: true, loc: true, tokens: true, tolerant: true, comment: true}.
20 | """
21 |
22 | # Note: slightly improved from HideNoSeek
23 |
24 |
25 | class ExtendedAst:
26 | """ Stores the Esprima formatted AST into python objects. """
27 |
28 | def __init__(self):
29 | self.type = None
30 | self.filename = ''
31 | self.body = []
32 | self.source_type = None
33 | self.range = []
34 | self.comments = []
35 | self.tokens = []
36 | self.leading_comments = []
37 |
38 | def get_type(self):
39 | return self.type
40 |
41 | def set_type(self, root):
42 | self.type = root
43 |
44 | def get_body(self):
45 | return self.body
46 |
47 | def set_body(self, body):
48 | self.body = body
49 |
50 | def get_extended_ast(self):
51 | return {'type': self.get_type(), 'body': self.get_body(),
52 | 'sourceType': self.get_source_type(), 'range': self.get_range(),
53 | 'comments': self.get_comments(), 'tokens': self.get_tokens(),
54 | 'filename': self.filename,
55 | 'leadingComments': self.get_leading_comments()}
56 |
57 | def get_ast(self):
58 | return {'type': self.get_type(), 'body': self.get_body(), 'filename': self.filename}
59 |
60 | def get_source_type(self):
61 | return self.source_type
62 |
63 | def set_source_type(self, source_type):
64 | self.source_type = source_type
65 |
66 | def get_range(self):
67 | return self.range
68 |
69 | def set_range(self, ast_range):
70 | self.range = ast_range
71 |
72 | def get_comments(self):
73 | return self.comments
74 |
75 | def set_comments(self, comments):
76 | self.comments = comments
77 |
78 | def get_tokens(self):
79 | return self.tokens
80 |
81 | def set_tokens(self, tokens):
82 | self.tokens = tokens
83 |
84 | def get_leading_comments(self):
85 | return self.leading_comments
86 |
87 | def set_leading_comments(self, leading_comments):
88 | self.leading_comments = leading_comments
89 |
--------------------------------------------------------------------------------
/pdg_js/js_operators.py:
--------------------------------------------------------------------------------
1 | # Copyright (C) 2021 Aurore Fass
2 | #
3 | # This program is free software: you can redistribute it and/or modify
4 | # it under the terms of the GNU Affero General Public License as published
5 | # by the Free Software Foundation, either version 3 of the License, or
6 | # (at your option) any later version.
7 | #
8 | # This program is distributed in the hope that it will be useful,
9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of
10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11 | # GNU Affero General Public License for more details.
12 | #
13 | # You should have received a copy of the GNU Affero General Public License
14 | # along with this program. If not, see .
15 |
16 |
17 | """
18 | Operators computation for pointer analysis; computing values of variables.
19 | """
20 |
21 | import logging
22 |
23 | from . import node as _node
24 |
25 | """
26 | In the following,
27 | - node: Node
28 | Current node.
29 | - initial_node: Node
30 | Node, which we leveraged to compute the value of node (for provenance purpose).
31 | """
32 |
33 |
34 | def get_node_value(node, initial_node=None, recdepth=0, recvisited=None):
35 | """ Gets the value of node, depending on its type. """
36 |
37 | if recvisited is None:
38 | recvisited = set()
39 |
40 | if isinstance(node, _node.ValueExpr):
41 | if node.value is not None: # Special case if node references a Node whose value changed
42 | return node.value
43 |
44 | got_attr, node_attributes = node.get_node_attributes()
45 | if got_attr: # Got attributes, returns the value
46 | return node_attributes
47 |
48 | logging.debug('Getting the value from %s', node.name)
49 |
50 | if node.name == 'UnaryExpression':
51 | return compute_unary_expression(node, initial_node=initial_node,
52 | recdepth=recdepth + 1, recvisited=recvisited)
53 | if node.name in ('BinaryExpression', 'LogicalExpression'):
54 | return compute_binary_expression(node, initial_node=initial_node,
55 | recdepth=recdepth + 1, recvisited=recvisited)
56 | if node.name == 'ArrayExpression':
57 | return node
58 | if node.name in ('ObjectExpression', 'ObjectPattern'):
59 | return node
60 | if node.name == 'MemberExpression':
61 | return compute_member_expression(node, initial_node=initial_node,
62 | recdepth=recdepth + 1, recvisited=recvisited)
63 | if node.name == 'ThisExpression':
64 | return 'this'
65 | if isinstance(node, _node.FunctionExpression):
66 | return compute_function_expression(node)
67 | if node.name == 'CallExpression' and isinstance(node.children[0], _node.FunctionExpression):
68 | return node.children[0].fun_name # Function called; mapping to the function name if any
69 | if node.name in _node.CALL_EXPR:
70 | return compute_call_expression(node, initial_node=initial_node,
71 | recdepth=recdepth + 1, recvisited=recvisited)
72 | if node.name == 'ReturnStatement' or node.name == 'BlockStatement':
73 | if node.children:
74 | return get_node_computed_value(node.children[0], initial_node=initial_node,
75 | recdepth=recdepth + 1, recvisited=recvisited)
76 | return None
77 | if node.name == 'TemplateLiteral':
78 | return compute_template_literal(node, initial_node=initial_node,
79 | recdepth=recdepth + 1, recvisited=recvisited)
80 | if node.name == 'ConditionalExpression':
81 | return compute_conditional_expression(node, initial_node=initial_node,
82 | recdepth=recdepth + 1, recvisited=recvisited)
83 | if node.name == 'AssignmentExpression':
84 | return compute_assignment_expression(node, initial_node=initial_node,
85 | recdepth=recdepth + 1, recvisited=recvisited)
86 | if node.name == 'UpdateExpression':
87 | return get_node_computed_value(node.children[0], initial_node=initial_node,
88 | recdepth=recdepth + 1, recvisited=recvisited)
89 |
90 | for child in node.children:
91 | get_node_computed_value(child, initial_node=initial_node,
92 | recdepth=recdepth + 1, recvisited=recvisited)
93 |
94 | logging.warning('Could not get the value of the node %s, whose attributes are %s',
95 | node.name, node.attributes)
96 |
97 | return None
98 |
99 |
100 | def get_node_computed_value(node, initial_node=None, keep_none=False, recdepth=0, recvisited=None):
101 | """ Computes the value of node, depending on its type. """
102 |
103 | if recvisited is None:
104 | recvisited = set()
105 |
106 | logging.debug("Visiting node: %s", node.attributes)
107 |
108 | if node in recvisited:
109 | if isinstance(node, _node.Value):
110 | logging.debug("Revisiting node: %s %s (value: %s)", node.attributes, initial_node,
111 | node.value)
112 | return node.value
113 | logging.debug("Revisiting node: %s %s (none)", node.attributes, initial_node)
114 | return None
115 | recvisited.add(node)
116 | if recdepth > 1000:
117 | logging.debug("Recursion depth for get_node_computed_value exceeded: %s", node.attributes)
118 | if hasattr(node, "value"):
119 | return node.value
120 | return None
121 |
122 | value = None
123 | if isinstance(initial_node, _node.Value):
124 | logging.debug('%s is depending on %s', initial_node.attributes, node.attributes)
125 | initial_node.set_provenance(node)
126 |
127 | if isinstance(node, _node.Value): # if we already know the value
128 | value = node.value # might be directly a value (int/str) or a Node referring to the value
129 | logging.debug('Computing the value of an %s node, got %s', node.name, value)
130 |
131 | if isinstance(value, _node.Node): # node.value is a Node
132 | # computing actual value
133 | if node.value != node:
134 | value = get_node_computed_value(node.value, initial_node=initial_node,
135 | recdepth=recdepth + 1, recvisited=recvisited)
136 | logging.debug('Its value is a node, computed it and got %s', value)
137 |
138 | if value is None and not keep_none: # node is not an Identifier or is None
139 | # keep_none True is just for display_temp, to avoid having an Identifier variable with
140 | # None value being equal to the variable because of the call to get_node_value on itself
141 | value = get_node_value(node, initial_node=initial_node,
142 | recdepth=recdepth + 1, recvisited=recvisited)
143 | logging.debug('The value should be computed, got %s', value)
144 |
145 | if isinstance(node, _node.Value) and node.name not in _node.CALL_EXPR:
146 | # Do not store value for CallExpr as could have changed and should be recomputed
147 | node.set_value(value) # Stores the value so as not to compute it again
148 |
149 | return value
150 |
151 |
152 | def compute_operators(operator, node_a, node_b, initial_node=None, recdepth=0, recvisited=None):
153 | """ Evaluates node_a operator node_b. """
154 |
155 | if isinstance(node_a, _node.Node): # Standard case
156 | if isinstance(node_a, _node.Identifier):
157 | # If it is an Identifier, it should have a value, possibly None.
158 | # But the value should not be the Identifier's name.
159 | a = get_node_computed_value(node_a, initial_node=initial_node, keep_none=True,
160 | recdepth=recdepth + 1, recvisited=recvisited)
161 | else:
162 | a = get_node_computed_value(node_a, initial_node=initial_node,
163 | recdepth=recdepth + 1, recvisited=recvisited)
164 | else: # Specific to compute_binary_expression
165 | a = node_a # node_a may not be a Node but already a computed result
166 | if isinstance(node_b, _node.Node): # Standard case
167 | if isinstance(node_b, _node.Identifier):
168 | b = get_node_computed_value(node_b, initial_node=initial_node, keep_none=True,
169 | recdepth=recdepth + 1, recvisited=recvisited)
170 | else:
171 | b = get_node_computed_value(node_b, initial_node=initial_node,
172 | recdepth=recdepth + 1, recvisited=recvisited)
173 | else: # Specific to compute_binary_expression
174 | b = node_b # node_b may not be a Node but already a computed result
175 |
176 | if not isinstance(a, (int, float)) or not isinstance(b, (int, float)):
177 | if operator in ('+=', '+') and (isinstance(a, str) or isinstance(b, str)):
178 | return operator_plus(a, b)
179 | if a is None or b is None:
180 | return None
181 | if (not isinstance(a, str) or isinstance(a, str) and not '.' in a)\
182 | and (not isinstance(b, str) or isinstance(b, str) and not '.' in b):
183 | # So that if MemExpr could not be fully computed we do not take any hasty decisions
184 | # For ex: data.message.split(-).1.toUpperCase() == POST is undecidable for us
185 | # But abc == abc is not
186 | pass
187 | else:
188 | logging.warning('Unable to compute %s %s %s', a, operator, b)
189 | return None
190 |
191 | try:
192 | if operator in ('+=', '+'):
193 | return operator_plus(a, b)
194 | if operator in ('-=', '-'):
195 | return operator_minus(a, b)
196 | if operator in ('*=', '*'):
197 | return operator_asterisk(a, b)
198 | if operator in ('/=', '/'):
199 | return operator_slash(a, b)
200 | if operator in ('**=', '**'):
201 | return operator_2asterisk(a, b)
202 | if operator in ('%=', '%'):
203 | return operator_modulo(a, b)
204 | if operator == '++':
205 | return operator_plus_plus(a)
206 | if operator == '--':
207 | return operator_minus_minus(a)
208 | if operator in ('==', '==='):
209 | return operator_equal(a, b)
210 | if operator in ('!=', '!=='):
211 | return operator_different(a, b)
212 | if operator == '!':
213 | return operator_not(a)
214 | if operator == '>=':
215 | return operator_bigger_equal(a, b)
216 | if operator == '>':
217 | return operator_bigger(a, b)
218 | if operator == '<=':
219 | return operator_smaller_equal(a, b)
220 | if operator == '<':
221 | return operator_smaller(a, b)
222 | if operator == '&&':
223 | return operator_and(a, b)
224 | if operator == '||':
225 | return operator_or(a, b)
226 | if operator in ('&', '>>', '>>>', '<<', '^', '|', '&=', '>>=', '>>>=', '<<=', '^=', '|=',
227 | 'in', 'instanceof'):
228 | logging.warning('Currently not handling the operator %s', operator)
229 | return None
230 |
231 | except TypeError:
232 | logging.warning('Type problem, could not compute %s %s %s', a, operator, b)
233 | return None
234 |
235 | logging.error('Unknown operator %s', operator)
236 | return None
237 |
238 |
239 | def compute_unary_expression(node, initial_node, recdepth=0, recvisited=None):
240 | """ Evaluates an UnaryExpression node. """
241 |
242 | compute_unary = get_node_computed_value(node.children[0], initial_node=initial_node,
243 | recdepth=recdepth + 1, recvisited=recvisited)
244 | if compute_unary is None:
245 | return None
246 | if isinstance(compute_unary, bool):
247 | return not compute_unary
248 | if isinstance(compute_unary, (int, float)):
249 | return - compute_unary
250 | if isinstance(compute_unary, str): # So as not to lose the current compute_unary value
251 | return node.attributes['operator'] + compute_unary # Adds the UnaryOp before value
252 |
253 | logging.warning('Could not compute the unary operation %s on %s',
254 | node.attributes['operator'], compute_unary)
255 | return None
256 |
257 |
258 | def compute_binary_expression(node, initial_node, recdepth=0, recvisited=None):
259 | """ Evaluates a BinaryExpression node. """
260 |
261 | operator = node.attributes['operator']
262 | node_a = node.children[0]
263 | node_b = node.children[1]
264 |
265 | # node_a operator node_b
266 | return compute_operators(operator, node_a, node_b, initial_node=initial_node,
267 | recdepth=recdepth, recvisited=recvisited)
268 |
269 |
270 | def compute_member_expression(node, initial_node, compute=True, recdepth=0, recvisited=None):
271 | """ Evaluates a MemberExpression node. """
272 |
273 | obj = node.children[0]
274 | prop = node.children[1]
275 | prop_value = get_node_computed_value(prop, initial_node=initial_node, recdepth=recdepth + 1,
276 | recvisited=recvisited) # Computes the value
277 | obj_value = get_node_computed_value(obj, initial_node=initial_node,
278 | recdepth=recdepth + 1, recvisited=recvisited)
279 | if obj.name == 'ThisExpression' or obj_value in _node.GLOBAL_VAR:
280 | return prop_value
281 |
282 | if not isinstance(obj_value, _node.Node):
283 | # Specific case if we changed an Array/Object type
284 | # var my_array = [[1]]; my_array[0] = 18; e = my_array[0][0]; -> e undefined hence None
285 | # If ArrayExpression or ObjectExpression, we are trying to access an element that does not
286 | # exist anymore, will be displayed as .prop
287 | # Otherwise: obj.prop
288 | if isinstance(obj_value, list): # Special case for TaggedTemplateExpression
289 | if isinstance(prop_value, int):
290 | try:
291 | return obj_value[prop_value] # Params passed in obj_value, cf. data_flow
292 | except IndexError as e:
293 | logging.exception(e)
294 | logging.exception('Could not get the property %s of %s', prop_value, obj_value)
295 | return None
296 | elif isinstance(obj_value, dict): # Special case for already defined objects with new prop
297 | if prop_value in obj_value:
298 | return obj_value[prop_value] # ex: localStorage.firstTime
299 | return None
300 | return display_member_expression_value(node, '', initial_node=initial_node)[0:-1]
301 |
302 | # obj_value.prop_value or obj_value[prop_value]
303 | if obj_value.name == 'Literal' or obj_value.name == 'Identifier':
304 | member_expression_value = obj_value # We already have the value
305 | else:
306 | if isinstance(prop_value, str): # obj_value.prop_value -> prop_value str = object property
307 | obj_prop_list = []
308 | search_object_property(obj_value, prop_value, obj_prop_list)
309 | if obj_prop_list: # Stores all matches
310 | member_expression_value = None
311 | for obj_prop in obj_prop_list:
312 | member_expression_value, worked = get_property_value(obj_prop,
313 | initial_node=initial_node,
314 | recdepth=recdepth + 1,
315 | recvisited=recvisited)
316 | if worked: # Takes the first one that is working
317 | break
318 | else:
319 | member_expression_value = None
320 | logging.warning('Could not get the property %s of the %s with value %s',
321 | prop_value, obj.name, obj_value)
322 | elif isinstance(prop_value, int): # obj_value[prop_value] -> prop_value int = array index
323 | if len(obj_value.children) > prop_value:
324 | member_expression_value = obj_value.children[prop_value] # We fetch the value
325 | else:
326 | member_expression_value = display_member_expression_value\
327 | (node, '', initial_node=initial_node)[0:-1]
328 | else:
329 | logging.error('Expected an str or an int, got a %s', type(prop_value))
330 | member_expression_value = None
331 |
332 | if compute and isinstance(member_expression_value, _node.Node):
333 | # Computes the value
334 | return get_node_computed_value(member_expression_value, initial_node=initial_node,
335 | recdepth=recdepth + 1)
336 |
337 | return member_expression_value # Returns the node referencing the value
338 |
339 |
340 | def search_object_property(node, prop, found_list):
341 | """ Search in an object definition where a given property (-> prop = str) is defined.
342 | Storing all the matches in case the first one is not the right one, e.g.,
343 | var obj = {
344 | f1: function(a) {obj.f2(1)},
345 | f2: function(a) {}
346 | };
347 | obj.f2();
348 | By looking for f2, the 1st match is wrong and will lead to an error, the 2nd one is correct."""
349 |
350 | if 'name' in node.attributes:
351 | if isinstance(prop, str):
352 | if node.attributes['name'] == prop:
353 | # prop is already the value
354 | found_list.append(node)
355 | elif 'value' in node.attributes:
356 | if isinstance(prop, str):
357 | if node.attributes['value'] == prop:
358 | # prop is already the value
359 | found_list.append(node)
360 |
361 | for child in node.children:
362 | search_object_property(child, prop, found_list)
363 |
364 |
365 | def get_property_value(node, initial_node, recdepth=0, recvisited=None):
366 | """ Get the value of an object's property. """
367 |
368 | if (isinstance(node, _node.Identifier) or node.name == 'Literal')\
369 | and node.parent.name == 'Property':
370 | prop_value = node.parent.children[1]
371 | if prop_value.name == 'Literal':
372 | return prop_value, True
373 | return get_node_computed_value(prop_value, initial_node=initial_node, recdepth=recdepth + 1,
374 | recvisited=recvisited), True
375 |
376 | logging.warning('Trying to get the property value of %s whose parent is %s',
377 | node.name, node.parent.name)
378 | return None, False
379 |
380 |
381 | def compute_function_expression(node):
382 | """ Computes a (Arrow)FunctionExpression node. """
383 |
384 | fun_name = node.fun_name
385 | if fun_name is not None:
386 | return fun_name # Mapping to the function's name if any
387 | return node # Otherwise mapping to the FunExpr handler
388 |
389 |
390 | def compute_call_expression(node, initial_node, recdepth=0, recvisited=None):
391 | """ Gets the value of CallExpression with parameters. """
392 |
393 | if isinstance(initial_node, _node.Value):
394 | initial_node.set_provenance(node)
395 |
396 | callee = node.children[0]
397 | params = '('
398 |
399 | for arg in range(1, len(node.children)):
400 | # Computes the value of the arguments: a.b...(arg1, arg2...)
401 | params += str(get_node_computed_value(node.children[arg], initial_node=initial_node,
402 | recdepth=recdepth + 1, recvisited=recvisited))
403 | if arg < len(node.children) - 1:
404 | params += ', '
405 |
406 | params += ')'
407 |
408 | if isinstance(callee, _node.Identifier):
409 | return str(get_node_computed_value(callee, initial_node=initial_node,
410 | recdepth=recdepth + 1, recvisited=recvisited)) + params
411 |
412 | if callee.name == 'MemberExpression':
413 | value = display_member_expression_value(callee, '', initial_node=initial_node)
414 | value = value[0:-1] + params
415 | return value
416 | # return compute_member_expression(callee) + params # To test if problems here
417 |
418 | if callee.name in _node.CALL_EXPR:
419 | if get_node_computed_value(callee, initial_node=initial_node, recdepth=recdepth + 1,
420 | recvisited=recvisited) is None or params is None:
421 | return None
422 | return get_node_computed_value(callee, initial_node=initial_node,
423 | recdepth=recdepth + 1, recvisited=recvisited) + params
424 |
425 | if callee.name == 'LogicalExpression': # a || b, if a not False a otherwise b
426 | if get_node_computed_value(callee.children[0], initial_node=initial_node,
427 | recdepth=recdepth + 1, recvisited=recvisited) is False:
428 | return get_node_computed_value(callee.children[1], initial_node=initial_node,
429 | recdepth=recdepth + 1, recvisited=recvisited)
430 | return get_node_computed_value(callee.children[0], initial_node=initial_node,
431 | recdepth=recdepth + 1, recvisited=recvisited)
432 |
433 | logging.error('Got a CallExpression on %s with attributes %s and id %s',
434 | callee.name, callee.attributes, callee.id)
435 | return None
436 |
437 |
438 | def compute_template_literal(node, initial_node, recdepth=0, recvisited=None):
439 | """ Gets the value of TemplateLiteral. """
440 |
441 | template_element = [] # Seems that TemplateElement = similar to Literal and in front
442 | expressions = [] # vs. Expressions has to be computed and are at the end
443 | template_literal = ''
444 |
445 | for child in node.children:
446 | if child.name == 'TemplateElement': # Either that
447 | template_element.append(child)
448 | else: # Or Expressions
449 | expressions.append(child)
450 |
451 | len_template_element = len(template_element)
452 | len_expressions = len(expressions)
453 |
454 | if len_template_element != len_expressions + 1:
455 | logging.error('Unexpected %s with %s TemplateElements and %s Expressions', node.type,
456 | len_template_element, len_expressions)
457 | return None
458 |
459 | for i in range(len_expressions):
460 | # Will concatenate: 1 TemplateElement, 1 Expr, ..., 1 TemplateElement
461 | template_literal += str(get_node_computed_value(template_element[i],
462 | initial_node=initial_node,
463 | recdepth=recdepth + 1,
464 | recvisited=recvisited)) \
465 | + str(get_node_computed_value(expressions[i],
466 | initial_node=initial_node,
467 | recdepth=recdepth + 1,
468 | recvisited=recvisited))
469 | template_literal += str(get_node_computed_value(template_element[len_template_element - 1],
470 | initial_node=initial_node,
471 | recdepth=recdepth + 1,
472 | recvisited=recvisited))
473 |
474 | return template_literal
475 |
476 |
477 | def display_member_expression_value(node, value, initial_node):
478 | """ Displays the value of elements from a MemberExpression. """
479 |
480 | for child in node.children:
481 | if child.name == 'MemberExpression':
482 | value = display_member_expression_value(child, value, initial_node=initial_node)
483 | else:
484 | value += str(get_node_computed_value(child, initial_node=initial_node)) + '.'
485 | return value
486 |
487 |
488 | def compute_object_expr(node, initial_node):
489 | """ For debug: displays the content of an ObjectExpression. """
490 |
491 | node_value = '{'
492 |
493 | for prop in node.children:
494 | key = prop.children[0]
495 | key_value = get_node_computed_value(key, initial_node=initial_node)
496 | value = prop.children[1]
497 | value_value = get_node_computed_value(value, initial_node=initial_node)
498 |
499 | prop_value = str(key_value) + ': ' + str(value_value)
500 | node_value += '\n\t' + prop_value
501 |
502 | node_value += '\n}'
503 | return node_value
504 |
505 |
506 | def compute_conditional_expression(node, initial_node, recdepth=0, recvisited=None):
507 | """ Gets the value of a ConditionalExpression. """
508 |
509 | test = get_node_computed_value(node.children[0], initial_node=initial_node,
510 | recdepth=recdepth + 1, recvisited=recvisited)
511 | consequent = get_node_computed_value(node.children[1], initial_node=initial_node,
512 | recdepth=recdepth + 1, recvisited=recvisited)
513 | alternate = get_node_computed_value(node.children[2], initial_node=initial_node,
514 | recdepth=recdepth + 1, recvisited=recvisited)
515 | if not isinstance(test, bool):
516 | test = None # So that must be either True, False or None
517 | if test is None:
518 | return [alternate, consequent]
519 | if test:
520 | return consequent
521 | return alternate
522 |
523 |
524 | def compute_assignment_expression(node, initial_node, recdepth=0, recvisited=None):
525 | """ Computes the value of an AssignmentExpression node. """
526 |
527 | var = node.children[0] # Value coming from the right: a = b = value, computing a knowing b
528 | if isinstance(var, _node.Value) and var.value is not None:
529 | return var.value
530 | return get_node_computed_value(var, initial_node=initial_node,
531 | recdepth=recdepth + 1, recvisited=recvisited)
532 |
533 |
534 | def operator_plus(a, b):
535 | """ Evaluates a + b. """
536 | if isinstance(a, str) or isinstance(b, str):
537 | return str(a) + str(b)
538 | return a + b
539 |
540 |
541 | def operator_minus(a, b):
542 | """ Evaluates a - b. """
543 | return a - b
544 |
545 |
546 | def operator_asterisk(a, b):
547 | """ Evaluates a * b. """
548 | return a * b
549 |
550 |
551 | def operator_slash(a, b):
552 | """ Evaluates a / b. """
553 | if b == 0:
554 | logging.warning('Trying to compute %s / %s', a, b)
555 | return None
556 | return a / b
557 |
558 |
559 | def operator_2asterisk(a, b):
560 | """ Evaluates a ** b. """
561 | return a ** b
562 |
563 |
564 | def operator_modulo(a, b):
565 | """ Evaluates a % b. """
566 | return a % b
567 |
568 |
569 | def operator_plus_plus(a):
570 | """ Evaluates a++. """
571 | return a + 1
572 |
573 |
574 | def operator_minus_minus(a):
575 | """ Evaluates a--. """
576 | return a - 1
577 |
578 |
579 | def operator_equal(a, b):
580 | """ Evaluates a == b. """
581 | return a == b
582 |
583 |
584 | def operator_different(a, b):
585 | """ Evaluates a != b. """
586 | return a != b
587 |
588 |
589 | def operator_not(a):
590 | """ Evaluates !a. """
591 | return not a
592 |
593 |
594 | def operator_bigger_equal(a, b):
595 | """ Evaluates a >= b. """
596 | return a >= b
597 |
598 |
599 | def operator_bigger(a, b):
600 | """ Evaluates a > b. """
601 | return a > b
602 |
603 |
604 | def operator_smaller_equal(a, b):
605 | """ Evaluates a <= b. """
606 | return a <= b
607 |
608 |
609 | def operator_smaller(a, b):
610 | """ Evaluates a < b. """
611 | return a < b
612 |
613 |
614 | def operator_and(a, b):
615 | """ Evaluates a and b. """
616 | return a and b
617 |
618 |
619 | def operator_or(a, b):
620 | """ Evaluates a or b. """
621 | return a or b
622 |
--------------------------------------------------------------------------------
/pdg_js/js_reserved.py:
--------------------------------------------------------------------------------
1 | # Copyright (C) 2021 Aurore Fass
2 | #
3 | # This program is free software: you can redistribute it and/or modify
4 | # it under the terms of the GNU Affero General Public License as published
5 | # by the Free Software Foundation, either version 3 of the License, or
6 | # (at your option) any later version.
7 | #
8 | # This program is distributed in the hope that it will be useful,
9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of
10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11 | # GNU Affero General Public License for more details.
12 | #
13 | # You should have received a copy of the GNU Affero General Public License
14 | # along with this program. If not, see .
15 |
16 |
17 | """
18 | JavaScript reserved keywords or words known by the interpreter.
19 | """
20 |
21 | # Note: slightly improved from HideNoSeek (browser extension keywords)
22 |
23 |
24 | RESERVED_WORDS = ["abstract", "arguments", "await", "boolean", "break", "byte", "case", "catch",
25 | "char", "class", "const", "continue", "debugger", "default", "delete", "do",
26 | "double", "else", "enum", "eval", "export", "extends", "false", "final",
27 | "finally", "float", "for", "function", "goto", "if", "implements", "import", "in",
28 | "instanceof", "int", "interface", "let", "long", "native", "new", "null",
29 | "package", "private", "protected", "public", "return", "short", "static", "super",
30 | "switch", "synchronized", "this", "throw", "throws", "transient", "true", "try",
31 | "typeof", "var", "void", "volatile", "while", "with", "yield", "Array",
32 | "Date", "eval", "function", "hasOwnProperty", "Infinity", "isFinite", "isNaN",
33 | "isPrototypeOf", "length", "Math", "NaN", "name", "Number", "Object", "prototype",
34 | "String", "toString", "undefined", "valueOf", "getClass", "java", "JavaArray",
35 | "javaClass", "JavaObject", "JavaPackage", "alert", "all", "anchor", "anchors",
36 | "area", "assign", "blur", "button", "checkbox", "clearInterval", "clearTimeout",
37 | "clientInformation", "close", "closed", "confirm", "constructor", "crypto",
38 | "decodeURI", "decodeURIComponent", "defaultStatus", "document", "element",
39 | "elements", "embed", "embeds", "encodeURI", "encodeURIComponent", "escape",
40 | "event", "fileUpload", "focus", "form", "forms", "frame", "innerHeight",
41 | "innerWidth", "layer", "layers", "link", "location", "mimeTypes", "navigate",
42 | "navigator", "frames", "frameRate", "hidden", "history", "image", "images",
43 | "offscreenBuffering", "open", "opener", "option", "outerHeight", "outerWidth",
44 | "packages", "pageXOffset", "pageYOffset", "parent", "parseFloat", "parseInt",
45 | "password", "pkcs11", "plugin", "prompt", "propertyIsEnum", "radio", "reset",
46 | "screenX", "screenY", "scroll", "secure", "select", "self", "setInterval",
47 | "setTimeout", "status", "submit", "taint", "text", "textarea", "top", "unescape",
48 | "untaint", "window", "onblur", "onclick", "onerror", "onfocus", "onkeydown",
49 | "onkeypress", "onkeyup", "onmouseover", "onload", "onmouseup", "onmousedown",
50 | "onsubmit",
51 | "define", "exports", "require", "each", "ActiveXObject", "console", "module",
52 | "Error", "TypeError", "RangeError", "RegExp", "Symbol", "Set"]
53 |
54 |
55 | BROWSER_EXTENSIONS = ['addEventListener', 'browser', 'chrome', 'localStorage', 'postMessage',
56 | 'Promise', 'JSON', 'XMLHttpRequest', '$', 'screen', 'CryptoJS']
57 |
58 | KNOWN_WORDS_LOWER = [word.lower() for word in RESERVED_WORDS + BROWSER_EXTENSIONS]
59 |
--------------------------------------------------------------------------------
/pdg_js/node.py:
--------------------------------------------------------------------------------
1 | # Copyright (C) 2021 Aurore Fass
2 | #
3 | # This program is free software: you can redistribute it and/or modify
4 | # it under the terms of the GNU Affero General Public License as published
5 | # by the Free Software Foundation, either version 3 of the License, or
6 | # (at your option) any later version.
7 | #
8 | # This program is distributed in the hope that it will be useful,
9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of
10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11 | # GNU Affero General Public License for more details.
12 | #
13 | # You should have received a copy of the GNU Affero General Public License
14 | # along with this program. If not, see .
15 |
16 |
17 | """
18 | Definition of classes:
19 | - Dependence;
20 | - Node;
21 | - Value;
22 | - Identifier(Node, Value);
23 | - ValueExpr(Node, Value);
24 | - Statement(Node);
25 | - ReturnStatement(Statement, Value);
26 | - Function;
27 | - FunctionDeclaration(Statement, Function);
28 | - FunctionExpression(Node, Function)
29 | """
30 |
31 | # Note: going significantly beyond the node structure of HideNoSeek:
32 | # semantic information to the nodes, which have different properties, e.g., DF on Identifier,
33 | # parameter flows, value handling, provenance tracking, etc
34 |
35 |
36 | import logging
37 | import random
38 |
39 | from . import utility_df
40 |
41 | EXPRESSIONS = ['AssignmentExpression', 'ArrayExpression', 'ArrowFunctionExpression',
42 | 'AwaitExpression', 'BinaryExpression', 'CallExpression', 'ClassExpression',
43 | 'ConditionalExpression', 'FunctionExpression', 'LogicalExpression',
44 | 'MemberExpression', 'NewExpression', 'ObjectExpression', 'SequenceExpression',
45 | 'TaggedTemplateExpression', 'ThisExpression', 'UnaryExpression', 'UpdateExpression',
46 | 'YieldExpression']
47 |
48 | EPSILON = ['BlockStatement', 'DebuggerStatement', 'EmptyStatement',
49 | 'ExpressionStatement', 'LabeledStatement', 'ReturnStatement',
50 | 'ThrowStatement', 'WithStatement', 'CatchClause', 'VariableDeclaration',
51 | 'FunctionDeclaration', 'ClassDeclaration']
52 |
53 | CONDITIONAL = ['DoWhileStatement', 'ForStatement', 'ForOfStatement', 'ForInStatement',
54 | 'IfStatement', 'SwitchCase', 'SwitchStatement', 'TryStatement',
55 | 'WhileStatement', 'ConditionalExpression']
56 |
57 | UNSTRUCTURED = ['BreakStatement', 'ContinueStatement']
58 |
59 | STATEMENTS = EPSILON + CONDITIONAL + UNSTRUCTURED
60 | CALL_EXPR = ['CallExpression', 'TaggedTemplateExpression', 'NewExpression']
61 | VALUE_EXPR = ['Literal', 'ArrayExpression', 'ObjectExpression', 'ObjectPattern'] + CALL_EXPR
62 | COMMENTS = ['Line', 'Block']
63 |
64 | GLOBAL_VAR = ['window', 'this', 'self', 'top', 'global', 'that']
65 |
66 | LIMIT_SIZE = utility_df.LIMIT_SIZE # To avoid list values with over 1,000 characters
67 |
68 |
69 | class Dependence:
70 | """ For control, data, comment, and statement dependencies. """
71 |
72 | def __init__(self, dependency_type, extremity, label, nearest_statement=None):
73 | self.type = dependency_type
74 | self.extremity = extremity
75 | self.nearest_statement = nearest_statement
76 | self.label = label
77 |
78 |
79 | class Node:
80 | """ Defines a Node that is used in the AST. """
81 |
82 | id = random.randint(0, 2*32) # To limit id collision between 2 ASTs from separate processes
83 |
84 | def __init__(self, name, parent=None):
85 | self.name = name
86 | self.id = Node.id
87 | Node.id += 1
88 | self.filename = ''
89 | self.attributes = {}
90 | self.body = None
91 | self.body_list = False
92 | self.parent = parent
93 | self.children = []
94 | self.statement_dep_parents = []
95 | self.statement_dep_children = [] # Between Statement and their non-Statement descendants
96 |
97 | def is_leaf(self):
98 | return not self.children
99 |
100 | def set_attribute(self, attribute_type, node_attribute):
101 | self.attributes[attribute_type] = node_attribute
102 |
103 | def set_body(self, body):
104 | self.body = body
105 |
106 | def set_body_list(self, bool_body_list):
107 | self.body_list = bool_body_list
108 |
109 | def set_parent(self, parent):
110 | self.parent = parent
111 |
112 | def set_child(self, child):
113 | self.children.append(child)
114 |
115 | def adopt_child(self, step_daddy): # child = self changes parent
116 | old_parent = self.parent
117 | old_parent.children.remove(self) # Old parent does not point to the child anymore
118 | step_daddy.children.insert(0, self) # New parent points to the child
119 | self.set_parent(step_daddy) # The child points to its new parent
120 |
121 | def set_statement_dependency(self, extremity):
122 | self.statement_dep_children.append(Dependence('statement dependency', extremity, ''))
123 | extremity.statement_dep_parents.append(Dependence('statement dependency', self, ''))
124 |
125 | # def set_comment_dependency(self, extremity):
126 | # self.statement_dep_children.append(Dependence('comment dependency', extremity, 'c'))
127 | # extremity.statement_dep_parents.append(Dependence('comment dependency', self, 'c'))
128 |
129 | def is_comment(self):
130 | if self.name in COMMENTS:
131 | return True
132 | return False
133 |
134 | def get_node_attributes(self):
135 | """ Get the attributes regex, value or name of a node. """
136 | node_attribute = self.attributes
137 | if 'regex' in node_attribute:
138 | regex = node_attribute['regex']
139 | if isinstance(regex, dict) and 'pattern' in regex:
140 | return True, '/' + str(regex['pattern']) + '/'
141 | if 'value' in node_attribute:
142 | value = node_attribute['value']
143 | if isinstance(value, dict) and 'raw' in value:
144 | return True, value['raw']
145 | return True, node_attribute['value']
146 | if 'name' in node_attribute:
147 | return True, node_attribute['name']
148 | return False, None # Just None was a pb when used in get_node_value as value could be None
149 |
150 | def get_line(self):
151 | """ Gets the line number where a given node is defined. """
152 | try:
153 | line_begin = self.attributes['loc']['start']['line']
154 | line_end = self.attributes['loc']['end']['line']
155 | return str(line_begin) + ' - ' + str(line_end)
156 | except KeyError:
157 | return None
158 |
159 | def get_file(self):
160 | parent = self
161 | while True:
162 | if parent is not None and parent.parent:
163 | parent = parent.parent
164 | else:
165 | break
166 | if parent is not None:
167 | if "filename" in parent.attributes:
168 | return parent.attributes["filename"]
169 | return ''
170 |
171 |
172 | def literal_type(literal_node):
173 | """ Gets the type of a Literal node. """
174 |
175 | if 'value' in literal_node.attributes:
176 | literal = literal_node.attributes['value']
177 | if isinstance(literal, str):
178 | return 'String'
179 | if isinstance(literal, int):
180 | return 'Int'
181 | if isinstance(literal, float):
182 | return 'Numeric'
183 | if isinstance(literal, bool):
184 | return 'Bool'
185 | if literal == 'null' or literal is None:
186 | return 'Null'
187 | if 'regex' in literal_node.attributes:
188 | return 'RegExp'
189 | logging.error('The literal %s has an unknown type', literal_node.attributes['raw'])
190 | return None
191 |
192 |
193 | def shorten_value_list(value_list, value_list_shortened, counter=0):
194 | """ When a value is a list, shorten it so that keep at most LIMIT_SIZE characters. """
195 |
196 | for el in value_list:
197 | if isinstance(el, list):
198 | value_list_shortened.append([])
199 | counter = shorten_value_list(el, value_list_shortened[-1], counter)
200 | if counter >= LIMIT_SIZE:
201 | return counter
202 | elif isinstance(el, str):
203 | counter += len(el)
204 | if counter < LIMIT_SIZE:
205 | value_list_shortened.append(el)
206 | else:
207 | counter += len(str(el))
208 | if counter < LIMIT_SIZE:
209 | value_list_shortened.append(el)
210 | return counter
211 |
212 |
213 | def shorten_value_dict(value_dict, value_dict_shortened, counter=0, visited=None):
214 | """ When a value is a dict, shorten it so that keep at most LIMIT_SIZE characters. """
215 |
216 | if visited is None:
217 | visited = set()
218 | if id(value_dict) in visited:
219 | return counter
220 | visited.add(id(value_dict))
221 |
222 | for k, v in value_dict.items():
223 | if isinstance(k, str):
224 | counter += len(k)
225 | if isinstance(v, list):
226 | value_dict_shortened[k] = []
227 | counter = shorten_value_list(v, value_dict_shortened[k], counter)
228 | if counter >= LIMIT_SIZE:
229 | return counter
230 | elif isinstance(v, dict):
231 | value_dict_shortened[k] = {}
232 | if id(v) in visited:
233 | return counter
234 | counter = shorten_value_dict(v, value_dict_shortened[k], counter, visited)
235 | if counter >= LIMIT_SIZE:
236 | return counter
237 | elif isinstance(v, str):
238 | counter += len(v)
239 | if counter < LIMIT_SIZE:
240 | value_dict_shortened[k] = v
241 | else:
242 | counter += len(str(v))
243 | if counter < LIMIT_SIZE:
244 | value_dict_shortened[k] = v
245 | return counter
246 |
247 |
248 | class Value:
249 | """ To store the value of a specific node. """
250 |
251 | def __init__(self):
252 | self.value = None
253 | self.update_value = True
254 | self.provenance_children = []
255 | self.provenance_parents = []
256 | self.provenance_children_set = set()
257 | self.provenance_parents_set = set()
258 | self.seen_provenance = set()
259 |
260 | def set_value(self, value):
261 | if isinstance(value, list): # To shorten value if over LIMIT_SIZE characters
262 | value_shortened = []
263 | counter = shorten_value_list(value, value_shortened)
264 | if counter >= LIMIT_SIZE:
265 | value = value_shortened
266 | logging.warning('Shortened the value of %s %s', self.name, self.attributes)
267 | elif isinstance(value, dict): # To shorten value if over LIMIT_SIZE characters
268 | value_shortened = {}
269 | counter = shorten_value_dict(value, value_shortened)
270 | if counter >= LIMIT_SIZE:
271 | value = value_shortened
272 | logging.warning('Shortened the value of %s %s', self.name, self.attributes)
273 | elif isinstance(value, str): # To shorten value if over LIMIT_SIZE characters
274 | value = value[:LIMIT_SIZE]
275 | self.value = value
276 |
277 | def set_update_value(self, update_value):
278 | self.update_value = update_value
279 |
280 | def set_provenance_dd(self, extremity): # Set Node provenance, set_data_dependency case
281 | # self is the origin of the DD while extremity is the destination of the DD
282 | if extremity.provenance_children:
283 | for child in extremity.provenance_children:
284 | if child not in self.provenance_children_set:
285 | self.provenance_children_set.add(child)
286 | self.provenance_children.append(child)
287 | else:
288 | if extremity not in self.provenance_children_set:
289 | self.provenance_children_set.add(extremity)
290 | self.provenance_children.append(extremity)
291 | if self.provenance_parents:
292 | for parent in self.provenance_parents:
293 | if parent not in extremity.provenance_parents_set:
294 | extremity.provenance_parents_set.add(parent)
295 | extremity.provenance_parents.append(parent)
296 | else:
297 | if self not in extremity.provenance_parents_set:
298 | extremity.provenance_parents_set.add(self)
299 | extremity.provenance_parents.append(self)
300 |
301 | def set_provenance(self, extremity): # Set Node provenance, computed value case
302 | """
303 | a.b = c
304 | """
305 | if extremity in self.seen_provenance:
306 | pass
307 | self.seen_provenance.add(extremity)
308 | # extremity was leveraged to compute the value of self
309 | if not isinstance(extremity, Node): # extremity is None:
310 | if self not in self.provenance_parents_set:
311 | self.provenance_parents_set.add(self)
312 | self.provenance_parents.append(self)
313 | elif isinstance(extremity, Value):
314 | if extremity.provenance_parents:
315 | for parent in extremity.provenance_parents:
316 | if parent not in self.provenance_parents_set:
317 | self.provenance_parents_set.add(parent)
318 | self.provenance_parents.append(parent)
319 | else:
320 | if extremity not in self.provenance_parents_set:
321 | self.provenance_parents_set.add(extremity)
322 | self.provenance_parents.append(extremity)
323 | if self.provenance_children:
324 | for child in self.provenance_children:
325 | if child not in extremity.provenance_children_set:
326 | extremity.provenance_children_set.add(child)
327 | extremity.provenance_children.append(child)
328 | else:
329 | if self not in extremity.provenance_children_set:
330 | extremity.provenance_children_set.add(self)
331 | extremity.provenance_children.append(self)
332 | elif isinstance(extremity, Node): # Otherwise very restrictive
333 | self.provenance_parents_set.add(extremity)
334 | self.provenance_parents.append(extremity)
335 | for extremity_child in extremity.children: # Not necessarily useful
336 | self.set_provenance(extremity_child)
337 |
338 | def set_provenance_rec(self, extremity):
339 | self.set_provenance(extremity)
340 | for child in extremity.children:
341 | self.set_provenance_rec(child)
342 |
343 |
344 | class Identifier(Node, Value):
345 | """ Identifier Nodes. DD is on Identifier nodes. """
346 |
347 | def __init__(self, name, parent):
348 | Node.__init__(self, name, parent)
349 | Value.__init__(self)
350 | self.code = None
351 | self.fun = None
352 | self.data_dep_parents = []
353 | self.data_dep_children = []
354 |
355 | def set_code(self, code):
356 | self.code = code
357 |
358 | def set_fun(self, fun): # The Identifier node refers to a function ('s name)
359 | self.fun = fun
360 |
361 | def set_data_dependency(self, extremity, nearest_statement=None):
362 | if extremity not in [el.extremity for el in self.data_dep_children]: # Avoids duplicates
363 | self.data_dep_children.append(Dependence('data dependency', extremity, 'data',
364 | nearest_statement))
365 | extremity.data_dep_parents.append(Dependence('data dependency', self, 'data',
366 | nearest_statement))
367 | self.set_provenance_dd(extremity) # Stored provenance
368 |
369 |
370 | class ValueExpr(Node, Value):
371 | """ Nodes from VALUE_EXPR which therefore have a value that should be stored. """
372 |
373 | def __init__(self, name, parent):
374 | Node.__init__(self, name, parent)
375 | Value.__init__(self)
376 |
377 |
378 | class Statement(Node):
379 | """ Statement Nodes, see STATEMENTS. """
380 |
381 | def __init__(self, name, parent):
382 | Node.__init__(self, name, parent)
383 | self.control_dep_parents = []
384 | self.control_dep_children = []
385 |
386 | def set_control_dependency(self, extremity, label):
387 | self.control_dep_children.append(Dependence('control dependency', extremity, label))
388 | try:
389 | extremity.control_dep_parents.append(Dependence('control dependency', self, label))
390 | except AttributeError as e:
391 | logging.debug('Unable to build a CF to go up the tree: %s', e)
392 |
393 | def remove_control_dependency(self, extremity):
394 | for i, _ in enumerate(self.control_dep_children):
395 | elt = self.control_dep_children[i]
396 | if elt.extremity.id == extremity.id:
397 | del self.control_dep_children[i]
398 | try:
399 | del extremity.control_dep_parents[i]
400 | except AttributeError as e:
401 | logging.debug('No CF going up the tree to delete: %s', e)
402 |
403 |
404 | class ReturnStatement(Statement, Value):
405 | """ ReturnStatement Node. It is a Statement that also has the attributes of a Value. """
406 |
407 | def __init__(self, name, parent):
408 | Statement.__init__(self, name, parent)
409 | Value.__init__(self)
410 |
411 |
412 | class Function:
413 | """ To store function related information. """
414 |
415 | def __init__(self):
416 | self.fun_name = None
417 | self.fun_params = []
418 | self.fun_return = []
419 | self.retraverse = False # Indicates if we are traversing a given node again
420 | self.called = False
421 |
422 | def set_fun_name(self, fun_name):
423 | self.fun_name = fun_name
424 | fun_name.set_fun(self) # Identifier fun_name has a handler to the function declaration self
425 |
426 | def add_fun_param(self, fun_param):
427 | self.fun_params.append(fun_param)
428 |
429 | def add_fun_return(self, fun_return):
430 | # if fun_return.id not in [el.id for el in self.fun_return]: # Avoids duplicates
431 | # Duplicates are okay, because we only consider the last return value from the list
432 | return_id_list = [el.id for el in self.fun_return]
433 | if not return_id_list:
434 | self.fun_return.append(fun_return)
435 | elif fun_return.id != return_id_list[-1]: # Avoids duplicates if already considered one
436 | self.fun_return.append(fun_return)
437 |
438 | def set_retraverse(self):
439 | self.retraverse = True
440 |
441 | def call_function(self):
442 | self.called = True
443 |
444 |
445 | class FunctionDeclaration(Statement, Function):
446 | """ FunctionDeclaration Node. It is a Statement that also has the attributes of a Function. """
447 |
448 | def __init__(self, name, parent):
449 | Statement.__init__(self, name, parent)
450 | Function.__init__(self)
451 |
452 |
453 | class FunctionExpression(Node, Function):
454 | """ FunctionExpression and ArrowFunctionExpression Nodes. Have the attributes of a Function. """
455 |
456 | def __init__(self, name, parent):
457 | Node.__init__(self, name, parent)
458 | Function.__init__(self)
459 | self.fun_intern_name = None
460 |
461 | def set_fun_intern_name(self, fun_intern_name):
462 | self.fun_intern_name = fun_intern_name # Name used if FunExpr referenced inside itself
463 | fun_intern_name.set_fun(self) # fun_intern_name has a handler to the function declaration
464 |
--------------------------------------------------------------------------------
/pdg_js/package-lock.json:
--------------------------------------------------------------------------------
1 | {
2 | "name": "pdg_js",
3 | "lockfileVersion": 2,
4 | "requires": true,
5 | "packages": {
6 | "": {
7 | "dependencies": {
8 | "escodegen": "^2.0.0",
9 | "esprima": "^4.0.1"
10 | }
11 | },
12 | "node_modules/deep-is": {
13 | "version": "0.1.4",
14 | "resolved": "https://registry.npmjs.org/deep-is/-/deep-is-0.1.4.tgz",
15 | "integrity": "sha512-oIPzksmTg4/MriiaYGO+okXDT7ztn/w3Eptv/+gSIdMdKsJo0u4CfYNFJPy+4SKMuCqGw2wxnA+URMg3t8a/bQ=="
16 | },
17 | "node_modules/escodegen": {
18 | "version": "2.0.0",
19 | "resolved": "https://registry.npmjs.org/escodegen/-/escodegen-2.0.0.tgz",
20 | "integrity": "sha512-mmHKys/C8BFUGI+MAWNcSYoORYLMdPzjrknd2Vc+bUsjN5bXcr8EhrNB+UTqfL1y3I9c4fw2ihgtMPQLBRiQxw==",
21 | "dependencies": {
22 | "esprima": "^4.0.1",
23 | "estraverse": "^5.2.0",
24 | "esutils": "^2.0.2",
25 | "optionator": "^0.8.1"
26 | },
27 | "bin": {
28 | "escodegen": "bin/escodegen.js",
29 | "esgenerate": "bin/esgenerate.js"
30 | },
31 | "engines": {
32 | "node": ">=6.0"
33 | },
34 | "optionalDependencies": {
35 | "source-map": "~0.6.1"
36 | }
37 | },
38 | "node_modules/esprima": {
39 | "version": "4.0.1",
40 | "resolved": "https://registry.npmjs.org/esprima/-/esprima-4.0.1.tgz",
41 | "integrity": "sha512-eGuFFw7Upda+g4p+QHvnW0RyTX/SVeJBDM/gCtMARO0cLuT2HcEKnTPvhjV6aGeqrCB/sbNop0Kszm0jsaWU4A==",
42 | "bin": {
43 | "esparse": "bin/esparse.js",
44 | "esvalidate": "bin/esvalidate.js"
45 | },
46 | "engines": {
47 | "node": ">=4"
48 | }
49 | },
50 | "node_modules/estraverse": {
51 | "version": "5.3.0",
52 | "resolved": "https://registry.npmjs.org/estraverse/-/estraverse-5.3.0.tgz",
53 | "integrity": "sha512-MMdARuVEQziNTeJD8DgMqmhwR11BRQ/cBP+pLtYdSTnf3MIO8fFeiINEbX36ZdNlfU/7A9f3gUw49B3oQsvwBA==",
54 | "engines": {
55 | "node": ">=4.0"
56 | }
57 | },
58 | "node_modules/esutils": {
59 | "version": "2.0.3",
60 | "resolved": "https://registry.npmjs.org/esutils/-/esutils-2.0.3.tgz",
61 | "integrity": "sha512-kVscqXk4OCp68SZ0dkgEKVi6/8ij300KBWTJq32P/dYeWTSwK41WyTxalN1eRmA5Z9UU/LX9D7FWSmV9SAYx6g==",
62 | "engines": {
63 | "node": ">=0.10.0"
64 | }
65 | },
66 | "node_modules/fast-levenshtein": {
67 | "version": "2.0.6",
68 | "resolved": "https://registry.npmjs.org/fast-levenshtein/-/fast-levenshtein-2.0.6.tgz",
69 | "integrity": "sha1-PYpcZog6FqMMqGQ+hR8Zuqd5eRc="
70 | },
71 | "node_modules/levn": {
72 | "version": "0.3.0",
73 | "resolved": "https://registry.npmjs.org/levn/-/levn-0.3.0.tgz",
74 | "integrity": "sha1-OwmSTt+fCDwEkP3UwLxEIeBHZO4=",
75 | "dependencies": {
76 | "prelude-ls": "~1.1.2",
77 | "type-check": "~0.3.2"
78 | },
79 | "engines": {
80 | "node": ">= 0.8.0"
81 | }
82 | },
83 | "node_modules/optionator": {
84 | "version": "0.8.3",
85 | "resolved": "https://registry.npmjs.org/optionator/-/optionator-0.8.3.tgz",
86 | "integrity": "sha512-+IW9pACdk3XWmmTXG8m3upGUJst5XRGzxMRjXzAuJ1XnIFNvfhjjIuYkDvysnPQ7qzqVzLt78BCruntqRhWQbA==",
87 | "dependencies": {
88 | "deep-is": "~0.1.3",
89 | "fast-levenshtein": "~2.0.6",
90 | "levn": "~0.3.0",
91 | "prelude-ls": "~1.1.2",
92 | "type-check": "~0.3.2",
93 | "word-wrap": "~1.2.3"
94 | },
95 | "engines": {
96 | "node": ">= 0.8.0"
97 | }
98 | },
99 | "node_modules/prelude-ls": {
100 | "version": "1.1.2",
101 | "resolved": "https://registry.npmjs.org/prelude-ls/-/prelude-ls-1.1.2.tgz",
102 | "integrity": "sha1-IZMqVJ9eUv/ZqCf1cOBL5iqX2lQ=",
103 | "engines": {
104 | "node": ">= 0.8.0"
105 | }
106 | },
107 | "node_modules/source-map": {
108 | "version": "0.6.1",
109 | "resolved": "https://registry.npmjs.org/source-map/-/source-map-0.6.1.tgz",
110 | "integrity": "sha512-UjgapumWlbMhkBgzT7Ykc5YXUT46F0iKu8SGXq0bcwP5dz/h0Plj6enJqjz1Zbq2l5WaqYnrVbwWOWMyF3F47g==",
111 | "optional": true,
112 | "engines": {
113 | "node": ">=0.10.0"
114 | }
115 | },
116 | "node_modules/type-check": {
117 | "version": "0.3.2",
118 | "resolved": "https://registry.npmjs.org/type-check/-/type-check-0.3.2.tgz",
119 | "integrity": "sha1-WITKtRLPHTVeP7eE8wgEsrUg23I=",
120 | "dependencies": {
121 | "prelude-ls": "~1.1.2"
122 | },
123 | "engines": {
124 | "node": ">= 0.8.0"
125 | }
126 | },
127 | "node_modules/word-wrap": {
128 | "version": "1.2.3",
129 | "resolved": "https://registry.npmjs.org/word-wrap/-/word-wrap-1.2.3.tgz",
130 | "integrity": "sha512-Hz/mrNwitNRh/HUAtM/VT/5VH+ygD6DV7mYKZAtHOrbs8U7lvPS6xf7EJKMF0uW1KJCl0H701g3ZGus+muE5vQ==",
131 | "engines": {
132 | "node": ">=0.10.0"
133 | }
134 | }
135 | },
136 | "dependencies": {
137 | "deep-is": {
138 | "version": "0.1.4",
139 | "resolved": "https://registry.npmjs.org/deep-is/-/deep-is-0.1.4.tgz",
140 | "integrity": "sha512-oIPzksmTg4/MriiaYGO+okXDT7ztn/w3Eptv/+gSIdMdKsJo0u4CfYNFJPy+4SKMuCqGw2wxnA+URMg3t8a/bQ=="
141 | },
142 | "escodegen": {
143 | "version": "2.0.0",
144 | "resolved": "https://registry.npmjs.org/escodegen/-/escodegen-2.0.0.tgz",
145 | "integrity": "sha512-mmHKys/C8BFUGI+MAWNcSYoORYLMdPzjrknd2Vc+bUsjN5bXcr8EhrNB+UTqfL1y3I9c4fw2ihgtMPQLBRiQxw==",
146 | "requires": {
147 | "esprima": "^4.0.1",
148 | "estraverse": "^5.2.0",
149 | "esutils": "^2.0.2",
150 | "optionator": "^0.8.1",
151 | "source-map": "~0.6.1"
152 | }
153 | },
154 | "esprima": {
155 | "version": "4.0.1",
156 | "resolved": "https://registry.npmjs.org/esprima/-/esprima-4.0.1.tgz",
157 | "integrity": "sha512-eGuFFw7Upda+g4p+QHvnW0RyTX/SVeJBDM/gCtMARO0cLuT2HcEKnTPvhjV6aGeqrCB/sbNop0Kszm0jsaWU4A=="
158 | },
159 | "estraverse": {
160 | "version": "5.3.0",
161 | "resolved": "https://registry.npmjs.org/estraverse/-/estraverse-5.3.0.tgz",
162 | "integrity": "sha512-MMdARuVEQziNTeJD8DgMqmhwR11BRQ/cBP+pLtYdSTnf3MIO8fFeiINEbX36ZdNlfU/7A9f3gUw49B3oQsvwBA=="
163 | },
164 | "esutils": {
165 | "version": "2.0.3",
166 | "resolved": "https://registry.npmjs.org/esutils/-/esutils-2.0.3.tgz",
167 | "integrity": "sha512-kVscqXk4OCp68SZ0dkgEKVi6/8ij300KBWTJq32P/dYeWTSwK41WyTxalN1eRmA5Z9UU/LX9D7FWSmV9SAYx6g=="
168 | },
169 | "fast-levenshtein": {
170 | "version": "2.0.6",
171 | "resolved": "https://registry.npmjs.org/fast-levenshtein/-/fast-levenshtein-2.0.6.tgz",
172 | "integrity": "sha1-PYpcZog6FqMMqGQ+hR8Zuqd5eRc="
173 | },
174 | "levn": {
175 | "version": "0.3.0",
176 | "resolved": "https://registry.npmjs.org/levn/-/levn-0.3.0.tgz",
177 | "integrity": "sha1-OwmSTt+fCDwEkP3UwLxEIeBHZO4=",
178 | "requires": {
179 | "prelude-ls": "~1.1.2",
180 | "type-check": "~0.3.2"
181 | }
182 | },
183 | "optionator": {
184 | "version": "0.8.3",
185 | "resolved": "https://registry.npmjs.org/optionator/-/optionator-0.8.3.tgz",
186 | "integrity": "sha512-+IW9pACdk3XWmmTXG8m3upGUJst5XRGzxMRjXzAuJ1XnIFNvfhjjIuYkDvysnPQ7qzqVzLt78BCruntqRhWQbA==",
187 | "requires": {
188 | "deep-is": "~0.1.3",
189 | "fast-levenshtein": "~2.0.6",
190 | "levn": "~0.3.0",
191 | "prelude-ls": "~1.1.2",
192 | "type-check": "~0.3.2",
193 | "word-wrap": "~1.2.3"
194 | }
195 | },
196 | "prelude-ls": {
197 | "version": "1.1.2",
198 | "resolved": "https://registry.npmjs.org/prelude-ls/-/prelude-ls-1.1.2.tgz",
199 | "integrity": "sha1-IZMqVJ9eUv/ZqCf1cOBL5iqX2lQ="
200 | },
201 | "source-map": {
202 | "version": "0.6.1",
203 | "resolved": "https://registry.npmjs.org/source-map/-/source-map-0.6.1.tgz",
204 | "integrity": "sha512-UjgapumWlbMhkBgzT7Ykc5YXUT46F0iKu8SGXq0bcwP5dz/h0Plj6enJqjz1Zbq2l5WaqYnrVbwWOWMyF3F47g==",
205 | "optional": true
206 | },
207 | "type-check": {
208 | "version": "0.3.2",
209 | "resolved": "https://registry.npmjs.org/type-check/-/type-check-0.3.2.tgz",
210 | "integrity": "sha1-WITKtRLPHTVeP7eE8wgEsrUg23I=",
211 | "requires": {
212 | "prelude-ls": "~1.1.2"
213 | }
214 | },
215 | "word-wrap": {
216 | "version": "1.2.3",
217 | "resolved": "https://registry.npmjs.org/word-wrap/-/word-wrap-1.2.3.tgz",
218 | "integrity": "sha512-Hz/mrNwitNRh/HUAtM/VT/5VH+ygD6DV7mYKZAtHOrbs8U7lvPS6xf7EJKMF0uW1KJCl0H701g3ZGus+muE5vQ=="
219 | }
220 | }
221 | }
222 |
--------------------------------------------------------------------------------
/pdg_js/package.json:
--------------------------------------------------------------------------------
1 | {
2 | "dependencies": {
3 | "escodegen": "^2.0.0",
4 | "esprima": "^4.0.1"
5 | }
6 | }
7 |
--------------------------------------------------------------------------------
/pdg_js/parser.js:
--------------------------------------------------------------------------------
1 | // Copyright (C) 2021 Aurore Fass
2 | // Copyright (C) 2022 Anonymous
3 | //
4 | // This program is free software: you can redistribute it and/or modify
5 | // it under the terms of the GNU Affero General Public License as published
6 | // by the Free Software Foundation, either version 3 of the License, or
7 | // (at your option) any later version.
8 | //
9 | // This program is distributed in the hope that it will be useful,
10 | // but WITHOUT ANY WARRANTY; without even the implied warranty of
11 | // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12 | // GNU Affero General Public License for more details.
13 | //
14 | // You should have received a copy of the GNU Affero General Public License
15 | // along with this program. If not, see .
16 |
17 |
18 | // Conversion of a JS file into its Esprima AST.
19 |
20 |
21 | module.exports = {
22 | js2ast: js2ast,
23 | };
24 |
25 |
26 | var esprima = require("esprima");
27 | var es = require("escodegen");
28 | var fs = require("fs");
29 | var path = require("path");
30 | var process = require("process");
31 |
32 |
33 | /**
34 | * Extraction of the AST of an input JS file using Esprima.
35 | *
36 | * @param js
37 | * @param json_path
38 | * @returns {*}
39 | */
40 | function js2ast(js, json_path) {
41 | var text = fs.readFileSync(js).toString('utf-8');
42 | try {
43 | var ast = esprima.parseModule(text, {
44 | range: true,
45 | loc: true,
46 | tokens: true,
47 | tolerant: true,
48 | comment: true
49 | });
50 | } catch(e) {
51 | console.error(js, e);
52 | process.exit(1);
53 | }
54 |
55 | // Attaching comments is a separate step for Escodegen
56 | ast = es.attachComments(ast, ast.comments, ast.tokens);
57 |
58 | fs.mkdirSync(path.dirname(json_path), {recursive: true});
59 | fs.writeFile(json_path, JSON.stringify(ast), function (err) {
60 | if (err) {
61 | console.error(err);
62 | }
63 | });
64 |
65 | return ast;
66 | }
67 |
68 | js2ast(process.argv[2], process.argv[3]);
69 |
--------------------------------------------------------------------------------
/pdg_js/pointer_analysis.py:
--------------------------------------------------------------------------------
1 | # Copyright (C) 2021 Aurore Fass
2 | #
3 | # This program is free software: you can redistribute it and/or modify
4 | # it under the terms of the GNU Affero General Public License as published
5 | # by the Free Software Foundation, either version 3 of the License, or
6 | # (at your option) any later version.
7 | #
8 | # This program is distributed in the hope that it will be useful,
9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of
10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11 | # GNU Affero General Public License for more details.
12 | #
13 | # You should have received a copy of the GNU Affero General Public License
14 | # along with this program. If not, see .
15 |
16 |
17 | """
18 | Pointer analysis; mapping a variable to where its value is defined.
19 | """
20 |
21 | import logging
22 |
23 | from . import js_operators
24 | from .value_filters import get_node_computed_value, display_values
25 | from . import node as _node
26 |
27 |
28 | """
29 | In the following and if not stated otherwise,
30 | - node: Node
31 | Current node.
32 | - identifiers: list
33 | List of Identifier nodes whose values we aim at computing.
34 | - operator: None or str (e.g., '+=').
35 | """
36 |
37 |
38 | def get_node_path(begin_node, destination_node, path):
39 | """
40 | Find the path between begin_node and destination_node.
41 | -------
42 | Parameters:
43 | - begin_node: Node
44 | Entry point, origin.
45 | - destination_node: Node
46 | Descendant of begin_node. Destination point.
47 | - path: list
48 | Path between begin_node and destination_node.
49 | Ex: [0, 0, 1] <=> begin_node.children[0].children[0].children[1] = destination_node.
50 | """
51 |
52 | if begin_node.id == destination_node.id:
53 | return True
54 |
55 | for i, _ in enumerate(begin_node.children):
56 | path.append(i) # Child number i
57 | found = get_node_path(begin_node.children[i], destination_node, path)
58 | if found:
59 | return True
60 | del path[-1]
61 | return False
62 |
63 |
64 | def find_node(var, begin_node, path):
65 | """ Find the node whose path from begin_node is given. """
66 |
67 | logging.debug('Trying to find the node symmetric from %s using the following path %s from %s',
68 | var.name, path, begin_node.name)
69 | while path:
70 | child_nb = path.pop(0)
71 | try:
72 | begin_node = begin_node.children[child_nb]
73 | except IndexError: # Case Asymmetric mapping, e.g., Array or Object mapped to an Identifier
74 | return begin_node, None
75 |
76 | if not path: # begin_node is already the node we are looking for
77 | return begin_node, None
78 |
79 | # Case Asymmetric mapping, e.g., Identifier mapped to an Array or else
80 | logging.debug('Asymmetric mapping case')
81 | if begin_node.name in ('ArrayExpression', 'ObjectExpression', 'ObjectPattern', 'NewExpression'):
82 | value = begin_node
83 | logging.debug('The value corresponds to node %s', value.name)
84 | return None, value
85 |
86 | return begin_node, None
87 |
88 |
89 | def get_member_expression(node):
90 | """ Returns:
91 | - if a MemberExpression node ascendant was found;
92 | - the furthest MemberExpression ascendant (if True) or node.
93 | - if we are in a window.node or this.node situation. """
94 |
95 | if node.parent.name != 'MemberExpression':
96 | return False, node, False
97 |
98 | while node.parent.name == 'MemberExpression':
99 | if node.parent.children[0].name == 'ThisExpression'\
100 | or get_node_computed_value(node.parent.children[0]) in _node.GLOBAL_VAR:
101 | return False, node, True
102 | node = node.parent
103 | return True, node, False
104 |
105 |
106 | def map_var2value(node, identifiers, operator=None):
107 | """
108 | Map identifier nodes to their corresponding Literal/Identifier values.
109 |
110 | -------
111 | Parameters:
112 | - node: Node
113 | Entry point, either a VariableDeclaration or AssignmentExpression node.
114 | Therefore: node.children[0] => Identifier = considered variable;
115 | node.children[1] => Identifier/Literal = corresponding value
116 | - identifiers: list
117 | List of Identifier nodes to map to their values.
118 |
119 | Trick: Symmetry between AST left-hand side (declaration) and right-hand side (value).
120 | """
121 |
122 | if node.name != 'VariableDeclarator' and node.name != 'AssignmentExpression' \
123 | and node.name != 'Property':
124 | # Could be called on other Nodes because of assignment_expr_df which calculates DD on
125 | # right-hand side elements which may not be variable declarations/assignments anymore
126 | return
127 |
128 | var = node.children[0]
129 | init = node.children[1]
130 |
131 | for decl in identifiers:
132 | # Compute the value for each decl, as it might have changed
133 | logging.debug('Computing a value for the variable %s with id %s',
134 | decl.attributes['name'], decl.id)
135 |
136 | decl.set_update_value(True) # Will be updated when printed in display_temp
137 | member_expr, decl, this_window = get_member_expression(decl)
138 |
139 | path = list()
140 | get_node_path(var, decl, path)
141 | if this_window:
142 | path.pop() # We jump over the MemberExpression parent to keep the symmetry
143 |
144 | if isinstance(init, _node.Identifier) and isinstance(init.value, _node.Node):
145 | try:
146 | logging.debug('The variable %s was initialized with the Identifier %s which already'
147 | ' has a value', decl.attributes['name'], init.attributes['name'])
148 | except KeyError:
149 | logging.debug('The variable %s was initialized with the Identifier %s which already'
150 | ' has a value', decl.name, init.name)
151 | value_node, value = find_node(var, init.value, path)
152 | else:
153 | if isinstance(decl, _node.Identifier):
154 | logging.debug('The variable %s was not initialized with an Identifier or '
155 | 'it does not already have a value', decl.attributes['name'])
156 | else:
157 | logging.debug('The %s %s was not initialized with an Identifier or '
158 | 'it does not already have a value', decl.name, decl.attributes)
159 | value_node, value = find_node(var, init, path)
160 | if value_node is not None:
161 | logging.debug('Got the node %s', value_node.name)
162 |
163 | if value is None:
164 | if isinstance(decl, _node.Identifier):
165 | logging.debug('Calculating the value of the variable %s', decl.attributes['name'])
166 | else:
167 | logging.debug('Calculating the value')
168 | if operator is None:
169 | logging.debug('Fetching the value')
170 | # We compute the value ourselves
171 | value = get_node_computed_value(value_node, initial_node=decl)
172 | if isinstance(decl, _node.Identifier):
173 | decl.set_code(node) # Add code
174 |
175 | else:
176 | logging.debug('Found the %s operator, computing the value ourselves', operator)
177 | # We compute the value ourselves: decl operator value_node
178 | value = js_operators.compute_operators(operator, decl, value_node,
179 | initial_node=decl)
180 | if isinstance(decl, _node.Identifier):
181 | decl.set_code(node) # Add code
182 |
183 | else:
184 | decl.set_code(node) # Add code
185 |
186 | if not member_expr: # Standard case, assign the value to the Identifier node
187 | logging.debug('Assigning the value %s to %s', value, decl.attributes['name'])
188 | decl.set_value(value)
189 | if isinstance(value_node, _node.FunctionExpression):
190 | fun_name = decl
191 | if value_node.fun_intern_name is not None:
192 | logging.debug('The variable %s refers to the (Arrow)FunctionExpresion %s',
193 | fun_name.attributes['name'],
194 | value_node.fun_intern_name.attributes['name'])
195 | else:
196 | logging.debug('The variable %s refers to an anonymous (Arrow)FunctionExpresion',
197 | fun_name.attributes['name'])
198 | value_node.set_fun_name(fun_name)
199 | else:
200 | display_values(decl) # Displays values
201 | else: # MemberExpression case
202 | logging.debug('MemberExpression case')
203 | literal_value = update_member_expression(decl, initial_node=decl)
204 | if isinstance(literal_value, _node.Value): # Everything is fine, can store value
205 | logging.debug('The object was defined, set the value of its property')
206 | literal_value.set_value(value) # Modifies value of the node referencing the MemExpr
207 | literal_value.set_provenance_rec(value_node) # Updates provenance
208 | display_values(literal_value) # Displays values
209 | else: # The object is probably a built-in object therefore no handle to get its prop
210 | logging.debug('The object was not defined, stored its property and set its value')
211 | obj, all_prop = define_obj_properties(decl, value, initial_node=decl)
212 | obj.set_value(all_prop)
213 | obj.set_provenance_rec(value_node) # Updates provenance
214 | display_values(obj)
215 |
216 |
217 | def compute_update_expression(node, identifier):
218 | """ Evaluates an UpdateExpression node. """
219 |
220 | identifier.set_update_value(True) # Will be updated when printed in display_temp
221 | operator = node.attributes['operator']
222 | value = js_operators.compute_operators(operator, identifier, 0)
223 | identifier.set_value(value)
224 | identifier.set_code(node.parent)
225 |
226 |
227 | def update_member_expression(member_expression_node, initial_node):
228 | """ If a MemberExpression is modified (i.e., left-hand side of an assignment),
229 | modifies the value of the node referencing the MemberExpression. """
230 |
231 | literal_value = js_operators.compute_member_expression(member_expression_node,
232 | initial_node=initial_node, compute=False)
233 | return literal_value
234 |
235 |
236 | def search_properties(node, tab):
237 | """ Searches the Identifier/Literal nodes properties of a MemberExpression node. """
238 |
239 | if node.name in ('Identifier', 'Literal'):
240 | if get_node_computed_value(node) not in _node.GLOBAL_VAR: # do nothing if window &co
241 | tab.append(node) # store left member as not window &co
242 |
243 | for child in node.children:
244 | search_properties(child, tab)
245 |
246 |
247 | def define_obj_properties(member_expression_node, value, initial_node):
248 | """ Defines the properties of a built-in object. Returns the object + its properties. """
249 |
250 | properties = []
251 | search_properties(member_expression_node, properties) # Got all prop
252 |
253 | obj = properties[0]
254 | obj_init = get_node_computed_value(obj, initial_node=initial_node)
255 | # The obj may already have some properties
256 | properties = properties[1:]
257 | properties_value = [get_node_computed_value(prop,
258 | initial_node=initial_node) for prop in properties]
259 |
260 | # Good for debugging to see dict content, but cannot be used as loses link to variables
261 | # if isinstance(value, _node.Node):
262 | # if value.name in ('ObjectExpression', 'ObjectPattern'):
263 | # value = js_operators.compute_object_expr(value)
264 |
265 | if isinstance(obj_init, dict): # the obj already have properties
266 | all_prop = obj_init # initialize obj with its existing properties
267 | elif isinstance(obj_init, str): # the obj was previously defined with value obj_init
268 | all_prop = {obj_init: {}} # store its previous value as a property to keep it
269 | else:
270 | all_prop = {} # initialize with empty dict
271 | previous_prop = all_prop
272 | for i in range(len(properties_value) - 1):
273 | prop = properties_value[i]
274 | if prop not in previous_prop or not isinstance(previous_prop[prop], dict):
275 | previous_prop[prop] = {} # previous_prop[prop] does not already exist
276 | previous_prop = previous_prop[prop]
277 | previous_prop[properties_value[-1]] = value # prop0.prop1.prop2... = value
278 |
279 | return obj, all_prop
280 |
--------------------------------------------------------------------------------
/pdg_js/scope.py:
--------------------------------------------------------------------------------
1 | # Copyright (C) 2021 Aurore Fass
2 | #
3 | # This program is free software: you can redistribute it and/or modify
4 | # it under the terms of the GNU Affero General Public License as published
5 | # by the Free Software Foundation, either version 3 of the License, or
6 | # (at your option) any later version.
7 | #
8 | # This program is distributed in the hope that it will be useful,
9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of
10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11 | # GNU Affero General Public License for more details.
12 | #
13 | # You should have received a copy of the GNU Affero General Public License
14 | # along with this program. If not, see .
15 |
16 |
17 | """
18 | Definition of class Scope to handle JS scoping rules.
19 | """
20 |
21 | import copy
22 |
23 |
24 | class Scope:
25 | """ To apply JS scoping rules. """
26 |
27 | def __init__(self, name=''):
28 | self.name = name
29 | self.var_list = []
30 | self.var_if2_list = [] # Specific to if constructs with 2 possible variables at the end
31 | self.unknown_var = set() # Unknown variable in a given scope
32 | self.function = None
33 | self.bloc = False # Indicates if we are in a block statement
34 | self.need_to_recompute_var_list = True
35 | self.id_name_list = set()
36 |
37 | def set_name(self, name):
38 | self.name = name
39 |
40 | def set_var_list(self, var_list):
41 | self.var_list = var_list
42 | self.need_to_recompute_var_list = True
43 |
44 | def set_var_if2_list(self, var_if2_list):
45 | self.var_if2_list = var_if2_list
46 |
47 | def set_unknown_var(self, unknown_var):
48 | self.unknown_var = unknown_var
49 |
50 | def set_function(self, function):
51 | self.function = function
52 |
53 | def add_var(self, identifier_node):
54 | self.var_list.append(identifier_node)
55 | self.need_to_recompute_var_list = True
56 | self.var_if2_list.append(None)
57 |
58 | def add_unknown_var(self, unknown):
59 | self.unknown_var.add(unknown) # Set avoids duplicates
60 |
61 | def remove_unknown_var(self, unknown):
62 | self.unknown_var.remove(unknown)
63 |
64 | def update_var(self, index, identifier_node):
65 | self.var_list[index] = identifier_node
66 | self.need_to_recompute_var_list = True
67 | self.var_if2_list[index] = None
68 |
69 | def update_var_if2(self, index, identifier_node_list):
70 | self.var_if2_list[index] = identifier_node_list
71 |
72 | def add_var_if2(self, index, identifier_node):
73 | if not isinstance(self.var_if2_list[index], list):
74 | self.var_if2_list[index] = []
75 | self.var_if2_list[index].append(identifier_node)
76 |
77 | def is_equal(self, var_list2):
78 | if self.var_list == var_list2.var_list and self.var_if2_list == var_list2.var_if2_list:
79 | return True
80 | return False
81 |
82 | def copy_scope(self):
83 | scope = Scope()
84 | scope.set_name(copy.copy(self.name))
85 | scope.set_var_list(copy.copy(self.var_list))
86 | scope.set_var_if2_list(copy.copy(self.var_if2_list))
87 | scope.set_unknown_var(copy.copy(self.unknown_var))
88 | scope.set_function(copy.copy(self.function))
89 | return scope
90 |
91 | def get_pos_identifier(self, identifier_node):
92 | tmp_list = None
93 | if self.need_to_recompute_var_list:
94 | tmp_list = [elt.attributes['name'] for elt in self.var_list]
95 | self.id_name_list = set(tmp_list)
96 | self.need_to_recompute_var_list = False
97 | var_name = identifier_node.attributes['name']
98 | if var_name in self.id_name_list:
99 | if tmp_list is None:
100 | tmp_list = [elt.attributes['name'] for elt in self.var_list]
101 | return tmp_list.index(var_name) # Position of identifier_node in var_list
102 | return None # None if it is not in the list
103 |
104 | def set_in_bloc(self, bloc):
105 | self.bloc = bloc
106 |
--------------------------------------------------------------------------------
/pdg_js/utility_df.py:
--------------------------------------------------------------------------------
1 | # Copyright (C) 2021 Aurore Fass
2 | #
3 | # This program is free software: you can redistribute it and/or modify
4 | # it under the terms of the GNU Affero General Public License as published
5 | # by the Free Software Foundation, either version 3 of the License, or
6 | # (at your option) any later version.
7 | #
8 | # This program is distributed in the hope that it will be useful,
9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of
10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11 | # GNU Affero General Public License for more details.
12 | #
13 | # You should have received a copy of the GNU Affero General Public License
14 | # along with this program. If not, see .
15 |
16 |
17 | """
18 | Utility file, stores shared information.
19 | """
20 |
21 | import sys
22 | import resource
23 | import timeit
24 | import logging
25 | import signal
26 | import traceback
27 |
28 | sys.setrecursionlimit(100000)
29 |
30 |
31 | TEST = False
32 |
33 | if TEST: # To test, e.g., the examples
34 | PDG_EXCEPT = True # To print the exceptions encountered while building the PDG
35 | LIMIT_SIZE = 10000 # To avoid list/str values with over 10,000 characters
36 | LIMIT_RETRAVERSE = 1 # If function called on itself, then max times to avoid infinite recursion
37 | LIMIT_LOOP = 5 # If iterating through a loop, then max times to avoid infinite loops
38 | DISPLAY_VAR = True # To display variable values
39 | CHECK_JSON = True # Builds the JS code from the AST, to check for possible bugs in the AST
40 |
41 | NUM_WORKERS = 1
42 |
43 | else: # To run with multiprocessing
44 | PDG_EXCEPT = False # To ignore (pass) the exceptions encountered while building the PDG
45 | LIMIT_SIZE = 10000 # To avoid list/str values with over 10,000 characters
46 | LIMIT_RETRAVERSE = 1 # If function called on itself, then max times to avoid infinite recursion
47 | LIMIT_LOOP = 1 # If iterating through a loop, then max times to avoid infinite loops
48 | DISPLAY_VAR = False # To not display variable values
49 | CHECK_JSON = False # To not build the JS code from the AST
50 |
51 | NUM_WORKERS = 1 # CHANGE THIS ONE
52 |
53 |
54 | class UpperThresholdFilter(logging.Filter):
55 | """
56 | This allows us to set an upper threshold for the log levels since the setLevel method only
57 | sets a lower one
58 | """
59 |
60 | def __init__(self, threshold, *args, **kwargs):
61 | self._threshold = threshold
62 | super(UpperThresholdFilter, self).__init__(*args, **kwargs)
63 |
64 | def filter(self, rec):
65 | return rec.levelno <= self._threshold
66 |
67 |
68 | logging.basicConfig(format='%(levelname)s: %(filename)s: %(message)s', level=logging.CRITICAL)
69 | # logging.basicConfig(filename='pdg.log', format='%(levelname)s: %(filename)s: %(message)s',
70 | # level=logging.DEBUG)
71 | # LOGGER = logging.getLogger()
72 | # LOGGER.addFilter(UpperThresholdFilter(logging.CRITICAL))
73 |
74 |
75 | def micro_benchmark(message, elapsed_time):
76 | """ Micro benchmarks. """
77 | logging.info('%s %s%s', message, str(elapsed_time), 's')
78 | print('CURRENT STATE %s %s%s' % (message, str(elapsed_time), 's'))
79 | return timeit.default_timer()
80 |
81 |
82 | class Timeout:
83 | """ Timeout class using ALARM signal. """
84 |
85 | class Timeout(Exception):
86 | """ Timeout class throwing an exception. """
87 |
88 | def __init__(self, sec):
89 | self.sec = sec
90 |
91 | def __enter__(self):
92 | signal.signal(signal.SIGALRM, self.raise_timeout)
93 | signal.alarm(self.sec)
94 |
95 | def __exit__(self, *args):
96 | signal.alarm(0) # disable alarm
97 |
98 | def raise_timeout(self, *args):
99 | traceback.print_stack(limit=100)
100 | raise Timeout.Timeout()
101 |
102 |
103 | def limit_memory(maxsize):
104 | """ Limiting the memory usage to maxsize (in bytes), soft limit. """
105 |
106 | soft, hard = resource.getrlimit(resource.RLIMIT_AS)
107 | resource.setrlimit(resource.RLIMIT_AS, (maxsize, hard))
108 |
--------------------------------------------------------------------------------
/pdg_js/value_filters.py:
--------------------------------------------------------------------------------
1 | # Copyright (C) 2021 Aurore Fass
2 | #
3 | # This program is free software: you can redistribute it and/or modify
4 | # it under the terms of the GNU Affero General Public License as published
5 | # by the Free Software Foundation, either version 3 of the License, or
6 | # (at your option) any later version.
7 | #
8 | # This program is distributed in the hope that it will be useful,
9 | # but WITHOUT ANY WARRANTY; without even the implied warranty of
10 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11 | # GNU Affero General Public License for more details.
12 | #
13 | # You should have received a copy of the GNU Affero General Public License
14 | # along with this program. If not, see .
15 |
16 |
17 | """ Prints variables with their corresponding value. And logs whether an insecure API was used. """
18 |
19 | import logging
20 | from . import node as _node
21 | from .js_operators import get_node_computed_value, get_node_value
22 | from . import utility_df
23 |
24 | INSECURE = ['document.write']
25 | DISPLAY_VAR = utility_df.DISPLAY_VAR # To display the variables' value or not
26 |
27 |
28 | def is_insecure_there(value):
29 | """ Checks if value is part of an insecure API. """
30 |
31 | for insecure in INSECURE:
32 | if insecure in value:
33 | logging.debug('Found a call to %s', insecure)
34 |
35 |
36 | def display_values(var, keep_none=True, check_insecure=True, recompute=False):
37 | """ Print var = its value and checks whether the value is part of an insecure API. """
38 |
39 | if not DISPLAY_VAR: # We do not want the values printed during large-scale analyses
40 | return
41 |
42 | if recompute: # If we store ALL value sometimes we need to recompute them as could have changed
43 | # Currently not executed, check if set_value in get_node_computed_value
44 | value = get_node_value(var)
45 | var.set_value(value)
46 | else:
47 | value = var.value # We store value so as not to compute it AGAIN
48 | if isinstance(value, _node.Node) or value is None: # Only if necessary
49 | value = get_node_computed_value(var, keep_none=keep_none) # Gets variable value
50 |
51 | if isinstance(var, _node.Identifier):
52 | variable = get_node_value(var)
53 | print('\t' + variable + ' = ' + str(value)) # Prints variable = value
54 |
55 | elif var.name in _node.CALL_EXPR + ['ReturnStatement']:
56 | print('\t' + var.name + ' = ' + str(value)) # Prints variable = value)
57 |
58 | if isinstance(value, _node.Node):
59 | print('\t' + value.name, value.attributes, value.id)
60 |
61 | elif isinstance(value, str) and check_insecure:
62 | is_insecure_there(value) # Checks for usage of insecure APIs
63 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | graphviz==0.20
2 | lxml==4.9.0
3 |
--------------------------------------------------------------------------------
/taint_mini/__init__.py:
--------------------------------------------------------------------------------
1 | from .storage import *
2 | from .wxml import *
3 | from .wxjs import *
4 | from .taintmini import *
5 |
--------------------------------------------------------------------------------
/taint_mini/storage.py:
--------------------------------------------------------------------------------
1 | class Storage:
2 | instance = None
3 | node = None
4 | results = None
5 | app_path = None
6 | page_path = None
7 | config = None
8 |
9 | def __init__(self, _node, _app_path, _path_path, _config):
10 | self.node = _node
11 | # data structure: {
12 | # [page_name]: [
13 | # { method: [method_name],
14 | # source: [source_name],
15 | # sink: [sink_name]
16 | # },
17 | # ]
18 | # }
19 | self.results = dict()
20 | self.app_path = _app_path
21 | self.page_path = _path_path
22 | self.config = _config
23 |
24 | def get_node(self):
25 | return self.node
26 |
27 | def get_results(self):
28 | return self.results
29 |
30 | def get_app_path(self):
31 | return self.app_path
32 |
33 | def get_page_path(self):
34 | return self.page_path
35 |
36 | def get_config(self):
37 | return self.config
38 |
39 | @staticmethod
40 | def init(_node, _app_path, _page_path, _config):
41 | Storage.instance = Storage(_node, _app_path, _page_path, _config)
42 |
43 | @staticmethod
44 | def get_instance():
45 | return Storage.instance
46 |
--------------------------------------------------------------------------------
/taint_mini/taintmini.py:
--------------------------------------------------------------------------------
1 | import os
2 | import time
3 | from .wxjs import gen_pdg, handle_wxjs
4 | from .wxml import handle_wxml
5 | from .storage import Storage
6 | import multiprocessing as mp
7 |
8 |
9 | def filter_results(results, config):
10 | # no filters, just return
11 | if ("sources" not in config or len(config["sources"]) == 0) and \
12 | ("sinks" not in config or len(config["sinks"]) == 0):
13 | return results
14 |
15 | filtered = {}
16 | for page in results:
17 | filtered[page] = []
18 | for flow in results[page]:
19 | # filter source
20 | if "sources" in config and len(config["sources"]) > 0:
21 | if "sinks" in config and len(config["sinks"]) > 0:
22 | # apply source and sink filter
23 | if flow['source'] in config["sources"] and flow['sink'] in config["sinks"]:
24 | filtered[page].append(flow)
25 | # handle double binding in source
26 | if "[double_binding]" in config["sources"] and "[data from" in flow['source'] \
27 | and flow['sink'] in config["sinks"]:
28 | filtered[page].append(flow)
29 | else:
30 | # no sink filter, just apply source filter
31 | if flow['sink'] in config["sinks"]:
32 | filtered[page].append(flow)
33 | else:
34 | # no source filter, apply sink filter
35 | if "sinks" in config and len(config["sinks"]) > 0:
36 | # apply sink filter
37 | if flow['sink'] in config["sinks"]:
38 | filtered[page].append(flow)
39 | # remove empty entries
40 | if len(filtered[page]) == 0:
41 | filtered.pop(page)
42 | return filtered
43 |
44 |
45 | def analyze_worker(app_path, page_path, results_path, config, queue):
46 | # generate pdg first
47 | r = gen_pdg(os.path.join(app_path, "pages", f"{page_path}.js"), results_path)
48 | # init shared storage (per process)
49 | Storage.init(r, app_path, page_path, config)
50 | # analyze double binding
51 | handle_wxml(os.path.join(app_path, "pages", f"{page_path}.wxml"))
52 | # analyze data flow
53 | handle_wxjs(r)
54 | # retrieve results
55 | results = Storage.get_instance().get_results()
56 | # filter results
57 | filtered = filter_results(results, config)
58 | # send results
59 | queue.put(filtered)
60 |
61 |
62 | def analyze_listener(result_path, queue):
63 | with open(result_path, "w") as f:
64 | f.write("page_name | page_method | ident | source | sink\n")
65 | while True:
66 | message = queue.get()
67 | if message == "kill":
68 | break
69 | if isinstance(message, dict):
70 | for page in message:
71 | for flow in message[page]:
72 | f.write(f"{page} | {flow['method']} | {flow['ident']} | {flow['source']} | {flow['sink']}\n")
73 | f.flush()
74 | f.flush()
75 |
76 |
77 | def obtain_valid_page(files):
78 | sub_pages = set()
79 | for f in files:
80 | sub_pages.add(str.split(f, ".")[0])
81 | for f in list(sub_pages):
82 | if f"{f}.js" not in files or f"{f}.wxml" not in files:
83 | sub_pages.remove(f)
84 | return sub_pages
85 |
86 |
87 | def retrieve_pages(app_path):
88 | pages = set()
89 | for root, dirs, files in os.walk(os.path.join(app_path, "pages/")):
90 | for s in obtain_valid_page(files):
91 | pages.add(f"{root[len(os.path.join(app_path, 'pages/')):]}/{s}")
92 | return pages
93 |
94 |
95 | def analyze_mini_program(app_path, results_path, config, workers, bench):
96 | if not os.path.exists(app_path):
97 | print("[main] invalid app path")
98 |
99 | # obtain pages
100 | pages = retrieve_pages(app_path)
101 | if len(pages) == 0:
102 | print(f"[main] no page found")
103 | return
104 |
105 | # prepare output path
106 | if not os.path.exists(results_path):
107 | os.mkdir(results_path)
108 | elif os.path.isfile(results_path):
109 | print(f"[main] error: invalid output path")
110 | return
111 |
112 | manager = mp.Manager()
113 | queue = manager.Queue()
114 | pool = mp.Pool(workers if workers is not None else mp.cpu_count())
115 |
116 | # put listener to pool first
117 | pool.apply_async(analyze_listener, (os.path.join(results_path, f"{os.path.basename(app_path)}-result.csv"), queue))
118 |
119 | bench_out = None
120 | if bench:
121 | bench_out = open(os.path.join(results_path, f"{os.path.basename(app_path)}-bench.csv"), "w")
122 | bench_out.write("page|start|end\n")
123 |
124 | # execute workers
125 | workers = dict()
126 | for p in pages:
127 | workers[p] = dict()
128 | workers[p]["job"] = pool.apply_async(analyze_worker, (app_path, p, results_path, config, queue))
129 | if bench:
130 | workers[p]["begin_time"] = int(time.time())
131 |
132 | # collect results
133 | for p in pages:
134 | try:
135 | workers[p]["job"].get()
136 | except Exception as e:
137 | print(f"[main] critical error: {e}")
138 | finally:
139 | if bench:
140 | workers[p]["end_time"] = int(time.time())
141 |
142 | queue.put("kill")
143 | pool.close()
144 | pool.join()
145 |
146 | if bench and bench_out is not None:
147 | for p in pages:
148 | bench_out.write(f"{p}|{workers[p]['begin_time']}|{workers[p]['end_time']}\n")
149 | bench_out.close()
150 |
--------------------------------------------------------------------------------
/taint_mini/wxjs.py:
--------------------------------------------------------------------------------
1 | from collections import deque
2 | from pdg_js import node as _node
3 | from pdg_js.build_pdg import get_data_flow
4 | from .storage import Storage
5 |
6 |
7 | def gen_pdg(file_path, results_path):
8 | return get_data_flow(file_path, benchmarks=dict(), alt_json_path=f"{results_path}/intermediate-data/")
9 |
10 |
11 | def handle_wxjs(r):
12 | results = Storage.get_instance().get_results()
13 | results[Storage.get_instance().get_page_path()] = list()
14 | find_page_methods_node(r)
15 |
16 |
17 | def find_page_methods_node(r):
18 | for child in r.children:
19 | if child.name == "ExpressionStatement":
20 | if len(child.children) > 0 \
21 | and child.children[0].name == "CallExpression" \
22 | and child.children[0].children[0].attributes["name"] == "Page":
23 | # found page expression
24 | for method_node in child.children[0].children[1].children:
25 | if method_node.attributes["value"]["type"] == "FunctionExpression":
26 | # handle node
27 | method_name = method_node.children[0].attributes['name']
28 | print(
29 | f"[page method] got page method, method name: {method_name}")
30 | try:
31 | dfs_search(method_node, method_name)
32 | except Exception as e:
33 | print(f"[wxjs] error in searching method {method_name}: {e}")
34 |
35 |
36 | def find_nearest_call_expr_node(node):
37 | return node if node is not None and \
38 | hasattr(node, "name") and isinstance(node, _node.ValueExpr) and node.name == "CallExpression" \
39 | else find_nearest_call_expr_node(node.parent if node.parent is not None else None)
40 |
41 |
42 | def obtain_callee_from_call_expr(node):
43 | if len(node.children[0].children) == 0 and node.children[0].attributes["name"] != "Page":
44 | return node.children[0].attributes["name"]
45 | return ".".join([i.attributes["name"] if "name" in i.attributes else "" for i in node.children[0].children])
46 |
47 |
48 | def obtain_var_decl_callee(node):
49 | return ".".join([i.attributes["name"] for i in node.children[0].children])
50 |
51 |
52 | def obtain_value_expr_callee(node):
53 | return ".".join([i.attributes["name"] for i in node.children])
54 |
55 |
56 | def obtain_data_flow_sink(dep):
57 | # check if the dependence node has CallExpression parent
58 | if isinstance(dep.extremity.parent, _node.ValueExpr):
59 | return obtain_value_expr_callee(dep.extremity.parent.children[0])
60 | return None
61 |
62 |
63 | def handle_data_parent_node(node):
64 | source = check_immediate_data_dep_parent(node)
65 | # if no known pattern match, fall back to general search
66 | if source is None:
67 | call_expr_node = find_nearest_call_expr_node(node)
68 | source = obtain_callee_from_call_expr(call_expr_node)
69 | print(f"[taint source] got nearest callee (source): {source}")
70 |
71 | # obtain sink
72 | sink = []
73 | for child in node.data_dep_children:
74 | s = obtain_callee_from_call_expr(find_nearest_call_expr_node(child.extremity))
75 | if s is not None:
76 | print(f"[taint sink] got data flow sink: {s}")
77 | sink.append(s)
78 |
79 | print(f"[flow path] data identifier: {node.attributes['name']}, "
80 | f"from source: {source if source is not None else 'None'}, "
81 | f"to sink: {','.join(map(str, sink))}")
82 |
83 |
84 | def is_parent_var_decl_or_assign_expr(node):
85 | return isinstance(node.parent, _node.Node) and \
86 | hasattr(node.parent, "name") and \
87 | (node.parent.name == "VariableDeclarator" or node.parent.name == "AssignmentExpression")
88 |
89 |
90 | def check_immediate_data_dep_parent(node):
91 | # check the data dep parent node is assignment or var decl
92 | # this check suitable for var_decl -> further usage
93 | source = None
94 | if is_parent_var_decl_or_assign_expr(node):
95 | # variable declaration or assignment, check the call expr
96 | if len(node.parent.children) > 1 and isinstance(node.parent.children[1], _node.ValueExpr):
97 | # obtain callee if parent is call expr
98 | if hasattr(node.parent.children[1], "name") and node.parent.children[1].name == "CallExpression":
99 | source = obtain_callee_from_call_expr(node.parent.children[1])
100 |
101 | # obtain callee if parent is var decl
102 | if source is None:
103 | source = obtain_var_decl_callee(node.parent.children[1])
104 | print(f"[taint source] got data flow source: {source}, identifier: {node.attributes['name']}")
105 | return source
106 |
107 |
108 | def is_page_method_parameter(node):
109 | if not isinstance(node, _node.Identifier):
110 | return False
111 | # in AST tree, ident -> FunctionExpr -> Property -> ObjectExpr
112 | # -> CallExpr <- Ident (Page)
113 | try:
114 | if node.parent.parent.parent.parent \
115 | .children[0].attributes["name"] == "Page":
116 | return True
117 | except IndexError:
118 | return False
119 | except AttributeError:
120 | return False
121 | except KeyError:
122 | return False
123 |
124 |
125 | def get_input_name(value):
126 | return value[value.rindex(".") + 1:] if isinstance(value, str) and "detail.value" in value else None
127 |
128 |
129 | def handle_page_method_parameter(node, _n):
130 | # handle double binding values
131 | if not isinstance(node, _node.Identifier) or not isinstance(_n, _node.Identifier):
132 | return None
133 | # key is double_binding_values in ident node
134 | # omit it since false-negatives
135 | # if "double_binding_values" not in _n.attributes:
136 | # return None
137 | sources = set()
138 | # handle form double binding (input)
139 | # pattern: e.detail.value.[id]
140 | if isinstance(node.value, dict):
141 | for i in node.value:
142 | if isinstance(node.value[i], str) and "detail.value" in node.value[i]:
143 | input_name = get_input_name(node.value[i])
144 | if input_name is None or input_name not in _n.attributes["double_binding_values"]:
145 | continue
146 | sources.add(f"[data from double binding: {input_name}, "
147 | f"type: {_n.attributes['double_binding_values'][input_name]}]")
148 | elif isinstance(node.value, str) and "detail.value" in node.value:
149 | input_name = get_input_name(node.value)
150 | if input_name is not None and input_name in _n.attributes["double_binding_values"]:
151 | sources.add(f"[data from double binding: {input_name}, "
152 | f"type: {_n.attributes['double_binding_values'][input_name]}]")
153 |
154 | # if no double binding found, fall back to general resolve
155 | if len(sources) == 0:
156 | sources.add(f"[data from page parameter: {node.value}]")
157 | return sources
158 |
159 |
160 | def handle_data_dep_parents(node):
161 | """
162 | @return set of sources
163 | """
164 | # check immediate data dep parent node first
165 | source = check_immediate_data_dep_parent(node)
166 | if source is not None:
167 | return {source}
168 |
169 | # no source found, fall back to general search
170 | source = obtain_callee_from_call_expr(find_nearest_call_expr_node(node))
171 | if source is not None and source != "":
172 | return {source}
173 |
174 | sources = set()
175 | # no call expr found, search from provenance parents
176 | for n in node.provenance_parents_set:
177 | # check ident
178 | if isinstance(n, _node.Identifier):
179 | # check if it's page method parameter first
180 | if is_page_method_parameter(n):
181 | # is page method parameter, handle double binding
182 | # notice here should analyze the original node,
183 | # not the provenance parent node
184 | r = handle_page_method_parameter(node, n)
185 | if r is not None:
186 | sources.update(r)
187 | continue
188 |
189 | # search for source from var decl or assignment expr
190 | r = check_immediate_data_dep_parent(n)
191 | if r is None:
192 | # no results found, fall back to general search
193 | r = obtain_callee_from_call_expr(find_nearest_call_expr_node(n))
194 |
195 | # still no results
196 | if r is None or r == "":
197 | continue
198 | # found source, add to set
199 | sources.add(r)
200 | # normal node, don't handle it
201 | if isinstance(n, _node.Node):
202 | continue
203 | # value expr, don't handle it
204 | if isinstance(n, _node.ValueExpr):
205 | continue
206 | # end for
207 | return sources
208 |
209 |
210 | def handle_data_child_node(node, method_name):
211 | if hasattr(node, "data_dep_children") and len(node.data_dep_children) > 0:
212 | # this node has data dep children (intermediate node), won't handle it
213 | return
214 |
215 | # no more children, it's the last node of the data flow
216 | # resolve sink api if the parent node is call expr
217 | sink = obtain_callee_from_call_expr(find_nearest_call_expr_node(node))
218 | if sink == "":
219 | print(f"[taint sink] no sink api resolved, passing...")
220 | return
221 | print(f"[taint sink] got data flow sink: {sink}, resolving data flow source")
222 |
223 | # resolve data source
224 | sources = set()
225 | data_dep_parent_nodes = node.data_dep_parents
226 | for n in data_dep_parent_nodes:
227 | s = handle_data_dep_parents(n.extremity)
228 | if s is not None:
229 | sources.update(s)
230 |
231 | if len(sources):
232 | print(f"[taint source] resolve data sources: {', '.join(sources)}")
233 | else:
234 | print(f"[taint source] no valid source found")
235 |
236 | # flow path
237 | if len(sources):
238 | print(f"[flow path] data identifier: {node.attributes['name']}, "
239 | f"from source: {', '.join(sources)}, "
240 | f"to sink: {sink}")
241 | results = Storage.get_instance().get_results()
242 | for s in sources:
243 | results[Storage.get_instance().get_page_path()].append({
244 | "method": method_name,
245 | "ident": node.attributes['name'],
246 | "source": s,
247 | "sink": sink
248 | })
249 |
250 |
251 | def handle_identifier_node(node, method_name):
252 | # if hasattr(node, "data_dep_children") and len(node.data_dep_children) > 0:
253 | # print("[handle ident] got data flow parent node")
254 | # handle_data_parent_node(node)
255 |
256 | # search backwards (from children)
257 | if hasattr(node, "data_dep_parents") and len(node.data_dep_parents) > 0:
258 | print("[handle ident] got data flow child node")
259 | # omit backwards search
260 | handle_data_child_node(node, method_name)
261 |
262 |
263 | def dfs_visit(node, method_name):
264 | if not isinstance(node, _node.Identifier):
265 | # print("normal node, passing")
266 | return
267 |
268 | handle_identifier_node(node, method_name)
269 |
270 |
271 | def dfs_search(r, n):
272 | stack = deque()
273 | stack.append(r)
274 |
275 | visited = []
276 |
277 | while stack:
278 | v = stack.pop()
279 | if v in visited:
280 | continue
281 |
282 | # node is not visited
283 | visited.append(v)
284 | dfs_visit(v, n)
285 |
286 | # visit its children
287 | children = v.children
288 | for i in reversed(children):
289 | if i not in visited:
290 | stack.append(i)
291 |
--------------------------------------------------------------------------------
/taint_mini/wxml.py:
--------------------------------------------------------------------------------
1 | from lxml.html import parse
2 | from .storage import Storage
3 |
4 |
5 | def handle_wxml(file):
6 | try:
7 | wxml_html_root = parse(file)
8 | visit_wxml_tree(wxml_html_root)
9 | except Exception as e:
10 | print(f"[wxml] got error: {e}")
11 |
12 |
13 | def find_page_method_node(root, name):
14 | for child in root.children:
15 | if child.name == "ExpressionStatement":
16 | if len(child.children) > 0 \
17 | and child.children[0].name == "CallExpression" \
18 | and child.children[0].children[0].attributes["name"] == "Page":
19 | # found page expression
20 | for method_node in child.children[0].children[1].children:
21 | if method_node.attributes["value"]["type"] == "FunctionExpression":
22 | # handle node
23 | if method_node.children[0].attributes["name"] == name:
24 | return method_node
25 |
26 |
27 | def tag_properties_to_page_method_param_ident_node(node, p):
28 | # Function [1] -> FunctionExpr [0] -> Ident
29 | node.children[1].children[0].attributes["double_binding_values"] = p["inputs"]
30 |
31 |
32 | def handle_form_properties(p):
33 | root = Storage.get_instance().get_node()
34 | node = find_page_method_node(root, p["bind_submit"])
35 | tag_properties_to_page_method_param_ident_node(node, p)
36 |
37 |
38 | def handle_wxml_form(element):
39 | visited_elements_in_form = []
40 | form_properties = dict()
41 | # mapping: key name -> type
42 | form_properties["inputs"] = dict()
43 |
44 | for e in element.iter():
45 | visited_elements_in_form.append(e)
46 |
47 | # handle form bind:submit
48 | if e.tag == "g-form" or e.tag == "form":
49 | if hasattr(e, "attrib") and "bind:submit" in e.attrib:
50 | form_properties["bind_submit"] = e.attrib["bind:submit"]
51 | continue
52 |
53 | # handle input properties
54 | if (e.tag == "g-input" or e.tag == "input") and hasattr(e, "attrib") \
55 | and ("name" in e.attrib or "id" in e.attrib):
56 | # handle password
57 | if "password" in e.attrib or ("type" in e.attrib and e.attrib["type"] == "safe-password"):
58 | form_properties["inputs"][e.attrib["name"] if "name" in e.attrib else e.attrib["id"]] = "password"
59 | continue
60 | # handle normal input
61 | if "type" in e.attrib:
62 | form_properties["inputs"][e.attrib["name"] if "name" in e.attrib else e.attrib["id"]] = e.attrib["type"]
63 | continue
64 |
65 | # handle the properties
66 | if form_properties["bind_submit"] is not None:
67 | handle_form_properties(form_properties)
68 | # return the visited elements in this form
69 | return visited_elements_in_form
70 |
71 |
72 | def handle_wxml_element(element):
73 | pass
74 |
75 |
76 | def visit_wxml_tree(r):
77 | visited = []
78 |
79 | def visit_node(v):
80 | visited.append(v)
81 | # handle form element
82 | # as a form may have many child input elements
83 | if hasattr(v, "tag") and (v.tag == "g-form" or v.tag == "form"):
84 | # multiple elements are visited in handling form element
85 | visited.extend(handle_wxml_form(v))
86 | return
87 |
88 | # handle normal xml element
89 | handle_wxml_element(v)
90 |
91 | # iter all the elements
92 | for i in r.iter():
93 | if i not in visited:
94 | visit_node(i)
95 |
96 |
97 |
--------------------------------------------------------------------------------