├── LICENSE ├── MANIFEST.in ├── README.md ├── __init__.py ├── demos ├── list1.py ├── list2.py ├── psydiff1.py └── psydiff2.py ├── diff.css ├── htmlize.py ├── improve_ast.py ├── lists.py ├── nav.js ├── parameters.py ├── psydiff.py ├── setup.py └── utils.py /LICENSE: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | Copyright (C) 2007 Free Software Foundation, Inc. 5 | Everyone is permitted to copy and distribute verbatim copies 6 | of this license document, but changing it is not allowed. 7 | 8 | Preamble 9 | 10 | The GNU General Public License is a free, copyleft license for 11 | software and other kinds of works. 12 | 13 | The licenses for most software and other practical works are designed 14 | to take away your freedom to share and change the works. By contrast, 15 | the GNU General Public License is intended to guarantee your freedom to 16 | share and change all versions of a program--to make sure it remains free 17 | software for all its users. We, the Free Software Foundation, use the 18 | GNU General Public License for most of our software; it applies also to 19 | any other work released this way by its authors. You can apply it to 20 | your programs, too. 21 | 22 | When we speak of free software, we are referring to freedom, not 23 | price. Our General Public Licenses are designed to make sure that you 24 | have the freedom to distribute copies of free software (and charge for 25 | them if you wish), that you receive source code or can get it if you 26 | want it, that you can change the software or use pieces of it in new 27 | free programs, and that you know you can do these things. 28 | 29 | To protect your rights, we need to prevent others from denying you 30 | these rights or asking you to surrender the rights. Therefore, you have 31 | certain responsibilities if you distribute copies of the software, or if 32 | you modify it: responsibilities to respect the freedom of others. 33 | 34 | For example, if you distribute copies of such a program, whether 35 | gratis or for a fee, you must pass on to the recipients the same 36 | freedoms that you received. You must make sure that they, too, receive 37 | or can get the source code. And you must show them these terms so they 38 | know their rights. 39 | 40 | Developers that use the GNU GPL protect your rights with two steps: 41 | (1) assert copyright on the software, and (2) offer you this License 42 | giving you legal permission to copy, distribute and/or modify it. 43 | 44 | For the developers' and authors' protection, the GPL clearly explains 45 | that there is no warranty for this free software. For both users' and 46 | authors' sake, the GPL requires that modified versions be marked as 47 | changed, so that their problems will not be attributed erroneously to 48 | authors of previous versions. 49 | 50 | Some devices are designed to deny users access to install or run 51 | modified versions of the software inside them, although the manufacturer 52 | can do so. This is fundamentally incompatible with the aim of 53 | protecting users' freedom to change the software. The systematic 54 | pattern of such abuse occurs in the area of products for individuals to 55 | use, which is precisely where it is most unacceptable. Therefore, we 56 | have designed this version of the GPL to prohibit the practice for those 57 | products. If such problems arise substantially in other domains, we 58 | stand ready to extend this provision to those domains in future versions 59 | of the GPL, as needed to protect the freedom of users. 60 | 61 | Finally, every program is threatened constantly by software patents. 62 | States should not allow patents to restrict development and use of 63 | software on general-purpose computers, but in those that do, we wish to 64 | avoid the special danger that patents applied to a free program could 65 | make it effectively proprietary. To prevent this, the GPL assures that 66 | patents cannot be used to render the program non-free. 67 | 68 | The precise terms and conditions for copying, distribution and 69 | modification follow. 70 | 71 | TERMS AND CONDITIONS 72 | 73 | 0. Definitions. 74 | 75 | "This License" refers to version 3 of the GNU General Public License. 76 | 77 | "Copyright" also means copyright-like laws that apply to other kinds of 78 | works, such as semiconductor masks. 79 | 80 | "The Program" refers to any copyrightable work licensed under this 81 | License. Each licensee is addressed as "you". "Licensees" and 82 | "recipients" may be individuals or organizations. 83 | 84 | To "modify" a work means to copy from or adapt all or part of the work 85 | in a fashion requiring copyright permission, other than the making of an 86 | exact copy. The resulting work is called a "modified version" of the 87 | earlier work or a work "based on" the earlier work. 88 | 89 | A "covered work" means either the unmodified Program or a work based 90 | on the Program. 91 | 92 | To "propagate" a work means to do anything with it that, without 93 | permission, would make you directly or secondarily liable for 94 | infringement under applicable copyright law, except executing it on a 95 | computer or modifying a private copy. Propagation includes copying, 96 | distribution (with or without modification), making available to the 97 | public, and in some countries other activities as well. 98 | 99 | To "convey" a work means any kind of propagation that enables other 100 | parties to make or receive copies. Mere interaction with a user through 101 | a computer network, with no transfer of a copy, is not conveying. 102 | 103 | An interactive user interface displays "Appropriate Legal Notices" 104 | to the extent that it includes a convenient and prominently visible 105 | feature that (1) displays an appropriate copyright notice, and (2) 106 | tells the user that there is no warranty for the work (except to the 107 | extent that warranties are provided), that licensees may convey the 108 | work under this License, and how to view a copy of this License. If 109 | the interface presents a list of user commands or options, such as a 110 | menu, a prominent item in the list meets this criterion. 111 | 112 | 1. Source Code. 113 | 114 | The "source code" for a work means the preferred form of the work 115 | for making modifications to it. "Object code" means any non-source 116 | form of a work. 117 | 118 | A "Standard Interface" means an interface that either is an official 119 | standard defined by a recognized standards body, or, in the case of 120 | interfaces specified for a particular programming language, one that 121 | is widely used among developers working in that language. 122 | 123 | The "System Libraries" of an executable work include anything, other 124 | than the work as a whole, that (a) is included in the normal form of 125 | packaging a Major Component, but which is not part of that Major 126 | Component, and (b) serves only to enable use of the work with that 127 | Major Component, or to implement a Standard Interface for which an 128 | implementation is available to the public in source code form. A 129 | "Major Component", in this context, means a major essential component 130 | (kernel, window system, and so on) of the specific operating system 131 | (if any) on which the executable work runs, or a compiler used to 132 | produce the work, or an object code interpreter used to run it. 133 | 134 | The "Corresponding Source" for a work in object code form means all 135 | the source code needed to generate, install, and (for an executable 136 | work) run the object code and to modify the work, including scripts to 137 | control those activities. However, it does not include the work's 138 | System Libraries, or general-purpose tools or generally available free 139 | programs which are used unmodified in performing those activities but 140 | which are not part of the work. For example, Corresponding Source 141 | includes interface definition files associated with source files for 142 | the work, and the source code for shared libraries and dynamically 143 | linked subprograms that the work is specifically designed to require, 144 | such as by intimate data communication or control flow between those 145 | subprograms and other parts of the work. 146 | 147 | The Corresponding Source need not include anything that users 148 | can regenerate automatically from other parts of the Corresponding 149 | Source. 150 | 151 | The Corresponding Source for a work in source code form is that 152 | same work. 153 | 154 | 2. Basic Permissions. 155 | 156 | All rights granted under this License are granted for the term of 157 | copyright on the Program, and are irrevocable provided the stated 158 | conditions are met. This License explicitly affirms your unlimited 159 | permission to run the unmodified Program. The output from running a 160 | covered work is covered by this License only if the output, given its 161 | content, constitutes a covered work. This License acknowledges your 162 | rights of fair use or other equivalent, as provided by copyright law. 163 | 164 | You may make, run and propagate covered works that you do not 165 | convey, without conditions so long as your license otherwise remains 166 | in force. You may convey covered works to others for the sole purpose 167 | of having them make modifications exclusively for you, or provide you 168 | with facilities for running those works, provided that you comply with 169 | the terms of this License in conveying all material for which you do 170 | not control copyright. Those thus making or running the covered works 171 | for you must do so exclusively on your behalf, under your direction 172 | and control, on terms that prohibit them from making any copies of 173 | your copyrighted material outside their relationship with you. 174 | 175 | Conveying under any other circumstances is permitted solely under 176 | the conditions stated below. Sublicensing is not allowed; section 10 177 | makes it unnecessary. 178 | 179 | 3. Protecting Users' Legal Rights From Anti-Circumvention Law. 180 | 181 | No covered work shall be deemed part of an effective technological 182 | measure under any applicable law fulfilling obligations under article 183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or 184 | similar laws prohibiting or restricting circumvention of such 185 | measures. 186 | 187 | When you convey a covered work, you waive any legal power to forbid 188 | circumvention of technological measures to the extent such circumvention 189 | is effected by exercising rights under this License with respect to 190 | the covered work, and you disclaim any intention to limit operation or 191 | modification of the work as a means of enforcing, against the work's 192 | users, your or third parties' legal rights to forbid circumvention of 193 | technological measures. 194 | 195 | 4. Conveying Verbatim Copies. 196 | 197 | You may convey verbatim copies of the Program's source code as you 198 | receive it, in any medium, provided that you conspicuously and 199 | appropriately publish on each copy an appropriate copyright notice; 200 | keep intact all notices stating that this License and any 201 | non-permissive terms added in accord with section 7 apply to the code; 202 | keep intact all notices of the absence of any warranty; and give all 203 | recipients a copy of this License along with the Program. 204 | 205 | You may charge any price or no price for each copy that you convey, 206 | and you may offer support or warranty protection for a fee. 207 | 208 | 5. Conveying Modified Source Versions. 209 | 210 | You may convey a work based on the Program, or the modifications to 211 | produce it from the Program, in the form of source code under the 212 | terms of section 4, provided that you also meet all of these conditions: 213 | 214 | a) The work must carry prominent notices stating that you modified 215 | it, and giving a relevant date. 216 | 217 | b) The work must carry prominent notices stating that it is 218 | released under this License and any conditions added under section 219 | 7. This requirement modifies the requirement in section 4 to 220 | "keep intact all notices". 221 | 222 | c) You must license the entire work, as a whole, under this 223 | License to anyone who comes into possession of a copy. This 224 | License will therefore apply, along with any applicable section 7 225 | additional terms, to the whole of the work, and all its parts, 226 | regardless of how they are packaged. This License gives no 227 | permission to license the work in any other way, but it does not 228 | invalidate such permission if you have separately received it. 229 | 230 | d) If the work has interactive user interfaces, each must display 231 | Appropriate Legal Notices; however, if the Program has interactive 232 | interfaces that do not display Appropriate Legal Notices, your 233 | work need not make them do so. 234 | 235 | A compilation of a covered work with other separate and independent 236 | works, which are not by their nature extensions of the covered work, 237 | and which are not combined with it such as to form a larger program, 238 | in or on a volume of a storage or distribution medium, is called an 239 | "aggregate" if the compilation and its resulting copyright are not 240 | used to limit the access or legal rights of the compilation's users 241 | beyond what the individual works permit. Inclusion of a covered work 242 | in an aggregate does not cause this License to apply to the other 243 | parts of the aggregate. 244 | 245 | 6. Conveying Non-Source Forms. 246 | 247 | You may convey a covered work in object code form under the terms 248 | of sections 4 and 5, provided that you also convey the 249 | machine-readable Corresponding Source under the terms of this License, 250 | in one of these ways: 251 | 252 | a) Convey the object code in, or embodied in, a physical product 253 | (including a physical distribution medium), accompanied by the 254 | Corresponding Source fixed on a durable physical medium 255 | customarily used for software interchange. 256 | 257 | b) Convey the object code in, or embodied in, a physical product 258 | (including a physical distribution medium), accompanied by a 259 | written offer, valid for at least three years and valid for as 260 | long as you offer spare parts or customer support for that product 261 | model, to give anyone who possesses the object code either (1) a 262 | copy of the Corresponding Source for all the software in the 263 | product that is covered by this License, on a durable physical 264 | medium customarily used for software interchange, for a price no 265 | more than your reasonable cost of physically performing this 266 | conveying of source, or (2) access to copy the 267 | Corresponding Source from a network server at no charge. 268 | 269 | c) Convey individual copies of the object code with a copy of the 270 | written offer to provide the Corresponding Source. This 271 | alternative is allowed only occasionally and noncommercially, and 272 | only if you received the object code with such an offer, in accord 273 | with subsection 6b. 274 | 275 | d) Convey the object code by offering access from a designated 276 | place (gratis or for a charge), and offer equivalent access to the 277 | Corresponding Source in the same way through the same place at no 278 | further charge. You need not require recipients to copy the 279 | Corresponding Source along with the object code. If the place to 280 | copy the object code is a network server, the Corresponding Source 281 | may be on a different server (operated by you or a third party) 282 | that supports equivalent copying facilities, provided you maintain 283 | clear directions next to the object code saying where to find the 284 | Corresponding Source. Regardless of what server hosts the 285 | Corresponding Source, you remain obligated to ensure that it is 286 | available for as long as needed to satisfy these requirements. 287 | 288 | e) Convey the object code using peer-to-peer transmission, provided 289 | you inform other peers where the object code and Corresponding 290 | Source of the work are being offered to the general public at no 291 | charge under subsection 6d. 292 | 293 | A separable portion of the object code, whose source code is excluded 294 | from the Corresponding Source as a System Library, need not be 295 | included in conveying the object code work. 296 | 297 | A "User Product" is either (1) a "consumer product", which means any 298 | tangible personal property which is normally used for personal, family, 299 | or household purposes, or (2) anything designed or sold for incorporation 300 | into a dwelling. In determining whether a product is a consumer product, 301 | doubtful cases shall be resolved in favor of coverage. For a particular 302 | product received by a particular user, "normally used" refers to a 303 | typical or common use of that class of product, regardless of the status 304 | of the particular user or of the way in which the particular user 305 | actually uses, or expects or is expected to use, the product. A product 306 | is a consumer product regardless of whether the product has substantial 307 | commercial, industrial or non-consumer uses, unless such uses represent 308 | the only significant mode of use of the product. 309 | 310 | "Installation Information" for a User Product means any methods, 311 | procedures, authorization keys, or other information required to install 312 | and execute modified versions of a covered work in that User Product from 313 | a modified version of its Corresponding Source. The information must 314 | suffice to ensure that the continued functioning of the modified object 315 | code is in no case prevented or interfered with solely because 316 | modification has been made. 317 | 318 | If you convey an object code work under this section in, or with, or 319 | specifically for use in, a User Product, and the conveying occurs as 320 | part of a transaction in which the right of possession and use of the 321 | User Product is transferred to the recipient in perpetuity or for a 322 | fixed term (regardless of how the transaction is characterized), the 323 | Corresponding Source conveyed under this section must be accompanied 324 | by the Installation Information. But this requirement does not apply 325 | if neither you nor any third party retains the ability to install 326 | modified object code on the User Product (for example, the work has 327 | been installed in ROM). 328 | 329 | The requirement to provide Installation Information does not include a 330 | requirement to continue to provide support service, warranty, or updates 331 | for a work that has been modified or installed by the recipient, or for 332 | the User Product in which it has been modified or installed. Access to a 333 | network may be denied when the modification itself materially and 334 | adversely affects the operation of the network or violates the rules and 335 | protocols for communication across the network. 336 | 337 | Corresponding Source conveyed, and Installation Information provided, 338 | in accord with this section must be in a format that is publicly 339 | documented (and with an implementation available to the public in 340 | source code form), and must require no special password or key for 341 | unpacking, reading or copying. 342 | 343 | 7. Additional Terms. 344 | 345 | "Additional permissions" are terms that supplement the terms of this 346 | License by making exceptions from one or more of its conditions. 347 | Additional permissions that are applicable to the entire Program shall 348 | be treated as though they were included in this License, to the extent 349 | that they are valid under applicable law. If additional permissions 350 | apply only to part of the Program, that part may be used separately 351 | under those permissions, but the entire Program remains governed by 352 | this License without regard to the additional permissions. 353 | 354 | When you convey a copy of a covered work, you may at your option 355 | remove any additional permissions from that copy, or from any part of 356 | it. (Additional permissions may be written to require their own 357 | removal in certain cases when you modify the work.) You may place 358 | additional permissions on material, added by you to a covered work, 359 | for which you have or can give appropriate copyright permission. 360 | 361 | Notwithstanding any other provision of this License, for material you 362 | add to a covered work, you may (if authorized by the copyright holders of 363 | that material) supplement the terms of this License with terms: 364 | 365 | a) Disclaiming warranty or limiting liability differently from the 366 | terms of sections 15 and 16 of this License; or 367 | 368 | b) Requiring preservation of specified reasonable legal notices or 369 | author attributions in that material or in the Appropriate Legal 370 | Notices displayed by works containing it; or 371 | 372 | c) Prohibiting misrepresentation of the origin of that material, or 373 | requiring that modified versions of such material be marked in 374 | reasonable ways as different from the original version; or 375 | 376 | d) Limiting the use for publicity purposes of names of licensors or 377 | authors of the material; or 378 | 379 | e) Declining to grant rights under trademark law for use of some 380 | trade names, trademarks, or service marks; or 381 | 382 | f) Requiring indemnification of licensors and authors of that 383 | material by anyone who conveys the material (or modified versions of 384 | it) with contractual assumptions of liability to the recipient, for 385 | any liability that these contractual assumptions directly impose on 386 | those licensors and authors. 387 | 388 | All other non-permissive additional terms are considered "further 389 | restrictions" within the meaning of section 10. If the Program as you 390 | received it, or any part of it, contains a notice stating that it is 391 | governed by this License along with a term that is a further 392 | restriction, you may remove that term. If a license document contains 393 | a further restriction but permits relicensing or conveying under this 394 | License, you may add to a covered work material governed by the terms 395 | of that license document, provided that the further restriction does 396 | not survive such relicensing or conveying. 397 | 398 | If you add terms to a covered work in accord with this section, you 399 | must place, in the relevant source files, a statement of the 400 | additional terms that apply to those files, or a notice indicating 401 | where to find the applicable terms. 402 | 403 | Additional terms, permissive or non-permissive, may be stated in the 404 | form of a separately written license, or stated as exceptions; 405 | the above requirements apply either way. 406 | 407 | 8. Termination. 408 | 409 | You may not propagate or modify a covered work except as expressly 410 | provided under this License. Any attempt otherwise to propagate or 411 | modify it is void, and will automatically terminate your rights under 412 | this License (including any patent licenses granted under the third 413 | paragraph of section 11). 414 | 415 | However, if you cease all violation of this License, then your 416 | license from a particular copyright holder is reinstated (a) 417 | provisionally, unless and until the copyright holder explicitly and 418 | finally terminates your license, and (b) permanently, if the copyright 419 | holder fails to notify you of the violation by some reasonable means 420 | prior to 60 days after the cessation. 421 | 422 | Moreover, your license from a particular copyright holder is 423 | reinstated permanently if the copyright holder notifies you of the 424 | violation by some reasonable means, this is the first time you have 425 | received notice of violation of this License (for any work) from that 426 | copyright holder, and you cure the violation prior to 30 days after 427 | your receipt of the notice. 428 | 429 | Termination of your rights under this section does not terminate the 430 | licenses of parties who have received copies or rights from you under 431 | this License. If your rights have been terminated and not permanently 432 | reinstated, you do not qualify to receive new licenses for the same 433 | material under section 10. 434 | 435 | 9. Acceptance Not Required for Having Copies. 436 | 437 | You are not required to accept this License in order to receive or 438 | run a copy of the Program. Ancillary propagation of a covered work 439 | occurring solely as a consequence of using peer-to-peer transmission 440 | to receive a copy likewise does not require acceptance. However, 441 | nothing other than this License grants you permission to propagate or 442 | modify any covered work. These actions infringe copyright if you do 443 | not accept this License. Therefore, by modifying or propagating a 444 | covered work, you indicate your acceptance of this License to do so. 445 | 446 | 10. Automatic Licensing of Downstream Recipients. 447 | 448 | Each time you convey a covered work, the recipient automatically 449 | receives a license from the original licensors, to run, modify and 450 | propagate that work, subject to this License. You are not responsible 451 | for enforcing compliance by third parties with this License. 452 | 453 | An "entity transaction" is a transaction transferring control of an 454 | organization, or substantially all assets of one, or subdividing an 455 | organization, or merging organizations. If propagation of a covered 456 | work results from an entity transaction, each party to that 457 | transaction who receives a copy of the work also receives whatever 458 | licenses to the work the party's predecessor in interest had or could 459 | give under the previous paragraph, plus a right to possession of the 460 | Corresponding Source of the work from the predecessor in interest, if 461 | the predecessor has it or can get it with reasonable efforts. 462 | 463 | You may not impose any further restrictions on the exercise of the 464 | rights granted or affirmed under this License. For example, you may 465 | not impose a license fee, royalty, or other charge for exercise of 466 | rights granted under this License, and you may not initiate litigation 467 | (including a cross-claim or counterclaim in a lawsuit) alleging that 468 | any patent claim is infringed by making, using, selling, offering for 469 | sale, or importing the Program or any portion of it. 470 | 471 | 11. Patents. 472 | 473 | A "contributor" is a copyright holder who authorizes use under this 474 | License of the Program or a work on which the Program is based. The 475 | work thus licensed is called the contributor's "contributor version". 476 | 477 | A contributor's "essential patent claims" are all patent claims 478 | owned or controlled by the contributor, whether already acquired or 479 | hereafter acquired, that would be infringed by some manner, permitted 480 | by this License, of making, using, or selling its contributor version, 481 | but do not include claims that would be infringed only as a 482 | consequence of further modification of the contributor version. For 483 | purposes of this definition, "control" includes the right to grant 484 | patent sublicenses in a manner consistent with the requirements of 485 | this License. 486 | 487 | Each contributor grants you a non-exclusive, worldwide, royalty-free 488 | patent license under the contributor's essential patent claims, to 489 | make, use, sell, offer for sale, import and otherwise run, modify and 490 | propagate the contents of its contributor version. 491 | 492 | In the following three paragraphs, a "patent license" is any express 493 | agreement or commitment, however denominated, not to enforce a patent 494 | (such as an express permission to practice a patent or covenant not to 495 | sue for patent infringement). To "grant" such a patent license to a 496 | party means to make such an agreement or commitment not to enforce a 497 | patent against the party. 498 | 499 | If you convey a covered work, knowingly relying on a patent license, 500 | and the Corresponding Source of the work is not available for anyone 501 | to copy, free of charge and under the terms of this License, through a 502 | publicly available network server or other readily accessible means, 503 | then you must either (1) cause the Corresponding Source to be so 504 | available, or (2) arrange to deprive yourself of the benefit of the 505 | patent license for this particular work, or (3) arrange, in a manner 506 | consistent with the requirements of this License, to extend the patent 507 | license to downstream recipients. "Knowingly relying" means you have 508 | actual knowledge that, but for the patent license, your conveying the 509 | covered work in a country, or your recipient's use of the covered work 510 | in a country, would infringe one or more identifiable patents in that 511 | country that you have reason to believe are valid. 512 | 513 | If, pursuant to or in connection with a single transaction or 514 | arrangement, you convey, or propagate by procuring conveyance of, a 515 | covered work, and grant a patent license to some of the parties 516 | receiving the covered work authorizing them to use, propagate, modify 517 | or convey a specific copy of the covered work, then the patent license 518 | you grant is automatically extended to all recipients of the covered 519 | work and works based on it. 520 | 521 | A patent license is "discriminatory" if it does not include within 522 | the scope of its coverage, prohibits the exercise of, or is 523 | conditioned on the non-exercise of one or more of the rights that are 524 | specifically granted under this License. You may not convey a covered 525 | work if you are a party to an arrangement with a third party that is 526 | in the business of distributing software, under which you make payment 527 | to the third party based on the extent of your activity of conveying 528 | the work, and under which the third party grants, to any of the 529 | parties who would receive the covered work from you, a discriminatory 530 | patent license (a) in connection with copies of the covered work 531 | conveyed by you (or copies made from those copies), or (b) primarily 532 | for and in connection with specific products or compilations that 533 | contain the covered work, unless you entered into that arrangement, 534 | or that patent license was granted, prior to 28 March 2007. 535 | 536 | Nothing in this License shall be construed as excluding or limiting 537 | any implied license or other defenses to infringement that may 538 | otherwise be available to you under applicable patent law. 539 | 540 | 12. No Surrender of Others' Freedom. 541 | 542 | If conditions are imposed on you (whether by court order, agreement or 543 | otherwise) that contradict the conditions of this License, they do not 544 | excuse you from the conditions of this License. If you cannot convey a 545 | covered work so as to satisfy simultaneously your obligations under this 546 | License and any other pertinent obligations, then as a consequence you may 547 | not convey it at all. For example, if you agree to terms that obligate you 548 | to collect a royalty for further conveying from those to whom you convey 549 | the Program, the only way you could satisfy both those terms and this 550 | License would be to refrain entirely from conveying the Program. 551 | 552 | 13. Use with the GNU Affero General Public License. 553 | 554 | Notwithstanding any other provision of this License, you have 555 | permission to link or combine any covered work with a work licensed 556 | under version 3 of the GNU Affero General Public License into a single 557 | combined work, and to convey the resulting work. The terms of this 558 | License will continue to apply to the part which is the covered work, 559 | but the special requirements of the GNU Affero General Public License, 560 | section 13, concerning interaction through a network will apply to the 561 | combination as such. 562 | 563 | 14. Revised Versions of this License. 564 | 565 | The Free Software Foundation may publish revised and/or new versions of 566 | the GNU General Public License from time to time. Such new versions will 567 | be similar in spirit to the present version, but may differ in detail to 568 | address new problems or concerns. 569 | 570 | Each version is given a distinguishing version number. If the 571 | Program specifies that a certain numbered version of the GNU General 572 | Public License "or any later version" applies to it, you have the 573 | option of following the terms and conditions either of that numbered 574 | version or of any later version published by the Free Software 575 | Foundation. If the Program does not specify a version number of the 576 | GNU General Public License, you may choose any version ever published 577 | by the Free Software Foundation. 578 | 579 | If the Program specifies that a proxy can decide which future 580 | versions of the GNU General Public License can be used, that proxy's 581 | public statement of acceptance of a version permanently authorizes you 582 | to choose that version for the Program. 583 | 584 | Later license versions may give you additional or different 585 | permissions. However, no additional obligations are imposed on any 586 | author or copyright holder as a result of your choosing to follow a 587 | later version. 588 | 589 | 15. Disclaimer of Warranty. 590 | 591 | THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY 592 | APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT 593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY 594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, 595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 596 | PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM 597 | IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF 598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 599 | 600 | 16. Limitation of Liability. 601 | 602 | IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS 604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY 605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE 606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF 607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD 608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), 609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF 610 | SUCH DAMAGES. 611 | 612 | 17. Interpretation of Sections 15 and 16. 613 | 614 | If the disclaimer of warranty and limitation of liability provided 615 | above cannot be given local legal effect according to their terms, 616 | reviewing courts shall apply local law that most closely approximates 617 | an absolute waiver of all civil liability in connection with the 618 | Program, unless a warranty or assumption of liability accompanies a 619 | copy of the Program in return for a fee. 620 | 621 | END OF TERMS AND CONDITIONS 622 | 623 | How to Apply These Terms to Your New Programs 624 | 625 | If you develop a new program, and you want it to be of the greatest 626 | possible use to the public, the best way to achieve this is to make it 627 | free software which everyone can redistribute and change under these terms. 628 | 629 | To do so, attach the following notices to the program. It is safest 630 | to attach them to the start of each source file to most effectively 631 | state the exclusion of warranty; and each file should have at least 632 | the "copyright" line and a pointer to where the full notice is found. 633 | 634 | 635 | Copyright (C) 636 | 637 | This program is free software: you can redistribute it and/or modify 638 | it under the terms of the GNU General Public License as published by 639 | the Free Software Foundation, either version 3 of the License, or 640 | (at your option) any later version. 641 | 642 | This program is distributed in the hope that it will be useful, 643 | but WITHOUT ANY WARRANTY; without even the implied warranty of 644 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 645 | GNU General Public License for more details. 646 | 647 | You should have received a copy of the GNU General Public License 648 | along with this program. If not, see . 649 | 650 | Also add information on how to contact you by electronic and paper mail. 651 | 652 | If the program does terminal interaction, make it output a short 653 | notice like this when it starts in an interactive mode: 654 | 655 | Copyright (C) 656 | This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 657 | This is free software, and you are welcome to redistribute it 658 | under certain conditions; type `show c' for details. 659 | 660 | The hypothetical commands `show w' and `show c' should show the appropriate 661 | parts of the General Public License. Of course, your program's commands 662 | might be different; for a GUI interface, you would use an "about box". 663 | 664 | You should also get your employer (if you work as a programmer) or school, 665 | if any, to sign a "copyright disclaimer" for the program, if necessary. 666 | For more information on this, and how to apply and follow the GNU GPL, see 667 | . 668 | 669 | The GNU General Public License does not permit incorporating your program 670 | into proprietary programs. If your program is a subroutine library, you 671 | may consider it more useful to permit linking proprietary applications with 672 | the library. If this is what you want to do, use the GNU Lesser General 673 | Public License instead of this License. But first, please read 674 | . 675 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include nav.js 2 | include diff.css 3 | 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | psydiff 2 | ======= 3 | 4 | *a structural comparison tool for Python* 5 | 6 | Psydiff is a structural differencer for Python. It parses Python into ASTs, 7 | compare them, and then generate interactive HTML. 8 | 9 | 10 | ### Demo 11 | 12 | 13 | 14 | A demo of Psydiff's output (Psydiff diffing itself over a recent big change) can 15 | be found here: 16 | 17 | http://www.yinwang.org/resources/pydiff1-pydiff2.html 18 | 19 | 20 | 21 | ### Installation 22 | 23 | 1. Copy the whole directory to somewhere in your file system 24 | 2. Add the path to the system's "PATH variable" 25 | 26 | 27 | 28 | ### Usage 29 | 30 | Just run psydiff.py from the command line: 31 | 32 | ./psydiff.py demos/list1.py demos/list2.py 33 | 34 | This will generate a HTML file named list1-list2.html in the current directory. 35 | You can then use your browser to open this file and browse around the code. 36 | 37 | The HTML is a standalone entity (CSS styles and JavaScript embedded). You can 38 | put it anywhere you like and still be able to view it. 39 | 40 | 41 | 42 | ### Contact 43 | 44 | Yin Wang (yinwang0@gmail.com) 45 | 46 | 47 | 48 | #### License (GPLv3) 49 | 50 | psydiff - a structural comparison tool for Python 51 | 52 | Copyright (c) 2011-2014 Yin Wang 53 | 54 | This program is free software: you can redistribute it and/or modify 55 | it under the terms of the GNU General Public License as published by 56 | the Free Software Foundation, either version 3 of the License, or 57 | (at your option) any later version. 58 | 59 | This program is distributed in the hope that it will be useful, 60 | but WITHOUT ANY WARRANTY; without even the implied warranty of 61 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 62 | GNU General Public License for more details. 63 | 64 | You should have received a copy of the GNU General Public License 65 | along with this program. If not, see . 66 | -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yinwang0/psydiff/2e5a8767519cdd34c7ab4ea07e2e29075e8df47d/__init__.py -------------------------------------------------------------------------------- /demos/list1.py: -------------------------------------------------------------------------------- 1 | ################################################################## 2 | # Demo for an AST-based diffing tool 3 | # author: Yin Wang (yinwang0@gmail.com) 4 | ################################################################## 5 | 6 | ################################################################## 7 | # Features: 8 | # - Detect insertion, deletion and modification of code 9 | # - Detect refactoring (renamed or moved code) 10 | # - Assess similarity of code 11 | # - Ignore comments and whitespaces 12 | # 13 | ################################################################### 14 | # Usage: 15 | # 16 | # - Mouseover any framed elements to show information 17 | # 18 | # - Click on Blue or White elements to match the other side. 19 | # Once matched, the two sides will be locked into that 20 | # position until next match. 21 | # 22 | #################################################################### 23 | # Legend of colors: 24 | # 25 | # - Red : deleted 26 | # - Green : inserted 27 | # - Blue : modified (mouse over to show percentage of change) 28 | # - White : unchanged or moved 29 | # 30 | ################################################################### 31 | 32 | 33 | 34 | 35 | class Nil: 36 | def __repr__(this): 37 | return "()" 38 | 39 | nil = Nil() # singleton instance of Nil 40 | 41 | 42 | 43 | class Cons: 44 | def __init__(this, first, rest): 45 | this.first = first 46 | this.rest = rest 47 | def __repr__(this): 48 | if (this.rest == nil): 49 | return "(" + repr(this.first) + ")" 50 | elif (IS(this.rest, Cons)): 51 | s = repr(this.rest) 52 | return "(" + repr(this.first) + " " + s[1:-1] + ")" 53 | else: 54 | return "(" + repr(this.first) + " . " + repr(this.rest) + ")" 55 | 56 | 57 | 58 | 59 | def foldl(f, x, ls): 60 | if ls == nil: 61 | return x 62 | else: 63 | return foldl(f, f(x, ls.first), ls.rest) 64 | 65 | 66 | 67 | 68 | def length(ls): 69 | if ls == nil: 70 | return 0 71 | else: 72 | return 1 + length(ls.rest) 73 | 74 | 75 | 76 | 77 | def atomAssoc(u, v): 78 | return Cons(Cons(u, v), nil) 79 | 80 | 81 | 82 | 83 | def mkList(pylist): 84 | if (pylist == []): 85 | return nil 86 | else: 87 | return Cons(pylist[0], mkList(pylist[1:])) 88 | 89 | 90 | 91 | 92 | def toList(ls): 93 | ret = [] 94 | while ls <> nil: 95 | ret.append(ls.first) 96 | ls = ls.rest 97 | return ret 98 | 99 | 100 | 101 | 102 | def ext(x, v, s): 103 | return Cons(Cons(x, v), s) 104 | 105 | 106 | 107 | 108 | def append(ls1, ls2): 109 | if (ls1 == nil): 110 | return ls2 111 | else: 112 | return append(ls1.rest, Cons(ls1.first, ls2)) 113 | 114 | 115 | 116 | 117 | def assq(x, s): 118 | while s <> nil: 119 | if x == s.first.first: 120 | return s.first 121 | else: 122 | s = s.rest 123 | return None 124 | 125 | # if (s == nil): 126 | # return None 127 | # elif (x == s.first.first): 128 | # return s.first 129 | # else: 130 | # return assq(x, s.rest) 131 | 132 | 133 | # lookup is unchanged, but it is moved in relative 134 | # position to other functions. 135 | def lookup(x, s): 136 | p = assq(x, s) 137 | if p <> None: 138 | return p.snd 139 | else: 140 | return None 141 | 142 | 143 | 144 | # cmap was renamed to maplist, but the function 145 | # has been modified significantly since renaming. 146 | # Thus we no longer consider them to be the same 147 | # function. 148 | def cmap(f, ls): 149 | if (ls == nil): 150 | return nil 151 | else: 152 | return Cons(f(ls.first), cmap(f, ls.rest)) 153 | 154 | 155 | # reverse is unchanged 156 | def reverse(ls): 157 | ret = nil 158 | while ls <> nil: 159 | ret = Cons(ls.first, ret) 160 | ls = ls.rest 161 | return ret 162 | 163 | 164 | 165 | # cfilter was renamed to filterlist, but the 166 | # function has been modified significantly since 167 | # renaming. Thus we no longer consider them to be 168 | # the same function. 169 | def cfilter(f, ls): 170 | ret = nil 171 | while ls <> nil: 172 | if f(ls.first): 173 | ret = Cons(ls.first, ret) 174 | ls = ls.rest 175 | return reverse(ret) 176 | 177 | # if (ls == nil): 178 | # return nil 179 | # elif f(ls.first): 180 | # return Cons(ls.first, cfilter(f, ls.rest)) 181 | # else: 182 | # return cfilter(f, ls.rest) 183 | 184 | -------------------------------------------------------------------------------- /demos/list2.py: -------------------------------------------------------------------------------- 1 | ################################################################## 2 | # Demo for an AST-based diffing tool 3 | # author: Yin Wang (yinwang0@gmail.com) 4 | ################################################################## 5 | 6 | ################################################################## 7 | # Features: 8 | # - Detect insertion, deletion and modification of code 9 | # - Detect refactoring (renamed or moved code) 10 | # - Assess similarity of code 11 | # - Ignore comments and whitespaces 12 | # 13 | ################################################################### 14 | # Usage: 15 | # 16 | # - Mouseover any framed elements to show information 17 | # 18 | # - Click on Blue or White elements to match the other side. 19 | # Once matched, the two sides will be locked into that 20 | # position until next match. 21 | # 22 | #################################################################### 23 | # Legend of colors: 24 | # 25 | # - Red : deleted 26 | # - Green : inserted 27 | # - Blue : modified (mouse over to show percentage of change) 28 | # - White : unchanged or moved 29 | # 30 | ################################################################### 31 | 32 | 33 | 34 | 35 | 36 | class PairIterator: 37 | def __init__(self, p): 38 | self.p = p 39 | def next(self): 40 | if self.p == nil: 41 | raise StopIteration 42 | ret = self.p.fst 43 | self.p = self.p.snd 44 | return ret 45 | 46 | 47 | 48 | 49 | class Nil: 50 | def __repr__(self): 51 | return "()" 52 | def __iter__(self): 53 | return PairIterator(self) 54 | 55 | nil = Nil() # singleton instance of Nil 56 | 57 | 58 | 59 | 60 | class Pair: 61 | def __init__(self, fst, snd): 62 | self.fst = fst 63 | self.snd = snd 64 | def __repr__(self): 65 | if (self.snd == nil): 66 | return "(" + repr(self.fst) + ")" 67 | elif (isinstance(self.snd, Pair)): 68 | s = repr(self.snd) 69 | return "(" + repr(self.fst) + " " + s[1:-1] + ")" 70 | else: 71 | return "(" + repr(self.fst) + " . " + repr(self.snd) + ")" 72 | def __iter__(self): 73 | return PairIterator(self) 74 | 75 | 76 | 77 | 78 | def foldl(f, x, ls): 79 | ret = x 80 | for y in ls: 81 | ret = f(ret, y) 82 | return ret 83 | 84 | 85 | 86 | 87 | def length(ls): 88 | ret = 0 89 | for x in ls: 90 | ret = ret + 1 91 | return ret 92 | 93 | 94 | 95 | 96 | def assoc(u, v): 97 | return Pair(Pair(u, v), nil) 98 | 99 | 100 | 101 | 102 | def slist(pylist): 103 | ret = nil 104 | for i in xrange(len(pylist)): 105 | ret = Pair(pylist[len(pylist)-i-1], ret) 106 | return ret 107 | 108 | 109 | 110 | 111 | def pylist(ls): 112 | ret = [] 113 | for x in ls: 114 | ret.append(x) 115 | return ret 116 | 117 | 118 | 119 | # maplist was renamed from cmap, but the function 120 | # has been modified significantly since renaming. 121 | # Thus we no longer consider them to be the same 122 | # function. 123 | def maplist(f, ls): 124 | ret = nil 125 | for x in ls: 126 | ret = Pair(f(x), ret) 127 | return reverse(ret) 128 | 129 | 130 | 131 | # filterlist was renamed from cfilter, but the 132 | # function has been modified significantly since 133 | # renaming. Thus we no longer consider them to be 134 | # the same function. 135 | def filterlist(f, ls): 136 | ret = nil 137 | for x in ls: 138 | if f(x): 139 | ret = Pair(x, ret) 140 | return reverse(ret) 141 | 142 | 143 | 144 | def reverse(ls): 145 | ret = nil 146 | while ls <> nil: 147 | ret = Cons(ls.first, ret) 148 | ls = ls.rest 149 | return ret 150 | 151 | # def reverse(ls): 152 | # ret = nil 153 | # for x in ls: 154 | # ret = Pair(x, ret) 155 | # return ret 156 | 157 | 158 | 159 | 160 | def append(*lists, **kw): 161 | def append1(ls1, ls2): 162 | ret = ls2 163 | for x in ls1: 164 | ret = Pair(x, ret) 165 | return ret 166 | return foldl(append1, nil, slist(lists)) 167 | 168 | 169 | 170 | 171 | def assq(x, s): 172 | for p in s: 173 | if x == p.fst: 174 | return p 175 | return None 176 | 177 | 178 | 179 | 180 | def ziplist(ls1, ls2): 181 | ret = nil 182 | while ls1 <> nil and ls2 <> nil: 183 | ret = Pair(Pair(ls1.fst, ls2.fst), ret) 184 | ls1 = ls1.snd 185 | ls2 = ls2.snd 186 | return reverse(ret) 187 | 188 | 189 | 190 | 191 | # building association lists 192 | def ext(x, v, s): 193 | return Pair(Pair(x, v), s) 194 | 195 | 196 | 197 | 198 | def lookup(x, s): 199 | p = assq(x, s) 200 | if p <> None: 201 | return p.snd 202 | else: 203 | return None 204 | 205 | 206 | -------------------------------------------------------------------------------- /demos/psydiff1.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import re 3 | from ast import * 4 | from lists import * 5 | 6 | 7 | 8 | 9 | #------------------------------------------------------------- 10 | # global parameters 11 | #------------------------------------------------------------- 12 | 13 | DEBUG = False 14 | # sys.setrecursionlimit(10000) 15 | 16 | 17 | MOVE_RATIO = 0.2 18 | MOVE_SIZE = 10 19 | MOVE_ROUND = 5 20 | 21 | FRAME_DEPTH = 1 22 | FRAME_SIZE = 20 23 | 24 | NAME_PENALTY = 1 25 | IF_PENALTY = 1 26 | ASSIGN_PENALTY = 1 27 | 28 | 29 | #------------------------------------------------------------- 30 | # utilities 31 | #------------------------------------------------------------- 32 | 33 | IS = isinstance 34 | 35 | 36 | def debug(*args): 37 | if DEBUG: 38 | print args 39 | 40 | 41 | def dot(): 42 | sys.stdout.write('.') 43 | 44 | 45 | def isAlpha(c): 46 | return (c == '_' 47 | or ('0' <= c <= '9') 48 | or ('a' <= c <= 'z') 49 | or ('A' <= c <= 'Z')) 50 | 51 | 52 | def div(m, n): 53 | if n == 0: 54 | return m 55 | else: 56 | return m/float(n) 57 | 58 | 59 | # for debugging 60 | def ps(s): 61 | v = parse(s).body[0] 62 | if IS(v, Expr): 63 | return v.value 64 | else: 65 | return v 66 | 67 | 68 | def sz(s): 69 | return nodeSize(parse(s), True) - 1 70 | 71 | 72 | def dp(s): 73 | return dump(parse(s)) 74 | 75 | 76 | def run(name, closure=True, debug=False): 77 | fullname1 = name + '1.py' 78 | fullname2 = name + '2.py' 79 | 80 | global DEBUG 81 | olddebug = DEBUG 82 | DEBUG = debug 83 | 84 | diff(fullname1, fullname2, closure) 85 | 86 | DEBUG = olddebug 87 | 88 | 89 | def demo(): 90 | run('demo') 91 | 92 | 93 | def go(): 94 | run('heavy') 95 | 96 | 97 | def pf(): 98 | import cProfile 99 | cProfile.run("run('heavy')", sort="cumulative") 100 | 101 | 102 | 103 | 104 | #------------------------ file system support ----------------------- 105 | 106 | def pyFileName(filename): 107 | try: 108 | start = filename.rindex('/') + 1 109 | except ValueError: 110 | start = 0 111 | end = filename.rindex('.py') 112 | return filename[start:end] 113 | 114 | 115 | ## file system support 116 | def parseFile(filename): 117 | f = open(filename, 'r'); 118 | lines = f.read() 119 | ast = parse(lines) 120 | improveAST(ast, lines, filename, 'left') 121 | return ast 122 | 123 | 124 | 125 | 126 | #------------------------------------------------------------- 127 | # tests and operations on AST nodes 128 | #------------------------------------------------------------- 129 | 130 | # get list of fields from a node 131 | def nodeFields(node): 132 | ret = [] 133 | for field in node._fields: 134 | if field <> 'ctx' and hasattr(node, field): 135 | ret.append(getattr(node, field)) 136 | return ret 137 | 138 | 139 | 140 | # get full source text where the node is from 141 | def nodeSource(node): 142 | if hasattr(node, 'nodeSource'): 143 | return node.nodeSource 144 | else: 145 | return None 146 | 147 | 148 | 149 | # utility for getting exact source code part of the node 150 | def src(node): 151 | return node.nodeSource[node.nodeStart : node.nodeEnd] 152 | 153 | 154 | 155 | def nodeStart(node): 156 | if (hasattr(node, 'nodeStart')): 157 | return node.nodeStart 158 | else: 159 | return 0 160 | 161 | 162 | 163 | def nodeEnd(node): 164 | return node.nodeEnd 165 | 166 | 167 | 168 | def isAtom(x): 169 | return type(x) in [int, str, bool, float] 170 | 171 | 172 | 173 | def isDef(node): 174 | return IS(node, FunctionDef) or IS(node, ClassDef) 175 | 176 | 177 | 178 | # whether a node is a "frame" which can contain others and be 179 | # labeled 180 | def isFrame(node): 181 | return type(node) in [ClassDef, FunctionDef, Import, ImportFrom] 182 | 183 | 184 | 185 | def isEmptyContainer(node): 186 | 187 | if IS(node, List) and node.elts == []: 188 | return True 189 | if IS(node, Tuple) and node.elts == []: 190 | return True 191 | if IS(node, Dict) and node.keys == []: 192 | return True 193 | 194 | return False 195 | 196 | 197 | def sameDef(node1, node2): 198 | if IS(node1, FunctionDef) and IS(node2, FunctionDef): 199 | return node1.name == node2.name 200 | elif IS(node1, ClassDef) and IS(node2, ClassDef): 201 | return node1.name == node2.name 202 | else: 203 | return False 204 | 205 | 206 | def differentDef(node1, node2): 207 | if isDef(node1) and isDef(node2): 208 | return node1.name <> node2.name 209 | return False 210 | 211 | 212 | # decide whether it is reasonable to consider two nodes to be 213 | # moves of each other 214 | def canMove(node1, node2, c): 215 | return (sameDef(node1, node2) or 216 | c <= (nodeSize(node1) + nodeSize(node2)) * MOVE_RATIO) 217 | 218 | 219 | # whether the node is considered deleted or inserted because 220 | # the other party matches a substructure of it. 221 | def nodeFramed(node, changes): 222 | for c in changes: 223 | if (c.isFrame and (node == c.orig or node == c.cur)): 224 | return True 225 | return False 226 | 227 | 228 | 229 | # helper for turning nested if statements into sequences, 230 | # otherwise we will be trapped in the nested structure and find 231 | # too many differences 232 | def serializeIf(node): 233 | if IS(node, If): 234 | if not hasattr(node, 'nodeEnd'): 235 | print "has no end:", node 236 | 237 | newif = If(node.test, node.body, []) 238 | newif.lineno = node.lineno 239 | newif.col_offset = node.col_offset 240 | newif.nodeStart = node.nodeStart 241 | newif.nodeEnd = node.nodeEnd 242 | newif.nodeSource = node.nodeSource 243 | newif.fileName = node.fileName 244 | return [newif] + serializeIf(node.orelse) 245 | elif IS(node, list): 246 | ret = [] 247 | for n in node: 248 | ret += serializeIf(n) 249 | return ret 250 | else: 251 | return [node] 252 | 253 | 254 | def nodeName(node): 255 | if IS(node, Name): 256 | return node.id 257 | elif IS(node, FunctionDef) or IS(node, ClassDef): 258 | return node.name 259 | else: 260 | return None 261 | 262 | 263 | def attr2str(node): 264 | if IS(node, Attribute): 265 | vName = attr2str(node.value) 266 | if vName <> None: 267 | return vName + "." + node.attr 268 | else: 269 | return None 270 | elif IS(node, Name): 271 | return node.id 272 | else: 273 | return None 274 | 275 | 276 | ### utility for counting size of terms 277 | def nodeSize(node, test=False): 278 | 279 | if not test and hasattr(node, 'nodeSize'): 280 | ret = node.nodeSize 281 | 282 | elif IS(node, list): 283 | ret = sum(map(lambda x: nodeSize(x, test), node)) 284 | 285 | elif isAtom(node): 286 | ret = 1 287 | 288 | elif IS(node, Name): 289 | ret = 1 290 | 291 | elif IS(node, Num): 292 | ret = 1 293 | 294 | elif IS(node, Str): 295 | ret = 1 296 | 297 | elif IS(node, Expr): 298 | ret = nodeSize(node.value, test) 299 | 300 | elif IS(node, AST): 301 | ret = 1 + sum(map(lambda x: nodeSize(x, test), nodeFields(node))) 302 | 303 | else: 304 | ret = 0 305 | 306 | if test: 307 | print "node:", node, "size=", ret 308 | 309 | if IS(node, AST): 310 | node.nodeSize = ret 311 | 312 | return ret 313 | 314 | 315 | 316 | 317 | #------------------------------- types ------------------------------ 318 | # global storage of running stats 319 | class Stat: 320 | def __init__(self): 321 | pass 322 | 323 | stat = Stat() 324 | 325 | 326 | 327 | # The difference between nodes are stored as a Change structure. 328 | class Change: 329 | def __init__(self, orig, cur, cost, isFrame=False): 330 | self.orig = orig 331 | self.cur = cur 332 | if orig == None: 333 | self.cost = nodeSize(cur) 334 | elif cur == None: 335 | self.cost = nodeSize(orig) 336 | elif cost == 'all': 337 | self.cost = nodeSize(orig) + nodeSize(cur) 338 | else: 339 | self.cost = cost 340 | self.isFrame = isFrame 341 | def __repr__(self): 342 | fr = "F" if self.isFrame else "-" 343 | def hole(x): 344 | if x == None: 345 | return "[]" 346 | else: 347 | return x 348 | return ("(C:" + str(hole(self.orig)) + ":" + str(hole(self.cur)) 349 | + ":" + str(self.cost) + ":" + str(self.similarity()) 350 | + ":" + fr + ")") 351 | def similarity(self): 352 | total = nodeSize(self.orig) + nodeSize(self.cur) 353 | return 1 - div(self.cost, total) 354 | 355 | 356 | 357 | # Three major kinds of changes: 358 | # * modification 359 | # * deletion 360 | # *insertion 361 | def modifyNode(node1, node2, cost): 362 | return loner(Change(node1, node2, cost)) 363 | 364 | def delNode(node): 365 | return loner(Change(node, None, nodeSize(node))) 366 | 367 | def insNode(node): 368 | return loner(Change(None, node, nodeSize(node))) 369 | 370 | 371 | 372 | # general cache table for acceleration 373 | class Cache: 374 | def __init__(self): 375 | self.table = {} 376 | def __repr__(self): 377 | return "Cache:" + str(self.table) 378 | def __len__(self): 379 | return len(self.table) 380 | def put(self, key, value): 381 | self.table[key] = value 382 | def get(self, key): 383 | if self.table.has_key(key): 384 | return self.table[key] 385 | else: 386 | return None 387 | 388 | 389 | 390 | # 2-D array table for memoization of dynamic programming 391 | def createTable(x, y): 392 | table = [] 393 | for i in range(x+1): 394 | table.append([None] * (y+1)) 395 | return table 396 | 397 | def tableLookup(t, x, y): 398 | return t[x][y] 399 | 400 | def tablePut(t, x, y, v): 401 | t[x][y] = v 402 | 403 | 404 | 405 | 406 | 407 | #------------------------------------------------------------- 408 | # string distance function 409 | #------------------------------------------------------------- 410 | 411 | ### diff cache for AST nodes 412 | strDistCache = Cache() 413 | def clearStrDistCache(): 414 | global strDistCache 415 | strDistCache = Cache() 416 | 417 | 418 | ### string distance function 419 | def strDist(s1, s2): 420 | cached = strDistCache.get((s1, s2)) 421 | if cached <> None: 422 | return cached 423 | 424 | if len(s1) > 100 or len(s2) > 100: 425 | if s1 <> s2: 426 | return 2.0 427 | else: 428 | return 0 429 | 430 | table = createTable(len(s1), len(s2)) 431 | d = dist1(table, s1, s2) 432 | ret = div(2*d, len(s1) + len(s2)) 433 | 434 | strDistCache.put((s1, s2), ret) 435 | return ret 436 | 437 | 438 | # the main dynamic programming part 439 | # similar to the structure of diffList 440 | def dist1(table, s1, s2): 441 | def memo(v): 442 | tablePut(table, len(s1), len(s2), v) 443 | return v 444 | 445 | cached = tableLookup(table, len(s1), len(s2)) 446 | if (cached <> None): 447 | return cached 448 | 449 | if s1 == '': 450 | return memo(len(s2)) 451 | elif s2 == '': 452 | return memo(len(s1)) 453 | else: 454 | if s1[0] == s2[0]: 455 | d0 = 0 456 | elif s1[0].lower() == s2[0].lower(): 457 | d0 = 1 458 | else: 459 | d0 = 2 460 | 461 | d0 = d0 + dist1(table, s1[1:], s2[1:]) 462 | d1 = 1 + dist1(table, s1[1:], s2) 463 | d2 = 1 + dist1(table, s1, s2[1:]) 464 | return memo(min(d0, d1, d2)) 465 | 466 | 467 | 468 | 469 | #------------------------------------------------------------- 470 | # diff of nodes 471 | #------------------------------------------------------------- 472 | 473 | stat.diffCount = 0 474 | def diffNode(node1, node2, env1, env2, depth, move): 475 | 476 | # try substructural diff 477 | def trysub((changes, cost)): 478 | if not move: 479 | return (changes, cost) 480 | elif canMove(node1, node2, cost): 481 | return (changes, cost) 482 | else: 483 | mc1 = diffSubNode(node1, node2, env1, env2, depth, move) 484 | if mc1 <> None: 485 | return mc1 486 | else: 487 | return (changes, cost) 488 | 489 | if IS(node1, list) and not IS(node2, list): 490 | return diffNode(node1, [node2], env1, env2, depth, move) 491 | 492 | if not IS(node1, list) and IS(node2, list): 493 | return diffNode([node1], node2, env1, env2, depth, move) 494 | 495 | if (IS(node1, list) and IS(node2, list)): 496 | node1 = serializeIf(node1) 497 | node2 = serializeIf(node2) 498 | table = createTable(len(node1), len(node2)) 499 | return diffList(table, node1, node2, env1, env2, 0, move) 500 | 501 | # statistics 502 | stat.diffCount += 1 503 | if stat.diffCount % 1000 == 0: 504 | dot() 505 | 506 | if node1 == node2: 507 | return (modifyNode(node1, node2, 0), 0) 508 | 509 | if IS(node1, Num) and IS(node2, Num): 510 | if node1.n == node2.n: 511 | return (modifyNode(node1, node2, 0), 0) 512 | else: 513 | return (modifyNode(node1, node2, 1), 1) 514 | 515 | if IS(node1, Str) and IS(node2, Str): 516 | cost = strDist(node1.s, node2.s) 517 | return (modifyNode(node1, node2, cost), cost) 518 | 519 | if (IS(node1, Name) and IS(node2, Name)): 520 | v1 = lookup(node1.id, env1) 521 | v2 = lookup(node2.id, env2) 522 | if v1 <> v2 or (v1 == None and v2 == None): 523 | cost = strDist(node1.id, node2.id) 524 | return (modifyNode(node1, node2, cost), cost) 525 | else: # same variable 526 | return (modifyNode(node1, node2, 0), 0) 527 | 528 | if (IS(node1, Attribute) and IS(node2, Name) or 529 | IS(node1, Name) and IS(node2, Attribute) or 530 | IS(node1, Attribute) and IS(node2, Attribute)): 531 | s1 = attr2str(node1) 532 | s2 = attr2str(node2) 533 | if s1 <> None and s2 <> None: 534 | cost = strDist(s1, s2) 535 | return (modifyNode(node1, node2, cost), cost) 536 | # else fall through for things like f(x).y vs x.y 537 | 538 | # if (IS(node1, ClassDef) and IS(node2, ClassDef)): 539 | # (m1, c1) = diffNode(node1.bases, node2.bases, env1, env2, depth, move) 540 | # (m2, c2) = diffNode(node1.body, node2.body, env1, env2, depth, move) 541 | # (m3, c3) = diffNode(node1.decorator_list, node2.decorator_list, 542 | # env1, env2, depth, move) 543 | # changes = append(m1, m2, m3) 544 | # cost = c1 + c2 + c3 + strDist(node1.name, node2.name) 545 | # return trysub((changes, cost)) 546 | 547 | # if (IS(node1, FunctionDef) and IS(node2, FunctionDef)): 548 | # return trysub(diffFunctionDef(node1, node2, 549 | # env1, env2, depth, move)) 550 | 551 | # if (IS(node1, Assign) and IS(node2, Assign)): 552 | # (m1, c1) = diffNode(node1.targets, node2.targets, 553 | # env1, env2, depth, move) 554 | # (m2, c2) = diffNode(node1.value, node2.value, 555 | # env1, env2, depth, move) 556 | # return (append(m1, m2), c1 * ASSIGN_PENALTY + c2) 557 | 558 | # # flatten nested if nodes 559 | # if IS(node1, If) and IS(node2, If): 560 | # seq1 = serializeIf(node1) 561 | # seq2 = serializeIf(node2) 562 | # if len(seq1) > 1 and len(seq2) > 1: 563 | # return diffNode(seq1, seq2, env1, env2, depth, move) 564 | # else: 565 | # (m0, c0) = diffNode(node1.test, node2.test, env1, env2, depth, move) 566 | # (m1, c1) = diffNode(node1.body, node2.body, env1, env2, depth, move) 567 | # (m2, c2) = diffNode(node1.orelse, node2.orelse, env1, env2, depth, move) 568 | # changes = append(m0, m1, m2) 569 | # cost = c0 * IF_PENALTY + c1 + c2 570 | # return trysub((changes, cost)) 571 | 572 | if IS(node1, Module) and IS(node2, Module): 573 | return diffNode(node1.body, node2.body, env1, env2, depth, move) 574 | 575 | # other AST nodes 576 | if (IS(node1, AST) and IS(node2, AST) and 577 | type(node1) == type(node2)): 578 | 579 | fs1 = nodeFields(node1) 580 | fs2 = nodeFields(node2) 581 | changes, cost = nil, 0 582 | 583 | for i in xrange(len(fs1)): 584 | (m, c) = diffNode(fs1[i], fs2[i], env1, env2, depth, move) 585 | changes = append(m, changes) 586 | cost += c 587 | 588 | return trysub((changes, cost)) 589 | 590 | if (type(node1) == type(node2) and 591 | isEmptyContainer(node1) and isEmptyContainer(node2)): 592 | return (modifyNode(node1, node2, 0), 0) 593 | 594 | # all unmatched types and unequal values 595 | return trysub((append(delNode(node1), insNode(node2)), 596 | nodeSize(node1) + nodeSize(node2))) 597 | 598 | 599 | 600 | 601 | 602 | ###################### diff of a FunctionDef ##################### 603 | 604 | # separate out because it is too long 605 | 606 | def diffFunctionDef(node1, node2, env1, env2, depth, move): 607 | 608 | # positionals 609 | len1 = len(node1.args.args) 610 | len2 = len(node2.args.args) 611 | 612 | if len1 < len2: 613 | minlen = len1 614 | rest = node2.args.args[minlen:] 615 | else: 616 | minlen = len2 617 | rest = node1.args.args[minlen:] 618 | 619 | ma = nil 620 | for i in xrange(minlen): 621 | a1 = node1.args.args[i] 622 | a2 = node2.args.args[i] 623 | if IS(a1, Name) and IS(a2, Name) and a1.id <> a2.id: 624 | env1 = ext(a1.id, a2, env1) 625 | env2 = ext(a2.id, a2, env2) 626 | (m1, c1) = diffNode(a1, a2, env1, env2, depth, move) 627 | ma = append(m1, ma) 628 | 629 | # handle rest of the positionals 630 | ca = 0 631 | if rest <> []: 632 | if len1 < len2: 633 | for arg in rest: 634 | ma = append(insNode(arg), ma) 635 | ca += nodeSize(arg) 636 | else: 637 | for arg in rest: 638 | ma = append(delNode(arg), ma) 639 | ca += nodeSize(arg) 640 | 641 | # vararg 642 | va1 = node1.varargName 643 | va2 = node2.varargName 644 | if va1 <> None and va2 <> None: 645 | if va1.id <> va2.id: 646 | env1 = ext(va1.id, va2, env1) 647 | env2 = ext(va2.id, va2, env2) 648 | cost = strDist(va1.id, va2.id) 649 | ma = append(modifyNode(va1, va2, cost), ma) 650 | ca += cost 651 | elif va1 <> None or va2 <> None: 652 | cost = nodeSize(va1) if va1 <> None else nodeSize(va2) 653 | ma = append(modifyNode(va1, va2, cost), ma) 654 | ca += cost 655 | 656 | # kwarg 657 | ka1 = node1.kwargName 658 | ka2 = node2.kwargName 659 | if ka1 <> None and ka2 <> None: 660 | if ka1.id <> ka2.id: 661 | env1 = ext(ka1.id, ka2, env1) 662 | env2 = ext(ka2.id, ka2, env2) 663 | cost = strDist(ka1.id, ka2.id) 664 | ma = append(modifyNode(ka1, ka2, cost), ma) 665 | ca += cost 666 | elif ka1 <> None or ka2 <> None: 667 | cost = nodeSize(ka1) if ka1 <> None else nodeSize(ka2) 668 | ma = append(modifyNode(ka1, ka2, cost), ma) 669 | ca += cost 670 | 671 | # defaults and body 672 | (md, cd) = diffNode(node1.args.defaults, node2.args.defaults, 673 | env1, env2, depth, move) 674 | (mb, cb) = diffNode(node1.body, node2.body, env1, env2, depth, move) 675 | 676 | # sum up cost. penalize functions with different names. 677 | cost = ca + cd + cb + strDist(node1.name, node2.name) 678 | if node1.name <> node2.name: 679 | cost = cost * NAME_PENALTY 680 | 681 | return (append(ma, md, mb), cost) 682 | 683 | 684 | 685 | 686 | 687 | ########################## diff of a list ########################## 688 | 689 | # diffList is the main part of dynamic programming 690 | 691 | def diffList(table, ls1, ls2, env1, env2, depth, move): 692 | 693 | def memo(v): 694 | tablePut(table, len(ls1), len(ls2), v) 695 | return v 696 | 697 | def guess(table, ls1, ls2, env1, env2): 698 | (m0, c0) = diffNode(ls1[0], ls2[0], env1, env2, depth, move) 699 | (m1, c1) = diffList(table, ls1[1:], ls2[1:], env1, env2, depth, move) 700 | cost1 = c1 + c0 701 | 702 | if ((isFrame(ls1[0]) and 703 | isFrame(ls2[0]) and 704 | not nodeFramed(ls1[0], m0) and 705 | not nodeFramed(ls2[0], m0))): 706 | frameChange = modifyNode(ls1[0], ls2[0], c0) 707 | else: 708 | frameChange = nil 709 | 710 | # short cut 1 (func and classes with same names) 711 | if canMove(ls1[0], ls2[0], c0): 712 | return (append(frameChange, m0, m1), cost1) 713 | 714 | else: # do more work 715 | (m2, c2) = diffList(table, ls1[1:], ls2, env1, env2, depth, move) 716 | (m3, c3) = diffList(table, ls1, ls2[1:], env1, env2, depth, move) 717 | cost2 = c2 + nodeSize(ls1[0]) 718 | cost3 = c3 + nodeSize(ls2[0]) 719 | 720 | if (not differentDef(ls1[0], ls2[0]) and 721 | cost1 <= cost2 and cost1 <= cost3): 722 | return (append(frameChange, m0, m1), cost1) 723 | elif (cost2 <= cost3): 724 | return (append(delNode(ls1[0]), m2), cost2) 725 | else: 726 | return (append(insNode(ls2[0]), m3), cost3) 727 | 728 | # cache look up 729 | cached = tableLookup(table, len(ls1), len(ls2)) 730 | if (cached <> None): 731 | return cached 732 | 733 | if (ls1 == [] and ls2 == []): 734 | return memo((nil, 0)) 735 | 736 | elif (ls1 <> [] and ls2 <> []): 737 | return memo(guess(table, ls1, ls2, env1, env2)) 738 | 739 | elif ls1 == []: 740 | d = nil 741 | for n in ls2: 742 | d = append(insNode(n), d) 743 | return memo((d, nodeSize(ls2))) 744 | 745 | else: # ls2 == []: 746 | d = nil 747 | for n in ls1: 748 | d = append(delNode(n), d) 749 | return memo((d, nodeSize(ls1))) 750 | 751 | 752 | 753 | 754 | ###################### diff into a subnode ####################### 755 | 756 | # Subnode diff is only used in the moving phase. There is no 757 | # need to compare the substructure of two nodes in the first 758 | # run, because they will be reconsidered if we just consider 759 | # them to be complete deletion and insertions. 760 | 761 | def diffSubNode(node1, node2, env1, env2, depth, move): 762 | 763 | if (depth >= FRAME_DEPTH or 764 | nodeSize(node1) < FRAME_SIZE or 765 | nodeSize(node2) < FRAME_SIZE): 766 | return None 767 | 768 | if IS(node1, AST) and IS(node2, AST): 769 | 770 | if nodeSize(node1) == nodeSize(node2): 771 | return None 772 | 773 | if IS(node1, Expr): 774 | node1 = node1.value 775 | 776 | if IS(node2, Expr): 777 | node2 = node2.value 778 | 779 | if (nodeSize(node1) < nodeSize(node2)): 780 | for f in nodeFields(node2): 781 | (m0, c0) = diffNode(node1, f, env1, env2, depth+1, move) 782 | if canMove(node1, f, c0): 783 | if not IS(f, list): 784 | m1 = modifyNode(node1, f, c0) 785 | else: 786 | m1 = nil 787 | framecost = nodeSize(node2) - nodeSize(node1) 788 | m2 = loner(Change(None, node2, framecost, True)) 789 | return (append(m2, m1, m0), c0 + framecost) 790 | 791 | if (nodeSize(node1) > nodeSize(node2)): 792 | for f in nodeFields(node1): 793 | (m0, c0) = diffNode(f, node2, env1, env2, depth+1, move) 794 | if canMove(f, node2, c0): 795 | framecost = nodeSize(node1) - nodeSize(node2) 796 | if not IS(f, list): 797 | m1 = modifyNode(f, node2, c0) 798 | else: 799 | m1 = nil 800 | m2 = loner(Change(node1, None, framecost, True)) 801 | return (append(m2, m1, m0), c0 + framecost) 802 | 803 | return None 804 | 805 | 806 | 807 | 808 | 809 | ########################################################################## 810 | ## move detection 811 | ########################################################################## 812 | def moveCandidate(node): 813 | return (isDef(node) or nodeSize(node) >= MOVE_SIZE) 814 | 815 | 816 | stat.moveCount = 0 817 | stat.moveSavings = 0 818 | def getmoves(ds, round=0): 819 | 820 | dels = pylist(filterlist(lambda p: (p.cur == None and 821 | moveCandidate(p.orig) and 822 | not p.isFrame), 823 | ds)) 824 | adds = pylist(filterlist(lambda p: (p.orig == None and 825 | moveCandidate(p.cur) and 826 | not p.isFrame), 827 | ds)) 828 | 829 | # print "dels=", dels 830 | # print "adds=", adds 831 | 832 | matched = [] 833 | newChanges, total = nil, 0 834 | 835 | print("\n[getmoves #%d] %d * %d = %d pairs of nodes to consider ..." 836 | % (round, len(dels), len(adds), len(dels) * len(adds))) 837 | 838 | for d0 in dels: 839 | for a0 in adds: 840 | (node1, node2) = (d0.orig, a0.cur) 841 | (changes, cost) = diffNode(node1, node2, nil, nil, 0, True) 842 | nterms = nodeSize(node1) + nodeSize(node2) 843 | 844 | if (canMove(node1, node2, cost) or 845 | nodeFramed(node1, changes) or 846 | nodeFramed(node2, changes)): 847 | 848 | matched.append(d0) 849 | matched.append(a0) 850 | adds.remove(a0) 851 | newChanges = append(changes, newChanges) 852 | total += cost 853 | 854 | if (not nodeFramed(node1, changes) and 855 | not nodeFramed(node2, changes) and 856 | isDef(node1) and isDef(node2)): 857 | newChanges = append(modifyNode(node1, node2, cost), 858 | newChanges) 859 | 860 | stat.moveSavings += nterms 861 | stat.moveCount +=1 862 | if stat.moveCount % 1000 == 0: 863 | dot() 864 | 865 | break 866 | 867 | print("\n\t%d matched pairs found with %d new changes." 868 | % (len(pylist(matched)), len(pylist(newChanges)))) 869 | 870 | # print "matches=", matched 871 | # print "newChanges=", newChanges 872 | 873 | return (matched, newChanges, total) 874 | 875 | 876 | 877 | # Get moves repeatedly because new moves may introduce new 878 | # deletions and insertions. 879 | 880 | def closure(res): 881 | (changes, cost) = res 882 | matched = None 883 | moveround = 1 884 | 885 | while moveround <= MOVE_ROUND and matched <> []: 886 | (matched, newChanges, c) = getmoves(changes, moveround) 887 | moveround += 1 888 | # print "matched:", matched 889 | # print "changes:", changes 890 | changes = filterlist(lambda c: c not in matched, changes) 891 | changes = append(newChanges, changes) 892 | savings = sum(map(lambda p: nodeSize(p.orig) + nodeSize(p.cur), matched)) 893 | cost = cost + c - savings 894 | return (changes, cost) 895 | 896 | 897 | 898 | 899 | 900 | #------------------------------------------------------------- 901 | # improvements to the AST 902 | #------------------------------------------------------------- 903 | 904 | allNodes1 = set() 905 | allNodes2 = set() 906 | 907 | def improveNode(node, s, idxmap, filename, side): 908 | 909 | if IS(node, list): 910 | for n in node: 911 | improveNode(n, s, idxmap, filename, side) 912 | 913 | elif IS(node, AST): 914 | 915 | if side == 'left': 916 | allNodes1.add(node) 917 | else: 918 | allNodes2.add(node) 919 | 920 | findNodeStart(node, s, idxmap) 921 | findNodeEnd(node, s, idxmap) 922 | addMissingNames(node, s, idxmap) 923 | 924 | node.nodeSource = s 925 | node.fileName = filename 926 | 927 | for f in nodeFields(node): 928 | improveNode(f, s, idxmap, filename, side) 929 | 930 | 931 | 932 | def improveAST(node, s, filename, side): 933 | idxmap = buildIndexMap(s) 934 | improveNode(node, s, idxmap, filename, side) 935 | 936 | 937 | 938 | 939 | #------------------------------------------------------------- 940 | # finding start and end index of nodes 941 | #------------------------------------------------------------- 942 | 943 | def findNodeStart(node, s, idxmap): 944 | 945 | if hasattr(node, 'nodeStart'): 946 | return node.nodeStart 947 | 948 | elif IS(node, list): 949 | ret = findNodeStart(node[0], s, idxmap) 950 | 951 | elif IS(node, Module): 952 | ret = findNodeStart(node.body[0], s, idxmap) 953 | 954 | elif IS(node, BinOp): 955 | leftstart = findNodeStart(node.left, s, idxmap) 956 | if leftstart <> None: 957 | ret = leftstart 958 | else: 959 | ret = mapIdx(idxmap, node.lineno, node.col_offset) 960 | 961 | elif hasattr(node, 'lineno'): 962 | if node.col_offset >= 0: 963 | ret = mapIdx(idxmap, node.lineno, node.col_offset) 964 | else: # special case for """ strings 965 | i = mapIdx(idxmap, node.lineno, node.col_offset) 966 | while i > 0 and i+2 < len(s) and s[i:i+3] <> '"""': 967 | i -= 1 968 | ret = i 969 | else: 970 | ret = None 971 | 972 | if ret == None and hasattr(node, 'lineno'): 973 | raise TypeError("got None for node that has lineno", node) 974 | 975 | if IS(node, AST) and ret <> None: 976 | node.nodeStart = ret 977 | 978 | return ret 979 | 980 | 981 | 982 | 983 | def findNodeEnd(node, s, idxmap): 984 | 985 | if hasattr(node, 'nodeEnd'): 986 | return node.nodeEnd 987 | 988 | elif IS(node, list): 989 | ret = findNodeEnd(node[-1], s, idxmap) 990 | 991 | elif IS(node, Module): 992 | ret = findNodeEnd(node.body[-1], s, idxmap) 993 | 994 | elif IS(node, Expr): 995 | ret = findNodeEnd(node.value, s, idxmap) 996 | 997 | elif IS(node, Str): 998 | i = findNodeStart(node, s, idxmap) 999 | if i+2 < len(s) and s[i:i+3] == '"""': 1000 | q = '"""' 1001 | i += 3 1002 | elif s[i] == '"': 1003 | q = '"' 1004 | i += 1 1005 | elif s[i] == "'": 1006 | q = "'" 1007 | i += 1 1008 | else: 1009 | print "illegal:", i, s[i] 1010 | ret = endSeq(s, q, i) 1011 | 1012 | elif IS(node, Name): 1013 | ret = findNodeStart(node, s, idxmap) + len(node.id) 1014 | 1015 | elif IS(node, Attribute): 1016 | ret = endSeq(s, node.attr, findNodeEnd(node.value, s, idxmap)) 1017 | 1018 | elif IS(node, FunctionDef): 1019 | # addMissingNames(node, s, idxmap) 1020 | # ret = findNodeEnd(node.nameName, s, idxmap) 1021 | ret = findNodeEnd(node.body, s, idxmap) 1022 | 1023 | elif IS(node, Lambda): 1024 | ret = findNodeEnd(node.body, s, idxmap) 1025 | 1026 | elif IS(node, ClassDef): 1027 | # addMissingNames(node, s, idxmap) 1028 | # ret = findNodeEnd(node.nameName, s, idxmap) 1029 | ret = findNodeEnd(node.body, s, idxmap) 1030 | 1031 | elif IS(node, Call): 1032 | ret = matchParen(s, '(', ')', findNodeEnd(node.func, s, idxmap)) 1033 | 1034 | elif IS(node, Yield): 1035 | ret = findNodeEnd(node.value, s, idxmap) 1036 | 1037 | elif IS(node, Return): 1038 | if node.value <> None: 1039 | ret = findNodeEnd(node.value, s, idxmap) 1040 | else: 1041 | ret = findNodeStart(node, s, idxmap) + len('return') 1042 | 1043 | elif IS(node, Print): 1044 | ret = startSeq(s, '\n', findNodeStart(node, s, idxmap)) 1045 | 1046 | elif (IS(node, For) or 1047 | IS(node, While) or 1048 | IS(node, If) or 1049 | IS(node, IfExp)): 1050 | if node.orelse <> []: 1051 | ret = findNodeEnd(node.orelse, s, idxmap) 1052 | else: 1053 | ret = findNodeEnd(node.body, s, idxmap) 1054 | 1055 | elif IS(node, Assign) or IS(node, AugAssign): 1056 | ret = findNodeEnd(node.value, s, idxmap) 1057 | 1058 | elif IS(node, BinOp): 1059 | ret = findNodeEnd(node.right, s, idxmap) 1060 | 1061 | elif IS(node, BoolOp): 1062 | ret = findNodeEnd(node.values[-1], s, idxmap) 1063 | 1064 | elif IS(node, Compare): 1065 | ret = findNodeEnd(node.comparators[-1], s, idxmap) 1066 | 1067 | elif IS(node, UnaryOp): 1068 | ret = findNodeEnd(node.operand, s, idxmap) 1069 | 1070 | elif IS(node, Num): 1071 | ret = findNodeStart(node, s, idxmap) + len(str(node.n)) 1072 | 1073 | elif IS(node, List): 1074 | ret = matchParen(s, '[', ']', findNodeStart(node, s, idxmap)); 1075 | 1076 | elif IS(node, Subscript): 1077 | ret = matchParen(s, '[', ']', findNodeStart(node, s, idxmap)); 1078 | 1079 | elif IS(node, Tuple): 1080 | ret = findNodeEnd(node.elts[-1], s, idxmap) 1081 | 1082 | elif IS(node, Dict): 1083 | ret = matchParen(s, '{', '}', findNodeStart(node, s, idxmap)); 1084 | 1085 | elif IS(node, TryExcept): 1086 | if node.orelse <> []: 1087 | ret = findNodeEnd(node.orelse, s, idxmap) 1088 | elif node.handlers <> []: 1089 | ret = findNodeEnd(node.handlers, s, idxmap) 1090 | else: 1091 | ret = findNodeEnd(node.body, s, idxmap) 1092 | 1093 | elif IS(node, ExceptHandler): 1094 | ret = findNodeEnd(node.body, s, idxmap) 1095 | 1096 | elif IS(node, Pass): 1097 | ret = findNodeStart(node, s, idxmap) + len('pass') 1098 | 1099 | elif IS(node, Break): 1100 | ret = findNodeStart(node, s, idxmap) + len('break') 1101 | 1102 | elif IS(node, Continue): 1103 | ret = findNodeStart(node, s, idxmap) + len('continue') 1104 | 1105 | elif IS(node, Global): 1106 | ret = startSeq(s, '\n', findNodeStart(node, s, idxmap)) 1107 | 1108 | elif IS(node, Import): 1109 | ret = findNodeStart(node, s, idxmap) + len('import') 1110 | 1111 | elif IS(node, ImportFrom): 1112 | ret = findNodeStart(node, s, idxmap) + len('from') 1113 | 1114 | else: 1115 | # print "[findNodeEnd] unrecognized node:", node, "type:", type(node) 1116 | start = findNodeStart(node, s, idxmap) 1117 | if start <> None: 1118 | ret = start + 3 1119 | else: 1120 | ret = None 1121 | 1122 | if ret == None and hasattr(node, 'lineno'): 1123 | raise TypeError("got None for node that has lineno", node) 1124 | 1125 | if IS(node, AST) and ret <> None: 1126 | node.nodeEnd = ret 1127 | 1128 | return ret 1129 | 1130 | 1131 | 1132 | 1133 | #------------------------------------------------------------- 1134 | # adding missing Names 1135 | #------------------------------------------------------------- 1136 | 1137 | def addMissingNames(node, s, idxmap): 1138 | 1139 | if hasattr(node, 'extraAttribute'): 1140 | return 1141 | 1142 | if IS(node, list): 1143 | for n in node: 1144 | addMissingNames(n, s, idxmap) 1145 | 1146 | elif IS(node, ClassDef): 1147 | start = findNodeStart(node, s, idxmap) + len('class') 1148 | node.nameName = str2Name(s, start, idxmap) 1149 | node._fields += ('nameName',) 1150 | 1151 | elif IS(node, FunctionDef): 1152 | start = findNodeStart(node, s, idxmap) + len('def') 1153 | node.nameName = str2Name(s, start, idxmap) 1154 | node._fields += ('nameName',) 1155 | 1156 | if node.args.vararg <> None: 1157 | if len(node.args.args) > 0: 1158 | vstart = findNodeEnd(node.args.args[-1], s, idxmap) 1159 | else: 1160 | vstart = findNodeEnd(node.nameName, s, idxmap) 1161 | vname = str2Name(s, vstart, idxmap) 1162 | node.varargName = vname 1163 | else: 1164 | node.varargName = None 1165 | node._fields += ('varargName',) 1166 | 1167 | if node.args.kwarg <> None: 1168 | if len(node.args.args) > 0: 1169 | kstart = findNodeEnd(node.args.args[-1], s, idxmap) 1170 | else: 1171 | kstart = findNodeEnd(node.varargName, s, idxmap) 1172 | kname = str2Name(s, kstart, idxmap) 1173 | node.kwargName = kname 1174 | else: 1175 | node.kwargName = None 1176 | node._fields += ('kwargName',) 1177 | 1178 | elif IS(node, Attribute): 1179 | start = findNodeEnd(node.value, s, idxmap) 1180 | name = str2Name(s, start, idxmap) 1181 | node.attrName = name 1182 | node._fields = ('value', 'attrName') # remove attr for node size accuracy 1183 | 1184 | elif IS(node, Compare): 1185 | node.opsName = convertOps(node.ops, s, 1186 | findNodeStart(node, s, idxmap), idxmap) 1187 | node._fields += ('opsName',) 1188 | 1189 | elif (IS(node, BoolOp) or 1190 | IS(node, BinOp) or 1191 | IS(node, UnaryOp) or 1192 | IS(node, AugAssign)): 1193 | if hasattr(node, 'left'): 1194 | start = findNodeEnd(node.left, s, idxmap) 1195 | else: 1196 | start = findNodeStart(node, s, idxmap) 1197 | ops = convertOps([node.op], s, start, idxmap) 1198 | node.opName = ops[0] 1199 | node._fields += ('opName',) 1200 | 1201 | elif IS(node, Import): 1202 | nameNames = [] 1203 | next = findNodeStart(node, s, idxmap) + len('import') 1204 | name = str2Name(s, next, idxmap) 1205 | while name <> None and next < len(s) and s[next] <> '\n': 1206 | nameNames.append(name) 1207 | next = name.nodeEnd 1208 | name = str2Name(s, next, idxmap) 1209 | node.nameNames = nameNames 1210 | node._fields += ('nameNames',) 1211 | 1212 | node.extraAttribute = True 1213 | 1214 | 1215 | 1216 | #------------------------------------------------------------- 1217 | # utilities used by improve AST functions 1218 | #------------------------------------------------------------- 1219 | 1220 | # find a sequence in a string s, returning the start point 1221 | def startSeq(s, pat, start): 1222 | try: 1223 | return s.index(pat, start) 1224 | except ValueError: 1225 | return len(s) 1226 | 1227 | 1228 | 1229 | # find a sequence in a string s, returning the end point 1230 | def endSeq(s, pat, start): 1231 | try: 1232 | return s.index(pat, start) + len(pat) 1233 | except ValueError: 1234 | return len(s) 1235 | 1236 | 1237 | 1238 | # find matching close paren from start 1239 | def matchParen(s, open, close, start): 1240 | while s[start] <> open and start < len(s): 1241 | start += 1 1242 | if start >= len(s): 1243 | return len(s) 1244 | 1245 | left = 1 1246 | i = start + 1 1247 | while left > 0 and i < len(s): 1248 | if s[i] == open: 1249 | left += 1 1250 | elif s[i] == close: 1251 | left -= 1 1252 | i += 1 1253 | return i 1254 | 1255 | 1256 | 1257 | # build table for lineno <-> index oonversion 1258 | def buildIndexMap(s): 1259 | line = 0 1260 | col = 0 1261 | idx = 0 1262 | idxmap = [0] 1263 | while idx < len(s): 1264 | if s[idx] == '\n': 1265 | idxmap.append(idx + 1) 1266 | line += 1 1267 | idx += 1 1268 | return idxmap 1269 | 1270 | 1271 | 1272 | # convert (line, col) to offset index 1273 | def mapIdx(idxmap, line, col): 1274 | return idxmap[line-1] + col 1275 | 1276 | 1277 | 1278 | # convert offset index into (line, col) 1279 | def mapLineCol(idxmap, idx): 1280 | line = 0 1281 | for start in idxmap: 1282 | if idx < start: 1283 | break 1284 | line += 1 1285 | col = idx - idxmap[line-1] 1286 | return (line, col) 1287 | 1288 | 1289 | 1290 | # convert string to Name 1291 | def str2Name(s, start, idxmap): 1292 | i = start; 1293 | while i < len(s) and not isAlpha(s[i]): 1294 | i += 1 1295 | startIdx = i 1296 | ret = [] 1297 | while i < len(s) and isAlpha(s[i]): 1298 | ret.append(s[i]) 1299 | i += 1 1300 | endIdx = i 1301 | id1 = ''.join(ret) 1302 | 1303 | if id1 == '': 1304 | return None 1305 | else: 1306 | name = Name(id1, None) 1307 | name.nodeStart = startIdx 1308 | name.nodeEnd = endIdx 1309 | name.lineno, name.col_offset = mapLineCol(idxmap, startIdx) 1310 | return name 1311 | 1312 | 1313 | 1314 | def convertOps(ops, s, start, idxmap): 1315 | syms = map(lambda op: opsMap[type(op)], ops) 1316 | i = start 1317 | j = 0 1318 | ret = [] 1319 | while i < len(s) and j < len(syms): 1320 | oplen = len(syms[j]) 1321 | if s[i:i+oplen] == syms[j]: 1322 | opName = Name(syms[j], None) 1323 | opName.nodeStart = i 1324 | opName.nodeEnd = i+oplen 1325 | opName.lineno, opName.col_offset = mapLineCol(idxmap, i) 1326 | ret.append(opName) 1327 | j += 1 1328 | i = opName.nodeEnd 1329 | else: 1330 | i += 1 1331 | return ret 1332 | 1333 | 1334 | # lookup table for operators for convertOps 1335 | opsMap = { 1336 | # compare: 1337 | Eq : '==', 1338 | NotEq : '<>', 1339 | Lt : '<', 1340 | LtE : '<=', 1341 | Gt : '>', 1342 | GtE : '>=', 1343 | In : 'in', 1344 | NotIn : 'not in', 1345 | 1346 | # BoolOp 1347 | Or : 'or', 1348 | And : 'and', 1349 | Not : 'not', 1350 | 1351 | # BinOp 1352 | Add : '+', 1353 | Sub : '-', 1354 | Mult : '*', 1355 | Div : '/', 1356 | Mod : '%', 1357 | 1358 | # UnaryOp 1359 | USub : '-', 1360 | UAdd : '+', 1361 | } 1362 | 1363 | 1364 | 1365 | 1366 | 1367 | 1368 | #------------------------------------------------------------- 1369 | # HTML generation 1370 | #------------------------------------------------------------- 1371 | 1372 | 1373 | #-------------------- types and utilities ---------------------- 1374 | 1375 | class Tag: 1376 | def __init__(self, tag, idx, start=-1): 1377 | self.tag = tag 1378 | self.idx = idx 1379 | self.start = start 1380 | def __repr__(self): 1381 | return "tag:" + str(self.tag) + ":" + str(self.idx) 1382 | 1383 | 1384 | 1385 | # escape for HTML 1386 | def escape(s): 1387 | s = s.replace('"', '"') 1388 | s = s.replace("'", ''') 1389 | s = s.replace("<", '<') 1390 | s = s.replace(">", '>') 1391 | return s 1392 | 1393 | 1394 | 1395 | uidCount = -1 1396 | uidHash = {} 1397 | def clearUID(): 1398 | global uidCount, uidHash 1399 | uidCount = -1 1400 | uidHash = {} 1401 | 1402 | 1403 | def uid(node): 1404 | if uidHash.has_key(node): 1405 | return uidHash[node] 1406 | 1407 | global uidCount 1408 | uidCount += 1 1409 | uidHash[node] = str(uidCount) 1410 | return str(uidCount) 1411 | 1412 | 1413 | 1414 | def lineId(lineno): 1415 | return 'L' + str(lineno); 1416 | 1417 | 1418 | def qs(s): 1419 | return "'" + s + "'" 1420 | 1421 | 1422 | 1423 | #-------------------- main HTML generating function ------------------ 1424 | 1425 | def genHTML(text, changes, side): 1426 | ltags = lineTags(text) 1427 | ctags = changeTags(text, changes, side) 1428 | ktags = keywordTags(side) 1429 | body = applyTags(text, ltags + ctags + ktags, side) 1430 | 1431 | out = [] 1432 | out.append('\n') 1433 | out.append('\n') 1434 | out.append('\n') 1435 | out.append('\n') 1436 | out.append('\n') 1437 | out.append('\n') 1438 | out.append('\n') 1439 | 1440 | out.append('
\n')
1441 |     out.append(body)
1442 |     out.append('
\n') 1443 | 1444 | # out.append('\n') 1445 | # out.append('\n') 1446 | 1447 | return ''.join(out) 1448 | 1449 | 1450 | 1451 | # put the tags generated by changeTags into the text and create HTML 1452 | def applyTags(s, tags, side): 1453 | tags = sorted(tags, key = lambda t: (t.idx, -t.start)) 1454 | curr = 0 1455 | out = [] 1456 | for t in tags: 1457 | while curr < t.idx and curr < len(s): 1458 | out.append(escape(s[curr])) 1459 | curr += 1 1460 | out.append(t.tag) 1461 | 1462 | while curr < len(s): 1463 | out.append(escape(s[curr])) 1464 | curr += 1 1465 | return ''.join(out) 1466 | 1467 | 1468 | 1469 | 1470 | #--------------------- tag generation functions ---------------------- 1471 | 1472 | def changeTags(s, changes, side): 1473 | tags = [] 1474 | for r in changes: 1475 | key = r.orig if side == 'left' else r.cur 1476 | if hasattr(key, 'lineno'): 1477 | start = nodeStart(key) 1478 | if IS(key, FunctionDef): 1479 | end = start + len('def') 1480 | elif IS(key, ClassDef): 1481 | end = start + len('class') 1482 | else: 1483 | end = nodeEnd(key) 1484 | 1485 | if r.orig <> None and r.cur <> None: 1486 | # for change and move 1487 | tags.append(Tag(linkTagStart(r, side), start)) 1488 | tags.append(Tag("", end, start)) 1489 | else: 1490 | # for deletion and insertion 1491 | tags.append(Tag(spanStart(r), start)) 1492 | tags.append(Tag('', end, start)) 1493 | 1494 | return tags 1495 | 1496 | 1497 | 1498 | def lineTags(s): 1499 | out = [] 1500 | lineno = 1; 1501 | curr = 0 1502 | while curr < len(s): 1503 | if curr == 0 or s[curr-1] == '\n': 1504 | out.append(Tag('
', curr)) 1505 | out.append(Tag('' + str(lineno) + ' ', curr)) 1506 | if s[curr] == '\n': 1507 | out.append(Tag('
', curr)) 1508 | lineno += 1 1509 | curr += 1 1510 | out.append(Tag('', curr)) 1511 | return out 1512 | 1513 | 1514 | 1515 | def keywordTags(side): 1516 | tags = [] 1517 | allNodes = allNodes1 if side == 'left' else allNodes2 1518 | for node in allNodes: 1519 | if type(node) in keywordMap: 1520 | kw = keywordMap[type(node)] 1521 | start = nodeStart(node) 1522 | if src(node)[:len(kw)] == kw: 1523 | startTag = (Tag('', start)) 1524 | tags.append(startTag) 1525 | endTag = Tag('', start + len(kw), start) 1526 | tags.append(endTag) 1527 | return tags 1528 | 1529 | 1530 | def spanStart(diff): 1531 | if diff.cur == None: 1532 | cls = "deletion" 1533 | else: 1534 | cls = "insertion" 1535 | text = escape(describeChange(diff)) 1536 | return '' 1537 | 1538 | 1539 | 1540 | def linkTagStart(diff, side): 1541 | if side == 'left': 1542 | me, other = diff.orig, diff.cur 1543 | else: 1544 | me, other = diff.cur, diff.orig 1545 | 1546 | text = escape(describeChange(diff)) 1547 | if diff.cost > 0: 1548 | cls = "change" 1549 | else: 1550 | cls = "move" 1551 | 1552 | return ('') 1560 | 1561 | 1562 | keywordMap = { 1563 | FunctionDef : 'def', 1564 | ClassDef : 'class', 1565 | For : 'for', 1566 | While : 'while', 1567 | If : 'if', 1568 | With : 'with', 1569 | Return : 'return', 1570 | Yield : 'yield', 1571 | Global : 'global', 1572 | Raise : 'raise', 1573 | Pass : 'pass', 1574 | TryExcept : 'try', 1575 | TryFinally : 'try', 1576 | } 1577 | 1578 | 1579 | 1580 | 1581 | # human readable description of node 1582 | 1583 | def describeNode(node): 1584 | 1585 | def code(s): 1586 | return "'" + s + "'" 1587 | 1588 | def short(node): 1589 | if IS(node, Module): 1590 | ret = "module" 1591 | elif IS(node, Import): 1592 | ret = "import statement" 1593 | elif IS(node, Name): 1594 | ret = code(node.id) 1595 | elif IS(node, Attribute): 1596 | ret = code(short(node.value) + "." + short(node.attrName)) 1597 | elif IS(node, FunctionDef): 1598 | ret = "function " + code(node.name) 1599 | elif IS(node, ClassDef): 1600 | ret = "class " + code(node.name) 1601 | elif IS(node, Call): 1602 | ret = "call to " + code(short(node.func)) 1603 | elif IS(node, Assign): 1604 | ret = "assignment" 1605 | elif IS(node, If): 1606 | ret = "if statement" 1607 | elif IS(node, While): 1608 | ret = "while loop" 1609 | elif IS(node, For): 1610 | ret = "for loop" 1611 | elif IS(node, Yield): 1612 | ret = "yield" 1613 | elif IS(node, TryExcept) or IS(node, TryFinally): 1614 | ret = "try statement" 1615 | elif IS(node, Compare): 1616 | ret = "comparison " + src(node) 1617 | elif IS(node, Return): 1618 | ret = "return " + short(node.value) 1619 | elif IS(node, Print): 1620 | ret = ("print " + short(node.dest) + 1621 | ", " if (node.dest!=None) else "" + printList(node.values)) 1622 | elif IS(node, Expr): 1623 | ret = "expression " + short(node.value) 1624 | elif IS(node, Num): 1625 | ret = str(node.n) 1626 | elif IS(node, Str): 1627 | if len(node.s) > 20: 1628 | ret = "string " + code(node.s[:20]) + "..." 1629 | else: 1630 | ret = "string " + code(node.s) 1631 | elif IS(node, Tuple): 1632 | ret = "tuple (" + src(node) + ")" 1633 | elif IS(node, BinOp): 1634 | ret = (short(node.left) + " " + 1635 | node.opName.id + " " + short(node.right)) 1636 | elif IS(node, BoolOp): 1637 | ret = src(node) 1638 | elif IS(node, UnaryOp): 1639 | ret = node.opName.id + " " + short(node.operand) 1640 | elif IS(node, Pass): 1641 | ret = "pass" 1642 | elif IS(node, list): 1643 | ret = map(short, node) 1644 | else: 1645 | ret = str(type(node)) 1646 | return ret 1647 | 1648 | ret = short(node) 1649 | if hasattr(node, 'lineno'): 1650 | ret = re.sub(" *(line [0-9]+)", '', ret) 1651 | return ret + " (line " + str(node.lineno) + ")" 1652 | else: 1653 | return ret 1654 | 1655 | 1656 | 1657 | 1658 | # describe a change in a human readable fashion 1659 | def describeChange(diff): 1660 | 1661 | ratio = diff.similarity() 1662 | sim = str(ratio) 1663 | 1664 | if ratio == 1.0: 1665 | sim = " (unchanged)" 1666 | else: 1667 | sim = " (similarity %.1f%%)" % (ratio * 100) 1668 | 1669 | if diff.isFrame: 1670 | wrap = "wrap " 1671 | else: 1672 | wrap = "" 1673 | 1674 | if diff.cur == None: 1675 | ret = wrap + describeNode(diff.orig) + " deleted" 1676 | elif diff.orig == None: 1677 | ret = wrap + describeNode(diff.cur) + " inserted" 1678 | elif nodeName(diff.orig) <> nodeName(diff.cur): 1679 | ret = (describeNode(diff.orig) + 1680 | " renamed to " + describeNode(diff.cur) + sim) 1681 | elif diff.cost == 0 and diff.orig.lineno <> diff.cur.lineno: 1682 | ret = (describeNode(diff.orig) + 1683 | " moved to " + describeNode(diff.cur) + sim) 1684 | elif diff.cost == 0: 1685 | ret = describeNode(diff.orig) + " unchanged" 1686 | else: 1687 | ret = (describeNode(diff.orig) + 1688 | " changed to " + describeNode(diff.cur) + sim) 1689 | 1690 | return ret 1691 | 1692 | 1693 | 1694 | 1695 | 1696 | #------------------------------------------------------------- 1697 | # main HTML based command 1698 | #------------------------------------------------------------- 1699 | 1700 | def diff(file1, file2, move=True): 1701 | 1702 | import time 1703 | print("\nJob started at %s, %s\n" % (time.ctime(), time.tzname[0])) 1704 | startTime = time.time() 1705 | checkpoint(startTime) 1706 | 1707 | cleanUp() 1708 | 1709 | # base files names 1710 | baseName1 = pyFileName(file1) 1711 | baseName2 = pyFileName(file2) 1712 | 1713 | # get AST of file1 1714 | f1 = open(file1, 'r'); 1715 | lines1 = f1.read() 1716 | f1.close() 1717 | node1 = parse(lines1) 1718 | improveAST(node1, lines1, file1, 'left') 1719 | 1720 | # get AST of file2 1721 | f2 = open(file2, 'r'); 1722 | lines2 = f2.read() 1723 | f2.close() 1724 | node2 = parse(lines2) 1725 | improveAST(node2, lines2, file2, 'right') 1726 | 1727 | 1728 | print("[parse] finished in %s. Now start to diff." % sec2min(checkpoint())) 1729 | 1730 | # get the changes 1731 | 1732 | (changes, cost) = diffNode(node1, node2, nil, nil, 0, False) 1733 | 1734 | print ("\n[diff] processed %d nodes in %s." 1735 | % (stat.diffCount, sec2min(checkpoint()))) 1736 | 1737 | if move: 1738 | # print "changes:", changes 1739 | (changes, cost) = closure((changes, cost)) 1740 | 1741 | print("\n[closure] finished in %s." % sec2min(checkpoint())) 1742 | 1743 | 1744 | 1745 | #---------------------- print final stats --------------------- 1746 | size1 = nodeSize(node1) 1747 | size2 = nodeSize(node2) 1748 | total = size1 + size2 1749 | 1750 | report = "" 1751 | report += ("\n--------------------- summary -----------------------") + "\n" 1752 | report += ("- total changes (chars): %d" % cost) + "\n" 1753 | report += ("- total code size: %d (left: %d right: %d)" 1754 | % (total, size1, size2)) + "\n" 1755 | report += ("- total moved pieces: %d" % stat.moveCount) + "\n" 1756 | report += ("- percentage of change: %.1f%%" 1757 | % (div(cost, total) * 100)) + "\n" 1758 | report += ("-----------------------------------------------------") + "\n" 1759 | 1760 | print report 1761 | 1762 | 1763 | #---------------------- generation HTML --------------------- 1764 | # write left file 1765 | leftChanges = filterlist(lambda p: p.orig <> None, changes) 1766 | html1 = genHTML(lines1, leftChanges, 'left') 1767 | 1768 | outname1 = baseName1 + '.html' 1769 | outfile1 = open(outname1, 'w') 1770 | outfile1.write(html1) 1771 | outfile1.write('
')
1772 |     outfile1.write(report)
1773 |     outfile1.write('
') 1774 | outfile1.write('\n') 1775 | outfile1.write('\n') 1776 | outfile1.close() 1777 | 1778 | 1779 | # write right file 1780 | rightChanges = filterlist(lambda p: p.cur <> None, changes) 1781 | html2 = genHTML(lines2, rightChanges, 'right') 1782 | 1783 | outname2 = baseName2 + '.html' 1784 | outfile2 = open(outname2, 'w') 1785 | outfile2.write(html2) 1786 | outfile2.write('
')
1787 |     outfile2.write(report)
1788 |     outfile2.write('
') 1789 | outfile2.write('\n') 1790 | outfile2.write('\n') 1791 | outfile2.close() 1792 | 1793 | 1794 | # write frame file 1795 | framename = baseName1 + "-" + baseName2 + ".html" 1796 | framefile = open(framename, 'w') 1797 | framefile.write('\n') 1798 | framefile.write('\n') 1799 | framefile.write('\n') 1800 | framefile.write('\n') 1801 | framefile.close() 1802 | 1803 | dur = time.time() - startTime 1804 | print("\n[summary] Job finished at %s, %s" % 1805 | (time.ctime(), time.tzname[0])) 1806 | print("\n\tTotal duration: %s" % sec2min(dur)) 1807 | 1808 | 1809 | 1810 | 1811 | def cleanUp(): 1812 | clearStrDistCache() 1813 | clearUID() 1814 | 1815 | global allNodes1, allNodes2 1816 | allNodes1 = set() 1817 | allNodes2 = set() 1818 | 1819 | stat.diffCount = 0 1820 | stat.moveCount = 0 1821 | stat.moveSavings = 0 1822 | 1823 | 1824 | 1825 | def sec2min(s): 1826 | if s < 60: 1827 | return ("%.1f seconds" % s) 1828 | else: 1829 | return ("%.1f minutes" % div(s, 60)) 1830 | 1831 | 1832 | 1833 | lastCheckpoint = None 1834 | def checkpoint(init=None): 1835 | import time 1836 | global lastCheckpoint 1837 | if init <> None: 1838 | lastCheckpoint = init 1839 | return None 1840 | else: 1841 | dur = time.time() - lastCheckpoint 1842 | lastCheckpoint = time.time() 1843 | return dur 1844 | 1845 | 1846 | 1847 | 1848 | #------------------------------------------------------------- 1849 | # text-based interfaces 1850 | #------------------------------------------------------------- 1851 | 1852 | ## text-based main command 1853 | def printDiff(file1, file2): 1854 | (m, c) = diffFile(file1, file2) 1855 | print "----------", file1, "<<<", c, ">>>", file2, "-----------" 1856 | 1857 | ms = pylist(m) 1858 | ms = sorted(ms, key=lambda d: nodeStart(d.orig)) 1859 | print "\n-------------------- changes(", len(ms), ")---------------------- " 1860 | for m0 in ms: 1861 | print m0 1862 | 1863 | print "\n------------------- end ----------------------- " 1864 | 1865 | 1866 | 1867 | 1868 | def diffFile(file1, file2): 1869 | node1 = parseFile(file1) 1870 | node2 = parseFile(file2) 1871 | return closure(diffNode(node1, node2, nil, nil, 0, False)) 1872 | 1873 | 1874 | 1875 | 1876 | # printing support for debugging use 1877 | def iter_fields(node): 1878 | """Iterate over all existing fields, excluding 'ctx'.""" 1879 | for field in node._fields: 1880 | try: 1881 | if field <> 'ctx': 1882 | yield field, getattr(node, field) 1883 | except AttributeError: 1884 | pass 1885 | 1886 | 1887 | def dump(node, annotate_fields=True, include_attributes=False): 1888 | def _format(node): 1889 | if isinstance(node, AST): 1890 | fields = [(a, _format(b)) for a, b in iter_fields(node)] 1891 | rv = '%s(%s' % (node.__class__.__name__, ', '.join( 1892 | ('%s=%s' % field for field in fields) 1893 | if annotate_fields else 1894 | (b for a, b in fields) 1895 | )) 1896 | if include_attributes and node._attributes: 1897 | rv += fields and ', ' or ' ' 1898 | rv += ', '.join('%s=%s' % (a, _format(getattr(node, a))) 1899 | for a in node._attributes) 1900 | return rv + ')' 1901 | elif isinstance(node, list): 1902 | return '[%s]' % ', '.join(_format(x) for x in node) 1903 | return repr(node) 1904 | if not isinstance(node, AST): 1905 | raise TypeError('expected AST, got %r' % node.__class__.__name__) 1906 | return _format(node) 1907 | 1908 | def printList(ls): 1909 | if (ls == None or ls == []): 1910 | return "" 1911 | elif (len(ls) == 1): 1912 | return str(ls[0]) 1913 | else: 1914 | return str(ls) 1915 | 1916 | 1917 | 1918 | 1919 | # for debugging use 1920 | def printAst(node): 1921 | if (IS(node, Module)): 1922 | ret = "module:" + str(node.body) 1923 | elif (IS(node, Name)): 1924 | ret = str(node.id) 1925 | elif (IS(node, Attribute)): 1926 | if hasattr(node, 'attrName'): 1927 | ret = str(node.value) + "." + str(node.attrName) 1928 | else: 1929 | ret = str(node.value) + "." + str(node.attr) 1930 | elif (IS(node, FunctionDef)): 1931 | if hasattr(node, 'nameName'): 1932 | ret = "fun:" + str(node.nameName) 1933 | else: 1934 | ret = "fun:" + str(node.name) 1935 | elif (IS(node, ClassDef)): 1936 | ret = "class:" + str(node.name) 1937 | elif (IS(node, Call)): 1938 | ret = "call:" + str(node.func) + ":(" + printList(node.args) + ")" 1939 | elif (IS(node, Assign)): 1940 | ret = "(" + printList(node.targets) + " <- " + printAst(node.value) + ")" 1941 | elif (IS(node, If)): 1942 | ret = "if " + str(node.test) + ":" + printList(node.body) + ":" + printList(node.orelse) 1943 | elif (IS(node, Compare)): 1944 | ret = str(node.left) + ":" + printList(node.ops) + ":" + printList(node.comparators) 1945 | elif (IS(node, Return)): 1946 | ret = "return " + repr(node.value) 1947 | elif (IS(node, Print)): 1948 | ret = "print(" + (str(node.dest) + ", " if (node.dest!=None) else "") + printList(node.values) + ")" 1949 | elif (IS(node, Expr)): 1950 | ret = "expr:" + str(node.value) 1951 | elif (IS(node, Num)): 1952 | ret = "num:" + str(node.n) 1953 | elif (IS(node, Str)): 1954 | ret = 'str:"' + str(node.s) + '"' 1955 | elif (IS(node, BinOp)): 1956 | ret = str(node.left) + " " + str(node.op) + " " + str(node.right) 1957 | elif (IS(node, Add)): 1958 | ret = '+' 1959 | elif (IS(node, Mult)): 1960 | ret = '*' 1961 | elif IS(node, NotEq): 1962 | ret = '<>' 1963 | elif (IS(node, Eq)): 1964 | ret = '==' 1965 | elif (IS(node, Pass)): 1966 | ret = "pass" 1967 | elif IS(node,list): 1968 | ret = printList(node) 1969 | else: 1970 | ret = str(type(node)) 1971 | 1972 | if hasattr(node, 'lineno'): 1973 | return re.sub("@[0-9]+", '', ret) + "@" + str(node.lineno) 1974 | elif hasattr(node, 'nodeStart'): 1975 | return re.sub("@[0-9]+", '', ret) + "%" + str(nodeStart(node)) 1976 | else: 1977 | return ret 1978 | 1979 | 1980 | def installPrinter(): 1981 | import inspect, ast 1982 | for name, obj in inspect.getmembers(ast): 1983 | if (inspect.isclass(obj) and not (obj == AST)): 1984 | obj.__repr__ = printAst 1985 | 1986 | installPrinter() 1987 | 1988 | # demo 1989 | # diff('demos/demo1.py', 'demos/demo2.py') 1990 | -------------------------------------------------------------------------------- /demos/psydiff2.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import sys 4 | import time 5 | import cProfile 6 | 7 | from ast import * 8 | 9 | from parameters import * 10 | from improve_ast import * 11 | from htmlize import * 12 | from utils import * 13 | 14 | 15 | 16 | #------------------------------- types ------------------------------ 17 | class Stat: 18 | "storage for stat counters" 19 | def __init__(self): 20 | self.reset() 21 | 22 | def reset(self): 23 | self.diff_count = 0 24 | self.move_count = 0 25 | self.move_savings = 0 26 | 27 | def add_moves(self, nterms): 28 | self.move_savings += nterms 29 | self.move_count +=1 30 | if self.move_count % 1000 == 0: 31 | dot() 32 | def add_diff(self): 33 | self.diff_count += 1 34 | if stat.diff_count % 1000 == 0: 35 | dot() 36 | 37 | stat = Stat() 38 | 39 | 40 | 41 | # The difference between nodes are stored as a Change structure. 42 | class Change: 43 | def __init__(self, orig, cur, cost, is_frame=False): 44 | self.orig = orig 45 | self.cur = cur 46 | if orig == None: 47 | self.cost = node_size(cur) 48 | elif cur == None: 49 | self.cost = node_size(orig) 50 | elif cost == 'all': 51 | self.cost = node_size(orig) + node_size(cur) 52 | else: 53 | self.cost = cost 54 | self.is_frame = is_frame 55 | def __repr__(self): 56 | fr = "F" if self.is_frame else "-" 57 | def hole(x): 58 | return [] if x==None else x 59 | return ("(C:" + str(hole(self.orig)) + ":" + str(hole(self.cur)) 60 | + ":" + str(self.cost) + ":" + str(self.similarity()) 61 | + ":" + fr + ")") 62 | def similarity(self): 63 | total = node_size(self.orig) + node_size(self.cur) 64 | return 1 - div(self.cost, total) 65 | 66 | 67 | 68 | # Three major kinds of changes: 69 | # * modification 70 | # * deletion 71 | # * insertion 72 | def mod_node(node1, node2, cost): 73 | return [Change(node1, node2, cost)] 74 | 75 | def del_node(node): 76 | return [Change(node, None, node_size(node))] 77 | 78 | def ins_node(node): 79 | return [Change(None, node, node_size(node))] 80 | 81 | 82 | # 2-D array table for memoization of dynamic programming 83 | def create_table(x, y): 84 | table = [] 85 | for i in xrange(x+1): 86 | table.append([None] * (y+1)) 87 | return table 88 | 89 | def table_lookup(t, x, y): 90 | return t[x][y] 91 | 92 | def table_put(t, x, y, v): 93 | t[x][y] = v 94 | 95 | 96 | 97 | 98 | 99 | #------------------------------------------------------------- 100 | # string distance function 101 | #------------------------------------------------------------- 102 | 103 | ### diff cache for AST nodes 104 | str_dist_cache = {} 105 | 106 | 107 | ### string distance function 108 | def str_dist(s1, s2): 109 | cached = str_dist_cache.get((s1, s2)) 110 | if cached is not None: 111 | return cached 112 | 113 | if len(s1) > 100 or len(s2) > 100: 114 | if s1 != s2: 115 | return 2.0 116 | else: 117 | return 0 118 | 119 | table = create_table(len(s1), len(s2)) 120 | d = dist1(table, s1, s2) 121 | ret = div(2*d, len(s1) + len(s2)) 122 | 123 | str_dist_cache[(s1, s2)]=ret 124 | return ret 125 | 126 | 127 | # the main dynamic programming part 128 | # similar to the structure of diff_list 129 | def dist1(table, s1, s2): 130 | def memo(v): 131 | table_put(table, len(s1), len(s2), v) 132 | return v 133 | 134 | cached = table_lookup(table, len(s1), len(s2)) 135 | if cached is not None: 136 | return cached 137 | 138 | if s1 == '': 139 | return memo(len(s2)) 140 | elif s2 == '': 141 | return memo(len(s1)) 142 | else: 143 | if s1[0] == s2[0]: 144 | d0 = 0 145 | elif s1[0].lower() == s2[0].lower(): 146 | d0 = 1 147 | else: 148 | d0 = 2 149 | 150 | d0 = d0 + dist1(table, s1[1:], s2[1:]) 151 | d1 = 1 + dist1(table, s1[1:], s2) 152 | d2 = 1 + dist1(table, s1, s2[1:]) 153 | return memo(min(d0, d1, d2)) 154 | 155 | 156 | 157 | 158 | #------------------------------------------------------------- 159 | # diff of nodes 160 | #------------------------------------------------------------- 161 | 162 | def diff_node(node1, node2, depth, move): 163 | 164 | # try substructural diff 165 | def trysub((changes, cost)): 166 | if not move: 167 | return (changes, cost) 168 | elif can_move(node1, node2, cost): 169 | return (changes, cost) 170 | else: 171 | mc1 = diff_subnode(node1, node2, depth, move) 172 | if mc1 is not None: 173 | return mc1 174 | else: 175 | return (changes, cost) 176 | 177 | if isinstance(node1, list) and not isinstance(node2, list): 178 | node2 = [node2] 179 | 180 | if not isinstance(node1, list) and isinstance(node2, list): 181 | node1 = [node1] 182 | 183 | if isinstance(node1, list) and isinstance(node2, list): 184 | node1 = serialize_if(node1) 185 | node2 = serialize_if(node2) 186 | table = create_table(len(node1), len(node2)) 187 | return diff_list(table, node1, node2, 0, move) 188 | 189 | # statistics 190 | stat.add_diff() 191 | 192 | if node1 == node2: 193 | return (mod_node(node1, node2, 0), 0) 194 | 195 | if isinstance(node1, Num) and isinstance(node2, Num): 196 | if node1.n == node2.n: 197 | return (mod_node(node1, node2, 0), 0) 198 | else: 199 | return (mod_node(node1, node2, 1), 1) 200 | 201 | if isinstance(node1, Str) and isinstance(node2, Str): 202 | cost = str_dist(node1.s, node2.s) 203 | return (mod_node(node1, node2, cost), cost) 204 | 205 | if (isinstance(node1, Name) and isinstance(node2, Name)): 206 | cost = str_dist(node1.id, node2.id) 207 | return (mod_node(node1, node2, cost), cost) 208 | 209 | if (isinstance(node1, Attribute) and isinstance(node2, Name) or 210 | isinstance(node1, Name) and isinstance(node2, Attribute) or 211 | isinstance(node1, Attribute) and isinstance(node2, Attribute)): 212 | s1 = attr_to_str(node1) 213 | s2 = attr_to_str(node2) 214 | if s1 <> None and s2 <> None: 215 | cost = str_dist(s1, s2) 216 | return (mod_node(node1, node2, cost), cost) 217 | # else fall through for things like f(x).y vs x.y 218 | 219 | if isinstance(node1, Module) and isinstance(node2, Module): 220 | return diff_node(node1.body, node2.body, depth, move) 221 | 222 | # other AST nodes 223 | if (isinstance(node1, AST) and isinstance(node2, AST) and 224 | type(node1) == type(node2)): 225 | 226 | fs1 = node_fields(node1) 227 | fs2 = node_fields(node2) 228 | changes, cost = [], 0 229 | 230 | for i in xrange(len(fs1)): 231 | (m, c) = diff_node(fs1[i], fs2[i], depth, move) 232 | changes = m + changes 233 | cost += c 234 | 235 | return trysub((changes, cost)) 236 | 237 | if (type(node1) == type(node2) and 238 | is_empty_container(node1) and is_empty_container(node2)): 239 | return (mod_node(node1, node2, 0), 0) 240 | 241 | # all unmatched types and unequal values 242 | return trysub((del_node(node1) + ins_node(node2), 243 | node_size(node1) + node_size(node2))) 244 | 245 | 246 | 247 | ########################## diff of a list ########################## 248 | 249 | # diff_list is the main part of dynamic programming 250 | 251 | def diff_list(table, ls1, ls2, depth, move): 252 | 253 | def memo(v): 254 | table_put(table, len(ls1), len(ls2), v) 255 | return v 256 | 257 | def guess(table, ls1, ls2): 258 | (m0, c0) = diff_node(ls1[0], ls2[0], depth, move) 259 | (m1, c1) = diff_list(table, ls1[1:], ls2[1:], depth, move) 260 | cost1 = c1 + c0 261 | 262 | if ((is_frame(ls1[0]) and 263 | is_frame(ls2[0]) and 264 | not nodeFramed(ls1[0], m0) and 265 | not nodeFramed(ls2[0], m0))): 266 | frame_change = mod_node(ls1[0], ls2[0], c0) 267 | else: 268 | frame_change = [] 269 | 270 | # short cut 1 (func and classes with same names) 271 | if can_move(ls1[0], ls2[0], c0): 272 | return (frame_change + m0 + m1, cost1) 273 | 274 | else: # do more work 275 | (m2, c2) = diff_list(table, ls1[1:], ls2, depth, move) 276 | (m3, c3) = diff_list(table, ls1, ls2[1:], depth, move) 277 | cost2 = c2 + node_size(ls1[0]) 278 | cost3 = c3 + node_size(ls2[0]) 279 | 280 | if (not different_def(ls1[0], ls2[0]) and 281 | cost1 <= cost2 and cost1 <= cost3): 282 | return (frame_change + m0 + m1, cost1) 283 | elif (cost2 <= cost3): 284 | return (del_node(ls1[0]) + m2, cost2) 285 | else: 286 | return (ins_node(ls2[0]) + m3, cost3) 287 | 288 | # cache look up 289 | cached = table_lookup(table, len(ls1), len(ls2)) 290 | if cached is not None: 291 | return cached 292 | 293 | if (ls1 == [] and ls2 == []): 294 | return memo(([], 0)) 295 | 296 | elif (ls1 <> [] and ls2 <> []): 297 | return memo(guess(table, ls1, ls2)) 298 | 299 | elif ls1 == []: 300 | d = [] 301 | for n in ls2: 302 | d = ins_node(n) + d 303 | return memo((d, node_size(ls2))) 304 | 305 | else: # ls2 == []: 306 | d = [] 307 | for n in ls1: 308 | d = del_node(n) + d 309 | return memo((d, node_size(ls1))) 310 | 311 | 312 | 313 | 314 | ###################### diff into a subnode ####################### 315 | 316 | # Subnode diff is only used in the moving phase. There is no 317 | # need to compare the substructure of two nodes in the first 318 | # run, because they will be reconsidered if we just consider 319 | # them to be complete deletion and insertions. 320 | 321 | def diff_subnode(node1, node2, depth, move): 322 | 323 | if (depth >= FRAME_DEPTH or 324 | node_size(node1) < FRAME_SIZE or 325 | node_size(node2) < FRAME_SIZE): 326 | return None 327 | 328 | if isinstance(node1, AST) and isinstance(node2, AST): 329 | 330 | if node_size(node1) == node_size(node2): 331 | return None 332 | 333 | if isinstance(node1, Expr): 334 | node1 = node1.value 335 | 336 | if isinstance(node2, Expr): 337 | node2 = node2.value 338 | 339 | if (node_size(node1) < node_size(node2)): 340 | for f in node_fields(node2): 341 | (m0, c0) = diff_node(node1, f, depth+1, move) 342 | if can_move(node1, f, c0): 343 | if not isinstance(f, list): 344 | m1 = mod_node(node1, f, c0) 345 | else: 346 | m1 = [] 347 | framecost = node_size(node2) - node_size(node1) 348 | m2 = [Change(None, node2, framecost, True)] 349 | return (m2 + m1 + m0, c0 + framecost) 350 | 351 | if (node_size(node1) > node_size(node2)): 352 | for f in node_fields(node1): 353 | (m0, c0) = diff_node(f, node2, depth+1, move) 354 | if can_move(f, node2, c0): 355 | framecost = node_size(node1) - node_size(node2) 356 | if not isinstance(f, list): 357 | m1 = mod_node(f, node2, c0) 358 | else: 359 | m1 = [] 360 | m2 = [Change(node1, None, framecost, True)] 361 | return (m2 + m1 + m0, c0 + framecost) 362 | 363 | return None 364 | 365 | 366 | 367 | 368 | 369 | ########################################################################## 370 | ## move detection 371 | ########################################################################## 372 | def move_candidate(node): 373 | return (is_def(node) or node_size(node) >= MOVE_SIZE) 374 | 375 | 376 | def get_moves(ds, round=0): 377 | 378 | dels = filter(lambda p: (p.cur == None and 379 | move_candidate(p.orig) and 380 | not p.is_frame), 381 | ds) 382 | adds = filter(lambda p: (p.orig == None and 383 | move_candidate(p.cur) and 384 | not p.is_frame), 385 | ds) 386 | 387 | # print "dels=", dels 388 | # print "adds=", adds 389 | 390 | matched = [] 391 | newChanges, total = [], 0 392 | 393 | print("\n[move #%d] %d * %d = %d pairs of nodes to consider ..." 394 | % (round, len(dels), len(adds), len(dels) * len(adds))) 395 | 396 | for d0 in dels: 397 | for a0 in adds: 398 | (node1, node2) = (d0.orig, a0.cur) 399 | (changes, cost) = diff_node(node1, node2, 0, True) 400 | nterms = node_size(node1) + node_size(node2) 401 | 402 | if (can_move(node1, node2, cost) or 403 | nodeFramed(node1, changes) or 404 | nodeFramed(node2, changes)): 405 | 406 | matched.append(d0) 407 | matched.append(a0) 408 | adds.remove(a0) 409 | newChanges = changes + newChanges 410 | total += cost 411 | 412 | if (not nodeFramed(node1, changes) and 413 | not nodeFramed(node2, changes) and 414 | is_def(node1) and is_def(node2)): 415 | newChanges = mod_node(node1, node2, cost) + newChanges 416 | stat.add_moves(nterms) 417 | break 418 | 419 | print("\n\t%d matched pairs found with %d new changes." 420 | % (len(matched), len(newChanges))) 421 | 422 | # print "matches=", matched 423 | # print "newChanges=", newChanges 424 | 425 | return (matched, newChanges, total) 426 | 427 | 428 | 429 | # Get moves repeatedly because new moves may introduce new 430 | # deletions and insertions. 431 | 432 | def find_all_moves(res): 433 | (changes, cost) = res 434 | matched = None 435 | moveround = 1 436 | 437 | while moveround <= MOVE_ROUND and matched <> []: 438 | (matched, newChanges, c) = get_moves(changes, moveround) 439 | moveround += 1 440 | # print "matched:", matched 441 | # print "changes:", changes 442 | changes = filter(lambda c: c not in matched, changes) 443 | changes = newChanges + changes 444 | savings = sum(map(lambda p: node_size(p.orig) + node_size(p.cur), matched)) 445 | cost = cost + c - savings 446 | return (changes, cost) 447 | 448 | 449 | 450 | 451 | 452 | 453 | 454 | 455 | #------------------------------------------------------------- 456 | # main HTML based command 457 | #------------------------------------------------------------- 458 | 459 | def diff(file1, file2, move=True): 460 | 461 | print("\nJob started at %s, %s\n" % (time.ctime(), time.tzname[0])) 462 | start_time = time.time() 463 | checkpoint(start_time) 464 | 465 | cleanup() 466 | 467 | # base files names 468 | base1 = base_name(file1) 469 | base2 = base_name(file2) 470 | 471 | # get AST of file1 472 | f1 = open(file1, 'r'); 473 | lines1 = f1.read() 474 | f1.close() 475 | node1 = parse(lines1) 476 | improve_ast(node1, lines1, file1, 'left') 477 | 478 | # get AST of file2 479 | f2 = open(file2, 'r'); 480 | lines2 = f2.read() 481 | f2.close() 482 | node2 = parse(lines2) 483 | improve_ast(node2, lines2, file2, 'right') 484 | 485 | 486 | print("[parse] finished in %s. Now start to diff." % sec_to_min(checkpoint())) 487 | 488 | # get the changes 489 | 490 | (changes, cost) = diff_node(node1, node2, 0, False) 491 | 492 | print ("\n[diff] processed %d nodes in %s." 493 | % (stat.diff_count, sec_to_min(checkpoint()))) 494 | 495 | if move: 496 | (changes, cost) = find_all_moves((changes, cost)) 497 | 498 | print("\nfinished in %s." % sec_to_min(checkpoint())) 499 | 500 | 501 | 502 | #---------------------- print final stats --------------------- 503 | size1 = node_size(node1) 504 | size2 = node_size(node2) 505 | total = size1 + size2 506 | 507 | report = "" 508 | report += ("\n--------------------- summary -----------------------") + "\n" 509 | report += ("- total changes (chars): %d" % cost) + "\n" 510 | report += ("- total code size: %d (left: %d right: %d)" 511 | % (total, size1, size2)) + "\n" 512 | report += ("- total moved pieces: %d" % stat.move_count) + "\n" 513 | report += ("- percentage of change: %.1f%%" 514 | % (div(cost, total) * 100)) + "\n" 515 | report += ("-----------------------------------------------------") + "\n" 516 | 517 | print report 518 | 519 | 520 | #---------------------- generation HTML --------------------- 521 | 522 | htmlize(changes, file1, file2, lines1, lines2) 523 | 524 | dur = time.time() - start_time 525 | print("\n[summary] Job finished at %s, %s" % 526 | (time.ctime(), time.tzname[0])) 527 | print("\n\tTotal duration: %s" % sec_to_min(dur)) 528 | 529 | 530 | 531 | 532 | def cleanup(): 533 | str_dist_cache.clear() 534 | clear_uid() 535 | 536 | global allNodes1, allNodes2 537 | allNodes1 = set() 538 | allNodes2 = set() 539 | 540 | stat.reset() 541 | 542 | 543 | 544 | def sec_to_min(s): 545 | if s < 60: 546 | return ("%.1f seconds" % s) 547 | else: 548 | return ("%.1f minutes" % div(s, 60)) 549 | 550 | 551 | 552 | last_checkpoint = None 553 | def checkpoint(init=None): 554 | import time 555 | global last_checkpoint 556 | if init <> None: 557 | last_checkpoint = init 558 | return None 559 | else: 560 | dur = time.time() - last_checkpoint 561 | last_checkpoint = time.time() 562 | return dur 563 | 564 | 565 | 566 | 567 | #------------------------------------------------------------- 568 | # text-based interfaces 569 | #------------------------------------------------------------- 570 | 571 | ## print the diffs as text 572 | def print_diff(file1, file2): 573 | (m, c) = diff_file(file1, file2) 574 | print "----------", file1, "<<<", c, ">>>", file2, "-----------" 575 | 576 | ms = m 577 | ms = sorted(ms, key=lambda d: node_start(d.orig)) 578 | print "\n-------------------- changes(", len(ms), ")---------------------- " 579 | for m0 in ms: 580 | print m0 581 | 582 | print "\n------------------- end ----------------------- " 583 | 584 | 585 | 586 | 587 | def diff_file(file1, file2): 588 | node1 = parse_file(file1) 589 | node2 = parse_file(file2) 590 | return find_all_moves(diff_node(node1, node2, 0, False)) 591 | 592 | 593 | ## if run under command line 594 | ## pydiff.py file1.py file2.py 595 | if len(sys.argv) == 3: 596 | file1 = sys.argv[1] 597 | file2 = sys.argv[2] 598 | diff(file1, file2) 599 | -------------------------------------------------------------------------------- /diff.css: -------------------------------------------------------------------------------- 1 | .d { /* deleted */ 2 | border: solid 1px #CC929A; 3 | border-radius: 3px; 4 | background-color: #FCBFBA; 5 | } 6 | 7 | .i { /* inserted */ 8 | border: solid 1px #73BE73; 9 | border-radius: 3px; 10 | background-color: #98FB98; 11 | } 12 | 13 | .c { /* changed */ 14 | border: solid 1px #8AADB8; 15 | background-color: LightBlue; 16 | border-radius: 3px; 17 | cursor: pointer; 18 | } 19 | 20 | .m { /* moved */ 21 | border: solid 1px #A9A9A9; 22 | border-radius: 3px; 23 | cursor: crosshair; 24 | } 25 | 26 | .mc { 27 | border: solid 1px LightPink; 28 | background-color: LightBlue; 29 | cursor: pointer; 30 | } 31 | 32 | .u { /* unchanged */ 33 | border: solid 1px #A9A9A9; 34 | border-radius: 4px; 35 | cursor: crosshair; 36 | } 37 | 38 | span.lineno { 39 | color: lightgrey; 40 | -webkit-user-select: none; 41 | -moz-user-select: none; 42 | } 43 | 44 | span.keyword { 45 | /* color: #007070; */ 46 | font-weight: 700; 47 | } 48 | 49 | div.line { 50 | } 51 | 52 | div.src { 53 | width:48%; 54 | height:98%; 55 | overflow:scroll; 56 | float:left; 57 | padding:0.5%; 58 | border: solid 2px LightGrey; 59 | border-radius: 5px; 60 | } 61 | 62 | 63 | div.stats { 64 | border: solid 1px grey; 65 | z-index: 1000; 66 | width: 80%; 67 | padding-left: 5%; 68 | } 69 | 70 | pre.stats { 71 | color: grey; 72 | -webkit-user-select: none; 73 | -moz-user-select: none; 74 | } 75 | 76 | pre { 77 | line-height: 200%; 78 | } 79 | 80 | p { 81 | line-height: 200%; 82 | } 83 | 84 | ::-webkit-scrollbar { 85 | width: 10px; 86 | } 87 | 88 | ::-webkit-scrollbar-track { 89 | -webkit-box-shadow: inset 0 0 6px rgba(0,0,0,0.3); 90 | border-radius: 10px; 91 | } 92 | 93 | ::-webkit-scrollbar-thumb { 94 | border-radius: 10px; 95 | -webkit-box-shadow: inset 0 0 6px rgba(0,0,0,0.5); 96 | } 97 | -------------------------------------------------------------------------------- /htmlize.py: -------------------------------------------------------------------------------- 1 | #------------------------------------------------------------- 2 | # HTML generation 3 | #------------------------------------------------------------- 4 | 5 | import os 6 | 7 | from parameters import * 8 | from ast import * 9 | from utils import * 10 | 11 | 12 | #-------------------- types and utilities ---------------------- 13 | 14 | class Tag: 15 | def __init__(self, tag, idx, start=-1): 16 | self.tag = tag 17 | self.idx = idx 18 | self.start = start 19 | def __repr__(self): 20 | return "tag:" + str(self.tag) + ":" + str(self.idx) 21 | 22 | 23 | 24 | # escape for HTML 25 | def escape(s): 26 | s = s.replace('"', '"') 27 | s = s.replace("'", ''') 28 | s = s.replace("<", '<') 29 | s = s.replace(">", '>') 30 | return s 31 | 32 | 33 | 34 | uid_count = -1 35 | uid_hash = {} 36 | def clear_uid(): 37 | global uid_count, uid_hash 38 | uid_count = -1 39 | uid_hash = {} 40 | 41 | 42 | def uid(node): 43 | if node in uid_hash: 44 | return uid_hash[node] 45 | 46 | global uid_count 47 | uid_count += 1 48 | uid_hash[node] = str(uid_count) 49 | return str(uid_count) 50 | 51 | 52 | def html_header(): 53 | 54 | install_path = get_install_path() 55 | 56 | js_filename = ''.join([install_path, 'nav.js']) 57 | js_file = open(js_filename, 'r') 58 | js_text = js_file.read() 59 | js_file.close() 60 | 61 | css_filename = ''.join([install_path, 'diff.css']) 62 | css_file = open(css_filename, 'r') 63 | css_text = css_file.read() 64 | css_file.close() 65 | 66 | out = [] 67 | out.append('\n') 68 | out.append('\n') 69 | out.append('\n') 70 | 71 | out.append('\n') 74 | 75 | out.append('\n') 78 | 79 | out.append('\n') 80 | out.append('\n') 81 | return ''.join(out) 82 | 83 | 84 | def html_footer(): 85 | out = [] 86 | out.append('\n') 87 | out.append('\n') 88 | return ''.join(out) 89 | 90 | 91 | def write_html(text, side): 92 | out = [] 93 | out.append('
') 94 | out.append('
')
 95 |     if side == 'left':
 96 |         out.append('')
 97 |     else:
 98 |         out.append('')
 99 | 
100 |     out.append(text)
101 |     out.append('
') 102 | out.append('
') 103 | return ''.join(out) 104 | 105 | 106 | def htmlize(changes, file1, file2, text1, text2): 107 | tags1 = change_tags(changes, 'left') 108 | tags2 = change_tags(changes, 'right') 109 | tagged_text1 = apply_tags(text1, tags1) 110 | tagged_text2 = apply_tags(text2, tags2) 111 | 112 | outname = base_name(file1) + '-' + base_name(file2) + '.html' 113 | outfile = open(outname, 'w') 114 | outfile.write(html_header()) 115 | outfile.write(write_html(tagged_text1, 'left')) 116 | outfile.write(write_html(tagged_text2, 'right')) 117 | outfile.write(html_footer()) 118 | outfile.close() 119 | 120 | 121 | 122 | # put the tags generated by change_tags into the text and create HTML 123 | def apply_tags(s, tags): 124 | tags = sorted(tags, key = lambda t: (t.idx, -t.start)) 125 | curr = 0 126 | out = [] 127 | for t in tags: 128 | while curr < t.idx and curr < len(s): 129 | out.append(escape(s[curr])) 130 | curr += 1 131 | out.append(t.tag) 132 | 133 | while curr < len(s): 134 | out.append(escape(s[curr])) 135 | curr += 1 136 | return ''.join(out) 137 | 138 | 139 | 140 | 141 | #--------------------- tag generation functions ---------------------- 142 | 143 | def change_tags(changes, side): 144 | tags = [] 145 | for c in changes: 146 | key = c.orig if side == 'left' else c.cur 147 | if hasattr(key, 'lineno'): 148 | start = node_start(key) 149 | end = node_end(key) 150 | 151 | if c.orig != None and c.cur != None: 152 | # for change and move 153 | tags.append(Tag(link_start(c, side), start)) 154 | tags.append(Tag("", end, start)) 155 | else: 156 | # for deletion and insertion 157 | tags.append(Tag(span_start(c), start)) 158 | tags.append(Tag('', end, start)) 159 | 160 | return tags 161 | 162 | 163 | def change_class(change): 164 | if (change.cur == None): 165 | return 'd' 166 | elif (change.orig == None): 167 | return 'i' 168 | elif (change.cost > 0): 169 | return 'c' 170 | else: 171 | return 'u' 172 | 173 | 174 | def span_start(change): 175 | return '' 176 | 177 | 178 | def link_start(change, side): 179 | cls = change_class(change) 180 | 181 | if side == 'left': 182 | me, other = change.orig, change.cur 183 | else: 184 | me, other = change.cur, change.orig 185 | 186 | return ('') 190 | 191 | def qs(s): 192 | return "'" + s + "'" 193 | -------------------------------------------------------------------------------- /improve_ast.py: -------------------------------------------------------------------------------- 1 | #------------------------------------------------------------- 2 | # improvements to the AST 3 | #------------------------------------------------------------- 4 | 5 | import sys 6 | 7 | from ast import * 8 | from utils import * 9 | from parameters import * 10 | 11 | # Is it Python 3? 12 | python3 = (sys.version_info.major == 3) 13 | 14 | allNodes1 = set() 15 | allNodes2 = set() 16 | 17 | def improve_node(node, s, idxmap, filename, side): 18 | 19 | if isinstance(node, list): 20 | for n in node: 21 | improve_node(n, s, idxmap, filename, side) 22 | 23 | elif isinstance(node, AST): 24 | 25 | if side == 'left': 26 | allNodes1.add(node) 27 | else: 28 | allNodes2.add(node) 29 | 30 | find_node_start(node, s, idxmap) 31 | find_node_end(node, s, idxmap) 32 | add_missing_names(node, s, idxmap) 33 | 34 | node.node_source = s 35 | node.fileName = filename 36 | 37 | for f in node_fields(node): 38 | improve_node(f, s, idxmap, filename, side) 39 | 40 | 41 | 42 | def improve_ast(node, s, filename, side): 43 | idxmap = build_index_map(s) 44 | improve_node(node, s, idxmap, filename, side) 45 | 46 | 47 | 48 | 49 | #------------------------------------------------------------- 50 | # finding start and end index of nodes 51 | #------------------------------------------------------------- 52 | 53 | def find_node_start(node, s, idxmap): 54 | ret = None # default value 55 | 56 | if hasattr(node, 'node_start'): 57 | ret = node.node_start 58 | 59 | elif isinstance(node, list): 60 | if node != []: 61 | ret = find_node_start(node[0], s, idxmap) 62 | 63 | elif isinstance(node, Module): 64 | if node.body != []: 65 | ret = find_node_start(node.body[0], s, idxmap) 66 | 67 | elif isinstance(node, BinOp): 68 | leftstart = find_node_start(node.left, s, idxmap) 69 | if leftstart != None: 70 | ret = leftstart 71 | else: 72 | ret = map_idx(idxmap, node.lineno, node.col_offset) 73 | 74 | elif hasattr(node, 'lineno'): 75 | if node.col_offset >= 0: 76 | ret = map_idx(idxmap, node.lineno, node.col_offset) 77 | else: # special case for """ strings 78 | i = map_idx(idxmap, node.lineno, node.col_offset) 79 | while i > 0 and i+2 < len(s) and s[i:i+3] != '"""': 80 | i -= 1 81 | ret = i 82 | else: 83 | return None 84 | 85 | if ret == None and hasattr(node, 'lineno'): 86 | raise TypeError("got None for node that has lineno", node) 87 | 88 | if isinstance(node, AST) and ret != None: 89 | node.node_start = ret 90 | 91 | return ret 92 | 93 | 94 | 95 | 96 | def find_node_end(node, s, idxmap): 97 | 98 | the_end = None 99 | 100 | if hasattr(node, 'node_end'): 101 | return node.node_end 102 | 103 | elif isinstance(node, list): 104 | if node != []: 105 | the_end = find_node_end(node[-1], s, idxmap) 106 | 107 | elif isinstance(node, Module): 108 | if node.body != []: 109 | the_end = find_node_end(node.body[-1], s, idxmap) 110 | 111 | elif isinstance(node, Expr): 112 | the_end = find_node_end(node.value, s, idxmap) 113 | 114 | elif isinstance(node, Str): 115 | i = find_node_start(node, s, idxmap) 116 | if i+2 < len(s) and s[i:i+3] == '"""': 117 | q = '"""' 118 | i += 3 119 | elif s[i] == '"': 120 | q = '"' 121 | i += 1 122 | elif s[i] == "'": 123 | q = "'" 124 | i += 1 125 | else: 126 | # print "illegal:", i, s[i] 127 | q = '' 128 | 129 | if q != '': 130 | the_end = end_seq(s, q, i) 131 | 132 | elif isinstance(node, Name): 133 | the_end = find_node_start(node, s, idxmap) + len(node.id) 134 | 135 | elif isinstance(node, Attribute): 136 | the_end = end_seq(s, node.attr, find_node_end(node.value, s, idxmap)) 137 | 138 | elif isinstance(node, FunctionDef): 139 | the_end = find_node_end(node.body, s, idxmap) 140 | 141 | elif isinstance(node, Lambda): 142 | the_end = find_node_end(node.body, s, idxmap) 143 | 144 | elif isinstance(node, ClassDef): 145 | the_end = find_node_end(node.body, s, idxmap) 146 | 147 | # print will be a Call in Python 3 148 | elif not python3 and isinstance(node, Print): 149 | the_end = start_seq(s, '\n', find_node_start(node, s, idxmap)) 150 | 151 | elif isinstance(node, Call): 152 | start = find_node_end(node.func, s, idxmap) 153 | if start != None: 154 | the_end = match_paren(s, '(', ')', start) 155 | 156 | elif isinstance(node, Yield): 157 | the_end = find_node_end(node.value, s, idxmap) 158 | 159 | elif isinstance(node, Return): 160 | if node.value != None: 161 | the_end = find_node_end(node.value, s, idxmap) 162 | else: 163 | the_end = find_node_start(node, s, idxmap) + len('return') 164 | 165 | elif (isinstance(node, For) or 166 | isinstance(node, While) or 167 | isinstance(node, If) or 168 | isinstance(node, IfExp)): 169 | if node.orelse != []: 170 | the_end = find_node_end(node.orelse, s, idxmap) 171 | else: 172 | the_end = find_node_end(node.body, s, idxmap) 173 | 174 | elif isinstance(node, Assign) or isinstance(node, AugAssign): 175 | the_end = find_node_end(node.value, s, idxmap) 176 | 177 | elif isinstance(node, BinOp): 178 | the_end = find_node_end(node.right, s, idxmap) 179 | 180 | elif isinstance(node, BoolOp): 181 | the_end = find_node_end(node.values[-1], s, idxmap) 182 | 183 | elif isinstance(node, Compare): 184 | the_end = find_node_end(node.comparators[-1], s, idxmap) 185 | 186 | elif isinstance(node, UnaryOp): 187 | the_end = find_node_end(node.operand, s, idxmap) 188 | 189 | elif isinstance(node, Num): 190 | the_end = find_node_start(node, s, idxmap) + len(str(node.n)) 191 | 192 | elif isinstance(node, List): 193 | the_end = match_paren(s, '[', ']', find_node_start(node, s, idxmap)); 194 | 195 | elif isinstance(node, Subscript): 196 | the_end = match_paren(s, '[', ']', find_node_start(node, s, idxmap)); 197 | 198 | elif isinstance(node, Tuple): 199 | if node.elts != []: 200 | the_end = find_node_end(node.elts[-1], s, idxmap) 201 | 202 | elif isinstance(node, Dict): 203 | the_end = match_paren(s, '{', '}', find_node_start(node, s, idxmap)); 204 | 205 | elif ((not python3 and isinstance(node, TryExcept)) or 206 | (python3 and isinstance(node, Try))): 207 | if node.orelse != []: 208 | the_end = find_node_end(node.orelse, s, idxmap) 209 | elif node.handlers != []: 210 | the_end = find_node_end(node.handlers, s, idxmap) 211 | else: 212 | the_end = find_node_end(node.body, s, idxmap) 213 | 214 | elif isinstance(node, ExceptHandler): 215 | the_end = find_node_end(node.body, s, idxmap) 216 | 217 | elif isinstance(node, Pass): 218 | the_end = find_node_start(node, s, idxmap) + len('pass') 219 | 220 | elif isinstance(node, Break): 221 | the_end = find_node_start(node, s, idxmap) + len('break') 222 | 223 | elif isinstance(node, Continue): 224 | the_end = find_node_start(node, s, idxmap) + len('continue') 225 | 226 | elif isinstance(node, Global): 227 | the_end = start_seq(s, '\n', find_node_start(node, s, idxmap)) 228 | 229 | elif isinstance(node, Import): 230 | the_end = find_node_start(node, s, idxmap) + len('import') 231 | 232 | elif isinstance(node, ImportFrom): 233 | the_end = find_node_start(node, s, idxmap) + len('from') 234 | 235 | else: # can't determine node end, set to 3 chars after start 236 | start = find_node_start(node, s, idxmap) 237 | if start != None: 238 | the_end = start + 3 239 | 240 | if isinstance(node, AST) and the_end != None: 241 | node.node_end = the_end 242 | 243 | return the_end 244 | 245 | 246 | 247 | 248 | #------------------------------------------------------------- 249 | # adding missing Names 250 | #------------------------------------------------------------- 251 | 252 | def add_missing_names(node, s, idxmap): 253 | 254 | if hasattr(node, 'extraAttribute'): 255 | return 256 | 257 | if isinstance(node, list): 258 | for n in node: 259 | add_missing_names(n, s, idxmap) 260 | 261 | elif isinstance(node, ClassDef): 262 | start = find_node_start(node, s, idxmap) + len('class') 263 | if start != None: 264 | node.name_node = str_to_name(s, start, idxmap) 265 | node._fields += ('name_node',) 266 | 267 | elif isinstance(node, FunctionDef): 268 | start = find_node_start(node, s, idxmap) + len('def') 269 | if start != None: 270 | node.name_node = str_to_name(s, start, idxmap) 271 | node._fields += ('name_node',) 272 | 273 | # keyword_start = find_node_start(node, s, idxmap) 274 | # node.keyword_node = str_to_name(s, keyword_start, idxmap) 275 | # node._fields += ('keyword_node',) 276 | 277 | if node.args.vararg != None: 278 | if len(node.args.args) > 0: 279 | vstart = find_node_end(node.args.args[-1], s, idxmap) 280 | else: 281 | vstart = find_node_end(node.name_node, s, idxmap) 282 | if vstart != None: 283 | vname = str_to_name(s, vstart, idxmap) 284 | node.vararg_name = vname 285 | else: 286 | node.vararg_name = None 287 | node._fields += ('vararg_name',) 288 | 289 | if node.args.kwarg != None: 290 | if len(node.args.args) > 0: 291 | kstart = find_node_end(node.args.args[-1], s, idxmap) 292 | else: 293 | kstart = find_node_end(node.vararg_name, s, idxmap) 294 | if kstart: 295 | kname = str_to_name(s, kstart, idxmap) 296 | node.kwarg_name = kname 297 | else: 298 | node.kwarg_name = None 299 | node._fields += ('kwarg_name',) 300 | 301 | elif isinstance(node, Attribute): 302 | start = find_node_end(node.value, s, idxmap) 303 | if start != None: 304 | name = str_to_name(s, start, idxmap) 305 | node.attr_name = name 306 | node._fields = ('value', 'attr_name') # remove attr for node size accuracy 307 | 308 | elif isinstance(node, Compare): 309 | start = find_node_start(node, s, idxmap) 310 | if start != None: 311 | node.opsName = convert_ops(node.ops, s, start, idxmap) 312 | node._fields += ('opsName',) 313 | 314 | elif (isinstance(node, BoolOp) or 315 | isinstance(node, BinOp) or 316 | isinstance(node, UnaryOp) or 317 | isinstance(node, AugAssign)): 318 | if hasattr(node, 'left'): 319 | start = find_node_end(node.left, s, idxmap) 320 | else: 321 | start = find_node_start(node, s, idxmap) 322 | if start != None: 323 | ops = convert_ops([node.op], s, start, idxmap) 324 | else: 325 | ops = [] 326 | if ops != []: 327 | node.op_node = ops[0] 328 | node._fields += ('op_node',) 329 | 330 | elif isinstance(node, Import): 331 | name_nodes = [] 332 | next = find_node_start(node, s, idxmap) + len('import') 333 | if next != None: 334 | name = str_to_name(s, next, idxmap) 335 | while name != None and next < len(s) and s[next] != '\n': 336 | name_nodes.append(name) 337 | next = name.node_end 338 | name = str_to_name(s, next, idxmap) 339 | node.name_nodes = name_nodes 340 | node._fields += ('name_nodes',) 341 | 342 | node.extraAttribute = True 343 | 344 | 345 | 346 | #------------------------------------------------------------- 347 | # utilities used by improve AST functions 348 | #------------------------------------------------------------- 349 | 350 | # find a sequence in a string s, returning the start point 351 | def start_seq(s, pat, start): 352 | try: 353 | return s.index(pat, start) 354 | except ValueError: 355 | return len(s) 356 | 357 | 358 | 359 | # find a sequence in a string s, returning the end point 360 | def end_seq(s, pat, start): 361 | try: 362 | return s.index(pat, start) + len(pat) 363 | except ValueError: 364 | return len(s) 365 | 366 | 367 | 368 | # find matching close paren from start 369 | def match_paren(s, open, close, start): 370 | while start < len(s) and s[start] != open: 371 | start += 1 372 | if start >= len(s): 373 | return len(s) 374 | 375 | left = 1 376 | i = start + 1 377 | while left > 0 and i < len(s): 378 | if s[i] == open: 379 | left += 1 380 | elif s[i] == close: 381 | left -= 1 382 | i += 1 383 | return i 384 | 385 | 386 | 387 | # build table for lineno <-> index oonversion 388 | def build_index_map(s): 389 | line = 0 390 | col = 0 391 | idx = 0 392 | idxmap = [0] 393 | while idx < len(s): 394 | if s[idx] == '\n': 395 | idxmap.append(idx + 1) 396 | line += 1 397 | idx += 1 398 | return idxmap 399 | 400 | 401 | 402 | # convert (line, col) to offset index 403 | def map_idx(idxmap, line, col): 404 | return idxmap[line-1] + col 405 | 406 | 407 | 408 | # convert offset index into (line, col) 409 | def map_line_col(idxmap, idx): 410 | line = 0 411 | for start in idxmap: 412 | if idx < start: 413 | break 414 | line += 1 415 | col = idx - idxmap[line-1] 416 | return (line, col) 417 | 418 | 419 | 420 | # convert string to Name 421 | def str_to_name(s, start, idxmap): 422 | i = start; 423 | while i < len(s) and not is_alpha(s[i]): 424 | i = i + 1 425 | name_start = i 426 | 427 | ret = [] 428 | while i < len(s) and is_alpha(s[i]): 429 | ret.append(s[i]) 430 | i += 1 431 | name_end = i 432 | 433 | id1 = ''.join(ret) 434 | if id1 == '': 435 | return None 436 | else: 437 | name = Name(id1, None) 438 | name.node_start = name_start 439 | name.node_end = name_end 440 | name.lineno, name.col_offset = map_line_col(idxmap, name_start) 441 | return name 442 | 443 | 444 | 445 | def convert_ops(ops, s, start, idxmap): 446 | syms = [] 447 | for op in ops: 448 | if type(op) in ops_map: 449 | syms.append(ops_map[type(op)]) 450 | else: 451 | print("[WARNING] operator %s is missing from ops_map, " \ 452 | "please report the bug on GitHub" % op) 453 | 454 | i = start 455 | j = 0 456 | ret = [] 457 | while i < len(s) and j < len(syms): 458 | oplen = len(syms[j]) 459 | if s[i:i+oplen] == syms[j]: 460 | op_node = Name(syms[j], None) 461 | op_node.node_start = i 462 | op_node.node_end = i+oplen 463 | op_node.lineno, op_node.col_offset = map_line_col(idxmap, i) 464 | ret.append(op_node) 465 | j += 1 466 | i = op_node.node_end 467 | else: 468 | i += 1 469 | return ret 470 | 471 | 472 | # lookup table for operators for convert_ops 473 | ops_map = { 474 | # compare: 475 | Eq : '==', 476 | NotEq : '!=', 477 | LtE : '<=', 478 | Lt : '<', 479 | GtE : '>=', 480 | Gt : '>', 481 | NotIn : 'not in', 482 | In : 'in', 483 | IsNot : 'is not', 484 | Is : 'is', 485 | 486 | # BoolOp 487 | Or : 'or', 488 | And : 'and', 489 | Not : 'not', 490 | Invert : '~', 491 | 492 | # bit operators 493 | BitOr : '|', 494 | BitAnd : '&', 495 | BitXor : '^', 496 | RShift : '>>', 497 | LShift : '<<', 498 | 499 | 500 | # BinOp 501 | Add : '+', 502 | Sub : '-', 503 | Mult : '*', 504 | Div : '/', 505 | FloorDiv : '//', 506 | Mod : '%', 507 | Pow : '**', 508 | 509 | # UnaryOp 510 | USub : '-', 511 | UAdd : '+', 512 | } 513 | -------------------------------------------------------------------------------- /lists.py: -------------------------------------------------------------------------------- 1 | 2 | 3 | #################################################################### 4 | ## lists 5 | #################################################################### 6 | class PairIterator: 7 | def __init__(self, p): 8 | self.p = p 9 | def next(self): 10 | if self.p == nil: 11 | raise StopIteration 12 | ret = self.p.fst 13 | self.p = self.p.snd 14 | return ret 15 | 16 | class Nil: 17 | def __repr__(self): 18 | return "()" 19 | def __iter__(self): 20 | return PairIterator(self) 21 | 22 | nil = Nil() 23 | 24 | class Pair: 25 | def __init__(self, fst, snd): 26 | self.fst = fst 27 | self.snd = snd 28 | def __repr__(self): 29 | if (self.snd == nil): 30 | return "(" + repr(self.fst) + ")" 31 | elif (isinstance(self.snd, Pair)): 32 | s = repr(self.snd) 33 | return "(" + repr(self.fst) + " " + s[1:-1] + ")" 34 | else: 35 | return "(" + repr(self.fst) + " . " + repr(self.snd) + ")" 36 | def __iter__(self): 37 | return PairIterator(self) 38 | def __eq__(self, other): 39 | if not isinstance(other, Pair): 40 | return False 41 | else: 42 | return self.fst == other.fst and self.snd == other.snd 43 | 44 | 45 | 46 | def loner(u): 47 | return Pair(u, nil) 48 | 49 | 50 | def foldl(f, x, ls): 51 | ret = x 52 | for y in ls: 53 | ret = f(ret, y) 54 | return ret 55 | 56 | def length(ls): 57 | ret = 0 58 | for x in ls: 59 | ret = ret + 1 60 | return ret 61 | 62 | def remove(x, ls): 63 | ret = nil 64 | for y in ls: 65 | if x <> y: 66 | ret = Pair(y, ret) 67 | return reverse(ret) 68 | 69 | def assoc(u, v): 70 | return Pair(Pair(u, v), nil) 71 | 72 | def slist(pylist): 73 | ret = nil 74 | for i in xrange(len(pylist)): 75 | ret = Pair(pylist[len(pylist)-i-1], ret) 76 | return ret 77 | 78 | def pylist(ls): 79 | ret = [] 80 | for x in ls: 81 | ret.append(x) 82 | return ret 83 | 84 | 85 | def maplist(f, ls): 86 | ret = nil 87 | for x in ls: 88 | ret = Pair(f(x), ret) 89 | return reverse(ret) 90 | 91 | 92 | def reverse(ls): 93 | ret = nil 94 | for x in ls: 95 | ret = Pair(x, ret) 96 | return ret 97 | 98 | 99 | def filterlist(f, ls): 100 | ret = nil 101 | for x in ls: 102 | if f(x): 103 | ret = Pair(x, ret) 104 | return reverse(ret) 105 | 106 | 107 | # def append(*lists): 108 | # ret = nil 109 | # i = 0 110 | # while i < len(lists): 111 | # ls = lists[i] 112 | # while ls <> nil: 113 | # ret = Pair(ls.fst, ret) 114 | # ls = ls.snd 115 | # i += 1 116 | # return ret 117 | 118 | 119 | def append(*lists): 120 | def append1(ls1, ls2): 121 | ret = ls2 122 | for x in ls1: 123 | ret = Pair(x, ret) 124 | return ret 125 | return foldl(append1, nil, slist(lists)) 126 | 127 | 128 | def assq(x, s): 129 | for p in s: 130 | if x == p.fst: 131 | return p 132 | return None 133 | 134 | 135 | def ziplist(ls1, ls2): 136 | ret = nil 137 | while ls1 <> nil and ls2 <> nil: 138 | ret = Pair(Pair(ls1.fst, ls2.fst), ret) 139 | ls1 = ls1.snd 140 | ls2 = ls2.snd 141 | return reverse(ret) 142 | 143 | 144 | # building association lists 145 | def ext(x, v, s): 146 | return Pair(Pair(x, v), s) 147 | 148 | 149 | def lookup(x, s): 150 | p = assq(x, s) 151 | if p <> None: 152 | return p.snd 153 | else: 154 | return None 155 | 156 | -------------------------------------------------------------------------------- /nav.js: -------------------------------------------------------------------------------- 1 | // convenience function for document.getElementById(). 2 | window['$']=function(a){return document.getElementById(a)}; 3 | 4 | 5 | /////////////////////// debug flag //////////////////////// 6 | var debug = false; 7 | 8 | 9 | /////////////////////// adjustable parameters ////////////////// 10 | var minStep = 10; 11 | var nSteps = 30; 12 | var stepInterval = 10; 13 | var blockRange = 5; // how far consider one page blocked 14 | var nodeHLColor = '#C9B0A9'; 15 | var lineHLColor = '#FFFF66'; 16 | var lineBlockedColor = '#E9AB17'; 17 | var bgColor = ''; 18 | var bodyBlockedColor = '#FAF0E6'; 19 | 20 | 21 | ///////////////////////// globals //////////////////////// 22 | var eventCount = { 'left' : 0, 'right' : 0}; 23 | var moving = false; 24 | var matchId1 = 'leftstart'; 25 | var matchId2 = 'rightstart'; 26 | var matchLineId1 = -1; 27 | var matchLineId2 = -1; 28 | var cTimeout; 29 | 30 | 31 | ///////////////////////// utilities /////////////////////// 32 | 33 | // No Math.sign() in JS? 34 | function sign(x) { 35 | if (x > 0) { 36 | return 1; 37 | } else if (x < 0) { 38 | return -1; 39 | } else { 40 | return 0; 41 | } 42 | } 43 | 44 | 45 | function log(msg) { 46 | if (debug) { 47 | console.log(msg); 48 | } 49 | } 50 | 51 | 52 | 53 | function elementPosition(id) { 54 | obj = $(id); 55 | var curleft = 0, curtop = 0; 56 | 57 | if (obj && obj.offsetParent) { 58 | curleft = obj.offsetLeft; 59 | curtop = obj.offsetTop; 60 | 61 | while (obj = obj.offsetParent) { 62 | curleft += obj.offsetLeft; 63 | curtop += obj.offsetTop; 64 | } 65 | } 66 | 67 | return { x: curleft, y: curtop }; 68 | } 69 | 70 | 71 | /* 72 | * Scroll the window to relative position, detecting blocking positions. 73 | */ 74 | function scrollWithBlockCheck(container, distX, distY) { 75 | var oldTop = container.scrollTop; 76 | var oldLeft = container.scrollLeft; 77 | 78 | container.scrollTop += distY; // the ONLY place for actual scrolling 79 | container.scrollLeft += distX; 80 | 81 | var actualX = container.scrollLeft - oldLeft; 82 | var actualY = container.scrollTop - oldTop; 83 | log("distY=" + distY + ", actualY=" + actualY); 84 | log("distX=" + distX + ", actualX=" + actualX); 85 | 86 | // extra leewaw here because Chrome scrolling is horribly inacurate 87 | if ((Math.abs(distX) > blockRange && actualX === 0) 88 | || Math.abs(distY) > blockRange && actualY === 0) { 89 | log("blocked"); 90 | container.style.backgroundColor = bodyBlockedColor; 91 | return true; 92 | } else { 93 | eventCount[container.id] += 1; 94 | container.style.backgroundColor = bgColor; 95 | return false; 96 | } 97 | } 98 | 99 | 100 | function getContainer(elm) { 101 | while (elm && elm.tagName !== 'DIV') { 102 | elm = elm.parentElement || elm.parentNode; 103 | } 104 | return elm; 105 | } 106 | 107 | 108 | /* 109 | * timed animation function for scrolling the current window 110 | */ 111 | function matchWindow(linkId, targetId, n) 112 | { 113 | moving = true; 114 | 115 | var link = $(linkId); 116 | var target = $(targetId); 117 | var linkContainer = getContainer(link); 118 | var targetContainer = getContainer(target); 119 | 120 | var linkPos = elementPosition(linkId).y - linkContainer.scrollTop; 121 | var targetPos = elementPosition(targetId).y - targetContainer.scrollTop; 122 | var distY = targetPos - linkPos; 123 | var distX = linkContainer.scrollLeft - targetContainer.scrollLeft; 124 | 125 | 126 | log("matching window... " + n + " distY=" + distY + " distX=" + distX); 127 | 128 | if (distY === 0 && distX === 0) { 129 | clearTimeout(cTimeout); 130 | moving = false; 131 | } else if (n <= 1) { 132 | scrollWithBlockCheck(targetContainer, distX, distY); 133 | moving = false; 134 | } else { 135 | var stepSize = Math.floor(Math.abs(distY) / n); 136 | actualMinStep = Math.min(minStep, Math.abs(distY)); 137 | if (Math.abs(stepSize) < minStep) { 138 | var step = actualMinStep * sign(distY); 139 | } else { 140 | var step = stepSize * sign(distY); 141 | } 142 | var blocked = scrollWithBlockCheck(targetContainer, distX, step); 143 | var rest = Math.floor(distY / step) - 1; 144 | log("blocked?" + blocked + ", rest steps=" + rest); 145 | if (!blocked) { 146 | cTimeout = setTimeout(function () { 147 | return matchWindow(linkId, targetId, rest); 148 | }, stepInterval); 149 | } else { 150 | clearTimeout(cTimeout); 151 | moving = false; 152 | } 153 | } 154 | } 155 | 156 | 157 | function showArrow(linkId, targetId) 158 | { 159 | var link = $(linkId); 160 | var target = $(targetId); 161 | var linkContainer = getContainer(link); 162 | var targetContainer = getContainer(target); 163 | 164 | var linkPos = elementPosition(linkId).y - linkContainer.scrollTop; 165 | var targetPos = elementPosition(targetId).y - targetContainer.scrollTop; 166 | var distY = targetPos - linkPos; 167 | var distX = linkContainer.scrollLeft - targetContainer.scrollLeft; 168 | 169 | 170 | log("targetPos = " + targetPos); 171 | } 172 | 173 | 174 | ////////////////////////// highlighting ///////////////////////////// 175 | 176 | var highlighted = [] 177 | function putHighlight(id, color) { 178 | var elm = $(id); 179 | if (elm !== null) { 180 | elm.style.backgroundColor = color; 181 | if (color !== bgColor) { 182 | highlighted.push(id); 183 | } 184 | } 185 | } 186 | 187 | 188 | function clearHighlight() { 189 | for (i = 0; i < highlighted.length; i += 1) { 190 | putHighlight(highlighted[i], bgColor); 191 | } 192 | highlighted = []; 193 | } 194 | 195 | 196 | 197 | /* 198 | * Highlight the link, target nodes and their lines, 199 | * then start animation to move the other window to match. 200 | */ 201 | function highlight(me, linkId, targetId) 202 | { 203 | if (me.id === 'left') { 204 | matchId1 = linkId; 205 | matchId2 = targetId; 206 | } else { 207 | matchId1 = targetId; 208 | matchId2 = linkId; 209 | } 210 | 211 | clearHighlight(); 212 | 213 | putHighlight(linkId, nodeHLColor); 214 | putHighlight(targetId, nodeHLColor); 215 | } 216 | 217 | 218 | function instantMoveOtherWindow (me) { 219 | log("me=" + me.id + ", eventcount=" + eventCount[me.id]); 220 | log("matchId1=" + matchId1 + ", matchId2=" + matchId2); 221 | 222 | me.style.backgroundColor = bgColor; 223 | 224 | if (!moving && eventCount[me.id] === 0) { 225 | if (me.id === 'left') { 226 | matchWindow(matchId1, matchId2, 1); 227 | } else { 228 | matchWindow(matchId2, matchId1, 1); 229 | } 230 | } 231 | if (eventCount[me.id] > 0) { 232 | eventCount[me.id] -= 1; 233 | } 234 | } 235 | 236 | 237 | function getTarget(x){ 238 | x = x || window.event; 239 | return x.target || x.srcElement; 240 | } 241 | 242 | 243 | window.onload = 244 | function (e) { 245 | var tags = document.getElementsByTagName("A") 246 | for (var i = 0; i < tags.length; i++) { 247 | tags[i].onmouseover = 248 | function (e) { 249 | var t = getTarget(e) 250 | var lid = t.id 251 | var tid = t.getAttribute('tid') 252 | var container = getContainer(t) 253 | highlight(container, lid, tid) 254 | showArrow(lid, tid) 255 | } 256 | tags[i].onclick = 257 | function (e) { 258 | var t = getTarget(e) 259 | var lid = t.id 260 | var tid = t.getAttribute('tid') 261 | var container = getContainer(t) 262 | highlight(container, lid, tid) 263 | matchWindow(lid, tid, nSteps) 264 | } 265 | } 266 | 267 | tags = document.getElementsByTagName("DIV") 268 | for (var i = 0; i < tags.length; i++) { 269 | tags[i].onscroll = 270 | function (e) { 271 | instantMoveOtherWindow(getTarget(e)) 272 | } 273 | } 274 | 275 | } 276 | -------------------------------------------------------------------------------- /parameters.py: -------------------------------------------------------------------------------- 1 | #------------------------------------------------------------- 2 | # global parameters 3 | #------------------------------------------------------------- 4 | 5 | DEBUG = False 6 | # sys.setrecursionlimit(10000) 7 | 8 | 9 | MOVE_RATIO = 0.2 10 | MOVE_SIZE = 10 11 | MOVE_ROUND = 5 12 | 13 | FRAME_DEPTH = 1 14 | FRAME_SIZE = 20 15 | 16 | allNodes1 = set() 17 | allNodes2 = set() 18 | 19 | -------------------------------------------------------------------------------- /psydiff.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import sys 4 | import time 5 | import cProfile 6 | 7 | from ast import * 8 | 9 | from parameters import * 10 | from improve_ast import * 11 | from htmlize import * 12 | from utils import * 13 | 14 | 15 | 16 | #------------------------------- types ------------------------------ 17 | class Stat: 18 | "storage for stat counters" 19 | def __init__(self): 20 | self.reset() 21 | 22 | def reset(self): 23 | self.diff_count = 0 24 | self.move_count = 0 25 | self.move_savings = 0 26 | 27 | def add_moves(self, nterms): 28 | self.move_savings += nterms 29 | self.move_count +=1 30 | if self.move_count % 1000 == 0: 31 | dot() 32 | def add_diff(self): 33 | self.diff_count += 1 34 | if stat.diff_count % 1000 == 0: 35 | dot() 36 | 37 | stat = Stat() 38 | 39 | 40 | 41 | # The difference between nodes are stored as a Change structure. 42 | class Change: 43 | def __init__(self, orig, cur, cost, is_frame=False): 44 | self.orig = orig 45 | self.cur = cur 46 | if orig is None: 47 | self.cost = node_size(cur) 48 | elif cur is None: 49 | self.cost = node_size(orig) 50 | elif cost == 'all': 51 | self.cost = node_size(orig) + node_size(cur) 52 | else: 53 | self.cost = cost 54 | self.is_frame = is_frame 55 | def __repr__(self): 56 | fr = "F" if self.is_frame else "-" 57 | def hole(x): 58 | return [] if x==None else x 59 | return ("(C:" + str(hole(self.orig)) + ":" + str(hole(self.cur)) 60 | + ":" + str(self.cost) + ":" + str(self.similarity()) 61 | + ":" + fr + ")") 62 | def similarity(self): 63 | total = node_size(self.orig) + node_size(self.cur) 64 | return 1 - div(self.cost, total) 65 | 66 | 67 | 68 | # Three major kinds of changes: 69 | # * modification 70 | # * deletion 71 | # * insertion 72 | def mod_node(node1, node2, cost): 73 | return Change(node1, node2, cost) 74 | 75 | def del_node(node): 76 | return Change(node, None, node_size(node)) 77 | 78 | def ins_node(node): 79 | return Change(None, node, node_size(node)) 80 | 81 | 82 | # 2-D array table for memoization of dynamic programming 83 | def create_table(x, y): 84 | table = [] 85 | for i in range(x+1): 86 | table.append([None] * (y+1)) 87 | return table 88 | 89 | def table_lookup(t, x, y): 90 | return t[x][y] 91 | 92 | def table_put(t, x, y, v): 93 | t[x][y] = v 94 | 95 | 96 | 97 | 98 | 99 | #------------------------------------------------------------- 100 | # string distance function 101 | #------------------------------------------------------------- 102 | 103 | ### diff cache for AST nodes 104 | str_dist_cache = {} 105 | 106 | 107 | ### string distance function 108 | def str_dist(s1, s2): 109 | cached = str_dist_cache.get((s1, s2)) 110 | if cached is not None: 111 | return cached 112 | 113 | if len(s1) > 100 or len(s2) > 100: 114 | if s1 != s2: 115 | return 2.0 116 | else: 117 | return 0 118 | 119 | table = create_table(len(s1), len(s2)) 120 | d = dist1(table, s1, s2) 121 | ret = div(2*d, len(s1) + len(s2)) 122 | 123 | str_dist_cache[(s1, s2)]=ret 124 | return ret 125 | 126 | 127 | # the main dynamic programming part 128 | # similar to the structure of diff_list 129 | def dist1(table, s1, s2): 130 | def memo(v): 131 | table_put(table, len(s1), len(s2), v) 132 | return v 133 | 134 | cached = table_lookup(table, len(s1), len(s2)) 135 | if cached is not None: 136 | return cached 137 | 138 | if s1 == '': 139 | return memo(len(s2)) 140 | elif s2 == '': 141 | return memo(len(s1)) 142 | else: 143 | if s1[0] == s2[0]: 144 | d0 = 0 145 | elif s1[0].lower() == s2[0].lower(): 146 | d0 = 1 147 | else: 148 | d0 = 2 149 | 150 | d0 = d0 + dist1(table, s1[1:], s2[1:]) 151 | d1 = 1 + dist1(table, s1[1:], s2) 152 | d2 = 1 + dist1(table, s1, s2[1:]) 153 | return memo(min(d0, d1, d2)) 154 | 155 | 156 | 157 | 158 | #------------------------------------------------------------- 159 | # diff of nodes 160 | #------------------------------------------------------------- 161 | 162 | def diff_node(node1, node2, depth, move): 163 | 164 | # try substructural diff 165 | def trysub(cc): 166 | (changes, cost) = cc 167 | if not move: 168 | return (changes, cost) 169 | elif can_move(node1, node2, cost): 170 | return (changes, cost) 171 | else: 172 | mc1 = diff_subnode(node1, node2, depth, move) 173 | if mc1 is not None: 174 | return mc1 175 | else: 176 | return (changes, cost) 177 | 178 | if isinstance(node1, list) and not isinstance(node2, list): 179 | node2 = [node2] 180 | 181 | if not isinstance(node1, list) and isinstance(node2, list): 182 | node1 = [node1] 183 | 184 | if isinstance(node1, list) and isinstance(node2, list): 185 | node1 = serialize_if(node1) 186 | node2 = serialize_if(node2) 187 | table = create_table(len(node1), len(node2)) 188 | return diff_list(table, node1, node2, 0, move) 189 | 190 | # statistics 191 | stat.add_diff() 192 | 193 | if node1 == node2: 194 | return ([mod_node(node1, node2, 0)], 0) 195 | 196 | if isinstance(node1, Num) and isinstance(node2, Num): 197 | if node1.n == node2.n: 198 | return ([mod_node(node1, node2, 0)], 0) 199 | else: 200 | return ([mod_node(node1, node2, 1)], 1) 201 | 202 | if isinstance(node1, Str) and isinstance(node2, Str): 203 | cost = str_dist(node1.s, node2.s) 204 | return ([mod_node(node1, node2, cost)], cost) 205 | 206 | if (isinstance(node1, Name) and isinstance(node2, Name)): 207 | cost = str_dist(node1.id, node2.id) 208 | return ([mod_node(node1, node2, cost)], cost) 209 | 210 | if (isinstance(node1, Attribute) and isinstance(node2, Name) or 211 | isinstance(node1, Name) and isinstance(node2, Attribute) or 212 | isinstance(node1, Attribute) and isinstance(node2, Attribute)): 213 | s1 = attr_to_str(node1) 214 | s2 = attr_to_str(node2) 215 | if s1 is not None and s2 is not None: 216 | cost = str_dist(s1, s2) 217 | return ([mod_node(node1, node2, cost)], cost) 218 | # else fall through for things like f(x).y vs x.y 219 | 220 | if isinstance(node1, Module) and isinstance(node2, Module): 221 | return diff_node(node1.body, node2.body, depth, move) 222 | 223 | # same type of other AST nodes 224 | if (isinstance(node1, AST) and isinstance(node2, AST) and 225 | type(node1) == type(node2)): 226 | 227 | fs1 = node_fields(node1) 228 | fs2 = node_fields(node2) 229 | changes, cost = [], 0 230 | min_len = min(len(fs1), len(fs2)) 231 | 232 | for i in range(min_len): 233 | (m, c) = diff_node(fs1[i], fs2[i], depth, move) 234 | changes = m + changes 235 | cost += c 236 | 237 | # final all moves local to the node 238 | return find_moves((changes, cost)) 239 | 240 | if (type(node1) == type(node2) and 241 | is_empty_container(node1) and is_empty_container(node2)): 242 | return ([mod_node(node1, node2, 0)], 0) 243 | 244 | # all unmatched types and unequal values 245 | return trysub(([del_node(node1), ins_node(node2)], 246 | node_size(node1) + node_size(node2))) 247 | 248 | 249 | 250 | ########################## diff of a list ########################## 251 | 252 | # diff_list is the main part of dynamic programming 253 | 254 | def diff_list(table, ls1, ls2, depth, move): 255 | 256 | def memo(v): 257 | table_put(table, len(ls1), len(ls2), v) 258 | return v 259 | 260 | def guess(table, ls1, ls2): 261 | (m0, c0) = diff_node(ls1[0], ls2[0], depth, move) 262 | (m1, c1) = diff_list(table, ls1[1:], ls2[1:], depth, move) 263 | cost1 = c1 + c0 264 | 265 | if ((is_frame(ls1[0]) and 266 | is_frame(ls2[0]) and 267 | not node_framed(ls1[0], m0) and 268 | not node_framed(ls2[0], m0))): 269 | frame_change = [mod_node(ls1[0], ls2[0], c0)] 270 | else: 271 | frame_change = [] 272 | 273 | # short cut 1 (func and classes with same names) 274 | if can_move(ls1[0], ls2[0], c0): 275 | return (frame_change + m0 + m1, cost1) 276 | 277 | else: # do more work 278 | (m2, c2) = diff_list(table, ls1[1:], ls2, depth, move) 279 | (m3, c3) = diff_list(table, ls1, ls2[1:], depth, move) 280 | cost2 = c2 + node_size(ls1[0]) 281 | cost3 = c3 + node_size(ls2[0]) 282 | 283 | if (not different_def(ls1[0], ls2[0]) and 284 | cost1 <= cost2 and cost1 <= cost3): 285 | return (frame_change + m0 + m1, cost1) 286 | elif (cost2 <= cost3): 287 | return ([del_node(ls1[0])] + m2, cost2) 288 | else: 289 | return ([ins_node(ls2[0])] + m3, cost3) 290 | 291 | # cache look up 292 | cached = table_lookup(table, len(ls1), len(ls2)) 293 | if cached is not None: 294 | return cached 295 | 296 | if (ls1 == [] and ls2 == []): 297 | return memo(([], 0)) 298 | 299 | elif (ls1 != [] and ls2 != []): 300 | return memo(guess(table, ls1, ls2)) 301 | 302 | elif ls1 == []: 303 | d = [] 304 | for n in ls2: 305 | d = [ins_node(n)] + d 306 | return memo((d, node_size(ls2))) 307 | 308 | else: # ls2 == []: 309 | d = [] 310 | for n in ls1: 311 | d = [del_node(n)] + d 312 | return memo((d, node_size(ls1))) 313 | 314 | 315 | 316 | 317 | ###################### diff into a subnode ####################### 318 | 319 | # Subnode diff is only used in the moving phase. There is no 320 | # need to compare the substructure of two nodes in the first 321 | # run, because they will be reconsidered if we just consider 322 | # them to be complete deletion and insertions. 323 | 324 | def diff_subnode(node1, node2, depth, move): 325 | 326 | if (depth >= FRAME_DEPTH or 327 | node_size(node1) < FRAME_SIZE or 328 | node_size(node2) < FRAME_SIZE): 329 | return None 330 | 331 | if isinstance(node1, AST) and isinstance(node2, AST): 332 | 333 | if node_size(node1) == node_size(node2): 334 | return None 335 | 336 | if isinstance(node1, Expr): 337 | node1 = node1.value 338 | 339 | if isinstance(node2, Expr): 340 | node2 = node2.value 341 | 342 | if (node_size(node1) < node_size(node2)): 343 | for f in node_fields(node2): 344 | (m0, c0) = diff_node(node1, f, depth+1, move) 345 | if can_move(node1, f, c0): 346 | if not isinstance(f, list): 347 | m1 = [mod_node(node1, f, c0)] 348 | else: 349 | m1 = [] 350 | framecost = node_size(node2) - node_size(node1) 351 | m2 = [Change(None, node2, framecost, True)] 352 | return (m2 + m1 + m0, c0 + framecost) 353 | 354 | if (node_size(node1) > node_size(node2)): 355 | for f in node_fields(node1): 356 | (m0, c0) = diff_node(f, node2, depth+1, move) 357 | if can_move(f, node2, c0): 358 | framecost = node_size(node1) - node_size(node2) 359 | if not isinstance(f, list): 360 | m1 = [mod_node(f, node2, c0)] 361 | else: 362 | m1 = [] 363 | m2 = [Change(node1, None, framecost, True)] 364 | return (m2 + m1 + m0, c0 + framecost) 365 | 366 | return None 367 | 368 | 369 | 370 | 371 | ########################################################################## 372 | ## move detection 373 | ########################################################################## 374 | def move_candidate(node): 375 | return (is_def(node) or node_size(node) >= MOVE_SIZE) 376 | 377 | 378 | def match_up(changes, round=0): 379 | 380 | deletions = lfilter(lambda p: (p.cur is None and 381 | move_candidate(p.orig) and 382 | not p.is_frame), 383 | changes) 384 | 385 | insertions = lfilter(lambda p: (p.orig is None and 386 | move_candidate(p.cur) and 387 | not p.is_frame), 388 | changes) 389 | 390 | matched = [] 391 | new_changes = [] 392 | total = 0 393 | 394 | # find definition with the same names first 395 | for d0 in deletions: 396 | for a0 in insertions: 397 | (node1, node2) = (d0.orig, a0.cur) 398 | if same_def(node1, node2): 399 | matched.append(d0) 400 | matched.append(a0) 401 | deletions.remove(d0) 402 | insertions.remove(a0) 403 | 404 | (changes, cost) = diff_node(node1, node2, 0, True) 405 | nterms = node_size(node1) + node_size(node2) 406 | new_changes.extend(changes) 407 | total += cost 408 | 409 | if (not node_framed(node1, changes) and 410 | not node_framed(node2, changes) and 411 | is_def(node1) and is_def(node2)): 412 | new_changes.append(mod_node(node1, node2, cost)) 413 | stat.add_moves(nterms) 414 | break 415 | 416 | 417 | # match the rest of the deltas 418 | for d0 in deletions: 419 | for a0 in insertions: 420 | (node1, node2) = (d0.orig, a0.cur) 421 | (changes, cost) = diff_node(node1, node2, 0, True) 422 | nterms = node_size(node1) + node_size(node2) 423 | 424 | if (cost <= (node_size(node1) + node_size(node2)) * MOVE_RATIO or 425 | node_framed(node1, changes) or 426 | node_framed(node2, changes)): 427 | 428 | matched.append(d0) 429 | matched.append(a0) 430 | insertions.remove(a0) 431 | new_changes.extend(changes) 432 | total += cost 433 | 434 | if (not node_framed(node1, changes) and 435 | not node_framed(node2, changes) and 436 | is_def(node1) and is_def(node2)): 437 | new_changes.append(mod_node(node1, node2, cost)) 438 | stat.add_moves(nterms) 439 | break 440 | 441 | return (matched, new_changes, total) 442 | 443 | 444 | 445 | # Get moves repeatedly because new moves may introduce new 446 | # deletions and insertions. 447 | 448 | def find_moves(res): 449 | (changes, cost) = res 450 | matched = None 451 | move_round = 1 452 | 453 | while move_round <= MOVE_ROUND and matched != []: 454 | (matched, new_changes, c) = match_up(changes, move_round) 455 | move_round += 1 456 | changes = lfilter(lambda c: c not in matched, changes) 457 | changes.extend(new_changes) 458 | savings = sum(map(lambda p: node_size(p.orig) + node_size(p.cur), matched)) 459 | cost = cost + c - savings 460 | return (changes, cost) 461 | 462 | 463 | 464 | 465 | #------------------------------------------------------------- 466 | # main diff command 467 | #------------------------------------------------------------- 468 | 469 | def diff(file1, file2, move=True): 470 | 471 | print("File 1: %s" % file1) 472 | print("File 2: %s" % file2) 473 | print("Start time: %s, %s" % (time.ctime(), time.tzname[0])) 474 | start_time = time.time() 475 | checkpoint(start_time) 476 | 477 | cleanup() 478 | 479 | # base files names 480 | base1 = base_name(file1) 481 | base2 = base_name(file2) 482 | 483 | # get AST of file1 484 | f1 = open(file1, 'r'); 485 | lines1 = f1.read() 486 | f1.close() 487 | 488 | try: 489 | node1 = parse(lines1) 490 | except Exception: 491 | print('file %s cannot be parsed' % file1) 492 | exit(-1) 493 | 494 | improve_ast(node1, lines1, file1, 'left') 495 | 496 | # get AST of file2 497 | f2 = open(file2, 'r'); 498 | lines2 = f2.read() 499 | f2.close() 500 | 501 | try: 502 | node2 = parse(lines2) 503 | except Exception: 504 | print('file %s cannot be parsed' % file2) 505 | exit(-1) 506 | 507 | improve_ast(node2, lines2, file2, 'right') 508 | 509 | print("Parse finished in %s. Now start to diff." % sec_to_min(checkpoint())) 510 | 511 | # get the changes 512 | 513 | (changes, cost) = diff_node(node1, node2, 0, False) 514 | 515 | print("\n[diff] processed %d nodes in %s." 516 | % (stat.diff_count, sec_to_min(checkpoint()))) 517 | 518 | 519 | #---------------------- print final stats --------------------- 520 | size1 = node_size(node1) 521 | size2 = node_size(node2) 522 | total = size1 + size2 523 | 524 | report = "" 525 | report += ("\n--------------------- summary -----------------------") + "\n" 526 | report += ("- total changes (chars): %d" % cost) + "\n" 527 | report += ("- total code size: %d (left: %d right: %d)" 528 | % (total, size1, size2)) + "\n" 529 | report += ("- total moved pieces: %d" % stat.move_count) + "\n" 530 | report += ("- percentage of change: %.1f%%" 531 | % (div(cost, total) * 100)) + "\n" 532 | report += ("-----------------------------------------------------") + "\n" 533 | 534 | print(report) 535 | 536 | 537 | #---------------------- generation HTML --------------------- 538 | 539 | htmlize(changes, file1, file2, lines1, lines2) 540 | 541 | dur = time.time() - start_time 542 | print("\n[summary] Job finished at %s, %s" % 543 | (time.ctime(), time.tzname[0])) 544 | print("\n\tTotal duration: %s" % sec_to_min(dur)) 545 | 546 | 547 | 548 | 549 | def cleanup(): 550 | str_dist_cache.clear() 551 | clear_uid() 552 | 553 | global allNodes1, allNodes2 554 | allNodes1 = set() 555 | allNodes2 = set() 556 | 557 | stat.reset() 558 | 559 | 560 | 561 | def sec_to_min(s): 562 | if s < 60: 563 | return ("%.1f seconds" % s) 564 | else: 565 | return ("%.1f minutes" % div(s, 60)) 566 | 567 | 568 | 569 | last_checkpoint = None 570 | def checkpoint(init=None): 571 | import time 572 | global last_checkpoint 573 | if init is not None: 574 | last_checkpoint = init 575 | return None 576 | else: 577 | dur = time.time() - last_checkpoint 578 | last_checkpoint = time.time() 579 | return dur 580 | 581 | 582 | 583 | 584 | #------------------------------------------------------------- 585 | # text-based interfaces 586 | #------------------------------------------------------------- 587 | 588 | ## print the diffs as text 589 | def print_diff(file1, file2): 590 | (m, c) = diff_file(file1, file2) 591 | print("----------", file1, "<<<", c, ">>>", file2, "-----------") 592 | 593 | ms = m 594 | ms = sorted(ms, key=lambda d: node_start(d.orig)) 595 | print("\n-------------------- changes(", len(ms), ")---------------------- ") 596 | for m0 in ms: 597 | print(m0) 598 | 599 | print("\n------------------- end ----------------------- ") 600 | 601 | 602 | 603 | 604 | def diff_file(file1, file2): 605 | node1 = parse_file(file1) 606 | node2 = parse_file(file2) 607 | return diff_node(node1, node2, 0) 608 | 609 | 610 | def main(): 611 | ## if run under command line 612 | ## psydiff.py file1.py file2.py 613 | if len(sys.argv) == 3: 614 | file1 = sys.argv[1] 615 | file2 = sys.argv[2] 616 | diff(file1, file2) 617 | 618 | 619 | if __name__ == '__main__': 620 | main() 621 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | 3 | 4 | setup( 5 | name='psydiff', 6 | version='0.1', 7 | author='Yin Wang', 8 | description=('A structural differencer for Python. ' 9 | 'Parses Python into ASTs, compares them, ' 10 | 'and generates interactive HTML.'), 11 | packages=['psydiff'], 12 | package_dir={'psydiff': '.'}, 13 | package_data={'psydiff': ['diff.css', 'nav.js']}, 14 | entry_points={'console_scripts': ['psydiff = psydiff.psydiff:main']}, 15 | license='GNU GPLv3', 16 | url='https://github.com/yinwang0/psydiff', 17 | classifiers=[ 18 | 'Development Status :: 3 - Alpha', 19 | 'Environment :: Console', 20 | 'Environment :: Web Environment', 21 | 'Intended Audience :: Developers', 22 | 'License :: OSI Approved' 23 | ' :: GNU General Public License v3 or later (GPLv3+)', 24 | 'Operating System :: OS Independent', 25 | 'Programming Language :: Python :: 2', 26 | 'Programming Language :: Python :: 3', 27 | 'Topic :: Software Development', 28 | 'Topic :: Utilities' 29 | ] 30 | ) 31 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | #------------------------------------------------------------- 2 | # tests and operations on AST nodes 3 | #------------------------------------------------------------- 4 | import os 5 | import sys 6 | import re 7 | import cProfile 8 | 9 | from ast import * 10 | from parameters import * 11 | 12 | 13 | # get list of fields from a node 14 | def node_fields(node): 15 | ret = [] 16 | for field in node._fields: 17 | if field != 'ctx' and hasattr(node, field): 18 | ret.append(getattr(node, field)) 19 | return ret 20 | 21 | 22 | 23 | # get full source text where the node is from 24 | def node_source(node): 25 | if hasattr(node, 'node_source'): 26 | return node.node_source 27 | else: 28 | return None 29 | 30 | 31 | 32 | # utility for getting exact source code part of the node 33 | def src(node): 34 | return node.node_source[node.node_start : node.node_end] 35 | 36 | 37 | 38 | def node_start(node): 39 | if (hasattr(node, 'node_start')): 40 | return node.node_start 41 | else: 42 | return 0 43 | 44 | 45 | 46 | def node_end(node): 47 | if hasattr(node, 'node_end'): 48 | return node.node_end 49 | else: 50 | return None 51 | 52 | 53 | def is_atom(x): 54 | return type(x) in [int, str, bool, float] 55 | 56 | 57 | 58 | def is_def(node): 59 | return isinstance(node, FunctionDef) or isinstance(node, ClassDef) 60 | 61 | 62 | 63 | # whether a node is a "frame" which can contain others and be 64 | # labeled 65 | def is_frame(node): 66 | return type(node) in [ClassDef, FunctionDef, Import, ImportFrom] 67 | 68 | 69 | 70 | def is_empty_container(node): 71 | if isinstance(node, List) and node.elts == []: 72 | return True 73 | if isinstance(node, Tuple) and node.elts == []: 74 | return True 75 | if isinstance(node, Dict) and node.keys == []: 76 | return True 77 | 78 | return False 79 | 80 | 81 | def same_def(node1, node2): 82 | if isinstance(node1, FunctionDef) and isinstance(node2, FunctionDef): 83 | return node1.name == node2.name 84 | elif isinstance(node1, ClassDef) and isinstance(node2, ClassDef): 85 | return node1.name == node2.name 86 | else: 87 | return False 88 | 89 | 90 | def different_def(node1, node2): 91 | if is_def(node1) and is_def(node2): 92 | return node1.name != node2.name 93 | return False 94 | 95 | 96 | # decide whether it is reasonable to consider two nodes to be 97 | # moves of each other 98 | def can_move(node1, node2, cost): 99 | return (same_def(node1, node2) or 100 | cost <= (node_size(node1) + node_size(node2)) * MOVE_RATIO) 101 | 102 | 103 | # whether the node is considered deleted or inserted because 104 | # the other party matches a substructure of it. 105 | def node_framed(node, changes): 106 | for c in changes: 107 | if (c.is_frame and (node == c.orig or node == c.cur)): 108 | return True 109 | return False 110 | 111 | 112 | 113 | # helper for turning nested if statements into sequences, 114 | # otherwise we will be trapped in the nested structure and find 115 | # too many differences 116 | def serialize_if(node): 117 | if isinstance(node, If): 118 | if not hasattr(node, 'node_end'): 119 | print("has no end:", node) 120 | 121 | newif = If(node.test, node.body, []) 122 | newif.lineno = node.lineno 123 | newif.col_offset = node.col_offset 124 | newif.node_start = node.node_start 125 | newif.node_end = node.node_end 126 | newif.node_source = node.node_source 127 | newif.fileName = node.fileName 128 | return [newif] + serialize_if(node.orelse) 129 | elif isinstance(node, list): 130 | ret = [] 131 | for n in node: 132 | ret += serialize_if(n) 133 | return ret 134 | else: 135 | return [node] 136 | 137 | 138 | def node_name(node): 139 | if isinstance(node, Name): 140 | return node.id 141 | elif isinstance(node, FunctionDef) or isinstance(node, ClassDef): 142 | return node.name 143 | else: 144 | return None 145 | 146 | 147 | def attr_to_str(node): 148 | if isinstance(node, Attribute): 149 | vName = attr_to_str(node.value) 150 | if vName != None: 151 | return vName + "." + node.attr 152 | else: 153 | return None 154 | elif isinstance(node, Name): 155 | return node.id 156 | else: 157 | return None 158 | 159 | 160 | ### utility for counting size of terms 161 | def node_size(node, test=False): 162 | 163 | if not test and hasattr(node, 'node_size'): 164 | ret = node.node_size 165 | 166 | elif isinstance(node, list): 167 | ret = sum(map(lambda x: node_size(x, test), node)) 168 | 169 | elif is_atom(node): 170 | ret = 1 171 | 172 | elif isinstance(node, Name): 173 | ret = 1 174 | 175 | elif isinstance(node, Num): 176 | ret = 1 177 | 178 | elif isinstance(node, Str): 179 | ret = 1 180 | 181 | elif isinstance(node, Expr): 182 | ret = node_size(node.value, test) 183 | 184 | elif isinstance(node, AST): 185 | ret = 1 + sum(map(lambda x: node_size(x, test), node_fields(node))) 186 | 187 | else: 188 | ret = 0 189 | 190 | if test: 191 | print("node:", node, "size=", ret) 192 | 193 | if isinstance(node, AST): 194 | node.node_size = ret 195 | 196 | return ret 197 | 198 | 199 | 200 | #------------------------------------------------------------- 201 | # utilities 202 | #------------------------------------------------------------- 203 | def debug(*args): 204 | if DEBUG: 205 | print(args) 206 | 207 | 208 | def dot(): 209 | sys.stdout.write('.') 210 | sys.stdout.flush() 211 | 212 | 213 | def is_alpha(c): 214 | return (c == '_' 215 | or ('0' <= c <= '9') 216 | or ('a' <= c <= 'z') 217 | or ('A' <= c <= 'Z')) 218 | 219 | 220 | def div(m, n): 221 | if n == 0: 222 | return m 223 | else: 224 | return m/float(n) 225 | 226 | 227 | # for debugging 228 | def ps(s): 229 | v = parse(s).body[0] 230 | if isinstance(v, Expr): 231 | return v.value 232 | else: 233 | return v 234 | 235 | 236 | def sz(s): 237 | return node_size(parse(s), True) - 1 238 | 239 | 240 | def dp(s): 241 | return dump(parse(s)) 242 | 243 | 244 | def run(name, closure=True, debug=False): 245 | fullname1 = name + '1.py' 246 | fullname2 = name + '2.py' 247 | 248 | global DEBUG 249 | olddebug = DEBUG 250 | DEBUG = debug 251 | 252 | diff(fullname1, fullname2, closure) 253 | 254 | DEBUG = olddebug 255 | 256 | 257 | def demo(): 258 | run('demo') 259 | 260 | 261 | def go(): 262 | run('heavy') 263 | 264 | 265 | def pf(): 266 | cProfile.run("run('heavy')", sort="cumulative") 267 | 268 | 269 | 270 | 271 | #------------------------ file system support ----------------------- 272 | 273 | def base_name(filename): 274 | try: 275 | start = filename.rindex('/') + 1 276 | except ValueError: 277 | start = 0 278 | 279 | try: 280 | end = filename.rindex('.py') 281 | except ValueError: 282 | end = 0 283 | return filename[start:end] 284 | 285 | 286 | ## file system support 287 | def parse_file(filename): 288 | f = open(filename, 'r'); 289 | lines = f.read() 290 | ast = parse(lines) 291 | improve_ast(ast, lines, filename, 'left') 292 | return ast 293 | 294 | 295 | def get_install_path(): 296 | exec_name = os.path.abspath(__file__) 297 | path = exec_name.rindex(os.sep) + 1 298 | return exec_name[:path] 299 | 300 | 301 | def lfilter(f, ls): 302 | return list(filter(f, ls)) 303 | 304 | --------------------------------------------------------------------------------