├── .gitignore ├── .pre-commit-config.yaml ├── CHANGELOG.md ├── LICENSE ├── README.md ├── faint ├── __init__.py ├── __main__.py ├── branch_structure.py ├── cli.py ├── diff_trees.py ├── extract_comments.py ├── join_comments.py └── utils.py ├── pyproject.toml ├── requirements.txt └── tests ├── __init__.py ├── cases ├── happy_path │ ├── __init__.py │ ├── comments.json │ ├── for_diff.py │ ├── no_comments.py │ └── with_comments.py └── unknown_problem_1 │ ├── __init__.py │ ├── comments.json │ ├── no_comments.py │ └── with_comments.py ├── test_diff.py ├── test_extract.py ├── test_join.py └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Rust 2 | debug/ 3 | target/ 4 | Cargo.lock 5 | 6 | # Python 7 | tree-sitter-python/ 8 | build/ 9 | dist/ 10 | venv/ 11 | .venv/ 12 | main_no_comments.py 13 | just_comments.json 14 | with_comments_main.py 15 | main_with_comments.py 16 | out.py 17 | __pycache__ 18 | -------------------------------------------------------------------------------- /.pre-commit-config.yaml: -------------------------------------------------------------------------------- 1 | repos: 2 | - repo: https://github.com/pre-commit/pre-commit-hooks 3 | rev: v4.6.0 4 | hooks: 5 | - id: end-of-file-fixer 6 | - id: check-yaml 7 | 8 | - repo: https://github.com/charliermarsh/ruff-pre-commit 9 | rev: v0.4.5 10 | hooks: 11 | - id: ruff 12 | args: [--fix] 13 | - id: ruff-format 14 | 15 | - repo: https://github.com/pre-commit/mirrors-mypy 16 | rev: v1.10.0 17 | hooks: 18 | - id: mypy 19 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Changelog 2 | 3 | ## Thought 4 | 5 | - The usage of `TreeSitter` in theory can help to identify correct place for comment. 6 | The problem is that it have different attributes for every supported programming language. 7 | - Even if the `Python` scripts are just for `PoC` I want to write some tests that will allow me to catch edge cases. 8 | 9 | For now all tests are done on `Python` files, as the `TreeSitter` might work differently for other formats. 10 | 11 | ## Newest update 12 | 13 | ## 03.06.2024 14 | 15 | In progress with creating demo. 16 | 17 | ## 18.05.2024 18 | 19 | While extracting comments we should add info if they are alone in line or next to code. 20 | 21 | ## 15.05.2024 22 | 23 | There is bug that if there is many lines comment then in diff_tree only one is attached. 24 | 25 | Also the code works on files only in this repo (because we use commits from current repo). 26 | We also need to handle situtaion where there is no repo. 27 | 28 | ## 10.05.2024 29 | 30 | I wont be able to rewrite any of this to Rust as I planed initialy. 31 | Instead I will focus to deliver working flow in Python. 32 | 33 | I need to restructure my scripts and prepare them for pip install. 34 | 35 | ## 04.05.2024 36 | 37 | At first I was worry that the JSON needs to keep only comment and minimum data. 38 | But with this limitation it might be really hard to have good data structure. 39 | 40 | New idea for data structure is to keep whole "branch" of AST (Abstract Syntax Tree) next to comment. 41 | In each file there always be only unique "branches" like `class -> function -> variable`. It can be nested 42 | any number of times but at the end there will be comment. It already should tell us where the comment should be placed. 43 | 44 | ## 03.05.2024 45 | 46 | To address the problem of possible double injecting comments and double extracting I think JSON needs to be "DB" with all comments. 47 | We always needs to think that JSON contains all comments, and we can just apply them on code no matter if there is already this comment or not. 48 | To do so it might be helpful to think about better format structure that would keep not a row in code but something semantic. 49 | Also when join algorithm is running it first should gather all the comments, throw them to JSON, resolve duplicates and then apply the comments. 50 | 51 | ## 29.04.2024 52 | 53 | I run `extract_comments.py` and `join_comments.py` one after another on different Python source files to find edge cases. 54 | If the source file is not same after two scripts it means that there might be edge case. 55 | 56 | So far I found edge case related two variables defined in global space, which are under the `module` parent. 57 | 58 | ## 26.04.2024 59 | 60 | Another day without progress, I do not have idea how to solve problem of many edge cases. 61 | 62 | ## 25.04.2024 63 | 64 | I wrote some extension that would work with PREFIX. However my conclusion is that it still does not solve most of the problems. 65 | Also I am overthinking the problem of situation when user is turning off comments, writing something and turning them on again (in a loop). 66 | 67 | For now I need to focus to finish base idea and write down all needed features. 68 | 69 | ## 24.04.2024 70 | 71 | The double join problem made me think that we need to have any kind of control over the state. 72 | We can try to determine it on the fly or we can save it somewhere. 73 | In both cases the problem is that user might mix the already saved comments with new one etc. 74 | It will always create problems like the comments that are partially loaded or modified. 75 | 76 | To prevent that I think that we can make the comments with prefix like `# LOCAL: this is comment`. 77 | This way you see if the comments are loaded or not. You never delete comments that are not meant for deletion. 78 | You can have commands that resolve better and easier any problems. They also should be easier to parse. 79 | 80 | ## 23.04.2024 81 | 82 | The double usage of `join_comments.py` will modify the file twice. It needs to be solved. 83 | 84 | ## 19.04.2024 85 | 86 | The main idea is ready. The diff algorithm for sure needs tweaks and more tests but its something to fix later. 87 | Now I need a working flow. For this I will create CLI tool with options to choose. 88 | Along the way the implementation of algorithms will probably change as well as JSON format and things I haven't thought about yet. 89 | 90 | ## 12.04.2024 91 | 92 | After refactor and first test case I will probably continue with this approach. Now it is time to create tests for other files. 93 | 94 | ## 08.04.2024 95 | 96 | After many hours I decided to check how `linting` tools manage tests. 97 | This won't only be helpful in `PoC` but especially in final solution, where the tests will be crucial. 98 | 99 | I decided to check the test code for `black` and `isort`. It is almost terrifying 100 | 101 | ## 07.04.2024 102 | 103 | After small refactor of `extract_comments.py` and `join_comments.py` also the `diff_tree.py` needs one. 104 | The biggest problem with this files is how to handle the files, so it will be easy to test, use, and put in to the flow. 105 | 106 | ## 03.04.2024 107 | 108 | I need to rethink some design solutions. Nothing important today. 109 | 110 | ## 31.03.2024 111 | 112 | This is Easter commit. So I can only offer an :egg:. 113 | 114 | ## 29.03.2024 115 | 116 | So I am not very happy with current algorithm, but with some adjustments it should work well enough for proof of concept. 117 | After tweaks I would like to prepare demo of how it can work in action. 118 | 119 | ## 24.03.2024 120 | 121 | To summarize small research - we use `LCS` algorithm to find the `LCS` and then we see what is missing or what is new in newer file. 122 | I will probably try to use old file with comments in, and just omit the fact of absence in newer one. 123 | 124 | ## 23.03.2024 125 | 126 | Some research on most popular option to show the diff. 127 | So it terns out that it is mostly `Longest Common Subsequence` (LCS). 128 | With that knowledge I want to test algorithms that will allow me to identify the same nodes between code versions. 129 | 130 | ## 22.03.2024 131 | 132 | Learning more about how to diff two `AST`s. 133 | I want solution that will allow me to support users that forget or intentionally did not load the comments to the code. 134 | In this situation I need to take last commits and parse thru all commits to current one tracking the position of nodes. 135 | 136 | If all research will fail, I will probably build a workflow that will inform user about unsolved comments. 137 | 138 | ## 21.03.2024 139 | 140 | The flow is working, you can use `main.py` to remove comments and `join_them.py` to place them in correct place again. 141 | Next I want to check if I can use `TreeSitter` to find how the code changed between comments. 142 | 143 | ## 18.03.2024 144 | 145 | In progress with modifying `join_them` and `main` so we remove empty lines after comments: 146 | 147 | ```python 148 | def foo(): 149 | # comment 150 | # comment 151 | ... 152 | ``` 153 | 154 | Will be: 155 | 156 | ```python 157 | def foo(): 158 | ... 159 | 160 | ``` 161 | 162 | Not: 163 | 164 | ```python 165 | def foo(): 166 | 167 | 168 | ... 169 | ``` 170 | 171 | ## 17.03.2024 172 | 173 | `join_them.py` fixed. 174 | 175 | Time to design flow with happy path and create little demo. 176 | 177 | ## 15.03.2024 178 | 179 | The `join_them.py` must be fixed. 180 | 181 | ## 14.03.2024 182 | 183 | In progress research of different comments standards. 184 | For example [TODO comments](https://github.com/stsewd/tree-sitter-comment). 185 | That basically means that we could create some special syntax for our comments, 186 | or wrap this tool around `TODO comments`. 187 | 188 | ## 13.03.2024 189 | 190 | I found [diffsitter](https://github.com/afnanenayet/diffsitter) which may help me 191 | to find corresponding nodes between commits. 192 | 193 | Not yet decide if I will use it as a part of the system or learn how it works to implement part of that. 194 | 195 | ## 11.03.2024 196 | 197 | Today only some small tests without new results. 198 | 199 | ## 09.03.2024 200 | 201 | Day spend on reading Tree Sitter documentation. 202 | New idea is to build flow around git hook. 203 | So the flow would go: 204 | 205 | ``` 206 | | Load comments | 207 | V 208 | | Write code and comments | 209 | V 210 | | Save file | 211 | V 212 | | Run comment remover | 213 | V 214 | | Comments saved in file | 215 | | with commit id attached | 216 | ``` 217 | 218 | Now this flow is happy path. We have file with comments load to the same file from which we removed them. 219 | Problem begins if someone would not load comments right after. 220 | Then we can try to get commit in which they were removed, 221 | load the old tree, load current tree and find corresponding nodes. 222 | Then we can apply the old comments to new file. 223 | 224 | As a addition we can create messages of CLI tool that inform about hanging comments or other problems. 225 | 226 | ## 08.03.2024 227 | 228 | At this moment comments are stored by their position in bytes in file. 229 | The better approach I am trying to implement is to put them via their connection with node or line in code. 230 | In version when we can store them by corresponding node, we can them push them in similar way to tree. 231 | The way to which node to attach them will be to determine if they are on the left from code or above another node. 232 | 233 | ## 06.03.2024 234 | 235 | In `poc` there are two programs: `main.py` that removes comments from file 236 | and put them to json file and `join_them.py` that merges two files together again. 237 | 238 | Now I need more research to decide how to handle the flow. 239 | 240 | ## 03.03.2024 241 | 242 | So we have working `comment remover`. Just few lines of code and we can remove comments from file. 243 | The final solution will be probably in `Rust` but `PoCs` I will do in `Python`. 244 | 245 | ## 02.03.2024 246 | 247 | So after more research I think that it may work similar to formatter or linter. 248 | You could write comment as normal, and on save it could collect all new comments and extract them too new file. 249 | It would give advantage of not changing the flow of writing comments. 250 | So the thing is that after collecting them we need to display them to user but also allow user to edit them. 251 | We can then have toggle mechanism that would load comments to file and the extract them, but it would mess up git on every toggle. 252 | 253 | The project needs to start, so probably first step will be to do some PoC with tools like TreeSitter check if we can use existing CI/CD tools. 254 | 255 | ## Initial plan 256 | 257 | This is how most of our comments will look like: 258 | 259 | ```python 260 | def main(): 261 | print("Hello World!") # This is my comment 262 | ``` 263 | 264 | The problem is that sometimes we want to include some comments or notes that should not go to public repository. 265 | 266 | Lets call them **Comments of shame** but basically I mean private notes and thoughts. 267 | 268 | My initial plan is to keep them in separate file: 269 | 270 | ```python 271 | def main(): 272 | print("Hello World!") 273 | ``` 274 | 275 | And maybe something like: 276 | 277 | ```bash 278 | <1st: print> # This is my comment 279 | ``` 280 | 281 | The syntax is just to demonstrate idea. 282 | 283 | ## Approach 284 | 285 | Options are many. The most difficult will be to keep comments in right place when code is modified or refactored. 286 | Probably there might be some idling comments and notes. 287 | 288 | My smart ideas are to: 289 | 290 | - Use **TreeSitter** or basically keep comments attached to Code Tree Element 291 | - Force **LSP** to display them as `errors` or something 292 | - 293 | 294 | Difficult parts are: 295 | 296 | - Abandoned comments 297 | - How to enter the note/comment edition 298 | - How to keep track on refactored function (moved from one place to another) 299 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Cvaniak 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Faint Comments 2 | 3 | This repository is my attempt in [100 commits competition](https://100commitow.pl/). 4 | 5 | > :warning: This project is in PoC and still needs to be polished. Always pin version. 6 | > Do backup or use version control before applying any command. 7 | 8 | ## What it is? 9 | 10 | This library keeps your code comments in separate file and allows to put them back on. 11 | 12 | ## Why 13 | 14 | Comments are [code smell](https://refactoring.guru/pl/smells/comments). But sometimes: 15 | 16 | - you just want to have your own note that should never appear in Version Control. 17 | - you have code that needs some comments but comments must always be removed before deploy on production. 18 | - you want to give someone instruction on what to do in certain files. 19 | 20 | In all this cases it would be handy to keep all comments 21 | in separate file that could be potentially _git ignored_ 22 | but at the same time applied whenever you want to read them. 23 | 24 | Also I thought I can learn some cool algorithms and tools. 25 | 26 | --- 27 | 28 | ## Demo 29 | 30 | Imagine you have your code: 31 | 32 | ```python 33 | # TODO: this name must be changed 34 | def foo(): 35 | ... 36 | i = 0x5f3759df - ( i >> 1 ) # what the quack? 37 | ... 38 | ``` 39 | 40 | and maybe you do not think it is good to keep your comments in code. 41 | You run `faint extract .py` as a result you get: 42 | 43 | ```python 44 | def foo(): 45 | ... 46 | i = 0x5f3759df - ( i >> 1 ) 47 | ... 48 | 49 | ``` 50 | 51 | and `JSON` file: 52 | 53 | ```json 54 | { 55 | "comments": [ 56 | { 57 | "start": { 58 | "row": 4, 59 | "column": 33 60 | }, 61 | "text": "# what the quack?", 62 | "is_inline": true 63 | }, 64 | { 65 | "start": { 66 | "row": 1, 67 | "column": 0 68 | }, 69 | "text": "# TODO: this name must be changed", 70 | "is_inline": false 71 | } 72 | ], 73 | "deleted_lines": [1], 74 | "file_metadata": { 75 | "commit_sha": "sha_of_commit", 76 | "file_name": "path/to/file/.py", 77 | "file_sha": "hash_of_the_file" 78 | } 79 | } 80 | ``` 81 | 82 | and this file is kept as `comment_.json`. 83 | At any moment you can `faint join .py` and you will get original file. 84 | 85 | But this tool also tries to handle situation when you will modify the file before `join` command and more. 86 | 87 | ## How to install 88 | 89 | You can `pip` install: 90 | 91 | ```bash 92 | pip3 install faint 93 | ``` 94 | 95 | ## How to use 96 | 97 | You can check what is available via `faint --help` or just `faint`. 98 | 99 | `faint` is made of two main subcommands: 100 | 101 | - `faint extract ` which removes comments from code and place them in separate `JSON` file. 102 | - `faint join ` which applies comments from `JSON` file (if exists) in corresponding places. 103 | 104 | By default (not changeable yet) `JSON` files are named as `comments_.json`. 105 | So you can add to `.gitignore` line like `comments_*.json`. 106 | 107 | ### Workflow 108 | 109 | First use `extract` on the file. Then you can: 110 | 111 | - Use `join` and not modify the file 112 | - Use `join` and modify the file 113 | - Continue to modify the file and then use `join` 114 | 115 | Then: 116 | 117 | - If you apply `join` on file that matches exactly the file after `extract` 118 | it should be fast and simple and you should get all comments back in place. 119 | - If you apply `join` and file follows the structure good enough you should also 120 | get all comments back in right place. 121 | 122 | In both cases the `extract` should work as at the beginning. 123 | 124 | Current version does not allow for: 125 | 126 | - `extract` on already `extract`ed file (it will discard all comments in `JSON`) 127 | - `join` on already `join`ed file (it will place the comments twice) 128 | 129 | So if you want to add any new comments you should first `join` comments and the extract them. 130 | 131 | ## Supported languages 132 | 133 | This tool uses `TreeSitter`. For every language the `AST` (Abstract Syntax Tree) has different nodes. 134 | For this reason every language needs to be covered separately. 135 | 136 | - [x] Python 137 | 138 | ## Tests 139 | 140 | This tool will have many edge cases to cover. The tests for now does not need to pass any treshold, but are more a track of edge cases to cover in future. 141 | 142 | ## TODO 143 | 144 | - [x] Make it pip installable 145 | - [ ] Fix double join 146 | - [ ] When extracting, compare with current `JSON` 147 | - [ ] Show abandoned comments 148 | - [ ] Show deleted comments 149 | - [ ] When `join` and comment is abandoned inform user. Then suggest `--force` flag. 150 | - [ ] Allow for path to subdirectory 151 | - [ ] Figure out how to install TreeSitter per language 152 | - [ ] Add Git hook 153 | - [ ] Handle when there is new comment on already `extract`ed file 154 | - [ ] Create `join-extract` command 155 | - [ ] Write path to file as relative to repo 156 | 157 | ## Future plans 158 | 159 | ## Sumarry of 100 commits challange 160 | 161 | It is really hard to be consistent in anything. But being consistent is one thing; keeping a project going through various changes is another. 162 | Having strict rules that require you to add something to your project might be beneficial. It keeps you thinking about new changes and helps you remember what is already there. However, it also has downsides. For me, it was challenging to complete larger tasks. 163 | In the end, I am happy that I tried to finish this challenge and create the tool that had been in the back of my mind for a long time. I applied changes as needed. 164 | 165 | -------------------------------------------------------------------------------- /faint/__init__.py: -------------------------------------------------------------------------------- 1 | """Tool to manage your comments in code!""" 2 | 3 | __version__ = "0.1" 4 | -------------------------------------------------------------------------------- /faint/__main__.py: -------------------------------------------------------------------------------- 1 | from faint import cli 2 | 3 | 4 | def main(): 5 | cli.app() 6 | 7 | 8 | if __name__ == "__main__": 9 | main() 10 | -------------------------------------------------------------------------------- /faint/branch_structure.py: -------------------------------------------------------------------------------- 1 | import bisect 2 | import json 3 | from dataclasses import asdict, dataclass 4 | from pathlib import Path 5 | from typing import List 6 | 7 | import git 8 | from tree_sitter import Node, Tree 9 | from utils import get_tree 10 | 11 | 12 | @dataclass 13 | class Position: 14 | row: int 15 | column: int 16 | 17 | 18 | @dataclass 19 | class Comment: 20 | start: Position 21 | text: str 22 | branch: list[str] 23 | 24 | 25 | @dataclass 26 | class CommentsStruct: 27 | comments: List[Comment] 28 | deleted_lines: List[int] 29 | commit_sha: str 30 | file_name: str 31 | 32 | 33 | def collect_comment_nodes(tree: Tree): 34 | comment_nodes = [] 35 | 36 | def _collect_comment_ranges(node: Node): 37 | if node.type == "comment": 38 | branch = [] 39 | curr = node 40 | print(curr, tree.root_node) 41 | while curr is not None and curr != tree.root_node: 42 | print(curr) 43 | branch.append(str(curr.text, encoding="utf8")) 44 | # branch.append(curr.sexp) 45 | curr = curr.parent 46 | comment_nodes.append( 47 | ( 48 | Comment( 49 | start=Position(*node.start_point), 50 | text=str(node.text, encoding="utf-8"), 51 | branch=branch, 52 | ), 53 | node, 54 | ) 55 | ) 56 | else: 57 | for child in node.children: 58 | _collect_comment_ranges(child) 59 | 60 | _collect_comment_ranges(tree.root_node) 61 | 62 | return comment_nodes 63 | 64 | 65 | def remove_comments(tree, lines): 66 | # Collect ranges for comments 67 | comments_nodes: list[tuple[Comment, Node]] = collect_comment_nodes(tree) 68 | 69 | deleted_lines = [] 70 | 71 | # Remove comments by replacing them with spaces (to preserve formatting) 72 | for comment, node in reversed(comments_nodes): # Reverse to avoid offset issues 73 | start_p = comment.start 74 | 75 | # comment = Comment( 76 | # start=Position(*start_p), text=lines[start_p[0]][start_p[1] : end_p[1]] 77 | # ) 78 | 79 | # TODO: here was edge case, needs to be watched for more 80 | if ( 81 | node.parent is not None 82 | and node.parent.start_point[0] != start_p.row 83 | and node.prev_sibling is not None 84 | and node.prev_sibling.start_point[0] != start_p.row 85 | ) or (node.prev_sibling is None): 86 | lines[start_p.row] = "" 87 | bisect.insort(deleted_lines, start_p.row) 88 | else: 89 | lines[start_p.row] = lines[start_p.row][: start_p.column].rstrip() + "\n" 90 | 91 | lines = [x for x in lines if x != ""] 92 | 93 | return [comment for comment, _ in reversed(comments_nodes)], deleted_lines, lines 94 | 95 | 96 | def get_commit_sha(): 97 | repo = git.Repo(search_parent_directories=True) 98 | sha = repo.head.object.hexsha 99 | return sha 100 | 101 | 102 | def extract_comments(input_file_path, source_code, lines): 103 | tree = get_tree(source_code) 104 | comments, deleted_lines, new_lines = remove_comments(tree, lines) 105 | 106 | out_data = CommentsStruct(comments, deleted_lines, get_commit_sha(), input_file_path) 107 | 108 | return new_lines, out_data 109 | 110 | 111 | # Example of reading, 112 | # processing, and writing the file 113 | # also three line comment 114 | def main(input_file_path: Path, output_file_path: Path, output_comments_file_path: Path): 115 | with open(input_file_path, "r", encoding="utf-8") as file: 116 | source_code = file.read() 117 | 118 | with open(input_file_path, "r", encoding="utf-8") as file: 119 | lines = file.readlines() 120 | 121 | new_lines, out_data = extract_comments(str(input_file_path), source_code, lines) 122 | 123 | with open(output_file_path, "w", encoding="utf-8") as file: 124 | file.writelines(new_lines) 125 | 126 | with open(output_comments_file_path, "w", encoding="utf-8") as file: 127 | json.dump(asdict(out_data), file, indent=4) 128 | 129 | 130 | if __name__ == "__main__": 131 | input_file_path = Path("./tests/cases/happy_path/with_comments.py") 132 | output_file_path = Path("./tests/cases/happy_path/no_comments.py") 133 | output_comments_file_path = Path("./tests/cases/happy_path/comments.json") 134 | main(input_file_path, output_file_path, output_comments_file_path) 135 | -------------------------------------------------------------------------------- /faint/cli.py: -------------------------------------------------------------------------------- 1 | from pathlib import Path 2 | from typing import Annotated 3 | 4 | import typer 5 | 6 | from faint import diff_trees, extract_comments, join_comments 7 | from faint.utils import compare_files, get_tree 8 | 9 | app = typer.Typer(no_args_is_help=True) 10 | 11 | FaintFile = Annotated[ 12 | Path, 13 | typer.Argument( 14 | exists=True, 15 | file_okay=True, 16 | dir_okay=False, 17 | writable=False, 18 | readable=True, 19 | resolve_path=True, 20 | ), 21 | ] 22 | 23 | 24 | @app.command() 25 | def extract(file: FaintFile): 26 | """ 27 | Extract comments from choosen file 28 | """ 29 | json_file = file.with_name(f"comments_{file.stem}.json") 30 | 31 | extract_comments.main(file.absolute(), file.absolute(), json_file) 32 | print(f"{file.stem} extracted to {json_file}") 33 | print("Some stats could be shown here") 34 | 35 | 36 | @app.command() 37 | def join(file: FaintFile): 38 | """ 39 | Join comments with source file 40 | """ 41 | json_file = file.with_name(f"comments_{file.stem}.json") 42 | if not json_file.exists(): 43 | raise typer.BadParameter("File does not exist.") 44 | 45 | if compare_files(json_file, file): 46 | join_comments.main(file.absolute(), file.absolute(), json_file) 47 | print(f"{file.stem} is joined with {json_file.stem}") 48 | print("Some stats could be shown here") 49 | else: 50 | diff_trees.main_between_commits(file, json_file) 51 | print("Difficult case") 52 | 53 | 54 | @app.command() 55 | def list_comments(file: FaintFile): 56 | """ 57 | List all comments existing in code 58 | """ 59 | with open(file, "r", encoding="utf-8") as f: 60 | source_code = f.read() 61 | tree = get_tree(source_code) 62 | 63 | comment_nodes = extract_comments.collect_comment_nodes(tree) 64 | 65 | for comment in comment_nodes: 66 | print(f"line {comment.start_point[0]}:\n{comment.text.decode('utf8')}") 67 | 68 | 69 | if __name__ == "__main__": 70 | app() 71 | -------------------------------------------------------------------------------- /faint/diff_trees.py: -------------------------------------------------------------------------------- 1 | import json 2 | from dataclasses import dataclass 3 | from pathlib import Path 4 | from typing import List, Optional, cast 5 | 6 | from tree_sitter import Node, Parser 7 | 8 | from faint.utils import get_file_bytes_by_commit_sha, load_language 9 | 10 | 11 | @dataclass 12 | class LeafNode: 13 | text: str 14 | line: int 15 | comment: bool = False 16 | marked: bool = False 17 | alone: bool = True 18 | node: Optional["LeafNode"] = None 19 | column: int = 0 20 | below_comment: Optional["LeafNode"] = None 21 | 22 | def __eq__(self, other): 23 | return self.text == other.text 24 | 25 | 26 | @dataclass 27 | class MissingComments: 28 | comment: LeafNode 29 | target_node: Optional[LeafNode] = None 30 | 31 | 32 | def serialize_tree(node: Node, leaf_nodes: list[LeafNode]) -> None: 33 | # It contains children -> it is "leaf" 34 | if node.child_count > 0: 35 | for child in node.children: 36 | serialize_tree(child, leaf_nodes) 37 | 38 | else: 39 | x = LeafNode(node.text.decode("utf-8"), node.start_point[0], column=node.start_point[1]) 40 | 41 | if node.type == "comment": 42 | x.comment = True 43 | if leaf_nodes: 44 | # If code is in the same line with comment 45 | if leaf_nodes[-1].line == x.line: 46 | x.alone = False 47 | x.node = leaf_nodes[-1] 48 | leaf_nodes[-1].marked = True 49 | leaf_nodes[-1].node = x 50 | 51 | # If previous node is also comment we group them 52 | elif leaf_nodes[-1].comment: 53 | x.below_comment = leaf_nodes[-1] 54 | 55 | else: 56 | # if previous leaf is comment we mark this node and attach comment node 57 | if leaf_nodes and leaf_nodes[-1].comment and leaf_nodes[-1].alone: 58 | x.marked = True 59 | leaf_nodes[-1].node = x 60 | x.node = leaf_nodes[-1] 61 | 62 | leaf_nodes.append(x) 63 | 64 | 65 | def get_serialized_tree_bytes(file: bytes, parser: Parser) -> list[LeafNode]: 66 | tree = parser.parse(file) 67 | 68 | serialized_tree: list[LeafNode] = [] 69 | serialize_tree(tree.root_node, serialized_tree) 70 | 71 | return serialized_tree 72 | 73 | 74 | def lcs(tree_a: list[LeafNode], tree_b: list[LeafNode]) -> list[list[int]]: 75 | m, n = len(tree_a), len(tree_b) 76 | matrix = [[0] * (n + 1) for _ in range(m + 1)] 77 | for i in range(1, m + 1): 78 | for j in range(1, n + 1): 79 | if tree_a[i - 1] == tree_b[j - 1]: 80 | matrix[i][j] = matrix[i - 1][j - 1] + 1 81 | else: 82 | matrix[i][j] = max(matrix[i][j - 1], matrix[i - 1][j]) 83 | return matrix 84 | 85 | 86 | def backtrack( 87 | matrix: list[list[int]], 88 | tree_a: list[LeafNode], 89 | tree_b: list[LeafNode], 90 | i: int, 91 | j: int, 92 | ) -> list[MissingComments]: 93 | # Terminate 94 | if i == 0 or j == 0: 95 | return [] 96 | 97 | # If we have this node in both trees 98 | elif tree_a[i - 1] == tree_b[j - 1]: 99 | added = backtrack(matrix, tree_a, tree_b, i - 1, j - 1) 100 | if tree_a[i - 1].marked: 101 | if tree_a[i - 1].node is not None: 102 | val = cast(LeafNode, tree_a[i - 1].node) 103 | added.append(MissingComments(target_node=tree_b[j - 1], comment=val)) 104 | return added 105 | 106 | else: 107 | # If this node is only in newer tree 108 | if matrix[i][j - 1] > matrix[i - 1][j]: 109 | added = backtrack(matrix, tree_a, tree_b, i, j - 1) 110 | # If this node is only in old tree 111 | else: 112 | added = backtrack(matrix, tree_a, tree_b, i - 1, j) 113 | # This is node that have comment but we do not know where to put it 114 | if tree_a[i - 1].marked: 115 | if tree_a[i - 1].node is not None: 116 | val = cast(LeafNode, tree_a[i - 1].node) 117 | added.append(MissingComments(target_node=tree_b[j - 1], comment=val)) 118 | return added 119 | 120 | 121 | def backtrack_add_remove(matrix, tree_a, tree_b, i, j): 122 | if i == 0 or j == 0: 123 | return [], [], [] 124 | elif tree_a[i - 1] == tree_b[j - 1]: 125 | added, removed, common = backtrack_add_remove(matrix, tree_a, tree_b, i - 1, j - 1) 126 | common.append(tree_a[i - 1]) 127 | return added, removed, common 128 | else: 129 | if matrix[i][j - 1] > matrix[i - 1][j]: 130 | added, removed, common = backtrack_add_remove(matrix, tree_a, tree_b, i, j - 1) 131 | added.append(tree_b[j - 1]) 132 | else: 133 | added, removed, common = backtrack_add_remove(matrix, tree_a, tree_b, i - 1, j) 134 | removed.append(tree_a[i - 1]) 135 | return added, removed, common 136 | 137 | 138 | def display_diff(added: List[MissingComments]): 139 | # added, removed, common = backtrack( 140 | 141 | for item in added: 142 | if item.target_node is not None: 143 | if item.comment.alone: 144 | print(f"line: {item.target_node.line}\n{item.comment.text}\n{item.target_node.text}") 145 | else: 146 | print(f"line: {item.target_node.line}\n{item.target_node.text} {item.comment.text}") 147 | else: 148 | print(f"abandoned: {item.comment.text}") 149 | print() 150 | 151 | 152 | # NOTE: Remember it is modified in place 153 | def apply_missing_comments(content: list[str], diffs: list[MissingComments]): 154 | # We apply from last line to first 155 | # So we do not move 156 | shift = 0 157 | 158 | for item in diffs: 159 | # This is abandoned 160 | if item.target_node is None: 161 | continue 162 | 163 | if item.comment.alone: 164 | # Apply possibly grouped comments 165 | lines = [] 166 | curr = item.comment 167 | while curr.below_comment: 168 | lines.append(curr) 169 | curr = curr.below_comment 170 | lines.append(curr) 171 | 172 | row = item.target_node.line 173 | for line in reversed(lines): 174 | x = line.text 175 | if x[-1] != "\n": 176 | x = x + "\n" 177 | content.insert(row + shift, " " * line.column + x) 178 | shift += 1 179 | 180 | else: 181 | if len(content) <= item.target_node.line + shift: 182 | # Should not happen 183 | print( 184 | "Target node have line higher that file length", 185 | item.target_node.line, 186 | shift, 187 | item.comment.text, 188 | ) 189 | continue 190 | 191 | x = content[item.target_node.line + shift][:-1] 192 | content[item.target_node.line + shift] = x + " " + item.comment.text + "\n" 193 | 194 | return content 195 | 196 | 197 | def find_missing_comments(tree_a: list[LeafNode], tree_b: list[LeafNode]) -> list[MissingComments]: 198 | lcs_sequence = lcs(tree_a, tree_b) 199 | 200 | added = backtrack( 201 | lcs_sequence, 202 | tree_a, 203 | tree_b, 204 | len(tree_a), 205 | len(tree_b), 206 | ) 207 | return added 208 | 209 | 210 | def main(file_in_1: Path, file_in_2: Path, file_out: Path): 211 | parser = load_language() 212 | 213 | with open(file_in_1, "rb") as file: 214 | file_bytes_1 = file.read() 215 | with open(file_in_2, "rb") as file: 216 | file_bytes_2 = file.read() 217 | 218 | tree1 = get_serialized_tree_bytes(file_bytes_1, parser) 219 | tree2 = get_serialized_tree_bytes(file_bytes_2, parser) 220 | 221 | added = find_missing_comments(tree1, tree2) 222 | 223 | display_diff(added) 224 | with open(file_in_2, "r") as file: 225 | origin_file_data = file.readlines() 226 | 227 | content = apply_missing_comments(origin_file_data, added) 228 | 229 | with open(file_out, "w", encoding="utf-8") as output_file: 230 | output_file.writelines(content) 231 | 232 | 233 | def main_between_commits(file: Path, json_file: Path): 234 | parser = load_language() 235 | with open(json_file, "r", encoding="utf-8") as f: 236 | comments_data = json.load(f) 237 | 238 | original_file = get_file_bytes_by_commit_sha(file, comments_data["file_metadata"]["commit_sha"]) 239 | with open(file, "rb") as f: 240 | file_bytes = f.read() 241 | 242 | tree1 = get_serialized_tree_bytes(original_file, parser) 243 | tree2 = get_serialized_tree_bytes(file_bytes, parser) 244 | 245 | added = find_missing_comments(tree1, tree2) 246 | 247 | with open(file, "r") as f: 248 | origin_file_data = f.readlines() 249 | 250 | content = apply_missing_comments(origin_file_data, added) 251 | 252 | with open(file, "w", encoding="utf-8") as f: 253 | f.writelines(content) 254 | 255 | 256 | if __name__ == "__main__": 257 | file_a = Path("./tests/cases/happy_path/a.py") 258 | file_b = Path("./tests/cases/happy_path/b.py") 259 | file_out = Path("./tests/cases/happy_path/out.py") 260 | main(file_a, file_b, file_out) 261 | -------------------------------------------------------------------------------- /faint/extract_comments.py: -------------------------------------------------------------------------------- 1 | import bisect 2 | import json 3 | from dataclasses import asdict, dataclass 4 | from pathlib import Path 5 | from typing import List, Optional 6 | 7 | from tree_sitter import Node 8 | 9 | from faint.utils import get_commit_sha, get_lines_hash, get_tree 10 | 11 | 12 | @dataclass 13 | class Position: 14 | row: int 15 | column: int 16 | 17 | 18 | @dataclass 19 | class Comment: 20 | start: Position 21 | text: str 22 | is_inline: bool 23 | 24 | 25 | @dataclass 26 | class FileMetadata: 27 | commit_sha: str 28 | file_name: str 29 | file_sha: str 30 | 31 | 32 | @dataclass 33 | class CommentsStruct: 34 | comments: List[Comment] 35 | deleted_lines: List[int] 36 | file_metadata: Optional[FileMetadata] = None 37 | 38 | 39 | def collect_comment_nodes(tree): 40 | comment_nodes = [] 41 | 42 | def _collect_comment_ranges(node: Node): 43 | if node.type == "comment": 44 | comment_nodes.append(node) 45 | else: 46 | for child in node.children: 47 | _collect_comment_ranges(child) 48 | 49 | _collect_comment_ranges(tree.root_node) 50 | 51 | return comment_nodes 52 | 53 | 54 | def remove_comments(tree, lines): 55 | # Collect ranges for comments 56 | comment_nodes = collect_comment_nodes(tree) 57 | 58 | comments = [] 59 | deleted_lines = [] 60 | 61 | # Remove comments by replacing them with spaces (to preserve formatting) 62 | for node in reversed(comment_nodes): # Reverse to avoid offset issues 63 | start_p, end_p = node.start_point, node.end_point 64 | is_inline = False 65 | text = lines[start_p[0]][start_p[1] : end_p[1]] 66 | 67 | # TODO: here was edge case, needs to be watched for more 68 | if (node.parent.start_point[0] != start_p[0] and node.prev_sibling.start_point[0] != start_p[0]) or ( 69 | node.prev_sibling is None 70 | ): 71 | lines[start_p[0]] = "" 72 | bisect.insort(deleted_lines, start_p[0]) 73 | else: 74 | lines[start_p[0]] = lines[start_p[0]][: start_p[1]].rstrip() + "\n" 75 | is_inline = True 76 | 77 | comment = Comment( 78 | start=Position(*start_p), 79 | text=text, 80 | is_inline=is_inline, 81 | ) 82 | comments.append(comment) 83 | 84 | lines = [x for x in lines if x != ""] 85 | 86 | return comments, deleted_lines, lines 87 | 88 | 89 | def extract_comments(input_file_path, source_code, lines): 90 | tree = get_tree(source_code) 91 | comments, deleted_lines, new_lines = remove_comments(tree, lines) 92 | 93 | metadata = FileMetadata(get_commit_sha(), input_file_path, get_lines_hash(lines)) 94 | 95 | out_data = CommentsStruct(comments, deleted_lines, metadata) 96 | 97 | return new_lines, out_data 98 | 99 | 100 | # Example of reading, 101 | # processing, and writing the file 102 | # also three line comment 103 | def main(input_file_path: Path, output_file_path: Path, output_comments_file_path: Path): 104 | with open(input_file_path, "r", encoding="utf-8") as file: 105 | source_code = file.read() 106 | 107 | with open(input_file_path, "r", encoding="utf-8") as file: 108 | lines = file.readlines() 109 | 110 | new_lines, out_data = extract_comments(str(input_file_path), source_code, lines) 111 | 112 | with open(output_file_path, "w", encoding="utf-8") as file: 113 | file.writelines(new_lines) 114 | 115 | with open(output_comments_file_path, "w", encoding="utf-8") as file: 116 | json.dump(asdict(out_data), file, indent=4) 117 | 118 | 119 | if __name__ == "__main__": 120 | input_file_path = Path("./tests/cases/happy_path/with_comments.py") 121 | output_file_path = Path("./tests/cases/happy_path/no_comments.py") 122 | output_comments_file_path = Path("./tests/cases/happy_path/comments.json") 123 | main(input_file_path, output_file_path, output_comments_file_path) 124 | -------------------------------------------------------------------------------- /faint/join_comments.py: -------------------------------------------------------------------------------- 1 | import json 2 | from pathlib import Path 3 | 4 | 5 | def apply_comments_to_file(comments_data, lines): 6 | # NOTE: it modify original lines. Would need deep copy but might not be necessary 7 | adjusted_lines = lines 8 | 9 | for line_to_append in comments_data["deleted_lines"]: 10 | adjusted_lines.insert(line_to_append, "") 11 | 12 | for comment in comments_data["comments"]: 13 | tmp = comment["start"] 14 | line_number, column = tmp["row"], tmp["column"] 15 | comment_text = comment["text"] 16 | is_inline = comment["is_inline"] 17 | 18 | # NOTE: line is empty 19 | if not is_inline: 20 | adjusted_lines[line_number] = " " * column + comment_text + "\n" 21 | 22 | # NOTE: apply on right side of the code. Removes trailing spaces and apply exactly 2 spaces. 23 | elif len(adjusted_lines[line_number][:-1]) <= column: 24 | adjusted_lines[line_number] = adjusted_lines[line_number][:-1].rstrip() + " " + comment_text + "\n" 25 | 26 | else: 27 | raise ValueError("It should not happen.") 28 | 29 | return adjusted_lines 30 | 31 | 32 | def main(source_file_path: Path, output_file_path: Path, json_comments: Path): 33 | with open(json_comments, "r", encoding="utf-8") as json_file: 34 | comments_data = json.load(json_file) 35 | 36 | with open(source_file_path, "r", encoding="utf-8") as file: 37 | lines = file.readlines() 38 | 39 | done = apply_comments_to_file(comments_data, lines) 40 | 41 | with open(output_file_path, "w", encoding="utf-8") as output_file: 42 | output_file.writelines(done) 43 | 44 | 45 | if __name__ == "__main__": 46 | source_file_path = Path("main_no_comments.py") 47 | output_file_path = Path("main_with_comments.py") 48 | json_comments = Path("just_comments.json") 49 | main(source_file_path, output_file_path, json_comments) 50 | -------------------------------------------------------------------------------- /faint/utils.py: -------------------------------------------------------------------------------- 1 | import hashlib 2 | import json 3 | from pathlib import Path 4 | from typing import List 5 | 6 | import git 7 | import tree_sitter_python as tspython 8 | from tree_sitter import Language, Parser 9 | 10 | 11 | def load_language(): 12 | PY_LANGUAGE = Language(tspython.language()) 13 | parser = Parser() 14 | parser.language = PY_LANGUAGE 15 | return parser 16 | 17 | 18 | # Load the language library (Adjust the path to your compiled language library) 19 | def get_tree(source_code): 20 | parser = load_language() 21 | 22 | tree = parser.parse(bytes(source_code, "utf8")) 23 | return tree 24 | 25 | 26 | def get_file_hash(file: Path): 27 | with open(file, "rb") as f: 28 | digest = hashlib.file_digest(f, "sha256") 29 | return digest.hexdigest() 30 | 31 | 32 | def get_lines_hash(lines: List[str]): 33 | joined_lines = "".join(lines).encode("utf-8") 34 | hasher = hashlib.sha256() 35 | hasher.update(joined_lines) 36 | 37 | return hasher.hexdigest() 38 | 39 | 40 | def compare_files(json_file: Path, file: Path) -> bool: 41 | file_hash = get_file_hash(file) 42 | with open(json_file, "r", encoding="utf-8") as f: 43 | comments_data = json.load(f) 44 | 45 | return file_hash == comments_data["file_metadata"]["file_sha"] 46 | 47 | 48 | def get_file_bytes_by_commit_sha(file: Path, commit_sha: str) -> bytes: 49 | repo = git.Repo(search_parent_directories=True) 50 | 51 | # Get the commit 52 | commit = repo.commit(commit_sha) 53 | 54 | # Get the file content at the specific commit 55 | 56 | repo_path = Path(repo.working_tree_dir).resolve() 57 | file_path = Path(file).resolve() 58 | 59 | file_content = commit.tree / str(file_path.relative_to(repo_path)) 60 | 61 | # Print the content 62 | # return file_content.data_stream.read().decode("utf-8") 63 | return file_content.data_stream.read() 64 | 65 | 66 | def get_commit_sha(): 67 | repo = git.Repo(search_parent_directories=True) 68 | sha = repo.head.object.hexsha 69 | return sha 70 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [build-system] 2 | requires = ["flit_core >=3.2,<4"] 3 | build-backend = "flit_core.buildapi" 4 | 5 | [project] 6 | name = "faint" 7 | authors = [{ name = "Cvaniak", email = "igna.cwaniak@gmail.com" }] 8 | readme = "README.md" 9 | license = { file = "LICENSE" } 10 | classifiers = ["License :: OSI Approved :: MIT License"] 11 | dynamic = ["version", "description"] 12 | requires-python = ">=3.7" 13 | dependencies = [ 14 | "typer==0.12.3", 15 | "tree-sitter-python==0.21.0", 16 | "tree-sitter==0.22.3", 17 | "GitPython==3.1.43", 18 | ] 19 | 20 | 21 | [project.scripts] 22 | faint = "faint.__main__:main" 23 | 24 | [tool.ruff] 25 | extend-exclude = ["tests/cases"] 26 | line-length = 120 27 | 28 | [tool.ruff.lint] 29 | exclude = ["tests/cases/*"] 30 | select = ["E", "F", "I", "B", "A"] 31 | 32 | [tool.ruff.format] 33 | exclude = ["tests/cases/*"] 34 | 35 | [tool.mypy] 36 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | tree-sitter 2 | gitpython 3 | pytest 4 | tree-sitter-python 5 | 6 | typer 7 | pdbpp 8 | debugpy 9 | 10 | ruff 11 | isort 12 | flake8 13 | mypy 14 | pre-commit 15 | -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Cvaniak/Consistent/abd834c2559df0978ef4d031404362246bd38e2b/tests/__init__.py -------------------------------------------------------------------------------- /tests/cases/happy_path/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Cvaniak/Consistent/abd834c2559df0978ef4d031404362246bd38e2b/tests/cases/happy_path/__init__.py -------------------------------------------------------------------------------- /tests/cases/happy_path/comments.json: -------------------------------------------------------------------------------- 1 | { 2 | "comments": [ 3 | { 4 | "start": { 5 | "row": 18, 6 | "column": 0 7 | }, 8 | "text": "# zys above", 9 | "is_inline": false 10 | }, 11 | { 12 | "start": { 13 | "row": 15, 14 | "column": 14 15 | }, 16 | "text": "# Pass inline", 17 | "is_inline": true 18 | }, 19 | { 20 | "start": { 21 | "row": 13, 22 | "column": 4 23 | }, 24 | "text": "# Comment above", 25 | "is_inline": false 26 | }, 27 | { 28 | "start": { 29 | "row": 12, 30 | "column": 10 31 | }, 32 | "text": "# class A inline", 33 | "is_inline": true 34 | }, 35 | { 36 | "start": { 37 | "row": 8, 38 | "column": 0 39 | }, 40 | "text": "# fox above", 41 | "is_inline": false 42 | }, 43 | { 44 | "start": { 45 | "row": 4, 46 | "column": 12 47 | }, 48 | "text": "# tricky inline", 49 | "is_inline": true 50 | }, 51 | { 52 | "start": { 53 | "row": 0, 54 | "column": 0 55 | }, 56 | "text": "# Test above", 57 | "is_inline": false 58 | } 59 | ], 60 | "deleted_lines": [ 61 | 0, 62 | 8, 63 | 13, 64 | 18 65 | ], 66 | "file_metadata": { 67 | "commit_sha": "fe85b31f7a1bae98ab05fda142a263b1b3bdefcb", 68 | "file_name": "/home/cvaniak/Code/Cvaniak/Consistent/tests/cases/happy_path/with_comments.py", 69 | "file_sha": "ca43efba412cb8d653bca85bc735847140cec4fee867195ae8970e2c538874f9" 70 | } 71 | } 72 | -------------------------------------------------------------------------------- /tests/cases/happy_path/for_diff.py: -------------------------------------------------------------------------------- 1 | # Test above 2 | def foo(): ... 3 | 4 | 5 | def bar(): # tricky inline 6 | a = 10 7 | pass 8 | ... 9 | 10 | 11 | # this will be abandoned 12 | def this_will_be_abandoned(): 13 | pass 14 | 15 | 16 | # fox above 17 | def fox(): ... 18 | 19 | 20 | class A: # class A inline 21 | # Comment above 22 | def __init__(self) -> None: 23 | pass # Pass inline 24 | 25 | 26 | # zys above 27 | def zys(): ... 28 | -------------------------------------------------------------------------------- /tests/cases/happy_path/no_comments.py: -------------------------------------------------------------------------------- 1 | def foo(): ... 2 | 3 | 4 | def bar(): 5 | ... 6 | 7 | 8 | def fox(): ... 9 | 10 | 11 | class A: 12 | def __init__(self) -> None: 13 | pass 14 | 15 | 16 | def zys(): ... 17 | -------------------------------------------------------------------------------- /tests/cases/happy_path/with_comments.py: -------------------------------------------------------------------------------- 1 | # Test above 2 | def foo(): ... 3 | 4 | 5 | def bar(): # tricky inline 6 | ... 7 | 8 | 9 | # fox above 10 | def fox(): ... 11 | 12 | 13 | class A: # class A inline 14 | # Comment above 15 | def __init__(self) -> None: 16 | pass # Pass inline 17 | 18 | 19 | # zys above 20 | def zys(): ... 21 | -------------------------------------------------------------------------------- /tests/cases/unknown_problem_1/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Cvaniak/Consistent/abd834c2559df0978ef4d031404362246bd38e2b/tests/cases/unknown_problem_1/__init__.py -------------------------------------------------------------------------------- /tests/cases/unknown_problem_1/comments.json: -------------------------------------------------------------------------------- 1 | { 2 | "comments": [ 3 | { 4 | "start": { 5 | "row": 2, 6 | "column": 40 7 | }, 8 | "text": "# broken symlink pointing to itself", 9 | "is_inline": true 10 | }, 11 | { 12 | "start": { 13 | "row": 1, 14 | "column": 30 15 | }, 16 | "text": "# fix for bpo-35306", 17 | "is_inline": true 18 | }, 19 | { 20 | "start": { 21 | "row": 0, 22 | "column": 26 23 | }, 24 | "text": "# drive exists but is not accessible", 25 | "is_inline": true 26 | } 27 | ], 28 | "deleted_lines": [], 29 | "file_metadata": { 30 | "commit_sha": "fe85b31f7a1bae98ab05fda142a263b1b3bdefcb", 31 | "file_name": "/home/cvaniak/Code/Cvaniak/Consistent/tests/cases/unknown_problem_1/with_comments.py", 32 | "file_sha": "f61ead675ffbc0c81d0e3c76bc3912688838608cccacb2d612850bf1337aaf37" 33 | } 34 | } 35 | -------------------------------------------------------------------------------- /tests/cases/unknown_problem_1/no_comments.py: -------------------------------------------------------------------------------- 1 | _WINERROR_NOT_READY = 21 2 | _WINERROR_INVALID_NAME = 123 3 | _WINERROR_CANT_RESOLVE_FILENAME = 1921 4 | 5 | def test(): 6 | ... 7 | -------------------------------------------------------------------------------- /tests/cases/unknown_problem_1/with_comments.py: -------------------------------------------------------------------------------- 1 | _WINERROR_NOT_READY = 21 # drive exists but is not accessible 2 | _WINERROR_INVALID_NAME = 123 # fix for bpo-35306 3 | _WINERROR_CANT_RESOLVE_FILENAME = 1921 # broken symlink pointing to itself 4 | 5 | def test(): 6 | ... 7 | -------------------------------------------------------------------------------- /tests/test_diff.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | 3 | from bip.diff_trees import ( 4 | apply_missing_comments, 5 | find_missing_comments, 6 | get_serialized_tree_bytes, 7 | ) 8 | from bip.utils import load_language 9 | from tests.utils import get_lines_from_file 10 | 11 | 12 | @pytest.fixture 13 | def parser(): 14 | parser = load_language() 15 | return parser 16 | 17 | 18 | @pytest.fixture 19 | def old_tree(path, parser): 20 | old_file = path + "for_diff.py" 21 | with open(old_file, "rb") as f: 22 | file_bytes = f.read() 23 | return get_serialized_tree_bytes(file_bytes, parser) 24 | 25 | 26 | @pytest.fixture 27 | def new_tree(path, parser): 28 | new_file = path + "no_comments.py" 29 | with open(new_file, "rb") as f: 30 | file_bytes = f.read() 31 | return get_serialized_tree_bytes(file_bytes, parser) 32 | 33 | 34 | class TestFindMissingComments: 35 | @pytest.mark.parametrize("path", ["./tests/cases/happy_path/"]) 36 | def test_happy_path(self, path: str, old_tree, new_tree): 37 | # given 38 | 39 | # when 40 | missing = find_missing_comments(old_tree, new_tree) 41 | 42 | # then 43 | # TODO: check if missing is valid 44 | assert missing 45 | 46 | 47 | class TestApplyMissingComments: 48 | @pytest.mark.parametrize("path", ["./tests/cases/happy_path/"]) 49 | def test_happy_path(self, path: str, old_tree, new_tree): 50 | # given 51 | missing = find_missing_comments(old_tree, new_tree) 52 | 53 | new_file_nc = path + "no_comments.py" 54 | new_lines_nc = get_lines_from_file(new_file_nc) 55 | 56 | new_file_c = path + "with_comments.py" 57 | expected_lines = get_lines_from_file(new_file_c) 58 | 59 | # when 60 | new_lines = apply_missing_comments(new_lines_nc, missing) 61 | 62 | # then 63 | assert new_lines == expected_lines 64 | -------------------------------------------------------------------------------- /tests/test_extract.py: -------------------------------------------------------------------------------- 1 | from dataclasses import asdict 2 | 3 | import pytest 4 | 5 | from bip.extract_comments import extract_comments 6 | from tests.utils import get_bytes_from_file, get_lines_from_file, load_json 7 | 8 | 9 | @pytest.mark.parametrize( 10 | "path,deleted_lines", 11 | [ 12 | ("./tests/cases/happy_path/", [0, 8, 13, 18]), 13 | ("./tests/cases/unknown_problem_1/", []), 14 | ], 15 | ) 16 | def test_happy_path(path: str, deleted_lines: list[int]): 17 | # given 18 | file_in, file_out = path + "with_comments.py", path + "no_comments.py" 19 | file_json = path + "comments.json" 20 | lines = get_lines_from_file(file_in) 21 | source_code = get_bytes_from_file(file_in) 22 | out_file = get_lines_from_file(file_out) 23 | data_out = load_json(file_json) 24 | 25 | # when 26 | new_lines, data = extract_comments(file_in, source_code, lines) 27 | 28 | # then 29 | assert new_lines == out_file 30 | assert asdict(data)["comments"] == data_out["comments"] 31 | assert data.deleted_lines == deleted_lines 32 | -------------------------------------------------------------------------------- /tests/test_join.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | 3 | from bip.join_comments import apply_comments_to_file 4 | from tests.utils import get_lines_from_file, load_json 5 | 6 | 7 | @pytest.mark.parametrize("path", ["./tests/cases/happy_path/"]) 8 | class TestJoin: 9 | def test_happy_path(self, path: str): 10 | # given 11 | file_in, file_out = path + "no_comments.py", path + "with_comments.py" 12 | file_json = path + "comments.json" 13 | lines_in = get_lines_from_file(file_in) 14 | lines_out = get_lines_from_file(file_out) 15 | comments_data = load_json(file_json) 16 | 17 | # when 18 | lines_with_applied_comments = apply_comments_to_file(comments_data, lines_in) 19 | 20 | # then 21 | assert lines_with_applied_comments == lines_out 22 | 23 | def test_double_join(self, path: str): 24 | # given 25 | file_in, file_out = path + "no_comments.py", path + "with_comments.py" 26 | file_json = path + "comments.json" 27 | lines_in = get_lines_from_file(file_in) 28 | lines_out = get_lines_from_file(file_out) 29 | comments_data = load_json(file_json) 30 | 31 | # when 32 | lines_with_applied_comments = apply_comments_to_file(comments_data, lines_in) 33 | lines_double_applied_comments = apply_comments_to_file(comments_data, lines_with_applied_comments) 34 | 35 | # then 36 | assert lines_double_applied_comments == lines_out 37 | -------------------------------------------------------------------------------- /tests/utils.py: -------------------------------------------------------------------------------- 1 | import json 2 | 3 | 4 | def get_bytes_from_file(file_path): 5 | with open(file_path, "r", encoding="utf-8") as file: 6 | source_code = file.read() 7 | return source_code 8 | 9 | 10 | def get_lines_from_file(file_path): 11 | with open(file_path, "r", encoding="utf-8") as file: 12 | lines = file.readlines() 13 | return lines 14 | 15 | 16 | def load_json(file_path): 17 | with open(file_path, "r", encoding="utf-8") as file: 18 | data_out = json.load(file) 19 | return data_out 20 | --------------------------------------------------------------------------------