├── .gitignore
├── .pre-commit-config.yaml
├── CHANGELOG.md
├── LICENSE
├── README.md
├── faint
    ├── __init__.py
    ├── __main__.py
    ├── branch_structure.py
    ├── cli.py
    ├── diff_trees.py
    ├── extract_comments.py
    ├── join_comments.py
    └── utils.py
├── pyproject.toml
├── requirements.txt
└── tests
    ├── __init__.py
    ├── cases
        ├── happy_path
        │   ├── __init__.py
        │   ├── comments.json
        │   ├── for_diff.py
        │   ├── no_comments.py
        │   └── with_comments.py
        └── unknown_problem_1
        │   ├── __init__.py
        │   ├── comments.json
        │   ├── no_comments.py
        │   └── with_comments.py
    ├── test_diff.py
    ├── test_extract.py
    ├── test_join.py
    └── utils.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | # Rust
 2 | debug/
 3 | target/
 4 | Cargo.lock
 5 | 
 6 | # Python
 7 | tree-sitter-python/
 8 | build/
 9 | dist/
10 | venv/
11 | .venv/
12 | main_no_comments.py
13 | just_comments.json
14 | with_comments_main.py
15 | main_with_comments.py
16 | out.py
17 | __pycache__
18 | 


--------------------------------------------------------------------------------
/.pre-commit-config.yaml:
--------------------------------------------------------------------------------
 1 | repos:
 2 |   - repo: https://github.com/pre-commit/pre-commit-hooks
 3 |     rev: v4.6.0
 4 |     hooks:
 5 |       - id: end-of-file-fixer
 6 |       - id: check-yaml
 7 | 
 8 |   - repo: https://github.com/charliermarsh/ruff-pre-commit
 9 |     rev: v0.4.5
10 |     hooks:
11 |       - id: ruff
12 |         args: [--fix]
13 |       - id: ruff-format
14 | 
15 |   - repo: https://github.com/pre-commit/mirrors-mypy
16 |     rev: v1.10.0
17 |     hooks:
18 |       - id: mypy
19 | 


--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------
  1 | # Changelog
  2 | 
  3 | ## Thought
  4 | 
  5 | - The usage of `TreeSitter` in theory can help to identify correct place for comment.
  6 |   The problem is that it have different attributes for every supported programming language.
  7 | - Even if the `Python` scripts are just for `PoC` I want to write some tests that will allow me to catch edge cases.
  8 | 
  9 | For now all tests are done on `Python` files, as the `TreeSitter` might work differently for other formats.
 10 | 
 11 | ## Newest update
 12 | 
 13 | ## 03.06.2024
 14 | 
 15 | In progress with creating demo.
 16 | 
 17 | ## 18.05.2024
 18 | 
 19 | While extracting comments we should add info if they are alone in line or next to code.
 20 | 
 21 | ## 15.05.2024
 22 | 
 23 | There is bug that if there is many lines comment then in diff_tree only one is attached.
 24 | 
 25 | Also the code works on files only in this repo (because we use commits from current repo).
 26 | We also need to handle situtaion where there is no repo.
 27 | 
 28 | ## 10.05.2024
 29 | 
 30 | I wont be able to rewrite any of this to Rust as I planed initialy.
 31 | Instead I will focus to deliver working flow in Python.
 32 | 
 33 | I need to restructure my scripts and prepare them for pip install.
 34 | 
 35 | ## 04.05.2024
 36 | 
 37 | At first I was worry that the JSON needs to keep only comment and minimum data.
 38 | But with this limitation it might be really hard to have good data structure.
 39 | 
 40 | New idea for data structure is to keep whole "branch" of AST (Abstract Syntax Tree) next to comment.
 41 | In each file there always be only unique "branches" like `class -> function -> variable`. It can be nested
 42 | any number of times but at the end there will be comment. It already should tell us where the comment should be placed.
 43 | 
 44 | ## 03.05.2024
 45 | 
 46 | To address the problem of possible double injecting comments and double extracting I think JSON needs to be "DB" with all comments.
 47 | We always needs to think that JSON contains all comments, and we can just apply them on code no matter if there is already this comment or not.
 48 | To do so it might be helpful to think about better format structure that would keep not a row in code but something semantic.
 49 | Also when join algorithm is running it first should gather all the comments, throw them to JSON, resolve duplicates and then apply the comments.
 50 | 
 51 | ## 29.04.2024
 52 | 
 53 | I run `extract_comments.py` and `join_comments.py` one after another on different Python source files to find edge cases.
 54 | If the source file is not same after two scripts it means that there might be edge case.
 55 | 
 56 | So far I found edge case related two variables defined in global space, which are under the `module` parent.
 57 | 
 58 | ## 26.04.2024
 59 | 
 60 | Another day without progress, I do not have idea how to solve problem of many edge cases.
 61 | 
 62 | ## 25.04.2024
 63 | 
 64 | I wrote some extension that would work with PREFIX. However my conclusion is that it still does not solve most of the problems.
 65 | Also I am overthinking the problem of situation when user is turning off comments, writing something and turning them on again (in a loop).
 66 | 
 67 | For now I need to focus to finish base idea and write down all needed features.
 68 | 
 69 | ## 24.04.2024
 70 | 
 71 | The double join problem made me think that we need to have any kind of control over the state.
 72 | We can try to determine it on the fly or we can save it somewhere.
 73 | In both cases the problem is that user might mix the already saved comments with new one etc.
 74 | It will always create problems like the comments that are partially loaded or modified.
 75 | 
 76 | To prevent that I think that we can make the comments with prefix like `# LOCAL: this is comment`.
 77 | This way you see if the comments are loaded or not. You never delete comments that are not meant for deletion.
 78 | You can have commands that resolve better and easier any problems. They also should be easier to parse.
 79 | 
 80 | ## 23.04.2024
 81 | 
 82 | The double usage of `join_comments.py` will modify the file twice. It needs to be solved.
 83 | 
 84 | ## 19.04.2024
 85 | 
 86 | The main idea is ready. The diff algorithm for sure needs tweaks and more tests but its something to fix later.
 87 | Now I need a working flow. For this I will create CLI tool with options to choose.
 88 | Along the way the implementation of algorithms will probably change as well as JSON format and things I haven't thought about yet.
 89 | 
 90 | ## 12.04.2024
 91 | 
 92 | After refactor and first test case I will probably continue with this approach. Now it is time to create tests for other files.
 93 | 
 94 | ## 08.04.2024
 95 | 
 96 | After many hours I decided to check how `linting` tools manage tests.
 97 | This won't only be helpful in `PoC` but especially in final solution, where the tests will be crucial.
 98 | 
 99 | I decided to check the test code for `black` and `isort`. It is almost terrifying
100 | 
101 | ## 07.04.2024
102 | 
103 | After small refactor of `extract_comments.py` and `join_comments.py` also the `diff_tree.py` needs one.
104 | The biggest problem with this files is how to handle the files, so it will be easy to test, use, and put in to the flow.
105 | 
106 | ## 03.04.2024
107 | 
108 | I need to rethink some design solutions. Nothing important today.
109 | 
110 | ## 31.03.2024
111 | 
112 | This is Easter commit. So I can only offer an :egg:.
113 | 
114 | ## 29.03.2024
115 | 
116 | So I am not very happy with current algorithm, but with some adjustments it should work well enough for proof of concept.
117 | After tweaks I would like to prepare demo of how it can work in action.
118 | 
119 | ## 24.03.2024
120 | 
121 | To summarize small research - we use `LCS` algorithm to find the `LCS` and then we see what is missing or what is new in newer file.
122 | I will probably try to use old file with comments in, and just omit the fact of absence in newer one.
123 | 
124 | ## 23.03.2024
125 | 
126 | Some research on most popular option to show the diff.
127 | So it terns out that it is mostly `Longest Common Subsequence` (LCS).
128 | With that knowledge I want to test algorithms that will allow me to identify the same nodes between code versions.
129 | 
130 | ## 22.03.2024
131 | 
132 | Learning more about how to diff two `AST`s.
133 | I want solution that will allow me to support users that forget or intentionally did not load the comments to the code.
134 | In this situation I need to take last commits and parse thru all commits to current one tracking the position of nodes.
135 | 
136 | If all research will fail, I will probably build a workflow that will inform user about unsolved comments.
137 | 
138 | ## 21.03.2024
139 | 
140 | The flow is working, you can use `main.py` to remove comments and `join_them.py` to place them in correct place again.
141 | Next I want to check if I can use `TreeSitter` to find how the code changed between comments.
142 | 
143 | ## 18.03.2024
144 | 
145 | In progress with modifying `join_them` and `main` so we remove empty lines after comments:
146 | 
147 | ```python
148 | def foo():
149 |     # comment
150 |     # comment
151 |     ...
152 | ```
153 | 
154 | Will be:
155 | 
156 | ```python
157 | def foo():
158 |     ...
159 | 
160 | ```
161 | 
162 | Not:
163 | 
164 | ```python
165 | def foo():
166 | 
167 | 
168 |     ...
169 | ```
170 | 
171 | ## 17.03.2024
172 | 
173 | `join_them.py` fixed.
174 | 
175 | Time to design flow with happy path and create little demo.
176 | 
177 | ## 15.03.2024
178 | 
179 | The `join_them.py` must be fixed.
180 | 
181 | ## 14.03.2024
182 | 
183 | In progress research of different comments standards.
184 | For example [TODO comments](https://github.com/stsewd/tree-sitter-comment).
185 | That basically means that we could create some special syntax for our comments,
186 | or wrap this tool around `TODO comments`.
187 | 
188 | ## 13.03.2024
189 | 
190 | I found [diffsitter](https://github.com/afnanenayet/diffsitter) which may help me
191 | to find corresponding nodes between commits.
192 | 
193 | Not yet decide if I will use it as a part of the system or learn how it works to implement part of that.
194 | 
195 | ## 11.03.2024
196 | 
197 | Today only some small tests without new results.
198 | 
199 | ## 09.03.2024
200 | 
201 | Day spend on reading Tree Sitter documentation.
202 | New idea is to build flow around git hook.
203 | So the flow would go:
204 | 
205 | ```
206 |       | Load comments |
207 |               V
208 |   | Write code and comments |
209 |               V
210 |         | Save file |
211 |               V
212 |    | Run comment remover |
213 |               V
214 |   | Comments saved in file |
215 |   | with commit id attached |
216 | ```
217 | 
218 | Now this flow is happy path. We have file with comments load to the same file from which we removed them.
219 | Problem begins if someone would not load comments right after.
220 | Then we can try to get commit in which they were removed,
221 | load the old tree, load current tree and find corresponding nodes.
222 | Then we can apply the old comments to new file.
223 | 
224 | As a addition we can create messages of CLI tool that inform about hanging comments or other problems.
225 | 
226 | ## 08.03.2024
227 | 
228 | At this moment comments are stored by their position in bytes in file.
229 | The better approach I am trying to implement is to put them via their connection with node or line in code.
230 | In version when we can store them by corresponding node, we can them push them in similar way to tree.
231 | The way to which node to attach them will be to determine if they are on the left from code or above another node.
232 | 
233 | ## 06.03.2024
234 | 
235 | In `poc` there are two programs: `main.py` that removes comments from file
236 | and put them to json file and `join_them.py` that merges two files together again.
237 | 
238 | Now I need more research to decide how to handle the flow.
239 | 
240 | ## 03.03.2024
241 | 
242 | So we have working `comment remover`. Just few lines of code and we can remove comments from file.
243 | The final solution will be probably in `Rust` but `PoCs` I will do in `Python`.
244 | 
245 | ## 02.03.2024
246 | 
247 | So after more research I think that it may work similar to formatter or linter.
248 | You could write comment as normal, and on save it could collect all new comments and extract them too new file.
249 | It would give advantage of not changing the flow of writing comments.
250 | So the thing is that after collecting them we need to display them to user but also allow user to edit them.
251 | We can then have toggle mechanism that would load comments to file and the extract them, but it would mess up git on every toggle.
252 | 
253 | The project needs to start, so probably first step will be to do some PoC with tools like TreeSitter check if we can use existing CI/CD tools.
254 | 
255 | ## Initial plan
256 | 
257 | This is how most of our comments will look like:
258 | 
259 | ```python
260 | def main():
261 |     print("Hello World!") # This is my comment
262 | ```
263 | 
264 | The problem is that sometimes we want to include some comments or notes that should not go to public repository.
265 | 
266 | Lets call them **Comments of shame** but basically I mean private notes and thoughts.
267 | 
268 | My initial plan is to keep them in separate file:
269 | 
270 | ```python
271 | def main():
272 |     print("Hello World!")
273 | ```
274 | 
275 | And maybe something like:
276 | 
277 | ```bash
278 | <function: main> <1st: print> <on the right> # This is my comment
279 | ```
280 | 
281 | The syntax is just to demonstrate idea.
282 | 
283 | ## Approach
284 | 
285 | Options are many. The most difficult will be to keep comments in right place when code is modified or refactored.
286 | Probably there might be some idling comments and notes.
287 | 
288 | My smart ideas are to:
289 | 
290 | - Use **TreeSitter** or basically keep comments attached to Code Tree Element
291 | - Force **LSP** to display them as `errors` or something
292 | -
293 | 
294 | Difficult parts are:
295 | 
296 | - Abandoned comments
297 | - How to enter the note/comment edition
298 | - How to keep track on refactored function (moved from one place to another)
299 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2024 Cvaniak
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Faint Comments
  2 | 
  3 | This repository is my attempt in [100 commits competition](https://100commitow.pl/).
  4 | 
  5 | > :warning: This project is in PoC and still needs to be polished. Always pin version.
  6 | > Do backup or use version control before applying any command.
  7 | 
  8 | ## What it is?
  9 | 
 10 | This library keeps your code comments in separate file and allows to put them back on.
 11 | 
 12 | ## Why
 13 | 
 14 | Comments are [code smell](https://refactoring.guru/pl/smells/comments). But sometimes:
 15 | 
 16 | - you just want to have your own note that should never appear in Version Control.
 17 | - you have code that needs some comments but comments must always be removed before deploy on production.
 18 | - you want to give someone instruction on what to do in certain files.
 19 | 
 20 | In all this cases it would be handy to keep all comments
 21 | in separate file that could be potentially _git ignored_
 22 | but at the same time applied whenever you want to read them.
 23 | 
 24 | Also I thought I can learn some cool algorithms and tools.
 25 | 
 26 | ---
 27 | 
 28 | ## Demo
 29 | 
 30 | Imagine you have your code:
 31 | 
 32 | ```python
 33 | # TODO: this name must be changed
 34 | def foo():
 35 |     ...
 36 |     i  = 0x5f3759df - ( i >> 1 ) # what the quack?
 37 |     ...
 38 | ```
 39 | 
 40 | and maybe you do not think it is good to keep your comments in code.  
 41 | You run `faint extract <name_of_the_file>.py` as a result you get:
 42 | 
 43 | ```python
 44 | def foo():
 45 |     ...
 46 |     i  = 0x5f3759df - ( i >> 1 )
 47 |     ...
 48 | 
 49 | ```
 50 | 
 51 | and `JSON` file:
 52 | 
 53 | ```json
 54 | {
 55 |   "comments": [
 56 |     {
 57 |       "start": {
 58 |         "row": 4,
 59 |         "column": 33
 60 |       },
 61 |       "text": "# what the quack?",
 62 |       "is_inline": true
 63 |     },
 64 |     {
 65 |       "start": {
 66 |         "row": 1,
 67 |         "column": 0
 68 |       },
 69 |       "text": "# TODO: this name must be changed",
 70 |       "is_inline": false
 71 |     }
 72 |   ],
 73 |   "deleted_lines": [1],
 74 |   "file_metadata": {
 75 |     "commit_sha": "sha_of_commit",
 76 |     "file_name": "path/to/file/<name_of_the_file>.py",
 77 |     "file_sha": "hash_of_the_file"
 78 |   }
 79 | }
 80 | ```
 81 | 
 82 | and this file is kept as `comment_<name_of_the_file>.json`.  
 83 | At any moment you can `faint join <name_of_the_file>.py` and you will get original file.
 84 | 
 85 | But this tool also tries to handle situation when you will modify the file before `join` command and more.
 86 | 
 87 | ## How to install
 88 | 
 89 | You can `pip` install:
 90 | 
 91 | ```bash
 92 | pip3 install faint
 93 | ```
 94 | 
 95 | ## How to use
 96 | 
 97 | You can check what is available via `faint --help` or just `faint`.
 98 | 
 99 | `faint` is made of two main subcommands:
100 | 
101 | - `faint extract <file name>` which removes comments from code and place them in separate `JSON` file.
102 | - `faint join <file name>` which applies comments from `JSON` file (if exists) in corresponding places.
103 | 
104 | By default (not changeable yet) `JSON` files are named as `comments_<original name>.json`.
105 | So you can add to `.gitignore` line like `comments_*.json`.
106 | 
107 | ### Workflow
108 | 
109 | First use `extract` on the file. Then you can:
110 | 
111 | - Use `join` and not modify the file
112 | - Use `join` and modify the file
113 | - Continue to modify the file and then use `join`
114 | 
115 | Then:
116 | 
117 | - If you apply `join` on file that matches exactly the file after `extract`
118 |   it should be fast and simple and you should get all comments back in place.
119 | - If you apply `join` and file follows the structure good enough you should also
120 |   get all comments back in right place.
121 | 
122 | In both cases the `extract` should work as at the beginning.
123 | 
124 | Current version does not allow for:
125 | 
126 | - `extract` on already `extract`ed file (it will discard all comments in `JSON`)
127 | - `join` on already `join`ed file (it will place the comments twice)
128 | 
129 | So if you want to add any new comments you should first `join` comments and the extract them.
130 | 
131 | ## Supported languages
132 | 
133 | This tool uses `TreeSitter`. For every language the `AST` (Abstract Syntax Tree) has different nodes.
134 | For this reason every language needs to be covered separately.
135 | 
136 | - [x] Python
137 | 
138 | ## Tests
139 | 
140 | This tool will have many edge cases to cover. The tests for now does not need to pass any treshold, but are more a track of edge cases to cover in future.
141 | 
142 | ## TODO
143 | 
144 | - [x] Make it pip installable
145 | - [ ] Fix double join
146 | - [ ] When extracting, compare with current `JSON`
147 | - [ ] Show abandoned comments
148 | - [ ] Show deleted comments
149 | - [ ] When `join` and comment is abandoned inform user. Then suggest `--force` flag.
150 | - [ ] Allow for path to subdirectory
151 | - [ ] Figure out how to install TreeSitter per language
152 | - [ ] Add Git hook
153 | - [ ] Handle when there is new comment on already `extract`ed file
154 | - [ ] Create `join-extract` command
155 | - [ ] Write path to file as relative to repo
156 | 
157 | ## Future plans
158 | 
159 | ## Sumarry of 100 commits challange
160 | 
161 | It is really hard to be consistent in anything. But being consistent is one thing; keeping a project going through various changes is another.
162 | Having strict rules that require you to add something to your project might be beneficial. It keeps you thinking about new changes and helps you remember what is already there. However, it also has downsides. For me, it was challenging to complete larger tasks.
163 | In the end, I am happy that I tried to finish this challenge and create the tool that had been in the back of my mind for a long time. I applied changes as needed.
164 | 
165 | 


--------------------------------------------------------------------------------
/faint/__init__.py:
--------------------------------------------------------------------------------
1 | """Tool to manage your comments in code!"""
2 | 
3 | __version__ = "0.1"
4 | 


--------------------------------------------------------------------------------
/faint/__main__.py:
--------------------------------------------------------------------------------
 1 | from faint import cli
 2 | 
 3 | 
 4 | def main():
 5 |     cli.app()
 6 | 
 7 | 
 8 | if __name__ == "__main__":
 9 |     main()
10 | 


--------------------------------------------------------------------------------
/faint/branch_structure.py:
--------------------------------------------------------------------------------
  1 | import bisect
  2 | import json
  3 | from dataclasses import asdict, dataclass
  4 | from pathlib import Path
  5 | from typing import List
  6 | 
  7 | import git
  8 | from tree_sitter import Node, Tree
  9 | from utils import get_tree
 10 | 
 11 | 
 12 | @dataclass
 13 | class Position:
 14 |     row: int
 15 |     column: int
 16 | 
 17 | 
 18 | @dataclass
 19 | class Comment:
 20 |     start: Position
 21 |     text: str
 22 |     branch: list[str]
 23 | 
 24 | 
 25 | @dataclass
 26 | class CommentsStruct:
 27 |     comments: List[Comment]
 28 |     deleted_lines: List[int]
 29 |     commit_sha: str
 30 |     file_name: str
 31 | 
 32 | 
 33 | def collect_comment_nodes(tree: Tree):
 34 |     comment_nodes = []
 35 | 
 36 |     def _collect_comment_ranges(node: Node):
 37 |         if node.type == "comment":
 38 |             branch = []
 39 |             curr = node
 40 |             print(curr, tree.root_node)
 41 |             while curr is not None and curr != tree.root_node:
 42 |                 print(curr)
 43 |                 branch.append(str(curr.text, encoding="utf8"))
 44 |                 # branch.append(curr.sexp)
 45 |                 curr = curr.parent
 46 |             comment_nodes.append(
 47 |                 (
 48 |                     Comment(
 49 |                         start=Position(*node.start_point),
 50 |                         text=str(node.text, encoding="utf-8"),
 51 |                         branch=branch,
 52 |                     ),
 53 |                     node,
 54 |                 )
 55 |             )
 56 |         else:
 57 |             for child in node.children:
 58 |                 _collect_comment_ranges(child)
 59 | 
 60 |     _collect_comment_ranges(tree.root_node)
 61 | 
 62 |     return comment_nodes
 63 | 
 64 | 
 65 | def remove_comments(tree, lines):
 66 |     # Collect ranges for comments
 67 |     comments_nodes: list[tuple[Comment, Node]] = collect_comment_nodes(tree)
 68 | 
 69 |     deleted_lines = []
 70 | 
 71 |     # Remove comments by replacing them with spaces (to preserve formatting)
 72 |     for comment, node in reversed(comments_nodes):  # Reverse to avoid offset issues
 73 |         start_p = comment.start
 74 | 
 75 |         # comment = Comment(
 76 |         #     start=Position(*start_p), text=lines[start_p[0]][start_p[1] : end_p[1]]
 77 |         # )
 78 | 
 79 |         # TODO: here was edge case, needs to be watched for more
 80 |         if (
 81 |             node.parent is not None
 82 |             and node.parent.start_point[0] != start_p.row
 83 |             and node.prev_sibling is not None
 84 |             and node.prev_sibling.start_point[0] != start_p.row
 85 |         ) or (node.prev_sibling is None):
 86 |             lines[start_p.row] = ""
 87 |             bisect.insort(deleted_lines, start_p.row)
 88 |         else:
 89 |             lines[start_p.row] = lines[start_p.row][: start_p.column].rstrip() + "\n"
 90 | 
 91 |     lines = [x for x in lines if x != ""]
 92 | 
 93 |     return [comment for comment, _ in reversed(comments_nodes)], deleted_lines, lines
 94 | 
 95 | 
 96 | def get_commit_sha():
 97 |     repo = git.Repo(search_parent_directories=True)
 98 |     sha = repo.head.object.hexsha
 99 |     return sha
100 | 
101 | 
102 | def extract_comments(input_file_path, source_code, lines):
103 |     tree = get_tree(source_code)
104 |     comments, deleted_lines, new_lines = remove_comments(tree, lines)
105 | 
106 |     out_data = CommentsStruct(comments, deleted_lines, get_commit_sha(), input_file_path)
107 | 
108 |     return new_lines, out_data
109 | 
110 | 
111 | # Example of reading,
112 | # processing, and writing the file
113 | # also three line comment
114 | def main(input_file_path: Path, output_file_path: Path, output_comments_file_path: Path):
115 |     with open(input_file_path, "r", encoding="utf-8") as file:
116 |         source_code = file.read()
117 | 
118 |     with open(input_file_path, "r", encoding="utf-8") as file:
119 |         lines = file.readlines()
120 | 
121 |     new_lines, out_data = extract_comments(str(input_file_path), source_code, lines)
122 | 
123 |     with open(output_file_path, "w", encoding="utf-8") as file:
124 |         file.writelines(new_lines)
125 | 
126 |     with open(output_comments_file_path, "w", encoding="utf-8") as file:
127 |         json.dump(asdict(out_data), file, indent=4)
128 | 
129 | 
130 | if __name__ == "__main__":
131 |     input_file_path = Path("./tests/cases/happy_path/with_comments.py")
132 |     output_file_path = Path("./tests/cases/happy_path/no_comments.py")
133 |     output_comments_file_path = Path("./tests/cases/happy_path/comments.json")
134 |     main(input_file_path, output_file_path, output_comments_file_path)
135 | 


--------------------------------------------------------------------------------
/faint/cli.py:
--------------------------------------------------------------------------------
 1 | from pathlib import Path
 2 | from typing import Annotated
 3 | 
 4 | import typer
 5 | 
 6 | from faint import diff_trees, extract_comments, join_comments
 7 | from faint.utils import compare_files, get_tree
 8 | 
 9 | app = typer.Typer(no_args_is_help=True)
10 | 
11 | FaintFile = Annotated[
12 |     Path,
13 |     typer.Argument(
14 |         exists=True,
15 |         file_okay=True,
16 |         dir_okay=False,
17 |         writable=False,
18 |         readable=True,
19 |         resolve_path=True,
20 |     ),
21 | ]
22 | 
23 | 
24 | @app.command()
25 | def extract(file: FaintFile):
26 |     """
27 |     Extract comments from choosen file
28 |     """
29 |     json_file = file.with_name(f"comments_{file.stem}.json")
30 | 
31 |     extract_comments.main(file.absolute(), file.absolute(), json_file)
32 |     print(f"{file.stem} extracted to {json_file}")
33 |     print("Some stats could be shown here")
34 | 
35 | 
36 | @app.command()
37 | def join(file: FaintFile):
38 |     """
39 |     Join comments with source file
40 |     """
41 |     json_file = file.with_name(f"comments_{file.stem}.json")
42 |     if not json_file.exists():
43 |         raise typer.BadParameter("File does not exist.")
44 | 
45 |     if compare_files(json_file, file):
46 |         join_comments.main(file.absolute(), file.absolute(), json_file)
47 |         print(f"{file.stem} is joined with {json_file.stem}")
48 |         print("Some stats could be shown here")
49 |     else:
50 |         diff_trees.main_between_commits(file, json_file)
51 |         print("Difficult case")
52 | 
53 | 
54 | @app.command()
55 | def list_comments(file: FaintFile):
56 |     """
57 |     List all comments existing in code
58 |     """
59 |     with open(file, "r", encoding="utf-8") as f:
60 |         source_code = f.read()
61 |     tree = get_tree(source_code)
62 | 
63 |     comment_nodes = extract_comments.collect_comment_nodes(tree)
64 | 
65 |     for comment in comment_nodes:
66 |         print(f"line {comment.start_point[0]}:\n{comment.text.decode('utf8')}")
67 | 
68 | 
69 | if __name__ == "__main__":
70 |     app()
71 | 


--------------------------------------------------------------------------------
/faint/diff_trees.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | from dataclasses import dataclass
  3 | from pathlib import Path
  4 | from typing import List, Optional, cast
  5 | 
  6 | from tree_sitter import Node, Parser
  7 | 
  8 | from faint.utils import get_file_bytes_by_commit_sha, load_language
  9 | 
 10 | 
 11 | @dataclass
 12 | class LeafNode:
 13 |     text: str
 14 |     line: int
 15 |     comment: bool = False
 16 |     marked: bool = False
 17 |     alone: bool = True
 18 |     node: Optional["LeafNode"] = None
 19 |     column: int = 0
 20 |     below_comment: Optional["LeafNode"] = None
 21 | 
 22 |     def __eq__(self, other):
 23 |         return self.text == other.text
 24 | 
 25 | 
 26 | @dataclass
 27 | class MissingComments:
 28 |     comment: LeafNode
 29 |     target_node: Optional[LeafNode] = None
 30 | 
 31 | 
 32 | def serialize_tree(node: Node, leaf_nodes: list[LeafNode]) -> None:
 33 |     # It contains children -> it is "leaf"
 34 |     if node.child_count > 0:
 35 |         for child in node.children:
 36 |             serialize_tree(child, leaf_nodes)
 37 | 
 38 |     else:
 39 |         x = LeafNode(node.text.decode("utf-8"), node.start_point[0], column=node.start_point[1])
 40 | 
 41 |         if node.type == "comment":
 42 |             x.comment = True
 43 |             if leaf_nodes:
 44 |                 # If code is in the same line with comment
 45 |                 if leaf_nodes[-1].line == x.line:
 46 |                     x.alone = False
 47 |                     x.node = leaf_nodes[-1]
 48 |                     leaf_nodes[-1].marked = True
 49 |                     leaf_nodes[-1].node = x
 50 | 
 51 |                 # If previous node is also comment we group them
 52 |                 elif leaf_nodes[-1].comment:
 53 |                     x.below_comment = leaf_nodes[-1]
 54 | 
 55 |         else:
 56 |             # if previous leaf is comment we mark this node and attach comment node
 57 |             if leaf_nodes and leaf_nodes[-1].comment and leaf_nodes[-1].alone:
 58 |                 x.marked = True
 59 |                 leaf_nodes[-1].node = x
 60 |                 x.node = leaf_nodes[-1]
 61 | 
 62 |         leaf_nodes.append(x)
 63 | 
 64 | 
 65 | def get_serialized_tree_bytes(file: bytes, parser: Parser) -> list[LeafNode]:
 66 |     tree = parser.parse(file)
 67 | 
 68 |     serialized_tree: list[LeafNode] = []
 69 |     serialize_tree(tree.root_node, serialized_tree)
 70 | 
 71 |     return serialized_tree
 72 | 
 73 | 
 74 | def lcs(tree_a: list[LeafNode], tree_b: list[LeafNode]) -> list[list[int]]:
 75 |     m, n = len(tree_a), len(tree_b)
 76 |     matrix = [[0] * (n + 1) for _ in range(m + 1)]
 77 |     for i in range(1, m + 1):
 78 |         for j in range(1, n + 1):
 79 |             if tree_a[i - 1] == tree_b[j - 1]:
 80 |                 matrix[i][j] = matrix[i - 1][j - 1] + 1
 81 |             else:
 82 |                 matrix[i][j] = max(matrix[i][j - 1], matrix[i - 1][j])
 83 |     return matrix
 84 | 
 85 | 
 86 | def backtrack(
 87 |     matrix: list[list[int]],
 88 |     tree_a: list[LeafNode],
 89 |     tree_b: list[LeafNode],
 90 |     i: int,
 91 |     j: int,
 92 | ) -> list[MissingComments]:
 93 |     # Terminate
 94 |     if i == 0 or j == 0:
 95 |         return []
 96 | 
 97 |     # If we have this node in both trees
 98 |     elif tree_a[i - 1] == tree_b[j - 1]:
 99 |         added = backtrack(matrix, tree_a, tree_b, i - 1, j - 1)
100 |         if tree_a[i - 1].marked:
101 |             if tree_a[i - 1].node is not None:
102 |                 val = cast(LeafNode, tree_a[i - 1].node)
103 |                 added.append(MissingComments(target_node=tree_b[j - 1], comment=val))
104 |         return added
105 | 
106 |     else:
107 |         # If this node is only in newer tree
108 |         if matrix[i][j - 1] > matrix[i - 1][j]:
109 |             added = backtrack(matrix, tree_a, tree_b, i, j - 1)
110 |         # If this node is only in old tree
111 |         else:
112 |             added = backtrack(matrix, tree_a, tree_b, i - 1, j)
113 |             # This is node that have comment but we do not know where to put it
114 |             if tree_a[i - 1].marked:
115 |                 if tree_a[i - 1].node is not None:
116 |                     val = cast(LeafNode, tree_a[i - 1].node)
117 |                     added.append(MissingComments(target_node=tree_b[j - 1], comment=val))
118 |         return added
119 | 
120 | 
121 | def backtrack_add_remove(matrix, tree_a, tree_b, i, j):
122 |     if i == 0 or j == 0:
123 |         return [], [], []
124 |     elif tree_a[i - 1] == tree_b[j - 1]:
125 |         added, removed, common = backtrack_add_remove(matrix, tree_a, tree_b, i - 1, j - 1)
126 |         common.append(tree_a[i - 1])
127 |         return added, removed, common
128 |     else:
129 |         if matrix[i][j - 1] > matrix[i - 1][j]:
130 |             added, removed, common = backtrack_add_remove(matrix, tree_a, tree_b, i, j - 1)
131 |             added.append(tree_b[j - 1])
132 |         else:
133 |             added, removed, common = backtrack_add_remove(matrix, tree_a, tree_b, i - 1, j)
134 |             removed.append(tree_a[i - 1])
135 |         return added, removed, common
136 | 
137 | 
138 | def display_diff(added: List[MissingComments]):
139 |     # added, removed, common = backtrack(
140 | 
141 |     for item in added:
142 |         if item.target_node is not None:
143 |             if item.comment.alone:
144 |                 print(f"line: {item.target_node.line}\n{item.comment.text}\n{item.target_node.text}")
145 |             else:
146 |                 print(f"line: {item.target_node.line}\n{item.target_node.text} {item.comment.text}")
147 |         else:
148 |             print(f"abandoned: {item.comment.text}")
149 |         print()
150 | 
151 | 
152 | # NOTE: Remember it is modified in place
153 | def apply_missing_comments(content: list[str], diffs: list[MissingComments]):
154 |     # We apply from last line to first
155 |     # So we do not move
156 |     shift = 0
157 | 
158 |     for item in diffs:
159 |         # This is abandoned
160 |         if item.target_node is None:
161 |             continue
162 | 
163 |         if item.comment.alone:
164 |             # Apply possibly grouped comments
165 |             lines = []
166 |             curr = item.comment
167 |             while curr.below_comment:
168 |                 lines.append(curr)
169 |                 curr = curr.below_comment
170 |             lines.append(curr)
171 | 
172 |             row = item.target_node.line
173 |             for line in reversed(lines):
174 |                 x = line.text
175 |                 if x[-1] != "\n":
176 |                     x = x + "\n"
177 |                 content.insert(row + shift, " " * line.column + x)
178 |                 shift += 1
179 | 
180 |         else:
181 |             if len(content) <= item.target_node.line + shift:
182 |                 # Should not happen
183 |                 print(
184 |                     "Target node have line higher that file length",
185 |                     item.target_node.line,
186 |                     shift,
187 |                     item.comment.text,
188 |                 )
189 |                 continue
190 | 
191 |             x = content[item.target_node.line + shift][:-1]
192 |             content[item.target_node.line + shift] = x + "  " + item.comment.text + "\n"
193 | 
194 |     return content
195 | 
196 | 
197 | def find_missing_comments(tree_a: list[LeafNode], tree_b: list[LeafNode]) -> list[MissingComments]:
198 |     lcs_sequence = lcs(tree_a, tree_b)
199 | 
200 |     added = backtrack(
201 |         lcs_sequence,
202 |         tree_a,
203 |         tree_b,
204 |         len(tree_a),
205 |         len(tree_b),
206 |     )
207 |     return added
208 | 
209 | 
210 | def main(file_in_1: Path, file_in_2: Path, file_out: Path):
211 |     parser = load_language()
212 | 
213 |     with open(file_in_1, "rb") as file:
214 |         file_bytes_1 = file.read()
215 |     with open(file_in_2, "rb") as file:
216 |         file_bytes_2 = file.read()
217 | 
218 |     tree1 = get_serialized_tree_bytes(file_bytes_1, parser)
219 |     tree2 = get_serialized_tree_bytes(file_bytes_2, parser)
220 | 
221 |     added = find_missing_comments(tree1, tree2)
222 | 
223 |     display_diff(added)
224 |     with open(file_in_2, "r") as file:
225 |         origin_file_data = file.readlines()
226 | 
227 |     content = apply_missing_comments(origin_file_data, added)
228 | 
229 |     with open(file_out, "w", encoding="utf-8") as output_file:
230 |         output_file.writelines(content)
231 | 
232 | 
233 | def main_between_commits(file: Path, json_file: Path):
234 |     parser = load_language()
235 |     with open(json_file, "r", encoding="utf-8") as f:
236 |         comments_data = json.load(f)
237 | 
238 |     original_file = get_file_bytes_by_commit_sha(file, comments_data["file_metadata"]["commit_sha"])
239 |     with open(file, "rb") as f:
240 |         file_bytes = f.read()
241 | 
242 |     tree1 = get_serialized_tree_bytes(original_file, parser)
243 |     tree2 = get_serialized_tree_bytes(file_bytes, parser)
244 | 
245 |     added = find_missing_comments(tree1, tree2)
246 | 
247 |     with open(file, "r") as f:
248 |         origin_file_data = f.readlines()
249 | 
250 |     content = apply_missing_comments(origin_file_data, added)
251 | 
252 |     with open(file, "w", encoding="utf-8") as f:
253 |         f.writelines(content)
254 | 
255 | 
256 | if __name__ == "__main__":
257 |     file_a = Path("./tests/cases/happy_path/a.py")
258 |     file_b = Path("./tests/cases/happy_path/b.py")
259 |     file_out = Path("./tests/cases/happy_path/out.py")
260 |     main(file_a, file_b, file_out)
261 | 


--------------------------------------------------------------------------------
/faint/extract_comments.py:
--------------------------------------------------------------------------------
  1 | import bisect
  2 | import json
  3 | from dataclasses import asdict, dataclass
  4 | from pathlib import Path
  5 | from typing import List, Optional
  6 | 
  7 | from tree_sitter import Node
  8 | 
  9 | from faint.utils import get_commit_sha, get_lines_hash, get_tree
 10 | 
 11 | 
 12 | @dataclass
 13 | class Position:
 14 |     row: int
 15 |     column: int
 16 | 
 17 | 
 18 | @dataclass
 19 | class Comment:
 20 |     start: Position
 21 |     text: str
 22 |     is_inline: bool
 23 | 
 24 | 
 25 | @dataclass
 26 | class FileMetadata:
 27 |     commit_sha: str
 28 |     file_name: str
 29 |     file_sha: str
 30 | 
 31 | 
 32 | @dataclass
 33 | class CommentsStruct:
 34 |     comments: List[Comment]
 35 |     deleted_lines: List[int]
 36 |     file_metadata: Optional[FileMetadata] = None
 37 | 
 38 | 
 39 | def collect_comment_nodes(tree):
 40 |     comment_nodes = []
 41 | 
 42 |     def _collect_comment_ranges(node: Node):
 43 |         if node.type == "comment":
 44 |             comment_nodes.append(node)
 45 |         else:
 46 |             for child in node.children:
 47 |                 _collect_comment_ranges(child)
 48 | 
 49 |     _collect_comment_ranges(tree.root_node)
 50 | 
 51 |     return comment_nodes
 52 | 
 53 | 
 54 | def remove_comments(tree, lines):
 55 |     # Collect ranges for comments
 56 |     comment_nodes = collect_comment_nodes(tree)
 57 | 
 58 |     comments = []
 59 |     deleted_lines = []
 60 | 
 61 |     # Remove comments by replacing them with spaces (to preserve formatting)
 62 |     for node in reversed(comment_nodes):  # Reverse to avoid offset issues
 63 |         start_p, end_p = node.start_point, node.end_point
 64 |         is_inline = False
 65 |         text = lines[start_p[0]][start_p[1] : end_p[1]]
 66 | 
 67 |         # TODO: here was edge case, needs to be watched for more
 68 |         if (node.parent.start_point[0] != start_p[0] and node.prev_sibling.start_point[0] != start_p[0]) or (
 69 |             node.prev_sibling is None
 70 |         ):
 71 |             lines[start_p[0]] = ""
 72 |             bisect.insort(deleted_lines, start_p[0])
 73 |         else:
 74 |             lines[start_p[0]] = lines[start_p[0]][: start_p[1]].rstrip() + "\n"
 75 |             is_inline = True
 76 | 
 77 |         comment = Comment(
 78 |             start=Position(*start_p),
 79 |             text=text,
 80 |             is_inline=is_inline,
 81 |         )
 82 |         comments.append(comment)
 83 | 
 84 |     lines = [x for x in lines if x != ""]
 85 | 
 86 |     return comments, deleted_lines, lines
 87 | 
 88 | 
 89 | def extract_comments(input_file_path, source_code, lines):
 90 |     tree = get_tree(source_code)
 91 |     comments, deleted_lines, new_lines = remove_comments(tree, lines)
 92 | 
 93 |     metadata = FileMetadata(get_commit_sha(), input_file_path, get_lines_hash(lines))
 94 | 
 95 |     out_data = CommentsStruct(comments, deleted_lines, metadata)
 96 | 
 97 |     return new_lines, out_data
 98 | 
 99 | 
100 | # Example of reading,
101 | # processing, and writing the file
102 | # also three line comment
103 | def main(input_file_path: Path, output_file_path: Path, output_comments_file_path: Path):
104 |     with open(input_file_path, "r", encoding="utf-8") as file:
105 |         source_code = file.read()
106 | 
107 |     with open(input_file_path, "r", encoding="utf-8") as file:
108 |         lines = file.readlines()
109 | 
110 |     new_lines, out_data = extract_comments(str(input_file_path), source_code, lines)
111 | 
112 |     with open(output_file_path, "w", encoding="utf-8") as file:
113 |         file.writelines(new_lines)
114 | 
115 |     with open(output_comments_file_path, "w", encoding="utf-8") as file:
116 |         json.dump(asdict(out_data), file, indent=4)
117 | 
118 | 
119 | if __name__ == "__main__":
120 |     input_file_path = Path("./tests/cases/happy_path/with_comments.py")
121 |     output_file_path = Path("./tests/cases/happy_path/no_comments.py")
122 |     output_comments_file_path = Path("./tests/cases/happy_path/comments.json")
123 |     main(input_file_path, output_file_path, output_comments_file_path)
124 | 


--------------------------------------------------------------------------------
/faint/join_comments.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | from pathlib import Path
 3 | 
 4 | 
 5 | def apply_comments_to_file(comments_data, lines):
 6 |     # NOTE: it modify original lines. Would need deep copy but might not be necessary
 7 |     adjusted_lines = lines
 8 | 
 9 |     for line_to_append in comments_data["deleted_lines"]:
10 |         adjusted_lines.insert(line_to_append, "")
11 | 
12 |     for comment in comments_data["comments"]:
13 |         tmp = comment["start"]
14 |         line_number, column = tmp["row"], tmp["column"]
15 |         comment_text = comment["text"]
16 |         is_inline = comment["is_inline"]
17 | 
18 |         # NOTE: line is empty
19 |         if not is_inline:
20 |             adjusted_lines[line_number] = " " * column + comment_text + "\n"
21 | 
22 |         # NOTE: apply on right side of the code. Removes trailing spaces and apply exactly 2 spaces.
23 |         elif len(adjusted_lines[line_number][:-1]) <= column:
24 |             adjusted_lines[line_number] = adjusted_lines[line_number][:-1].rstrip() + "  " + comment_text + "\n"
25 | 
26 |         else:
27 |             raise ValueError("It should not happen.")
28 | 
29 |     return adjusted_lines
30 | 
31 | 
32 | def main(source_file_path: Path, output_file_path: Path, json_comments: Path):
33 |     with open(json_comments, "r", encoding="utf-8") as json_file:
34 |         comments_data = json.load(json_file)
35 | 
36 |     with open(source_file_path, "r", encoding="utf-8") as file:
37 |         lines = file.readlines()
38 | 
39 |     done = apply_comments_to_file(comments_data, lines)
40 | 
41 |     with open(output_file_path, "w", encoding="utf-8") as output_file:
42 |         output_file.writelines(done)
43 | 
44 | 
45 | if __name__ == "__main__":
46 |     source_file_path = Path("main_no_comments.py")
47 |     output_file_path = Path("main_with_comments.py")
48 |     json_comments = Path("just_comments.json")
49 |     main(source_file_path, output_file_path, json_comments)
50 | 


--------------------------------------------------------------------------------
/faint/utils.py:
--------------------------------------------------------------------------------
 1 | import hashlib
 2 | import json
 3 | from pathlib import Path
 4 | from typing import List
 5 | 
 6 | import git
 7 | import tree_sitter_python as tspython
 8 | from tree_sitter import Language, Parser
 9 | 
10 | 
11 | def load_language():
12 |     PY_LANGUAGE = Language(tspython.language())
13 |     parser = Parser()
14 |     parser.language = PY_LANGUAGE
15 |     return parser
16 | 
17 | 
18 | # Load the language library (Adjust the path to your compiled language library)
19 | def get_tree(source_code):
20 |     parser = load_language()
21 | 
22 |     tree = parser.parse(bytes(source_code, "utf8"))
23 |     return tree
24 | 
25 | 
26 | def get_file_hash(file: Path):
27 |     with open(file, "rb") as f:
28 |         digest = hashlib.file_digest(f, "sha256")
29 |     return digest.hexdigest()
30 | 
31 | 
32 | def get_lines_hash(lines: List[str]):
33 |     joined_lines = "".join(lines).encode("utf-8")
34 |     hasher = hashlib.sha256()
35 |     hasher.update(joined_lines)
36 | 
37 |     return hasher.hexdigest()
38 | 
39 | 
40 | def compare_files(json_file: Path, file: Path) -> bool:
41 |     file_hash = get_file_hash(file)
42 |     with open(json_file, "r", encoding="utf-8") as f:
43 |         comments_data = json.load(f)
44 | 
45 |     return file_hash == comments_data["file_metadata"]["file_sha"]
46 | 
47 | 
48 | def get_file_bytes_by_commit_sha(file: Path, commit_sha: str) -> bytes:
49 |     repo = git.Repo(search_parent_directories=True)
50 | 
51 |     # Get the commit
52 |     commit = repo.commit(commit_sha)
53 | 
54 |     # Get the file content at the specific commit
55 | 
56 |     repo_path = Path(repo.working_tree_dir).resolve()
57 |     file_path = Path(file).resolve()
58 | 
59 |     file_content = commit.tree / str(file_path.relative_to(repo_path))
60 | 
61 |     # Print the content
62 |     # return file_content.data_stream.read().decode("utf-8")
63 |     return file_content.data_stream.read()
64 | 
65 | 
66 | def get_commit_sha():
67 |     repo = git.Repo(search_parent_directories=True)
68 |     sha = repo.head.object.hexsha
69 |     return sha
70 | 


--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
 1 | [build-system]
 2 | requires = ["flit_core >=3.2,<4"]
 3 | build-backend = "flit_core.buildapi"
 4 | 
 5 | [project]
 6 | name = "faint"
 7 | authors = [{ name = "Cvaniak", email = "igna.cwaniak@gmail.com" }]
 8 | readme = "README.md"
 9 | license = { file = "LICENSE" }
10 | classifiers = ["License :: OSI Approved :: MIT License"]
11 | dynamic = ["version", "description"]
12 | requires-python = ">=3.7"
13 | dependencies = [
14 |   "typer==0.12.3",
15 |   "tree-sitter-python==0.21.0",
16 |   "tree-sitter==0.22.3",
17 |   "GitPython==3.1.43",
18 | ]
19 | 
20 | 
21 | [project.scripts]
22 | faint = "faint.__main__:main"
23 | 
24 | [tool.ruff]
25 | extend-exclude = ["tests/cases"]
26 | line-length = 120
27 | 
28 | [tool.ruff.lint]
29 | exclude = ["tests/cases/*"]
30 | select = ["E", "F", "I", "B", "A"]
31 | 
32 | [tool.ruff.format]
33 | exclude = ["tests/cases/*"]
34 | 
35 | [tool.mypy]
36 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | tree-sitter
 2 | gitpython
 3 | pytest
 4 | tree-sitter-python
 5 | 
 6 | typer
 7 | pdbpp
 8 | debugpy
 9 | 
10 | ruff
11 | isort
12 | flake8
13 | mypy
14 | pre-commit
15 | 


--------------------------------------------------------------------------------
/tests/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Cvaniak/Consistent/abd834c2559df0978ef4d031404362246bd38e2b/tests/__init__.py


--------------------------------------------------------------------------------
/tests/cases/happy_path/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Cvaniak/Consistent/abd834c2559df0978ef4d031404362246bd38e2b/tests/cases/happy_path/__init__.py


--------------------------------------------------------------------------------
/tests/cases/happy_path/comments.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "comments": [
 3 |         {
 4 |             "start": {
 5 |                 "row": 18,
 6 |                 "column": 0
 7 |             },
 8 |             "text": "# zys above",
 9 |             "is_inline": false
10 |         },
11 |         {
12 |             "start": {
13 |                 "row": 15,
14 |                 "column": 14
15 |             },
16 |             "text": "# Pass inline",
17 |             "is_inline": true
18 |         },
19 |         {
20 |             "start": {
21 |                 "row": 13,
22 |                 "column": 4
23 |             },
24 |             "text": "# Comment above",
25 |             "is_inline": false
26 |         },
27 |         {
28 |             "start": {
29 |                 "row": 12,
30 |                 "column": 10
31 |             },
32 |             "text": "# class A inline",
33 |             "is_inline": true
34 |         },
35 |         {
36 |             "start": {
37 |                 "row": 8,
38 |                 "column": 0
39 |             },
40 |             "text": "# fox above",
41 |             "is_inline": false
42 |         },
43 |         {
44 |             "start": {
45 |                 "row": 4,
46 |                 "column": 12
47 |             },
48 |             "text": "# tricky inline",
49 |             "is_inline": true
50 |         },
51 |         {
52 |             "start": {
53 |                 "row": 0,
54 |                 "column": 0
55 |             },
56 |             "text": "# Test above",
57 |             "is_inline": false
58 |         }
59 |     ],
60 |     "deleted_lines": [
61 |         0,
62 |         8,
63 |         13,
64 |         18
65 |     ],
66 |     "file_metadata": {
67 |         "commit_sha": "fe85b31f7a1bae98ab05fda142a263b1b3bdefcb",
68 |         "file_name": "/home/cvaniak/Code/Cvaniak/Consistent/tests/cases/happy_path/with_comments.py",
69 |         "file_sha": "ca43efba412cb8d653bca85bc735847140cec4fee867195ae8970e2c538874f9"
70 |     }
71 | }
72 | 


--------------------------------------------------------------------------------
/tests/cases/happy_path/for_diff.py:
--------------------------------------------------------------------------------
 1 | # Test above
 2 | def foo(): ...
 3 | 
 4 | 
 5 | def bar():  # tricky inline
 6 |     a = 10
 7 |     pass
 8 |     ...
 9 | 
10 | 
11 | # this will be abandoned
12 | def this_will_be_abandoned():
13 |     pass
14 | 
15 | 
16 | # fox above
17 | def fox(): ...
18 | 
19 | 
20 | class A:  # class A inline
21 |     # Comment above
22 |     def __init__(self) -> None:
23 |         pass  # Pass inline
24 | 
25 | 
26 | # zys above
27 | def zys(): ...
28 | 


--------------------------------------------------------------------------------
/tests/cases/happy_path/no_comments.py:
--------------------------------------------------------------------------------
 1 | def foo(): ...
 2 | 
 3 | 
 4 | def bar():
 5 |     ...
 6 | 
 7 | 
 8 | def fox(): ...
 9 | 
10 | 
11 | class A:
12 |     def __init__(self) -> None:
13 |         pass
14 | 
15 | 
16 | def zys(): ...
17 | 


--------------------------------------------------------------------------------
/tests/cases/happy_path/with_comments.py:
--------------------------------------------------------------------------------
 1 | # Test above
 2 | def foo(): ...
 3 | 
 4 | 
 5 | def bar():  # tricky inline
 6 |     ...
 7 | 
 8 | 
 9 | # fox above
10 | def fox(): ...
11 | 
12 | 
13 | class A:  # class A inline
14 |     # Comment above
15 |     def __init__(self) -> None:
16 |         pass  # Pass inline
17 | 
18 | 
19 | # zys above
20 | def zys(): ...
21 | 


--------------------------------------------------------------------------------
/tests/cases/unknown_problem_1/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Cvaniak/Consistent/abd834c2559df0978ef4d031404362246bd38e2b/tests/cases/unknown_problem_1/__init__.py


--------------------------------------------------------------------------------
/tests/cases/unknown_problem_1/comments.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "comments": [
 3 |         {
 4 |             "start": {
 5 |                 "row": 2,
 6 |                 "column": 40
 7 |             },
 8 |             "text": "# broken symlink pointing to itself",
 9 |             "is_inline": true
10 |         },
11 |         {
12 |             "start": {
13 |                 "row": 1,
14 |                 "column": 30
15 |             },
16 |             "text": "# fix for bpo-35306",
17 |             "is_inline": true
18 |         },
19 |         {
20 |             "start": {
21 |                 "row": 0,
22 |                 "column": 26
23 |             },
24 |             "text": "# drive exists but is not accessible",
25 |             "is_inline": true
26 |         }
27 |     ],
28 |     "deleted_lines": [],
29 |     "file_metadata": {
30 |         "commit_sha": "fe85b31f7a1bae98ab05fda142a263b1b3bdefcb",
31 |         "file_name": "/home/cvaniak/Code/Cvaniak/Consistent/tests/cases/unknown_problem_1/with_comments.py",
32 |         "file_sha": "f61ead675ffbc0c81d0e3c76bc3912688838608cccacb2d612850bf1337aaf37"
33 |     }
34 | }
35 | 


--------------------------------------------------------------------------------
/tests/cases/unknown_problem_1/no_comments.py:
--------------------------------------------------------------------------------
1 | _WINERROR_NOT_READY = 21
2 | _WINERROR_INVALID_NAME = 123
3 | _WINERROR_CANT_RESOLVE_FILENAME = 1921
4 | 
5 | def test():  
6 |     ...
7 | 


--------------------------------------------------------------------------------
/tests/cases/unknown_problem_1/with_comments.py:
--------------------------------------------------------------------------------
1 | _WINERROR_NOT_READY = 21  # drive exists but is not accessible
2 | _WINERROR_INVALID_NAME = 123  # fix for bpo-35306
3 | _WINERROR_CANT_RESOLVE_FILENAME = 1921  # broken symlink pointing to itself
4 | 
5 | def test():  
6 |     ...
7 | 


--------------------------------------------------------------------------------
/tests/test_diff.py:
--------------------------------------------------------------------------------
 1 | import pytest
 2 | 
 3 | from bip.diff_trees import (
 4 |     apply_missing_comments,
 5 |     find_missing_comments,
 6 |     get_serialized_tree_bytes,
 7 | )
 8 | from bip.utils import load_language
 9 | from tests.utils import get_lines_from_file
10 | 
11 | 
12 | @pytest.fixture
13 | def parser():
14 |     parser = load_language()
15 |     return parser
16 | 
17 | 
18 | @pytest.fixture
19 | def old_tree(path, parser):
20 |     old_file = path + "for_diff.py"
21 |     with open(old_file, "rb") as f:
22 |         file_bytes = f.read()
23 |     return get_serialized_tree_bytes(file_bytes, parser)
24 | 
25 | 
26 | @pytest.fixture
27 | def new_tree(path, parser):
28 |     new_file = path + "no_comments.py"
29 |     with open(new_file, "rb") as f:
30 |         file_bytes = f.read()
31 |     return get_serialized_tree_bytes(file_bytes, parser)
32 | 
33 | 
34 | class TestFindMissingComments:
35 |     @pytest.mark.parametrize("path", ["./tests/cases/happy_path/"])
36 |     def test_happy_path(self, path: str, old_tree, new_tree):
37 |         # given
38 | 
39 |         # when
40 |         missing = find_missing_comments(old_tree, new_tree)
41 | 
42 |         # then
43 |         # TODO: check if missing is valid
44 |         assert missing
45 | 
46 | 
47 | class TestApplyMissingComments:
48 |     @pytest.mark.parametrize("path", ["./tests/cases/happy_path/"])
49 |     def test_happy_path(self, path: str, old_tree, new_tree):
50 |         # given
51 |         missing = find_missing_comments(old_tree, new_tree)
52 | 
53 |         new_file_nc = path + "no_comments.py"
54 |         new_lines_nc = get_lines_from_file(new_file_nc)
55 | 
56 |         new_file_c = path + "with_comments.py"
57 |         expected_lines = get_lines_from_file(new_file_c)
58 | 
59 |         # when
60 |         new_lines = apply_missing_comments(new_lines_nc, missing)
61 | 
62 |         # then
63 |         assert new_lines == expected_lines
64 | 


--------------------------------------------------------------------------------
/tests/test_extract.py:
--------------------------------------------------------------------------------
 1 | from dataclasses import asdict
 2 | 
 3 | import pytest
 4 | 
 5 | from bip.extract_comments import extract_comments
 6 | from tests.utils import get_bytes_from_file, get_lines_from_file, load_json
 7 | 
 8 | 
 9 | @pytest.mark.parametrize(
10 |     "path,deleted_lines",
11 |     [
12 |         ("./tests/cases/happy_path/", [0, 8, 13, 18]),
13 |         ("./tests/cases/unknown_problem_1/", []),
14 |     ],
15 | )
16 | def test_happy_path(path: str, deleted_lines: list[int]):
17 |     # given
18 |     file_in, file_out = path + "with_comments.py", path + "no_comments.py"
19 |     file_json = path + "comments.json"
20 |     lines = get_lines_from_file(file_in)
21 |     source_code = get_bytes_from_file(file_in)
22 |     out_file = get_lines_from_file(file_out)
23 |     data_out = load_json(file_json)
24 | 
25 |     # when
26 |     new_lines, data = extract_comments(file_in, source_code, lines)
27 | 
28 |     # then
29 |     assert new_lines == out_file
30 |     assert asdict(data)["comments"] == data_out["comments"]
31 |     assert data.deleted_lines == deleted_lines
32 | 


--------------------------------------------------------------------------------
/tests/test_join.py:
--------------------------------------------------------------------------------
 1 | import pytest
 2 | 
 3 | from bip.join_comments import apply_comments_to_file
 4 | from tests.utils import get_lines_from_file, load_json
 5 | 
 6 | 
 7 | @pytest.mark.parametrize("path", ["./tests/cases/happy_path/"])
 8 | class TestJoin:
 9 |     def test_happy_path(self, path: str):
10 |         # given
11 |         file_in, file_out = path + "no_comments.py", path + "with_comments.py"
12 |         file_json = path + "comments.json"
13 |         lines_in = get_lines_from_file(file_in)
14 |         lines_out = get_lines_from_file(file_out)
15 |         comments_data = load_json(file_json)
16 | 
17 |         # when
18 |         lines_with_applied_comments = apply_comments_to_file(comments_data, lines_in)
19 | 
20 |         # then
21 |         assert lines_with_applied_comments == lines_out
22 | 
23 |     def test_double_join(self, path: str):
24 |         # given
25 |         file_in, file_out = path + "no_comments.py", path + "with_comments.py"
26 |         file_json = path + "comments.json"
27 |         lines_in = get_lines_from_file(file_in)
28 |         lines_out = get_lines_from_file(file_out)
29 |         comments_data = load_json(file_json)
30 | 
31 |         # when
32 |         lines_with_applied_comments = apply_comments_to_file(comments_data, lines_in)
33 |         lines_double_applied_comments = apply_comments_to_file(comments_data, lines_with_applied_comments)
34 | 
35 |         # then
36 |         assert lines_double_applied_comments == lines_out
37 | 


--------------------------------------------------------------------------------
/tests/utils.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | 
 3 | 
 4 | def get_bytes_from_file(file_path):
 5 |     with open(file_path, "r", encoding="utf-8") as file:
 6 |         source_code = file.read()
 7 |     return source_code
 8 | 
 9 | 
10 | def get_lines_from_file(file_path):
11 |     with open(file_path, "r", encoding="utf-8") as file:
12 |         lines = file.readlines()
13 |     return lines
14 | 
15 | 
16 | def load_json(file_path):
17 |     with open(file_path, "r", encoding="utf-8") as file:
18 |         data_out = json.load(file)
19 |     return data_out
20 | 


--------------------------------------------------------------------------------