├── .gitignore ├── img ├── demo.jpg └── multi_text_blocks_error.jpg ├── testcase ├── paper.docx ├── paper_badcase.docx └── processed_paper.docx ├── sort_reference ├── __init__.py ├── __main__.py └── SortReference.py ├── setup.py ├── README.md └── LICENSE /.gitignore: -------------------------------------------------------------------------------- 1 | .vscode 2 | build 3 | dist 4 | test 5 | sort_reference.egg-info 6 | *.pyc -------------------------------------------------------------------------------- /img/demo.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Casxt/SortReference/HEAD/img/demo.jpg -------------------------------------------------------------------------------- /testcase/paper.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Casxt/SortReference/HEAD/testcase/paper.docx -------------------------------------------------------------------------------- /testcase/paper_badcase.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Casxt/SortReference/HEAD/testcase/paper_badcase.docx -------------------------------------------------------------------------------- /testcase/processed_paper.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Casxt/SortReference/HEAD/testcase/processed_paper.docx -------------------------------------------------------------------------------- /img/multi_text_blocks_error.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Casxt/SortReference/HEAD/img/multi_text_blocks_error.jpg -------------------------------------------------------------------------------- /sort_reference/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | from .SortReference import ExtraTextBlock, ExtraSimpleReference, ReorderReference, ReplaceSimpleReference 3 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | import setuptools 2 | 3 | with open("README.md", "r", encoding="utf-8") as fh: 4 | long_description = fh.read() 5 | 6 | setuptools.setup( 7 | name="sort-reference", 8 | version="0.1.2", 9 | author="casxt", 10 | author_email="maple@forer.cn", 11 | description="Sort reference by cited order in docx file", 12 | long_description=long_description, 13 | long_description_content_type='text/markdown', 14 | url="https://github.com/Casxt/SortReference", 15 | packages=["sort_reference"], 16 | classifiers = [ 17 | 'Development Status :: 3 - Alpha', 18 | 'Intended Audience :: Developers', 19 | "Programming Language :: Python :: 3", 20 | "License :: OSI Approved :: Apache Software License", 21 | "Operating System :: OS Independent", 22 | ], 23 | python_requires='>=3.6', 24 | install_requires=['lxml>=4.8.0'], 25 | ) -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # SortReference 2 | 3 | 按照文献引用顺序对文献进行编号。 4 | 5 | ![demo](https://raw.githubusercontent.com/Casxt/SortReference/main/img/demo.jpg "demo") 6 | 7 | ## 支持的场景 8 | 9 | 请确保word中有且只有`[n]`这一种引用格式,`[1-3]`或 `[1,2,3]`等形式需要统一改写成`[1][2][3]`。 10 | 11 | 请确保word中所有形如`[n]`的字符串均代表引用,否则这些字符也会被计入引用而被错误的改写,如果有的话可以先把不代表引用的字符替换为其他格式比如``等到程序处理完后再改写回来。 12 | 13 | ## 使用方法 14 | 15 | 首先确保已经安装python3.6或更高版本。 16 | 17 | 1. 使用 pip 安装依赖 18 | 19 | ``` 20 | python -m pip install sort_reference 21 | ``` 22 | 23 | 2. 指定输入输出文件 24 | 25 | ``` 26 | python -m sort_reference [input] [output] 27 | ``` 28 | 如:`python -m sort_reference testcase/paper.docx testcase/processed_paper.docx` 29 | 30 | 3. 手动处理引用顺序 31 | 32 | 最后的引用目录还无法使用程序自动排序,执行结束后手动排序一下即可。 33 | 34 | ## 报错处理 35 | 36 | ### 1. AssertionError: multi text blocks edit not support yet 37 | 38 | ![multi_text_blocks_error](https://raw.githubusercontent.com/Casxt/SortReference/main/img/multi_text_blocks_error.jpg?raw=true "multi_text_blocks_error") 39 | 40 | **假设具体的报错内容是`total 2 text blocks in snippet: '12'`, 全文搜索找到每一处`[12]`, 删除其中的数字并重新输入一遍即可** 41 | 42 | 出现这个报错是因为一些引用的字符被word分到了多个不同的文字块中,具体来说,某一个`[12]`的字符被word分割为了`[1`和`2]`两个字符串,只是显示时看起来是连续的。此时删除字符并重新输入就可以保证新输入的字符在同一个文字块内了。 43 | 44 | `testcase/paper_badcase.docx`中复现了这一错误,只需要重新输入一遍`12`就可以解决。 45 | 46 | ### 2. 论文导出pdf出现“错误!未找到引用源” 47 | 48 | 导出前使用`ctrl+a`和`ctrl+F11`禁用全局域更新,导出后`ctrl+a`和`ctrl+shift+F11`启用全局域更新即可。网络上有大量教程,可以自行搜索。理论上不是本程序导致的。 49 | 50 | ### 其他错误 51 | 52 | 请提交issue/pr并附带源文件,如果没有反馈可能是没看到,可以发邮件到maple@forer.cn。 53 | -------------------------------------------------------------------------------- /sort_reference/__main__.py: -------------------------------------------------------------------------------- 1 | from .SortReference import ExtraTextBlock, ExtraSimpleReference, ReorderReference, ReplaceSimpleReference 2 | import zipfile 3 | from lxml import etree 4 | import argparse 5 | parser = argparse.ArgumentParser() 6 | parser.add_argument("input", type=str, help="input file") 7 | parser.add_argument("output", type=str, help="output file") 8 | parser.add_argument("-v", dest="verbose", action="store_true", help="output file") 9 | args = parser.parse_args() 10 | 11 | from pathlib import Path 12 | input_path = Path(args.input) 13 | output_path = Path(args.output) 14 | if not input_path.exists(): 15 | print(f"{args.input} not exists.") 16 | exit(0) 17 | 18 | if str(output_path.resolve().absolute()) == str(input_path.resolve().absolute()): 19 | print(f"can not output to input file.") 20 | exit(0) 21 | 22 | archive = zipfile.ZipFile(args.input, 'r') 23 | 24 | with archive.open('word/document.xml') as document: 25 | document_xml = document.read() 26 | print("Scanning reference...") 27 | root = etree.fromstring(document_xml) 28 | texts = ExtraTextBlock(root) 29 | refs, ref_texts = ExtraSimpleReference(texts) 30 | if args.verbose: 31 | for ref, texts in zip(refs, ref_texts): 32 | content = "".join([t.text for t in texts]) 33 | print(f"Found reference [{ref}] in snippet '{content}'") 34 | print(f"Scanning reference complete, total {len(set(refs))} references found in {len(refs)} places.") 35 | 36 | print("Analysis reference...") 37 | ref_count, replace_map = ReorderReference(refs) 38 | for ref, count in ref_count.items(): 39 | if count == 1: 40 | print(f"WARNING: Reference [{ref}] used at just one place, which means it may only be used in your Reference section.") 41 | change_ref_count = 0 42 | change_place_count = 0 43 | for old_id, new_id in replace_map.items(): 44 | if old_id != new_id: 45 | change_ref_count += 1 46 | change_place_count += ref_count[old_id] 47 | if args.verbose: 48 | print(f"reference [{old_id}] now change to [{new_id}]") 49 | print(f"Analysis reference complete, reorder {change_ref_count} references, {change_place_count} place will be changed.") 50 | 51 | print("Reorder reference...") 52 | ReplaceSimpleReference(texts, replace_map) 53 | 54 | with zipfile.ZipFile(args.output, 'w', compression=zipfile.ZIP_DEFLATED, compresslevel=archive.compresslevel) as dest: 55 | for file in archive.filelist: 56 | if file.filename == "word/document.xml": 57 | data = etree.tostring(root, encoding='UTF-8', standalone=True) 58 | else: 59 | data = archive.read(file.filename) 60 | dest.writestr(file.filename, data) 61 | print("Reorder reference succeed!") 62 | print("Don't forget to edit the Reference section manually.") -------------------------------------------------------------------------------- /sort_reference/SortReference.py: -------------------------------------------------------------------------------- 1 | 2 | def ExtraTextBlock(root): 3 | """extra all element in dfs order""" 4 | w_namespace = root.nsmap['w'] 5 | wt_tag = "{%s}t" % (w_namespace) 6 | wt_elems = [] 7 | stack = [root] 8 | while len(stack) > 0: 9 | t = stack.pop() 10 | if t.tag == wt_tag: 11 | wt_elems.append(t) 12 | else: 13 | for i in range(len(t) - 1, -1, -1): 14 | stack.append(t[i]) 15 | return wt_elems 16 | 17 | 18 | def ExtraSimpleReference(wt_elems): 19 | """ 20 | extra reference in [num] format 21 | state1: wiat '[' 22 | state2: wiat '\d' or ']' 23 | """ 24 | ref_texts = [] 25 | state = 1 26 | temp_ref = [] 27 | temp_ref_str = "" 28 | refs = [] 29 | for t in wt_elems: 30 | for c in t.text: 31 | if state == 1: 32 | if c == '[': 33 | state = 2 34 | temp_ref.append(t) 35 | else: 36 | pass 37 | elif state == 2: 38 | if t is not temp_ref[-1]: 39 | temp_ref.append(t) 40 | if ord('0') <= ord(c) <= ord('9'): 41 | temp_ref_str = temp_ref_str + c 42 | elif c == ']': 43 | state = 1 44 | ref_texts.append(temp_ref) 45 | temp_ref = [] 46 | refs.append(temp_ref_str) 47 | temp_ref_str = "" 48 | else: 49 | temp_ref = [] 50 | temp_ref_str = "" 51 | state = 1 52 | return refs, ref_texts 53 | 54 | def ReorderReference(ref_order): 55 | """ 56 | count reference usage 57 | sort reference to new order by using order 58 | """ 59 | count = {} 60 | new_id = {} 61 | for ref in ref_order: 62 | if ref in count: 63 | count[ref] += 1 64 | else: 65 | count[ref] = 1 66 | new_id[ref] = str(len(count)) 67 | return count, new_id 68 | 69 | 70 | def ReplaceSimpleReference(wt_elems, replace_map): 71 | """ 72 | state1: wiat '[' 73 | state2: wiat '\d' or ']' 74 | """ 75 | state = 1 76 | temp_ref = [] 77 | temp_ref_str = "" 78 | for t in wt_elems: 79 | p = 0 80 | while p < len(t.text): 81 | c = t.text[p] 82 | if state == 1: 83 | if c == '[': 84 | state = 2 85 | else: 86 | pass 87 | elif state == 2: 88 | if ord('0') <= ord(c) <= ord('9'): 89 | temp_ref_str = temp_ref_str + c 90 | if len(temp_ref) == 0 or t is not temp_ref[-1][0]: 91 | temp_ref.append([t, p, p]) 92 | else: 93 | temp_ref[-1][2] = p 94 | elif c == ']': 95 | assert len(temp_ref) == 1, f"""multi text blocks edit not support yet, 96 | total {len(temp_ref)} text blocks in snippet: '{''.join([t[0].text for t in temp_ref])}', 97 | to solve this error, delete and rewrite this snippet could be useful.""" 98 | temp_t, s, e = temp_ref[-1][0], temp_ref[-1][1], temp_ref[-1][2] 99 | new_idx = replace_map[temp_ref_str] 100 | # update `p` if `p` still in current text 101 | if t is temp_t: 102 | p = p + len(new_idx) - len(temp_ref_str) 103 | temp_t.text = temp_t.text[0:s] + new_idx + temp_t.text[e+1:] 104 | state = 1 105 | temp_ref = [] 106 | temp_ref_str = "" 107 | else: 108 | temp_ref = [] 109 | temp_ref_str = "" 110 | state = 1 111 | p += 1 112 | 113 | 114 | 115 | 116 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. --------------------------------------------------------------------------------