├── .gitignore ├── LICENSE ├── README.md ├── checker.py ├── checker_custom_path.py ├── compress.py ├── cookie.tmpl.py ├── genInfo.py ├── main.py ├── requirements.txt ├── start.cmd ├── utils ├── README.md ├── db.text.json └── x-gallery-metadata.user.js └── writeInfo.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # test module 个人习惯,测试目录去掉 10 | /test/ 11 | 12 | # Distribution / packaging 13 | .Python 14 | build/ 15 | develop-eggs/ 16 | dist/ 17 | downloads/ 18 | eggs/ 19 | .eggs/ 20 | lib/ 21 | lib64/ 22 | parts/ 23 | sdist/ 24 | var/ 25 | wheels/ 26 | pip-wheel-metadata/ 27 | share/python-wheels/ 28 | *.egg-info/ 29 | .installed.cfg 30 | *.egg 31 | MANIFEST 32 | 33 | # PyInstaller 34 | # Usually these files are written by a python script from a template 35 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 36 | *.manifest 37 | *.spec 38 | .idea/ 39 | 40 | # Installer logs 41 | pip-log.txt 42 | pip-delete-this-directory.txt 43 | 44 | # Unit test / coverage reports 45 | htmlcov/ 46 | .tox/ 47 | .nox/ 48 | .coverage 49 | .coverage.* 50 | .cache 51 | nosetests.xml 52 | coverage.xml 53 | *.cover 54 | .hypothesis/ 55 | .pytest_cache/ 56 | 57 | # Translations 58 | *.mo 59 | *.pot 60 | 61 | # Django stuff: 62 | *.log 63 | local_settings.py 64 | db.sqlite3 65 | 66 | # Flask stuff: 67 | instance/ 68 | .webassets-cache 69 | 70 | # Scrapy stuff: 71 | .scrapy 72 | 73 | # Sphinx documentation 74 | docs/_build/ 75 | 76 | # PyBuilder 77 | target/ 78 | 79 | # Jupyter Notebook 80 | .ipynb_checkpoints 81 | 82 | # IPython 83 | profile_default/ 84 | ipython_config.py 85 | 86 | # pyenv 87 | .python-version 88 | 89 | # celery beat schedule file 90 | celerybeat-schedule 91 | 92 | # SageMath parsed files 93 | *.sage.py 94 | 95 | # Environments 96 | .env 97 | .venv 98 | env/ 99 | venv/ 100 | ENV/ 101 | env.bak/ 102 | venv.bak/ 103 | 104 | # Spyder project settings 105 | .spyderproject 106 | .spyproject 107 | 108 | # Rope project settings 109 | .ropeproject 110 | 111 | # mkdocs documentation 112 | /site 113 | 114 | # mypy 115 | .mypy_cache/ 116 | .dmypy.json 117 | dmypy.json 118 | 119 | # Pyre type checker 120 | .pyre/ 121 | 122 | out/ 123 | work/ 124 | cookie.py 125 | 126 | # outputs 127 | *.json 128 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 S4kura0ne 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # hentaiTagger4calibre 2 | A tag converter for calibre 3 | 4 | --- 5 | 6 | 7 | ### 注意 8 | 9 | 为了提升阅读体验,我的新本子站已经迁移到了 [LANraragi](https://github.com/Difegue/LANraragi). 因此这个repo将不那么频繁地进行维护。 10 | 11 | LANraragi的优点: 12 | 13 | - Calibre-web 在线阅读时需要 **加载整个cbz文件**, LANraragi 支持 **服务端解压加载,传输单张图片给浏览器**. 14 | - 支持**直接输入 e-hentai 网址** 下载本子, 支持 **自动从 e-hentai 和 n-hentai 下载标签标题信息**. 15 | - 更好的**标签管理**, 尤其适合本子 **有许多标签** 的情形, calibre-web里存在太多标签会使得标签系统失去作用. 16 | - **不更改文件的hash值**, 因此下载的问价你可以直接从 e-hentai 服务器溯源,或是作为种子文件再次上传. 17 | - 不需要学会python的用法/魔改此脚本. 18 | 19 | 这个脚本的优点: 20 | 21 | - 支持嵌入 **eze 的 info.json**, 意味着不想LANraragi将信息单独存放在它自己的数据库中, **所有的** 元信息 **都和本子在一起**. 22 | - 支持检查画廊更新,有时候有些画廊会,每周更新,这个脚本可以方便地将其捞出来. 23 | - **精准地导入元数据**, 当多个汉化组同时汉化一本本子时,LANraragi自带的搜刮器可能会下错翻译组的信息. 24 | - **兼容** 包括 calibre and LANraragi 在内的所有阅读方案. 25 | - 我自己写的,因此遇到新的需求可以直接修改. 26 | 27 | 28 | ### Notice 29 | 30 | I have migrated to [LANraragi](https://github.com/Difegue/LANraragi) due to its better web reading experience. So this repo is maintained in a less frequent status. 31 | 32 | Advantages in LANraragi: 33 | 34 | - Calibre-web requires **loading of the whole cbz file**, and LANraragi supports decompress cbz file **at server-side**. 35 | - Support direct download **by inputing the e-hentai url**, and support **automatically scrub the meta info from e-hentai and n-hentai**. 36 | - Better **tag management**, especially designed for commic files with **lots of tags**, in calibre-web, too much tag makes the whole tag system unavailable to use. 37 | - **Not modify the hash of the archive**, means that using that hash, the archive can be found more easily on e-hentai server, or be uploaded as a bittorrent file. 38 | - No need to learn python to run this script 39 | 40 | Advantages of this script: 41 | 42 | - support embedding **eze info.json**, which means **all** meta infos are **with the cbz file**, not like LANraragi, meta infomation are stored in its seperated database. 43 | - support checking for update. Sometimes some galley will have new images uploaded, this script can help find these out-dated archives. 44 | - **More precise while importing meta**, when importing with LANraragi, some meta may be downloaded from the wrong galley, may caused by multiple translation group are translating the same galley. 45 | - **compatible** with all solutions like calibre and LANraragi 46 | - Wrote by myself, so it is more easy to modify when new requirements exist. 47 | 48 | ### Introduction 49 | 50 | This simple python3 app can convert metadata in archive zip file downloaded from e-hentai or exhentai to a format that calibre can recognize. 51 | 52 | ### Requirements 53 | 54 | - A windows machine, linux not tested 55 | - A plugin called [Embeded Comic metadata](https://github.com/dickloraine/EmbedComicMetadata) should be installed on calibre. 56 | - p7zip or 7zip in PATH. 57 | 58 | ```bash 59 | sudo apt install python p7zip-full # on windows: choco install python 7zip 60 | pip -r requirements.txt 61 | ``` 62 | 63 | ### Usage 64 | 65 | - Download a zip archive from one of two hentai websites 66 | - Use [this script](https://raw.githubusercontent.com/dnsev-h/x/master/builds/x-gallery-metadata.user.js) to get metadata in a form of info.json (from https://dnsev-h.github.io/x/), and add it into the zip file. 67 | - Delete all intermediate and final outputs like `inf.json`, `ser.json`, `out/` 68 | - Uncompress the zip file, move the output folder into `work/` subfolder. 69 | 70 | Work folder should look like this: 71 | 72 | ``` 73 | │ 1_info.py 74 | │ 2_compress.cmd 75 | │ 3_zipNote.py 76 | │ 77 | └─work 78 | ├─commic1 79 | │ 1.png 80 | │ 2.png 81 | │ info.json 82 | │ 83 | └─commic2 84 | 1.png 85 | 2.png 86 | info.json 87 | ``` 88 | 89 | - Make sure all the requirements are satisfied. 90 | - Run `python main.py` in order. 91 | 92 | The final cbz files should appear in `out/` subfolder. 93 | 94 | 95 | 96 | ### Checker 97 | 98 | Specify your cbz path in `checker_custom_path.py`, and run it. It will help you check if your books are up-to-date. 99 | 100 | ~~The `checker.py` can check whether books recorded in `inf.json` are all visible now. It is useful to use this script to track some ongoing comics since they will be replaced and become invisible. ~~ 101 | -------------------------------------------------------------------------------- /checker.py: -------------------------------------------------------------------------------- 1 | import requests 2 | from pathlib import Path 3 | import json 4 | import time 5 | from cookie import cookie 6 | 7 | cwd = Path.cwd() 8 | infPath = cwd / 'inf.json' 9 | chgPath = cwd / 'chg.json' 10 | errPath = cwd / 'err.json' 11 | 12 | infStore = json.loads(infPath.read_text(encoding='UTF-8')) 13 | 14 | proxies = { 15 | 'http': 'http://127.0.0.1:7890', 16 | 'https': 'http://127.0.0.1:7890', 17 | } 18 | 19 | def getPage(url, s): 20 | r = s.get(url)#, proxies=proxies) 21 | # print(r.text) 22 | return r.text 23 | 24 | def stillThere(url, s): 25 | return getPage(url, s).find('Visible:Yes') != -1 26 | 27 | changed = [] 28 | error = [] 29 | 30 | counter = 0 31 | 32 | s = requests.Session() 33 | requests.utils.add_dict_to_cookiejar(s.cookies, cookie) 34 | s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36' 35 | 36 | for i in infStore: 37 | try: 38 | counter += 1 39 | if counter < 0: # when error occured, change this to continue 40 | continue 41 | url = infStore[i]["Web"] 42 | print(f'{counter}/{len(infStore)}') 43 | if not stillThere(url, s): 44 | print(url) 45 | # print(stillThere(url)) 46 | changed.append(url) 47 | # time.sleep(1) 48 | except: 49 | print(f'err:{0}', url) 50 | error.append(url) 51 | 52 | print(changed) 53 | print(error) 54 | chgPath.write_text(json.dumps(changed, ensure_ascii=False, indent=2), encoding='UTF-8') 55 | errPath.write_text(json.dumps(error, ensure_ascii=False, indent=2), encoding='UTF-8') 56 | -------------------------------------------------------------------------------- /checker_custom_path.py: -------------------------------------------------------------------------------- 1 | import requests 2 | from pathlib import Path 3 | import json 4 | import time 5 | import zipfile 6 | import pprint 7 | from xml.dom.minidom import parseString 8 | from cookie import cookie 9 | 10 | cbzPath = Path('\\\\server\\share') 11 | # cbzPath = Path('D:\\books') 12 | 13 | cwd = Path.cwd() 14 | chgPath = cwd / 'chg.json' 15 | errPath = cwd / 'err.json' 16 | 17 | cbzList = list(cbzPath.glob('**/*.cbz')) 18 | 19 | print(f'Found {len(cbzList)} files.') 20 | 21 | urls = [] 22 | changed = [] 23 | error = [] 24 | 25 | for filePath in cbzList: 26 | try: 27 | zipobj = zipfile.ZipFile(filePath) 28 | xmlobj = zipobj.read('ComicInfo.xml').decode() 29 | dom = parseString(xmlobj) 30 | urls.append(dom.getElementsByTagName('Web')[0].childNodes[0].data) 31 | except: 32 | print(f'err:{0}', filePath) 33 | error.append(filePath) 34 | 35 | proxies = { 36 | 'http': 'http://127.0.0.1:7890', 37 | 'https': 'http://127.0.0.1:7890', 38 | } 39 | 40 | def getPage(url, s): 41 | r = s.get(url)#, proxies=proxies) 42 | # print(r.text) 43 | return r.text 44 | 45 | def stillThere(url, s): 46 | return getPage(url, s).find('Visible:Yes') != -1 47 | 48 | 49 | counter = 0 50 | 51 | s = requests.Session() 52 | requests.utils.add_dict_to_cookiejar(s.cookies, cookie) 53 | s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36' 54 | 55 | for url in urls: 56 | try: 57 | counter += 1 58 | if counter < 0: # when error occured, change this to continue 59 | continue 60 | print(f'{counter}/{len(urls)}') 61 | if not stillThere(url, s): 62 | print(url) 63 | # print(stillThere(url)) 64 | changed.append(url) 65 | # time.sleep(1) 66 | except: 67 | print(f'err:{0}', url) 68 | error.append(url) 69 | 70 | print('changed:') 71 | pprint.pprint(changed) 72 | print('error:') 73 | pprint.pprint(error) 74 | chgPath.write_text(json.dumps(changed, ensure_ascii=False, indent=2), encoding='UTF-8') 75 | errPath.write_text(json.dumps(error, ensure_ascii=False, indent=2), encoding='UTF-8') 76 | -------------------------------------------------------------------------------- /compress.py: -------------------------------------------------------------------------------- 1 | import subprocess 2 | import multiprocessing 3 | import sys 4 | 5 | corecount = multiprocessing.cpu_count() 6 | 7 | def compress(dirName, verbose = False): 8 | command = f'7z a -r -scsUTF-8 -sccUTF-8 -mx6 -mmt{corecount} "out/{dirName}.zip" "./work/{dirName}/*"&&7z t "out/{dirName}.zip"' 9 | print(command) 10 | try: 11 | subprocess.run(command, shell=True, check=True) 12 | except: 13 | return [False, sys.exc_info()] 14 | return [True] 15 | -------------------------------------------------------------------------------- /cookie.tmpl.py: -------------------------------------------------------------------------------- 1 | cookie = { 2 | 'ipb_member_id': '', 3 | 'ipb_pass_hash': '', 4 | 'igneous': '', 5 | } -------------------------------------------------------------------------------- /genInfo.py: -------------------------------------------------------------------------------- 1 | from pathlib import Path 2 | import json 3 | import pprint 4 | import pycountry 5 | import re 6 | import sys 7 | import urllib.parse 8 | 9 | pp = pprint.PrettyPrinter(indent=2) 10 | 11 | cwd = Path.cwd() 12 | utilsPath = cwd / 'utils' 13 | transPath = utilsPath / 'db.text.json' 14 | trans = json.loads(transPath.read_text(encoding='UTF-8')) 15 | 16 | def getCore(st): 17 | t1 = re.sub(u"\\「.*?\\」|\\(.*?\\)|\\(.*?)|\\{.*?}|\\[.*?]|\\【.*?】", "", st).strip() 18 | if t1 == '': 19 | t1 = re.sub(u"\\(.*?\\)|\\(.*?)|\\{.*?}|\\[.*?]|\\【.*?】", "", st).strip() 20 | if t1 == '': 21 | t1 = re.sub(u"\\(.*?)|\\{.*?}|\\[.*?]|\\【.*?】", "", st).strip() 22 | return t1 23 | 24 | def getSeries(st): 25 | core = getCore(st) 26 | iss = 1.0 27 | ser = core 28 | 29 | # ①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳ 30 | replaceList = [ 31 | ('①', 1), 32 | ('②', 2), 33 | ('③', 3), 34 | ('④', 4), 35 | ('⑤', 5), 36 | ('⑥', 6), 37 | ('⑦', 7), 38 | ('⑧', 8), 39 | ('⑨', 9), 40 | ('⑩', 10), 41 | ('⑪', 11), 42 | ('⑫', 12), 43 | ('⑬', 13), 44 | ('⑭', 14), 45 | ('⑮', 15), 46 | ('⑯', 16), 47 | ('⑰', 17), 48 | ('⑱', 18), 49 | ('⑲', 19), 50 | ('⑳', 20), 51 | ] 52 | for char in replaceList: 53 | core = core.replace(char[0], str(char[1])) 54 | 55 | # 2020年10月号 56 | # 2021月2号 57 | if '月' in core[-4:] and '号' in core[-4:]: 58 | while (ser[-1] != ' '): 59 | ser = ser[:-1] 60 | ser = ser.strip() 61 | return [ser, iss] 62 | 63 | # Artist Galleries ::: 64 | if core[:16].lower() == 'artist galleries': 65 | ser = core[16:].strip().strip(':').strip() 66 | return [ser, iss] 67 | 68 | # 从本子了解汉化教程 69 | if core[:9] == '从本子了解汉化教程': 70 | ser = '从本子了解汉化教程' 71 | if core[9:][0].isdigit(): 72 | iss = float(core[9:][0]) 73 | return [ser, iss] 74 | 75 | # 美羽ちゃんとベランダXX 76 | if core == '美羽ちゃんとベランダXX': 77 | return [ser, iss] 78 | 79 | # 1階 80 | if core[-1] == '階' and core[-2].isdigit(): 81 | count = -2 82 | while core[count - 1].isdigit(): 83 | count = count - 1 84 | ser = core[:count].strip() 85 | iss = float(core[count:-1]) 86 | return [ser, iss] 87 | # 1 88 | # 01 89 | # 1.5 90 | if core[-1].isdigit(): 91 | count = -1 92 | while core[count - 1].isdigit() or (core[count - 1] == '.' and core[count - 2].isdigit()): 93 | count = count - 1 94 | ser = core[:count].strip() 95 | iss = float(core[count:]) 96 | # 01 97 | # vol.1 98 | # Vol,01 99 | if ser[-1:] == '#' or ser[-1:] == '.' or ser[-1:] == ',': 100 | ser = ser[:-1].strip() 101 | # vol 1 102 | if ser[-3:].lower() == 'vol': 103 | ser = ser[:-3].strip() 104 | # LEVEL:1 105 | if ser[-6:].lower() == 'level:': 106 | ser = ser[:-6].strip() 107 | # 1+2 108 | # 1-2 109 | if ser[-1:] == '+' or ser[-1:] == '-': 110 | iss = 1.0 111 | ser = core 112 | 113 | # roman numerals 114 | # Ⅰ 115 | # Ⅱ 116 | # Ⅲ 117 | # Ⅳ 118 | # Ⅴ 119 | # Ⅵ 120 | # Ⅶ 121 | # Ⅷ 122 | # Ⅸ 123 | # Ⅹ 124 | # Ⅺ 125 | # Ⅻ 126 | # XIII 127 | # XIV 128 | # XV 129 | rn = [ 130 | ['Ⅰ', 1.0], 131 | ['Ⅱ', 2.0], 132 | ['Ⅲ', 3.0], 133 | ['Ⅳ', 4.0], 134 | ['Ⅴ', 5.0], 135 | ['Ⅵ', 6.0], 136 | ['Ⅶ', 7.0], 137 | ['Ⅷ', 8.0], 138 | ['Ⅸ', 9.0], 139 | ['Ⅹ', 10.0], 140 | ['Ⅺ', 11.0], 141 | ['Ⅻ', 12.0], 142 | ['XIII', 13.0], 143 | ['VIII', 8.0], 144 | ['XIV', 14.0], 145 | ['XII', 12.0], 146 | ['VII', 7.0], 147 | ['III', 3.0], 148 | ['XV', 15.0], 149 | ['XI', 11.0], 150 | ['VI', 6.0], 151 | ['IX', 9.0], 152 | ['IV', 4.0], 153 | ['II', 2.0], 154 | ['X', 10.0], 155 | ['V', 5.0], 156 | ['I', 1.0], 157 | ] 158 | for r in rn: 159 | l = len(r[0]) 160 | if core[-l:] == r[0]: 161 | ser = core[:-l].strip() 162 | iss = r[1] 163 | return [ser, iss] 164 | 165 | # 援助交配 166 | if core[:4] == '援助交配': 167 | ser = '援助交配' 168 | return [ser, iss] 169 | 170 | # ネコぱら01 おまけ本 171 | if core == 'ネコぱら01 おまけ本': 172 | ser = 'ネコぱら' 173 | return [ser, iss] 174 | 175 | # Arknights Character Fan Art Gallery 176 | if core[:35].lower() == 'arknights character fan art gallery': 177 | ser = 'Arknights Character Fan Art Gallery' 178 | return [ser, iss] 179 | 180 | return [ser, iss] 181 | 182 | def gett(index, st): 183 | if st.lower() in trans['data'][index]['data']: 184 | return trans['data'][index]['data'][st]['name'] 185 | else: 186 | return None 187 | 188 | def trasgroup(d): 189 | res = [] 190 | for i in d: 191 | a = gett(i[0], i[1]) 192 | if a != None: 193 | res.append(a) 194 | return res 195 | 196 | def genInfo(dir, verbose = False): 197 | try: 198 | infoPath = dir / 'info.json' 199 | infoText = infoPath.read_text(encoding='UTF-8') 200 | infoJson = json.loads(infoText) 201 | except: 202 | return [False, sys.exc_info()] 203 | 204 | info = {} 205 | info['Title'] = infoJson['gallery_info']['title_original'] or infoJson['gallery_info']['title'] 206 | info['Genre'] = infoJson['gallery_info']['category'] 207 | info['Language'] = infoJson['gallery_info']['language'] 208 | info['UploadDate'] = infoJson['gallery_info']['upload_date'] 209 | info['Year'] = info['UploadDate'][0] 210 | info['Month'] = info['UploadDate'][1] 211 | info['Day'] = info['UploadDate'][2] 212 | info['PageCount'] = infoJson['gallery_info_full']['image_count'] 213 | info['Rating'] = infoJson['gallery_info_full']['rating']['average'] 214 | info['Publisher'] = urllib.parse.unquote(infoJson['gallery_info_full']['uploader']) 215 | 216 | if infoJson['gallery_info']['source']['site'] == 'exhentai': 217 | info['Web'] = f'https://exhentai.org/g/{infoJson["gallery_info"]["source"]["gid"]}/{infoJson["gallery_info"]["source"]["token"]}/' 218 | elif infoJson['gallery_info']['source']['site'] == 'e-hentai': 219 | info['Web'] = f'https://e-hentai.org/g/{infoJson["gallery_info"]["source"]["gid"]}/{infoJson["gallery_info"]["source"]["token"]}/' 220 | elif infoJson['gallery_info']['source']['site'] == 'acg18': 221 | info['Web'] = f'https://acg18.moe/{infoJson["gallery_info"]["source"]["gid"]}.html' 222 | 223 | info['Imprint'] = re.match(r'^(?:\()(.+?)(?:\))', infoJson['gallery_info']['title']) 224 | if(info['Imprint'] != None): 225 | info['Imprint'] = info['Imprint'].group(1) 226 | else: 227 | info['Imprint'] = infoJson['gallery_info_full']['source_site'] 228 | 229 | # begin tags 230 | info['tags'] = [] 231 | 232 | info['tags'].append(info['Genre']) 233 | transtags = [[1, info['Genre']]] 234 | 235 | keywords = [ 236 | ['language', 2], 237 | ['parody', 3], 238 | ['character', 4], 239 | ['male', 7], 240 | ['female', 8], 241 | ['misc', 9], 242 | ] 243 | 244 | for typ in keywords: 245 | if typ[0] in infoJson['gallery_info']['tags']: 246 | for tag in infoJson['gallery_info']['tags'][typ[0]]: 247 | info['tags'].append(tag) 248 | transtags.append([typ[1], tag]) 249 | 250 | rtagInTitle=re.findall(r'\[(.+?)\]|\((.+?)\)|【(.+?)】|((.+?))', infoJson['gallery_info']['title']) 251 | tagInTitle = [] 252 | for x in rtagInTitle: 253 | tagInTitle += list(x) 254 | 255 | info['tags'] = trasgroup(transtags) + tagInTitle + info['tags'] 256 | 257 | info['tags'] = list(dict.fromkeys(info['tags'])) 258 | 259 | if '' in info['tags']: 260 | info['tags'].remove('') 261 | 262 | # end tags 263 | 264 | # begin writer 265 | info['writer'] = [] 266 | transwris = [] 267 | if 'group' in infoJson['gallery_info']['tags']: 268 | for t in infoJson['gallery_info']['tags']['group']: 269 | info['writer'].append(t) 270 | transwris.append([5, t]) 271 | if 'artist' in infoJson['gallery_info']['tags']: 272 | for t in infoJson['gallery_info']['tags']['artist']: 273 | info['writer'].append(t) 274 | transwris.append([6, t]) 275 | tg = trasgroup(transwris) 276 | ltg = [x.lower() for x in tg] 277 | awrite = [] 278 | for x in info['writer']: 279 | if x.lower() not in ltg: 280 | awrite.append(x) 281 | info['writer'] = tg + awrite 282 | info['writer'] = list(dict.fromkeys(info['writer'])) 283 | # end writer 284 | 285 | # begin characters 286 | info['characters'] = [] 287 | transchars = [] 288 | if 'character' in infoJson['gallery_info']['tags']: 289 | for t in infoJson['gallery_info']['tags']['character']: 290 | info['characters'].append(t) 291 | transchars.append([4, t]) 292 | tg = trasgroup(transchars) 293 | ltg = [x.lower() for x in tg] 294 | achar = [] 295 | for x in info['characters']: 296 | if x.lower() not in ltg: 297 | achar.append(x) 298 | info['characters'] = tg + achar 299 | info['characters'] = list(dict.fromkeys(info['characters'])) 300 | # end characters 301 | 302 | # begin series 303 | info['coreTitle'] = getCore(info['Title']) 304 | info['series'], info['issue'] = getSeries(info['coreTitle']) 305 | # [Pixiv] 306 | # [pixiv] 307 | # [Pixiv Fanbox] 308 | if info['Title'][1:6].lower() == 'pixiv': 309 | info['series'], info['issue'] = ['Pixiv', 1.0] 310 | # [Twitter] 311 | if info['Title'][1:8].lower() == 'twitter': 312 | info['series'], info['issue'] = ['Twitter', 1.0] 313 | # Karorfulmix♥EX 314 | if info['series'] == 'Karorfulmix♥EX': 315 | info['series'] = 'KARORFUL MIX EX' 316 | 317 | cau = ['-', '-', ':', ':', '~', ']', '[', '(', ')', '「', '」', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '+'] 318 | cauFlag = False 319 | for c in cau: 320 | if c in info['series']: 321 | cauFlag = True 322 | if cauFlag: 323 | info['coreTitle'] = f"[CAUTION]{info['coreTitle']}" 324 | 325 | #end series 326 | 327 | if info['Genre'] == 'non-h': 328 | info['AgeRating'] = 'Teen' 329 | else: 330 | info['AgeRating'] = 'Adults Only 18+' 331 | 332 | if info['Genre'] in ['doujinshi', 'manga']: 333 | info['Manga'] = 'Yes' 334 | else: 335 | info['Manga'] = 'No' 336 | 337 | info['Writer'] = ', '.join(str(p) for p in info['writer']) 338 | info['Characters'] = ', '.join(str(p) for p in info['characters']) 339 | info['LanguageISO'] = pycountry.languages.get(name=info['Language']).alpha_2 340 | info['Comments'] = f'''

Web: {info['Web']}

Rating: {info['Rating']}, {infoJson['gallery_info_full']['rating']['count']}

PageCount: {info['PageCount']}

Genre: {info['Genre']}

Imprint: {info['Imprint']}

AgeRating: {info['AgeRating']}

UploadDate: {info['UploadDate']}

''' 341 | 342 | if verbose: 343 | pp.pprint(info) 344 | 345 | return [True, info] 346 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | from genInfo import genInfo 2 | from pathlib import Path 3 | from writeInfo import writeInfo 4 | import json 5 | 6 | verbose = False 7 | infoOnly = False 8 | 9 | succeeded = [] 10 | failed = [] 11 | 12 | cwd = Path.cwd() 13 | work = cwd / 'work' 14 | infPath = cwd / 'inf.json' 15 | serPath = cwd / 'ser.json' 16 | 17 | if infPath.exists(): 18 | infStore = json.loads(infPath.read_text(encoding='UTF-8')) 19 | else: 20 | infStore = {} 21 | 22 | if serPath.exists(): 23 | serStore = json.loads(serPath.read_text(encoding='UTF-8')) 24 | else: 25 | serStore = {} 26 | 27 | dirList = [x for x in work.iterdir() if x.is_dir()] 28 | 29 | for curDirIndex in range(len(dirList)): 30 | curDir = dirList[curDirIndex] 31 | print(f'===== start processing {curDirIndex+1}/{len(dirList)} =====') 32 | print(f' path: {curDir}') 33 | 34 | if curDir.name in infStore: 35 | print('from inf.json') 36 | info = infStore[curDir.name] 37 | else: 38 | gr = genInfo(curDir, verbose) 39 | if not gr[0]: 40 | print(f'===== fail generating {curDirIndex+1}/{len(dirList)} =====\n') 41 | failed.append([curDir.name, gr[1]]) 42 | continue 43 | info = gr[1] 44 | 45 | if curDir.name in serStore: 46 | print('from ser.json') 47 | info['series'], info['issue'] = serStore[curDir.name][:2] 48 | 49 | serStore[curDir.name] = [info['series'], info['issue'], info['coreTitle'], info['Web']] 50 | infStore[curDir.name] = info 51 | 52 | if not infoOnly: 53 | wr = writeInfo(curDir.name, info, verbose) 54 | if(not wr[0]): 55 | print(f'===== fail writing {curDirIndex+1}/{len(dirList)} =====\n') 56 | failed.append([curDir.name, wr[1]]) 57 | continue 58 | print(f'===== finish processing {curDirIndex+1}/{len(dirList)} =====\n') 59 | succeeded.append(curDir.name) 60 | 61 | infPath.write_text(json.dumps(infStore, ensure_ascii=False, indent=2, sort_keys=True), encoding='UTF-8') 62 | serPath.write_text(json.dumps(serStore, ensure_ascii=False, indent=2, sort_keys=True), encoding='UTF-8') 63 | 64 | result = { 65 | 'succeeded_count': len(succeeded), 66 | 'failed_count': len(failed), 67 | } 68 | print(failed) 69 | print(result) 70 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | pycountry 2 | requests 3 | -------------------------------------------------------------------------------- /start.cmd: -------------------------------------------------------------------------------- 1 | python main.py 2 | pause -------------------------------------------------------------------------------- /utils/README.md: -------------------------------------------------------------------------------- 1 | ### x-gallery-metadata.user.js 2 | 3 | Downloaded from [https://raw.githubusercontent.com/dnsev-h/x/master/builds/x-gallery-metadata.user.js](https://raw.githubusercontent.com/dnsev-h/x/master/builds/x-gallery-metadata.user.js), v1.2.4, at 2021-02-11 4 | 5 | ### db.text.json 6 | 7 | Downloaded from [https://github.com/EhTagTranslation/Database/releases](https://github.com/EhTagTranslation/Database/releases), [`ceaeb72`](https://github.com/EhTagTranslation/Database/compare/efe2dee6f44474b7cc68245bad751bdba7dc3400...ceaeb72c3c548d39ac381f9ab9b81f1f40a4387a), at 2021-02-11 8 | 9 | -------------------------------------------------------------------------------- /utils/x-gallery-metadata.user.js: -------------------------------------------------------------------------------- 1 | // ==UserScript== 2 | // @name x/gallery-metadata 3 | // @version 1.2.4 4 | // @author dnsev-h 5 | // @namespace dnsev-h 6 | // @description Download metadata JSON files for galleries 7 | // @run-at document-start 8 | // @include https://exhentai.org/* 9 | // @include https://e-hentai.org/* 10 | // @icon  11 | // @icon64  12 | // @homepage https://dnsev-h.github.io/x/ 13 | // @supportURL https://github.com/dnsev-h/x/issues 14 | // @updateURL https://raw.githubusercontent.com/dnsev-h/x/master/builds/x-gallery-metadata.meta.js 15 | // @downloadURL https://raw.githubusercontent.com/dnsev-h/x/master/builds/x-gallery-metadata.user.js 16 | // ==/UserScript== 17 | (function(){function r(e,n,t){function o(i,f){if(!n[i]){if(!e[i]){var c="function"==typeof require&&require;if(!f&&c)return c(i,!0);if(u)return u(i,!0);var a=new Error("Cannot find module '"+i+"'");throw a.code="MODULE_NOT_FOUND",a}var p=n[i]={exports:{}};e[i][0].call(p.exports,function(r){var n=e[i][1][r];return o(n||r)},p,p.exports,r,e,n,t)}return n[i].exports}for(var u="function"==typeof require&&require,i=0;idiv"); 257 | if (node === null) { return null; } 258 | 259 | let url = getCssUrl(node.style.backgroundImage); 260 | if (url !== null) { return url; } 261 | 262 | const img = node.querySelector("img[src]"); 263 | return (img !== null ? img.getAttribute("src") : null); 264 | } 265 | 266 | function getCategory(html) { 267 | const node = html.querySelector("#gdc>div[onclick]"); 268 | if (node === null) { return null; } 269 | 270 | const pattern = /['"].*?\/\/.+?\/(.*?)(\?.*?)?(#.*?)?['"]/; 271 | const match = pattern.exec(node.getAttribute("onclick") || ""); 272 | return (match !== null ? match[1] : null); 273 | } 274 | 275 | function getUploader(html) { 276 | const node = html.querySelector("#gdn>a"); 277 | if (node === null) { return null; } 278 | 279 | const pattern = /^.*?\/\/.+?\/(.*?)(\?.*?)?(#.*?)?$/; 280 | const match = pattern.exec(node.getAttribute("href") || ""); 281 | return (match !== null ? (match[1].split("/")[1] || "") : null); 282 | } 283 | 284 | function getRatingCount(html) { 285 | const node = html.querySelector("#rating_count"); 286 | if (node === null) { return null; } 287 | 288 | const value = parseInt(node.textContent.trim(), 10); 289 | return (Number.isNaN(value) ? null : value); 290 | } 291 | 292 | function getRatingAverage(html) { 293 | const node = html.querySelector("#rating_label"); 294 | if (node === null) { return null; } 295 | 296 | const pattern = /average:\s*([0-9\.]+)/i; 297 | const match = pattern.exec(node.textContent); 298 | if (match === null) { return null; } 299 | 300 | const value = parseFloat(match[1]); 301 | return (Number.isNaN(value) ? null : value); 302 | } 303 | 304 | function getFavoriteCount(html) { 305 | const node = html.querySelector("#favcount"); 306 | if (node === null) { return null; } 307 | 308 | const pattern = /\s*([0-9]+|once)/i; 309 | const match = pattern.exec(node.textContent); 310 | if (match === null) { return null; } 311 | 312 | const match1 = match[1]; 313 | return (match1.toLowerCase() === "once" ? 1 : parseInt(match1, 10)); 314 | } 315 | 316 | function getFavoriteCategory(html) { 317 | const node = html.querySelector("#fav>div.i"); 318 | if (node === null) { return null; } 319 | 320 | const title = node.getAttribute("title") || ""; 321 | const pattern = /background-position\s*:\s*\d+(?:px)?\s+(-?\d+)(?:px)/; 322 | const match = pattern.exec(node.getAttribute("style") || ""); 323 | const index = (match !== null) ? 324 | Math.floor((Math.abs(parseInt(match[1], 10)) - 2) / 19) : 325 | -1; 326 | 327 | return { index, title }; 328 | } 329 | 330 | function getThumbnailSize(html) { 331 | const nodes = html.querySelectorAll("#gdo4>.nosel"); 332 | if (nodes.length < 2) { return null; } 333 | return (nodes[0].classList.contains("ths") ? "normal" : "large"); 334 | } 335 | 336 | function getThumbnailRows(html) { 337 | const nodes = html.querySelectorAll("#gdo2>.nosel"); 338 | if (nodes.length === 0) { return null; } 339 | 340 | const pattern = /\s*([0-9]+)/; 341 | for (const node of nodes) { 342 | if (node.classList.contains("ths")) { 343 | const match = pattern.exec(node.textContent); 344 | if (match !== null) { 345 | return parseInt(match[1], 10); 346 | } 347 | } 348 | } 349 | 350 | return null; 351 | } 352 | 353 | function getTags(html) { 354 | const pattern = /(.+):/; 355 | const groups = html.querySelectorAll("#taglist tr"); 356 | const tags = {}; 357 | 358 | for (const group of groups) { 359 | const tds = group.querySelectorAll("td"); 360 | if (tds.length === 0) { continue; } 361 | 362 | const match = pattern.exec(tds[0].textContent); 363 | const namespace = (match !== null ? match[1].trim() : ""); 364 | 365 | let namespaceTags; 366 | if (tags.hasOwnProperty(namespace)) { 367 | namespaceTags = tags[namespace]; 368 | } else { 369 | namespaceTags = []; 370 | tags[namespace] = namespaceTags; 371 | } 372 | 373 | const tagDivs = tds[tds.length - 1].querySelectorAll("div"); 374 | for (const div of tagDivs) { 375 | const link = div.querySelector("a"); 376 | if (link === null) { continue; } 377 | 378 | const tag = link.textContent.trim(); 379 | namespaceTags.push(tag); 380 | } 381 | } 382 | 383 | return tags; 384 | } 385 | 386 | function getDetailsNodes(html) { 387 | return html.querySelectorAll("#gdd tr"); 388 | } 389 | 390 | function getDateUploaded(detailsNodes) { 391 | if (detailsNodes.length <= 0) { return null; } 392 | const node = detailsNodes[0].querySelector(".gdt2"); 393 | return (node !== null ? getTimestamp(node.textContent) : null); 394 | } 395 | 396 | function getVisibleInfo(detailsNodes) { 397 | let visible = true; 398 | let visibleReason = null; 399 | 400 | if (detailsNodes.length > 2) { 401 | const node = detailsNodes[2].querySelector(".gdt2"); 402 | if (node !== null) { 403 | const pattern = /no\s+\((.+?)\)/i; 404 | const match = pattern.exec(node.textContent); 405 | if (match !== null) { 406 | visible = false; 407 | visibleReason = match[1].trim(); 408 | } 409 | } 410 | } 411 | 412 | return { visible, visibleReason }; 413 | } 414 | 415 | function getLanguageInfo(detailsNodes) { 416 | let language = null; 417 | let translated = false; 418 | 419 | if (detailsNodes.length > 3) { 420 | const node = detailsNodes[3].querySelector(".gdt2"); 421 | if (node !== null) { 422 | const textNode = node.firstChild; 423 | if (textNode !== null && textNode.nodeType === Node.TEXT_NODE) { 424 | language = textNode.nodeValue.trim(); 425 | } 426 | 427 | const trNode = node.querySelector(".halp"); 428 | translated = (trNode !== null && trNode.textContent.trim().toLowerCase() === "tr"); 429 | } 430 | } 431 | 432 | return { language, translated }; 433 | } 434 | 435 | function getApproximateTotalFileSize(detailsNodes) { 436 | if (detailsNodes.length <= 4) { return null; } 437 | 438 | const node = detailsNodes[4].querySelector(".gdt2"); 439 | if (node === null) { return null; } 440 | 441 | const pattern = /([0-9\.]+)\s*(\w+)/i; 442 | const match = pattern.exec(node.textContent); 443 | return (match !== null ? utils.getBytesSizeFromLabel(match[1], match[2]) : null); 444 | } 445 | 446 | function getFileCount(detailsNodes) { 447 | if (detailsNodes.length <= 5) { return null; } 448 | 449 | const node = detailsNodes[5].querySelector(".gdt2"); 450 | if (node === null) { return null; } 451 | 452 | const pattern = /([0-9,]+)\s*pages/i; 453 | const match = pattern.exec(node.textContent); 454 | return (match !== null ? parseInt(match[1].replace(/,/g, ""), 10) : null); 455 | } 456 | 457 | function getParent(detailsNodes) { 458 | if (detailsNodes.length <= 1) { return null; } 459 | 460 | const node = detailsNodes[1].querySelector(".gdt2>a"); 461 | if (node === null) { return null; } 462 | 463 | const info = utils.getGalleryIdentifierAndPageFromUrl(node.getAttribute("href") || ""); 464 | return (info !== null ? info.identifier : null); 465 | } 466 | 467 | function getNewerVersions(html) { 468 | const results = []; 469 | const nodes = html.querySelectorAll("#gnd>a"); 470 | 471 | for (const node of nodes) { 472 | const info = utils.getGalleryIdentifierAndPageFromUrl(node.getAttribute("href") || ""); 473 | if (info === null) { continue; } 474 | 475 | const galleryInfo = { 476 | identifier: info.identifier, 477 | name: node.textContent.trim(), 478 | dateUploaded: null 479 | }; 480 | 481 | if (node.nextSibling !== null) { 482 | galleryInfo.dateUploaded = getTimestamp(node.nextSibling.textContent); 483 | } 484 | 485 | results.push(galleryInfo); 486 | } 487 | 488 | return results; 489 | } 490 | 491 | function getTorrentCount(html) { 492 | const nodes = html.querySelectorAll("#gd5 .g2>a"); 493 | const pattern = /\btorrent\s+download\s*\(\s*(\d+)\s*\)/i; 494 | for (const node of nodes) { 495 | const match = pattern.exec(node.textContent); 496 | if (match !== null) { 497 | return parseInt(match[1], 10); 498 | } 499 | } 500 | 501 | return null; 502 | } 503 | 504 | function getArchiverKey(html) { 505 | const nodes = html.querySelectorAll("#gd5 .g2>a"); 506 | const pattern = /\barchive\s+download\b/i; 507 | for (const node of nodes) { 508 | const match = pattern.exec(node.textContent); 509 | if (match !== null) { 510 | const pattern2 = /&or=([^'"]*)['"]/; 511 | const match2 = pattern2.exec(node.getAttribute("onclick") || ""); 512 | return (match2 !== null ? match2[1] : null); 513 | } 514 | } 515 | 516 | return null; 517 | } 518 | 519 | function populateGalleryInfoFromHtml(info, html) { 520 | // General 521 | info.title = getTitle(html); 522 | info.titleOriginal = getTitleOriginal(html); 523 | info.mainThumbnailUrl = getMainThumbnailUrl(html); 524 | info.category = getCategory(html); 525 | info.uploader = getUploader(html); 526 | 527 | info.ratingCount = getRatingCount(html); 528 | info.ratingAverage = getRatingAverage(html); 529 | 530 | info.favoriteCount = getFavoriteCount(html); 531 | info.favoriteCategory = getFavoriteCategory(html); 532 | 533 | info.thumbnailSize = getThumbnailSize(html); 534 | info.thumbnailRows = getThumbnailRows(html); 535 | 536 | info.newerVersions = getNewerVersions(html); 537 | 538 | info.torrentCount = getTorrentCount(html); 539 | info.archiverKey = getArchiverKey(html); 540 | 541 | // Details 542 | const detailsNodes = getDetailsNodes(html); 543 | 544 | info.dateUploaded = getDateUploaded(detailsNodes); 545 | 546 | info.parent = getParent(detailsNodes); 547 | 548 | const visibleInfo = getVisibleInfo(detailsNodes); 549 | info.visible = visibleInfo.visible; 550 | info.visibleReason = visibleInfo.visibleReason; 551 | 552 | const languageInfo = getLanguageInfo(detailsNodes); 553 | info.language = languageInfo.language; 554 | info.translated = languageInfo.translated; 555 | 556 | info.approximateTotalFileSize = getApproximateTotalFileSize(detailsNodes); 557 | 558 | info.fileCount = getFileCount(detailsNodes); 559 | 560 | // Tags 561 | info.tags = getTags(html); 562 | info.tagsHaveNamespace = true; 563 | } 564 | 565 | function getFromHtml(html, url) { 566 | const link = html.querySelector(".ptt td.ptds>a[href],.ptt td.ptdd>a[href]"); 567 | if (link === null) { return null; } 568 | 569 | const idPage = utils.getGalleryIdentifierAndPageFromUrl(link.getAttribute("href") || ""); 570 | if (idPage === null) { return null; } 571 | 572 | const info = new types.GalleryInfo(); 573 | info.identifier = idPage.identifier; 574 | info.currentPage = idPage.page; 575 | info.source = "html"; 576 | populateGalleryInfoFromHtml(info, html); 577 | info.sourceSite = utils.getSourceSiteFromUrl(url); 578 | info.dateGenerated = Date.now(); 579 | return info; 580 | } 581 | 582 | 583 | module.exports = getFromHtml; 584 | 585 | },{"./types":4,"./utils":5}],4:[function(require,module,exports){ 586 | "use strict"; 587 | 588 | const GalleryIdentifier = require("../gallery-identifier").GalleryIdentifier; 589 | 590 | 591 | class GalleryInfo { 592 | constructor() { 593 | this.identifier = null; 594 | this.title = null; 595 | this.titleOriginal = null; 596 | this.dateUploaded = null; 597 | this.category = null; 598 | this.uploader = null; 599 | this.ratingAverage = null; 600 | this.ratingCount = null; 601 | this.favoriteCategory = null; 602 | this.favoriteCount = null; 603 | this.mainThumbnailUrl = null; 604 | this.thumbnailSize = null; 605 | this.thumbnailRows = null; 606 | this.fileCount = null; 607 | this.approximateTotalFileSize = null; 608 | this.visible = true; 609 | this.visibleReason = null; 610 | this.language = null; 611 | this.translated = null; 612 | this.archiverKey = null; 613 | this.torrentCount = null; 614 | this.tags = null; 615 | this.tagsHaveNamespace = null; 616 | this.currentPage = null; 617 | this.parent = null; 618 | this.newerVersions = null; 619 | this.source = null; 620 | this.sourceSite = null; 621 | this.dateGenerated = null; 622 | } 623 | } 624 | 625 | 626 | module.exports = { 627 | GalleryIdentifier, 628 | GalleryInfo 629 | }; 630 | 631 | },{"../gallery-identifier":1}],5:[function(require,module,exports){ 632 | "use strict"; 633 | 634 | const types = require("./types"); 635 | 636 | const sizeLabelToBytesPrefixes = [ "b", "kb", "mb", "gb" ]; 637 | 638 | 639 | function getGalleryPageFromUrl(url) { 640 | const match = /\?(?:(|[\w\W]*?&)p=([\+\-]?\d+))?/.exec(url); 641 | if (match !== null && match[1]) { 642 | const page = parseInt(match[1], 10); 643 | if (!Number.isNaN(page)) { return page; } 644 | } 645 | return null; 646 | } 647 | 648 | function getGalleryIdentifierAndPageFromUrl(url) { 649 | const identifier = types.GalleryIdentifier.createFromUrl(url); 650 | if (identifier === null) { return null; } 651 | 652 | const page = getGalleryPageFromUrl(url); 653 | return { identifier, page }; 654 | } 655 | 656 | function getBytesSizeFromLabel(number, label) { 657 | let i = sizeLabelToBytesPrefixes.indexOf(label.toLowerCase()); 658 | if (i < 0) { i = 0; } 659 | return Math.floor(parseFloat(number) * Math.pow(1024, i)); 660 | } 661 | 662 | function getSourceSiteFromUrl(url) { 663 | const pattern = /^(?:(?:[a-z][a-z0-9\+\-\.]*:\/*|\/{2,})([^\/]*))?(\/?[\w\W]*)$/i; 664 | const match = pattern.exec(url); 665 | 666 | if (match !== null && match[1]) { 667 | const host = match[1].toLowerCase(); 668 | if (host.indexOf("exhentai") >= 0) { return "exhentai"; } 669 | if (host.indexOf("e-hentai") >= 0) { return "e-hentai"; } 670 | } 671 | 672 | return null; 673 | } 674 | 675 | 676 | module.exports = { 677 | getGalleryIdentifierAndPageFromUrl, 678 | getBytesSizeFromLabel, 679 | getSourceSiteFromUrl 680 | }; 681 | 682 | },{"./types":4}],6:[function(require,module,exports){ 683 | "use strict"; 684 | 685 | const apiStyle = require("./style"); 686 | const style = require("../style"); 687 | 688 | 689 | function insertStylesheet() { 690 | const id = "x-gallery-links-right-sidebar"; 691 | if (style.hasStylesheet(id)) { return; } 692 | 693 | const src = require("./style/gallery-right-sidebar.css"); 694 | style.addStylesheet(src, id); 695 | } 696 | 697 | function getGroupContainer(parent) { 698 | const id = "x-gallery-links-right-sidebar-container"; 699 | let node = parent.querySelector(`.${id}`); 700 | if (node === null) { 701 | node = document.createElement("div"); 702 | node.className = `g2 gsp ${id}`; 703 | parent.appendChild(node); 704 | 705 | const p = parent.parentNode; 706 | if (p !== null) { 707 | p.classList.add("x-gallery-links-right-sidebar-contains-container"); 708 | } 709 | } 710 | return node; 711 | } 712 | 713 | function createLink(label, order) { 714 | const parent = document.querySelector("#gd5"); 715 | if (parent === null) { 716 | return { link: null, linkContainer: null }; 717 | } 718 | 719 | // Style 720 | insertStylesheet(); 721 | 722 | // Container 723 | const linkGroup = getGroupContainer(parent); 724 | const linkContainer = document.createElement("div"); 725 | linkContainer.className = "x-gallery-links-right-sidebar-entry"; 726 | if (typeof(order) === "number" && !Number.isNaN(order)) { 727 | linkContainer.style.order = `${order}`; 728 | } 729 | 730 | const img = document.createElement("img"); 731 | img.src = apiStyle.getArrowIconUrl(); 732 | linkContainer.appendChild(img); 733 | 734 | linkContainer.appendChild(document.createTextNode(" ")); 735 | 736 | const link = document.createElement("a"); 737 | link.textContent = label; 738 | linkContainer.appendChild(link); 739 | 740 | linkGroup.appendChild(linkContainer); 741 | 742 | return { link, linkContainer }; 743 | } 744 | 745 | 746 | module.exports = { 747 | createLink 748 | }; 749 | 750 | },{"../style":13,"./style":8,"./style/gallery-right-sidebar.css":9}],7:[function(require,module,exports){ 751 | "use strict"; 752 | 753 | const overrideAttributeName = "data-x-override-page-type"; 754 | 755 | 756 | function setOverride(value) { 757 | if (value) { 758 | document.documentElement.setAttribute(overrideAttributeName, value); 759 | } else { 760 | document.documentElement.removeAttribute(overrideAttributeName); 761 | } 762 | } 763 | 764 | function getOverride() { 765 | const value = document.documentElement.getAttribute(overrideAttributeName); 766 | return value ? value : null; 767 | } 768 | 769 | function get(doc, location) { 770 | const overrideType = getOverride(); 771 | if (overrideType !== null) { 772 | return overrideType; 773 | } 774 | 775 | if (doc.querySelector("#searchbox") !== null) { 776 | return "search"; 777 | } 778 | if (doc.querySelector("input[name=favcat]") !== null) { 779 | return "favorites"; 780 | } 781 | if (doc.querySelector("#i1>h1") !== null) { 782 | return "image"; 783 | } 784 | if (doc.querySelector(".gm h1#gn") !== null) { 785 | return "gallery"; 786 | } 787 | if (doc.querySelector("#profile_outer") !== null) { 788 | return "settings"; 789 | } 790 | if (doc.querySelector("#torrentinfo") !== null) { 791 | return "torrentInfo"; 792 | } 793 | 794 | let n = doc.querySelector("body>.d>p"); 795 | if ( 796 | (n !== null && /gallery\s+has\s+been\s+removed/i.test(n.textContent)) || 797 | doc.querySelector(".eze_dgallery_table") !== null) { // eze resurrection 798 | return "deletedGallery"; 799 | } 800 | 801 | n = doc.querySelector("img[src]"); 802 | if (n !== null && location !== null) { 803 | const p = location.pathname; 804 | if ( 805 | n.getAttribute("src") === location.href && 806 | p.substr(0, 3) !== "/t/" && 807 | p.substr(0, 5) !== "/img/") { 808 | return "panda"; 809 | } 810 | } 811 | 812 | // Unknown 813 | return null; 814 | } 815 | 816 | 817 | module.exports = { 818 | get, 819 | getOverride, 820 | setOverride 821 | }; 822 | 823 | },{}],8:[function(require,module,exports){ 824 | "use strict"; 825 | 826 | function isDark() { 827 | return ( 828 | window.location.hostname.indexOf("exhentai") >= 0 || 829 | document.documentElement.classList.contains("x-force-dark")); 830 | } 831 | 832 | function setDocumentDarkFlag() { 833 | document.documentElement.classList.toggle("x-is-dark", isDark()); 834 | } 835 | 836 | function getArrowIconUrl() { 837 | return (isDark() ? "https://exhentai.org/img/mr.gif" : "https://ehgt.org/g/mr.gif"); 838 | } 839 | 840 | 841 | module.exports = { 842 | isDark, 843 | setDocumentDarkFlag, 844 | getArrowIconUrl 845 | }; 846 | 847 | },{}],9:[function(require,module,exports){ 848 | module.exports = ".x-gallery-links-right-sidebar-container{margin-top:-25px;padding-bottom:0;display:flex;flex-direction:column}.x-gallery-links-right-sidebar-entry{margin-top:25px}div#gright.x-gallery-links-right-sidebar-contains-container{overflow-x:hidden;overflow-y:auto}"; 849 | },{}],10:[function(require,module,exports){ 850 | "use strict"; 851 | 852 | const ready = require("../ready"); 853 | const pageType = require("../api/page-type"); 854 | const windowMessage = require("../window-message"); 855 | const getFromHtml = require("../api/gallery-info/get-from-html"); 856 | const queryString = require("../query-string"); 857 | const GalleryIdentifier = require("../api/gallery-identifier").GalleryIdentifier; 858 | const toCommonJson = require("../api/gallery-info/common-json").toCommonJson; 859 | 860 | let downloadDataUrl = null; 861 | 862 | 863 | function setupGalleryPage() { 864 | createGalleryPageDownloadLink(); 865 | 866 | windowMessage.registerCommand("galleryInfoRequest", (e) => { 867 | const data = getFromHtml(document, window.location.href); 868 | if (data === null) { return; } 869 | windowMessage.post(e.source, "galleryInfoResponse", toCommonJson(data)); 870 | }); 871 | } 872 | 873 | function createGalleryPageDownloadLink() { 874 | const galleryRightSidebar = require("../api/gallery-right-sidebar"); 875 | const link = galleryRightSidebar.createLink("Metadata JSON", 0).link; 876 | if (link === null) { return; } 877 | 878 | link.setAttribute("download", "info.json"); 879 | link.href = "#"; 880 | 881 | link.addEventListener("click", onDownloadLinkClicked, false); 882 | link.addEventListener("auxclick", onDownloadLinkClicked, false); 883 | } 884 | 885 | function getGalleryInfo() { 886 | try { 887 | return getFromHtml(document, window.location.href); 888 | } catch (e) { 889 | console.error(e); 890 | return null; 891 | } 892 | } 893 | 894 | function createDownloadDataUrl(info) { 895 | const infoString = JSON.stringify(info, null, " "); 896 | const blob = new Blob([ infoString ], { type: "application/json" }); 897 | return URL.createObjectURL(blob); 898 | } 899 | 900 | function onDownloadLinkClicked(e) { 901 | /* jshint -W040 */ 902 | if (downloadDataUrl === null) { 903 | const info = getGalleryInfo(); 904 | if (info === null) { 905 | console.error("Failed to create download data"); 906 | e.preventDefault(); 907 | e.stopPropagation(); 908 | return false; 909 | } 910 | 911 | downloadDataUrl = createDownloadDataUrl(toCommonJson(info)); 912 | this.setAttribute("href", downloadDataUrl); 913 | } 914 | /* jshint +W040 */ 915 | } 916 | 917 | 918 | function setupTorrentPage() { 919 | if (!window.opener) { return; } 920 | 921 | const identifier = getGalleryIdentifierFromTorrentPageUrl(window.location.href); 922 | if (identifier === null) { return; } 923 | 924 | windowMessage.registerCommand("galleryInfoResponse", (e, info) => { 925 | if (downloadDataUrl !== null || !isValidInfo(info, identifier)) { return; } 926 | downloadDataUrl = createDownloadDataUrl(info); 927 | createTorrentPageDownloadLinks(downloadDataUrl); 928 | }); 929 | windowMessage.post(window.opener, "galleryInfoRequest"); 930 | } 931 | 932 | function getGalleryIdentifierFromTorrentPageUrl(url) { 933 | const params = queryString.getUrlParameters(url); 934 | if (!params.hasOwnProperty("gid") || !params.hasOwnProperty("t")) { return null; } 935 | 936 | const id = parseInt(params.gid, 10); 937 | if (Number.isNaN(id)) { return null; } 938 | 939 | return new GalleryIdentifier(id, params.t); 940 | } 941 | 942 | function isValidInfo(info, identifier) { 943 | const g = info.gallery; 944 | return ( 945 | g !== null && typeof(g) === "object" && 946 | g.gid === identifier.id && 947 | g.token === identifier.token); 948 | } 949 | 950 | function createTorrentPageDownloadLinks(url) { 951 | const tables = document.querySelectorAll("#torrentinfo form table>tbody"); 952 | for (const table of tables) { 953 | const torrentLink = table.querySelector("tr:nth-of-type(3)>td"); 954 | if (torrentLink === null) { continue; } 955 | 956 | const text = torrentLink.textContent; 957 | const whitespace = /^\s*/.exec(text)[0]; 958 | const torrentFileName = text.trim().replace(/\.[^\.]*$/, ""); 959 | 960 | const row = document.createElement("tr"); 961 | 962 | const cell = document.createElement("td"); 963 | cell.setAttribute("colspan", "5"); 964 | 965 | if (whitespace.length > 0) { 966 | cell.appendChild(document.createTextNode(whitespace)); 967 | } 968 | 969 | const link = document.createElement("a"); 970 | link.setAttribute("download", `${torrentFileName}.info.json`); 971 | link.href = url; 972 | link.textContent = "Metadata JSON"; 973 | cell.appendChild(link); 974 | 975 | row.appendChild(cell); 976 | table.appendChild(row); 977 | } 978 | } 979 | 980 | 981 | function main() { 982 | const currentPageType = pageType.get(document, location); 983 | 984 | switch (currentPageType) { 985 | case "gallery": 986 | setupGalleryPage(); 987 | break; 988 | case "torrentInfo": 989 | setupTorrentPage(); 990 | break; 991 | } 992 | } 993 | 994 | 995 | ready.onReady(main); 996 | 997 | },{"../api/gallery-identifier":1,"../api/gallery-info/common-json":2,"../api/gallery-info/get-from-html":3,"../api/gallery-right-sidebar":6,"../api/page-type":7,"../query-string":11,"../ready":12,"../window-message":14}],11:[function(require,module,exports){ 998 | "use strict"; 999 | 1000 | function getUrlParameters(url) { 1001 | const result = {}; 1002 | const match = /^([^#\?]*)(\?[^#]*)?(#[\w\W]*)?$/.exec(url); 1003 | if (match !== null && match[2] && match[2].length > 1) { 1004 | const pattern = /([^=]*)(?:=([\w\W]*))?/; 1005 | for (const part of match[2].substr(1).split("&")) { 1006 | if (part.length === 0) { continue; } 1007 | const match2 = pattern.exec(part); 1008 | const value = match2[2]; 1009 | result[decodeURIComponent(match2[1])] = (value !== undefined ? decodeURIComponent(value) : null); 1010 | } 1011 | } 1012 | return result; 1013 | } 1014 | 1015 | function removeQueryParameter(url, parameterName) { 1016 | return url.replace( 1017 | new RegExp(`([&\\?])${parameterName}(?:(?:=[^&]*)?(&|$))`), 1018 | (m0, m1, m2) => (m1 === "?" && m2 ? "?" : m2)); 1019 | } 1020 | 1021 | 1022 | module.exports = { 1023 | getUrlParameters, 1024 | removeQueryParameter 1025 | }; 1026 | 1027 | },{}],12:[function(require,module,exports){ 1028 | "use strict"; 1029 | 1030 | let isReadyValue = false; 1031 | let callbacks = null; 1032 | let checkIntervalId = null; 1033 | const checkIntervalRate = 250; 1034 | 1035 | 1036 | function isHooked() { 1037 | return callbacks !== null; 1038 | } 1039 | 1040 | function hook() { 1041 | callbacks = []; 1042 | window.addEventListener("load", checkIfReady, false); 1043 | window.addEventListener("DOMContentLoaded", checkIfReady, false); 1044 | document.addEventListener("readystatechange", checkIfReady, false); 1045 | checkIntervalId = setInterval(checkIfReady, checkIntervalRate); 1046 | } 1047 | 1048 | function unhook() { 1049 | const cbs = callbacks; 1050 | 1051 | callbacks = null; 1052 | window.removeEventListener("load", checkIfReady, false); 1053 | window.removeEventListener("DOMContentLoaded", checkIfReady, false); 1054 | document.removeEventListener("readystatechange", checkIfReady, false); 1055 | clearInterval(checkIntervalId); 1056 | checkIntervalId = null; 1057 | 1058 | invoke(cbs); 1059 | } 1060 | 1061 | function invoke(callbacks) { 1062 | for (let cb of callbacks) { 1063 | try { 1064 | cb(); 1065 | } 1066 | catch (e) { 1067 | console.error(e); 1068 | } 1069 | } 1070 | } 1071 | 1072 | function isReady() { 1073 | if (isReadyValue) { return true; } 1074 | 1075 | if (document.readyState === "interactive" || document.readyState === "complete") { 1076 | if (isHooked()) { unhook(); } 1077 | isReadyValue = true; 1078 | return true; 1079 | } 1080 | return false; 1081 | } 1082 | 1083 | function checkIfReady() { 1084 | isReady(); 1085 | } 1086 | 1087 | 1088 | function onReady(callback) { 1089 | if (isReady()) { 1090 | callback(); 1091 | return; 1092 | } 1093 | 1094 | if (!isHooked()) { hook(); } 1095 | 1096 | callbacks.push(callback); 1097 | } 1098 | 1099 | 1100 | module.exports = { 1101 | onReady: onReady, 1102 | get isReady() { return isReady(); } 1103 | }; 1104 | 1105 | },{}],13:[function(require,module,exports){ 1106 | "use strict"; 1107 | 1108 | let apiStyle = null; 1109 | 1110 | 1111 | function getId(id) { 1112 | return `${id}-stylesheet`; 1113 | } 1114 | 1115 | function getStylesheet(id) { 1116 | return document.getElementById(getId(id)); 1117 | } 1118 | 1119 | function hasStylesheet(id) { 1120 | return !!getStylesheet(id); 1121 | } 1122 | 1123 | function addStylesheet(source, id) { 1124 | if (apiStyle === null) { apiStyle = require("./api/style"); } 1125 | apiStyle.setDocumentDarkFlag(); 1126 | 1127 | const style = document.createElement("style"); 1128 | style.textContent = source; 1129 | if (typeof(id) === "string") { 1130 | style.id = getId(id); 1131 | } 1132 | document.head.appendChild(style); 1133 | return style; 1134 | } 1135 | 1136 | 1137 | module.exports = { 1138 | hasStylesheet, 1139 | getStylesheet, 1140 | addStylesheet 1141 | }; 1142 | 1143 | },{"./api/style":8}],14:[function(require,module,exports){ 1144 | "use strict"; 1145 | 1146 | let commands = null; 1147 | 1148 | 1149 | function registerCommand(commandName, callback) { 1150 | if (commands === null) { 1151 | commands = {}; 1152 | window.addEventListener("message", onWindowMessage, false); 1153 | } 1154 | 1155 | commands[commandName] = callback; 1156 | } 1157 | 1158 | function post(targetWindow, commandName, data) { 1159 | targetWindow.postMessage({ 1160 | xData: { command: commandName, data: data } 1161 | }, window.location.origin); 1162 | } 1163 | 1164 | function onWindowMessage(e) { 1165 | if (e.origin !== window.origin) { return; } 1166 | 1167 | let data = e.data; 1168 | if (data === null || typeof(data) !== "object") { return; } 1169 | 1170 | data = data.xData; 1171 | if (data === null || typeof(data) !== "object") { return; } 1172 | if (typeof(data.command) !== "string") { return; } 1173 | 1174 | const callback = commands[data.command]; 1175 | if (typeof(callback) !== "function") { return; } 1176 | 1177 | callback(e, data.data); 1178 | } 1179 | 1180 | 1181 | module.exports = { 1182 | registerCommand, 1183 | post 1184 | }; 1185 | 1186 | },{}]},{},[10]) 1187 | //# sourceMappingURL=data:application/json;charset=utf-8;base64, 1188 | -------------------------------------------------------------------------------- /writeInfo.py: -------------------------------------------------------------------------------- 1 | from compress import compress 2 | from pathlib import Path 3 | import json 4 | import math 5 | import pprint 6 | import sys 7 | import zipfile 8 | 9 | cwd = Path.cwd() 10 | work = cwd / 'work' 11 | out = cwd / 'out' 12 | 13 | pp = pprint.PrettyPrinter(indent=2) 14 | 15 | def writeInfo(fileStem, info, verbose): 16 | if verbose: 17 | pp.pprint(info) 18 | try: 19 | xmlData = f''' 20 | 21 | <![CDATA[{info['Title']}]]> 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | ''' # 39 | xmlDataPath = work / fileStem / 'ComicInfo.xml' 40 | xmlDataPath.write_text(xmlData, encoding='UTF-8') 41 | 42 | cr = compress(fileStem, verbose) 43 | if(not cr[0]): 44 | return [False, cr[1]] 45 | 46 | jsonData = json.loads('{"ComicBookInfo/1.0": {}}') 47 | 48 | jsonData['ComicBookInfo/1.0']['comments'] = info['Comments'] 49 | jsonData['ComicBookInfo/1.0']['credits'] = list(map(lambda x: {'person': x, 'role': 'Writer'}, info['writer'])) 50 | jsonData['ComicBookInfo/1.0']['genre'] = info['Genre'] 51 | jsonData['ComicBookInfo/1.0']['issue'] = info['issue'] 52 | jsonData['ComicBookInfo/1.0']['language'] = info['LanguageISO'] 53 | jsonData['ComicBookInfo/1.0']['publicationMonth'] = info['Month'] 54 | jsonData['ComicBookInfo/1.0']['publicationYear'] = info['Year'] 55 | jsonData['ComicBookInfo/1.0']['publisher'] = info['Publisher'] 56 | jsonData['ComicBookInfo/1.0']['rating'] = math.floor(info['Rating']*2) or 1 57 | jsonData['ComicBookInfo/1.0']['series'] = info['series'] 58 | jsonData['ComicBookInfo/1.0']['tags'] = info['tags'] 59 | jsonData['ComicBookInfo/1.0']['title'] = info['Title'] 60 | 61 | zipNote = json.dumps(jsonData, ensure_ascii=False, sort_keys=True).encode('utf-8') 62 | print(f'zip note size: {len(zipNote)} bytes/65535 bytes') 63 | f = out / f'{fileStem}.zip' 64 | fzip = zipfile.ZipFile(f, 'a', compression=zipfile.ZIP_DEFLATED, compresslevel=6) 65 | fzip.comment = zipNote 66 | fzip.close() 67 | newName = out / f'{fileStem}.cbz' 68 | f.rename(newName) 69 | except: 70 | return [False, sys.exc_info()] 71 | return [True] --------------------------------------------------------------------------------