├── .gitignore
├── LICENSE
├── README.md
├── checker.py
├── checker_custom_path.py
├── compress.py
├── cookie.tmpl.py
├── genInfo.py
├── main.py
├── requirements.txt
├── start.cmd
├── utils
├── README.md
├── db.text.json
└── x-gallery-metadata.user.js
└── writeInfo.py
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # test module 个人习惯,测试目录去掉
10 | /test/
11 |
12 | # Distribution / packaging
13 | .Python
14 | build/
15 | develop-eggs/
16 | dist/
17 | downloads/
18 | eggs/
19 | .eggs/
20 | lib/
21 | lib64/
22 | parts/
23 | sdist/
24 | var/
25 | wheels/
26 | pip-wheel-metadata/
27 | share/python-wheels/
28 | *.egg-info/
29 | .installed.cfg
30 | *.egg
31 | MANIFEST
32 |
33 | # PyInstaller
34 | # Usually these files are written by a python script from a template
35 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
36 | *.manifest
37 | *.spec
38 | .idea/
39 |
40 | # Installer logs
41 | pip-log.txt
42 | pip-delete-this-directory.txt
43 |
44 | # Unit test / coverage reports
45 | htmlcov/
46 | .tox/
47 | .nox/
48 | .coverage
49 | .coverage.*
50 | .cache
51 | nosetests.xml
52 | coverage.xml
53 | *.cover
54 | .hypothesis/
55 | .pytest_cache/
56 |
57 | # Translations
58 | *.mo
59 | *.pot
60 |
61 | # Django stuff:
62 | *.log
63 | local_settings.py
64 | db.sqlite3
65 |
66 | # Flask stuff:
67 | instance/
68 | .webassets-cache
69 |
70 | # Scrapy stuff:
71 | .scrapy
72 |
73 | # Sphinx documentation
74 | docs/_build/
75 |
76 | # PyBuilder
77 | target/
78 |
79 | # Jupyter Notebook
80 | .ipynb_checkpoints
81 |
82 | # IPython
83 | profile_default/
84 | ipython_config.py
85 |
86 | # pyenv
87 | .python-version
88 |
89 | # celery beat schedule file
90 | celerybeat-schedule
91 |
92 | # SageMath parsed files
93 | *.sage.py
94 |
95 | # Environments
96 | .env
97 | .venv
98 | env/
99 | venv/
100 | ENV/
101 | env.bak/
102 | venv.bak/
103 |
104 | # Spyder project settings
105 | .spyderproject
106 | .spyproject
107 |
108 | # Rope project settings
109 | .ropeproject
110 |
111 | # mkdocs documentation
112 | /site
113 |
114 | # mypy
115 | .mypy_cache/
116 | .dmypy.json
117 | dmypy.json
118 |
119 | # Pyre type checker
120 | .pyre/
121 |
122 | out/
123 | work/
124 | cookie.py
125 |
126 | # outputs
127 | *.json
128 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2020 S4kura0ne
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # hentaiTagger4calibre
2 | A tag converter for calibre
3 |
4 | ---
5 |
6 |
7 | ### 注意
8 |
9 | 为了提升阅读体验,我的新本子站已经迁移到了 [LANraragi](https://github.com/Difegue/LANraragi). 因此这个repo将不那么频繁地进行维护。
10 |
11 | LANraragi的优点:
12 |
13 | - Calibre-web 在线阅读时需要 **加载整个cbz文件**, LANraragi 支持 **服务端解压加载,传输单张图片给浏览器**.
14 | - 支持**直接输入 e-hentai 网址** 下载本子, 支持 **自动从 e-hentai 和 n-hentai 下载标签标题信息**.
15 | - 更好的**标签管理**, 尤其适合本子 **有许多标签** 的情形, calibre-web里存在太多标签会使得标签系统失去作用.
16 | - **不更改文件的hash值**, 因此下载的问价你可以直接从 e-hentai 服务器溯源,或是作为种子文件再次上传.
17 | - 不需要学会python的用法/魔改此脚本.
18 |
19 | 这个脚本的优点:
20 |
21 | - 支持嵌入 **eze 的 info.json**, 意味着不想LANraragi将信息单独存放在它自己的数据库中, **所有的** 元信息 **都和本子在一起**.
22 | - 支持检查画廊更新,有时候有些画廊会,每周更新,这个脚本可以方便地将其捞出来.
23 | - **精准地导入元数据**, 当多个汉化组同时汉化一本本子时,LANraragi自带的搜刮器可能会下错翻译组的信息.
24 | - **兼容** 包括 calibre and LANraragi 在内的所有阅读方案.
25 | - 我自己写的,因此遇到新的需求可以直接修改.
26 |
27 |
28 | ### Notice
29 |
30 | I have migrated to [LANraragi](https://github.com/Difegue/LANraragi) due to its better web reading experience. So this repo is maintained in a less frequent status.
31 |
32 | Advantages in LANraragi:
33 |
34 | - Calibre-web requires **loading of the whole cbz file**, and LANraragi supports decompress cbz file **at server-side**.
35 | - Support direct download **by inputing the e-hentai url**, and support **automatically scrub the meta info from e-hentai and n-hentai**.
36 | - Better **tag management**, especially designed for commic files with **lots of tags**, in calibre-web, too much tag makes the whole tag system unavailable to use.
37 | - **Not modify the hash of the archive**, means that using that hash, the archive can be found more easily on e-hentai server, or be uploaded as a bittorrent file.
38 | - No need to learn python to run this script
39 |
40 | Advantages of this script:
41 |
42 | - support embedding **eze info.json**, which means **all** meta infos are **with the cbz file**, not like LANraragi, meta infomation are stored in its seperated database.
43 | - support checking for update. Sometimes some galley will have new images uploaded, this script can help find these out-dated archives.
44 | - **More precise while importing meta**, when importing with LANraragi, some meta may be downloaded from the wrong galley, may caused by multiple translation group are translating the same galley.
45 | - **compatible** with all solutions like calibre and LANraragi
46 | - Wrote by myself, so it is more easy to modify when new requirements exist.
47 |
48 | ### Introduction
49 |
50 | This simple python3 app can convert metadata in archive zip file downloaded from e-hentai or exhentai to a format that calibre can recognize.
51 |
52 | ### Requirements
53 |
54 | - A windows machine, linux not tested
55 | - A plugin called [Embeded Comic metadata](https://github.com/dickloraine/EmbedComicMetadata) should be installed on calibre.
56 | - p7zip or 7zip in PATH.
57 |
58 | ```bash
59 | sudo apt install python p7zip-full # on windows: choco install python 7zip
60 | pip -r requirements.txt
61 | ```
62 |
63 | ### Usage
64 |
65 | - Download a zip archive from one of two hentai websites
66 | - Use [this script](https://raw.githubusercontent.com/dnsev-h/x/master/builds/x-gallery-metadata.user.js) to get metadata in a form of info.json (from https://dnsev-h.github.io/x/), and add it into the zip file.
67 | - Delete all intermediate and final outputs like `inf.json`, `ser.json`, `out/`
68 | - Uncompress the zip file, move the output folder into `work/` subfolder.
69 |
70 | Work folder should look like this:
71 |
72 | ```
73 | │ 1_info.py
74 | │ 2_compress.cmd
75 | │ 3_zipNote.py
76 | │
77 | └─work
78 | ├─commic1
79 | │ 1.png
80 | │ 2.png
81 | │ info.json
82 | │
83 | └─commic2
84 | 1.png
85 | 2.png
86 | info.json
87 | ```
88 |
89 | - Make sure all the requirements are satisfied.
90 | - Run `python main.py` in order.
91 |
92 | The final cbz files should appear in `out/` subfolder.
93 |
94 |
95 |
96 | ### Checker
97 |
98 | Specify your cbz path in `checker_custom_path.py`, and run it. It will help you check if your books are up-to-date.
99 |
100 | ~~The `checker.py` can check whether books recorded in `inf.json` are all visible now. It is useful to use this script to track some ongoing comics since they will be replaced and become invisible. ~~
101 |
--------------------------------------------------------------------------------
/checker.py:
--------------------------------------------------------------------------------
1 | import requests
2 | from pathlib import Path
3 | import json
4 | import time
5 | from cookie import cookie
6 |
7 | cwd = Path.cwd()
8 | infPath = cwd / 'inf.json'
9 | chgPath = cwd / 'chg.json'
10 | errPath = cwd / 'err.json'
11 |
12 | infStore = json.loads(infPath.read_text(encoding='UTF-8'))
13 |
14 | proxies = {
15 | 'http': 'http://127.0.0.1:7890',
16 | 'https': 'http://127.0.0.1:7890',
17 | }
18 |
19 | def getPage(url, s):
20 | r = s.get(url)#, proxies=proxies)
21 | # print(r.text)
22 | return r.text
23 |
24 | def stillThere(url, s):
25 | return getPage(url, s).find('
| Visible: | Yes |
') != -1
26 |
27 | changed = []
28 | error = []
29 |
30 | counter = 0
31 |
32 | s = requests.Session()
33 | requests.utils.add_dict_to_cookiejar(s.cookies, cookie)
34 | s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'
35 |
36 | for i in infStore:
37 | try:
38 | counter += 1
39 | if counter < 0: # when error occured, change this to continue
40 | continue
41 | url = infStore[i]["Web"]
42 | print(f'{counter}/{len(infStore)}')
43 | if not stillThere(url, s):
44 | print(url)
45 | # print(stillThere(url))
46 | changed.append(url)
47 | # time.sleep(1)
48 | except:
49 | print(f'err:{0}', url)
50 | error.append(url)
51 |
52 | print(changed)
53 | print(error)
54 | chgPath.write_text(json.dumps(changed, ensure_ascii=False, indent=2), encoding='UTF-8')
55 | errPath.write_text(json.dumps(error, ensure_ascii=False, indent=2), encoding='UTF-8')
56 |
--------------------------------------------------------------------------------
/checker_custom_path.py:
--------------------------------------------------------------------------------
1 | import requests
2 | from pathlib import Path
3 | import json
4 | import time
5 | import zipfile
6 | import pprint
7 | from xml.dom.minidom import parseString
8 | from cookie import cookie
9 |
10 | cbzPath = Path('\\\\server\\share')
11 | # cbzPath = Path('D:\\books')
12 |
13 | cwd = Path.cwd()
14 | chgPath = cwd / 'chg.json'
15 | errPath = cwd / 'err.json'
16 |
17 | cbzList = list(cbzPath.glob('**/*.cbz'))
18 |
19 | print(f'Found {len(cbzList)} files.')
20 |
21 | urls = []
22 | changed = []
23 | error = []
24 |
25 | for filePath in cbzList:
26 | try:
27 | zipobj = zipfile.ZipFile(filePath)
28 | xmlobj = zipobj.read('ComicInfo.xml').decode()
29 | dom = parseString(xmlobj)
30 | urls.append(dom.getElementsByTagName('Web')[0].childNodes[0].data)
31 | except:
32 | print(f'err:{0}', filePath)
33 | error.append(filePath)
34 |
35 | proxies = {
36 | 'http': 'http://127.0.0.1:7890',
37 | 'https': 'http://127.0.0.1:7890',
38 | }
39 |
40 | def getPage(url, s):
41 | r = s.get(url)#, proxies=proxies)
42 | # print(r.text)
43 | return r.text
44 |
45 | def stillThere(url, s):
46 | return getPage(url, s).find('| Visible: | Yes |
') != -1
47 |
48 |
49 | counter = 0
50 |
51 | s = requests.Session()
52 | requests.utils.add_dict_to_cookiejar(s.cookies, cookie)
53 | s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'
54 |
55 | for url in urls:
56 | try:
57 | counter += 1
58 | if counter < 0: # when error occured, change this to continue
59 | continue
60 | print(f'{counter}/{len(urls)}')
61 | if not stillThere(url, s):
62 | print(url)
63 | # print(stillThere(url))
64 | changed.append(url)
65 | # time.sleep(1)
66 | except:
67 | print(f'err:{0}', url)
68 | error.append(url)
69 |
70 | print('changed:')
71 | pprint.pprint(changed)
72 | print('error:')
73 | pprint.pprint(error)
74 | chgPath.write_text(json.dumps(changed, ensure_ascii=False, indent=2), encoding='UTF-8')
75 | errPath.write_text(json.dumps(error, ensure_ascii=False, indent=2), encoding='UTF-8')
76 |
--------------------------------------------------------------------------------
/compress.py:
--------------------------------------------------------------------------------
1 | import subprocess
2 | import multiprocessing
3 | import sys
4 |
5 | corecount = multiprocessing.cpu_count()
6 |
7 | def compress(dirName, verbose = False):
8 | command = f'7z a -r -scsUTF-8 -sccUTF-8 -mx6 -mmt{corecount} "out/{dirName}.zip" "./work/{dirName}/*"&&7z t "out/{dirName}.zip"'
9 | print(command)
10 | try:
11 | subprocess.run(command, shell=True, check=True)
12 | except:
13 | return [False, sys.exc_info()]
14 | return [True]
15 |
--------------------------------------------------------------------------------
/cookie.tmpl.py:
--------------------------------------------------------------------------------
1 | cookie = {
2 | 'ipb_member_id': '',
3 | 'ipb_pass_hash': '',
4 | 'igneous': '',
5 | }
--------------------------------------------------------------------------------
/genInfo.py:
--------------------------------------------------------------------------------
1 | from pathlib import Path
2 | import json
3 | import pprint
4 | import pycountry
5 | import re
6 | import sys
7 | import urllib.parse
8 |
9 | pp = pprint.PrettyPrinter(indent=2)
10 |
11 | cwd = Path.cwd()
12 | utilsPath = cwd / 'utils'
13 | transPath = utilsPath / 'db.text.json'
14 | trans = json.loads(transPath.read_text(encoding='UTF-8'))
15 |
16 | def getCore(st):
17 | t1 = re.sub(u"\\「.*?\\」|\\(.*?\\)|\\(.*?)|\\{.*?}|\\[.*?]|\\【.*?】", "", st).strip()
18 | if t1 == '':
19 | t1 = re.sub(u"\\(.*?\\)|\\(.*?)|\\{.*?}|\\[.*?]|\\【.*?】", "", st).strip()
20 | if t1 == '':
21 | t1 = re.sub(u"\\(.*?)|\\{.*?}|\\[.*?]|\\【.*?】", "", st).strip()
22 | return t1
23 |
24 | def getSeries(st):
25 | core = getCore(st)
26 | iss = 1.0
27 | ser = core
28 |
29 | # ①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳
30 | replaceList = [
31 | ('①', 1),
32 | ('②', 2),
33 | ('③', 3),
34 | ('④', 4),
35 | ('⑤', 5),
36 | ('⑥', 6),
37 | ('⑦', 7),
38 | ('⑧', 8),
39 | ('⑨', 9),
40 | ('⑩', 10),
41 | ('⑪', 11),
42 | ('⑫', 12),
43 | ('⑬', 13),
44 | ('⑭', 14),
45 | ('⑮', 15),
46 | ('⑯', 16),
47 | ('⑰', 17),
48 | ('⑱', 18),
49 | ('⑲', 19),
50 | ('⑳', 20),
51 | ]
52 | for char in replaceList:
53 | core = core.replace(char[0], str(char[1]))
54 |
55 | # 2020年10月号
56 | # 2021月2号
57 | if '月' in core[-4:] and '号' in core[-4:]:
58 | while (ser[-1] != ' '):
59 | ser = ser[:-1]
60 | ser = ser.strip()
61 | return [ser, iss]
62 |
63 | # Artist Galleries :::
64 | if core[:16].lower() == 'artist galleries':
65 | ser = core[16:].strip().strip(':').strip()
66 | return [ser, iss]
67 |
68 | # 从本子了解汉化教程
69 | if core[:9] == '从本子了解汉化教程':
70 | ser = '从本子了解汉化教程'
71 | if core[9:][0].isdigit():
72 | iss = float(core[9:][0])
73 | return [ser, iss]
74 |
75 | # 美羽ちゃんとベランダXX
76 | if core == '美羽ちゃんとベランダXX':
77 | return [ser, iss]
78 |
79 | # 1階
80 | if core[-1] == '階' and core[-2].isdigit():
81 | count = -2
82 | while core[count - 1].isdigit():
83 | count = count - 1
84 | ser = core[:count].strip()
85 | iss = float(core[count:-1])
86 | return [ser, iss]
87 | # 1
88 | # 01
89 | # 1.5
90 | if core[-1].isdigit():
91 | count = -1
92 | while core[count - 1].isdigit() or (core[count - 1] == '.' and core[count - 2].isdigit()):
93 | count = count - 1
94 | ser = core[:count].strip()
95 | iss = float(core[count:])
96 | # 01
97 | # vol.1
98 | # Vol,01
99 | if ser[-1:] == '#' or ser[-1:] == '.' or ser[-1:] == ',':
100 | ser = ser[:-1].strip()
101 | # vol 1
102 | if ser[-3:].lower() == 'vol':
103 | ser = ser[:-3].strip()
104 | # LEVEL:1
105 | if ser[-6:].lower() == 'level:':
106 | ser = ser[:-6].strip()
107 | # 1+2
108 | # 1-2
109 | if ser[-1:] == '+' or ser[-1:] == '-':
110 | iss = 1.0
111 | ser = core
112 |
113 | # roman numerals
114 | # Ⅰ
115 | # Ⅱ
116 | # Ⅲ
117 | # Ⅳ
118 | # Ⅴ
119 | # Ⅵ
120 | # Ⅶ
121 | # Ⅷ
122 | # Ⅸ
123 | # Ⅹ
124 | # Ⅺ
125 | # Ⅻ
126 | # XIII
127 | # XIV
128 | # XV
129 | rn = [
130 | ['Ⅰ', 1.0],
131 | ['Ⅱ', 2.0],
132 | ['Ⅲ', 3.0],
133 | ['Ⅳ', 4.0],
134 | ['Ⅴ', 5.0],
135 | ['Ⅵ', 6.0],
136 | ['Ⅶ', 7.0],
137 | ['Ⅷ', 8.0],
138 | ['Ⅸ', 9.0],
139 | ['Ⅹ', 10.0],
140 | ['Ⅺ', 11.0],
141 | ['Ⅻ', 12.0],
142 | ['XIII', 13.0],
143 | ['VIII', 8.0],
144 | ['XIV', 14.0],
145 | ['XII', 12.0],
146 | ['VII', 7.0],
147 | ['III', 3.0],
148 | ['XV', 15.0],
149 | ['XI', 11.0],
150 | ['VI', 6.0],
151 | ['IX', 9.0],
152 | ['IV', 4.0],
153 | ['II', 2.0],
154 | ['X', 10.0],
155 | ['V', 5.0],
156 | ['I', 1.0],
157 | ]
158 | for r in rn:
159 | l = len(r[0])
160 | if core[-l:] == r[0]:
161 | ser = core[:-l].strip()
162 | iss = r[1]
163 | return [ser, iss]
164 |
165 | # 援助交配
166 | if core[:4] == '援助交配':
167 | ser = '援助交配'
168 | return [ser, iss]
169 |
170 | # ネコぱら01 おまけ本
171 | if core == 'ネコぱら01 おまけ本':
172 | ser = 'ネコぱら'
173 | return [ser, iss]
174 |
175 | # Arknights Character Fan Art Gallery
176 | if core[:35].lower() == 'arknights character fan art gallery':
177 | ser = 'Arknights Character Fan Art Gallery'
178 | return [ser, iss]
179 |
180 | return [ser, iss]
181 |
182 | def gett(index, st):
183 | if st.lower() in trans['data'][index]['data']:
184 | return trans['data'][index]['data'][st]['name']
185 | else:
186 | return None
187 |
188 | def trasgroup(d):
189 | res = []
190 | for i in d:
191 | a = gett(i[0], i[1])
192 | if a != None:
193 | res.append(a)
194 | return res
195 |
196 | def genInfo(dir, verbose = False):
197 | try:
198 | infoPath = dir / 'info.json'
199 | infoText = infoPath.read_text(encoding='UTF-8')
200 | infoJson = json.loads(infoText)
201 | except:
202 | return [False, sys.exc_info()]
203 |
204 | info = {}
205 | info['Title'] = infoJson['gallery_info']['title_original'] or infoJson['gallery_info']['title']
206 | info['Genre'] = infoJson['gallery_info']['category']
207 | info['Language'] = infoJson['gallery_info']['language']
208 | info['UploadDate'] = infoJson['gallery_info']['upload_date']
209 | info['Year'] = info['UploadDate'][0]
210 | info['Month'] = info['UploadDate'][1]
211 | info['Day'] = info['UploadDate'][2]
212 | info['PageCount'] = infoJson['gallery_info_full']['image_count']
213 | info['Rating'] = infoJson['gallery_info_full']['rating']['average']
214 | info['Publisher'] = urllib.parse.unquote(infoJson['gallery_info_full']['uploader'])
215 |
216 | if infoJson['gallery_info']['source']['site'] == 'exhentai':
217 | info['Web'] = f'https://exhentai.org/g/{infoJson["gallery_info"]["source"]["gid"]}/{infoJson["gallery_info"]["source"]["token"]}/'
218 | elif infoJson['gallery_info']['source']['site'] == 'e-hentai':
219 | info['Web'] = f'https://e-hentai.org/g/{infoJson["gallery_info"]["source"]["gid"]}/{infoJson["gallery_info"]["source"]["token"]}/'
220 | elif infoJson['gallery_info']['source']['site'] == 'acg18':
221 | info['Web'] = f'https://acg18.moe/{infoJson["gallery_info"]["source"]["gid"]}.html'
222 |
223 | info['Imprint'] = re.match(r'^(?:\()(.+?)(?:\))', infoJson['gallery_info']['title'])
224 | if(info['Imprint'] != None):
225 | info['Imprint'] = info['Imprint'].group(1)
226 | else:
227 | info['Imprint'] = infoJson['gallery_info_full']['source_site']
228 |
229 | # begin tags
230 | info['tags'] = []
231 |
232 | info['tags'].append(info['Genre'])
233 | transtags = [[1, info['Genre']]]
234 |
235 | keywords = [
236 | ['language', 2],
237 | ['parody', 3],
238 | ['character', 4],
239 | ['male', 7],
240 | ['female', 8],
241 | ['misc', 9],
242 | ]
243 |
244 | for typ in keywords:
245 | if typ[0] in infoJson['gallery_info']['tags']:
246 | for tag in infoJson['gallery_info']['tags'][typ[0]]:
247 | info['tags'].append(tag)
248 | transtags.append([typ[1], tag])
249 |
250 | rtagInTitle=re.findall(r'\[(.+?)\]|\((.+?)\)|【(.+?)】|((.+?))', infoJson['gallery_info']['title'])
251 | tagInTitle = []
252 | for x in rtagInTitle:
253 | tagInTitle += list(x)
254 |
255 | info['tags'] = trasgroup(transtags) + tagInTitle + info['tags']
256 |
257 | info['tags'] = list(dict.fromkeys(info['tags']))
258 |
259 | if '' in info['tags']:
260 | info['tags'].remove('')
261 |
262 | # end tags
263 |
264 | # begin writer
265 | info['writer'] = []
266 | transwris = []
267 | if 'group' in infoJson['gallery_info']['tags']:
268 | for t in infoJson['gallery_info']['tags']['group']:
269 | info['writer'].append(t)
270 | transwris.append([5, t])
271 | if 'artist' in infoJson['gallery_info']['tags']:
272 | for t in infoJson['gallery_info']['tags']['artist']:
273 | info['writer'].append(t)
274 | transwris.append([6, t])
275 | tg = trasgroup(transwris)
276 | ltg = [x.lower() for x in tg]
277 | awrite = []
278 | for x in info['writer']:
279 | if x.lower() not in ltg:
280 | awrite.append(x)
281 | info['writer'] = tg + awrite
282 | info['writer'] = list(dict.fromkeys(info['writer']))
283 | # end writer
284 |
285 | # begin characters
286 | info['characters'] = []
287 | transchars = []
288 | if 'character' in infoJson['gallery_info']['tags']:
289 | for t in infoJson['gallery_info']['tags']['character']:
290 | info['characters'].append(t)
291 | transchars.append([4, t])
292 | tg = trasgroup(transchars)
293 | ltg = [x.lower() for x in tg]
294 | achar = []
295 | for x in info['characters']:
296 | if x.lower() not in ltg:
297 | achar.append(x)
298 | info['characters'] = tg + achar
299 | info['characters'] = list(dict.fromkeys(info['characters']))
300 | # end characters
301 |
302 | # begin series
303 | info['coreTitle'] = getCore(info['Title'])
304 | info['series'], info['issue'] = getSeries(info['coreTitle'])
305 | # [Pixiv]
306 | # [pixiv]
307 | # [Pixiv Fanbox]
308 | if info['Title'][1:6].lower() == 'pixiv':
309 | info['series'], info['issue'] = ['Pixiv', 1.0]
310 | # [Twitter]
311 | if info['Title'][1:8].lower() == 'twitter':
312 | info['series'], info['issue'] = ['Twitter', 1.0]
313 | # Karorfulmix♥EX
314 | if info['series'] == 'Karorfulmix♥EX':
315 | info['series'] = 'KARORFUL MIX EX'
316 |
317 | cau = ['-', '-', ':', ':', '~', ']', '[', '(', ')', '「', '」', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '+']
318 | cauFlag = False
319 | for c in cau:
320 | if c in info['series']:
321 | cauFlag = True
322 | if cauFlag:
323 | info['coreTitle'] = f"[CAUTION]{info['coreTitle']}"
324 |
325 | #end series
326 |
327 | if info['Genre'] == 'non-h':
328 | info['AgeRating'] = 'Teen'
329 | else:
330 | info['AgeRating'] = 'Adults Only 18+'
331 |
332 | if info['Genre'] in ['doujinshi', 'manga']:
333 | info['Manga'] = 'Yes'
334 | else:
335 | info['Manga'] = 'No'
336 |
337 | info['Writer'] = ', '.join(str(p) for p in info['writer'])
338 | info['Characters'] = ', '.join(str(p) for p in info['characters'])
339 | info['LanguageISO'] = pycountry.languages.get(name=info['Language']).alpha_2
340 | info['Comments'] = f'''Web: {info['Web']}
Rating: {info['Rating']}, {infoJson['gallery_info_full']['rating']['count']}
PageCount: {info['PageCount']}
Genre: {info['Genre']}
Imprint: {info['Imprint']}
AgeRating: {info['AgeRating']}
UploadDate: {info['UploadDate']}
'''
341 |
342 | if verbose:
343 | pp.pprint(info)
344 |
345 | return [True, info]
346 |
--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
1 | from genInfo import genInfo
2 | from pathlib import Path
3 | from writeInfo import writeInfo
4 | import json
5 |
6 | verbose = False
7 | infoOnly = False
8 |
9 | succeeded = []
10 | failed = []
11 |
12 | cwd = Path.cwd()
13 | work = cwd / 'work'
14 | infPath = cwd / 'inf.json'
15 | serPath = cwd / 'ser.json'
16 |
17 | if infPath.exists():
18 | infStore = json.loads(infPath.read_text(encoding='UTF-8'))
19 | else:
20 | infStore = {}
21 |
22 | if serPath.exists():
23 | serStore = json.loads(serPath.read_text(encoding='UTF-8'))
24 | else:
25 | serStore = {}
26 |
27 | dirList = [x for x in work.iterdir() if x.is_dir()]
28 |
29 | for curDirIndex in range(len(dirList)):
30 | curDir = dirList[curDirIndex]
31 | print(f'===== start processing {curDirIndex+1}/{len(dirList)} =====')
32 | print(f' path: {curDir}')
33 |
34 | if curDir.name in infStore:
35 | print('from inf.json')
36 | info = infStore[curDir.name]
37 | else:
38 | gr = genInfo(curDir, verbose)
39 | if not gr[0]:
40 | print(f'===== fail generating {curDirIndex+1}/{len(dirList)} =====\n')
41 | failed.append([curDir.name, gr[1]])
42 | continue
43 | info = gr[1]
44 |
45 | if curDir.name in serStore:
46 | print('from ser.json')
47 | info['series'], info['issue'] = serStore[curDir.name][:2]
48 |
49 | serStore[curDir.name] = [info['series'], info['issue'], info['coreTitle'], info['Web']]
50 | infStore[curDir.name] = info
51 |
52 | if not infoOnly:
53 | wr = writeInfo(curDir.name, info, verbose)
54 | if(not wr[0]):
55 | print(f'===== fail writing {curDirIndex+1}/{len(dirList)} =====\n')
56 | failed.append([curDir.name, wr[1]])
57 | continue
58 | print(f'===== finish processing {curDirIndex+1}/{len(dirList)} =====\n')
59 | succeeded.append(curDir.name)
60 |
61 | infPath.write_text(json.dumps(infStore, ensure_ascii=False, indent=2, sort_keys=True), encoding='UTF-8')
62 | serPath.write_text(json.dumps(serStore, ensure_ascii=False, indent=2, sort_keys=True), encoding='UTF-8')
63 |
64 | result = {
65 | 'succeeded_count': len(succeeded),
66 | 'failed_count': len(failed),
67 | }
68 | print(failed)
69 | print(result)
70 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | pycountry
2 | requests
3 |
--------------------------------------------------------------------------------
/start.cmd:
--------------------------------------------------------------------------------
1 | python main.py
2 | pause
--------------------------------------------------------------------------------
/utils/README.md:
--------------------------------------------------------------------------------
1 | ### x-gallery-metadata.user.js
2 |
3 | Downloaded from [https://raw.githubusercontent.com/dnsev-h/x/master/builds/x-gallery-metadata.user.js](https://raw.githubusercontent.com/dnsev-h/x/master/builds/x-gallery-metadata.user.js), v1.2.4, at 2021-02-11
4 |
5 | ### db.text.json
6 |
7 | Downloaded from [https://github.com/EhTagTranslation/Database/releases](https://github.com/EhTagTranslation/Database/releases), [`ceaeb72`](https://github.com/EhTagTranslation/Database/compare/efe2dee6f44474b7cc68245bad751bdba7dc3400...ceaeb72c3c548d39ac381f9ab9b81f1f40a4387a), at 2021-02-11
8 |
9 |
--------------------------------------------------------------------------------
/utils/x-gallery-metadata.user.js:
--------------------------------------------------------------------------------
1 | // ==UserScript==
2 | // @name x/gallery-metadata
3 | // @version 1.2.4
4 | // @author dnsev-h
5 | // @namespace dnsev-h
6 | // @description Download metadata JSON files for galleries
7 | // @run-at document-start
8 | // @include https://exhentai.org/*
9 | // @include https://e-hentai.org/*
10 | // @icon 
11 | // @icon64 
12 | // @homepage https://dnsev-h.github.io/x/
13 | // @supportURL https://github.com/dnsev-h/x/issues
14 | // @updateURL https://raw.githubusercontent.com/dnsev-h/x/master/builds/x-gallery-metadata.meta.js
15 | // @downloadURL https://raw.githubusercontent.com/dnsev-h/x/master/builds/x-gallery-metadata.user.js
16 | // ==/UserScript==
17 | (function(){function r(e,n,t){function o(i,f){if(!n[i]){if(!e[i]){var c="function"==typeof require&&require;if(!f&&c)return c(i,!0);if(u)return u(i,!0);var a=new Error("Cannot find module '"+i+"'");throw a.code="MODULE_NOT_FOUND",a}var p=n[i]={exports:{}};e[i][0].call(p.exports,function(r){var n=e[i][1][r];return o(n||r)},p,p.exports,r,e,n,t)}return n[i].exports}for(var u="function"==typeof require&&require,i=0;idiv");
257 | if (node === null) { return null; }
258 |
259 | let url = getCssUrl(node.style.backgroundImage);
260 | if (url !== null) { return url; }
261 |
262 | const img = node.querySelector("img[src]");
263 | return (img !== null ? img.getAttribute("src") : null);
264 | }
265 |
266 | function getCategory(html) {
267 | const node = html.querySelector("#gdc>div[onclick]");
268 | if (node === null) { return null; }
269 |
270 | const pattern = /['"].*?\/\/.+?\/(.*?)(\?.*?)?(#.*?)?['"]/;
271 | const match = pattern.exec(node.getAttribute("onclick") || "");
272 | return (match !== null ? match[1] : null);
273 | }
274 |
275 | function getUploader(html) {
276 | const node = html.querySelector("#gdn>a");
277 | if (node === null) { return null; }
278 |
279 | const pattern = /^.*?\/\/.+?\/(.*?)(\?.*?)?(#.*?)?$/;
280 | const match = pattern.exec(node.getAttribute("href") || "");
281 | return (match !== null ? (match[1].split("/")[1] || "") : null);
282 | }
283 |
284 | function getRatingCount(html) {
285 | const node = html.querySelector("#rating_count");
286 | if (node === null) { return null; }
287 |
288 | const value = parseInt(node.textContent.trim(), 10);
289 | return (Number.isNaN(value) ? null : value);
290 | }
291 |
292 | function getRatingAverage(html) {
293 | const node = html.querySelector("#rating_label");
294 | if (node === null) { return null; }
295 |
296 | const pattern = /average:\s*([0-9\.]+)/i;
297 | const match = pattern.exec(node.textContent);
298 | if (match === null) { return null; }
299 |
300 | const value = parseFloat(match[1]);
301 | return (Number.isNaN(value) ? null : value);
302 | }
303 |
304 | function getFavoriteCount(html) {
305 | const node = html.querySelector("#favcount");
306 | if (node === null) { return null; }
307 |
308 | const pattern = /\s*([0-9]+|once)/i;
309 | const match = pattern.exec(node.textContent);
310 | if (match === null) { return null; }
311 |
312 | const match1 = match[1];
313 | return (match1.toLowerCase() === "once" ? 1 : parseInt(match1, 10));
314 | }
315 |
316 | function getFavoriteCategory(html) {
317 | const node = html.querySelector("#fav>div.i");
318 | if (node === null) { return null; }
319 |
320 | const title = node.getAttribute("title") || "";
321 | const pattern = /background-position\s*:\s*\d+(?:px)?\s+(-?\d+)(?:px)/;
322 | const match = pattern.exec(node.getAttribute("style") || "");
323 | const index = (match !== null) ?
324 | Math.floor((Math.abs(parseInt(match[1], 10)) - 2) / 19) :
325 | -1;
326 |
327 | return { index, title };
328 | }
329 |
330 | function getThumbnailSize(html) {
331 | const nodes = html.querySelectorAll("#gdo4>.nosel");
332 | if (nodes.length < 2) { return null; }
333 | return (nodes[0].classList.contains("ths") ? "normal" : "large");
334 | }
335 |
336 | function getThumbnailRows(html) {
337 | const nodes = html.querySelectorAll("#gdo2>.nosel");
338 | if (nodes.length === 0) { return null; }
339 |
340 | const pattern = /\s*([0-9]+)/;
341 | for (const node of nodes) {
342 | if (node.classList.contains("ths")) {
343 | const match = pattern.exec(node.textContent);
344 | if (match !== null) {
345 | return parseInt(match[1], 10);
346 | }
347 | }
348 | }
349 |
350 | return null;
351 | }
352 |
353 | function getTags(html) {
354 | const pattern = /(.+):/;
355 | const groups = html.querySelectorAll("#taglist tr");
356 | const tags = {};
357 |
358 | for (const group of groups) {
359 | const tds = group.querySelectorAll("td");
360 | if (tds.length === 0) { continue; }
361 |
362 | const match = pattern.exec(tds[0].textContent);
363 | const namespace = (match !== null ? match[1].trim() : "");
364 |
365 | let namespaceTags;
366 | if (tags.hasOwnProperty(namespace)) {
367 | namespaceTags = tags[namespace];
368 | } else {
369 | namespaceTags = [];
370 | tags[namespace] = namespaceTags;
371 | }
372 |
373 | const tagDivs = tds[tds.length - 1].querySelectorAll("div");
374 | for (const div of tagDivs) {
375 | const link = div.querySelector("a");
376 | if (link === null) { continue; }
377 |
378 | const tag = link.textContent.trim();
379 | namespaceTags.push(tag);
380 | }
381 | }
382 |
383 | return tags;
384 | }
385 |
386 | function getDetailsNodes(html) {
387 | return html.querySelectorAll("#gdd tr");
388 | }
389 |
390 | function getDateUploaded(detailsNodes) {
391 | if (detailsNodes.length <= 0) { return null; }
392 | const node = detailsNodes[0].querySelector(".gdt2");
393 | return (node !== null ? getTimestamp(node.textContent) : null);
394 | }
395 |
396 | function getVisibleInfo(detailsNodes) {
397 | let visible = true;
398 | let visibleReason = null;
399 |
400 | if (detailsNodes.length > 2) {
401 | const node = detailsNodes[2].querySelector(".gdt2");
402 | if (node !== null) {
403 | const pattern = /no\s+\((.+?)\)/i;
404 | const match = pattern.exec(node.textContent);
405 | if (match !== null) {
406 | visible = false;
407 | visibleReason = match[1].trim();
408 | }
409 | }
410 | }
411 |
412 | return { visible, visibleReason };
413 | }
414 |
415 | function getLanguageInfo(detailsNodes) {
416 | let language = null;
417 | let translated = false;
418 |
419 | if (detailsNodes.length > 3) {
420 | const node = detailsNodes[3].querySelector(".gdt2");
421 | if (node !== null) {
422 | const textNode = node.firstChild;
423 | if (textNode !== null && textNode.nodeType === Node.TEXT_NODE) {
424 | language = textNode.nodeValue.trim();
425 | }
426 |
427 | const trNode = node.querySelector(".halp");
428 | translated = (trNode !== null && trNode.textContent.trim().toLowerCase() === "tr");
429 | }
430 | }
431 |
432 | return { language, translated };
433 | }
434 |
435 | function getApproximateTotalFileSize(detailsNodes) {
436 | if (detailsNodes.length <= 4) { return null; }
437 |
438 | const node = detailsNodes[4].querySelector(".gdt2");
439 | if (node === null) { return null; }
440 |
441 | const pattern = /([0-9\.]+)\s*(\w+)/i;
442 | const match = pattern.exec(node.textContent);
443 | return (match !== null ? utils.getBytesSizeFromLabel(match[1], match[2]) : null);
444 | }
445 |
446 | function getFileCount(detailsNodes) {
447 | if (detailsNodes.length <= 5) { return null; }
448 |
449 | const node = detailsNodes[5].querySelector(".gdt2");
450 | if (node === null) { return null; }
451 |
452 | const pattern = /([0-9,]+)\s*pages/i;
453 | const match = pattern.exec(node.textContent);
454 | return (match !== null ? parseInt(match[1].replace(/,/g, ""), 10) : null);
455 | }
456 |
457 | function getParent(detailsNodes) {
458 | if (detailsNodes.length <= 1) { return null; }
459 |
460 | const node = detailsNodes[1].querySelector(".gdt2>a");
461 | if (node === null) { return null; }
462 |
463 | const info = utils.getGalleryIdentifierAndPageFromUrl(node.getAttribute("href") || "");
464 | return (info !== null ? info.identifier : null);
465 | }
466 |
467 | function getNewerVersions(html) {
468 | const results = [];
469 | const nodes = html.querySelectorAll("#gnd>a");
470 |
471 | for (const node of nodes) {
472 | const info = utils.getGalleryIdentifierAndPageFromUrl(node.getAttribute("href") || "");
473 | if (info === null) { continue; }
474 |
475 | const galleryInfo = {
476 | identifier: info.identifier,
477 | name: node.textContent.trim(),
478 | dateUploaded: null
479 | };
480 |
481 | if (node.nextSibling !== null) {
482 | galleryInfo.dateUploaded = getTimestamp(node.nextSibling.textContent);
483 | }
484 |
485 | results.push(galleryInfo);
486 | }
487 |
488 | return results;
489 | }
490 |
491 | function getTorrentCount(html) {
492 | const nodes = html.querySelectorAll("#gd5 .g2>a");
493 | const pattern = /\btorrent\s+download\s*\(\s*(\d+)\s*\)/i;
494 | for (const node of nodes) {
495 | const match = pattern.exec(node.textContent);
496 | if (match !== null) {
497 | return parseInt(match[1], 10);
498 | }
499 | }
500 |
501 | return null;
502 | }
503 |
504 | function getArchiverKey(html) {
505 | const nodes = html.querySelectorAll("#gd5 .g2>a");
506 | const pattern = /\barchive\s+download\b/i;
507 | for (const node of nodes) {
508 | const match = pattern.exec(node.textContent);
509 | if (match !== null) {
510 | const pattern2 = /&or=([^'"]*)['"]/;
511 | const match2 = pattern2.exec(node.getAttribute("onclick") || "");
512 | return (match2 !== null ? match2[1] : null);
513 | }
514 | }
515 |
516 | return null;
517 | }
518 |
519 | function populateGalleryInfoFromHtml(info, html) {
520 | // General
521 | info.title = getTitle(html);
522 | info.titleOriginal = getTitleOriginal(html);
523 | info.mainThumbnailUrl = getMainThumbnailUrl(html);
524 | info.category = getCategory(html);
525 | info.uploader = getUploader(html);
526 |
527 | info.ratingCount = getRatingCount(html);
528 | info.ratingAverage = getRatingAverage(html);
529 |
530 | info.favoriteCount = getFavoriteCount(html);
531 | info.favoriteCategory = getFavoriteCategory(html);
532 |
533 | info.thumbnailSize = getThumbnailSize(html);
534 | info.thumbnailRows = getThumbnailRows(html);
535 |
536 | info.newerVersions = getNewerVersions(html);
537 |
538 | info.torrentCount = getTorrentCount(html);
539 | info.archiverKey = getArchiverKey(html);
540 |
541 | // Details
542 | const detailsNodes = getDetailsNodes(html);
543 |
544 | info.dateUploaded = getDateUploaded(detailsNodes);
545 |
546 | info.parent = getParent(detailsNodes);
547 |
548 | const visibleInfo = getVisibleInfo(detailsNodes);
549 | info.visible = visibleInfo.visible;
550 | info.visibleReason = visibleInfo.visibleReason;
551 |
552 | const languageInfo = getLanguageInfo(detailsNodes);
553 | info.language = languageInfo.language;
554 | info.translated = languageInfo.translated;
555 |
556 | info.approximateTotalFileSize = getApproximateTotalFileSize(detailsNodes);
557 |
558 | info.fileCount = getFileCount(detailsNodes);
559 |
560 | // Tags
561 | info.tags = getTags(html);
562 | info.tagsHaveNamespace = true;
563 | }
564 |
565 | function getFromHtml(html, url) {
566 | const link = html.querySelector(".ptt td.ptds>a[href],.ptt td.ptdd>a[href]");
567 | if (link === null) { return null; }
568 |
569 | const idPage = utils.getGalleryIdentifierAndPageFromUrl(link.getAttribute("href") || "");
570 | if (idPage === null) { return null; }
571 |
572 | const info = new types.GalleryInfo();
573 | info.identifier = idPage.identifier;
574 | info.currentPage = idPage.page;
575 | info.source = "html";
576 | populateGalleryInfoFromHtml(info, html);
577 | info.sourceSite = utils.getSourceSiteFromUrl(url);
578 | info.dateGenerated = Date.now();
579 | return info;
580 | }
581 |
582 |
583 | module.exports = getFromHtml;
584 |
585 | },{"./types":4,"./utils":5}],4:[function(require,module,exports){
586 | "use strict";
587 |
588 | const GalleryIdentifier = require("../gallery-identifier").GalleryIdentifier;
589 |
590 |
591 | class GalleryInfo {
592 | constructor() {
593 | this.identifier = null;
594 | this.title = null;
595 | this.titleOriginal = null;
596 | this.dateUploaded = null;
597 | this.category = null;
598 | this.uploader = null;
599 | this.ratingAverage = null;
600 | this.ratingCount = null;
601 | this.favoriteCategory = null;
602 | this.favoriteCount = null;
603 | this.mainThumbnailUrl = null;
604 | this.thumbnailSize = null;
605 | this.thumbnailRows = null;
606 | this.fileCount = null;
607 | this.approximateTotalFileSize = null;
608 | this.visible = true;
609 | this.visibleReason = null;
610 | this.language = null;
611 | this.translated = null;
612 | this.archiverKey = null;
613 | this.torrentCount = null;
614 | this.tags = null;
615 | this.tagsHaveNamespace = null;
616 | this.currentPage = null;
617 | this.parent = null;
618 | this.newerVersions = null;
619 | this.source = null;
620 | this.sourceSite = null;
621 | this.dateGenerated = null;
622 | }
623 | }
624 |
625 |
626 | module.exports = {
627 | GalleryIdentifier,
628 | GalleryInfo
629 | };
630 |
631 | },{"../gallery-identifier":1}],5:[function(require,module,exports){
632 | "use strict";
633 |
634 | const types = require("./types");
635 |
636 | const sizeLabelToBytesPrefixes = [ "b", "kb", "mb", "gb" ];
637 |
638 |
639 | function getGalleryPageFromUrl(url) {
640 | const match = /\?(?:(|[\w\W]*?&)p=([\+\-]?\d+))?/.exec(url);
641 | if (match !== null && match[1]) {
642 | const page = parseInt(match[1], 10);
643 | if (!Number.isNaN(page)) { return page; }
644 | }
645 | return null;
646 | }
647 |
648 | function getGalleryIdentifierAndPageFromUrl(url) {
649 | const identifier = types.GalleryIdentifier.createFromUrl(url);
650 | if (identifier === null) { return null; }
651 |
652 | const page = getGalleryPageFromUrl(url);
653 | return { identifier, page };
654 | }
655 |
656 | function getBytesSizeFromLabel(number, label) {
657 | let i = sizeLabelToBytesPrefixes.indexOf(label.toLowerCase());
658 | if (i < 0) { i = 0; }
659 | return Math.floor(parseFloat(number) * Math.pow(1024, i));
660 | }
661 |
662 | function getSourceSiteFromUrl(url) {
663 | const pattern = /^(?:(?:[a-z][a-z0-9\+\-\.]*:\/*|\/{2,})([^\/]*))?(\/?[\w\W]*)$/i;
664 | const match = pattern.exec(url);
665 |
666 | if (match !== null && match[1]) {
667 | const host = match[1].toLowerCase();
668 | if (host.indexOf("exhentai") >= 0) { return "exhentai"; }
669 | if (host.indexOf("e-hentai") >= 0) { return "e-hentai"; }
670 | }
671 |
672 | return null;
673 | }
674 |
675 |
676 | module.exports = {
677 | getGalleryIdentifierAndPageFromUrl,
678 | getBytesSizeFromLabel,
679 | getSourceSiteFromUrl
680 | };
681 |
682 | },{"./types":4}],6:[function(require,module,exports){
683 | "use strict";
684 |
685 | const apiStyle = require("./style");
686 | const style = require("../style");
687 |
688 |
689 | function insertStylesheet() {
690 | const id = "x-gallery-links-right-sidebar";
691 | if (style.hasStylesheet(id)) { return; }
692 |
693 | const src = require("./style/gallery-right-sidebar.css");
694 | style.addStylesheet(src, id);
695 | }
696 |
697 | function getGroupContainer(parent) {
698 | const id = "x-gallery-links-right-sidebar-container";
699 | let node = parent.querySelector(`.${id}`);
700 | if (node === null) {
701 | node = document.createElement("div");
702 | node.className = `g2 gsp ${id}`;
703 | parent.appendChild(node);
704 |
705 | const p = parent.parentNode;
706 | if (p !== null) {
707 | p.classList.add("x-gallery-links-right-sidebar-contains-container");
708 | }
709 | }
710 | return node;
711 | }
712 |
713 | function createLink(label, order) {
714 | const parent = document.querySelector("#gd5");
715 | if (parent === null) {
716 | return { link: null, linkContainer: null };
717 | }
718 |
719 | // Style
720 | insertStylesheet();
721 |
722 | // Container
723 | const linkGroup = getGroupContainer(parent);
724 | const linkContainer = document.createElement("div");
725 | linkContainer.className = "x-gallery-links-right-sidebar-entry";
726 | if (typeof(order) === "number" && !Number.isNaN(order)) {
727 | linkContainer.style.order = `${order}`;
728 | }
729 |
730 | const img = document.createElement("img");
731 | img.src = apiStyle.getArrowIconUrl();
732 | linkContainer.appendChild(img);
733 |
734 | linkContainer.appendChild(document.createTextNode(" "));
735 |
736 | const link = document.createElement("a");
737 | link.textContent = label;
738 | linkContainer.appendChild(link);
739 |
740 | linkGroup.appendChild(linkContainer);
741 |
742 | return { link, linkContainer };
743 | }
744 |
745 |
746 | module.exports = {
747 | createLink
748 | };
749 |
750 | },{"../style":13,"./style":8,"./style/gallery-right-sidebar.css":9}],7:[function(require,module,exports){
751 | "use strict";
752 |
753 | const overrideAttributeName = "data-x-override-page-type";
754 |
755 |
756 | function setOverride(value) {
757 | if (value) {
758 | document.documentElement.setAttribute(overrideAttributeName, value);
759 | } else {
760 | document.documentElement.removeAttribute(overrideAttributeName);
761 | }
762 | }
763 |
764 | function getOverride() {
765 | const value = document.documentElement.getAttribute(overrideAttributeName);
766 | return value ? value : null;
767 | }
768 |
769 | function get(doc, location) {
770 | const overrideType = getOverride();
771 | if (overrideType !== null) {
772 | return overrideType;
773 | }
774 |
775 | if (doc.querySelector("#searchbox") !== null) {
776 | return "search";
777 | }
778 | if (doc.querySelector("input[name=favcat]") !== null) {
779 | return "favorites";
780 | }
781 | if (doc.querySelector("#i1>h1") !== null) {
782 | return "image";
783 | }
784 | if (doc.querySelector(".gm h1#gn") !== null) {
785 | return "gallery";
786 | }
787 | if (doc.querySelector("#profile_outer") !== null) {
788 | return "settings";
789 | }
790 | if (doc.querySelector("#torrentinfo") !== null) {
791 | return "torrentInfo";
792 | }
793 |
794 | let n = doc.querySelector("body>.d>p");
795 | if (
796 | (n !== null && /gallery\s+has\s+been\s+removed/i.test(n.textContent)) ||
797 | doc.querySelector(".eze_dgallery_table") !== null) { // eze resurrection
798 | return "deletedGallery";
799 | }
800 |
801 | n = doc.querySelector("img[src]");
802 | if (n !== null && location !== null) {
803 | const p = location.pathname;
804 | if (
805 | n.getAttribute("src") === location.href &&
806 | p.substr(0, 3) !== "/t/" &&
807 | p.substr(0, 5) !== "/img/") {
808 | return "panda";
809 | }
810 | }
811 |
812 | // Unknown
813 | return null;
814 | }
815 |
816 |
817 | module.exports = {
818 | get,
819 | getOverride,
820 | setOverride
821 | };
822 |
823 | },{}],8:[function(require,module,exports){
824 | "use strict";
825 |
826 | function isDark() {
827 | return (
828 | window.location.hostname.indexOf("exhentai") >= 0 ||
829 | document.documentElement.classList.contains("x-force-dark"));
830 | }
831 |
832 | function setDocumentDarkFlag() {
833 | document.documentElement.classList.toggle("x-is-dark", isDark());
834 | }
835 |
836 | function getArrowIconUrl() {
837 | return (isDark() ? "https://exhentai.org/img/mr.gif" : "https://ehgt.org/g/mr.gif");
838 | }
839 |
840 |
841 | module.exports = {
842 | isDark,
843 | setDocumentDarkFlag,
844 | getArrowIconUrl
845 | };
846 |
847 | },{}],9:[function(require,module,exports){
848 | module.exports = ".x-gallery-links-right-sidebar-container{margin-top:-25px;padding-bottom:0;display:flex;flex-direction:column}.x-gallery-links-right-sidebar-entry{margin-top:25px}div#gright.x-gallery-links-right-sidebar-contains-container{overflow-x:hidden;overflow-y:auto}";
849 | },{}],10:[function(require,module,exports){
850 | "use strict";
851 |
852 | const ready = require("../ready");
853 | const pageType = require("../api/page-type");
854 | const windowMessage = require("../window-message");
855 | const getFromHtml = require("../api/gallery-info/get-from-html");
856 | const queryString = require("../query-string");
857 | const GalleryIdentifier = require("../api/gallery-identifier").GalleryIdentifier;
858 | const toCommonJson = require("../api/gallery-info/common-json").toCommonJson;
859 |
860 | let downloadDataUrl = null;
861 |
862 |
863 | function setupGalleryPage() {
864 | createGalleryPageDownloadLink();
865 |
866 | windowMessage.registerCommand("galleryInfoRequest", (e) => {
867 | const data = getFromHtml(document, window.location.href);
868 | if (data === null) { return; }
869 | windowMessage.post(e.source, "galleryInfoResponse", toCommonJson(data));
870 | });
871 | }
872 |
873 | function createGalleryPageDownloadLink() {
874 | const galleryRightSidebar = require("../api/gallery-right-sidebar");
875 | const link = galleryRightSidebar.createLink("Metadata JSON", 0).link;
876 | if (link === null) { return; }
877 |
878 | link.setAttribute("download", "info.json");
879 | link.href = "#";
880 |
881 | link.addEventListener("click", onDownloadLinkClicked, false);
882 | link.addEventListener("auxclick", onDownloadLinkClicked, false);
883 | }
884 |
885 | function getGalleryInfo() {
886 | try {
887 | return getFromHtml(document, window.location.href);
888 | } catch (e) {
889 | console.error(e);
890 | return null;
891 | }
892 | }
893 |
894 | function createDownloadDataUrl(info) {
895 | const infoString = JSON.stringify(info, null, " ");
896 | const blob = new Blob([ infoString ], { type: "application/json" });
897 | return URL.createObjectURL(blob);
898 | }
899 |
900 | function onDownloadLinkClicked(e) {
901 | /* jshint -W040 */
902 | if (downloadDataUrl === null) {
903 | const info = getGalleryInfo();
904 | if (info === null) {
905 | console.error("Failed to create download data");
906 | e.preventDefault();
907 | e.stopPropagation();
908 | return false;
909 | }
910 |
911 | downloadDataUrl = createDownloadDataUrl(toCommonJson(info));
912 | this.setAttribute("href", downloadDataUrl);
913 | }
914 | /* jshint +W040 */
915 | }
916 |
917 |
918 | function setupTorrentPage() {
919 | if (!window.opener) { return; }
920 |
921 | const identifier = getGalleryIdentifierFromTorrentPageUrl(window.location.href);
922 | if (identifier === null) { return; }
923 |
924 | windowMessage.registerCommand("galleryInfoResponse", (e, info) => {
925 | if (downloadDataUrl !== null || !isValidInfo(info, identifier)) { return; }
926 | downloadDataUrl = createDownloadDataUrl(info);
927 | createTorrentPageDownloadLinks(downloadDataUrl);
928 | });
929 | windowMessage.post(window.opener, "galleryInfoRequest");
930 | }
931 |
932 | function getGalleryIdentifierFromTorrentPageUrl(url) {
933 | const params = queryString.getUrlParameters(url);
934 | if (!params.hasOwnProperty("gid") || !params.hasOwnProperty("t")) { return null; }
935 |
936 | const id = parseInt(params.gid, 10);
937 | if (Number.isNaN(id)) { return null; }
938 |
939 | return new GalleryIdentifier(id, params.t);
940 | }
941 |
942 | function isValidInfo(info, identifier) {
943 | const g = info.gallery;
944 | return (
945 | g !== null && typeof(g) === "object" &&
946 | g.gid === identifier.id &&
947 | g.token === identifier.token);
948 | }
949 |
950 | function createTorrentPageDownloadLinks(url) {
951 | const tables = document.querySelectorAll("#torrentinfo form table>tbody");
952 | for (const table of tables) {
953 | const torrentLink = table.querySelector("tr:nth-of-type(3)>td");
954 | if (torrentLink === null) { continue; }
955 |
956 | const text = torrentLink.textContent;
957 | const whitespace = /^\s*/.exec(text)[0];
958 | const torrentFileName = text.trim().replace(/\.[^\.]*$/, "");
959 |
960 | const row = document.createElement("tr");
961 |
962 | const cell = document.createElement("td");
963 | cell.setAttribute("colspan", "5");
964 |
965 | if (whitespace.length > 0) {
966 | cell.appendChild(document.createTextNode(whitespace));
967 | }
968 |
969 | const link = document.createElement("a");
970 | link.setAttribute("download", `${torrentFileName}.info.json`);
971 | link.href = url;
972 | link.textContent = "Metadata JSON";
973 | cell.appendChild(link);
974 |
975 | row.appendChild(cell);
976 | table.appendChild(row);
977 | }
978 | }
979 |
980 |
981 | function main() {
982 | const currentPageType = pageType.get(document, location);
983 |
984 | switch (currentPageType) {
985 | case "gallery":
986 | setupGalleryPage();
987 | break;
988 | case "torrentInfo":
989 | setupTorrentPage();
990 | break;
991 | }
992 | }
993 |
994 |
995 | ready.onReady(main);
996 |
997 | },{"../api/gallery-identifier":1,"../api/gallery-info/common-json":2,"../api/gallery-info/get-from-html":3,"../api/gallery-right-sidebar":6,"../api/page-type":7,"../query-string":11,"../ready":12,"../window-message":14}],11:[function(require,module,exports){
998 | "use strict";
999 |
1000 | function getUrlParameters(url) {
1001 | const result = {};
1002 | const match = /^([^#\?]*)(\?[^#]*)?(#[\w\W]*)?$/.exec(url);
1003 | if (match !== null && match[2] && match[2].length > 1) {
1004 | const pattern = /([^=]*)(?:=([\w\W]*))?/;
1005 | for (const part of match[2].substr(1).split("&")) {
1006 | if (part.length === 0) { continue; }
1007 | const match2 = pattern.exec(part);
1008 | const value = match2[2];
1009 | result[decodeURIComponent(match2[1])] = (value !== undefined ? decodeURIComponent(value) : null);
1010 | }
1011 | }
1012 | return result;
1013 | }
1014 |
1015 | function removeQueryParameter(url, parameterName) {
1016 | return url.replace(
1017 | new RegExp(`([&\\?])${parameterName}(?:(?:=[^&]*)?(&|$))`),
1018 | (m0, m1, m2) => (m1 === "?" && m2 ? "?" : m2));
1019 | }
1020 |
1021 |
1022 | module.exports = {
1023 | getUrlParameters,
1024 | removeQueryParameter
1025 | };
1026 |
1027 | },{}],12:[function(require,module,exports){
1028 | "use strict";
1029 |
1030 | let isReadyValue = false;
1031 | let callbacks = null;
1032 | let checkIntervalId = null;
1033 | const checkIntervalRate = 250;
1034 |
1035 |
1036 | function isHooked() {
1037 | return callbacks !== null;
1038 | }
1039 |
1040 | function hook() {
1041 | callbacks = [];
1042 | window.addEventListener("load", checkIfReady, false);
1043 | window.addEventListener("DOMContentLoaded", checkIfReady, false);
1044 | document.addEventListener("readystatechange", checkIfReady, false);
1045 | checkIntervalId = setInterval(checkIfReady, checkIntervalRate);
1046 | }
1047 |
1048 | function unhook() {
1049 | const cbs = callbacks;
1050 |
1051 | callbacks = null;
1052 | window.removeEventListener("load", checkIfReady, false);
1053 | window.removeEventListener("DOMContentLoaded", checkIfReady, false);
1054 | document.removeEventListener("readystatechange", checkIfReady, false);
1055 | clearInterval(checkIntervalId);
1056 | checkIntervalId = null;
1057 |
1058 | invoke(cbs);
1059 | }
1060 |
1061 | function invoke(callbacks) {
1062 | for (let cb of callbacks) {
1063 | try {
1064 | cb();
1065 | }
1066 | catch (e) {
1067 | console.error(e);
1068 | }
1069 | }
1070 | }
1071 |
1072 | function isReady() {
1073 | if (isReadyValue) { return true; }
1074 |
1075 | if (document.readyState === "interactive" || document.readyState === "complete") {
1076 | if (isHooked()) { unhook(); }
1077 | isReadyValue = true;
1078 | return true;
1079 | }
1080 | return false;
1081 | }
1082 |
1083 | function checkIfReady() {
1084 | isReady();
1085 | }
1086 |
1087 |
1088 | function onReady(callback) {
1089 | if (isReady()) {
1090 | callback();
1091 | return;
1092 | }
1093 |
1094 | if (!isHooked()) { hook(); }
1095 |
1096 | callbacks.push(callback);
1097 | }
1098 |
1099 |
1100 | module.exports = {
1101 | onReady: onReady,
1102 | get isReady() { return isReady(); }
1103 | };
1104 |
1105 | },{}],13:[function(require,module,exports){
1106 | "use strict";
1107 |
1108 | let apiStyle = null;
1109 |
1110 |
1111 | function getId(id) {
1112 | return `${id}-stylesheet`;
1113 | }
1114 |
1115 | function getStylesheet(id) {
1116 | return document.getElementById(getId(id));
1117 | }
1118 |
1119 | function hasStylesheet(id) {
1120 | return !!getStylesheet(id);
1121 | }
1122 |
1123 | function addStylesheet(source, id) {
1124 | if (apiStyle === null) { apiStyle = require("./api/style"); }
1125 | apiStyle.setDocumentDarkFlag();
1126 |
1127 | const style = document.createElement("style");
1128 | style.textContent = source;
1129 | if (typeof(id) === "string") {
1130 | style.id = getId(id);
1131 | }
1132 | document.head.appendChild(style);
1133 | return style;
1134 | }
1135 |
1136 |
1137 | module.exports = {
1138 | hasStylesheet,
1139 | getStylesheet,
1140 | addStylesheet
1141 | };
1142 |
1143 | },{"./api/style":8}],14:[function(require,module,exports){
1144 | "use strict";
1145 |
1146 | let commands = null;
1147 |
1148 |
1149 | function registerCommand(commandName, callback) {
1150 | if (commands === null) {
1151 | commands = {};
1152 | window.addEventListener("message", onWindowMessage, false);
1153 | }
1154 |
1155 | commands[commandName] = callback;
1156 | }
1157 |
1158 | function post(targetWindow, commandName, data) {
1159 | targetWindow.postMessage({
1160 | xData: { command: commandName, data: data }
1161 | }, window.location.origin);
1162 | }
1163 |
1164 | function onWindowMessage(e) {
1165 | if (e.origin !== window.origin) { return; }
1166 |
1167 | let data = e.data;
1168 | if (data === null || typeof(data) !== "object") { return; }
1169 |
1170 | data = data.xData;
1171 | if (data === null || typeof(data) !== "object") { return; }
1172 | if (typeof(data.command) !== "string") { return; }
1173 |
1174 | const callback = commands[data.command];
1175 | if (typeof(callback) !== "function") { return; }
1176 |
1177 | callback(e, data.data);
1178 | }
1179 |
1180 |
1181 | module.exports = {
1182 | registerCommand,
1183 | post
1184 | };
1185 |
1186 | },{}]},{},[10])
1187 | //# sourceMappingURL=data:application/json;charset=utf-8;base64,
1188 |
--------------------------------------------------------------------------------
/writeInfo.py:
--------------------------------------------------------------------------------
1 | from compress import compress
2 | from pathlib import Path
3 | import json
4 | import math
5 | import pprint
6 | import sys
7 | import zipfile
8 |
9 | cwd = Path.cwd()
10 | work = cwd / 'work'
11 | out = cwd / 'out'
12 |
13 | pp = pprint.PrettyPrinter(indent=2)
14 |
15 | def writeInfo(fileStem, info, verbose):
16 | if verbose:
17 | pp.pprint(info)
18 | try:
19 | xmlData = f'''
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 | ''' #
39 | xmlDataPath = work / fileStem / 'ComicInfo.xml'
40 | xmlDataPath.write_text(xmlData, encoding='UTF-8')
41 |
42 | cr = compress(fileStem, verbose)
43 | if(not cr[0]):
44 | return [False, cr[1]]
45 |
46 | jsonData = json.loads('{"ComicBookInfo/1.0": {}}')
47 |
48 | jsonData['ComicBookInfo/1.0']['comments'] = info['Comments']
49 | jsonData['ComicBookInfo/1.0']['credits'] = list(map(lambda x: {'person': x, 'role': 'Writer'}, info['writer']))
50 | jsonData['ComicBookInfo/1.0']['genre'] = info['Genre']
51 | jsonData['ComicBookInfo/1.0']['issue'] = info['issue']
52 | jsonData['ComicBookInfo/1.0']['language'] = info['LanguageISO']
53 | jsonData['ComicBookInfo/1.0']['publicationMonth'] = info['Month']
54 | jsonData['ComicBookInfo/1.0']['publicationYear'] = info['Year']
55 | jsonData['ComicBookInfo/1.0']['publisher'] = info['Publisher']
56 | jsonData['ComicBookInfo/1.0']['rating'] = math.floor(info['Rating']*2) or 1
57 | jsonData['ComicBookInfo/1.0']['series'] = info['series']
58 | jsonData['ComicBookInfo/1.0']['tags'] = info['tags']
59 | jsonData['ComicBookInfo/1.0']['title'] = info['Title']
60 |
61 | zipNote = json.dumps(jsonData, ensure_ascii=False, sort_keys=True).encode('utf-8')
62 | print(f'zip note size: {len(zipNote)} bytes/65535 bytes')
63 | f = out / f'{fileStem}.zip'
64 | fzip = zipfile.ZipFile(f, 'a', compression=zipfile.ZIP_DEFLATED, compresslevel=6)
65 | fzip.comment = zipNote
66 | fzip.close()
67 | newName = out / f'{fileStem}.cbz'
68 | f.rename(newName)
69 | except:
70 | return [False, sys.exc_info()]
71 | return [True]
--------------------------------------------------------------------------------