├── images
├── demo.png
└── logo.png
├── .github
└── ISSUE_TEMPLATE
│ ├── feature_request.md
│ └── bug_report.md
├── LICENSE
├── README.md
└── hako2epub.py
/images/demo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/quantrancse/hako2epub/HEAD/images/demo.png
--------------------------------------------------------------------------------
/images/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/quantrancse/hako2epub/HEAD/images/logo.png
--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/feature_request.md:
--------------------------------------------------------------------------------
1 | ---
2 | name: Feature request
3 | about: Suggest an idea for this project
4 | title: ''
5 | labels: ''
6 | assignees: ''
7 |
8 | ---
9 |
10 | **Is your feature request related to a problem? Please describe.**
11 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
12 |
13 | **Describe the solution you'd like**
14 | A clear and concise description of what you want to happen.
15 |
16 | **Describe alternatives you've considered**
17 | A clear and concise description of any alternative solutions or features you've considered.
18 |
19 | **Additional context**
20 | Add any other context or screenshots about the feature request here.
21 |
--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/bug_report.md:
--------------------------------------------------------------------------------
1 | ---
2 | name: Bug report
3 | about: Create a report to help us improve
4 | title: ''
5 | labels: ''
6 | assignees: ''
7 |
8 | ---
9 |
10 | **Describe the bug**
11 | A clear and concise description of what the bug is.
12 |
13 | **To Reproduce**
14 | Steps to reproduce the behavior:
15 | 1. Go to '...'
16 | 2. Click on '....'
17 | 3. Scroll down to '....'
18 | 4. See error
19 |
20 | **Expected behavior**
21 | A clear and concise description of what you expected to happen.
22 |
23 | **Screenshots**
24 | If applicable, add screenshots to help explain your problem.
25 |
26 | **Desktop (please complete the following information):**
27 | - OS: [e.g. iOS]
28 | - Version [e.g. 22]
29 |
30 | **Additional context**
31 | Add any other context about the problem here.
32 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2021 Tran Trung Quan
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
hako2epub
7 |
8 |
9 | A tool to download light novels from ln.hako.vn in epub file format for offline reading.
10 |
11 |
12 | Download
13 | ·
14 | Screenshots
15 | ·
16 | Script Usage
17 |
18 |
19 |
20 | ## Notes
21 | It's recommended to use [**1.1.1.1 Cloudflare WARP**](https://one.one.one.one/) when downloading novels for better performance and reliability.
22 |
23 |
24 | ## Table of Contents
25 |
26 | - [Table of Contents](#table-of-contents)
27 | - [About The Project](#about-the-project)
28 | - [Features](#features)
29 | - [Getting Started](#getting-started)
30 | - [Prerequisites](#prerequisites)
31 | - [Usage](#usage)
32 | - [Notes](#notes)
33 | - [Screenshots](#screenshots)
34 | - [Issues](#issues)
35 | - [Contributing](#contributing)
36 | - [License](#license)
37 | - [Contact](#contact)
38 | - [Acknowledgements](#acknowledgements)
39 |
40 |
41 | ## About The Project
42 |
43 | A tool to download light novels from [ln.hako.vn](https://ln.hako.vn) in epub file format for offline reading.
44 |
45 | **_Notes:_**
46 | * _This tool is a personal standalone project, it does not have any related to [ln.hako.vn](https://ln.hako.vn) administrators._
47 | * _If possible, please support the original light novel, hako website, and light novel translation authors._
48 | * _This tool is for non-commercial purpose only._
49 |
50 | ### Features
51 | * Working with [docln.net](https://docln.net/) and [docln.sbs](https://docln.sbs/).
52 | * Auto check and switch to working URL.
53 | * Support all kind of novels (Truyện dịch, Sáng tác, AI Dịch).
54 | * Support images.
55 | * Support navigation and table of contents.
56 | * Notes are shown directly in the light novel content.
57 | * Download all/single volume of a light novel.
58 | * Download specific chapters of a light novel.
59 | * Update all/single downloaded light novel.
60 | * Update new volumes.
61 | * Update new chapters.
62 | * Support multiprocessing to speed up.
63 | * Auto get current downloaded light novel in the directory.
64 | * Auto checking the new tool version.
65 |
66 |
67 | ## Getting Started
68 |
69 | For normal user, download the execution file below. Run and follow the instructions.
70 |
71 | **Windows**: [hako2epub.exe](https://github.com/quantrancse/hako2epub/releases/download/v2.0.6/hako2epub.exe)
72 |
73 | ### Prerequisites
74 |
75 | * python >= 3.9
76 | * ebooklib
77 | * requests
78 | * bs4
79 | * pillow
80 | * tqdm
81 | * questionary
82 | * argparse
83 | ```sh
84 | pip install ebooklib requests bs4 pillow argparse tqdm questionary
85 | ```
86 |
87 | ### Usage
88 | ```text
89 | usage: hako2epub.py [-h] [-v] [-c ln_url] [-u [ln_url]] [ln_url]
90 |
91 | A tool to download light novels from https://ln.hako.vn in epub file format for offline reading.
92 |
93 | positional arguments:
94 | ln_url url to the light novel page
95 |
96 | options:
97 | -h, --help show this help message and exit
98 | -v, --version show program's version number and exit
99 | -c, --chapter ln_url download specific chapters of a light novel
100 | -u, --update [ln_url] update all/single light novel
101 | ```
102 | * Download a light novel
103 | ```sh
104 | python hako2epub.py light_novel_url
105 | ```
106 | * Download specific chapters of light novel
107 | ```sh
108 | python hako2epub.py -c light_novel_url
109 | ```
110 | * Update all downloaded light novels
111 | ```sh
112 | python hako2epub.py -u
113 | ```
114 | * Update a single downloaded light novel
115 | ```sh
116 | python hako2epub.py -u light_novel_url
117 | ```
118 | ### Notes
119 | * After processing 190 requests at a time, the program will pause for 120 seconds (2 mins) to avoid spam blocking. Please be patient if it hangs.
120 | * Light novel will be downloaded into the same folder as the program.
121 | * Downloaded information will be saved into `ln_info.json` file located in the same folder as the program.
122 | * If you download specific chapters of a light novel, please enter the full name of the chapter in the "from ... to ..." prompt.
123 | * If you update the volume which contains specific chapters, only new chapters after the current latest chapter will be added.
124 | * Try to keep the program and `ln_info.json` file at the same folder with your downloaded light novels for efficiently management.
125 |
126 | ## Screenshots
127 | 
128 |
129 |
130 | ## Issues
131 |
132 | * I only tested on some of my favorite light novels.
133 | * Sometimes the tool can not get images from some image hosts.
134 | * Sometimes you have to wait (most cases are under 10 seconds) to download or update the light novels (maybe only the first light novel in the list). If you are over that time, you should use a VPN (1.1.1.1 Cloudflare WARP) to avoid this.
135 | * If you update the light novel that was renamed, it will download the whole light novel again. To avoid this, please manually rename the path of the epub file to the new light novel name exactly like the current name format. Also rename the light novel in the `ln_info.json` file.
136 |
137 |
138 | ## Contributing
139 |
140 | Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are **greatly appreciated**.
141 |
142 | 1. Fork the Project
143 | 2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
144 | 3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
145 | 4. Push to the Branch (`git push origin feature/AmazingFeature`)
146 | 5. Open a Pull Request
147 |
148 |
149 | ## License
150 |
151 | Distributed under the MIT License. See [LICENSE][license-url] for more information.
152 |
153 |
154 | ## Contact
155 |
156 | * **Author** - [@quantrancse](https://quantrancse.github.io)
157 |
158 |
159 | ## Acknowledgements
160 | * [EbookLib](https://github.com/aerkalov/ebooklib)
161 |
--------------------------------------------------------------------------------
/hako2epub.py:
--------------------------------------------------------------------------------
1 | """
2 | hako2epub - A tool to download light novels from ln.hako.vn in EPUB format.
3 |
4 | This tool allows users to download light novels, specific chapters, and update existing downloads.
5 |
6 | Features:
7 | - Download all/single volume of a light novel
8 | - Download specific chapters of a light novel
9 | - Update all/single downloaded light novel
10 | - Support images and navigation
11 | - Support multiprocessing to speed up downloads
12 | """
13 |
14 | import argparse
15 | import json
16 | import re
17 | import time
18 | import logging
19 | from io import BytesIO
20 | from multiprocessing.dummy import Pool as ThreadPool
21 | from os import mkdir
22 | from os.path import isdir, isfile, join
23 | from typing import Dict, List, Optional, Tuple, Any
24 | from dataclasses import dataclass, field
25 |
26 | import questionary
27 | import requests
28 | import tqdm
29 | from bs4 import BeautifulSoup
30 | from ebooklib import epub
31 | from PIL import Image
32 |
33 | # Configure logging
34 | logging.basicConfig(
35 | level=logging.INFO,
36 | format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
37 | )
38 | logger = logging.getLogger(__name__)
39 |
40 | # Constants
41 | DOMAINS = ['ln.hako.vn', 'docln.net', 'docln.sbs']
42 | SLEEP_TIME = 30
43 | LINE_SIZE = 80
44 | THREAD_NUM = 8
45 | HEADERS = {
46 | 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.97 Safari/537.36'
47 | }
48 | TOOL_VERSION = '2.0.6'
49 | HTML_PARSER = 'html.parser'
50 |
51 | # Session for requests
52 | session = requests.Session()
53 |
54 |
55 | @dataclass
56 | class Chapter:
57 | """Represents a chapter in a light novel."""
58 | name: str
59 | url: str
60 |
61 |
62 | @dataclass
63 | class Volume:
64 | """Represents a volume in a light novel."""
65 | url: str = ''
66 | name: str = ''
67 | cover_img: str = ''
68 | num_chapters: int = 0
69 | chapters: Dict[str, str] = field(default_factory=dict) # name -> url
70 |
71 |
72 | @dataclass
73 | class LightNovel:
74 | """Represents a light novel with all its information."""
75 | name: str = ''
76 | url: str = ''
77 | num_volumes: int = 0
78 | author: str = ''
79 | summary: str = ''
80 | series_info: str = ''
81 | fact_item: str = ''
82 | volumes: List[Volume] = field(default_factory=list)
83 |
84 |
85 | class ColorCodes:
86 | """ANSI color codes for terminal output."""
87 | HEADER = '\033[95m'
88 | OKBLUE = '\03[94m'
89 | OKCYAN = '\033[96m'
90 | OKGREEN = '\033[92m'
91 | OKORANGE = '\033[93m'
92 | FAIL = '\03[91m'
93 | ENDC = '\033[0m'
94 | BOLD = '\033[1m'
95 | UNDERLINE = '\033[4m'
96 |
97 |
98 | class NetworkManager:
99 | """Handles network requests with retry logic."""
100 |
101 | @staticmethod
102 | def check_available_request(url: str, stream: bool = False) -> requests.Response:
103 | """
104 | Check if a request to the given URL is available and handle retries.
105 |
106 | Args:
107 | url: The URL to request
108 | stream: Whether to stream the response
109 |
110 | Returns:
111 | The response object
112 |
113 | Raises:
114 | requests.RequestException: If the request fails after retries
115 | """
116 | if not url.startswith("http"):
117 | url = "https://" + url
118 |
119 | # Try each domain in order until one works
120 | original_url = url
121 | domains_to_try = DOMAINS[:] if DOMAINS else ["ln.hako.vn"]
122 |
123 | # Extract path from URL for domain replacement
124 | path = url
125 | for domain in DOMAINS:
126 | if f"https://{domain}" in url:
127 | path = url.split(f"https://{domain}", 1)[1]
128 | break
129 | elif f"http://{domain}" in url:
130 | path = url.split(f"http://{domain}", 1)[1]
131 | break
132 |
133 | last_exception = None
134 |
135 | # Try each domain
136 | for domain in domains_to_try:
137 | # Construct URL with current domain
138 | if any(f"https://{old_domain}" in original_url or f"http://{old_domain}" in original_url for old_domain in DOMAINS):
139 | url = f"https://{domain}{path}"
140 | else:
141 | url = original_url
142 |
143 | # Update headers with referer
144 | headers = HEADERS.copy()
145 | headers['Referer'] = f'https://{domain}'
146 |
147 | retry_count = 0
148 | max_retries = 3
149 | while retry_count < max_retries:
150 | try:
151 | response = session.get(
152 | url, stream=stream, headers=headers, timeout=30)
153 | if response.status_code in range(200, 299):
154 | return response
155 | elif response.status_code == 404:
156 | # Don't retry on 404
157 | break
158 | else:
159 | # Retry on other status codes
160 | retry_count += 1
161 | if retry_count < max_retries:
162 | logger.debug(
163 | f"Request to {url} failed with status {response.status_code}. "
164 | f"Retrying in {SLEEP_TIME}s... (Attempt {retry_count}/{max_retries})"
165 | )
166 | time.sleep(SLEEP_TIME)
167 | except requests.RequestException as e:
168 | retry_count += 1
169 | last_exception = e
170 | if retry_count < max_retries:
171 | logger.debug(
172 | f"Request to {url} failed with exception: {e}. "
173 | f"Retrying in {SLEEP_TIME}s... (Attempt {retry_count}/{max_retries})"
174 | )
175 | time.sleep(SLEEP_TIME)
176 |
177 | # If we get here, this domain failed. Try the next one.
178 | logger.debug(f"Domain {domain} failed, trying next domain...")
179 |
180 | # If all domains failed, raise the last exception
181 | if last_exception:
182 | raise last_exception
183 | else:
184 | # Create a generic exception if we don't have one
185 | raise requests.RequestException(
186 | f"Failed to get response from {original_url} using any domain")
187 |
188 |
189 | class TextUtils:
190 | """Utility functions for text processing."""
191 |
192 | @staticmethod
193 | def format_text(text: str) -> str:
194 | """
195 | Format text by stripping and replacing newlines.
196 |
197 | Args:
198 | text: The text to format
199 |
200 | Returns:
201 | The formatted text
202 | """
203 | return text.strip().replace('\n', '')
204 |
205 | @staticmethod
206 | def format_filename(name: str) -> str:
207 | """
208 | Format filename by removing special characters and limiting length.
209 |
210 | Args:
211 | name: The name to format
212 |
213 | Returns:
214 | The formatted filename
215 | """
216 | special_chars = ['?', '!', '.', ':', '\\',
217 | '/', '<', '>', '|', '*', '"', ',']
218 | for char in special_chars:
219 | name = name.replace(char, '')
220 | name = name.replace(' ', '-')
221 | if len(name) > 100:
222 | name = name[:100]
223 | return name
224 |
225 | @staticmethod
226 | def reformat_url(base_url: str, url: str) -> str:
227 | """
228 | Reformat URL to use the primary domain.
229 |
230 | Args:
231 | base_url: The base URL
232 | url: The URL to reformat
233 |
234 | Returns:
235 | The reformatted URL
236 | """
237 | # Extract domain from base_url
238 | domain = DOMAINS[0] if DOMAINS else "ln.hako.vn"
239 |
240 | # If URL already starts with a domain, replace it with the primary domain
241 | if url.startswith("/"):
242 | return domain + url
243 | else:
244 | # Handle full URLs by replacing the domain
245 | for old_domain in DOMAINS:
246 | if url.startswith(f"https://{old_domain}") or url.startswith(f"http://{old_domain}"):
247 | path = url.split(old_domain, 1)[1]
248 | return f"https://{domain}{path}"
249 | # If no known domain found, just return the URL as is
250 | return url
251 |
252 |
253 | class ImageManager:
254 | """Handles image processing and downloading."""
255 |
256 | @staticmethod
257 | def get_image(image_url: str) -> Optional[Image.Image]:
258 | """
259 | Get image from URL.
260 |
261 | Args:
262 | image_url: The image URL
263 |
264 | Returns:
265 | The image object or None if failed
266 | """
267 | if 'imgur.com' in image_url and '.' not in image_url[-5:]:
268 | image_url += '.jpg'
269 |
270 | try:
271 | response = NetworkManager.check_available_request(
272 | image_url, stream=True)
273 | image = Image.open(response.raw).convert('RGB')
274 | return image
275 | except Exception as e:
276 | logger.error(f"Cannot get image: {image_url} - Error: {e}")
277 | return None
278 |
279 |
280 | class OutputFormatter:
281 | """Handles formatted output to the terminal."""
282 |
283 | @staticmethod
284 | def print_formatted(name: str = '', info: str = '', info_style: str = 'bold fg:orange', prefix: str = '! ') -> None:
285 | """
286 | Print formatted output using questionary.
287 |
288 | Args:
289 | name: The name to print
290 | info: The info to print
291 | info_style: The style for the info
292 | prefix: The prefix for the output
293 | """
294 | questionary.print(prefix, style='bold fg:gray', end='')
295 | questionary.print(name, style='bold fg:white', end='')
296 | questionary.print(info, style=info_style)
297 |
298 | @staticmethod
299 | def print_success(message: str, item_name: str = '') -> None:
300 | """Print a success message."""
301 | if item_name:
302 | print(
303 | f'{message} {ColorCodes.OKCYAN}{item_name}{ColorCodes.ENDC}: [{ColorCodes.OKGREEN} DONE {ColorCodes.ENDC}]')
304 | else:
305 | print(f'{message}: [{ColorCodes.OKGREEN} DONE {ColorCodes.ENDC}]')
306 |
307 | @staticmethod
308 | def print_error(message: str, item_name: str = '') -> None:
309 | """Print an error message."""
310 | if item_name:
311 | print(
312 | f'{message} {ColorCodes.OKCYAN}{item_name}{ColorCodes.ENDC}: [{ColorCodes.FAIL} FAIL {ColorCodes.ENDC}]')
313 | else:
314 | print(f'{message}: [{ColorCodes.FAIL} FAIL {ColorCodes.ENDC}]')
315 |
316 |
317 | class UpdateManager:
318 | """Handles updating of existing light novels."""
319 |
320 | def __init__(self, json_file: str = 'ln_info.json'):
321 | self.json_file = json_file
322 |
323 | def check_updates(self, ln_url: str = 'all') -> None:
324 | """
325 | Check for updates for light novels.
326 |
327 | Args:
328 | ln_url: The light novel URL or 'all' for all novels
329 | """
330 | try:
331 | if not isfile(self.json_file):
332 | logger.warning('Cannot find ln_info.json file!')
333 | return
334 |
335 | with open(self.json_file, 'r', encoding='utf-8') as file:
336 | data = json.load(file)
337 |
338 | ln_list = data.get('ln_list', [])
339 | for ln_data in ln_list:
340 | if ln_url == 'all':
341 | self._check_update_single(ln_data)
342 | elif ln_url == ln_data.get('ln_url'):
343 | self._check_update_single(ln_data, 'updatevol')
344 |
345 | except FileNotFoundError:
346 | logger.error('ln_info.json file not found!')
347 | except json.JSONDecodeError as e:
348 | logger.error(f'Error parsing ln_info.json: {e}')
349 | except Exception as e:
350 | logger.error(f'Error processing ln_info.json: {e}')
351 |
352 | def _check_update_single(self, ln_data: Dict[str, Any], mode: str = '') -> None:
353 | """
354 | Check for updates for a single light novel.
355 |
356 | Args:
357 | ln_data: The light novel data
358 | mode: The update mode
359 | """
360 | ln_name = ln_data.get('ln_name', 'Unknown')
361 | OutputFormatter.print_formatted('Checking update: ', ln_name)
362 | ln_url = ln_data.get('ln_url')
363 |
364 | try:
365 | response = NetworkManager.check_available_request(ln_url)
366 | soup = BeautifulSoup(response.text, HTML_PARSER)
367 |
368 | # Create new light novel object with updated info
369 | new_ln = self._get_updated_ln_info(ln_url, soup)
370 |
371 | if mode == 'updatevol':
372 | self._update_volumes(ln_data, new_ln)
373 | else:
374 | self._update_light_novel(ln_data, new_ln)
375 |
376 | OutputFormatter.print_success('Update', ln_name)
377 | print('-' * LINE_SIZE)
378 |
379 | except requests.RequestException as e:
380 | logger.error(f'Network error while checking light novel info: {e}')
381 | OutputFormatter.print_error('Update', ln_name)
382 | print('Error: Network error while checking light novel info!')
383 | print('-' * LINE_SIZE)
384 | except Exception as e:
385 | logger.error(f'Error checking light novel info: {e}')
386 | OutputFormatter.print_error('Update', ln_name)
387 | print('Error: Cannot check light novel info!')
388 | print('-' * LINE_SIZE)
389 |
390 | def _get_updated_ln_info(self, ln_url: str, soup: BeautifulSoup) -> LightNovel:
391 | """
392 | Get updated light novel information.
393 |
394 | Args:
395 | ln_url: The light novel URL
396 | soup: The parsed HTML
397 |
398 | Returns:
399 | Updated light novel information
400 | """
401 | ln = LightNovel()
402 | ln.url = ln_url
403 |
404 | # Get name
405 | name_element = soup.find('span', 'series-name')
406 | ln.name = TextUtils.format_text(
407 | name_element.text) if name_element else "Unknown Light Novel"
408 |
409 | # Get series info
410 | series_info = soup.find('div', 'series-information')
411 | if series_info:
412 | # Clean up anchor tags
413 | for a in soup.find_all('a'):
414 | try:
415 | del a[':href']
416 | except KeyError:
417 | pass
418 | ln.series_info = str(series_info)
419 |
420 | # Extract author
421 | info_items = series_info.find_all('div', 'info-item')
422 | if info_items:
423 | author_div = info_items[0].find(
424 | 'a') if len(info_items) > 0 else None
425 | if author_div:
426 | ln.author = TextUtils.format_text(author_div.text)
427 | elif len(info_items) > 1:
428 | author_div = info_items[1].find('a')
429 | if author_div:
430 | ln.author = TextUtils.format_text(author_div.text)
431 |
432 | # Get summary
433 | summary_content = soup.find('div', 'summary-content')
434 | if summary_content:
435 | ln.summary = 'Tóm tắt
' + str(summary_content)
436 |
437 | # Get fact item
438 | fact_item = soup.find('div', 'fact-item')
439 | if fact_item:
440 | ln.fact_item = str(fact_item)
441 |
442 | # Get volumes
443 | volume_sections = soup.find_all('section', 'volume-list')
444 | ln.num_volumes = len(volume_sections)
445 |
446 | for volume_section in volume_sections:
447 | volume = Volume()
448 |
449 | # Get volume name
450 | name_element = volume_section.find('span', 'sect-title')
451 | volume.name = TextUtils.format_text(
452 | name_element.text) if name_element else "Unknown Volume"
453 |
454 | # Get volume URL
455 | cover_element = volume_section.find('div', 'volume-cover')
456 | if cover_element:
457 | a_tag = cover_element.find('a')
458 | if a_tag and a_tag.get('href'):
459 | volume.url = TextUtils.reformat_url(
460 | ln_url, a_tag.get('href'))
461 |
462 | # Get volume details
463 | try:
464 | vol_response = NetworkManager.check_available_request(
465 | volume.url)
466 | vol_soup = BeautifulSoup(
467 | vol_response.text, HTML_PARSER)
468 |
469 | # Get cover image
470 | cover_element = vol_soup.find('div', 'series-cover')
471 | if cover_element:
472 | img_element = cover_element.find(
473 | 'div', 'img-in-ratio')
474 | if img_element and img_element.get('style'):
475 | style = img_element.get('style')
476 | if len(style) > 25:
477 | volume.cover_img = style[23:-2]
478 |
479 | # Get chapters
480 | chapter_list_element = vol_soup.find(
481 | 'ul', 'list-chapters')
482 | if chapter_list_element:
483 | chapter_items = chapter_list_element.find_all('li')
484 | volume.num_chapters = len(chapter_items)
485 |
486 | for chapter_item in chapter_items:
487 | a_tag = chapter_item.find('a')
488 | if a_tag:
489 | chapter_name = TextUtils.format_text(
490 | a_tag.text)
491 | chapter_url = TextUtils.reformat_url(
492 | volume.url, a_tag.get('href'))
493 | volume.chapters[chapter_name] = chapter_url
494 | except Exception as e:
495 | logger.error(f"Error getting volume details: {e}")
496 |
497 | ln.volumes.append(volume)
498 |
499 | return ln
500 |
501 | def _update_volumes(self, old_ln: Dict[str, Any], new_ln: LightNovel) -> None:
502 | """
503 | Update volumes for a light novel.
504 |
505 | Args:
506 | old_ln: The old light novel data
507 | new_ln: The new light novel data
508 | """
509 | old_volume_names = [vol.get('vol_name')
510 | for vol in old_ln.get('vol_list', [])]
511 | new_volume_names = [vol.name for vol in new_ln.volumes]
512 |
513 | existed_prefix = 'Existed: '
514 | new_prefix = 'New: '
515 |
516 | volume_titles = [existed_prefix + name for name in old_volume_names]
517 | all_existed_volumes = f'All existed volumes ({len(old_volume_names)} volumes)'
518 |
519 | all_volumes = ''
520 |
521 | if old_volume_names != new_volume_names:
522 | new_volume_titles = [
523 | new_prefix + name for name in new_volume_names if name not in old_volume_names]
524 | volume_titles += new_volume_titles
525 | all_volumes = f'All volumes ({len(volume_titles)} volumes)'
526 | volume_titles.insert(0, all_existed_volumes)
527 | volume_titles.insert(
528 | 0, questionary.Choice(all_volumes, checked=True))
529 | else:
530 | volume_titles.insert(0, questionary.Choice(
531 | all_existed_volumes, checked=True))
532 |
533 | selected_volumes = questionary.checkbox(
534 | 'Select volumes to update:', choices=volume_titles).ask()
535 |
536 | if selected_volumes:
537 | if all_volumes in selected_volumes:
538 | self._update_light_novel(old_ln, new_ln)
539 | elif all_existed_volumes in selected_volumes:
540 | for volume in new_ln.volumes:
541 | if volume.name in old_volume_names:
542 | self._update_chapters(new_ln, volume, old_ln)
543 | else:
544 | new_volume_names_selected = [
545 | vol[len(new_prefix):] for vol in selected_volumes if new_prefix in vol]
546 | old_volume_names_selected = [
547 | vol[len(existed_prefix):] for vol in selected_volumes if existed_prefix in vol]
548 |
549 | for volume in new_ln.volumes:
550 | if volume.name in old_volume_names_selected:
551 | self._update_chapters(new_ln, volume, old_ln)
552 | elif volume.name in new_volume_names_selected:
553 | self._update_new_volume(new_ln, volume)
554 |
555 | def _update_light_novel(self, old_ln: Dict[str, Any], new_ln: LightNovel) -> None:
556 | """
557 | Update a light novel.
558 |
559 | Args:
560 | old_ln: The old light novel data
561 | new_ln: The new light novel data
562 | """
563 | old_volume_names = [vol.get('vol_name')
564 | for vol in old_ln.get('vol_list', [])]
565 |
566 | for volume in new_ln.volumes:
567 | if volume.name not in old_volume_names:
568 | self._update_new_volume(new_ln, volume)
569 | else:
570 | self._update_chapters(new_ln, volume, old_ln)
571 |
572 | def _update_new_volume(self, ln: LightNovel, volume: Volume) -> None:
573 | """
574 | Update a new volume.
575 |
576 | Args:
577 | ln: The light novel data
578 | volume: The volume to update
579 | """
580 | OutputFormatter.print_formatted(
581 | 'Updating volume: ', volume.name, info_style='bold fg:cyan')
582 |
583 | # Create a temporary light novel with just this volume
584 | temp_ln = LightNovel(
585 | name=ln.name,
586 | url=ln.url,
587 | author=ln.author,
588 | summary=ln.summary,
589 | series_info=ln.series_info,
590 | fact_item=ln.fact_item,
591 | volumes=[volume]
592 | )
593 |
594 | epub_engine = EpubEngine()
595 | epub_engine.create_epub(temp_ln)
596 | OutputFormatter.print_success('Updating volume', volume.name)
597 | print('-' * LINE_SIZE)
598 |
599 | def _update_chapters(self, new_ln: LightNovel, volume: Volume, old_ln: Dict[str, Any]) -> None:
600 | """
601 | Update new chapters in a volume.
602 |
603 | Args:
604 | new_ln: The new light novel data
605 | volume: The volume to update
606 | old_ln: The old light novel data
607 | """
608 | OutputFormatter.print_formatted(
609 | 'Checking volume: ', volume.name, info_style='bold fg:cyan')
610 |
611 | for old_volume in old_ln.get('vol_list', []):
612 | if volume.name == old_volume.get('vol_name'):
613 | new_chapter_names = list(volume.chapters.keys())
614 | old_chapter_names = old_volume.get('chapter_list', [])
615 | volume_chapter_names = []
616 |
617 | for i in range(len(old_chapter_names)):
618 | if old_chapter_names[i] in new_chapter_names:
619 | volume_chapter_names = new_chapter_names[new_chapter_names.index(
620 | old_chapter_names[i]):]
621 | break
622 |
623 | # Remove chapters that already exist or are not in the update range
624 | for chapter_name in list(volume.chapters.keys()):
625 | if chapter_name in old_chapter_names or chapter_name not in volume_chapter_names:
626 | volume.chapters.pop(chapter_name, None)
627 |
628 | if volume.chapters:
629 | OutputFormatter.print_formatted(
630 | 'Updating volume: ', volume.name, info_style='bold fg:cyan')
631 | epub_engine = EpubEngine()
632 | epub_engine.update_epub(new_ln, volume)
633 | OutputFormatter.print_success('Updating', volume.name)
634 |
635 | OutputFormatter.print_success('Checking volume', volume.name)
636 | print('-' * LINE_SIZE)
637 |
638 | def update_json(self, ln: LightNovel) -> None:
639 | """
640 | Update the JSON file with light novel information.
641 |
642 | Args:
643 | ln: The light novel data
644 | """
645 | try:
646 | print('Updating ln_info.json...', end='\r')
647 |
648 | if not isfile(self.json_file):
649 | self._create_json(ln)
650 | return
651 |
652 | with open(self.json_file, 'r', encoding='utf-8') as file:
653 | data = json.load(file)
654 |
655 | ln_urls = [item.get('ln_url') for item in data.get('ln_list', [])]
656 |
657 | if ln.url not in ln_urls:
658 | # Add new light novel
659 | new_ln_data = {
660 | 'ln_name': ln.name,
661 | 'ln_url': ln.url,
662 | 'num_vol': ln.num_volumes,
663 | 'vol_list': [{
664 | 'vol_name': volume.name,
665 | 'num_chapter': volume.num_chapters,
666 | 'chapter_list': list(volume.chapters.keys())
667 | } for volume in ln.volumes]
668 | }
669 | data['ln_list'].append(new_ln_data)
670 | else:
671 | # Update existing light novel
672 | for i, ln_item in enumerate(data.get('ln_list', [])):
673 | if ln.url == ln_item.get('ln_url'):
674 | if ln.name != ln_item.get('ln_name'):
675 | data['ln_list'][i]['ln_name'] = ln.name
676 |
677 | existing_volume_names = [
678 | vol.get('vol_name') for vol in ln_item.get('vol_list', [])]
679 |
680 | for volume in ln.volumes:
681 | if volume.name not in existing_volume_names:
682 | # Add new volume
683 | new_volume = {
684 | 'vol_name': volume.name,
685 | 'num_chapter': volume.num_chapters,
686 | 'chapter_list': list(volume.chapters.keys())
687 | }
688 | data['ln_list'][i]['vol_list'].append(
689 | new_volume)
690 | else:
691 | # Update existing volume chapters
692 | for j, vol_item in enumerate(ln_item.get('vol_list', [])):
693 | if volume.name == vol_item.get('vol_name'):
694 | for chapter_name in volume.chapters.keys():
695 | if chapter_name not in vol_item.get('chapter_list', []):
696 | data['ln_list'][i]['vol_list'][j]['chapter_list'].append(
697 | chapter_name)
698 |
699 | with open(self.json_file, 'w', encoding='utf-8') as file:
700 | json.dump(data, file, indent=4, ensure_ascii=False)
701 |
702 | OutputFormatter.print_success('Updating ln_info.json')
703 | print('-' * LINE_SIZE)
704 |
705 | except FileNotFoundError:
706 | logger.error('ln_info.json file not found!')
707 | OutputFormatter.print_error('Updating ln_info.json')
708 | print('Error: ln_info.json file not found!')
709 | print('-' * LINE_SIZE)
710 | except json.JSONDecodeError as e:
711 | logger.error(f'Error parsing ln_info.json: {e}')
712 | OutputFormatter.print_error('Updating ln_info.json')
713 | print('Error: Invalid JSON in ln_info.json!')
714 | print('-' * LINE_SIZE)
715 | except Exception as e:
716 | logger.error(f'Error updating ln_info.json: {e}')
717 | OutputFormatter.print_error('Updating ln_info.json')
718 | print('Error: Cannot update ln_info.json!')
719 | print('-' * LINE_SIZE)
720 |
721 | def _create_json(self, ln: LightNovel) -> None:
722 | """
723 | Create a new JSON file with light novel information.
724 |
725 | Args:
726 | ln: The light novel data
727 | """
728 | try:
729 | print('Creating ln_info.json...', end='\r')
730 |
731 | data = {
732 | 'ln_list': [{
733 | 'ln_name': ln.name,
734 | 'ln_url': ln.url,
735 | 'num_vol': ln.num_volumes,
736 | 'vol_list': [{
737 | 'vol_name': volume.name,
738 | 'num_chapter': volume.num_chapters,
739 | 'chapter_list': list(volume.chapters.keys())
740 | } for volume in ln.volumes]
741 | }]
742 | }
743 |
744 | with open(self.json_file, 'w', encoding='utf-8') as file:
745 | json.dump(data, file, indent=4, ensure_ascii=False)
746 |
747 | OutputFormatter.print_success('Creating ln_info.json')
748 | print('-' * LINE_SIZE)
749 | except Exception as e:
750 | logger.error(f'Error creating ln_info.json: {e}')
751 | OutputFormatter.print_error('Creating ln_info.json')
752 | print('Error: Cannot create ln_info.json!')
753 | print('-' * LINE_SIZE)
754 |
755 |
756 | class EpubEngine:
757 | """Class for creating and managing EPUB files."""
758 |
759 | def __init__(self, json_file: str = 'ln_info.json'):
760 | self.json_file = json_file
761 | self.book = None
762 | self.light_novel = None
763 | self.volume = None
764 |
765 | def make_cover_image(self) -> Optional[epub.EpubItem]:
766 | """
767 | Create a cover image for the EPUB.
768 |
769 | Returns:
770 | The cover image item or None if failed
771 | """
772 | try:
773 | print('Making cover image...', end='\r')
774 | image = ImageManager.get_image(self.volume.cover_img)
775 | if image is None:
776 | raise Exception("Failed to get cover image")
777 |
778 | buffer = BytesIO()
779 | image.save(buffer, 'jpeg')
780 | image_data = buffer.getvalue()
781 |
782 | cover_image = epub.EpubItem(
783 | file_name='cover_image.jpeg',
784 | media_type='image/jpeg',
785 | content=image_data
786 | )
787 | OutputFormatter.print_success('Making cover image')
788 | return cover_image
789 | except Exception as e:
790 | logger.error(f'Error making cover image: {e}')
791 | OutputFormatter.print_error('Making cover image')
792 | print('Error: Cannot get cover image!')
793 | print('-' * LINE_SIZE)
794 | return None
795 |
796 | def set_metadata(self, title: str, author: str, lang: str = 'vi') -> None:
797 | """
798 | Set metadata for the EPUB book.
799 |
800 | Args:
801 | title: The book title
802 | author: The book author
803 | lang: The book language
804 | """
805 | self.book.set_title(title)
806 | self.book.set_language(lang)
807 | self.book.add_author(author)
808 |
809 | def make_intro_page(self) -> epub.EpubHtml:
810 | """
811 | Create an introduction page for the EPUB.
812 |
813 | Returns:
814 | The introduction page
815 | """
816 | print('Making intro page...', end='\r')
817 | github_url = 'https://github.com/quantrancse/hako2epub'
818 |
819 | intro_html = ''
820 |
821 | cover_image = self.make_cover_image()
822 | if cover_image:
823 | self.book.add_item(cover_image)
824 | intro_html += f'

'
825 |
826 | intro_html += f'''
827 |
828 |
{self.light_novel.name}
829 | {self.volume.name}
830 |
831 | '''
832 |
833 | intro_html += self.light_novel.series_info
834 | intro_html += self.light_novel.fact_item
835 | intro_html += '
'
836 |
837 | if ':class' in intro_html:
838 | intro_html = intro_html.replace(
839 | '"":class="{ \'fade-in\': more }" ""', '')
840 |
841 | OutputFormatter.print_success('Making intro page')
842 | return epub.EpubHtml(
843 | uid='intro',
844 | file_name='intro.xhtml',
845 | title='Intro',
846 | content=intro_html,
847 | )
848 |
849 | def make_chapters(self, start_index: int = 0) -> None:
850 | """
851 | Create chapters for the EPUB.
852 |
853 | Args:
854 | start_index: Starting chapter index
855 | """
856 | chapter_data = []
857 | for i, (name, url) in enumerate(self.volume.chapters.items(), start_index):
858 | chapter_data.append((i, name, url))
859 |
860 | pool = ThreadPool(THREAD_NUM)
861 | contents = []
862 | try:
863 | print(
864 | '[THE PROCESS WILL BE PAUSE WHEN IT GETTING BLOCK. PLEASE BE PATIENT IF IT HANGS]')
865 | contents = list(tqdm.tqdm(pool.imap_unordered(self._make_chapter_content, chapter_data),
866 | total=len(chapter_data),
867 | desc='Making chapter contents: '))
868 | contents.sort(key=lambda x: x[0])
869 | contents = [content[1] for content in contents if content]
870 | except Exception as e:
871 | logger.error(f'Error making chapter contents: {e}')
872 | finally:
873 | pool.close()
874 | pool.join()
875 |
876 | for content in contents:
877 | if content: # Only add if content was successfully created
878 | self.book.add_item(content)
879 | self.book.spine.append(content)
880 | self.book.toc.append(content)
881 |
882 | def _make_chapter_content(self, chapter_data: Tuple[int, str, str]) -> Optional[Tuple[int, epub.EpubHtml]]:
883 | """
884 | Create content for a chapter.
885 |
886 | Args:
887 | chapter_data: Tuple of (index, name, url)
888 |
889 | Returns:
890 | Tuple of (index, chapter content) or None if failed
891 | """
892 | try:
893 | index, name, url = chapter_data
894 |
895 | response = NetworkManager.check_available_request(url)
896 | soup = BeautifulSoup(response.text, HTML_PARSER)
897 |
898 | filename = f'chap_{index + 1}.xhtml'
899 |
900 | # Get chapter title
901 | title_element = soup.find('div', 'title-top')
902 | chapter_title = title_element.find(
903 | 'h4').text if title_element and title_element.find('h4') else f'Chapter {index + 1}'
904 |
905 | content = f' {chapter_title}
'
906 |
907 | # Get chapter content
908 | content_div = soup.find('div', id='chapter-content')
909 | if content_div:
910 | content += self._process_images(content_div, index + 1)
911 |
912 | # Get notes
913 | notes = self._get_chapter_notes(soup)
914 | content = self._replace_notes(content, notes)
915 |
916 | epub_content = epub.EpubHtml(
917 | uid=str(index + 1),
918 | title=chapter_title,
919 | file_name=filename,
920 | content=content
921 | )
922 |
923 | return (index, epub_content)
924 |
925 | except requests.RequestException as e:
926 | logger.error(
927 | f'Network error while getting chapter contents: {e} - URL: {url}')
928 | OutputFormatter.print_error('Making chapter contents')
929 | print(
930 | f'Error: Network error while getting chapter contents! {url}')
931 | print('-' * LINE_SIZE)
932 | return None
933 | except Exception as e:
934 | logger.error(f'Error getting chapter contents: {e} - URL: {url}')
935 | OutputFormatter.print_error('Making chapter contents')
936 | print(f'Error: Cannot get chapter contents! {url}')
937 | print('-' * LINE_SIZE)
938 | return None
939 |
940 | def _process_images(self, content_div: BeautifulSoup, chapter_id: int) -> str:
941 | """
942 | Process images in chapter content.
943 |
944 | Args:
945 | content_div: The chapter content div
946 | chapter_id: The chapter ID
947 |
948 | Returns:
949 | The processed content with images
950 | """
951 | # Remove unwanted elements
952 | content_div.find('div', class_='flex')
953 | for element in content_div.find_all('p', {'target': '__blank'}):
954 | element.decompose()
955 |
956 | img_tags = content_div.find_all('img')
957 | content = str(content_div)
958 |
959 | if img_tags:
960 | for i, img_tag in enumerate(img_tags):
961 | img_url = img_tag.get('src')
962 | if img_url and "chapter-banners" not in img_url:
963 | try:
964 | image = ImageManager.get_image(img_url)
965 | if image is None:
966 | continue
967 |
968 | buffer = BytesIO()
969 | image.save(buffer, 'jpeg')
970 | image_data = buffer.getvalue()
971 |
972 | img_path = f'images/chapter_{chapter_id}/image_{i}.jpeg'
973 | image_item = epub.EpubItem(
974 | file_name=img_path,
975 | media_type='image/jpeg',
976 | content=image_data
977 | )
978 |
979 | self.book.add_item(image_item)
980 |
981 | old_path = f'src="{img_url}'
982 | new_path = f'style="display: block;margin-left: auto;margin-right: auto;" src="{img_path}'
983 | content = content.replace(old_path, new_path)
984 | except Exception as e:
985 | logger.error(
986 | f'Error processing chapter image: {e} - Chapter ID: {chapter_id}')
987 | print(
988 | f'Error: Cannot get chapter images! {chapter_id}')
989 | print('-' * LINE_SIZE)
990 | return content
991 |
992 | def _get_chapter_notes(self, soup: BeautifulSoup) -> Dict[str, str]:
993 | """
994 | Get notes from chapter content.
995 |
996 | Args:
997 | soup: The chapter content soup
998 |
999 | Returns:
1000 | Dictionary of notes
1001 | """
1002 | notes = {}
1003 | note_divs = soup.find_all('div', id=re.compile("^note"))
1004 | for div in note_divs:
1005 | note_id = div.get('id')
1006 | if note_id:
1007 | note_tag = f'[{note_id}]'
1008 | content_span = div.find('span', class_='note-content_real')
1009 | if content_span:
1010 | note_content = content_span.text
1011 | note_text = f'(Note: {note_content})'
1012 | notes[note_tag] = note_text
1013 | return notes
1014 |
1015 | def _replace_notes(self, content: str, notes: Dict[str, str]) -> str:
1016 | """
1017 | Replace note tags in chapter content.
1018 |
1019 | Args:
1020 | content: The chapter content
1021 | notes: Dictionary of notes
1022 |
1023 | Returns:
1024 | The processed chapter content
1025 | """
1026 | for note_tag, note_text in notes.items():
1027 | content = content.replace(note_tag, note_text)
1028 | return content
1029 |
1030 | def bind_epub_book(self) -> None:
1031 | """
1032 | Bind all components into an EPUB book.
1033 | """
1034 | intro_page = self.make_intro_page()
1035 | self.book.add_item(intro_page)
1036 |
1037 | try:
1038 | response = NetworkManager.check_available_request(
1039 | self.volume.cover_img, stream=True)
1040 | self.book.set_cover('cover.jpeg', response.content)
1041 | except requests.RequestException as e:
1042 | logger.error(f'Network error while setting cover image: {e}')
1043 | print('Error: Network error while setting cover image!')
1044 | print('-' * LINE_SIZE)
1045 | except Exception as e:
1046 | logger.error(f'Error setting cover image: {e}')
1047 | print('Error: Cannot set cover image!')
1048 | print('-' * LINE_SIZE)
1049 |
1050 | self.book.spine = ['cover', intro_page, 'nav']
1051 |
1052 | self.make_chapters()
1053 | self.book.add_item(epub.EpubNcx())
1054 | self.book.add_item(epub.EpubNav())
1055 |
1056 | filename = TextUtils.format_filename(
1057 | f'{self.volume.name}-{self.light_novel.name}') + '.epub'
1058 | self.set_metadata(filename, self.light_novel.author)
1059 |
1060 | folder_name = TextUtils.format_filename(self.light_novel.name)
1061 | if not isdir(folder_name):
1062 | mkdir(folder_name)
1063 |
1064 | filepath = join(folder_name, filename)
1065 |
1066 | try:
1067 | epub.write_epub(filepath, self.book, {})
1068 | except Exception as e:
1069 | logger.error(f'Error writing epub file: {e}')
1070 | print('Error: Cannot write epub file!')
1071 | print('-' * LINE_SIZE)
1072 |
1073 | def create_epub(self, ln: LightNovel) -> None:
1074 | """
1075 | Create EPUB files for all volumes.
1076 |
1077 | Args:
1078 | ln: The light novel data
1079 | """
1080 | self.light_novel = ln
1081 | for volume in ln.volumes:
1082 | OutputFormatter.print_formatted(
1083 | 'Processing volume: ', volume.name, info_style='bold fg:cyan')
1084 | self.book = epub.EpubBook()
1085 | self.volume = volume
1086 | self.bind_epub_book()
1087 | OutputFormatter.print_success('Processing', volume.name)
1088 | print('-' * LINE_SIZE)
1089 | self._save_json(ln)
1090 |
1091 | def update_epub(self, ln: LightNovel, volume: Volume) -> None:
1092 | """
1093 | Update an existing EPUB file.
1094 |
1095 | Args:
1096 | ln: The light novel data
1097 | volume: The volume to update
1098 | """
1099 | filename = TextUtils.format_filename(
1100 | f'{volume.name}-{ln.name}') + '.epub'
1101 | folder_name = TextUtils.format_filename(ln.name)
1102 | filepath = join(folder_name, filename)
1103 |
1104 | if isfile(filepath):
1105 | try:
1106 | self.book = epub.read_epub(filepath)
1107 | except Exception as e:
1108 | logger.error(f'Error reading epub file: {e}')
1109 | print('Error: Cannot read epub file!')
1110 | print('-' * LINE_SIZE)
1111 | return
1112 |
1113 | existing_chapters = [item.file_name for item in self.book.get_items()
1114 | if item.file_name.startswith('chap')]
1115 |
1116 | self.light_novel = ln
1117 | self.volume = volume
1118 | self.make_chapters(len(existing_chapters))
1119 |
1120 | # Remove old TOC
1121 | # Create a copy to avoid modification during iteration
1122 | for item in self.book.items[:]:
1123 | if item.file_name == 'toc.ncx':
1124 | self.book.items.remove(item)
1125 |
1126 | self.book.add_item(epub.EpubNcx())
1127 |
1128 | try:
1129 | epub.write_epub(filepath, self.book, {})
1130 | except Exception as e:
1131 | logger.error(f'Error writing epub file: {e}')
1132 | print('Error: Cannot write epub file!')
1133 | print('-' * LINE_SIZE)
1134 |
1135 | self._save_json(ln)
1136 | else:
1137 | print('Cannot find the old light novel path!')
1138 | print('Creating the new one...')
1139 | self.create_epub(ln)
1140 |
1141 | def _save_json(self, ln: LightNovel) -> None:
1142 | """
1143 | Save light novel information to JSON.
1144 |
1145 | Args:
1146 | ln: The light novel data
1147 | """
1148 | update_manager = UpdateManager(self.json_file)
1149 | update_manager.update_json(ln)
1150 |
1151 |
1152 | class LightNovelManager:
1153 | """Manages light novel operations."""
1154 |
1155 | def __init__(self):
1156 | self.json_file = 'ln_info.json'
1157 |
1158 | def _check_domains(self) -> None:
1159 | """Check which domains are accessible."""
1160 | global DOMAINS
1161 | accessible_domains = []
1162 |
1163 | # Always put the primary domain first, then check others
1164 | primary_domain = DOMAINS[0] if DOMAINS else "ln.hako.vn"
1165 |
1166 | # Check primary domain first
1167 | try:
1168 | response = session.get(f"https://{primary_domain}", timeout=10)
1169 | response.raise_for_status()
1170 | accessible_domains.append(primary_domain)
1171 | logger.debug(f"Primary domain {primary_domain} is accessible")
1172 | except requests.RequestException as e:
1173 | logger.debug(
1174 | f"Primary domain {primary_domain} is not accessible: {e}")
1175 |
1176 | # Check other domains
1177 | for domain in DOMAINS[1:]: # Skip the primary domain
1178 | try:
1179 | response = session.get(f"https://{domain}", timeout=10)
1180 | response.raise_for_status()
1181 | accessible_domains.append(domain)
1182 | logger.debug(f"Domain {domain} is accessible")
1183 | except requests.RequestException as e:
1184 | logger.debug(f"Domain {domain} is not accessible: {e}")
1185 |
1186 | DOMAINS = accessible_domains
1187 |
1188 | if not DOMAINS:
1189 | logger.error("No domains are accessible. Exiting.")
1190 | print(
1191 | "Error: No domains are accessible. Please check your internet connection.")
1192 | exit(1)
1193 | else:
1194 | logger.debug(f"Accessible domains: {DOMAINS}")
1195 |
1196 | def _check_for_updates(self) -> None:
1197 | """Check for tool updates."""
1198 | try:
1199 | release_api = 'https://api.github.com/repos/quantrancse/hako2epub/releases/latest'
1200 | response = requests.get(release_api, headers=HEADERS, timeout=5)
1201 | response.raise_for_status()
1202 | data = response.json()
1203 | latest_release = data['tag_name'][1:]
1204 |
1205 | if TOOL_VERSION != latest_release:
1206 | OutputFormatter.print_formatted(
1207 | 'Current tool version: ', TOOL_VERSION, info_style='bold fg:red')
1208 | OutputFormatter.print_formatted(
1209 | 'Latest tool version: ', latest_release, info_style='bold fg:green')
1210 | OutputFormatter.print_formatted(
1211 | 'Please upgrade the tool at: ', 'https://github.com/quantrancse/hako2epub', info_style='bold fg:cyan')
1212 | print('-' * LINE_SIZE)
1213 | except requests.RequestException as e:
1214 | logger.error(f"Failed to check for updates: {e}")
1215 | except KeyError as e:
1216 | logger.error(f"Failed to parse update response: {e}")
1217 | except Exception as e:
1218 | logger.error(f"Unexpected error while checking for updates: {e}")
1219 |
1220 | def _validate_url(self, url: str) -> bool:
1221 | """
1222 | Check if a URL is valid.
1223 |
1224 | Args:
1225 | url: The URL to check
1226 |
1227 | Returns:
1228 | True if valid, False otherwise
1229 | """
1230 | if not any(domain in url for domain in DOMAINS):
1231 | print('Invalid url. Please try again.')
1232 | return False
1233 | return True
1234 |
1235 | def _update_json_file(self) -> None:
1236 | """Update the JSON file by removing entries for deleted novels."""
1237 | try:
1238 | if not isfile(self.json_file):
1239 | return
1240 |
1241 | with open(self.json_file, 'r', encoding='utf-8') as file:
1242 | data = json.load(file)
1243 |
1244 | updated_data = data.copy()
1245 |
1246 | for ln_entry in data.get('ln_list', []):
1247 | ln_name = ln_entry.get('ln_name')
1248 | if not ln_name:
1249 | continue
1250 |
1251 | folder_name = TextUtils.format_filename(ln_name)
1252 | if not isdir(folder_name):
1253 | # Remove entry if folder doesn't exist
1254 | updated_data['ln_list'] = [entry for entry in updated_data['ln_list']
1255 | if entry.get('ln_name') != ln_name]
1256 | else:
1257 | # Check volumes
1258 | updated_volumes = ln_entry.get('vol_list', []).copy()
1259 | for volume_entry in ln_entry.get('vol_list', []):
1260 | volume_name = volume_entry.get('vol_name')
1261 | if not volume_name:
1262 | continue
1263 |
1264 | epub_name = TextUtils.format_filename(
1265 | f'{volume_name}-{ln_name}') + '.epub'
1266 | epub_path = join(folder_name, epub_name)
1267 | if not isfile(epub_path):
1268 | # Remove volume if EPUB doesn't exist
1269 | updated_volumes = [vol for vol in updated_volumes
1270 | if vol.get('vol_name') != volume_name]
1271 |
1272 | # Update the volume list
1273 | for entry in updated_data['ln_list']:
1274 | if ln_entry.get('ln_url') == entry.get('ln_url'):
1275 | entry['vol_list'] = updated_volumes
1276 |
1277 | # Save updated data
1278 | with open(self.json_file, 'w', encoding='utf-8') as file:
1279 | json.dump(updated_data, file, indent=4, ensure_ascii=False)
1280 |
1281 | except FileNotFoundError:
1282 | logger.warning('ln_info.json file not found!')
1283 | except json.JSONDecodeError as e:
1284 | logger.error(f'Error parsing ln_info.json: {e}')
1285 | except Exception as e:
1286 | logger.error(f'Error processing ln_info.json: {e}')
1287 |
1288 | def start(self, ln_url: str, mode: str) -> None:
1289 | """
1290 | Start the light novel manager.
1291 |
1292 | Args:
1293 | ln_url: The light novel URL
1294 | mode: The mode (default, chapter, update, update_all)
1295 | """
1296 | # Check domains and tool updates
1297 | self._check_domains()
1298 | self._check_for_updates()
1299 | self._update_json_file()
1300 |
1301 | if ln_url and self._validate_url(ln_url):
1302 | if mode == 'update':
1303 | update_manager = UpdateManager()
1304 | update_manager.check_updates(ln_url)
1305 | elif mode == 'chapter':
1306 | self._download_chapters(ln_url)
1307 | else:
1308 | self._download_light_novel(ln_url)
1309 | elif mode == 'update_all':
1310 | update_manager = UpdateManager()
1311 | update_manager.check_updates()
1312 | else:
1313 | print('Please provide a valid URL or use update mode.')
1314 |
1315 | def _download_light_novel(self, ln_url: str) -> None:
1316 | """
1317 | Download a light novel.
1318 |
1319 | Args:
1320 | ln_url: The light novel URL
1321 | """
1322 | try:
1323 | response = NetworkManager.check_available_request(ln_url)
1324 | soup = BeautifulSoup(response.text, HTML_PARSER)
1325 |
1326 | if not soup.find('section', 'volume-list'):
1327 | print('Invalid url. Please try again.')
1328 | return
1329 |
1330 | # Create light novel object
1331 | ln = self._parse_light_novel(ln_url, soup)
1332 |
1333 | if ln.volumes:
1334 | epub_engine = EpubEngine()
1335 | epub_engine.create_epub(ln)
1336 |
1337 | except requests.RequestException as e:
1338 | logger.error(f'Network error while checking light novel url: {e}')
1339 | print('Error: Network error while checking light novel url!')
1340 | print('-' * LINE_SIZE)
1341 | except Exception as e:
1342 | logger.error(f'Error checking light novel url: {e}')
1343 | print('Error: Cannot check light novel url!')
1344 | print('-' * LINE_SIZE)
1345 |
1346 | def _download_chapters(self, ln_url: str) -> None:
1347 | """
1348 | Download specific chapters of a light novel.
1349 |
1350 | Args:
1351 | ln_url: The light novel URL
1352 | """
1353 | try:
1354 | response = NetworkManager.check_available_request(ln_url)
1355 | soup = BeautifulSoup(response.text, HTML_PARSER)
1356 |
1357 | if not soup.find('section', 'volume-list'):
1358 | print('Invalid url. Please try again.')
1359 | return
1360 |
1361 | # Create light novel object
1362 | ln = self._parse_light_novel(ln_url, soup, 'chapter')
1363 |
1364 | if ln.volumes:
1365 | epub_engine = EpubEngine()
1366 | epub_engine.create_epub(ln)
1367 |
1368 | except requests.RequestException as e:
1369 | logger.error(f'Network error while checking light novel url: {e}')
1370 | print('Error: Network error while checking light novel url!')
1371 | print('-' * LINE_SIZE)
1372 | except Exception as e:
1373 | logger.error(f'Error checking light novel url: {e}')
1374 | print('Error: Cannot check light novel url!')
1375 | print('-' * LINE_SIZE)
1376 |
1377 | def _parse_light_novel(self, ln_url: str, soup: BeautifulSoup, mode: str = '') -> LightNovel:
1378 | """
1379 | Parse light novel information from HTML.
1380 |
1381 | Args:
1382 | ln_url: The light novel URL
1383 | soup: The parsed HTML
1384 | mode: The mode
1385 |
1386 | Returns:
1387 | The light novel object
1388 | """
1389 | ln = LightNovel()
1390 | ln.url = ln_url
1391 |
1392 | # Get name
1393 | name_element = soup.find('span', 'series-name')
1394 | ln.name = TextUtils.format_text(
1395 | name_element.text) if name_element else "Unknown Light Novel"
1396 | OutputFormatter.print_formatted('Novel: ', ln.name)
1397 |
1398 | # Get series info
1399 | series_info = soup.find('div', 'series-information')
1400 | if series_info:
1401 | # Clean up anchor tags
1402 | for a in soup.find_all('a'):
1403 | try:
1404 | del a[':href']
1405 | except KeyError:
1406 | pass
1407 | ln.series_info = str(series_info)
1408 |
1409 | # Extract author
1410 | info_items = series_info.find_all('div', 'info-item')
1411 | if info_items:
1412 | author_div = info_items[0].find(
1413 | 'a') if len(info_items) > 0 else None
1414 | if author_div:
1415 | ln.author = TextUtils.format_text(author_div.text)
1416 | elif len(info_items) > 1:
1417 | author_div = info_items[1].find('a')
1418 | if author_div:
1419 | ln.author = TextUtils.format_text(author_div.text)
1420 |
1421 | # Get summary
1422 | summary_content = soup.find('div', 'summary-content')
1423 | if summary_content:
1424 | ln.summary = 'Tóm tắt
' + str(summary_content)
1425 |
1426 | # Get fact item
1427 | fact_item = soup.find('div', 'fact-item')
1428 | if fact_item:
1429 | ln.fact_item = str(fact_item)
1430 |
1431 | # Get volumes
1432 | volume_sections = soup.find_all('section', 'volume-list')
1433 | ln.num_volumes = len(volume_sections)
1434 |
1435 | if mode == 'chapter':
1436 | # For chapter mode, select a single volume
1437 | volume_titles = []
1438 | for volume_section in volume_sections:
1439 | title_element = volume_section.find('span', 'sect-title')
1440 | if title_element:
1441 | volume_titles.append(
1442 | TextUtils.format_text(title_element.text))
1443 |
1444 | if volume_titles:
1445 | selected_title = questionary.select(
1446 | 'Select volumes to download:', choices=volume_titles, use_shortcuts=True).ask()
1447 |
1448 | if selected_title:
1449 | # Find the selected volume
1450 | for volume_section in volume_sections:
1451 | title_element = volume_section.find(
1452 | 'span', 'sect-title')
1453 | if title_element and TextUtils.format_text(title_element.text) == selected_title:
1454 | volume = self._parse_volume(ln_url, volume_section)
1455 | if volume:
1456 | # For chapter mode, filter chapters
1457 | self._select_chapters(volume)
1458 | ln.volumes.append(volume)
1459 | break
1460 | else:
1461 | # For normal mode, select multiple volumes
1462 | volume_titles = []
1463 | for volume_section in volume_sections:
1464 | title_element = volume_section.find('span', 'sect-title')
1465 | if title_element:
1466 | volume_titles.append(
1467 | TextUtils.format_text(title_element.text))
1468 |
1469 | if volume_titles:
1470 | all_volumes_text = f'All volumes ({len(volume_titles)} volumes)'
1471 | volume_titles.insert(0, questionary.Choice(
1472 | all_volumes_text, checked=True))
1473 |
1474 | selected_titles = questionary.checkbox(
1475 | 'Select volumes to download:', choices=volume_titles).ask()
1476 |
1477 | if selected_titles:
1478 | if all_volumes_text in selected_titles:
1479 | # Download all volumes
1480 | for volume_section in volume_sections:
1481 | volume = self._parse_volume(ln_url, volume_section)
1482 | if volume:
1483 | ln.volumes.append(volume)
1484 | else:
1485 | # Download selected volumes
1486 | selected_titles = [
1487 | title for title in selected_titles if title != all_volumes_text]
1488 | for volume_section in volume_sections:
1489 | title_element = volume_section.find(
1490 | 'span', 'sect-title')
1491 | if title_element and TextUtils.format_text(title_element.text) in selected_titles:
1492 | volume = self._parse_volume(
1493 | ln_url, volume_section)
1494 | if volume:
1495 | ln.volumes.append(volume)
1496 |
1497 | return ln
1498 |
1499 | def _parse_volume(self, ln_url: str, volume_section: BeautifulSoup) -> Optional[Volume]:
1500 | """
1501 | Parse volume information from HTML section.
1502 |
1503 | Args:
1504 | ln_url: The light novel URL
1505 | volume_section: The volume section HTML
1506 |
1507 | Returns:
1508 | The volume object or None if failed
1509 | """
1510 | volume = Volume()
1511 |
1512 | # Get volume name
1513 | name_element = volume_section.find('span', 'sect-title')
1514 | volume.name = TextUtils.format_text(
1515 | name_element.text) if name_element else "Unknown Volume"
1516 |
1517 | # Get volume URL
1518 | cover_element = volume_section.find('div', 'volume-cover')
1519 | if cover_element:
1520 | a_tag = cover_element.find('a')
1521 | if a_tag and a_tag.get('href'):
1522 | volume.url = TextUtils.reformat_url(ln_url, a_tag.get('href'))
1523 |
1524 | # Get volume details
1525 | try:
1526 | response = NetworkManager.check_available_request(
1527 | volume.url)
1528 | soup = BeautifulSoup(response.text, HTML_PARSER)
1529 |
1530 | # Get cover image
1531 | cover_div = soup.find('div', 'series-cover')
1532 | if cover_div:
1533 | img_element = cover_div.find('div', 'img-in-ratio')
1534 | if img_element and img_element.get('style'):
1535 | style = img_element.get('style')
1536 | if len(style) > 25:
1537 | volume.cover_img = style[23:-2]
1538 |
1539 | # Get chapters
1540 | chapter_list = soup.find('ul', 'list-chapters')
1541 | if chapter_list:
1542 | chapter_items = chapter_list.find_all('li')
1543 | volume.num_chapters = len(chapter_items)
1544 |
1545 | for chapter_item in chapter_items:
1546 | a_tag = chapter_item.find('a')
1547 | if a_tag:
1548 | chapter_name = TextUtils.format_text(
1549 | a_tag.text)
1550 | chapter_url = TextUtils.reformat_url(
1551 | volume.url, a_tag.get('href'))
1552 | volume.chapters[chapter_name] = chapter_url
1553 | except Exception as e:
1554 | logger.error(f"Error getting volume details: {e}")
1555 |
1556 | return volume
1557 |
1558 | def _select_chapters(self, volume: Volume) -> None:
1559 | """
1560 | Let user select specific chapters to download.
1561 |
1562 | Args:
1563 | volume: The volume to select chapters from
1564 | """
1565 | if not volume.chapters:
1566 | return
1567 |
1568 | chapter_names = list(volume.chapters.keys())
1569 | from_chapter = questionary.text('Enter from chapter name:').ask()
1570 | to_chapter = questionary.text('Enter to chapter name:').ask()
1571 |
1572 | if from_chapter not in chapter_names or to_chapter not in chapter_names:
1573 | print('Invalid input chapter!')
1574 | volume.chapters = {}
1575 | else:
1576 | from_index = chapter_names.index(from_chapter)
1577 | to_index = chapter_names.index(to_chapter)
1578 |
1579 | if to_index < from_index:
1580 | from_index, to_index = to_index, from_index
1581 |
1582 | selected_names = chapter_names[from_index:to_index+1]
1583 | volume.chapters = {
1584 | name: volume.chapters[name] for name in selected_names
1585 | }
1586 |
1587 |
1588 | def main():
1589 | """Main entry point for the application."""
1590 | parser = argparse.ArgumentParser(
1591 | description='A tool to download light novels from https://ln.hako.vn in epub file format for offline reading.')
1592 | parser.add_argument('-v', '--version', action='version',
1593 | version=f'hako2epub v{TOOL_VERSION}')
1594 | parser.add_argument('ln_url', type=str, nargs='?',
1595 | default='',
1596 | help='url to the light novel page')
1597 | parser.add_argument('-c', '--chapter', type=str, metavar='ln_url',
1598 | help='download specific chapters of a light novel')
1599 | parser.add_argument('-u', '--update', type=str, metavar='ln_url', nargs='?', default=argparse.SUPPRESS,
1600 | help='update all/single light novel')
1601 |
1602 | args = parser.parse_args()
1603 | manager = LightNovelManager()
1604 |
1605 | if args.chapter:
1606 | manager.start(args.chapter, 'chapter')
1607 | elif 'update' in args:
1608 | if args.update:
1609 | manager.start(args.update, 'update')
1610 | else:
1611 | manager.start('', 'update_all')
1612 | else:
1613 | manager.start(args.ln_url, 'default')
1614 |
1615 |
1616 | if __name__ == '__main__':
1617 | main()
1618 |
--------------------------------------------------------------------------------