├── images ├── demo.png └── logo.png ├── .github └── ISSUE_TEMPLATE │ ├── feature_request.md │ └── bug_report.md ├── LICENSE ├── README.md └── hako2epub.py /images/demo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/quantrancse/hako2epub/HEAD/images/demo.png -------------------------------------------------------------------------------- /images/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/quantrancse/hako2epub/HEAD/images/logo.png -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Feature request 3 | about: Suggest an idea for this project 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Is your feature request related to a problem? Please describe.** 11 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 12 | 13 | **Describe the solution you'd like** 14 | A clear and concise description of what you want to happen. 15 | 16 | **Describe alternatives you've considered** 17 | A clear and concise description of any alternative solutions or features you've considered. 18 | 19 | **Additional context** 20 | Add any other context or screenshots about the feature request here. 21 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Describe the bug** 11 | A clear and concise description of what the bug is. 12 | 13 | **To Reproduce** 14 | Steps to reproduce the behavior: 15 | 1. Go to '...' 16 | 2. Click on '....' 17 | 3. Scroll down to '....' 18 | 4. See error 19 | 20 | **Expected behavior** 21 | A clear and concise description of what you expected to happen. 22 | 23 | **Screenshots** 24 | If applicable, add screenshots to help explain your problem. 25 | 26 | **Desktop (please complete the following information):** 27 | - OS: [e.g. iOS] 28 | - Version [e.g. 22] 29 | 30 | **Additional context** 31 | Add any other context about the problem here. 32 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 Tran Trung Quan 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 |
3 |

4 | Logo 5 | 6 |

hako2epub

7 | 8 |

9 | A tool to download light novels from ln.hako.vn in epub file format for offline reading. 10 |
11 |
12 | Download 13 | · 14 | Screenshots 15 | · 16 | Script Usage 17 |

18 |

19 | 20 | ## Notes 21 | It's recommended to use [**1.1.1.1 Cloudflare WARP**](https://one.one.one.one/) when downloading novels for better performance and reliability. 22 | 23 | 24 | ## Table of Contents 25 | 26 | - [Table of Contents](#table-of-contents) 27 | - [About The Project](#about-the-project) 28 | - [Features](#features) 29 | - [Getting Started](#getting-started) 30 | - [Prerequisites](#prerequisites) 31 | - [Usage](#usage) 32 | - [Notes](#notes) 33 | - [Screenshots](#screenshots) 34 | - [Issues](#issues) 35 | - [Contributing](#contributing) 36 | - [License](#license) 37 | - [Contact](#contact) 38 | - [Acknowledgements](#acknowledgements) 39 | 40 | 41 | ## About The Project 42 | 43 | A tool to download light novels from [ln.hako.vn](https://ln.hako.vn) in epub file format for offline reading. 44 | 45 | **_Notes:_** 46 | * _This tool is a personal standalone project, it does not have any related to [ln.hako.vn](https://ln.hako.vn) administrators._ 47 | * _If possible, please support the original light novel, hako website, and light novel translation authors._ 48 | * _This tool is for non-commercial purpose only._ 49 | 50 | ### Features 51 | * Working with [docln.net](https://docln.net/) and [docln.sbs](https://docln.sbs/). 52 | * Auto check and switch to working URL. 53 | * Support all kind of novels (Truyện dịch, Sáng tác, AI Dịch). 54 | * Support images. 55 | * Support navigation and table of contents. 56 | * Notes are shown directly in the light novel content. 57 | * Download all/single volume of a light novel. 58 | * Download specific chapters of a light novel. 59 | * Update all/single downloaded light novel. 60 | * Update new volumes. 61 | * Update new chapters. 62 | * Support multiprocessing to speed up. 63 | * Auto get current downloaded light novel in the directory. 64 | * Auto checking the new tool version. 65 | 66 | 67 | ## Getting Started 68 | 69 | For normal user, download the execution file below. Run and follow the instructions. 70 | 71 | **Windows**: [hako2epub.exe](https://github.com/quantrancse/hako2epub/releases/download/v2.0.6/hako2epub.exe) 72 | 73 | ### Prerequisites 74 | 75 | * python >= 3.9 76 | * ebooklib 77 | * requests 78 | * bs4 79 | * pillow 80 | * tqdm 81 | * questionary 82 | * argparse 83 | ```sh 84 | pip install ebooklib requests bs4 pillow argparse tqdm questionary 85 | ``` 86 | 87 | ### Usage 88 | ```text 89 | usage: hako2epub.py [-h] [-v] [-c ln_url] [-u [ln_url]] [ln_url] 90 | 91 | A tool to download light novels from https://ln.hako.vn in epub file format for offline reading. 92 | 93 | positional arguments: 94 | ln_url url to the light novel page 95 | 96 | options: 97 | -h, --help show this help message and exit 98 | -v, --version show program's version number and exit 99 | -c, --chapter ln_url download specific chapters of a light novel 100 | -u, --update [ln_url] update all/single light novel 101 | ``` 102 | * Download a light novel 103 | ```sh 104 | python hako2epub.py light_novel_url 105 | ``` 106 | * Download specific chapters of light novel 107 | ```sh 108 | python hako2epub.py -c light_novel_url 109 | ``` 110 | * Update all downloaded light novels 111 | ```sh 112 | python hako2epub.py -u 113 | ``` 114 | * Update a single downloaded light novel 115 | ```sh 116 | python hako2epub.py -u light_novel_url 117 | ``` 118 | ### Notes 119 | * After processing 190 requests at a time, the program will pause for 120 seconds (2 mins) to avoid spam blocking. Please be patient if it hangs. 120 | * Light novel will be downloaded into the same folder as the program. 121 | * Downloaded information will be saved into `ln_info.json` file located in the same folder as the program. 122 | * If you download specific chapters of a light novel, please enter the full name of the chapter in the "from ... to ..." prompt. 123 | * If you update the volume which contains specific chapters, only new chapters after the current latest chapter will be added. 124 | * Try to keep the program and `ln_info.json` file at the same folder with your downloaded light novels for efficiently management. 125 | 126 | ## Screenshots 127 | ![Demo](images/demo.png) 128 | 129 | 130 | ## Issues 131 | 132 | * I only tested on some of my favorite light novels. 133 | * Sometimes the tool can not get images from some image hosts. 134 | * Sometimes you have to wait (most cases are under 10 seconds) to download or update the light novels (maybe only the first light novel in the list). If you are over that time, you should use a VPN (1.1.1.1 Cloudflare WARP) to avoid this. 135 | * If you update the light novel that was renamed, it will download the whole light novel again. To avoid this, please manually rename the path of the epub file to the new light novel name exactly like the current name format. Also rename the light novel in the `ln_info.json` file. 136 | 137 | 138 | ## Contributing 139 | 140 | Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are **greatly appreciated**. 141 | 142 | 1. Fork the Project 143 | 2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`) 144 | 3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`) 145 | 4. Push to the Branch (`git push origin feature/AmazingFeature`) 146 | 5. Open a Pull Request 147 | 148 | 149 | ## License 150 | 151 | Distributed under the MIT License. See [LICENSE][license-url] for more information. 152 | 153 | 154 | ## Contact 155 | 156 | * **Author** - [@quantrancse](https://quantrancse.github.io) 157 | 158 | 159 | ## Acknowledgements 160 | * [EbookLib](https://github.com/aerkalov/ebooklib) 161 | -------------------------------------------------------------------------------- /hako2epub.py: -------------------------------------------------------------------------------- 1 | """ 2 | hako2epub - A tool to download light novels from ln.hako.vn in EPUB format. 3 | 4 | This tool allows users to download light novels, specific chapters, and update existing downloads. 5 | 6 | Features: 7 | - Download all/single volume of a light novel 8 | - Download specific chapters of a light novel 9 | - Update all/single downloaded light novel 10 | - Support images and navigation 11 | - Support multiprocessing to speed up downloads 12 | """ 13 | 14 | import argparse 15 | import json 16 | import re 17 | import time 18 | import logging 19 | from io import BytesIO 20 | from multiprocessing.dummy import Pool as ThreadPool 21 | from os import mkdir 22 | from os.path import isdir, isfile, join 23 | from typing import Dict, List, Optional, Tuple, Any 24 | from dataclasses import dataclass, field 25 | 26 | import questionary 27 | import requests 28 | import tqdm 29 | from bs4 import BeautifulSoup 30 | from ebooklib import epub 31 | from PIL import Image 32 | 33 | # Configure logging 34 | logging.basicConfig( 35 | level=logging.INFO, 36 | format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' 37 | ) 38 | logger = logging.getLogger(__name__) 39 | 40 | # Constants 41 | DOMAINS = ['ln.hako.vn', 'docln.net', 'docln.sbs'] 42 | SLEEP_TIME = 30 43 | LINE_SIZE = 80 44 | THREAD_NUM = 8 45 | HEADERS = { 46 | 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.97 Safari/537.36' 47 | } 48 | TOOL_VERSION = '2.0.6' 49 | HTML_PARSER = 'html.parser' 50 | 51 | # Session for requests 52 | session = requests.Session() 53 | 54 | 55 | @dataclass 56 | class Chapter: 57 | """Represents a chapter in a light novel.""" 58 | name: str 59 | url: str 60 | 61 | 62 | @dataclass 63 | class Volume: 64 | """Represents a volume in a light novel.""" 65 | url: str = '' 66 | name: str = '' 67 | cover_img: str = '' 68 | num_chapters: int = 0 69 | chapters: Dict[str, str] = field(default_factory=dict) # name -> url 70 | 71 | 72 | @dataclass 73 | class LightNovel: 74 | """Represents a light novel with all its information.""" 75 | name: str = '' 76 | url: str = '' 77 | num_volumes: int = 0 78 | author: str = '' 79 | summary: str = '' 80 | series_info: str = '' 81 | fact_item: str = '' 82 | volumes: List[Volume] = field(default_factory=list) 83 | 84 | 85 | class ColorCodes: 86 | """ANSI color codes for terminal output.""" 87 | HEADER = '\033[95m' 88 | OKBLUE = '\03[94m' 89 | OKCYAN = '\033[96m' 90 | OKGREEN = '\033[92m' 91 | OKORANGE = '\033[93m' 92 | FAIL = '\03[91m' 93 | ENDC = '\033[0m' 94 | BOLD = '\033[1m' 95 | UNDERLINE = '\033[4m' 96 | 97 | 98 | class NetworkManager: 99 | """Handles network requests with retry logic.""" 100 | 101 | @staticmethod 102 | def check_available_request(url: str, stream: bool = False) -> requests.Response: 103 | """ 104 | Check if a request to the given URL is available and handle retries. 105 | 106 | Args: 107 | url: The URL to request 108 | stream: Whether to stream the response 109 | 110 | Returns: 111 | The response object 112 | 113 | Raises: 114 | requests.RequestException: If the request fails after retries 115 | """ 116 | if not url.startswith("http"): 117 | url = "https://" + url 118 | 119 | # Try each domain in order until one works 120 | original_url = url 121 | domains_to_try = DOMAINS[:] if DOMAINS else ["ln.hako.vn"] 122 | 123 | # Extract path from URL for domain replacement 124 | path = url 125 | for domain in DOMAINS: 126 | if f"https://{domain}" in url: 127 | path = url.split(f"https://{domain}", 1)[1] 128 | break 129 | elif f"http://{domain}" in url: 130 | path = url.split(f"http://{domain}", 1)[1] 131 | break 132 | 133 | last_exception = None 134 | 135 | # Try each domain 136 | for domain in domains_to_try: 137 | # Construct URL with current domain 138 | if any(f"https://{old_domain}" in original_url or f"http://{old_domain}" in original_url for old_domain in DOMAINS): 139 | url = f"https://{domain}{path}" 140 | else: 141 | url = original_url 142 | 143 | # Update headers with referer 144 | headers = HEADERS.copy() 145 | headers['Referer'] = f'https://{domain}' 146 | 147 | retry_count = 0 148 | max_retries = 3 149 | while retry_count < max_retries: 150 | try: 151 | response = session.get( 152 | url, stream=stream, headers=headers, timeout=30) 153 | if response.status_code in range(200, 299): 154 | return response 155 | elif response.status_code == 404: 156 | # Don't retry on 404 157 | break 158 | else: 159 | # Retry on other status codes 160 | retry_count += 1 161 | if retry_count < max_retries: 162 | logger.debug( 163 | f"Request to {url} failed with status {response.status_code}. " 164 | f"Retrying in {SLEEP_TIME}s... (Attempt {retry_count}/{max_retries})" 165 | ) 166 | time.sleep(SLEEP_TIME) 167 | except requests.RequestException as e: 168 | retry_count += 1 169 | last_exception = e 170 | if retry_count < max_retries: 171 | logger.debug( 172 | f"Request to {url} failed with exception: {e}. " 173 | f"Retrying in {SLEEP_TIME}s... (Attempt {retry_count}/{max_retries})" 174 | ) 175 | time.sleep(SLEEP_TIME) 176 | 177 | # If we get here, this domain failed. Try the next one. 178 | logger.debug(f"Domain {domain} failed, trying next domain...") 179 | 180 | # If all domains failed, raise the last exception 181 | if last_exception: 182 | raise last_exception 183 | else: 184 | # Create a generic exception if we don't have one 185 | raise requests.RequestException( 186 | f"Failed to get response from {original_url} using any domain") 187 | 188 | 189 | class TextUtils: 190 | """Utility functions for text processing.""" 191 | 192 | @staticmethod 193 | def format_text(text: str) -> str: 194 | """ 195 | Format text by stripping and replacing newlines. 196 | 197 | Args: 198 | text: The text to format 199 | 200 | Returns: 201 | The formatted text 202 | """ 203 | return text.strip().replace('\n', '') 204 | 205 | @staticmethod 206 | def format_filename(name: str) -> str: 207 | """ 208 | Format filename by removing special characters and limiting length. 209 | 210 | Args: 211 | name: The name to format 212 | 213 | Returns: 214 | The formatted filename 215 | """ 216 | special_chars = ['?', '!', '.', ':', '\\', 217 | '/', '<', '>', '|', '*', '"', ','] 218 | for char in special_chars: 219 | name = name.replace(char, '') 220 | name = name.replace(' ', '-') 221 | if len(name) > 100: 222 | name = name[:100] 223 | return name 224 | 225 | @staticmethod 226 | def reformat_url(base_url: str, url: str) -> str: 227 | """ 228 | Reformat URL to use the primary domain. 229 | 230 | Args: 231 | base_url: The base URL 232 | url: The URL to reformat 233 | 234 | Returns: 235 | The reformatted URL 236 | """ 237 | # Extract domain from base_url 238 | domain = DOMAINS[0] if DOMAINS else "ln.hako.vn" 239 | 240 | # If URL already starts with a domain, replace it with the primary domain 241 | if url.startswith("/"): 242 | return domain + url 243 | else: 244 | # Handle full URLs by replacing the domain 245 | for old_domain in DOMAINS: 246 | if url.startswith(f"https://{old_domain}") or url.startswith(f"http://{old_domain}"): 247 | path = url.split(old_domain, 1)[1] 248 | return f"https://{domain}{path}" 249 | # If no known domain found, just return the URL as is 250 | return url 251 | 252 | 253 | class ImageManager: 254 | """Handles image processing and downloading.""" 255 | 256 | @staticmethod 257 | def get_image(image_url: str) -> Optional[Image.Image]: 258 | """ 259 | Get image from URL. 260 | 261 | Args: 262 | image_url: The image URL 263 | 264 | Returns: 265 | The image object or None if failed 266 | """ 267 | if 'imgur.com' in image_url and '.' not in image_url[-5:]: 268 | image_url += '.jpg' 269 | 270 | try: 271 | response = NetworkManager.check_available_request( 272 | image_url, stream=True) 273 | image = Image.open(response.raw).convert('RGB') 274 | return image 275 | except Exception as e: 276 | logger.error(f"Cannot get image: {image_url} - Error: {e}") 277 | return None 278 | 279 | 280 | class OutputFormatter: 281 | """Handles formatted output to the terminal.""" 282 | 283 | @staticmethod 284 | def print_formatted(name: str = '', info: str = '', info_style: str = 'bold fg:orange', prefix: str = '! ') -> None: 285 | """ 286 | Print formatted output using questionary. 287 | 288 | Args: 289 | name: The name to print 290 | info: The info to print 291 | info_style: The style for the info 292 | prefix: The prefix for the output 293 | """ 294 | questionary.print(prefix, style='bold fg:gray', end='') 295 | questionary.print(name, style='bold fg:white', end='') 296 | questionary.print(info, style=info_style) 297 | 298 | @staticmethod 299 | def print_success(message: str, item_name: str = '') -> None: 300 | """Print a success message.""" 301 | if item_name: 302 | print( 303 | f'{message} {ColorCodes.OKCYAN}{item_name}{ColorCodes.ENDC}: [{ColorCodes.OKGREEN} DONE {ColorCodes.ENDC}]') 304 | else: 305 | print(f'{message}: [{ColorCodes.OKGREEN} DONE {ColorCodes.ENDC}]') 306 | 307 | @staticmethod 308 | def print_error(message: str, item_name: str = '') -> None: 309 | """Print an error message.""" 310 | if item_name: 311 | print( 312 | f'{message} {ColorCodes.OKCYAN}{item_name}{ColorCodes.ENDC}: [{ColorCodes.FAIL} FAIL {ColorCodes.ENDC}]') 313 | else: 314 | print(f'{message}: [{ColorCodes.FAIL} FAIL {ColorCodes.ENDC}]') 315 | 316 | 317 | class UpdateManager: 318 | """Handles updating of existing light novels.""" 319 | 320 | def __init__(self, json_file: str = 'ln_info.json'): 321 | self.json_file = json_file 322 | 323 | def check_updates(self, ln_url: str = 'all') -> None: 324 | """ 325 | Check for updates for light novels. 326 | 327 | Args: 328 | ln_url: The light novel URL or 'all' for all novels 329 | """ 330 | try: 331 | if not isfile(self.json_file): 332 | logger.warning('Cannot find ln_info.json file!') 333 | return 334 | 335 | with open(self.json_file, 'r', encoding='utf-8') as file: 336 | data = json.load(file) 337 | 338 | ln_list = data.get('ln_list', []) 339 | for ln_data in ln_list: 340 | if ln_url == 'all': 341 | self._check_update_single(ln_data) 342 | elif ln_url == ln_data.get('ln_url'): 343 | self._check_update_single(ln_data, 'updatevol') 344 | 345 | except FileNotFoundError: 346 | logger.error('ln_info.json file not found!') 347 | except json.JSONDecodeError as e: 348 | logger.error(f'Error parsing ln_info.json: {e}') 349 | except Exception as e: 350 | logger.error(f'Error processing ln_info.json: {e}') 351 | 352 | def _check_update_single(self, ln_data: Dict[str, Any], mode: str = '') -> None: 353 | """ 354 | Check for updates for a single light novel. 355 | 356 | Args: 357 | ln_data: The light novel data 358 | mode: The update mode 359 | """ 360 | ln_name = ln_data.get('ln_name', 'Unknown') 361 | OutputFormatter.print_formatted('Checking update: ', ln_name) 362 | ln_url = ln_data.get('ln_url') 363 | 364 | try: 365 | response = NetworkManager.check_available_request(ln_url) 366 | soup = BeautifulSoup(response.text, HTML_PARSER) 367 | 368 | # Create new light novel object with updated info 369 | new_ln = self._get_updated_ln_info(ln_url, soup) 370 | 371 | if mode == 'updatevol': 372 | self._update_volumes(ln_data, new_ln) 373 | else: 374 | self._update_light_novel(ln_data, new_ln) 375 | 376 | OutputFormatter.print_success('Update', ln_name) 377 | print('-' * LINE_SIZE) 378 | 379 | except requests.RequestException as e: 380 | logger.error(f'Network error while checking light novel info: {e}') 381 | OutputFormatter.print_error('Update', ln_name) 382 | print('Error: Network error while checking light novel info!') 383 | print('-' * LINE_SIZE) 384 | except Exception as e: 385 | logger.error(f'Error checking light novel info: {e}') 386 | OutputFormatter.print_error('Update', ln_name) 387 | print('Error: Cannot check light novel info!') 388 | print('-' * LINE_SIZE) 389 | 390 | def _get_updated_ln_info(self, ln_url: str, soup: BeautifulSoup) -> LightNovel: 391 | """ 392 | Get updated light novel information. 393 | 394 | Args: 395 | ln_url: The light novel URL 396 | soup: The parsed HTML 397 | 398 | Returns: 399 | Updated light novel information 400 | """ 401 | ln = LightNovel() 402 | ln.url = ln_url 403 | 404 | # Get name 405 | name_element = soup.find('span', 'series-name') 406 | ln.name = TextUtils.format_text( 407 | name_element.text) if name_element else "Unknown Light Novel" 408 | 409 | # Get series info 410 | series_info = soup.find('div', 'series-information') 411 | if series_info: 412 | # Clean up anchor tags 413 | for a in soup.find_all('a'): 414 | try: 415 | del a[':href'] 416 | except KeyError: 417 | pass 418 | ln.series_info = str(series_info) 419 | 420 | # Extract author 421 | info_items = series_info.find_all('div', 'info-item') 422 | if info_items: 423 | author_div = info_items[0].find( 424 | 'a') if len(info_items) > 0 else None 425 | if author_div: 426 | ln.author = TextUtils.format_text(author_div.text) 427 | elif len(info_items) > 1: 428 | author_div = info_items[1].find('a') 429 | if author_div: 430 | ln.author = TextUtils.format_text(author_div.text) 431 | 432 | # Get summary 433 | summary_content = soup.find('div', 'summary-content') 434 | if summary_content: 435 | ln.summary = '

Tóm tắt

' + str(summary_content) 436 | 437 | # Get fact item 438 | fact_item = soup.find('div', 'fact-item') 439 | if fact_item: 440 | ln.fact_item = str(fact_item) 441 | 442 | # Get volumes 443 | volume_sections = soup.find_all('section', 'volume-list') 444 | ln.num_volumes = len(volume_sections) 445 | 446 | for volume_section in volume_sections: 447 | volume = Volume() 448 | 449 | # Get volume name 450 | name_element = volume_section.find('span', 'sect-title') 451 | volume.name = TextUtils.format_text( 452 | name_element.text) if name_element else "Unknown Volume" 453 | 454 | # Get volume URL 455 | cover_element = volume_section.find('div', 'volume-cover') 456 | if cover_element: 457 | a_tag = cover_element.find('a') 458 | if a_tag and a_tag.get('href'): 459 | volume.url = TextUtils.reformat_url( 460 | ln_url, a_tag.get('href')) 461 | 462 | # Get volume details 463 | try: 464 | vol_response = NetworkManager.check_available_request( 465 | volume.url) 466 | vol_soup = BeautifulSoup( 467 | vol_response.text, HTML_PARSER) 468 | 469 | # Get cover image 470 | cover_element = vol_soup.find('div', 'series-cover') 471 | if cover_element: 472 | img_element = cover_element.find( 473 | 'div', 'img-in-ratio') 474 | if img_element and img_element.get('style'): 475 | style = img_element.get('style') 476 | if len(style) > 25: 477 | volume.cover_img = style[23:-2] 478 | 479 | # Get chapters 480 | chapter_list_element = vol_soup.find( 481 | 'ul', 'list-chapters') 482 | if chapter_list_element: 483 | chapter_items = chapter_list_element.find_all('li') 484 | volume.num_chapters = len(chapter_items) 485 | 486 | for chapter_item in chapter_items: 487 | a_tag = chapter_item.find('a') 488 | if a_tag: 489 | chapter_name = TextUtils.format_text( 490 | a_tag.text) 491 | chapter_url = TextUtils.reformat_url( 492 | volume.url, a_tag.get('href')) 493 | volume.chapters[chapter_name] = chapter_url 494 | except Exception as e: 495 | logger.error(f"Error getting volume details: {e}") 496 | 497 | ln.volumes.append(volume) 498 | 499 | return ln 500 | 501 | def _update_volumes(self, old_ln: Dict[str, Any], new_ln: LightNovel) -> None: 502 | """ 503 | Update volumes for a light novel. 504 | 505 | Args: 506 | old_ln: The old light novel data 507 | new_ln: The new light novel data 508 | """ 509 | old_volume_names = [vol.get('vol_name') 510 | for vol in old_ln.get('vol_list', [])] 511 | new_volume_names = [vol.name for vol in new_ln.volumes] 512 | 513 | existed_prefix = 'Existed: ' 514 | new_prefix = 'New: ' 515 | 516 | volume_titles = [existed_prefix + name for name in old_volume_names] 517 | all_existed_volumes = f'All existed volumes ({len(old_volume_names)} volumes)' 518 | 519 | all_volumes = '' 520 | 521 | if old_volume_names != new_volume_names: 522 | new_volume_titles = [ 523 | new_prefix + name for name in new_volume_names if name not in old_volume_names] 524 | volume_titles += new_volume_titles 525 | all_volumes = f'All volumes ({len(volume_titles)} volumes)' 526 | volume_titles.insert(0, all_existed_volumes) 527 | volume_titles.insert( 528 | 0, questionary.Choice(all_volumes, checked=True)) 529 | else: 530 | volume_titles.insert(0, questionary.Choice( 531 | all_existed_volumes, checked=True)) 532 | 533 | selected_volumes = questionary.checkbox( 534 | 'Select volumes to update:', choices=volume_titles).ask() 535 | 536 | if selected_volumes: 537 | if all_volumes in selected_volumes: 538 | self._update_light_novel(old_ln, new_ln) 539 | elif all_existed_volumes in selected_volumes: 540 | for volume in new_ln.volumes: 541 | if volume.name in old_volume_names: 542 | self._update_chapters(new_ln, volume, old_ln) 543 | else: 544 | new_volume_names_selected = [ 545 | vol[len(new_prefix):] for vol in selected_volumes if new_prefix in vol] 546 | old_volume_names_selected = [ 547 | vol[len(existed_prefix):] for vol in selected_volumes if existed_prefix in vol] 548 | 549 | for volume in new_ln.volumes: 550 | if volume.name in old_volume_names_selected: 551 | self._update_chapters(new_ln, volume, old_ln) 552 | elif volume.name in new_volume_names_selected: 553 | self._update_new_volume(new_ln, volume) 554 | 555 | def _update_light_novel(self, old_ln: Dict[str, Any], new_ln: LightNovel) -> None: 556 | """ 557 | Update a light novel. 558 | 559 | Args: 560 | old_ln: The old light novel data 561 | new_ln: The new light novel data 562 | """ 563 | old_volume_names = [vol.get('vol_name') 564 | for vol in old_ln.get('vol_list', [])] 565 | 566 | for volume in new_ln.volumes: 567 | if volume.name not in old_volume_names: 568 | self._update_new_volume(new_ln, volume) 569 | else: 570 | self._update_chapters(new_ln, volume, old_ln) 571 | 572 | def _update_new_volume(self, ln: LightNovel, volume: Volume) -> None: 573 | """ 574 | Update a new volume. 575 | 576 | Args: 577 | ln: The light novel data 578 | volume: The volume to update 579 | """ 580 | OutputFormatter.print_formatted( 581 | 'Updating volume: ', volume.name, info_style='bold fg:cyan') 582 | 583 | # Create a temporary light novel with just this volume 584 | temp_ln = LightNovel( 585 | name=ln.name, 586 | url=ln.url, 587 | author=ln.author, 588 | summary=ln.summary, 589 | series_info=ln.series_info, 590 | fact_item=ln.fact_item, 591 | volumes=[volume] 592 | ) 593 | 594 | epub_engine = EpubEngine() 595 | epub_engine.create_epub(temp_ln) 596 | OutputFormatter.print_success('Updating volume', volume.name) 597 | print('-' * LINE_SIZE) 598 | 599 | def _update_chapters(self, new_ln: LightNovel, volume: Volume, old_ln: Dict[str, Any]) -> None: 600 | """ 601 | Update new chapters in a volume. 602 | 603 | Args: 604 | new_ln: The new light novel data 605 | volume: The volume to update 606 | old_ln: The old light novel data 607 | """ 608 | OutputFormatter.print_formatted( 609 | 'Checking volume: ', volume.name, info_style='bold fg:cyan') 610 | 611 | for old_volume in old_ln.get('vol_list', []): 612 | if volume.name == old_volume.get('vol_name'): 613 | new_chapter_names = list(volume.chapters.keys()) 614 | old_chapter_names = old_volume.get('chapter_list', []) 615 | volume_chapter_names = [] 616 | 617 | for i in range(len(old_chapter_names)): 618 | if old_chapter_names[i] in new_chapter_names: 619 | volume_chapter_names = new_chapter_names[new_chapter_names.index( 620 | old_chapter_names[i]):] 621 | break 622 | 623 | # Remove chapters that already exist or are not in the update range 624 | for chapter_name in list(volume.chapters.keys()): 625 | if chapter_name in old_chapter_names or chapter_name not in volume_chapter_names: 626 | volume.chapters.pop(chapter_name, None) 627 | 628 | if volume.chapters: 629 | OutputFormatter.print_formatted( 630 | 'Updating volume: ', volume.name, info_style='bold fg:cyan') 631 | epub_engine = EpubEngine() 632 | epub_engine.update_epub(new_ln, volume) 633 | OutputFormatter.print_success('Updating', volume.name) 634 | 635 | OutputFormatter.print_success('Checking volume', volume.name) 636 | print('-' * LINE_SIZE) 637 | 638 | def update_json(self, ln: LightNovel) -> None: 639 | """ 640 | Update the JSON file with light novel information. 641 | 642 | Args: 643 | ln: The light novel data 644 | """ 645 | try: 646 | print('Updating ln_info.json...', end='\r') 647 | 648 | if not isfile(self.json_file): 649 | self._create_json(ln) 650 | return 651 | 652 | with open(self.json_file, 'r', encoding='utf-8') as file: 653 | data = json.load(file) 654 | 655 | ln_urls = [item.get('ln_url') for item in data.get('ln_list', [])] 656 | 657 | if ln.url not in ln_urls: 658 | # Add new light novel 659 | new_ln_data = { 660 | 'ln_name': ln.name, 661 | 'ln_url': ln.url, 662 | 'num_vol': ln.num_volumes, 663 | 'vol_list': [{ 664 | 'vol_name': volume.name, 665 | 'num_chapter': volume.num_chapters, 666 | 'chapter_list': list(volume.chapters.keys()) 667 | } for volume in ln.volumes] 668 | } 669 | data['ln_list'].append(new_ln_data) 670 | else: 671 | # Update existing light novel 672 | for i, ln_item in enumerate(data.get('ln_list', [])): 673 | if ln.url == ln_item.get('ln_url'): 674 | if ln.name != ln_item.get('ln_name'): 675 | data['ln_list'][i]['ln_name'] = ln.name 676 | 677 | existing_volume_names = [ 678 | vol.get('vol_name') for vol in ln_item.get('vol_list', [])] 679 | 680 | for volume in ln.volumes: 681 | if volume.name not in existing_volume_names: 682 | # Add new volume 683 | new_volume = { 684 | 'vol_name': volume.name, 685 | 'num_chapter': volume.num_chapters, 686 | 'chapter_list': list(volume.chapters.keys()) 687 | } 688 | data['ln_list'][i]['vol_list'].append( 689 | new_volume) 690 | else: 691 | # Update existing volume chapters 692 | for j, vol_item in enumerate(ln_item.get('vol_list', [])): 693 | if volume.name == vol_item.get('vol_name'): 694 | for chapter_name in volume.chapters.keys(): 695 | if chapter_name not in vol_item.get('chapter_list', []): 696 | data['ln_list'][i]['vol_list'][j]['chapter_list'].append( 697 | chapter_name) 698 | 699 | with open(self.json_file, 'w', encoding='utf-8') as file: 700 | json.dump(data, file, indent=4, ensure_ascii=False) 701 | 702 | OutputFormatter.print_success('Updating ln_info.json') 703 | print('-' * LINE_SIZE) 704 | 705 | except FileNotFoundError: 706 | logger.error('ln_info.json file not found!') 707 | OutputFormatter.print_error('Updating ln_info.json') 708 | print('Error: ln_info.json file not found!') 709 | print('-' * LINE_SIZE) 710 | except json.JSONDecodeError as e: 711 | logger.error(f'Error parsing ln_info.json: {e}') 712 | OutputFormatter.print_error('Updating ln_info.json') 713 | print('Error: Invalid JSON in ln_info.json!') 714 | print('-' * LINE_SIZE) 715 | except Exception as e: 716 | logger.error(f'Error updating ln_info.json: {e}') 717 | OutputFormatter.print_error('Updating ln_info.json') 718 | print('Error: Cannot update ln_info.json!') 719 | print('-' * LINE_SIZE) 720 | 721 | def _create_json(self, ln: LightNovel) -> None: 722 | """ 723 | Create a new JSON file with light novel information. 724 | 725 | Args: 726 | ln: The light novel data 727 | """ 728 | try: 729 | print('Creating ln_info.json...', end='\r') 730 | 731 | data = { 732 | 'ln_list': [{ 733 | 'ln_name': ln.name, 734 | 'ln_url': ln.url, 735 | 'num_vol': ln.num_volumes, 736 | 'vol_list': [{ 737 | 'vol_name': volume.name, 738 | 'num_chapter': volume.num_chapters, 739 | 'chapter_list': list(volume.chapters.keys()) 740 | } for volume in ln.volumes] 741 | }] 742 | } 743 | 744 | with open(self.json_file, 'w', encoding='utf-8') as file: 745 | json.dump(data, file, indent=4, ensure_ascii=False) 746 | 747 | OutputFormatter.print_success('Creating ln_info.json') 748 | print('-' * LINE_SIZE) 749 | except Exception as e: 750 | logger.error(f'Error creating ln_info.json: {e}') 751 | OutputFormatter.print_error('Creating ln_info.json') 752 | print('Error: Cannot create ln_info.json!') 753 | print('-' * LINE_SIZE) 754 | 755 | 756 | class EpubEngine: 757 | """Class for creating and managing EPUB files.""" 758 | 759 | def __init__(self, json_file: str = 'ln_info.json'): 760 | self.json_file = json_file 761 | self.book = None 762 | self.light_novel = None 763 | self.volume = None 764 | 765 | def make_cover_image(self) -> Optional[epub.EpubItem]: 766 | """ 767 | Create a cover image for the EPUB. 768 | 769 | Returns: 770 | The cover image item or None if failed 771 | """ 772 | try: 773 | print('Making cover image...', end='\r') 774 | image = ImageManager.get_image(self.volume.cover_img) 775 | if image is None: 776 | raise Exception("Failed to get cover image") 777 | 778 | buffer = BytesIO() 779 | image.save(buffer, 'jpeg') 780 | image_data = buffer.getvalue() 781 | 782 | cover_image = epub.EpubItem( 783 | file_name='cover_image.jpeg', 784 | media_type='image/jpeg', 785 | content=image_data 786 | ) 787 | OutputFormatter.print_success('Making cover image') 788 | return cover_image 789 | except Exception as e: 790 | logger.error(f'Error making cover image: {e}') 791 | OutputFormatter.print_error('Making cover image') 792 | print('Error: Cannot get cover image!') 793 | print('-' * LINE_SIZE) 794 | return None 795 | 796 | def set_metadata(self, title: str, author: str, lang: str = 'vi') -> None: 797 | """ 798 | Set metadata for the EPUB book. 799 | 800 | Args: 801 | title: The book title 802 | author: The book author 803 | lang: The book language 804 | """ 805 | self.book.set_title(title) 806 | self.book.set_language(lang) 807 | self.book.add_author(author) 808 | 809 | def make_intro_page(self) -> epub.EpubHtml: 810 | """ 811 | Create an introduction page for the EPUB. 812 | 813 | Returns: 814 | The introduction page 815 | """ 816 | print('Making intro page...', end='\r') 817 | github_url = 'https://github.com/quantrancse/hako2epub' 818 | 819 | intro_html = '
' 820 | 821 | cover_image = self.make_cover_image() 822 | if cover_image: 823 | self.book.add_item(cover_image) 824 | intro_html += f'' 825 | 826 | intro_html += f''' 827 |
828 |

{self.light_novel.name}

829 |

{self.volume.name}

830 |
831 | ''' 832 | 833 | intro_html += self.light_novel.series_info 834 | intro_html += self.light_novel.fact_item 835 | intro_html += '
' 836 | 837 | if ':class' in intro_html: 838 | intro_html = intro_html.replace( 839 | '"":class="{ \'fade-in\': more }" ""', '') 840 | 841 | OutputFormatter.print_success('Making intro page') 842 | return epub.EpubHtml( 843 | uid='intro', 844 | file_name='intro.xhtml', 845 | title='Intro', 846 | content=intro_html, 847 | ) 848 | 849 | def make_chapters(self, start_index: int = 0) -> None: 850 | """ 851 | Create chapters for the EPUB. 852 | 853 | Args: 854 | start_index: Starting chapter index 855 | """ 856 | chapter_data = [] 857 | for i, (name, url) in enumerate(self.volume.chapters.items(), start_index): 858 | chapter_data.append((i, name, url)) 859 | 860 | pool = ThreadPool(THREAD_NUM) 861 | contents = [] 862 | try: 863 | print( 864 | '[THE PROCESS WILL BE PAUSE WHEN IT GETTING BLOCK. PLEASE BE PATIENT IF IT HANGS]') 865 | contents = list(tqdm.tqdm(pool.imap_unordered(self._make_chapter_content, chapter_data), 866 | total=len(chapter_data), 867 | desc='Making chapter contents: ')) 868 | contents.sort(key=lambda x: x[0]) 869 | contents = [content[1] for content in contents if content] 870 | except Exception as e: 871 | logger.error(f'Error making chapter contents: {e}') 872 | finally: 873 | pool.close() 874 | pool.join() 875 | 876 | for content in contents: 877 | if content: # Only add if content was successfully created 878 | self.book.add_item(content) 879 | self.book.spine.append(content) 880 | self.book.toc.append(content) 881 | 882 | def _make_chapter_content(self, chapter_data: Tuple[int, str, str]) -> Optional[Tuple[int, epub.EpubHtml]]: 883 | """ 884 | Create content for a chapter. 885 | 886 | Args: 887 | chapter_data: Tuple of (index, name, url) 888 | 889 | Returns: 890 | Tuple of (index, chapter content) or None if failed 891 | """ 892 | try: 893 | index, name, url = chapter_data 894 | 895 | response = NetworkManager.check_available_request(url) 896 | soup = BeautifulSoup(response.text, HTML_PARSER) 897 | 898 | filename = f'chap_{index + 1}.xhtml' 899 | 900 | # Get chapter title 901 | title_element = soup.find('div', 'title-top') 902 | chapter_title = title_element.find( 903 | 'h4').text if title_element and title_element.find('h4') else f'Chapter {index + 1}' 904 | 905 | content = f'

{chapter_title}

' 906 | 907 | # Get chapter content 908 | content_div = soup.find('div', id='chapter-content') 909 | if content_div: 910 | content += self._process_images(content_div, index + 1) 911 | 912 | # Get notes 913 | notes = self._get_chapter_notes(soup) 914 | content = self._replace_notes(content, notes) 915 | 916 | epub_content = epub.EpubHtml( 917 | uid=str(index + 1), 918 | title=chapter_title, 919 | file_name=filename, 920 | content=content 921 | ) 922 | 923 | return (index, epub_content) 924 | 925 | except requests.RequestException as e: 926 | logger.error( 927 | f'Network error while getting chapter contents: {e} - URL: {url}') 928 | OutputFormatter.print_error('Making chapter contents') 929 | print( 930 | f'Error: Network error while getting chapter contents! {url}') 931 | print('-' * LINE_SIZE) 932 | return None 933 | except Exception as e: 934 | logger.error(f'Error getting chapter contents: {e} - URL: {url}') 935 | OutputFormatter.print_error('Making chapter contents') 936 | print(f'Error: Cannot get chapter contents! {url}') 937 | print('-' * LINE_SIZE) 938 | return None 939 | 940 | def _process_images(self, content_div: BeautifulSoup, chapter_id: int) -> str: 941 | """ 942 | Process images in chapter content. 943 | 944 | Args: 945 | content_div: The chapter content div 946 | chapter_id: The chapter ID 947 | 948 | Returns: 949 | The processed content with images 950 | """ 951 | # Remove unwanted elements 952 | content_div.find('div', class_='flex') 953 | for element in content_div.find_all('p', {'target': '__blank'}): 954 | element.decompose() 955 | 956 | img_tags = content_div.find_all('img') 957 | content = str(content_div) 958 | 959 | if img_tags: 960 | for i, img_tag in enumerate(img_tags): 961 | img_url = img_tag.get('src') 962 | if img_url and "chapter-banners" not in img_url: 963 | try: 964 | image = ImageManager.get_image(img_url) 965 | if image is None: 966 | continue 967 | 968 | buffer = BytesIO() 969 | image.save(buffer, 'jpeg') 970 | image_data = buffer.getvalue() 971 | 972 | img_path = f'images/chapter_{chapter_id}/image_{i}.jpeg' 973 | image_item = epub.EpubItem( 974 | file_name=img_path, 975 | media_type='image/jpeg', 976 | content=image_data 977 | ) 978 | 979 | self.book.add_item(image_item) 980 | 981 | old_path = f'src="{img_url}' 982 | new_path = f'style="display: block;margin-left: auto;margin-right: auto;" src="{img_path}' 983 | content = content.replace(old_path, new_path) 984 | except Exception as e: 985 | logger.error( 986 | f'Error processing chapter image: {e} - Chapter ID: {chapter_id}') 987 | print( 988 | f'Error: Cannot get chapter images! {chapter_id}') 989 | print('-' * LINE_SIZE) 990 | return content 991 | 992 | def _get_chapter_notes(self, soup: BeautifulSoup) -> Dict[str, str]: 993 | """ 994 | Get notes from chapter content. 995 | 996 | Args: 997 | soup: The chapter content soup 998 | 999 | Returns: 1000 | Dictionary of notes 1001 | """ 1002 | notes = {} 1003 | note_divs = soup.find_all('div', id=re.compile("^note")) 1004 | for div in note_divs: 1005 | note_id = div.get('id') 1006 | if note_id: 1007 | note_tag = f'[{note_id}]' 1008 | content_span = div.find('span', class_='note-content_real') 1009 | if content_span: 1010 | note_content = content_span.text 1011 | note_text = f'(Note: {note_content})' 1012 | notes[note_tag] = note_text 1013 | return notes 1014 | 1015 | def _replace_notes(self, content: str, notes: Dict[str, str]) -> str: 1016 | """ 1017 | Replace note tags in chapter content. 1018 | 1019 | Args: 1020 | content: The chapter content 1021 | notes: Dictionary of notes 1022 | 1023 | Returns: 1024 | The processed chapter content 1025 | """ 1026 | for note_tag, note_text in notes.items(): 1027 | content = content.replace(note_tag, note_text) 1028 | return content 1029 | 1030 | def bind_epub_book(self) -> None: 1031 | """ 1032 | Bind all components into an EPUB book. 1033 | """ 1034 | intro_page = self.make_intro_page() 1035 | self.book.add_item(intro_page) 1036 | 1037 | try: 1038 | response = NetworkManager.check_available_request( 1039 | self.volume.cover_img, stream=True) 1040 | self.book.set_cover('cover.jpeg', response.content) 1041 | except requests.RequestException as e: 1042 | logger.error(f'Network error while setting cover image: {e}') 1043 | print('Error: Network error while setting cover image!') 1044 | print('-' * LINE_SIZE) 1045 | except Exception as e: 1046 | logger.error(f'Error setting cover image: {e}') 1047 | print('Error: Cannot set cover image!') 1048 | print('-' * LINE_SIZE) 1049 | 1050 | self.book.spine = ['cover', intro_page, 'nav'] 1051 | 1052 | self.make_chapters() 1053 | self.book.add_item(epub.EpubNcx()) 1054 | self.book.add_item(epub.EpubNav()) 1055 | 1056 | filename = TextUtils.format_filename( 1057 | f'{self.volume.name}-{self.light_novel.name}') + '.epub' 1058 | self.set_metadata(filename, self.light_novel.author) 1059 | 1060 | folder_name = TextUtils.format_filename(self.light_novel.name) 1061 | if not isdir(folder_name): 1062 | mkdir(folder_name) 1063 | 1064 | filepath = join(folder_name, filename) 1065 | 1066 | try: 1067 | epub.write_epub(filepath, self.book, {}) 1068 | except Exception as e: 1069 | logger.error(f'Error writing epub file: {e}') 1070 | print('Error: Cannot write epub file!') 1071 | print('-' * LINE_SIZE) 1072 | 1073 | def create_epub(self, ln: LightNovel) -> None: 1074 | """ 1075 | Create EPUB files for all volumes. 1076 | 1077 | Args: 1078 | ln: The light novel data 1079 | """ 1080 | self.light_novel = ln 1081 | for volume in ln.volumes: 1082 | OutputFormatter.print_formatted( 1083 | 'Processing volume: ', volume.name, info_style='bold fg:cyan') 1084 | self.book = epub.EpubBook() 1085 | self.volume = volume 1086 | self.bind_epub_book() 1087 | OutputFormatter.print_success('Processing', volume.name) 1088 | print('-' * LINE_SIZE) 1089 | self._save_json(ln) 1090 | 1091 | def update_epub(self, ln: LightNovel, volume: Volume) -> None: 1092 | """ 1093 | Update an existing EPUB file. 1094 | 1095 | Args: 1096 | ln: The light novel data 1097 | volume: The volume to update 1098 | """ 1099 | filename = TextUtils.format_filename( 1100 | f'{volume.name}-{ln.name}') + '.epub' 1101 | folder_name = TextUtils.format_filename(ln.name) 1102 | filepath = join(folder_name, filename) 1103 | 1104 | if isfile(filepath): 1105 | try: 1106 | self.book = epub.read_epub(filepath) 1107 | except Exception as e: 1108 | logger.error(f'Error reading epub file: {e}') 1109 | print('Error: Cannot read epub file!') 1110 | print('-' * LINE_SIZE) 1111 | return 1112 | 1113 | existing_chapters = [item.file_name for item in self.book.get_items() 1114 | if item.file_name.startswith('chap')] 1115 | 1116 | self.light_novel = ln 1117 | self.volume = volume 1118 | self.make_chapters(len(existing_chapters)) 1119 | 1120 | # Remove old TOC 1121 | # Create a copy to avoid modification during iteration 1122 | for item in self.book.items[:]: 1123 | if item.file_name == 'toc.ncx': 1124 | self.book.items.remove(item) 1125 | 1126 | self.book.add_item(epub.EpubNcx()) 1127 | 1128 | try: 1129 | epub.write_epub(filepath, self.book, {}) 1130 | except Exception as e: 1131 | logger.error(f'Error writing epub file: {e}') 1132 | print('Error: Cannot write epub file!') 1133 | print('-' * LINE_SIZE) 1134 | 1135 | self._save_json(ln) 1136 | else: 1137 | print('Cannot find the old light novel path!') 1138 | print('Creating the new one...') 1139 | self.create_epub(ln) 1140 | 1141 | def _save_json(self, ln: LightNovel) -> None: 1142 | """ 1143 | Save light novel information to JSON. 1144 | 1145 | Args: 1146 | ln: The light novel data 1147 | """ 1148 | update_manager = UpdateManager(self.json_file) 1149 | update_manager.update_json(ln) 1150 | 1151 | 1152 | class LightNovelManager: 1153 | """Manages light novel operations.""" 1154 | 1155 | def __init__(self): 1156 | self.json_file = 'ln_info.json' 1157 | 1158 | def _check_domains(self) -> None: 1159 | """Check which domains are accessible.""" 1160 | global DOMAINS 1161 | accessible_domains = [] 1162 | 1163 | # Always put the primary domain first, then check others 1164 | primary_domain = DOMAINS[0] if DOMAINS else "ln.hako.vn" 1165 | 1166 | # Check primary domain first 1167 | try: 1168 | response = session.get(f"https://{primary_domain}", timeout=10) 1169 | response.raise_for_status() 1170 | accessible_domains.append(primary_domain) 1171 | logger.debug(f"Primary domain {primary_domain} is accessible") 1172 | except requests.RequestException as e: 1173 | logger.debug( 1174 | f"Primary domain {primary_domain} is not accessible: {e}") 1175 | 1176 | # Check other domains 1177 | for domain in DOMAINS[1:]: # Skip the primary domain 1178 | try: 1179 | response = session.get(f"https://{domain}", timeout=10) 1180 | response.raise_for_status() 1181 | accessible_domains.append(domain) 1182 | logger.debug(f"Domain {domain} is accessible") 1183 | except requests.RequestException as e: 1184 | logger.debug(f"Domain {domain} is not accessible: {e}") 1185 | 1186 | DOMAINS = accessible_domains 1187 | 1188 | if not DOMAINS: 1189 | logger.error("No domains are accessible. Exiting.") 1190 | print( 1191 | "Error: No domains are accessible. Please check your internet connection.") 1192 | exit(1) 1193 | else: 1194 | logger.debug(f"Accessible domains: {DOMAINS}") 1195 | 1196 | def _check_for_updates(self) -> None: 1197 | """Check for tool updates.""" 1198 | try: 1199 | release_api = 'https://api.github.com/repos/quantrancse/hako2epub/releases/latest' 1200 | response = requests.get(release_api, headers=HEADERS, timeout=5) 1201 | response.raise_for_status() 1202 | data = response.json() 1203 | latest_release = data['tag_name'][1:] 1204 | 1205 | if TOOL_VERSION != latest_release: 1206 | OutputFormatter.print_formatted( 1207 | 'Current tool version: ', TOOL_VERSION, info_style='bold fg:red') 1208 | OutputFormatter.print_formatted( 1209 | 'Latest tool version: ', latest_release, info_style='bold fg:green') 1210 | OutputFormatter.print_formatted( 1211 | 'Please upgrade the tool at: ', 'https://github.com/quantrancse/hako2epub', info_style='bold fg:cyan') 1212 | print('-' * LINE_SIZE) 1213 | except requests.RequestException as e: 1214 | logger.error(f"Failed to check for updates: {e}") 1215 | except KeyError as e: 1216 | logger.error(f"Failed to parse update response: {e}") 1217 | except Exception as e: 1218 | logger.error(f"Unexpected error while checking for updates: {e}") 1219 | 1220 | def _validate_url(self, url: str) -> bool: 1221 | """ 1222 | Check if a URL is valid. 1223 | 1224 | Args: 1225 | url: The URL to check 1226 | 1227 | Returns: 1228 | True if valid, False otherwise 1229 | """ 1230 | if not any(domain in url for domain in DOMAINS): 1231 | print('Invalid url. Please try again.') 1232 | return False 1233 | return True 1234 | 1235 | def _update_json_file(self) -> None: 1236 | """Update the JSON file by removing entries for deleted novels.""" 1237 | try: 1238 | if not isfile(self.json_file): 1239 | return 1240 | 1241 | with open(self.json_file, 'r', encoding='utf-8') as file: 1242 | data = json.load(file) 1243 | 1244 | updated_data = data.copy() 1245 | 1246 | for ln_entry in data.get('ln_list', []): 1247 | ln_name = ln_entry.get('ln_name') 1248 | if not ln_name: 1249 | continue 1250 | 1251 | folder_name = TextUtils.format_filename(ln_name) 1252 | if not isdir(folder_name): 1253 | # Remove entry if folder doesn't exist 1254 | updated_data['ln_list'] = [entry for entry in updated_data['ln_list'] 1255 | if entry.get('ln_name') != ln_name] 1256 | else: 1257 | # Check volumes 1258 | updated_volumes = ln_entry.get('vol_list', []).copy() 1259 | for volume_entry in ln_entry.get('vol_list', []): 1260 | volume_name = volume_entry.get('vol_name') 1261 | if not volume_name: 1262 | continue 1263 | 1264 | epub_name = TextUtils.format_filename( 1265 | f'{volume_name}-{ln_name}') + '.epub' 1266 | epub_path = join(folder_name, epub_name) 1267 | if not isfile(epub_path): 1268 | # Remove volume if EPUB doesn't exist 1269 | updated_volumes = [vol for vol in updated_volumes 1270 | if vol.get('vol_name') != volume_name] 1271 | 1272 | # Update the volume list 1273 | for entry in updated_data['ln_list']: 1274 | if ln_entry.get('ln_url') == entry.get('ln_url'): 1275 | entry['vol_list'] = updated_volumes 1276 | 1277 | # Save updated data 1278 | with open(self.json_file, 'w', encoding='utf-8') as file: 1279 | json.dump(updated_data, file, indent=4, ensure_ascii=False) 1280 | 1281 | except FileNotFoundError: 1282 | logger.warning('ln_info.json file not found!') 1283 | except json.JSONDecodeError as e: 1284 | logger.error(f'Error parsing ln_info.json: {e}') 1285 | except Exception as e: 1286 | logger.error(f'Error processing ln_info.json: {e}') 1287 | 1288 | def start(self, ln_url: str, mode: str) -> None: 1289 | """ 1290 | Start the light novel manager. 1291 | 1292 | Args: 1293 | ln_url: The light novel URL 1294 | mode: The mode (default, chapter, update, update_all) 1295 | """ 1296 | # Check domains and tool updates 1297 | self._check_domains() 1298 | self._check_for_updates() 1299 | self._update_json_file() 1300 | 1301 | if ln_url and self._validate_url(ln_url): 1302 | if mode == 'update': 1303 | update_manager = UpdateManager() 1304 | update_manager.check_updates(ln_url) 1305 | elif mode == 'chapter': 1306 | self._download_chapters(ln_url) 1307 | else: 1308 | self._download_light_novel(ln_url) 1309 | elif mode == 'update_all': 1310 | update_manager = UpdateManager() 1311 | update_manager.check_updates() 1312 | else: 1313 | print('Please provide a valid URL or use update mode.') 1314 | 1315 | def _download_light_novel(self, ln_url: str) -> None: 1316 | """ 1317 | Download a light novel. 1318 | 1319 | Args: 1320 | ln_url: The light novel URL 1321 | """ 1322 | try: 1323 | response = NetworkManager.check_available_request(ln_url) 1324 | soup = BeautifulSoup(response.text, HTML_PARSER) 1325 | 1326 | if not soup.find('section', 'volume-list'): 1327 | print('Invalid url. Please try again.') 1328 | return 1329 | 1330 | # Create light novel object 1331 | ln = self._parse_light_novel(ln_url, soup) 1332 | 1333 | if ln.volumes: 1334 | epub_engine = EpubEngine() 1335 | epub_engine.create_epub(ln) 1336 | 1337 | except requests.RequestException as e: 1338 | logger.error(f'Network error while checking light novel url: {e}') 1339 | print('Error: Network error while checking light novel url!') 1340 | print('-' * LINE_SIZE) 1341 | except Exception as e: 1342 | logger.error(f'Error checking light novel url: {e}') 1343 | print('Error: Cannot check light novel url!') 1344 | print('-' * LINE_SIZE) 1345 | 1346 | def _download_chapters(self, ln_url: str) -> None: 1347 | """ 1348 | Download specific chapters of a light novel. 1349 | 1350 | Args: 1351 | ln_url: The light novel URL 1352 | """ 1353 | try: 1354 | response = NetworkManager.check_available_request(ln_url) 1355 | soup = BeautifulSoup(response.text, HTML_PARSER) 1356 | 1357 | if not soup.find('section', 'volume-list'): 1358 | print('Invalid url. Please try again.') 1359 | return 1360 | 1361 | # Create light novel object 1362 | ln = self._parse_light_novel(ln_url, soup, 'chapter') 1363 | 1364 | if ln.volumes: 1365 | epub_engine = EpubEngine() 1366 | epub_engine.create_epub(ln) 1367 | 1368 | except requests.RequestException as e: 1369 | logger.error(f'Network error while checking light novel url: {e}') 1370 | print('Error: Network error while checking light novel url!') 1371 | print('-' * LINE_SIZE) 1372 | except Exception as e: 1373 | logger.error(f'Error checking light novel url: {e}') 1374 | print('Error: Cannot check light novel url!') 1375 | print('-' * LINE_SIZE) 1376 | 1377 | def _parse_light_novel(self, ln_url: str, soup: BeautifulSoup, mode: str = '') -> LightNovel: 1378 | """ 1379 | Parse light novel information from HTML. 1380 | 1381 | Args: 1382 | ln_url: The light novel URL 1383 | soup: The parsed HTML 1384 | mode: The mode 1385 | 1386 | Returns: 1387 | The light novel object 1388 | """ 1389 | ln = LightNovel() 1390 | ln.url = ln_url 1391 | 1392 | # Get name 1393 | name_element = soup.find('span', 'series-name') 1394 | ln.name = TextUtils.format_text( 1395 | name_element.text) if name_element else "Unknown Light Novel" 1396 | OutputFormatter.print_formatted('Novel: ', ln.name) 1397 | 1398 | # Get series info 1399 | series_info = soup.find('div', 'series-information') 1400 | if series_info: 1401 | # Clean up anchor tags 1402 | for a in soup.find_all('a'): 1403 | try: 1404 | del a[':href'] 1405 | except KeyError: 1406 | pass 1407 | ln.series_info = str(series_info) 1408 | 1409 | # Extract author 1410 | info_items = series_info.find_all('div', 'info-item') 1411 | if info_items: 1412 | author_div = info_items[0].find( 1413 | 'a') if len(info_items) > 0 else None 1414 | if author_div: 1415 | ln.author = TextUtils.format_text(author_div.text) 1416 | elif len(info_items) > 1: 1417 | author_div = info_items[1].find('a') 1418 | if author_div: 1419 | ln.author = TextUtils.format_text(author_div.text) 1420 | 1421 | # Get summary 1422 | summary_content = soup.find('div', 'summary-content') 1423 | if summary_content: 1424 | ln.summary = '

Tóm tắt

' + str(summary_content) 1425 | 1426 | # Get fact item 1427 | fact_item = soup.find('div', 'fact-item') 1428 | if fact_item: 1429 | ln.fact_item = str(fact_item) 1430 | 1431 | # Get volumes 1432 | volume_sections = soup.find_all('section', 'volume-list') 1433 | ln.num_volumes = len(volume_sections) 1434 | 1435 | if mode == 'chapter': 1436 | # For chapter mode, select a single volume 1437 | volume_titles = [] 1438 | for volume_section in volume_sections: 1439 | title_element = volume_section.find('span', 'sect-title') 1440 | if title_element: 1441 | volume_titles.append( 1442 | TextUtils.format_text(title_element.text)) 1443 | 1444 | if volume_titles: 1445 | selected_title = questionary.select( 1446 | 'Select volumes to download:', choices=volume_titles, use_shortcuts=True).ask() 1447 | 1448 | if selected_title: 1449 | # Find the selected volume 1450 | for volume_section in volume_sections: 1451 | title_element = volume_section.find( 1452 | 'span', 'sect-title') 1453 | if title_element and TextUtils.format_text(title_element.text) == selected_title: 1454 | volume = self._parse_volume(ln_url, volume_section) 1455 | if volume: 1456 | # For chapter mode, filter chapters 1457 | self._select_chapters(volume) 1458 | ln.volumes.append(volume) 1459 | break 1460 | else: 1461 | # For normal mode, select multiple volumes 1462 | volume_titles = [] 1463 | for volume_section in volume_sections: 1464 | title_element = volume_section.find('span', 'sect-title') 1465 | if title_element: 1466 | volume_titles.append( 1467 | TextUtils.format_text(title_element.text)) 1468 | 1469 | if volume_titles: 1470 | all_volumes_text = f'All volumes ({len(volume_titles)} volumes)' 1471 | volume_titles.insert(0, questionary.Choice( 1472 | all_volumes_text, checked=True)) 1473 | 1474 | selected_titles = questionary.checkbox( 1475 | 'Select volumes to download:', choices=volume_titles).ask() 1476 | 1477 | if selected_titles: 1478 | if all_volumes_text in selected_titles: 1479 | # Download all volumes 1480 | for volume_section in volume_sections: 1481 | volume = self._parse_volume(ln_url, volume_section) 1482 | if volume: 1483 | ln.volumes.append(volume) 1484 | else: 1485 | # Download selected volumes 1486 | selected_titles = [ 1487 | title for title in selected_titles if title != all_volumes_text] 1488 | for volume_section in volume_sections: 1489 | title_element = volume_section.find( 1490 | 'span', 'sect-title') 1491 | if title_element and TextUtils.format_text(title_element.text) in selected_titles: 1492 | volume = self._parse_volume( 1493 | ln_url, volume_section) 1494 | if volume: 1495 | ln.volumes.append(volume) 1496 | 1497 | return ln 1498 | 1499 | def _parse_volume(self, ln_url: str, volume_section: BeautifulSoup) -> Optional[Volume]: 1500 | """ 1501 | Parse volume information from HTML section. 1502 | 1503 | Args: 1504 | ln_url: The light novel URL 1505 | volume_section: The volume section HTML 1506 | 1507 | Returns: 1508 | The volume object or None if failed 1509 | """ 1510 | volume = Volume() 1511 | 1512 | # Get volume name 1513 | name_element = volume_section.find('span', 'sect-title') 1514 | volume.name = TextUtils.format_text( 1515 | name_element.text) if name_element else "Unknown Volume" 1516 | 1517 | # Get volume URL 1518 | cover_element = volume_section.find('div', 'volume-cover') 1519 | if cover_element: 1520 | a_tag = cover_element.find('a') 1521 | if a_tag and a_tag.get('href'): 1522 | volume.url = TextUtils.reformat_url(ln_url, a_tag.get('href')) 1523 | 1524 | # Get volume details 1525 | try: 1526 | response = NetworkManager.check_available_request( 1527 | volume.url) 1528 | soup = BeautifulSoup(response.text, HTML_PARSER) 1529 | 1530 | # Get cover image 1531 | cover_div = soup.find('div', 'series-cover') 1532 | if cover_div: 1533 | img_element = cover_div.find('div', 'img-in-ratio') 1534 | if img_element and img_element.get('style'): 1535 | style = img_element.get('style') 1536 | if len(style) > 25: 1537 | volume.cover_img = style[23:-2] 1538 | 1539 | # Get chapters 1540 | chapter_list = soup.find('ul', 'list-chapters') 1541 | if chapter_list: 1542 | chapter_items = chapter_list.find_all('li') 1543 | volume.num_chapters = len(chapter_items) 1544 | 1545 | for chapter_item in chapter_items: 1546 | a_tag = chapter_item.find('a') 1547 | if a_tag: 1548 | chapter_name = TextUtils.format_text( 1549 | a_tag.text) 1550 | chapter_url = TextUtils.reformat_url( 1551 | volume.url, a_tag.get('href')) 1552 | volume.chapters[chapter_name] = chapter_url 1553 | except Exception as e: 1554 | logger.error(f"Error getting volume details: {e}") 1555 | 1556 | return volume 1557 | 1558 | def _select_chapters(self, volume: Volume) -> None: 1559 | """ 1560 | Let user select specific chapters to download. 1561 | 1562 | Args: 1563 | volume: The volume to select chapters from 1564 | """ 1565 | if not volume.chapters: 1566 | return 1567 | 1568 | chapter_names = list(volume.chapters.keys()) 1569 | from_chapter = questionary.text('Enter from chapter name:').ask() 1570 | to_chapter = questionary.text('Enter to chapter name:').ask() 1571 | 1572 | if from_chapter not in chapter_names or to_chapter not in chapter_names: 1573 | print('Invalid input chapter!') 1574 | volume.chapters = {} 1575 | else: 1576 | from_index = chapter_names.index(from_chapter) 1577 | to_index = chapter_names.index(to_chapter) 1578 | 1579 | if to_index < from_index: 1580 | from_index, to_index = to_index, from_index 1581 | 1582 | selected_names = chapter_names[from_index:to_index+1] 1583 | volume.chapters = { 1584 | name: volume.chapters[name] for name in selected_names 1585 | } 1586 | 1587 | 1588 | def main(): 1589 | """Main entry point for the application.""" 1590 | parser = argparse.ArgumentParser( 1591 | description='A tool to download light novels from https://ln.hako.vn in epub file format for offline reading.') 1592 | parser.add_argument('-v', '--version', action='version', 1593 | version=f'hako2epub v{TOOL_VERSION}') 1594 | parser.add_argument('ln_url', type=str, nargs='?', 1595 | default='', 1596 | help='url to the light novel page') 1597 | parser.add_argument('-c', '--chapter', type=str, metavar='ln_url', 1598 | help='download specific chapters of a light novel') 1599 | parser.add_argument('-u', '--update', type=str, metavar='ln_url', nargs='?', default=argparse.SUPPRESS, 1600 | help='update all/single light novel') 1601 | 1602 | args = parser.parse_args() 1603 | manager = LightNovelManager() 1604 | 1605 | if args.chapter: 1606 | manager.start(args.chapter, 'chapter') 1607 | elif 'update' in args: 1608 | if args.update: 1609 | manager.start(args.update, 'update') 1610 | else: 1611 | manager.start('', 'update_all') 1612 | else: 1613 | manager.start(args.ln_url, 'default') 1614 | 1615 | 1616 | if __name__ == '__main__': 1617 | main() 1618 | --------------------------------------------------------------------------------