├── .github
    └── workflows
    │   └── python-package.yml
├── LICENSE
├── README.md
├── config.yml
└── main.py


/.github/workflows/python-package.yml:
--------------------------------------------------------------------------------
 1 | # This workflow will install Python dependencies, run tests and lint with a variety of Python versions
 2 | # For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python
 3 | 
 4 | name: Python package
 5 | 
 6 | on:
 7 |   push:
 8 |     branches: [ "main" ]
 9 |   pull_request:
10 |     branches: [ "main" ]
11 | 
12 | jobs:
13 |   build:
14 | 
15 |     runs-on: ubuntu-latest
16 |     strategy:
17 |       fail-fast: false
18 |       matrix:
19 |         python-version: ["3.9", "3.10", "3.11"]
20 | 
21 |     steps:
22 |     - uses: actions/checkout@v4
23 |     - name: Set up Python ${{ matrix.python-version }}
24 |       uses: actions/setup-python@v3
25 |       with:
26 |         python-version: ${{ matrix.python-version }}
27 |     - name: Install dependencies
28 |       run: |
29 |         python -m pip install --upgrade pip
30 |         python -m pip install flake8 pytest
31 |         if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
32 |     - name: Lint with flake8
33 |       run: |
34 |         # stop the build if there are Python syntax errors or undefined names
35 |         flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
36 |         # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
37 |         flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
38 |     - name: Test with pytest
39 |       run: |
40 |         pytest
41 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 salikx
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ## 感谢以下项目
  2 | [![Readme Card](https://github-readme-stats.vercel.app/api/pin/?username=hect0x7&repo=JMComic-Crawler-Python)]([https://github.com/tonquer/JMComic-qt](https://github.com/hect0x7/JMComic-Crawler-Python)https://github.com/hect0x7/JMComic-Crawler-Python)
  3 | 
  4 | 
  5 | # 📄 图片批量转换为 PDF 脚本
  6 | 
  7 | 该项目用于将分章节存储的图片合并为单个 PDF 文件，支持自动遍历指定目录下的所有文件夹，避免高内存占用问题。
  8 | 
  9 | ---
 10 | 
 11 | ## 📌 功能介绍
 12 | 
 13 | * 遍历指定根目录，处理子文件夹中的图片并生成对应的 PDF。
 14 | * 支持图片格式：`JPG` / `JPEG` / `PNG` / `WEBP` / `BMP`。
 15 | * 数字顺序排序子目录和图片，确保页码正确。
 16 | * 检测已生成的 PDF，避免重复转换。
 17 | * 错误处理和日志提示，自动跳过异常图片或空子目录。
 18 | * 内存优化：使用生成器逐张处理图片，避免一次性加载所有图片。
 19 | 
 20 | ---
 21 | 
 22 | ## 📂 目录结构示例
 23 | 
 24 | ```
 25 | root_directory/
 26 | ├── 001/
 27 | │   ├── 1.jpg
 28 | │   ├── 2.jpg
 29 | │   └── …
 30 | ├── 002/
 31 | │   ├── 1.png
 32 | │   ├── 2.png
 33 | │   └── …
 34 | ├── Chapter3/
 35 | │   ├── 01.webp
 36 | │   ├── 02.webp
 37 | │   └── …
 38 | └── script.py
 39 | ```
 40 | 
 41 | 生成的 PDF 将保存在 `root_directory` 下：
 42 | 
 43 | ```
 44 | root_directory/
 45 | ├── 001.pdf
 46 | ├── 002.pdf
 47 | ├── Chapter3.pdf
 48 | └── …
 49 | ```
 50 | 
 51 | ---
 52 | 
 53 | ## ⚙️ 环境依赖
 54 | 
 55 | 请确保安装以下依赖：
 56 | 
 57 | ```bash
 58 | pip install pillow pyyaml jmcomic
 59 | ```
 60 | 
 61 | > **备注：** `jmcomic` 用于加载配置文件。确保使用前已正确安装或替换为自己的配置获取逻辑。
 62 | 
 63 | ---
 64 | 
 65 | ## 📄 使用方法
 66 | 
 67 | 1. **克隆或下载代码。**
 68 | 
 69 | 2. **确保配置文件存在并正确设置：**
 70 | 
 71 | 配置文件路径需在脚本中指定：
 72 | 
 73 | ```python
 74 | config_path = "D:/18comic_down/code/config.yml"
 75 | ```
 76 | 
 77 | 配置文件需要包含根目录设置：
 78 | 
 79 | ```yaml
 80 | dir_rule:
 81 |   base_dir: "D:/your_base_directory"
 82 | ```
 83 | 
 84 | 3. **运行脚本：**
 85 | 
 86 | ```bash
 87 | python script.py
 88 | ```
 89 | 
 90 | > 如果需要自定义参数或路径，请修改 `config_path` 和相关参数。
 91 | 
 92 | ---
 93 | 
 94 | ## 🔧 参数说明
 95 | 
 96 | | 参数            | 说明            | 备注      |
 97 | | ------------- | ------------- | ------- |
 98 | | `config_path` | 配置文件路径        | YAML 格式 |
 99 | | `base_dir`    | 根目录，存放图片文件夹位置 | 必须存在    |
100 | 
101 | ---
102 | 
103 | ## 🚩 功能细节
104 | 
105 | * **内存优化：**
106 | 
107 |   * 使用生成器逐张加载图片，仅在处理时占用内存，有效防止内存泄露。
108 | * **排序规则：**
109 | 
110 |   * 子文件夹按纯数字排序，非数字文件夹排在最后。
111 |   * 图片文件名根据数字部分排序，确保正确顺序合成 PDF。
112 | * **异常处理：**
113 | 
114 |   * 跳过非数字子目录。
115 |   * 跳过无法读取或损坏的图片文件。
116 |   * 检查目标 PDF 是否已存在，避免重复转换。
117 | 
118 | ---
119 | 
120 | ## 📑 日志输出示例
121 | 
122 | ```
123 | 📄 转换中：001
124 | 开始生成PDF：D:\your_base_directory\001.pdf
125 | ✅ 成功生成PDF：D:\your_base_directory\001.pdf
126 | 处理完成，耗时 5.23 秒
127 | 
128 | 跳过已有PDF：002.pdf
129 | 
130 | 📄 转换中：Chapter3
131 | 开始生成PDF：D:\your_base_directory\Chapter3.pdf
132 | ✅ 成功生成PDF：D:\your_base_directory\Chapter3.pdf
133 | 处理完成，耗时 8.47 秒
134 | ```
135 | 
136 | ---
137 | 
138 | ## ❓ 常见问题
139 | 
140 | 1. **配置文件加载失败：**
141 | 
142 |    * 请检查 `config_path` 路径是否正确。
143 |    * 确保 `config.yml` 存在且格式正确。
144 | 
145 | 2. **未找到图片文件：**
146 | 
147 |    * 确保子目录内存在支持的图片类型。
148 |    * 检查子目录命名是否为纯数字，如 `001`、`002`。
149 | 
150 | 3. **内存占用高：**
151 | 
152 |    * 脚本已使用生成器优化，如仍有问题请检查图片分辨率或尝试拆分图片文件夹。
153 | 
154 | ---
155 | 
156 | ## 📬 联系
157 | 
158 | 如有问题或建议，请在仓库提交 issue 反馈。
159 | 


--------------------------------------------------------------------------------
/config.yml:
--------------------------------------------------------------------------------
 1 | # Github Actions 下载脚本配置
 2 | version: '2.0'
 3 | 
 4 | dir_rule:
 5 |   base_dir: D:/18comic_down/books
 6 |   rule: Bd_Atitle_Pindex
 7 | 
 8 | client:
 9 |   domain:
10 |     - 18comic.vip
11 |     - 18comic.org
12 | 
13 | download:
14 |   cache: true # 如果要下载的文件在磁盘上已存在，不用再下一遍了吧？
15 |   image:
16 |     decode: true # JM的原图是混淆过的，要不要还原？
17 |     suffix: .jpg # 把图片都转为.jpg格式
18 |   threading:
19 |     # batch_count: 章节的批量下载图片线程数
20 |     # 数值大，下得快，配置要求高，对禁漫压力大
21 |     # 数值小，下得慢，配置要求低，对禁漫压力小
22 |     # PS: 禁漫网页一般是一次请求50张图
23 |     batch_count: 45


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import time
  3 | import yaml
  4 | from PIL import Image
  5 | import jmcomic
  6 | 
  7 | def sorted_numeric_filenames(file_list):
  8 |     """对文件名按数字部分排序"""
  9 |     def extract_number(s):
 10 |         name, _ = os.path.splitext(s)
 11 |         return int(''.join(filter(str.isdigit, name)) or 0)
 12 |     return sorted(file_list, key=extract_number)
 13 | 
 14 | def convert_images_to_pdf(input_folder, output_path, pdf_name):
 15 |     start_time = time.time()
 16 |     allowed_extensions = {'.jpg', '.jpeg', '.png', '.webp', '.bmp'}
 17 |     output_path = os.path.normpath(output_path)
 18 |     os.makedirs(output_path, exist_ok=True)
 19 |     pdf_full_path = os.path.join(output_path, f"{os.path.splitext(pdf_name)[0]}.pdf")
 20 | 
 21 |     image_iterator = []
 22 | 
 23 |     # 获取子目录并排序
 24 |     try:
 25 |         subdirs = sorted(
 26 |             [d for d in os.listdir(input_folder) if os.path.isdir(os.path.join(input_folder, d))],
 27 |             key=lambda x: int(x) if x.isdigit() else float('inf')
 28 |         )
 29 |     except Exception as e:
 30 |         print(f"错误：无法读取目录 {input_folder}，原因：{e}")
 31 |         return
 32 | 
 33 |     for subdir in subdirs:
 34 |         subdir_path = os.path.join(input_folder, subdir)
 35 |         try:
 36 |             files = [f for f in os.listdir(subdir_path)
 37 |                      if os.path.isfile(os.path.join(subdir_path, f)) and os.path.splitext(f)[1].lower() in allowed_extensions]
 38 |             files = sorted_numeric_filenames(files)
 39 |             for f in files:
 40 |                 image_iterator.append(os.path.join(subdir_path, f))
 41 |         except Exception as e:
 42 |             print(f"警告：读取子目录失败 {subdir_path}，原因：{e}")
 43 | 
 44 |     if not image_iterator:
 45 |         print("错误：未找到任何图片文件")
 46 |         return
 47 | 
 48 |     try:
 49 |         def open_image(path):
 50 |             img = Image.open(path)
 51 |             if img.mode != 'RGB':
 52 |                 img = img.convert('RGB')
 53 |             return img
 54 | 
 55 |         # 用生成器延迟加载，首张图用作 PDF 的 base 图
 56 |         image_iter = (open_image(p) for p in image_iterator)
 57 |         first_image = next(image_iter, None)
 58 | 
 59 |         if not first_image:
 60 |             print("错误：没有有效图片可生成PDF")
 61 |             return
 62 | 
 63 |         print(f"开始生成PDF：{pdf_full_path}")
 64 |         first_image.save(
 65 |             pdf_full_path,
 66 |             "PDF",
 67 |             save_all=True,
 68 |             append_images=[img for img in image_iter],
 69 |             optimize=True
 70 |         )
 71 |         print(f"✅ 成功生成PDF：{pdf_full_path}")
 72 | 
 73 |     except Exception as e:
 74 |         print(f"❌ 生成PDF失败：{e}")
 75 | 
 76 |     print(f"处理完成，耗时 {time.time() - start_time:.2f} 秒")
 77 | 
 78 | def main():
 79 |     config_path = "D:/18comic_down/code/config.yml"
 80 |     try:
 81 |         option = jmcomic.JmOption.from_file(config_path)
 82 |         with open(config_path, "r", encoding="utf-8") as f:
 83 |             config = yaml.safe_load(f)
 84 |             base_dir = config["dir_rule"]["base_dir"]
 85 |     except Exception as e:
 86 |         print(f"加载配置失败：{e}")
 87 |         return
 88 | 
 89 |     if not os.path.exists(base_dir):
 90 |         print(f"错误：根目录不存在 {base_dir}")
 91 |         return
 92 | 
 93 |     for entry in os.scandir(base_dir):
 94 |         if entry.is_dir():
 95 |             pdf_name = f"{entry.name}.pdf"
 96 |             pdf_path = os.path.join(base_dir, pdf_name)
 97 |             if os.path.exists(pdf_path):
 98 |                 print(f"跳过已有PDF：{pdf_name}")
 99 |                 continue
100 | 
101 |             print(f"\n📄 转换中：{entry.name}")
102 |             convert_images_to_pdf(
103 |                 input_folder=entry.path,
104 |                 output_path=base_dir,
105 |                 pdf_name=entry.name
106 |             )
107 | 
108 | if __name__ == "__main__":
109 |     main()
110 | 


--------------------------------------------------------------------------------