├── .gitignore ├── LICENSE ├── README.md ├── assets ├── image-1.png ├── image-20250112190109165.png ├── image-20250112192705855.png ├── image-20250112192810066.png ├── image-20250112192921071.png ├── image-20250112193112943.png ├── image-20250112193244857.png ├── image-20250112193320521.png ├── image-20250112193821807.png ├── image-20250112193847388.png ├── image-20250112193938194.png ├── image-20250112194025460.png ├── image-20250112194142442.png ├── image-20250112194229557.png ├── image-20250112194256356.png ├── image.png └── image1.png ├── csdn2md_node.js ├── csdn2md_tampermonkey.js ├── csdn2md_tampermonkey_v2.js ├── examples ├── README.md ├── case1.html ├── case1.md ├── case1.pdf ├── case2.html ├── case2.md └── case2.pdf └── test.js /.gitignore: -------------------------------------------------------------------------------- 1 | node_modules 2 | node_modules/* 3 | package-lock.json 4 | package.json -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | # PolyForm Strict License 1.0.0 2 | 3 | 4 | 5 | ## Acceptance 6 | 7 | In order to get any license under these terms, you must agree 8 | to them as both strict obligations and conditions to all 9 | your licenses. 10 | 11 | ## Copyright License 12 | 13 | The licensor grants you a copyright license for the software 14 | to do everything you might do with the software that would 15 | otherwise infringe the licensor's copyright in it for any 16 | permitted purpose, other than distributing the software or 17 | making changes or new works based on the software. 18 | 19 | ## Patent License 20 | 21 | The licensor grants you a patent license for the software that 22 | covers patent claims the licensor can license, or becomes able 23 | to license, that you would infringe by using the software. 24 | 25 | ## Noncommercial Purposes 26 | 27 | Any noncommercial purpose is a permitted purpose. 28 | 29 | ## Personal Uses 30 | 31 | Personal use for research, experiment, and testing for 32 | the benefit of public knowledge, personal study, private 33 | entertainment, hobby projects, amateur pursuits, or religious 34 | observance, without any anticipated commercial application, 35 | is use for a permitted purpose. 36 | 37 | ## Noncommercial Organizations 38 | 39 | Use by any charitable organization, educational institution, 40 | public research organization, public safety or health 41 | organization, environmental protection organization, 42 | or government institution is use for a permitted purpose 43 | regardless of the source of funding or obligations resulting 44 | from the funding. 45 | 46 | ## Fair Use 47 | 48 | You may have "fair use" rights for the software under the 49 | law. These terms do not limit them. 50 | 51 | ## No Other Rights 52 | 53 | These terms do not allow you to sublicense or transfer any of 54 | your licenses to anyone else, or prevent the licensor from 55 | granting licenses to anyone else. These terms do not imply 56 | any other licenses. 57 | 58 | ## Patent Defense 59 | 60 | If you make any written claim that the software infringes or 61 | contributes to infringement of any patent, your patent license 62 | for the software granted under these terms ends immediately. If 63 | your company makes such a claim, your patent license ends 64 | immediately for work on behalf of your company. 65 | 66 | ## Violations 67 | 68 | The first time you are notified in writing that you have 69 | violated any of these terms, or done anything with the software 70 | not covered by your licenses, your licenses can nonetheless 71 | continue if you come into full compliance with these terms, 72 | and take practical steps to correct past violations, within 73 | 32 days of receiving notice. Otherwise, all your licenses 74 | end immediately. 75 | 76 | ## No Liability 77 | 78 | ***As far as the law allows, the software comes as is, without 79 | any warranty or condition, and the licensor will not be liable 80 | to you for any damages arising out of these terms or the use 81 | or nature of the software, under any kind of legal claim.*** 82 | 83 | ## Definitions 84 | 85 | The **licensor** is the individual or entity offering these 86 | terms, and the **software** is the software the licensor makes 87 | available under these terms. 88 | 89 | **You** refers to the individual or entity agreeing to these 90 | terms. 91 | 92 | **Your company** is any legal entity, sole proprietorship, 93 | or other kind of organization that you work for, plus all 94 | organizations that have control over, are under the control of, 95 | or are under common control with that organization. **Control** 96 | means ownership of substantially all the assets of an entity, 97 | or the power to direct its management and policies by vote, 98 | contract, or otherwise. Control can be direct or indirect. 99 | 100 | **Your licenses** are all the licenses granted to you for the 101 | software under these terms. 102 | 103 | **Use** means anything you do with the software requiring one 104 | of your licenses. 105 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # csdn2md - 批量下载 CSDN 文章和专栏(支持油猴脚本) 2 | 3 | ![GitHub license](https://img.shields.io/github/license/Qalxry/csdn2md) 4 | ![GitHub stars](https://img.shields.io/github/stars/Qalxry/csdn2md) 5 | ![GitHub forks](https://img.shields.io/github/forks/Qalxry/csdn2md) 6 | ![GitHub issues](https://img.shields.io/github/issues/Qalxry/csdn2md) 7 | 8 | ## 简介 9 | 10 | `csdn2md` 是一个用于***批量下载 CSDN 文章和专栏并将其转换为 Markdown 文件***的工具。支持**油猴脚本**和 Node.js 两种版本。 11 | 12 | ![界面](./assets/image1.png) 13 | 14 | 下载了 Markdown 文件后,如果希望获得其他格式的文件,可以使用 [Pandoc](https://pandoc.org/) 进行转换,或者用 [Typora](https://typora.io/) 直接打开后另存为其他格式(pdf、docx 等)。 15 | 16 | > 动机: 17 | > 18 | > - 现有的 CSDN 文章下载工具有不少失效了,即使有用,对于 CSDN 文章格式的支持也不够完善。 19 | > - 尤其是 KaTeX 公式,作者希望能够以 LaTeX 的形式保存公式,而不是图片或乱码。 20 | > - 另外也希望能够专门完整地支持 CSDN 编辑器的全部功能。 21 | > 22 | > 本仓库代码对 CSDN 富文本编辑器和 Markdown 编辑器的功能基本上都支持了,效果很好,欢迎尝试使用。 23 | 24 | 项目提供了两种版本: 25 | 26 | - **油猴脚本(Tampermonkey)版本**:适合普通用户,操作简便,支持专栏页面和文章页面内直接下载。 27 | - **Node.js 版本**:适合开发者和技术用户,提供基本的转换函数实现,可作为爬虫工具进一步扩展。(注意该版本***基本不维护***,尽量使用油猴脚本) 28 | 29 | 油猴脚本下载:[csdn2md - 批量下载CSDN文章为Markdown](https://greasyfork.org/en/scripts/523540-csdn2md-%E6%89%B9%E9%87%8F%E4%B8%8B%E8%BD%BDcsdn%E6%96%87%E7%AB%A0%E4%B8%BAmarkdown) 30 | 31 | > [!WARNING] 32 | > 33 | > 本仓库最后测试时间为 `2025.5.29` 34 | > 35 | > 请注意 CSDN 页面结构随时可能变动,导致脚本无法正常使用。 36 | 37 | ------ 38 | 39 | ## Q&A 40 | 41 | ### 1. 我下载了油猴脚本,但是没有看到下载按钮 42 | 43 | 可能是你的浏览器阻止了油猴插件,需要你打开扩展程序的开发人员模式,此时油猴插件会显示 “Please enable developer mode to allow userscript injection.” 44 | 45 | 可参考 [https://blog.csdn.net/m0_57703994/article/details/143798922](https://blog.csdn.net/m0_57703994/article/details/143798922) 的教程解决该问题。 46 | 47 | ![图片](https://github.com/user-attachments/assets/732074a6-c143-4a89-95a9-79f8403ad9e7) 48 | 49 | 50 | ### 2. 点击下载弹出【一个用户脚本试图访问一个跨域资源】的提示 51 | 52 | 这是因为脚本需要访问 CSDN 的页面,以获取文章内容。 53 | 请点击【总是允许此域名】,否则脚本无法正常工作。 54 | 55 | ![alt text](./assets/image.png) 56 | 57 | > 如果你点错了或者超时没有选到,可以到 Tampermonkey,点击【管理面板】,在【已安装脚本】中双击【csdn2md - 批量下载CSDN文章为Markdown】,点击【设置】,查看【XHR 安全】的【用户域名黑名单】是否有东西,如果有,删除即可。 58 | > 59 | > ![alt text](./assets/image-1.png) 60 | 61 | ## 功能特性 62 | 63 | ### 油猴脚本版本 64 | 65 | - **批量下载**:支持批量下载单篇文章或整个专栏的所有文章。 66 | - **高保真转换**:转换后的 Markdown 文件尽可能保留 CSDN 编辑器的原始格式和语法特性,包括但不限于: 67 | - **数学公式**:支持 KaTeX 内联公式和公式块。 68 | - **多媒体**:图片、Bilibili 视频控件。 69 | - **代码**:内联代码和代码块。 70 | - **列表**:有序列表、无序列表、任务列表、自定义列表。 71 | - **排版**:加粗、斜体、删除线、下划线、高亮、内容对齐(左、中、右)。 72 | - **其他**:目录、注脚、引用块、链接、快捷键(kbd)、表格、上下标、甘特图、UML 图、FlowChart 流程图等。 73 | 74 | ### Node.js 版本 75 | 76 | - **基本功能**:只提供 HTML 转 Markdown 的基础函数,实现基本的内容转换。 77 | - **可扩展性**:适合作为爬虫工具的基础,可以根据需要增加上层函数以实现更多功能。 78 | 79 | ## 安装与使用 80 | 81 | ### 油猴脚本版本 82 | 83 | 如果您只是想下载少量文章或专栏,可以使用油猴脚本版本。非常简单方便。 84 | 85 | 1. **安装 Tampermonkey 插件**: 86 | - 前往 [Tampermonkey 官网](https://www.tampermonkey.net/) 根据浏览器类型安装 Tampermonkey 插件。 87 | 88 | 2. **安装 `csdn2md` 脚本**: 89 | - 访问 [Greasy Fork 上的 csdn2md 脚本页面](https://greasyfork.org/en/scripts/523540-csdn2md-%E6%89%B9%E9%87%8F%E4%B8%8B%E8%BD%BDcsdn%E6%96%87%E7%AB%A0%E4%B8%BAmarkdown) 安装 `csdn2md` 脚本。 90 | - 安装完成后,刷新并访问 CSDN 文章/专栏页时会在页面右下角出现下载按钮。 91 | 92 | 3. **使用脚本下载**: 93 | - 打开需要下载的 CSDN 文章或专栏页面,点击页面右下角的下载按钮即可将内容保存为 Markdown 文件。 94 | - 建议使用 [Typora](https://typora.io/) 打开下载的 Markdown 文件以获得最佳显示效果。 95 | 96 | 4. **注意事项**: 97 | - 由于 CSDN 前端页面可能会变动,脚本可能无法正确识别(目前2025.1.12正常)。请检查下载的 Markdown 文件是否符合预期。 98 | 99 | ### Node.js 版本 100 | 101 | 如果您出于学术目的,希望爬取大量 CSDN 文章并进行批量转换,可以参考本仓库提供的 Node.js 版本脚本编写您自己的爬虫工具。 102 | 103 | > (严禁爬取未授权的内容,***本仓库不提供大量爬取功能***,仅供学习参考和个人使用,不承担任何法律责任。) 104 | 105 | 1. **安装 Node.js**: 106 | - 前往 [Node.js 官网](https://nodejs.org/) 下载并安装最新稳定版本的 Node.js。 107 | 108 | 2. **克隆仓库并安装依赖**: 109 | 110 | ```bash 111 | git clone https://github.com/Qalxry/csdn2md.git 112 | cd csdn2md 113 | npm install jsdom --save 114 | ``` 115 | 116 | 3. **使用 Node.js 脚本**: 117 | - 编辑 `csdn2md_node.js` 文件,根据需要修改示例用法或添加自定义功能。 118 | - 运行脚本: 119 | 120 | ```bash 121 | node csdn2md_node.js 122 | ``` 123 | 124 | - 示例代码会读取 `examples` 目录下的 HTML 文件并生成对应的 Markdown 文件。 125 | 126 | - 具体功能见代码内的示例。 127 | 128 | ## 对比效果 129 | 130 | CSDN 有两种编辑器,它们所产出的网页不太一样,而本项目完美支持两种编辑器产出的文章页,几乎一致。 131 | 132 | ### CSDN 富文本编辑器转换效果 133 | 134 | 左侧为 CSDN 网页,右侧为 typora 查看转换后的md,使用 typora 默认主题: 135 | 136 | ![image-20250112192705855](./assets/image-20250112192705855.png) 137 | 138 | ![image-20250112192810066](./assets/image-20250112192810066.png) 139 | 140 | ![image-20250112192921071](./assets/image-20250112192921071.png) 141 | 142 | ![image-20250112193112943](./assets/image-20250112193112943.png) 143 | 144 | ![image-20250112193244857](./assets/image-20250112193244857.png) 145 | 146 | ![image-20250112193320521](./assets/image-20250112193320521.png) 147 | 148 | ### CSDN Markdown 编辑器转换效果 149 | 150 | 左侧为 CSDN 网页,右侧为 typora 查看转换后的md,使用 typora 默认主题: 151 | 152 | ![image-20250112193821807](./assets/image-20250112193821807.png) 153 | 154 | ![image-20250112193847388](./assets/image-20250112193847388.png) 155 | 156 | ![image-20250112193938194](./assets/image-20250112193938194.png) 157 | 158 | ![image-20250112194025460](./assets/image-20250112194025460.png) 159 | 160 | ![image-20250112194142442](./assets/image-20250112194142442.png) 161 | 162 | ![image-20250112194229557](./assets/image-20250112194229557.png) 163 | 164 | ![image-20250112194256356](./assets/image-20250112194256356.png) 165 | 166 | ## 示例 167 | 168 | `examples` 目录包含了 HTML 到 Markdown 的转换示例: 169 | 170 | - `case1.html` 和 `case1.md`:示例 1 的 HTML 内容及其转换后的 Markdown 文件。 171 | - `case2.html` 和 `case2.md`:示例 2 的 HTML 内容及其转换后的 Markdown 文件。 172 | 173 | ## 文件结构 174 | 175 | ``` 176 | csdn2md/ 177 | ├── csdn2md_node.js # Node.js 版本主脚本 178 | ├── csdn2md_tampermonkey.js # 油猴脚本 179 | ├── examples # 示例文件目录 180 | ├── LICENSE # 项目许可证 181 | └── README.md # 项目说明文件 182 | ``` 183 | 184 | ## 许可证 185 | 186 | 本项目采用 [PolyForm Strict License 1.0.0](https://polyformproject.org/licenses/strict/1.0.0/) 许可证。**禁止商业用途,仅供学习和个人使用。** 187 | 188 | ## 使用限制 189 | 190 | - **法律风险**:CSDN 实施了 `robots.txt` 爬虫协议,未经许可爬取其内容可能存在法律风险。请谨慎使用,避免用于商业用途。严禁爬取未授权的内容,***本仓库不提供大量爬取功能***,仅供学习参考和个人使用,不承担任何法律责任。 191 | - **教育用途**:本仓库仅供学习和交流,不得用于任何商业目的。作者不承担因使用本工具而引发的任何法律责任。 192 | 193 | ## 贡献/问题反馈 194 | 195 | 如有任何建议或问题,欢迎提交 Issue ,作者会根据时间情况尽快回复。 196 | 197 | **免责声明**:使用本工具前,请确保您拥有相关内容的合法使用权。作者对因使用本工具而导致的任何法律问题不承担责任。 198 | 199 | [![Star History Chart](https://api.star-history.com/svg?repos=Qalxry/csdn2md&type=Date)](https://star-history.com/#Qalxry/csdn2md&Date) 200 | -------------------------------------------------------------------------------- /assets/image-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Qalxry/csdn2md/f9748f4d446b8a488506e3b5cb08bf802081d716/assets/image-1.png -------------------------------------------------------------------------------- /assets/image-20250112190109165.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Qalxry/csdn2md/f9748f4d446b8a488506e3b5cb08bf802081d716/assets/image-20250112190109165.png -------------------------------------------------------------------------------- /assets/image-20250112192705855.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Qalxry/csdn2md/f9748f4d446b8a488506e3b5cb08bf802081d716/assets/image-20250112192705855.png -------------------------------------------------------------------------------- /assets/image-20250112192810066.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Qalxry/csdn2md/f9748f4d446b8a488506e3b5cb08bf802081d716/assets/image-20250112192810066.png -------------------------------------------------------------------------------- /assets/image-20250112192921071.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Qalxry/csdn2md/f9748f4d446b8a488506e3b5cb08bf802081d716/assets/image-20250112192921071.png -------------------------------------------------------------------------------- /assets/image-20250112193112943.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Qalxry/csdn2md/f9748f4d446b8a488506e3b5cb08bf802081d716/assets/image-20250112193112943.png -------------------------------------------------------------------------------- /assets/image-20250112193244857.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Qalxry/csdn2md/f9748f4d446b8a488506e3b5cb08bf802081d716/assets/image-20250112193244857.png -------------------------------------------------------------------------------- /assets/image-20250112193320521.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Qalxry/csdn2md/f9748f4d446b8a488506e3b5cb08bf802081d716/assets/image-20250112193320521.png -------------------------------------------------------------------------------- /assets/image-20250112193821807.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Qalxry/csdn2md/f9748f4d446b8a488506e3b5cb08bf802081d716/assets/image-20250112193821807.png -------------------------------------------------------------------------------- /assets/image-20250112193847388.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Qalxry/csdn2md/f9748f4d446b8a488506e3b5cb08bf802081d716/assets/image-20250112193847388.png -------------------------------------------------------------------------------- /assets/image-20250112193938194.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Qalxry/csdn2md/f9748f4d446b8a488506e3b5cb08bf802081d716/assets/image-20250112193938194.png -------------------------------------------------------------------------------- /assets/image-20250112194025460.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Qalxry/csdn2md/f9748f4d446b8a488506e3b5cb08bf802081d716/assets/image-20250112194025460.png -------------------------------------------------------------------------------- /assets/image-20250112194142442.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Qalxry/csdn2md/f9748f4d446b8a488506e3b5cb08bf802081d716/assets/image-20250112194142442.png -------------------------------------------------------------------------------- /assets/image-20250112194229557.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Qalxry/csdn2md/f9748f4d446b8a488506e3b5cb08bf802081d716/assets/image-20250112194229557.png -------------------------------------------------------------------------------- /assets/image-20250112194256356.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Qalxry/csdn2md/f9748f4d446b8a488506e3b5cb08bf802081d716/assets/image-20250112194256356.png -------------------------------------------------------------------------------- /assets/image.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Qalxry/csdn2md/f9748f4d446b8a488506e3b5cb08bf802081d716/assets/image.png -------------------------------------------------------------------------------- /assets/image1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Qalxry/csdn2md/f9748f4d446b8a488506e3b5cb08bf802081d716/assets/image1.png -------------------------------------------------------------------------------- /csdn2md_node.js: -------------------------------------------------------------------------------- 1 | // Description: Convert CSDN HTML to Markdown using Node.js. 2 | // Author: ShizuriYuki 3 | // Date: 2025-01-11 4 | // Last Update: 2025-01-11 5 | // License: Polyform Strict License 1.0.0 (https://polyformproject.org/licenses/strict/1.0.0/) 6 | // Version: 1.0.1 7 | // SupportURL: https://github.com/Qalxry/csdn2md 8 | 9 | const { JSDOM } = require("jsdom"); 10 | const { TextEncoder } = require("util"); 11 | const fs = require("fs"); 12 | 13 | /** 14 | * 将 SVG 图片转换为 Base64 编码的字符串。 15 | * @param {string} text - SVG 图片的文本内容。 16 | * @returns {string} - Base64 编码的字符串。 17 | */ 18 | function svgToBase64(svgText) { 19 | const uint8Array = new TextEncoder().encode(svgText); 20 | const binaryString = uint8Array.reduce((data, byte) => data + String.fromCharCode(byte), ''); 21 | return btoa(binaryString); 22 | } 23 | 24 | /** 25 | * 压缩HTML内容,移除多余的空白和换行符。 26 | * @param {string} html - 输入的HTML字符串。 27 | * @returns {string} - 压缩后的HTML字符串。 28 | */ 29 | function shrinkHtml(html) { 30 | return html 31 | .replace(/>\s+<') // 去除标签之间的空白 32 | .replace(/\s{2,}/g, ' ') // 多个空格压缩成一个 33 | .replace(/^\s+|\s+$/g, ''); // 去除首尾空白 34 | } 35 | 36 | /** 37 | * 清除字符串中的特殊字符。 38 | * @param {*} str 39 | * @returns 40 | */ 41 | function clearSpecialChars(str) { 42 | return str.replace(/[\s]{2,}/g, "").replace(/[\u200B-\u200F\u202A-\u202E\u2060-\u206F\uFEFF\u00AD\u034F\u061C\u180E\u2800\u3164\uFFA0\uFFF9-\uFFFB]/g, ""); 43 | } 44 | 45 | /** 46 | * 转换 CSDN HTML 到 Markdown 格式。 47 | * @param {string} html - CSDN 文章的 HTML 内容。含有 id="content_views" 的 div 元素。 48 | * @returns {string} - Markdown 格式的文本。 49 | */ 50 | function htmlToMarkdown(html) { 51 | const htype_map = { 52 | 一级标题: 1, 53 | 二级标题: 2, 54 | 三级标题: 3, 55 | 四级标题: 4, 56 | 五级标题: 5, 57 | 六级标题: 6, 58 | }; 59 | 60 | // Create a DOM parser 61 | const document = new JSDOM(html).window.document; 62 | const content = document.getElementById("content_views"); 63 | 64 | let markdown = ""; 65 | 66 | // 辅助函数,用于转义特殊的 Markdown 字符 67 | const escapeMarkdown = (text) => { 68 | // return text.replace(/([\\`*_\{\}\[\]()#+\-.!])/g, "\\$1").trim(); 69 | return text.trim(); 70 | }; 71 | 72 | /** 73 | * 递归处理 DOM 节点并将其转换为 Markdown。 74 | * @param {Node} node - 当前的 DOM 节点。 75 | * @param {number} listLevel - 当前列表嵌套级别。 76 | * @returns {string} - 节点的 Markdown 字符串。 77 | */ 78 | function processNode(node, listLevel = 0) { 79 | let result = ""; 80 | const ELEMENT_NODE = 1; 81 | const TEXT_NODE = 3; 82 | const COMMENT_NODE = 8; 83 | switch (node.nodeType) { 84 | case ELEMENT_NODE: 85 | switch (node.tagName.toLowerCase()) { 86 | case "h1": 87 | case "h2": 88 | case "h3": 89 | case "h4": 90 | case "h5": 91 | case "h6": 92 | { 93 | // 解析 id 里的 url 编码,如 %E4%B8%80%E7%BA%A7%E6%A0%87%E9%A2%98 -> 一级标题 94 | if (node.getAttribute("id")) { 95 | const htype = decodeURIComponent(node.getAttribute("id")); 96 | result += `${"#".repeat(htype_map[htype])} ${node.textContent.trim()}\n\n`; 97 | } 98 | else { 99 | const htype = Number(node.tagName[1]); 100 | result += `${"#".repeat(htype)} ${node.textContent.trim()}\n\n`; 101 | } 102 | } 103 | break; 104 | case "p": 105 | { 106 | const style = node.getAttribute("style"); 107 | if (node.getAttribute("id") === "main-toc") { 108 | result += `**目录**\n\n[TOC]\n\n`; 109 | break; 110 | } 111 | let text = processChildren(node, listLevel); 112 | if (style) { 113 | if (style.includes("padding-left")) { 114 | break; 115 | } 116 | if (style.includes("text-align:center")) { 117 | text = `
${text}
\n\n`; 118 | } else if (style.includes("text-align:right")) { 119 | text = `
${text}
\n\n`; 120 | } else if (style.includes("text-align:justify")) { 121 | text = `
${text}
\n\n`; 122 | } else { 123 | text += "\n\n"; 124 | } 125 | } else { 126 | text += "\n\n"; 127 | } 128 | result += text; 129 | } 130 | break; 131 | case "strong": 132 | case "b": 133 | result += ` **${processChildren(node, listLevel).trim()}** `; 134 | break; 135 | case "em": 136 | case "i": 137 | result += ` *${processChildren(node, listLevel).trim()}* `; 138 | break; 139 | case "u": 140 | result += ` ${processChildren(node, listLevel).trim()} `; 141 | break; 142 | case "s": 143 | case "strike": 144 | result += ` ~~${processChildren(node, listLevel).trim()}~~ `; 145 | break; 146 | case "a": 147 | { 148 | const node_class = node.getAttribute("class"); 149 | if (node_class && node_class.includes("footnote-backref")) { 150 | break; 151 | } 152 | const href = node.getAttribute("href") || ""; 153 | const text = processChildren(node, listLevel); 154 | result += ` [${text}](${href}) `; 155 | } 156 | break; 157 | case "img": 158 | { 159 | const src = node.getAttribute("src") || ""; 160 | const alt = node.getAttribute("alt") || ""; 161 | const cls = node.getAttribute("class") || ""; 162 | const width = node.getAttribute("width") || ""; 163 | const height = node.getAttribute("height") || ""; 164 | if (cls.includes("mathcode")) { 165 | result += `$$\n${alt}\n$$`; 166 | } else { 167 | if (src.includes('#pic_center')) { 168 | result += '\n\n'; 169 | } else { 170 | result += ' '; 171 | } 172 | if (width && height) { 173 | // result += `${alt}`; 174 | result += `${alt}`; 175 | } else { 176 | result += `![${alt}](${src})`; 177 | } 178 | } 179 | } 180 | break; 181 | case "ul": 182 | result += processList(node, listLevel, false); 183 | break; 184 | case "ol": 185 | result += processList(node, listLevel, true); 186 | break; 187 | case "blockquote": 188 | { 189 | const text = processChildren(node, listLevel) 190 | .trim() 191 | .split("\n") 192 | .map((line) => (line ? `> ${line}` : "> ")) 193 | .join("\n"); 194 | result += `${text}\n\n`; 195 | } 196 | break; 197 | case "pre": 198 | { 199 | const codeNode = node.querySelector("code"); 200 | if (codeNode) { 201 | const className = codeNode.className || ""; 202 | let language = ""; 203 | // 新版本的代码块,class 含有 language-xxx 204 | if (className.includes("language-")) { 205 | // const languageMatch = className.match(/language-(\w+)/); 206 | // language = languageMatch ? languageMatch[0] : ""; 207 | const languageMatch = className.split(" "); 208 | // 找到第一个 language- 开头的字符串 209 | for (const item of languageMatch) { 210 | if (item.startsWith("language-")) { 211 | language = item; 212 | break; 213 | } 214 | } 215 | language = language.replace("language-", ""); 216 | } 217 | // 老版本的代码块 218 | else if (className.startsWith("hljs")) { 219 | const languageMatch = className.split(" "); 220 | language = languageMatch ? languageMatch[1] : ""; 221 | } 222 | result += `\`\`\`${language}\n${processCodeBlock(codeNode)}\`\`\`\n\n`; 223 | } else { 224 | console.warn("Code block without element:", node.outerHTML); 225 | const codeText = node.textContent.replace(/^\s+|\s+$/g, ""); 226 | result += `\`\`\`\n${codeText}\n\`\`\`\n\n`; 227 | } 228 | } 229 | break; 230 | case "code": 231 | { 232 | const codeText = node.textContent; 233 | result += ` \`${codeText}\` `; 234 | } 235 | break; 236 | case "hr": 237 | if (node.getAttribute("id") !== "hr-toc") { 238 | result += `---\n\n`; 239 | } 240 | break; 241 | case "br": 242 | result += ` \n`; 243 | break; 244 | case "table": 245 | result += processTable(node) + "\n\n"; 246 | break; 247 | // case 'iframe': 248 | // { 249 | // const src = node.getAttribute('src') || ''; 250 | // const iframeHTML = node.outerHTML.replace('>', ' style="width: 100%; aspect-ratio: 2;">'); // Ensure proper closing 251 | // result += `${iframeHTML}\n\n`; 252 | // } 253 | // break; 254 | case "div": 255 | { 256 | const className = node.getAttribute("class") || ""; 257 | if (className.includes("csdn-video-box")) { 258 | // Handle video boxes or other specific divs 259 | // result += `
${processChildren(node, listLevel)}
\n\n`; 260 | 261 | // 不递归处理了,直接在这里进行解析 262 | const iframe = node.querySelector("iframe"); 263 | const src = iframe.getAttribute("src") || ""; 264 | const title = node.querySelector("p").textContent || ""; 265 | const iframeHTML = iframe.outerHTML.replace( 266 | ">", 267 | ' style="width: 100%; aspect-ratio: 2;">' 268 | ); // Ensure video box is full width 269 | result += `
${title}${iframeHTML}
\n\n`; 270 | } else if (className.includes("toc")) { 271 | const customTitle = node.querySelector("h4").textContent || ""; 272 | result += `**${customTitle}**\n\n[TOC]\n\n`; 273 | } else { 274 | result += processChildren(node, listLevel); 275 | } 276 | } 277 | break; 278 | case "span": 279 | { 280 | const node_class = node.getAttribute("class"); 281 | if (node_class) { 282 | if (node_class.includes("katex--inline")) { 283 | // class="katex-mathml" 284 | const mathml = clearSpecialChars(node.querySelector(".katex-mathml").textContent); 285 | const katex_html = clearSpecialChars(node.querySelector(".katex-html").textContent); 286 | // result += ` $${mathml.replace(katex_html, "")}$ `; 287 | 288 | if (mathml.startsWith(katex_html)) { 289 | result += ` $${mathml.replace(katex_html, "")}$ `; 290 | } else { 291 | // 字符串切片,去掉 mathml 开头等同长度的 katex_html,注意不能用 replace,因为 katex_html 里的字符顺序可能会变 292 | result += ` $${mathml.slice(katex_html.length)}$ `; 293 | } 294 | break; 295 | } else if (node_class.includes("katex--display")) { 296 | const mathml = clearSpecialChars(node.querySelector(".katex-mathml").textContent); 297 | const katex_html = clearSpecialChars(node.querySelector(".katex-html").textContent); 298 | // result += `$$\n${mathml.replace(katex_html, "")}\n$$\n\n`; 299 | 300 | if (mathml.startsWith(katex_html)) { 301 | result += `$$\n${mathml.replace(katex_html, "")}\n$$\n\n`; 302 | } else { 303 | // 字符串切片,去掉 mathml 开头等同长度的 katex_html,注意不能用 replace,因为 katex_html 里的字符顺序可能会变 304 | result += `$$\n${mathml.slice(katex_html.length)}\n$$\n\n`; 305 | } 306 | break; 307 | } 308 | } 309 | const style = node.getAttribute("style") || ""; 310 | if (style.includes("background-color") || style.includes("color")) { 311 | result += `${processChildren(node, listLevel)}`; 312 | } else { 313 | result += processChildren(node, listLevel); 314 | } 315 | } 316 | break; 317 | case "kbd": 318 | result += ` ${node.textContent} `; 319 | break 320 | case "mark": 321 | result += ` ${processChildren(node, listLevel)} `; 322 | break; 323 | case "sub": 324 | result += `${processChildren(node, listLevel)}`; 325 | break; 326 | case "sup": 327 | { 328 | const node_class = node.getAttribute("class"); 329 | if (node_class && node_class.includes("footnote-ref")) { 330 | result += `[^${node.textContent}]`; 331 | } else { 332 | result += `${processChildren(node, listLevel)}`; 333 | } 334 | } 335 | break; 336 | case "svg": 337 | { 338 | const style = node.getAttribute("style"); 339 | if (style && style.includes("display: none")) { 340 | break; 341 | } 342 | // 必须为 foreignObject 里的 div 添加属性 xmlns="http://www.w3.org/1999/xhtml" ,否则 typora 无法识别 343 | const foreignObjects = node.querySelectorAll('foreignObject'); 344 | for (const foreignObject of foreignObjects) { 345 | const divs = foreignObject.querySelectorAll('div'); 346 | divs.forEach(div => { 347 | div.setAttribute('xmlns', 'http://www.w3.org/1999/xhtml'); 348 | }); 349 | } 350 | // 检查是否有 style 标签存在于 svg 元素内,如果有,则需要将 svg 元素转换为 img 元素,用 Base64 编码的方式显示。否则直接返回 svg 元素 351 | if (node.querySelector("style")) { 352 | const base64 = svgToBase64(node.outerHTML); 353 | // result += `SVG Image`; 354 | result += `![SVG Image](data:image/svg+xml;base64,${base64})\n\n`; 355 | } else { 356 | result += `
${node.outerHTML}
\n\n`; 357 | } 358 | } 359 | break; 360 | case "section": // 这个是注脚的内容 361 | { 362 | const node_class = node.getAttribute("class"); 363 | if (node_class && node_class.includes("footnotes")) { 364 | result += processFootnotes(node); 365 | } 366 | } 367 | break; 368 | case "input": 369 | // 仅处理 checkbox 类型的 input 元素 370 | if (node.getAttribute("type") === "checkbox") { 371 | result += `[${node.checked ? "x" : " "}] `; 372 | } 373 | break; 374 | case 'dl': 375 | // 自定义列表,懒得解析了,直接用 html 吧 376 | result += `${shrinkHtml(node.outerHTML)}\n\n`; 377 | break; 378 | default: 379 | result += processChildren(node, listLevel); 380 | result += "\n\n"; 381 | break; 382 | } 383 | break; 384 | case TEXT_NODE: 385 | result += escapeMarkdown(node.textContent); 386 | break; 387 | case COMMENT_NODE: 388 | // Ignore comments 389 | break; 390 | default: 391 | break; 392 | } 393 | 394 | return result; 395 | } 396 | 397 | /** 398 | * 处理给定节点的子节点。 399 | * @param {Node} node - 父节点。 400 | * @param {number} listLevel - 当前列表嵌套级别。 401 | * @returns {string} - 子节点拼接后的 Markdown 字符串。 402 | */ 403 | function processChildren(node, listLevel) { 404 | let text = ""; 405 | node.childNodes.forEach((child) => { 406 | text += processNode(child, listLevel); 407 | }); 408 | return text; 409 | } 410 | 411 | /** 412 | * 处理列表元素 (