├── .gitignore ├── README.md ├── app ├── __init__.py ├── main.py ├── requirements.txt ├── schemas.py └── utils.py ├── requirements.txt ├── setup.py ├── static ├── image.png ├── p1.png ├── p2.png ├── p3.png └── p4.png ├── templates └── index.html └── tools ├── __init__.py ├── card.png ├── card_extractor.py ├── html2pdf.py ├── html2pic.py ├── html2pic2.py ├── llm_caller.py ├── llm_prompt.py ├── output.png ├── output_images ├── card_1.png ├── card_1_card_boundary.png ├── card_1_connected.png ├── card_1_content_regions.png ├── card_1_thresh.png └── page_1.png ├── output_selenium.png ├── pdf2card.py ├── prompt_config.py └── selenium2img.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Python相关忽略 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | *.so 6 | .Python 7 | build/ 8 | develop-eggs/ 9 | dist/ 10 | downloads/ 11 | eggs/ 12 | .eggs/ 13 | lib/ 14 | lib64/ 15 | parts/ 16 | sdist/ 17 | var/ 18 | wheels/ 19 | *.egg-info/ 20 | .installed.cfg 21 | *.egg 22 | MANIFEST 23 | 24 | # 虚拟环境 25 | .env 26 | .venv 27 | env/ 28 | venv/ 29 | ENV/ 30 | env.bak/ 31 | venv.bak/ 32 | .python-version 33 | 34 | # uv相关文件 35 | .uv/ 36 | .uvroot 37 | 38 | # 项目输出文件 39 | output/ 40 | # static/ 41 | # *.png 42 | *.pdf 43 | *.html 44 | !templates/*.html 45 | 46 | # IDE相关 47 | .idea/ 48 | .vscode/ 49 | *.swp 50 | *.swo 51 | .DS_Store 52 | 53 | # 日志文件 54 | *.log 55 | logs/ 56 | 57 | # 本地配置 58 | tools/.env 59 | 60 | # 测试相关 61 | .coverage 62 | htmlcov/ 63 | .pytest_cache/ 64 | nosetests.xml 65 | coverage.xml 66 | *.cover 67 | 68 | # 临时文件 69 | temp/ 70 | tmp/ 71 | .temp/ 72 | .tmp/ 73 | 74 | # Chrome WebDriver 75 | chromedriver* 76 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 智能卡片工坊 (Smart Card Workshop) 2 | 3 | 一款基于AI的内容转换工具,可以将文本、网页内容或HTML代码转换为精美的卡片图像。 4 | 5 | 在线体验:http://14.103.128.59:8000/ 6 | 7 | ![智能卡片工坊界面](./static/image.png) 8 | 9 | ## 应用截图 10 | 11 | 12 | ### 卡片生成效果 13 |
14 | 卡片效果 15 | 卡片效果 16 | 卡片效果 17 | 卡片效果 18 |
19 | 20 | ### 演示视频 21 | [观看B站演示视频](https://www.bilibili.com/video/BV1kQRSY9EzQ/) 22 | 23 | ## 功能特点 24 | 25 | - **多种输入方式**: 26 | - 直接输入需求文本,AI自动生成HTML 27 | - 粘贴已有HTML代码 28 | - 智能总结长文本内容 29 | - 抓取并总结网页内容 30 | 31 | - **基于AI的内容生成**: 32 | - 使用先进的大语言模型处理内容 33 | - 自动提取关键信息生成结构化卡片 34 | - 支持多种模型选择 35 | 36 | - **精美卡片导出**: 37 | - 生成适合移动设备的HTML卡片 38 | - 导出为PNG图片格式 39 | - 实时预览编辑效果 40 | 41 | ## 技术栈 42 | 43 | - **后端**:FastAPI (Python) 44 | - **前端**:Bootstrap 5, JavaScript 45 | - **AI模型**:通过ARK平台接入大语言模型 46 | - **图像处理**:Selenium WebDriver, OpenCV 47 | - **网页抓取**:Jina API 48 | 49 | ## 安装指南 50 | 51 | ### 使用uv包管理器安装 52 | 53 | [uv](https://github.com/astral-sh/uv) 是一个快速且现代的Python包管理器,推荐使用它来安装项目依赖。 54 | 55 | #### 安装uv 56 | 57 | ```bash 58 | # 使用pip安装uv 59 | pip install uv 60 | 61 | # 或者使用pipx安装(推荐) 62 | pipx install uv 63 | ``` 64 | 65 | #### 克隆项目并安装依赖 66 | 67 | ```bash 68 | # 克隆项目 69 | git clone https://github.com/datawhalechina/smart-card-workshop.git 70 | cd smart-card-workshop 71 | 72 | # 创建虚拟环境并安装依赖 73 | uv venv 74 | uv pip install -r app/requirements.txt 75 | ``` 76 | 77 | ### 环境变量配置 78 | 79 | 在`tools`目录下创建一个`.env`文件,添加以下配置: 80 | 81 | ARK指的是火山平台,你需要完成两步可以得到你的可用api_key 82 | 1. 创建api_key 83 | https://console.volcengine.com/ark/region:ark+cn-beijing/apiKey?apikey=%7B%7D 84 | 85 | 2. 创建模型推理接入点,记得选择deepseekv3 0324版本 86 | https://console.volcengine.com/ark/region:ark+cn-beijing/endpoint?config=%7B%7D 87 | 88 | Jina api需要再下面的网页获取token,如果不用网页获取功能可以随便填一个key哦 89 | https://jina.ai/zh-CN/ 90 | 91 | ``` 92 | # ARK平台API密钥 93 | ARK_API_KEY="your_ark_api_key_here" 94 | 95 | # Jina API密钥(用于网页抓取) 96 | JINA_API_KEY="your_jina_api_key_here" 97 | ``` 98 | 99 | ## 运行应用 100 | 101 | ```bash 102 | # 激活虚拟环境 103 | # Windows 104 | .venv\Scripts\activate 105 | # Linux/MacOS 106 | source .venv/bin/activate 107 | 108 | # 启动应用 109 | uvicorn app.main:app --reload 110 | ``` 111 | 112 | 然后在浏览器中访问 `http://localhost:8000` 即可使用应用。 113 | 114 | ## 使用指南 115 | 116 | 1. **需求生成**:输入您需要的卡片内容描述,AI将生成相应的HTML卡片 117 | 2. **智能总结**:输入长文本,自动提取关键信息并生成摘要 118 | 3. **总结网页**:输入网页URL,抓取并总结网页内容 119 | 4. **粘贴HTML**:直接粘贴您已有的HTML代码生成卡片 120 | 121 | ## 系统要求 122 | 123 | - Python 3.8+ 124 | - Google Chrome浏览器 (用于Selenium渲染) 125 | - 网络连接 (用于API调用) 126 | 127 | ## 许可证 128 | 129 | [Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)](https://creativecommons.org/licenses/by-nc/4.0/) 130 | 131 | 本项目采用CC BY-NC 4.0协议,您可以自由地: 132 | - 共享 — 在任何媒介以任何形式复制、发行本作品 133 | - 演绎 — 修改、转换或以本作品为基础进行创作 134 | 135 | 惟须遵守下列条件: 136 | - 署名 — 您必须给出适当的署名,提供指向本许可协议的链接,同时标明是否对原始作品作了修改 137 | - 非商业性使用 — 您不得将本作品用于商业目的 138 | 139 | 详细许可条款请查看[完整法律文本](https://creativecommons.org/licenses/by-nc/4.0/legalcode.zh-Hans)。 140 | 141 | ## 贡献指南 142 | 143 | 欢迎贡献代码、报告问题或提供改进建议。请遵循以下步骤: 144 | 145 | 1. Fork本项目 146 | 2. 创建您的特性分支 (`git checkout -b feature/amazing-feature`) 147 | 3. 提交您的更改 (`git commit -m '添加一些很棒的功能'`) 148 | 4. 将您的更改推送到分支 (`git push origin feature/amazing-feature`) 149 | 5. 提交Pull Request 150 | 151 | ## 联系方式 152 | 153 | 如有任何问题或建议,请通过以下方式联系我们: 154 | 155 | - 邮箱:bard0wang@foxmail.com 156 | - GitHub Issues:[提交问题](https://github.com/Bald0Wang/Smart_Card_Workshop/issues) 157 | -------------------------------------------------------------------------------- /app/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/smart-card-workshop/fa1a010c3da706684d77578648e750f59f60da1b/app/__init__.py -------------------------------------------------------------------------------- /app/main.py: -------------------------------------------------------------------------------- 1 | from fastapi import FastAPI, Request, HTTPException 2 | from fastapi.responses import HTMLResponse, FileResponse, JSONResponse 3 | from fastapi.staticfiles import StaticFiles 4 | from fastapi.templating import Jinja2Templates 5 | from pydantic import BaseModel, Field 6 | from typing import Optional, List 7 | import os 8 | import uuid 9 | import logging 10 | import cv2 11 | import numpy as np 12 | from enum import Enum 13 | # from tools.pdf2card import pdf_to_images, extract_card_from_image 14 | # from tools.html2pdf import html_to_pdf 15 | from tools.selenium2img import html_to_image 16 | from tools.card_extractor import extract_card_from_image 17 | import asyncio 18 | from dotenv import load_dotenv 19 | import requests 20 | # Import functions from llm_prompt.py 21 | from tools.llm_prompt import call_ark_llm, extract_html_from_response 22 | from tools.prompt_config import SYSTEM_PROMPT_WEB_DESIGNER, USER_PROMPT_WEB_DESIGNER, SYSTEM_PROMPT_SUMMARIZE_2MD 23 | from tools.llm_caller import generate_content_with_llm 24 | import os 25 | 26 | # 配置日志 27 | logging.basicConfig(level=logging.INFO) 28 | logger = logging.getLogger(__name__) 29 | 30 | # Load environment variables from .env file with correct path 31 | env_path = os.path.join(os.path.dirname(__file__), "..", "tools", ".env") 32 | load_dotenv(env_path) 33 | 34 | # Jina API Key for web content extraction 35 | JINA_API_URL = "https://r.jina.ai/" 36 | JINA_API_KEY = os.getenv("JINA_API_KEY") 37 | 38 | # Log environment variable loading 39 | logger.info(f"Loading environment variables from: {env_path}") 40 | logger.info(f"JINA_API_KEY loaded: {'Yes' if JINA_API_KEY else 'No'}") 41 | 42 | app = FastAPI() 43 | 44 | 45 | # Constants 46 | OUTPUT_DIR = "output" 47 | STATIC_DIR = "static" 48 | TEMPLATES_DIR = "templates" 49 | 50 | # Ensure output and static directories exist 51 | os.makedirs(OUTPUT_DIR, exist_ok=True) 52 | app_static_dir = os.path.join(os.path.dirname(__file__), STATIC_DIR) 53 | os.makedirs(app_static_dir, exist_ok=True) 54 | app.mount(f"/{STATIC_DIR}", StaticFiles(directory=app_static_dir), name=STATIC_DIR) 55 | 56 | # Setup templates 57 | # Adjust the path to point to the root templates directory, relative to main.py 58 | templates = Jinja2Templates(directory=os.path.join(os.path.dirname(__file__), "..", TEMPLATES_DIR)) 59 | 60 | class MarkdownRequest(BaseModel): 61 | markdown: str 62 | style: Optional[str] = Field(default="default") 63 | file_id: Optional[str] = None 64 | 65 | class GenerationMode(str, Enum): 66 | DIRECT = "direct" 67 | PROMPT = "prompt" 68 | PASTE = "paste" 69 | 70 | class GenerationRequest(BaseModel): 71 | mode: GenerationMode 72 | prompt: Optional[str] = None 73 | template: Optional[str] = None 74 | html_input: Optional[str] = None 75 | style: Optional[str] = Field(default="default") 76 | model: Optional[str] = None 77 | temperature: Optional[float] = 0.7 78 | 79 | class GenerationResponseData(BaseModel): 80 | """API的数据响应结构,用于返回生成内容的状态和信息""" 81 | file_id: str 82 | success: bool 83 | html_path: Optional[str] = None 84 | image_path: Optional[str] = None 85 | card_path: Optional[str] = None 86 | raw_llm_response: Optional[str] = None 87 | message: Optional[str] = None 88 | 89 | class SummarizeRequest(BaseModel): 90 | content: str 91 | model: Optional[str] = None 92 | 93 | class SummarizeResponse(BaseModel): 94 | summary: str 95 | success: bool 96 | message: Optional[str] = None 97 | 98 | class WebFetchRequest(BaseModel): 99 | url: str 100 | 101 | class WebFetchResponse(BaseModel): 102 | content: str 103 | success: bool 104 | message: Optional[str] = None 105 | 106 | class GenerationResponse(BaseModel): 107 | file_id: str 108 | html_url: str 109 | pdf_url: str 110 | image_url: Optional[str] = None 111 | 112 | def generate_html_from_markdown(markdown_content: str, style: str, file_id: str, request: Request): 113 | """Generates HTML from Markdown content and saves it.""" 114 | # Render Markdown to basic HTML 115 | basic_html_content = markdown_content 116 | # Use Jinja2 template to wrap HTML content, include CSS based on style 117 | full_html = templates.get_template("card_template.html").render( 118 | {"request": request, "content": basic_html_content, "style": style} 119 | ) 120 | html_path = os.path.join(OUTPUT_DIR, f"{file_id}.html") 121 | with open(html_path, "w", encoding="utf-8") as f: 122 | f.write(full_html) 123 | logger.info(f"HTML file generated: {html_path}") 124 | return html_path 125 | 126 | async def generate_card(payload: GenerationRequest) -> GenerationResponseData: 127 | """根据提供的请求负载生成卡片。""" 128 | # 生成唯一的文件ID 129 | file_id = str(uuid.uuid4()) 130 | llm_raw_response = "" 131 | html_path = "" 132 | 133 | # 根据生成模式处理 134 | if payload.mode == GenerationMode.PROMPT: 135 | # PROMPT模式:首先调用LLM转换Markdown到HTML 136 | if not payload.prompt: 137 | raise HTTPException(status_code=400, detail="PROMPT模式需要提供prompt") 138 | 139 | logger.info(f"处理PROMPT模式 - file_id: {file_id}") 140 | 141 | # 合并用户提示和设计师提示 142 | combined_prompt = USER_PROMPT_WEB_DESIGNER + payload.prompt 143 | 144 | # 保存原始提示到文件 145 | prompt_path = os.path.join(OUTPUT_DIR, f"{file_id}_prompt.txt") 146 | with open(prompt_path, "w", encoding="utf-8") as f: 147 | f.write(payload.prompt) 148 | 149 | try: 150 | # 调用LLM生成内容 151 | logger.info(f"使用模型 '{payload.model or 'default'}' 调用LLM") 152 | 153 | # 使用直接从llm_prompt.py导入的call_ark_llm函数 154 | model_to_use = payload.model or "deepseek-v3-250324" # 默认模型 155 | temperature_to_use = payload.temperature or 0.7 # 默认温度 156 | 157 | # 使用同步调用,放到异步线程中执行 158 | llm_raw_response = await asyncio.to_thread( 159 | call_ark_llm, 160 | prompt=combined_prompt, 161 | model_id=model_to_use, 162 | temperature=temperature_to_use 163 | ) 164 | 165 | # 从LLM响应中提取HTML 166 | html_content = extract_html_from_response(llm_raw_response) 167 | 168 | # 保存提取的HTML到文件 169 | html_path = os.path.join(OUTPUT_DIR, f"{file_id}.html") 170 | with open(html_path, "w", encoding="utf-8") as f: 171 | f.write(html_content) 172 | logger.info(f"HTML内容已保存到: {html_path}") 173 | 174 | except Exception as e: 175 | logger.error(f"通过LLM生成内容时发生错误: {e}", exc_info=True) 176 | raise HTTPException(status_code=500, detail=f"LLM调用期间发生内部服务器错误: {str(e)}") 177 | 178 | elif payload.mode == GenerationMode.PASTE: 179 | # PASTE模式:直接使用请求中提供的HTML内容 180 | if not payload.html_input: 181 | raise HTTPException(status_code=400, detail="PASTE模式需要提供HTML输入") 182 | logger.info(f"处理PASTE模式 - file_id: {file_id}") 183 | html_path = os.path.join(OUTPUT_DIR, f"{file_id}.html") 184 | with open(html_path, "w", encoding="utf-8") as f: 185 | f.write(payload.html_input) 186 | logger.info(f"HTML文件已直接保存: {html_path}") 187 | else: 188 | # 无效的生成模式 189 | raise HTTPException(status_code=400, detail="无效的生成模式") 190 | 191 | # 直接从HTML生成图像 192 | image_path = os.path.join(OUTPUT_DIR, f"{file_id}.png") 193 | card_image_path = os.path.join(OUTPUT_DIR, f"{file_id}_card.png") 194 | 195 | # 使用Selenium从HTML生成图像 196 | logger.info(f"使用Selenium从HTML生成图像: {html_path} -> {image_path}") 197 | image_success = html_to_image(html_path, image_path, width=1200) 198 | 199 | if not image_success: 200 | logger.error(f"从HTML生成图像失败: {html_path}") 201 | raise HTTPException(status_code=500, detail="生成图像失败") 202 | 203 | # 从生成的图像提取卡片 204 | logger.info(f"从图像提取卡片: {image_path} -> {card_image_path}") 205 | card_success = extract_card_from_image(image_path, card_image_path, min_area=500, debug=True) 206 | 207 | if not card_success: 208 | logger.warning(f"卡片提取失败,使用原始图像作为备选: {image_path} -> {card_image_path}") 209 | # 如果提取失败,使用原始图像作为备用 210 | import shutil 211 | shutil.copy(image_path, card_image_path) 212 | 213 | # 构造API URL 214 | html_url = f"/api/download-html/{file_id}" 215 | image_url = f"/api/download-image/{file_id}" 216 | 217 | # 构造响应数据 218 | response_data = GenerationResponseData( 219 | file_id=file_id, 220 | success=True, 221 | html_path=html_url, 222 | image_path=image_url, 223 | card_path=image_url, 224 | raw_llm_response=llm_raw_response if payload.mode == GenerationMode.PROMPT else None, 225 | message="卡片生成成功" 226 | ) 227 | 228 | return response_data 229 | 230 | @app.post("/api/generate") 231 | async def generate_files(payload: GenerationRequest): 232 | """接收生成请求,根据模式处理,并返回文件URL。""" 233 | response_data = await generate_card(payload) 234 | return response_data 235 | 236 | @app.get("/") 237 | async def read_root(request: Request): 238 | """Serves the main HTML page.""" 239 | return templates.TemplateResponse("index.html", {"request": request}) 240 | 241 | @app.get("/api/download-html/{file_id}") 242 | async def download_html(file_id: str): 243 | """Serves the generated HTML file.""" 244 | file_path = os.path.join(OUTPUT_DIR, f"{file_id}.html") 245 | if not os.path.exists(file_path): 246 | raise HTTPException(status_code=404, detail="HTML file not found") 247 | return FileResponse(file_path, media_type='text/html', filename=f"{file_id}.html") 248 | 249 | @app.get("/api/download-image/{file_id}") 250 | async def download_image(file_id: str): 251 | """Serves the generated card image file.""" 252 | image_path = os.path.join(OUTPUT_DIR, f"{file_id}.png") 253 | card_image_path = os.path.join(OUTPUT_DIR, f"{file_id}_card.png") 254 | 255 | if not os.path.exists(card_image_path): 256 | logger.warning(f"Card image file {card_image_path} not found. Attempting to regenerate.") 257 | html_path = os.path.join(OUTPUT_DIR, f"{file_id}.html") 258 | 259 | if not os.path.exists(html_path): 260 | raise HTTPException(status_code=404, detail="HTML file not found, cannot regenerate image") 261 | 262 | # Generate image directly from HTML using Selenium 263 | logger.info(f"Generating image from HTML {html_path} using Selenium") 264 | if not html_to_image(html_path, image_path, width=1200): 265 | raise HTTPException(status_code=500, detail="Failed to generate image from HTML") 266 | 267 | # Extract card from the generated image 268 | logger.info(f"Extracting card from {image_path} to {card_image_path}") 269 | if not extract_card_from_image(image_path, card_image_path, min_area=500, debug=True): 270 | logger.warning("Card extraction failed, using the full image instead") 271 | # If extraction fails, use the original image as fallback 272 | if os.path.exists(image_path): 273 | import shutil 274 | shutil.copy(image_path, card_image_path) 275 | else: 276 | raise HTTPException(status_code=500, detail="Failed to generate card image") 277 | 278 | if not os.path.exists(card_image_path): 279 | raise HTTPException(status_code=404, detail="Card image file not found") 280 | 281 | return FileResponse(card_image_path, media_type='image/png', filename=f"{file_id}_card.png") 282 | 283 | @app.post("/api/summarize", response_model=SummarizeResponse) 284 | async def summarize_content(summarize_req: SummarizeRequest): 285 | """ 286 | 接收用户内容并生成智能总结 287 | """ 288 | try: 289 | content = summarize_req.content 290 | 291 | if not content: 292 | return SummarizeResponse( 293 | summary="", 294 | success=False, 295 | message="请提供需要总结的内容" 296 | ) 297 | 298 | # 构造总结提示词 299 | summarize_prompt = f"""请对以下内容进行简洁明了的总结,突出关键信息,保持语言简练: 300 | 301 | {content} 302 | 303 | 总结: 304 | """ 305 | 306 | # 调用LLM生成总结 307 | summary = await generate_content_with_llm( 308 | prompt=summarize_prompt, 309 | sys_prompt=SYSTEM_PROMPT_SUMMARIZE_2MD, 310 | model=summarize_req.model, 311 | temperature=0.5 # 使用较低的温度以获得更一致的总结 312 | ) 313 | 314 | return SummarizeResponse( 315 | summary=summary.strip(), 316 | success=True 317 | ) 318 | except Exception as e: 319 | logger.error(f"内容总结失败: {str(e)}") 320 | return SummarizeResponse( 321 | summary="", 322 | success=False, 323 | message=f"总结生成失败: {str(e)}" 324 | ) 325 | 326 | 327 | @app.post("/api/fetch-web", response_model=WebFetchResponse) 328 | async def fetch_web_content(fetch_req: WebFetchRequest): 329 | """ 330 | 使用Jina服务获取网页内容 331 | """ 332 | try: 333 | url = fetch_req.url.strip() 334 | 335 | if not url: 336 | return WebFetchResponse( 337 | content="", 338 | success=False, 339 | message="请提供有效的URL" 340 | ) 341 | 342 | # 构建Jina API请求 343 | jina_url = f"{JINA_API_URL}{url}" 344 | 345 | # 检查是否成功获取API密钥 346 | if not JINA_API_KEY: 347 | logger.error("JINA_API_KEY environment variable not found") 348 | return WebFetchResponse( 349 | content="", 350 | success=False, 351 | message="Jina API密钥未配置,请检查环境变量" 352 | ) 353 | 354 | headers = {'Authorization': f'Bearer {JINA_API_KEY}'} 355 | 356 | logger.info(f"Fetching web content from: {url}") 357 | 358 | # 使用requests获取内容 359 | response = requests.get(jina_url, headers=headers) 360 | response.raise_for_status() # 会抛出异常如果请求失败 361 | 362 | # 提取网页内容 363 | content = response.text 364 | 365 | return WebFetchResponse( 366 | content=content, 367 | success=True 368 | ) 369 | except Exception as e: 370 | logger.error(f"获取网页内容失败: {str(e)}") 371 | return WebFetchResponse( 372 | content="", 373 | success=False, 374 | message=f"获取网页内容失败: {str(e)}" 375 | ) 376 | 377 | 378 | # --- Main Execution --- 379 | if __name__ == "__main__": 380 | import uvicorn 381 | if not os.path.exists(OUTPUT_DIR): 382 | os.makedirs(OUTPUT_DIR) 383 | # Startup warnings moved to llm_caller or can be checked here if needed 384 | # Check API keys availability using os.getenv again if critical for startup 385 | if not os.getenv("ARK_API_KEY") and not os.getenv("DEEPSEEK_API_KEY"): 386 | logger.warning("CRITICAL: Neither ARK_API_KEY nor DEEPSEEK_API_KEY environment variables are set. PROMPT mode WILL fail.") 387 | 388 | uvicorn.run(app, host="0.0.0.0", port=8000) -------------------------------------------------------------------------------- /app/requirements.txt: -------------------------------------------------------------------------------- 1 | fastapi 2 | uvicorn[standard] 3 | httpx 4 | jinja2 5 | pyppeteer 6 | opencv-python 7 | numpy 8 | openai 9 | python-dotenv # Added for loading .env files 10 | selenium # Added for browser automation 11 | # webdriver-manager # Removed as WebDriver is now handled directly 12 | requests # Added for HTTP requests to Jina API 13 | -------------------------------------------------------------------------------- /app/schemas.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/smart-card-workshop/fa1a010c3da706684d77578648e750f59f60da1b/app/schemas.py -------------------------------------------------------------------------------- /app/utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import shutil 3 | 4 | def cleanup_old_files(directory: str, max_age_hours: int = 24): 5 | """清理旧文件""" 6 | # 实现可以根据需要添加 7 | pass -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | fastapi 2 | uvicorn[standard] 3 | httpx 4 | html2image 5 | pyppeteer 6 | jinja2 7 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | """ 3 | 智能卡片工坊环境配置脚本 4 | 此脚本可在不同平台上支持快速设置uv环境并安装依赖 5 | """ 6 | 7 | import os 8 | import subprocess 9 | import sys 10 | import platform 11 | 12 | # 检测操作系统 13 | IS_WINDOWS = platform.system() == "Windows" 14 | VENV_DIR = ".venv" 15 | VENV_BIN = os.path.join(VENV_DIR, "Scripts" if IS_WINDOWS else "bin") 16 | VENV_PYTHON = os.path.join(VENV_BIN, "python") 17 | VENV_UV = os.path.join(VENV_BIN, "uv") 18 | 19 | def run_command(cmd, desc=None, check=True): 20 | """执行命令并输出状态""" 21 | if desc: 22 | print(f">>> {desc}...") 23 | print(f"$ {' '.join(cmd)}") 24 | result = subprocess.run(cmd, check=check) 25 | return result.returncode == 0 26 | 27 | def check_uv(): 28 | """检查uv是否已安装""" 29 | try: 30 | subprocess.run(["uv", "--version"], stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=False) 31 | return True 32 | except FileNotFoundError: 33 | return False 34 | 35 | def install_deps_with_uv(): 36 | """使用uv安装项目依赖""" 37 | # 创建虚拟环境 38 | run_command(["uv", "venv", VENV_DIR], "创建虚拟环境") 39 | 40 | # 安装依赖 41 | uv_pip = [VENV_UV, "pip"] 42 | run_command(uv_pip + ["install", "-r", "app/requirements.txt"], "安装项目依赖") 43 | 44 | # 安装开发依赖(如果存在) 45 | dev_reqs = "app/requirements-dev.txt" 46 | if os.path.exists(dev_reqs): 47 | run_command(uv_pip + ["install", "-r", dev_reqs], "安装开发依赖") 48 | 49 | print("\n✅ 环境设置完成!") 50 | print(f"使用以下命令激活环境:\n" 51 | f"{'> ' if IS_WINDOWS else '$ '}" 52 | f"{os.path.join(VENV_DIR, 'Scripts', 'activate') if IS_WINDOWS else f'source {os.path.join(VENV_DIR, 'bin', 'activate')}'}") 53 | print("然后启动应用:\n" 54 | f"$ cd app && uvicorn main:app --reload") 55 | 56 | def install_with_pip(): 57 | """使用标准pip设置虚拟环境并安装依赖""" 58 | # 创建虚拟环境 59 | run_command([sys.executable, "-m", "venv", VENV_DIR], "创建虚拟环境") 60 | 61 | # 安装依赖 62 | pip_cmd = [VENV_PYTHON, "-m", "pip"] 63 | run_command(pip_cmd + ["install", "--upgrade", "pip"], "升级pip") 64 | run_command(pip_cmd + ["install", "-r", "app/requirements.txt"], "安装项目依赖") 65 | 66 | # 安装开发依赖(如果存在) 67 | dev_reqs = "app/requirements-dev.txt" 68 | if os.path.exists(dev_reqs): 69 | run_command(pip_cmd + ["install", "-r", dev_reqs], "安装开发依赖") 70 | 71 | print("\n✅ 环境设置完成!") 72 | print(f"使用以下命令激活环境:\n" 73 | f"{'> ' if IS_WINDOWS else '$ '}" 74 | f"{os.path.join(VENV_DIR, 'Scripts', 'activate') if IS_WINDOWS else f'source {os.path.join(VENV_DIR, 'bin', 'activate')}'}") 75 | print("然后启动应用:\n" 76 | f"$ cd app && uvicorn main:app --reload") 77 | 78 | def main(): 79 | """主函数""" 80 | print("=== 智能卡片工坊环境设置 ===") 81 | print(f"操作系统: {platform.system()} {platform.release()}") 82 | print(f"Python版本: {platform.python_version()}") 83 | 84 | if check_uv(): 85 | print("检测到uv,将使用uv安装依赖") 86 | install_deps_with_uv() 87 | else: 88 | print("未检测到uv,将使用pip安装依赖") 89 | print("推荐使用uv以获得更快的依赖安装体验,运行: pip install uv") 90 | install_with_pip() 91 | 92 | if __name__ == "__main__": 93 | main() 94 | -------------------------------------------------------------------------------- /static/image.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/smart-card-workshop/fa1a010c3da706684d77578648e750f59f60da1b/static/image.png -------------------------------------------------------------------------------- /static/p1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/smart-card-workshop/fa1a010c3da706684d77578648e750f59f60da1b/static/p1.png -------------------------------------------------------------------------------- /static/p2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/smart-card-workshop/fa1a010c3da706684d77578648e750f59f60da1b/static/p2.png -------------------------------------------------------------------------------- /static/p3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/smart-card-workshop/fa1a010c3da706684d77578648e750f59f60da1b/static/p3.png -------------------------------------------------------------------------------- /static/p4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/smart-card-workshop/fa1a010c3da706684d77578648e750f59f60da1b/static/p4.png -------------------------------------------------------------------------------- /templates/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 智能卡片工坊 8 | 9 | 10 | 11 | 226 | 227 | 228 | 229 |
230 |
231 |

232 | 233 | 智能卡片工坊 234 |

235 |

AI内容转卡片专家

236 |
237 |
238 | 239 |
240 |
241 |
242 |
243 |
244 |
生成HTML
245 |
246 |
247 | 265 | 266 |
267 |
268 |
269 | 270 | 272 |
273 |
274 | 275 | 279 |
280 | 281 |
282 | 283 |
284 |
285 | 286 | 288 |
289 | 290 | 291 | 296 | 297 | 298 |
299 | 300 |
301 |
302 | 303 | 305 |
306 | 307 | 308 | 313 | 314 | 319 | 320 | 321 |
322 | 323 |
324 |
325 | 326 | 328 |
329 | 330 |
331 |
332 | 333 |
334 |
335 | Loading... 336 |
337 |

正在生成中,请稍候...

338 |
339 | 340 | 341 |
342 |
LLM原始响应:
343 |
344 |
345 |
346 |
347 | 348 | 360 |
361 | 362 |
363 |
364 |
365 |
卡片预览
366 |
367 |
368 | 369 |
370 |
371 |
372 |
373 |
374 | 375 | 376 | 698 | 699 | 700 | -------------------------------------------------------------------------------- /tools/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/smart-card-workshop/fa1a010c3da706684d77578648e750f59f60da1b/tools/__init__.py -------------------------------------------------------------------------------- /tools/card.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/smart-card-workshop/fa1a010c3da706684d77578648e750f59f60da1b/tools/card.png -------------------------------------------------------------------------------- /tools/card_extractor.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | import os 4 | 5 | def extract_card_from_image(image_path, output_path, min_area=500, debug=False): 6 | """ 7 | 从图片中提取包含所有文字/内容块的完整卡片区域,优先识别最大内容块。 8 | 9 | 参数: 10 | image_path: 输入图片路径 11 | output_path: 输出卡片图片路径 12 | min_area: 最小文字块面积(像素) 13 | debug: 是否保存调试图片 14 | """ 15 | # 读取图片 16 | img = cv2.imread(image_path) 17 | if img is None: 18 | print(f"无法读取图片: {image_path}") 19 | return False 20 | 21 | original = img.copy() 22 | height, width = img.shape[:2] 23 | 24 | # 假设输入图像来自更高分辨率的源,设置缩放因子 25 | # 如果图像本身就是原始分辨率,可以设为1 26 | scale_factor = 2 # 根据实际情况调整 27 | 28 | # 根据缩放因子调整参数 29 | scaled_min_area = min_area * (scale_factor ** 2) 30 | 31 | # 转换为灰度图 32 | gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 33 | 34 | # 预处理:高斯模糊 + 自适应阈值 35 | blur_kernel_size = max(3, int(3 * scale_factor)) 36 | if blur_kernel_size % 2 == 0: blur_kernel_size += 1 37 | blurred = cv2.GaussianBlur(gray, (blur_kernel_size, blur_kernel_size), 0) 38 | 39 | block_size = max(11, int(11 * scale_factor)) 40 | if block_size % 2 == 0: block_size += 1 41 | thresh = cv2.adaptiveThreshold(blurred, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, 42 | cv2.THRESH_BINARY_INV, block_size, 2) 43 | 44 | if debug: 45 | cv2.imwrite(output_path.replace('.png', '_thresh.png'), thresh) 46 | 47 | # 形态学操作连接文字区域 48 | kernel_h_size = max(15, int(15 * scale_factor)) 49 | kernel_v_size = max(15, int(15 * scale_factor)) 50 | kernel_h = np.ones((1, kernel_h_size), np.uint8) 51 | kernel_v = np.ones((kernel_v_size, 1), np.uint8) 52 | connected = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel_h) 53 | connected = cv2.morphologyEx(connected, cv2.MORPH_CLOSE, kernel_v) 54 | 55 | if debug: 56 | cv2.imwrite(output_path.replace('.png', '_connected.png'), connected) 57 | 58 | # 查找轮廓 59 | contours, _ = cv2.findContours(connected, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) 60 | 61 | # --- 新思路:找到最大的轮廓作为主要卡片区域 --- 62 | largest_contour = None 63 | max_area = 0 64 | all_contours_bbox = [] # 存储所有有效轮廓的边界框 65 | 66 | if contours: 67 | # 过滤掉面积过小的轮廓,并记录边界框 68 | valid_contours = [] 69 | for contour in contours: 70 | area = cv2.contourArea(contour) 71 | if area >= scaled_min_area: 72 | valid_contours.append(contour) 73 | all_contours_bbox.append(cv2.boundingRect(contour)) 74 | if area > max_area: 75 | max_area = area 76 | largest_contour = contour 77 | 78 | if largest_contour is not None: 79 | # 获取最大轮廓的边界框 80 | card_x, card_y, card_w, card_h = cv2.boundingRect(largest_contour) 81 | 82 | if debug: 83 | debug_img = original.copy() 84 | cv2.drawContours(debug_img, [largest_contour], -1, (0, 255, 0), 3) 85 | cv2.rectangle(debug_img, (card_x, card_y), (card_x + card_w, card_y + card_h), (0, 0, 255), 2) 86 | cv2.imwrite(output_path.replace('.png', '_largest_contour.png'), debug_img) 87 | 88 | # --- 微调边界以包含所有检测到的内容 --- 89 | if all_contours_bbox: 90 | min_x, min_y = card_x, card_y 91 | max_x, max_y = card_x + card_w, card_y + card_h 92 | 93 | for x_c, y_c, w_c, h_c in all_contours_bbox: 94 | min_x = min(min_x, x_c) 95 | min_y = min(min_y, y_c) 96 | max_x = max(max_x, x_c + w_c) 97 | max_y = max(max_y, y_c + h_c) 98 | 99 | # 更新卡片边界以包含所有内容 100 | card_x, card_y = min_x, min_y 101 | card_w = max_x - min_x 102 | card_h = max_y - min_y 103 | 104 | # --- 添加边距 --- 105 | # 根据卡片尺寸动态添加边距 106 | padding_h = int(width * 0.015) # 水平边距 1.5% 107 | padding_v = int(height * 0.015) # 垂直边距 1.5% 108 | 109 | x_start = max(0, card_x - padding_h) 110 | y_start = max(0, card_y - padding_v) 111 | x_end = min(width, card_x + card_w + padding_h) 112 | y_end = min(height, card_y + card_h + padding_v) 113 | 114 | # 提取卡片区域 115 | card = original[y_start:y_end, x_start:x_end] 116 | 117 | # 保存结果 118 | cv2.imwrite(output_path, card) 119 | print(f"卡片已提取保存到 {output_path} (基于最大内容块)") 120 | return True 121 | else: 122 | print("未找到足够大的内容轮廓") 123 | else: 124 | print("未找到任何轮廓") 125 | 126 | # 如果主要方法失败,尝试使用备用方法(如果需要,可以重新启用) 127 | print("主要方法失败,尝试备用方法...") 128 | 129 | # --- MSER备用方法 --- 130 | mser = cv2.MSER_create() 131 | regions, _ = mser.detectRegions(gray) 132 | if regions: 133 | # ... (此处省略MSER逻辑,与之前类似,如果需要可以恢复) ... 134 | # 如果MSER成功,保存并返回True 135 | pass 136 | 137 | # --- Otsu备用方法 --- 138 | _, otsu = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU) 139 | kernel = np.ones((5, 5), np.uint8) 140 | morph = cv2.morphologyEx(otsu, cv2.MORPH_CLOSE, kernel) 141 | otsu_contours, _ = cv2.findContours(morph, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) 142 | if otsu_contours: 143 | # ... (此处省略Otsu逻辑,与之前类似,如果需要可以恢复) ... 144 | # 如果Otsu成功,保存并返回True 145 | pass 146 | 147 | # 所有方法失败 148 | print("所有方法均无法提取卡片内容区域,返回原始图像并进行超分处理") 149 | # 使用线性插值将原始图像放大3倍 150 | h, w = original.shape[:2] 151 | resized_original = cv2.resize(original, (w * 3, h * 3), interpolation=cv2.INTER_LINEAR) 152 | cv2.imwrite(output_path, resized_original) 153 | return False 154 | 155 | # 示例用法 156 | if __name__ == "__main__": 157 | # 测试函数 158 | input_image = "path/to/your/input_image.png" # 替换为实际图片路径 159 | output_image = "path/to/your/output_card.png" # 替换为实际输出路径 160 | 161 | if os.path.exists(input_image): 162 | extract_card_from_image(input_image, output_image, debug=True) 163 | else: 164 | print(f"测试图片不存在: {input_image}") 165 | -------------------------------------------------------------------------------- /tools/html2pdf.py: -------------------------------------------------------------------------------- 1 | import pdfkit 2 | 3 | def html_to_pdf(html_path, output_path, options=None): 4 | """ 5 | 将HTML转换为PDF 6 | 7 | 参数: 8 | html_path: HTML文件路径或HTML字符串 9 | output_path: 输出PDF路径(如'output.pdf') 10 | options: 可选配置字典 11 | """ 12 | default_options = { 13 | 'encoding': "UTF-8", 14 | 'quiet': '', 15 | 'enable-local-file-access': None # 允许访问本地文件 16 | } 17 | 18 | if options: 19 | default_options.update(options) 20 | 21 | try: 22 | # 判断输入是文件还是HTML字符串 23 | if html_path.endswith('.html'): 24 | pdfkit.from_file(html_path, output_path, options=default_options) 25 | else: 26 | pdfkit.from_string(html_path, output_path, options=default_options) 27 | 28 | print(f"PDF已成功保存到 {output_path}") 29 | except Exception as e: 30 | print(f"转换失败: {str(e)}") 31 | 32 | # 使用示例 33 | html_to_pdf('./1c6ccb00-f117-4cc0-af04-90b008c2744c.html', 'output_weasyprint.pdf')# 或直接使用HTML字符串 34 | # html_to_pdf('

Hello World

', 'output.pdf') 35 | 36 | -------------------------------------------------------------------------------- /tools/html2pic.py: -------------------------------------------------------------------------------- 1 | import imgkit 2 | 3 | def html_to_image(html_path, output_path, options=None): 4 | """ 5 | 将HTML文件转换为图片 6 | 7 | 参数: 8 | html_path: HTML文件路径或HTML字符串 9 | output_path: 输出图片路径(如'output.png') 10 | options: 可选配置字典 11 | """ 12 | default_options = { 13 | 'encoding': "UTF-8", 14 | 'enable-local-file-access': None, 15 | 'custom-header': [ 16 | ('Accept-Language', 'zh-CN,zh;q=0.9') 17 | ], 18 | 'quiet': None, 19 | 'quality':100, 20 | 'disable-smart-width': None, 21 | 'width': 500, 22 | 'minimum-font-size': 16, 23 | 'zoom': 100, 24 | } 25 | 26 | if options: 27 | default_options.update(options) 28 | 29 | try: 30 | imgkit.from_file(html_path, output_path, options=default_options) 31 | print(f"图片已成功保存到 {output_path},已优化中文字体和图像清晰度") 32 | except Exception as e: 33 | print(f"转换失败: {str(e)}") 34 | 35 | # 使用示例 36 | html_to_image('./1c6ccb00-f117-4cc0-af04-90b008c2744c.html', 'output.png') -------------------------------------------------------------------------------- /tools/html2pic2.py: -------------------------------------------------------------------------------- 1 | from selenium import webdriver 2 | from selenium.webdriver.chrome.options import Options 3 | import os 4 | 5 | def html_to_image_selenium(html_path, output_path): 6 | """ 7 | 使用selenium将HTML转换为图片,并根据内容自动调整尺寸 8 | 9 | 参数: 10 | html_path: HTML文件路径或URL 11 | output_path: 输出图片路径 12 | """ 13 | chrome_options = Options() 14 | chrome_options.add_argument('--headless') 15 | chrome_options.add_argument('--disable-gpu') 16 | chrome_options.add_argument('--no-sandbox') 17 | 18 | try: 19 | driver = webdriver.Chrome(options=chrome_options) 20 | driver.get(f'file://{os.path.abspath(html_path)}' if html_path.endswith('.html') else html_path) 21 | 22 | # 获取内容尺寸,而不是整个body 23 | width = driver.execute_script("return Math.max(document.documentElement.scrollWidth, document.documentElement.clientWidth);") 24 | height = driver.execute_script("return Math.max(document.documentElement.scrollHeight, document.documentElement.clientHeight);") 25 | 26 | # 分析内容是否需要额外边距 27 | has_card = driver.execute_script("return document.querySelector('.card') !== null;") 28 | if has_card: 29 | # 如果有卡片元素,添加一些额外边距 30 | width += 40 31 | height += 40 32 | 33 | driver.set_window_size(width, height) 34 | driver.save_screenshot(output_path) 35 | print(f"图片已成功保存到 {output_path},尺寸: {width}x{height}") 36 | except Exception as e: 37 | print(f"转换失败: {str(e)}") 38 | finally: 39 | if 'driver' in locals(): 40 | driver.quit() 41 | 42 | # 使用示例 43 | html_to_image_selenium('./1c6ccb00-f117-4cc0-af04-90b008c2744c.html', 'output_selenium.png') -------------------------------------------------------------------------------- /tools/llm_caller.py: -------------------------------------------------------------------------------- 1 | import os 2 | import asyncio 3 | import logging 4 | import httpx 5 | from openai import OpenAI 6 | from dotenv import load_dotenv 7 | from fastapi import HTTPException # Re-import HTTPException if needed for raising errors 8 | load_dotenv() 9 | # Configure logging (can inherit from main app or configure separately) 10 | logger = logging.getLogger(__name__) 11 | # Ensure logger is configured if this module is run independently or before main app config 12 | if not logger.hasHandlers(): 13 | logging.basicConfig(level=logging.INFO) 14 | 15 | # Load environment variables directly (assuming dotenv is called in the main script) 16 | # LLM Configuration (Original - can be fallback or replaced) 17 | DEEPSEEK_API_KEY = os.getenv("DEEPSEEK_API_KEY") 18 | DEEPSEEK_API_URL = os.getenv("DEEPSEEK_API_URL", "https://api.deepseek.com/v1/chat/completions") # Provide default URL 19 | 20 | # LLM Configuration (New - Ark Platform via OpenAI client) 21 | ARK_API_KEY = os.getenv("ARK_API_KEY") 22 | ARK_BASE_URL = os.getenv("ARK_BASE_URL", "https://ark.cn-beijing.volces.com/api/v3") # Provide default URL 23 | 24 | # Default model IDs (can be overridden by caller) 25 | DEFAULT_ARK_MODEL = "deepseek-v3-250324" 26 | DEFAULT_ORIGINAL_MODEL = "ep-m-20250330105359-r7wqp" # Or "deepseek-chat" if preferred 27 | 28 | # --- Internal LLM Call Functions --- 29 | 30 | async def _call_original_llm(prompt: str, model: str, temperature: float) -> str: 31 | """Internal function to call the LLM using the original httpx method.""" 32 | if not DEEPSEEK_API_KEY: 33 | logger.error("DEEPSEEK_API_KEY environment variable not set for original LLM call.") 34 | # Raise HTTPException directly if this function needs to interact with FastAPI error handling 35 | # Or return an error indicator / raise a custom exception 36 | raise HTTPException(status_code=500, detail="LLM API key not configured (original method)") 37 | 38 | headers = { 39 | "Authorization": f"Bearer {DEEPSEEK_API_KEY}", 40 | "Content-Type": "application/json", 41 | } 42 | payload = { 43 | "model": model, 44 | "messages": [{"role": "user", "content": prompt}], 45 | "temperature": temperature, 46 | } 47 | 48 | async with httpx.AsyncClient(timeout=60.0) as client: 49 | try: 50 | response = await client.post(DEEPSEEK_API_URL, headers=headers, json=payload) 51 | response.raise_for_status() 52 | data = response.json() 53 | if data.get('choices') and len(data['choices']) > 0: 54 | return data['choices'][0]['message']['content'] 55 | else: 56 | logger.error(f"LLM API response missing expected data: {data}") 57 | raise HTTPException(status_code=500, detail="Invalid LLM API response (original method)") 58 | except httpx.RequestError as e: 59 | logger.error(f"Error calling original LLM API: {e}") 60 | raise HTTPException(status_code=500, detail=f"Failed to connect to LLM API (original method): {e}") 61 | except httpx.HTTPStatusError as e: 62 | logger.error(f"Original LLM API request failed: {e.response.status_code} - {e.response.text}") 63 | raise HTTPException(status_code=e.response.status_code, detail=f"LLM API error (original method): {e.response.text}") 64 | 65 | async def _call_ark_llm(prompt: str, model: str, temperature: float, sys_prompt: str = None) -> str: 66 | """Internal function to call the Ark LLM platform using the OpenAI client.""" 67 | if not ARK_API_KEY: 68 | logger.error("ARK_API_KEY environment variable not set.") 69 | raise HTTPException(status_code=500, detail="LLM API key not configured (Ark method)") 70 | 71 | try: 72 | client = OpenAI( 73 | api_key=ARK_API_KEY, 74 | base_url=ARK_BASE_URL, 75 | timeout=1800.0, # 1800 seconds = 30 minutes 76 | ) 77 | 78 | logger.info(f"Sending request to Ark LLM. Model: {model}, Temperature: {temperature}") 79 | 80 | messages = [] 81 | if sys_prompt: 82 | messages.append({"role": "system", "content": sys_prompt}) 83 | messages.append({"role": "user", "content": prompt}) 84 | 85 | response = await asyncio.to_thread( 86 | client.chat.completions.create, 87 | model=model, # User specifies the Ark model ID here 88 | messages=messages, 89 | temperature=temperature 90 | ) 91 | 92 | if hasattr(response.choices[0].message, 'reasoning_content'): 93 | logger.info(f"LLM Reasoning Content: {response.choices[0].message.reasoning_content}") 94 | 95 | return response.choices[0].message.content 96 | 97 | except Exception as e: # Catch potential OpenAI client errors 98 | logger.error(f"Error calling Ark LLM API: {e}", exc_info=True) 99 | raise HTTPException(status_code=500, detail=f"Failed to call LLM API (Ark method): {e}") 100 | 101 | 102 | # --- Public Function --- 103 | 104 | async def generate_content_with_llm(prompt: str, model: str | None = None, temperature: float = 0.7, sys_prompt: str = None) -> str: 105 | """ 106 | Generates content using the appropriate LLM based on available API keys. 107 | 108 | Args: 109 | prompt: The input prompt for the LLM. 110 | model: The specific model ID to use. If None, uses defaults based on API key. 111 | temperature: The generation temperature. 112 | sys_prompt: Optional system prompt to use for the LLM request. 113 | 114 | Returns: 115 | The generated content string. 116 | 117 | Raises: 118 | HTTPException: If API keys are missing or API calls fail. 119 | """ 120 | if ARK_API_KEY: 121 | logger.info("ARK_API_KEY found, using Ark LLM method.") 122 | effective_model = model or DEFAULT_ARK_MODEL 123 | logger.info(f"Calling Ark LLM with model: {effective_model}") 124 | return await _call_ark_llm(prompt, effective_model, temperature, sys_prompt) 125 | elif DEEPSEEK_API_KEY: 126 | logger.warning("ARK_API_KEY not found, falling back to original LLM method.") 127 | effective_model = model or DEFAULT_ORIGINAL_MODEL 128 | logger.info(f"Calling original LLM with model: {effective_model}") 129 | # Original method doesn't support system prompt yet 130 | return await _call_original_llm(prompt, effective_model, temperature) 131 | else: 132 | logger.error("Neither ARK_API_KEY nor DEEPSEEK_API_KEY are set.") 133 | raise HTTPException(status_code=500, detail="No LLM API Key configured.") 134 | -------------------------------------------------------------------------------- /tools/llm_prompt.py: -------------------------------------------------------------------------------- 1 | import os 2 | import logging 3 | from openai import OpenAI 4 | from dotenv import load_dotenv 5 | from .prompt_config import SYSTEM_PROMPT_WEB_DESIGNER, USER_PROMPT_WEB_DESIGNER, SYSTEM_PROMPT_SUMMARIZE_2MD 6 | import re 7 | 8 | # Load environment variables from a .env file if present 9 | load_dotenv() 10 | 11 | # Configure logging 12 | logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') 13 | logger = logging.getLogger(__name__) 14 | 15 | 16 | 17 | def call_ark_llm(prompt: str, sys_prompt:str = SYSTEM_PROMPT_WEB_DESIGNER,model_id: str = "deepseek-v3-250324", temperature: float = 0.7) -> str: 18 | """ 19 | Calls the Ark platform LLM using the OpenAI client library. 20 | 21 | Args: 22 | prompt (str): The user prompt to send to the LLM. 23 | model_id (str): The model ID to use (e.g., "deepseek-r1-250120"). 24 | Defaults to "deepseek-r1-250120". 25 | temperature (float): Controls randomness. Lower is more deterministic. 26 | Defaults to 0.7. 27 | 28 | Returns: 29 | str: The content of the LLM's response message. 30 | 31 | Raises: 32 | ValueError: If the ARK_API_KEY environment variable is not set. 33 | Exception: If the API call fails. 34 | """ 35 | api_key = os.environ.get("ARK_API_KEY") 36 | # Use the provided base_url or default to the one in the example 37 | base_url = os.environ.get("ARK_BASE_URL", "https://ark.cn-beijing.volces.com/api/v3") 38 | 39 | if not api_key: 40 | logger.error("ARK_API_KEY environment variable not found.") 41 | raise ValueError("ARK_API_KEY environment variable must be set.") 42 | 43 | try: 44 | client = OpenAI( 45 | api_key=api_key, 46 | base_url=base_url, 47 | # Set a long timeout as recommended for potentially long-running models 48 | timeout=1800.0, # 1800 seconds = 30 minutes 49 | ) 50 | 51 | logger.info(f"Sending request to Ark LLM. Model: {model_id}, Temperature: {temperature}") 52 | response = client.chat.completions.create( 53 | model=model_id, 54 | messages=[ 55 | # You can add a system prompt here if needed: 56 | {"role": "system", "content": sys_prompt}, 57 | {"role": "user", "content": prompt} 58 | ], 59 | temperature=temperature, 60 | # Add other parameters like max_tokens if necessary 61 | # max_tokens=1024, 62 | ) 63 | 64 | message = response.choices[0].message 65 | 66 | # Check for and log reasoning content if the model provides it 67 | if hasattr(message, 'reasoning_content') and message.reasoning_content: 68 | logger.info("LLM Reasoning Content Detected:") 69 | # You might want to print this or handle it differently 70 | print("--- Reasoning Content ---") 71 | print(message.reasoning_content) 72 | print("--- End Reasoning Content ---") 73 | 74 | 75 | content = message.content 76 | logger.info("Successfully received response from Ark LLM.") 77 | return content 78 | 79 | except Exception as e: 80 | logger.error(f"Error during Ark LLM API call: {e}", exc_info=True) 81 | # Re-raise the exception to be handled by the caller 82 | raise Exception(f"Failed to get response from Ark LLM: {e}") 83 | 84 | def extract_html_from_response(response_text): 85 | """ 86 | 从LLM响应中提取HTML内容 87 | 88 | 参数: 89 | response_text: LLM返回的完整文本 90 | 91 | 返回: 92 | 提取出的HTML内容,如果没有找到则返回原始文本 93 | """ 94 | # 尝试匹配完整的HTML文档(从或开始到结束) 95 | html_pattern = r'(?:]*>|]*>)[\s\S]*?' 96 | match = re.search(html_pattern, response_text, re.IGNORECASE) 97 | 98 | if match: 99 | return match.group(0) 100 | 101 | # 如果没有找到完整HTML,尝试匹配被代码块包围的HTML 102 | code_block_pattern = r'```(?:html)?\s*((?:]*>|]*>)[\s\S]*?)\s*```' 103 | match = re.search(code_block_pattern, response_text, re.IGNORECASE) 104 | 105 | if match: 106 | return match.group(1) 107 | 108 | # 如果仍然没有找到,尝试匹配任何HTML片段 109 | html_fragment_pattern = r'<[^>]+>[\s\S]*?]+>' 110 | match = re.search(html_fragment_pattern, response_text) 111 | 112 | if match: 113 | return match.group(0) 114 | 115 | # 如果所有模式都没有匹配,返回原始文本 116 | return response_text 117 | 118 | # --- Example Usage --- 119 | if __name__ == "__main__": 120 | # Make sure you have a .env file in the same directory 121 | # with your ARK_API_KEY, like: 122 | # ARK_API_KEY="your_actual_ark_api_key_here" 123 | # ARK_BASE_URL="https://ark.cn-beijing.volces.com/api/v3" # Optional, if different 124 | 125 | # Define your prompt 126 | my_prompt = USER_PROMPT_WEB_DESIGNER + "解释一下什么是大型语言模型 (LLM),并举例说明其应用。" 127 | specific_model = "deepseek-v3-250324" # Or another model you have access to 128 | 129 | try: 130 | # Call the function with your prompt 131 | llm_response = call_ark_llm(prompt=my_prompt, model_id=specific_model) 132 | 133 | # 提取HTML内容 134 | html_content = extract_html_from_response(llm_response) 135 | 136 | # 将HTML内容保存到文件 137 | html_file_path = "llm_generated.html" 138 | with open(html_file_path, "w", encoding="utf-8") as f: 139 | f.write(html_content) 140 | print(f"HTML内容已保存到: {html_file_path}") 141 | 142 | # Print the result 143 | print("\n--- LLM Response ---") 144 | print(llm_response) 145 | 146 | print("--- End LLM Response ---") 147 | 148 | except ValueError as ve: 149 | print(f"Configuration Error: {ve}") 150 | except Exception as ex: 151 | print(f"An error occurred: {ex}") -------------------------------------------------------------------------------- /tools/output.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/smart-card-workshop/fa1a010c3da706684d77578648e750f59f60da1b/tools/output.png -------------------------------------------------------------------------------- /tools/output_images/card_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/smart-card-workshop/fa1a010c3da706684d77578648e750f59f60da1b/tools/output_images/card_1.png -------------------------------------------------------------------------------- /tools/output_images/card_1_card_boundary.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/smart-card-workshop/fa1a010c3da706684d77578648e750f59f60da1b/tools/output_images/card_1_card_boundary.png -------------------------------------------------------------------------------- /tools/output_images/card_1_connected.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/smart-card-workshop/fa1a010c3da706684d77578648e750f59f60da1b/tools/output_images/card_1_connected.png -------------------------------------------------------------------------------- /tools/output_images/card_1_content_regions.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/smart-card-workshop/fa1a010c3da706684d77578648e750f59f60da1b/tools/output_images/card_1_content_regions.png -------------------------------------------------------------------------------- /tools/output_images/card_1_thresh.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/smart-card-workshop/fa1a010c3da706684d77578648e750f59f60da1b/tools/output_images/card_1_thresh.png -------------------------------------------------------------------------------- /tools/output_images/page_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/smart-card-workshop/fa1a010c3da706684d77578648e750f59f60da1b/tools/output_images/page_1.png -------------------------------------------------------------------------------- /tools/output_selenium.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/smart-card-workshop/fa1a010c3da706684d77578648e750f59f60da1b/tools/output_selenium.png -------------------------------------------------------------------------------- /tools/pdf2card.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | from pdf2image import convert_from_path 4 | import os 5 | from PIL import Image 6 | from card_extractor import extract_card_from_image # 导入从单独文件中拆分出的函数 7 | import concurrent.futures 8 | import time # For timing comparisons 9 | 10 | # Define a helper function to check if a row is white 11 | def is_white_row(row, threshold=245): 12 | """Checks if a row in an image is predominantly white.""" 13 | # If the row's mean color value is higher than the threshold, consider it white. 14 | # Using np.all might be faster if strict white is expected, but mean is robust to noise. 15 | return np.mean(row) > threshold 16 | 17 | # Define the image processing logic for a single page 18 | def process_page_image(args): 19 | """Processes a single page image: scales (optional), removes borders.""" 20 | i, img_pil, total_pages, scale_factor = args 21 | try: 22 | # Convert PIL Image to NumPy array for processing 23 | img_np = np.array(img_pil) 24 | 25 | # --- Optional Scaling (kept scale_factor=1 as per original code) --- 26 | if scale_factor != 1: 27 | try: 28 | # print(f"Applying {scale_factor}x linear interpolation scaling to page {i+1}...") # Can be verbose 29 | h, w = img_np.shape[:2] 30 | new_h, new_w = int(h * scale_factor), int(w * scale_factor) 31 | img_np = cv2.resize(img_np, (new_w, new_h), interpolation=cv2.INTER_LINEAR) 32 | # print(f"Scaling complete for page {i+1}.") 33 | except Exception as scale_error: 34 | print(f"\nWarning: Scaling failed for page {i+1} ({scale_error}). Using original resolution.") 35 | # --- End Scaling --- 36 | 37 | # Find the top border (skip for the first page) 38 | top_crop = 0 39 | if i > 0: # Only crop top if not the first page 40 | for y in range(img_np.shape[0]): 41 | if not is_white_row(img_np[y]): 42 | top_crop = y 43 | break 44 | # Add a small safety margin if needed, e.g., top_crop = max(0, top_crop - 2) 45 | 46 | # Find the bottom border (skip for the last page) 47 | bottom_crop = img_np.shape[0] 48 | if i < total_pages - 1: # Only crop bottom if not the last page 49 | for y in range(img_np.shape[0] - 1, top_crop -1, -1): # Search down to top_crop 50 | if not is_white_row(img_np[y]): 51 | bottom_crop = y + 1 # Crop below the last content row 52 | break 53 | # Add a small safety margin if needed, e.g., bottom_crop = min(img_np.shape[0], bottom_crop + 2) 54 | 55 | 56 | # Crop the image 57 | # print(f"Page {i+1}: Original Shape: {img_np.shape}, Cropping: top={top_crop}, bottom={bottom_crop}") # Debugging info 58 | if top_crop >= bottom_crop: 59 | print(f"Warning: Page {i+1} cropping resulted in empty image (top={top_crop}, bottom={bottom_crop}). Skipping crop.") 60 | cropped_img_np = img_np # Keep original if crop is invalid 61 | else: 62 | cropped_img_np = img_np[top_crop:bottom_crop] 63 | # print(f"Page {i+1}: Cropped Shape: {cropped_img_np.shape}") # Debugging info 64 | 65 | # Return index and processed NumPy array (avoids PIL conversion overhead here) 66 | return i, cropped_img_np 67 | 68 | except Exception as e: 69 | print(f"Error processing page {i+1}: {e}") 70 | # Return original image data if processing fails 71 | return i, np.array(img_pil) 72 | 73 | 74 | def pdf_to_images_optimized(pdf_path, output_folder, dpi=1000): 75 | """ 76 | Converts PDF pages to a single vertically stitched image, removing borders between pages. 77 | Optimized for speed using parallel processing. 78 | 79 | Args: 80 | pdf_path (str): Path to the PDF file. 81 | output_folder (str): Folder to save the output image. 82 | dpi (int): Resolution for PDF conversion. IMPORTANT: Higher DPI significantly impacts 83 | performance and memory. 300 is often sufficient, 1000 is very high. 84 | Consider lowering this value (e.g., 150-300) for speed. Default is 1000 for compatibility. 85 | 86 | Returns: 87 | str: Path to the combined image, or None if conversion fails. 88 | """ 89 | if not os.path.exists(output_folder): 90 | os.makedirs(output_folder) 91 | 92 | try: 93 | start_time = time.time() 94 | print(f"Starting PDF conversion (DPI={dpi})...") 95 | # Use thread_count for potential speedup in pdf2image/poppler 96 | # Use more threads if your system can handle it (e.g., os.cpu_count()) 97 | images_pil = convert_from_path(pdf_path, dpi=dpi, thread_count=4) # Adjust thread_count as needed 98 | conversion_time = time.time() - start_time 99 | print(f"PDF conversion finished in {conversion_time:.2f} seconds, got {len(images_pil)} pages.") 100 | 101 | if not images_pil: 102 | print("No images were generated from the PDF.") 103 | return None 104 | 105 | processed_results = {} 106 | total_pages = len(images_pil) 107 | scale_factor = 1 # Keep scale factor as 1 as per original requirement 108 | 109 | print("Starting parallel image processing...") 110 | start_processing_time = time.time() 111 | # Use ThreadPoolExecutor for parallel processing of pages 112 | # Adjust max_workers based on your CPU cores 113 | with concurrent.futures.ThreadPoolExecutor(max_workers=os.cpu_count()) as executor: 114 | # Prepare arguments for each task 115 | tasks_args = [(i, images_pil[i], total_pages, scale_factor) for i in range(total_pages)] 116 | # Map the processing function to the arguments 117 | futures = {executor.submit(process_page_image, arg): i for i, arg in enumerate(tasks_args)} 118 | 119 | for future in concurrent.futures.as_completed(futures): 120 | original_index = futures[future] 121 | try: 122 | # Get the result (index, processed_image_np) 123 | idx, processed_image_np = future.result() 124 | processed_results[idx] = processed_image_np # Store with original index 125 | print(f" Processed page {idx + 1}/{total_pages}") 126 | except Exception as exc: 127 | print(f'Page {original_index + 1} generated an exception: {exc}') 128 | # Optionally handle failure, e.g., use original image 129 | processed_results[original_index] = np.array(images_pil[original_index]) 130 | 131 | 132 | processing_time = time.time() - start_processing_time 133 | print(f"Parallel image processing finished in {processing_time:.2f} seconds.") 134 | 135 | # Ensure results are sorted by original page order 136 | sorted_processed_images_np = [processed_results[i] for i in range(total_pages) if i in processed_results] 137 | 138 | if not sorted_processed_images_np: 139 | print("Image processing failed for all pages.") 140 | return None 141 | 142 | # Calculate final image dimensions from NumPy arrays 143 | widths = [img_np.shape[1] for img_np in sorted_processed_images_np] 144 | heights = [img_np.shape[0] for img_np in sorted_processed_images_np] 145 | max_width = max(widths) if widths else 0 146 | total_height = sum(heights) if heights else 0 147 | 148 | if max_width == 0 or total_height == 0: 149 | print("Calculated final image dimensions are zero. Cannot proceed.") 150 | return None 151 | 152 | print(f"Creating final combined image (Width: {max_width}, Height: {total_height})...") 153 | start_combining_time = time.time() 154 | 155 | # Create the combined image using PIL 156 | # Ensure 3 channels (RGB) for the final image. Find the mode from the first valid image. 157 | output_mode = 'RGB' 158 | if sorted_processed_images_np[0].ndim == 3: 159 | channels = sorted_processed_images_np[0].shape[2] 160 | if channels == 4: 161 | output_mode = 'RGBA' # Keep alpha if present 162 | elif channels == 1: 163 | output_mode = 'L' # Grayscale 164 | elif sorted_processed_images_np[0].ndim == 2: 165 | output_mode = 'L' # Grayscale 166 | 167 | # If any image has color, default to RGB 168 | for img_np in sorted_processed_images_np: 169 | if img_np.ndim == 3 and img_np.shape[2] == 3: 170 | output_mode = 'RGB' 171 | break 172 | if img_np.ndim == 3 and img_np.shape[2] == 4: 173 | output_mode = 'RGBA' # Prioritize RGBA if found 174 | # Don't break yet, check if others force RGB 175 | 176 | # If ended up with RGBA but some images are RGB/L, convert final to RGB 177 | # Or handle mixing modes more carefully if alpha is critical. Sticking to RGB is safer. 178 | # Let's force RGB for simplicity unless alpha was explicitly detected and consistent. 179 | final_mode = 'RGB' if output_mode != 'L' else 'L' # Stick to RGB or L 180 | 181 | 182 | # Create blank canvas. Default background to white. 183 | combined_image = Image.new(final_mode, (max_width, total_height), color='white') 184 | 185 | current_y = 0 186 | for img_np in sorted_processed_images_np: 187 | # Convert NumPy array back to PIL Image for pasting 188 | img_pil_to_paste = Image.fromarray(img_np) 189 | 190 | # Ensure image matches the final combined image mode before pasting 191 | if img_pil_to_paste.mode != combined_image.mode: 192 | img_pil_to_paste = img_pil_to_paste.convert(combined_image.mode) 193 | 194 | # Center image if its width is less than max_width (optional, original didn't do this) 195 | paste_x = (max_width - img_pil_to_paste.width) // 2 196 | combined_image.paste(img_pil_to_paste, (paste_x, current_y)) 197 | current_y += img_pil_to_paste.height # Use the actual height of the pasted image 198 | 199 | combining_time = time.time() - start_combining_time 200 | print(f"Image combination finished in {combining_time:.2f} seconds.") 201 | 202 | # Save the combined image 203 | pdf_filename = os.path.splitext(os.path.basename(pdf_path))[0] 204 | combined_image_path = os.path.join(output_folder, f'{pdf_filename}_combined_optimized.png') 205 | combined_image.save(combined_image_path, 'PNG') 206 | total_time = time.time() - start_time 207 | print(f"Optimized processing complete. Total time: {total_time:.2f} seconds.") 208 | print(f"Combined image saved to: {combined_image_path}") 209 | 210 | return combined_image_path 211 | 212 | except Exception as e: 213 | import traceback 214 | print(f"Error in optimized PDF to combined image conversion: {e}") 215 | traceback.print_exc() # Print detailed traceback 216 | return None 217 | 218 | # --- Main execution block remains the same, but calls the optimized function --- 219 | if __name__ == "__main__": 220 | # 1. Convert PDF to combined image using the optimized function 221 | pdf_path = "./output_weasyprint.pdf" # Ensure this PDF exists 222 | output_folder = "output_images_optimized" 223 | # CRITICAL: Test with a more reasonable DPI first, e.g., 300 224 | # image_path = pdf_to_images_optimized(pdf_path, output_folder, dpi=300) 225 | image_path = pdf_to_images_optimized(pdf_path, output_folder, dpi=300) # Using 300 based on original example call 226 | 227 | # 2. Extract card from the combined image (if successful) 228 | if image_path: 229 | card_output_path = os.path.join(output_folder, 'card_extracted.png') 230 | print(f"\nExtracting card from: {image_path}") 231 | try: 232 | extract_card_from_image( 233 | image_path, 234 | card_output_path, 235 | min_area=500, # Adjust as needed 236 | debug=True # Enable debug if needed for extraction visualization 237 | ) 238 | print(f"Card extraction attempted. Output expected at: {card_output_path}") 239 | except Exception as extract_error: 240 | print(f"Error during card extraction: {extract_error}") 241 | else: 242 | print("\nPDF conversion failed, skipping card extraction.") -------------------------------------------------------------------------------- /tools/prompt_config.py: -------------------------------------------------------------------------------- 1 | """ 2 | Configuration file for LLM system prompts. 3 | """ 4 | 5 | # System prompt for the Ark LLM (Web Designer Role) 6 | SYSTEM_PROMPT_WEB_DESIGNER = """ 7 | ## 角色定位 8 | 你是一位专业的网页设计师与前端开发专家,擅长根据需求快速生成美观、响应式的HTML卡片页面代码。卡片需要适配手机尺寸,一般以iphone15尺寸为准。 9 | 10 | ## 核心能力 11 | 1. 能根据用户需求生成完整的HTML5页面结构 12 | 2. 精通现代CSS布局技术(Flexbox/Grid) 13 | 3. 掌握色彩搭配与UI设计原则 14 | 4. 能实现响应式设计适配不同设备 15 | 5. 熟悉常用设计风格(极简/拟物/毛玻璃等) 16 | 17 | ## 知识储备 18 | - 最新HTML5/CSS3标准 19 | - 主流UI框架设计规范 20 | - WCAG无障碍标准 21 | - 色彩心理学基础 22 | - 排版设计原则 23 | 24 | ## 输出要求 25 | 1. 务必满足生成适合手机尺寸的HTML卡片页面,卡片宽度写死393px 26 | 2. 提供完整的HTML文件代码 27 | 3. 包含内联CSS样式 28 | 4. 使用语义化标签 29 | 5. 添加必要的meta标签 30 | 6. 确保代码整洁规范 31 | 7. 遵循W3C标准 32 | 8. 注意只输出HTML代码,不包含其他内容!!!注意只生成一段完整的HTML代码,不要输出多段。 33 | 9. 落款中加入 © 2025 Deepseek & BreaklmLab的标识 34 | 35 | ## 交互方式 36 | 请用户提供: 37 | 1. 页面用途(企业官网/个人博客/产品展示等) 38 | 2. 期望的设计风格 39 | 3. 需要包含的主要内容区块 40 | 4. 品牌色/偏好色(可选) 41 | 5. 其他特殊需求 42 | """ 43 | 44 | SYSTEM_PROMPT_SUMMARIZE_2MD = """ 45 | ## 角色定位 46 | 你是一位专业的文本分析专家,擅长从复杂内容中提取关键信息并生成结构化摘要。 47 | 48 | ## 核心能力 49 | 1. 精准识别文本核心观点和关键细节 50 | 2. 自动划分逻辑段落并提取主旨 51 | 3. 保持原文语义的同时高度凝练 52 | 4. 生成规范的Markdown格式输出 53 | 5. 根据内容类型调整总结风格 54 | 55 | ## 知识储备 56 | - 信息提取技术 57 | - 自然语言处理 58 | - 结构化写作规范 59 | - 多种文本类型特征(新闻/论文/报告等) 60 | - 关键信息识别方法 61 | 62 | ## 题目书写 63 | - 需要再文章中生成一个小红书爆款标题 64 | - 题目紧扣内容核心 65 | - 题目要求有网感,吸引阅读者目光 66 | 67 | ## 输出要求 68 | 1. 使用Markdown标题分级 69 | 2. 包含3-5个核心要点 70 | 3. 每个要点不超过2句话 71 | 4. 保留关键数据/事实 72 | 5. 总长度不超过原文30% 73 | 6.除了主要内容不要输出额外内容 74 | 75 | ## 交互方式 76 | 请用户提供: 77 | 1. 需要总结的文本内容 78 | 2. 期望的总结深度(简要/详细) 79 | 3. 特定关注点(可选) 80 | 4. 是否需要保留示例/引用(可选) 81 | """ 82 | 83 | # You can add other system prompts here as needed 84 | # SYSTEM_PROMPT_OTHER = """...""" 85 | USER_PROMPT_WEB_DESIGNER = """ 86 | 现在请根据用户提供的信息,生成一个符合要求的HTML卡片页面。卡片需要适配手机尺寸,一般以iphone15尺寸为准。 87 | """ -------------------------------------------------------------------------------- /tools/selenium2img.py: -------------------------------------------------------------------------------- 1 | from selenium import webdriver 2 | from selenium.webdriver.chrome.options import Options 3 | import time 4 | import os 5 | from .card_extractor import extract_card_from_image 6 | 7 | def html_to_image(html_path, output_path, width=393, height=None): 8 | """ 9 | Renders HTML file to an image using Selenium and Chrome, emulating a mobile device. 10 | 11 | Parameters: 12 | html_path: Path to HTML file or URL 13 | output_path: Path to save the output image 14 | width: Width of the viewport in pixels (default: 393px - iPhone 15 width) 15 | height: Height of the viewport in pixels (None for auto, dynamically calculated) 16 | """ 17 | # Configure Chrome options 18 | chrome_options = Options() 19 | chrome_options.add_argument("--headless") # Run in headless mode 20 | chrome_options.add_argument("--disable-gpu") 21 | # Remove specific window size and user agent as mobile emulation handles this 22 | # chrome_options.add_argument(f"--window-size={width},{height or 852}") 23 | # chrome_options.add_argument("--user-agent=Mozilla/5.0 (iPhone; CPU iPhone OS 15_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.0 Mobile/15E148 Safari/604.1") 24 | chrome_options.add_argument("--hide-scrollbars") 25 | chrome_options.add_argument("--no-sandbox") 26 | chrome_options.add_argument("--disable-dev-shm-usage") 27 | chrome_options.add_argument("--force-device-scale-factor=2") # Keep high DPI 28 | 29 | # Define mobile emulation settings for iPhone 15 30 | mobile_emulation = { 31 | "deviceMetrics": { 32 | "width": width, 33 | "height": height or 852, # Use a default height if not dynamic 34 | "pixelRatio": 3.0 # iPhone 15 has a 3x pixel ratio 35 | }, 36 | "userAgent": "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Mobile/15E148 Safari/604.1" 37 | } 38 | chrome_options.add_experimental_option("mobileEmulation", mobile_emulation) 39 | 40 | try: 41 | # Initialize the driver with mobile emulation options 42 | driver = webdriver.Chrome(options=chrome_options) 43 | 44 | # Convert to absolute path if it's a local file 45 | if not html_path.startswith('http'): 46 | html_path = 'file://' + os.path.abspath(html_path) 47 | 48 | # Load the page 49 | driver.get(html_path) 50 | 51 | # Wait for page rendering and any JavaScript execution 52 | time.sleep(2) 53 | 54 | # Dynamically calculate height if needed 55 | if height is None: 56 | # Calculate the full page height 57 | calculated_height = driver.execute_script(""" 58 | return Math.max( 59 | document.body.scrollHeight, 60 | document.documentElement.scrollHeight, 61 | document.body.offsetHeight, 62 | document.documentElement.offsetHeight, 63 | document.body.clientHeight, 64 | document.documentElement.clientHeight 65 | ); 66 | """) 67 | print(f"自适应内容高度: {calculated_height}像素") 68 | 69 | # Update mobile emulation height and re-initialize driver for correct height 70 | # Note: Re-initializing is necessary because mobileEmulation height is set at launch 71 | driver.quit() # Close the current driver 72 | mobile_emulation["deviceMetrics"]["height"] = calculated_height 73 | chrome_options.add_experimental_option("mobileEmulation", mobile_emulation) 74 | driver = webdriver.Chrome(options=chrome_options) # Re-launch with correct height 75 | driver.get(html_path) # Reload page 76 | time.sleep(1) # Wait for reload 77 | 78 | # Capture screenshot 79 | driver.save_screenshot(output_path) 80 | print(f"Image saved to {output_path}") 81 | return True 82 | 83 | except Exception as e: 84 | print(f"Error converting HTML to image: {e}") 85 | return False 86 | 87 | finally: 88 | if 'driver' in locals(): 89 | driver.quit() 90 | 91 | # Only run example code when this file is executed directly, not when imported 92 | if __name__ == "__main__": 93 | # Example usage 94 | html_to_image('./96a32822-aebe-4f5a-a2c6-2bed830af5f9.html', 'output.png') 95 | # Uncommenting the below line will run the card extraction too 96 | extract_card_from_image('output.png', 'card.png') --------------------------------------------------------------------------------