├── .gitignore ├── LICENSE ├── README.md ├── YC 项目.md ├── companies.csv ├── demo.ipynb ├── markdown-img ├── 33447fe34423edcf8aea3d91155a201.png ├── 6d1306b25bbef0f3ffb24e36c79cbcb.png ├── 71bdc5ee84dc6da5c8534bbeff765ba.png ├── 7344e9dcdc45d631ad66a11a1d7b05b.png ├── 8d7ad8961df57af31d635ad8e2e691b.png ├── AgentUniverse.png ├── f25752adc9ac83fc7465e0762895cc0.png ├── image-20241007181136044.png ├── image-20241007181508053.png ├── image-20241007181624006.png ├── image-20241007181627977.png ├── image-20241007182235402.png ├── image-20241007182241902.png ├── image-20241007182349251.png ├── img_v3_02fe_d19a1ccb-ed0c-4f20-b26c-fbc19a9f206g.jpg └── 流程图.jpg ├── package.json ├── poster_html ├── LinkenDin.png ├── example_logo │ └── example.png ├── poster.html ├── style2.css ├── 汉仪润圆-65W.ttf └── 特工宇宙.png ├── requirements.txt └── screenshot.js /.gitignore: -------------------------------------------------------------------------------- 1 | # 忽略 .history 2 | /.history/ 3 | 4 | # 忽略 node_modules 目录 5 | /node_modules/ 6 | 7 | # 可选择忽略 package-lock.json 8 | /package-lock.json 9 | 10 | # 忽略 Chrome 文件夹 11 | /Chrome/ 12 | 13 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Agent Universe 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 🌟 YC_Poster:YC 项目爬取、数据清理与海报可视化生成 2 | 3 | ## 项目简介 4 | 5 | 本项目的目标是自动化获取 YC S24 批次的项目信息,并对其进行数据清理与信息抽取,最终以海报形式展示相关项目内容。 6 | 7 | 使用爬虫技术获取 YC 网站上的项目信息,并利用大语言模型对数据进行清理和抽取关键信息。 8 | 9 | 最后,借助 HTML 渲染和 Puppeteer 截图技术,将这些信息以视觉化的方式呈现。 10 | 11 | ## **基于该项目的一些分析** 12 | 13 | ### YC S24 AI 相关项目完整信息(表格): 14 | 15 | 📊 总表格:https://agentuniverse.feishu.cn/wiki/HosvwLWT9ifN7lkidDAcHjxqnsf?from=from_copylink 16 | 17 | ### YC S24 AI 项目图谱(可下载) 18 | 19 | 🌐 总图谱:https://agentuniverse.feishu.cn/wiki/RLUPw94FWiMSGSkSJKTc94djnof 20 | 21 | 📑 每个项目:https://agentuniverse.feishu.cn/wiki/L0C1wj2k4iiXAMkuCNrcHwLtnPb 22 | 23 | ### YC 项目分析: 24 | 25 | 📚 超级盘点丨YC S24 200+ AI 项目详细整理:https://mp.weixin.qq.com/s/jaKksNweXtbB4MXBUs9JhQ 26 | 27 | ### 更多公开资料: 28 | 29 | 📖「特工宇宙」公开资料:https://agentuniverse.feishu.cn/wiki/ISlvw7QTIi8kq8kMOYqczJ1inFh 30 | 31 | ## 📁 项目结构 32 | 33 | ```csharp 34 | . 35 | ├── Chrome # 存放爬取过程中的缓存数据 36 | ├── demo.ipynb # 核心功能展示的 Jupyter Notebook 37 | ├── LICENSE # 项目开源协议 38 | ├── markdown-img # Markdown 文件所需的图片资源 39 | ├── package.json # 项目依赖的配置文件 40 | ├── poster_html # 海报 HTML 版本存放 41 | ├── README.md # 项目说明文档 42 | ├── screenshot.js # Puppeteer 截图脚本 43 | └── YC 项目.md # 项目的详细教程文档 44 | ``` 45 | 46 | ### 整体流程图 47 | 48 | ![流程图](markdown-img/流程图.jpg) 49 | 50 | ## 🔧 功能概述 51 | 52 | 1. **项目列表爬取** :通过自动化工具爬取 YC 网站上的所有 S24 项目信息 53 | 2. **数据清理与抽取** :利用 LLM 模型对每个项目的详细信息进行处理,提取出项目的背景、问题和解决方案等关键信息。 54 | 3. **可视化海报生成** :使用 HTML 模板和 Puppeteer 对项目信息进行批量渲染,并自动生成海报图片。 55 | 56 | ## 🚀 安装与运行 57 | 58 | ### 1️⃣ 克隆项目 59 | 60 | ```bash 61 | git clone https://github.com/Agent-Universe/YC_Poster.git 62 | cd YC 63 | ``` 64 | 65 | ### 2️⃣ 安装依赖 66 | 67 | #### 🐍 Python 环境创建以及依赖安装 68 | 69 | ```bash 70 | # 创建环境,推荐使用 conda 71 | conda create -n yc_poster python=3.8 72 | 73 | # 激活环境 74 | conda activate yc_poster 75 | 76 | # 安装依赖 77 | pip install -r requirements.txt 78 | ``` 79 | 80 | #### 🛠️ Node js 依赖安装 81 | 82 | ```bash 83 | # 安装 node js 和 npm 之后 84 | npm install 85 | 86 | # 项目只依赖 puppeteer,直接 npm install puppeteer 也可以 87 | ``` 88 | 89 | ### 🖥️ 项目 Demo 90 | 91 | YC 项目列表的爬取、详细项目的爬取、LLM 信息抽取、生成 html 版海报均在 [demo.ipynb](demo.ipynb) 中,目前仅展示核心逻辑和基础代码。详细教程请查看 [YC 项目.md](YC 项目.md) 92 | 93 | #### 海报生成 94 | 95 | html 版海报生成后,可运行如下代码,获得图片。 96 | 97 | ```bash 98 | node screenshot.js 99 | ``` 100 | 101 | ## 📜 许可证 102 | 103 | 本项目基于 MIT 许可证开源,详细信息请参见 LICENSE 文件。 104 | 105 | ## 📞 联系作者 106 | 107 | 微信:jamiu99 108 | 109 | 欢迎关注微信公众号:特工宇宙 110 | 111 | ![AgentUniverse](markdown-img/AgentUniverse.png) 112 | -------------------------------------------------------------------------------- /YC 项目.md: -------------------------------------------------------------------------------- 1 | # YC 项目 2 | 3 | ## 1. 获取 YC 项目原始信息 4 | 5 | > 通过爬虫等方式,获取 YC S24 的项目信息 6 | 7 | ### 1.1 获取总项目列表 8 | 9 | #### 整体思路分析: 10 | 11 | 1. 目标网站和获取目标 12 | 1. https://www.ycombinator.com/companies?batch=s24 13 | 2. https://www.ycombinator.com/launches?batch=S2024 14 | 3. 获取所有标签为 S24 的项目名字和项目的详情链接 15 | 2. 实现方案(多种实现方案) 16 | 1. 使用 selenium 对目标网站模拟访问 17 | 18 | 1. 访问模拟滚动行为,等待完全加载完成后,保存所有渲染后的 html 19 | 20 | 1. 这步也可以手动滚动后,对控制台编写脚本,保存渲染后的 html 得到,更加简单,并且避免代理问题。 21 | 2. 使用 BeautifulSoup 对保存后的 html 进行解析 22 | 3. 根据网站开发者工具,找到对应的标签,然后批量获取后保存 23 | 24 | ![image-20241007181136044](markdown-img/image-20241007181136044.png) 25 | 4. 方案优劣 26 | 27 | 1. 优势:所见即所得,上手简单 28 | 2. 劣势:得到的数据都是渲染后的,难以看到一些传输的“隐性数据”,内存占用会较大 29 | 2. 额外方案一:使用网站的开发者工具的控制台,在全部渲染完之后,直接编写 js 脚本,对内容进行提取。(提取思路和前面是一样的,只是语言不同) 30 | 31 | ```js 32 | // 参考代码 33 | function downloadCSV(csvContent, filename) { 34 | const blob = new Blob([csvContent], { type: 'text/csv;charset=utf-8;' }); 35 | const link = document.createElement("a"); 36 | const url = URL.createObjectURL(blob); 37 | link.setAttribute("href", url); 38 | link.setAttribute("download", filename); 39 | document.body.appendChild(link); 40 | link.click(); 41 | document.body.removeChild(link); 42 | } 43 | 44 | // Function to extract data and convert to CSV format 45 | function extractLinksAndNamesToCSV() { 46 | const rows = []; 47 | 48 | // Get all 'a' tags with class '_company_86jzd_338' 49 | const links = document.querySelectorAll('a._company_86jzd_338'); 50 | 51 | links.forEach(link => { 52 | const url = link.getAttribute('href'); // Get the href of the tag 53 | const nameElement = link.querySelector('span._coName_86jzd_453'); // Get the span with class '_coName_86jzd_453' 54 | const name = nameElement ? nameElement.innerText : ''; // Get the text inside the span 55 | rows.push([url, name]); // Add the data to the rows 56 | }); 57 | 58 | // Create CSV content 59 | let csvContent = "URL,Company Name\n"; // CSV header 60 | rows.forEach(row => { 61 | csvContent += row.join(",") + "\n"; // Append each row as a line in the CSV 62 | }); 63 | 64 | // Trigger the download 65 | downloadCSV(csvContent, 'companies.csv'); 66 | } 67 | 68 | // Run the function 69 | extractLinksAndNamesToCSV(); 70 | ``` 71 | 72 | ![71bdc5ee84dc6da5c8534bbeff765ba](markdown-img/71bdc5ee84dc6da5c8534bbeff765ba.png) 73 | 3. 额外方案二:通过中间人代理/使用开发者工具,找到请求接口,直接获取传输到前端的数据。 74 | 75 | ![7344e9dcdc45d631ad66a11a1d7b05b](markdown-img/7344e9dcdc45d631ad66a11a1d7b05b.png) 76 | 77 | #### 补充说明: 78 | 79 | 1. 对于原方案的一点补充 80 | 1. 在使用 Selenium 对目标网站访问时,需要主动设置 Proxy。 81 | 2. 由于这部分过程的 Selenium 目的为获取完整滚动渲染后的 html,因此可以人工在浏览器中加载到网页末端,然后使用 js 脚本保存 html,这样可以避免在代码中对 Selenium 的操作,避免代理设置。 82 | 83 | ![f25752adc9ac83fc7465e0762895cc0](markdown-img/f25752adc9ac83fc7465e0762895cc0.png) 84 | 85 | ```js 86 | // 参考代码 87 | var renderedHTML = document.documentElement.outerHTML; 88 | var blob = new Blob([renderedHTML], {type: 'text/html'}); 89 | var link = document.createElement('a'); 90 | link.href = URL.createObjectURL(blob); 91 | link.download = 'ycs24.html'; 92 | link.click(); 93 | ``` 94 | 95 | ### 1.2 获取单个项目的数据 96 | 97 | #### 整体思路分析: 98 | 99 | 1. 目标网站和获取目标: 100 | 101 | 1. 每个项目的详细介绍链接 102 | 2. 项目的详细信息:介绍文章+项目名片信息+创始人名片信息 103 | 2. 实现方案: 104 | 105 | 1. 与获取总项目列表差不多,但由于列表存在 200+个项目,因此全人工是不现实的,需要尽可能全自动化 106 | 2. 使用 Selenium 模拟访问(与获取项目列表的过程一致) 107 | 108 | 1. 模拟滚动到底,保证全部渲染 109 | 2. 使用 BeautifulSoup 对保存后的 html 进行解析 110 | 3. 根据网站开发者工具,找到对应的标签,然后批量获取后保存 111 | 112 | ![image-20241007181508053](markdown-img/image-20241007181508053.png) 113 | 3. 额外方案:通过中间人代理/使用开发者工具等方式找到请求接口,而后批量请求,获得数据。 114 | 115 | #### 补充说明: 116 | 117 | 1. 对原方案的一点补充 118 | 1. 因为每一个项目的文章内容格式都不相同,因此批量获取存在一定难度,可以考虑跳过,在下一部分,借助 jina-ai/reader 完成。 119 | 2. 同样需要设置 Proxy 120 | 3. 部分内容加载需要进行一定时间的等待 121 | 4. 可以设置多线程加快速度以及代理池 122 | 5. 频率过高会出现被拒绝的情况,需要记录 ERROR 并后续重试 123 | 124 | ## 2. 数据清理和信息抽取(利用 LLM) 125 | 126 | > 核心目标:借助 LLM 的信息抽取能力,对爬取下来的文章,抽取出项目对应的背景/问题(行业通点)以及他们的解决方案,并且对整个项目进行总结。 127 | 128 | ### 1. 优化给 LLM 的输入 129 | 130 | 这部分是可选的,并非必做,其源于前一步对文章主体爬取困难 131 | 132 | 主要借助项目 https://github.com/jina-ai/reader 133 | 134 | 该项目可以将复杂的 html 网页转化为 LLM 易读的 markdown 格式,并且只保留主要部分。 135 | 136 | 使用方式有三种姿势: 137 | 138 | 1. 本地使用这个库,借助其中编写的规则,对 html 进行处理和转化 139 | 140 | 1. 本地化操作,可以直接对保存好的 html 进行处理,可手动修改规则更好适配 141 | 2. 对于本地化部署,可以考虑下面这两个项目 142 | 1. https://github.com/intergalacticalvariable/reader 143 | 2. https://github.com/hargup/reader 144 | 2. (推荐)借助他们的免费服务,对待访问的完整加前缀 `https://r.jina.ai/`,比如对于https://www.ycombinator.com/companies/codeviz,只需要访问https://r.jina.ai/https://www.ycombinator.com/companies/codeviz即可。好用且方便,大部分情况都可以满足 145 | 146 | ![6d1306b25bbef0f3ffb24e36c79cbcb](markdown-img/6d1306b25bbef0f3ffb24e36c79cbcb.png) 147 | 3. jina.ai 考虑使用 LLM 进行端到端处理,开源了两个小模型,用 LLM 来优化 LLM 输入,可以尝试使用 148 | 149 | 1. https://huggingface.co/jinaai/reader-lm-0.5b 150 | 2. https://huggingface.co/jinaai/reader-lm-1.5b 151 | 152 | ![image-20241007181627977](markdown-img/image-20241007181627977.png) 153 | 154 | ### 2. 用 LLM 进行信息抽取 155 | 156 | 这一步的整体思路非常直接,用提示词的方式,让大语言模型格式化输出,总结、问题痛点、解决方案。 157 | 158 | 但实际过程中出于大语言模型的不稳定性,提示词和结构化输出可以有很多优化方案。 159 | 160 | 1. 提示词的优化,用一定的结构和分割,可以有效的提升输出质量 161 | 162 | ``` 163 | 164 | {任务} 165 | 166 | 167 | {输出规则} 168 | 169 |
170 | {文章内容} 171 |
172 | ``` 173 | 2. 对于结构化输出 174 | 175 | 1. 来自一些实验中的结论https://arxiv.org/abs/2408.02442 176 | 1. 让 LLM 严格输出 json 格式可能会降低性能 177 | 2. 不同的大模型偏好的格式不同 178 | 2. 解决方案 179 | 3. 提示词控制,用提示词的方式限制 llm 输出。(可用,但属于几乎最差的方案,推理性能降低,并且还不稳定) 180 | 4. 对于不同大模型,选择更偏好的结构化输出格式,从而尽可能减少的降低 181 | 5. 对于一些本地部署的大模型,可以通过添加一些特定的 embedding 模型,让 LLM 严格输出。 182 | 6. 一些大模型的 API 提供有格式化输出的选项。 183 | 7. 先让大模型自由输出,后用稳定严格生成的小模型进行格式转化。 184 | 8. 使用一些库或框架进行控制,比如 pydantic。 185 | 186 | ## 3. 制作海报 187 | 188 | > 基于已有的 YC 项目信息,制作图片海报。 189 | > 190 | > 使用代码批量制作海报方案很多,比如 fastposter,但这些方案不容易进行较好的自适应布局,因此使用编写 html,渲染后截图保存的形式更为优雅。同时借助一些插件,转化设计稿至前端代码,可以极大加速编写流程。 191 | 192 | ### 1. 制作自适应模板 193 | 194 | 具体方案: 195 | 196 | 1. 使用 mastergo/figma 制作原型模板 197 | 198 | 1. 注意事项:在制作时,尽可能采用容器内自动布局编排的方式,从而实现自适应布局,同时为了后续 html 渲染截图,可以控制宽的分辨率固定,高自适应 199 | 2. 参考如下文件 200 | 201 | 1. https://mastergo.com/goto/DSUMXlWq?page_id=M&file=134481327511593 202 | 203 | ![img_v3_02fe_d19a1ccb-ed0c-4f20-b26c-fbc19a9f206g](markdown-img/img_v3_02fe_d19a1ccb-ed0c-4f20-b26c-fbc19a9f206g.jpg) 204 | 2. 借助 `网易海报 D2C 插件` 实现快速的 UI 设计稿向前端代码的转化 205 | 206 | ![image-20241007182241902](markdown-img/image-20241007182241902.png) 207 | 208 | ![33447fe34423edcf8aea3d91155a201](markdown-img/33447fe34423edcf8aea3d91155a201.png) 209 | 3. 打开网页进行预览 210 | 211 | 1. 使用设备仿真可以查看固定分辨率下的状态(右键,检查,切换设备仿真,设定分辨率) 212 | 213 | ![image-20241007182349251](markdown-img/image-20241007182349251.png) 214 | 4. 部分转化可能存在一定问题,需要进行简单的手动修改,里面的一些图片链接为临时链接,需要转化为本地资源。 215 | 216 | ![8d7ad8961df57af31d635ad8e2e691b](markdown-img/8d7ad8961df57af31d635ad8e2e691b.png) 217 | 5. 字体资源需要自行添加。 218 | 219 | ### 2. 批量信息插入并渲染出图 220 | 221 | 1. 编写代码,批量读取之前保存的信息,对 html 中的部分信息进行批量替换 222 | 2. 使用 puppeteer 库,编写 js 代码对 html 进行渲染,然后保存为图片 223 | 224 | ```js 225 | // 参考代码 226 | const puppeteer = require('puppeteer'); 227 | const fs = require('fs'); 228 | const path = require('path'); 229 | 230 | (async () => { 231 | const browser = await puppeteer.launch(); 232 | const page = await browser.newPage(); 233 | 234 | const outputDir = 'screenshots'; 235 | if (!fs.existsSync(outputDir)) { 236 | fs.mkdirSync(outputDir); 237 | } 238 | 239 | const htmlFiles = fs.readdirSync('html_output').filter(file => file.endsWith('.html')); 240 | 241 | for (const htmlFile of htmlFiles) { 242 | const filePath = path.join(__dirname, 'html_output', htmlFile); 243 | 244 | await page.goto(`file://${filePath}`, { waitUntil: 'networkidle2' }); 245 | 246 | // 设置宽度为720,自动计算高度 247 | await page.setViewport({ width: 720, height: 0 }); // height=0以计算自适应高度 248 | const bodyHandle = await page.$('body'); 249 | const { height } = await bodyHandle.boundingBox(); 250 | await bodyHandle.dispose(); 251 | 252 | // 重新设置viewport高度 253 | await page.setViewport({ width: 720, height: Math.ceil(height),deviceScaleFactor: 3}); 254 | 255 | // 截图并保存 256 | const screenshotPath = path.join(outputDir, `${path.parse(htmlFile).name}.png`); 257 | await page.screenshot({ path: screenshotPath, fullPage: true }); 258 | } 259 | 260 | await browser.close(); 261 | })(); 262 | ``` 263 | 264 | 注意事项: 265 | 266 | ```js 267 | await page.setViewport({ width: 720, height: Math.ceil(height),deviceScaleFactor: 3}); 268 | ``` 269 | 270 | 这段代码中的 deviceScaleFactor 默认为 1,可能会导致图片不够清晰,可以设置大些。 271 | -------------------------------------------------------------------------------- /companies.csv: -------------------------------------------------------------------------------- 1 | URL,Company Name 2 | https://www.ycombinator.com/companies/outerport,Outerport 3 | https://www.ycombinator.com/companies/dreamrp,DreamRP 4 | https://www.ycombinator.com/companies/melty,Melty 5 | https://www.ycombinator.com/companies/syntra,Syntra 6 | https://www.ycombinator.com/companies/pathpilot,PathPilot 7 | https://www.ycombinator.com/companies/standard-data,Standard Data 8 | https://www.ycombinator.com/companies/kairo-health,Kairo Health 9 | https://www.ycombinator.com/companies/midship,Midship 10 | https://www.ycombinator.com/companies/soma-lab,Soma Lab 11 | https://www.ycombinator.com/companies/merlin-ai,Merlin AI 12 | https://www.ycombinator.com/companies/freestyle,Freestyle 13 | https://www.ycombinator.com/companies/arva-ai,Arva AI 14 | https://www.ycombinator.com/companies/affil-ai,Affil.ai 15 | https://www.ycombinator.com/companies/orgorg,OrgOrg 16 | https://www.ycombinator.com/companies/patched,Patched 17 | https://www.ycombinator.com/companies/village-labs,Village Labs 18 | https://www.ycombinator.com/companies/mica-ai,Mica AI 19 | https://www.ycombinator.com/companies/formula-insight,Formula Insight 20 | https://www.ycombinator.com/companies/overlap,Overlap 21 | https://www.ycombinator.com/companies/capitol-ai,Capitol AI 22 | https://www.ycombinator.com/companies/kart-ai,Kart AI 23 | https://www.ycombinator.com/companies/void,Void 24 | https://www.ycombinator.com/companies/anglera,Anglera 25 | https://www.ycombinator.com/companies/xtraffic,XTraffic 26 | https://www.ycombinator.com/companies/felafax,Felafax 27 | https://www.ycombinator.com/companies/henry-2,Henry 28 | https://www.ycombinator.com/companies/hey-revia,Hey Revia 29 | https://www.ycombinator.com/companies/ficra,Ficra 30 | https://www.ycombinator.com/companies/rowboat-labs,RowBoat Labs 31 | https://www.ycombinator.com/companies/pipeshift,Pipeshift 32 | https://www.ycombinator.com/companies/vera-health,Vera Health 33 | https://www.ycombinator.com/companies/pharos,Pharos 34 | https://www.ycombinator.com/companies/et-al,et al. 35 | https://www.ycombinator.com/companies/keet,Keet 36 | https://www.ycombinator.com/companies/terra-2,Terra 37 | https://www.ycombinator.com/companies/ai-sell,AI Sell 38 | https://www.ycombinator.com/companies/diode-computers-inc,"Diode Computers, Inc." 39 | https://www.ycombinator.com/companies/surebright,SureBright 40 | https://www.ycombinator.com/companies/usul,Usul 41 | https://www.ycombinator.com/companies/saldor,Saldor 42 | https://www.ycombinator.com/companies/bucket-robotics,Bucket Robotics 43 | https://www.ycombinator.com/companies/oway,Oway 44 | https://www.ycombinator.com/companies/elayne,Elayne 45 | https://www.ycombinator.com/companies/blaze-2,Blaze 46 | https://www.ycombinator.com/companies/pap,pap! 47 | https://www.ycombinator.com/companies/mindely,Mindely 48 | https://www.ycombinator.com/companies/tradeflow,TradeFlow 49 | https://www.ycombinator.com/companies/voker,Voker 50 | https://www.ycombinator.com/companies/rastro,Rastro 51 | https://www.ycombinator.com/companies/quetzal,Quetzal 52 | https://www.ycombinator.com/companies/weavel,Weavel 53 | https://www.ycombinator.com/companies/moreta,Moreta 54 | https://www.ycombinator.com/companies/focus-buddy,Focus Buddy 55 | https://www.ycombinator.com/companies/simplifine,Simplifine 56 | https://www.ycombinator.com/companies/sepal-ai,Sepal AI 57 | https://www.ycombinator.com/companies/assembly-hoa,Assembly HOA 58 | https://www.ycombinator.com/companies/autarc,autarc 59 | https://www.ycombinator.com/companies/omnidock,omnidock 60 | https://www.ycombinator.com/companies/polymet,Polymet 61 | https://www.ycombinator.com/companies/kontigo,Kontigo 62 | https://www.ycombinator.com/companies/remo,Remo 63 | https://www.ycombinator.com/companies/propaya,Propaya 64 | https://www.ycombinator.com/companies/vendra,Vendra 65 | https://www.ycombinator.com/companies/argil,Argil 66 | https://www.ycombinator.com/companies/zeropath,ZeroPath 67 | https://www.ycombinator.com/companies/actionbase,Actionbase 68 | https://www.ycombinator.com/companies/corpdaq,Corpdaq 69 | https://www.ycombinator.com/companies/lumen-orbit,Lumen Orbit 70 | https://www.ycombinator.com/companies/dataleap,Dataleap 71 | https://www.ycombinator.com/companies/zimi,Zimi 72 | https://www.ycombinator.com/companies/thyme,Thyme 73 | https://www.ycombinator.com/companies/ionworks,Ionworks 74 | https://www.ycombinator.com/companies/rescript,Rescript 75 | https://www.ycombinator.com/companies/zigma-by-nextui,Zigma - by NextUI 76 | https://www.ycombinator.com/companies/helium,Helium 77 | https://www.ycombinator.com/companies/ai-2,&AI 78 | https://www.ycombinator.com/companies/haystack-software,Haystack Software 79 | https://www.ycombinator.com/companies/finosu,Finosu 80 | https://www.ycombinator.com/companies/odo,Odo 81 | https://www.ycombinator.com/companies/baseline-ai,Baseline AI 82 | https://www.ycombinator.com/companies/zephr,Zephr 83 | https://www.ycombinator.com/companies/winford-ai,Winford AI 84 | https://www.ycombinator.com/companies/cartage,Cartage 85 | https://www.ycombinator.com/companies/paasa,Paasa 86 | https://www.ycombinator.com/companies/blast,Blast 87 | https://www.ycombinator.com/companies/lilac-labs,Lilac Labs 88 | https://www.ycombinator.com/companies/palmier,Palmier 89 | https://www.ycombinator.com/companies/simple-ai,Simple AI 90 | https://www.ycombinator.com/companies/sensei,Sensei 91 | https://www.ycombinator.com/companies/guardian-rf,Guardian RF 92 | https://www.ycombinator.com/companies/laminar-ai,Laminar AI 93 | https://www.ycombinator.com/companies/cerulion,Cerulion 94 | https://www.ycombinator.com/companies/presti-ai,Presti AI 95 | https://www.ycombinator.com/companies/the-forecasting-company,The Forecasting Company 96 | https://www.ycombinator.com/companies/praxos,Praxos 97 | https://www.ycombinator.com/companies/exa-laboratories,Exa Laboratories 98 | https://www.ycombinator.com/companies/schemeflow,SchemeFlow 99 | https://www.ycombinator.com/companies/brighterway,Brighterway 100 | https://www.ycombinator.com/companies/mineflow,Mineflow 101 | https://www.ycombinator.com/companies/soff,Soff 102 | https://www.ycombinator.com/companies/parley,Parley 103 | https://www.ycombinator.com/companies/intryc,Intryc 104 | https://www.ycombinator.com/companies/pinnacle,Pinnacle 105 | https://www.ycombinator.com/companies/ember-robotics,Ember Robotics 106 | https://www.ycombinator.com/companies/thunder-compute,Thunder Compute 107 | https://www.ycombinator.com/companies/zeit-ai,Zeit AI 108 | https://www.ycombinator.com/companies/remade,Remade 109 | https://www.ycombinator.com/companies/distro,Distro 110 | https://www.ycombinator.com/companies/modern-realty,Modern Realty 111 | https://www.ycombinator.com/companies/david-ai,David AI 112 | https://www.ycombinator.com/companies/beebettor,BeeBettor 113 | https://www.ycombinator.com/companies/guardian-ai,Guardian AI 114 | https://www.ycombinator.com/companies/stempad,Stempad 115 | https://www.ycombinator.com/companies/pax,Pax 116 | https://www.ycombinator.com/companies/conductor-quantum,Conductor Quantum 117 | https://www.ycombinator.com/companies/autumn-labs,Autumn Labs 118 | https://www.ycombinator.com/companies/opslane,Opslane 119 | https://www.ycombinator.com/companies/asterisk,Asterisk 120 | https://www.ycombinator.com/companies/biocartesian,Biocartesian 121 | https://www.ycombinator.com/companies/seals-ai,Seals AI 122 | https://www.ycombinator.com/companies/deepsilicon,deepsilicon 123 | https://www.ycombinator.com/companies/drillbit,Drillbit 124 | https://www.ycombinator.com/companies/lumenary,Lumenary 125 | https://www.ycombinator.com/companies/dimely,Dimely 126 | https://www.ycombinator.com/companies/redouble-ai,Redouble AI 127 | https://www.ycombinator.com/companies/planbase,Planbase 128 | https://www.ycombinator.com/companies/unriddle,Unriddle 129 | https://www.ycombinator.com/companies/hestus-inc,"Hestus, Inc." 130 | https://www.ycombinator.com/companies/magicode,MagiCode 131 | https://www.ycombinator.com/companies/finny-ai,FINNY AI 132 | https://www.ycombinator.com/companies/focal-2,Focal 133 | https://www.ycombinator.com/companies/1849-bio,1849 bio 134 | https://www.ycombinator.com/companies/reactwise,ReactWise 135 | https://www.ycombinator.com/companies/olive-legal,Olive Legal 136 | https://www.ycombinator.com/companies/saturn,Saturn 137 | https://www.ycombinator.com/companies/clearly-ai,Clearly AI 138 | https://www.ycombinator.com/companies/expand-ai,expand.ai 139 | https://www.ycombinator.com/companies/overstand-labs,Overstand Labs 140 | https://www.ycombinator.com/companies/promi,Promi 141 | https://www.ycombinator.com/companies/ontra-mobility,Ontra Mobility 142 | https://www.ycombinator.com/companies/fortress,Fortress 143 | https://www.ycombinator.com/companies/panora,Panora 144 | https://www.ycombinator.com/companies/manaflow,Manaflow 145 | https://www.ycombinator.com/companies/conveo,Conveo 146 | https://www.ycombinator.com/companies/parity,Parity 147 | https://www.ycombinator.com/companies/kastle,Kastle 148 | https://www.ycombinator.com/companies/kopra-bio,Kopra Bio 149 | https://www.ycombinator.com/companies/anthrogen,Anthrogen 150 | https://www.ycombinator.com/companies/benchify,Benchify 151 | https://www.ycombinator.com/companies/fazeshift,Fazeshift 152 | https://www.ycombinator.com/companies/taxgpt,TaxGPT 153 | https://www.ycombinator.com/companies/substrate,Substrate 154 | https://www.ycombinator.com/companies/zenbase-ai,Zenbase AI 155 | https://www.ycombinator.com/companies/taxo,Taxo 156 | https://www.ycombinator.com/companies/mito-health,Mito Health 157 | https://www.ycombinator.com/companies/lighthouz-ai,Lighthouz AI 158 | https://www.ycombinator.com/companies/modus,Modus 159 | https://www.ycombinator.com/companies/dmodel,dmodel 160 | https://www.ycombinator.com/companies/ledgerup,LedgerUp 161 | https://www.ycombinator.com/companies/plume,Plume 162 | https://www.ycombinator.com/companies/pulse-ai,Pulse AI 163 | https://www.ycombinator.com/companies/autopallet-robotics,AutoPallet Robotics 164 | https://www.ycombinator.com/companies/stormy-ai,stormy.ai 165 | https://www.ycombinator.com/companies/merse-2,Merse 166 | https://www.ycombinator.com/companies/aminoanalytica,AminoAnalytica 167 | https://www.ycombinator.com/companies/gauge,Gauge 168 | https://www.ycombinator.com/companies/weave-robotics,Weave Robotics 169 | https://www.ycombinator.com/companies/tandem-2,Tandem 170 | https://www.ycombinator.com/companies/answergrid,AnswerGrid 171 | https://www.ycombinator.com/companies/tivara,Tivara 172 | https://www.ycombinator.com/companies/camfer,camfer 173 | https://www.ycombinator.com/companies/minusx,MinusX 174 | https://www.ycombinator.com/companies/snowpilot,Snowpilot 175 | https://www.ycombinator.com/companies/passage,Passage 176 | https://www.ycombinator.com/companies/parahelp,Parahelp 177 | https://www.ycombinator.com/companies/codeviz,CodeViz 178 | https://www.ycombinator.com/companies/neuralize,Neuralize 179 | https://www.ycombinator.com/companies/ares-industries,Ares Industries 180 | https://www.ycombinator.com/companies/dodo,Dodo 181 | https://www.ycombinator.com/companies/driftly,Driftly 182 | https://www.ycombinator.com/companies/elevate-2,Elevate 183 | https://www.ycombinator.com/companies/offstream,Offstream 184 | https://www.ycombinator.com/companies/storia-ai,Storia AI 185 | https://www.ycombinator.com/companies/deepsim-inc,"DeepSim, Inc." 186 | https://www.ycombinator.com/companies/reworks,reworks 187 | https://www.ycombinator.com/companies/zuni,Zuni 188 | https://www.ycombinator.com/companies/proxis,Proxis 189 | https://www.ycombinator.com/companies/rewbi,Rewbi 190 | https://www.ycombinator.com/companies/willow,Willow 191 | https://www.ycombinator.com/companies/aviary,Aviary 192 | https://www.ycombinator.com/companies/networkocean,NetworkOcean 193 | https://www.ycombinator.com/companies/silurian,Silurian 194 | https://www.ycombinator.com/companies/kura,Kura 195 | https://www.ycombinator.com/companies/unbound-security,Unbound Security 196 | https://www.ycombinator.com/companies/retrofix-ai,RetroFix AI 197 | https://www.ycombinator.com/companies/azalea-robotics-corporation,Azalea Robotics Corporation 198 | https://www.ycombinator.com/companies/decisional-ai,Decisional AI 199 | https://www.ycombinator.com/companies/moonglow,Moonglow 200 | https://www.ycombinator.com/companies/ideate-xyz,ideate.xyz 201 | https://www.ycombinator.com/companies/firstwork,FirstWork 202 | https://www.ycombinator.com/companies/sorcerer,Sorcerer 203 | https://www.ycombinator.com/companies/wordware,Wordware 204 | https://www.ycombinator.com/companies/maitai,Maitai 205 | https://www.ycombinator.com/companies/corgi,Corgi 206 | https://www.ycombinator.com/companies/phonely,Phonely 207 | https://www.ycombinator.com/companies/videogen,VideoGen 208 | https://www.ycombinator.com/companies/clara,Clara 209 | https://www.ycombinator.com/companies/stack-auth,Stack Auth 210 | https://www.ycombinator.com/companies/patchwork-technologies,Patchwork Technologies 211 | https://www.ycombinator.com/companies/tabular,Tabular 212 | https://www.ycombinator.com/companies/callback,Callback 213 | https://www.ycombinator.com/companies/flyflow,Flyflow 214 | https://www.ycombinator.com/companies/bayesline,Bayesline 215 | https://www.ycombinator.com/companies/digitalcarbon,DigitalCarbon 216 | https://www.ycombinator.com/companies/random-labs,Random Labs 217 | https://www.ycombinator.com/companies/unsloth-ai,Unsloth AI 218 | https://www.ycombinator.com/companies/codes-health,Codes Health 219 | https://www.ycombinator.com/companies/entangl,Entangl 220 | https://www.ycombinator.com/companies/claimsorted,ClaimSorted 221 | https://www.ycombinator.com/companies/mdhub,mdhub 222 | https://www.ycombinator.com/companies/hamming-ai,Hamming AI 223 | https://www.ycombinator.com/companies/genie,Genie 224 | https://www.ycombinator.com/companies/prohostai,ProhostAI 225 | https://www.ycombinator.com/companies/educato-ai,Educato AI 226 | https://www.ycombinator.com/companies/spaceium-inc,Spaceium Inc 227 | https://www.ycombinator.com/companies/comfy-deploy,Comfy Deploy 228 | https://www.ycombinator.com/companies/synnax,Synnax 229 | https://www.ycombinator.com/companies/theseus,Theseus 230 | https://www.ycombinator.com/companies/saphira-ai,Saphira AI 231 | https://www.ycombinator.com/companies/glasskube,Glasskube 232 | https://www.ycombinator.com/companies/mem0,Mem0 233 | https://www.ycombinator.com/companies/cheers-2,Cheers 234 | https://www.ycombinator.com/companies/ply-health,Ply Health 235 | https://www.ycombinator.com/companies/superfilter,Superfilter 236 | https://www.ycombinator.com/companies/superunit,Superunit 237 | https://www.ycombinator.com/companies/undermind,Undermind 238 | https://www.ycombinator.com/companies/domu-technology-inc,Domu Technology Inc. 239 | https://www.ycombinator.com/companies/apten,Apten 240 | https://www.ycombinator.com/companies/spherecast,Spherecast 241 | https://www.ycombinator.com/companies/miru-ml,Miru 242 | https://www.ycombinator.com/companies/cardlift,CardLift 243 | https://www.ycombinator.com/companies/abel-police,Abel Police 244 | https://www.ycombinator.com/companies/ultra,Ultra 245 | https://www.ycombinator.com/companies/angstrom-ai,Ångström AI 246 | https://www.ycombinator.com/companies/coval,Coval 247 | https://www.ycombinator.com/companies/evolvere-biosciences,Evolvere BioSciences 248 | https://www.ycombinator.com/companies/spur,Spur 249 | https://www.ycombinator.com/companies/ligo-biosciences,Ligo Biosciences 250 | https://www.ycombinator.com/companies/acx,ACX 251 | https://www.ycombinator.com/companies/cracked,Cracked 252 | https://www.ycombinator.com/companies/fuse-2,Fuse 253 | https://www.ycombinator.com/companies/central,Central 254 | https://www.ycombinator.com/companies/simplex,Simplex 255 | https://www.ycombinator.com/companies/poka-labs,Poka Labs 256 | -------------------------------------------------------------------------------- /demo.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 前提说明\n", 8 | "这是一个 demo 代码,主要展示整体流程,不包含批量处理等\n", 9 | "\n", 10 | "可自行进行完整补充\n", 11 | "\n", 12 | "## 目录\n", 13 | "- 爬取 YC S24 项目信息\n", 14 | " - 爬取项目总列表\n", 15 | " - 读取爬取后的列表文件,爬取项目完整信息\n", 16 | " - 爬取文章主体外的其他信息,如项目名片和创始人名片\n", 17 | " - 使用 jina.ai 获取文章主体内容\n", 18 | "- 使用 LLM 进行信息抽取\n", 19 | "- 生成海报的html" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 3, 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "# 一些配置信息\n", 29 | "\n", 30 | "## 请替换成你的ChromeDriver路径\n", 31 | "chrome_driver_path = \"./Chrome/ChromeDriver/chromedriver\"\n", 32 | "chrome_driver_path_bin = \"./Chrome/chrome-linux64/chrome\"\n", 33 | "\n", 34 | "## 请替换成你的代理地址和端口\n", 35 | "proxy = \"http://localhost:10809\"\n", 36 | "\n", 37 | "\n", 38 | "## 大模型API地址和API_key\n", 39 | "Base_url = \"https://open.bigmodel.cn/api/paas/v4\"\n", 40 | "API_key = \"你的API_key\"\n", 41 | "\n" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "## 爬取 YC S24 项目信息" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "### 爬取项目总列表" 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": 4, 61 | "metadata": {}, 62 | "outputs": [ 63 | { 64 | "name": "stdout", 65 | "output_type": "stream", 66 | "text": [ 67 | "Data has been written to companies.csv\n" 68 | ] 69 | } 70 | ], 71 | "source": [ 72 | "import csv\n", 73 | "import json\n", 74 | "import os\n", 75 | "import time\n", 76 | "from openai import OpenAI\n", 77 | "from bs4 import BeautifulSoup\n", 78 | "from selenium import webdriver\n", 79 | "from selenium.webdriver.chrome.service import Service\n", 80 | "from selenium.webdriver.common.by import By\n", 81 | "from selenium.webdriver.support.ui import WebDriverWait\n", 82 | "from selenium.webdriver.support import expected_conditions as EC\n", 83 | "\n", 84 | "# 个人的一些配置信息\n", 85 | "os.environ.pop('all_proxy', None)\n", 86 | "os.environ.pop('ALL_PROXY', None)\n", 87 | "\n", 88 | "# 启动 Chrome 浏览器\n", 89 | "options = webdriver.ChromeOptions()\n", 90 | "# options.add_argument(\"--headless\") # 使用无头模式\n", 91 | "options.binary_location = chrome_driver_path_bin # 设置 Chrome 二进制文件路径\n", 92 | "\n", 93 | "# 配置代理\n", 94 | "options.add_argument(f'--proxy-server={proxy}')\n", 95 | "\n", 96 | "service = Service(chrome_driver_path)\n", 97 | "driver = webdriver.Chrome(service=service, options=options)\n", 98 | "\n", 99 | "\n", 100 | "url = \"https://www.ycombinator.com/companies?batch=s24\"\n", 101 | "driver.get(url)\n", 102 | "\n", 103 | "# 获取页面高度并滚动到底\n", 104 | "last_height = driver.execute_script(\"return document.body.scrollHeight\")\n", 105 | "while True:\n", 106 | " driver.execute_script(\"window.scrollTo(0, document.body.scrollHeight);\")\n", 107 | " time.sleep(5) # 等待页面加载\n", 108 | "\n", 109 | " new_height = driver.execute_script(\"return document.body.scrollHeight\")\n", 110 | " if new_height == last_height: # 如果页面高度没有变化,说明已经到底部\n", 111 | " break\n", 112 | " last_height = new_height\n", 113 | "\n", 114 | "# 获取页面的 HTML 内容\n", 115 | "html = driver.page_source\n", 116 | "\n", 117 | "# 使用 BeautifulSoup 解析 HTML\n", 118 | "soup = BeautifulSoup(html, \"html.parser\")\n", 119 | "\n", 120 | "# 提取公司名称和链接\n", 121 | "rows = []\n", 122 | "for link in soup.find_all('a', class_='_company_86jzd_338'):\n", 123 | " url = link.get('href')\n", 124 | " if url: # 检查 URL 是否存在\n", 125 | " full_url = \"https://www.ycombinator.com\" + url\n", 126 | " else:\n", 127 | " full_url = '' # 如果 href 为空,则设为空字符串\n", 128 | "\n", 129 | " name_element = link.find('span', class_='_coName_86jzd_453')\n", 130 | " company_name = name_element.get_text() if name_element else ''\n", 131 | " \n", 132 | " # 将链接和公司名称添加到行\n", 133 | " rows.append([full_url, company_name])\n", 134 | "\n", 135 | "# 将数据保存到 CSV 文件\n", 136 | "csv_filename = \"companies.csv\"\n", 137 | "with open(csv_filename, mode='w', newline='', encoding='utf-8') as file:\n", 138 | " writer = csv.writer(file)\n", 139 | " writer.writerow([\"URL\", \"Company Name\"]) # 写入表头\n", 140 | " writer.writerows(rows) # 写入数据\n", 141 | "\n", 142 | "# 关闭浏览器\n", 143 | "driver.quit()\n", 144 | "\n", 145 | "print(f\"Data has been written to {csv_filename}\")" 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": {}, 151 | "source": [ 152 | "### 读取爬取后的列表文件,逐个爬取完整信息" 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": {}, 158 | "source": [ 159 | "#### 爬取除文章主体外的其他信息,如项目名片和创始人名片" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": 5, 165 | "metadata": {}, 166 | "outputs": [ 167 | { 168 | "name": "stdout", 169 | "output_type": "stream", 170 | "text": [ 171 | "
Void
\n", 172 | "Company Info:\n", 173 | "[{'company_name': 'Void',\n", 174 | " 'founded': '2024',\n", 175 | " 'group_partner_link': 'https://www.ycombinator.com/people/jared-friedman',\n", 176 | " 'group_partner_name': 'Jared Friedman',\n", 177 | " 'location': 'San Francisco',\n", 178 | " 'logo': 'https://bookface-images.s3.us-west-2.amazonaws.com/logos/c3f60489646b8949075e4fdc612cbb8365cc1720.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAQC4NIECAAIVN5AAU%2F20241007%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20241007T122407Z&X-Amz-Expires=2227&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEM%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMiJGMEQCICinVSBXaXDl%2Bn7PXDLE7WXz1LALZZ65wSS4InIjYHryAiBAB7i938pMcJLQ3AsO50eH7xLZXlyzL8EYM7B5qMaHXSrlAwgoEAAaDDAwNjIwMTgxMTA3MiIMi3whU1kFaq%2Fcb9CDKsIDXAqO%2Bzp7mB6Ii93JnjiXBPleLNPhG1tvPc6RF866X7fsLHuYpfPC8H9rLXP9X%2BHfnBo9qLTUbRufqaY0IPnB%2BdswveGzzN3gRGzPWbk6L8Mg30eY2pGQFTSas5VwDuoh%2BZu7ONQajxhCc1%2FkliXrdQer%2Fx8t4WmBW3esWeaDW0LGuIYg2VsApWj4PbXQP46%2Bw0EQ7Yij5LocgFbnwto2eFq97KMNcPIxjQhDfJCi0h23fWBTgFbLnYNrFYSe7pdzjZuf%2F4GHaxZLZVXPTov7RC%2F0Q3PbTtyrPmbGPOFKcPBqZM0B4rYwzMnIhRhM%2FOmf6xjdIxsbH%2BO2GJ%2FJX%2Fpf4o%2BpjoF1%2Fsvg3uiy1%2FOaQDPzEZxD8ivmQHrNNujV%2F%2Fd93Jvwvg2G4ah6e7p8dYvLpFTAKk%2BAhrt%2BjcQ9QJUQFIaayTTT4xt9HFTRJWECSK24HzgV7Q949XKDx%2FJhELdJI2FIOj6Fgoy4wRJnNOwOGMlmMMGy4gZMPkDxlNOnLmqLx9wxXfNoPxh4ZYnh5xUiwa3pTbkdrI0SyP7rau3%2FimkFwOz7pezLmivKG%2B9%2B5Zg2Iaqtbe8YMpxYwtpj35ZzGHoFMLuOjrgGOqYBVFyXlWElbzNZBAEBSM7R68u4flYB8vU0CwKKssJKOZrdoB2hJjCzow5hAeRRz7nP4U8lfbwYStZsbFwF8js4bEO%2FNhET7yhN7F1fm6hfZgROqUnmXqB%2B1lc5ACBOF1vjanxEW85bzffNbTjypFAGI7QaEueyPgUvOlbv6YpZMo%2F2dxit7eLfQBQbSjOZ194OhXRXZCdKnT8%2F%2F7t%2BcUWmUK%2BkCQ%2B1Sw%3D%3D&X-Amz-SignedHeaders=host&X-Amz-Signature=1064f9a6b92b27db258eb955d351f2edf78f64963efa9b1d768ec8b4d4e6e817',\n", 179 | " 'team_size': '2'},\n", 180 | " {'linkedin': 'https://linkedin.com/in/andrew-pareles',\n", 181 | " 'name': 'Andrew Pareles',\n", 182 | " 'photo_url': 'https://bookface-images.s3.us-west-2.amazonaws.com/avatars/a4377934cc3196fe2791dbb8733156403da0d9d8.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAQC4NIECAAIVN5AAU%2F20241007%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20241007T122407Z&X-Amz-Expires=2227&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEM%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMiJGMEQCICinVSBXaXDl%2Bn7PXDLE7WXz1LALZZ65wSS4InIjYHryAiBAB7i938pMcJLQ3AsO50eH7xLZXlyzL8EYM7B5qMaHXSrlAwgoEAAaDDAwNjIwMTgxMTA3MiIMi3whU1kFaq%2Fcb9CDKsIDXAqO%2Bzp7mB6Ii93JnjiXBPleLNPhG1tvPc6RF866X7fsLHuYpfPC8H9rLXP9X%2BHfnBo9qLTUbRufqaY0IPnB%2BdswveGzzN3gRGzPWbk6L8Mg30eY2pGQFTSas5VwDuoh%2BZu7ONQajxhCc1%2FkliXrdQer%2Fx8t4WmBW3esWeaDW0LGuIYg2VsApWj4PbXQP46%2Bw0EQ7Yij5LocgFbnwto2eFq97KMNcPIxjQhDfJCi0h23fWBTgFbLnYNrFYSe7pdzjZuf%2F4GHaxZLZVXPTov7RC%2F0Q3PbTtyrPmbGPOFKcPBqZM0B4rYwzMnIhRhM%2FOmf6xjdIxsbH%2BO2GJ%2FJX%2Fpf4o%2BpjoF1%2Fsvg3uiy1%2FOaQDPzEZxD8ivmQHrNNujV%2F%2Fd93Jvwvg2G4ah6e7p8dYvLpFTAKk%2BAhrt%2BjcQ9QJUQFIaayTTT4xt9HFTRJWECSK24HzgV7Q949XKDx%2FJhELdJI2FIOj6Fgoy4wRJnNOwOGMlmMMGy4gZMPkDxlNOnLmqLx9wxXfNoPxh4ZYnh5xUiwa3pTbkdrI0SyP7rau3%2FimkFwOz7pezLmivKG%2B9%2B5Zg2Iaqtbe8YMpxYwtpj35ZzGHoFMLuOjrgGOqYBVFyXlWElbzNZBAEBSM7R68u4flYB8vU0CwKKssJKOZrdoB2hJjCzow5hAeRRz7nP4U8lfbwYStZsbFwF8js4bEO%2FNhET7yhN7F1fm6hfZgROqUnmXqB%2B1lc5ACBOF1vjanxEW85bzffNbTjypFAGI7QaEueyPgUvOlbv6YpZMo%2F2dxit7eLfQBQbSjOZ194OhXRXZCdKnT8%2F%2F7t%2BcUWmUK%2BkCQ%2B1Sw%3D%3D&X-Amz-SignedHeaders=host&X-Amz-Signature=5ee676b939874af828e021a27a8c8d9954ad6d95ea9ff9946f9fbbb290d652de'},\n", 183 | " {'linkedin': 'https://linkedin.com/in/mathew-pareles/',\n", 184 | " 'name': 'Mathew Pareles',\n", 185 | " 'photo_url': 'https://bookface-images.s3.us-west-2.amazonaws.com/avatars/090db01fe52b67b04c6c2c8aed12b70c02276098.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAQC4NIECAAIVN5AAU%2F20241007%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20241007T122407Z&X-Amz-Expires=2227&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEM%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMiJGMEQCICinVSBXaXDl%2Bn7PXDLE7WXz1LALZZ65wSS4InIjYHryAiBAB7i938pMcJLQ3AsO50eH7xLZXlyzL8EYM7B5qMaHXSrlAwgoEAAaDDAwNjIwMTgxMTA3MiIMi3whU1kFaq%2Fcb9CDKsIDXAqO%2Bzp7mB6Ii93JnjiXBPleLNPhG1tvPc6RF866X7fsLHuYpfPC8H9rLXP9X%2BHfnBo9qLTUbRufqaY0IPnB%2BdswveGzzN3gRGzPWbk6L8Mg30eY2pGQFTSas5VwDuoh%2BZu7ONQajxhCc1%2FkliXrdQer%2Fx8t4WmBW3esWeaDW0LGuIYg2VsApWj4PbXQP46%2Bw0EQ7Yij5LocgFbnwto2eFq97KMNcPIxjQhDfJCi0h23fWBTgFbLnYNrFYSe7pdzjZuf%2F4GHaxZLZVXPTov7RC%2F0Q3PbTtyrPmbGPOFKcPBqZM0B4rYwzMnIhRhM%2FOmf6xjdIxsbH%2BO2GJ%2FJX%2Fpf4o%2BpjoF1%2Fsvg3uiy1%2FOaQDPzEZxD8ivmQHrNNujV%2F%2Fd93Jvwvg2G4ah6e7p8dYvLpFTAKk%2BAhrt%2BjcQ9QJUQFIaayTTT4xt9HFTRJWECSK24HzgV7Q949XKDx%2FJhELdJI2FIOj6Fgoy4wRJnNOwOGMlmMMGy4gZMPkDxlNOnLmqLx9wxXfNoPxh4ZYnh5xUiwa3pTbkdrI0SyP7rau3%2FimkFwOz7pezLmivKG%2B9%2B5Zg2Iaqtbe8YMpxYwtpj35ZzGHoFMLuOjrgGOqYBVFyXlWElbzNZBAEBSM7R68u4flYB8vU0CwKKssJKOZrdoB2hJjCzow5hAeRRz7nP4U8lfbwYStZsbFwF8js4bEO%2FNhET7yhN7F1fm6hfZgROqUnmXqB%2B1lc5ACBOF1vjanxEW85bzffNbTjypFAGI7QaEueyPgUvOlbv6YpZMo%2F2dxit7eLfQBQbSjOZ194OhXRXZCdKnT8%2F%2F7t%2BcUWmUK%2BkCQ%2B1Sw%3D%3D&X-Amz-SignedHeaders=host&X-Amz-Signature=36b66c0d21385a065a0b76f8d2e740a0aa59b2f404c3baaaaee00f5c483470b6'}]\n", 186 | "\n", 187 | "Founders Info:\n", 188 | "[{'company_name': 'Void',\n", 189 | " 'founded': '2024',\n", 190 | " 'group_partner_link': 'https://www.ycombinator.com/people/jared-friedman',\n", 191 | " 'group_partner_name': 'Jared Friedman',\n", 192 | " 'location': 'San Francisco',\n", 193 | " 'logo': 'https://bookface-images.s3.us-west-2.amazonaws.com/logos/c3f60489646b8949075e4fdc612cbb8365cc1720.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAQC4NIECAAIVN5AAU%2F20241007%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20241007T122407Z&X-Amz-Expires=2227&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEM%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMiJGMEQCICinVSBXaXDl%2Bn7PXDLE7WXz1LALZZ65wSS4InIjYHryAiBAB7i938pMcJLQ3AsO50eH7xLZXlyzL8EYM7B5qMaHXSrlAwgoEAAaDDAwNjIwMTgxMTA3MiIMi3whU1kFaq%2Fcb9CDKsIDXAqO%2Bzp7mB6Ii93JnjiXBPleLNPhG1tvPc6RF866X7fsLHuYpfPC8H9rLXP9X%2BHfnBo9qLTUbRufqaY0IPnB%2BdswveGzzN3gRGzPWbk6L8Mg30eY2pGQFTSas5VwDuoh%2BZu7ONQajxhCc1%2FkliXrdQer%2Fx8t4WmBW3esWeaDW0LGuIYg2VsApWj4PbXQP46%2Bw0EQ7Yij5LocgFbnwto2eFq97KMNcPIxjQhDfJCi0h23fWBTgFbLnYNrFYSe7pdzjZuf%2F4GHaxZLZVXPTov7RC%2F0Q3PbTtyrPmbGPOFKcPBqZM0B4rYwzMnIhRhM%2FOmf6xjdIxsbH%2BO2GJ%2FJX%2Fpf4o%2BpjoF1%2Fsvg3uiy1%2FOaQDPzEZxD8ivmQHrNNujV%2F%2Fd93Jvwvg2G4ah6e7p8dYvLpFTAKk%2BAhrt%2BjcQ9QJUQFIaayTTT4xt9HFTRJWECSK24HzgV7Q949XKDx%2FJhELdJI2FIOj6Fgoy4wRJnNOwOGMlmMMGy4gZMPkDxlNOnLmqLx9wxXfNoPxh4ZYnh5xUiwa3pTbkdrI0SyP7rau3%2FimkFwOz7pezLmivKG%2B9%2B5Zg2Iaqtbe8YMpxYwtpj35ZzGHoFMLuOjrgGOqYBVFyXlWElbzNZBAEBSM7R68u4flYB8vU0CwKKssJKOZrdoB2hJjCzow5hAeRRz7nP4U8lfbwYStZsbFwF8js4bEO%2FNhET7yhN7F1fm6hfZgROqUnmXqB%2B1lc5ACBOF1vjanxEW85bzffNbTjypFAGI7QaEueyPgUvOlbv6YpZMo%2F2dxit7eLfQBQbSjOZ194OhXRXZCdKnT8%2F%2F7t%2BcUWmUK%2BkCQ%2B1Sw%3D%3D&X-Amz-SignedHeaders=host&X-Amz-Signature=1064f9a6b92b27db258eb955d351f2edf78f64963efa9b1d768ec8b4d4e6e817',\n", 194 | " 'team_size': '2'},\n", 195 | " {'linkedin': 'https://linkedin.com/in/andrew-pareles',\n", 196 | " 'name': 'Andrew Pareles',\n", 197 | " 'photo_url': 'https://bookface-images.s3.us-west-2.amazonaws.com/avatars/a4377934cc3196fe2791dbb8733156403da0d9d8.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAQC4NIECAAIVN5AAU%2F20241007%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20241007T122407Z&X-Amz-Expires=2227&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEM%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMiJGMEQCICinVSBXaXDl%2Bn7PXDLE7WXz1LALZZ65wSS4InIjYHryAiBAB7i938pMcJLQ3AsO50eH7xLZXlyzL8EYM7B5qMaHXSrlAwgoEAAaDDAwNjIwMTgxMTA3MiIMi3whU1kFaq%2Fcb9CDKsIDXAqO%2Bzp7mB6Ii93JnjiXBPleLNPhG1tvPc6RF866X7fsLHuYpfPC8H9rLXP9X%2BHfnBo9qLTUbRufqaY0IPnB%2BdswveGzzN3gRGzPWbk6L8Mg30eY2pGQFTSas5VwDuoh%2BZu7ONQajxhCc1%2FkliXrdQer%2Fx8t4WmBW3esWeaDW0LGuIYg2VsApWj4PbXQP46%2Bw0EQ7Yij5LocgFbnwto2eFq97KMNcPIxjQhDfJCi0h23fWBTgFbLnYNrFYSe7pdzjZuf%2F4GHaxZLZVXPTov7RC%2F0Q3PbTtyrPmbGPOFKcPBqZM0B4rYwzMnIhRhM%2FOmf6xjdIxsbH%2BO2GJ%2FJX%2Fpf4o%2BpjoF1%2Fsvg3uiy1%2FOaQDPzEZxD8ivmQHrNNujV%2F%2Fd93Jvwvg2G4ah6e7p8dYvLpFTAKk%2BAhrt%2BjcQ9QJUQFIaayTTT4xt9HFTRJWECSK24HzgV7Q949XKDx%2FJhELdJI2FIOj6Fgoy4wRJnNOwOGMlmMMGy4gZMPkDxlNOnLmqLx9wxXfNoPxh4ZYnh5xUiwa3pTbkdrI0SyP7rau3%2FimkFwOz7pezLmivKG%2B9%2B5Zg2Iaqtbe8YMpxYwtpj35ZzGHoFMLuOjrgGOqYBVFyXlWElbzNZBAEBSM7R68u4flYB8vU0CwKKssJKOZrdoB2hJjCzow5hAeRRz7nP4U8lfbwYStZsbFwF8js4bEO%2FNhET7yhN7F1fm6hfZgROqUnmXqB%2B1lc5ACBOF1vjanxEW85bzffNbTjypFAGI7QaEueyPgUvOlbv6YpZMo%2F2dxit7eLfQBQbSjOZ194OhXRXZCdKnT8%2F%2F7t%2BcUWmUK%2BkCQ%2B1Sw%3D%3D&X-Amz-SignedHeaders=host&X-Amz-Signature=5ee676b939874af828e021a27a8c8d9954ad6d95ea9ff9946f9fbbb290d652de'},\n", 198 | " {'linkedin': 'https://linkedin.com/in/mathew-pareles/',\n", 199 | " 'name': 'Mathew Pareles',\n", 200 | " 'photo_url': 'https://bookface-images.s3.us-west-2.amazonaws.com/avatars/090db01fe52b67b04c6c2c8aed12b70c02276098.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAQC4NIECAAIVN5AAU%2F20241007%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20241007T122407Z&X-Amz-Expires=2227&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEM%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMiJGMEQCICinVSBXaXDl%2Bn7PXDLE7WXz1LALZZ65wSS4InIjYHryAiBAB7i938pMcJLQ3AsO50eH7xLZXlyzL8EYM7B5qMaHXSrlAwgoEAAaDDAwNjIwMTgxMTA3MiIMi3whU1kFaq%2Fcb9CDKsIDXAqO%2Bzp7mB6Ii93JnjiXBPleLNPhG1tvPc6RF866X7fsLHuYpfPC8H9rLXP9X%2BHfnBo9qLTUbRufqaY0IPnB%2BdswveGzzN3gRGzPWbk6L8Mg30eY2pGQFTSas5VwDuoh%2BZu7ONQajxhCc1%2FkliXrdQer%2Fx8t4WmBW3esWeaDW0LGuIYg2VsApWj4PbXQP46%2Bw0EQ7Yij5LocgFbnwto2eFq97KMNcPIxjQhDfJCi0h23fWBTgFbLnYNrFYSe7pdzjZuf%2F4GHaxZLZVXPTov7RC%2F0Q3PbTtyrPmbGPOFKcPBqZM0B4rYwzMnIhRhM%2FOmf6xjdIxsbH%2BO2GJ%2FJX%2Fpf4o%2BpjoF1%2Fsvg3uiy1%2FOaQDPzEZxD8ivmQHrNNujV%2F%2Fd93Jvwvg2G4ah6e7p8dYvLpFTAKk%2BAhrt%2BjcQ9QJUQFIaayTTT4xt9HFTRJWECSK24HzgV7Q949XKDx%2FJhELdJI2FIOj6Fgoy4wRJnNOwOGMlmMMGy4gZMPkDxlNOnLmqLx9wxXfNoPxh4ZYnh5xUiwa3pTbkdrI0SyP7rau3%2FimkFwOz7pezLmivKG%2B9%2B5Zg2Iaqtbe8YMpxYwtpj35ZzGHoFMLuOjrgGOqYBVFyXlWElbzNZBAEBSM7R68u4flYB8vU0CwKKssJKOZrdoB2hJjCzow5hAeRRz7nP4U8lfbwYStZsbFwF8js4bEO%2FNhET7yhN7F1fm6hfZgROqUnmXqB%2B1lc5ACBOF1vjanxEW85bzffNbTjypFAGI7QaEueyPgUvOlbv6YpZMo%2F2dxit7eLfQBQbSjOZ194OhXRXZCdKnT8%2F%2F7t%2BcUWmUK%2BkCQ%2B1Sw%3D%3D&X-Amz-SignedHeaders=host&X-Amz-Signature=36b66c0d21385a065a0b76f8d2e740a0aa59b2f404c3baaaaee00f5c483470b6'}]\n" 201 | ] 202 | } 203 | ], 204 | "source": [ 205 | "from pprint import pprint\n", 206 | "\n", 207 | "# 启动 Chrome 浏览器\n", 208 | "options = webdriver.ChromeOptions()\n", 209 | "# options.add_argument(\"--headless\") # 使用无头模式\n", 210 | "options.binary_location = chrome_driver_path_bin # 设置 Chrome 二进制文件路径\n", 211 | "\n", 212 | "# 配置代理\n", 213 | "options.add_argument(f'--proxy-server={proxy}')\n", 214 | "\n", 215 | "service = Service(chrome_driver_path)\n", 216 | "driver = webdriver.Chrome(service=service, options=options)\n", 217 | "\n", 218 | "project_detail_url = \"https://www.ycombinator.com/companies/void\"\n", 219 | "driver.get(project_detail_url)\n", 220 | "\n", 221 | "# 使用显式等待确保页面加载完毕\n", 222 | "WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CLASS_NAME, 'ycdc-card')))\n", 223 | "\n", 224 | "detail_html = driver.page_source\n", 225 | "\n", 226 | "driver.quit()\n", 227 | "\n", 228 | "soup = BeautifulSoup(detail_html, 'html.parser')\n", 229 | "\n", 230 | "founders_info = []\n", 231 | "company_cards = soup.find_all('div', class_='ycdc-card space-y-1.5 sm:w-[300px]')\n", 232 | "founders_cards = soup.find_all('div', class_='shrink-0 space-y-1.5 rounded-md border-[1px] border-[#999] bg-[#FDFDF7] p-6 sm:w-[300px]')\n", 233 | "\n", 234 | "for card in company_cards:\n", 235 | " # LOGO 获取\n", 236 | " img = card.find('img')['src'] if card.find('img') else '无图片'\n", 237 | " \n", 238 | " # Company Name 获取\n", 239 | " company_name_div = card.find('div', class_='text-lg font-bold', recursive=False)\n", 240 | " print(company_name_div)\n", 241 | " company_name = company_name_div.text.strip() if company_name_div else '无公司名'\n", 242 | "\n", 243 | " # Company Description 获取\n", 244 | " founded = card.find(string='Founded:').find_next().text.strip() if card.find(string='Founded:') else '无成立年份'\n", 245 | " team_size = card.find(string='Team Size:').find_next().text.strip() if card.find(string='Team Size:') else '无团队规模'\n", 246 | " location = card.find(string='Location:').find_next().text.strip() if card.find(string='Location:') else '无位置'\n", 247 | " group_partner = card.find(string='Group Partner:').find_next()\n", 248 | " group_partner_name = group_partner.text.strip() if group_partner else '无合伙人' \n", 249 | " group_partner_link = group_partner['href'] if group_partner else '无合伙人链接'\n", 250 | "\n", 251 | " # 将公司信息存储到字典中\n", 252 | " company_data = {\n", 253 | " 'logo': img,\n", 254 | " 'company_name': company_name,\n", 255 | " 'founded': founded,\n", 256 | " 'team_size': team_size,\n", 257 | " 'location': location,\n", 258 | " 'group_partner_name': group_partner_name,\n", 259 | " 'group_partner_link': group_partner_link\n", 260 | " }\n", 261 | "\n", 262 | " # 将字典添加到列表中\n", 263 | " founders_info.append(company_data)\n", 264 | "\n", 265 | "for card in founders_cards:\n", 266 | " # 提取照片\n", 267 | " photo_url = card.find('img')['src']\n", 268 | " \n", 269 | " # 提取姓名\n", 270 | " name = card.find('div', class_='font-bold').text.strip()\n", 271 | "\n", 272 | " # 提取 LinkedIn 链接\n", 273 | " linkedin_link = card.find('a', href=True, title='LinkedIn profile')['href']\n", 274 | "\n", 275 | " # 将信息存储到字典中\n", 276 | " founder_data = {\n", 277 | " 'photo_url': photo_url,\n", 278 | " 'name': name,\n", 279 | " 'linkedin': linkedin_link\n", 280 | " }\n", 281 | "\n", 282 | " # 将字典添加到列表中\n", 283 | " founders_info.append(founder_data)\n", 284 | "\n", 285 | "print(\"Company Info:\")\n", 286 | "pprint(founders_info)\n", 287 | "\n", 288 | "print(\"\\nFounders Info:\")\n", 289 | "pprint(founders_info)" 290 | ] 291 | }, 292 | { 293 | "cell_type": "markdown", 294 | "metadata": {}, 295 | "source": [ 296 | "#### 使用 jina.ai 获取文章主体内容" 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": 6, 302 | "metadata": {}, 303 | "outputs": [ 304 | { 305 | "name": "stdout", 306 | "output_type": "stream", 307 | "text": [ 308 | "Scraping https://r.jina.ai/https://www.ycombinator.com/companies/void...\n", 309 | "Scraped Content: Title: Void: The open source Cursor alternative | Y Combinator\n", 310 | "\n", 311 | "URL Source: https://www.ycombinator.com/companies/void\n", 312 | "\n", 313 | "Markdown Content:\n", 314 | "![Image 1](https://bookface-images.s3.amazonaws.com/small_logos/0ab88e3a4fc8224ae094d26b68f966408fe4cf3f.png)\n", 315 | "\n", 316 | "### The open source Cursor alternative\n", 317 | "\n", 318 | "Void is an open source AI code editor. It provides developers with the AI features of Cursor, GitHub Copilot, and more, without sending their code to an external API.\n", 319 | "\n", 320 | "![Image 2: Void](https://bookface-images.s3.us-west-2.amazonaws.com/logos/c3f60489646b8949075e4fdc612cbb8365cc1720.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAQC4NIECAFIKN3J4A%2F20241007%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20241007T122413Z&X-Amz-Expires=2245&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEM%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMiJHMEUCIFYomW1JngLDk%2BB%2BqOyokIQZ3sLbrNKywrUKDp6NV3ZAAiEAthPerfTfr4AuX5zIs8fK3oOVy%2BgY980bN6TXR%2Bh4FLYq5QMIKBAAGgwwMDYyMDE4MTEwNzIiDG5xYZwSUMs8xMd9uCrCA8e79InPYD6HdZxW%2BTD1CovzIl6gRm42Pw0hhldCx20BZGesTkcmgWZ%2FRI55vGzs4Db3NvEit9lqIZHvDD9gmipcdNOVj5pZ9jHnPbc5SUi5WUunfsS4%2BeG9MQYmAz9Ib6gfTEG9FZlfFiiY5dS7N%2BOmwzjM82R2xNitnK0XL%2BBBsLffcqFA4s0wV0Axo7AaAsR53ZyEicL5oWMwEOoKVKwCrBH8Dgiax%2FsNDeWDvSe3TA0oN1jZUiqFTvmQEsJOyom9UkDmgPlBjjTf5UMz2YEBKvtfD7uacR3havUY1UwhaJzN%2B77wOKmiQfiZ4XKeblaekj8lTzdJCVZrFmf3dtq26%2Fo7RYyjO5xRHJmfvB4Ydx2zfeUxJWHnBgmpEcHRlRiJa7D1LEjPsW%2FwmloV%2FmdRDgmIxTIn%2FifXgHQINj7CK4XxUWHK340HkQI%2FetQXeGwhvqbyeJjyTaJtOHM8zHt7Q7IzgF9UB8gb%2FPSunK22p4GMnn2dU6BexvQr3SGgNFnRY8Bqj5uc21sTvnJ%2FxlNqNxHwoh5F%2BCY%2FIGaP%2FXOZ%2BY393HLvsWgVUT4ZmrDq%2B1cyrXpiSxe6ojs%2BiOzbG7bS2zDTjo64BjqlATD0wjqkNyLjA8fQ%2BKCCMz8plYsqryaK0YaNv6hiDB3KaAYLIW1%2Bb8jyJ5740gODN575tteBSb%2FsYCo9zohx0GFJByhe65Zl5MjPLbUsi2P1dp5lsQ3pLI%2BI7OgoV521X7bOwdh3PQ%2BUpCc1YX5TP%2FJHMKWtfIEsIcUwCYjRG8RFwDxCDtc%2FgLv5BbiQLHRL%2F%2BcyiG9INq%2Fz%2BXcs5z7%2B37ee%2B45HtQ%3D%3D&X-Amz-SignedHeaders=host&X-Amz-Signature=59d62ed27d146c7b1aabf18c76b4b9f1a498cfff36c4d8f637f3916aed609faf)\n", 321 | "\n", 322 | "Void\n", 323 | "\n", 324 | "Founded:2024\n", 325 | "\n", 326 | "Team Size:2\n", 327 | "\n", 328 | "Location:San Francisco\n", 329 | "\n", 330 | "### Active Founders\n", 331 | "\n", 332 | "### Andrew Pareles, Founder\n", 333 | "\n", 334 | "Andrew is the CEO of Void. Before starting Void, he bootstrapped an edtech company, did quantum computing research @ Johns Hopkins APL, and studied computer science @ Cornell.\n", 335 | "\n", 336 | "![Image 3: Andrew Pareles](https://bookface-images.s3.us-west-2.amazonaws.com/avatars/a4377934cc3196fe2791dbb8733156403da0d9d8.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAQC4NIECAFIKN3J4A%2F20241007%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20241007T122413Z&X-Amz-Expires=2245&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEM%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMiJHMEUCIFYomW1JngLDk%2BB%2BqOyokIQZ3sLbrNKywrUKDp6NV3ZAAiEAthPerfTfr4AuX5zIs8fK3oOVy%2BgY980bN6TXR%2Bh4FLYq5QMIKBAAGgwwMDYyMDE4MTEwNzIiDG5xYZwSUMs8xMd9uCrCA8e79InPYD6HdZxW%2BTD1CovzIl6gRm42Pw0hhldCx20BZGesTkcmgWZ%2FRI55vGzs4Db3NvEit9lqIZHvDD9gmipcdNOVj5pZ9jHnPbc5SUi5WUunfsS4%2BeG9MQYmAz9Ib6gfTEG9FZlfFiiY5dS7N%2BOmwzjM82R2xNitnK0XL%2BBBsLffcqFA4s0wV0Axo7AaAsR53ZyEicL5oWMwEOoKVKwCrBH8Dgiax%2FsNDeWDvSe3TA0oN1jZUiqFTvmQEsJOyom9UkDmgPlBjjTf5UMz2YEBKvtfD7uacR3havUY1UwhaJzN%2B77wOKmiQfiZ4XKeblaekj8lTzdJCVZrFmf3dtq26%2Fo7RYyjO5xRHJmfvB4Ydx2zfeUxJWHnBgmpEcHRlRiJa7D1LEjPsW%2FwmloV%2FmdRDgmIxTIn%2FifXgHQINj7CK4XxUWHK340HkQI%2FetQXeGwhvqbyeJjyTaJtOHM8zHt7Q7IzgF9UB8gb%2FPSunK22p4GMnn2dU6BexvQr3SGgNFnRY8Bqj5uc21sTvnJ%2FxlNqNxHwoh5F%2BCY%2FIGaP%2FXOZ%2BY393HLvsWgVUT4ZmrDq%2B1cyrXpiSxe6ojs%2BiOzbG7bS2zDTjo64BjqlATD0wjqkNyLjA8fQ%2BKCCMz8plYsqryaK0YaNv6hiDB3KaAYLIW1%2Bb8jyJ5740gODN575tteBSb%2FsYCo9zohx0GFJByhe65Zl5MjPLbUsi2P1dp5lsQ3pLI%2BI7OgoV521X7bOwdh3PQ%2BUpCc1YX5TP%2FJHMKWtfIEsIcUwCYjRG8RFwDxCDtc%2FgLv5BbiQLHRL%2F%2BcyiG9INq%2Fz%2BXcs5z7%2B37ee%2B45HtQ%3D%3D&X-Amz-SignedHeaders=host&X-Amz-Signature=335128d8b1f951f02cc54149a83977b50b0905a5f3288af75626528cb6ddaa79)\n", 337 | "\n", 338 | "### Mathew Pareles, Founder\n", 339 | "\n", 340 | "Mathew is the CTO of Void, the open-source AI code editor. He studied physics and computer science @Cornell, and conducted machine learning research with a former string theorist @Harvard.\n", 341 | "\n", 342 | "![Image 4: Mathew Pareles](https://bookface-images.s3.us-west-2.amazonaws.com/avatars/090db01fe52b67b04c6c2c8aed12b70c02276098.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAQC4NIECAFIKN3J4A%2F20241007%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20241007T122413Z&X-Amz-Expires=2245&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEM%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMiJHMEUCIFYomW1JngLDk%2BB%2BqOyokIQZ3sLbrNKywrUKDp6NV3ZAAiEAthPerfTfr4AuX5zIs8fK3oOVy%2BgY980bN6TXR%2Bh4FLYq5QMIKBAAGgwwMDYyMDE4MTEwNzIiDG5xYZwSUMs8xMd9uCrCA8e79InPYD6HdZxW%2BTD1CovzIl6gRm42Pw0hhldCx20BZGesTkcmgWZ%2FRI55vGzs4Db3NvEit9lqIZHvDD9gmipcdNOVj5pZ9jHnPbc5SUi5WUunfsS4%2BeG9MQYmAz9Ib6gfTEG9FZlfFiiY5dS7N%2BOmwzjM82R2xNitnK0XL%2BBBsLffcqFA4s0wV0Axo7AaAsR53ZyEicL5oWMwEOoKVKwCrBH8Dgiax%2FsNDeWDvSe3TA0oN1jZUiqFTvmQEsJOyom9UkDmgPlBjjTf5UMz2YEBKvtfD7uacR3havUY1UwhaJzN%2B77wOKmiQfiZ4XKeblaekj8lTzdJCVZrFmf3dtq26%2Fo7RYyjO5xRHJmfvB4Ydx2zfeUxJWHnBgmpEcHRlRiJa7D1LEjPsW%2FwmloV%2FmdRDgmIxTIn%2FifXgHQINj7CK4XxUWHK340HkQI%2FetQXeGwhvqbyeJjyTaJtOHM8zHt7Q7IzgF9UB8gb%2FPSunK22p4GMnn2dU6BexvQr3SGgNFnRY8Bqj5uc21sTvnJ%2FxlNqNxHwoh5F%2BCY%2FIGaP%2FXOZ%2BY393HLvsWgVUT4ZmrDq%2B1cyrXpiSxe6ojs%2BiOzbG7bS2zDTjo64BjqlATD0wjqkNyLjA8fQ%2BKCCMz8plYsqryaK0YaNv6hiDB3KaAYLIW1%2Bb8jyJ5740gODN575tteBSb%2FsYCo9zohx0GFJByhe65Zl5MjPLbUsi2P1dp5lsQ3pLI%2BI7OgoV521X7bOwdh3PQ%2BUpCc1YX5TP%2FJHMKWtfIEsIcUwCYjRG8RFwDxCDtc%2FgLv5BbiQLHRL%2F%2BcyiG9INq%2Fz%2BXcs5z7%2B37ee%2B45HtQ%3D%3D&X-Amz-SignedHeaders=host&X-Amz-Signature=3186d4fc480a4d85e546882c2227ff318424b6b1f03b2809d68625ea4fee9593)\n", 343 | "\n", 344 | "### Company Launches\n", 345 | "\n", 346 | "[### Void: The open source Cursor alternative](https://www.ycombinator.com/launches/Lrh-void-the-open-source-cursor-alternative)\n", 347 | "\n", 348 | "**_TL;DR:_** [Void](http://voideditor.com/) is the open source Cursor alternative. It offers the AI features of an editor like Cursor, while allowing you to host your own AI model locally and keep your data private. Check us out on [GitHub](http://github.com/voideditor/void)!\n", 349 | "\n", 350 | "![Image 5](https://www.ycombinator.com/media/?type=post&id=84053&key=user_uploads/1324771/56eae9a0-49fd-499f-b101-3e15e81914a9)\n", 351 | "\n", 352 | "About us\n", 353 | "--------\n", 354 | "\n", 355 | "Hi everyone, we’re [Mat](https://linkedin.com/in/mathew-pareles/) and [Andrew](https://linkedin.com/in/andrew-pareles), and we’re the team behind Void. Mat’s been prompting Transformers since before GPT3 became mainstream, and Andrew did quantum computing research at JHU APL. We’re best friends who have been programming together since we were 8.\n", 356 | "\n", 357 | "![Image 6](https://www.ycombinator.com/media/?type=post&id=84053&key=user_uploads/1324771/30b62998-29ed-447f-883b-f0875c3d1ffa)\n", 358 | "\n", 359 | "_left to right: Mat (CTO), Andrew (CEO)_\n", 360 | "\n", 361 | "❌ The problem: the leading AI IDE is closed source\n", 362 | "--------------------------------------------------\n", 363 | "\n", 364 | "Cursor is a closed-source AI editor, which means you need to send your private data through Cursor’s backend every time you use it. This leads to obvious privacy concerns. It’s also expensive for developers, and it means one individual has full control over a powerful AI model.\n", 365 | "\n", 366 | "We think the future of AI should be open. You should be able to use whatever model you like and not be locked into using a privatized commercial one that’s built to collect your data.\n", 367 | "\n", 368 | "✅ The solution: open source\n", 369 | "---------------------------\n", 370 | "\n", 371 | "The solution: create an open-source Cursor alternative. This way, you can choose to self-host or communicate directly with any model you like, anywhere you like.\n", 372 | "\n", 373 | "There are lots of benefits to open-sourcing besides privacy, too: on Void, you can access community-made AI features beyond the ones that Cursor offers, and gain full access to all your prompts.\n", 374 | "\n", 375 | "Key features\n", 376 | "------------\n", 377 | "\n", 378 | "We’re building a lot. Here are the features we’re most excited about:\n", 379 | "\n", 380 | "* **File system awareness:** Void can answer questions about your entire codebase. It builds an index of all your files and uses it automatically, so you don’t manually select relevant files.\n", 381 | "* **View and edit underlying prompts:** Void lets you open up the hood and view/edit every single prompt in your chat history. This is really important if you want to understand how the LLM got its answer, and it’s lacking in most closed-source tools.\n", 382 | "* **Fast edits across 1000s of lines:** Even if your file is thousands of lines long, Void applies changes almost instantly. We prompt the AI to make changes the way a human would instead of rewriting the whole file.\n", 383 | "\n", 384 | "![Image 7](https://www.ycombinator.com/media/?type=post&id=84053&key=user_uploads/1324771/48310318-059a-4844-8d75-a0c71e2a8385)\n", 385 | "\n", 386 | "* **Full privacy / data control:** All of Void’s prompt-building is done on your own computer. This means you can self-host your own model and have your data never leave your network. Or, you can connect directly to Claude, GPT, or Gemini, and not worry about sending your company data through a middle layer of communication.\n", 387 | "\n", 388 | "We think the future of AI is open, and we’re excited about building it.\n", 389 | "\n", 390 | "**🙏 Asks**\n", 391 | "-----------\n", 392 | "\n", 393 | "* Check out the [**Void editor**](http://voideditor.com/) and star us on GitHub!\n", 394 | "* If you’re interested in helping us build, join our [Discord](https://discord.gg/PspNkKG5wt) or shoot us an email. We rely on contributions, even small ones, and we’ll help you get up and running as a contributor!\n", 395 | "* Please share this post! Help spread the word that we’re building an open source Cursor alternative.\n", 396 | "\n", 397 | "#### YC Sign Photo\n", 398 | "\n", 399 | "![Image 8: YC Sign Photo](https://bookface-images.s3.us-west-2.amazonaws.com/attachments/7bd912674ec2956ed721dc02fec7cfd0b2ba7de0.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAQC4NIECAFIKN3J4A%2F20241007%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20241007T122413Z&X-Amz-Expires=2245&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEM%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMiJHMEUCIFYomW1JngLDk%2BB%2BqOyokIQZ3sLbrNKywrUKDp6NV3ZAAiEAthPerfTfr4AuX5zIs8fK3oOVy%2BgY980bN6TXR%2Bh4FLYq5QMIKBAAGgwwMDYyMDE4MTEwNzIiDG5xYZwSUMs8xMd9uCrCA8e79InPYD6HdZxW%2BTD1CovzIl6gRm42Pw0hhldCx20BZGesTkcmgWZ%2FRI55vGzs4Db3NvEit9lqIZHvDD9gmipcdNOVj5pZ9jHnPbc5SUi5WUunfsS4%2BeG9MQYmAz9Ib6gfTEG9FZlfFiiY5dS7N%2BOmwzjM82R2xNitnK0XL%2BBBsLffcqFA4s0wV0Axo7AaAsR53ZyEicL5oWMwEOoKVKwCrBH8Dgiax%2FsNDeWDvSe3TA0oN1jZUiqFTvmQEsJOyom9UkDmgPlBjjTf5UMz2YEBKvtfD7uacR3havUY1UwhaJzN%2B77wOKmiQfiZ4XKeblaekj8lTzdJCVZrFmf3dtq26%2Fo7RYyjO5xRHJmfvB4Ydx2zfeUxJWHnBgmpEcHRlRiJa7D1LEjPsW%2FwmloV%2FmdRDgmIxTIn%2FifXgHQINj7CK4XxUWHK340HkQI%2FetQXeGwhvqbyeJjyTaJtOHM8zHt7Q7IzgF9UB8gb%2FPSunK22p4GMnn2dU6BexvQr3SGgNFnRY8Bqj5uc21sTvnJ%2FxlNqNxHwoh5F%2BCY%2FIGaP%2FXOZ%2BY393HLvsWgVUT4ZmrDq%2B1cyrXpiSxe6ojs%2BiOzbG7bS2zDTjo64BjqlATD0wjqkNyLjA8fQ%2BKCCMz8plYsqryaK0YaNv6hiDB3KaAYLIW1%2Bb8jyJ5740gODN575tteBSb%2FsYCo9zohx0GFJByhe65Zl5MjPLbUsi2P1dp5lsQ3pLI%2BI7OgoV521X7bOwdh3PQ%2BUpCc1YX5TP%2FJHMKWtfIEsIcUwCYjRG8RFwDxCDtc%2FgLv5BbiQLHRL%2F%2BcyiG9INq%2Fz%2BXcs5z7%2B37ee%2B45HtQ%3D%3D&X-Amz-SignedHeaders=host&X-Amz-Signature=4219384ec473f47922a4288c9871ac18ba0374f6e68f7a297779a08372595ae9)\n", 400 | "\n" 401 | ] 402 | } 403 | ], 404 | "source": [ 405 | "import requests\n", 406 | "from bs4 import BeautifulSoup\n", 407 | "\n", 408 | "# 单个链接\n", 409 | "link = project_detail_url\n", 410 | "base_url = 'https://r.jina.ai/'\n", 411 | "full_url = base_url + link\n", 412 | "\n", 413 | "# 设置代理\n", 414 | "proxies = {\n", 415 | " \"http\": proxy,\n", 416 | " \"https\": proxy,\n", 417 | "}\n", 418 | "\n", 419 | "# 定义一个函数,用于爬取单个链接内容\n", 420 | "def scrape_content(full_url):\n", 421 | " try:\n", 422 | " response = requests.get(full_url, proxies=proxies, timeout=30)\n", 423 | " response.raise_for_status() # 如果状态码不是200,抛出HTTPError异常\n", 424 | " soup = BeautifulSoup(response.content, 'html.parser')\n", 425 | " return soup.get_text() # 返回网页的纯文本\n", 426 | " except requests.exceptions.RequestException as e:\n", 427 | " print(f\"Error fetching {full_url}: {e}\")\n", 428 | " return \"Error\" # 如果请求失败,返回错误信息\n", 429 | "\n", 430 | "# 爬取单个链接内容\n", 431 | "print(f\"Scraping {full_url}...\")\n", 432 | "scraped_content = scrape_content(full_url)\n", 433 | "print(f\"Scraped Content: {scraped_content}\")\n" 434 | ] 435 | }, 436 | { 437 | "cell_type": "markdown", 438 | "metadata": {}, 439 | "source": [ 440 | "## 使用大语言模型总结" 441 | ] 442 | }, 443 | { 444 | "cell_type": "code", 445 | "execution_count": 11, 446 | "metadata": {}, 447 | "outputs": [ 448 | { 449 | "name": "stdout", 450 | "output_type": "stream", 451 | "text": [ 452 | "{\"一句话总结\": \"Void是一个开源的AI代码编辑器,旨在为开发者提供Cursor、GitHub Copilot等AI功能,同时保持代码的隐私性。\", \"Problem\": \"现有的AI IDE如Cursor是闭源的,这导致开发者在使用这些工具时需要将代码发送到外部API,引发隐私担忧,同时使用成本高,且权力集中在单个实体手中。\", \"Solution\": \"Void通过开源的方式解决了这些问题,开发者可以选择自行托管AI模型,保持数据的私密性,也可以直接连接到Claude、GPT或Gemini等模型,而无需担心数据通过中间层进行通信。\"}\n", 453 | "{'一句话总结': 'Void是一个开源的AI代码编辑器,旨在为开发者提供Cursor、GitHub Copilot等AI功能,同时保持代码的隐私性。', 'Problem': '现有的AI IDE如Cursor是闭源的,这导致开发者在使用这些工具时需要将代码发送到外部API,引发隐私担忧,同时使用成本高,且权力集中在单个实体手中。', 'Solution': 'Void通过开源的方式解决了这些问题,开发者可以选择自行托管AI模型,保持数据的私密性,也可以直接连接到Claude、GPT或Gemini等模型,而无需担心数据通过中间层进行通信。'}\n" 454 | ] 455 | } 456 | ], 457 | "source": [ 458 | "extract_client = OpenAI(base_url = Base_url, api_key=API_key)\n", 459 | "\n", 460 | "def call_llm(system_prompt: str, text: str):\n", 461 | " \"\"\"使用给定的提示和对话格式调用LLM。\"\"\"\n", 462 | " response = extract_client.chat.completions.create(\n", 463 | " messages=[\n", 464 | " {\"role\": \"system\", \"content\": system_prompt},\n", 465 | " {\"role\": \"user\", \"content\": text},\n", 466 | " ],\n", 467 | " model=\"glm-4-plus\",\n", 468 | " max_tokens=4096,\n", 469 | " temperature=0.8,\n", 470 | " )\n", 471 | " return response.choices[0].message.content\n", 472 | "\n", 473 | "rule = {\"一句话总结\": \"通过xxx实现xxx\",\n", 474 | " \"Problem\": \"xxx行业存在xxx问题,导致xxx\",\n", 475 | " \"Solution\": \"xxx解决方案\"}\n", 476 | "\n", 477 | "\n", 478 | "system_prompt = \"你是一个资深的投资人,你有超强的信息整理能力和深刻的洞察力,你总是能深刻且清晰的挖掘出项目中的痛点和背景,以及解决方案。\"\n", 479 | "query_prompt = f\"\"\"\n", 480 | "
\n", 481 | "{scraped_content}\n", 482 | "
\n", 483 | "\n", 484 | "\n", 485 | "本次任务是文档中总结出项目的信息,包括\"一句话总结\",\"项目提出的背景/问题/行业痛点\",\"项目的解决方案\"\n", 486 | "\n", 487 | "\n", 488 | "你的回答应当简洁而深刻,并且使用中文回答\n", 489 | "你的输出应当为json格式,并且避免```json 此类格式性内容\n", 490 | "直接输出json即可,格式严格参考 {json.dumps(rule, ensure_ascii=False)}\"\n", 491 | "\n", 492 | "\"\"\"\n", 493 | "\n", 494 | "llm_summary = call_llm(system_prompt, query_prompt)\n", 495 | "# 解析 llm_summary\n", 496 | "\n", 497 | "print(llm_summary)\n", 498 | "\n", 499 | "parsed_summary = json.loads(llm_summary)\n", 500 | "\n", 501 | "# 保存到新的字典\n", 502 | "summary_dict = {\n", 503 | " \"一句话总结\": parsed_summary.get(\"一句话总结\", \"\"),\n", 504 | " \"Problem\": parsed_summary.get(\"Problem\", \"\"),\n", 505 | " \"Solution\": parsed_summary.get(\"Solution\", \"\")\n", 506 | "}\n", 507 | "\n", 508 | "print(summary_dict)" 509 | ] 510 | }, 511 | { 512 | "cell_type": "markdown", 513 | "metadata": {}, 514 | "source": [ 515 | "## 生成海报" 516 | ] 517 | }, 518 | { 519 | "cell_type": "markdown", 520 | "metadata": {}, 521 | "source": [ 522 | "### 生成海报html版" 523 | ] 524 | }, 525 | { 526 | "cell_type": "code", 527 | "execution_count": 12, 528 | "metadata": {}, 529 | "outputs": [ 530 | { 531 | "name": "stdout", 532 | "output_type": "stream", 533 | "text": [ 534 | "HTML has been saved to poster_html/poster.html\n" 535 | ] 536 | } 537 | ], 538 | "source": [ 539 | "def generate_html(founders_info,summary_dict,big_logo_path):\n", 540 | " html_template = \"\"\"\n", 541 | "\n", 542 | "\n", 543 | " \n", 544 | " \n", 545 | " \n", 548 | " Document\n", 549 | " \n", 550 | " \n", 559 | " \n", 560 | " \n", 561 | "
\n", 562 | "
\n", 563 | " \n", 566 | "
\n", 567 | " {company_name}
成立年份:{established_year}
团队规模:{team_size}
地理位置:{location}
\n", 568 | "
\n", 569 | "
\n", 570 | "
\n", 571 | "
\n", 572 | " 🌟 {summary}\n", 573 | "
\n", 574 | "
\n", 575 | "
\n", 576 | "
\n", 577 | "
🤔 Problemm
\n", 578 | "
\n", 579 | " {problem}\n", 580 | "
\n", 581 | "
\n", 582 | "
\n", 583 | "
🧐 Solutionn
\n", 584 | "
\n", 585 | " {solution}\n", 586 | "
\n", 587 | "
\n", 588 | "
\n", 589 | " YC S2024 项目整理 | 特工宇宙\n", 593 | "
\n", 594 | "
\n", 595 | " \n", 596 | "\"\"\".format(\n", 597 | " big_logo_path=big_logo_path,\n", 598 | " company_name=founders_info[0]['company_name'],\n", 599 | " established_year=founders_info[0]['founded'],\n", 600 | " team_size=founders_info[0]['team_size'],\n", 601 | " location=founders_info[0]['location'],\n", 602 | " summary=summary_dict['一句话总结'],\n", 603 | " problem=summary_dict[\"Problem\"],\n", 604 | " solution=summary_dict[\"Solution\"]\n", 605 | " )\n", 606 | " return html_template\n", 607 | "\n", 608 | "big_logo_path = './example_logo/example.png'\n", 609 | "poster_html = generate_html(founders_info,summary_dict,big_logo_path)\n", 610 | "# 保存 HTML 到文件\n", 611 | "output_dir = \"poster_html\"\n", 612 | "os.makedirs(output_dir, exist_ok=True)\n", 613 | "output_file_path = os.path.join(output_dir, \"poster.html\")\n", 614 | "\n", 615 | "with open(output_file_path, \"w\", encoding=\"utf-8\") as file:\n", 616 | " file.write(poster_html)\n", 617 | "\n", 618 | "print(f\"HTML has been saved to {output_file_path}\")" 619 | ] 620 | }, 621 | { 622 | "cell_type": "code", 623 | "execution_count": null, 624 | "metadata": {}, 625 | "outputs": [], 626 | "source": [ 627 | "def generate_html(logo_base64, agent_logo_base64, font_base64,company_name, established_year, team_size, location, summary, problem, solution):\n", 628 | " html_template = \"\"\"\n", 629 | "\n", 630 | "\n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " Document\n", 635 | " \n", 649 | " \n", 650 | " \n", 651 | "
\n", 652 | "
\n", 653 | " \n", 656 | "
\n", 657 | " {company_name}\n", 658 | " \n", 659 | "
成立年份:{established_year}\n", 660 | "
团队规模:{team_size}\n", 661 | "
地理位置:{location}\n", 662 | "
\n", 663 | "
\n", 664 | "
\n", 665 | "
\n", 666 | "
\n", 667 | " {summary}\n", 668 | " \n", 669 | "
\n", 670 | "
\n", 671 | "
\n", 672 | "
\n", 673 | "
\n", 674 | " 樂 Problem\n", 675 | " m\n", 676 | "
\n", 677 | "
\n", 678 | " {problem}\n", 679 | "
\n", 680 | "
\n", 681 | "
\n", 682 | "
\n", 683 | " 類 Solution\n", 684 | " n\n", 685 | "
\n", 686 | "
\n", 687 | " {solution}\n", 688 | "
\n", 689 | "
\n", 690 | "
\n", 691 | " YC S2024 项目整理 | 特工宇宙\n", 692 | " \n", 695 | "
\n", 696 | "
\n", 697 | " \n", 698 | "\"\"\".format(\n", 699 | " logo_base64=logo_base64,\n", 700 | " agent_logo_base64=agent_logo_base64,\n", 701 | " font_base64=font_base64,\n", 702 | " company_name=company_name,\n", 703 | " established_year=established_year,\n", 704 | " team_size=team_size,\n", 705 | " location=location,\n", 706 | " summary=summary,\n", 707 | " problem=problem,\n", 708 | " solution=solution\n", 709 | " )\n", 710 | " return html_template\n", 711 | "\n", 712 | "logo_base64 = \"你的公司logo base64字符串\"\n", 713 | "agent_logo_base64 = \"特工宇宙logo base64字符串\"\n", 714 | "font_base64 = \"字体文件 base64字符串\"\n", 715 | "established_year = \"\"\n", 716 | "team_size = \"\"\n", 717 | "location = \"\"\n", 718 | "summary = \"\"\n", 719 | "problem = \"\"\n", 720 | "solution = \"\"\n", 721 | "def handler(args: Args[Input])->Output:\n", 722 | " logo_base64 = args.input.logo_base64\n", 723 | " agent_logo_base64 = args.input.agent_logo_base64\n", 724 | " font_base64 = args.input.font_base64\n", 725 | " company_name = args.input.company_name\n", 726 | " established_year = args.input.established_year\n", 727 | " team_size = args.input.team_size\n", 728 | " location = args.input.location\n", 729 | " summary = args.input.summary\n", 730 | " problem = args.input.problem\n", 731 | " solution = args.input.solution\n", 732 | " html_content = generate_html(logo_base64, agent_logo_base64, font_base64,company_name, established_year, team_size, location, summary, problem, solution) \n", 733 | " \n", 734 | " return {\"html_content\":html_content}\n", 735 | "\n" 736 | ] 737 | }, 738 | { 739 | "cell_type": "markdown", 740 | "metadata": {}, 741 | "source": [ 742 | "### 生成海报图片\n", 743 | "\n", 744 | "该部分使用 nodejs 完成\n", 745 | "\n", 746 | "请运行 screenshot.js(需要安装 puppeteer)" 747 | ] 748 | }, 749 | { 750 | "cell_type": "code", 751 | "execution_count": 6, 752 | "metadata": {}, 753 | "outputs": [ 754 | { 755 | "name": "stderr", 756 | "output_type": "stream", 757 | "text": [ 758 | "/home/jamiu/miniconda3/envs/all38/lib/python3.8/site-packages/pyppeteer/util.py:29: RuntimeWarning: coroutine 'run_browser' was never awaited\n", 759 | " gc.collect()\n", 760 | "RuntimeWarning: Enable tracemalloc to get the object allocation traceback\n" 761 | ] 762 | }, 763 | { 764 | "ename": "BrowserError", 765 | "evalue": "Browser closed unexpectedly:\n", 766 | "output_type": "error", 767 | "traceback": [ 768 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 769 | "\u001b[0;31mBrowserError\u001b[0m Traceback (most recent call last)", 770 | "Cell \u001b[0;32mIn[6], line 51\u001b[0m\n\u001b[1;32m 48\u001b[0m \u001b[38;5;28;01mawait\u001b[39;00m browser\u001b[38;5;241m.\u001b[39mclose()\n\u001b[1;32m 50\u001b[0m \u001b[38;5;66;03m# 运行截图程序\u001b[39;00m\n\u001b[0;32m---> 51\u001b[0m \u001b[43mcapture_screenshots\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n", 771 | "Cell \u001b[0;32mIn[6], line 14\u001b[0m, in \u001b[0;36mcapture_screenshots\u001b[0;34m()\u001b[0m\n\u001b[1;32m 12\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mcapture_screenshots\u001b[39m():\n\u001b[1;32m 13\u001b[0m \u001b[38;5;66;03m# 调用异步函数\u001b[39;00m\n\u001b[0;32m---> 14\u001b[0m \u001b[43masyncio\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_event_loop\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun_until_complete\u001b[49m\u001b[43m(\u001b[49m\u001b[43mrun_browser\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n", 772 | "File \u001b[0;32m~/miniconda3/envs/all38/lib/python3.8/site-packages/nest_asyncio.py:98\u001b[0m, in \u001b[0;36m_patch_loop..run_until_complete\u001b[0;34m(self, future)\u001b[0m\n\u001b[1;32m 95\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m f\u001b[38;5;241m.\u001b[39mdone():\n\u001b[1;32m 96\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mRuntimeError\u001b[39;00m(\n\u001b[1;32m 97\u001b[0m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mEvent loop stopped before Future completed.\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[0;32m---> 98\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mf\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mresult\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n", 773 | "File \u001b[0;32m~/miniconda3/envs/all38/lib/python3.8/asyncio/futures.py:178\u001b[0m, in \u001b[0;36mFuture.result\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 176\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m__log_traceback \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mFalse\u001b[39;00m\n\u001b[1;32m 177\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_exception \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m--> 178\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_exception\n\u001b[1;32m 179\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_result\n", 774 | "File \u001b[0;32m~/miniconda3/envs/all38/lib/python3.8/asyncio/tasks.py:280\u001b[0m, in \u001b[0;36mTask.__step\u001b[0;34m(***failed resolving arguments***)\u001b[0m\n\u001b[1;32m 276\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m 277\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m exc \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m 278\u001b[0m \u001b[38;5;66;03m# We use the `send` method directly, because coroutines\u001b[39;00m\n\u001b[1;32m 279\u001b[0m \u001b[38;5;66;03m# don't have `__iter__` and `__next__` methods.\u001b[39;00m\n\u001b[0;32m--> 280\u001b[0m result \u001b[38;5;241m=\u001b[39m \u001b[43mcoro\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msend\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m)\u001b[49m\n\u001b[1;32m 281\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 282\u001b[0m result \u001b[38;5;241m=\u001b[39m coro\u001b[38;5;241m.\u001b[39mthrow(exc)\n", 775 | "Cell \u001b[0;32mIn[6], line 18\u001b[0m, in \u001b[0;36mrun_browser\u001b[0;34m()\u001b[0m\n\u001b[1;32m 16\u001b[0m \u001b[38;5;28;01masync\u001b[39;00m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mrun_browser\u001b[39m():\n\u001b[1;32m 17\u001b[0m \u001b[38;5;66;03m# 启动浏览器,指定自定义 Chrome 路径\u001b[39;00m\n\u001b[0;32m---> 18\u001b[0m browser \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mawait\u001b[39;00m launch(executablePath\u001b[38;5;241m=\u001b[39mchrome_driver_path_bin, headless\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n\u001b[1;32m 19\u001b[0m page \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mawait\u001b[39;00m browser\u001b[38;5;241m.\u001b[39mnewPage()\n\u001b[1;32m 21\u001b[0m output_dir \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mposter_png\u001b[39m\u001b[38;5;124m'\u001b[39m\n", 776 | "File \u001b[0;32m~/miniconda3/envs/all38/lib/python3.8/site-packages/pyppeteer/launcher.py:307\u001b[0m, in \u001b[0;36mlaunch\u001b[0;34m(options, **kwargs)\u001b[0m\n\u001b[1;32m 239\u001b[0m \u001b[38;5;28;01masync\u001b[39;00m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mlaunch\u001b[39m(options: \u001b[38;5;28mdict\u001b[39m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Any) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Browser:\n\u001b[1;32m 240\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"Start chrome process and return :class:`~pyppeteer.browser.Browser`.\u001b[39;00m\n\u001b[1;32m 241\u001b[0m \u001b[38;5;124;03m This function is a shortcut to :meth:`Launcher(options, **kwargs).launch`.\u001b[39;00m\n\u001b[1;32m 242\u001b[0m \u001b[38;5;124;03m Available options are:\u001b[39;00m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 305\u001b[0m \u001b[38;5;124;03m option with extreme caution.\u001b[39;00m\n\u001b[1;32m 306\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m\n\u001b[0;32m--> 307\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;01mawait\u001b[39;00m Launcher(options, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs)\u001b[38;5;241m.\u001b[39mlaunch()\n", 777 | "File \u001b[0;32m~/miniconda3/envs/all38/lib/python3.8/site-packages/pyppeteer/launcher.py:168\u001b[0m, in \u001b[0;36mLauncher.launch\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 165\u001b[0m signal\u001b[38;5;241m.\u001b[39msignal(signal\u001b[38;5;241m.\u001b[39mSIGHUP, _close_process)\n\u001b[1;32m 167\u001b[0m connectionDelay \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mslowMo\n\u001b[0;32m--> 168\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mbrowserWSEndpoint \u001b[38;5;241m=\u001b[39m \u001b[43mget_ws_endpoint\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43murl\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 169\u001b[0m logger\u001b[38;5;241m.\u001b[39minfo(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mBrowser listening on: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mbrowserWSEndpoint\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m'\u001b[39m)\n\u001b[1;32m 170\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mconnection \u001b[38;5;241m=\u001b[39m Connection(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mbrowserWSEndpoint, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_loop, connectionDelay, )\n", 778 | "File \u001b[0;32m~/miniconda3/envs/all38/lib/python3.8/site-packages/pyppeteer/launcher.py:227\u001b[0m, in \u001b[0;36mget_ws_endpoint\u001b[0;34m(url)\u001b[0m\n\u001b[1;32m 225\u001b[0m \u001b[38;5;28;01mwhile\u001b[39;00m (\u001b[38;5;28;01mTrue\u001b[39;00m):\n\u001b[1;32m 226\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m time\u001b[38;5;241m.\u001b[39mtime() \u001b[38;5;241m>\u001b[39m timeout:\n\u001b[0;32m--> 227\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m BrowserError(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mBrowser closed unexpectedly:\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;124m'\u001b[39m)\n\u001b[1;32m 228\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m 229\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m urlopen(url) \u001b[38;5;28;01mas\u001b[39;00m f:\n", 779 | "\u001b[0;31mBrowserError\u001b[0m: Browser closed unexpectedly:\n" 780 | ] 781 | } 782 | ], 783 | "source": [] 784 | } 785 | ], 786 | "metadata": { 787 | "kernelspec": { 788 | "display_name": "all38", 789 | "language": "python", 790 | "name": "python3" 791 | }, 792 | "language_info": { 793 | "codemirror_mode": { 794 | "name": "ipython", 795 | "version": 3 796 | }, 797 | "file_extension": ".py", 798 | "mimetype": "text/x-python", 799 | "name": "python", 800 | "nbconvert_exporter": "python", 801 | "pygments_lexer": "ipython3", 802 | "version": "3.8.19" 803 | } 804 | }, 805 | "nbformat": 4, 806 | "nbformat_minor": 2 807 | } 808 | -------------------------------------------------------------------------------- /markdown-img/33447fe34423edcf8aea3d91155a201.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Agent-Universe/YC_Poster/34d4a8e499fccfeee02193e1fadf85417b968cf3/markdown-img/33447fe34423edcf8aea3d91155a201.png -------------------------------------------------------------------------------- /markdown-img/6d1306b25bbef0f3ffb24e36c79cbcb.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Agent-Universe/YC_Poster/34d4a8e499fccfeee02193e1fadf85417b968cf3/markdown-img/6d1306b25bbef0f3ffb24e36c79cbcb.png -------------------------------------------------------------------------------- /markdown-img/71bdc5ee84dc6da5c8534bbeff765ba.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Agent-Universe/YC_Poster/34d4a8e499fccfeee02193e1fadf85417b968cf3/markdown-img/71bdc5ee84dc6da5c8534bbeff765ba.png -------------------------------------------------------------------------------- /markdown-img/7344e9dcdc45d631ad66a11a1d7b05b.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Agent-Universe/YC_Poster/34d4a8e499fccfeee02193e1fadf85417b968cf3/markdown-img/7344e9dcdc45d631ad66a11a1d7b05b.png -------------------------------------------------------------------------------- /markdown-img/8d7ad8961df57af31d635ad8e2e691b.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Agent-Universe/YC_Poster/34d4a8e499fccfeee02193e1fadf85417b968cf3/markdown-img/8d7ad8961df57af31d635ad8e2e691b.png -------------------------------------------------------------------------------- /markdown-img/AgentUniverse.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Agent-Universe/YC_Poster/34d4a8e499fccfeee02193e1fadf85417b968cf3/markdown-img/AgentUniverse.png -------------------------------------------------------------------------------- /markdown-img/f25752adc9ac83fc7465e0762895cc0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Agent-Universe/YC_Poster/34d4a8e499fccfeee02193e1fadf85417b968cf3/markdown-img/f25752adc9ac83fc7465e0762895cc0.png -------------------------------------------------------------------------------- /markdown-img/image-20241007181136044.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Agent-Universe/YC_Poster/34d4a8e499fccfeee02193e1fadf85417b968cf3/markdown-img/image-20241007181136044.png -------------------------------------------------------------------------------- /markdown-img/image-20241007181508053.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Agent-Universe/YC_Poster/34d4a8e499fccfeee02193e1fadf85417b968cf3/markdown-img/image-20241007181508053.png -------------------------------------------------------------------------------- /markdown-img/image-20241007181624006.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Agent-Universe/YC_Poster/34d4a8e499fccfeee02193e1fadf85417b968cf3/markdown-img/image-20241007181624006.png -------------------------------------------------------------------------------- /markdown-img/image-20241007181627977.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Agent-Universe/YC_Poster/34d4a8e499fccfeee02193e1fadf85417b968cf3/markdown-img/image-20241007181627977.png -------------------------------------------------------------------------------- /markdown-img/image-20241007182235402.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Agent-Universe/YC_Poster/34d4a8e499fccfeee02193e1fadf85417b968cf3/markdown-img/image-20241007182235402.png -------------------------------------------------------------------------------- /markdown-img/image-20241007182241902.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Agent-Universe/YC_Poster/34d4a8e499fccfeee02193e1fadf85417b968cf3/markdown-img/image-20241007182241902.png -------------------------------------------------------------------------------- /markdown-img/image-20241007182349251.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Agent-Universe/YC_Poster/34d4a8e499fccfeee02193e1fadf85417b968cf3/markdown-img/image-20241007182349251.png -------------------------------------------------------------------------------- /markdown-img/img_v3_02fe_d19a1ccb-ed0c-4f20-b26c-fbc19a9f206g.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Agent-Universe/YC_Poster/34d4a8e499fccfeee02193e1fadf85417b968cf3/markdown-img/img_v3_02fe_d19a1ccb-ed0c-4f20-b26c-fbc19a9f206g.jpg -------------------------------------------------------------------------------- /markdown-img/流程图.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Agent-Universe/YC_Poster/34d4a8e499fccfeee02193e1fadf85417b968cf3/markdown-img/流程图.jpg -------------------------------------------------------------------------------- /package.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "yc", 3 | "version": "1.0.0", 4 | "main": "screenshot.js", 5 | "scripts": { 6 | "test": "echo \"Error: no test specified\" && exit 1" 7 | }, 8 | "author": "", 9 | "license": "ISC", 10 | "description": "", 11 | "dependencies": { 12 | "puppeteer": "^23.5.0" 13 | } 14 | } 15 | -------------------------------------------------------------------------------- /poster_html/LinkenDin.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Agent-Universe/YC_Poster/34d4a8e499fccfeee02193e1fadf85417b968cf3/poster_html/LinkenDin.png -------------------------------------------------------------------------------- /poster_html/example_logo/example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Agent-Universe/YC_Poster/34d4a8e499fccfeee02193e1fadf85417b968cf3/poster_html/example_logo/example.png -------------------------------------------------------------------------------- /poster_html/poster.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 9 | Document 10 | 11 | 20 | 21 | 22 |
23 |
24 | 27 |
28 | Void
成立年份:2024
团队规模:2
地理位置:San Francisco
29 |
30 |
31 |
32 |
33 | 🌟 Void是一个开源的AI代码编辑器,旨在为开发者提供Cursor、GitHub Copilot等AI功能,同时保持代码的隐私性。 34 |
35 |
36 |
37 |
38 |
🤔 Problemm
39 |
40 | 现有的AI IDE如Cursor是闭源的,这导致开发者在使用这些工具时需要将代码发送到外部API,引发隐私担忧,同时使用成本高,且权力集中在单个实体手中。 41 |
42 |
43 |
44 |
🧐 Solutionn
45 |
46 | Void通过开源的方式解决了这些问题,开发者可以选择自行托管AI模型,保持数据的私密性,也可以直接连接到Claude、GPT或Gemini等模型,而无需担心数据通过中间层进行通信。 47 |
48 |
49 |
50 | YC S2024 项目整理 | 特工宇宙 54 |
55 |
56 | 57 | -------------------------------------------------------------------------------- /poster_html/style2.css: -------------------------------------------------------------------------------- 1 | html, 2 | body, 3 | div, 4 | span, 5 | applet, 6 | object, 7 | iframe, 8 | h1, 9 | h2, 10 | h3, 11 | h4, 12 | h5, 13 | h6, 14 | p, 15 | blockquote, 16 | pre, 17 | a, 18 | abbr, 19 | acronym, 20 | address, 21 | big, 22 | cite, 23 | code, 24 | del, 25 | dfn, 26 | em, 27 | img, 28 | ins, 29 | kbd, 30 | q, 31 | s, 32 | samp, 33 | small, 34 | strike, 35 | strong, 36 | sub, 37 | sup, 38 | tt, 39 | var, 40 | b, 41 | u, 42 | i, 43 | center, 44 | dl, 45 | dt, 46 | dd, 47 | ol, 48 | ul, 49 | li, 50 | fieldset, 51 | form, 52 | label, 53 | legend, 54 | table, 55 | caption, 56 | tbody, 57 | tfoot, 58 | thead, 59 | tr, 60 | th, 61 | td, 62 | article, 63 | aside, 64 | canvas, 65 | details, 66 | embed, 67 | figure, 68 | figcaption, 69 | footer, 70 | header, 71 | hgroup, 72 | menu, 73 | nav, 74 | output, 75 | ruby, 76 | section, 77 | summary, 78 | time, 79 | mark, 80 | audio, 81 | video { 82 | margin: 0; 83 | padding: 0; 84 | border: 0; 85 | font-size: 100%; 86 | font: inherit; 87 | vertical-align: baseline; 88 | } 89 | * { 90 | box-sizing: border-box; 91 | } 92 | .flex-row { 93 | display: flex; 94 | flex-direction: row; 95 | } 96 | .flex-col { 97 | display: flex; 98 | flex-direction: column; 99 | } 100 | .justify-start { 101 | display: flex; 102 | justify-content: flex-start; 103 | } 104 | .justify-center { 105 | display: flex; 106 | justify-content: center; 107 | } 108 | .justify-end { 109 | display: flex; 110 | justify-content: flex-end; 111 | } 112 | .justify-between { 113 | display: flex; 114 | justify-content: space-between; 115 | } 116 | .items-start { 117 | display: flex; 118 | align-items: flex-start; 119 | } 120 | .items-end { 121 | display: flex; 122 | align-items: flex-end; 123 | } 124 | .items-center { 125 | display: flex; 126 | align-items: center; 127 | } 128 | .no-shrink { 129 | flex-shrink: 0; 130 | } 131 | 132 | .music_9_57 { 133 | width: 100%; 134 | display: flex; 135 | flex-direction: column; 136 | justify-content: flex-start; 137 | align-items: flex-start; 138 | padding-top: 31px; 139 | padding-right: 20px; 140 | padding-left: 20px; 141 | overflow: hidden; 142 | background-color: #f5f5ee; 143 | position: relative; 144 | height: auto; 145 | min-height: 100%; 146 | } 147 | 148 | .music_9_58 { 149 | display: flex; 150 | flex-direction: row; 151 | justify-content: center; 152 | align-items: center; 153 | padding-bottom: 10px; 154 | gap: 40px; 155 | flex-shrink: 0; 156 | align-self: stretch; 157 | overflow: hidden; 158 | line-height: 37px; 159 | position: relative; 160 | } 161 | 162 | .music_9_59 { 163 | /* width: 264px; 164 | height: 148px; */ 165 | flex-shrink: 0; 166 | object-fit: cover; 167 | } 168 | 169 | .music_9_60 { 170 | flex-grow: 1; 171 | width: 0; 172 | } 173 | 174 | .music_9_60_0_8 { 175 | font-size: 36px; 176 | font-family: HYRunYuan; 177 | line-height: 37px; 178 | font-weight: 400; 179 | color: #3d3d3d; 180 | } 181 | 182 | .music_9_60_8_44 { 183 | font-size: 20px; 184 | font-family: HYRunYuan; 185 | line-height: 37px; 186 | font-weight: 400; 187 | color: #3d3d3d; 188 | } 189 | 190 | .music_9_66 { 191 | display: flex; 192 | flex-direction: column; 193 | justify-content: flex-start; 194 | align-items: flex-start; 195 | padding-bottom: 20px; 196 | gap: 12px; 197 | flex-shrink: 0; 198 | align-self: stretch; 199 | overflow: hidden; 200 | line-height: 29px; 201 | position: relative; 202 | } 203 | 204 | .music_9_67 { 205 | flex-shrink: 0; 206 | align-self: stretch; 207 | } 208 | 209 | .music_9_67_0_52 { 210 | font-size: 18px; 211 | font-family: HYRunYuan; 212 | line-height: 29px; 213 | font-weight: 400; 214 | color: #3d3d3d; 215 | margin-left: auto; 216 | margin-right: auto; 217 | } 218 | 219 | .music_9_67_52_53 { 220 | font-size: 24px; 221 | font-family: HYRunYuan; 222 | line-height: 29px; 223 | font-weight: 400; 224 | -webkit-background-clip: text; 225 | background-clip: text; 226 | -webkit-text-fill-color: transparent; 227 | text-fill-color: transparent; 228 | margin-left: auto; 229 | margin-right: auto; 230 | } 231 | 232 | .music_9_71 { 233 | width: 680px; 234 | height: 0; 235 | flex-shrink: 0; 236 | transform: rotate(0deg); 237 | border: 1px solid #b3b3b3; 238 | } 239 | 240 | .music_12_285 { 241 | display: flex; 242 | flex-direction: row; 243 | justify-content: flex-start; 244 | align-items: center; 245 | padding-bottom: 20px; 246 | gap: 30px; 247 | flex-shrink: 0; 248 | align-self: stretch; 249 | overflow: hidden; 250 | position: relative; 251 | } 252 | 253 | .music_12_286 { 254 | display: flex; 255 | flex-direction: row; 256 | justify-content: center; 257 | align-items: center; 258 | flex-shrink: 0; 259 | overflow: hidden; 260 | border-radius: 100px; 261 | position: relative; 262 | } 263 | 264 | .music_12_287 { 265 | width: 100px; 266 | height: 100px; 267 | flex-shrink: 0; 268 | object-fit: cover; 269 | } 270 | 271 | .music_12_288 { 272 | height: 100px; 273 | display: flex; 274 | flex-direction: column; 275 | justify-content: center; 276 | align-items: flex-start; 277 | gap: 13px; 278 | flex-grow: 1; 279 | width: 0; 280 | overflow: hidden; 281 | position: relative; 282 | } 283 | 284 | .music_12_299 { 285 | flex-shrink: 0; 286 | align-self: stretch; 287 | font-size: 30px; 288 | font-family: HYRunYuan; 289 | font-weight: 400; 290 | line-height: 45px; 291 | color: #3d3d3d; 292 | white-space: pre; 293 | height: 45px; 294 | margin-top: -11.5px; 295 | margin-bottom: -11.5px; 296 | } 297 | 298 | .music_12_289 { 299 | display: flex; 300 | flex-direction: row; 301 | justify-content: flex-start; 302 | align-items: center; 303 | gap: 10px; 304 | flex-shrink: 0; 305 | align-self: stretch; 306 | overflow: hidden; 307 | font-size: 18px; 308 | font-family: HYRunYuan; 309 | font-weight: 400; 310 | line-height: 22px; 311 | color: #3d3d3d; 312 | position: relative; 313 | } 314 | 315 | .music_12_295 { 316 | width: 24px; 317 | display: flex; 318 | flex-direction: column; 319 | justify-content: flex-start; 320 | align-items: flex-start; 321 | padding-top: 10px; 322 | padding-bottom: 3px; 323 | flex-shrink: 0; 324 | overflow: hidden; 325 | position: relative; 326 | } 327 | 328 | .music_12_296 { 329 | width: 24px; 330 | height: 29px; 331 | flex-shrink: 0; 332 | margin-top: -0.5px; 333 | margin-right: 0; 334 | margin-bottom: -0.5px; 335 | margin-left: 0; 336 | } 337 | 338 | .music_12_290 { 339 | flex-grow: 1; 340 | width: 0; 341 | } 342 | 343 | .music_12_316 { 344 | display: flex; 345 | flex-direction: row; 346 | justify-content: flex-start; 347 | align-items: center; 348 | padding-bottom: 20px; 349 | gap: 30px; 350 | flex-shrink: 0; 351 | align-self: stretch; 352 | overflow: hidden; 353 | position: relative; 354 | } 355 | 356 | .music_12_331 { 357 | display: flex; 358 | flex-direction: row; 359 | justify-content: center; 360 | align-items: center; 361 | flex-shrink: 0; 362 | overflow: hidden; 363 | border-radius: 100px; 364 | position: relative; 365 | } 366 | 367 | .music_9_89 { 368 | width: 100px; 369 | height: 100px; 370 | flex-shrink: 0; 371 | object-fit: cover; 372 | } 373 | 374 | .music_12_317 { 375 | height: 100px; 376 | display: flex; 377 | flex-direction: column; 378 | justify-content: center; 379 | align-items: flex-start; 380 | gap: 13px; 381 | flex-grow: 1; 382 | width: 0; 383 | overflow: hidden; 384 | position: relative; 385 | } 386 | 387 | .music_12_327 { 388 | flex-shrink: 0; 389 | align-self: stretch; 390 | font-size: 30px; 391 | font-family: HYRunYuan; 392 | font-weight: 400; 393 | line-height: 45px; 394 | color: #3d3d3d; 395 | white-space: pre; 396 | height: 45px; 397 | margin-top: -11.5px; 398 | margin-bottom: -11.5px; 399 | } 400 | 401 | .music_12_318 { 402 | display: flex; 403 | flex-direction: row; 404 | justify-content: flex-start; 405 | align-items: center; 406 | gap: 10px; 407 | flex-shrink: 0; 408 | align-self: stretch; 409 | overflow: hidden; 410 | font-size: 18px; 411 | font-family: HYRunYuan; 412 | font-weight: 400; 413 | line-height: 22px; 414 | color: #3d3d3d; 415 | position: relative; 416 | } 417 | 418 | .music_12_319 { 419 | width: 24px; 420 | display: flex; 421 | flex-direction: column; 422 | justify-content: flex-start; 423 | align-items: flex-start; 424 | padding-top: 10px; 425 | padding-bottom: 3px; 426 | flex-shrink: 0; 427 | overflow: hidden; 428 | position: relative; 429 | } 430 | 431 | .music_12_320 { 432 | width: 24px; 433 | height: 29px; 434 | flex-shrink: 0; 435 | margin-top: -0.5px; 436 | margin-right: 0; 437 | margin-bottom: -0.5px; 438 | margin-left: 0; 439 | } 440 | 441 | .music_12_323 { 442 | flex-grow: 1; 443 | width: 0; 444 | } 445 | 446 | .music_9_003 { 447 | display: flex; 448 | flex-direction: column; 449 | justify-content: center; 450 | align-items: flex-start; 451 | padding-top: 12px; 452 | padding-bottom: 20px; 453 | gap: 15px; 454 | flex-shrink: 0; 455 | align-self: stretch; 456 | overflow: hidden; 457 | position: relative; 458 | } 459 | 460 | .music_9_004 { 461 | flex-shrink: 0; 462 | align-self: stretch; 463 | line-height: 48px; 464 | white-space: pre; 465 | height: 48px; 466 | margin-top: -13px; 467 | margin-bottom: -13px; 468 | } 469 | 470 | .music_9_004_0_9 { 471 | font-size: 24px; 472 | font-family: HYRunYuan; 473 | line-height: 22px; 474 | font-weight: 400; 475 | color: #3d3d3d; 476 | margin-left: auto; 477 | margin-right: auto; 478 | } 479 | 480 | .music_9_004_9_10 { 481 | font-size: 32px; 482 | font-family: HYRunYuan; 483 | line-height: 22px; 484 | font-weight: 400; 485 | -webkit-background-clip: text; 486 | background-clip: text; 487 | -webkit-text-fill-color: transparent; 488 | text-fill-color: transparent; 489 | margin-left: auto; 490 | margin-right: auto; 491 | } 492 | 493 | .music_9_008 { 494 | width: 680px; 495 | display: flex; 496 | flex-direction: row; 497 | justify-content: flex-start; 498 | align-items: center; 499 | gap: 10px; 500 | flex-shrink: 0; 501 | overflow: hidden; 502 | line-height: 28px; 503 | padding-top: 0; 504 | padding-right: 0; 505 | padding-bottom: 0; 506 | padding-left: 0; 507 | } 508 | 509 | .music_9_009_0_99 { 510 | font-size: 18px; 511 | font-family: HYRunYuan; 512 | line-height: 28px; 513 | font-weight: 400; 514 | color: #3d3d3d; 515 | } 516 | 517 | .music_9_009_99_101 { 518 | font-size: 18px; 519 | font-family: HYRunYuan; 520 | line-height: 28px; 521 | font-weight: 400; 522 | -webkit-background-clip: text; 523 | background-clip: text; 524 | -webkit-text-fill-color: transparent; 525 | text-fill-color: transparent; 526 | } 527 | 528 | .music_9_013 { 529 | display: flex; 530 | flex-direction: column; 531 | justify-content: center; 532 | align-items: flex-start; 533 | padding-top: 12px; 534 | gap: 15px; 535 | flex-shrink: 0; 536 | align-self: stretch; 537 | overflow: hidden; 538 | position: relative; 539 | } 540 | 541 | .music_9_014 { 542 | flex-shrink: 0; 543 | align-self: stretch; 544 | line-height: 48px; 545 | white-space: pre; 546 | height: 48px; 547 | margin-top: -13px; 548 | margin-bottom: -13px; 549 | } 550 | 551 | .music_9_014_0_10 { 552 | font-size: 24px; 553 | font-family: HYRunYuan; 554 | line-height: 22px; 555 | font-weight: 400; 556 | color: #3d3d3d; 557 | margin-left: auto; 558 | margin-right: auto; 559 | } 560 | 561 | .music_9_014_10_11 { 562 | font-size: 32px; 563 | font-family: HYRunYuan; 564 | line-height: 22px; 565 | font-weight: 400; 566 | -webkit-background-clip: text; 567 | background-clip: text; 568 | -webkit-text-fill-color: transparent; 569 | text-fill-color: transparent; 570 | margin-left: auto; 571 | margin-right: auto; 572 | } 573 | 574 | .music_9_018 { 575 | width: 680px; 576 | display: flex; 577 | flex-direction: row; 578 | justify-content: flex-start; 579 | align-items: center; 580 | gap: 10px; 581 | flex-shrink: 0; 582 | overflow: hidden; 583 | line-height: 28px; 584 | padding-top: 0; 585 | padding-right: 0; 586 | padding-bottom: 0; 587 | padding-left: 0; 588 | } 589 | 590 | .music_9_019_0_129 { 591 | font-size: 18px; 592 | font-family: HYRunYuan; 593 | line-height: 28px; 594 | font-weight: 400; 595 | color: #3d3d3d; 596 | } 597 | 598 | .music_9_019_129_131 { 599 | font-size: 18px; 600 | font-family: HYRunYuan; 601 | line-height: 28px; 602 | font-weight: 400; 603 | -webkit-background-clip: text; 604 | background-clip: text; 605 | -webkit-text-fill-color: transparent; 606 | text-fill-color: transparent; 607 | } 608 | 609 | .music_9_023 { 610 | height: 87px; 611 | display: flex; 612 | flex-direction: row; 613 | justify-content: flex-start; 614 | align-items: flex-end; 615 | padding-top: 12px; 616 | padding-bottom: 20px; 617 | flex-shrink: 0; 618 | align-self: stretch; 619 | overflow: hidden; 620 | font-size: 15px; 621 | font-family: HYRunYuan; 622 | font-weight: 400; 623 | line-height: 22px; 624 | color: #3d3d3d; 625 | white-space: pre; 626 | position: relative; 627 | } 628 | 629 | .music_9_024 { 630 | flex-grow: 1; 631 | width: 0; 632 | } 633 | 634 | .music_9_028 { 635 | width: 100px; 636 | height: 26px; 637 | position: relative; 638 | flex-shrink: 0; 639 | margin-top: 0; 640 | margin-right: 0px; 641 | margin-bottom: 0; 642 | margin-left: 0; 643 | } 644 | -------------------------------------------------------------------------------- /poster_html/汉仪润圆-65W.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Agent-Universe/YC_Poster/34d4a8e499fccfeee02193e1fadf85417b968cf3/poster_html/汉仪润圆-65W.ttf -------------------------------------------------------------------------------- /poster_html/特工宇宙.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Agent-Universe/YC_Poster/34d4a8e499fccfeee02193e1fadf85417b968cf3/poster_html/特工宇宙.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | selenium 2 | beautifulsoup4 3 | openai 4 | webdriver-manager 5 | requests 6 | ipykernel -------------------------------------------------------------------------------- /screenshot.js: -------------------------------------------------------------------------------- 1 | const puppeteer = require('puppeteer'); 2 | const fs = require('fs'); 3 | const path = require('path'); 4 | 5 | (async () => { 6 | const browser = await puppeteer.launch(); 7 | const page = await browser.newPage(); 8 | 9 | const outputDir = 'poster_png'; 10 | if (!fs.existsSync(outputDir)) { 11 | fs.mkdirSync(outputDir); 12 | } 13 | 14 | const htmlFiles = fs.readdirSync('poster_html').filter(file => file.endsWith('.html')); 15 | 16 | for (const htmlFile of htmlFiles) { 17 | const filePath = path.join(__dirname, 'poster_html', htmlFile); 18 | 19 | await page.goto(`file://${filePath}`, { waitUntil: 'networkidle2' }); 20 | 21 | // 设置宽度为720,自动计算高度 22 | await page.setViewport({ width: 720, height: 0 }); // height=0以计算自适应高度 23 | const bodyHandle = await page.$('body'); 24 | const { height } = await bodyHandle.boundingBox(); 25 | await bodyHandle.dispose(); 26 | 27 | // 重新设置viewport高度 28 | await page.setViewport({ width: 720, height: Math.ceil(height),deviceScaleFactor: 3}); 29 | 30 | // 截图并保存 31 | const screenshotPath = path.join(outputDir, `${path.parse(htmlFile).name}.png`); 32 | await page.screenshot({ path: screenshotPath, fullPage: true }); 33 | } 34 | 35 | await browser.close(); 36 | })(); --------------------------------------------------------------------------------