├── .gitignore ├── LICENSE ├── README.md ├── README_zh.md ├── arxiv_bot ├── __init__.py ├── agent.py ├── ai_service.py ├── config.py ├── fetcher.py ├── messenger.py └── storage.py ├── assert └── ARIES.webp ├── config.yaml ├── main.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | # Python 2 | __pycache__/ 3 | *.py[cod] 4 | *.egg-info/ 5 | 6 | .DS_Store 7 | 8 | # Environment 9 | .env 10 | 11 | # Storage 12 | paper_history.json -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2025-present 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy 4 | of this software and associated documentation files (the "Software"), to deal 5 | in the Software without restriction, including without limitation the rights 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 7 | copies of the Software, and to permit persons to whom the Software is 8 | furnished to do so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all 11 | copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | SOFTWARE. 20 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ♈️ Aries: ArXiv Research Intelligent Efficient Summary 2 | ## Arxiv Paper to Feishu 3 | 4 | 5 |

6 | ARIES Logo 7 |

8 | 9 | 10 | 11 | [中文文档](README_zh.md) 12 | --- 13 | 14 |

15 | 16 | 17 | 18 | 19 | 20 | 21 |

22 | 23 | 24 | ## 🎉 Introduction 25 | 26 | A tool that automatically fetches the latest LLM-related papers from arXiv and pushes them through Feishu bots. The bot uses Deepseek AI for intelligent filtering and summarization, helping you stay up-to-date with the latest research developments. 27 | 28 | --- 29 | 30 | ## ✨ Features 31 | 32 | - 🤖 **Auto Paper Fetching**: Scrapes the latest LLM-related papers from arXiv. 33 | - 🧠 **Smart Filtering & Summarization**: Uses Deepseek AI for high-quality filtering and summarization. 34 | - 📱 **Multi-bot Support**: Configure multiple Feishu bots to push to different groups. 35 | - ⏰ **Scheduled Tasks**: Default push at 9 AM daily, customizable. 36 | - ⚙️ **Flexible Configuration**: Customize paper types, filtering rules, and push methods via config file. 37 | - 📊 **History Tracking**: Pushed papers are recorded in `paper_history.json` to avoid duplicates. 38 | 39 | --- 40 | 41 | ## 🚀 Quick Start 42 | 43 | 1. Install dependencies: 44 | ```bash 45 | pip install -r requirements.txt 46 | ``` 47 | 2. Configure environment variables: 48 | - Create a `.env` file and fill in as follows: 49 | ``` 50 | DEEPSEEK_API_KEY=your_deepseek_api_key 51 | WEBHOOK_URL_1=https://your_first_webhook_url 52 | WEBHOOK_URL_2=https://your_second_webhook_url 53 | ``` 54 | 55 | 3. Configure `config.yaml` to customize paper types and filtering rules. 56 | 57 | 4. Run the script: 58 | ```bash 59 | python main.py 60 | ``` 61 | 62 | --- 63 | 64 | ## ⚙️ Configuration Guide 65 | 66 | ### Environment Variables 67 | 68 | - `DEEPSEEK_API_KEY`: **Required**, for calling Deepseek API. 69 | - `WEBHOOK_URL_[n]`: **Required**, Feishu bot webhook URLs, multiple can be configured. 70 | 71 | ### Configuration Details: 72 | 73 | - **`paper_types`**: Define settings for each paper type. 74 | - **`enabled`**: Status, `true` to enable, `false` to disable. 75 | - **`search_query`**: arXiv search query, supports logical conditions and keyword combinations. 76 | - **`keywords`**: Keyword list for paper filtering. 77 | - **`prompt`**: Prompt for Deepseek to judge paper relevance, generated based on search_query and keywords. Can be modified. 78 | - **`max_papers`**: Maximum number of papers per push (default 5, customizable). 79 | 80 | - **`general`**: Global configuration. 81 | - **`max_search_results`**: Maximum number of papers returned by search. 82 | - **`schedule_time`**: Daily scheduled task time (24-hour format). 83 | 84 | --- 85 | 86 | ## ❗ Important Notes 87 | 88 | 1. Ensure sufficient Deepseek API Key quota. 89 | 2. Feishu Webhook URLs should be obtained from bot settings in Feishu groups. 90 | 3. Recommended to deploy on a server for continuous operation. 91 | 4. New paper types can be added or existing configurations adjusted via `config.yaml`. 92 | 93 | --- 94 | 95 | ## ❓ FAQ 96 | 97 | 1. **API Call Failure** 98 | - Check if the Deepseek API Key is correct. 99 | 2. **Message Push Failure** 100 | - Verify if the Webhook URL is valid. 101 | 3. **Test Push** 102 | - Uncomment `agent.run()` in the `main` function to run directly. 103 | 104 | --- 105 | 106 | ## 📝 TODO List 107 | 108 | - 📚 Paper Collection & Management 109 | - [x] Automatic arXiv paper fetching 110 | - [ ] Paper history storage 111 | - [ ] Related paper correlation analysis 112 | - [ ] Paper archiving system 113 | 114 | - 🔍 Intelligent Paper Processing 115 | - [x] Auto Summary: Paper summarization 116 | - [ ] Auto Review: Paper review generation 117 | - [ ] Auto Survey: Field survey generation 118 | 119 | - 📢 Multi-platform Distribution 120 | - [x] Feishu bot integration 121 | - [ ] WeChat bot integration 122 | - [ ] Xiaohongshu content publishing 123 | 124 | --- 125 | 126 | ## 📄 License 127 | 128 | This project is licensed under the [MIT License](https://opensource.org/license/mit). -------------------------------------------------------------------------------- /README_zh.md: -------------------------------------------------------------------------------- 1 | # ♈️ Aries: ArXiv Research Intelligent Efficient Summary 2 | ## Arxiv 论文推送到飞书 3 | 4 |

5 | ARIES Logo 6 |

7 | 8 | 9 | 10 | [English](README.md) 11 | 12 |

13 | 14 | 15 | 16 | 17 | 18 | 19 |

20 | 21 | ## 介绍 22 | 23 | 一个自动获取 arXiv 最新 LLM 相关论文,并通过飞书机器人推送的工具。该机器人使用 Deepseek AI 对论文进行智能筛选和总结,帮助您快速了解最新研究动态。 24 | 25 | --- 26 | 27 | ## 功能特点 28 | 29 | - 🤖 **自动获取最新论文**:从 arXiv 抓取最新的 LLM 相关论文。 30 | - 🧠 **智能筛选与总结**:使用 Deepseek AI 提供高质量的筛选和总结。 31 | - 📱 **支持多机器人推送**:可配置多个飞书机器人,分别推送到不同群组。 32 | - ⏰ **定时任务支持**:默认每天早上 9 点推送,支持自定义。 33 | - ⚙️ **灵活配置**:通过配置文件自定义论文类型、筛选规则和推送方式。 34 | - 📊 **历史记录**:推送过的论文会记录在 `paper_history.json` 文件中,下次推送时会跳过。 35 | 36 | --- 37 | 38 | ## 快速开始 39 | 40 | 1. 安装依赖: 41 | ```bash 42 | pip install -r requirements.txt 43 | ``` 44 | 2. 配置环境变量: 45 | - 创建一个 `.env` 文件,按照下方样例填写: 46 | ``` 47 | DEEPSEEK_API_KEY=your_deepseek_api_key 48 | WEBHOOK_URL_1=https://your_first_webhook_url 49 | WEBHOOK_URL_2=https://your_second_webhook_url 50 | ``` 51 | 52 | 3. 配置 `config.yaml`,自定义论文类型和筛选规则。 53 | 54 | 4. 启动脚本: 55 | ```bash 56 | python main.py 57 | ``` 58 | 59 | --- 60 | 61 | ## 配置说明 62 | 63 | ### 环境变量配置 64 | 65 | - `DEEPSEEK_API_KEY`:**必填**,用于调用 Deepseek API。 66 | - `WEBHOOK_URL_[n]`:**必填**,飞书机器人的 Webhook 地址,可配置多个。 67 | 68 | #### 配置说明: 69 | 70 | - **`paper_types`**:定义每种论文类型的具体设置。 71 | - **`enabled`**:启用状态,`true` 表示启用,`false` 表示禁用。 72 | - **`search_query`**:在 arXiv 上使用的搜索查询,支持逻辑条件和关键词组合。 73 | - **`keywords`**:用于筛选论文的关键词列表。 74 | - **`prompt`**:使用 Deepseek 或其他工具判断论文相关性的提示词,已经根据search_query和keywords生成。用户可以自行修改。 75 | - **`max_papers`**:单次推送的最大论文数量(默认 5,可自行修改)。 76 | 77 | - **`general`**:全局配置。 78 | - **`max_search_results`**:搜索返回的最大论文数量。 79 | - **`schedule_time`**:每天定时任务运行时间(24 小时制)。 80 | 81 | --- 82 | 83 | ## 注意事项 84 | 85 | 1. 确保 Deepseek API Key 额度充足。 86 | 2. 飞书 Webhook 地址需要在飞书群中通过机器人设置获取。 87 | 3. 建议部署在服务器上持续运行。 88 | 4. 可通过编辑 `config.yaml` 添加新的论文类型或调整现有配置。 89 | 90 | --- 91 | 92 | ## 常见问题 93 | 94 | 1. **API 调用失败** 95 | - 请检查 Deepseek API Key 是否正确。 96 | 2. **消息推送失败** 97 | - 请确认 Webhook 地址是否有效。 98 | 3. **测试推送** 99 | - 可取消 `main` 函数中的 `schedule` 注释,直接运行 `agent.run()`。 100 | 101 | --- 102 | 103 | ## 📝 待办事项 104 | 105 | - 📚 文章收集与管理 106 | - [x] arXiv论文自动抓取 107 | - [ ] 文章历史记忆存储 108 | - [ ] 相关论文关联分析 109 | - [ ] 论文分类归档系统 110 | 111 | - 🔍 文章智能处理 112 | - [x] Auto Summary: 论文自动摘要 113 | - [ ] Auto Review: 论文自动点评 114 | - [ ] Auto Survey: 领域综述生成 115 | 116 | - 📢 多平台推送 117 | - [x] 飞书机器人推送 118 | - [ ] 微信机器人集成 119 | - [ ] 小红书内容发布 120 | 121 | --- 122 | 123 | ## License 124 | 125 | 本项目使用 [MIT License](https://opensource.org/license/mit)。 126 | -------------------------------------------------------------------------------- /arxiv_bot/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LAMDASZ-ML/Aries/fb266eddd4e021f1f7db26853dd5ea303e671663/arxiv_bot/__init__.py -------------------------------------------------------------------------------- /arxiv_bot/agent.py: -------------------------------------------------------------------------------- 1 | from typing import List, Dict 2 | from .config import Config 3 | from .ai_service import AIService 4 | from .fetcher import PaperFetcher 5 | from .messenger import FeishuMessenger 6 | 7 | class ArxivPaperAgent: 8 | def __init__(self, config_path: str = "config.yaml"): 9 | self.config = Config(config_path) 10 | self.ai_service = AIService(self.config.api_key) 11 | self.fetcher = PaperFetcher(self.ai_service, self.config) 12 | self.messenger = FeishuMessenger(self.config.webhook_urls) 13 | 14 | def run(self): 15 | """运行主流程""" 16 | for paper_type, type_config in self.config.config['paper_types'].items(): 17 | if not type_config['enabled']: 18 | continue 19 | 20 | papers = self.fetcher.fetch_papers(paper_type) 21 | if not papers: 22 | continue 23 | 24 | summaries = self._process_papers(papers) 25 | self.messenger.send_message(summaries, paper_type, type_config) 26 | 27 | def _process_papers(self, papers: List[Dict]) -> List[Dict]: 28 | summaries = [] 29 | for paper in papers: 30 | summary = self.ai_service.summarize( 31 | paper['abstract'], 32 | self.config.get_general_config()['summary_prompt'] 33 | ) 34 | summaries.append({ 35 | 'title': paper['title'], 36 | 'summary': summary, 37 | 'url': paper['url'] 38 | }) 39 | return summaries -------------------------------------------------------------------------------- /arxiv_bot/ai_service.py: -------------------------------------------------------------------------------- 1 | import requests 2 | from typing import Dict 3 | import json 4 | 5 | class AIService: 6 | def __init__(self, api_key: str): 7 | self.api_key = api_key 8 | self.api_url = "https://api.deepseek.com/chat/completions" 9 | 10 | def _call_api(self, prompt: str) -> str: 11 | headers = { 12 | "Authorization": f"Bearer {self.api_key}", 13 | "Content-Type": "application/json" 14 | } 15 | 16 | data = { 17 | "model": "deepseek-chat", 18 | "messages": [ 19 | {"role": "system", "content": "你是一个专业的学术论文助手,善于总结和分析论文内容。"}, 20 | {"role": "user", "content": prompt} 21 | ], 22 | "temperature": 0.7 23 | } 24 | 25 | try: 26 | response = requests.post(self.api_url, headers=headers, json=data) 27 | response.raise_for_status() 28 | result = response.json() 29 | 30 | if 'choices' in result and len(result['choices']) > 0: 31 | output = result['choices'][0]['message']['content'] 32 | return output 33 | else: 34 | print(f"API response format error: {json.dumps(result, ensure_ascii=False)}") 35 | return "" 36 | 37 | except Exception as e: 38 | print(f"Unexpected error: {e}") 39 | return "" 40 | 41 | def check_relevance(self, title: str, abstract: str, prompt_template: str) -> bool: 42 | prompt = prompt_template.format(title=title, abstract=abstract) 43 | answer = self._call_api(prompt).strip().lower() 44 | return not ("否" in answer or "no" in answer) 45 | 46 | def summarize(self, abstract: str, prompt_template: str) -> str: 47 | prompt = prompt_template.format(abstract=abstract) 48 | result = self._call_api(prompt) 49 | if not result: 50 | return "抱歉,生成摘要时出现错误。" 51 | return result -------------------------------------------------------------------------------- /arxiv_bot/config.py: -------------------------------------------------------------------------------- 1 | from typing import Dict, Any 2 | import os 3 | import yaml 4 | from dotenv import load_dotenv, dotenv_values 5 | 6 | load_dotenv(override=True) 7 | 8 | class Config: 9 | def __init__(self, config_path: str = "config.yaml"): 10 | self._load_config(config_path) 11 | self._load_env_vars() 12 | 13 | def _load_config(self, config_path: str): 14 | with open(config_path, 'r', encoding='utf-8') as f: 15 | self.config = yaml.safe_load(f) 16 | for paper_type, details in self.config['paper_types'].items(): 17 | if 'title' not in details: 18 | details['title'] = f"今日{paper_type}论文更新".upper() 19 | if 'prompt' not in details: 20 | details['prompt'] = self._generate_prompt(paper_type, details) 21 | if 'max_papers' not in details: 22 | details['max_papers'] = 5 23 | 24 | if 'summary_prompt' not in self.config['general']: 25 | self.config['general']['summary_prompt'] = "请根据摘要用一句话总结这篇文章的核心内容: {abstract}" 26 | 27 | def _generate_prompt(self, paper_type: str, details: Dict[str, Any]) -> str: 28 | """ 29 | 根据 paper_type 和 config.yaml 中的配置动态生成 prompt。 30 | """ 31 | keywords = "、".join(details.get('keywords', [])) 32 | search_query = details.get('search_query', '未定义搜索条件') 33 | 34 | prompt = ( 35 | f"请反思批判性的判断这篇论文是否与以下主题相关:{keywords}。\n\n" 36 | f"标题: {{title}}\n" 37 | f"摘要: {{abstract}}\n\n" 38 | f"请只回答\"是\"或\"否\"。如果论文主要研究与以下关键词相关的主题:{keywords},或符合搜索条件:{search_query},回答\"是\";\n" 39 | f"否者请回答\"否\"。" 40 | ) 41 | return prompt 42 | 43 | def _load_env_vars(self): 44 | self.webhook_urls = [] 45 | i = 1 46 | while True: 47 | webhook_url = os.getenv(f'WEBHOOK_URL_{i}') 48 | if webhook_url: 49 | self.webhook_urls.append(webhook_url) 50 | i += 1 51 | else: 52 | break 53 | 54 | if not self.webhook_urls: 55 | raise ValueError("No webhook URLs found in environment variables") 56 | 57 | env_vars = dotenv_values('.env') 58 | self.api_key = env_vars.get('DEEPSEEK_API_KEY') 59 | if not self.api_key: 60 | raise ValueError("DEEPSEEK_API_KEY not found in environment variables") 61 | 62 | def get_paper_type_config(self, paper_type: str) -> Dict[str, Any]: 63 | return self.config['paper_types'][paper_type] 64 | 65 | def get_general_config(self) -> Dict[str, Any]: 66 | return self.config['general'] -------------------------------------------------------------------------------- /arxiv_bot/fetcher.py: -------------------------------------------------------------------------------- 1 | import arxiv 2 | from typing import List, Dict 3 | from tqdm import tqdm 4 | from .storage import PaperStorage 5 | import time 6 | 7 | class PaperFetcher: 8 | def __init__(self, ai_service, config): 9 | self.ai_service = ai_service 10 | self.config = config 11 | self.storage = PaperStorage() 12 | 13 | def fetch_papers(self, paper_type: str): 14 | total_start = time.time() 15 | 16 | type_config = self.config.get_paper_type_config(paper_type) 17 | general_config = self.config.get_general_config() 18 | 19 | print(f"\n📚 开始获取 {paper_type} 类型的论文...") 20 | print(f"🔍 搜索查询: {type_config['search_query']}") 21 | print(f"📋 目标论文数量: {type_config['max_papers']}\n") 22 | 23 | papers = [] 24 | max_papers = type_config['max_papers'] 25 | max_results_per_request = general_config['max_search_results'] 26 | search_query = type_config['search_query'] 27 | max_attempts = 3 28 | retry_delay = 5 # 重试等待时间(秒) 29 | 30 | search = arxiv.Search( 31 | query=search_query, 32 | max_results=general_config['max_search_results'] * max_attempts, 33 | sort_by=arxiv.SortCriterion.SubmittedDate 34 | ) 35 | 36 | results_iterator = iter(search.results()) 37 | current_batch = 0 38 | 39 | while len(papers) < max_papers: 40 | try: 41 | print(f"\n📥 正在获取第 {current_batch + 1} 批论文") 42 | 43 | # 获取当前批次的论文 44 | current_papers = [] 45 | for _ in range(max_results_per_request): 46 | try: 47 | current_papers.append(next(results_iterator)) 48 | except StopIteration: 49 | print("\n📢 没有更多论文结果") 50 | break 51 | 52 | if not current_papers: 53 | break 54 | 55 | # 获取历史记录中最新的和最旧的论文ID 56 | latest_paper_id, oldest_paper_id = self.storage.get_latest_and_oldest_paper_id(paper_type) 57 | # 过滤已经推送过的论文 58 | if latest_paper_id: 59 | current_papers = [r for r in current_papers if self._is_valid_paper(r.entry_id, latest_paper_id, oldest_paper_id)] 60 | 61 | relevant_count = 0 62 | for result in tqdm(current_papers, desc=f"🔍 正在分析论文相关性", unit="篇"): 63 | if len(papers) >= max_papers: 64 | break 65 | 66 | if self.storage.is_paper_exists(paper_type, result.entry_id): 67 | continue 68 | 69 | if self._is_relevant_paper(result.title, result.summary, type_config): 70 | paper = { 71 | 'title': result.title, 72 | 'abstract': result.summary, 73 | 'url': result.entry_id 74 | } 75 | papers.append(paper) 76 | self.storage.add_paper(paper_type, result.entry_id) 77 | relevant_count += 1 78 | 79 | print(f"📊 其中相关论文: {relevant_count} 篇") 80 | current_batch += 1 81 | 82 | except Exception as e: 83 | print(f"\n❌ 发生错误: {str(e)}") 84 | time.sleep(retry_delay) 85 | continue 86 | 87 | total_time = time.time() - total_start 88 | print(f"\n✅ 论文获取完成!") 89 | print(f"⏱️ 总耗时: {total_time:.2f} 秒") 90 | print(f"📝 共获取相关论文: {len(papers)} 篇\n") 91 | 92 | return papers 93 | 94 | 95 | def _is_relevant_paper(self, title: str, abstract: str, type_config: Dict) -> bool: 96 | # 关键词检查 97 | for keyword in type_config['keywords']: 98 | if keyword in title.lower() or keyword in abstract.lower(): 99 | return True 100 | 101 | # AI 相关性检查 102 | return self.ai_service.check_relevance(title, abstract, type_config['prompt']) 103 | 104 | def _is_valid_paper(self, current_id: str, latest_id: str, oldest_id: str) -> bool: 105 | """ 106 | 比较两个论文ID,判断当前论文是否比最新的论文更新,并且比最旧的论文旧 107 | arxiv ID格式例如: 2403.12345v1 108 | """ 109 | current_version = float(current_id.split('/')[-1].split('v')[0]) 110 | latest_version = float(latest_id) 111 | oldest_version = float(oldest_id) 112 | return current_version > latest_version or current_version < oldest_version -------------------------------------------------------------------------------- /arxiv_bot/messenger.py: -------------------------------------------------------------------------------- 1 | import requests 2 | from datetime import datetime 3 | from typing import List, Dict 4 | 5 | class FeishuMessenger: 6 | def __init__(self, webhook_urls: List[str]): 7 | self.webhook_urls = webhook_urls 8 | 9 | def send_message(self, summaries: List[Dict], paper_type: str, type_config: Dict) -> bool: 10 | message = self._format_message(summaries, paper_type, type_config) 11 | return self._send_to_webhooks(message) 12 | 13 | def _format_message(self, summaries: List[Dict], paper_type: str, type_config: Dict) -> Dict: 14 | return { 15 | "msg_type": "post", 16 | "content": { 17 | "post": { 18 | "zh_cn": { 19 | "title": f"{type_config['title']} - {datetime.now().strftime('%Y-%m-%d')}", 20 | "content": [ 21 | [{ 22 | "tag": "text", 23 | "text": f"📑 {paper['title']}\n" 24 | f"💡 总结: {paper['summary']}\n" 25 | f"🔗 链接: {paper['url']}\n\n" 26 | }] for paper in summaries 27 | ] 28 | } 29 | } 30 | } 31 | } 32 | 33 | def _send_to_webhooks(self, message: Dict) -> bool: 34 | results = [] 35 | for webhook_url in self.webhook_urls: 36 | try: 37 | response = requests.post(webhook_url, json=message) 38 | success = response.status_code == 200 39 | results.append(success) 40 | print(f"Sent to webhook {webhook_url}: {'Success' if success else 'Failed'}") 41 | except Exception as e: 42 | print(f"Error sending to webhook {webhook_url}: {e}") 43 | results.append(False) 44 | 45 | return any(results) -------------------------------------------------------------------------------- /arxiv_bot/storage.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | from typing import Set, Dict 4 | from datetime import datetime 5 | 6 | class PaperStorage: 7 | def __init__(self, storage_file: str = "paper_history.json"): 8 | self.storage_file = storage_file 9 | self.paper_history = self._load_history() 10 | 11 | def _load_history(self) -> Dict[str, Set[str]]: 12 | if os.path.exists(self.storage_file): 13 | with open(self.storage_file, 'r', encoding='utf-8') as f: 14 | history_dict = json.load(f) 15 | # 将列表转换为集合以提高查找效率 16 | return {k: set(v) for k, v in history_dict.items()} 17 | return {} 18 | 19 | def _save_history(self): 20 | # 将集合转换回列表以便JSON序列化 21 | history_dict = {k: list(v) for k, v in self.paper_history.items()} 22 | with open(self.storage_file, 'w', encoding='utf-8') as f: 23 | json.dump(history_dict, f, ensure_ascii=False, indent=2) 24 | 25 | def is_paper_exists(self, paper_type: str, paper_url: str) -> bool: 26 | return paper_url in self.paper_history.get(paper_type, set()) 27 | 28 | def add_paper(self, paper_type: str, paper_url: str): 29 | if paper_type not in self.paper_history: 30 | self.paper_history[paper_type] = set() 31 | self.paper_history[paper_type].add(paper_url) 32 | self._save_history() 33 | 34 | def get_latest_and_oldest_paper_id(self, paper_type: str) -> tuple: 35 | """ 36 | 获取指定类型中最新的和最旧的论文ID 37 | """ 38 | papers = self.paper_history.get(paper_type, set()) 39 | if not papers: 40 | return float('-inf'), float('inf') 41 | 42 | try: 43 | paper_ids = [] 44 | for pid in papers: 45 | id_part = pid.split('/')[-1].split('v')[0] 46 | paper_ids.append((pid, float(id_part))) 47 | 48 | latest = max(paper_ids, key=lambda x: x[1]) 49 | oldest = min(paper_ids, key=lambda x: x[1]) 50 | return latest[1], oldest[1] 51 | except: 52 | return float('-inf'), float('inf') -------------------------------------------------------------------------------- /assert/ARIES.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LAMDASZ-ML/Aries/fb266eddd4e021f1f7db26853dd5ea303e671663/assert/ARIES.webp -------------------------------------------------------------------------------- /config.yaml: -------------------------------------------------------------------------------- 1 | paper_types: 2 | reasoning: 3 | enabled: true 4 | search_query: "cat:cs.AI AND (LLM OR 'Large Language Model Reasoning' OR 'LLM Reasoning' OR 'Neuro-Symbolic' OR 'Fast and Slow Thinking')" 5 | keywords: 6 | - reasoning 7 | - fast and slow thinking 8 | - Neuro-Symbolic 9 | - LLM Reasoning 10 | - Large Language Model Reasoning 11 | mllm: 12 | enabled: true 13 | search_query: "cat:cs.AI AND (LLM OR 'Multimodal Large Language Models')" 14 | keywords: 15 | - multimodal Large Language Models 16 | - cross-modal Large Language Models 17 | - MLLM 18 | 19 | general: 20 | max_search_results: 100 21 | schedule_time: 22 | - "09:00" 23 | - "18:00" -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import schedule 2 | import time 3 | from arxiv_bot.agent import ArxivPaperAgent 4 | 5 | def main(): 6 | agent = ArxivPaperAgent() 7 | # agent.run() # 取消注释以立即运行一次 8 | 9 | schedule_times = agent.config.get_general_config()['schedule_time'] 10 | if isinstance(schedule_times, str): 11 | schedule_times = [schedule_times] 12 | 13 | for schedule_time in schedule_times: 14 | schedule.every().day.at(schedule_time).do(agent.run) 15 | print(f"已设置定时任务:每天 {schedule_time} 运行") 16 | 17 | while True: 18 | schedule.run_pending() 19 | time.sleep(60) 20 | 21 | if __name__ == "__main__": 22 | main() -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | arxiv 2 | requests 3 | python-dotenv 4 | schedule 5 | tqdm 6 | PyYAML --------------------------------------------------------------------------------