├── .gitignore ├── LICENSE ├── README.md ├── cutoff_time2.txt ├── env.txt ├── main.py ├── requirements.txt ├── rss.ipynb ├── twitter_list.txt └── twitter_list_debug.txt /.gitignore: -------------------------------------------------------------------------------- 1 | # Ignore .env file 2 | .env 3 | *.venv* 4 | .ipynb* 5 | log* 6 | *debug* 7 | *example* 8 | cutoff_time.txt 9 | 10 | # Include env.txt file 11 | !env.txt 12 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 指定twitter ID 列表,定时抓取推特内容更新发送给自己的Telegram Bot。 3 | 4 | Use jupyter-lab to send Twitter updates from a list of users to your Telegram bot. 5 | 6 | 7 | ## Fork说明 8 | 9 | - 项目fork自[@joeseesun](https://github.com/joeseesun/AIGC_Telegram_Bot),感谢原作者带来的推动力和诸多启发! 10 | - 对这个项目感兴趣是因为我认为在不久的将来,用户可获取所有线上公开内容,每人将拥有自己的算法,在全网范围过滤信息和整理信息。付费信息也可以通过便利的市场完成交易。 11 | - 信息的流动将会变得异常高效,而各种互联网平台,只是处理某一类信息的算法公司,用户将不必需要这些平台提供信息的推荐服务。 12 | - 未来,一个视频可以发布在一个图床,然后声明允许各大视频网站抓取,并且提供对应的用户名,就可以实现全网发布。一篇文章,一段声音,也可以用同样的方式全网发布。 13 | - 全网发布,则意味着全网的用户都可以消费,并且使用自己最希望的方式去消费,比如,查看一个视频的总结和关键帧之后再看视频,查询与一篇文章相似的其他文章的合集,为文章添加AI配图之后再去阅读,文字|声音|视频|问答交互|等多模态之间可以自由切换。信息的消费方式将会实现高度的个性化。 14 | 15 | ## 更新要点 16 | - 改为放在jupyter-lab中运行,可以快速实验一些想法 17 | - 添加了一个文件,用于保存时间戳,来判断需要更新的内容,首次运行,会获取一小时内的内容 18 | - 添加了异步函数,同时请求多个rss源,减少等待时间 19 | - 添加了设置telegram api代理地址的功能,可参考[这里](https://blog.orii.xyz/202301/%E4%BD%BF%E7%94%A8cloudflare-Worker%E4%BB%A3%E7%90%86telegram-bot-api/) 20 | 21 | 22 | ## Todo 23 | - rsshub中,可能有跟时间相关的请求格式,带上时间去请求,可以减少数据传输 24 | - 鉴于rsshub有大量信息可订阅,需要一个分类订阅信息的功能,最好能有一个基于本地web端的数据看板和订阅源管理模块 25 | - 订阅的内容,发送到telegram是多种消费方式的一种,不妨喂进去AI模型,先提炼总结下 26 | 27 | ## 如果显示连接timeout 28 | - 首先`ping api.telegram.org`,看下是否可以连上 29 | - 如果ping不通,可以试下全局翻,并且打开clash的增强模式 30 | - 或者可以使用telegram api代理地址 31 | 32 | 33 | ## 使用方法 34 | 35 | 1. 创建Telegram机器人,获取Token 36 | - 打开 https://t.me/botfather 输入 /start 37 | - 按引导流程,先输入机器人名字,然后输入想要ID(必须以bot结尾),比如telegram_rss_bot 38 | - 创建后会给Token,类似这种结构:5987500169:AAEBqLx7OWmK6ne9pIfHhrgMktDmq_VcsSQ 39 | 40 | 2. 获取自己的Telegram ID 41 | 打开 https://t.me/userinfobot 输入 \/start,拿到自己的ID,类似结构:1293676963 42 | 43 | 44 | 3. 设置Token和Telegram ID 45 | 46 | - 把Token和Telegram ID 填入env.txt文件,然后把env.txt改名为".env" 47 | - 需要添加telegram api代理地址的,也可以设置在TELEGRAM_API_BASE_URL,防止网络无法连上 48 | - 如果有自己的rss的服务器,比如自建的rsshub服务器地址,也可以设置在RSS_BASE_URL 49 | 50 | 4. 把 cutoff_time2.txt 改名为 cutoff_time.txt,用于保存时间戳 51 | 52 | 5. 添加venv,安装依赖程序 53 | ``` 54 | python3 -m venv .venv_bot 55 | source .venv_bot/bin/activate 56 | pip install -r requirements.txt 57 | ``` 58 | 59 | 6. 运行程序 60 | ``` 61 | jupyter-lab 62 | ``` 63 | 之后打开rss.ipynb 64 | 65 | 7. 如果需要停止程序,在最后出现的输入框中按回车即可 66 | 67 | 68 | ## 想自定义关注人? 69 | 修改 twitter_list.txt ,一个一个 twitter ID,逗号分割后面是名字,可自定义(非必须) 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | -------------------------------------------------------------------------------- /cutoff_time2.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mobilestack/AIGC_Telegram_Bot/c1026ddf49af4680b0ea96a2a10f29157acf8252/cutoff_time2.txt -------------------------------------------------------------------------------- /env.txt: -------------------------------------------------------------------------------- 1 | # ① 创建Telegram机器人,获取Token 2 | # 打开 https://t.me/botfather 输入 /start 3 | # 按引导流程,先输入机器人名字,然后输入想要ID(必须以bot结尾),比如telegram_rss_bot 4 | # 创建后会给Token,类似这种结构:5987500169:AAEBqLx7OWmK6ne9pIfHhrgMktDmq_VcsSQ 5 | 6 | # ② 获取自己的Telegram ID 7 | # 打开 https://t.me/userinfobot 输入 \/start,拿到自己的ID,类似结构:1293676963 8 | 9 | TOKEN=替换为你的机器人Token 10 | target_chat_id=替换为你的Telegram ID 11 | 12 | # in case you cannot connect to telegram api 13 | TELEGRAM_API_BASE_URL=https://yourproxy.com/bot 14 | 15 | # if you have your own rsshub address, can put it here 16 | RSS_BASE_URL= 17 | 18 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import time 3 | import feedparser 4 | import requests 5 | from telegram import Bot 6 | from datetime import datetime 7 | from bs4 import BeautifulSoup 8 | from translate import Translator 9 | from dotenv import load_dotenv 10 | import os 11 | import asyncio 12 | 13 | # Set the source and target languages 14 | source_language = "en" 15 | target_language = "zh" 16 | translator = Translator(from_lang=source_language, to_lang=target_language) 17 | 18 | load_dotenv() # Load environment variables from .env file 19 | 20 | TOKEN = os.getenv("TOKEN") 21 | target_chat_id = os.getenv("target_chat_id") 22 | TELEGRAM_API_BASE_URL=os.getenv("TELEGRAM_API_BASE_URL", "https://api.telegram.org/bot") 23 | 24 | bot = Bot( 25 | token=TOKEN, 26 | base_url=TELEGRAM_API_BASE_URL, 27 | ) 28 | 29 | rss_list = [] 30 | # 读取文本文件 31 | with open("twitter_list.txt", "r", encoding="utf-8") as file: 32 | lines = file.readlines() 33 | 34 | # 遍历文本里的每一行 35 | for line in lines: 36 | info = line.strip().split(',') 37 | 38 | # 获取 Twitter ID 和昵称 39 | twitter_id = info[0].strip() 40 | nickname = info[1].strip() if len(info) > 1 else twitter_id 41 | 42 | # 构造 URL 43 | url = f"http://rss.qiaomu.pro/twitter/user/{twitter_id}" 44 | 45 | # 添加到 rss_list 46 | rss_list.append({ 47 | "name": nickname, 48 | "url": url 49 | }) 50 | 51 | # print(f"rss list is {rss_list}") 52 | 53 | def get_latest_twitter_updates(rss_url, last_item_link): 54 | response = requests.get(rss_url) 55 | rss_content = response.content 56 | feed = feedparser.parse(rss_content) 57 | 58 | latest_items = [] 59 | for entry in feed["entries"]: 60 | if entry["link"] == last_item_link: 61 | break 62 | latest_items.append(entry) 63 | 64 | 65 | # print(f"content list is {latest_items}") 66 | 67 | return latest_items 68 | 69 | async def send_update_to_telegram(items): 70 | 71 | # print(f"in async send, item length is {len(items)}") 72 | 73 | for item in items: 74 | author = item["author"] 75 | title = item["title"] 76 | 77 | # print(f"author is {author}, title is {title}") 78 | 79 | description_html = item["description"] 80 | soup = BeautifulSoup(description_html, 'html.parser') 81 | 82 | # Convert div with class rsshub-quote 83 | rsshub_quotes = soup.find_all('div', class_='rsshub-quote') 84 | for rsshub_quote in rsshub_quotes: 85 | rsshub_quote.string = f"\n> {rsshub_quote.get_text(separator=' ', strip=True)}\n\n" 86 | 87 | for br in soup.find_all('br'): 88 | br.replace_with('\n') 89 | 90 | description = "\n".join(soup.stripped_strings) 91 | description_zh = translator.translate(description) 92 | 93 | 94 | # Get and send images from the text 95 | images = soup.find_all('img', src=True) 96 | # 处理图片,单独发送 97 | for img in images: 98 | await asyncio.to_thread(bot.send_photo, chat_id=target_chat_id, photo=img['src']) 99 | # 处理视频,单独发送 100 | videos = soup.find_all('video', src=True) 101 | for video in videos: 102 | video_url = video.get("src") 103 | await asyncio.to_thread(bot.send_video, chat_id=target_chat_id, video=video_url) 104 | 105 | pub_date_parsed = datetime.strptime(item["published"], "%a, %d %b %Y %H:%M:%S %Z") 106 | pub_date = pub_date_parsed.strftime("%Y-%m-%d %H:%M:%S") 107 | link = item["link"] 108 | 109 | message = ( 110 | f"From {author}:\n\n" 111 | f"发布时间: {pub_date}\n\n" 112 | f"{description}\n\n" 113 | f"{description_zh}\n\n" 114 | f"链接: {link}" 115 | ) 116 | 117 | # print(f"message to be sent is {message}") 118 | 119 | await asyncio.to_thread( 120 | bot.send_message, 121 | chat_id=target_chat_id, 122 | text=message, 123 | timeout=100, 124 | ) # Do not use parse_mode="HTML" 125 | 126 | last_links = [None] * len(rss_list) 127 | interval = 600 # 以秒为单位,根据需要调整RSS检查的频率 128 | 129 | async def main(): 130 | while True: 131 | for index, rss_source in enumerate(rss_list): 132 | latest_items = get_latest_twitter_updates(rss_source["url"], last_links[index]) 133 | 134 | # print(f"latest_items length is {len(latest_items)}") 135 | 136 | 137 | if latest_items: 138 | last_links[index] = latest_items[0]["link"] 139 | await send_update_to_telegram(latest_items[::-1]) # Send tweets from oldest to newest 140 | 141 | await asyncio.sleep(interval) 142 | 143 | if __name__ == "__main__": 144 | asyncio.run(main()) 145 | 146 | 147 | 148 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | aiofiles==22.1.0 2 | aiohttp==3.8.4 3 | aiosignal==1.3.1 4 | aiosqlite==0.19.0 5 | anyio==3.6.2 6 | appnope==0.1.3 7 | APScheduler==3.6.3 8 | argon2-cffi==21.3.0 9 | argon2-cffi-bindings==21.2.0 10 | arrow==1.2.3 11 | asttokens==2.2.1 12 | async-timeout==4.0.2 13 | attrs==23.1.0 14 | Babel==2.12.1 15 | backcall==0.2.0 16 | beautifulsoup4==4.9.3 17 | bleach==6.0.0 18 | cachetools==4.2.2 19 | certifi==2022.12.7 20 | cffi==1.15.1 21 | charset-normalizer==2.0.12 22 | click==8.1.3 23 | comm==0.1.3 24 | debugpy==1.6.7 25 | decorator==5.1.1 26 | defusedxml==0.7.1 27 | executing==1.2.0 28 | fastjsonschema==2.16.3 29 | feedparser==6.0.8 30 | fqdn==1.5.1 31 | frozenlist==1.3.3 32 | idna==3.4 33 | ipykernel==6.22.0 34 | ipython==8.12.0 35 | ipython-genutils==0.2.0 36 | isoduration==20.11.0 37 | jedi==0.18.2 38 | Jinja2==3.1.2 39 | json5==0.9.11 40 | jsonpointer==2.3 41 | jsonschema==4.17.3 42 | jupyter-events==0.6.3 43 | jupyter-ydoc==0.2.4 44 | jupyter_client==8.2.0 45 | jupyter_core==5.3.0 46 | jupyter_server==2.5.0 47 | jupyter_server_fileid==0.9.0 48 | jupyter_server_terminals==0.4.4 49 | jupyter_server_ydoc==0.8.0 50 | jupyterlab==3.6.3 51 | jupyterlab-pygments==0.2.2 52 | jupyterlab_server==2.22.1 53 | libretranslatepy==2.1.1 54 | lxml==4.9.2 55 | MarkupSafe==2.1.2 56 | matplotlib-inline==0.1.6 57 | mistune==2.0.5 58 | multidict==6.0.4 59 | nbclassic==0.5.5 60 | nbclient==0.7.3 61 | nbconvert==7.3.1 62 | nbformat==5.8.0 63 | nest-asyncio==1.5.6 64 | notebook==6.5.4 65 | notebook_shim==0.2.2 66 | packaging==23.1 67 | pandocfilters==1.5.0 68 | parso==0.8.3 69 | pexpect==4.8.0 70 | pickleshare==0.7.5 71 | platformdirs==3.2.0 72 | prometheus-client==0.16.0 73 | prompt-toolkit==3.0.38 74 | psutil==5.9.4 75 | ptyprocess==0.7.0 76 | pure-eval==0.2.2 77 | pycparser==2.21 78 | Pygments==2.15.0 79 | pyrsistent==0.19.3 80 | python-dateutil==2.8.2 81 | python-dotenv==1.0.0 82 | python-json-logger==2.0.7 83 | python-telegram-bot==13.7 84 | pytz==2023.3 85 | pytz-deprecation-shim==0.1.0.post0 86 | PyYAML==6.0 87 | pyzmq==25.0.2 88 | requests==2.28.2 89 | rfc3339-validator==0.1.4 90 | rfc3986-validator==0.1.1 91 | Send2Trash==1.8.0 92 | sgmllib3k==1.0.0 93 | six==1.16.0 94 | sniffio==1.3.0 95 | soupsieve==2.4.1 96 | stack-data==0.6.2 97 | terminado==0.17.1 98 | tinycss2==1.2.1 99 | tornado==6.2 100 | traitlets==5.9.0 101 | translate==3.6.1 102 | tzdata==2023.3 103 | tzlocal==4.3 104 | uri-template==1.2.0 105 | urllib3==1.26.15 106 | wcwidth==0.2.6 107 | webcolors==1.13 108 | webencodings==0.5.1 109 | websocket-client==1.5.1 110 | y-py==0.5.9 111 | yarl==1.8.2 112 | ypy-websocket==0.8.2 113 | -------------------------------------------------------------------------------- /rss.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "59749f2a-888b-477a-9d36-57e3865a7dab", 6 | "metadata": {}, 7 | "source": [ 8 | "# Telegram Bot Message From Interested Twitter Users" 9 | ] 10 | }, 11 | { 12 | "cell_type": "markdown", 13 | "id": "06550f52-03bc-4975-b84a-481c543444b4", 14 | "metadata": {}, 15 | "source": [ 16 | "## Setup Environment" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 1, 22 | "id": "a9bc0df0-1f64-44a2-93cb-9aae71887cb5", 23 | "metadata": { 24 | "tags": [] 25 | }, 26 | "outputs": [], 27 | "source": [ 28 | "import time\n", 29 | "import feedparser\n", 30 | "import requests\n", 31 | "from telegram import Bot\n", 32 | "from datetime import datetime,timedelta\n", 33 | "from bs4 import BeautifulSoup\n", 34 | "from translate import Translator\n", 35 | "from dotenv import load_dotenv\n", 36 | "import os\n", 37 | "import asyncio\n", 38 | "import aiohttp\n", 39 | "from typing import List, Dict\n", 40 | "from dateutil import tz, parser" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "id": "95c270ff-eaa4-486c-88a7-8798e7d70876", 46 | "metadata": {}, 47 | "source": [ 48 | "### Environment Variable" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 2, 54 | "id": "e43f969f-7408-4fd8-a1a1-fe67dae62074", 55 | "metadata": { 56 | "tags": [] 57 | }, 58 | "outputs": [], 59 | "source": [ 60 | "load_dotenv() # Load environment variables from .env file\n", 61 | "\n", 62 | "TOKEN = os.getenv(\"TOKEN\")\n", 63 | "target_chat_id = os.getenv(\"target_chat_id\")\n", 64 | "TELEGRAM_API_BASE_URL=os.getenv(\"TELEGRAM_API_BASE_URL\", \"https://api.telegram.org/bot\")\n", 65 | "RSS_BASE_URL=os.getenv(\"RSS_BASE_URL\", \"http://rsshub.app\")" 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "id": "3090c386-2560-4d77-81a4-797b45794b3a", 71 | "metadata": {}, 72 | "source": [ 73 | "### Paths, Constants, Globals" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": 3, 79 | "id": "9ae74887-8c56-4e55-b663-88ce4a1d92b7", 80 | "metadata": { 81 | "tags": [] 82 | }, 83 | "outputs": [], 84 | "source": [ 85 | "# at least specify the username, name optional\n", 86 | "TWITTER_USER_LIST_FILE=\"twitter_list.txt\"\n", 87 | "\n", 88 | "# here to save the cutoff time\n", 89 | "CUTOFF_TIME_FILE=\"cutoff_time.txt\"\n", 90 | "\n", 91 | "# here to save the log file\n", 92 | "LOG_FILE=\"log_file.txt\"\n" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "id": "bce4f1e8-6f8b-4a88-9250-2fead553178b", 98 | "metadata": {}, 99 | "source": [ 100 | "- Globals" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 4, 106 | "id": "e1f9a7dd-c09e-43c6-b938-7c3c40ff583f", 107 | "metadata": { 108 | "tags": [] 109 | }, 110 | "outputs": [], 111 | "source": [ 112 | "# the newest time from fetched twitter entries\n", 113 | "# use this to filter newer ones or fetch newer ones \n", 114 | "newest_time_str = \"\"\n", 115 | "\n", 116 | "# one hour?\n", 117 | "wait_interval = 3600 # in seconds\n", 118 | "\n", 119 | "# we are runing async\n", 120 | "loop = asyncio.get_event_loop()\n" 121 | ] 122 | }, 123 | { 124 | "cell_type": "markdown", 125 | "id": "c4b4d440-ba16-4c06-b467-34dafde5cd2c", 126 | "metadata": {}, 127 | "source": [ 128 | "## Read Input" 129 | ] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "id": "5bcb55d8-c69f-4371-ab8b-ec7d7c9cd79d", 134 | "metadata": {}, 135 | "source": [ 136 | "### Read Twitter User List" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": 5, 142 | "id": "c260a4f8-d0f6-48d4-9117-26ccba9eca4e", 143 | "metadata": { 144 | "tags": [] 145 | }, 146 | "outputs": [], 147 | "source": [ 148 | "def read_twitter_user_url_list():\n", 149 | " with open(TWITTER_USER_LIST_FILE, \"r\", encoding=\"utf-8\") as file:\n", 150 | " lines = file.readlines()\n", 151 | " \n", 152 | " url_list = []\n", 153 | " for line in lines:\n", 154 | " info = line.strip().split(',')\n", 155 | " twitter_id = info[0].strip()\n", 156 | " url = f\"{RSS_BASE_URL}/twitter/user/{twitter_id}\"\n", 157 | " url_list.append(url)\n", 158 | " \n", 159 | " print(f\"interested users: {len(url_list)}\\n\")\n", 160 | "\n", 161 | " return url_list" 162 | ] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "id": "04c70ecd-e6f8-4722-9eae-d8f1d137ce20", 167 | "metadata": {}, 168 | "source": [ 169 | "## Helper Function" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "id": "710cb404-2681-4998-a66d-dc3e7b7bc4ac", 175 | "metadata": {}, 176 | "source": [ 177 | "### Cutoff Time Helper Function" 178 | ] 179 | }, 180 | { 181 | "cell_type": "code", 182 | "execution_count": 6, 183 | "id": "e4aabe42-af9f-4a7d-a371-b0df6ecf3e5c", 184 | "metadata": { 185 | "tags": [] 186 | }, 187 | "outputs": [], 188 | "source": [ 189 | "def read_cutoff_time():\n", 190 | " \"\"\"\n", 191 | " record the newest time at the bottom\n", 192 | " return: datetime object\n", 193 | " \"\"\"\n", 194 | " try:\n", 195 | " with open(CUTOFF_TIME_FILE, \"r\") as f:\n", 196 | " lines = f.readlines()\n", 197 | " except FileNotFoundError:\n", 198 | " return use_yesterday_as_cutoff()\n", 199 | " \n", 200 | " if len(lines) == 0:\n", 201 | " return use_yesterday_as_cutoff()\n", 202 | " \n", 203 | " # in case you opened this file and hit some enters\n", 204 | " stripped_lines = [line for line in lines if len(line.strip()) > 0]\n", 205 | " if len(stripped_lines) == 0:\n", 206 | " return use_yesterday_as_cutoff()\n", 207 | " \n", 208 | " cutoff_time = stripped_lines[-1]\n", 209 | " \n", 210 | " # print(f\"read, cutoff time is {cutoff_time}\")\n", 211 | " \n", 212 | " try:\n", 213 | " # must use the format we defined, strictly\n", 214 | " time_converted = parser.parse(cutoff_time)\n", 215 | " except:\n", 216 | " raise\n", 217 | " \n", 218 | " # print(f\"read, time converted is {time_converted}\")\n", 219 | " \n", 220 | " return time_converted\n", 221 | "\n", 222 | "def use_yesterday_as_cutoff():\n", 223 | " \"\"\"\n", 224 | " 第一次运行,获取一天前或一小时前的内容,等等,可自定义\n", 225 | " Take care of timezone for international twitter users\n", 226 | " \"\"\"\n", 227 | " local_tz = tz.tzlocal()\n", 228 | " now = datetime.now(local_tz)\n", 229 | " # fetch contents from 1 hour ago\n", 230 | " # or 1 day ago, etc\n", 231 | " one_day_ago = now - timedelta(hours=1)\n", 232 | " write_cutoff_time(one_day_ago)\n", 233 | " return one_day_ago\n", 234 | "\n", 235 | "def write_cutoff_time(cutoff_time):\n", 236 | " \"\"\"\n", 237 | " time: str or datetime object\n", 238 | " return: None\n", 239 | " time_str write to file\n", 240 | " \"\"\"\n", 241 | " if isinstance(cutoff_time, str):\n", 242 | " # test if format is correct\n", 243 | " try:\n", 244 | " # if is str and with correct format\n", 245 | " # print(f\"cutoff time in write, is str, is {cutoff_time}\")\n", 246 | " time_converted = parser.parse(cutoff_time)\n", 247 | " except:\n", 248 | " raise\n", 249 | " \n", 250 | " time_str = cutoff_time\n", 251 | " \n", 252 | " elif isinstance(cutoff_time, datetime):\n", 253 | " # must be timezone aware\n", 254 | " # already checked this, able to print out timezone, if input has tz\n", 255 | " TIME_RECORD_FORMAT=\"%Y-%m-%d %H:%M:%S %Z\"\n", 256 | " time_str = cutoff_time.strftime(TIME_RECORD_FORMAT)\n", 257 | " else:\n", 258 | " raise(\"not str or datetime.datetime\")\n", 259 | " \n", 260 | " # print(f\"writing, time str is {time_str}\")\n", 261 | " \n", 262 | " # overwrite everything in the file\n", 263 | " with open(CUTOFF_TIME_FILE, \"w\") as f:\n", 264 | " f.write(time_str)\n" 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "id": "843740d1-c34f-4e2f-bd2a-9fd46227939e", 270 | "metadata": {}, 271 | "source": [ 272 | "### Time Format Helper Function" 273 | ] 274 | }, 275 | { 276 | "cell_type": "code", 277 | "execution_count": 7, 278 | "id": "9cfb64ab-9461-4cbf-ac91-b7a4da925dab", 279 | "metadata": { 280 | "tags": [] 281 | }, 282 | "outputs": [], 283 | "source": [ 284 | "def twitter_rss_time_converter(datetime_string:str) -> datetime:\n", 285 | " aware_datetime = parser.parse(datetime_string)\n", 286 | " return aware_datetime" 287 | ] 288 | }, 289 | { 290 | "cell_type": "markdown", 291 | "id": "d1f87441-e373-46a3-9aaf-06214455eb90", 292 | "metadata": {}, 293 | "source": [ 294 | "### Filter Function" 295 | ] 296 | }, 297 | { 298 | "cell_type": "code", 299 | "execution_count": 8, 300 | "id": "632d8a8c-06ed-488a-afd2-aa29aedbf3a8", 301 | "metadata": { 302 | "tags": [] 303 | }, 304 | "outputs": [], 305 | "source": [ 306 | "def filter_sort_twitter_entries(entries):\n", 307 | " cutoff_time = read_cutoff_time()\n", 308 | "\n", 309 | " # filtered_entries = list(filter(lambda x: twitter_rss_time_converter(x['published']) > cutoff_time, entries))\n", 310 | " # or simpler\n", 311 | " \n", 312 | " filtered_entries = [x for x in entries if twitter_rss_time_converter(x['published']) > cutoff_time]\n", 313 | " \n", 314 | " print(f\"Cutoff Time: {cutoff_time}\\n\")\n", 315 | " # print([x.published for x in filtered_entries])\n", 316 | " if len(filtered_entries) > 0:\n", 317 | " print(f\"After Filter, {len(filtered_entries)} items will be sent to bot.\\n\")\n", 318 | " \n", 319 | " sorted_results = sorted(filtered_entries, key=lambda x: twitter_rss_time_converter(x['published']), reverse=True)\n", 320 | " return sorted_results\n" 321 | ] 322 | }, 323 | { 324 | "cell_type": "markdown", 325 | "id": "03592da4-4679-42d7-aa76-c4fb809f03d7", 326 | "metadata": {}, 327 | "source": [ 328 | "### Logging" 329 | ] 330 | }, 331 | { 332 | "cell_type": "code", 333 | "execution_count": 9, 334 | "id": "7ce844df-1c72-45a0-ad10-5f03ce40cc1d", 335 | "metadata": {}, 336 | "outputs": [], 337 | "source": [ 338 | "def log_time():\n", 339 | " \"\"\"\n", 340 | " todo, might log more info\n", 341 | " \"\"\"\n", 342 | " with open(LOG_FILE, \"a\") as f:\n", 343 | " f.write(f\"now is {datetime.now()}; newest cutoff time is {newest_time_str}\\n\")\n" 344 | ] 345 | }, 346 | { 347 | "cell_type": "markdown", 348 | "id": "03c9826d-179e-4e10-9037-ddeddfb325cb", 349 | "metadata": {}, 350 | "source": [ 351 | "## Async Fetching URLs" 352 | ] 353 | }, 354 | { 355 | "cell_type": "markdown", 356 | "id": "7405810b-948c-4cfd-9530-0b45d1539639", 357 | "metadata": {}, 358 | "source": [ 359 | "### Async Fetch" 360 | ] 361 | }, 362 | { 363 | "cell_type": "code", 364 | "execution_count": 10, 365 | "id": "9ce46e92-ba1c-4761-a091-4727e8d9c00d", 366 | "metadata": { 367 | "tags": [] 368 | }, 369 | "outputs": [ 370 | { 371 | "name": "stdout", 372 | "output_type": "stream", 373 | "text": [ 374 | "IPython autoawait is `on`, and set to use `asyncio`\n" 375 | ] 376 | } 377 | ], 378 | "source": [ 379 | "# with this, able to run event loop in Jupyter\n", 380 | "%autoawait\n", 381 | "\n", 382 | "async def fetch(session, url):\n", 383 | " async with session.get(url) as response:\n", 384 | " return await response.text()\n", 385 | "\n", 386 | "async def fetch_all(urls):\n", 387 | " async with aiohttp.ClientSession() as session:\n", 388 | " tasks = []\n", 389 | " for url in urls:\n", 390 | " task = asyncio.ensure_future(fetch(session, url))\n", 391 | " tasks.append(task)\n", 392 | " responses = await asyncio.gather(*tasks)\n", 393 | " return responses\n", 394 | " \n", 395 | "async def fetch_twitter_entries():\n", 396 | " # print(f\"async, start fetching\\n\")\n", 397 | " urls = read_twitter_user_url_list()\n", 398 | " \n", 399 | " print(f\"Fetching ...\\n\")\n", 400 | "\n", 401 | " responses = await fetch_all(urls)\n", 402 | " # print(f\"After Async Running, Fetched Users: {len(responses)}\\n\")\n", 403 | " # print(f\"Fetched Users: {len(responses)}\\n\")\n", 404 | "\n", 405 | " entries = []\n", 406 | " for i, response in enumerate(responses):\n", 407 | " feed = feedparser.parse(response)\n", 408 | " entries += feed.entries\n", 409 | " \n", 410 | " return entries\n", 411 | "\n" 412 | ] 413 | }, 414 | { 415 | "cell_type": "markdown", 416 | "id": "ea05c8f4-a146-4ed9-8cd5-04e08f619f57", 417 | "metadata": {}, 418 | "source": [ 419 | "## Format Telegram Bot Messages" 420 | ] 421 | }, 422 | { 423 | "cell_type": "markdown", 424 | "id": "db5eb933-7b3a-4af6-8b25-684798741170", 425 | "metadata": {}, 426 | "source": [ 427 | "### Bot Message Formatting" 428 | ] 429 | }, 430 | { 431 | "cell_type": "code", 432 | "execution_count": 11, 433 | "id": "cc7d0dfe-835b-44e9-b95f-f3fa1be2fa15", 434 | "metadata": { 435 | "tags": [] 436 | }, 437 | "outputs": [], 438 | "source": [ 439 | "def bot_message_from_entrie(item):\n", 440 | " author = item[\"author\"]\n", 441 | " title = item[\"title\"]\n", 442 | " link = item[\"link\"]\n", 443 | " pub_date_parsed = parser.parse(item[\"published\"])\n", 444 | " description = parse_html_from_rss(item[\"description\"])\n", 445 | "\n", 446 | " message = (\n", 447 | " f\"{author} {pub_date_parsed}\\n\"\n", 448 | " f\"{description}\\n\" \n", 449 | " f\"{link}\"\n", 450 | " )\n", 451 | " \n", 452 | " return message\n", 453 | "\n", 454 | "def parse_html_from_rss(description_html):\n", 455 | " soup = BeautifulSoup(description_html, 'html.parser')\n", 456 | " # Convert div with class rsshub-quote\n", 457 | " rsshub_quotes = soup.find_all('div', class_='rsshub-quote')\n", 458 | " for rsshub_quote in rsshub_quotes:\n", 459 | " rsshub_quote.string = f\"\\n> {rsshub_quote.get_text(separator=' ', strip=True)}\\n\\n\"\n", 460 | "\n", 461 | " for br in soup.find_all('br'):\n", 462 | " br.replace_with('\\n')\n", 463 | "\n", 464 | " description = \"\\n\".join(soup.stripped_strings)\n", 465 | " \n", 466 | " return description\n" 467 | ] 468 | }, 469 | { 470 | "cell_type": "markdown", 471 | "id": "c8643f16-c753-47d8-b4c4-ff9ac7c0785c", 472 | "metadata": {}, 473 | "source": [ 474 | "## Filter Content and Send Messages" 475 | ] 476 | }, 477 | { 478 | "cell_type": "markdown", 479 | "id": "c7bc8ffc-4cb3-4261-acf1-9c60ae1f686f", 480 | "metadata": {}, 481 | "source": [ 482 | "### Filter Twitter Entries" 483 | ] 484 | }, 485 | { 486 | "cell_type": "code", 487 | "execution_count": 12, 488 | "id": "7cbb5b8a-4f27-4569-a07f-c47a28ae55f5", 489 | "metadata": { 490 | "tags": [] 491 | }, 492 | "outputs": [], 493 | "source": [ 494 | "async def telegram_message_list_to_send():\n", 495 | " \n", 496 | " # todo, maybe log twitter account as well\n", 497 | " log_time()\n", 498 | " \n", 499 | " entries = await fetch_twitter_entries()\n", 500 | " print(f\"Fetched twitter: {len(entries)}\\n\")\n", 501 | " \n", 502 | " filtered_entries = filter_sort_twitter_entries(entries)\n", 503 | " \n", 504 | " if len(filtered_entries) > 0:\n", 505 | " global newest_time_str\n", 506 | " newest_time_str = filtered_entries[0]['published']\n", 507 | " message_list = [bot_message_from_entrie(x) for x in filtered_entries]\n", 508 | " return message_list\n", 509 | " else:\n", 510 | " # print(f\"no new twitter entry\")\n", 511 | " return []\n" 512 | ] 513 | }, 514 | { 515 | "cell_type": "markdown", 516 | "id": "e7bda862-9b93-40b4-8351-506b40381586", 517 | "metadata": {}, 518 | "source": [ 519 | "### Send Bot Message" 520 | ] 521 | }, 522 | { 523 | "cell_type": "code", 524 | "execution_count": 13, 525 | "id": "32e8f495-562d-4d0e-9b73-d3e335f69098", 526 | "metadata": { 527 | "tags": [] 528 | }, 529 | "outputs": [], 530 | "source": [ 531 | "async def send_to_telegram_bot():\n", 532 | " bot = Bot(\n", 533 | " token=TOKEN,\n", 534 | " base_url=TELEGRAM_API_BASE_URL,\n", 535 | " )\n", 536 | " \n", 537 | " ml = await telegram_message_list_to_send()\n", 538 | " \n", 539 | " sleep_time_msg = f\"Now sleep time, next run will be after {wait_interval} seconds.\\n\"\n", 540 | " \n", 541 | " if len(ml) == 0:\n", 542 | " # nothing to do\n", 543 | " print(f\"No Messages to Send.\\n\")\n", 544 | " print(sleep_time_msg)\n", 545 | " return\n", 546 | " \n", 547 | " print(f\"Sending Telegram Bot Messages\\n\")\n", 548 | " \n", 549 | " # todo, record to log file and later send to AI\n", 550 | "\n", 551 | " global newest_time_str\n", 552 | " \n", 553 | " try:\n", 554 | " for message in ml:\n", 555 | " bot.send_message(\n", 556 | " chat_id=target_chat_id, \n", 557 | " text=message,\n", 558 | " timeout=10,\n", 559 | " ) \n", 560 | " \n", 561 | " # update cutoff time\n", 562 | " write_cutoff_time(newest_time_str)\n", 563 | " print(f\"Cutoff time updated to: {newest_time_str}\\n\")\n", 564 | " print(sleep_time_msg)\n", 565 | "\n", 566 | " except:\n", 567 | " raise\n", 568 | " \n" 569 | ] 570 | }, 571 | { 572 | "cell_type": "markdown", 573 | "id": "d35e539b-490a-4eb3-83c8-f205380a6029", 574 | "metadata": {}, 575 | "source": [ 576 | "## Task Management" 577 | ] 578 | }, 579 | { 580 | "cell_type": "markdown", 581 | "id": "2b435f83-7b53-4439-bb02-a1a526acef92", 582 | "metadata": {}, 583 | "source": [ 584 | "### Start Task" 585 | ] 586 | }, 587 | { 588 | "cell_type": "code", 589 | "execution_count": 14, 590 | "id": "bf780865-6fa7-4216-ab55-9a8746a91f9e", 591 | "metadata": { 592 | "tags": [] 593 | }, 594 | "outputs": [], 595 | "source": [ 596 | "# Here in Jupyter-lab, do not use asyncio.run\n", 597 | "# this will have conflict with Jupyter\n", 598 | "# use %autoawait is the solution\n", 599 | "\n", 600 | "cancel_event = asyncio.Event()\n", 601 | "\n", 602 | "async def main(cancel_event): \n", 603 | " try:\n", 604 | " while not cancel_event.is_set():\n", 605 | " await send_to_telegram_bot()\n", 606 | " await asyncio.sleep(wait_interval)\n", 607 | " except asyncio.CancelledError:\n", 608 | " print(\"Coroutine cancelled.\")\n", 609 | " finally:\n", 610 | " print(\"Coroutine stopped. 程序已结束.\")\n" 611 | ] 612 | }, 613 | { 614 | "cell_type": "markdown", 615 | "id": "ff625652-e2a2-420a-9803-b1a95491e94a", 616 | "metadata": {}, 617 | "source": [ 618 | "### Cancel Task" 619 | ] 620 | }, 621 | { 622 | "cell_type": "markdown", 623 | "id": "7429e592-fd38-4c34-814f-8f0fab737f7d", 624 | "metadata": {}, 625 | "source": [ 626 | "- 在下面出现的输入框中敲击回车,即可停止程序运行\n", 627 | "- 或者在输入框中输入任何字符后回车,也可停止" 628 | ] 629 | }, 630 | { 631 | "cell_type": "code", 632 | "execution_count": 15, 633 | "id": "477ed0ac-7c86-4d3f-ac87-15252b59dcd4", 634 | "metadata": { 635 | "tags": [] 636 | }, 637 | "outputs": [ 638 | { 639 | "name": "stdout", 640 | "output_type": "stream", 641 | "text": [ 642 | "Fetching Content ...\n", 643 | "\n", 644 | "Fetched Users: 111\n", 645 | "\n", 646 | "Fetched Entries: 1980\n", 647 | "\n", 648 | "Cutoff Time: 2023-04-18 11:04:09+00:00\n", 649 | "\n", 650 | "After Filter, 417 items will be sent to bot.\n", 651 | "\n", 652 | "Sending Telegram Bot Messages\n", 653 | "\n", 654 | "Cutoff time updated to: Wed, 19 Apr 2023 06:23:25 GMT\n", 655 | "\n", 656 | "Now sleep time, next run will be after 3600 seconds.\n", 657 | "\n" 658 | ] 659 | }, 660 | { 661 | "name": "stdin", 662 | "output_type": "stream", 663 | "text": [ 664 | " \n" 665 | ] 666 | }, 667 | { 668 | "name": "stdout", 669 | "output_type": "stream", 670 | "text": [ 671 | "Coroutine cancelled.\n", 672 | "Coroutine stopped. 程序已结束.\n" 673 | ] 674 | } 675 | ], 676 | "source": [ 677 | "async def cancel_on_keypress(task):\n", 678 | " # print(\"Press Enter to cancel the task.\")\n", 679 | " await asyncio.to_thread(input)\n", 680 | " task.cancel()\n", 681 | "\n", 682 | "task = asyncio.create_task(main(cancel_event))\n", 683 | "cancel_task = asyncio.create_task(cancel_on_keypress(task))\n", 684 | "\n", 685 | "try:\n", 686 | " await asyncio.gather(task, cancel_task, return_exceptions=True)\n", 687 | "except asyncio.CancelledError:\n", 688 | " pass" 689 | ] 690 | } 691 | ], 692 | "metadata": { 693 | "kernelspec": { 694 | "display_name": "Python 3 (ipykernel)", 695 | "language": "python", 696 | "name": "python3" 697 | }, 698 | "language_info": { 699 | "codemirror_mode": { 700 | "name": "ipython", 701 | "version": 3 702 | }, 703 | "file_extension": ".py", 704 | "mimetype": "text/x-python", 705 | "name": "python", 706 | "nbconvert_exporter": "python", 707 | "pygments_lexer": "ipython3", 708 | "version": "3.11.3" 709 | }, 710 | "toc-autonumbering": true, 711 | "toc-showmarkdowntxt": false 712 | }, 713 | "nbformat": 4, 714 | "nbformat_minor": 5 715 | } 716 | -------------------------------------------------------------------------------- /twitter_list.txt: -------------------------------------------------------------------------------- 1 | FinanceYF5,Will 2 | sama,Sam Altman 3 | AlecRad,Alec Radford 4 | gdb,Greg Brockman 5 | ilyasut,Ilya Sutskever 6 | woj_zaremba,Wojciech Zaremba 7 | johnschulman2,John Schulman 8 | janleike,Jan Leike 9 | karpathy,Andrej Karpathy 10 | miramurati,Mira Murati 11 | lilianweng,Lilian Weng 12 | merettm,Jakub Pachocki 13 | OfficialLoganK,Logan.GPT 14 | bobmcgrewai,BOB MCGREW 15 | dswillner,DAVE WILLNER 16 | markchen90,MARK CHEN 17 | c_berner,CHRISTOPHER BERNER 18 | Miles_Brundage,MILES BRUNDAGE 19 | longouyang,LONG OUYANG 20 | crlfq,Ivan Zhang 21 | nickfrosst,Nick Frosst 22 | ashVaswani,Ashish Vaswani 23 | NoamShazeer,Noam Shazeer 24 | nikiparmar09,Niki Parmar 25 | kyosu,Jakob Uszkoreit 26 | aidangomezzz,Aidan Gomez 27 | lukaszkaiser,Lukasz Kaiser 28 | ilblackdragon,Illia Polosukhin 29 | dan_defr,Daniel De Freitas 30 | ThoppilanRomal,Romal Thoppilan 31 | annadgoldie,Anna Goldie 32 | colinraffel,Colin Raffel 33 | sharan0909,Sharan Narang 34 | jacob_devlin,Jacob Devlin 35 | Azaliamirh,Azalia Mirhoseini 36 | ylecun,Yann LeCun 37 | JeffDean,Jeff Dean 38 | DrJimFan,Jim Fan 39 | oran_ge,orange.ai 40 | dotey,宝玉 41 | bearbig,Bear Liu 42 | BaibanbaoNet,白板报 43 | wanglei001,Kenshin 44 | OwenYoungZh,Owen 45 | thinkingjimmy,JimmyWong 46 | decohack,viggo 47 | op7418,歸藏 48 | xicilion,响马 49 | WuPingJu,P.J. Wu 吳秉儒 50 | yetone,yetone 51 | nash_su,nash_su 52 | novoreorx,Reorx 53 | nishuang,倪爽 54 | hzlzh,自力hzlzh 55 | mr_easonyang,Eason Yang 56 | luoleiorg,luolei 57 | yihong0618,yihong0618 58 | jesselaunz,紐村遁一子 59 | Cydiar404,𝗖𝘆𝗱𝗶𝗮𝗿 60 | lxfater,铁锤人 61 | mtrainier2020,雷尼尔 62 | mranti,Michael Anti 63 | JourneymanChina,Journeyman 64 | lewangdev,lewang 65 | vikingmute,Viking 66 | vista8,向阳乔木 67 | browserdotsys 68 | two_dukes 69 | StabilityAI 70 | AndyChenML 71 | Meta_RealityLabs 72 | BrownUniversity 73 | mezaoptimizer 74 | DannyDriess 75 | GoogleAI 76 | sohamxsarkar 77 | brickroad7 78 | SamWolfstone 79 | _jasonwei 80 | MatthewJBar 81 | MJBPredictions 82 | natalia__coelho 83 | dwarkesh_sp 84 | MoritzW42 85 | andriy_mulyar 86 | nearcyan 87 | full_stack_dl 88 | geoffreyhinton 89 | woj_zaremba 90 | charles_irl 91 | weights_biases 92 | Redwood_Neuro 93 | atroyn 94 | trychroma 95 | maithra_raghu 96 | Samaya_AI 97 | entirelyuseles 98 | realGeorgeHotz 99 | DhruvBatraDB 100 | digi_literacy 101 | johnjnay 102 | CodeXStanford 103 | ggerganov 104 | mckaywrigley 105 | CodewandAI 106 | ChatbotUI 107 | ESYudkowsky 108 | _akhaliq 109 | Gradio_HuggingFace 110 | ilyasut 111 | connerruhl -------------------------------------------------------------------------------- /twitter_list_debug.txt: -------------------------------------------------------------------------------- 1 | OfficialLoganK,Logan.GPT 2 | bobmcgrewai,BOB MCGREW 3 | dswillner,DAVE WILLNER 4 | markchen90,MARK CHEN 5 | c_berner,CHRISTOPHER BERNER 6 | Miles_Brundage,MILES BRUNDAGE 7 | longouyang,LONG OUYANG 8 | crlfq,Ivan Zhang 9 | nickfrosst,Nick Frosst 10 | ashVaswani,Ashish Vaswani 11 | NoamShazeer,Noam Shazeer 12 | nikiparmar09,Niki Parmar 13 | kyosu,Jakob Uszkoreit 14 | aidangomezzz,Aidan Gomez --------------------------------------------------------------------------------