├── .gitignore
├── LICENSE
├── README.md
├── cutoff_time2.txt
├── env.txt
├── main.py
├── requirements.txt
├── rss.ipynb
├── twitter_list.txt
└── twitter_list_debug.txt


/.gitignore:
--------------------------------------------------------------------------------
 1 | # Ignore .env file
 2 | .env
 3 | *.venv*
 4 | .ipynb*
 5 | log*
 6 | *debug*
 7 | *example*
 8 | cutoff_time.txt
 9 | 
10 | # Include env.txt file
11 | !env.txt
12 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 |                                  Apache License
  2 |                            Version 2.0, January 2004
  3 |                         http://www.apache.org/licenses/
  4 | 
  5 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 |    1. Definitions.
  8 | 
  9 |       "License" shall mean the terms and conditions for use, reproduction,
 10 |       and distribution as defined by Sections 1 through 9 of this document.
 11 | 
 12 |       "Licensor" shall mean the copyright owner or entity authorized by
 13 |       the copyright owner that is granting the License.
 14 | 
 15 |       "Legal Entity" shall mean the union of the acting entity and all
 16 |       other entities that control, are controlled by, or are under common
 17 |       control with that entity. For the purposes of this definition,
 18 |       "control" means (i) the power, direct or indirect, to cause the
 19 |       direction or management of such entity, whether by contract or
 20 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 21 |       outstanding shares, or (iii) beneficial ownership of such entity.
 22 | 
 23 |       "You" (or "Your") shall mean an individual or Legal Entity
 24 |       exercising permissions granted by this License.
 25 | 
 26 |       "Source" form shall mean the preferred form for making modifications,
 27 |       including but not limited to software source code, documentation
 28 |       source, and configuration files.
 29 | 
 30 |       "Object" form shall mean any form resulting from mechanical
 31 |       transformation or translation of a Source form, including but
 32 |       not limited to compiled object code, generated documentation,
 33 |       and conversions to other media types.
 34 | 
 35 |       "Work" shall mean the work of authorship, whether in Source or
 36 |       Object form, made available under the License, as indicated by a
 37 |       copyright notice that is included in or attached to the work
 38 |       (an example is provided in the Appendix below).
 39 | 
 40 |       "Derivative Works" shall mean any work, whether in Source or Object
 41 |       form, that is based on (or derived from) the Work and for which the
 42 |       editorial revisions, annotations, elaborations, or other modifications
 43 |       represent, as a whole, an original work of authorship. For the purposes
 44 |       of this License, Derivative Works shall not include works that remain
 45 |       separable from, or merely link (or bind by name) to the interfaces of,
 46 |       the Work and Derivative Works thereof.
 47 | 
 48 |       "Contribution" shall mean any work of authorship, including
 49 |       the original version of the Work and any modifications or additions
 50 |       to that Work or Derivative Works thereof, that is intentionally
 51 |       submitted to Licensor for inclusion in the Work by the copyright owner
 52 |       or by an individual or Legal Entity authorized to submit on behalf of
 53 |       the copyright owner. For the purposes of this definition, "submitted"
 54 |       means any form of electronic, verbal, or written communication sent
 55 |       to the Licensor or its representatives, including but not limited to
 56 |       communication on electronic mailing lists, source code control systems,
 57 |       and issue tracking systems that are managed by, or on behalf of, the
 58 |       Licensor for the purpose of discussing and improving the Work, but
 59 |       excluding communication that is conspicuously marked or otherwise
 60 |       designated in writing by the copyright owner as "Not a Contribution."
 61 | 
 62 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 63 |       on behalf of whom a Contribution has been received by Licensor and
 64 |       subsequently incorporated within the Work.
 65 | 
 66 |    2. Grant of Copyright License. Subject to the terms and conditions of
 67 |       this License, each Contributor hereby grants to You a perpetual,
 68 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 69 |       copyright license to reproduce, prepare Derivative Works of,
 70 |       publicly display, publicly perform, sublicense, and distribute the
 71 |       Work and such Derivative Works in Source or Object form.
 72 | 
 73 |    3. Grant of Patent License. Subject to the terms and conditions of
 74 |       this License, each Contributor hereby grants to You a perpetual,
 75 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 76 |       (except as stated in this section) patent license to make, have made,
 77 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 78 |       where such license applies only to those patent claims licensable
 79 |       by such Contributor that are necessarily infringed by their
 80 |       Contribution(s) alone or by combination of their Contribution(s)
 81 |       with the Work to which such Contribution(s) was submitted. If You
 82 |       institute patent litigation against any entity (including a
 83 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 84 |       or a Contribution incorporated within the Work constitutes direct
 85 |       or contributory patent infringement, then any patent licenses
 86 |       granted to You under this License for that Work shall terminate
 87 |       as of the date such litigation is filed.
 88 | 
 89 |    4. Redistribution. You may reproduce and distribute copies of the
 90 |       Work or Derivative Works thereof in any medium, with or without
 91 |       modifications, and in Source or Object form, provided that You
 92 |       meet the following conditions:
 93 | 
 94 |       (a) You must give any other recipients of the Work or
 95 |           Derivative Works a copy of this License; and
 96 | 
 97 |       (b) You must cause any modified files to carry prominent notices
 98 |           stating that You changed the files; and
 99 | 
100 |       (c) You must retain, in the Source form of any Derivative Works
101 |           that You distribute, all copyright, patent, trademark, and
102 |           attribution notices from the Source form of the Work,
103 |           excluding those notices that do not pertain to any part of
104 |           the Derivative Works; and
105 | 
106 |       (d) If the Work includes a "NOTICE" text file as part of its
107 |           distribution, then any Derivative Works that You distribute must
108 |           include a readable copy of the attribution notices contained
109 |           within such NOTICE file, excluding those notices that do not
110 |           pertain to any part of the Derivative Works, in at least one
111 |           of the following places: within a NOTICE text file distributed
112 |           as part of the Derivative Works; within the Source form or
113 |           documentation, if provided along with the Derivative Works; or,
114 |           within a display generated by the Derivative Works, if and
115 |           wherever such third-party notices normally appear. The contents
116 |           of the NOTICE file are for informational purposes only and
117 |           do not modify the License. You may add Your own attribution
118 |           notices within Derivative Works that You distribute, alongside
119 |           or as an addendum to the NOTICE text from the Work, provided
120 |           that such additional attribution notices cannot be construed
121 |           as modifying the License.
122 | 
123 |       You may add Your own copyright statement to Your modifications and
124 |       may provide additional or different license terms and conditions
125 |       for use, reproduction, or distribution of Your modifications, or
126 |       for any such Derivative Works as a whole, provided Your use,
127 |       reproduction, and distribution of the Work otherwise complies with
128 |       the conditions stated in this License.
129 | 
130 |    5. Submission of Contributions. Unless You explicitly state otherwise,
131 |       any Contribution intentionally submitted for inclusion in the Work
132 |       by You to the Licensor shall be under the terms and conditions of
133 |       this License, without any additional terms or conditions.
134 |       Notwithstanding the above, nothing herein shall supersede or modify
135 |       the terms of any separate license agreement you may have executed
136 |       with Licensor regarding such Contributions.
137 | 
138 |    6. Trademarks. This License does not grant permission to use the trade
139 |       names, trademarks, service marks, or product names of the Licensor,
140 |       except as required for reasonable and customary use in describing the
141 |       origin of the Work and reproducing the content of the NOTICE file.
142 | 
143 |    7. Disclaimer of Warranty. Unless required by applicable law or
144 |       agreed to in writing, Licensor provides the Work (and each
145 |       Contributor provides its Contributions) on an "AS IS" BASIS,
146 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 |       implied, including, without limitation, any warranties or conditions
148 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 |       PARTICULAR PURPOSE. You are solely responsible for determining the
150 |       appropriateness of using or redistributing the Work and assume any
151 |       risks associated with Your exercise of permissions under this License.
152 | 
153 |    8. Limitation of Liability. In no event and under no legal theory,
154 |       whether in tort (including negligence), contract, or otherwise,
155 |       unless required by applicable law (such as deliberate and grossly
156 |       negligent acts) or agreed to in writing, shall any Contributor be
157 |       liable to You for damages, including any direct, indirect, special,
158 |       incidental, or consequential damages of any character arising as a
159 |       result of this License or out of the use or inability to use the
160 |       Work (including but not limited to damages for loss of goodwill,
161 |       work stoppage, computer failure or malfunction, or any and all
162 |       other commercial damages or losses), even if such Contributor
163 |       has been advised of the possibility of such damages.
164 | 
165 |    9. Accepting Warranty or Additional Liability. While redistributing
166 |       the Work or Derivative Works thereof, You may choose to offer,
167 |       and charge a fee for, acceptance of support, warranty, indemnity,
168 |       or other liability obligations and/or rights consistent with this
169 |       License. However, in accepting such obligations, You may act only
170 |       on Your own behalf and on Your sole responsibility, not on behalf
171 |       of any other Contributor, and only if You agree to indemnify,
172 |       defend, and hold each Contributor harmless for any liability
173 |       incurred by, or claims asserted against, such Contributor by reason
174 |       of your accepting any such warranty or additional liability.
175 | 
176 |    END OF TERMS AND CONDITIONS
177 | 
178 |    APPENDIX: How to apply the Apache License to your work.
179 | 
180 |       To apply the Apache License to your work, attach the following
181 |       boilerplate notice, with the fields enclosed by brackets "[]"
182 |       replaced with your own identifying information. (Don't include
183 |       the brackets!)  The text should be enclosed in the appropriate
184 |       comment syntax for the file format. We also recommend that a
185 |       file or class name and description of purpose be included on the
186 |       same "printed page" as the copyright notice for easier
187 |       identification within third-party archives.
188 | 
189 |    Copyright [yyyy] [name of copyright owner]
190 | 
191 |    Licensed under the Apache License, Version 2.0 (the "License");
192 |    you may not use this file except in compliance with the License.
193 |    You may obtain a copy of the License at
194 | 
195 |        http://www.apache.org/licenses/LICENSE-2.0
196 | 
197 |    Unless required by applicable law or agreed to in writing, software
198 |    distributed under the License is distributed on an "AS IS" BASIS,
199 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 |    See the License for the specific language governing permissions and
201 |    limitations under the License.
202 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | 
 2 | 指定twitter ID 列表，定时抓取推特内容更新发送给自己的Telegram Bot。
 3 | 
 4 | Use jupyter-lab to send Twitter updates from a list of users to your Telegram bot. 
 5 | 
 6 | 
 7 | ## Fork说明
 8 | 
 9 | - 项目fork自[@joeseesun](https://github.com/joeseesun/AIGC_Telegram_Bot)，感谢原作者带来的推动力和诸多启发！
10 | - 对这个项目感兴趣是因为我认为在不久的将来，用户可获取所有线上公开内容，每人将拥有自己的算法，在全网范围过滤信息和整理信息。付费信息也可以通过便利的市场完成交易。
11 | - 信息的流动将会变得异常高效，而各种互联网平台，只是处理某一类信息的算法公司，用户将不必需要这些平台提供信息的推荐服务。
12 | - 未来，一个视频可以发布在一个图床，然后声明允许各大视频网站抓取，并且提供对应的用户名，就可以实现全网发布。一篇文章，一段声音，也可以用同样的方式全网发布。
13 | - 全网发布，则意味着全网的用户都可以消费，并且使用自己最希望的方式去消费，比如，查看一个视频的总结和关键帧之后再看视频，查询与一篇文章相似的其他文章的合集，为文章添加AI配图之后再去阅读，文字|声音|视频|问答交互|等多模态之间可以自由切换。信息的消费方式将会实现高度的个性化。
14 | 
15 | ## 更新要点
16 | - 改为放在jupyter-lab中运行，可以快速实验一些想法
17 | - 添加了一个文件，用于保存时间戳，来判断需要更新的内容，首次运行，会获取一小时内的内容
18 | - 添加了异步函数，同时请求多个rss源，减少等待时间
19 | - 添加了设置telegram api代理地址的功能，可参考[这里](https://blog.orii.xyz/202301/%E4%BD%BF%E7%94%A8cloudflare-Worker%E4%BB%A3%E7%90%86telegram-bot-api/)
20 | 
21 | 
22 | ## Todo
23 | - rsshub中，可能有跟时间相关的请求格式，带上时间去请求，可以减少数据传输
24 | - 鉴于rsshub有大量信息可订阅，需要一个分类订阅信息的功能，最好能有一个基于本地web端的数据看板和订阅源管理模块
25 | - 订阅的内容，发送到telegram是多种消费方式的一种，不妨喂进去AI模型，先提炼总结下
26 | 
27 | ## 如果显示连接timeout
28 | - 首先`ping api.telegram.org`,看下是否可以连上
29 | - 如果ping不通，可以试下全局翻，并且打开clash的增强模式
30 | - 或者可以使用telegram api代理地址
31 | 
32 | 
33 | ## 使用方法
34 | 
35 | 1. 创建Telegram机器人，获取Token
36 | - 打开 https://t.me/botfather 输入 /start
37 | - 按引导流程，先输入机器人名字，然后输入想要ID（必须以bot结尾），比如telegram_rss_bot
38 | - 创建后会给Token，类似这种结构：5987500169:AAEBqLx7OWmK6ne9pIfHhrgMktDmq_VcsSQ
39 | 
40 | 2. 获取自己的Telegram ID
41 | 打开 https://t.me/userinfobot 输入 \/start，拿到自己的ID，类似结构：1293676963
42 | 
43 | 
44 | 3. 设置Token和Telegram ID
45 | 
46 | - 把Token和Telegram ID 填入env.txt文件，然后把env.txt改名为".env"
47 | - 需要添加telegram api代理地址的，也可以设置在TELEGRAM_API_BASE_URL，防止网络无法连上
48 | - 如果有自己的rss的服务器，比如自建的rsshub服务器地址，也可以设置在RSS_BASE_URL
49 | 
50 | 4. 把 cutoff_time2.txt 改名为 cutoff_time.txt，用于保存时间戳
51 | 
52 | 5. 添加venv，安装依赖程序
53 | ```
54 | python3 -m venv .venv_bot
55 | source .venv_bot/bin/activate
56 | pip install -r requirements.txt
57 | ```
58 | 
59 | 6. 运行程序
60 | ```
61 | jupyter-lab
62 | ```
63 | 之后打开rss.ipynb
64 | 
65 | 7. 如果需要停止程序，在最后出现的输入框中按回车即可
66 | 
67 | 
68 | ## 想自定义关注人？
69 | 修改 twitter_list.txt ，一个一个 twitter ID，逗号分割后面是名字，可自定义（非必须）
70 | 
71 | 
72 | 
73 | 
74 | 
75 | 
76 | 
77 | 


--------------------------------------------------------------------------------
/cutoff_time2.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mobilestack/AIGC_Telegram_Bot/c1026ddf49af4680b0ea96a2a10f29157acf8252/cutoff_time2.txt


--------------------------------------------------------------------------------
/env.txt:
--------------------------------------------------------------------------------
 1 | # ① 创建Telegram机器人，获取Token
 2 | # 打开 https://t.me/botfather 输入 /start
 3 | # 按引导流程，先输入机器人名字，然后输入想要ID（必须以bot结尾），比如telegram_rss_bot
 4 | # 创建后会给Token，类似这种结构：5987500169:AAEBqLx7OWmK6ne9pIfHhrgMktDmq_VcsSQ
 5 | 
 6 | # ② 获取自己的Telegram ID
 7 | # 打开 https://t.me/userinfobot 输入 \/start，拿到自己的ID，类似结构：1293676963
 8 | 
 9 | TOKEN=替换为你的机器人Token
10 | target_chat_id=替换为你的Telegram ID
11 | 
12 | # in case you cannot connect to telegram api
13 | TELEGRAM_API_BASE_URL=https://yourproxy.com/bot
14 | 
15 | # if you have your own rsshub address, can put it here
16 | RSS_BASE_URL=
17 | 
18 | 


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | import time
  3 | import feedparser
  4 | import requests
  5 | from telegram import Bot
  6 | from datetime import datetime
  7 | from bs4 import BeautifulSoup
  8 | from translate import Translator
  9 | from dotenv import load_dotenv
 10 | import os
 11 | import asyncio
 12 | 
 13 | # Set the source and target languages
 14 | source_language = "en"
 15 | target_language = "zh"
 16 | translator = Translator(from_lang=source_language, to_lang=target_language)
 17 | 
 18 | load_dotenv()  # Load environment variables from .env file
 19 | 
 20 | TOKEN = os.getenv("TOKEN")
 21 | target_chat_id = os.getenv("target_chat_id")
 22 | TELEGRAM_API_BASE_URL=os.getenv("TELEGRAM_API_BASE_URL", "https://api.telegram.org/bot")
 23 | 
 24 | bot = Bot(
 25 |     token=TOKEN,
 26 |     base_url=TELEGRAM_API_BASE_URL,
 27 |     )
 28 | 
 29 | rss_list = []
 30 | # 读取文本文件
 31 | with open("twitter_list.txt", "r", encoding="utf-8") as file:
 32 |     lines = file.readlines()
 33 | 
 34 | # 遍历文本里的每一行
 35 | for line in lines:
 36 |     info = line.strip().split(',')
 37 | 
 38 |     # 获取 Twitter ID 和昵称
 39 |     twitter_id = info[0].strip()
 40 |     nickname = info[1].strip() if len(info) > 1 else twitter_id
 41 | 
 42 |     # 构造 URL
 43 |     url = f"http://rss.qiaomu.pro/twitter/user/{twitter_id}"
 44 | 
 45 |     # 添加到 rss_list
 46 |     rss_list.append({
 47 |         "name": nickname,
 48 |         "url": url
 49 |     })
 50 | 
 51 |     # print(f"rss list is {rss_list}")
 52 | 
 53 | def get_latest_twitter_updates(rss_url, last_item_link):
 54 |     response = requests.get(rss_url)
 55 |     rss_content = response.content
 56 |     feed = feedparser.parse(rss_content)
 57 |     
 58 |     latest_items = []
 59 |     for entry in feed["entries"]:
 60 |         if entry["link"] == last_item_link:
 61 |             break
 62 |         latest_items.append(entry)
 63 | 
 64 | 
 65 |     # print(f"content list is {latest_items}")
 66 |         
 67 |     return latest_items
 68 | 
 69 | async def send_update_to_telegram(items):
 70 | 
 71 |     # print(f"in async send, item length is {len(items)}")
 72 | 
 73 |     for item in items:
 74 |         author = item["author"]
 75 |         title = item["title"]
 76 | 
 77 |         # print(f"author is {author}, title is {title}")
 78 | 
 79 |         description_html = item["description"]
 80 |         soup = BeautifulSoup(description_html, 'html.parser')
 81 | 
 82 |         # Convert div with class rsshub-quote
 83 |         rsshub_quotes = soup.find_all('div', class_='rsshub-quote')
 84 |         for rsshub_quote in rsshub_quotes:
 85 |             rsshub_quote.string = f"\n&gt; {rsshub_quote.get_text(separator=' ', strip=True)}\n\n"
 86 |         
 87 |         for br in soup.find_all('br'):
 88 |             br.replace_with('\n')
 89 |         
 90 |         description = "\n".join(soup.stripped_strings)
 91 |         description_zh = translator.translate(description)
 92 | 
 93 | 
 94 |         # Get and send images from the text
 95 |         images = soup.find_all('img', src=True)
 96 |         # 处理图片，单独发送
 97 |         for img in images:
 98 |             await asyncio.to_thread(bot.send_photo, chat_id=target_chat_id, photo=img['src'])
 99 |         # 处理视频，单独发送
100 |         videos = soup.find_all('video', src=True)
101 |         for video in videos:
102 |             video_url = video.get("src")
103 |             await asyncio.to_thread(bot.send_video, chat_id=target_chat_id, video=video_url)
104 | 
105 |         pub_date_parsed = datetime.strptime(item["published"], "%a, %d %b %Y %H:%M:%S %Z")
106 |         pub_date = pub_date_parsed.strftime("%Y-%m-%d %H:%M:%S")
107 |         link = item["link"]
108 | 
109 |         message = (
110 |             f"From {author}:\n\n"
111 |             f"发布时间: {pub_date}\n\n"
112 |             f"{description}\n\n"  
113 |             f"{description_zh}\n\n" 
114 |             f"链接: {link}"
115 |         )
116 | 
117 |         # print(f"message to be sent is {message}")
118 | 
119 |         await asyncio.to_thread(
120 |             bot.send_message, 
121 |             chat_id=target_chat_id, 
122 |             text=message,
123 |             timeout=100,
124 |             )  # Do not use parse_mode="HTML"
125 | 
126 | last_links = [None] * len(rss_list)
127 | interval = 600  # 以秒为单位，根据需要调整RSS检查的频率
128 | 
129 | async def main():
130 |     while True:
131 |         for index, rss_source in enumerate(rss_list):
132 |             latest_items = get_latest_twitter_updates(rss_source["url"], last_links[index])
133 | 
134 |             # print(f"latest_items length is {len(latest_items)}")
135 | 
136 | 
137 |             if latest_items:
138 |                 last_links[index] = latest_items[0]["link"]
139 |                 await send_update_to_telegram(latest_items[::-1]) # Send tweets from oldest to newest
140 |             
141 |             await asyncio.sleep(interval)
142 | 
143 | if __name__ == "__main__":
144 |     asyncio.run(main())
145 | 
146 | 
147 | 
148 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
  1 | aiofiles==22.1.0
  2 | aiohttp==3.8.4
  3 | aiosignal==1.3.1
  4 | aiosqlite==0.19.0
  5 | anyio==3.6.2
  6 | appnope==0.1.3
  7 | APScheduler==3.6.3
  8 | argon2-cffi==21.3.0
  9 | argon2-cffi-bindings==21.2.0
 10 | arrow==1.2.3
 11 | asttokens==2.2.1
 12 | async-timeout==4.0.2
 13 | attrs==23.1.0
 14 | Babel==2.12.1
 15 | backcall==0.2.0
 16 | beautifulsoup4==4.9.3
 17 | bleach==6.0.0
 18 | cachetools==4.2.2
 19 | certifi==2022.12.7
 20 | cffi==1.15.1
 21 | charset-normalizer==2.0.12
 22 | click==8.1.3
 23 | comm==0.1.3
 24 | debugpy==1.6.7
 25 | decorator==5.1.1
 26 | defusedxml==0.7.1
 27 | executing==1.2.0
 28 | fastjsonschema==2.16.3
 29 | feedparser==6.0.8
 30 | fqdn==1.5.1
 31 | frozenlist==1.3.3
 32 | idna==3.4
 33 | ipykernel==6.22.0
 34 | ipython==8.12.0
 35 | ipython-genutils==0.2.0
 36 | isoduration==20.11.0
 37 | jedi==0.18.2
 38 | Jinja2==3.1.2
 39 | json5==0.9.11
 40 | jsonpointer==2.3
 41 | jsonschema==4.17.3
 42 | jupyter-events==0.6.3
 43 | jupyter-ydoc==0.2.4
 44 | jupyter_client==8.2.0
 45 | jupyter_core==5.3.0
 46 | jupyter_server==2.5.0
 47 | jupyter_server_fileid==0.9.0
 48 | jupyter_server_terminals==0.4.4
 49 | jupyter_server_ydoc==0.8.0
 50 | jupyterlab==3.6.3
 51 | jupyterlab-pygments==0.2.2
 52 | jupyterlab_server==2.22.1
 53 | libretranslatepy==2.1.1
 54 | lxml==4.9.2
 55 | MarkupSafe==2.1.2
 56 | matplotlib-inline==0.1.6
 57 | mistune==2.0.5
 58 | multidict==6.0.4
 59 | nbclassic==0.5.5
 60 | nbclient==0.7.3
 61 | nbconvert==7.3.1
 62 | nbformat==5.8.0
 63 | nest-asyncio==1.5.6
 64 | notebook==6.5.4
 65 | notebook_shim==0.2.2
 66 | packaging==23.1
 67 | pandocfilters==1.5.0
 68 | parso==0.8.3
 69 | pexpect==4.8.0
 70 | pickleshare==0.7.5
 71 | platformdirs==3.2.0
 72 | prometheus-client==0.16.0
 73 | prompt-toolkit==3.0.38
 74 | psutil==5.9.4
 75 | ptyprocess==0.7.0
 76 | pure-eval==0.2.2
 77 | pycparser==2.21
 78 | Pygments==2.15.0
 79 | pyrsistent==0.19.3
 80 | python-dateutil==2.8.2
 81 | python-dotenv==1.0.0
 82 | python-json-logger==2.0.7
 83 | python-telegram-bot==13.7
 84 | pytz==2023.3
 85 | pytz-deprecation-shim==0.1.0.post0
 86 | PyYAML==6.0
 87 | pyzmq==25.0.2
 88 | requests==2.28.2
 89 | rfc3339-validator==0.1.4
 90 | rfc3986-validator==0.1.1
 91 | Send2Trash==1.8.0
 92 | sgmllib3k==1.0.0
 93 | six==1.16.0
 94 | sniffio==1.3.0
 95 | soupsieve==2.4.1
 96 | stack-data==0.6.2
 97 | terminado==0.17.1
 98 | tinycss2==1.2.1
 99 | tornado==6.2
100 | traitlets==5.9.0
101 | translate==3.6.1
102 | tzdata==2023.3
103 | tzlocal==4.3
104 | uri-template==1.2.0
105 | urllib3==1.26.15
106 | wcwidth==0.2.6
107 | webcolors==1.13
108 | webencodings==0.5.1
109 | websocket-client==1.5.1
110 | y-py==0.5.9
111 | yarl==1.8.2
112 | ypy-websocket==0.8.2
113 | 


--------------------------------------------------------------------------------
/rss.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "id": "59749f2a-888b-477a-9d36-57e3865a7dab",
  6 |    "metadata": {},
  7 |    "source": [
  8 |     "# Telegram Bot Message From Interested Twitter Users"
  9 |    ]
 10 |   },
 11 |   {
 12 |    "cell_type": "markdown",
 13 |    "id": "06550f52-03bc-4975-b84a-481c543444b4",
 14 |    "metadata": {},
 15 |    "source": [
 16 |     "## Setup Environment"
 17 |    ]
 18 |   },
 19 |   {
 20 |    "cell_type": "code",
 21 |    "execution_count": 1,
 22 |    "id": "a9bc0df0-1f64-44a2-93cb-9aae71887cb5",
 23 |    "metadata": {
 24 |     "tags": []
 25 |    },
 26 |    "outputs": [],
 27 |    "source": [
 28 |     "import time\n",
 29 |     "import feedparser\n",
 30 |     "import requests\n",
 31 |     "from telegram import Bot\n",
 32 |     "from datetime import datetime,timedelta\n",
 33 |     "from bs4 import BeautifulSoup\n",
 34 |     "from translate import Translator\n",
 35 |     "from dotenv import load_dotenv\n",
 36 |     "import os\n",
 37 |     "import asyncio\n",
 38 |     "import aiohttp\n",
 39 |     "from typing import List, Dict\n",
 40 |     "from dateutil import tz, parser"
 41 |    ]
 42 |   },
 43 |   {
 44 |    "cell_type": "markdown",
 45 |    "id": "95c270ff-eaa4-486c-88a7-8798e7d70876",
 46 |    "metadata": {},
 47 |    "source": [
 48 |     "### Environment Variable"
 49 |    ]
 50 |   },
 51 |   {
 52 |    "cell_type": "code",
 53 |    "execution_count": 2,
 54 |    "id": "e43f969f-7408-4fd8-a1a1-fe67dae62074",
 55 |    "metadata": {
 56 |     "tags": []
 57 |    },
 58 |    "outputs": [],
 59 |    "source": [
 60 |     "load_dotenv()  # Load environment variables from .env file\n",
 61 |     "\n",
 62 |     "TOKEN = os.getenv(\"TOKEN\")\n",
 63 |     "target_chat_id = os.getenv(\"target_chat_id\")\n",
 64 |     "TELEGRAM_API_BASE_URL=os.getenv(\"TELEGRAM_API_BASE_URL\", \"https://api.telegram.org/bot\")\n",
 65 |     "RSS_BASE_URL=os.getenv(\"RSS_BASE_URL\", \"http://rsshub.app\")"
 66 |    ]
 67 |   },
 68 |   {
 69 |    "cell_type": "markdown",
 70 |    "id": "3090c386-2560-4d77-81a4-797b45794b3a",
 71 |    "metadata": {},
 72 |    "source": [
 73 |     "### Paths, Constants, Globals"
 74 |    ]
 75 |   },
 76 |   {
 77 |    "cell_type": "code",
 78 |    "execution_count": 3,
 79 |    "id": "9ae74887-8c56-4e55-b663-88ce4a1d92b7",
 80 |    "metadata": {
 81 |     "tags": []
 82 |    },
 83 |    "outputs": [],
 84 |    "source": [
 85 |     "# at least specify the username, name optional\n",
 86 |     "TWITTER_USER_LIST_FILE=\"twitter_list.txt\"\n",
 87 |     "\n",
 88 |     "# here to save the cutoff time\n",
 89 |     "CUTOFF_TIME_FILE=\"cutoff_time.txt\"\n",
 90 |     "\n",
 91 |     "# here to save the log file\n",
 92 |     "LOG_FILE=\"log_file.txt\"\n"
 93 |    ]
 94 |   },
 95 |   {
 96 |    "cell_type": "markdown",
 97 |    "id": "bce4f1e8-6f8b-4a88-9250-2fead553178b",
 98 |    "metadata": {},
 99 |    "source": [
100 |     "- Globals"
101 |    ]
102 |   },
103 |   {
104 |    "cell_type": "code",
105 |    "execution_count": 4,
106 |    "id": "e1f9a7dd-c09e-43c6-b938-7c3c40ff583f",
107 |    "metadata": {
108 |     "tags": []
109 |    },
110 |    "outputs": [],
111 |    "source": [
112 |     "# the newest time from fetched twitter entries\n",
113 |     "# use this to filter newer ones or fetch newer ones \n",
114 |     "newest_time_str = \"\"\n",
115 |     "\n",
116 |     "# one hour?\n",
117 |     "wait_interval = 3600  # in seconds\n",
118 |     "\n",
119 |     "# we are runing async\n",
120 |     "loop = asyncio.get_event_loop()\n"
121 |    ]
122 |   },
123 |   {
124 |    "cell_type": "markdown",
125 |    "id": "c4b4d440-ba16-4c06-b467-34dafde5cd2c",
126 |    "metadata": {},
127 |    "source": [
128 |     "## Read Input"
129 |    ]
130 |   },
131 |   {
132 |    "cell_type": "markdown",
133 |    "id": "5bcb55d8-c69f-4371-ab8b-ec7d7c9cd79d",
134 |    "metadata": {},
135 |    "source": [
136 |     "### Read Twitter User List"
137 |    ]
138 |   },
139 |   {
140 |    "cell_type": "code",
141 |    "execution_count": 5,
142 |    "id": "c260a4f8-d0f6-48d4-9117-26ccba9eca4e",
143 |    "metadata": {
144 |     "tags": []
145 |    },
146 |    "outputs": [],
147 |    "source": [
148 |     "def read_twitter_user_url_list():\n",
149 |     "    with open(TWITTER_USER_LIST_FILE, \"r\", encoding=\"utf-8\") as file:\n",
150 |     "        lines = file.readlines()\n",
151 |     "    \n",
152 |     "    url_list = []\n",
153 |     "    for line in lines:\n",
154 |     "        info = line.strip().split(',')\n",
155 |     "        twitter_id = info[0].strip()\n",
156 |     "        url = f\"{RSS_BASE_URL}/twitter/user/{twitter_id}\"\n",
157 |     "        url_list.append(url)\n",
158 |     "    \n",
159 |     "    print(f\"interested users: {len(url_list)}\\n\")\n",
160 |     "\n",
161 |     "    return url_list"
162 |    ]
163 |   },
164 |   {
165 |    "cell_type": "markdown",
166 |    "id": "04c70ecd-e6f8-4722-9eae-d8f1d137ce20",
167 |    "metadata": {},
168 |    "source": [
169 |     "## Helper Function"
170 |    ]
171 |   },
172 |   {
173 |    "cell_type": "markdown",
174 |    "id": "710cb404-2681-4998-a66d-dc3e7b7bc4ac",
175 |    "metadata": {},
176 |    "source": [
177 |     "### Cutoff Time Helper Function"
178 |    ]
179 |   },
180 |   {
181 |    "cell_type": "code",
182 |    "execution_count": 6,
183 |    "id": "e4aabe42-af9f-4a7d-a371-b0df6ecf3e5c",
184 |    "metadata": {
185 |     "tags": []
186 |    },
187 |    "outputs": [],
188 |    "source": [
189 |     "def read_cutoff_time():\n",
190 |     "    \"\"\"\n",
191 |     "    record the newest time at the bottom\n",
192 |     "    return: datetime object\n",
193 |     "    \"\"\"\n",
194 |     "    try:\n",
195 |     "        with open(CUTOFF_TIME_FILE, \"r\") as f:\n",
196 |     "            lines = f.readlines()\n",
197 |     "    except FileNotFoundError:\n",
198 |     "        return use_yesterday_as_cutoff()\n",
199 |     "        \n",
200 |     "    if len(lines) == 0:\n",
201 |     "        return use_yesterday_as_cutoff()\n",
202 |     "    \n",
203 |     "    # in case you opened this file and hit some enters\n",
204 |     "    stripped_lines = [line for line in lines if len(line.strip()) > 0]\n",
205 |     "    if len(stripped_lines) == 0:\n",
206 |     "        return use_yesterday_as_cutoff()\n",
207 |     "        \n",
208 |     "    cutoff_time = stripped_lines[-1]\n",
209 |     "    \n",
210 |     "    # print(f\"read, cutoff time is {cutoff_time}\")\n",
211 |     "    \n",
212 |     "    try:\n",
213 |     "        # must use the format we defined, strictly\n",
214 |     "        time_converted = parser.parse(cutoff_time)\n",
215 |     "    except:\n",
216 |     "        raise\n",
217 |     "    \n",
218 |     "    # print(f\"read, time converted is {time_converted}\")\n",
219 |     "    \n",
220 |     "    return time_converted\n",
221 |     "\n",
222 |     "def use_yesterday_as_cutoff():\n",
223 |     "    \"\"\"\n",
224 |     "    第一次运行，获取一天前或一小时前的内容，等等，可自定义\n",
225 |     "    Take care of timezone for international twitter users\n",
226 |     "    \"\"\"\n",
227 |     "    local_tz = tz.tzlocal()\n",
228 |     "    now = datetime.now(local_tz)\n",
229 |     "    # fetch contents from 1 hour ago\n",
230 |     "    # or 1 day ago, etc\n",
231 |     "    one_day_ago = now - timedelta(hours=1)\n",
232 |     "    write_cutoff_time(one_day_ago)\n",
233 |     "    return one_day_ago\n",
234 |     "\n",
235 |     "def write_cutoff_time(cutoff_time):\n",
236 |     "    \"\"\"\n",
237 |     "    time: str or datetime object\n",
238 |     "    return: None\n",
239 |     "            time_str write to file\n",
240 |     "    \"\"\"\n",
241 |     "    if isinstance(cutoff_time, str):\n",
242 |     "        # test if format is correct\n",
243 |     "        try:\n",
244 |     "            # if is str and with correct format\n",
245 |     "            # print(f\"cutoff time in write, is str, is {cutoff_time}\")\n",
246 |     "            time_converted = parser.parse(cutoff_time)\n",
247 |     "        except:\n",
248 |     "            raise\n",
249 |     "        \n",
250 |     "        time_str = cutoff_time\n",
251 |     "        \n",
252 |     "    elif isinstance(cutoff_time, datetime):\n",
253 |     "        # must be timezone aware\n",
254 |     "        # already checked this, able to print out timezone, if input has tz\n",
255 |     "        TIME_RECORD_FORMAT=\"%Y-%m-%d %H:%M:%S %Z\"\n",
256 |     "        time_str = cutoff_time.strftime(TIME_RECORD_FORMAT)\n",
257 |     "    else:\n",
258 |     "        raise(\"not str or datetime.datetime\")\n",
259 |     "    \n",
260 |     "    # print(f\"writing, time str is {time_str}\")\n",
261 |     "    \n",
262 |     "    # overwrite everything in the file\n",
263 |     "    with open(CUTOFF_TIME_FILE, \"w\") as f:\n",
264 |     "        f.write(time_str)\n"
265 |    ]
266 |   },
267 |   {
268 |    "cell_type": "markdown",
269 |    "id": "843740d1-c34f-4e2f-bd2a-9fd46227939e",
270 |    "metadata": {},
271 |    "source": [
272 |     "### Time Format Helper Function"
273 |    ]
274 |   },
275 |   {
276 |    "cell_type": "code",
277 |    "execution_count": 7,
278 |    "id": "9cfb64ab-9461-4cbf-ac91-b7a4da925dab",
279 |    "metadata": {
280 |     "tags": []
281 |    },
282 |    "outputs": [],
283 |    "source": [
284 |     "def twitter_rss_time_converter(datetime_string:str) -> datetime:\n",
285 |     "    aware_datetime = parser.parse(datetime_string)\n",
286 |     "    return aware_datetime"
287 |    ]
288 |   },
289 |   {
290 |    "cell_type": "markdown",
291 |    "id": "d1f87441-e373-46a3-9aaf-06214455eb90",
292 |    "metadata": {},
293 |    "source": [
294 |     "### Filter Function"
295 |    ]
296 |   },
297 |   {
298 |    "cell_type": "code",
299 |    "execution_count": 8,
300 |    "id": "632d8a8c-06ed-488a-afd2-aa29aedbf3a8",
301 |    "metadata": {
302 |     "tags": []
303 |    },
304 |    "outputs": [],
305 |    "source": [
306 |     "def filter_sort_twitter_entries(entries):\n",
307 |     "    cutoff_time = read_cutoff_time()\n",
308 |     "\n",
309 |     "    # filtered_entries = list(filter(lambda x: twitter_rss_time_converter(x['published']) > cutoff_time, entries))\n",
310 |     "    # or simpler\n",
311 |     "    \n",
312 |     "    filtered_entries = [x for x in entries if twitter_rss_time_converter(x['published']) > cutoff_time]\n",
313 |     "    \n",
314 |     "    print(f\"Cutoff Time: {cutoff_time}\\n\")\n",
315 |     "    # print([x.published for x in filtered_entries])\n",
316 |     "    if len(filtered_entries) > 0:\n",
317 |     "        print(f\"After Filter, {len(filtered_entries)} items will be sent to bot.\\n\")\n",
318 |     "    \n",
319 |     "    sorted_results = sorted(filtered_entries, key=lambda x: twitter_rss_time_converter(x['published']), reverse=True)\n",
320 |     "    return sorted_results\n"
321 |    ]
322 |   },
323 |   {
324 |    "cell_type": "markdown",
325 |    "id": "03592da4-4679-42d7-aa76-c4fb809f03d7",
326 |    "metadata": {},
327 |    "source": [
328 |     "### Logging"
329 |    ]
330 |   },
331 |   {
332 |    "cell_type": "code",
333 |    "execution_count": 9,
334 |    "id": "7ce844df-1c72-45a0-ad10-5f03ce40cc1d",
335 |    "metadata": {},
336 |    "outputs": [],
337 |    "source": [
338 |     "def log_time():\n",
339 |     "    \"\"\"\n",
340 |     "    todo, might log more info\n",
341 |     "    \"\"\"\n",
342 |     "    with open(LOG_FILE, \"a\") as f:\n",
343 |     "        f.write(f\"now is {datetime.now()}; newest cutoff time is {newest_time_str}\\n\")\n"
344 |    ]
345 |   },
346 |   {
347 |    "cell_type": "markdown",
348 |    "id": "03c9826d-179e-4e10-9037-ddeddfb325cb",
349 |    "metadata": {},
350 |    "source": [
351 |     "## Async Fetching URLs"
352 |    ]
353 |   },
354 |   {
355 |    "cell_type": "markdown",
356 |    "id": "7405810b-948c-4cfd-9530-0b45d1539639",
357 |    "metadata": {},
358 |    "source": [
359 |     "### Async Fetch"
360 |    ]
361 |   },
362 |   {
363 |    "cell_type": "code",
364 |    "execution_count": 10,
365 |    "id": "9ce46e92-ba1c-4761-a091-4727e8d9c00d",
366 |    "metadata": {
367 |     "tags": []
368 |    },
369 |    "outputs": [
370 |     {
371 |      "name": "stdout",
372 |      "output_type": "stream",
373 |      "text": [
374 |       "IPython autoawait is `on`, and set to use `asyncio`\n"
375 |      ]
376 |     }
377 |    ],
378 |    "source": [
379 |     "# with this, able to run event loop in Jupyter\n",
380 |     "%autoawait\n",
381 |     "\n",
382 |     "async def fetch(session, url):\n",
383 |     "    async with session.get(url) as response:\n",
384 |     "        return await response.text()\n",
385 |     "\n",
386 |     "async def fetch_all(urls):\n",
387 |     "    async with aiohttp.ClientSession() as session:\n",
388 |     "        tasks = []\n",
389 |     "        for url in urls:\n",
390 |     "            task = asyncio.ensure_future(fetch(session, url))\n",
391 |     "            tasks.append(task)\n",
392 |     "        responses = await asyncio.gather(*tasks)\n",
393 |     "        return responses\n",
394 |     "    \n",
395 |     "async def fetch_twitter_entries():\n",
396 |     "    # print(f\"async, start fetching\\n\")\n",
397 |     "    urls = read_twitter_user_url_list()\n",
398 |     "    \n",
399 |     "    print(f\"Fetching ...\\n\")\n",
400 |     "\n",
401 |     "    responses = await fetch_all(urls)\n",
402 |     "    # print(f\"After Async Running, Fetched Users: {len(responses)}\\n\")\n",
403 |     "    # print(f\"Fetched Users: {len(responses)}\\n\")\n",
404 |     "\n",
405 |     "    entries = []\n",
406 |     "    for i, response in enumerate(responses):\n",
407 |     "        feed = feedparser.parse(response)\n",
408 |     "        entries += feed.entries\n",
409 |     "    \n",
410 |     "    return entries\n",
411 |     "\n"
412 |    ]
413 |   },
414 |   {
415 |    "cell_type": "markdown",
416 |    "id": "ea05c8f4-a146-4ed9-8cd5-04e08f619f57",
417 |    "metadata": {},
418 |    "source": [
419 |     "## Format Telegram Bot Messages"
420 |    ]
421 |   },
422 |   {
423 |    "cell_type": "markdown",
424 |    "id": "db5eb933-7b3a-4af6-8b25-684798741170",
425 |    "metadata": {},
426 |    "source": [
427 |     "### Bot Message Formatting"
428 |    ]
429 |   },
430 |   {
431 |    "cell_type": "code",
432 |    "execution_count": 11,
433 |    "id": "cc7d0dfe-835b-44e9-b95f-f3fa1be2fa15",
434 |    "metadata": {
435 |     "tags": []
436 |    },
437 |    "outputs": [],
438 |    "source": [
439 |     "def bot_message_from_entrie(item):\n",
440 |     "    author = item[\"author\"]\n",
441 |     "    title = item[\"title\"]\n",
442 |     "    link = item[\"link\"]\n",
443 |     "    pub_date_parsed = parser.parse(item[\"published\"])\n",
444 |     "    description = parse_html_from_rss(item[\"description\"])\n",
445 |     "\n",
446 |     "    message = (\n",
447 |     "        f\"{author}  {pub_date_parsed}\\n\"\n",
448 |     "        f\"{description}\\n\"  \n",
449 |     "        f\"{link}\"\n",
450 |     "    )\n",
451 |     "    \n",
452 |     "    return message\n",
453 |     "\n",
454 |     "def parse_html_from_rss(description_html):\n",
455 |     "    soup = BeautifulSoup(description_html, 'html.parser')\n",
456 |     "    # Convert div with class rsshub-quote\n",
457 |     "    rsshub_quotes = soup.find_all('div', class_='rsshub-quote')\n",
458 |     "    for rsshub_quote in rsshub_quotes:\n",
459 |     "        rsshub_quote.string = f\"\\n&gt; {rsshub_quote.get_text(separator=' ', strip=True)}\\n\\n\"\n",
460 |     "\n",
461 |     "    for br in soup.find_all('br'):\n",
462 |     "        br.replace_with('\\n')\n",
463 |     "\n",
464 |     "    description = \"\\n\".join(soup.stripped_strings)\n",
465 |     "    \n",
466 |     "    return description\n"
467 |    ]
468 |   },
469 |   {
470 |    "cell_type": "markdown",
471 |    "id": "c8643f16-c753-47d8-b4c4-ff9ac7c0785c",
472 |    "metadata": {},
473 |    "source": [
474 |     "## Filter Content and Send Messages"
475 |    ]
476 |   },
477 |   {
478 |    "cell_type": "markdown",
479 |    "id": "c7bc8ffc-4cb3-4261-acf1-9c60ae1f686f",
480 |    "metadata": {},
481 |    "source": [
482 |     "### Filter Twitter Entries"
483 |    ]
484 |   },
485 |   {
486 |    "cell_type": "code",
487 |    "execution_count": 12,
488 |    "id": "7cbb5b8a-4f27-4569-a07f-c47a28ae55f5",
489 |    "metadata": {
490 |     "tags": []
491 |    },
492 |    "outputs": [],
493 |    "source": [
494 |     "async def telegram_message_list_to_send():\n",
495 |     "    \n",
496 |     "    # todo, maybe log twitter account as well\n",
497 |     "    log_time()\n",
498 |     "    \n",
499 |     "    entries = await fetch_twitter_entries()\n",
500 |     "    print(f\"Fetched twitter: {len(entries)}\\n\")\n",
501 |     "                               \n",
502 |     "    filtered_entries = filter_sort_twitter_entries(entries)\n",
503 |     "    \n",
504 |     "    if len(filtered_entries) > 0:\n",
505 |     "        global newest_time_str\n",
506 |     "        newest_time_str = filtered_entries[0]['published']\n",
507 |     "        message_list = [bot_message_from_entrie(x) for x in filtered_entries]\n",
508 |     "        return message_list\n",
509 |     "    else:\n",
510 |     "        # print(f\"no new twitter entry\")\n",
511 |     "        return []\n"
512 |    ]
513 |   },
514 |   {
515 |    "cell_type": "markdown",
516 |    "id": "e7bda862-9b93-40b4-8351-506b40381586",
517 |    "metadata": {},
518 |    "source": [
519 |     "### Send Bot Message"
520 |    ]
521 |   },
522 |   {
523 |    "cell_type": "code",
524 |    "execution_count": 13,
525 |    "id": "32e8f495-562d-4d0e-9b73-d3e335f69098",
526 |    "metadata": {
527 |     "tags": []
528 |    },
529 |    "outputs": [],
530 |    "source": [
531 |     "async def send_to_telegram_bot():\n",
532 |     "    bot = Bot(\n",
533 |     "        token=TOKEN,\n",
534 |     "        base_url=TELEGRAM_API_BASE_URL,\n",
535 |     "    )\n",
536 |     "    \n",
537 |     "    ml = await telegram_message_list_to_send()\n",
538 |     "    \n",
539 |     "    sleep_time_msg = f\"Now sleep time, next run will be after {wait_interval} seconds.\\n\"\n",
540 |     "    \n",
541 |     "    if len(ml) == 0:\n",
542 |     "        # nothing to do\n",
543 |     "        print(f\"No Messages to Send.\\n\")\n",
544 |     "        print(sleep_time_msg)\n",
545 |     "        return\n",
546 |     "    \n",
547 |     "    print(f\"Sending Telegram Bot Messages\\n\")\n",
548 |     "    \n",
549 |     "    # todo, record to log file and later send to AI\n",
550 |     "\n",
551 |     "    global newest_time_str\n",
552 |     "    \n",
553 |     "    try:\n",
554 |     "        for message in ml:\n",
555 |     "            bot.send_message(\n",
556 |     "                chat_id=target_chat_id, \n",
557 |     "                text=message,\n",
558 |     "                timeout=10,\n",
559 |     "            )  \n",
560 |     "    \n",
561 |     "        # update cutoff time\n",
562 |     "        write_cutoff_time(newest_time_str)\n",
563 |     "        print(f\"Cutoff time updated to: {newest_time_str}\\n\")\n",
564 |     "        print(sleep_time_msg)\n",
565 |     "\n",
566 |     "    except:\n",
567 |     "        raise\n",
568 |     "    \n"
569 |    ]
570 |   },
571 |   {
572 |    "cell_type": "markdown",
573 |    "id": "d35e539b-490a-4eb3-83c8-f205380a6029",
574 |    "metadata": {},
575 |    "source": [
576 |     "## Task Management"
577 |    ]
578 |   },
579 |   {
580 |    "cell_type": "markdown",
581 |    "id": "2b435f83-7b53-4439-bb02-a1a526acef92",
582 |    "metadata": {},
583 |    "source": [
584 |     "### Start Task"
585 |    ]
586 |   },
587 |   {
588 |    "cell_type": "code",
589 |    "execution_count": 14,
590 |    "id": "bf780865-6fa7-4216-ab55-9a8746a91f9e",
591 |    "metadata": {
592 |     "tags": []
593 |    },
594 |    "outputs": [],
595 |    "source": [
596 |     "# Here in Jupyter-lab, do not use asyncio.run\n",
597 |     "# this will have conflict with Jupyter\n",
598 |     "# use %autoawait is the solution\n",
599 |     "\n",
600 |     "cancel_event = asyncio.Event()\n",
601 |     "\n",
602 |     "async def main(cancel_event):    \n",
603 |     "    try:\n",
604 |     "        while not cancel_event.is_set():\n",
605 |     "            await send_to_telegram_bot()\n",
606 |     "            await asyncio.sleep(wait_interval)\n",
607 |     "    except asyncio.CancelledError:\n",
608 |     "        print(\"Coroutine cancelled.\")\n",
609 |     "    finally:\n",
610 |     "        print(\"Coroutine stopped. 程序已结束.\")\n"
611 |    ]
612 |   },
613 |   {
614 |    "cell_type": "markdown",
615 |    "id": "ff625652-e2a2-420a-9803-b1a95491e94a",
616 |    "metadata": {},
617 |    "source": [
618 |     "### Cancel Task"
619 |    ]
620 |   },
621 |   {
622 |    "cell_type": "markdown",
623 |    "id": "7429e592-fd38-4c34-814f-8f0fab737f7d",
624 |    "metadata": {},
625 |    "source": [
626 |     "- 在下面出现的输入框中敲击回车，即可停止程序运行\n",
627 |     "- 或者在输入框中输入任何字符后回车，也可停止"
628 |    ]
629 |   },
630 |   {
631 |    "cell_type": "code",
632 |    "execution_count": 15,
633 |    "id": "477ed0ac-7c86-4d3f-ac87-15252b59dcd4",
634 |    "metadata": {
635 |     "tags": []
636 |    },
637 |    "outputs": [
638 |     {
639 |      "name": "stdout",
640 |      "output_type": "stream",
641 |      "text": [
642 |       "Fetching Content ...\n",
643 |       "\n",
644 |       "Fetched Users: 111\n",
645 |       "\n",
646 |       "Fetched Entries: 1980\n",
647 |       "\n",
648 |       "Cutoff Time: 2023-04-18 11:04:09+00:00\n",
649 |       "\n",
650 |       "After Filter, 417 items will be sent to bot.\n",
651 |       "\n",
652 |       "Sending Telegram Bot Messages\n",
653 |       "\n",
654 |       "Cutoff time updated to: Wed, 19 Apr 2023 06:23:25 GMT\n",
655 |       "\n",
656 |       "Now sleep time, next run will be after 3600 seconds.\n",
657 |       "\n"
658 |      ]
659 |     },
660 |     {
661 |      "name": "stdin",
662 |      "output_type": "stream",
663 |      "text": [
664 |       " \n"
665 |      ]
666 |     },
667 |     {
668 |      "name": "stdout",
669 |      "output_type": "stream",
670 |      "text": [
671 |       "Coroutine cancelled.\n",
672 |       "Coroutine stopped. 程序已结束.\n"
673 |      ]
674 |     }
675 |    ],
676 |    "source": [
677 |     "async def cancel_on_keypress(task):\n",
678 |     "    # print(\"Press Enter to cancel the task.\")\n",
679 |     "    await asyncio.to_thread(input)\n",
680 |     "    task.cancel()\n",
681 |     "\n",
682 |     "task = asyncio.create_task(main(cancel_event))\n",
683 |     "cancel_task = asyncio.create_task(cancel_on_keypress(task))\n",
684 |     "\n",
685 |     "try:\n",
686 |     "    await asyncio.gather(task, cancel_task, return_exceptions=True)\n",
687 |     "except asyncio.CancelledError:\n",
688 |     "    pass"
689 |    ]
690 |   }
691 |  ],
692 |  "metadata": {
693 |   "kernelspec": {
694 |    "display_name": "Python 3 (ipykernel)",
695 |    "language": "python",
696 |    "name": "python3"
697 |   },
698 |   "language_info": {
699 |    "codemirror_mode": {
700 |     "name": "ipython",
701 |     "version": 3
702 |    },
703 |    "file_extension": ".py",
704 |    "mimetype": "text/x-python",
705 |    "name": "python",
706 |    "nbconvert_exporter": "python",
707 |    "pygments_lexer": "ipython3",
708 |    "version": "3.11.3"
709 |   },
710 |   "toc-autonumbering": true,
711 |   "toc-showmarkdowntxt": false
712 |  },
713 |  "nbformat": 4,
714 |  "nbformat_minor": 5
715 | }
716 | 


--------------------------------------------------------------------------------
/twitter_list.txt:
--------------------------------------------------------------------------------
  1 | FinanceYF5,Will
  2 | sama,Sam Altman
  3 | AlecRad,Alec Radford
  4 | gdb,Greg Brockman
  5 | ilyasut,Ilya Sutskever
  6 | woj_zaremba,Wojciech Zaremba
  7 | johnschulman2,John Schulman
  8 | janleike,Jan Leike
  9 | karpathy,Andrej Karpathy
 10 | miramurati,Mira Murati
 11 | lilianweng,Lilian Weng
 12 | merettm,Jakub Pachocki
 13 | OfficialLoganK,Logan.GPT
 14 | bobmcgrewai,BOB MCGREW
 15 | dswillner,DAVE WILLNER
 16 | markchen90,MARK CHEN
 17 | c_berner,CHRISTOPHER BERNER
 18 | Miles_Brundage,MILES BRUNDAGE
 19 | longouyang,LONG OUYANG
 20 | crlfq,Ivan Zhang
 21 | nickfrosst,Nick Frosst
 22 | ashVaswani,Ashish Vaswani
 23 | NoamShazeer,Noam Shazeer
 24 | nikiparmar09,Niki Parmar
 25 | kyosu,Jakob Uszkoreit
 26 | aidangomezzz,Aidan Gomez
 27 | lukaszkaiser,Lukasz Kaiser
 28 | ilblackdragon,Illia Polosukhin
 29 | dan_defr,Daniel De Freitas
 30 | ThoppilanRomal,Romal Thoppilan
 31 | annadgoldie,Anna Goldie
 32 | colinraffel,Colin Raffel
 33 | sharan0909,Sharan Narang
 34 | jacob_devlin,Jacob Devlin
 35 | Azaliamirh,Azalia Mirhoseini
 36 | ylecun,Yann LeCun
 37 | JeffDean,Jeff Dean
 38 | DrJimFan,Jim Fan
 39 | oran_ge,orange.ai
 40 | dotey,宝玉
 41 | bearbig,Bear Liu
 42 | BaibanbaoNet,白板报
 43 | wanglei001,Kenshin
 44 | OwenYoungZh,Owen
 45 | thinkingjimmy,JimmyWong
 46 | decohack,viggo
 47 | op7418,歸藏
 48 | xicilion,响马
 49 | WuPingJu,P.J. Wu 吳秉儒
 50 | yetone,yetone
 51 | nash_su,nash_su
 52 | novoreorx,Reorx
 53 | nishuang,倪爽
 54 | hzlzh,自力hzlzh
 55 | mr_easonyang,Eason Yang
 56 | luoleiorg,luolei
 57 | yihong0618,yihong0618
 58 | jesselaunz,紐村遁一子
 59 | Cydiar404,𝗖𝘆𝗱𝗶𝗮𝗿
 60 | lxfater,铁锤人
 61 | mtrainier2020,雷尼尔
 62 | mranti,Michael Anti
 63 | JourneymanChina,Journeyman
 64 | lewangdev,lewang
 65 | vikingmute,Viking
 66 | vista8,向阳乔木
 67 | browserdotsys
 68 | two_dukes
 69 | StabilityAI
 70 | AndyChenML
 71 | Meta_RealityLabs
 72 | BrownUniversity
 73 | mezaoptimizer
 74 | DannyDriess
 75 | GoogleAI
 76 | sohamxsarkar
 77 | brickroad7
 78 | SamWolfstone
 79 | _jasonwei
 80 | MatthewJBar
 81 | MJBPredictions
 82 | natalia__coelho
 83 | dwarkesh_sp
 84 | MoritzW42
 85 | andriy_mulyar
 86 | nearcyan
 87 | full_stack_dl
 88 | geoffreyhinton
 89 | woj_zaremba
 90 | charles_irl
 91 | weights_biases
 92 | Redwood_Neuro
 93 | atroyn
 94 | trychroma
 95 | maithra_raghu
 96 | Samaya_AI
 97 | entirelyuseles
 98 | realGeorgeHotz
 99 | DhruvBatraDB
100 | digi_literacy
101 | johnjnay
102 | CodeXStanford
103 | ggerganov
104 | mckaywrigley
105 | CodewandAI
106 | ChatbotUI
107 | ESYudkowsky
108 | _akhaliq
109 | Gradio_HuggingFace
110 | ilyasut
111 | connerruhl


--------------------------------------------------------------------------------
/twitter_list_debug.txt:
--------------------------------------------------------------------------------
 1 | OfficialLoganK,Logan.GPT
 2 | bobmcgrewai,BOB MCGREW
 3 | dswillner,DAVE WILLNER
 4 | markchen90,MARK CHEN
 5 | c_berner,CHRISTOPHER BERNER
 6 | Miles_Brundage,MILES BRUNDAGE
 7 | longouyang,LONG OUYANG
 8 | crlfq,Ivan Zhang
 9 | nickfrosst,Nick Frosst
10 | ashVaswani,Ashish Vaswani
11 | NoamShazeer,Noam Shazeer
12 | nikiparmar09,Niki Parmar
13 | kyosu,Jakob Uszkoreit
14 | aidangomezzz,Aidan Gomez


--------------------------------------------------------------------------------