├── .dockerignore
├── Dockerfile
├── LICENSE
├── README.md
├── README_ja.md
├── docker-compose.yml
├── img
├── codescraper-1.png
└── codescraper-2.png
├── master
├── log
│ └── .gitignore
├── master_post.py
├── plugins
│ ├── __init__.py
│ ├── edit_conf_db.py
│ └── getCommand.py
├── run.py
├── search_api.py
├── settings
│ └── .gitignore
├── slackbot_settings.py.sample
└── startbot.sh
└── requirements
/.dockerignore:
--------------------------------------------------------------------------------
1 | .git/
2 | master/__pycache__
3 | master/plugins/__pycache__
4 | img/*
5 | LICENSE
6 | README*
7 |
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM python:3.7-alpine
2 | MAINTAINER blue1616
3 |
4 | COPY requirements /root/requirements
5 |
6 | RUN apk upgrade --no-cache && \
7 | apk add --no-cache build-base && \
8 | apk add --no-cache libxml2-dev libxslt-dev && \
9 | pip install -r /root/requirements && \
10 | apk del build-base
11 |
12 | RUN addgroup -g 1000 codescraper && \
13 | adduser -D -u 1000 -G codescraper codescraper && \
14 | mkdir -p /home/codescraper/
15 |
16 | COPY ./master /home/codescraper/master
17 | RUN chown -R codescraper:codescraper /home/codescraper && \
18 | chmod +x /home/codescraper/master/startbot.sh
19 |
20 | USER codescraper
21 | WORKDIR /home/codescraper/master
22 |
23 | #CMD ["python", "/home/codescraper/master/run.py"]
24 | CMD ["/home/codescraper/master/startbot.sh"]
25 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2018 blueblue
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # CodeScraper
2 |
3 | ## Description
4 | CodeScraper is Slackbot that searches sites such as Github, Gitlab, Pastebin with pre-registered keywords and notifies you when it finds new ones.
5 |
6 | Currently the following functions are implemented.
7 |
8 | |Function|Description|Notes|
9 | |---|---|---|
10 | |github|Search Github and find new repositories||
11 | |gist|Search Github and find new Gist||
12 | |github_code|Do Github code search
it depends on Github Indexing|Github API Token is required|
13 | |gitlab|Search Gitlab and find new projects||
14 | |gitlab_snippet|Scraping new posts of Gitlab Snippets|Regular Expression is available.|
15 | |google_custom|Search with Google Custom Search and find web pages|Make Search Engine and set your Engine ID and API Token.
Free Google API has limit of 100 requests per day|
16 | |pastebin|Scraping Pastebin with Scraping API|Pastebin PRO Account(Paid Account) is required.
Regular Expression is available|
17 | |rss_feed|Get RSS Feed
Feed filtering by specific words is available||
18 | |twitter|Search Twitter and find new tweets|||
19 |
20 | ## Requirements
21 | It is supposed to run using Docker
22 |
23 | Otherwise you need MongoDB and Python 3 and the following libraries
24 | - slackbot (https://github.com/lins05/slackbot)
25 | - lxml
26 | - crontab
27 | - feedparser
28 | - python-dateutil
29 | - pymongo
30 | - pyquery
31 |
32 | ## Install
33 | Using Docker
34 |
35 | ```sh
36 | docker-compose build
37 | ```
38 |
39 | Without using Docker
40 |
41 | ```sh
42 | pip3 install -r requirements
43 | ```
44 |
45 | ## Usage
46 | ### Run CodeScrapter
47 | You need slackbot_settings.py.
48 | Create your setting file.
49 |
50 | ```sh
51 | mv ./master/slackbot_settings.py.sample ./master/slackbot_settings.py
52 | vim ./master/slackbot_settings.py
53 | ```
54 |
55 | Edit your config file
56 | - Required settings
57 | - API_TOKEN (Line 6)
58 | - Log in Slack and access [here](https://my.slack.com/services/new/bot). Create your Slackbot.
59 | - Set your Slackbot API Token
60 | - channels(Line 17-19)
61 | - Set slack channels that your slackbot join
62 | - At least one channel must be listed
63 | - Enable Targets(Line 90-99)
64 | - Select whether to enable each search target(True|False)
65 | - Separate settings are required to activate the following targets
66 | - github_code : github_access_token must be set
67 | - pastebin : Pastebin PRO account(Paid Account) and Static IP is required. You have to set your static IP to whitelist for using scraping API
68 | - google_custom : google_custom_api_key and google_custom_search_engine_id must be set
69 | - default_channel (Line 22-30)
70 | - Channel that notifies search results of each target
71 | - It is possible to change for each search keyword
72 | - If string not listing in 'channels' is set, the first channel of 'channels' is specified.
73 |
74 | - Optional settings
75 | - github_access_token(Line 81)
76 | - It requires if github_code is enabled
77 | - Get from [here](https://github.com/settings/tokens)
78 | - google_custom_api_key(Line 86)
79 | - It requires if google_custom is enabled
80 | - Get from [here](https://console.developers.google.com/)
81 | - google_custom_search_engine_id(Line 87)
82 | - It requires if google_custom is enabled
83 | - Get from [here](https://console.developers.google.com/)
84 | - Interval(Line 102-113)
85 | - Set search execution time for each search target
86 | - Set in crontab format
87 | - default_settings(Line 33-77)
88 | - Default settings of earh target
89 | - The contents of each item are as shown in the table below
90 |
91 | |item|Description|
92 | |---|---|
93 | |Enable|Set whether a keyword is enable or not(True|False)|
94 | |SearchLevel|Set search range in github(1|2|3|4) or github_code(1|2). Larger numbers give more results|
95 | |Time_Range|Set Search Time Range in github or gist. Items created before the set number of days are searched|
96 | |Expire_date|Set keyword expiration date. The expiration date will be after the number of days set here, from the date at the time of registration. The keywords that have expired will be invalidated|
97 | |Exclude_list|Notice exclusion list. This setting is unnecessary as scripts automatically rewrites.|
98 | |Channel|Set up the channel to be notified|
99 |
100 | Run with docker-compose.
101 |
102 | ```sh
103 | docker-compose up -d
104 | ```
105 |
106 | After successful launch, Slcak will receive the following notification
107 | > ---CodeScraper Slackbot Started---
108 | >
109 | > github : SUCCESS : Started
110 | > gist : SUCCESS : Started
111 | > gitlab : SUCCESS : Started
112 |
113 | ### CodeScraper Commands
114 | Search keywords are operated by commands via Slackbot
115 |
116 | First, following command displays help
117 |
118 | ```
119 | @{Slackbot name} help:
120 | ```
121 |
122 | ```
123 | Command Format is Following:
124 | {Command}: {target}; {arg1}; {arg2}; ...
125 |
126 | Command List:
127 |
128 | 'setKeyword: target; [word]' Add [word] as New Search Keyword with Default Settings.
129 | (abbreviation=setK:)
130 | 'removeKeyword: target; [index]'tRemove the Search Keyword indicated by [index].
131 | (abbreviation=removeK:)
132 | 'enableKeyword: target; [index]' Enable the Search Keyword indicated by [index].
133 | (abbreviation=enableK:)
134 | 'disableKeyword: target; [index]' Disable the Search Keyword indicated by [index].
135 | (abbreviation=disableK:)
136 | 'setSearchLevel: target; [index]' Set Search Level of Github Search (1:easily 2:) indicated by [index]. It is used in github and github_code.
137 | (abbreviation=setSL:)
138 | 'setExpireDate: target; [index]; [expiration date]' Set a Expiration Date of the Keyword indicated by [index]. [expiration date] Format is YYYY-mm-dd.
139 | (abbreviation=setED:)
140 | 'setChannel: target; [index];[channel]' Set channel to notify the Search Keyword's result.
141 | (abbreviation=setC:)
142 | 'getKeyword: target;' Listing Enabled Search Keywords.
143 | (abbreviation=getK:)
144 | 'getAllKeyword: target;' Listing All Search Keyword (include Disabled Keywords).
145 | (abbreviation=getAllK:)
146 | 'getSearchSetting: target; [index]' Show Setting of the Search Keyword indicated by [index].
147 | (abbreviation=getSS:)
148 |
149 | 'reMatchTest: target; [index]; [text]' Check wheaer the pattern indicated by [index] in [target] matches [text]. If set pattern to Pastebin ID, check the contens of pastebin.
150 | 'setFeed: [name]; [url]' Add RSS Feed to [url] as [name].
151 | (abbreviation=setF:)
152 | 'setFeedFilter: [name]; [filter]' Add new RSS Feed Filter. Notily only contains filter words.
153 | (abbreviation=setFF:)
154 | 'editFeedFilter: [name]; [index]; filter' Edit Feed Filter indicated by [index] in RSS Feed of [name].
155 | (abbreviation=editFF:)
156 | 'removeFeedFilter: [name]; [index];' Remove Feed Filter indicated by [index] in RSS Feed of [name].
157 | (abbreviation=removeFF:)
158 | 'setTwitterQuery: [query]; ([users];)' Set [query] with Default Settings. If set [users], notify only from these users.
159 | (abbreviation=setTQ:)
160 | 'editTwitterQuery: [index]; [query]; ([users];)' Edit Twitter Query indicated by [index].
161 | (abbreviation=editTQ:)
162 | 'addUserTwitterQuery: [index]; [users];' Add User to Twitter Query indicated by [index]. That query notify only from these users.
163 | (abbreviation=addUserTQ:)
164 | 'removeTwitterQuery: [index];' Remove Twitter Query indicated by [index].
165 | (abbreviation=removeTQ:)
166 |
167 | 'help:' Show this Message.
168 |
169 | Target:
170 | github
171 | gist
172 | github_code
173 | gitlab
174 | gitlab_snippet (Use RE match)
175 | google_custom
176 | pastebin (Use RE match)
177 | rss_feed
178 | twitter
179 | ```
180 |
181 | Register Keywords to send commands to Slackbot.
182 |
183 | Commands are below.
184 | See 'help:' command for specific usage.
185 |
186 | |Command|Description|Search targets|
187 | |---|---|---|
188 | |setKeyword:|Register new search keyword|github, gist, github_code, gitlab, gitlab_snippet, google_custom, pastebin|
189 | |removeKeyword:|Remove specified search keyword|github, gist, github_code, gitlab, gitlab_snippet, google_custom, pastebin, twitter|
190 | |enableKeyword:|Enable specified search keyword|all|
191 | |disableKeyword:|Disable specified search keyword|all|
192 | |setSearchLevel:|Set Search Range of specified search keyword|github, github_code|
193 | |setExpireDate:|Set expiration date of specified search keyword|github, gist, github_code, gitlab, gitlab_snippet, google_custom, pastebin|
194 | |setChannel:|Set channel to notify of specified search keyword|all|
195 | |getKeyword:|Display lists of Enabled Keywords|all|
196 | |getAllKeyword:|Display lists of all registered keyword|all|
197 | |getSearchSetting:|Display current settings of specified search keyword|all|
198 | |setFeed:|Set new feed to rss_feed|-|
199 | |setFeedFilter:|Set new Feed Filter|-|
200 | |editFeedFilter:|Edit specified Feed Fileter|-|
201 | |removeFeedFilter:|Remove specified Feed Fileter|-|
202 | |setTwitterQuery:|Set new twitter search query|-|
203 | |editTwitterQuery:|Edit specified twitter search query|-|
204 | |addUserTwitterQuery:|Add user criteria to specified twitter search query|-|
205 | |removeTwitterQuery:|Remove specified twitter search query|-|
206 | |help:|Display help|-|
207 |
208 | The index is assigned to each search keywords. To change the setting, specify the index.
209 | The index of each keyword can be known by keyword registration or by 'getKeyword:' command.
210 |
211 | 
212 |
213 | Notification on finding Github new repository
214 |
215 | 
216 |
217 | ### Notices
218 | - Pastebin
219 | - Pastebin PRO Account(Paid Account) is required.
220 | - Access [here](https://pastebin.com/doc_scraping_api) and set your static IP you run CodeScraper to whitelist
221 | - Static IP is required.
222 | - Pastebin, Gitlab Snippet
223 | - Search keywords containing symbols as regular expressions(e.g. `[a-z1-7]{16}\.onion`, `example.com`). These searches are case sensitive
224 | - keywords with no symbols are not case sensitive.
225 | - Long regular expressions and patterns that take time to process can put a load on the CPU. It is recommended to refrain from it.
226 | - Google Custom Search
227 | - Free Google API has limit of 100 requests per day
228 | - If you set search interval to every 2 hour, you can register at most 8 search keywords (12 * 8 = 96 reqs). Be aware that there is a limit on the number of keywords depending on the frequency of search
229 |
230 | ## Author
231 | [blueblue](https://twitter.com/piedpiper1616)
232 |
--------------------------------------------------------------------------------
/README_ja.md:
--------------------------------------------------------------------------------
1 | # CodeScraper
2 |
3 | ## 概要
4 | Github, Gitlab, Pastebinなどのサイトを事前に登録したキーワードで検索し、新しいものが見つかった際に通知してくれるSlackbotです
5 |
6 | 現在以下の機能が実装されています
7 |
8 | |名前|説明|備考|
9 | |---|---|---|
10 | |github|Github の新規リポジトリを検索します||
11 | |gist|Github Gist の新規投稿を検索します||
12 | |github_code|Github のコード検索を行います
精度は Github のインデックスによります|github apiが必要です|
13 | |gitlab|Gitlab の検索を行います||
14 | |gitlab_snippet|Gitlab Snippetsの新規投稿のスクレイピングを行います|キーワードは正規表現で登録できます|
15 | |google_custom|Google Custom Search を用いて検索を行います|事前にサーチエンジンを作成し、そのEngine ID と API Token を設定する必要があります
無料版は1日100リクエストの制限があります|
16 | |pastebin|Pastebin Scraping API を用いて Pastebin のスクレイピングを行います|利用には Pastebin PRO Account(有償)が必要です
キーワードは正規表現で登録できます|
17 | |rss_feed|RSS Feedを取得します
通知する Feed は特定のワードでフィルタできます||
18 | |twitter|Twitter の検索を行います|||
19 |
20 |
21 | ## Requirements
22 | Docker を利用して、動かすことを想定しています
23 |
24 | そうでない場合、MongoDBとPython3 及び以下のライブラリが必要です
25 | - slackbot (https://github.com/lins05/slackbot)
26 | - lxml
27 | - crontab
28 | - feedparser
29 | - python-dateutil
30 | - pymongo
31 | - pyquery
32 |
33 | ## インストール
34 | Dockerを利用する場合
35 |
36 | ```sh
37 | docker-compose build
38 | ```
39 |
40 | Dockerを利用しない場合
41 |
42 | ```sh
43 | pip3 install -r requirements
44 | ```
45 |
46 | ## 使い方
47 | ### CodeScrapterの起動
48 | 利用するには slackbot_settings.py が必要です
49 | 最初のこのファイルを作成してください
50 |
51 | ```sh
52 | mv ./master/slackbot_settings.py.sample ./master/slackbot_settings.py
53 | vim ./master/slackbot_settings.py
54 | ```
55 |
56 | 設定ファイルを編集します
57 | - 必須項目
58 | - API_TOKEN (6行目)
59 | - Slackにログインしたうえで、 [ここ](https://my.slack.com/services/new/bot) にアクセスし、ボットを作成します
60 | - 作成したSlackボットのAPI Token を記載します
61 | - channels(17〜19行目)
62 | - Slackボットに通知させる Slackチャンネルを指定します
63 | - 上記で作成した Slackボット をここに記載するチャンネルに参加させておいてください
64 | - 最低でも1つ以上のチャンネルを記載しておく必要が有ります
65 | - Enable Targets(90〜99行目)
66 | - 有効化する検索ターゲットを選択してください(True|False)
67 | - 以下のターゲットを有効化するには別途設定が必要です
68 | - github_code : github_access_token の設定が必要です
69 | - pastebin : Pastebin PRO account(有償)と固定IPが必要です. 購入後、Pastebin に対してScraping API を利用する固定IPをホワイトリスト登録するように設定する必要があります
70 | - google_custom : google_custom_api_key と google_custom_search_engine_id の設定が必要になります
71 | - default_channel(22〜30行目)
72 | - 各ターゲットの検索結果を通知するチャンネルです
73 | - 各検索キーワードごとに変更することも可能です
74 | - channels に記載のない文字列が記載されている場合、channels の先頭のチャンネルが指定されます
75 |
76 | - 任意設定
77 | - github_access_token(81行目)
78 | - github_code を有効にした場合必要です
79 | - [ここ](https://github.com/settings/tokens) から取得してください
80 | - google_custom_api_key(86行目)
81 | - google_custom を有効にした場合必要です
82 | - [ここ](https://console.developers.google.com/) から取得してください
83 | - google_custom_search_engine_id(87行目)
84 | - google_custom を有効にした場合必要です
85 | - [ここ](https://console.developers.google.com/) から取得してください
86 | - Interval(102〜113行目)
87 | - 各検索ターゲットの検索実行時間を設定します
88 | - crontab の形式で記載します
89 | - default_settings(33〜77行目)
90 | - 各検索ターゲットにキーワードを登録する際のデフォルトの設定です
91 | - 各項目の内容は以下の表の通りです
92 |
93 | |項目|説明|
94 | |---|---|
95 | |Enable|キーワードの有効無効を設定します(True|False)|
96 | |SearchLevel|検索ターゲット github(1|2|3|4), github_code(1|2) における検索範囲の設定です. 大きい数のほうが多くの結果が得られます|
97 | |Time_Range|検索ターゲット github, gist における検索範囲の日数を設定します. 検索実行日から、ここに設定された日数前以降に作成されたものが検索対象となります|
98 | |Expire_date|キーワードの有効期限を設定します. 有効期限は登録時点に日にちから、ここに設定された日数後となります. 有効期限が切れたキーワードは自動で無効化されます|
99 | |Channel|通知するチャンネルを設定します. default_channel を設定している場合は変更は不要です|
100 |
101 | docker-compose により起動します
102 |
103 | ```sh
104 | docker-compose up -d
105 | ```
106 |
107 | 起動に成功すると、 Slcak に 以下のような通知が来ます
108 | > ---CodeScraper Slackbot Started---
109 | >
110 | > github : SUCCESS : Started
111 | > gist : SUCCESS : Started
112 | > gitlab : SUCCESS : Started
113 |
114 | ### CodeScraper コマンド
115 | 検索のキーワードは Slackボット を通じたコマンドによって操作します
116 |
117 | はじめに、Slackボットに対して、以下のコマンドを送ると、コマンドのヘルプを表示します
118 |
119 | ```
120 | @{Slackbotの名前} help:
121 | ```
122 |
123 | ```
124 | Command Format is Following:
125 | {Command}: {target}; {arg1}; {arg2}; ...
126 |
127 | Command List:
128 |
129 | 'setKeyword: target; [word]' Add [word] as New Search Keyword with Default Settings.
130 | (abbreviation=setK:)
131 | 'removeKeyword: target; [index]'tRemove the Search Keyword indicated by [index].
132 | (abbreviation=removeK:)
133 | 'enableKeyword: target; [index]' Enable the Search Keyword indicated by [index].
134 | (abbreviation=enableK:)
135 | 'disableKeyword: target; [index]' Disable the Search Keyword indicated by [index].
136 | (abbreviation=disableK:)
137 | 'setSearchLevel: target; [index]' Set Search Level of Github Search (1:easily 2:) indicated by [index]. It is used in github and github_code.
138 | (abbreviation=setSL:)
139 | 'setExpireDate: target; [index]; [expiration date]' Set a Expiration Date of the Keyword indicated by [index]. [expiration date] Format is YYYY-mm-dd.
140 | (abbreviation=setED:)
141 | 'setChannel: target; [index];[channel]' Set channel to notify the Search Keyword's result.
142 | (abbreviation=setC:)
143 | 'getKeyword: target;' Listing Enabled Search Keywords.
144 | (abbreviation=getK:)
145 | 'getAllKeyword: target;' Listing All Search Keyword (include Disabled Keywords).
146 | (abbreviation=getAllK:)
147 | 'getSearchSetting: target; [index]' Show Setting of the Search Keyword indicated by [index].
148 | (abbreviation=getSS:)
149 |
150 | 'reMatchTest: target; [index]; [text]' Check wheaer the pattern indicated by [index] in [target] matches [text]. If set pattern to Pastebin ID, check the contens of pastebin.
151 | 'setFeed: [name]; [url]' Add RSS Feed to [url] as [name].
152 | (abbreviation=setF:)
153 | 'setFeedFilter: [name]; [filter]' Add new RSS Feed Filter. Notily only contains filter words.
154 | (abbreviation=setFF:)
155 | 'editFeedFilter: [name]; [index]; filter' Edit Feed Filter indicated by [index] in RSS Feed of [name].
156 | (abbreviation=editFF:)
157 | 'removeFeedFilter: [name]; [index];' Remove Feed Filter indicated by [index] in RSS Feed of [name].
158 | (abbreviation=removeFF:)
159 | 'setTwitterQuery: [query]; ([users];)' Set [query] with Default Settings. If set [users], notify only from these users.
160 | (abbreviation=setTQ:)
161 | 'editTwitterQuery: [index]; [query]; ([users];)' Edit Twitter Query indicated by [index].
162 | (abbreviation=editTQ:)
163 | 'addUserTwitterQuery: [index]; [users];' Add User to Twitter Query indicated by [index]. That query notify only from these users.
164 | (abbreviation=addUserTQ:)
165 | 'removeTwitterQuery: [index];' Remove Twitter Query indicated by [index].
166 | (abbreviation=removeTQ:)
167 |
168 | 'help:' Show this Message.
169 |
170 | Target:
171 | github
172 | gist
173 | github_code
174 | gitlab
175 | gitlab_snippet (Use RE match)
176 | google_custom
177 | pastebin (Use RE match)
178 | rss_feed
179 | twitter
180 | ```
181 |
182 | Slackボットに対してコマンドを送ることで、キーワードを登録していきます
183 |
184 | コマンドの以下の通りです.
185 | 具体的な利用方法は help: コマンドを参照してください
186 |
187 | |コマンド名|説明|有効な検索ターゲット|
188 | |---|---|---|
189 | |setKeyword:|新しいキーワードを登録します|github, gist, github_code, gitlab, gitlab_snippet, google_custom, pastebin|
190 | |removeKeyword:|指定したキーワードを削除します|github, gist, github_code, gitlab, gitlab_snippet, google_custom, pastebin, twitter|
191 | |enableKeyword:|指定したキーワードを有効化します|すべて|
192 | |disableKeyword:|指定したキーワードを無効化します|すべて|
193 | |setSearchLevel:|指定したキーワードの検索範囲を設定します|github, github_code|
194 | |setExpireDate:|指定したキーワードの有効期限を設定します|github, gist, github_code, gitlab, gitlab_snippet, google_custom, pastebin|
195 | |setChannel:|指定したキーワードを通知するSlackチャンネルを指定します|すべて|
196 | |getKeyword:|設定されている検索キーワードのうち有効化されているものの一覧を表示します|すべて|
197 | |getAllKeyword:|設定されている検索キーワードの一覧を表示します|すべて|
198 | |getSearchSetting:|指定したキーワードの現在の設定を表示します|すべて|
199 | |setFeed:|rss_feed に新たな Feed を登録します|-|
200 | |setFeedFilter:|Feed の通知フィルタを設定します|-|
201 | |editFeedFilter:|Feed の通知フィルタを編集します|-|
202 | |removeFeedFilter:|Feed の通知フィルタを削除します|-|
203 | |setTwitterQuery:|twitter に新たな検索クエリを登録します|-|
204 | |editTwitterQuery:|twitter 検索クエリを編集します|-|
205 | |addUserTwitterQuery:|twitter 検索クエリにユーザ条件を追加します|-|
206 | |removeTwitterQuery:|twitter 検索クエリを削除します|-|
207 | |help:|ヘルプを表示します|-|
208 |
209 | 登録されたキーワードには Index が振られます。設定の変更には Index を指定します。
210 | 各キーワードの Index はキーワードの登録時、もしくは getKeyword: コマンドによって知ることができます。
211 |
212 | 
213 |
214 | Github 新しいリポジトリを見つけた際の通知
215 |
216 | 
217 |
218 | ### 注意点
219 | - Pastebin
220 | - Pastebin PRO Account(有償)が必要です
221 | - 購入後、[このページ](https://pastebin.com/doc_scraping_api) にて、CodeScraperを動かすホストのIPをホワイトリスト登録してください
222 | - ホワイトリスト登録が必要なため、固定IPが必要です
223 | - Pastebin, Gitlab Snippet
224 | - 記号を含むキーワードは正規表現として検索します(例:`[a-z1-7]{16}\.onion`, `example.com`). これらは大文字小文字を区別します
225 | - 記号を含まないキーワードは大文字小文字を区別せずに、キーワードマッチを行います
226 | - 長い正規表現や処理に時間がかかるパターンはCPUに負荷をかける可能性があるため、控えましょう
227 | - Google Custom Search
228 | - 無料アカウントで利用できるリクエスト数には制限があり、1日100リクエストまでとなっています
229 | - 2時間おきに検索を行う設定とした場合、1日に12回検索を行います. 登録キーワードが9個を超えると、 12 * 9 = 108 となり、制限回数を超過します. 検索の頻度によって登録できるキーワードの数に制限があることを認識してください
230 |
231 | ## Author
232 | [blueblue](https://twitter.com/piedpiper1616)
233 |
--------------------------------------------------------------------------------
/docker-compose.yml:
--------------------------------------------------------------------------------
1 | version: "2"
2 | services:
3 | codescraper:
4 | image: codescraper:alpine
5 | build: .
6 | restart: always
7 | environment:
8 | DB_HOST: cs-db
9 | DB_PORT: 27017
10 | links:
11 | - cs-db
12 | volumes:
13 | - $PWD/master/slackbot_settings.py:/home/codescraper/master/slackbot_settings.py
14 | cs-db:
15 | image: mongo
16 | restart: always
17 | volumes:
18 | - db-contents:/data/db
19 | volumes:
20 | db-contents:
21 | driver: local
22 |
--------------------------------------------------------------------------------
/img/codescraper-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/blue1616/CodeScraper/5e27c51ea645c7fb48fc3caaf290a75c8ee067ec/img/codescraper-1.png
--------------------------------------------------------------------------------
/img/codescraper-2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/blue1616/CodeScraper/5e27c51ea645c7fb48fc3caaf290a75c8ee067ec/img/codescraper-2.png
--------------------------------------------------------------------------------
/master/log/.gitignore:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/blue1616/CodeScraper/5e27c51ea645c7fb48fc3caaf290a75c8ee067ec/master/log/.gitignore
--------------------------------------------------------------------------------
/master/master_post.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 |
3 | from slacker import Slacker
4 | import slackbot_settings
5 | import traceback
6 |
7 | def postNewPoCFound(word, repos, channel):
8 | url = 'https://github.com'
9 | slack = Slacker(slackbot_settings.API_TOKEN)
10 | try:
11 | slack.chat.post_message(
12 | channel,
13 | 'New Code Found about `' + word + '` at _github_',
14 | as_user=True
15 | )
16 | message = ''
17 | for r in repos:
18 | message += url + '/' + r + '/\n'
19 | slack.chat.post_message(
20 | channel,
21 | message,
22 | as_user=True
23 | )
24 | except:
25 | print("Could not send slack notification.")
26 | print(traceback.format_exc())
27 |
28 | def postAnyData(word, channel):
29 | slack = Slacker(slackbot_settings.API_TOKEN)
30 | try:
31 | slack.chat.post_message(
32 | channel,
33 | word,
34 | as_user=True
35 | )
36 | except:
37 | print("Could not send slack notification.")
38 | print(traceback.format_exc())
39 |
40 | #if __name__ == '__main__':
41 | # slack = Slacker(slackbot_settings.API_TOKEN)
42 | # slack.chat.post_message(
43 | # 'bot_test',
44 | # 'Hello. I\'m Master',
45 | # as_user=True
46 | # )
47 |
--------------------------------------------------------------------------------
/master/plugins/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/blue1616/CodeScraper/5e27c51ea645c7fb48fc3caaf290a75c8ee067ec/master/plugins/__init__.py
--------------------------------------------------------------------------------
/master/plugins/edit_conf_db.py:
--------------------------------------------------------------------------------
1 | import json
2 | import datetime
3 | import shutil
4 | import time
5 | import re
6 | import os.path
7 | from pymongo import MongoClient
8 |
9 | client = None
10 | db = None
11 | collection = None
12 | collection_channel = None
13 |
14 | modules = [
15 | 'github',
16 | 'gist',
17 | 'github_code',
18 | 'gitlab',
19 | 'gitlab_snippet',
20 | 'google_custom',
21 | 'pastebin',
22 | 'rss_feed',
23 | 'twitter'
24 | ]
25 |
26 | github_settings = {
27 | 'Target':'github',
28 | 'Enable':True,
29 | 'SearchLevel':1,
30 | 'Time_Range':1,
31 | 'Expire_date':1,
32 | 'Exclude_list':[],
33 | 'Channel':'None',
34 | '__MOD_ENABLE__' : True,
35 | '__INITIAL__' : True,
36 | '__SAFETY__': 0
37 | }
38 |
39 | gist_settings = {
40 | 'Target':'gist',
41 | 'Enable':True,
42 | 'Time_Range':1,
43 | 'Expire_date':1,
44 | 'Exclude_list':[],
45 | 'Channel':'None',
46 | '__MOD_ENABLE__' : True,
47 | '__INITIAL__' : True,
48 | '__SAFETY__': 0
49 | }
50 |
51 | github_code_settings = {
52 | 'Target':'github_code',
53 | 'Enable':True,
54 | 'SearchLevel':1,
55 | 'Expire_date':1,
56 | 'Exclude_list':[],
57 | 'Channel':'None',
58 | '__MOD_ENABLE__' : True,
59 | '__INITIAL__' : True,
60 | '__SAFETY__': 0
61 | }
62 |
63 | gitlab_settings = {
64 | 'Target':'gitlab',
65 | 'Enable':True,
66 | 'Expire_date':1,
67 | 'Exclude_list':[],
68 | 'Channel':'None',
69 | '__MOD_ENABLE__' : True,
70 | '__INITIAL__' : True,
71 | '__SAFETY__': 0
72 | }
73 |
74 | gitlab_snippet_settings = {
75 | 'Target':'gitlab_snippet',
76 | 'Enable':True,
77 | 'Expire_date':1,
78 | 'Exclude_list':[],
79 | 'Channel':'None',
80 | '__MOD_ENABLE__' : True,
81 | '__INITIAL__' : True,
82 | '__SAFETY__': 0
83 | }
84 |
85 | pastebin_settings = {
86 | 'Target':'pastebin',
87 | 'Enable':True,
88 | 'Expire_date':1,
89 | 'Exclude_list':[],
90 | 'Channel':'None',
91 | '__MOD_ENABLE__' : True,
92 | '__INITIAL__' : True,
93 | '__SAFETY__': 0
94 | }
95 |
96 | google_custom_settings = {
97 | 'Target':'google_custom',
98 | 'Enable':True,
99 | 'Expire_date':1,
100 | 'Exclude_list':[],
101 | 'Channel':'None',
102 | '__MOD_ENABLE__' : True,
103 | '__INITIAL__' : True,
104 | '__SAFETY__': 0
105 | }
106 |
107 | rss_feed_settings = {
108 | 'Target':'rss_feed',
109 | 'Enable' : True,
110 | 'Name' : 'None',
111 | 'URL' : 'None',
112 | 'Filters' : [],
113 | 'Channel' : 'None',
114 | 'Last_Post' : {'title':'None', 'link':'None', 'timestamp':'1970-01-01 00:00:00'},
115 | '__MOD_ENABLE__' : True,
116 | '__INITIAL__' : True,
117 | '__SAFETY__': 0
118 | }
119 |
120 | twitter_settings = {
121 | 'Target':'twitter',
122 | 'Enable' : True,
123 | 'Query' : 'None',
124 | 'Users' : [],
125 | 'Channel' : 'None',
126 | 'Last_Post' : {'user':'None', 'text':'None', 'id':'None', 'link':'None'},
127 | '__MOD_ENABLE__' : True,
128 | '__INITIAL__' : True,
129 | '__SAFETY__': 0
130 | }
131 |
132 | setting_set = {
133 | 'github':github_settings,
134 | 'gist':gist_settings,
135 | 'github_code':github_code_settings,
136 | 'gitlab':gitlab_settings,
137 | 'gitlab_snippet':gitlab_snippet_settings,
138 | 'google_custom':google_custom_settings,
139 | 'pastebin':pastebin_settings,
140 | 'rss_feed':rss_feed_settings,
141 | 'twitter':twitter_settings
142 | }
143 |
144 | def setDB(BD_HOST, DB_PORT, DB_NAME):
145 | global client
146 | global db
147 | global collection
148 | global collection_channel
149 | client = MongoClient(BD_HOST, DB_PORT)
150 | db = client[DB_NAME]
151 | collection = db['keywords']
152 | collection_channel = db['channels']
153 |
154 |
155 | def setUsingChannels(channels):
156 | collection_channel.remove();
157 | if type(channels) == list:
158 | collection_channel.insert({'channels': channels})
159 | return True
160 | else:
161 | False
162 |
163 | def getUsingChannels():
164 | channels = collection_channel.find()[0]
165 | return channels['channels']
166 |
167 | def setDefaultSettings(target, default_dict):
168 | if target in modules:
169 | channels = getUsingChannels()
170 | setter = setting_set[target]
171 | for k in default_dict.keys():
172 | if k in setter.keys():
173 | if type(default_dict[k]) != type(setter[k]):
174 | return False
175 | if k == 'SearchLevel' and not default_dict['SearchLevel'] in [1,2,3,4]:
176 | return False
177 | if k == 'Channel' and not default_dict['Channel'] in channels:
178 | default_dict['Channel'] = channels[0]
179 | setter[k] = default_dict[k]
180 | setter['Index'] = 0
181 | setter['KEY'] = '__DEFAULT_SETTING__'
182 | collection.update({
183 | 'Target':target,
184 | 'KEY':'__DEFAULT_SETTING__'
185 | }, setter, upsert=True)
186 | return True
187 | else:
188 | return None
189 |
190 | def isEnable(target):
191 | if target in modules:
192 | data = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'})
193 | if data.count() != 0 and data[0]['__MOD_ENABLE__'] == True:
194 | return True
195 | else:
196 | return False
197 | else:
198 | return None
199 |
200 | def disable(target):
201 | if target in modules:
202 | data = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'})
203 | if data.count() != 0:
204 | updatedata = data[0]
205 | updatedata['__MOD_ENABLE__'] = False
206 | collection.update({
207 | 'Target':target,
208 | 'KEY':'__DEFAULT_SETTING__'
209 | }, updatedata)
210 | return True
211 | else:
212 | False
213 | else:
214 | return None
215 |
216 | def setNewKeyword(target, word):
217 | if target in modules:
218 | default = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'})[0]
219 | default['KEY'] = word
220 | del default['__MOD_ENABLE__'], default['_id'], default['__SAFETY__']
221 | if '__SEARCHEDPASTES__' in default.keys():
222 | del default['__SEARCHEDPASTES__']
223 | if 'Expire_date' in default:
224 | today = datetime.date.today()
225 | limit = (today + datetime.timedelta(int(default['Expire_date']))).strftime('%Y-%m-%d')
226 | default['Expire_date'] = limit
227 | replacedata = collection.find({'Target':target, 'KEY':word})
228 | if replacedata.count() == 0:
229 | index = collection.find({'Target':target}).sort('Index', -1)[0]['Index'] + 1
230 | default['Index'] = index
231 | collection.insert(default)
232 | return index
233 | else:
234 | index = collection.find({'Target':target, 'KEY':word})[0]['Index']
235 | default['Index'] = index
236 | collection.update({
237 | 'Target':target,
238 | 'KEY':word
239 | }, default)
240 | return index * -1
241 | else:
242 | return None
243 |
244 | def removeKeyword(target, index):
245 | ret = None
246 | if target != 'rss_feed' and target in modules:
247 | data = collection.find({'Target':target, 'Index':index})
248 | if data.count() != 0:
249 | key = data[0]['KEY']
250 | collection.remove({'Target':target, 'Index':index})
251 | ret = key
252 | return ret
253 |
254 | def getAllState():
255 | result = {}
256 | for m in modules:
257 | default = collection.find({'Target':m, 'KEY':'__DEFAULT_SETTING__'})
258 | if default.count() == 0:
259 | result[m] = False
260 | else:
261 | result[m] = default[0]['__MOD_ENABLE__']
262 | return result
263 |
264 | def enableKeywordSetting(target, index, enable):
265 | if target in modules:
266 | if target == 'rss_feed':
267 | data = collection.find({'Target':target, 'Name':index})
268 | else:
269 | data = collection.find({'Target':target, 'Index':index})
270 | if data.count() != 0:
271 | if target == 'rss_feed':
272 | word = data[0]['Name']
273 | collection.update({'Target':target, 'Name':index}, {'$set': {'Enable': enable}})
274 | else:
275 | word = data[0]['KEY']
276 | collection.update({'Target':target, 'Index':index}, {'$set': {'Enable': enable}})
277 | return word
278 | else:
279 | return None
280 | else:
281 | None
282 |
283 | def setSearchLevel(target, index, level):
284 | if target in modules:
285 | data = collection.find({'Target':target, 'Index':index})
286 | if data.count() != 0 and 'SearchLevel' in data[0].keys():
287 | word = data[0]['KEY']
288 | collection.update({'Target':target, 'Index':index}, {'$set': {'SearchLevel': level}})
289 | return word
290 | else:
291 | return None
292 | else:
293 | return None
294 |
295 | def setSearchRange(target, index, days):
296 | if target in modules:
297 | data = collection.find({'Target':target, 'Index':index})
298 | if data.count() != 0 and 'Time_Range' in data[0].keys():
299 | word = data[0]['KEY']
300 | collection.update({'Target':target, 'Index':index}, {'$set': {'Time_Range': days}})
301 | return word
302 | else:
303 | return None
304 | else:
305 | return None
306 |
307 | def setExpireDate(target, index, limit):
308 | if target in modules:
309 | data = collection.find({'Target':target, 'Index':index})
310 | regx = '\d{4}-(0[0-9]|1[0-2])-([0-2][0-9]|3[01])'
311 | if data.count() != 0 and 'Expire_date' in data[0].keys() and re.match(regx, limit):
312 | word = data[0]['KEY']
313 | collection.update({'Target':target, 'Index':index}, {'$set': {'Expire_date': limit}})
314 | return word
315 | else:
316 | return None
317 | else:
318 | return None
319 |
320 | def setChannel(target, index, channel):
321 | if target in modules:
322 | if target == 'rss_feed':
323 | data = collection.find({'Target':target, 'Name':index})
324 | else:
325 | data = collection.find({'Target':target, 'Index':index})
326 | channels = getUsingChannels()
327 | if data.count() != 0 and channel in channels:
328 | if target == 'rss_feed':
329 | word = data[0]['Name']
330 | collection.update({'Target':target, 'Name':index}, {'$set': {'Channel': channel}})
331 | return word
332 | else:
333 | word = data[0]['KEY']
334 | collection.update({'Target':target, 'Index':index}, {'$set': {'Channel': channel}})
335 | return word
336 | return word
337 | else:
338 | return None
339 | else:
340 | None
341 |
342 | def addExcludeList(target, index, exclude):
343 | if target in modules:
344 | data = collection.find({'Target':target, 'Index':index})
345 | if data.count() != 0 and 'Exclude_list' in data[0].keys():
346 | word = data[0]['KEY']
347 | newlist = data[0]['Exclude_list'] + exclude
348 | collection.update({'Target':target, 'Index':index}, {'$set': {'Exclude_list': newlist}})
349 | return word
350 | else:
351 | return None
352 | else:
353 | return None
354 |
355 | def clearExcludeList(target, index):
356 | if target in modules:
357 | data = collection.find({'Target':target, 'Index':index})
358 | if data.count() != 0 and 'Exclude_list' in data[0].keys():
359 | word = data[0]['KEY']
360 | collection.update({'Target':target, 'Index':index}, {'$set': {'Exclude_list': []}})
361 | return word
362 | else:
363 | return None
364 | else:
365 | return None
366 |
367 | def getKeywords(target):
368 | if target in modules:
369 | # data = list(collection.find({'Target':target}))
370 | data = list(collection.find({"$and": [{"Target": target}, {"KEY": {"$ne": '__DEFAULT_SETTING__'}}]}).sort('Index'))
371 | return data
372 | else:
373 | return None
374 |
375 | def getEnableKeywords(target):
376 | if target in modules:
377 | data = list(collection.find({"$and": [{"Target": target}, {"KEY": {"$ne": '__DEFAULT_SETTING__'}}, {'Enable': True}]}).sort('Index'))
378 | return data
379 | else:
380 | return None
381 |
382 | def getKeyword(target, index):
383 | if target in modules:
384 | if target == 'rss_feed':
385 | data = collection.find({'Target':target, 'Name':index})
386 | else:
387 | data = collection.find({'Target':target, 'Index':index})
388 | if data.count() == 0:
389 | return None
390 | else:
391 | return data[0]
392 | else:
393 | return None
394 |
395 | def getSafetyCount(target):
396 | if target in modules:
397 | data = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'})
398 | if data.count() != 0:
399 | return data[0]['__SAFETY__']
400 | else:
401 | None
402 | else:
403 | return None
404 |
405 | def setSafetyCount(target, count):
406 | if target in modules:
407 | data = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'})
408 | if data.count() != 0:
409 | collection.update({'Target':target, 'KEY':'__DEFAULT_SETTING__'}, {'$set': {'__SAFETY__': count}})
410 | return True
411 | else:
412 | False
413 | else:
414 | return None
415 |
416 | def setSearchedPastes(pastelist):
417 | target = 'pastebin'
418 | data = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'})
419 | if data.count() != 0:
420 | collection.update({'Target':target, 'KEY':'__DEFAULT_SETTING__'}, {'$set': {'__SEARCHEDPASTES__': list(pastelist)}})
421 | return True
422 | else:
423 | False
424 |
425 | def getSearchedPastes():
426 | target = 'pastebin'
427 | data = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'})
428 | if data.count() != 0:
429 | if '__SEARCHEDPASTES__' in data[0].keys():
430 | return data[0]['__SEARCHEDPASTES__']
431 | else:
432 | return []
433 | else:
434 | False
435 |
436 | def setNewRSSFeed(name, url):
437 | target = 'rss_feed'
438 | ret = False
439 | data = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'})
440 | if data.count() != 0:
441 | if collection.count({'Target':target, 'Name':name}) == 0:
442 | default = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'})[0]
443 | default['Name'] = name
444 | default['URL'] = url
445 | del default['__MOD_ENABLE__'], default['_id'], default['__SAFETY__'], default['KEY']
446 | collection.insert(default)
447 | ret = True
448 | return ret
449 |
450 | def setNewRSSFilter(name, words, channel):
451 | target = 'rss_feed'
452 | data = collection.find({'Target':target, 'Name':name})
453 | ret = None
454 | if data.count() != 0:
455 | filters = data[0]['Filters']
456 | index = 0
457 | filter = {}
458 | for f in filters:
459 | if index < f['Index']:
460 | index = f['Index']
461 | filter['Index'] = index
462 | filter['Channel'] = data[0]['Channel']
463 | if channel in getUsingChannels():
464 | filter['Channel'] = channel
465 | filter['Words'] = words
466 | filters.append(filter)
467 | ret = index
468 | collection.update({'Target':target, 'Name':name}, {'$set': {'Filters': filters}})
469 | return ret
470 |
471 | def editRSSFilter(name, index, words, channel):
472 | target = 'rss_feed'
473 | data = collection.find({'Target':target, 'Name':name})
474 | if data.count() != 0:
475 | filters = data[0]['Filters']
476 | filter = {}
477 | i = 0
478 | for f in filters:
479 | if index == f['Index']:
480 | filter = {'Index' : f['Index'],
481 | 'Channel' : f['Channel'],
482 | 'Words' : f['Words']}
483 | if channel != '':
484 | filter['Channel'] = channel
485 | if words != []:
486 | filter['Words'] = words
487 | filters[i] = filter
488 | ret = True
489 | break
490 | i += 1
491 | collection.update({'Target':target, 'Name':name}, {'$set': {'Filters': filters}})
492 | return True
493 | return False
494 |
495 | def removeRSSFilter(name, index):
496 | target = 'rss_feed'
497 | data = collection.find({'Target':target, 'Name':name})
498 | ret = None
499 | if data.count() != 0:
500 | filters = data[0]['Filters']
501 | i = 0
502 | for f in filters:
503 | if index == f['Index']:
504 | ret = f['Words']
505 | filters.remove(f)
506 | break
507 | collection.update({'Target':target, 'Name':name}, {'$set': {'Filters': filters}})
508 | return ret
509 |
510 | def haveSearched(target, name):
511 | keyword = 'rss_feed'
512 | if target == 'rss_feed':
513 | data = collection.find({'Target':target, 'Name':name})
514 | else:
515 | data = collection.find({'Target':target, 'Index':name})
516 | if data.count() != 0:
517 | if target == 'rss_feed':
518 | collection.update({'Target':target, 'Name':name}, {'$set': {'__INITIAL__': False}})
519 | else:
520 | collection.update({'Target':target, 'Index':name}, {'$set': {'__INITIAL__': False}})
521 | return True
522 | return False
523 |
524 | def setRSSLastPost(name, post):
525 | target = 'rss_feed'
526 | data = collection.find({'Target':target, 'Name':name})
527 | if data.count() != 0:
528 | collection.update({'Target':target, 'Name':name}, {'$set': {'Last_Post': post}})
529 | return True
530 | return False
531 |
532 | def setNewTwitterQuery(query, users):
533 | target = 'twitter'
534 | ret = False
535 | data = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'})
536 | if data.count() != 0:
537 | default = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'})[0]
538 | del default['__MOD_ENABLE__'], default['_id'], default['__SAFETY__'], default['KEY']
539 | default['Query'] = query
540 | default['Users'] = users
541 | if users != []:
542 | default['KEY'] = query + ' Users: ' + ', '.join(users)
543 | else:
544 | default['KEY'] = query
545 | index = collection.find({'Target':target}).sort('Index', -1)[0]['Index'] + 1
546 | default['Index'] = index
547 | collection.insert(default)
548 | ret = index
549 | return ret
550 |
551 | def editTwitterQuery(index, query, users):
552 | target = 'twitter'
553 | ret = None
554 | data = collection.find({'Target':target, 'Index':index})
555 | if data.count() != 0:
556 | tq = data[0]
557 | if query != '':
558 | collection.update({'Target':target, 'Index':index}, {'$set': {'Query': query}})
559 | ret = True
560 | else:
561 | query = tq['Query']
562 | if users != []:
563 | collection.update({'Target':target, 'Index':index}, {'$set': {'Users': users}})
564 | ret = True
565 | else:
566 | users = tq['Users']
567 | if ret:
568 | key = query + ' User: ' + ', '.join(users)
569 | collection.update({'Target':target, 'Index':index}, {'$set': {'KEY': key}})
570 | ret = key
571 | return ret
572 |
573 | def addUserToTwitterQuery(index, users):
574 | target = 'twitter'
575 | ret = False
576 | data = collection.find({'Target':target, 'Index':index})
577 | if data.count() != 0:
578 | tq = data[0]
579 | if users != []:
580 | newlist = list(set(tq['Users'] + users))
581 | collection.update({'Target':target, 'Index':index}, {'$set': {'Users': newlist}})
582 | key = tq['Query'] + ' User: ' + ', '.join(newlist)
583 | collection.update({'Target':target, 'Index':index}, {'$set': {'KEY': key}})
584 | ret = key
585 | return ret
586 |
587 | def setTwitterLastPost(index, post):
588 | target = 'twitter'
589 | data = collection.find({'Target':target, 'Index':index})
590 | if data.count() != 0:
591 | collection.update({'Target':target, 'Index':index}, {'$set': {'Last_Post': post}})
592 | return True
593 | return False
594 |
--------------------------------------------------------------------------------
/master/plugins/getCommand.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | from slackbot.bot import respond_to
3 | from slackbot.bot import listen_to
4 | import re
5 | from . import edit_conf_db as ec
6 | import os.path
7 | import requests
8 | import feedparser
9 | import lxml.html
10 |
11 | targets = [
12 | 'github',
13 | 'github_code',
14 | 'gist',
15 | 'gitlab',
16 | 'gitlab_snippet',
17 | 'pastebin',
18 | 'google_custom',
19 | 'rss_feed',
20 | 'twitter'
21 | ]
22 |
23 | def getPostData(keyword, index, target):
24 | post_data = ''
25 | if index > 0:
26 | post_data = 'Set New Search Keyword : `{keyword}` (index : {index}) in _{target}_'.format(keyword=keyword, index=abs(index), target=target)
27 | elif index < 0:
28 | post_data = 'Initialize Search Keyword : `{keyword}` (index : {index}) in _{target}_'.format(keyword=keyword, index=abs(index), target=target)
29 | else:
30 | post_data = 'Error has Occured'
31 | return post_data
32 |
33 | def getEnabledTargets():
34 | etargets = []
35 | state = ec.getAllState()
36 | for name,enable in state.items():
37 | if enable:
38 | etargets.append(name)
39 | return etargets
40 |
41 | @respond_to('setKeyword: (.*)')
42 | @respond_to('setK: (.*)')
43 | def setKeyword(message, params):
44 | target = ''
45 | enabled = getEnabledTargets()
46 | targets.append('all')
47 | for t in targets:
48 | if params.strip().startswith(t + ';'):
49 | target = t
50 | params = params.replace(t + ';', '', 1)
51 | break
52 | if target in enabled or target == 'all':
53 | word = params.split(';')[0].strip()
54 | post_data = ''
55 | if word == '':
56 | post_data = 'Please Put a Word'
57 | else:
58 | setter = [
59 | 'github',
60 | 'github_code',
61 | 'gist',
62 | 'gitlab'
63 | ]
64 | if target in enabled:
65 | ret = ec.setNewKeyword(target, word)
66 | post_data = getPostData(word, ret, target)
67 | elif target == 'all':
68 | enabled = list(set(enabled) & set(setter))
69 | for s in enabled:
70 | ret = ec.setNewKeyword(s, word)
71 | if ret != 0:
72 | if post_data != '':
73 | post_data += '\n'
74 | post_data += getPostData(word, ret, s)
75 | else:
76 | post_data = 'Invalid Target'
77 | ret = message._client.webapi.chat.post_message(
78 | message._body['channel'],
79 | post_data,
80 | as_user=True,
81 | )
82 |
83 | @respond_to('removeKeyword: (.*)')
84 | @respond_to('removeK: (.*)')
85 | def removeKeyword(message, params):
86 | target = ''
87 | enabled = getEnabledTargets()
88 | for t in targets:
89 | if params.strip().startswith(t + ';'):
90 | target = t
91 | params = params.replace(t + ';', '', 1)
92 | break
93 | post_data = ''
94 | if target in enabled:
95 | params = params.split(';')[0].strip()
96 | if params.isdigit():
97 | index = int(params.strip())
98 | ret = ec.removeKeyword(target, index)
99 | if ret != None:
100 | post_data = '`{key}`(index : {index}) is removed in _{target}_'.format(key=ret, index=str(index), target=target)
101 | else:
102 | post_data = 'No Data'
103 | else:
104 | post_data = 'Please Put Index of the Keyword'
105 | else:
106 | post_data = 'Invalid Target'
107 | ret = message._client.webapi.chat.post_message(
108 | message._body['channel'],
109 | post_data,
110 | as_user=True,
111 | )
112 |
113 | @respond_to('disableKeyword: (.*)')
114 | @respond_to('disableK: (.*)')
115 | def disableKeyword(message, params):
116 | target = ''
117 | enabled = getEnabledTargets()
118 | for t in targets:
119 | if params.strip().startswith(t + ';'):
120 | target = t
121 | params = params.replace(t + ';', '', 1)
122 | break
123 | if target in enabled:
124 | params = params.split(';')[0]
125 | if (target != 'rss_feed' and params.strip().isdigit()) or target == 'rss_feed':
126 | if target != 'rss_feed':
127 | index = int(params.strip())
128 | else:
129 | index = params.strip()
130 | ret = ec.enableKeywordSetting(target, index, False)
131 | if ret == None:
132 | post_data = 'No Data'
133 | else:
134 | post_data = '`{keyword}` is disabled in _{target}_'.format(keyword=ret, target=target)
135 | else:
136 | post_data = 'Invalid Target'
137 | ret = message._client.webapi.chat.post_message(
138 | message._body['channel'],
139 | post_data,
140 | as_user=True,
141 | )
142 |
143 | @respond_to('enableKeyword: (.*)')
144 | @respond_to('enableK: (.*)')
145 | def enableKeyword(message, params):
146 | target = ''
147 | enabled = getEnabledTargets()
148 | for t in targets:
149 | if params.strip().startswith(t + ';'):
150 | target = t
151 | params = params.replace(t + ';', '', 1)
152 | break
153 | if target in enabled:
154 | params = params.split(';')[0]
155 | if (target != 'rss_feed' and params.strip().isdigit()) or target == 'rss_feed':
156 | if target != 'rss_feed':
157 | index = int(params.strip())
158 | else:
159 | index = params.strip()
160 | ret = ec.enableKeywordSetting(target, index, True)
161 | if ret == None:
162 | post_data = 'No Data'
163 | else:
164 | post_data = '`{keyword}` is enabled in _{target}_'.format(keyword=ret, target=target)
165 | else:
166 | post_data = 'Please Put Index of the Word'
167 | else:
168 | post_data = 'Invalid Target'
169 | ret = message._client.webapi.chat.post_message(
170 | message._body['channel'],
171 | post_data,
172 | as_user=True,
173 | )
174 |
175 | @respond_to('setSearchLevel: (.*)')
176 | @respond_to('setSL: (.*)')
177 | def setSearchLevel(message, params):
178 | target = ''
179 | enabled = getEnabledTargets()
180 | for t in targets:
181 | if params.strip().startswith(t + ';'):
182 | target = t
183 | params = params.replace(t + ';', '', 1)
184 | break
185 | words = params.strip().split(';')
186 | valid_targets = [
187 | 'github',
188 | 'github_code'
189 | ]
190 | if target in valid_targets and target in enabled:
191 | if words[0].strip().isdigit():
192 | index = int(words[0].strip())
193 | if len(words) > 1:
194 | if words[1].strip().isdigit():
195 | valid_num = [1, 2, 3, 4]
196 | if int(words[1].strip()) in valid_num:
197 | ret = ec.setSearchLevel(target, index, int(words[1].strip()))
198 | if ret == '':
199 | post_data = 'No Data'
200 | else:
201 | post_data = 'Set `{keyword}` Search Level to {level}'.format(keyword=ret, level=words[1].strip())
202 | else:
203 | post_data = 'Invalid Search Level'
204 | else:
205 | post_data = 'Please Put Index of the Word'
206 | else:
207 | post_data = 'Parameter Shortage'
208 | else:
209 | post_data = 'Please Put Index of the Word'
210 | else:
211 | post_data = 'Invalid Target'
212 | ret = message._client.webapi.chat.post_message(
213 | message._body['channel'],
214 | post_data,
215 | as_user=True,
216 | )
217 |
218 | @respond_to('setSearchTimeRange: (.*)')
219 | @respond_to('setSTR: (.*)')
220 | def setSearchTimeRange(message, params):
221 | enabled = getEnabledTargets()
222 | target = ''
223 | for t in targets:
224 | if params.strip().startswith(t + ';'):
225 | target = t
226 | params = params.replace(t + ';', '', 1)
227 | break
228 | words = params.strip().split(';')
229 | valid_targets = [
230 | 'github',
231 | 'gist'
232 | ]
233 | if target in valid_targets and target in enabled:
234 | if words[0].strip().isdigit():
235 | index = int(words[0].strip())
236 | if len(words) > 1:
237 | if words[1].strip().isdigit():
238 | ret = ec.setSearchRange(target, index, int(words[1].strip()))
239 | if ret == '':
240 | post_data = 'No Data'
241 | else:
242 | post_data = '`{keyword}` serach in _{target}_ in last {range} days'.format(keyword=ret, target=target, range=words[1].strip())
243 | else:
244 | post_data = 'Parameter Shortage'
245 | else:
246 | post_data = 'Please Put Index of the Word'
247 | else:
248 | post_data = 'Invalid Target'
249 | ret = message._client.webapi.chat.post_message(
250 | message._body['channel'],
251 | post_data,
252 | as_user=True,
253 | )
254 |
255 | @respond_to('setExpireDate: (.*)')
256 | @respond_to('setED: (.*)')
257 | def setExpireDate(message, params):
258 | enabled = getEnabledTargets()
259 | target = ''
260 | for t in targets:
261 | if params.strip().startswith(t + ';'):
262 | target = t
263 | params = params.replace(t + ';', '', 1)
264 | break
265 | words = params.strip().split(';')
266 | if 'rss_feed' in enabled:
267 | enabled.remove('rss_feed')
268 | if target in enabled:
269 | if words[0].strip().isdigit():
270 | index = int(words[0].strip())
271 | if len(words) > 1:
272 | regx = '\d{4}-(0[0-9]|1[0-2])-([0-2][0-9]|3[01])'
273 | if re.match(regx, words[1].strip()):
274 | ret = ec.setExpireDate(target, index, words[1].strip())
275 | if ret == '':
276 | post_data = 'No Data'
277 | else:
278 | post_data = '`{keyword}` in _{target}_ will expire at {date}'.format(keyword=ret, target=target, date=words[1].strip())
279 | else:
280 | post_data = 'Parameter Pattern not Match'
281 | else:
282 | post_data = 'Parameter Shortage'
283 | else:
284 | post_data = 'Please Put Index of the Word'
285 | else:
286 | post_data = 'Invalid Target'
287 | ret = message._client.webapi.chat.post_message(
288 | message._body['channel'],
289 | post_data,
290 | as_user=True,
291 | )
292 |
293 | @respond_to('setChannel: (.*)')
294 | @respond_to('setC: (.*)')
295 | def setChannel(message, params):
296 | base = os.path.dirname(os.path.abspath(__file__))
297 | channelfile = os.path.normpath(os.path.join(base, '../settings/channellist'))
298 | enabled = getEnabledTargets()
299 | target = ''
300 | for t in targets:
301 | if params.strip().startswith(t + ';'):
302 | target = t
303 | params = params.replace(t + ';', '', 1)
304 | break
305 | if target in enabled:
306 | words = params.strip().split(';')
307 | if (target != 'rss_feed' and words[0].strip().isdigit()) or target == 'rss_feed':
308 | if target != 'rss_feed':
309 | index = int(words[0].strip())
310 | else:
311 | index = words[0].strip()
312 | if len(words) > 1:
313 | channels = ec.getUsingChannels()
314 | if words[1].strip() in channels:
315 | ret = ec.setChannel(target, index, words[1].strip())
316 | if ret == '':
317 | post_data = 'No Data'
318 | else:
319 | post_data = '`{keyword}` result in _{target}_ will notify at {channel}'.format(keyword=ret, target=target, channel=words[1].strip())
320 | else:
321 | post_data = 'Parameter Pattern not Match'
322 | else:
323 | post_data = 'Parameter Shortage'
324 | else:
325 | post_data = 'Please Put Index of the Word'
326 | else:
327 | post_data = 'Invalid Target'
328 | ret = message._client.webapi.chat.post_message(
329 | message._body['channel'],
330 | post_data,
331 | as_user=True,
332 | )
333 |
334 | @respond_to('addExcludeList: (.*)')
335 | @respond_to('addEL: (.*)')
336 | def addExcludeList(message, params):
337 | target = ''
338 | for t in targets:
339 | if params.strip().startswith(t + ';'):
340 | target = t
341 | params = params.replace(t + ';', '', 1)
342 | break
343 | words = params.strip().split(';')
344 | valid_targets = [
345 | 'github',
346 | 'github_code',
347 | 'gist',
348 | 'gitlab',
349 | 'gitlab_snippet',
350 | 'google_custom'
351 | ]
352 | enabled = list(set(getEnabledTargets()) & set(valid_targets))
353 | if target in enabled:
354 | if words[0].strip().isdigit():
355 | index = int(words[0].strip())
356 | if len(words) > 1:
357 | for word in words[1:]:
358 | if word != '':
359 | ret = ec.addExcludeList(target, index, word)
360 | if ret == '':
361 | post_data = 'No Data'
362 | break
363 | else:
364 | post_data = 'Add {words} in Exclude List of `{keyword}` in _{target}_'.format(words=','.join(words[1:]), keyword=ret, target=target)
365 | else:
366 | post_data = 'No Data'
367 | else:
368 | post_data = 'Parameter Shortage'
369 | else:
370 | post_data = 'Please Put Index of the Word'
371 | else:
372 | post_data = 'Invalid Target'
373 | ret = message._client.webapi.chat.post_message(
374 | message._body['channel'],
375 | post_data,
376 | as_user=True,
377 | )
378 |
379 | @respond_to('clearExcludeList: (.*)')
380 | @respond_to('clearEL: (.*)')
381 | def clearExcludeList(message, params):
382 | for t in targets:
383 | if params.strip().startswith(t + ';'):
384 | target = t
385 | params = params.replace(t + ';', '', 1)
386 | break
387 | valid_targets = [
388 | 'github',
389 | 'github_code',
390 | 'gist',
391 | 'gitlab',
392 | 'gitlab_snippet',
393 | 'google_custom'
394 | ]
395 | enabled = list(set(getEnabledTargets()) & set(valid_targets))
396 | params = params.split(';')[0]
397 | if target in enabled:
398 | if params.strip().isdigit():
399 | index = int(params.strip())
400 | ret = ec.clearExcludeList(target, index)
401 | if ret == None:
402 | post_data = 'No Data'
403 | else:
404 | post_data = 'Delete All Exclude List of `{keyword}` in _{target}_'.format(keyword=ret, target= target)
405 | else:
406 | post_data = 'Please Put Index of the Word'
407 | else:
408 | post_data = 'Invalid Target'
409 | ret = message._client.webapi.chat.post_message(
410 | message._body['channel'],
411 | post_data,
412 | as_user=True,
413 | )
414 |
415 | @respond_to('getKeyword: (.*)')
416 | @respond_to('getK: (.*)')
417 | def getKeyword(message, params):
418 | post_data = ''
419 | target = 'all'
420 | targets.append('all')
421 | for t in targets:
422 | if params.strip().startswith(t + ';'):
423 | target = t
424 | params = params.replace(t + ';', '', 1)
425 | break
426 | enabled = getEnabledTargets()
427 | if target in enabled or target == 'all':
428 | for g in enabled:
429 | if target == g or target == 'all':
430 | keys = ec.getEnableKeywords(g)
431 | if keys != []:
432 | if g == 'rss_feed':
433 | post_data += '-- Enabled RSS Feeds --\n'
434 | for k in keys:
435 | post_data += str(k['Name']) + ' : `' + k['URL'] + '`\n'
436 | else:
437 | post_data += '-- Enabled Search Keyword in _{target}_ --\n'.format(target=g)
438 | for k in keys:
439 | post_data += str(k['Index']) + ' : `' + k['KEY'] + '`\n'
440 | else:
441 | post_data = 'Invalid Target'
442 | if post_data == '':
443 | post_data = 'I don\'t have any data yet'
444 | ret = message._client.webapi.chat.post_message(
445 | message._body['channel'],
446 | post_data,
447 | as_user=True,
448 | )
449 |
450 | @respond_to('getAllKeyword: (.*)')
451 | @respond_to('getAllK: (.*)')
452 | def getAllKeyword(message, params):
453 | post_data = ''
454 | target = 'all'
455 | targets.append('all')
456 | for t in targets:
457 | if params.strip().startswith(t + ';'):
458 | target = t
459 | params = params.replace(t + ';', '', 1)
460 | break
461 | enabled = getEnabledTargets()
462 | if target in enabled or target == 'all':
463 | for g in enabled:
464 | if target == g or target == 'all':
465 | keys = ec.getKeywords(g)
466 | if keys != []:
467 | if g == 'rss_feed':
468 | post_data += '-- Enabled RSS Feeds --'
469 | for k in keys:
470 | post_data += str(k['Name']) + ' : `' + k['URL'] + '`\n'
471 | else:
472 | post_data += '-- Enabled Search Keyword in _{target}_ --\n'.format(target=g)
473 | for k in keys:
474 | post_data += str(k['Index']) + ' : `' + k['KEY'] + '`\n'
475 | else:
476 | post_data = 'Invalid Target'
477 | if post_data == '':
478 | post_data = 'I don\'t have any data yet'
479 | ret = message._client.webapi.chat.post_message(
480 | message._body['channel'],
481 | post_data,
482 | as_user=True,
483 | )
484 |
485 | @respond_to('getSearchSetting: (.*)')
486 | @respond_to('getSS: (.*)')
487 | def getKeywordSetting(message, params):
488 | post_data = ''
489 | target = ''
490 | for t in targets:
491 | if params.strip().startswith(t + ';'):
492 | target = t
493 | params = params.replace(t + ';', '', 1)
494 | break
495 | params = params.split(';')[0]
496 | enabled = getEnabledTargets()
497 | if target in enabled:
498 | if target == 'rss_feed':
499 | keyword = ec.getKeyword(target, params.strip())
500 | if keyword == None:
501 | post_data = 'No Data'
502 | else:
503 | conf_params = [
504 | 'Enable',
505 | 'URL',
506 | 'Filters',
507 | 'Channel',
508 | 'Last_Post'
509 | ]
510 | post_data = 'Settings of `' + keyword['Name'] + '` is :\n```'
511 | for p in conf_params:
512 | if p == 'Filters':
513 | v = ''
514 | for f in keyword['Filters']:
515 | v += '\n\tINDEX: ' + str(f['Index']) + '\n'
516 | v += '\tWORDS: ' + ', '.join(f['Words']) + '\n'
517 | v += '\tCHANNEL: ' + f['Channel']
518 | else:
519 | v = keyword[p]
520 | post_data += p.upper().replace('_', ' ') + ': ' + str(v) + '\n'
521 | post_data += '```'
522 | else:
523 | if params.strip().isdigit():
524 | index = int(params.strip())
525 | keyword = ec.getKeyword(target, index)
526 | if keyword == None:
527 | post_data = 'No Data'
528 | else:
529 | conf_params = [
530 | 'Index',
531 | 'Enable',
532 | 'Query',
533 | 'Users',
534 | 'SearchLevel',
535 | 'Time_Range',
536 | 'Expire_date',
537 | 'Channel',
538 | 'Last_Post'
539 | ]
540 | post_data = 'Settings of `' + keyword['KEY'] + '` is :\n```'
541 | for p in conf_params:
542 | if p in keyword.keys():
543 | v = keyword[p]
544 | post_data += p.upper().replace('_', ' ') + ': ' + str(v) + '\n'
545 | post_data += '```'
546 | else:
547 | post_data = 'Please Put Index of the Word'
548 | else:
549 | post_data = 'Invalid Target'
550 | ret = message._client.webapi.chat.post_message(
551 | message._body['channel'],
552 | post_data,
553 | as_user=True,
554 | )
555 |
556 | def isMatched(word, text):
557 | symbols = r'[!\"#$%&\'()*+,\-./:;<=>@\[\]^_{|}~\\]'
558 | re_symbol = re.compile(symbols)
559 | repatt = False
560 | matched = True
561 | if re.search(re_symbol, word):
562 | if re.search(patt, text):
563 | matched = True
564 | else:
565 | matched = False
566 | else:
567 | for p in word.split(' '):
568 | if text.lower().find(p.lower()) < 0:
569 | matched = False
570 | return matched
571 |
572 | @respond_to('reMatchTest: (.*)')
573 | def reMatchTest(message, params):
574 | post_data = ''
575 | target = 'pastebin'
576 | for t in targets:
577 | if params.strip().startswith(t + ';'):
578 | target = t
579 | params = params.replace(t + ';', '', 1)
580 | break
581 | if target == 'pastebin' or target == 'gitlab_snippet':
582 | words = params.strip().split(';')
583 | if words[0].strip().isdigit():
584 | index = int(words[0].strip())
585 | if len(words) > 1:
586 | candidatelist = ec.getKeywordlist(target)
587 | if index in candidatelist.values():
588 | key = ''
589 | for k,v in candidatelist.items():
590 | if v == index:
591 | key = k
592 | break
593 | word = words[1].strip()
594 | post_data = ''
595 | if re.match('[a-zA-Z0-9]{8}', word):
596 | post_data += 'Found Pastebin id pattern.\n'
597 | word = 'https://pastebin.com/raw/' + word
598 | raw_result = requests.get(word, timeout=10)
599 | if raw_result.status_code == 200:
600 | if isMatched(key, raw_result.text):
601 | post_data += 'The pattern, `{keyword}` match to contents of {url}'.format(keyword=key, url=word)
602 | else:
603 | post_data += 'The pattern, `{keyword}` not match to contents of {url}'.format(keyword=key, url=word)
604 | else:
605 | post_data += 'I couldn\'t access to {url}'.format(url=word)
606 | else:
607 | if isMatched(key, raw_result.text):
608 | post_data += 'The pattern, `{keyword}` match'.format(keyword=key)
609 | else:
610 | post_data += 'The pattern, `{keyword}` not match'.format(keyword=key)
611 | else:
612 | post_data = 'No Data'
613 | else:
614 | post_data = 'Parameter Shortage'
615 | else:
616 | post_data = 'Please Put Index of the Word'
617 | else:
618 | post_data = 'Invalid Target'
619 | ret = message._client.webapi.chat.post_message(
620 | message._body['channel'],
621 | post_data,
622 | as_user=True,
623 | )
624 |
625 | def checkRSSUrl(url):
626 | try:
627 | headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:50.0) Gecko/20100101 Firefox/50.0'}
628 | response = requests.get(url, timeout=4, headers=headers)
629 | rss = feedparser.parse(response.text)
630 | rssurl = None
631 | if rss['version'] == 'rss10' or rss['version'] == 'rss20' or rss['version'] == 'atom10':
632 | rssurl = url
633 | else:
634 | root = lxml.html.fromstring(response.text)
635 | for link in root.xpath('//link[@type="application/rss+xml"]'):
636 | url = link.get('href')
637 | rss = feedparser.parse(url)
638 | if rss['version'] == 'rss10' or rss['version'] == 'rss20' or rss['version'] == 'atom10':
639 | rssurl = url
640 | return rssurl
641 | except:
642 | return None
643 |
644 | @respond_to('setFeed: (.*)')
645 | @respond_to('setF: (.*)')
646 | def setNewFeed(message, params):
647 | target = ''
648 | enabled = getEnabledTargets()
649 | if 'rss_feed' in enabled:
650 | name = params.split(';')[0].strip()
651 | post_data = ''
652 | if name == '':
653 | post_data = 'Please Put a Name'
654 | else:
655 | url = ''
656 | if len(params) > params.find(';')+1:
657 | url = params[params.find(';')+1:].strip()
658 | if url == '':
659 | post_data = 'Please Put URL if RSS Feed'
660 | patt = r'https?://[A-Za-z0-9\-.]{0,62}?\.([A-Za-z0-9\-.]{1,255})/?[A-Za-z0-9.\-?=#%/]*'
661 | url = re.search(patt, url).group(0)
662 | rssurl = checkRSSUrl(url)
663 | if rssurl == None:
664 | post_data = 'Invalid RSS URL'
665 | else:
666 | if url != rssurl:
667 | post_data += 'RSS Feed Found. '
668 | ret = ec.setNewRSSFeed(name, rssurl)
669 | if not ret:
670 | post_data = '{name} is already used'.format(name=name)
671 | else:
672 | post_data += 'Set `{url}` to _{name}_'.format(url=rssurl, name=name)
673 | else:
674 | post_data = 'RSS is not enabled'
675 | ret = message._client.webapi.chat.post_message(
676 | message._body['channel'],
677 | post_data,
678 | as_user=True,
679 | )
680 |
681 | @respond_to('setFeedFilter: (.*)')
682 | @respond_to('setFF: (.*)')
683 | def setFeedFilter(message, params):
684 | target = ''
685 | enabled = getEnabledTargets()
686 | if 'rss_feed' in enabled:
687 | words = params.split(';')
688 | post_data = ''
689 | if len(words) < 2:
690 | post_data = 'Parameter Shortage'
691 | else:
692 | name = words[0].strip()
693 | filter = []
694 | for w in words[1].split(' '):
695 | if w.strip() != '':
696 | filter.append(w.strip())
697 | channel = ''
698 | if len(words) > 2:
699 | channel = words[2].strip()
700 | ret = ec.setNewRSSFilter(name, filter, channel)
701 | if ret != None:
702 | post_data = 'New Filter `[{filter}]`(index : {index}) is set to _{name}_'.format(filter=', '.join(filter), name=name, index=ret)
703 | else:
704 | post_data = 'Error has Occured'
705 | else:
706 | post_data = 'RSS is not enabled'
707 | ret = message._client.webapi.chat.post_message(
708 | message._body['channel'],
709 | post_data,
710 | as_user=True,
711 | )
712 |
713 | @respond_to('editFeedFilter: (.*)')
714 | @respond_to('editFF: (.*)')
715 | def editFeedFilter(message, params):
716 | enabled = getEnabledTargets()
717 | if 'rss_feed' in enabled:
718 | words = params.split(';')
719 | post_data = ''
720 | if len(words) < 4:
721 | post_data = 'Parameter Shortage'
722 | else:
723 | name = words[0].strip()
724 | if words[1].strip().isdigit():
725 | filter = []
726 | index = int(words[1].strip())
727 | for w in words[2].split(' '):
728 | if w != '':
729 | filter.append(w.strip())
730 | channel = ''
731 | if len(words) > 3:
732 | channel = words[3].strip()
733 | ret = ec.editRSSFilter(name, index, filter, channel)
734 | if ret:
735 | post_data = '`[{filter}]`(index : {index}) is set in _{name}_'.format(filter=', '.join(filter), index=index, name=name)
736 | else:
737 | post_data = 'Error has Occured'
738 | else:
739 | post_data = 'Please Put Index of the Filter'
740 | else:
741 | post_data = 'RSS is not enabled'
742 | ret = message._client.webapi.chat.post_message(
743 | message._body['channel'],
744 | post_data,
745 | as_user=True,
746 | )
747 |
748 | @respond_to('removeFeedFilter: (.*)')
749 | @respond_to('removeFF: (.*)')
750 | def removeFeedFilter(message, params):
751 | enabled = getEnabledTargets()
752 | if 'rss_feed' in enabled:
753 | words = params.split(';')
754 | post_data = ''
755 | if len(words) < 2:
756 | post_data = 'Parameter Shortage'
757 | else:
758 | name = words[0].strip()
759 | if words[1].strip().isdigit():
760 | index = int(words[1].strip())
761 | ret = ec.removeRSSFilter(name, index)
762 | if ret != None:
763 | post_data = '`[{filter}]`(index : {index}) is removed in _{name}_'.format(filter=', '.join(ret), name=name, index=index)
764 | else:
765 | post_data = 'Error has Occured'
766 | else:
767 | post_data = 'Please Put Index of the Filter'
768 | else:
769 | post_data = 'RSS is not enabled'
770 | ret = message._client.webapi.chat.post_message(
771 | message._body['channel'],
772 | post_data,
773 | as_user=True,
774 | )
775 |
776 | @respond_to('setTwitterQuery: (.*)')
777 | @respond_to('setTQ: (.*)')
778 | def setTwitterQuery(message, params):
779 | target = ''
780 | enabled = getEnabledTargets()
781 | if 'twitter' in enabled:
782 | words = params.split(';')
783 | post_data = ''
784 | users = []
785 | query = ''
786 | continueflag = True
787 | if len(words) == 1:
788 | if words[0].strip() != '':
789 | query = words[0].strip()
790 | else:
791 | post_data = 'Query is Empty'
792 | continueflag = False
793 | elif len(words) > 1:
794 | if words[0].strip() != '' or words[1].strip() != '':
795 | query = words[0].strip()
796 | for u in words[1].split(' '):
797 | if u.strip() != '':
798 | users.append(u.strip())
799 | else:
800 | post_data = 'Query is Empty'
801 | continueflag = False
802 | else:
803 | post_data = 'Parameter Shortage'
804 | continueflag = False
805 | if continueflag:
806 | name = words[0].strip()
807 | ret = ec.setNewTwitterQuery(query, users)
808 | if ret != None:
809 | if users != []:
810 | key = query + ' Users: ' + ', '.join(users)
811 | else:
812 | key = query
813 | post_data = 'New Twitter Search Query `[{query}]`(index : {index}) was set'.format(query=key, index=ret)
814 | else:
815 | post_data = 'Error has Occured'
816 | else:
817 | post_data = 'Twitter is not enabled'
818 | ret = message._client.webapi.chat.post_message(
819 | message._body['channel'],
820 | post_data,
821 | as_user=True,
822 | )
823 |
824 | @respond_to('editTwitterQuery: (.*)')
825 | @respond_to('editTQ: (.*)')
826 | def editTwitterQuery(message, params):
827 | enabled = getEnabledTargets()
828 | if 'twitter' in enabled:
829 | post_data = ''
830 | params = params.split(';')
831 | index = params[0].strip()
832 | if index.isdigit():
833 | index = int(index)
834 | if len(params) == 2:
835 | ret = ec.editTwitterQuery(index, params[1].strip(), [])
836 | if ret != None:
837 | post_data = '`{key}` was set in TwitterQuery (index : {index})'.format(key=ret, index=str(index))
838 | else:
839 | post_data = 'No Data'
840 | elif len(params) > 2:
841 | users = []
842 | for u in params[2].strip().split(' '):
843 | if u.strip() != '':
844 | users.append(u.strip())
845 | ret = ec.editTwitterQuery(index, params[1].strip(), users)
846 | if ret != None:
847 | post_data = '`{key}` was set in TwitterQuery (index : {index})'.format(key=ret, index=str(index))
848 | else:
849 | post_data = 'No Data'
850 | else:
851 | post_data = 'Parameter Shortage'
852 | else:
853 | post_data = 'Please Put Index of the Keyword'
854 | else:
855 | post_data = 'Twitter is not enabled'
856 | ret = message._client.webapi.chat.post_message(
857 | message._body['channel'],
858 | post_data,
859 | as_user=True,
860 | )
861 |
862 | @respond_to('addUserTwitterQuery: (.*)')
863 | @respond_to('addUserTQ: (.*)')
864 | def addUserTwitterQuery(message, params):
865 | enabled = getEnabledTargets()
866 | if 'twitter' in enabled:
867 | post_data = ''
868 | params = params.split(';')
869 | index = params[0].strip()
870 | if index.isdigit():
871 | index = int(index)
872 | if len(params) > 1:
873 | users = []
874 | for u in params[1].strip().split(' '):
875 | if u.strip() != '':
876 | users.append(u.strip())
877 | ret = ec.addUserToTwitterQuery(index, users)
878 | if ret != None:
879 | post_data = '`{key}` was set in TwitterQuery (index : {index})'.format(key=ret, index=str(index))
880 | else:
881 | post_data = 'No Data'
882 | else:
883 | post_data = 'Parameter Shortage'
884 | else:
885 | post_data = 'Please Put Index of the Keyword'
886 | else:
887 | post_data = 'Twitter is not enabled'
888 | ret = message._client.webapi.chat.post_message(
889 | message._body['channel'],
890 | post_data,
891 | as_user=True,
892 | )
893 |
894 | @respond_to('removeTwitterQuery: (.*)')
895 | @respond_to('removeTQ: (.*)')
896 | def removeTwitterQuery(message, params):
897 | target = 'twitter'
898 | enabled = getEnabledTargets()
899 | post_data = ''
900 | if target in enabled:
901 | params = params.split(';')[0].strip()
902 | if params.isdigit():
903 | index = int(params.strip())
904 | ret = ec.removeKeyword(target, index)
905 | if ret != None:
906 | post_data = '`{key}`(index : {index}) is removed in _twitter_'.format(key=ret, index=str(index))
907 | else:
908 | post_data = 'No Data'
909 | else:
910 | post_data = 'Please Put Index of the Keyword'
911 | else:
912 | post_data = 'Invalid Target'
913 | ret = message._client.webapi.chat.post_message(
914 | message._body['channel'],
915 | post_data,
916 | as_user=True,
917 | )
918 |
919 | @respond_to('help:')
920 | def getAllKeyword(message):
921 | # candidatelist = setting.getKeywordlist()
922 | post_data = '''```Command Format is Following:
923 | \t{Command}: {target}; {arg1}; {arg2}; ...
924 |
925 | Command List:
926 |
927 | \'setKeyword: target; [word]\'\tAdd [word] as New Search Keyword with Default Settings.
928 | (abbreviation=setK:)
929 | \'removeKeyword: target; [index]\'tRemove the Search Keyword indicated by [index].
930 | (abbreviation=removeK:)
931 | \'enableKeyword: target; [index]\'\tEnable the Search Keyword indicated by [index].
932 | (abbreviation=enableK:)
933 | \'disableKeyword: target; [index]\'\tDisable the Search Keyword indicated by [index].
934 | (abbreviation=disableK:)
935 | \'setSearchLevel: target; [index]\'\tSet Search Level of Github Search (1-4) or Gihub Code Search (1-2) indicated by [index].
936 | (abbreviation=setSL:)
937 | \'setExpireDate: target; [index]; [expiration date]\'\tSet a Expiration Date of the Keyword indicated by [index]. [expiration date] Format is YYYY-mm-dd.
938 | (abbreviation=setED:)
939 | \'setChannel: target; [index];[channel]\'\tSet channel to notify the Search Keyword\'s result.
940 | (abbreviation=setC:)
941 | \'getKeyword: target;\'\tListing Enabled Search Keywords.
942 | (abbreviation=getK:)
943 | \'getAllKeyword: target;\'\tListing All Search Keyword (include Disabled Keywords).
944 | (abbreviation=getAllK:)
945 | \'getSearchSetting: target; [index]\'\tShow Setting of the Search Keyword indicated by [index].
946 | (abbreviation=getSS:)
947 |
948 | \'reMatchTest: target; [index]; [text]\'\tCheck wheaer the pattern indicated by [index] in [target] matches [text]. If set pattern to Pastebin ID, check the contens of pastebin.
949 | \'setFeed: [name]; [url]\'\tAdd RSS Feed to [url] as [name].
950 | (abbreviation=setF:)
951 | \'setFeedFilter: [name]; [filter]\'\tAdd new RSS Feed Filter. Notily only contains filter words.
952 | (abbreviation=setFF:)
953 | \'editFeedFilter: [name]; [index]; filter\'\tEdit Feed Filter indicated by [index] in RSS Feed of [name].
954 | (abbreviation=editFF:)
955 | \'removeFeedFilter: [name]; [index];\'\tRemove Feed Filter indicated by [index] in RSS Feed of [name].
956 | (abbreviation=removeFF:)
957 | \'setTwitterQuery: [query]; ([users];)\'\tSet [query] with Default Settings. If set [users], notify only from these users.
958 | (abbreviation=setTQ:)
959 | \'editTwitterQuery: [index]; [query]; ([users];)\'\tEdit Twitter Query indicated by [index].
960 | (abbreviation=editTQ:)
961 | \'addUserTwitterQuery: [index]; [users];\'\tAdd User to Twitter Query indicated by [index]. That query notify only from these users.
962 | (abbreviation=addUserTQ:)
963 | \'removeTwitterQuery: [index];\'\tRemove Twitter Query indicated by [index].
964 | (abbreviation=removeTQ:)
965 |
966 | \'help:\'\tShow this Message.
967 |
968 | Target:
969 | \tgithub
970 | \tgist
971 | \tgithub_code
972 | \tgitlab
973 | \tgitlab_snippet (Use RE match)
974 | \tgoogle_custom
975 | \tpastebin (Use RE match)
976 | \trss_feed
977 | \ttwitter\n```'''
978 | ret = message._client.webapi.chat.post_message(
979 | message._body['channel'],
980 | post_data,
981 | as_user=True,
982 | )
983 |
984 | @listen_to('How are you?')
985 | def reaction(message):
986 | isername=message._client.login_data['self']['name'],
987 | message.send('I\'m fine, thank you.')
988 |
--------------------------------------------------------------------------------
/master/run.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 |
3 | import argparse
4 | from crontab import CronTab
5 | import datetime
6 | import logging
7 | from logging import getLogger, StreamHandler, Formatter
8 | from multiprocessing import Pool
9 | import math
10 | import os.path
11 | import sys
12 | import time
13 | import traceback
14 | from slackbot.bot import Bot
15 | import master_post as master
16 | import search_api
17 | import slackbot_settings
18 | import plugins.edit_conf_db as ec
19 |
20 | logger = logging.getLogger(__name__)
21 | logger.setLevel(logging.INFO)
22 | fh = logging.FileHandler('./log/run.log')
23 | logger.addHandler(fh)
24 | formatter = logging.Formatter('%(asctime)s - %(levelname)s - line %(lineno)d - %(name)s - %(filename)s - \n*** %(message)s')
25 | fh.setFormatter(formatter)
26 |
27 | def doSpecialAct(target, channel, key, result):
28 | if target == 'github':
29 | pass
30 | elif target == 'gist':
31 | pass
32 | elif target == 'github_code':
33 | pass
34 | elif target == 'gitlab':
35 | pass
36 | elif target == 'gitlab_snippet':
37 | pass
38 | elif target == 'google_custom':
39 | pass
40 | elif target == 'pastebin':
41 | pass
42 | elif target == 'twitter':
43 | pass
44 |
45 | def getSpecialChannel():
46 | try:
47 | channel = slackbot_settings.special_action_channel
48 | if type(channel) != list:
49 | return []
50 | else:
51 | return channel
52 | except:
53 | return []
54 |
55 | def runSearchGithub():
56 | try:
57 | logger.info('--START GITHUB SEARCH--')
58 | now = datetime.date.today()
59 | today = now.strftime('%Y-%m-%d')
60 |
61 | target = 'github'
62 | keywords = ec.getEnableKeywords(target)
63 |
64 | if ec.isEnable(target) and keywords != None and keywords != []:
65 | safe_limit = 6
66 | error_safety = ec.getSafetyCount(target)
67 | for key in keywords:
68 | channel = key['Channel']
69 | limittime = datetime.datetime.strptime(key['Expire_date'], '%Y-%m-%d').date()
70 | if now < limittime:
71 | oldtime = now - datetime.timedelta(key['Time_Range'])
72 | oldday = oldtime.strftime('%Y-%m-%d')
73 | (results, statuscode) = search_api.searchGithub(key['KEY'], oldday, key['SearchLevel'])
74 | result = list(set(results) - set(key['Exclude_list']))
75 | if statuscode != 200:
76 | error_safety += 1
77 | ec.setSafetyCount(target, error_safety)
78 | postdata = '`' + key['KEY'] + '` failed to search in _github_.\nStatus Code: ' + str(statuscode)
79 | master.postAnyData(postdata, channel)
80 | logger.info(postdata)
81 | if error_safety > safe_limit:
82 | postdata = 'Too Many Errors. _Github_ Module is disabled for safety'
83 | ec.disable(target)
84 | master.postAnyData(postdata, channel)
85 | logger.info(postdata)
86 | else:
87 | ec.setSafetyCount(target, 0)
88 | if result != []:
89 | if channel in getSpecialChannel():
90 | doSpecialAct(target, channel, key['KEY'], result)
91 | master.postNewPoCFound(key['KEY'], result, channel)
92 | logger.info('keyword : ' + key['KEY'])
93 | logger.info('\n'.join(result))
94 | exclude = results
95 | ec.clearExcludeList(target, key['Index'])
96 | ec.addExcludeList(target, key['Index'], exclude)
97 | time.sleep(10)
98 | else:
99 | postdata = '`' + key['KEY'] + '` expired in _github_, and was disabled.'
100 | logger.info(postdata)
101 | master.postAnyData(postdata, channel)
102 | ec.enableKeywordSetting(target, key['Index'], False)
103 | except:
104 | logger.error('--ERROR HAS OCCURED IN GITHUB SEARCH--')
105 | logger.error(traceback.format_exc())
106 | master.postAnyData(traceback.format_exc(), slackbot_settings.channels[0])
107 |
108 | def runSearchGithubCode():
109 | try:
110 | logger.info('--START GITHUB CODE SEARCH--')
111 | now = datetime.date.today()
112 | today = now.strftime('%Y-%m-%d')
113 |
114 | api_key = slackbot_settings.github_access_token
115 |
116 | target = 'github_code'
117 | keywords = ec.getEnableKeywords(target)
118 |
119 | if ec.isEnable(target) and keywords != None and keywords != []:
120 | safe_limit = 6
121 | error_safety = ec.getSafetyCount(target)
122 | for key in keywords:
123 | channel = key['Channel']
124 | limittime = datetime.datetime.strptime(key['Expire_date'], '%Y-%m-%d').date()
125 | if now < limittime:
126 | (results, statuscode) = search_api.searchGithubCode(key['KEY'], key['SearchLevel'], api_key)
127 | result = list(set(results) - set(key['Exclude_list']))
128 | if statuscode != 200:
129 | error_safety += 1
130 | ec.setSafetyCount('github_code', error_safety)
131 | postdata = '`' + key['KEY'] + '` failed to search in _github_code_.\nStatus Code: ' + str(statuscode)
132 | master.postAnyData(postdata, channel)
133 | logger.info(postdata)
134 | if error_safety > safe_limit:
135 | postdata = 'Too Many Errors. _Github Code_ Module is disabled for safety'
136 | ec.disable('github_code')
137 | master.postAnyData(postdata, channel)
138 | logger.info(postdata)
139 | else:
140 | ec.setSafetyCount('github_code', 0)
141 | if key['__INITIAL__'] == True:
142 | ec.haveSearched(target, key['Index'])
143 | if result != []:
144 | postdata = 'New Code Found about `' + key['KEY'] + '` in _github_code_'
145 | master.postAnyData(postdata, channel)
146 | if key['__INITIAL__'] == True:
147 | master.postAnyData(result[0], channel)
148 | else:
149 | if channel in getSpecialChannel():
150 | doSpecialAct(target, channel, key['KEY'], result)
151 | master.postAnyData('\n'.join(result), channel)
152 | logger.info('keyword : ' + key['KEY'])
153 | logger.info('\n'.join(result))
154 | exclude = results
155 | # ec.clearExcludeList('github_code', conf['Index'])
156 | ec.addExcludeList('github_code', key['Index'], exclude)
157 | time.sleep(10)
158 | else:
159 | postdata = '`' + key['KEY'] + '` expired in _github_code_, and was disabled.'
160 | logger.info(postdata)
161 | master.postAnyData(postdata, channel)
162 | ec.enableKeywordSetting('github_code', key['Index'], False)
163 | except:
164 | logger.error('--ERROR HAS OCCURED IN GITHUB SEARCH--')
165 | logger.error(traceback.format_exc())
166 | master.postAnyData(traceback.format_exc(), slackbot_settings.channels[0])
167 |
168 | def runSearchGist():
169 | try:
170 | logger.info('--START GIST SEARCH--')
171 | now = datetime.date.today()
172 | today = now.strftime('%Y-%m-%d')
173 |
174 | target = 'gist'
175 | keywords = ec.getEnableKeywords(target)
176 |
177 | if ec.isEnable(target) and keywords != None and keywords != []:
178 | safe_limit = 6
179 | error_safety = ec.getSafetyCount(target)
180 | for key in keywords:
181 | channel = key['Channel']
182 | limittime = datetime.datetime.strptime(key['Expire_date'], '%Y-%m-%d').date()
183 | if now < limittime:
184 | oldtime = now - datetime.timedelta(key['Time_Range'])
185 | oldday = oldtime.strftime('%Y-%m-%d')
186 | (results, statuscode) = search_api.searchGist(key['KEY'], oldday)
187 | result = list(set(results) - set(key['Exclude_list']))
188 | if statuscode != 200:
189 | error_safety += 1
190 | ec.setSafetyCount(target, error_safety)
191 | postdata = '`' + key['KEY'] + '` failed to search in _gist_.\nStatus Code: ' + str(statuscode)
192 | master.postAnyData(postdata, channel)
193 | logger.info(postdata)
194 | if error_safety > safe_limit:
195 | postdata = 'Too Many Errors. _Gist_ Module is disabled for safety'
196 | ec.disable(target)
197 | master.postAnyData(postdata, channel)
198 | logger.info(postdata)
199 | else:
200 | ec.setSafetyCount(target, 0)
201 | if result != []:
202 | if channel in getSpecialChannel():
203 | doSpecialAct(target, channel, key['KEY'], result)
204 | postdata = 'New Code Found about `' + key['KEY'] + '` in _gist_'
205 | master.postAnyData(postdata, channel)
206 | master.postAnyData('\n'.join(result), channel)
207 | logger.info('keyword : ' + key['KEY'])
208 | logger.info('\n'.join(result))
209 | exclude = results
210 | ec.clearExcludeList(target, key['Index'])
211 | ec.addExcludeList(target, key['Index'], exclude)
212 | time.sleep(45)
213 | else:
214 | postdata = '`' + key['KEY'] + '` is expired in _gist_, and disabled.'
215 | master.postAnyData(postdata, channel)
216 | ec.enableKeywordSetting(target, key['Index'], False)
217 | logger.info(postdata)
218 | except:
219 | logger.error('--ERROR HAS OCCURED IN GIST SEARCH--')
220 | logger.error(traceback.format_exc())
221 | master.postAnyData(traceback.format_exc(), slackbot_settings.channels[0])
222 |
223 | def runSearchGitlab():
224 | try:
225 | logger.info('--START GITLAB SEARCH--')
226 | now = datetime.date.today()
227 | today = now.strftime('%Y-%m-%d')
228 |
229 | target = 'gitlab'
230 | keywords = ec.getEnableKeywords(target)
231 |
232 | if ec.isEnable(target) and keywords != None and keywords != []:
233 | safe_limit = 6
234 | error_safety = ec.getSafetyCount(target)
235 | for key in keywords:
236 | channel = key['Channel']
237 | limittime = datetime.datetime.strptime(key['Expire_date'], '%Y-%m-%d').date()
238 | if now < limittime:
239 | (results, statuscode) = search_api.searchGitlab(key['KEY'])
240 | result = list(set(results) - set(key['Exclude_list']))
241 | if statuscode != 200:
242 | error_safety += 1
243 | ec.setSafetyCount(target, error_safety)
244 | postdata = '`' + key['KEY'] + '` failed to search in _gitlab_.\nStatus Code: ' + str(statuscode)
245 | master.postAnyData(postdata, channel)
246 | logger.info(postdata)
247 | if error_safety > safe_limit:
248 | postdata = 'Too Many Errors. _Gitlab_ Module is disabled for safety'
249 | ec.disable(target)
250 | master.postAnyData(postdata, channel)
251 | logger.info(postdata)
252 | else:
253 | if error_safety != 0:
254 | ec.setSafetyCount(target, 0)
255 | if key['__INITIAL__'] == True:
256 | ec.haveSearched(target, key['Index'])
257 | if result != []:
258 | postdata = 'New Code Found about `' + key['KEY'] + '` in _gitlab_'
259 | master.postAnyData(postdata, channel)
260 | url = []
261 | for i in result:
262 | url.append('https://gitlab.com' + i)
263 | if key['__INITIAL__'] == True:
264 | master.postAnyData(url[0], channel)
265 | else:
266 | if channel in getSpecialChannel():
267 | doSpecialAct(target, channel, key['KEY'], url)
268 | master.postAnyData('\n'.join(url), channel)
269 | logger.info('keyword : ' + key['KEY'])
270 | logger.info('\n'.join(url))
271 | exclude = results
272 | ec.clearExcludeList(target, key['Index'])
273 | ec.addExcludeList(target, key['Index'], exclude)
274 | time.sleep(30)
275 | else:
276 | postdata = '`' + key['KEY'] + '` is expired in _gitlab_, and disabled.'
277 | master.postAnyData(postdata, channel)
278 | ec.enableKeywordSetting(target, key['Index'], False)
279 | logger.info(postdata)
280 | except:
281 | logger.error('--ERROR HAS OCCURED IN GITLAB SEARCH--')
282 | logger.error(traceback.format_exc())
283 | master.postAnyData(traceback.format_exc(), slackbot_settings.channels[0])
284 |
285 | def runSearchGitlabSnippets():
286 | try:
287 | logger.info('--START GITLAB SNIPPETS SEARCH--')
288 | now = datetime.date.today()
289 | today = now.strftime('%Y-%m-%d')
290 |
291 | target = 'gitlab_snippet'
292 | keywords = ec.getEnableKeywords(target)
293 |
294 | if ec.isEnable(target) and keywords != None and keywords != []:
295 | safe_limit = 6
296 | error_safety = ec.getSafetyCount(target)
297 | for key in keywords:
298 | channel = key['Channel']
299 | limittime = datetime.datetime.strptime(key['Expire_date'], '%Y-%m-%d').date()
300 | if now > limittime:
301 | postdata = '`' + key['KEY'] + '` is expired in _gitlab_snippet_, and disabled.'
302 | master.postAnyData(postdata, channel)
303 | ec.enableKeywordSetting('gitlab_snippet', key['Index'], False)
304 | logger.info(postdata)
305 | keywords = ec.getEnableKeywords(target)
306 | if keywords != None and keywords != []:
307 | keylist = [d.get('KEY') for d in keywords]
308 | (results, statuscode) = search_api.searchGitlabSnippets(keylist)
309 | if statuscode != 200:
310 | error_safety += 1
311 | ec.setSafetyCount(target, error_safety)
312 | postdata = '_gitlab_snippet_ failed to search.\nStatus Code: ' + str(statuscode)
313 | master.postAnyData(postdata, channel)
314 | logger.info(postdata)
315 | if error_safety > safe_limit:
316 | postdata = 'Too Many Errors. _Gitlab Snippet_ Module is disabled for safety'
317 | ec.disable(target)
318 | master.postAnyData(postdata, channel)
319 | logger.info(postdata)
320 | else:
321 | ec.setSafetyCount(target, 0)
322 | for key in keywords:
323 | if key['KEY'] in results.keys():
324 | result = list(set(results[key['KEY']]) - set(key['Exclude_list']))
325 | if result != []:
326 | channel = key['Channel']
327 | postdata = 'New Code Found about `' + key['KEY'] + '` in _gitlab_snippet_'
328 | master.postAnyData(postdata, channel)
329 | logger.info(postdata)
330 | url = []
331 | for i in result:
332 | url.append('https://gitlab.com' + i)
333 | logger.info('https://gitlab.com' + i)
334 | # exclude = list(set(results[word]) & set(keywords[word][1]))
335 | exclude = results[key['KEY']]
336 | if channel in getSpecialChannel():
337 | doSpecialAct(target, channel, key['KEY'], url)
338 | master.postAnyData('\n'.join(url), channel)
339 | ec.clearExcludeList(target, key['Index'])
340 | ec.addExcludeList(target, key['Index'], exclude)
341 | except:
342 | logger.error('--ERROR HAS OCCURED IN GITLAB SNIPPETS SEARCH--')
343 | logger.error(traceback.format_exc())
344 | master.postAnyData(traceback.format_exc(), slackbot_settings.channels[0])
345 |
346 | def runSearchPastebin():
347 | logger.info('--START PASTEBIN SEARCH--')
348 | while True:
349 | try:
350 | now = datetime.date.today()
351 | today = now.strftime('%Y-%m-%d')
352 | target = 'pastebin'
353 | keywords = ec.getEnableKeywords(target)
354 |
355 | if ec.isEnable(target) and keywords != None and keywords != []:
356 | safe_limit = 10
357 | error_safety = ec.getSafetyCount(target)
358 | for key in keywords:
359 | channel = key['Channel']
360 | limittime = datetime.datetime.strptime(key['Expire_date'], '%Y-%m-%d').date()
361 | if now > limittime:
362 | postdata = '`' + key['KEY'] + '` is expired in _pastebin_, and disabled.'
363 | master.postAnyData(postdata, channel)
364 | ec.enableKeywordSetting(target, key['Index'], False)
365 | logger.info(postdata)
366 | keywords = ec.getEnableKeywords(target)
367 | if keywords != None and keywords != []:
368 | (pastelist, statuscode) = search_api.getPasteList(100)
369 | if statuscode != 200:
370 | error_safety += 1
371 | ec.setSafetyCount(target, error_safety)
372 | postdata = 'pastebin serach failed in _pastebin_.\nStatus Code: ' + str(statuscode)
373 | master.postAnyData(postdata, channel)
374 | logger.info(postdata)
375 | if error_safety == 5:
376 | postdata = 'Pause to access pastebin'
377 | master.postAnyData(postdata, channel)
378 | logger.info(postdata)
379 | time.sleep(300)
380 | if error_safety > safe_limit:
381 | postdata = 'Too Many Errors. _Pastebin_ Module is disabled for safety'
382 | ec.disable('pastebin')
383 | master.postAnyData(postdata, channel)
384 | logger.info(postdata)
385 | else:
386 | searchedpastes = ec.getSearchedPastes()
387 | searchlist = {}
388 | for paste, conf in pastelist.items():
389 | if not paste in searchedpastes:
390 | searchlist[paste] = conf
391 | if len(searchlist.keys()) > 30:
392 | ec.setSearchedPastes(pastelist.keys())
393 | logger.info('The number of scraping pastes is ' + str(len(searchlist.keys())))
394 | keylist = [d.get('KEY') for d in keywords]
395 | (results, statuscode) = search_api.scrapePastebin(keylist, searchlist)
396 | if statuscode != 200:
397 | error_safety += 1
398 | ec.setSafetyCount(target, error_safety)
399 | postdata = 'pastebin serach failed in _pastebin_.\nStatus Code: ' + str(statuscode)
400 | master.postAnyData(postdata, channel)
401 | logger.info(postdata)
402 | if error_safety > safe_limit:
403 | postdata = 'Too Many Errors. _Pastebin_ Module is disabled for safety'
404 | ec.disable(target)
405 | master.postAnyData(postdata, channel)
406 | logger.info(postdata)
407 | else:
408 | ec.setSafetyCount(target, 0)
409 | for key in keywords:
410 | if key['KEY'] in results.keys():
411 | if results[key['KEY']] != []:
412 | channel = key['Channel']
413 | postdata = 'New Code Found about `' + key['KEY'] + '` in _pastebin_'
414 | if channel in getSpecialChannel():
415 | doSpecialAct(target, channel, key['KEY'], results[key['KEY']])
416 | master.postAnyData(postdata, channel)
417 | logger.info(postdata)
418 | exclude = results[key['KEY']]
419 | master.postAnyData('\n'.join(results[key['KEY']]), channel)
420 | time.sleep(10)
421 | except:
422 | logger.error('--ERROR HAS OCCURED IN PASTEBIN SEARCH--')
423 | logger.error(traceback.format_exc())
424 | master.postAnyData(traceback.format_exc(), slackbot_settings.channels[0])
425 | time.sleep(10)
426 |
427 | def runSearchGoogleCustom():
428 | try:
429 | engine_id = slackbot_settings.google_custom_search_engine_id
430 | api_key = slackbot_settings.google_custom_api_key
431 | logger.info('--START GOOGLE CUSTOM SEARCH--')
432 | now = datetime.date.today()
433 | today = now.strftime('%Y-%m-%d')
434 |
435 | target = 'google_custom'
436 | keywords = ec.getEnableKeywords(target)
437 |
438 | if ec.isEnable(target) and keywords != None and keywords != []:
439 | safe_limit = 6
440 | error_safety = ec.getSafetyCount(target)
441 | for key in keywords:
442 | limittime = datetime.datetime.strptime(key['Expire_date'], '%Y-%m-%d').date()
443 | channel = key['Channel']
444 | if now < limittime:
445 | (result, statuscode) = search_api.googleCustomSearch(key['KEY'], engine_id, api_key)
446 | if statuscode != 200:
447 | error_safety += 1
448 | ec.setSafetyCount(target, error_safety)
449 | postdata = '`' + key['KEY'] + '` failed to search in _google_custom_.\nStatus Code: ' + str(statuscode)
450 | master.postAnyData(postdata, channel)
451 | logger.info(postdata)
452 | if error_safety > safe_limit:
453 | postdata = 'Too Many Errors. _Google Custom_ Module is disabled for safety'
454 | ec.disable(target)
455 | master.postAnyData(postdata, channel)
456 | logger.info(postdata)
457 | else:
458 | result_post = list(set(result.keys()) - set(key['Exclude_list']))
459 | ec.setSafetyCount(target, 0)
460 | if key['__INITIAL__'] == True:
461 | ec.haveSearched(target, key['Index'])
462 | if result_post != []:
463 | postdata = 'New Code Found about `' + key['KEY'] + '` in _google_custom_'
464 | master.postAnyData(postdata, channel)
465 | logger.info(postdata)
466 | if key['__INITIAL__'] == True:
467 | result_post = result_post[:1]
468 | for i in result_post:
469 | logger.info(i)
470 | post_code = result[i][0] + '\n' + i + '\n'
471 | if channel in getSpecialChannel():
472 | doSpecialAct(target, channel, key['KEY'], post_code)
473 | master.postAnyData(post_code, channel)
474 | exclude = list(result.keys())
475 | # ec.clearExcludeList('google_custom', conf['Index'])
476 | ec.addExcludeList(target, key['Index'], exclude)
477 | time.sleep(30)
478 | else:
479 | postdata = '`' + key['KEY'] + '` is expired in _google_custom_, and disabled.'
480 | master.postAnyData(postdata, channel)
481 | ec.enableKeywordSetting(target, key['Index'], False)
482 | logger.info(postdata)
483 | except:
484 | logger.error('--ERROR HAS OCCURED IN GOOGLE CUSTOM SEARCH--')
485 | logger.error(traceback.format_exc())
486 | master.postAnyData(traceback.format_exc(), slackbot_settings.channels[0])
487 |
488 | def filterFeeds(feeds, filter):
489 | filtereditems = []
490 | for f in feeds:
491 | matched = True
492 | for word in filter:
493 | target = '__ALL__'
494 | name = word
495 | pos = word.find('>')
496 | if pos > 0 and len(word) > pos+1:
497 | target = word[:pos].strip()
498 | name = word[pos+1:].strip()
499 | if name.startswith('!'):
500 | name = name[1:].strip()
501 | if target in f.keys():
502 | if type(f[target]) == list:
503 | text = ''.join(map(str, f[target]))
504 | else:
505 | text = f[target]
506 | if text.lower().find(name.lower()) > 0:
507 | matched = False
508 | else:
509 | text = ''
510 | for i in f.keys():
511 | if type(f[i]) == list:
512 | text += ''.join(map(str, f[i]))
513 | else:
514 | text += f[i]
515 | if text.lower().find(name.lower()) > 0:
516 | matched = False
517 | else:
518 | if target in f.keys():
519 | if type(f[target]) == list:
520 | text = ''.join(map(str,f[target]))
521 | else:
522 | text = f[target]
523 | if text.lower().find(name.lower()) < 0:
524 | matched = False
525 | else:
526 | text = ''
527 | for i in f.keys():
528 | if type(f[i]) == list:
529 | text += ''.join(map(str, f[i]))
530 | else:
531 | text += f[i]
532 | if text.lower().find(name.lower()) < 0:
533 | matched = False
534 | if matched:
535 | filtereditems.append(f)
536 | return filtereditems
537 |
538 | def runRSSFeeds():
539 | try:
540 | logger.info('--GET NEW RSS FEEDS--')
541 |
542 | target = 'rss_feed'
543 | keywords = ec.getEnableKeywords(target)
544 |
545 | if ec.isEnable(target) and keywords != None and keywords != []:
546 | safe_limit = 6
547 | error_safety = ec.getSafetyCount(target)
548 |
549 | for key in keywords:
550 | channel = key['Channel']
551 | filter = key['Filters']
552 | url = key['URL']
553 | lastpost = key['Last_Post']
554 | initialstate = key['__INITIAL__']
555 | (result, statuscode) = search_api.getRSSFeeds(url, lastpost)
556 | if statuscode != 200:
557 | error_safety += 1
558 | ec.setSafetyCount(target, error_safety)
559 | postdata = '`' + key['Name'] + '` failed to get _RSS_Feeds_.\nStatus Code: ' + str(statuscode)
560 | master.postAnyData(postdata, channel)
561 | logger.info(postdata)
562 | if error_safety > safe_limit:
563 | postdata = 'Too Many Errors. _RSS_Feeds_ Module is disabled for safety'
564 | ec.disable(target)
565 | master.postAnyData(postdata, channel)
566 | logger.info(postdata)
567 | else:
568 | if error_safety != 0:
569 | ec.setSafetyCount(target, 0)
570 | if len(result) > 0:
571 | if initialstate:
572 | result = result[:1]
573 | ec.haveSearched(target, key['Name'])
574 | filteredfeeds = {}
575 | if filter != []:
576 | for f in filter:
577 | c = f['Channel']
578 | w = f['Words']
579 | ff = filterFeeds(result, w)
580 | if ff != []:
581 | if c in filteredfeeds.keys():
582 | filteredfeeds[c] += ff
583 | else:
584 | filteredfeeds[c] = ff
585 | else:
586 | if result != {}:
587 | filteredfeeds[channel] = result
588 | lastpost = {'title':result[0]['title'], 'link':result[0]['link'], 'timestamp':result[0]['timestamp']}
589 | ec.setRSSLastPost(key['Name'], lastpost)
590 | if filteredfeeds != {}:
591 | for c, feeds in filteredfeeds.items():
592 | if c in getSpecialChannel():
593 | doSpecialAct(target, c, key['Name'], feeds)
594 | postdata = 'New Feed in `' + key['Name'] + '`'
595 | master.postAnyData(postdata, c)
596 | logger.info(postdata)
597 | postdata = ''
598 | for f in feeds:
599 | postdata = f['title'] + '\n'
600 | postdata += f['link']
601 | logger.info(postdata)
602 | master.postAnyData(postdata, c)
603 | time.sleep(30)
604 | except:
605 | logger.error('--ERROR HAS OCCURED IN GETTING RSS FEEDS--')
606 | logger.error(traceback.format_exc())
607 | master.postAnyData(traceback.format_exc(), slackbot_settings.channels[0])
608 |
609 | def runTwitterSearch():
610 | try:
611 | logger.info('--START TWITTER SEARCH--')
612 |
613 | target = 'twitter'
614 | keywords = ec.getEnableKeywords(target)
615 |
616 | if ec.isEnable(target) and keywords != None and keywords != []:
617 | safe_limit = 6
618 | error_safety = ec.getSafetyCount(target)
619 |
620 | for key in keywords:
621 | channel = key['Channel']
622 | query = key['Query']
623 | users = key['Users']
624 | lastpost = key['Last_Post']
625 | initialstate = key['__INITIAL__']
626 | (result, statuscode) = search_api.getTweets(users, query, lastpost)
627 | if statuscode != 200:
628 | error_safety += 1
629 | ec.setSafetyCount(target, error_safety)
630 | postdata = '`' + key['Name'] + '` failed to get _Twitter_.\nStatus Code: ' + str(statuscode)
631 | master.postAnyData(postdata, channel)
632 | logger.info(postdata)
633 | if error_safety > safe_limit:
634 | postdata = 'Too Many Errors. _Twitter_ Module is disabled for safety'
635 | ec.disable(target)
636 | master.postAnyData(postdata, channel)
637 | logger.info(postdata)
638 | else:
639 | if error_safety != 0:
640 | ec.setSafetyCount(target, 0)
641 | if len(result) > 0:
642 | if initialstate:
643 | result = result[:1]
644 | ec.haveSearched(target, key['Index'])
645 | lastpost = result[0]
646 | ec.setTwitterLastPost(key['Index'], lastpost)
647 | postdata = 'New Tweets in `' + key['KEY'] + '`'
648 | master.postAnyData(postdata, channel)
649 | logger.info(postdata)
650 | postdata = ''
651 | for tw in result:
652 | postdata = 'https://twitter.com' + tw['link']
653 | postdata += ' (FROM: '+ tw['user'] + ')\n'
654 | postdata += '>>>' + tw['tweet'] + '\n'
655 | logger.info(postdata)
656 | if channel in getSpecialChannel():
657 | doSpecialAct(target, channel, key['KEY'], tw)
658 | master.postAnyData(postdata, channel)
659 | time.sleep(30)
660 | except:
661 | logger.error('--ERROR HAS OCCURED IN SEARCHING TWITTER--')
662 | logger.error(traceback.format_exc())
663 | master.postAnyData(traceback.format_exc(), slackbot_settings.channels[0])
664 |
665 | def runBot():
666 | bot = Bot()
667 | bot.run()
668 |
669 | class JobConfig(object):
670 | def __init__(self, crontab, job):
671 | self._crontab = crontab
672 | self.job = job
673 |
674 | def schedule(self):
675 | crontab = self._crontab
676 | return datetime.now() + timedelta(seconds=math.ceil(crontab.next()))
677 |
678 | def next(self):
679 | crontab = self._crontab
680 | return math.ceil(crontab.next())
681 |
682 | def job_controller(jobConfig):
683 | while True:
684 | try:
685 | time.sleep(jobConfig.next())
686 | jobConfig.job()
687 | except KeyboardInterrupt:
688 | break
689 |
690 | def main():
691 | parser = argparse.ArgumentParser()
692 | parser.add_argument('--db-host', type=str, default='localhost', help='DATABASE HOST NAME')
693 | parser.add_argument('--db-port', type=int, default=27017, help='DATABASE PORT')
694 | parser.add_argument('--db-name', type=str, default='codescraper-database', help='DATABASE NAME')
695 | args = parser.parse_args()
696 | ec.setDB(args.db_host, args.db_port, args.db_name)
697 |
698 | jobConfigs = []
699 |
700 | try:
701 | slackbot_settings.API_TOKEN
702 | except NameError:
703 | sys.exit()
704 | print('Slackbot API TOKEN is required')
705 |
706 | start_state = []
707 | runpastebinflag = False
708 |
709 | try:
710 | channels = slackbot_settings.channels
711 | if type(channels) != list or channels == []:
712 | print('Set more than 1 channel')
713 | sys.exit()
714 |
715 | ec.setUsingChannels(channels)
716 |
717 | if slackbot_settings.enable_github_search:
718 | default_github = slackbot_settings.github_default_settings
719 | ret = ec.setDefaultSettings('github', default_github)
720 | if ret:
721 | github_interval = slackbot_settings.github_search_interval
722 | jobConfigs.append(JobConfig(CronTab(github_interval), runSearchGithub))
723 | message = 'Started'
724 | start_state.append(('github', 'SUCCESS', message))
725 | else:
726 | ec.disable('github')
727 | message = 'Default Setting is wrong. Disabled'
728 | start_state.append(('github', 'FAILED', message))
729 | else:
730 | ec.disable('github')
731 |
732 | if slackbot_settings.enable_github_code_search:
733 | default_github_code = slackbot_settings.github_code_default_settings
734 | ret = ec.setDefaultSettings('github_code', default_github_code)
735 | if ret:
736 | gist_interval = slackbot_settings.github_code_search_interval
737 | jobConfigs.append(JobConfig(CronTab(gist_interval), runSearchGithubCode))
738 | message = 'Started'
739 | start_state.append(('github_code', 'SUCCESS', message))
740 | else:
741 | ec.disable('github_code')
742 | message = 'Default Setting is wrong. Disabled'
743 | start_state.append(('github_code', 'FAILED', message))
744 | else:
745 | ec.disable('github_code')
746 |
747 | if slackbot_settings.enable_gist_search:
748 | default_gist = slackbot_settings.gist_default_settings
749 | ret = ec.setDefaultSettings('gist', default_gist)
750 | if ret:
751 | gist_interval = slackbot_settings.gist_search_interval
752 | jobConfigs.append(JobConfig(CronTab(gist_interval), runSearchGist))
753 | message = 'Started'
754 | start_state.append(('gist', 'SUCCESS', message))
755 | else:
756 | ec.disable('gist')
757 | message = 'Default Setting is wrong. Disabled'
758 | start_state.append(('gist', 'FAILED', message))
759 | else:
760 | ec.disable('gist')
761 |
762 | if slackbot_settings.enable_gitlab_search:
763 | default_gitlab = slackbot_settings.gitlab_default_settings
764 | ret = ec.setDefaultSettings('gitlab', default_gitlab)
765 | if ret:
766 | gitlab_interval = slackbot_settings.gitlab_search_interval
767 | jobConfigs.append(JobConfig(CronTab(gitlab_interval), runSearchGitlab))
768 | message = 'Started'
769 | start_state.append(('gitlab', 'SUCCESS', message))
770 | else:
771 | ec.disable('gitlab')
772 | message = 'Default Setting is wrong. Disabled'
773 | start_state.append(('gitlab', 'FAILED', message))
774 | else:
775 | ec.disable('gitlab')
776 |
777 | if slackbot_settings.enable_gitlab_snippet_search:
778 | default_gitlab_snippet = slackbot_settings.gitlab_snippet_default_settings
779 | ret = ec.setDefaultSettings('gitlab_snippet', default_gitlab_snippet)
780 | if ret:
781 | gitlab_snippet_interval = slackbot_settings.gitlab_snippet_search_interval
782 | jobConfigs.append(JobConfig(CronTab(gitlab_snippet_interval), runSearchGitlabSnippets))
783 | message = 'Started'
784 | start_state.append(('gitlab_snippet', 'SUCCESS', message))
785 | else:
786 | ec.disable('gitlab_snippet')
787 | message = 'Default Setting is wrong. Disabled'
788 | start_state.append(('gitlab_snippet', 'FAILED', message))
789 | else:
790 | ec.disable('gitlab_snippet')
791 |
792 | if slackbot_settings.enable_pastebin_search:
793 | default_pastebin = slackbot_settings.pastebin_default_settings
794 | ret = ec.setDefaultSettings('pastebin', default_pastebin)
795 | if ret:
796 | runpastebinflag = True
797 | message = 'Started'
798 | start_state.append(('pastebin', 'SUCCESS', message))
799 | else:
800 | ec.disable('pastebin')
801 | message = 'Default Setting is wrong. Disabled'
802 | start_state.append(('pastebin', 'FAILED', message))
803 | else:
804 | ec.disable('pastebin')
805 |
806 | if slackbot_settings.enable_google_custom_search:
807 | slackbot_settings.google_custom_search_engine_id
808 | slackbot_settings.google_custom_api_key
809 | default_google_custom = slackbot_settings.google_custom_default_settings
810 | ret = ec.setDefaultSettings('google_custom', default_google_custom)
811 | if ret:
812 | google_custom_interval = slackbot_settings.google_custom_search_interval
813 | jobConfigs.append(JobConfig(CronTab(google_custom_interval), runSearchGoogleCustom))
814 | message = 'Started'
815 | start_state.append(('google_custom', 'SUCCESS', message))
816 | else:
817 | ec.disable('google_custom')
818 | message = 'Default Setting is wrong. Disabled'
819 | start_state.append(('google_custom', 'FAILED', message))
820 | else:
821 | ec.disable('google_custom')
822 |
823 | if slackbot_settings.enable_rss_feed:
824 | default_channel = slackbot_settings.rss_feed_default_channel
825 | ret = ec.setDefaultSettings('rss_feed', {'Channel':default_channel})
826 | if ret:
827 | rss_interval = slackbot_settings.rss_feed_interval
828 | jobConfigs.append(JobConfig(CronTab(rss_interval), runRSSFeeds))
829 | message = 'Started'
830 | start_state.append(('rss_feed', 'SUCCESS', message))
831 | else:
832 | ec.disable('rss_feed')
833 | message = 'Default Setting is wrong. Disabled'
834 | start_state.append(('rss_feed', 'FAILED', message))
835 | else:
836 | ec.disable('rss_feed')
837 |
838 | if slackbot_settings.enable_twitter:
839 | default_channel = slackbot_settings.twitter_default_channel
840 | ret = ec.setDefaultSettings('twitter', {'Channel':default_channel})
841 | if ret:
842 | twitter_interval = slackbot_settings.twitter_interval
843 | jobConfigs.append(JobConfig(CronTab(twitter_interval), runTwitterSearch))
844 | message = 'Started'
845 | start_state.append(('twitter', 'SUCCESS', message))
846 | else:
847 | ec.disable('twitter')
848 | message = 'Default Setting is wrong. Disabled'
849 | start_state.append(('twitter', 'FAILED', message))
850 | else:
851 | ec.disable('twitter')
852 |
853 | except AttributeError:
854 | print('slackbot_settings is something wrong')
855 | sys.exit(0)
856 |
857 | postdata = '---CodeScraper Slackbot Started---\n```'
858 | for m in start_state:
859 | postdata += ' : '.join(m) + '\n'
860 | postdata += '```'
861 | master.postAnyData(postdata, channels[0])
862 | print(postdata)
863 |
864 | if runpastebinflag:
865 | p = Pool(len(jobConfigs) + 2)
866 | else:
867 | p = Pool(len(jobConfigs) + 1)
868 | try:
869 | p.apply_async(runBot)
870 | if runpastebinflag:
871 | p.apply_async(runSearchPastebin)
872 | p.map(job_controller, jobConfigs)
873 | except KeyboardInterrupt:
874 | pass
875 |
876 | if __name__ == "__main__":
877 | main()
878 |
--------------------------------------------------------------------------------
/master/search_api.py:
--------------------------------------------------------------------------------
1 | import requests
2 | import lxml.html
3 | import datetime
4 | import time
5 | import json
6 | import os.path
7 | import urllib
8 | import re
9 | import feedparser
10 | from dateutil import parser, tz
11 | #import traceback
12 | from pyquery import PyQuery
13 |
14 | '''def get_request(url, headers, tries, timeout):
15 | try:
16 | if tries < 0:
17 | r = requests.get(url, headers=headers timeout=timeout)
18 | return r
19 | else:
20 | return None
21 | except requests.exceptions.ConnectTimeout:
22 | sleep(1.5)
23 | res = requestX(url, headers, tries-1, timeout)
24 | return res'''
25 |
26 | def searchGithub(word, day, level):
27 | searchlevel = {
28 | 1: ['in:name,description', 'created'],
29 | 2: ['in:name,description,readme', 'created'],
30 | 3: ['in:name,description', 'pushed'],
31 | 4: ['in:name,description,readme', 'pushed']}
32 | github_url = 'https://api.github.com/search/repositories?q='
33 | try:
34 | if word.find(' ') > 0:
35 | word.replace(' ', '\" \"')
36 | word = urllib.parse.quote('\"' + word + '\"')
37 | url = github_url + word + '+' + searchlevel[level][0] + '+' + searchlevel[level][1] + ':>' + day + '&s=updated&o=asc'
38 | headers = {"Accept": "application/vnd.github.mercy-preview+json"}
39 | result = requests.get(url, timeout=10, headers=headers)
40 | statuscode = result.status_code
41 | resultdata = result.json()
42 | codes = []
43 | for a in resultdata['items']:
44 | name = a['full_name']
45 | if a['size'] > 0:
46 | codes.append(name)
47 | return codes, statuscode
48 | except:
49 | return [], -1
50 |
51 | def searchGithubCode(word, level, api_key):
52 | searchlevel = {
53 | 1: 'in:file',
54 | 2: 'in:file,path',
55 | 3: 'in:file,path',
56 | 4: 'in:file,path'}
57 | github_url = 'https://api.github.com/search/code?q='
58 | try:
59 | if word.find(' ') > 0:
60 | word.replace(' ', '\" \"')
61 | word = urllib.parse.quote('\"' + word + '\"')
62 | url = github_url + word + '+' + searchlevel[level] + '+sort%3Aindexed&access_token=' + api_key
63 | headers = {"Accept": "application/vnd.github.mercy-preview+json"}
64 | result = requests.get(url, timeout=10, headers=headers)
65 | statuscode = result.status_code
66 | resultdata = result.json()
67 | codes = []
68 | for a in resultdata['items']:
69 | name = a['html_url']
70 | if level == 3:
71 | if not a['name'].lower().startswith('readme'):
72 | codes.append(name)
73 | else:
74 | codes.append(name)
75 | return codes, statuscode
76 | except:
77 | return [], -1
78 |
79 | def searchGist(word, day):
80 | if word.find(' ') > 0:
81 | word.replace(' ', '\" \"')
82 | word = urllib.parse.quote('\"' + word + '\"')
83 | url = 'https://gist.github.com/search?utf8=%E2%9C%93&q=' + word + '+created%3A>' + day + '&ref=searchresults'
84 | try:
85 | result = requests.get(url, timeout=10)
86 | statuscode = result.status_code
87 | root = lxml.html.fromstring(result.text)
88 | codes = []
89 | for a in root.xpath('//div/a[@class="link-overlay"]'):
90 | # name = a.text_content()
91 | link = a.get('href')
92 | codes.append(link)
93 | return codes, statuscode
94 | except:
95 | return [], -1
96 |
97 | def searchGitlab(word):
98 | try:
99 | if word.find(' ') > 0:
100 | word.replace(' ', '\" \"')
101 | word = urllib.parse.quote('\"' + word + '\"')
102 | url = 'https://gitlab.com/explore/projects?utf8=%E2%9C%93&name=' + word + '&sort=latest_activity_desc'
103 | result = requests.get(url, timeout=10)
104 | statuscode = result.status_code
105 | root = lxml.html.fromstring(result.text)
106 | codes = []
107 | for a in root.xpath('//div/a[@class="project"]'):
108 | # name = a.text_content()
109 | link = a.get('href')
110 | codes.append(link)
111 | return codes, statuscode
112 | except:
113 | return [], -1
114 |
115 | def searchGitlabSnippets(words):
116 | try:
117 | url = 'https://gitlab.com/explore/snippets'
118 | result = requests.get(url, timeout=10)
119 | statuscode = result.status_code
120 | snippets = []
121 | root = lxml.html.fromstring(result.text)
122 | symbols = r'[!\"#$%&\'()*+,\-./:;<=>@\[\]^_{|}~\\]'
123 | re_symbol = re.compile(symbols)
124 | pattlist = {}
125 | wordlist = {}
126 | for w in words:
127 | if re.search(re_symbol, w):
128 | pattlist[w] = re.compile(w)
129 | else:
130 | wordlist[w] = w.split(' ')
131 | for a in root.xpath('//div[@class="title"]/a'):
132 | name = a.text_content()
133 | link = a.get('href')
134 | snippets.append([name, link])
135 | codes = {}
136 | if statuscode == 200:
137 | for l in snippets:
138 | raw_url = 'https://gitlab.com' + l[1] + '/raw'
139 | raw_result = requests.get(raw_url, timeout=10)
140 | if raw_result.status_code == 200:
141 | for w, patt in pattlist.items():
142 | if re.search(patt, l[0]) or re.search(patt, raw_result.text):
143 | if w in codes.keys():
144 | codes[w].append(l[1])
145 | else:
146 | codes[w] = [l[1]]
147 | for w, patt in wordlist.items():
148 | matched = True
149 | for p in patt:
150 | if l[0].lower().find(p.lower()) < 0 and raw_result.text.lower().find(p.lower()) < 0:
151 | matched = False
152 | if matched:
153 | if w in codes.keys():
154 | codes[w].append(l[1])
155 | else:
156 | codes[w] = [l[1]]
157 | else:
158 | statuscode = raw_result.status_code
159 | break
160 | time.sleep(10)
161 | return codes, statuscode # dict, int
162 | except:
163 | return {}, -1
164 |
165 | def getPasteList(limit):
166 | try:
167 | url = 'https://scrape.pastebin.com/api_scraping.php?limit=' + str(limit)
168 | result = requests.get(url, timeout=10)
169 | statuscode = result.status_code
170 | items = {}
171 | if statuscode == 200:
172 | scrape = result.json()
173 | for item in scrape:
174 | items[item["full_url"]] = [item["title"], item["scrape_url"]]
175 | return items, statuscode # dict, int
176 | except:
177 | return {}, -1
178 |
179 | def scrapePastebin(words, items):
180 | codes = {}
181 | statuscode = -1
182 | symbols = r'[!\"#$%&\'()*+,\-./:;<=>@\[\]^_{|}~\\]'
183 | re_symbol = re.compile(symbols)
184 | try:
185 | pattlist = {}
186 | wordlist = {}
187 | for w in words:
188 | if re.search(re_symbol, w):
189 | if w.startswith('.*'):
190 | w = w[2:]
191 | if w.endswith('.*'):
192 | w = w[:-2]
193 | pattlist[w] = re.compile(w)
194 | else:
195 | wordlist[w] = w.split(' ')
196 | for k,v in items.items():
197 | raw_result = requests.get(v[1], timeout=15)
198 | statuscode = raw_result.status_code
199 | if statuscode == 200:
200 | for w, patt in pattlist.items():
201 | if re.search(patt, v[0]) or re.search(patt, raw_result.text):
202 | if w in codes.keys():
203 | codes[w].append(k)
204 | else:
205 | codes[w] = [k]
206 | for w, patt in wordlist.items():
207 | matched = True
208 | for p in patt:
209 | if v[0].lower().find(p.lower()) < 0 and raw_result.text.lower().find(p.lower()) < 0:
210 | matched = False
211 | if matched:
212 | if w in codes.keys():
213 | codes[w].append(k)
214 | else:
215 | codes[w] = [k]
216 | else:
217 | return {}, statuscode
218 | time.sleep(1.5)
219 | return codes, statuscode # dict, int
220 | except:
221 | return {}, -1
222 |
223 | def googleCustomSearch(word, engine_id, api_key):
224 | try:
225 | if word.find(' ') > 0:
226 | word.replace(' ', '\" \"')
227 | word = urllib.parse.quote('\"' + word + '\"')
228 | headers = {"content-type": "application/json"}
229 | url = 'https://www.googleapis.com/customsearch/v1?key=' + api_key + '&rsz=filtered_cse&num=10&hl=en&prettyPrint=false&cx=' + engine_id + '&q=' + word + '&sort=date'
230 | result = requests.get(url, timeout=10, headers=headers)
231 | statuscode = result.status_code
232 | codes = {}
233 | if statuscode == 200:
234 | jsondata = result.json()
235 | if 'items' in jsondata.keys():
236 | for item in jsondata['items']:
237 | name = item['title']
238 | sub = item['snippet']
239 | link = item['link']
240 | codes[link] = [name, sub]
241 | return codes, statuscode
242 | except:
243 | return {}, -1
244 |
245 | def parseRSS(items):
246 | parseddata = []
247 | for item in items:
248 | data = {
249 | 'link' : item['link']
250 | }
251 | if 'title' in item.keys():
252 | data['title'] = item['title']
253 | if 'summary' in item.keys():
254 | data['summary'] = item['summary']
255 | if 'updated' in item.keys() and item['updated'] != '':
256 | dt = parser.parse(item['updated'])
257 | if dt.tzinfo == None:
258 | data['timestamp'] = dt.strftime('%Y-%m-%d %H:%M:%S')
259 | else:
260 | data['timestamp'] = dt.astimezone(tz.tzutc()).strftime('%Y-%m-%d %H:%M:%S')
261 | elif 'published' in item.keys() and item['published'] != '':
262 | dt = parser.parse(item['published'])
263 | if dt.tzinfo == None:
264 | data['timestamp'] = dt.strftime('%Y-%m-%d %H:%M:%S')
265 | else:
266 | data['timestamp'] = dt.astimezone(tz.tzutc()).strftime('%Y-%m-%d %H:%M:%S')
267 | else:
268 | data['timestamp'] = None
269 | taglist = []
270 | if 'tags'in item.keys():
271 | for tag in item['tags']:
272 | taglist.append(tag['term'])
273 | data['tags'] = taglist
274 | contents = []
275 | if 'content'in item.keys():
276 | for c in item['content']:
277 | content = (c['type'], c['value'])
278 | contents.append(content)
279 | data['contents'] = contents
280 | parseddata.append(data)
281 | return parseddata
282 |
283 | def getRSSFeeds(url, lastpost):
284 | try:
285 | headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:50.0) Gecko/20100101 Firefox/50.0'}
286 | response = requests.get(url, timeout=10, headers=headers)
287 | updateditems = []
288 | statuscode = response.status_code
289 | if statuscode == 200:
290 | rss = feedparser.parse(response.text)
291 | result = parseRSS(rss['entries'])
292 | for entry in result:
293 | if entry['link'] == lastpost['link']:
294 | break
295 | else:
296 | if entry['timestamp'] != None and lastpost['timestamp'] != None:
297 | if datetime.datetime.strptime(entry['timestamp'], '%Y-%m-%d %H:%M:%S') < datetime.datetime.strptime(lastpost['timestamp'], '%Y-%m-%d %H:%M:%S'):
298 | break
299 | updateditems.append(entry)
300 | return updateditems, statuscode
301 | except:
302 | return [], -1
303 |
304 | def getTweets(users, word, lastpost):
305 | try:
306 | query = ''
307 | if word.strip() != '':
308 | query += word
309 | if len(users) == 1:
310 | query += ' from:' + users[0]
311 | elif len(users) > 1:
312 | query += ' from:' + ' OR from:'.join(users)
313 | query = urllib.parse.quote_plus(query)
314 | url = 'https://twitter.com/i/search/timeline?f=tweets&q={query}&src=typd'.format(query=query)
315 | headers = {
316 | 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:50.0) Gecko/20100101 Firefox/50.0',
317 | 'Accept':"application/json, text/javascript, */*; q=0.01",
318 | 'Accept-Language':"de,en-US;q=0.7,en;q=0.3",
319 | 'X-Requested-With':"XMLHttpRequest",
320 | 'Referer':url,
321 | 'Connection':"keep-alive"
322 | }
323 | response = requests.get(url, headers=headers)
324 | statuscode = response.status_code
325 | tweetslist = []
326 | new_tweets = []
327 | res = response.json()
328 | if statuscode == 200:
329 | json_response = response.json()
330 | if json_response['items_html'].strip() != '':
331 | scraped_tweets = PyQuery(json_response['items_html'])
332 | scraped_tweets.remove('div.withheld-tweet')
333 | tweets = scraped_tweets('div.js-stream-tweet')
334 | if len(tweets) != 0:
335 | for tweet_html in tweets:
336 | t = {}
337 | tweetPQ = PyQuery(tweet_html)
338 | t['user'] = tweetPQ("span:first.username.u-dir b").text()
339 | txt = re.sub(r"\s+", " ", tweetPQ("p.js-tweet-text").text())
340 | txt = txt.replace('# ', '#')
341 | txt = txt.replace('@ ', '@')
342 | t['tweet'] = txt
343 | t['id'] = tweetPQ.attr("data-tweet-id")
344 | t['link'] = tweetPQ.attr("data-permalink-path")
345 | t['timestamp'] = int(tweetPQ("small.time span.js-short-timestamp").attr("data-time"))
346 | tweetslist.append(t)
347 | for tw in tweetslist:
348 | if tw['id'] == lastpost['id']:
349 | break
350 | if 'timestamp' in tw.keys() and 'timestamp' in lastpost.keys():
351 | if tw['timestamp'] < lastpost['timestamp']:
352 | break
353 | new_tweets.append(tw)
354 | return new_tweets, statuscode
355 | except:
356 | return [], -1
357 |
--------------------------------------------------------------------------------
/master/settings/.gitignore:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/blue1616/CodeScraper/5e27c51ea645c7fb48fc3caaf290a75c8ee067ec/master/settings/.gitignore
--------------------------------------------------------------------------------
/master/slackbot_settings.py.sample:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 |
3 | # API Token for Slackbot
4 | # https://api.slack.com/bot-users
5 | # CodeScraper uses lins05/slackbot (https://github.com/lins05/slackbot)
6 | API_TOKEN = "XXXX-XXXXXXXXXXXX-XXXXXXXXXXXXXXXXXXXXXXXX"
7 |
8 | default_reply = "What you mean?"
9 |
10 | # Do not edit
11 | PLUGINS = [
12 | 'plugins',
13 | ]
14 |
15 | # Slack Cannels participating slackbot
16 | # make channels and invete your slackbot
17 | channels = [
18 | "codescraper",
19 | "test_channel"]
20 |
21 | # You should change 'channel' name to one of the above
22 | github_default_channel = channels[0]
23 | gist_default_channel = channels[0]
24 | github_code_default_channel = channels[0]
25 | gitlab_default_channel = channels[0]
26 | gitlab_snippet_default_channel = channels[0]
27 | pastebin_default_channel = channels[0]
28 | google_custom_default_channel = channels[0]
29 | rss_feed_default_channel = channels[0]
30 | twitter_default_channel = channels[0]
31 |
32 | # Default Setting of Search Keywords
33 | github_default_settings = {
34 | 'Enable':True,
35 | 'SearchLevel':2,
36 | 'Time_Range':2,
37 | 'Expire_date':180,
38 | 'Channel': github_default_channel
39 | }
40 |
41 | gist_default_settings = {
42 | 'Enable':True,
43 | 'Time_Range':2,
44 | 'Expire_date':180,
45 | 'Channel':gist_default_channel
46 | }
47 |
48 | github_code_default_settings = {
49 | 'Enable':True,
50 | 'SearchLevel':2,
51 | 'Expire_date':180,
52 | 'Channel':github_code_default_channel
53 | }
54 |
55 | gitlab_default_settings = {
56 | 'Enable':True,
57 | 'Expire_date':180,
58 | 'Channel':gitlab_default_channel
59 | }
60 |
61 | gitlab_snippet_default_settings = {
62 | 'Enable':True,
63 | 'Expire_date':180,
64 | 'Channel':gitlab_snippet_default_channel
65 | }
66 |
67 | pastebin_default_settings = {
68 | 'Enable':True,
69 | 'Expire_date':180,
70 | 'Channel':pastebin_default_channel
71 | }
72 |
73 | google_custom_default_settings = {
74 | 'Enable':True,
75 | 'Expire_date':180,
76 | 'Channel':google_custom_default_channel
77 | }
78 |
79 | # github_access_token is required for searching github_code (not needed for github and gist search)
80 | # Access : https://github.com/settings/tokens
81 | github_access_token = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
82 |
83 | # Google Custom Search is required API key and search engine id.
84 | # Google Custom Search API : https://developers.google.com/custom-search/v1/overview
85 | # Access here : https://console.developers.google.com/
86 | google_custom_api_key = 'XXXXXXXXXXXXXXXXXXXXXX-XXXXXXXXXXXXXXXX'
87 | google_custom_search_engine_id = 'XXXXXXXXXXXXXXXXXXXXX:XXXXXXXXXXX'
88 |
89 | # Enable Modules
90 | enable_github_search = True
91 | enable_github_code_search = False
92 | enable_gist_search = True
93 | enable_gitlab_search = True
94 | enable_gitlab_snippet_search = False
95 | # Pastebin Search is required pro account and to ask pastebin to whitelist your IP
96 | enable_pastebin_search = False
97 | enable_google_custom_search = False
98 | enable_rss_feed = True
99 | enable_twitter = True
100 |
101 | # Interval (Write in Crontab Format)
102 | github_search_interval = "28 */1 * * *"
103 | github_code_search_interval = "48 */1 * * *"
104 | gist_search_interval = "02 */2 * * *"
105 | gitlab_search_interval = "13 */2 * * *"
106 | gitlab_snippet_search_interval = "43 */2 * * *"
107 | #pastebin_search_interval = "*/1 * * * *"
108 | # --*ATTENTION*--
109 | # free google api allows 100 requests per day
110 | # if you set search interval to every 2 hour, you can register at most 8 search candidates (12 * 8 = 96 reqs)
111 | google_custom_search_interval = "53 */2 * * *"
112 | rss_feed_interval = "07 */1 * * *"
113 | twitter_interval = "14 */1 * * *"
114 |
--------------------------------------------------------------------------------
/master/startbot.sh:
--------------------------------------------------------------------------------
1 | #!/bin/sh
2 |
3 | if [ -z "$DB_HOST" ]; then
4 | echo "DB_HOST is empty"
5 | exit
6 | fi
7 |
8 | if [ -z "$DB_PORT" ]; then
9 | echo "DB_PORT is empty"
10 | exit
11 | fi
12 |
13 | if [ -z "$DB_NAME" ]; then
14 | DB_NAME=codescraper-database
15 | fi
16 |
17 | python3 run.py --db-host=$DB_HOST --db-port=$DB_PORT --db-name=$DB_NAME
18 |
--------------------------------------------------------------------------------
/requirements:
--------------------------------------------------------------------------------
1 | slackbot==0.5.3
2 | lxml==4.2.4
3 | crontab==0.22.2
4 | feedparser==5.2.1
5 | python-dateutil==2.7.5
6 | pymongo==3.7.2
7 | pyquery==1.4.0
8 |
--------------------------------------------------------------------------------