├── .dockerignore ├── Dockerfile ├── LICENSE ├── README.md ├── README_ja.md ├── docker-compose.yml ├── img ├── codescraper-1.png └── codescraper-2.png ├── master ├── log │ └── .gitignore ├── master_post.py ├── plugins │ ├── __init__.py │ ├── edit_conf_db.py │ └── getCommand.py ├── run.py ├── search_api.py ├── settings │ └── .gitignore ├── slackbot_settings.py.sample └── startbot.sh └── requirements /.dockerignore: -------------------------------------------------------------------------------- 1 | .git/ 2 | master/__pycache__ 3 | master/plugins/__pycache__ 4 | img/* 5 | LICENSE 6 | README* 7 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM python:3.7-alpine 2 | MAINTAINER blue1616 3 | 4 | COPY requirements /root/requirements 5 | 6 | RUN apk upgrade --no-cache && \ 7 | apk add --no-cache build-base && \ 8 | apk add --no-cache libxml2-dev libxslt-dev && \ 9 | pip install -r /root/requirements && \ 10 | apk del build-base 11 | 12 | RUN addgroup -g 1000 codescraper && \ 13 | adduser -D -u 1000 -G codescraper codescraper && \ 14 | mkdir -p /home/codescraper/ 15 | 16 | COPY ./master /home/codescraper/master 17 | RUN chown -R codescraper:codescraper /home/codescraper && \ 18 | chmod +x /home/codescraper/master/startbot.sh 19 | 20 | USER codescraper 21 | WORKDIR /home/codescraper/master 22 | 23 | #CMD ["python", "/home/codescraper/master/run.py"] 24 | CMD ["/home/codescraper/master/startbot.sh"] 25 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 blueblue 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # CodeScraper 2 | 3 | ## Description 4 | CodeScraper is Slackbot that searches sites such as Github, Gitlab, Pastebin with pre-registered keywords and notifies you when it finds new ones. 5 | 6 | Currently the following functions are implemented. 7 | 8 | |Function|Description|Notes| 9 | |---|---|---| 10 | |github|Search Github and find new repositories|| 11 | |gist|Search Github and find new Gist|| 12 | |github_code|Do Github code search
it depends on Github Indexing|Github API Token is required| 13 | |gitlab|Search Gitlab and find new projects|| 14 | |gitlab_snippet|Scraping new posts of Gitlab Snippets|Regular Expression is available.| 15 | |google_custom|Search with Google Custom Search and find web pages|Make Search Engine and set your Engine ID and API Token.
Free Google API has limit of 100 requests per day| 16 | |pastebin|Scraping Pastebin with Scraping API|Pastebin PRO Account(Paid Account) is required.
Regular Expression is available| 17 | |rss_feed|Get RSS Feed
Feed filtering by specific words is available|| 18 | |twitter|Search Twitter and find new tweets||| 19 | 20 | ## Requirements 21 | It is supposed to run using Docker 22 | 23 | Otherwise you need MongoDB and Python 3 and the following libraries 24 | - slackbot (https://github.com/lins05/slackbot) 25 | - lxml 26 | - crontab 27 | - feedparser 28 | - python-dateutil 29 | - pymongo 30 | - pyquery 31 | 32 | ## Install 33 | Using Docker 34 | 35 | ```sh 36 | docker-compose build 37 | ``` 38 | 39 | Without using Docker 40 | 41 | ```sh 42 | pip3 install -r requirements 43 | ``` 44 | 45 | ## Usage 46 | ### Run CodeScrapter 47 | You need slackbot_settings.py. 48 | Create your setting file. 49 | 50 | ```sh 51 | mv ./master/slackbot_settings.py.sample ./master/slackbot_settings.py 52 | vim ./master/slackbot_settings.py 53 | ``` 54 | 55 | Edit your config file 56 | - Required settings 57 | - API_TOKEN (Line 6) 58 | - Log in Slack and access [here](https://my.slack.com/services/new/bot). Create your Slackbot. 59 | - Set your Slackbot API Token 60 | - channels(Line 17-19) 61 | - Set slack channels that your slackbot join 62 | - At least one channel must be listed 63 | - Enable Targets(Line 90-99) 64 | - Select whether to enable each search target(True|False) 65 | - Separate settings are required to activate the following targets 66 | - github_code : github_access_token must be set 67 | - pastebin : Pastebin PRO account(Paid Account) and Static IP is required. You have to set your static IP to whitelist for using scraping API 68 | - google_custom : google_custom_api_key and google_custom_search_engine_id must be set 69 | - default_channel (Line 22-30) 70 | - Channel that notifies search results of each target 71 | - It is possible to change for each search keyword 72 | - If string not listing in 'channels' is set, the first channel of 'channels' is specified. 73 | 74 | - Optional settings 75 | - github_access_token(Line 81) 76 | - It requires if github_code is enabled 77 | - Get from [here](https://github.com/settings/tokens) 78 | - google_custom_api_key(Line 86) 79 | - It requires if google_custom is enabled 80 | - Get from [here](https://console.developers.google.com/) 81 | - google_custom_search_engine_id(Line 87) 82 | - It requires if google_custom is enabled 83 | - Get from [here](https://console.developers.google.com/) 84 | - Interval(Line 102-113) 85 | - Set search execution time for each search target 86 | - Set in crontab format 87 | - default_settings(Line 33-77) 88 | - Default settings of earh target 89 | - The contents of each item are as shown in the table below 90 | 91 | |item|Description| 92 | |---|---| 93 | |Enable|Set whether a keyword is enable or not(True|False)| 94 | |SearchLevel|Set search range in github(1|2|3|4) or github_code(1|2). Larger numbers give more results| 95 | |Time_Range|Set Search Time Range in github or gist. Items created before the set number of days are searched| 96 | |Expire_date|Set keyword expiration date. The expiration date will be after the number of days set here, from the date at the time of registration. The keywords that have expired will be invalidated| 97 | |Exclude_list|Notice exclusion list. This setting is unnecessary as scripts automatically rewrites.| 98 | |Channel|Set up the channel to be notified| 99 | 100 | Run with docker-compose. 101 | 102 | ```sh 103 | docker-compose up -d 104 | ``` 105 | 106 | After successful launch, Slcak will receive the following notification 107 | > ---CodeScraper Slackbot Started--- 108 | > 109 | > github : SUCCESS : Started
110 | > gist : SUCCESS : Started
111 | > gitlab : SUCCESS : Started
112 | 113 | ### CodeScraper Commands 114 | Search keywords are operated by commands via Slackbot 115 | 116 | First, following command displays help 117 | 118 | ``` 119 | @{Slackbot name} help: 120 | ``` 121 | 122 | ``` 123 | Command Format is Following: 124 | {Command}: {target}; {arg1}; {arg2}; ... 125 | 126 | Command List: 127 | 128 | 'setKeyword: target; [word]' Add [word] as New Search Keyword with Default Settings. 129 | (abbreviation=setK:) 130 | 'removeKeyword: target; [index]'tRemove the Search Keyword indicated by [index]. 131 | (abbreviation=removeK:) 132 | 'enableKeyword: target; [index]' Enable the Search Keyword indicated by [index]. 133 | (abbreviation=enableK:) 134 | 'disableKeyword: target; [index]' Disable the Search Keyword indicated by [index]. 135 | (abbreviation=disableK:) 136 | 'setSearchLevel: target; [index]' Set Search Level of Github Search (1:easily 2:) indicated by [index]. It is used in github and github_code. 137 | (abbreviation=setSL:) 138 | 'setExpireDate: target; [index]; [expiration date]' Set a Expiration Date of the Keyword indicated by [index]. [expiration date] Format is YYYY-mm-dd. 139 | (abbreviation=setED:) 140 | 'setChannel: target; [index];[channel]' Set channel to notify the Search Keyword's result. 141 | (abbreviation=setC:) 142 | 'getKeyword: target;' Listing Enabled Search Keywords. 143 | (abbreviation=getK:) 144 | 'getAllKeyword: target;' Listing All Search Keyword (include Disabled Keywords). 145 | (abbreviation=getAllK:) 146 | 'getSearchSetting: target; [index]' Show Setting of the Search Keyword indicated by [index]. 147 | (abbreviation=getSS:) 148 | 149 | 'reMatchTest: target; [index]; [text]' Check wheaer the pattern indicated by [index] in [target] matches [text]. If set pattern to Pastebin ID, check the contens of pastebin. 150 | 'setFeed: [name]; [url]' Add RSS Feed to [url] as [name]. 151 | (abbreviation=setF:) 152 | 'setFeedFilter: [name]; [filter]' Add new RSS Feed Filter. Notily only contains filter words. 153 | (abbreviation=setFF:) 154 | 'editFeedFilter: [name]; [index]; filter' Edit Feed Filter indicated by [index] in RSS Feed of [name]. 155 | (abbreviation=editFF:) 156 | 'removeFeedFilter: [name]; [index];' Remove Feed Filter indicated by [index] in RSS Feed of [name]. 157 | (abbreviation=removeFF:) 158 | 'setTwitterQuery: [query]; ([users];)' Set [query] with Default Settings. If set [users], notify only from these users. 159 | (abbreviation=setTQ:) 160 | 'editTwitterQuery: [index]; [query]; ([users];)' Edit Twitter Query indicated by [index]. 161 | (abbreviation=editTQ:) 162 | 'addUserTwitterQuery: [index]; [users];' Add User to Twitter Query indicated by [index]. That query notify only from these users. 163 | (abbreviation=addUserTQ:) 164 | 'removeTwitterQuery: [index];' Remove Twitter Query indicated by [index]. 165 | (abbreviation=removeTQ:) 166 | 167 | 'help:' Show this Message. 168 | 169 | Target: 170 | github 171 | gist 172 | github_code 173 | gitlab 174 | gitlab_snippet (Use RE match) 175 | google_custom 176 | pastebin (Use RE match) 177 | rss_feed 178 | twitter 179 | ``` 180 | 181 | Register Keywords to send commands to Slackbot. 182 | 183 | Commands are below. 184 | See 'help:' command for specific usage. 185 | 186 | |Command|Description|Search targets| 187 | |---|---|---| 188 | |setKeyword:|Register new search keyword|github, gist, github_code, gitlab, gitlab_snippet, google_custom, pastebin| 189 | |removeKeyword:|Remove specified search keyword|github, gist, github_code, gitlab, gitlab_snippet, google_custom, pastebin, twitter| 190 | |enableKeyword:|Enable specified search keyword|all| 191 | |disableKeyword:|Disable specified search keyword|all| 192 | |setSearchLevel:|Set Search Range of specified search keyword|github, github_code| 193 | |setExpireDate:|Set expiration date of specified search keyword|github, gist, github_code, gitlab, gitlab_snippet, google_custom, pastebin| 194 | |setChannel:|Set channel to notify of specified search keyword|all| 195 | |getKeyword:|Display lists of Enabled Keywords|all| 196 | |getAllKeyword:|Display lists of all registered keyword|all| 197 | |getSearchSetting:|Display current settings of specified search keyword|all| 198 | |setFeed:|Set new feed to rss_feed|-| 199 | |setFeedFilter:|Set new Feed Filter|-| 200 | |editFeedFilter:|Edit specified Feed Fileter|-| 201 | |removeFeedFilter:|Remove specified Feed Fileter|-| 202 | |setTwitterQuery:|Set new twitter search query|-| 203 | |editTwitterQuery:|Edit specified twitter search query|-| 204 | |addUserTwitterQuery:|Add user criteria to specified twitter search query|-| 205 | |removeTwitterQuery:|Remove specified twitter search query|-| 206 | |help:|Display help|-| 207 | 208 | The index is assigned to each search keywords. To change the setting, specify the index. 209 | The index of each keyword can be known by keyword registration or by 'getKeyword:' command. 210 | 211 | ![CodeScraper-1](img/codescraper-1.png) 212 | 213 | Notification on finding Github new repository 214 | 215 | ![CodeScraper-2](img/codescraper-2.png) 216 | 217 | ### Notices 218 | - Pastebin 219 | - Pastebin PRO Account(Paid Account) is required. 220 | - Access [here](https://pastebin.com/doc_scraping_api) and set your static IP you run CodeScraper to whitelist 221 | - Static IP is required. 222 | - Pastebin, Gitlab Snippet 223 | - Search keywords containing symbols as regular expressions(e.g. `[a-z1-7]{16}\.onion`, `example.com`). These searches are case sensitive 224 | - keywords with no symbols are not case sensitive. 225 | - Long regular expressions and patterns that take time to process can put a load on the CPU. It is recommended to refrain from it. 226 | - Google Custom Search 227 | - Free Google API has limit of 100 requests per day 228 | - If you set search interval to every 2 hour, you can register at most 8 search keywords (12 * 8 = 96 reqs). Be aware that there is a limit on the number of keywords depending on the frequency of search 229 | 230 | ## Author 231 | [blueblue](https://twitter.com/piedpiper1616) 232 | -------------------------------------------------------------------------------- /README_ja.md: -------------------------------------------------------------------------------- 1 | # CodeScraper 2 | 3 | ## 概要 4 | Github, Gitlab, Pastebinなどのサイトを事前に登録したキーワードで検索し、新しいものが見つかった際に通知してくれるSlackbotです 5 | 6 | 現在以下の機能が実装されています 7 | 8 | |名前|説明|備考| 9 | |---|---|---| 10 | |github|Github の新規リポジトリを検索します|| 11 | |gist|Github Gist の新規投稿を検索します|| 12 | |github_code|Github のコード検索を行います
精度は Github のインデックスによります|github apiが必要です| 13 | |gitlab|Gitlab の検索を行います|| 14 | |gitlab_snippet|Gitlab Snippetsの新規投稿のスクレイピングを行います|キーワードは正規表現で登録できます| 15 | |google_custom|Google Custom Search を用いて検索を行います|事前にサーチエンジンを作成し、そのEngine ID と API Token を設定する必要があります
無料版は1日100リクエストの制限があります| 16 | |pastebin|Pastebin Scraping API を用いて Pastebin のスクレイピングを行います|利用には Pastebin PRO Account(有償)が必要です
キーワードは正規表現で登録できます| 17 | |rss_feed|RSS Feedを取得します
通知する Feed は特定のワードでフィルタできます|| 18 | |twitter|Twitter の検索を行います||| 19 | 20 | 21 | ## Requirements 22 | Docker を利用して、動かすことを想定しています 23 | 24 | そうでない場合、MongoDBとPython3 及び以下のライブラリが必要です 25 | - slackbot (https://github.com/lins05/slackbot) 26 | - lxml 27 | - crontab 28 | - feedparser 29 | - python-dateutil 30 | - pymongo 31 | - pyquery 32 | 33 | ## インストール 34 | Dockerを利用する場合 35 | 36 | ```sh 37 | docker-compose build 38 | ``` 39 | 40 | Dockerを利用しない場合 41 | 42 | ```sh 43 | pip3 install -r requirements 44 | ``` 45 | 46 | ## 使い方 47 | ### CodeScrapterの起動 48 | 利用するには slackbot_settings.py が必要です 49 | 最初のこのファイルを作成してください 50 | 51 | ```sh 52 | mv ./master/slackbot_settings.py.sample ./master/slackbot_settings.py 53 | vim ./master/slackbot_settings.py 54 | ``` 55 | 56 | 設定ファイルを編集します 57 | - 必須項目 58 | - API_TOKEN (6行目) 59 | - Slackにログインしたうえで、 [ここ](https://my.slack.com/services/new/bot) にアクセスし、ボットを作成します 60 | - 作成したSlackボットのAPI Token を記載します 61 | - channels(17〜19行目) 62 | - Slackボットに通知させる Slackチャンネルを指定します 63 | - 上記で作成した Slackボット をここに記載するチャンネルに参加させておいてください 64 | - 最低でも1つ以上のチャンネルを記載しておく必要が有ります 65 | - Enable Targets(90〜99行目) 66 | - 有効化する検索ターゲットを選択してください(True|False) 67 | - 以下のターゲットを有効化するには別途設定が必要です 68 | - github_code : github_access_token の設定が必要です 69 | - pastebin : Pastebin PRO account(有償)と固定IPが必要です. 購入後、Pastebin に対してScraping API を利用する固定IPをホワイトリスト登録するように設定する必要があります 70 | - google_custom : google_custom_api_key と google_custom_search_engine_id の設定が必要になります 71 | - default_channel(22〜30行目) 72 | - 各ターゲットの検索結果を通知するチャンネルです 73 | - 各検索キーワードごとに変更することも可能です 74 | - channels に記載のない文字列が記載されている場合、channels の先頭のチャンネルが指定されます 75 | 76 | - 任意設定 77 | - github_access_token(81行目) 78 | - github_code を有効にした場合必要です 79 | - [ここ](https://github.com/settings/tokens) から取得してください 80 | - google_custom_api_key(86行目) 81 | - google_custom を有効にした場合必要です 82 | - [ここ](https://console.developers.google.com/) から取得してください 83 | - google_custom_search_engine_id(87行目) 84 | - google_custom を有効にした場合必要です 85 | - [ここ](https://console.developers.google.com/) から取得してください 86 | - Interval(102〜113行目) 87 | - 各検索ターゲットの検索実行時間を設定します 88 | - crontab の形式で記載します 89 | - default_settings(33〜77行目) 90 | - 各検索ターゲットにキーワードを登録する際のデフォルトの設定です 91 | - 各項目の内容は以下の表の通りです 92 | 93 | |項目|説明| 94 | |---|---| 95 | |Enable|キーワードの有効無効を設定します(True|False)| 96 | |SearchLevel|検索ターゲット github(1|2|3|4), github_code(1|2) における検索範囲の設定です. 大きい数のほうが多くの結果が得られます| 97 | |Time_Range|検索ターゲット github, gist における検索範囲の日数を設定します. 検索実行日から、ここに設定された日数前以降に作成されたものが検索対象となります| 98 | |Expire_date|キーワードの有効期限を設定します. 有効期限は登録時点に日にちから、ここに設定された日数後となります. 有効期限が切れたキーワードは自動で無効化されます| 99 | |Channel|通知するチャンネルを設定します. default_channel を設定している場合は変更は不要です| 100 | 101 | docker-compose により起動します 102 | 103 | ```sh 104 | docker-compose up -d 105 | ``` 106 | 107 | 起動に成功すると、 Slcak に 以下のような通知が来ます 108 | > ---CodeScraper Slackbot Started--- 109 | > 110 | > github : SUCCESS : Started
111 | > gist : SUCCESS : Started
112 | > gitlab : SUCCESS : Started
113 | 114 | ### CodeScraper コマンド 115 | 検索のキーワードは Slackボット を通じたコマンドによって操作します 116 | 117 | はじめに、Slackボットに対して、以下のコマンドを送ると、コマンドのヘルプを表示します 118 | 119 | ``` 120 | @{Slackbotの名前} help: 121 | ``` 122 | 123 | ``` 124 | Command Format is Following: 125 | {Command}: {target}; {arg1}; {arg2}; ... 126 | 127 | Command List: 128 | 129 | 'setKeyword: target; [word]' Add [word] as New Search Keyword with Default Settings. 130 | (abbreviation=setK:) 131 | 'removeKeyword: target; [index]'tRemove the Search Keyword indicated by [index]. 132 | (abbreviation=removeK:) 133 | 'enableKeyword: target; [index]' Enable the Search Keyword indicated by [index]. 134 | (abbreviation=enableK:) 135 | 'disableKeyword: target; [index]' Disable the Search Keyword indicated by [index]. 136 | (abbreviation=disableK:) 137 | 'setSearchLevel: target; [index]' Set Search Level of Github Search (1:easily 2:) indicated by [index]. It is used in github and github_code. 138 | (abbreviation=setSL:) 139 | 'setExpireDate: target; [index]; [expiration date]' Set a Expiration Date of the Keyword indicated by [index]. [expiration date] Format is YYYY-mm-dd. 140 | (abbreviation=setED:) 141 | 'setChannel: target; [index];[channel]' Set channel to notify the Search Keyword's result. 142 | (abbreviation=setC:) 143 | 'getKeyword: target;' Listing Enabled Search Keywords. 144 | (abbreviation=getK:) 145 | 'getAllKeyword: target;' Listing All Search Keyword (include Disabled Keywords). 146 | (abbreviation=getAllK:) 147 | 'getSearchSetting: target; [index]' Show Setting of the Search Keyword indicated by [index]. 148 | (abbreviation=getSS:) 149 | 150 | 'reMatchTest: target; [index]; [text]' Check wheaer the pattern indicated by [index] in [target] matches [text]. If set pattern to Pastebin ID, check the contens of pastebin. 151 | 'setFeed: [name]; [url]' Add RSS Feed to [url] as [name]. 152 | (abbreviation=setF:) 153 | 'setFeedFilter: [name]; [filter]' Add new RSS Feed Filter. Notily only contains filter words. 154 | (abbreviation=setFF:) 155 | 'editFeedFilter: [name]; [index]; filter' Edit Feed Filter indicated by [index] in RSS Feed of [name]. 156 | (abbreviation=editFF:) 157 | 'removeFeedFilter: [name]; [index];' Remove Feed Filter indicated by [index] in RSS Feed of [name]. 158 | (abbreviation=removeFF:) 159 | 'setTwitterQuery: [query]; ([users];)' Set [query] with Default Settings. If set [users], notify only from these users. 160 | (abbreviation=setTQ:) 161 | 'editTwitterQuery: [index]; [query]; ([users];)' Edit Twitter Query indicated by [index]. 162 | (abbreviation=editTQ:) 163 | 'addUserTwitterQuery: [index]; [users];' Add User to Twitter Query indicated by [index]. That query notify only from these users. 164 | (abbreviation=addUserTQ:) 165 | 'removeTwitterQuery: [index];' Remove Twitter Query indicated by [index]. 166 | (abbreviation=removeTQ:) 167 | 168 | 'help:' Show this Message. 169 | 170 | Target: 171 | github 172 | gist 173 | github_code 174 | gitlab 175 | gitlab_snippet (Use RE match) 176 | google_custom 177 | pastebin (Use RE match) 178 | rss_feed 179 | twitter 180 | ``` 181 | 182 | Slackボットに対してコマンドを送ることで、キーワードを登録していきます 183 | 184 | コマンドの以下の通りです. 185 | 具体的な利用方法は help: コマンドを参照してください 186 | 187 | |コマンド名|説明|有効な検索ターゲット| 188 | |---|---|---| 189 | |setKeyword:|新しいキーワードを登録します|github, gist, github_code, gitlab, gitlab_snippet, google_custom, pastebin| 190 | |removeKeyword:|指定したキーワードを削除します|github, gist, github_code, gitlab, gitlab_snippet, google_custom, pastebin, twitter| 191 | |enableKeyword:|指定したキーワードを有効化します|すべて| 192 | |disableKeyword:|指定したキーワードを無効化します|すべて| 193 | |setSearchLevel:|指定したキーワードの検索範囲を設定します|github, github_code| 194 | |setExpireDate:|指定したキーワードの有効期限を設定します|github, gist, github_code, gitlab, gitlab_snippet, google_custom, pastebin| 195 | |setChannel:|指定したキーワードを通知するSlackチャンネルを指定します|すべて| 196 | |getKeyword:|設定されている検索キーワードのうち有効化されているものの一覧を表示します|すべて| 197 | |getAllKeyword:|設定されている検索キーワードの一覧を表示します|すべて| 198 | |getSearchSetting:|指定したキーワードの現在の設定を表示します|すべて| 199 | |setFeed:|rss_feed に新たな Feed を登録します|-| 200 | |setFeedFilter:|Feed の通知フィルタを設定します|-| 201 | |editFeedFilter:|Feed の通知フィルタを編集します|-| 202 | |removeFeedFilter:|Feed の通知フィルタを削除します|-| 203 | |setTwitterQuery:|twitter に新たな検索クエリを登録します|-| 204 | |editTwitterQuery:|twitter 検索クエリを編集します|-| 205 | |addUserTwitterQuery:|twitter 検索クエリにユーザ条件を追加します|-| 206 | |removeTwitterQuery:|twitter 検索クエリを削除します|-| 207 | |help:|ヘルプを表示します|-| 208 | 209 | 登録されたキーワードには Index が振られます。設定の変更には Index を指定します。 210 | 各キーワードの Index はキーワードの登録時、もしくは getKeyword: コマンドによって知ることができます。 211 | 212 | ![CodeScraper-1](img/codescraper-1.png) 213 | 214 | Github 新しいリポジトリを見つけた際の通知 215 | 216 | ![CodeScraper-2](img/codescraper-2.png) 217 | 218 | ### 注意点 219 | - Pastebin 220 | - Pastebin PRO Account(有償)が必要です 221 | - 購入後、[このページ](https://pastebin.com/doc_scraping_api) にて、CodeScraperを動かすホストのIPをホワイトリスト登録してください 222 | - ホワイトリスト登録が必要なため、固定IPが必要です 223 | - Pastebin, Gitlab Snippet 224 | - 記号を含むキーワードは正規表現として検索します(例:`[a-z1-7]{16}\.onion`, `example.com`). これらは大文字小文字を区別します 225 | - 記号を含まないキーワードは大文字小文字を区別せずに、キーワードマッチを行います 226 | - 長い正規表現や処理に時間がかかるパターンはCPUに負荷をかける可能性があるため、控えましょう 227 | - Google Custom Search 228 | - 無料アカウントで利用できるリクエスト数には制限があり、1日100リクエストまでとなっています 229 | - 2時間おきに検索を行う設定とした場合、1日に12回検索を行います. 登録キーワードが9個を超えると、 12 * 9 = 108 となり、制限回数を超過します. 検索の頻度によって登録できるキーワードの数に制限があることを認識してください 230 | 231 | ## Author 232 | [blueblue](https://twitter.com/piedpiper1616) 233 | -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- 1 | version: "2" 2 | services: 3 | codescraper: 4 | image: codescraper:alpine 5 | build: . 6 | restart: always 7 | environment: 8 | DB_HOST: cs-db 9 | DB_PORT: 27017 10 | links: 11 | - cs-db 12 | volumes: 13 | - $PWD/master/slackbot_settings.py:/home/codescraper/master/slackbot_settings.py 14 | cs-db: 15 | image: mongo 16 | restart: always 17 | volumes: 18 | - db-contents:/data/db 19 | volumes: 20 | db-contents: 21 | driver: local 22 | -------------------------------------------------------------------------------- /img/codescraper-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blue1616/CodeScraper/5e27c51ea645c7fb48fc3caaf290a75c8ee067ec/img/codescraper-1.png -------------------------------------------------------------------------------- /img/codescraper-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blue1616/CodeScraper/5e27c51ea645c7fb48fc3caaf290a75c8ee067ec/img/codescraper-2.png -------------------------------------------------------------------------------- /master/log/.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blue1616/CodeScraper/5e27c51ea645c7fb48fc3caaf290a75c8ee067ec/master/log/.gitignore -------------------------------------------------------------------------------- /master/master_post.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | from slacker import Slacker 4 | import slackbot_settings 5 | import traceback 6 | 7 | def postNewPoCFound(word, repos, channel): 8 | url = 'https://github.com' 9 | slack = Slacker(slackbot_settings.API_TOKEN) 10 | try: 11 | slack.chat.post_message( 12 | channel, 13 | 'New Code Found about `' + word + '` at _github_', 14 | as_user=True 15 | ) 16 | message = '' 17 | for r in repos: 18 | message += url + '/' + r + '/\n' 19 | slack.chat.post_message( 20 | channel, 21 | message, 22 | as_user=True 23 | ) 24 | except: 25 | print("Could not send slack notification.") 26 | print(traceback.format_exc()) 27 | 28 | def postAnyData(word, channel): 29 | slack = Slacker(slackbot_settings.API_TOKEN) 30 | try: 31 | slack.chat.post_message( 32 | channel, 33 | word, 34 | as_user=True 35 | ) 36 | except: 37 | print("Could not send slack notification.") 38 | print(traceback.format_exc()) 39 | 40 | #if __name__ == '__main__': 41 | # slack = Slacker(slackbot_settings.API_TOKEN) 42 | # slack.chat.post_message( 43 | # 'bot_test', 44 | # 'Hello. I\'m Master', 45 | # as_user=True 46 | # ) 47 | -------------------------------------------------------------------------------- /master/plugins/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blue1616/CodeScraper/5e27c51ea645c7fb48fc3caaf290a75c8ee067ec/master/plugins/__init__.py -------------------------------------------------------------------------------- /master/plugins/edit_conf_db.py: -------------------------------------------------------------------------------- 1 | import json 2 | import datetime 3 | import shutil 4 | import time 5 | import re 6 | import os.path 7 | from pymongo import MongoClient 8 | 9 | client = None 10 | db = None 11 | collection = None 12 | collection_channel = None 13 | 14 | modules = [ 15 | 'github', 16 | 'gist', 17 | 'github_code', 18 | 'gitlab', 19 | 'gitlab_snippet', 20 | 'google_custom', 21 | 'pastebin', 22 | 'rss_feed', 23 | 'twitter' 24 | ] 25 | 26 | github_settings = { 27 | 'Target':'github', 28 | 'Enable':True, 29 | 'SearchLevel':1, 30 | 'Time_Range':1, 31 | 'Expire_date':1, 32 | 'Exclude_list':[], 33 | 'Channel':'None', 34 | '__MOD_ENABLE__' : True, 35 | '__INITIAL__' : True, 36 | '__SAFETY__': 0 37 | } 38 | 39 | gist_settings = { 40 | 'Target':'gist', 41 | 'Enable':True, 42 | 'Time_Range':1, 43 | 'Expire_date':1, 44 | 'Exclude_list':[], 45 | 'Channel':'None', 46 | '__MOD_ENABLE__' : True, 47 | '__INITIAL__' : True, 48 | '__SAFETY__': 0 49 | } 50 | 51 | github_code_settings = { 52 | 'Target':'github_code', 53 | 'Enable':True, 54 | 'SearchLevel':1, 55 | 'Expire_date':1, 56 | 'Exclude_list':[], 57 | 'Channel':'None', 58 | '__MOD_ENABLE__' : True, 59 | '__INITIAL__' : True, 60 | '__SAFETY__': 0 61 | } 62 | 63 | gitlab_settings = { 64 | 'Target':'gitlab', 65 | 'Enable':True, 66 | 'Expire_date':1, 67 | 'Exclude_list':[], 68 | 'Channel':'None', 69 | '__MOD_ENABLE__' : True, 70 | '__INITIAL__' : True, 71 | '__SAFETY__': 0 72 | } 73 | 74 | gitlab_snippet_settings = { 75 | 'Target':'gitlab_snippet', 76 | 'Enable':True, 77 | 'Expire_date':1, 78 | 'Exclude_list':[], 79 | 'Channel':'None', 80 | '__MOD_ENABLE__' : True, 81 | '__INITIAL__' : True, 82 | '__SAFETY__': 0 83 | } 84 | 85 | pastebin_settings = { 86 | 'Target':'pastebin', 87 | 'Enable':True, 88 | 'Expire_date':1, 89 | 'Exclude_list':[], 90 | 'Channel':'None', 91 | '__MOD_ENABLE__' : True, 92 | '__INITIAL__' : True, 93 | '__SAFETY__': 0 94 | } 95 | 96 | google_custom_settings = { 97 | 'Target':'google_custom', 98 | 'Enable':True, 99 | 'Expire_date':1, 100 | 'Exclude_list':[], 101 | 'Channel':'None', 102 | '__MOD_ENABLE__' : True, 103 | '__INITIAL__' : True, 104 | '__SAFETY__': 0 105 | } 106 | 107 | rss_feed_settings = { 108 | 'Target':'rss_feed', 109 | 'Enable' : True, 110 | 'Name' : 'None', 111 | 'URL' : 'None', 112 | 'Filters' : [], 113 | 'Channel' : 'None', 114 | 'Last_Post' : {'title':'None', 'link':'None', 'timestamp':'1970-01-01 00:00:00'}, 115 | '__MOD_ENABLE__' : True, 116 | '__INITIAL__' : True, 117 | '__SAFETY__': 0 118 | } 119 | 120 | twitter_settings = { 121 | 'Target':'twitter', 122 | 'Enable' : True, 123 | 'Query' : 'None', 124 | 'Users' : [], 125 | 'Channel' : 'None', 126 | 'Last_Post' : {'user':'None', 'text':'None', 'id':'None', 'link':'None'}, 127 | '__MOD_ENABLE__' : True, 128 | '__INITIAL__' : True, 129 | '__SAFETY__': 0 130 | } 131 | 132 | setting_set = { 133 | 'github':github_settings, 134 | 'gist':gist_settings, 135 | 'github_code':github_code_settings, 136 | 'gitlab':gitlab_settings, 137 | 'gitlab_snippet':gitlab_snippet_settings, 138 | 'google_custom':google_custom_settings, 139 | 'pastebin':pastebin_settings, 140 | 'rss_feed':rss_feed_settings, 141 | 'twitter':twitter_settings 142 | } 143 | 144 | def setDB(BD_HOST, DB_PORT, DB_NAME): 145 | global client 146 | global db 147 | global collection 148 | global collection_channel 149 | client = MongoClient(BD_HOST, DB_PORT) 150 | db = client[DB_NAME] 151 | collection = db['keywords'] 152 | collection_channel = db['channels'] 153 | 154 | 155 | def setUsingChannels(channels): 156 | collection_channel.remove(); 157 | if type(channels) == list: 158 | collection_channel.insert({'channels': channels}) 159 | return True 160 | else: 161 | False 162 | 163 | def getUsingChannels(): 164 | channels = collection_channel.find()[0] 165 | return channels['channels'] 166 | 167 | def setDefaultSettings(target, default_dict): 168 | if target in modules: 169 | channels = getUsingChannels() 170 | setter = setting_set[target] 171 | for k in default_dict.keys(): 172 | if k in setter.keys(): 173 | if type(default_dict[k]) != type(setter[k]): 174 | return False 175 | if k == 'SearchLevel' and not default_dict['SearchLevel'] in [1,2,3,4]: 176 | return False 177 | if k == 'Channel' and not default_dict['Channel'] in channels: 178 | default_dict['Channel'] = channels[0] 179 | setter[k] = default_dict[k] 180 | setter['Index'] = 0 181 | setter['KEY'] = '__DEFAULT_SETTING__' 182 | collection.update({ 183 | 'Target':target, 184 | 'KEY':'__DEFAULT_SETTING__' 185 | }, setter, upsert=True) 186 | return True 187 | else: 188 | return None 189 | 190 | def isEnable(target): 191 | if target in modules: 192 | data = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'}) 193 | if data.count() != 0 and data[0]['__MOD_ENABLE__'] == True: 194 | return True 195 | else: 196 | return False 197 | else: 198 | return None 199 | 200 | def disable(target): 201 | if target in modules: 202 | data = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'}) 203 | if data.count() != 0: 204 | updatedata = data[0] 205 | updatedata['__MOD_ENABLE__'] = False 206 | collection.update({ 207 | 'Target':target, 208 | 'KEY':'__DEFAULT_SETTING__' 209 | }, updatedata) 210 | return True 211 | else: 212 | False 213 | else: 214 | return None 215 | 216 | def setNewKeyword(target, word): 217 | if target in modules: 218 | default = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'})[0] 219 | default['KEY'] = word 220 | del default['__MOD_ENABLE__'], default['_id'], default['__SAFETY__'] 221 | if '__SEARCHEDPASTES__' in default.keys(): 222 | del default['__SEARCHEDPASTES__'] 223 | if 'Expire_date' in default: 224 | today = datetime.date.today() 225 | limit = (today + datetime.timedelta(int(default['Expire_date']))).strftime('%Y-%m-%d') 226 | default['Expire_date'] = limit 227 | replacedata = collection.find({'Target':target, 'KEY':word}) 228 | if replacedata.count() == 0: 229 | index = collection.find({'Target':target}).sort('Index', -1)[0]['Index'] + 1 230 | default['Index'] = index 231 | collection.insert(default) 232 | return index 233 | else: 234 | index = collection.find({'Target':target, 'KEY':word})[0]['Index'] 235 | default['Index'] = index 236 | collection.update({ 237 | 'Target':target, 238 | 'KEY':word 239 | }, default) 240 | return index * -1 241 | else: 242 | return None 243 | 244 | def removeKeyword(target, index): 245 | ret = None 246 | if target != 'rss_feed' and target in modules: 247 | data = collection.find({'Target':target, 'Index':index}) 248 | if data.count() != 0: 249 | key = data[0]['KEY'] 250 | collection.remove({'Target':target, 'Index':index}) 251 | ret = key 252 | return ret 253 | 254 | def getAllState(): 255 | result = {} 256 | for m in modules: 257 | default = collection.find({'Target':m, 'KEY':'__DEFAULT_SETTING__'}) 258 | if default.count() == 0: 259 | result[m] = False 260 | else: 261 | result[m] = default[0]['__MOD_ENABLE__'] 262 | return result 263 | 264 | def enableKeywordSetting(target, index, enable): 265 | if target in modules: 266 | if target == 'rss_feed': 267 | data = collection.find({'Target':target, 'Name':index}) 268 | else: 269 | data = collection.find({'Target':target, 'Index':index}) 270 | if data.count() != 0: 271 | if target == 'rss_feed': 272 | word = data[0]['Name'] 273 | collection.update({'Target':target, 'Name':index}, {'$set': {'Enable': enable}}) 274 | else: 275 | word = data[0]['KEY'] 276 | collection.update({'Target':target, 'Index':index}, {'$set': {'Enable': enable}}) 277 | return word 278 | else: 279 | return None 280 | else: 281 | None 282 | 283 | def setSearchLevel(target, index, level): 284 | if target in modules: 285 | data = collection.find({'Target':target, 'Index':index}) 286 | if data.count() != 0 and 'SearchLevel' in data[0].keys(): 287 | word = data[0]['KEY'] 288 | collection.update({'Target':target, 'Index':index}, {'$set': {'SearchLevel': level}}) 289 | return word 290 | else: 291 | return None 292 | else: 293 | return None 294 | 295 | def setSearchRange(target, index, days): 296 | if target in modules: 297 | data = collection.find({'Target':target, 'Index':index}) 298 | if data.count() != 0 and 'Time_Range' in data[0].keys(): 299 | word = data[0]['KEY'] 300 | collection.update({'Target':target, 'Index':index}, {'$set': {'Time_Range': days}}) 301 | return word 302 | else: 303 | return None 304 | else: 305 | return None 306 | 307 | def setExpireDate(target, index, limit): 308 | if target in modules: 309 | data = collection.find({'Target':target, 'Index':index}) 310 | regx = '\d{4}-(0[0-9]|1[0-2])-([0-2][0-9]|3[01])' 311 | if data.count() != 0 and 'Expire_date' in data[0].keys() and re.match(regx, limit): 312 | word = data[0]['KEY'] 313 | collection.update({'Target':target, 'Index':index}, {'$set': {'Expire_date': limit}}) 314 | return word 315 | else: 316 | return None 317 | else: 318 | return None 319 | 320 | def setChannel(target, index, channel): 321 | if target in modules: 322 | if target == 'rss_feed': 323 | data = collection.find({'Target':target, 'Name':index}) 324 | else: 325 | data = collection.find({'Target':target, 'Index':index}) 326 | channels = getUsingChannels() 327 | if data.count() != 0 and channel in channels: 328 | if target == 'rss_feed': 329 | word = data[0]['Name'] 330 | collection.update({'Target':target, 'Name':index}, {'$set': {'Channel': channel}}) 331 | return word 332 | else: 333 | word = data[0]['KEY'] 334 | collection.update({'Target':target, 'Index':index}, {'$set': {'Channel': channel}}) 335 | return word 336 | return word 337 | else: 338 | return None 339 | else: 340 | None 341 | 342 | def addExcludeList(target, index, exclude): 343 | if target in modules: 344 | data = collection.find({'Target':target, 'Index':index}) 345 | if data.count() != 0 and 'Exclude_list' in data[0].keys(): 346 | word = data[0]['KEY'] 347 | newlist = data[0]['Exclude_list'] + exclude 348 | collection.update({'Target':target, 'Index':index}, {'$set': {'Exclude_list': newlist}}) 349 | return word 350 | else: 351 | return None 352 | else: 353 | return None 354 | 355 | def clearExcludeList(target, index): 356 | if target in modules: 357 | data = collection.find({'Target':target, 'Index':index}) 358 | if data.count() != 0 and 'Exclude_list' in data[0].keys(): 359 | word = data[0]['KEY'] 360 | collection.update({'Target':target, 'Index':index}, {'$set': {'Exclude_list': []}}) 361 | return word 362 | else: 363 | return None 364 | else: 365 | return None 366 | 367 | def getKeywords(target): 368 | if target in modules: 369 | # data = list(collection.find({'Target':target})) 370 | data = list(collection.find({"$and": [{"Target": target}, {"KEY": {"$ne": '__DEFAULT_SETTING__'}}]}).sort('Index')) 371 | return data 372 | else: 373 | return None 374 | 375 | def getEnableKeywords(target): 376 | if target in modules: 377 | data = list(collection.find({"$and": [{"Target": target}, {"KEY": {"$ne": '__DEFAULT_SETTING__'}}, {'Enable': True}]}).sort('Index')) 378 | return data 379 | else: 380 | return None 381 | 382 | def getKeyword(target, index): 383 | if target in modules: 384 | if target == 'rss_feed': 385 | data = collection.find({'Target':target, 'Name':index}) 386 | else: 387 | data = collection.find({'Target':target, 'Index':index}) 388 | if data.count() == 0: 389 | return None 390 | else: 391 | return data[0] 392 | else: 393 | return None 394 | 395 | def getSafetyCount(target): 396 | if target in modules: 397 | data = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'}) 398 | if data.count() != 0: 399 | return data[0]['__SAFETY__'] 400 | else: 401 | None 402 | else: 403 | return None 404 | 405 | def setSafetyCount(target, count): 406 | if target in modules: 407 | data = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'}) 408 | if data.count() != 0: 409 | collection.update({'Target':target, 'KEY':'__DEFAULT_SETTING__'}, {'$set': {'__SAFETY__': count}}) 410 | return True 411 | else: 412 | False 413 | else: 414 | return None 415 | 416 | def setSearchedPastes(pastelist): 417 | target = 'pastebin' 418 | data = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'}) 419 | if data.count() != 0: 420 | collection.update({'Target':target, 'KEY':'__DEFAULT_SETTING__'}, {'$set': {'__SEARCHEDPASTES__': list(pastelist)}}) 421 | return True 422 | else: 423 | False 424 | 425 | def getSearchedPastes(): 426 | target = 'pastebin' 427 | data = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'}) 428 | if data.count() != 0: 429 | if '__SEARCHEDPASTES__' in data[0].keys(): 430 | return data[0]['__SEARCHEDPASTES__'] 431 | else: 432 | return [] 433 | else: 434 | False 435 | 436 | def setNewRSSFeed(name, url): 437 | target = 'rss_feed' 438 | ret = False 439 | data = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'}) 440 | if data.count() != 0: 441 | if collection.count({'Target':target, 'Name':name}) == 0: 442 | default = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'})[0] 443 | default['Name'] = name 444 | default['URL'] = url 445 | del default['__MOD_ENABLE__'], default['_id'], default['__SAFETY__'], default['KEY'] 446 | collection.insert(default) 447 | ret = True 448 | return ret 449 | 450 | def setNewRSSFilter(name, words, channel): 451 | target = 'rss_feed' 452 | data = collection.find({'Target':target, 'Name':name}) 453 | ret = None 454 | if data.count() != 0: 455 | filters = data[0]['Filters'] 456 | index = 0 457 | filter = {} 458 | for f in filters: 459 | if index < f['Index']: 460 | index = f['Index'] 461 | filter['Index'] = index 462 | filter['Channel'] = data[0]['Channel'] 463 | if channel in getUsingChannels(): 464 | filter['Channel'] = channel 465 | filter['Words'] = words 466 | filters.append(filter) 467 | ret = index 468 | collection.update({'Target':target, 'Name':name}, {'$set': {'Filters': filters}}) 469 | return ret 470 | 471 | def editRSSFilter(name, index, words, channel): 472 | target = 'rss_feed' 473 | data = collection.find({'Target':target, 'Name':name}) 474 | if data.count() != 0: 475 | filters = data[0]['Filters'] 476 | filter = {} 477 | i = 0 478 | for f in filters: 479 | if index == f['Index']: 480 | filter = {'Index' : f['Index'], 481 | 'Channel' : f['Channel'], 482 | 'Words' : f['Words']} 483 | if channel != '': 484 | filter['Channel'] = channel 485 | if words != []: 486 | filter['Words'] = words 487 | filters[i] = filter 488 | ret = True 489 | break 490 | i += 1 491 | collection.update({'Target':target, 'Name':name}, {'$set': {'Filters': filters}}) 492 | return True 493 | return False 494 | 495 | def removeRSSFilter(name, index): 496 | target = 'rss_feed' 497 | data = collection.find({'Target':target, 'Name':name}) 498 | ret = None 499 | if data.count() != 0: 500 | filters = data[0]['Filters'] 501 | i = 0 502 | for f in filters: 503 | if index == f['Index']: 504 | ret = f['Words'] 505 | filters.remove(f) 506 | break 507 | collection.update({'Target':target, 'Name':name}, {'$set': {'Filters': filters}}) 508 | return ret 509 | 510 | def haveSearched(target, name): 511 | keyword = 'rss_feed' 512 | if target == 'rss_feed': 513 | data = collection.find({'Target':target, 'Name':name}) 514 | else: 515 | data = collection.find({'Target':target, 'Index':name}) 516 | if data.count() != 0: 517 | if target == 'rss_feed': 518 | collection.update({'Target':target, 'Name':name}, {'$set': {'__INITIAL__': False}}) 519 | else: 520 | collection.update({'Target':target, 'Index':name}, {'$set': {'__INITIAL__': False}}) 521 | return True 522 | return False 523 | 524 | def setRSSLastPost(name, post): 525 | target = 'rss_feed' 526 | data = collection.find({'Target':target, 'Name':name}) 527 | if data.count() != 0: 528 | collection.update({'Target':target, 'Name':name}, {'$set': {'Last_Post': post}}) 529 | return True 530 | return False 531 | 532 | def setNewTwitterQuery(query, users): 533 | target = 'twitter' 534 | ret = False 535 | data = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'}) 536 | if data.count() != 0: 537 | default = collection.find({'Target':target, 'KEY':'__DEFAULT_SETTING__'})[0] 538 | del default['__MOD_ENABLE__'], default['_id'], default['__SAFETY__'], default['KEY'] 539 | default['Query'] = query 540 | default['Users'] = users 541 | if users != []: 542 | default['KEY'] = query + ' Users: ' + ', '.join(users) 543 | else: 544 | default['KEY'] = query 545 | index = collection.find({'Target':target}).sort('Index', -1)[0]['Index'] + 1 546 | default['Index'] = index 547 | collection.insert(default) 548 | ret = index 549 | return ret 550 | 551 | def editTwitterQuery(index, query, users): 552 | target = 'twitter' 553 | ret = None 554 | data = collection.find({'Target':target, 'Index':index}) 555 | if data.count() != 0: 556 | tq = data[0] 557 | if query != '': 558 | collection.update({'Target':target, 'Index':index}, {'$set': {'Query': query}}) 559 | ret = True 560 | else: 561 | query = tq['Query'] 562 | if users != []: 563 | collection.update({'Target':target, 'Index':index}, {'$set': {'Users': users}}) 564 | ret = True 565 | else: 566 | users = tq['Users'] 567 | if ret: 568 | key = query + ' User: ' + ', '.join(users) 569 | collection.update({'Target':target, 'Index':index}, {'$set': {'KEY': key}}) 570 | ret = key 571 | return ret 572 | 573 | def addUserToTwitterQuery(index, users): 574 | target = 'twitter' 575 | ret = False 576 | data = collection.find({'Target':target, 'Index':index}) 577 | if data.count() != 0: 578 | tq = data[0] 579 | if users != []: 580 | newlist = list(set(tq['Users'] + users)) 581 | collection.update({'Target':target, 'Index':index}, {'$set': {'Users': newlist}}) 582 | key = tq['Query'] + ' User: ' + ', '.join(newlist) 583 | collection.update({'Target':target, 'Index':index}, {'$set': {'KEY': key}}) 584 | ret = key 585 | return ret 586 | 587 | def setTwitterLastPost(index, post): 588 | target = 'twitter' 589 | data = collection.find({'Target':target, 'Index':index}) 590 | if data.count() != 0: 591 | collection.update({'Target':target, 'Index':index}, {'$set': {'Last_Post': post}}) 592 | return True 593 | return False 594 | -------------------------------------------------------------------------------- /master/plugins/getCommand.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | from slackbot.bot import respond_to 3 | from slackbot.bot import listen_to 4 | import re 5 | from . import edit_conf_db as ec 6 | import os.path 7 | import requests 8 | import feedparser 9 | import lxml.html 10 | 11 | targets = [ 12 | 'github', 13 | 'github_code', 14 | 'gist', 15 | 'gitlab', 16 | 'gitlab_snippet', 17 | 'pastebin', 18 | 'google_custom', 19 | 'rss_feed', 20 | 'twitter' 21 | ] 22 | 23 | def getPostData(keyword, index, target): 24 | post_data = '' 25 | if index > 0: 26 | post_data = 'Set New Search Keyword : `{keyword}` (index : {index}) in _{target}_'.format(keyword=keyword, index=abs(index), target=target) 27 | elif index < 0: 28 | post_data = 'Initialize Search Keyword : `{keyword}` (index : {index}) in _{target}_'.format(keyword=keyword, index=abs(index), target=target) 29 | else: 30 | post_data = 'Error has Occured' 31 | return post_data 32 | 33 | def getEnabledTargets(): 34 | etargets = [] 35 | state = ec.getAllState() 36 | for name,enable in state.items(): 37 | if enable: 38 | etargets.append(name) 39 | return etargets 40 | 41 | @respond_to('setKeyword: (.*)') 42 | @respond_to('setK: (.*)') 43 | def setKeyword(message, params): 44 | target = '' 45 | enabled = getEnabledTargets() 46 | targets.append('all') 47 | for t in targets: 48 | if params.strip().startswith(t + ';'): 49 | target = t 50 | params = params.replace(t + ';', '', 1) 51 | break 52 | if target in enabled or target == 'all': 53 | word = params.split(';')[0].strip() 54 | post_data = '' 55 | if word == '': 56 | post_data = 'Please Put a Word' 57 | else: 58 | setter = [ 59 | 'github', 60 | 'github_code', 61 | 'gist', 62 | 'gitlab' 63 | ] 64 | if target in enabled: 65 | ret = ec.setNewKeyword(target, word) 66 | post_data = getPostData(word, ret, target) 67 | elif target == 'all': 68 | enabled = list(set(enabled) & set(setter)) 69 | for s in enabled: 70 | ret = ec.setNewKeyword(s, word) 71 | if ret != 0: 72 | if post_data != '': 73 | post_data += '\n' 74 | post_data += getPostData(word, ret, s) 75 | else: 76 | post_data = 'Invalid Target' 77 | ret = message._client.webapi.chat.post_message( 78 | message._body['channel'], 79 | post_data, 80 | as_user=True, 81 | ) 82 | 83 | @respond_to('removeKeyword: (.*)') 84 | @respond_to('removeK: (.*)') 85 | def removeKeyword(message, params): 86 | target = '' 87 | enabled = getEnabledTargets() 88 | for t in targets: 89 | if params.strip().startswith(t + ';'): 90 | target = t 91 | params = params.replace(t + ';', '', 1) 92 | break 93 | post_data = '' 94 | if target in enabled: 95 | params = params.split(';')[0].strip() 96 | if params.isdigit(): 97 | index = int(params.strip()) 98 | ret = ec.removeKeyword(target, index) 99 | if ret != None: 100 | post_data = '`{key}`(index : {index}) is removed in _{target}_'.format(key=ret, index=str(index), target=target) 101 | else: 102 | post_data = 'No Data' 103 | else: 104 | post_data = 'Please Put Index of the Keyword' 105 | else: 106 | post_data = 'Invalid Target' 107 | ret = message._client.webapi.chat.post_message( 108 | message._body['channel'], 109 | post_data, 110 | as_user=True, 111 | ) 112 | 113 | @respond_to('disableKeyword: (.*)') 114 | @respond_to('disableK: (.*)') 115 | def disableKeyword(message, params): 116 | target = '' 117 | enabled = getEnabledTargets() 118 | for t in targets: 119 | if params.strip().startswith(t + ';'): 120 | target = t 121 | params = params.replace(t + ';', '', 1) 122 | break 123 | if target in enabled: 124 | params = params.split(';')[0] 125 | if (target != 'rss_feed' and params.strip().isdigit()) or target == 'rss_feed': 126 | if target != 'rss_feed': 127 | index = int(params.strip()) 128 | else: 129 | index = params.strip() 130 | ret = ec.enableKeywordSetting(target, index, False) 131 | if ret == None: 132 | post_data = 'No Data' 133 | else: 134 | post_data = '`{keyword}` is disabled in _{target}_'.format(keyword=ret, target=target) 135 | else: 136 | post_data = 'Invalid Target' 137 | ret = message._client.webapi.chat.post_message( 138 | message._body['channel'], 139 | post_data, 140 | as_user=True, 141 | ) 142 | 143 | @respond_to('enableKeyword: (.*)') 144 | @respond_to('enableK: (.*)') 145 | def enableKeyword(message, params): 146 | target = '' 147 | enabled = getEnabledTargets() 148 | for t in targets: 149 | if params.strip().startswith(t + ';'): 150 | target = t 151 | params = params.replace(t + ';', '', 1) 152 | break 153 | if target in enabled: 154 | params = params.split(';')[0] 155 | if (target != 'rss_feed' and params.strip().isdigit()) or target == 'rss_feed': 156 | if target != 'rss_feed': 157 | index = int(params.strip()) 158 | else: 159 | index = params.strip() 160 | ret = ec.enableKeywordSetting(target, index, True) 161 | if ret == None: 162 | post_data = 'No Data' 163 | else: 164 | post_data = '`{keyword}` is enabled in _{target}_'.format(keyword=ret, target=target) 165 | else: 166 | post_data = 'Please Put Index of the Word' 167 | else: 168 | post_data = 'Invalid Target' 169 | ret = message._client.webapi.chat.post_message( 170 | message._body['channel'], 171 | post_data, 172 | as_user=True, 173 | ) 174 | 175 | @respond_to('setSearchLevel: (.*)') 176 | @respond_to('setSL: (.*)') 177 | def setSearchLevel(message, params): 178 | target = '' 179 | enabled = getEnabledTargets() 180 | for t in targets: 181 | if params.strip().startswith(t + ';'): 182 | target = t 183 | params = params.replace(t + ';', '', 1) 184 | break 185 | words = params.strip().split(';') 186 | valid_targets = [ 187 | 'github', 188 | 'github_code' 189 | ] 190 | if target in valid_targets and target in enabled: 191 | if words[0].strip().isdigit(): 192 | index = int(words[0].strip()) 193 | if len(words) > 1: 194 | if words[1].strip().isdigit(): 195 | valid_num = [1, 2, 3, 4] 196 | if int(words[1].strip()) in valid_num: 197 | ret = ec.setSearchLevel(target, index, int(words[1].strip())) 198 | if ret == '': 199 | post_data = 'No Data' 200 | else: 201 | post_data = 'Set `{keyword}` Search Level to {level}'.format(keyword=ret, level=words[1].strip()) 202 | else: 203 | post_data = 'Invalid Search Level' 204 | else: 205 | post_data = 'Please Put Index of the Word' 206 | else: 207 | post_data = 'Parameter Shortage' 208 | else: 209 | post_data = 'Please Put Index of the Word' 210 | else: 211 | post_data = 'Invalid Target' 212 | ret = message._client.webapi.chat.post_message( 213 | message._body['channel'], 214 | post_data, 215 | as_user=True, 216 | ) 217 | 218 | @respond_to('setSearchTimeRange: (.*)') 219 | @respond_to('setSTR: (.*)') 220 | def setSearchTimeRange(message, params): 221 | enabled = getEnabledTargets() 222 | target = '' 223 | for t in targets: 224 | if params.strip().startswith(t + ';'): 225 | target = t 226 | params = params.replace(t + ';', '', 1) 227 | break 228 | words = params.strip().split(';') 229 | valid_targets = [ 230 | 'github', 231 | 'gist' 232 | ] 233 | if target in valid_targets and target in enabled: 234 | if words[0].strip().isdigit(): 235 | index = int(words[0].strip()) 236 | if len(words) > 1: 237 | if words[1].strip().isdigit(): 238 | ret = ec.setSearchRange(target, index, int(words[1].strip())) 239 | if ret == '': 240 | post_data = 'No Data' 241 | else: 242 | post_data = '`{keyword}` serach in _{target}_ in last {range} days'.format(keyword=ret, target=target, range=words[1].strip()) 243 | else: 244 | post_data = 'Parameter Shortage' 245 | else: 246 | post_data = 'Please Put Index of the Word' 247 | else: 248 | post_data = 'Invalid Target' 249 | ret = message._client.webapi.chat.post_message( 250 | message._body['channel'], 251 | post_data, 252 | as_user=True, 253 | ) 254 | 255 | @respond_to('setExpireDate: (.*)') 256 | @respond_to('setED: (.*)') 257 | def setExpireDate(message, params): 258 | enabled = getEnabledTargets() 259 | target = '' 260 | for t in targets: 261 | if params.strip().startswith(t + ';'): 262 | target = t 263 | params = params.replace(t + ';', '', 1) 264 | break 265 | words = params.strip().split(';') 266 | if 'rss_feed' in enabled: 267 | enabled.remove('rss_feed') 268 | if target in enabled: 269 | if words[0].strip().isdigit(): 270 | index = int(words[0].strip()) 271 | if len(words) > 1: 272 | regx = '\d{4}-(0[0-9]|1[0-2])-([0-2][0-9]|3[01])' 273 | if re.match(regx, words[1].strip()): 274 | ret = ec.setExpireDate(target, index, words[1].strip()) 275 | if ret == '': 276 | post_data = 'No Data' 277 | else: 278 | post_data = '`{keyword}` in _{target}_ will expire at {date}'.format(keyword=ret, target=target, date=words[1].strip()) 279 | else: 280 | post_data = 'Parameter Pattern not Match' 281 | else: 282 | post_data = 'Parameter Shortage' 283 | else: 284 | post_data = 'Please Put Index of the Word' 285 | else: 286 | post_data = 'Invalid Target' 287 | ret = message._client.webapi.chat.post_message( 288 | message._body['channel'], 289 | post_data, 290 | as_user=True, 291 | ) 292 | 293 | @respond_to('setChannel: (.*)') 294 | @respond_to('setC: (.*)') 295 | def setChannel(message, params): 296 | base = os.path.dirname(os.path.abspath(__file__)) 297 | channelfile = os.path.normpath(os.path.join(base, '../settings/channellist')) 298 | enabled = getEnabledTargets() 299 | target = '' 300 | for t in targets: 301 | if params.strip().startswith(t + ';'): 302 | target = t 303 | params = params.replace(t + ';', '', 1) 304 | break 305 | if target in enabled: 306 | words = params.strip().split(';') 307 | if (target != 'rss_feed' and words[0].strip().isdigit()) or target == 'rss_feed': 308 | if target != 'rss_feed': 309 | index = int(words[0].strip()) 310 | else: 311 | index = words[0].strip() 312 | if len(words) > 1: 313 | channels = ec.getUsingChannels() 314 | if words[1].strip() in channels: 315 | ret = ec.setChannel(target, index, words[1].strip()) 316 | if ret == '': 317 | post_data = 'No Data' 318 | else: 319 | post_data = '`{keyword}` result in _{target}_ will notify at {channel}'.format(keyword=ret, target=target, channel=words[1].strip()) 320 | else: 321 | post_data = 'Parameter Pattern not Match' 322 | else: 323 | post_data = 'Parameter Shortage' 324 | else: 325 | post_data = 'Please Put Index of the Word' 326 | else: 327 | post_data = 'Invalid Target' 328 | ret = message._client.webapi.chat.post_message( 329 | message._body['channel'], 330 | post_data, 331 | as_user=True, 332 | ) 333 | 334 | @respond_to('addExcludeList: (.*)') 335 | @respond_to('addEL: (.*)') 336 | def addExcludeList(message, params): 337 | target = '' 338 | for t in targets: 339 | if params.strip().startswith(t + ';'): 340 | target = t 341 | params = params.replace(t + ';', '', 1) 342 | break 343 | words = params.strip().split(';') 344 | valid_targets = [ 345 | 'github', 346 | 'github_code', 347 | 'gist', 348 | 'gitlab', 349 | 'gitlab_snippet', 350 | 'google_custom' 351 | ] 352 | enabled = list(set(getEnabledTargets()) & set(valid_targets)) 353 | if target in enabled: 354 | if words[0].strip().isdigit(): 355 | index = int(words[0].strip()) 356 | if len(words) > 1: 357 | for word in words[1:]: 358 | if word != '': 359 | ret = ec.addExcludeList(target, index, word) 360 | if ret == '': 361 | post_data = 'No Data' 362 | break 363 | else: 364 | post_data = 'Add {words} in Exclude List of `{keyword}` in _{target}_'.format(words=','.join(words[1:]), keyword=ret, target=target) 365 | else: 366 | post_data = 'No Data' 367 | else: 368 | post_data = 'Parameter Shortage' 369 | else: 370 | post_data = 'Please Put Index of the Word' 371 | else: 372 | post_data = 'Invalid Target' 373 | ret = message._client.webapi.chat.post_message( 374 | message._body['channel'], 375 | post_data, 376 | as_user=True, 377 | ) 378 | 379 | @respond_to('clearExcludeList: (.*)') 380 | @respond_to('clearEL: (.*)') 381 | def clearExcludeList(message, params): 382 | for t in targets: 383 | if params.strip().startswith(t + ';'): 384 | target = t 385 | params = params.replace(t + ';', '', 1) 386 | break 387 | valid_targets = [ 388 | 'github', 389 | 'github_code', 390 | 'gist', 391 | 'gitlab', 392 | 'gitlab_snippet', 393 | 'google_custom' 394 | ] 395 | enabled = list(set(getEnabledTargets()) & set(valid_targets)) 396 | params = params.split(';')[0] 397 | if target in enabled: 398 | if params.strip().isdigit(): 399 | index = int(params.strip()) 400 | ret = ec.clearExcludeList(target, index) 401 | if ret == None: 402 | post_data = 'No Data' 403 | else: 404 | post_data = 'Delete All Exclude List of `{keyword}` in _{target}_'.format(keyword=ret, target= target) 405 | else: 406 | post_data = 'Please Put Index of the Word' 407 | else: 408 | post_data = 'Invalid Target' 409 | ret = message._client.webapi.chat.post_message( 410 | message._body['channel'], 411 | post_data, 412 | as_user=True, 413 | ) 414 | 415 | @respond_to('getKeyword: (.*)') 416 | @respond_to('getK: (.*)') 417 | def getKeyword(message, params): 418 | post_data = '' 419 | target = 'all' 420 | targets.append('all') 421 | for t in targets: 422 | if params.strip().startswith(t + ';'): 423 | target = t 424 | params = params.replace(t + ';', '', 1) 425 | break 426 | enabled = getEnabledTargets() 427 | if target in enabled or target == 'all': 428 | for g in enabled: 429 | if target == g or target == 'all': 430 | keys = ec.getEnableKeywords(g) 431 | if keys != []: 432 | if g == 'rss_feed': 433 | post_data += '-- Enabled RSS Feeds --\n' 434 | for k in keys: 435 | post_data += str(k['Name']) + ' : `' + k['URL'] + '`\n' 436 | else: 437 | post_data += '-- Enabled Search Keyword in _{target}_ --\n'.format(target=g) 438 | for k in keys: 439 | post_data += str(k['Index']) + ' : `' + k['KEY'] + '`\n' 440 | else: 441 | post_data = 'Invalid Target' 442 | if post_data == '': 443 | post_data = 'I don\'t have any data yet' 444 | ret = message._client.webapi.chat.post_message( 445 | message._body['channel'], 446 | post_data, 447 | as_user=True, 448 | ) 449 | 450 | @respond_to('getAllKeyword: (.*)') 451 | @respond_to('getAllK: (.*)') 452 | def getAllKeyword(message, params): 453 | post_data = '' 454 | target = 'all' 455 | targets.append('all') 456 | for t in targets: 457 | if params.strip().startswith(t + ';'): 458 | target = t 459 | params = params.replace(t + ';', '', 1) 460 | break 461 | enabled = getEnabledTargets() 462 | if target in enabled or target == 'all': 463 | for g in enabled: 464 | if target == g or target == 'all': 465 | keys = ec.getKeywords(g) 466 | if keys != []: 467 | if g == 'rss_feed': 468 | post_data += '-- Enabled RSS Feeds --' 469 | for k in keys: 470 | post_data += str(k['Name']) + ' : `' + k['URL'] + '`\n' 471 | else: 472 | post_data += '-- Enabled Search Keyword in _{target}_ --\n'.format(target=g) 473 | for k in keys: 474 | post_data += str(k['Index']) + ' : `' + k['KEY'] + '`\n' 475 | else: 476 | post_data = 'Invalid Target' 477 | if post_data == '': 478 | post_data = 'I don\'t have any data yet' 479 | ret = message._client.webapi.chat.post_message( 480 | message._body['channel'], 481 | post_data, 482 | as_user=True, 483 | ) 484 | 485 | @respond_to('getSearchSetting: (.*)') 486 | @respond_to('getSS: (.*)') 487 | def getKeywordSetting(message, params): 488 | post_data = '' 489 | target = '' 490 | for t in targets: 491 | if params.strip().startswith(t + ';'): 492 | target = t 493 | params = params.replace(t + ';', '', 1) 494 | break 495 | params = params.split(';')[0] 496 | enabled = getEnabledTargets() 497 | if target in enabled: 498 | if target == 'rss_feed': 499 | keyword = ec.getKeyword(target, params.strip()) 500 | if keyword == None: 501 | post_data = 'No Data' 502 | else: 503 | conf_params = [ 504 | 'Enable', 505 | 'URL', 506 | 'Filters', 507 | 'Channel', 508 | 'Last_Post' 509 | ] 510 | post_data = 'Settings of `' + keyword['Name'] + '` is :\n```' 511 | for p in conf_params: 512 | if p == 'Filters': 513 | v = '' 514 | for f in keyword['Filters']: 515 | v += '\n\tINDEX: ' + str(f['Index']) + '\n' 516 | v += '\tWORDS: ' + ', '.join(f['Words']) + '\n' 517 | v += '\tCHANNEL: ' + f['Channel'] 518 | else: 519 | v = keyword[p] 520 | post_data += p.upper().replace('_', ' ') + ': ' + str(v) + '\n' 521 | post_data += '```' 522 | else: 523 | if params.strip().isdigit(): 524 | index = int(params.strip()) 525 | keyword = ec.getKeyword(target, index) 526 | if keyword == None: 527 | post_data = 'No Data' 528 | else: 529 | conf_params = [ 530 | 'Index', 531 | 'Enable', 532 | 'Query', 533 | 'Users', 534 | 'SearchLevel', 535 | 'Time_Range', 536 | 'Expire_date', 537 | 'Channel', 538 | 'Last_Post' 539 | ] 540 | post_data = 'Settings of `' + keyword['KEY'] + '` is :\n```' 541 | for p in conf_params: 542 | if p in keyword.keys(): 543 | v = keyword[p] 544 | post_data += p.upper().replace('_', ' ') + ': ' + str(v) + '\n' 545 | post_data += '```' 546 | else: 547 | post_data = 'Please Put Index of the Word' 548 | else: 549 | post_data = 'Invalid Target' 550 | ret = message._client.webapi.chat.post_message( 551 | message._body['channel'], 552 | post_data, 553 | as_user=True, 554 | ) 555 | 556 | def isMatched(word, text): 557 | symbols = r'[!\"#$%&\'()*+,\-./:;<=>@\[\]^_{|}~\\]' 558 | re_symbol = re.compile(symbols) 559 | repatt = False 560 | matched = True 561 | if re.search(re_symbol, word): 562 | if re.search(patt, text): 563 | matched = True 564 | else: 565 | matched = False 566 | else: 567 | for p in word.split(' '): 568 | if text.lower().find(p.lower()) < 0: 569 | matched = False 570 | return matched 571 | 572 | @respond_to('reMatchTest: (.*)') 573 | def reMatchTest(message, params): 574 | post_data = '' 575 | target = 'pastebin' 576 | for t in targets: 577 | if params.strip().startswith(t + ';'): 578 | target = t 579 | params = params.replace(t + ';', '', 1) 580 | break 581 | if target == 'pastebin' or target == 'gitlab_snippet': 582 | words = params.strip().split(';') 583 | if words[0].strip().isdigit(): 584 | index = int(words[0].strip()) 585 | if len(words) > 1: 586 | candidatelist = ec.getKeywordlist(target) 587 | if index in candidatelist.values(): 588 | key = '' 589 | for k,v in candidatelist.items(): 590 | if v == index: 591 | key = k 592 | break 593 | word = words[1].strip() 594 | post_data = '' 595 | if re.match('[a-zA-Z0-9]{8}', word): 596 | post_data += 'Found Pastebin id pattern.\n' 597 | word = 'https://pastebin.com/raw/' + word 598 | raw_result = requests.get(word, timeout=10) 599 | if raw_result.status_code == 200: 600 | if isMatched(key, raw_result.text): 601 | post_data += 'The pattern, `{keyword}` match to contents of {url}'.format(keyword=key, url=word) 602 | else: 603 | post_data += 'The pattern, `{keyword}` not match to contents of {url}'.format(keyword=key, url=word) 604 | else: 605 | post_data += 'I couldn\'t access to {url}'.format(url=word) 606 | else: 607 | if isMatched(key, raw_result.text): 608 | post_data += 'The pattern, `{keyword}` match'.format(keyword=key) 609 | else: 610 | post_data += 'The pattern, `{keyword}` not match'.format(keyword=key) 611 | else: 612 | post_data = 'No Data' 613 | else: 614 | post_data = 'Parameter Shortage' 615 | else: 616 | post_data = 'Please Put Index of the Word' 617 | else: 618 | post_data = 'Invalid Target' 619 | ret = message._client.webapi.chat.post_message( 620 | message._body['channel'], 621 | post_data, 622 | as_user=True, 623 | ) 624 | 625 | def checkRSSUrl(url): 626 | try: 627 | headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:50.0) Gecko/20100101 Firefox/50.0'} 628 | response = requests.get(url, timeout=4, headers=headers) 629 | rss = feedparser.parse(response.text) 630 | rssurl = None 631 | if rss['version'] == 'rss10' or rss['version'] == 'rss20' or rss['version'] == 'atom10': 632 | rssurl = url 633 | else: 634 | root = lxml.html.fromstring(response.text) 635 | for link in root.xpath('//link[@type="application/rss+xml"]'): 636 | url = link.get('href') 637 | rss = feedparser.parse(url) 638 | if rss['version'] == 'rss10' or rss['version'] == 'rss20' or rss['version'] == 'atom10': 639 | rssurl = url 640 | return rssurl 641 | except: 642 | return None 643 | 644 | @respond_to('setFeed: (.*)') 645 | @respond_to('setF: (.*)') 646 | def setNewFeed(message, params): 647 | target = '' 648 | enabled = getEnabledTargets() 649 | if 'rss_feed' in enabled: 650 | name = params.split(';')[0].strip() 651 | post_data = '' 652 | if name == '': 653 | post_data = 'Please Put a Name' 654 | else: 655 | url = '' 656 | if len(params) > params.find(';')+1: 657 | url = params[params.find(';')+1:].strip() 658 | if url == '': 659 | post_data = 'Please Put URL if RSS Feed' 660 | patt = r'https?://[A-Za-z0-9\-.]{0,62}?\.([A-Za-z0-9\-.]{1,255})/?[A-Za-z0-9.\-?=#%/]*' 661 | url = re.search(patt, url).group(0) 662 | rssurl = checkRSSUrl(url) 663 | if rssurl == None: 664 | post_data = 'Invalid RSS URL' 665 | else: 666 | if url != rssurl: 667 | post_data += 'RSS Feed Found. ' 668 | ret = ec.setNewRSSFeed(name, rssurl) 669 | if not ret: 670 | post_data = '{name} is already used'.format(name=name) 671 | else: 672 | post_data += 'Set `{url}` to _{name}_'.format(url=rssurl, name=name) 673 | else: 674 | post_data = 'RSS is not enabled' 675 | ret = message._client.webapi.chat.post_message( 676 | message._body['channel'], 677 | post_data, 678 | as_user=True, 679 | ) 680 | 681 | @respond_to('setFeedFilter: (.*)') 682 | @respond_to('setFF: (.*)') 683 | def setFeedFilter(message, params): 684 | target = '' 685 | enabled = getEnabledTargets() 686 | if 'rss_feed' in enabled: 687 | words = params.split(';') 688 | post_data = '' 689 | if len(words) < 2: 690 | post_data = 'Parameter Shortage' 691 | else: 692 | name = words[0].strip() 693 | filter = [] 694 | for w in words[1].split(' '): 695 | if w.strip() != '': 696 | filter.append(w.strip()) 697 | channel = '' 698 | if len(words) > 2: 699 | channel = words[2].strip() 700 | ret = ec.setNewRSSFilter(name, filter, channel) 701 | if ret != None: 702 | post_data = 'New Filter `[{filter}]`(index : {index}) is set to _{name}_'.format(filter=', '.join(filter), name=name, index=ret) 703 | else: 704 | post_data = 'Error has Occured' 705 | else: 706 | post_data = 'RSS is not enabled' 707 | ret = message._client.webapi.chat.post_message( 708 | message._body['channel'], 709 | post_data, 710 | as_user=True, 711 | ) 712 | 713 | @respond_to('editFeedFilter: (.*)') 714 | @respond_to('editFF: (.*)') 715 | def editFeedFilter(message, params): 716 | enabled = getEnabledTargets() 717 | if 'rss_feed' in enabled: 718 | words = params.split(';') 719 | post_data = '' 720 | if len(words) < 4: 721 | post_data = 'Parameter Shortage' 722 | else: 723 | name = words[0].strip() 724 | if words[1].strip().isdigit(): 725 | filter = [] 726 | index = int(words[1].strip()) 727 | for w in words[2].split(' '): 728 | if w != '': 729 | filter.append(w.strip()) 730 | channel = '' 731 | if len(words) > 3: 732 | channel = words[3].strip() 733 | ret = ec.editRSSFilter(name, index, filter, channel) 734 | if ret: 735 | post_data = '`[{filter}]`(index : {index}) is set in _{name}_'.format(filter=', '.join(filter), index=index, name=name) 736 | else: 737 | post_data = 'Error has Occured' 738 | else: 739 | post_data = 'Please Put Index of the Filter' 740 | else: 741 | post_data = 'RSS is not enabled' 742 | ret = message._client.webapi.chat.post_message( 743 | message._body['channel'], 744 | post_data, 745 | as_user=True, 746 | ) 747 | 748 | @respond_to('removeFeedFilter: (.*)') 749 | @respond_to('removeFF: (.*)') 750 | def removeFeedFilter(message, params): 751 | enabled = getEnabledTargets() 752 | if 'rss_feed' in enabled: 753 | words = params.split(';') 754 | post_data = '' 755 | if len(words) < 2: 756 | post_data = 'Parameter Shortage' 757 | else: 758 | name = words[0].strip() 759 | if words[1].strip().isdigit(): 760 | index = int(words[1].strip()) 761 | ret = ec.removeRSSFilter(name, index) 762 | if ret != None: 763 | post_data = '`[{filter}]`(index : {index}) is removed in _{name}_'.format(filter=', '.join(ret), name=name, index=index) 764 | else: 765 | post_data = 'Error has Occured' 766 | else: 767 | post_data = 'Please Put Index of the Filter' 768 | else: 769 | post_data = 'RSS is not enabled' 770 | ret = message._client.webapi.chat.post_message( 771 | message._body['channel'], 772 | post_data, 773 | as_user=True, 774 | ) 775 | 776 | @respond_to('setTwitterQuery: (.*)') 777 | @respond_to('setTQ: (.*)') 778 | def setTwitterQuery(message, params): 779 | target = '' 780 | enabled = getEnabledTargets() 781 | if 'twitter' in enabled: 782 | words = params.split(';') 783 | post_data = '' 784 | users = [] 785 | query = '' 786 | continueflag = True 787 | if len(words) == 1: 788 | if words[0].strip() != '': 789 | query = words[0].strip() 790 | else: 791 | post_data = 'Query is Empty' 792 | continueflag = False 793 | elif len(words) > 1: 794 | if words[0].strip() != '' or words[1].strip() != '': 795 | query = words[0].strip() 796 | for u in words[1].split(' '): 797 | if u.strip() != '': 798 | users.append(u.strip()) 799 | else: 800 | post_data = 'Query is Empty' 801 | continueflag = False 802 | else: 803 | post_data = 'Parameter Shortage' 804 | continueflag = False 805 | if continueflag: 806 | name = words[0].strip() 807 | ret = ec.setNewTwitterQuery(query, users) 808 | if ret != None: 809 | if users != []: 810 | key = query + ' Users: ' + ', '.join(users) 811 | else: 812 | key = query 813 | post_data = 'New Twitter Search Query `[{query}]`(index : {index}) was set'.format(query=key, index=ret) 814 | else: 815 | post_data = 'Error has Occured' 816 | else: 817 | post_data = 'Twitter is not enabled' 818 | ret = message._client.webapi.chat.post_message( 819 | message._body['channel'], 820 | post_data, 821 | as_user=True, 822 | ) 823 | 824 | @respond_to('editTwitterQuery: (.*)') 825 | @respond_to('editTQ: (.*)') 826 | def editTwitterQuery(message, params): 827 | enabled = getEnabledTargets() 828 | if 'twitter' in enabled: 829 | post_data = '' 830 | params = params.split(';') 831 | index = params[0].strip() 832 | if index.isdigit(): 833 | index = int(index) 834 | if len(params) == 2: 835 | ret = ec.editTwitterQuery(index, params[1].strip(), []) 836 | if ret != None: 837 | post_data = '`{key}` was set in TwitterQuery (index : {index})'.format(key=ret, index=str(index)) 838 | else: 839 | post_data = 'No Data' 840 | elif len(params) > 2: 841 | users = [] 842 | for u in params[2].strip().split(' '): 843 | if u.strip() != '': 844 | users.append(u.strip()) 845 | ret = ec.editTwitterQuery(index, params[1].strip(), users) 846 | if ret != None: 847 | post_data = '`{key}` was set in TwitterQuery (index : {index})'.format(key=ret, index=str(index)) 848 | else: 849 | post_data = 'No Data' 850 | else: 851 | post_data = 'Parameter Shortage' 852 | else: 853 | post_data = 'Please Put Index of the Keyword' 854 | else: 855 | post_data = 'Twitter is not enabled' 856 | ret = message._client.webapi.chat.post_message( 857 | message._body['channel'], 858 | post_data, 859 | as_user=True, 860 | ) 861 | 862 | @respond_to('addUserTwitterQuery: (.*)') 863 | @respond_to('addUserTQ: (.*)') 864 | def addUserTwitterQuery(message, params): 865 | enabled = getEnabledTargets() 866 | if 'twitter' in enabled: 867 | post_data = '' 868 | params = params.split(';') 869 | index = params[0].strip() 870 | if index.isdigit(): 871 | index = int(index) 872 | if len(params) > 1: 873 | users = [] 874 | for u in params[1].strip().split(' '): 875 | if u.strip() != '': 876 | users.append(u.strip()) 877 | ret = ec.addUserToTwitterQuery(index, users) 878 | if ret != None: 879 | post_data = '`{key}` was set in TwitterQuery (index : {index})'.format(key=ret, index=str(index)) 880 | else: 881 | post_data = 'No Data' 882 | else: 883 | post_data = 'Parameter Shortage' 884 | else: 885 | post_data = 'Please Put Index of the Keyword' 886 | else: 887 | post_data = 'Twitter is not enabled' 888 | ret = message._client.webapi.chat.post_message( 889 | message._body['channel'], 890 | post_data, 891 | as_user=True, 892 | ) 893 | 894 | @respond_to('removeTwitterQuery: (.*)') 895 | @respond_to('removeTQ: (.*)') 896 | def removeTwitterQuery(message, params): 897 | target = 'twitter' 898 | enabled = getEnabledTargets() 899 | post_data = '' 900 | if target in enabled: 901 | params = params.split(';')[0].strip() 902 | if params.isdigit(): 903 | index = int(params.strip()) 904 | ret = ec.removeKeyword(target, index) 905 | if ret != None: 906 | post_data = '`{key}`(index : {index}) is removed in _twitter_'.format(key=ret, index=str(index)) 907 | else: 908 | post_data = 'No Data' 909 | else: 910 | post_data = 'Please Put Index of the Keyword' 911 | else: 912 | post_data = 'Invalid Target' 913 | ret = message._client.webapi.chat.post_message( 914 | message._body['channel'], 915 | post_data, 916 | as_user=True, 917 | ) 918 | 919 | @respond_to('help:') 920 | def getAllKeyword(message): 921 | # candidatelist = setting.getKeywordlist() 922 | post_data = '''```Command Format is Following: 923 | \t{Command}: {target}; {arg1}; {arg2}; ... 924 | 925 | Command List: 926 | 927 | \'setKeyword: target; [word]\'\tAdd [word] as New Search Keyword with Default Settings. 928 | (abbreviation=setK:) 929 | \'removeKeyword: target; [index]\'tRemove the Search Keyword indicated by [index]. 930 | (abbreviation=removeK:) 931 | \'enableKeyword: target; [index]\'\tEnable the Search Keyword indicated by [index]. 932 | (abbreviation=enableK:) 933 | \'disableKeyword: target; [index]\'\tDisable the Search Keyword indicated by [index]. 934 | (abbreviation=disableK:) 935 | \'setSearchLevel: target; [index]\'\tSet Search Level of Github Search (1-4) or Gihub Code Search (1-2) indicated by [index]. 936 | (abbreviation=setSL:) 937 | \'setExpireDate: target; [index]; [expiration date]\'\tSet a Expiration Date of the Keyword indicated by [index]. [expiration date] Format is YYYY-mm-dd. 938 | (abbreviation=setED:) 939 | \'setChannel: target; [index];[channel]\'\tSet channel to notify the Search Keyword\'s result. 940 | (abbreviation=setC:) 941 | \'getKeyword: target;\'\tListing Enabled Search Keywords. 942 | (abbreviation=getK:) 943 | \'getAllKeyword: target;\'\tListing All Search Keyword (include Disabled Keywords). 944 | (abbreviation=getAllK:) 945 | \'getSearchSetting: target; [index]\'\tShow Setting of the Search Keyword indicated by [index]. 946 | (abbreviation=getSS:) 947 | 948 | \'reMatchTest: target; [index]; [text]\'\tCheck wheaer the pattern indicated by [index] in [target] matches [text]. If set pattern to Pastebin ID, check the contens of pastebin. 949 | \'setFeed: [name]; [url]\'\tAdd RSS Feed to [url] as [name]. 950 | (abbreviation=setF:) 951 | \'setFeedFilter: [name]; [filter]\'\tAdd new RSS Feed Filter. Notily only contains filter words. 952 | (abbreviation=setFF:) 953 | \'editFeedFilter: [name]; [index]; filter\'\tEdit Feed Filter indicated by [index] in RSS Feed of [name]. 954 | (abbreviation=editFF:) 955 | \'removeFeedFilter: [name]; [index];\'\tRemove Feed Filter indicated by [index] in RSS Feed of [name]. 956 | (abbreviation=removeFF:) 957 | \'setTwitterQuery: [query]; ([users];)\'\tSet [query] with Default Settings. If set [users], notify only from these users. 958 | (abbreviation=setTQ:) 959 | \'editTwitterQuery: [index]; [query]; ([users];)\'\tEdit Twitter Query indicated by [index]. 960 | (abbreviation=editTQ:) 961 | \'addUserTwitterQuery: [index]; [users];\'\tAdd User to Twitter Query indicated by [index]. That query notify only from these users. 962 | (abbreviation=addUserTQ:) 963 | \'removeTwitterQuery: [index];\'\tRemove Twitter Query indicated by [index]. 964 | (abbreviation=removeTQ:) 965 | 966 | \'help:\'\tShow this Message. 967 | 968 | Target: 969 | \tgithub 970 | \tgist 971 | \tgithub_code 972 | \tgitlab 973 | \tgitlab_snippet (Use RE match) 974 | \tgoogle_custom 975 | \tpastebin (Use RE match) 976 | \trss_feed 977 | \ttwitter\n```''' 978 | ret = message._client.webapi.chat.post_message( 979 | message._body['channel'], 980 | post_data, 981 | as_user=True, 982 | ) 983 | 984 | @listen_to('How are you?') 985 | def reaction(message): 986 | isername=message._client.login_data['self']['name'], 987 | message.send('I\'m fine, thank you.') 988 | -------------------------------------------------------------------------------- /master/run.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | import argparse 4 | from crontab import CronTab 5 | import datetime 6 | import logging 7 | from logging import getLogger, StreamHandler, Formatter 8 | from multiprocessing import Pool 9 | import math 10 | import os.path 11 | import sys 12 | import time 13 | import traceback 14 | from slackbot.bot import Bot 15 | import master_post as master 16 | import search_api 17 | import slackbot_settings 18 | import plugins.edit_conf_db as ec 19 | 20 | logger = logging.getLogger(__name__) 21 | logger.setLevel(logging.INFO) 22 | fh = logging.FileHandler('./log/run.log') 23 | logger.addHandler(fh) 24 | formatter = logging.Formatter('%(asctime)s - %(levelname)s - line %(lineno)d - %(name)s - %(filename)s - \n*** %(message)s') 25 | fh.setFormatter(formatter) 26 | 27 | def doSpecialAct(target, channel, key, result): 28 | if target == 'github': 29 | pass 30 | elif target == 'gist': 31 | pass 32 | elif target == 'github_code': 33 | pass 34 | elif target == 'gitlab': 35 | pass 36 | elif target == 'gitlab_snippet': 37 | pass 38 | elif target == 'google_custom': 39 | pass 40 | elif target == 'pastebin': 41 | pass 42 | elif target == 'twitter': 43 | pass 44 | 45 | def getSpecialChannel(): 46 | try: 47 | channel = slackbot_settings.special_action_channel 48 | if type(channel) != list: 49 | return [] 50 | else: 51 | return channel 52 | except: 53 | return [] 54 | 55 | def runSearchGithub(): 56 | try: 57 | logger.info('--START GITHUB SEARCH--') 58 | now = datetime.date.today() 59 | today = now.strftime('%Y-%m-%d') 60 | 61 | target = 'github' 62 | keywords = ec.getEnableKeywords(target) 63 | 64 | if ec.isEnable(target) and keywords != None and keywords != []: 65 | safe_limit = 6 66 | error_safety = ec.getSafetyCount(target) 67 | for key in keywords: 68 | channel = key['Channel'] 69 | limittime = datetime.datetime.strptime(key['Expire_date'], '%Y-%m-%d').date() 70 | if now < limittime: 71 | oldtime = now - datetime.timedelta(key['Time_Range']) 72 | oldday = oldtime.strftime('%Y-%m-%d') 73 | (results, statuscode) = search_api.searchGithub(key['KEY'], oldday, key['SearchLevel']) 74 | result = list(set(results) - set(key['Exclude_list'])) 75 | if statuscode != 200: 76 | error_safety += 1 77 | ec.setSafetyCount(target, error_safety) 78 | postdata = '`' + key['KEY'] + '` failed to search in _github_.\nStatus Code: ' + str(statuscode) 79 | master.postAnyData(postdata, channel) 80 | logger.info(postdata) 81 | if error_safety > safe_limit: 82 | postdata = 'Too Many Errors. _Github_ Module is disabled for safety' 83 | ec.disable(target) 84 | master.postAnyData(postdata, channel) 85 | logger.info(postdata) 86 | else: 87 | ec.setSafetyCount(target, 0) 88 | if result != []: 89 | if channel in getSpecialChannel(): 90 | doSpecialAct(target, channel, key['KEY'], result) 91 | master.postNewPoCFound(key['KEY'], result, channel) 92 | logger.info('keyword : ' + key['KEY']) 93 | logger.info('\n'.join(result)) 94 | exclude = results 95 | ec.clearExcludeList(target, key['Index']) 96 | ec.addExcludeList(target, key['Index'], exclude) 97 | time.sleep(10) 98 | else: 99 | postdata = '`' + key['KEY'] + '` expired in _github_, and was disabled.' 100 | logger.info(postdata) 101 | master.postAnyData(postdata, channel) 102 | ec.enableKeywordSetting(target, key['Index'], False) 103 | except: 104 | logger.error('--ERROR HAS OCCURED IN GITHUB SEARCH--') 105 | logger.error(traceback.format_exc()) 106 | master.postAnyData(traceback.format_exc(), slackbot_settings.channels[0]) 107 | 108 | def runSearchGithubCode(): 109 | try: 110 | logger.info('--START GITHUB CODE SEARCH--') 111 | now = datetime.date.today() 112 | today = now.strftime('%Y-%m-%d') 113 | 114 | api_key = slackbot_settings.github_access_token 115 | 116 | target = 'github_code' 117 | keywords = ec.getEnableKeywords(target) 118 | 119 | if ec.isEnable(target) and keywords != None and keywords != []: 120 | safe_limit = 6 121 | error_safety = ec.getSafetyCount(target) 122 | for key in keywords: 123 | channel = key['Channel'] 124 | limittime = datetime.datetime.strptime(key['Expire_date'], '%Y-%m-%d').date() 125 | if now < limittime: 126 | (results, statuscode) = search_api.searchGithubCode(key['KEY'], key['SearchLevel'], api_key) 127 | result = list(set(results) - set(key['Exclude_list'])) 128 | if statuscode != 200: 129 | error_safety += 1 130 | ec.setSafetyCount('github_code', error_safety) 131 | postdata = '`' + key['KEY'] + '` failed to search in _github_code_.\nStatus Code: ' + str(statuscode) 132 | master.postAnyData(postdata, channel) 133 | logger.info(postdata) 134 | if error_safety > safe_limit: 135 | postdata = 'Too Many Errors. _Github Code_ Module is disabled for safety' 136 | ec.disable('github_code') 137 | master.postAnyData(postdata, channel) 138 | logger.info(postdata) 139 | else: 140 | ec.setSafetyCount('github_code', 0) 141 | if key['__INITIAL__'] == True: 142 | ec.haveSearched(target, key['Index']) 143 | if result != []: 144 | postdata = 'New Code Found about `' + key['KEY'] + '` in _github_code_' 145 | master.postAnyData(postdata, channel) 146 | if key['__INITIAL__'] == True: 147 | master.postAnyData(result[0], channel) 148 | else: 149 | if channel in getSpecialChannel(): 150 | doSpecialAct(target, channel, key['KEY'], result) 151 | master.postAnyData('\n'.join(result), channel) 152 | logger.info('keyword : ' + key['KEY']) 153 | logger.info('\n'.join(result)) 154 | exclude = results 155 | # ec.clearExcludeList('github_code', conf['Index']) 156 | ec.addExcludeList('github_code', key['Index'], exclude) 157 | time.sleep(10) 158 | else: 159 | postdata = '`' + key['KEY'] + '` expired in _github_code_, and was disabled.' 160 | logger.info(postdata) 161 | master.postAnyData(postdata, channel) 162 | ec.enableKeywordSetting('github_code', key['Index'], False) 163 | except: 164 | logger.error('--ERROR HAS OCCURED IN GITHUB SEARCH--') 165 | logger.error(traceback.format_exc()) 166 | master.postAnyData(traceback.format_exc(), slackbot_settings.channels[0]) 167 | 168 | def runSearchGist(): 169 | try: 170 | logger.info('--START GIST SEARCH--') 171 | now = datetime.date.today() 172 | today = now.strftime('%Y-%m-%d') 173 | 174 | target = 'gist' 175 | keywords = ec.getEnableKeywords(target) 176 | 177 | if ec.isEnable(target) and keywords != None and keywords != []: 178 | safe_limit = 6 179 | error_safety = ec.getSafetyCount(target) 180 | for key in keywords: 181 | channel = key['Channel'] 182 | limittime = datetime.datetime.strptime(key['Expire_date'], '%Y-%m-%d').date() 183 | if now < limittime: 184 | oldtime = now - datetime.timedelta(key['Time_Range']) 185 | oldday = oldtime.strftime('%Y-%m-%d') 186 | (results, statuscode) = search_api.searchGist(key['KEY'], oldday) 187 | result = list(set(results) - set(key['Exclude_list'])) 188 | if statuscode != 200: 189 | error_safety += 1 190 | ec.setSafetyCount(target, error_safety) 191 | postdata = '`' + key['KEY'] + '` failed to search in _gist_.\nStatus Code: ' + str(statuscode) 192 | master.postAnyData(postdata, channel) 193 | logger.info(postdata) 194 | if error_safety > safe_limit: 195 | postdata = 'Too Many Errors. _Gist_ Module is disabled for safety' 196 | ec.disable(target) 197 | master.postAnyData(postdata, channel) 198 | logger.info(postdata) 199 | else: 200 | ec.setSafetyCount(target, 0) 201 | if result != []: 202 | if channel in getSpecialChannel(): 203 | doSpecialAct(target, channel, key['KEY'], result) 204 | postdata = 'New Code Found about `' + key['KEY'] + '` in _gist_' 205 | master.postAnyData(postdata, channel) 206 | master.postAnyData('\n'.join(result), channel) 207 | logger.info('keyword : ' + key['KEY']) 208 | logger.info('\n'.join(result)) 209 | exclude = results 210 | ec.clearExcludeList(target, key['Index']) 211 | ec.addExcludeList(target, key['Index'], exclude) 212 | time.sleep(45) 213 | else: 214 | postdata = '`' + key['KEY'] + '` is expired in _gist_, and disabled.' 215 | master.postAnyData(postdata, channel) 216 | ec.enableKeywordSetting(target, key['Index'], False) 217 | logger.info(postdata) 218 | except: 219 | logger.error('--ERROR HAS OCCURED IN GIST SEARCH--') 220 | logger.error(traceback.format_exc()) 221 | master.postAnyData(traceback.format_exc(), slackbot_settings.channels[0]) 222 | 223 | def runSearchGitlab(): 224 | try: 225 | logger.info('--START GITLAB SEARCH--') 226 | now = datetime.date.today() 227 | today = now.strftime('%Y-%m-%d') 228 | 229 | target = 'gitlab' 230 | keywords = ec.getEnableKeywords(target) 231 | 232 | if ec.isEnable(target) and keywords != None and keywords != []: 233 | safe_limit = 6 234 | error_safety = ec.getSafetyCount(target) 235 | for key in keywords: 236 | channel = key['Channel'] 237 | limittime = datetime.datetime.strptime(key['Expire_date'], '%Y-%m-%d').date() 238 | if now < limittime: 239 | (results, statuscode) = search_api.searchGitlab(key['KEY']) 240 | result = list(set(results) - set(key['Exclude_list'])) 241 | if statuscode != 200: 242 | error_safety += 1 243 | ec.setSafetyCount(target, error_safety) 244 | postdata = '`' + key['KEY'] + '` failed to search in _gitlab_.\nStatus Code: ' + str(statuscode) 245 | master.postAnyData(postdata, channel) 246 | logger.info(postdata) 247 | if error_safety > safe_limit: 248 | postdata = 'Too Many Errors. _Gitlab_ Module is disabled for safety' 249 | ec.disable(target) 250 | master.postAnyData(postdata, channel) 251 | logger.info(postdata) 252 | else: 253 | if error_safety != 0: 254 | ec.setSafetyCount(target, 0) 255 | if key['__INITIAL__'] == True: 256 | ec.haveSearched(target, key['Index']) 257 | if result != []: 258 | postdata = 'New Code Found about `' + key['KEY'] + '` in _gitlab_' 259 | master.postAnyData(postdata, channel) 260 | url = [] 261 | for i in result: 262 | url.append('https://gitlab.com' + i) 263 | if key['__INITIAL__'] == True: 264 | master.postAnyData(url[0], channel) 265 | else: 266 | if channel in getSpecialChannel(): 267 | doSpecialAct(target, channel, key['KEY'], url) 268 | master.postAnyData('\n'.join(url), channel) 269 | logger.info('keyword : ' + key['KEY']) 270 | logger.info('\n'.join(url)) 271 | exclude = results 272 | ec.clearExcludeList(target, key['Index']) 273 | ec.addExcludeList(target, key['Index'], exclude) 274 | time.sleep(30) 275 | else: 276 | postdata = '`' + key['KEY'] + '` is expired in _gitlab_, and disabled.' 277 | master.postAnyData(postdata, channel) 278 | ec.enableKeywordSetting(target, key['Index'], False) 279 | logger.info(postdata) 280 | except: 281 | logger.error('--ERROR HAS OCCURED IN GITLAB SEARCH--') 282 | logger.error(traceback.format_exc()) 283 | master.postAnyData(traceback.format_exc(), slackbot_settings.channels[0]) 284 | 285 | def runSearchGitlabSnippets(): 286 | try: 287 | logger.info('--START GITLAB SNIPPETS SEARCH--') 288 | now = datetime.date.today() 289 | today = now.strftime('%Y-%m-%d') 290 | 291 | target = 'gitlab_snippet' 292 | keywords = ec.getEnableKeywords(target) 293 | 294 | if ec.isEnable(target) and keywords != None and keywords != []: 295 | safe_limit = 6 296 | error_safety = ec.getSafetyCount(target) 297 | for key in keywords: 298 | channel = key['Channel'] 299 | limittime = datetime.datetime.strptime(key['Expire_date'], '%Y-%m-%d').date() 300 | if now > limittime: 301 | postdata = '`' + key['KEY'] + '` is expired in _gitlab_snippet_, and disabled.' 302 | master.postAnyData(postdata, channel) 303 | ec.enableKeywordSetting('gitlab_snippet', key['Index'], False) 304 | logger.info(postdata) 305 | keywords = ec.getEnableKeywords(target) 306 | if keywords != None and keywords != []: 307 | keylist = [d.get('KEY') for d in keywords] 308 | (results, statuscode) = search_api.searchGitlabSnippets(keylist) 309 | if statuscode != 200: 310 | error_safety += 1 311 | ec.setSafetyCount(target, error_safety) 312 | postdata = '_gitlab_snippet_ failed to search.\nStatus Code: ' + str(statuscode) 313 | master.postAnyData(postdata, channel) 314 | logger.info(postdata) 315 | if error_safety > safe_limit: 316 | postdata = 'Too Many Errors. _Gitlab Snippet_ Module is disabled for safety' 317 | ec.disable(target) 318 | master.postAnyData(postdata, channel) 319 | logger.info(postdata) 320 | else: 321 | ec.setSafetyCount(target, 0) 322 | for key in keywords: 323 | if key['KEY'] in results.keys(): 324 | result = list(set(results[key['KEY']]) - set(key['Exclude_list'])) 325 | if result != []: 326 | channel = key['Channel'] 327 | postdata = 'New Code Found about `' + key['KEY'] + '` in _gitlab_snippet_' 328 | master.postAnyData(postdata, channel) 329 | logger.info(postdata) 330 | url = [] 331 | for i in result: 332 | url.append('https://gitlab.com' + i) 333 | logger.info('https://gitlab.com' + i) 334 | # exclude = list(set(results[word]) & set(keywords[word][1])) 335 | exclude = results[key['KEY']] 336 | if channel in getSpecialChannel(): 337 | doSpecialAct(target, channel, key['KEY'], url) 338 | master.postAnyData('\n'.join(url), channel) 339 | ec.clearExcludeList(target, key['Index']) 340 | ec.addExcludeList(target, key['Index'], exclude) 341 | except: 342 | logger.error('--ERROR HAS OCCURED IN GITLAB SNIPPETS SEARCH--') 343 | logger.error(traceback.format_exc()) 344 | master.postAnyData(traceback.format_exc(), slackbot_settings.channels[0]) 345 | 346 | def runSearchPastebin(): 347 | logger.info('--START PASTEBIN SEARCH--') 348 | while True: 349 | try: 350 | now = datetime.date.today() 351 | today = now.strftime('%Y-%m-%d') 352 | target = 'pastebin' 353 | keywords = ec.getEnableKeywords(target) 354 | 355 | if ec.isEnable(target) and keywords != None and keywords != []: 356 | safe_limit = 10 357 | error_safety = ec.getSafetyCount(target) 358 | for key in keywords: 359 | channel = key['Channel'] 360 | limittime = datetime.datetime.strptime(key['Expire_date'], '%Y-%m-%d').date() 361 | if now > limittime: 362 | postdata = '`' + key['KEY'] + '` is expired in _pastebin_, and disabled.' 363 | master.postAnyData(postdata, channel) 364 | ec.enableKeywordSetting(target, key['Index'], False) 365 | logger.info(postdata) 366 | keywords = ec.getEnableKeywords(target) 367 | if keywords != None and keywords != []: 368 | (pastelist, statuscode) = search_api.getPasteList(100) 369 | if statuscode != 200: 370 | error_safety += 1 371 | ec.setSafetyCount(target, error_safety) 372 | postdata = 'pastebin serach failed in _pastebin_.\nStatus Code: ' + str(statuscode) 373 | master.postAnyData(postdata, channel) 374 | logger.info(postdata) 375 | if error_safety == 5: 376 | postdata = 'Pause to access pastebin' 377 | master.postAnyData(postdata, channel) 378 | logger.info(postdata) 379 | time.sleep(300) 380 | if error_safety > safe_limit: 381 | postdata = 'Too Many Errors. _Pastebin_ Module is disabled for safety' 382 | ec.disable('pastebin') 383 | master.postAnyData(postdata, channel) 384 | logger.info(postdata) 385 | else: 386 | searchedpastes = ec.getSearchedPastes() 387 | searchlist = {} 388 | for paste, conf in pastelist.items(): 389 | if not paste in searchedpastes: 390 | searchlist[paste] = conf 391 | if len(searchlist.keys()) > 30: 392 | ec.setSearchedPastes(pastelist.keys()) 393 | logger.info('The number of scraping pastes is ' + str(len(searchlist.keys()))) 394 | keylist = [d.get('KEY') for d in keywords] 395 | (results, statuscode) = search_api.scrapePastebin(keylist, searchlist) 396 | if statuscode != 200: 397 | error_safety += 1 398 | ec.setSafetyCount(target, error_safety) 399 | postdata = 'pastebin serach failed in _pastebin_.\nStatus Code: ' + str(statuscode) 400 | master.postAnyData(postdata, channel) 401 | logger.info(postdata) 402 | if error_safety > safe_limit: 403 | postdata = 'Too Many Errors. _Pastebin_ Module is disabled for safety' 404 | ec.disable(target) 405 | master.postAnyData(postdata, channel) 406 | logger.info(postdata) 407 | else: 408 | ec.setSafetyCount(target, 0) 409 | for key in keywords: 410 | if key['KEY'] in results.keys(): 411 | if results[key['KEY']] != []: 412 | channel = key['Channel'] 413 | postdata = 'New Code Found about `' + key['KEY'] + '` in _pastebin_' 414 | if channel in getSpecialChannel(): 415 | doSpecialAct(target, channel, key['KEY'], results[key['KEY']]) 416 | master.postAnyData(postdata, channel) 417 | logger.info(postdata) 418 | exclude = results[key['KEY']] 419 | master.postAnyData('\n'.join(results[key['KEY']]), channel) 420 | time.sleep(10) 421 | except: 422 | logger.error('--ERROR HAS OCCURED IN PASTEBIN SEARCH--') 423 | logger.error(traceback.format_exc()) 424 | master.postAnyData(traceback.format_exc(), slackbot_settings.channels[0]) 425 | time.sleep(10) 426 | 427 | def runSearchGoogleCustom(): 428 | try: 429 | engine_id = slackbot_settings.google_custom_search_engine_id 430 | api_key = slackbot_settings.google_custom_api_key 431 | logger.info('--START GOOGLE CUSTOM SEARCH--') 432 | now = datetime.date.today() 433 | today = now.strftime('%Y-%m-%d') 434 | 435 | target = 'google_custom' 436 | keywords = ec.getEnableKeywords(target) 437 | 438 | if ec.isEnable(target) and keywords != None and keywords != []: 439 | safe_limit = 6 440 | error_safety = ec.getSafetyCount(target) 441 | for key in keywords: 442 | limittime = datetime.datetime.strptime(key['Expire_date'], '%Y-%m-%d').date() 443 | channel = key['Channel'] 444 | if now < limittime: 445 | (result, statuscode) = search_api.googleCustomSearch(key['KEY'], engine_id, api_key) 446 | if statuscode != 200: 447 | error_safety += 1 448 | ec.setSafetyCount(target, error_safety) 449 | postdata = '`' + key['KEY'] + '` failed to search in _google_custom_.\nStatus Code: ' + str(statuscode) 450 | master.postAnyData(postdata, channel) 451 | logger.info(postdata) 452 | if error_safety > safe_limit: 453 | postdata = 'Too Many Errors. _Google Custom_ Module is disabled for safety' 454 | ec.disable(target) 455 | master.postAnyData(postdata, channel) 456 | logger.info(postdata) 457 | else: 458 | result_post = list(set(result.keys()) - set(key['Exclude_list'])) 459 | ec.setSafetyCount(target, 0) 460 | if key['__INITIAL__'] == True: 461 | ec.haveSearched(target, key['Index']) 462 | if result_post != []: 463 | postdata = 'New Code Found about `' + key['KEY'] + '` in _google_custom_' 464 | master.postAnyData(postdata, channel) 465 | logger.info(postdata) 466 | if key['__INITIAL__'] == True: 467 | result_post = result_post[:1] 468 | for i in result_post: 469 | logger.info(i) 470 | post_code = result[i][0] + '\n' + i + '\n' 471 | if channel in getSpecialChannel(): 472 | doSpecialAct(target, channel, key['KEY'], post_code) 473 | master.postAnyData(post_code, channel) 474 | exclude = list(result.keys()) 475 | # ec.clearExcludeList('google_custom', conf['Index']) 476 | ec.addExcludeList(target, key['Index'], exclude) 477 | time.sleep(30) 478 | else: 479 | postdata = '`' + key['KEY'] + '` is expired in _google_custom_, and disabled.' 480 | master.postAnyData(postdata, channel) 481 | ec.enableKeywordSetting(target, key['Index'], False) 482 | logger.info(postdata) 483 | except: 484 | logger.error('--ERROR HAS OCCURED IN GOOGLE CUSTOM SEARCH--') 485 | logger.error(traceback.format_exc()) 486 | master.postAnyData(traceback.format_exc(), slackbot_settings.channels[0]) 487 | 488 | def filterFeeds(feeds, filter): 489 | filtereditems = [] 490 | for f in feeds: 491 | matched = True 492 | for word in filter: 493 | target = '__ALL__' 494 | name = word 495 | pos = word.find('>') 496 | if pos > 0 and len(word) > pos+1: 497 | target = word[:pos].strip() 498 | name = word[pos+1:].strip() 499 | if name.startswith('!'): 500 | name = name[1:].strip() 501 | if target in f.keys(): 502 | if type(f[target]) == list: 503 | text = ''.join(map(str, f[target])) 504 | else: 505 | text = f[target] 506 | if text.lower().find(name.lower()) > 0: 507 | matched = False 508 | else: 509 | text = '' 510 | for i in f.keys(): 511 | if type(f[i]) == list: 512 | text += ''.join(map(str, f[i])) 513 | else: 514 | text += f[i] 515 | if text.lower().find(name.lower()) > 0: 516 | matched = False 517 | else: 518 | if target in f.keys(): 519 | if type(f[target]) == list: 520 | text = ''.join(map(str,f[target])) 521 | else: 522 | text = f[target] 523 | if text.lower().find(name.lower()) < 0: 524 | matched = False 525 | else: 526 | text = '' 527 | for i in f.keys(): 528 | if type(f[i]) == list: 529 | text += ''.join(map(str, f[i])) 530 | else: 531 | text += f[i] 532 | if text.lower().find(name.lower()) < 0: 533 | matched = False 534 | if matched: 535 | filtereditems.append(f) 536 | return filtereditems 537 | 538 | def runRSSFeeds(): 539 | try: 540 | logger.info('--GET NEW RSS FEEDS--') 541 | 542 | target = 'rss_feed' 543 | keywords = ec.getEnableKeywords(target) 544 | 545 | if ec.isEnable(target) and keywords != None and keywords != []: 546 | safe_limit = 6 547 | error_safety = ec.getSafetyCount(target) 548 | 549 | for key in keywords: 550 | channel = key['Channel'] 551 | filter = key['Filters'] 552 | url = key['URL'] 553 | lastpost = key['Last_Post'] 554 | initialstate = key['__INITIAL__'] 555 | (result, statuscode) = search_api.getRSSFeeds(url, lastpost) 556 | if statuscode != 200: 557 | error_safety += 1 558 | ec.setSafetyCount(target, error_safety) 559 | postdata = '`' + key['Name'] + '` failed to get _RSS_Feeds_.\nStatus Code: ' + str(statuscode) 560 | master.postAnyData(postdata, channel) 561 | logger.info(postdata) 562 | if error_safety > safe_limit: 563 | postdata = 'Too Many Errors. _RSS_Feeds_ Module is disabled for safety' 564 | ec.disable(target) 565 | master.postAnyData(postdata, channel) 566 | logger.info(postdata) 567 | else: 568 | if error_safety != 0: 569 | ec.setSafetyCount(target, 0) 570 | if len(result) > 0: 571 | if initialstate: 572 | result = result[:1] 573 | ec.haveSearched(target, key['Name']) 574 | filteredfeeds = {} 575 | if filter != []: 576 | for f in filter: 577 | c = f['Channel'] 578 | w = f['Words'] 579 | ff = filterFeeds(result, w) 580 | if ff != []: 581 | if c in filteredfeeds.keys(): 582 | filteredfeeds[c] += ff 583 | else: 584 | filteredfeeds[c] = ff 585 | else: 586 | if result != {}: 587 | filteredfeeds[channel] = result 588 | lastpost = {'title':result[0]['title'], 'link':result[0]['link'], 'timestamp':result[0]['timestamp']} 589 | ec.setRSSLastPost(key['Name'], lastpost) 590 | if filteredfeeds != {}: 591 | for c, feeds in filteredfeeds.items(): 592 | if c in getSpecialChannel(): 593 | doSpecialAct(target, c, key['Name'], feeds) 594 | postdata = 'New Feed in `' + key['Name'] + '`' 595 | master.postAnyData(postdata, c) 596 | logger.info(postdata) 597 | postdata = '' 598 | for f in feeds: 599 | postdata = f['title'] + '\n' 600 | postdata += f['link'] 601 | logger.info(postdata) 602 | master.postAnyData(postdata, c) 603 | time.sleep(30) 604 | except: 605 | logger.error('--ERROR HAS OCCURED IN GETTING RSS FEEDS--') 606 | logger.error(traceback.format_exc()) 607 | master.postAnyData(traceback.format_exc(), slackbot_settings.channels[0]) 608 | 609 | def runTwitterSearch(): 610 | try: 611 | logger.info('--START TWITTER SEARCH--') 612 | 613 | target = 'twitter' 614 | keywords = ec.getEnableKeywords(target) 615 | 616 | if ec.isEnable(target) and keywords != None and keywords != []: 617 | safe_limit = 6 618 | error_safety = ec.getSafetyCount(target) 619 | 620 | for key in keywords: 621 | channel = key['Channel'] 622 | query = key['Query'] 623 | users = key['Users'] 624 | lastpost = key['Last_Post'] 625 | initialstate = key['__INITIAL__'] 626 | (result, statuscode) = search_api.getTweets(users, query, lastpost) 627 | if statuscode != 200: 628 | error_safety += 1 629 | ec.setSafetyCount(target, error_safety) 630 | postdata = '`' + key['Name'] + '` failed to get _Twitter_.\nStatus Code: ' + str(statuscode) 631 | master.postAnyData(postdata, channel) 632 | logger.info(postdata) 633 | if error_safety > safe_limit: 634 | postdata = 'Too Many Errors. _Twitter_ Module is disabled for safety' 635 | ec.disable(target) 636 | master.postAnyData(postdata, channel) 637 | logger.info(postdata) 638 | else: 639 | if error_safety != 0: 640 | ec.setSafetyCount(target, 0) 641 | if len(result) > 0: 642 | if initialstate: 643 | result = result[:1] 644 | ec.haveSearched(target, key['Index']) 645 | lastpost = result[0] 646 | ec.setTwitterLastPost(key['Index'], lastpost) 647 | postdata = 'New Tweets in `' + key['KEY'] + '`' 648 | master.postAnyData(postdata, channel) 649 | logger.info(postdata) 650 | postdata = '' 651 | for tw in result: 652 | postdata = 'https://twitter.com' + tw['link'] 653 | postdata += ' (FROM: '+ tw['user'] + ')\n' 654 | postdata += '>>>' + tw['tweet'] + '\n' 655 | logger.info(postdata) 656 | if channel in getSpecialChannel(): 657 | doSpecialAct(target, channel, key['KEY'], tw) 658 | master.postAnyData(postdata, channel) 659 | time.sleep(30) 660 | except: 661 | logger.error('--ERROR HAS OCCURED IN SEARCHING TWITTER--') 662 | logger.error(traceback.format_exc()) 663 | master.postAnyData(traceback.format_exc(), slackbot_settings.channels[0]) 664 | 665 | def runBot(): 666 | bot = Bot() 667 | bot.run() 668 | 669 | class JobConfig(object): 670 | def __init__(self, crontab, job): 671 | self._crontab = crontab 672 | self.job = job 673 | 674 | def schedule(self): 675 | crontab = self._crontab 676 | return datetime.now() + timedelta(seconds=math.ceil(crontab.next())) 677 | 678 | def next(self): 679 | crontab = self._crontab 680 | return math.ceil(crontab.next()) 681 | 682 | def job_controller(jobConfig): 683 | while True: 684 | try: 685 | time.sleep(jobConfig.next()) 686 | jobConfig.job() 687 | except KeyboardInterrupt: 688 | break 689 | 690 | def main(): 691 | parser = argparse.ArgumentParser() 692 | parser.add_argument('--db-host', type=str, default='localhost', help='DATABASE HOST NAME') 693 | parser.add_argument('--db-port', type=int, default=27017, help='DATABASE PORT') 694 | parser.add_argument('--db-name', type=str, default='codescraper-database', help='DATABASE NAME') 695 | args = parser.parse_args() 696 | ec.setDB(args.db_host, args.db_port, args.db_name) 697 | 698 | jobConfigs = [] 699 | 700 | try: 701 | slackbot_settings.API_TOKEN 702 | except NameError: 703 | sys.exit() 704 | print('Slackbot API TOKEN is required') 705 | 706 | start_state = [] 707 | runpastebinflag = False 708 | 709 | try: 710 | channels = slackbot_settings.channels 711 | if type(channels) != list or channels == []: 712 | print('Set more than 1 channel') 713 | sys.exit() 714 | 715 | ec.setUsingChannels(channels) 716 | 717 | if slackbot_settings.enable_github_search: 718 | default_github = slackbot_settings.github_default_settings 719 | ret = ec.setDefaultSettings('github', default_github) 720 | if ret: 721 | github_interval = slackbot_settings.github_search_interval 722 | jobConfigs.append(JobConfig(CronTab(github_interval), runSearchGithub)) 723 | message = 'Started' 724 | start_state.append(('github', 'SUCCESS', message)) 725 | else: 726 | ec.disable('github') 727 | message = 'Default Setting is wrong. Disabled' 728 | start_state.append(('github', 'FAILED', message)) 729 | else: 730 | ec.disable('github') 731 | 732 | if slackbot_settings.enable_github_code_search: 733 | default_github_code = slackbot_settings.github_code_default_settings 734 | ret = ec.setDefaultSettings('github_code', default_github_code) 735 | if ret: 736 | gist_interval = slackbot_settings.github_code_search_interval 737 | jobConfigs.append(JobConfig(CronTab(gist_interval), runSearchGithubCode)) 738 | message = 'Started' 739 | start_state.append(('github_code', 'SUCCESS', message)) 740 | else: 741 | ec.disable('github_code') 742 | message = 'Default Setting is wrong. Disabled' 743 | start_state.append(('github_code', 'FAILED', message)) 744 | else: 745 | ec.disable('github_code') 746 | 747 | if slackbot_settings.enable_gist_search: 748 | default_gist = slackbot_settings.gist_default_settings 749 | ret = ec.setDefaultSettings('gist', default_gist) 750 | if ret: 751 | gist_interval = slackbot_settings.gist_search_interval 752 | jobConfigs.append(JobConfig(CronTab(gist_interval), runSearchGist)) 753 | message = 'Started' 754 | start_state.append(('gist', 'SUCCESS', message)) 755 | else: 756 | ec.disable('gist') 757 | message = 'Default Setting is wrong. Disabled' 758 | start_state.append(('gist', 'FAILED', message)) 759 | else: 760 | ec.disable('gist') 761 | 762 | if slackbot_settings.enable_gitlab_search: 763 | default_gitlab = slackbot_settings.gitlab_default_settings 764 | ret = ec.setDefaultSettings('gitlab', default_gitlab) 765 | if ret: 766 | gitlab_interval = slackbot_settings.gitlab_search_interval 767 | jobConfigs.append(JobConfig(CronTab(gitlab_interval), runSearchGitlab)) 768 | message = 'Started' 769 | start_state.append(('gitlab', 'SUCCESS', message)) 770 | else: 771 | ec.disable('gitlab') 772 | message = 'Default Setting is wrong. Disabled' 773 | start_state.append(('gitlab', 'FAILED', message)) 774 | else: 775 | ec.disable('gitlab') 776 | 777 | if slackbot_settings.enable_gitlab_snippet_search: 778 | default_gitlab_snippet = slackbot_settings.gitlab_snippet_default_settings 779 | ret = ec.setDefaultSettings('gitlab_snippet', default_gitlab_snippet) 780 | if ret: 781 | gitlab_snippet_interval = slackbot_settings.gitlab_snippet_search_interval 782 | jobConfigs.append(JobConfig(CronTab(gitlab_snippet_interval), runSearchGitlabSnippets)) 783 | message = 'Started' 784 | start_state.append(('gitlab_snippet', 'SUCCESS', message)) 785 | else: 786 | ec.disable('gitlab_snippet') 787 | message = 'Default Setting is wrong. Disabled' 788 | start_state.append(('gitlab_snippet', 'FAILED', message)) 789 | else: 790 | ec.disable('gitlab_snippet') 791 | 792 | if slackbot_settings.enable_pastebin_search: 793 | default_pastebin = slackbot_settings.pastebin_default_settings 794 | ret = ec.setDefaultSettings('pastebin', default_pastebin) 795 | if ret: 796 | runpastebinflag = True 797 | message = 'Started' 798 | start_state.append(('pastebin', 'SUCCESS', message)) 799 | else: 800 | ec.disable('pastebin') 801 | message = 'Default Setting is wrong. Disabled' 802 | start_state.append(('pastebin', 'FAILED', message)) 803 | else: 804 | ec.disable('pastebin') 805 | 806 | if slackbot_settings.enable_google_custom_search: 807 | slackbot_settings.google_custom_search_engine_id 808 | slackbot_settings.google_custom_api_key 809 | default_google_custom = slackbot_settings.google_custom_default_settings 810 | ret = ec.setDefaultSettings('google_custom', default_google_custom) 811 | if ret: 812 | google_custom_interval = slackbot_settings.google_custom_search_interval 813 | jobConfigs.append(JobConfig(CronTab(google_custom_interval), runSearchGoogleCustom)) 814 | message = 'Started' 815 | start_state.append(('google_custom', 'SUCCESS', message)) 816 | else: 817 | ec.disable('google_custom') 818 | message = 'Default Setting is wrong. Disabled' 819 | start_state.append(('google_custom', 'FAILED', message)) 820 | else: 821 | ec.disable('google_custom') 822 | 823 | if slackbot_settings.enable_rss_feed: 824 | default_channel = slackbot_settings.rss_feed_default_channel 825 | ret = ec.setDefaultSettings('rss_feed', {'Channel':default_channel}) 826 | if ret: 827 | rss_interval = slackbot_settings.rss_feed_interval 828 | jobConfigs.append(JobConfig(CronTab(rss_interval), runRSSFeeds)) 829 | message = 'Started' 830 | start_state.append(('rss_feed', 'SUCCESS', message)) 831 | else: 832 | ec.disable('rss_feed') 833 | message = 'Default Setting is wrong. Disabled' 834 | start_state.append(('rss_feed', 'FAILED', message)) 835 | else: 836 | ec.disable('rss_feed') 837 | 838 | if slackbot_settings.enable_twitter: 839 | default_channel = slackbot_settings.twitter_default_channel 840 | ret = ec.setDefaultSettings('twitter', {'Channel':default_channel}) 841 | if ret: 842 | twitter_interval = slackbot_settings.twitter_interval 843 | jobConfigs.append(JobConfig(CronTab(twitter_interval), runTwitterSearch)) 844 | message = 'Started' 845 | start_state.append(('twitter', 'SUCCESS', message)) 846 | else: 847 | ec.disable('twitter') 848 | message = 'Default Setting is wrong. Disabled' 849 | start_state.append(('twitter', 'FAILED', message)) 850 | else: 851 | ec.disable('twitter') 852 | 853 | except AttributeError: 854 | print('slackbot_settings is something wrong') 855 | sys.exit(0) 856 | 857 | postdata = '---CodeScraper Slackbot Started---\n```' 858 | for m in start_state: 859 | postdata += ' : '.join(m) + '\n' 860 | postdata += '```' 861 | master.postAnyData(postdata, channels[0]) 862 | print(postdata) 863 | 864 | if runpastebinflag: 865 | p = Pool(len(jobConfigs) + 2) 866 | else: 867 | p = Pool(len(jobConfigs) + 1) 868 | try: 869 | p.apply_async(runBot) 870 | if runpastebinflag: 871 | p.apply_async(runSearchPastebin) 872 | p.map(job_controller, jobConfigs) 873 | except KeyboardInterrupt: 874 | pass 875 | 876 | if __name__ == "__main__": 877 | main() 878 | -------------------------------------------------------------------------------- /master/search_api.py: -------------------------------------------------------------------------------- 1 | import requests 2 | import lxml.html 3 | import datetime 4 | import time 5 | import json 6 | import os.path 7 | import urllib 8 | import re 9 | import feedparser 10 | from dateutil import parser, tz 11 | #import traceback 12 | from pyquery import PyQuery 13 | 14 | '''def get_request(url, headers, tries, timeout): 15 | try: 16 | if tries < 0: 17 | r = requests.get(url, headers=headers timeout=timeout) 18 | return r 19 | else: 20 | return None 21 | except requests.exceptions.ConnectTimeout: 22 | sleep(1.5) 23 | res = requestX(url, headers, tries-1, timeout) 24 | return res''' 25 | 26 | def searchGithub(word, day, level): 27 | searchlevel = { 28 | 1: ['in:name,description', 'created'], 29 | 2: ['in:name,description,readme', 'created'], 30 | 3: ['in:name,description', 'pushed'], 31 | 4: ['in:name,description,readme', 'pushed']} 32 | github_url = 'https://api.github.com/search/repositories?q=' 33 | try: 34 | if word.find(' ') > 0: 35 | word.replace(' ', '\" \"') 36 | word = urllib.parse.quote('\"' + word + '\"') 37 | url = github_url + word + '+' + searchlevel[level][0] + '+' + searchlevel[level][1] + ':>' + day + '&s=updated&o=asc' 38 | headers = {"Accept": "application/vnd.github.mercy-preview+json"} 39 | result = requests.get(url, timeout=10, headers=headers) 40 | statuscode = result.status_code 41 | resultdata = result.json() 42 | codes = [] 43 | for a in resultdata['items']: 44 | name = a['full_name'] 45 | if a['size'] > 0: 46 | codes.append(name) 47 | return codes, statuscode 48 | except: 49 | return [], -1 50 | 51 | def searchGithubCode(word, level, api_key): 52 | searchlevel = { 53 | 1: 'in:file', 54 | 2: 'in:file,path', 55 | 3: 'in:file,path', 56 | 4: 'in:file,path'} 57 | github_url = 'https://api.github.com/search/code?q=' 58 | try: 59 | if word.find(' ') > 0: 60 | word.replace(' ', '\" \"') 61 | word = urllib.parse.quote('\"' + word + '\"') 62 | url = github_url + word + '+' + searchlevel[level] + '+sort%3Aindexed&access_token=' + api_key 63 | headers = {"Accept": "application/vnd.github.mercy-preview+json"} 64 | result = requests.get(url, timeout=10, headers=headers) 65 | statuscode = result.status_code 66 | resultdata = result.json() 67 | codes = [] 68 | for a in resultdata['items']: 69 | name = a['html_url'] 70 | if level == 3: 71 | if not a['name'].lower().startswith('readme'): 72 | codes.append(name) 73 | else: 74 | codes.append(name) 75 | return codes, statuscode 76 | except: 77 | return [], -1 78 | 79 | def searchGist(word, day): 80 | if word.find(' ') > 0: 81 | word.replace(' ', '\" \"') 82 | word = urllib.parse.quote('\"' + word + '\"') 83 | url = 'https://gist.github.com/search?utf8=%E2%9C%93&q=' + word + '+created%3A>' + day + '&ref=searchresults' 84 | try: 85 | result = requests.get(url, timeout=10) 86 | statuscode = result.status_code 87 | root = lxml.html.fromstring(result.text) 88 | codes = [] 89 | for a in root.xpath('//div/a[@class="link-overlay"]'): 90 | # name = a.text_content() 91 | link = a.get('href') 92 | codes.append(link) 93 | return codes, statuscode 94 | except: 95 | return [], -1 96 | 97 | def searchGitlab(word): 98 | try: 99 | if word.find(' ') > 0: 100 | word.replace(' ', '\" \"') 101 | word = urllib.parse.quote('\"' + word + '\"') 102 | url = 'https://gitlab.com/explore/projects?utf8=%E2%9C%93&name=' + word + '&sort=latest_activity_desc' 103 | result = requests.get(url, timeout=10) 104 | statuscode = result.status_code 105 | root = lxml.html.fromstring(result.text) 106 | codes = [] 107 | for a in root.xpath('//div/a[@class="project"]'): 108 | # name = a.text_content() 109 | link = a.get('href') 110 | codes.append(link) 111 | return codes, statuscode 112 | except: 113 | return [], -1 114 | 115 | def searchGitlabSnippets(words): 116 | try: 117 | url = 'https://gitlab.com/explore/snippets' 118 | result = requests.get(url, timeout=10) 119 | statuscode = result.status_code 120 | snippets = [] 121 | root = lxml.html.fromstring(result.text) 122 | symbols = r'[!\"#$%&\'()*+,\-./:;<=>@\[\]^_{|}~\\]' 123 | re_symbol = re.compile(symbols) 124 | pattlist = {} 125 | wordlist = {} 126 | for w in words: 127 | if re.search(re_symbol, w): 128 | pattlist[w] = re.compile(w) 129 | else: 130 | wordlist[w] = w.split(' ') 131 | for a in root.xpath('//div[@class="title"]/a'): 132 | name = a.text_content() 133 | link = a.get('href') 134 | snippets.append([name, link]) 135 | codes = {} 136 | if statuscode == 200: 137 | for l in snippets: 138 | raw_url = 'https://gitlab.com' + l[1] + '/raw' 139 | raw_result = requests.get(raw_url, timeout=10) 140 | if raw_result.status_code == 200: 141 | for w, patt in pattlist.items(): 142 | if re.search(patt, l[0]) or re.search(patt, raw_result.text): 143 | if w in codes.keys(): 144 | codes[w].append(l[1]) 145 | else: 146 | codes[w] = [l[1]] 147 | for w, patt in wordlist.items(): 148 | matched = True 149 | for p in patt: 150 | if l[0].lower().find(p.lower()) < 0 and raw_result.text.lower().find(p.lower()) < 0: 151 | matched = False 152 | if matched: 153 | if w in codes.keys(): 154 | codes[w].append(l[1]) 155 | else: 156 | codes[w] = [l[1]] 157 | else: 158 | statuscode = raw_result.status_code 159 | break 160 | time.sleep(10) 161 | return codes, statuscode # dict, int 162 | except: 163 | return {}, -1 164 | 165 | def getPasteList(limit): 166 | try: 167 | url = 'https://scrape.pastebin.com/api_scraping.php?limit=' + str(limit) 168 | result = requests.get(url, timeout=10) 169 | statuscode = result.status_code 170 | items = {} 171 | if statuscode == 200: 172 | scrape = result.json() 173 | for item in scrape: 174 | items[item["full_url"]] = [item["title"], item["scrape_url"]] 175 | return items, statuscode # dict, int 176 | except: 177 | return {}, -1 178 | 179 | def scrapePastebin(words, items): 180 | codes = {} 181 | statuscode = -1 182 | symbols = r'[!\"#$%&\'()*+,\-./:;<=>@\[\]^_{|}~\\]' 183 | re_symbol = re.compile(symbols) 184 | try: 185 | pattlist = {} 186 | wordlist = {} 187 | for w in words: 188 | if re.search(re_symbol, w): 189 | if w.startswith('.*'): 190 | w = w[2:] 191 | if w.endswith('.*'): 192 | w = w[:-2] 193 | pattlist[w] = re.compile(w) 194 | else: 195 | wordlist[w] = w.split(' ') 196 | for k,v in items.items(): 197 | raw_result = requests.get(v[1], timeout=15) 198 | statuscode = raw_result.status_code 199 | if statuscode == 200: 200 | for w, patt in pattlist.items(): 201 | if re.search(patt, v[0]) or re.search(patt, raw_result.text): 202 | if w in codes.keys(): 203 | codes[w].append(k) 204 | else: 205 | codes[w] = [k] 206 | for w, patt in wordlist.items(): 207 | matched = True 208 | for p in patt: 209 | if v[0].lower().find(p.lower()) < 0 and raw_result.text.lower().find(p.lower()) < 0: 210 | matched = False 211 | if matched: 212 | if w in codes.keys(): 213 | codes[w].append(k) 214 | else: 215 | codes[w] = [k] 216 | else: 217 | return {}, statuscode 218 | time.sleep(1.5) 219 | return codes, statuscode # dict, int 220 | except: 221 | return {}, -1 222 | 223 | def googleCustomSearch(word, engine_id, api_key): 224 | try: 225 | if word.find(' ') > 0: 226 | word.replace(' ', '\" \"') 227 | word = urllib.parse.quote('\"' + word + '\"') 228 | headers = {"content-type": "application/json"} 229 | url = 'https://www.googleapis.com/customsearch/v1?key=' + api_key + '&rsz=filtered_cse&num=10&hl=en&prettyPrint=false&cx=' + engine_id + '&q=' + word + '&sort=date' 230 | result = requests.get(url, timeout=10, headers=headers) 231 | statuscode = result.status_code 232 | codes = {} 233 | if statuscode == 200: 234 | jsondata = result.json() 235 | if 'items' in jsondata.keys(): 236 | for item in jsondata['items']: 237 | name = item['title'] 238 | sub = item['snippet'] 239 | link = item['link'] 240 | codes[link] = [name, sub] 241 | return codes, statuscode 242 | except: 243 | return {}, -1 244 | 245 | def parseRSS(items): 246 | parseddata = [] 247 | for item in items: 248 | data = { 249 | 'link' : item['link'] 250 | } 251 | if 'title' in item.keys(): 252 | data['title'] = item['title'] 253 | if 'summary' in item.keys(): 254 | data['summary'] = item['summary'] 255 | if 'updated' in item.keys() and item['updated'] != '': 256 | dt = parser.parse(item['updated']) 257 | if dt.tzinfo == None: 258 | data['timestamp'] = dt.strftime('%Y-%m-%d %H:%M:%S') 259 | else: 260 | data['timestamp'] = dt.astimezone(tz.tzutc()).strftime('%Y-%m-%d %H:%M:%S') 261 | elif 'published' in item.keys() and item['published'] != '': 262 | dt = parser.parse(item['published']) 263 | if dt.tzinfo == None: 264 | data['timestamp'] = dt.strftime('%Y-%m-%d %H:%M:%S') 265 | else: 266 | data['timestamp'] = dt.astimezone(tz.tzutc()).strftime('%Y-%m-%d %H:%M:%S') 267 | else: 268 | data['timestamp'] = None 269 | taglist = [] 270 | if 'tags'in item.keys(): 271 | for tag in item['tags']: 272 | taglist.append(tag['term']) 273 | data['tags'] = taglist 274 | contents = [] 275 | if 'content'in item.keys(): 276 | for c in item['content']: 277 | content = (c['type'], c['value']) 278 | contents.append(content) 279 | data['contents'] = contents 280 | parseddata.append(data) 281 | return parseddata 282 | 283 | def getRSSFeeds(url, lastpost): 284 | try: 285 | headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:50.0) Gecko/20100101 Firefox/50.0'} 286 | response = requests.get(url, timeout=10, headers=headers) 287 | updateditems = [] 288 | statuscode = response.status_code 289 | if statuscode == 200: 290 | rss = feedparser.parse(response.text) 291 | result = parseRSS(rss['entries']) 292 | for entry in result: 293 | if entry['link'] == lastpost['link']: 294 | break 295 | else: 296 | if entry['timestamp'] != None and lastpost['timestamp'] != None: 297 | if datetime.datetime.strptime(entry['timestamp'], '%Y-%m-%d %H:%M:%S') < datetime.datetime.strptime(lastpost['timestamp'], '%Y-%m-%d %H:%M:%S'): 298 | break 299 | updateditems.append(entry) 300 | return updateditems, statuscode 301 | except: 302 | return [], -1 303 | 304 | def getTweets(users, word, lastpost): 305 | try: 306 | query = '' 307 | if word.strip() != '': 308 | query += word 309 | if len(users) == 1: 310 | query += ' from:' + users[0] 311 | elif len(users) > 1: 312 | query += ' from:' + ' OR from:'.join(users) 313 | query = urllib.parse.quote_plus(query) 314 | url = 'https://twitter.com/i/search/timeline?f=tweets&q={query}&src=typd'.format(query=query) 315 | headers = { 316 | 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:50.0) Gecko/20100101 Firefox/50.0', 317 | 'Accept':"application/json, text/javascript, */*; q=0.01", 318 | 'Accept-Language':"de,en-US;q=0.7,en;q=0.3", 319 | 'X-Requested-With':"XMLHttpRequest", 320 | 'Referer':url, 321 | 'Connection':"keep-alive" 322 | } 323 | response = requests.get(url, headers=headers) 324 | statuscode = response.status_code 325 | tweetslist = [] 326 | new_tweets = [] 327 | res = response.json() 328 | if statuscode == 200: 329 | json_response = response.json() 330 | if json_response['items_html'].strip() != '': 331 | scraped_tweets = PyQuery(json_response['items_html']) 332 | scraped_tweets.remove('div.withheld-tweet') 333 | tweets = scraped_tweets('div.js-stream-tweet') 334 | if len(tweets) != 0: 335 | for tweet_html in tweets: 336 | t = {} 337 | tweetPQ = PyQuery(tweet_html) 338 | t['user'] = tweetPQ("span:first.username.u-dir b").text() 339 | txt = re.sub(r"\s+", " ", tweetPQ("p.js-tweet-text").text()) 340 | txt = txt.replace('# ', '#') 341 | txt = txt.replace('@ ', '@') 342 | t['tweet'] = txt 343 | t['id'] = tweetPQ.attr("data-tweet-id") 344 | t['link'] = tweetPQ.attr("data-permalink-path") 345 | t['timestamp'] = int(tweetPQ("small.time span.js-short-timestamp").attr("data-time")) 346 | tweetslist.append(t) 347 | for tw in tweetslist: 348 | if tw['id'] == lastpost['id']: 349 | break 350 | if 'timestamp' in tw.keys() and 'timestamp' in lastpost.keys(): 351 | if tw['timestamp'] < lastpost['timestamp']: 352 | break 353 | new_tweets.append(tw) 354 | return new_tweets, statuscode 355 | except: 356 | return [], -1 357 | -------------------------------------------------------------------------------- /master/settings/.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blue1616/CodeScraper/5e27c51ea645c7fb48fc3caaf290a75c8ee067ec/master/settings/.gitignore -------------------------------------------------------------------------------- /master/slackbot_settings.py.sample: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # API Token for Slackbot 4 | # https://api.slack.com/bot-users 5 | # CodeScraper uses lins05/slackbot (https://github.com/lins05/slackbot) 6 | API_TOKEN = "XXXX-XXXXXXXXXXXX-XXXXXXXXXXXXXXXXXXXXXXXX" 7 | 8 | default_reply = "What you mean?" 9 | 10 | # Do not edit 11 | PLUGINS = [ 12 | 'plugins', 13 | ] 14 | 15 | # Slack Cannels participating slackbot 16 | # make channels and invete your slackbot 17 | channels = [ 18 | "codescraper", 19 | "test_channel"] 20 | 21 | # You should change 'channel' name to one of the above 22 | github_default_channel = channels[0] 23 | gist_default_channel = channels[0] 24 | github_code_default_channel = channels[0] 25 | gitlab_default_channel = channels[0] 26 | gitlab_snippet_default_channel = channels[0] 27 | pastebin_default_channel = channels[0] 28 | google_custom_default_channel = channels[0] 29 | rss_feed_default_channel = channels[0] 30 | twitter_default_channel = channels[0] 31 | 32 | # Default Setting of Search Keywords 33 | github_default_settings = { 34 | 'Enable':True, 35 | 'SearchLevel':2, 36 | 'Time_Range':2, 37 | 'Expire_date':180, 38 | 'Channel': github_default_channel 39 | } 40 | 41 | gist_default_settings = { 42 | 'Enable':True, 43 | 'Time_Range':2, 44 | 'Expire_date':180, 45 | 'Channel':gist_default_channel 46 | } 47 | 48 | github_code_default_settings = { 49 | 'Enable':True, 50 | 'SearchLevel':2, 51 | 'Expire_date':180, 52 | 'Channel':github_code_default_channel 53 | } 54 | 55 | gitlab_default_settings = { 56 | 'Enable':True, 57 | 'Expire_date':180, 58 | 'Channel':gitlab_default_channel 59 | } 60 | 61 | gitlab_snippet_default_settings = { 62 | 'Enable':True, 63 | 'Expire_date':180, 64 | 'Channel':gitlab_snippet_default_channel 65 | } 66 | 67 | pastebin_default_settings = { 68 | 'Enable':True, 69 | 'Expire_date':180, 70 | 'Channel':pastebin_default_channel 71 | } 72 | 73 | google_custom_default_settings = { 74 | 'Enable':True, 75 | 'Expire_date':180, 76 | 'Channel':google_custom_default_channel 77 | } 78 | 79 | # github_access_token is required for searching github_code (not needed for github and gist search) 80 | # Access : https://github.com/settings/tokens 81 | github_access_token = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' 82 | 83 | # Google Custom Search is required API key and search engine id. 84 | # Google Custom Search API : https://developers.google.com/custom-search/v1/overview 85 | # Access here : https://console.developers.google.com/ 86 | google_custom_api_key = 'XXXXXXXXXXXXXXXXXXXXXX-XXXXXXXXXXXXXXXX' 87 | google_custom_search_engine_id = 'XXXXXXXXXXXXXXXXXXXXX:XXXXXXXXXXX' 88 | 89 | # Enable Modules 90 | enable_github_search = True 91 | enable_github_code_search = False 92 | enable_gist_search = True 93 | enable_gitlab_search = True 94 | enable_gitlab_snippet_search = False 95 | # Pastebin Search is required pro account and to ask pastebin to whitelist your IP 96 | enable_pastebin_search = False 97 | enable_google_custom_search = False 98 | enable_rss_feed = True 99 | enable_twitter = True 100 | 101 | # Interval (Write in Crontab Format) 102 | github_search_interval = "28 */1 * * *" 103 | github_code_search_interval = "48 */1 * * *" 104 | gist_search_interval = "02 */2 * * *" 105 | gitlab_search_interval = "13 */2 * * *" 106 | gitlab_snippet_search_interval = "43 */2 * * *" 107 | #pastebin_search_interval = "*/1 * * * *" 108 | # --*ATTENTION*-- 109 | # free google api allows 100 requests per day 110 | # if you set search interval to every 2 hour, you can register at most 8 search candidates (12 * 8 = 96 reqs) 111 | google_custom_search_interval = "53 */2 * * *" 112 | rss_feed_interval = "07 */1 * * *" 113 | twitter_interval = "14 */1 * * *" 114 | -------------------------------------------------------------------------------- /master/startbot.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | if [ -z "$DB_HOST" ]; then 4 | echo "DB_HOST is empty" 5 | exit 6 | fi 7 | 8 | if [ -z "$DB_PORT" ]; then 9 | echo "DB_PORT is empty" 10 | exit 11 | fi 12 | 13 | if [ -z "$DB_NAME" ]; then 14 | DB_NAME=codescraper-database 15 | fi 16 | 17 | python3 run.py --db-host=$DB_HOST --db-port=$DB_PORT --db-name=$DB_NAME 18 | -------------------------------------------------------------------------------- /requirements: -------------------------------------------------------------------------------- 1 | slackbot==0.5.3 2 | lxml==4.2.4 3 | crontab==0.22.2 4 | feedparser==5.2.1 5 | python-dateutil==2.7.5 6 | pymongo==3.7.2 7 | pyquery==1.4.0 8 | --------------------------------------------------------------------------------