├── .gitignore ├── Readme.md ├── cli ├── cli.py ├── data.json ├── history └── 20200910094654_post_hash.json └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | .idea 2 | .vscode 3 | -------------------------------------------------------------------------------- /Readme.md: -------------------------------------------------------------------------------- 1 | 1. [MalwareIoCHash](#malwareiochash) 2 | 2. [仓库内容](#仓库内容) 3 | 3. [使用](#使用) 4 | 1. [支持的命令行](#支持的命令行) 5 | 2. [打印文件信息](#打印文件信息) 6 | 3. [搜索哈希值](#搜索哈希值) 7 | 4. [搜索文章标题(内容)](#搜索文章标题内容) 8 | 5. [搜索文章链接](#搜索文章链接) 9 | 6. [搜索包含哈希值的文件](#搜索包含哈希值的文件) 10 | 7. [输出搜索结果: 导出与打印、自定义字段选择](#输出搜索结果-导出与打印自定义字段选择) 11 | 1. [导出到`json`文件](#导出到json文件) 12 | 2. [指定输出(导出/打印)的内容](#指定输出导出打印的内容) 13 | 4. [TODO](#todo) 14 | 15 | # MalwareIoCHash 16 | - 对从多个站点抓取的包含IoC-Hash的文章进行搜索. 17 | - 当前包括文章个数: `11757` 18 | - 当前包括Hash个数: `37871` 19 | 20 | # 仓库内容 21 | - `data.json`: 从多个网站抓取包含Hash的文章后, 对IoC进行在线验证与扩充(VT/HybridAnalysis)后的数据 22 | - 每个文章包括的字段: 23 | - `time`: 文章发布日期(可能不准确) 24 | - `title`: 文章标题 25 | - `link`: 文章链接 26 | - `pending`: 从文章内容中提取的Hash值, 没有在VT/HybridAnalysis等站点中找到对应项, 导致此哈希值没有完整的`(md5, sha1, sha256)`对, 则为`pending` 27 | - `confirmed`: 从文章中提取的Hash值, 在VT/HybridAnalysis等站点中找到了对应项, 补全了`(md5, sha1, sha256)`对, 则为`confirmed` 28 | - `topic_list`: 在文章内容中搜索到的预定义的`topic`列表. 29 | - `topic`来源: [MIPS](https://www.misp-project.org/), [Malpedia](https://malpedia.caad.fkie.fraunhofer.de/) 30 | - 以字符串包含方式, 将`topic`与文章正文内容进行匹配 31 | - `cli.py`: Python脚本, 对`data.json`中的内容进行过滤、导出等操作 32 | 33 | # 使用 34 | - 只支持Python3 35 | - `pip3 install -r requirements.txt`: 安装依赖 36 | - `chmod +x cli`: 在`Linux`系统下, 可直接使用`cli` 37 | - `Windows`系统使用`python3 cli.py` 38 | 39 | ## 支持的命令行 40 | - `./cli --help`: 根命令 41 | ``` 42 | Commands: 43 | info 打印信息. 44 | search -> 搜索. 45 | ``` 46 | - `./cli search --help`: 搜索命令 47 | ``` 48 | -> 搜索. 49 | Commands: 50 | file 从指定文件中读取哈希值, 并搜索. 51 | hash 搜索哈希值. 52 | url 搜索链接. 53 | word 搜索文章关键词(默认只有title, 如果需要搜索content则指定 --content). 54 | ``` 55 | 56 | ## 打印文件信息 57 | - `./cli info` 58 | ``` 59 | topic 个数: 3684 60 | ['gobrut', 'smominru', 'ismo', 'lsmo', 'botnet', 'chachaddos', 'shade', 'zebrocy', 'zekapab', 'loudminer', 'buhtrap', 'ratopak', 'fakeapp', 'winnti', 'agent.alqhi', 'etso', 'miner', 'plead', 'apt28', 'sofacy', 'sednit', 'fancybear', 'pawnstorm', 'tsarteam', 'jkeyskw', 'carberplike', 'downrage', 'jhuhugit', 'komplex', 'seduploader', 'gamefish', 'sofacycarberp', 'tg-4127', 'grey-cloud', 'group-4127', 'strontium', 'threatgroup-4127', 'swallowtail', 'irontwilight', 'tag_0700', 'group74', 'oceanlotus', 'emotet', 'gobotkr', 'android.filecoder', 'amavaldo', 'balkan', 'balkanrat', 'balkandoor', 'newsource', 'armageddon', 'systemcrypter', 'loocipher', 'freeme', 'boooamcrypt', 'eris', 'expboot', 'doppelpaymer', 'skystars', 'zerofucks', 'tflower', 'xorist', 'syrk', 'arsium', 'yobacrypt', 'plague', 'mykings', 'xrat', 'gootkit', 'talalpek', 'xswkit', 'trickbot', 'trickloader', 'trickster', 'thetrick', 'remcos', 'miraixminer', 'viagra', 'asruex', 'necro', 'agenttesla', 'nymaim', 'nymain', 'pinkslipbot', 'qakbot', 'qbot', 'ramnit', 'nimnul', 'zegost', 'lokibot', 'expiro', 'cerber', 'crbrencryptor', 'tofsee', 'gheg', 'mondera', 'kovter', 'xtremerat', 'extrat', 'gh0strat'] ... 61 | 个数: 11757 62 | pending-hash 个数: 419 63 | confirmed-sha256 个数: 37452 64 | ``` 65 | 66 | ## 搜索哈希值 67 | - 当前只支持完整的`md5/sha1/sha256`哈希值的搜索 68 | - `./cli search hash -h ec2ed8e85eb96c65c64f666a63a5e9e6` 69 | ``` 70 | 要搜索的哈希值: ec2ed8e85eb96c65c64f666a63a5e9e6 71 | 待搜索个数: 11757 72 | 搜索结果: 1 73 | ------------------------------ 74 | 2014-07-22 75 | The Bank INTERAC was accepted. - virus 76 | https://techhelplist.com/spam-list/605-the-bank-interac-was-accepted-virus 77 | confirmed: 78 | ec2ed8e85eb96c65c64f666a63a5e9e6 79 | 90a4e2156839d855d29952a4ebf1d54f3c9b1950 80 | c83c891dbdd02f7f45bde586a1e802276819267904190e2f053a6f50da3513ad 81 | ``` 82 | 83 | ## 搜索文章标题(内容) 84 | - `./cli search word -w apt28`: 未指定`--content`, 只匹配标题, 搜索结果: `19` 85 | ``` 86 | 要搜索的词: apt28 87 | 待搜索个数: 11757 88 | 搜索结果: 19 89 | ------------------------------ 90 | 2019-11-07 91 | Here We GO: Crimeware & APT Journey From “RobbinHood” to APT28 92 | https://www.sentinelone.com/blog/here-we-go-crimeware-apt-journey-from-robbinhood-to-apt28/ 93 | confirmed: 94 | 602d2901d55c2720f955503456ac2f68 95 | 80e61ba572b2c955c50d8359eb68e6c13fc16ae1 96 | 93680d34d798a22c618c96dec724517829ec3aad71215213a2dcb1eb190ff9fa 97 | // ... 98 | ------------------------------ 99 | 2019-08-10 100 | APT28分析之X-agent样本分析 101 | https://xz.aliyun.com/t/5898 102 | confirmed: 103 | 6fc8602c8b3a18765bb6d2307d8a4ae1 104 | 57f455bfc074c881076f506aa8e3090f75e2e0ac 105 | dfba21b4b7e1e6ebd162010c880c82c9b04d797893311c19faab97431bf25927 106 | // ... 107 | // ........ 108 | ``` 109 | - `./cli search word -w apt28 --content`: 指定`--content`, 搜索结果: `55` 110 | ``` 111 | 要搜索的词: apt28 112 | 待搜索个数: 11757 113 | 搜索结果: 55 114 | ------------------------------ 115 | 2020-03-26 116 | The Dukes of Moscow 117 | https://www.carbonblack.com/2020/03/26/the-dukes-of-moscow/ 118 | confirmed: 119 | 28f96a57fa5ff663926e9bad51a1d0cb 120 | a75995f94854dea8799650a2f4a97980b71199d2 121 | 19972cc87c7653aff9620461ce459b996b1f9b030d7c8031df0c8265b73f670d 122 | // .... 123 | ``` 124 | - `./cli search word -w keylogger` 125 | ``` 126 | 要搜索的词: keylogger 127 | 待搜索个数: 11757 128 | 搜索结果: 46 129 | ------------------------------ 130 | 2019-10-07 131 | Dissecting Ardamax Keylogger 132 | https://medium.com/p/f33f922d2576 133 | confirmed: 134 | 4a57ce1565f05454e9b5a4a80d048865 135 | 24362704e540b58aafbc6d736bb99b5b1b28e784 136 | 907587a797ef5ee759534b95e6f886cdea5989129d65de5b684b6d3b4aa645dc 137 | // ... 138 | ``` 139 | 140 | ## 搜索文章链接 141 | - `./cli search url alienvault.com` 142 | ``` 143 | 要搜索的链接: alienvault.com 144 | 待搜索个数: 11757 145 | 搜索结果: 43 146 | ------------------------------ 147 | 2019-04-02 148 | Xwo - A Python-based bot scanner 149 | https://www.alienvault.com/blogs/labs-research/xwo-a-python-based-bot-scanner 150 | confirmed: 151 | fd67a98599b08832cf8570a641712301 152 | 1faf363809f266bb2d90fb8d3fc43c18253d0048 153 | 6408c69e802de04e949ed3047dc1174ef20125603ce7ba5c093e820cb77b1ae1 154 | // ... 155 | ``` 156 | 157 | ## 搜索包含哈希值的文件 158 | - `cat ~/tmp/xx.txt`: 文件内包括2个哈希值 159 | ``` 160 | fd67a98599b08832cf8570a641712301 161 | 4a57ce1565f05454e9b5a4a80d048865 162 | ``` 163 | - `./cli search file -f ~/tmp/xx.txt` 164 | ``` 165 | 文件路径: /home/xxx/tmp/xx.txt 166 | 原文件行数: 2 167 | 过滤后剩余行数: 2 168 | 过滤后剩余有效哈希数: 2 169 | 待搜索个数: 11757 170 | 搜索结果: 2 171 | ------------------------------ 172 | 2019-10-07 173 | Dissecting Ardamax Keylogger 174 | https://medium.com/p/f33f922d2576 175 | confirmed: 176 | 4a57ce1565f05454e9b5a4a80d048865 177 | 24362704e540b58aafbc6d736bb99b5b1b28e784 178 | 907587a797ef5ee759534b95e6f886cdea5989129d65de5b684b6d3b4aa645dc 179 | // ... 180 | ------------------------------ 181 | 2019-04-02 182 | Xwo - A Python-based bot scanner 183 | https://www.alienvault.com/blogs/labs-research/xwo-a-python-based-bot-scanner 184 | confirmed: 185 | fd67a98599b08832cf8570a641712301 186 | 1faf363809f266bb2d90fb8d3fc43c18253d0048 187 | 6408c69e802de04e949ed3047dc1174ef20125603ce7ba5c093e820cb77b1ae1 188 | // ... 189 | ``` 190 | 191 | ## 输出搜索结果: 导出与打印、自定义字段选择 192 | - **适用于以上所有搜索命令** 193 | - 支持选项: 194 | ``` 195 | --out TEXT 将搜索结果导出到指定目录(目录必须存在)(未指定则打印到控制台) 196 | --col TEXT 要导出/打印的列, 中间以逗号(,)分割 197 | 不指定则导出/打印全部(可选:"time/title/link/hash/sha256)" 198 | --json 是否以 json 格式导出/打印 199 | ``` 200 | 201 | ### 导出到`json`文件 202 | - `./cli search word -w apt28 --out ~/tmp` 203 | ``` 204 | 要搜索的词: apt28 205 | 待搜索个数: 11757 206 | 搜索结果: 19 207 | 结果导出至文件: /home/xxx/tmp/20200910115449_hash_output.txt 208 | ``` 209 | 210 | ### 指定输出(导出/打印)的内容 211 | - `./cli search word -w apt28 --col title,time` 212 | ``` 213 | 要搜索的词: apt28 214 | 待搜索个数: 11757 215 | 搜索结果: 19 216 | ------------------------------ 217 | 2019-11-07 218 | Here We GO: Crimeware & APT Journey From “RobbinHood” to APT28 219 | ------------------------------ 220 | 2019-08-10 221 | APT28分析之X-agent样本分析 222 | // .... 223 | ``` 224 | - `./cli search word -w apt28 --col hash`: 只输出哈希值 225 | ``` 226 | 要搜索的词: apt28 227 | 待搜索个数: 11757搜索结果: 19 228 | 65de07fc6b821d9fd3497cfa64212df2d39935dd515a86eda80d08086b183a3f 229 | 7cd1b5f6774b25727e1d80b29979dadd1d427d3a 230 | // ... 231 | ``` 232 | - `./cli search word -w apt28 --col sha256`: 对于`confirmed`类型的哈希值, 只输出`sha256` 233 | ``` 234 | 要搜索的词: apt28 235 | 待搜索个数: 11757 236 | 搜索结果: 19 237 | e7dd9678b0a1c4881e80230ac716b21a41757648d71c538417755521438576f6 238 | 6ad3eb8b5622145a70bec67b3d14868a1c13864864afd651fe70689c95b1399a 239 | fcf03bf5ef4babce577dd13483391344e957fd2c855624c9f0573880b8cba62e 240 | // .... 241 | ``` 242 | - `./cli search word -w apt28 --col title,sha256`: 混合 243 | ``` 244 | 要搜索的词: apt28 245 | 待搜索个数: 11757 246 | 搜索结果: 19 247 | ------------------------------ 248 | Here We GO: Crimeware & APT Journey From “RobbinHood” to APT28 249 | hash: 250 | 93680d34d798a22c618c96dec724517829ec3aad71215213a2dcb1eb190ff9fa 251 | 3bc78141ff3f742c5e942993adfbef39c2127f9682a303b5e786ed7f9a8d184b 252 | ------------------------------ 253 | APT28分析之X-agent样本分析 254 | hash: 255 | dfba21b4b7e1e6ebd162010c880c82c9b04d797893311c19faab97431bf25927 256 | 5f6b2a0d1d966fc4f1ed292b46240767f4acb06c13512b0061b434ae2a692fa1 257 | // .... 258 | ``` 259 | 260 | # TODO 261 | - 1. 搜索时可指定同名(alias) 262 | 263 | 264 | -------------------------------------------------------------------------------- /cli: -------------------------------------------------------------------------------- 1 | python3 cli.py "$@" -------------------------------------------------------------------------------- /cli.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python3 2 | # -*- coding: utf-8 -*- 3 | 4 | """数据文件读取、解析与过滤等.""" 5 | 6 | import json 7 | import os 8 | import re 9 | from datetime import datetime 10 | 11 | import click 12 | from tqdm import tqdm 13 | 14 | DEFAULT_DATA_PATH = 'data.json' 15 | 16 | 17 | def now_datetime_str(): 18 | """当前时间的字符串表示, 例如: 20190101010101.""" 19 | return datetime.strftime(datetime.now(), '%Y%m%d%H%M%S') 20 | 21 | 22 | def check_is_valid_hash(hash_): 23 | """检查指定 hash_ 是否为有效哈希.""" 24 | hash_pattern = r'((? 搜索.""" 269 | pass 270 | 271 | 272 | @search.command(name='hash') 273 | @click.option('-h', '--hash', 'hash_', required=True, help='要搜索的完整哈希值') 274 | @add_options(_out_options) 275 | def search_hash(hash_, out, col, is_json): 276 | """搜索哈希值.""" 277 | if not check_is_valid_hash(hash_): 278 | raise Exception(f'输入的哈希无效: {hash_}') 279 | 280 | hash_ = hash_.lower() 281 | tqdm.write(f'要搜索的哈希值: {hash_}') 282 | 283 | def cbk_is_hash_match(info): 284 | """.""" 285 | if info['pending'] and hash_ in info['pending']: 286 | return True 287 | if info['confirmed'] and any(hash_ in [ic['md5'], ic['sha1'], ic['sha256']] for ic in info['confirmed']): 288 | return True 289 | 290 | return False 291 | 292 | cbk_search(cbk_is_hash_match, out, col, is_json) 293 | 294 | 295 | @search.command(name='word') 296 | @click.option('-w', '--word', required=True, help='要搜索的词(默认只搜索标题)') 297 | @click.option('--content', 'include_content', is_flag=True, default=False, show_default=True, help='是否搜索内容') 298 | @add_options(_out_options) 299 | def search_word(word, include_content, out, col, is_json): 300 | """搜索文章关键词(默认只有title, 如果需要搜索content则指定 --content).""" 301 | word = word.lower() 302 | tqdm.write(f'要搜索的词: {word}') 303 | 304 | def cbk_is_word_match(info): 305 | """.""" 306 | return word in info['title'].lower() or (include_content and info['topic_list'] and word in info['topic_list']) 307 | 308 | cbk_search(cbk_is_word_match, out, col, is_json) 309 | 310 | 311 | @search.command(name='file') 312 | @click.option('-f', '--file', 'file_', required=True, help='包含有效哈希的文本文件(每行为1个哈希)') 313 | @add_options(_out_options) 314 | def search_file(file_, out, col, is_json): 315 | """从指定文件中读取哈希值, 并搜索.""" 316 | if not os.path.exists(file_): 317 | raise Exception(f'指定文件不存在: {file_}') 318 | file_ = os.path.abspath(file_) 319 | tqdm.write(f'文件路径: {file_}') 320 | 321 | # 读取文件 322 | lines = [] 323 | try: 324 | with open(file_, encoding='utf-8') as f: 325 | 326 | lines = f.readlines() 327 | tqdm.write(f'原文件行数: {len(lines)}') 328 | 329 | lines = [l.strip() for l in lines if l and l.strip() and check_is_valid_hash(l.strip())] 330 | if not lines: 331 | raise Exception(f'过滤后一行都没有了!') 332 | else: 333 | tqdm.write(f'过滤后剩余行数: {len(lines)}') 334 | 335 | except Exception as e: 336 | raise Exception(f'读取指定的文本文件异常, 文件: {file_}, 异常原因: {e.args}') 337 | 338 | # 去掉重复项 339 | assert lines 340 | h_set = set([l.lower() for l in lines]) 341 | assert h_set 342 | tqdm.write(f'过滤后剩余有效哈希数: {len(h_set)}') 343 | 344 | # 匹配 345 | def cbk_is_hash_match(info): 346 | """.""" 347 | if info['pending'] and h_set.intersection(info['pending']): 348 | return True 349 | if info['confirmed'] and any(h_set.intersection([ic['md5'], ic['sha1'], ic['sha256']]) for ic in info['confirmed']): 350 | return True 351 | 352 | return False 353 | 354 | cbk_search(cbk_is_hash_match, out, col, is_json) 355 | 356 | 357 | @search.command(name='url') 358 | @click.option('-u', '--url', required=True, help='要搜索的url片段') 359 | @add_options(_out_options) 360 | def search_url(url, out, col, is_json): 361 | """搜索文章链接.""" 362 | url = url.lower() 363 | tqdm.write(f'要搜索的链接: {url}') 364 | 365 | def cbk_is_url_match(info): 366 | """.""" 367 | return url in info['link'].lower() 368 | 369 | cbk_search(cbk_is_url_match, out, col, is_json) 370 | 371 | 372 | # @cli.command(name='diff') 373 | # def diff_json(): 374 | # """与指定 json 比较.""" 375 | # # TODO 376 | # pass 377 | 378 | 379 | cli.add_command(search) 380 | 381 | # main 382 | 383 | 384 | if __name__ == '__main__': 385 | cli() 386 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | click 2 | tqdm --------------------------------------------------------------------------------