├── .gitignore ├── LICENSE ├── README.md ├── ioc-parser.py ├── output.py ├── patterns.ini ├── whitelist.py └── whitelists ├── whitelist_CVE.ini ├── whitelist_Email.ini ├── whitelist_Filename.ini ├── whitelist_Filepath.ini ├── whitelist_Host.ini ├── whitelist_IP.ini ├── whitelist_MD5.ini ├── whitelist_Registry.ini ├── whitelist_SHA1.ini ├── whitelist_SHA256.ini └── whitelist_URL.ini /.gitignore: -------------------------------------------------------------------------------- 1 | venv 2 | *.pyc -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 armbues 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ioc-parser 2 | IOC Parser is a tool to extract indicators of compromise from security reports in PDF format. A good collection of APT related reports with many IOCs can be found here: [APTNotes](https://github.com/kbandla/APTnotes). 3 | 4 | ## Usage 5 | **ioc-parser.py [-h] [-p INI] [-i FORMAT] [-o FORMAT] [-d] [-l LIB] FILE** 6 | * *FILE* File/directory path to report(s) 7 | * *-p INI* Pattern file 8 | * *-i FORMAT* Input format (pdf/txt/html) 9 | * *-o FORMAT* Output format (csv/json/yara/autofocus) 10 | * *-d* Deduplicate matches 11 | * *-l LIB* Parsing library 12 | 13 | ## Requirements 14 | One of the following PDF parsing libraries: 15 | * [PyPDF2](https://github.com/mstamy2/PyPDF2) - *pip install pypdf2* 16 | * [pdfminer](https://github.com/euske/pdfminer) - *pip install pdfminer* 17 | 18 | For HTML parsing support: 19 | * [BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/) - *pip install beautifulsoup4* 20 | 21 | For HTTP(S) support: 22 | * [requests](http://docs.python-requests.org/en/latest/) - *pip install requests* -------------------------------------------------------------------------------- /ioc-parser.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | ################################################################################################### 4 | # 5 | # Copyright (c) 2015, Armin Buescher (armin.buescher@googlemail.com) 6 | # 7 | # Permission is hereby granted, free of charge, to any person obtaining a copy 8 | # of this software and associated documentation files (the "Software"), to deal 9 | # in the Software without restriction, including without limitation the rights 10 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 11 | # copies of the Software, and to permit persons to whom the Software is 12 | # furnished to do so, subject to the following conditions: 13 | # 14 | # The above copyright notice and this permission notice shall be included in all 15 | # copies or substantial portions of the Software. 16 | # 17 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 18 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 19 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 20 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 21 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 22 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 23 | # SOFTWARE. 24 | # 25 | ################################################################################################### 26 | # 27 | # File: ioc-parser.py 28 | # Description: IOC Parser is a tool to extract indicators of compromise from security reports 29 | # in PDF format. 30 | # Usage: ioc-parser.py [-h] [-p INI] [-f FORMAT] PDF 31 | # Req.: PyPDF2 (https://github.com/mstamy2/PyPDF2) 32 | # Author: Armin Buescher (@armbues) 33 | # Contributors: Angelo Dell'Aera (@angelodellaera) 34 | # Thanks to: Jose Ramon Palanco 35 | # Koen Van Impe (@cudeso) 36 | # 37 | ################################################################################################### 38 | # 39 | # 05/18/15 - Palo Alto Networks AutoFocus output format added by Christopher Clark 40 | # cclark@paloaltonetworks.com - https://github.com/Xen0ph0n/ 41 | # 42 | ################################################################################################### 43 | import os 44 | import sys 45 | import fnmatch 46 | import argparse 47 | import re 48 | from StringIO import StringIO 49 | try: 50 | import configparser as ConfigParser 51 | except ImportError: 52 | import ConfigParser 53 | 54 | # Import optional third-party libraries 55 | IMPORTS = [] 56 | try: 57 | from PyPDF2 import PdfFileReader 58 | IMPORTS.append('pypdf2') 59 | except ImportError: 60 | pass 61 | try: 62 | from pdfminer.pdfpage import PDFPage 63 | from pdfminer.pdfinterp import PDFResourceManager 64 | from pdfminer.converter import TextConverter 65 | from pdfminer.pdfinterp import PDFPageInterpreter 66 | from pdfminer.layout import LAParams 67 | IMPORTS.append('pdfminer') 68 | except ImportError: 69 | pass 70 | try: 71 | from bs4 import BeautifulSoup 72 | IMPORTS.append('beautifulsoup') 73 | except ImportError: 74 | pass 75 | try: 76 | import requests 77 | IMPORTS.append('requests') 78 | except ImportError: 79 | pass 80 | 81 | # Import additional project source files 82 | import output 83 | from whitelist import WhiteList 84 | 85 | class IOC_Parser(object): 86 | patterns = {} 87 | 88 | def __init__(self, patterns_ini, input_format = 'pdf', output_format='csv', dedup=False, library='pypdf2'): 89 | basedir = os.path.dirname(os.path.abspath(__file__)) 90 | self.load_patterns(patterns_ini) 91 | self.whitelist = WhiteList(basedir) 92 | self.handler = output.getHandler(output_format) 93 | self.dedup = dedup 94 | 95 | self.ext_filter = "*." + input_format 96 | parser_format = "parse_" + input_format 97 | try: 98 | self.parser_func = getattr(self, parser_format) 99 | except AttributeError: 100 | e = 'Selected parser format is not supported: %s' % (input_format) 101 | raise NotImplementedError(e) 102 | 103 | self.library = library 104 | if input_format == 'pdf': 105 | if library not in IMPORTS: 106 | e = 'Selected PDF parser library not found: %s' % (library) 107 | raise ImportError(e) 108 | elif input_format == 'html': 109 | if 'beautifulsoup' not in IMPORTS: 110 | e = 'HTML parser library not found: BeautifulSoup' 111 | raise ImportError(e) 112 | 113 | def load_patterns(self, fpath): 114 | config = ConfigParser.ConfigParser() 115 | with open(fpath) as f: 116 | config.readfp(f) 117 | 118 | for ind_type in config.sections(): 119 | try: 120 | ind_pattern = config.get(ind_type, 'pattern') 121 | except: 122 | continue 123 | 124 | if ind_pattern: 125 | ind_regex = re.compile(ind_pattern) 126 | self.patterns[ind_type] = ind_regex 127 | 128 | def is_whitelisted(self, ind_match, ind_type): 129 | for w in self.whitelist[ind_type]: 130 | if w.findall(ind_match): 131 | return True 132 | 133 | return False 134 | 135 | def parse_page(self, fpath, data, page_num): 136 | for ind_type, ind_regex in self.patterns.items(): 137 | matches = ind_regex.findall(data) 138 | 139 | for ind_match in matches: 140 | if isinstance(ind_match, tuple): 141 | ind_match = ind_match[0] 142 | 143 | if self.is_whitelisted(ind_match, ind_type): 144 | continue 145 | 146 | if self.dedup: 147 | if (ind_type, ind_match) in self.dedup_store: 148 | continue 149 | 150 | self.dedup_store.add((ind_type, ind_match)) 151 | 152 | self.handler.print_match(fpath, page_num, ind_type, ind_match) 153 | 154 | def parse_pdf_pypdf2(self, f, fpath): 155 | try: 156 | pdf = PdfFileReader(f, strict = False) 157 | 158 | if self.dedup: 159 | self.dedup_store = set() 160 | 161 | self.handler.print_header(fpath) 162 | page_num = 0 163 | for page in pdf.pages: 164 | page_num += 1 165 | 166 | data = page.extractText() 167 | 168 | self.parse_page(fpath, data, page_num) 169 | self.handler.print_footer(fpath) 170 | except (KeyboardInterrupt, SystemExit): 171 | raise 172 | except Exception as e: 173 | self.handler.print_error(fpath, e) 174 | 175 | def parse_pdf_pdfminer(self, f, fpath): 176 | try: 177 | laparams = LAParams() 178 | laparams.all_texts = True 179 | rsrcmgr = PDFResourceManager() 180 | pagenos = set() 181 | 182 | if self.dedup: 183 | self.dedup_store = set() 184 | 185 | self.handler.print_header(fpath) 186 | page_num = 0 187 | for page in PDFPage.get_pages(f, pagenos, check_extractable=True): 188 | page_num += 1 189 | 190 | retstr = StringIO() 191 | device = TextConverter(rsrcmgr, retstr, laparams=laparams) 192 | interpreter = PDFPageInterpreter(rsrcmgr, device) 193 | interpreter.process_page(page) 194 | data = retstr.getvalue() 195 | retstr.close() 196 | 197 | self.parse_page(fpath, data, page_num) 198 | self.handler.print_footer(fpath) 199 | except (KeyboardInterrupt, SystemExit): 200 | raise 201 | except Exception as e: 202 | self.handler.print_error(fpath, e) 203 | 204 | def parse_pdf(self, f, fpath): 205 | parser_format = "parse_pdf_" + self.library 206 | try: 207 | self.parser_func = getattr(self, parser_format) 208 | except AttributeError: 209 | e = 'Selected PDF parser library is not supported: %s' % (self.library) 210 | raise NotImplementedError(e) 211 | 212 | self.parser_func(f, fpath) 213 | 214 | def parse_txt(self, f, fpath): 215 | try: 216 | if self.dedup: 217 | self.dedup_store = set() 218 | 219 | data = f.read() 220 | self.handler.print_header(fpath) 221 | self.parse_page(fpath, data, 1) 222 | self.handler.print_footer(fpath) 223 | except (KeyboardInterrupt, SystemExit): 224 | raise 225 | except Exception as e: 226 | self.handler.print_error(fpath, e) 227 | 228 | def parse_html(self, f, fpath): 229 | try: 230 | if self.dedup: 231 | self.dedup_store = set() 232 | 233 | data = f.read() 234 | soup = BeautifulSoup(data) 235 | html = soup.findAll(text=True) 236 | 237 | text = u'' 238 | for elem in html: 239 | if elem.parent.name in ['style', 'script', '[document]', 'head', 'title']: 240 | continue 241 | elif re.match('', unicode(elem)): 242 | continue 243 | else: 244 | text += unicode(elem) 245 | 246 | self.handler.print_header(fpath) 247 | self.parse_page(fpath, text, 1) 248 | self.handler.print_footer(fpath) 249 | except (KeyboardInterrupt, SystemExit): 250 | raise 251 | except Exception as e: 252 | self.handler.print_error(fpath, e) 253 | 254 | def parse(self, path): 255 | try: 256 | if path.startswith('http://') or path.startswith('https://'): 257 | if 'requests' not in IMPORTS: 258 | e = 'HTTP library not found: requests' 259 | raise ImportError(e) 260 | headers = { 'User-Agent': 'Mozilla/5.0 Gecko Firefox' } 261 | r = requests.get(path, headers=headers) 262 | r.raise_for_status() 263 | f = StringIO(r.content) 264 | self.parser_func(f, path) 265 | return 266 | elif os.path.isfile(path): 267 | with open(path, 'rb') as f: 268 | self.parser_func(f, path) 269 | return 270 | elif os.path.isdir(path): 271 | for walk_root, walk_dirs, walk_files in os.walk(path): 272 | for walk_file in fnmatch.filter(walk_files, self.ext_filter): 273 | fpath = os.path.join(walk_root, walk_file) 274 | with open(fpath, 'rb') as f: 275 | self.parser_func(f, fpath) 276 | return 277 | 278 | e = 'File path is not a file, directory or URL: %s' % (path) 279 | raise IOError(e) 280 | except (KeyboardInterrupt, SystemExit): 281 | raise 282 | except Exception as e: 283 | self.handler.print_error(path, e) 284 | 285 | if __name__ == "__main__": 286 | argparser = argparse.ArgumentParser() 287 | argparser.add_argument('PATH', action='store', help='File/directory/URL to report(s)') 288 | argparser.add_argument('-p', dest='INI', default=os.path.join(os.path.dirname(os.path.abspath(__file__)), 'patterns.ini'), help='Pattern file') 289 | argparser.add_argument('-i', dest='INPUT_FORMAT', default='pdf', help='Input format (pdf/txt)') 290 | argparser.add_argument('-o', dest='OUTPUT_FORMAT', default='csv', help='Output format (csv/json/yara)') 291 | argparser.add_argument('-d', dest='DEDUP', action='store_true', default=False, help='Deduplicate matches') 292 | argparser.add_argument('-l', dest='LIB', default='pdfminer', help='PDF parsing library (pypdf2/pdfminer)') 293 | 294 | args = argparser.parse_args() 295 | 296 | parser = IOC_Parser(args.INI, args.INPUT_FORMAT, args.OUTPUT_FORMAT, args.DEDUP, args.LIB) 297 | parser.parse(args.PATH) 298 | -------------------------------------------------------------------------------- /output.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import csv 4 | import json 5 | 6 | OUTPUT_FORMATS = ('csv', 'json', 'yara', 'autofocus') 7 | 8 | def getHandler(output_format): 9 | output_format = output_format.lower() 10 | if output_format not in OUTPUT_FORMATS: 11 | print("[WARNING] Invalid output format specified.. using CSV") 12 | output_format = 'csv' 13 | 14 | handler_format = "OutputHandler_" + output_format 15 | handler_class = getattr(sys.modules[__name__], handler_format) 16 | 17 | return handler_class() 18 | 19 | class OutputHandler(object): 20 | def print_match(self, fpath, page, name, match, last = False): 21 | pass 22 | 23 | def print_header(self, fpath): 24 | pass 25 | 26 | def print_footer(self, fpath): 27 | pass 28 | 29 | def print_error(self, fpath, exception): 30 | print("[ERROR] %s" % (exception)) 31 | 32 | class OutputHandler_csv(OutputHandler): 33 | def __init__(self): 34 | self.csv_writer = csv.writer(sys.stdout, delimiter = '\t') 35 | 36 | def print_match(self, fpath, page, name, match): 37 | self.csv_writer.writerow((fpath, page, name, match)) 38 | 39 | def print_error(self, fpath, exception): 40 | self.csv_writer.writerow((fpath, '0', 'error', exception)) 41 | 42 | class OutputHandler_json(OutputHandler): 43 | def print_match(self, fpath, page, name, match): 44 | data = { 45 | 'path' : fpath, 46 | 'file' : os.path.basename(fpath), 47 | 'page' : page, 48 | 'type' : name, 49 | 'match': match 50 | } 51 | 52 | print(json.dumps(data)) 53 | 54 | def print_error(self, fpath, exception): 55 | data = { 56 | 'path' : fpath, 57 | 'file' : os.path.basename(fpath), 58 | 'type' : 'error', 59 | 'exception' : exception 60 | } 61 | 62 | print(json.dumps(data)) 63 | 64 | class OutputHandler_yara(OutputHandler): 65 | def __init__(self): 66 | self.rule_enc = ''.join(chr(c) if chr(c).isupper() or chr(c).islower() or chr(c).isdigit() else '_' for c in range(256)) 67 | 68 | def print_match(self, fpath, page, name, match): 69 | if name in self.cnt: 70 | self.cnt[name] += 1 71 | else: 72 | self.cnt[name] = 1 73 | 74 | string_id = "$%s%d" % (name, self.cnt[name]) 75 | self.sids.append(string_id) 76 | string_value = match.replace('\\', '\\\\') 77 | print("\t\t%s = \"%s\"" % (string_id, string_value)) 78 | 79 | def print_header(self, fpath): 80 | rule_name = os.path.splitext(os.path.basename(fpath))[0].translate(self.rule_enc) 81 | 82 | print("rule %s" % (rule_name)) 83 | print("{") 84 | print("\tstrings:") 85 | 86 | self.cnt = {} 87 | self.sids = [] 88 | 89 | def print_footer(self, fpath): 90 | cond = ' or '.join(self.sids) 91 | 92 | print("\tcondition:") 93 | print("\t\t" + cond) 94 | print("}") 95 | 96 | class OutputHandler_autofocus(OutputHandler): 97 | def __init__(self): 98 | self.rule_enc = ''.join(chr(c) if chr(c).isupper() or chr(c).islower() or chr(c).isdigit() else '_' for c in range(256)) 99 | 100 | def print_match(self, fpath, page, name, match): 101 | string_value = match.replace('hxxp', 'http').replace('\\', '\\\\') 102 | 103 | if name == "MD5": 104 | auto_focus_query = '{"field":"sample.md5","operator":"is","value":\"%s\"},' % (string_value) 105 | elif name == "SHA1": 106 | auto_focus_query = '{"field":"sample.sha1","operator":"is","value":\"%s\"},' % (string_value) 107 | elif name == "SHA256": 108 | auto_focus_query = '{"field":"sample.sha256","operator":"is","value":\"%s\"},' % (string_value) 109 | elif name == "URL": 110 | auto_focus_query = '{"field":"sample.tasks.connection","operator":"contains","value":\"%s\"},' % (string_value) 111 | elif name == "Host": 112 | auto_focus_query = '{"field":"sample.tasks.dns","operator":"contains","value":\"%s\"},' % (string_value) 113 | elif name == "Registry": 114 | #auto_focus_query = '{"field":"sample.tasks.registry","operator":"is","value":\"%s\"},' % (string_value) 115 | return 116 | elif name == "Filepath": 117 | #auto_focus_query = '{"field":"sample.tasks.file","operator":"is","value":\"%s\"},' % (string_value) 118 | return 119 | elif name == "Filename": 120 | #auto_focus_query = '{"field":"alias.filename","operator":"is","value":\"%s\"},' % (string_value) 121 | return 122 | elif name == "Email": 123 | #auto_focus_query = '{"field":"alias.email","operator":"is","value":\"%s\"},' % (string_value) 124 | return 125 | elif name == "IP": 126 | auto_focus_query = '{"field":"sample.tasks.connection","operator":"contains","value":\"%s\"},' % (string_value) 127 | elif name == "CVE": 128 | return 129 | print(auto_focus_query) 130 | 131 | def print_header(self, fpath): 132 | rule_name = os.path.splitext(os.path.basename(fpath))[0].translate(self.rule_enc) 133 | 134 | print("AutoFocus Search for: %s" % (rule_name)) 135 | print('{"operator":"Any","children":[') 136 | 137 | 138 | def print_footer(self, fpath): 139 | rule_name = os.path.splitext(os.path.basename(fpath))[0].translate(self.rule_enc) 140 | print('{"field":"sample.tag","operator":"is in the list","value":[\"%s\"]}]}' % (rule_name)) 141 | 142 | 143 | 144 | -------------------------------------------------------------------------------- /patterns.ini: -------------------------------------------------------------------------------- 1 | [URL] 2 | pattern: \b([a-z]{3,}\:\/\/[\S]{16,})\b 3 | 4 | [Host] 5 | pattern: \b(([a-z0-9\-]{2,}\.)+(abogado|ac|academy|accountants|active|actor|ad|adult|ae|aero|af|ag|agency|ai|airforce|al|allfinanz|alsace|am|amsterdam|an|android|ao|aq|aquarelle|ar|archi|army|arpa|as|asia|associates|at|attorney|au|auction|audio|autos|aw|ax|axa|az|ba|band|bank|bar|barclaycard|barclays|bargains|bayern|bb|bd|be|beer|berlin|best|bf|bg|bh|bi|bid|bike|bingo|bio|biz|bj|black|blackfriday|bloomberg|blue|bm|bmw|bn|bnpparibas|bo|boo|boutique|br|brussels|bs|bt|budapest|build|builders|business|buzz|bv|bw|by|bz|bzh|ca|cal|camera|camp|cancerresearch|canon|capetown|capital|caravan|cards|care|career|careers|cartier|casa|cash|cat|catering|cc|cd|center|ceo|cern|cf|cg|ch|channel|chat|cheap|christmas|chrome|church|ci|citic|city|ck|cl|claims|cleaning|click|clinic|clothing|club|cm|cn|co|coach|codes|coffee|college|cologne|com|community|company|computer|condos|construction|consulting|contractors|cooking|cool|coop|country|cr|credit|creditcard|cricket|crs|cruises|cu|cuisinella|cv|cw|cx|cy|cymru|cz|dabur|dad|dance|dating|day|dclk|de|deals|degree|delivery|democrat|dental|dentist|desi|design|dev|diamonds|diet|digital|direct|directory|discount|dj|dk|dm|dnp|do|docs|domains|doosan|durban|dvag|dz|eat|ec|edu|education|ee|eg|email|emerck|energy|engineer|engineering|enterprises|equipment|er|es|esq|estate|et|eu|eurovision|eus|events|everbank|exchange|expert|exposed|fail|farm|fashion|feedback|fi|finance|financial|firmdale|fish|fishing|fit|fitness|fj|fk|flights|florist|flowers|flsmidth|fly|fm|fo|foo|forsale|foundation|fr|frl|frogans|fund|furniture|futbol|ga|gal|gallery|garden|gb|gbiz|gd|ge|gent|gf|gg|ggee|gh|gi|gift|gifts|gives|gl|glass|gle|global|globo|gm|gmail|gmo|gmx|gn|goog|google|gop|gov|gp|gq|gr|graphics|gratis|green|gripe|gs|gt|gu|guide|guitars|guru|gw|gy|hamburg|hangout|haus|healthcare|help|here|hermes|hiphop|hiv|hk|hm|hn|holdings|holiday|homes|horse|host|hosting|house|how|hr|ht|hu|ibm|id|ie|ifm|il|im|immo|immobilien|in|industries|info|ing|ink|institute|insure|int|international|investments|io|iq|ir|irish|is|it|iwc|jcb|je|jetzt|jm|jo|jobs|joburg|jp|juegos|kaufen|kddi|ke|kg|kh|ki|kim|kitchen|kiwi|km|kn|koeln|kp|kr|krd|kred|kw|ky|kyoto|kz|la|lacaixa|land|lat|latrobe|lawyer|lb|lc|lds|lease|legal|lgbt|li|lidl|life|lighting|limited|limo|link|lk|loans|london|lotte|lotto|lr|ls|lt|ltda|lu|luxe|luxury|lv|ly|ma|madrid|maison|management|mango|market|marketing|marriott|mc|md|me|media|meet|melbourne|meme|memorial|menu|mg|mh|miami|mil|mini|mk|ml|mm|mn|mo|mobi|moda|moe|monash|money|mormon|mortgage|moscow|motorcycles|mov|mp|mq|mr|ms|mt|mu|museum|mv|mw|mx|my|mz|na|nagoya|name|navy|nc|ne|net|network|neustar|new|nexus|nf|ng|ngo|nhk|ni|ninja|nl|no|np|nr|nra|nrw|ntt|nu|nyc|nz|okinawa|om|one|ong|onl|ooo|org|organic|osaka|otsuka|ovh|pa|paris|partners|parts|party|pe|pf|pg|ph|pharmacy|photo|photography|photos|physio|pics|pictures|pink|pizza|pk|pl|place|plumbing|pm|pn|pohl|poker|porn|post|pr|praxi|press|pro|prod|productions|prof|properties|property|ps|pt|pub|pw|qa|qpon|quebec|re|realtor|recipes|red|rehab|reise|reisen|reit|ren|rentals|repair|report|republican|rest|restaurant|reviews|rich|rio|rip|ro|rocks|rodeo|rs|rsvp|ru|ruhr|rw|ryukyu|sa|saarland|sale|samsung|sarl|sb|sc|sca|scb|schmidt|schule|schwarz|science|scot|sd|se|services|sew|sexy|sg|sh|shiksha|shoes|shriram|si|singles|sj|sk|sky|sl|sm|sn|so|social|software|sohu|solar|solutions|soy|space|spiegel|sr|st|style|su|supplies|supply|support|surf|surgery|suzuki|sv|sx|sy|sydney|systems|sz|taipei|tatar|tattoo|tax|tc|td|technology|tel|temasek|tennis|tf|tg|th|tienda|tips|tires|tirol|tj|tk|tl|tm|tn|to|today|tokyo|tools|top|toshiba|town|toys|tp|tr|trade|training|travel|trust|tt|tui|tv|tw|tz|ua|ug|uk|university|uno|uol|us|uy|uz|va|vacations|vc|ve|vegas|ventures|versicherung|vet|vg|vi|viajes|video|villas|vision|vlaanderen|vn|vodka|vote|voting|voto|voyage|vu|wales|wang|watch|webcam|website|wed|wedding|wf|whoswho|wien|wiki|williamhill|wme|work|works|world|ws|wtc|wtf|xxx|xyz|yachts|yandex|ye|yoga|yokohama|youtube|yt|za|zm|zone|zuerich|zw))\b 6 | 7 | [IP] 8 | pattern: \b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b 9 | 10 | [Email] 11 | pattern: \b([a-z][_a-z0-9-.]+@[a-z0-9-]+\.[a-z]+)\b 12 | 13 | [MD5] 14 | pattern: \b([a-f0-9]{32}|[A-F0-9]{32})\b 15 | 16 | [SHA1] 17 | pattern: \b([a-f0-9]{40}|[A-F0-9]{40})\b 18 | 19 | [SHA256] 20 | pattern: \b([a-f0-9]{64}|[A-F0-9]{64})\b 21 | 22 | [CVE] 23 | pattern: \b(CVE\-[0-9]{4}\-[0-9]{4,6})\b 24 | 25 | [Registry] 26 | pattern: \b((HKLM|HKCU)\\[\\A-Za-z0-9-_]+)\b 27 | 28 | [Filename] 29 | pattern: \b([A-Za-z0-9-_\.]+\.(exe|dll|bat|sys|htm|html|js|jar|jpg|png|vb|scr|pif|chm|zip|rar|cab|pdf|doc|docx|ppt|pptx|xls|xlsx|swf|gif))\b 30 | 31 | [Filepath] 32 | pattern: \b[A-Z]:\\[A-Za-z0-9-_\.\\]+\b 33 | -------------------------------------------------------------------------------- /whitelist.py: -------------------------------------------------------------------------------- 1 | import os 2 | import glob 3 | import re 4 | 5 | class WhiteList(dict): 6 | def __init__(self, basedir): 7 | searchdir = os.path.join(basedir, "whitelists/whitelist_*.ini") 8 | fpaths = glob.glob(searchdir) 9 | for fpath in fpaths: 10 | t = os.path.splitext(fpath)[0].split('_',1)[1] 11 | patterns = [line.strip() for line in open(fpath)] 12 | self[t] = [re.compile(p) for p in patterns] -------------------------------------------------------------------------------- /whitelists/whitelist_CVE.ini: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PaloAltoNetworks/ioc-parser/c5ef1fc81da499039cd9554d1e11dfb434828fd1/whitelists/whitelist_CVE.ini -------------------------------------------------------------------------------- /whitelists/whitelist_Email.ini: -------------------------------------------------------------------------------- 1 | @fireeye.com 2 | @crowdstrike.com 3 | @f-secure.com 4 | @kaspersky.com 5 | @gdata.de 6 | @cylance.com -------------------------------------------------------------------------------- /whitelists/whitelist_Filename.ini: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PaloAltoNetworks/ioc-parser/c5ef1fc81da499039cd9554d1e11dfb434828fd1/whitelists/whitelist_Filename.ini -------------------------------------------------------------------------------- /whitelists/whitelist_Filepath.ini: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PaloAltoNetworks/ioc-parser/c5ef1fc81da499039cd9554d1e11dfb434828fd1/whitelists/whitelist_Filepath.ini -------------------------------------------------------------------------------- /whitelists/whitelist_Host.ini: -------------------------------------------------------------------------------- 1 | eset.com$ 2 | kaspersky.com$ 3 | trendmicro.com$ 4 | metasploit.com$ 5 | secunia.com$ 6 | symantec.com$ 7 | cisco.com$ 8 | fireeye.com$ 9 | mandiant.com$ 10 | bluecoat.com$ 11 | normanshark.com$ 12 | norman.no$ 13 | norman.com$ 14 | rsa.com$ 15 | f-secure.com$ 16 | securelist.com$ 17 | mcafee.com$ 18 | secureworks.com$ 19 | zscaler.com$ 20 | sophos.com$ 21 | avg.com$ 22 | isightpartners.com$ 23 | eset.sk$ 24 | rapid7.com$ 25 | crowdstrike.com$ 26 | gdata.de$ 27 | gdatasoftware.com$ 28 | fortinet.com$ 29 | fidelissecurity.com$ 30 | virustotal.com$ 31 | usenix.org$ 32 | cve.mitre.org$ 33 | clean-mx.de$ 34 | malwaredomainlist.com$ 35 | contagiodump.blogspot.com$ 36 | malware.dontneedcoffee.com$ 37 | exploit-db.com$ 38 | citizenlab.org$ 39 | crysys.hu$ 40 | krebsonsecurity.com$ 41 | darkreading.com$ 42 | shadowserver.org$ 43 | google.com$ 44 | facebook.com$ 45 | youtube.com$ 46 | twitter.com$ 47 | microsoft.com$ 48 | msn.com$ 49 | live.com$ 50 | windows.com$ 51 | adobe.com$ 52 | wikipedia.org$ 53 | linkedin.com$ 54 | yahoo.com$ 55 | gmail.com$ 56 | googlemail.com$ 57 | gmx.com$ 58 | gmx.de$ 59 | hotmail.com$ 60 | outlook.com$ 61 | yandex.ru$ 62 | github.com$ 63 | arstechnica.com$ 64 | wired.com$ 65 | zdnet.com$ 66 | bbc.co.uk$ 67 | dailymail.co.uk$ 68 | spiegel.de$ 69 | reuters.com$ 70 | theregister.co.uk$ 71 | forbes.com$ 72 | heise.de$ 73 | nytimes.com$ 74 | washingtonpost.com$ 75 | cbsnews.com$ -------------------------------------------------------------------------------- /whitelists/whitelist_IP.ini: -------------------------------------------------------------------------------- 1 | 127.0.0.1 -------------------------------------------------------------------------------- /whitelists/whitelist_MD5.ini: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PaloAltoNetworks/ioc-parser/c5ef1fc81da499039cd9554d1e11dfb434828fd1/whitelists/whitelist_MD5.ini -------------------------------------------------------------------------------- /whitelists/whitelist_Registry.ini: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PaloAltoNetworks/ioc-parser/c5ef1fc81da499039cd9554d1e11dfb434828fd1/whitelists/whitelist_Registry.ini -------------------------------------------------------------------------------- /whitelists/whitelist_SHA1.ini: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PaloAltoNetworks/ioc-parser/c5ef1fc81da499039cd9554d1e11dfb434828fd1/whitelists/whitelist_SHA1.ini -------------------------------------------------------------------------------- /whitelists/whitelist_SHA256.ini: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PaloAltoNetworks/ioc-parser/c5ef1fc81da499039cd9554d1e11dfb434828fd1/whitelists/whitelist_SHA256.ini -------------------------------------------------------------------------------- /whitelists/whitelist_URL.ini: -------------------------------------------------------------------------------- 1 | ^http:\/\/www.fireeye.com\/ 2 | ^http:\/\/blog.fireeye.com\/ 3 | ^http:\/\/www.symantec.com\/ 4 | ^http:\/\/blog.kaspersky.com\/ 5 | ^http:\/\/blog.trendmicro.com\/ 6 | ^https?:\/\/blogs.rsa.com\/ 7 | ^http:\/\/www.trendmicro.com\/ 8 | ^http:\/\/blog.trendmicro.com\/ 9 | ^http:\/\/blogs.norman.com\/ 10 | ^http:\/\/www.securelist.com\/ 11 | ^http:\/\/www.mcafee.com\/ 12 | ^http:\/\/blog.crysys.hu\/ 13 | ^http:\/\/tools.cisco.com\/security\/ 14 | ^http:\/\/www.secureworks.com\/research\/ 15 | ^http:\/\/threatexpert.com\/ 16 | ^http:\/\/www.f-secure.com\/weblog\/ 17 | ^http:\/\/nakedsecurity.sophos.com\/ 18 | ^http:\/\/blog.eset.com\/ 19 | ^http:\/\/www.gdata.de\/ 20 | ^http:\/\/www.sophos.com\/ 21 | ^http:\/\/normanshark.com\/ 22 | ^http:\/\/www.cve.mitre.org\/ 23 | ^http:\/\/www.virusbtn.com\/pdf\/ 24 | ^http:\/\/www.blackhat.com\/presentations\/ 25 | ^https?:\/\/www.usenix.org\/ 26 | ^http:\/\/blogs.sans.org\/ 27 | ^http:\/\/www.shadowserver.org\/ 28 | ^http:\/\/contagiodump.blogspot.com\/ 29 | ^http:\/\/support.clean-mx.de\/ 30 | ^http:\/\/lists.clean-mx.com\/ 31 | ^http:\/\/citizenlab.org\/ 32 | ^http:\/\/www.eff.org\/document\/ 33 | ^http:\/\/www.exploit-db.com\/exploits\/ 34 | ^http:\/\/www.adobe.com\/support\/security\/ 35 | ^http:\/\/krebsonsecurity.com\/ 36 | ^http:\/\/en.wikipedia.org\/wiki\/ --------------------------------------------------------------------------------