├── requirements.txt ├── .gitignore ├── dockerfile ├── LICENSE ├── README.md └── domainhunter.py /requirements.txt: -------------------------------------------------------------------------------- 1 | requests==2.13.0 2 | texttable==0.8.7 3 | beautifulsoup4==4.5.3 4 | lxml 5 | pillow==5.0.0 6 | pytesseract 7 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.html 2 | *.txt 3 | *.jpg 4 | Pipfile* 5 | 6 | .vscode/* 7 | !.vscode/settings.json 8 | !.vscode/tasks.json 9 | !.vscode/launch.json 10 | !.vscode/extensions.json 11 | -------------------------------------------------------------------------------- /dockerfile: -------------------------------------------------------------------------------- 1 | #build it: 2 | #docker build -t domainhunter:1.0 . 3 | #run it: 4 | #docker run -it domainhunter:1.0 [args] 5 | 6 | FROM ubuntu:16.04 7 | 8 | RUN apt-get update \ 9 | && apt-get install python3-pip -y\ 10 | && apt-get install tesseract-ocr -y\ 11 | && apt-get install python3-pil -y 12 | 13 | ADD domainhunter.py / 14 | ADD requirements.txt / 15 | 16 | RUN pip3 install -r requirements.txt 17 | 18 | ENTRYPOINT [ "python3", "./domainhunter.py" ] 19 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2017, Joe Vest, Andrew Chiles 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | * Redistributions of source code must retain the above copyright 7 | notice, this list of conditions and the following disclaimer. 8 | * Redistributions in binary form must reproduce the above copyright 9 | notice, this list of conditions and the following disclaimer in the 10 | documentation and/or other materials provided with the distribution. 11 | * Neither the name of the Domainhunter nor the 12 | names of its contributors may be used to endorse or promote products 13 | derived from this software without specific prior written permission. 14 | 15 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 16 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 17 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 18 | DISCLAIMED. IN NO EVENT SHALL COPYRIGHT HOLDER BE LIABLE FOR ANY 19 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 20 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 21 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 22 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 23 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 24 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Domain Hunter 2 | 3 | Authors Joe Vest (@joevest) & Andrew Chiles (@andrewchiles) 4 | 5 | Domain name selection is an important aspect of preparation for penetration tests and especially Red Team engagements. Commonly, domains that were used previously for benign purposes and were properly categorized can be purchased for only a few dollars. Such domains can allow a team to bypass reputation based web filters and network egress restrictions for phishing and C2 related tasks. 6 | 7 | This Python based tool was written to quickly query the Expireddomains.net search engine for expired/available domains with a previous history of use. It then optionally queries for domain reputation against services like Symantec WebPulse (BlueCoat), IBM X-Force, and Cisco Talos. The primary tool output is a timestamped HTML table style report. 8 | 9 | ## Changes 10 | 11 | - 5 October 2018 12 | + Fixed logic for filtering domains with desirable categorizations. Previously, some error conditions weren't filtered and would result in domains without a valid categorization making it into the final list. 13 | 14 | - 4 October 2018 15 | + Tweaked parsing logic 16 | + Fixed changes parsed columns indexes 17 | 18 | - 17 September 2018 19 | + Fixed Symantec WebPulse Site Review parsing errors caused by service updates 20 | 21 | - 18 May 2018 22 | + Add --alexa switch to control Alexa ranked site filtering 23 | 24 | - 16 May 2018 25 | + Update queries to increase probability of quickly finding a domain available for instant purchase. Previously, many reported domains had an "In Auction" or "Make an Offer" status. New criteria: .com|.net|.org + Alexa Ranked + Available for Purchase 26 | + Improved logic to filter out uncategorized and some potentially undesirable domain categorizations in the final text table and HTML output 27 | + Removed unnecessary columns from HTML report 28 | 29 | - 6 May 2018 30 | + Fixed expired domains parsing when performing a keyword search 31 | + Minor HTML and text table output updates 32 | + Filtered reputation checks to only execute for .COM, .ORG, and .NET domains and removed check for Archive.org records when performing a default or keyword search. Credit to @christruncer for the original PR and idea. 33 | 34 | - 11 April 2018 35 | + Added OCR support for CAPTCHA solving with tesseract. Thanks to t94j0 for the idea in [AIRMASTER](https://github.com/t94j0/AIRMASTER) 36 | + Added support for input file list of potential domains (-f/--filename) 37 | + Changed -q/--query switch to -k/--keyword to better match its purpose 38 | + Added additional error checking for ExpiredDomains.net parsing 39 | 40 | - 9 April 2018 41 | + Added -t switch for timing control. -t <1-5> 42 | + Added Google SafeBrowsing and PhishTank reputation checks 43 | + Fixed bug in IBMXForce response parsing 44 | 45 | - 7 April 2018 46 | + Fixed support for Symantec WebPulse Site Review (formerly Blue Coat WebFilter) 47 | + Added Cisco Talos Domain Reputation check 48 | + Added feature to perform a reputation check against a single non-expired domain. This is useful when monitoring reputation for domains used in ongoing campaigns and engagements. 49 | 50 | - 6 June 2017 51 | + Added python 3 support 52 | + Code cleanup and bug fixes 53 | + Added Status column (Available, Make Offer, Price, Backorder, etc) 54 | 55 | ## Features 56 | 57 | - Retrieve specified number of recently expired and deleted domains (.com, .net, .org) from ExpiredDomains.net 58 | - Retrieve available domains based on keyword search from ExpiredDomains.net 59 | - Perform reputation checks against the Symantec WebPulse Site Review (BlueCoat), IBM x-Force, Cisco Talos, Google SafeBrowsing, and PhishTank services 60 | - Sort results by domain age (if known) and filter for reputation 61 | - Text-based table and HTML report output with links to reputation sources and Archive.org entry 62 | 63 | ## Installation 64 | 65 | Install Python requirements 66 | 67 | pip3 install -r requirements.txt 68 | 69 | Optional - Install additional OCR support dependencies 70 | 71 | - Debian/Ubuntu: `apt-get install tesseract-ocr python3-imaging` 72 | 73 | - MAC OSX: `brew install tesseract` 74 | 75 | ## Usage 76 | 77 | usage: domainhunter.py [-h] [-a] [-k KEYWORD] [-c] [-f FILENAME] [--ocr] 78 | [-r MAXRESULTS] [-s SINGLE] [-t {0,1,2,3,4,5}] 79 | [-w MAXWIDTH] [-V] 80 | 81 | Finds expired domains, domain categorization, and Archive.org history to determine good candidates for C2 and phishing domains 82 | 83 | optional arguments: 84 | -h, --help show this help message and exit 85 | -a, --alexa Filter results to Alexa listings 86 | -k KEYWORD, --keyword KEYWORD 87 | Keyword used to refine search results 88 | -c, --check Perform domain reputation checks 89 | -f FILENAME, --filename FILENAME 90 | Specify input file of line delimited domain names to 91 | check 92 | --ocr Perform OCR on CAPTCHAs when challenged 93 | -r MAXRESULTS, --maxresults MAXRESULTS 94 | Number of results to return when querying latest 95 | expired/deleted domains 96 | -s SINGLE, --single SINGLE 97 | Performs detailed reputation checks against a single 98 | domain name/IP. 99 | -t {0,1,2,3,4,5}, --timing {0,1,2,3,4,5} 100 | Modifies request timing to avoid CAPTCHAs. Slowest(0) 101 | = 90-120 seconds, Default(3) = 10-20 seconds, 102 | Fastest(5) = no delay 103 | -w MAXWIDTH, --maxwidth MAXWIDTH 104 | Width of text table 105 | -V, --version show program's version number and exit 106 | 107 | Examples: 108 | ./domainhunter.py -k apples -c --ocr -t5 109 | ./domainhunter.py --check --ocr -t3 110 | ./domainhunter.py --single mydomain.com 111 | ./domainhunter.py --keyword tech --check --ocr --timing 5 --alexa 112 | ./domaihunter.py --filename inputlist.txt --ocr --timing 5 113 | 114 | Use defaults to check for most recent 100 domains and check reputation 115 | 116 | python3 ./domainhunter.py 117 | 118 | Search for 1000 most recently expired/deleted domains, but don't check reputation 119 | 120 | python3 ./domainhunter.py -r 1000 121 | 122 | Perform all reputation checks for a single domain 123 | 124 | python3 ./domainhunter.py -s mydomain.com 125 | 126 | [*] Downloading malware domain list from http://mirror1.malwaredomains.com/files/justdomains 127 | 128 | [*] Fetching domain reputation for: mydomain.com 129 | [*] Google SafeBrowsing and PhishTank: mydomain.com 130 | [+] mydomain.com: No issues found 131 | [*] BlueCoat: mydomain.com 132 | [+] mydomain.com: Technology/Internet 133 | [*] IBM xForce: mydomain.com 134 | [+] mydomain.com: Communication Services, Software as a Service, Cloud, (Score: 1) 135 | [*] Cisco Talos: mydomain.com 136 | [+] mydomain.com: Web Hosting (Score: Neutral) 137 | 138 | Perform all reputation checks for a list of domains at max speed with OCR of CAPTCHAs 139 | 140 | python3 ./domainhunter.py -f -t 5 --ocr 141 | 142 | Search for available domains with keyword term of "dog", max results of 25, and check reputation 143 | 144 | python3 ./domainhunter.py -k dog -r 25 -c 145 | 146 | ____ ___ __ __ _ ___ _ _ _ _ _ _ _ _ _____ _____ ____ 147 | | _ \ / _ \| \/ | / \ |_ _| \ | | | | | | | | | \ | |_ _| ____| _ \ 148 | | | | | | | | |\/| | / _ \ | || \| | | |_| | | | | \| | | | | _| | |_) | 149 | | |_| | |_| | | | |/ ___ \ | || |\ | | _ | |_| | |\ | | | | |___| _ < 150 | |____/ \___/|_| |_/_/ \_\___|_| \_| |_| |_|\___/|_| \_| |_| |_____|_| \_\ 151 | 152 | Expired Domains Reputation Checker 153 | Authors: @joevest and @andrewchiles 154 | 155 | DISCLAIMER: This is for educational purposes only! 156 | It is designed to promote education and the improvement of computer/cyber security. 157 | The authors or employers are not liable for any illegal act or misuse performed by any user of this tool. 158 | If you plan to use this content for illegal purpose, don't. Have a nice day :) 159 | 160 | [*] Downloading malware domain list from http://mirror1.malwaredomains.com/files/justdomains 161 | 162 | [*] Fetching expired or deleted domains containing "dog" 163 | [*] https://www.expireddomains.net/domain-name-search/?q=dog 164 | 165 | [*] Performing domain reputation checks for 8 domains. 166 | [*] BlueCoat: doginmysuitcase.com 167 | [+] doginmysuitcase.com: Travel 168 | [*] IBM xForce: doginmysuitcase.com 169 | [+] doginmysuitcase.com: Not found. 170 | [*] Cisco Talos: doginmysuitcase.com 171 | [+] doginmysuitcase.com: Uncategorized 172 | -------------------------------------------------------------------------------- /domainhunter.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | ## Title: domainhunter.py 4 | ## Author: @joevest and @andrewchiles 5 | ## Description: Checks expired domains, reputation/categorization, and Archive.org history to determine 6 | ## good candidates for phishing and C2 domain names 7 | 8 | # If the expected response format from a provider changes, use the traceback module to get a full stack trace without removing try/catch blocks 9 | #import traceback 10 | #traceback.print_exc() 11 | 12 | import time 13 | import random 14 | import argparse 15 | import json 16 | import base64 17 | import os 18 | 19 | __version__ = "20181005" 20 | 21 | ## Functions 22 | 23 | def doSleep(timing): 24 | if timing == 0: 25 | time.sleep(random.randrange(90,120)) 26 | elif timing == 1: 27 | time.sleep(random.randrange(60,90)) 28 | elif timing == 2: 29 | time.sleep(random.randrange(30,60)) 30 | elif timing == 3: 31 | time.sleep(random.randrange(10,20)) 32 | elif timing == 4: 33 | time.sleep(random.randrange(5,10)) 34 | # There's no elif timing == 5 here because we don't want to sleep for -t 5 35 | 36 | def checkBluecoat(domain): 37 | try: 38 | url = 'https://sitereview.bluecoat.com/resource/lookup' 39 | postData = {'url':domain,'captcha':''} 40 | headers = {'User-Agent':useragent, 41 | 'Accept':'application/json, text/plain, */*', 42 | 'Content-Type':'application/json; charset=UTF-8', 43 | 'Referer':'https://sitereview.bluecoat.com/lookup'} 44 | 45 | print('[*] BlueCoat: {}'.format(domain)) 46 | response = s.post(url,headers=headers,json=postData,verify=False) 47 | responseJSON = json.loads(response.text) 48 | 49 | if 'errorType' in responseJSON: 50 | a = responseJSON['errorType'] 51 | else: 52 | a = responseJSON['categorization'][0]['name'] 53 | 54 | # Print notice if CAPTCHAs are blocking accurate results and attempt to solve if --ocr 55 | if a == 'captcha': 56 | if ocr: 57 | # This request is also performed by a browser, but is not needed for our purposes 58 | #captcharequestURL = 'https://sitereview.bluecoat.com/resource/captcha-request' 59 | 60 | print('[*] Received CAPTCHA challenge!') 61 | captcha = solveCaptcha('https://sitereview.bluecoat.com/resource/captcha.jpg',s) 62 | 63 | if captcha: 64 | b64captcha = base64.urlsafe_b64encode(captcha.encode('utf-8')).decode('utf-8') 65 | 66 | # Send CAPTCHA solution via GET since inclusion with the domain categorization request doens't work anymore 67 | captchasolutionURL = 'https://sitereview.bluecoat.com/resource/captcha-request/{0}'.format(b64captcha) 68 | print('[*] Submiting CAPTCHA at {0}'.format(captchasolutionURL)) 69 | response = s.get(url=captchasolutionURL,headers=headers,verify=False) 70 | 71 | # Try the categorization request again 72 | response = s.post(url,headers=headers,json=postData,verify=False) 73 | 74 | responseJSON = json.loads(response.text) 75 | 76 | if 'errorType' in responseJSON: 77 | a = responseJSON['errorType'] 78 | else: 79 | a = responseJSON['categorization'][0]['name'] 80 | else: 81 | print('[-] Error: Failed to solve BlueCoat CAPTCHA with OCR! Manually solve at "https://sitereview.bluecoat.com/sitereview.jsp"') 82 | else: 83 | print('[-] Error: BlueCoat CAPTCHA received. Try --ocr flag or manually solve a CAPTCHA at "https://sitereview.bluecoat.com/sitereview.jsp"') 84 | 85 | return a 86 | 87 | except Exception as e: 88 | print('[-] Error retrieving Bluecoat reputation! {0}'.format(e)) 89 | return "error" 90 | 91 | def checkIBMXForce(domain): 92 | try: 93 | url = 'https://exchange.xforce.ibmcloud.com/url/{}'.format(domain) 94 | headers = {'User-Agent':useragent, 95 | 'Accept':'application/json, text/plain, */*', 96 | 'x-ui':'XFE', 97 | 'Origin':url, 98 | 'Referer':url} 99 | 100 | print('[*] IBM xForce: {}'.format(domain)) 101 | 102 | url = 'https://api.xforce.ibmcloud.com/url/{}'.format(domain) 103 | response = s.get(url,headers=headers,verify=False) 104 | 105 | responseJSON = json.loads(response.text) 106 | 107 | if 'error' in responseJSON: 108 | a = responseJSON['error'] 109 | 110 | elif not responseJSON['result']['cats']: 111 | a = 'Uncategorized' 112 | 113 | ## TO-DO - Add noticed when "intrusion" category is returned. This is indication of rate limit / brute-force protection hit on the endpoint 114 | 115 | else: 116 | categories = '' 117 | # Parse all dictionary keys and append to single string to get Category names 118 | for key in responseJSON["result"]['cats']: 119 | categories += '{0}, '.format(str(key)) 120 | 121 | a = '{0}(Score: {1})'.format(categories,str(responseJSON['result']['score'])) 122 | 123 | return a 124 | 125 | except Exception as e: 126 | print('[-] Error retrieving IBM-Xforce reputation! {0}'.format(e)) 127 | return "error" 128 | 129 | def checkTalos(domain): 130 | url = 'https://www.talosintelligence.com/sb_api/query_lookup?query=%2Fapi%2Fv2%2Fdetails%2Fdomain%2F&query_entry={0}&offset=0&order=ip+asc'.format(domain) 131 | headers = {'User-Agent':useragent, 132 | 'Referer':url} 133 | 134 | print('[*] Cisco Talos: {}'.format(domain)) 135 | try: 136 | response = s.get(url,headers=headers,verify=False) 137 | 138 | responseJSON = json.loads(response.text) 139 | 140 | if 'error' in responseJSON: 141 | a = str(responseJSON['error']) 142 | if a == "Unfortunately, we can't find any results for your search.": 143 | a = 'Uncategorized' 144 | 145 | elif responseJSON['category'] is None: 146 | a = 'Uncategorized' 147 | 148 | else: 149 | a = '{0} (Score: {1})'.format(str(responseJSON['category']['description']), str(responseJSON['web_score_name'])) 150 | 151 | return a 152 | 153 | except Exception as e: 154 | print('[-] Error retrieving Talos reputation! {0}'.format(e)) 155 | return "error" 156 | 157 | def checkMXToolbox(domain): 158 | url = 'https://mxtoolbox.com/Public/Tools/BrandReputation.aspx' 159 | headers = {'User-Agent':useragent, 160 | 'Origin':url, 161 | 'Referer':url} 162 | 163 | print('[*] Google SafeBrowsing and PhishTank: {}'.format(domain)) 164 | 165 | try: 166 | response = s.get(url=url, headers=headers) 167 | 168 | soup = BeautifulSoup(response.content,'lxml') 169 | 170 | viewstate = soup.select('input[name=__VIEWSTATE]')[0]['value'] 171 | viewstategenerator = soup.select('input[name=__VIEWSTATEGENERATOR]')[0]['value'] 172 | eventvalidation = soup.select('input[name=__EVENTVALIDATION]')[0]['value'] 173 | 174 | data = { 175 | "__EVENTTARGET": "", 176 | "__EVENTARGUMENT": "", 177 | "__VIEWSTATE": viewstate, 178 | "__VIEWSTATEGENERATOR": viewstategenerator, 179 | "__EVENTVALIDATION": eventvalidation, 180 | "ctl00$ContentPlaceHolder1$brandReputationUrl": domain, 181 | "ctl00$ContentPlaceHolder1$brandReputationDoLookup": "Brand Reputation Lookup", 182 | "ctl00$ucSignIn$hfRegCode": 'missing', 183 | "ctl00$ucSignIn$hfRedirectSignUp": '/Public/Tools/BrandReputation.aspx', 184 | "ctl00$ucSignIn$hfRedirectLogin": '', 185 | "ctl00$ucSignIn$txtEmailAddress": '', 186 | "ctl00$ucSignIn$cbNewAccount": 'cbNewAccount', 187 | "ctl00$ucSignIn$txtFullName": '', 188 | "ctl00$ucSignIn$txtModalNewPassword": '', 189 | "ctl00$ucSignIn$txtPhone": '', 190 | "ctl00$ucSignIn$txtCompanyName": '', 191 | "ctl00$ucSignIn$drpTitle": '', 192 | "ctl00$ucSignIn$txtTitleName": '', 193 | "ctl00$ucSignIn$txtModalPassword": '' 194 | } 195 | 196 | response = s.post(url=url, headers=headers, data=data) 197 | 198 | soup = BeautifulSoup(response.content,'lxml') 199 | 200 | a = '' 201 | if soup.select('div[id=ctl00_ContentPlaceHolder1_noIssuesFound]'): 202 | a = 'No issues found' 203 | return a 204 | else: 205 | if soup.select('div[id=ctl00_ContentPlaceHolder1_googleSafeBrowsingIssuesFound]'): 206 | a = 'Google SafeBrowsing Issues Found. ' 207 | 208 | if soup.select('div[id=ctl00_ContentPlaceHolder1_phishTankIssuesFound]'): 209 | a += 'PhishTank Issues Found' 210 | return a 211 | 212 | except Exception as e: 213 | print('[-] Error retrieving Google SafeBrowsing and PhishTank reputation!') 214 | return "error" 215 | 216 | def downloadMalwareDomains(malwaredomainsURL): 217 | url = malwaredomainsURL 218 | response = s.get(url=url,headers=headers,verify=False) 219 | responseText = response.text 220 | if response.status_code == 200: 221 | return responseText 222 | else: 223 | print("[-] Error reaching:{} Status: {}").format(url, response.status_code) 224 | 225 | def checkDomain(domain): 226 | print('[*] Fetching domain reputation for: {}'.format(domain)) 227 | 228 | if domain in maldomainsList: 229 | print("[!] {}: Identified as known malware domain (malwaredomains.com)".format(domain)) 230 | 231 | bluecoat = checkBluecoat(domain) 232 | print("[+] {}: {}".format(domain, bluecoat)) 233 | 234 | ibmxforce = checkIBMXForce(domain) 235 | print("[+] {}: {}".format(domain, ibmxforce)) 236 | 237 | ciscotalos = checkTalos(domain) 238 | print("[+] {}: {}".format(domain, ciscotalos)) 239 | 240 | mxtoolbox = checkMXToolbox(domain) 241 | print("[+] {}: {}".format(domain, mxtoolbox)) 242 | 243 | print("") 244 | 245 | results = [domain,bluecoat,ibmxforce,ciscotalos,mxtoolbox] 246 | return results 247 | 248 | def solveCaptcha(url,session): 249 | # Downloads CAPTCHA image and saves to current directory for OCR with tesseract 250 | # Returns CAPTCHA string or False if error occured 251 | 252 | jpeg = 'captcha.jpg' 253 | 254 | try: 255 | response = session.get(url=url,headers=headers,verify=False, stream=True) 256 | if response.status_code == 200: 257 | with open(jpeg, 'wb') as f: 258 | response.raw.decode_content = True 259 | shutil.copyfileobj(response.raw, f) 260 | else: 261 | print('[-] Error downloading CAPTCHA file!') 262 | return False 263 | 264 | # Perform basic OCR without additional image enhancement 265 | text = pytesseract.image_to_string(Image.open(jpeg)) 266 | text = text.replace(" ", "") 267 | 268 | # Remove CAPTCHA file 269 | try: 270 | os.remove(jpeg) 271 | except OSError: 272 | pass 273 | 274 | return text 275 | 276 | except Exception as e: 277 | print("[-] Error solving CAPTCHA - {0}".format(e)) 278 | 279 | return False 280 | 281 | def drawTable(header,data): 282 | 283 | data.insert(0,header) 284 | t = Texttable(max_width=maxwidth) 285 | t.add_rows(data) 286 | t.header(header) 287 | 288 | return(t.draw()) 289 | 290 | ## MAIN 291 | if __name__ == "__main__": 292 | 293 | 294 | parser = argparse.ArgumentParser( 295 | description='Finds expired domains, domain categorization, and Archive.org history to determine good candidates for C2 and phishing domains', 296 | epilog = ''' 297 | Examples: 298 | ./domainhunter.py -k apples -c --ocr -t5 299 | ./domainhunter.py --check --ocr -t3 300 | ./domainhunter.py --single mydomain.com 301 | ./domainhunter.py --keyword tech --check --ocr --timing 5 --alexa 302 | ./domaihunter.py --filename inputlist.txt --ocr --timing 5''', 303 | formatter_class=argparse.RawDescriptionHelpFormatter) 304 | 305 | parser.add_argument('-a','--alexa', help='Filter results to Alexa listings', required=False, default=0, action='store_const', const=1) 306 | parser.add_argument('-k','--keyword', help='Keyword used to refine search results', required=False, default=False, type=str, dest='keyword') 307 | parser.add_argument('-c','--check', help='Perform domain reputation checks', required=False, default=False, action='store_true', dest='check') 308 | parser.add_argument('-f','--filename', help='Specify input file of line delimited domain names to check', required=False, default=False, type=str, dest='filename') 309 | parser.add_argument('--ocr', help='Perform OCR on CAPTCHAs when challenged', required=False, default=False, action='store_true') 310 | parser.add_argument('-r','--maxresults', help='Number of results to return when querying latest expired/deleted domains', required=False, default=100, type=int, dest='maxresults') 311 | parser.add_argument('-s','--single', help='Performs detailed reputation checks against a single domain name/IP.', required=False, default=False, dest='single') 312 | parser.add_argument('-t','--timing', help='Modifies request timing to avoid CAPTCHAs. Slowest(0) = 90-120 seconds, Default(3) = 10-20 seconds, Fastest(5) = no delay', required=False, default=3, type=int, choices=range(0,6), dest='timing') 313 | parser.add_argument('-w','--maxwidth', help='Width of text table', required=False, default=400, type=int, dest='maxwidth') 314 | parser.add_argument('-V','--version', action='version',version='%(prog)s {version}'.format(version=__version__)) 315 | args = parser.parse_args() 316 | 317 | # Load dependent modules 318 | try: 319 | import requests 320 | from bs4 import BeautifulSoup 321 | from texttable import Texttable 322 | 323 | except Exception as e: 324 | print("Expired Domains Reputation Check") 325 | print("[-] Missing basic dependencies: {}".format(str(e))) 326 | print("[*] Install required dependencies by running `pip3 install -r requirements.txt`") 327 | quit(0) 328 | 329 | # Load OCR related modules if --ocr flag is set since these can be difficult to get working 330 | if args.ocr: 331 | try: 332 | import pytesseract 333 | from PIL import Image 334 | import shutil 335 | except Exception as e: 336 | print("Expired Domains Reputation Check") 337 | print("[-] Missing OCR dependencies: {}".format(str(e))) 338 | print("[*] Install required Python dependencies by running: pip3 install -r requirements.txt") 339 | print("[*] Ubuntu\Debian - Install tesseract by running: apt-get install tesseract-ocr python3-imaging") 340 | print("[*] macOS - Install tesseract with homebrew by running: brew install tesseract") 341 | quit(0) 342 | 343 | ## Variables 344 | 345 | alexa = args.alexa 346 | 347 | keyword = args.keyword 348 | 349 | check = args.check 350 | 351 | filename = args.filename 352 | 353 | maxresults = args.maxresults 354 | 355 | single = args.single 356 | 357 | timing = args.timing 358 | 359 | maxwidth = args.maxwidth 360 | 361 | ocr = args.ocr 362 | 363 | malwaredomainsURL = 'http://mirror1.malwaredomains.com/files/justdomains' 364 | 365 | expireddomainsqueryURL = 'https://www.expireddomains.net/domain-name-search' 366 | 367 | timestamp = time.strftime("%Y%m%d_%H%M%S") 368 | 369 | useragent = 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)' 370 | 371 | headers = {'User-Agent':useragent} 372 | 373 | requests.packages.urllib3.disable_warnings() 374 | 375 | # HTTP Session container, used to manage cookies, session tokens and other session information 376 | s = requests.Session() 377 | 378 | title = ''' 379 | ____ ___ __ __ _ ___ _ _ _ _ _ _ _ _ _____ _____ ____ 380 | | _ \ / _ \| \/ | / \ |_ _| \ | | | | | | | | | \ | |_ _| ____| _ \ 381 | | | | | | | | |\/| | / _ \ | || \| | | |_| | | | | \| | | | | _| | |_) | 382 | | |_| | |_| | | | |/ ___ \ | || |\ | | _ | |_| | |\ | | | | |___| _ < 383 | |____/ \___/|_| |_/_/ \_\___|_| \_| |_| |_|\___/|_| \_| |_| |_____|_| \_\ ''' 384 | 385 | print(title) 386 | print("") 387 | print("Expired Domains Reputation Checker") 388 | print("Authors: @joevest and @andrewchiles\n") 389 | print("DISCLAIMER: This is for educational purposes only!") 390 | disclaimer = '''It is designed to promote education and the improvement of computer/cyber security. 391 | The authors or employers are not liable for any illegal act or misuse performed by any user of this tool. 392 | If you plan to use this content for illegal purpose, don't. Have a nice day :)''' 393 | print(disclaimer) 394 | print("") 395 | 396 | # Download known malware domains 397 | print('[*] Downloading malware domain list from {}\n'.format(malwaredomainsURL)) 398 | maldomains = downloadMalwareDomains(malwaredomainsURL) 399 | maldomainsList = maldomains.split("\n") 400 | 401 | # Retrieve reputation for a single choosen domain (Quick Mode) 402 | if single: 403 | checkDomain(single) 404 | exit(0) 405 | 406 | # Perform detailed domain reputation checks against input file, print table, and quit. This does not generate an HTML report 407 | if filename: 408 | # Initialize our list with an empty row for the header 409 | data = [] 410 | try: 411 | with open(filename, 'r') as domainsList: 412 | for line in domainsList.read().splitlines(): 413 | data.append(checkDomain(line)) 414 | doSleep(timing) 415 | 416 | # Print results table 417 | header = ['Domain', 'BlueCoat', 'IBM X-Force', 'Cisco Talos', 'MXToolbox'] 418 | print(drawTable(header,data)) 419 | 420 | except KeyboardInterrupt: 421 | print('Caught keyboard interrupt. Exiting!') 422 | exit(0) 423 | except Exception as e: 424 | print('[-] Error: {}'.format(e)) 425 | exit(1) 426 | exit(0) 427 | 428 | # Generic Proxy support 429 | # TODO: add as a parameter 430 | proxies = { 431 | 'http': 'http://127.0.0.1:8080', 432 | 'https': 'http://127.0.0.1:8080', 433 | } 434 | 435 | # Create an initial session 436 | domainrequest = s.get("https://www.expireddomains.net",headers=headers,verify=False) 437 | 438 | # Use proxy like Burp for debugging request/parsing errors 439 | #domainrequest = s.get("https://www.expireddomains.net",headers=headers,verify=False,proxies=proxies) 440 | 441 | # Lists for our ExpiredDomains results 442 | domain_list = [] 443 | data = [] 444 | 445 | # Generate list of URLs to query for expired/deleted domains 446 | urls = [] 447 | 448 | # Use the keyword string to narrow domain search if provided. This generates a list of URLs to query 449 | 450 | if keyword: 451 | print('[*] Fetching expired or deleted domains containing "{}"'.format(keyword)) 452 | for i in range (0,maxresults,25): 453 | if i == 0: 454 | urls.append("{}/?q={}&fwhois=22&ftlds[]=2&ftlds[]=3&ftlds[]=4&falexa={}".format(expireddomainsqueryURL,keyword,alexa)) 455 | headers['Referer'] ='https://www.expireddomains.net/domain-name-search/?q={}&start=1'.format(keyword) 456 | else: 457 | urls.append("{}/?start={}&q={}&ftlds[]=2&ftlds[]=3&ftlds[]=4&fwhois=22&falexa={}".format(expireddomainsqueryURL,i,keyword,alexa)) 458 | headers['Referer'] ='https://www.expireddomains.net/domain-name-search/?start={}&q={}'.format((i-25),keyword) 459 | 460 | # If no keyword provided, generate list of recently expired domains URLS (batches of 25 results). 461 | else: 462 | print('[*] Fetching expired or deleted domains...') 463 | # Caculate number of URLs to request since we're performing a request for two different resources instead of one 464 | numresults = int(maxresults / 2) 465 | for i in range (0,(numresults),25): 466 | urls.append('https://www.expireddomains.net/backorder-expired-domains?start={}&ftlds[]=2&ftlds[]=3&ftlds[]=4&falexa={}'.format(i,alexa)) 467 | urls.append('https://www.expireddomains.net/deleted-com-domains/?start={}&ftlds[]=2&ftlds[]=3&ftlds[]=4&falexa={}'.format(i,alexa)) 468 | 469 | for url in urls: 470 | 471 | print("[*] {}".format(url)) 472 | 473 | # Annoyingly when querying specific keywords the expireddomains.net site requires additional cookies which 474 | # are set in JavaScript and not recognized by Requests so we add them here manually. 475 | # May not be needed, but the _pk_id.10.dd0a cookie only requires a single . to be successful 476 | # In order to somewhat match a real cookie, but still be different, random integers are introduced 477 | 478 | r1 = random.randint(100000,999999) 479 | 480 | # Known good example _pk_id.10.dd0a cookie: 5abbbc772cbacfb1.1496760705.2.1496760705.1496760705 481 | pk_str = '5abbbc772cbacfb1' + '.1496' + str(r1) + '.2.1496' + str(r1) + '.1496' + str(r1) 482 | 483 | jar = requests.cookies.RequestsCookieJar() 484 | jar.set('_pk_ses.10.dd0a', '*', domain='expireddomains.net', path='/') 485 | jar.set('_pk_id.10.dd0a', pk_str, domain='expireddomains.net', path='/') 486 | 487 | domainrequest = s.get(url,headers=headers,verify=False,cookies=jar) 488 | #domainrequest = s.get(url,headers=headers,verify=False,cookies=jar,proxies=proxies) 489 | 490 | domains = domainrequest.text 491 | 492 | # Turn the HTML into a Beautiful Soup object 493 | soup = BeautifulSoup(domains, 'lxml') 494 | #print(soup) 495 | try: 496 | table = soup.find("table") 497 | 498 | rows = table.findAll('tr')[1:] 499 | for row in table.findAll('tr')[1:]: 500 | 501 | # Alternative way to extract domain name 502 | # domain = row.find('td').find('a').text 503 | 504 | cells = row.findAll("td") 505 | 506 | if len(cells) >= 1: 507 | if keyword: 508 | 509 | c0 = row.find('td').find('a').text # domain 510 | c1 = cells[1].find(text=True) # bl 511 | c2 = cells[2].find(text=True) # domainpop 512 | c3 = cells[3].find(text=True) # birth 513 | c4 = cells[4].find(text=True) # Archive.org entries 514 | c5 = cells[5].find(text=True) # Alexa 515 | c6 = cells[6].find(text=True) # Dmoz.org 516 | c7 = cells[7].find(text=True) # status com 517 | c8 = cells[8].find(text=True) # status net 518 | c9 = cells[9].find(text=True) # status org 519 | c10 = cells[10].find(text=True) # status de 520 | c11 = cells[11].find(text=True) # TLDs 521 | c12 = cells[12].find(text=True) # RDT 522 | c13 = cells[13].find(text=True) # List 523 | c14 = cells[14].find(text=True) # Status 524 | c15 = "" # Links 525 | 526 | # create available TLD list 527 | available = '' 528 | if c7 == "available": 529 | available += ".com " 530 | 531 | if c8 == "available": 532 | available += ".net " 533 | 534 | if c9 == "available": 535 | available += ".org " 536 | 537 | if c10 == "available": 538 | available += ".de " 539 | 540 | # Only grab status for keyword searches since it doesn't exist otherwise 541 | status = "" 542 | if keyword: 543 | status = c14 544 | 545 | # Only add Expired, not Pending, Backorder, etc 546 | if c13 == "Expired": 547 | # Append parsed domain data to list if it matches our criteria (.com|.net|.org and not a known malware domain) 548 | if (c0.lower().endswith(".com") or c0.lower().endswith(".net") or c0.lower().endswith(".org")) and (c0 not in maldomainsList): 549 | domain_list.append([c0,c3,c4,available,status]) 550 | 551 | # Non-keyword search table format is slightly different 552 | else: 553 | 554 | c0 = cells[0].find(text=True) # domain 555 | c1 = cells[1].find(text=True) # bl 556 | c2 = cells[2].find(text=True) # domainpop 557 | c3 = cells[3].find(text=True) # birth 558 | c4 = cells[4].find(text=True) # Archive.org entries 559 | c5 = cells[5].find(text=True) # Alexa 560 | c6 = cells[6].find(text=True) # Dmoz.org 561 | c7 = cells[7].find(text=True) # status com 562 | c8 = cells[8].find(text=True) # status net 563 | c9 = cells[9].find(text=True) # status org 564 | c10 = cells[10].find(text=True) # status de 565 | c11 = cells[11].find(text=True) # TLDs 566 | c12 = cells[12].find(text=True) # RDT 567 | c13 = cells[13].find(text=True) # End Date 568 | c14 = cells[14].find(text=True) # Links 569 | 570 | # create available TLD list 571 | available = '' 572 | if c7 == "available": 573 | available += ".com " 574 | 575 | if c8 == "available": 576 | available += ".net " 577 | 578 | if c9 == "available": 579 | available += ".org " 580 | 581 | if c10 == "available": 582 | available += ".de " 583 | 584 | status = "" 585 | 586 | # Append original parsed domain data to list if it matches our criteria (.com|.net|.org and not a known malware domain) 587 | if (c0.lower().endswith(".com") or c0.lower().endswith(".net") or c0.lower().endswith(".org")) and (c0 not in maldomainsList): 588 | domain_list.append([c0,c3,c4,available,status]) 589 | 590 | except Exception as e: 591 | print("[!] Error: ", e) 592 | pass 593 | 594 | # Add additional sleep on requests to ExpiredDomains.net to avoid errors 595 | time.sleep(5) 596 | 597 | # Check for valid list results before continuing 598 | if len(domain_list) == 0: 599 | print("[-] No domain results found or none are currently available for purchase!") 600 | exit(0) 601 | else: 602 | domain_list_unique = [] 603 | [domain_list_unique.append(item) for item in domain_list if item not in domain_list_unique] 604 | 605 | # Print number of domains to perform reputation checks against 606 | if check: 607 | print("\n[*] Performing reputation checks for {} domains".format(len(domain_list_unique))) 608 | 609 | for domain_entry in domain_list_unique: 610 | domain = domain_entry[0] 611 | birthdate = domain_entry[1] 612 | archiveentries = domain_entry[2] 613 | availabletlds = domain_entry[3] 614 | status = domain_entry[4] 615 | bluecoat = '-' 616 | ibmxforce = '-' 617 | ciscotalos = '-' 618 | 619 | # Perform domain reputation checks 620 | if check: 621 | 622 | bluecoat = checkBluecoat(domain) 623 | print("[+] {}: {}".format(domain, bluecoat)) 624 | ibmxforce = checkIBMXForce(domain) 625 | print("[+] {}: {}".format(domain, ibmxforce)) 626 | ciscotalos = checkTalos(domain) 627 | print("[+] {}: {}".format(domain, ciscotalos)) 628 | print("") 629 | # Sleep to avoid captchas 630 | doSleep(timing) 631 | 632 | # Append entry to new list with reputation if at least one service reports reputation 633 | if not ((bluecoat in ('Uncategorized','badurl','Suspicious','Malicious Sources/Malnets','captcha','Phishing','Placeholders','Spam','error')) \ 634 | and (ibmxforce in ('Not found.','error')) and (ciscotalos in ('Uncategorized','error'))): 635 | 636 | data.append([domain,birthdate,archiveentries,availabletlds,status,bluecoat,ibmxforce,ciscotalos]) 637 | 638 | # Sort domain list by column 2 (Birth Year) 639 | sortedDomains = sorted(data, key=lambda x: x[1], reverse=True) 640 | 641 | if check: 642 | if len(sortedDomains) == 0: 643 | print("[-] No domains discovered with a desireable categorization!") 644 | exit(0) 645 | else: 646 | print("[*] {} of {} domains discovered with a potentially desireable categorization!".format(len(sortedDomains),len(domain_list))) 647 | 648 | # Build HTML Table 649 | html = '' 650 | htmlHeader = 'Expired Domain List' 651 | htmlBody = '

The following available domains report was generated at {}

'.format(timestamp) 652 | htmlTableHeader = ''' 653 | 654 | 655 | 656 | 657 | 658 | 659 | 660 | 661 | 662 | 663 | 664 | 665 | 666 | ''' 667 | 668 | htmlTableBody = '' 669 | htmlTableFooter = '
DomainBirthEntriesTLDs AvailableStatusBlueCoatIBM X-ForceCisco TalosWatchGuardNamecheapArchive.org
' 670 | htmlFooter = '' 671 | 672 | # Build HTML table contents 673 | for i in sortedDomains: 674 | htmlTableBody += '' 675 | htmlTableBody += '{}'.format(i[0]) # Domain 676 | htmlTableBody += '{}'.format(i[1]) # Birth 677 | htmlTableBody += '{}'.format(i[2]) # Entries 678 | htmlTableBody += '{}'.format(i[3]) # TLDs 679 | htmlTableBody += '{}'.format(i[4]) # Status 680 | 681 | htmlTableBody += '{}'.format(i[5]) # Bluecoat 682 | htmlTableBody += '{}'.format(i[0],i[6]) # IBM x-Force Categorization 683 | htmlTableBody += '{}'.format(i[0],i[7]) # Cisco Talos 684 | htmlTableBody += 'WatchGuard'.format(i[0]) # Borderware WatchGuard 685 | htmlTableBody += 'Namecheap'.format(i[0]) # Namecheap 686 | htmlTableBody += 'Archive.org'.format(i[0]) # Archive.org 687 | htmlTableBody += '' 688 | 689 | html = htmlHeader + htmlBody + htmlTableHeader + htmlTableBody + htmlTableFooter + htmlFooter 690 | 691 | logfilename = "{}_domainreport.html".format(timestamp) 692 | log = open(logfilename,'w') 693 | log.write(html) 694 | log.close 695 | 696 | print("\n[*] Search complete") 697 | print("[*] Log written to {}\n".format(logfilename)) 698 | 699 | # Print Text Table 700 | header = ['Domain', 'Birth', '#', 'TLDs', 'Status', 'BlueCoat', 'IBM', 'Cisco Talos'] 701 | print(drawTable(header,sortedDomains)) 702 | --------------------------------------------------------------------------------