├── requirements.txt
├── .gitignore
├── dockerfile
├── LICENSE
├── README.md
└── domainhunter.py


/requirements.txt:
--------------------------------------------------------------------------------
1 | requests==2.13.0
2 | texttable==0.8.7
3 | beautifulsoup4==4.5.3
4 | lxml
5 | pillow==5.0.0
6 | pytesseract
7 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | *.html
 2 | *.txt
 3 | *.jpg
 4 | Pipfile*
 5 | 
 6 | .vscode/*
 7 | !.vscode/settings.json
 8 | !.vscode/tasks.json
 9 | !.vscode/launch.json
10 | !.vscode/extensions.json
11 | 


--------------------------------------------------------------------------------
/dockerfile:
--------------------------------------------------------------------------------
 1 | #build it:
 2 | #docker build -t domainhunter:1.0 .
 3 | #run it:
 4 | #docker run -it domainhunter:1.0 [args]
 5 | 
 6 | FROM ubuntu:16.04
 7 | 
 8 | RUN apt-get update \
 9 | 	&& apt-get install python3-pip -y\
10 | 	&& apt-get install tesseract-ocr -y\
11 | 	&& apt-get install python3-pil -y
12 | 
13 | ADD domainhunter.py /
14 | ADD requirements.txt /
15 | 
16 | RUN pip3 install -r requirements.txt 
17 | 
18 | ENTRYPOINT [ "python3", "./domainhunter.py" ]
19 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2017, Joe Vest, Andrew Chiles
 2 | All rights reserved.
 3 | 
 4 | Redistribution and use in source and binary forms, with or without
 5 | modification, are permitted provided that the following conditions are met:
 6 |     * Redistributions of source code must retain the above copyright
 7 |       notice, this list of conditions and the following disclaimer.
 8 |     * Redistributions in binary form must reproduce the above copyright
 9 |       notice, this list of conditions and the following disclaimer in the
10 |       documentation and/or other materials provided with the distribution.
11 |     * Neither the name of the Domainhunter nor the
12 |       names of its contributors may be used to endorse or promote products
13 |       derived from this software without specific prior written permission.
14 | 
15 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
16 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
17 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
18 | DISCLAIMED. IN NO EVENT SHALL COPYRIGHT HOLDER BE LIABLE FOR ANY
19 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
20 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
21 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
22 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
23 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
24 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Domain Hunter
  2 | 
  3 | Authors Joe Vest (@joevest) & Andrew Chiles (@andrewchiles)
  4 | 
  5 | Domain name selection is an important aspect of preparation for penetration tests and especially Red Team engagements. Commonly, domains that were used previously for benign purposes and were properly categorized can be purchased for only a few dollars. Such domains can allow a team to bypass reputation based web filters and network egress restrictions for phishing and C2 related tasks. 
  6 | 
  7 | This Python based tool was written to quickly query the Expireddomains.net search engine for expired/available domains with a previous history of use. It then optionally queries for domain reputation against services like Symantec WebPulse (BlueCoat), IBM X-Force, and Cisco Talos. The primary tool output is a timestamped HTML table style report.
  8 | 
  9 | ## Changes
 10 | 
 11 | - 5 October 2018
 12 |    + Fixed logic for filtering domains with desirable categorizations. Previously, some error conditions weren't filtered and would result in domains without a valid categorization making it into the final list.
 13 | 
 14 | - 4 October 2018
 15 |    + Tweaked parsing logic
 16 |    + Fixed changes parsed columns indexes
 17 | 
 18 | - 17 September 2018
 19 |     + Fixed Symantec WebPulse Site Review parsing errors caused by service updates
 20 | 
 21 | - 18 May 2018
 22 |     + Add --alexa switch to control Alexa ranked site filtering
 23 | 
 24 | - 16 May 2018
 25 |     + Update queries to increase probability of quickly finding a domain available for instant purchase. Previously, many reported domains had an "In Auction" or "Make an Offer" status. New criteria: .com|.net|.org + Alexa Ranked + Available for Purchase
 26 |     + Improved logic to filter out uncategorized and some potentially undesirable domain categorizations in the final text table and HTML output
 27 |     + Removed unnecessary columns from HTML report
 28 | 
 29 | - 6 May 2018
 30 |     + Fixed expired domains parsing when performing a keyword search
 31 |     + Minor HTML and text table output updates
 32 |     + Filtered reputation checks to only execute for .COM, .ORG, and .NET domains and removed check for Archive.org records when performing a default or keyword search. Credit to @christruncer for the original PR and idea.
 33 | 
 34 | - 11 April 2018
 35 |     + Added OCR support for CAPTCHA solving with tesseract. Thanks to t94j0 for the idea in [AIRMASTER](https://github.com/t94j0/AIRMASTER)  
 36 |     + Added support for input file list of potential domains (-f/--filename)
 37 |     + Changed -q/--query switch to -k/--keyword to better match its purpose
 38 |     + Added additional error checking for ExpiredDomains.net parsing
 39 | 
 40 | - 9 April 2018
 41 |     + Added -t switch for timing control. -t <1-5>
 42 |     + Added Google SafeBrowsing and PhishTank reputation checks
 43 |     + Fixed bug in IBMXForce response parsing
 44 | 
 45 | - 7 April 2018
 46 |     + Fixed support for Symantec WebPulse Site Review (formerly Blue Coat WebFilter)
 47 |     + Added Cisco Talos Domain Reputation check
 48 |     + Added feature to perform a reputation check against a single non-expired domain. This is useful when monitoring reputation for domains used in ongoing campaigns and engagements.
 49 | 
 50 | - 6 June 2017
 51 |     + Added python 3 support
 52 |     + Code cleanup and bug fixes
 53 |     + Added Status column (Available, Make Offer, Price, Backorder, etc)
 54 | 
 55 | ## Features
 56 | 
 57 | - Retrieve specified number of recently expired and deleted domains (.com, .net, .org) from ExpiredDomains.net
 58 | - Retrieve available domains based on keyword search from ExpiredDomains.net
 59 | - Perform reputation checks against the Symantec WebPulse Site Review (BlueCoat), IBM x-Force, Cisco Talos, Google SafeBrowsing, and PhishTank services
 60 | - Sort results by domain age (if known) and filter for reputation
 61 | - Text-based table and HTML report output with links to reputation sources and Archive.org entry
 62 | 
 63 | ## Installation
 64 | 
 65 | Install Python requirements
 66 | 
 67 |     pip3 install -r requirements.txt
 68 |     
 69 | Optional - Install additional OCR support dependencies
 70 | 
 71 | - Debian/Ubuntu: `apt-get install tesseract-ocr python3-imaging`
 72 | 
 73 | - MAC OSX: `brew install tesseract`
 74 | 
 75 | ## Usage
 76 | 
 77 |     usage: domainhunter.py [-h] [-a] [-k KEYWORD] [-c] [-f FILENAME] [--ocr]
 78 |                         [-r MAXRESULTS] [-s SINGLE] [-t {0,1,2,3,4,5}]
 79 |                         [-w MAXWIDTH] [-V]
 80 | 
 81 |     Finds expired domains, domain categorization, and Archive.org history to determine good candidates for C2 and phishing domains
 82 | 
 83 |     optional arguments:
 84 |     -h, --help            show this help message and exit
 85 |     -a, --alexa           Filter results to Alexa listings
 86 |     -k KEYWORD, --keyword KEYWORD
 87 |                             Keyword used to refine search results
 88 |     -c, --check           Perform domain reputation checks
 89 |     -f FILENAME, --filename FILENAME
 90 |                             Specify input file of line delimited domain names to
 91 |                             check
 92 |     --ocr                 Perform OCR on CAPTCHAs when challenged
 93 |     -r MAXRESULTS, --maxresults MAXRESULTS
 94 |                             Number of results to return when querying latest
 95 |                             expired/deleted domains
 96 |     -s SINGLE, --single SINGLE
 97 |                             Performs detailed reputation checks against a single
 98 |                             domain name/IP.
 99 |     -t {0,1,2,3,4,5}, --timing {0,1,2,3,4,5}
100 |                             Modifies request timing to avoid CAPTCHAs. Slowest(0)
101 |                             = 90-120 seconds, Default(3) = 10-20 seconds,
102 |                             Fastest(5) = no delay
103 |     -w MAXWIDTH, --maxwidth MAXWIDTH
104 |                             Width of text table
105 |     -V, --version         show program's version number and exit
106 | 
107 |     Examples:
108 |     ./domainhunter.py -k apples -c --ocr -t5
109 |     ./domainhunter.py --check --ocr -t3
110 |     ./domainhunter.py --single mydomain.com
111 |     ./domainhunter.py --keyword tech --check --ocr --timing 5 --alexa
112 |     ./domaihunter.py --filename inputlist.txt --ocr --timing 5
113 | 
114 | Use defaults to check for most recent 100 domains and check reputation
115 |     
116 |     python3 ./domainhunter.py
117 | 
118 | Search for 1000 most recently expired/deleted domains, but don't check reputation
119 | 
120 |     python3 ./domainhunter.py -r 1000
121 | 
122 | Perform all reputation checks for a single domain
123 | 
124 |     python3 ./domainhunter.py -s mydomain.com
125 | 
126 |     [*] Downloading malware domain list from http://mirror1.malwaredomains.com/files/justdomains
127 | 
128 |     [*] Fetching domain reputation for: mydomain.com
129 |     [*] Google SafeBrowsing and PhishTank: mydomain.com
130 |     [+] mydomain.com: No issues found
131 |     [*] BlueCoat: mydomain.com
132 |     [+] mydomain.com: Technology/Internet
133 |     [*] IBM xForce: mydomain.com
134 |     [+] mydomain.com: Communication Services, Software as a Service, Cloud, (Score: 1)
135 |     [*] Cisco Talos: mydomain.com
136 |     [+] mydomain.com: Web Hosting (Score: Neutral)
137 | 
138 | Perform all reputation checks for a list of domains at max speed with OCR of CAPTCHAs
139 | 
140 |     python3 ./domainhunter.py -f <domainslist.txt> -t 5 --ocr
141 | 
142 | Search for available domains with keyword term of "dog", max results of 25, and check reputation
143 |     
144 |     python3 ./domainhunter.py -k dog -r 25 -c
145 | 
146 |      ____   ___  __  __    _    ___ _   _   _   _ _   _ _   _ _____ _____ ____
147 |     |  _ \ / _ \|  \/  |  / \  |_ _| \ | | | | | | | | | \ | |_   _| ____|  _ \
148 |     | | | | | | | |\/| | / _ \  | ||  \| | | |_| | | | |  \| | | | |  _| | |_) |
149 |     | |_| | |_| | |  | |/ ___ \ | || |\  | |  _  | |_| | |\  | | | | |___|  _ <
150 |     |____/ \___/|_|  |_/_/   \_\___|_| \_| |_| |_|\___/|_| \_| |_| |_____|_| \_\
151 | 
152 |     Expired Domains Reputation Checker
153 |     Authors: @joevest and @andrewchiles
154 | 
155 |     DISCLAIMER: This is for educational purposes only!
156 |     It is designed to promote education and the improvement of computer/cyber security.
157 |     The authors or employers are not liable for any illegal act or misuse performed by any user of this tool.
158 |     If you plan to use this content for illegal purpose, don't.  Have a nice day :)
159 | 
160 |     [*] Downloading malware domain list from http://mirror1.malwaredomains.com/files/justdomains
161 | 
162 |     [*] Fetching expired or deleted domains containing "dog"
163 |     [*]  https://www.expireddomains.net/domain-name-search/?q=dog
164 | 
165 |     [*] Performing domain reputation checks for 8 domains.
166 |     [*] BlueCoat: doginmysuitcase.com
167 |     [+] doginmysuitcase.com: Travel
168 |     [*] IBM xForce: doginmysuitcase.com
169 |     [+] doginmysuitcase.com: Not found.
170 |     [*] Cisco Talos: doginmysuitcase.com
171 |     [+] doginmysuitcase.com: Uncategorized
172 | 


--------------------------------------------------------------------------------
/domainhunter.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | ## Title:       domainhunter.py
  4 | ## Author:      @joevest and @andrewchiles
  5 | ## Description: Checks expired domains, reputation/categorization, and Archive.org history to determine 
  6 | ##              good candidates for phishing and C2 domain names
  7 | 
  8 | # If the expected response format from a provider changes, use the traceback module to get a full stack trace without removing try/catch blocks
  9 | #import traceback
 10 | #traceback.print_exc()
 11 | 
 12 | import time 
 13 | import random
 14 | import argparse
 15 | import json
 16 | import base64
 17 | import os
 18 | 
 19 | __version__ = "20181005"
 20 | 
 21 | ## Functions
 22 | 
 23 | def doSleep(timing):
 24 |     if timing == 0:
 25 |         time.sleep(random.randrange(90,120))
 26 |     elif timing == 1:
 27 |         time.sleep(random.randrange(60,90))
 28 |     elif timing == 2:
 29 |         time.sleep(random.randrange(30,60))
 30 |     elif timing == 3:
 31 |         time.sleep(random.randrange(10,20))
 32 |     elif timing == 4:
 33 |         time.sleep(random.randrange(5,10))
 34 |     # There's no elif timing == 5 here because we don't want to sleep for -t 5
 35 | 
 36 | def checkBluecoat(domain):
 37 |     try:
 38 |         url = 'https://sitereview.bluecoat.com/resource/lookup'
 39 |         postData = {'url':domain,'captcha':''}
 40 |         headers = {'User-Agent':useragent,
 41 |                    'Accept':'application/json, text/plain, */*',
 42 |                    'Content-Type':'application/json; charset=UTF-8',
 43 |                    'Referer':'https://sitereview.bluecoat.com/lookup'}
 44 | 
 45 |         print('[*] BlueCoat: {}'.format(domain))
 46 |         response = s.post(url,headers=headers,json=postData,verify=False)
 47 |         responseJSON = json.loads(response.text)
 48 |         
 49 |         if 'errorType' in responseJSON:
 50 |             a = responseJSON['errorType']
 51 |         else:
 52 |             a = responseJSON['categorization'][0]['name']
 53 |         
 54 |         # Print notice if CAPTCHAs are blocking accurate results and attempt to solve if --ocr
 55 |         if a == 'captcha':
 56 |             if ocr:
 57 |                 # This request is also performed by a browser, but is not needed for our purposes
 58 |                 #captcharequestURL = 'https://sitereview.bluecoat.com/resource/captcha-request'
 59 | 
 60 |                 print('[*] Received CAPTCHA challenge!')
 61 |                 captcha = solveCaptcha('https://sitereview.bluecoat.com/resource/captcha.jpg',s)
 62 |                 
 63 |                 if captcha:
 64 |                     b64captcha = base64.urlsafe_b64encode(captcha.encode('utf-8')).decode('utf-8')
 65 |                    
 66 |                     # Send CAPTCHA solution via GET since inclusion with the domain categorization request doens't work anymore
 67 |                     captchasolutionURL = 'https://sitereview.bluecoat.com/resource/captcha-request/{0}'.format(b64captcha)
 68 |                     print('[*] Submiting CAPTCHA at {0}'.format(captchasolutionURL))
 69 |                     response = s.get(url=captchasolutionURL,headers=headers,verify=False)
 70 | 
 71 |                     # Try the categorization request again
 72 |                     response = s.post(url,headers=headers,json=postData,verify=False)
 73 | 
 74 |                     responseJSON = json.loads(response.text)
 75 | 
 76 |                     if 'errorType' in responseJSON:
 77 |                         a = responseJSON['errorType']
 78 |                     else:
 79 |                         a = responseJSON['categorization'][0]['name']
 80 |                 else:
 81 |                     print('[-] Error: Failed to solve BlueCoat CAPTCHA with OCR! Manually solve at "https://sitereview.bluecoat.com/sitereview.jsp"')
 82 |             else:
 83 |                 print('[-] Error: BlueCoat CAPTCHA received. Try --ocr flag or manually solve a CAPTCHA at "https://sitereview.bluecoat.com/sitereview.jsp"')
 84 | 
 85 |         return a
 86 | 
 87 |     except Exception as e:
 88 |         print('[-] Error retrieving Bluecoat reputation! {0}'.format(e))
 89 |         return "error"
 90 | 
 91 | def checkIBMXForce(domain):
 92 |     try: 
 93 |         url = 'https://exchange.xforce.ibmcloud.com/url/{}'.format(domain)
 94 |         headers = {'User-Agent':useragent,
 95 |                     'Accept':'application/json, text/plain, */*',
 96 |                     'x-ui':'XFE',
 97 |                     'Origin':url,
 98 |                     'Referer':url}
 99 | 
100 |         print('[*] IBM xForce: {}'.format(domain))
101 | 
102 |         url = 'https://api.xforce.ibmcloud.com/url/{}'.format(domain)
103 |         response = s.get(url,headers=headers,verify=False)
104 | 
105 |         responseJSON = json.loads(response.text)
106 | 
107 |         if 'error' in responseJSON:
108 |             a = responseJSON['error']
109 | 
110 |         elif not responseJSON['result']['cats']:
111 |             a = 'Uncategorized'
112 | 	
113 | 	## TO-DO - Add noticed when "intrusion" category is returned. This is indication of rate limit / brute-force protection hit on the endpoint        
114 | 
115 |         else:
116 |             categories = ''
117 |             # Parse all dictionary keys and append to single string to get Category names
118 |             for key in responseJSON["result"]['cats']:
119 |                 categories += '{0}, '.format(str(key))
120 | 
121 |             a = '{0}(Score: {1})'.format(categories,str(responseJSON['result']['score']))
122 | 
123 |         return a
124 | 
125 |     except Exception as e:
126 |         print('[-] Error retrieving IBM-Xforce reputation! {0}'.format(e))
127 |         return "error"
128 | 
129 | def checkTalos(domain):
130 |     url = 'https://www.talosintelligence.com/sb_api/query_lookup?query=%2Fapi%2Fv2%2Fdetails%2Fdomain%2F&query_entry={0}&offset=0&order=ip+asc'.format(domain)
131 |     headers = {'User-Agent':useragent,
132 |                'Referer':url}
133 | 
134 |     print('[*] Cisco Talos: {}'.format(domain))
135 |     try:
136 |         response = s.get(url,headers=headers,verify=False)
137 | 
138 |         responseJSON = json.loads(response.text)
139 | 
140 |         if 'error' in responseJSON:
141 |             a = str(responseJSON['error'])
142 |             if a == "Unfortunately, we can't find any results for your search.":
143 |                 a = 'Uncategorized'
144 |         
145 |         elif responseJSON['category'] is None:
146 |             a = 'Uncategorized'
147 | 
148 |         else:
149 |             a = '{0} (Score: {1})'.format(str(responseJSON['category']['description']), str(responseJSON['web_score_name']))
150 |        
151 |         return a
152 | 
153 |     except Exception as e:
154 |         print('[-] Error retrieving Talos reputation! {0}'.format(e))
155 |         return "error"
156 | 
157 | def checkMXToolbox(domain):
158 |     url = 'https://mxtoolbox.com/Public/Tools/BrandReputation.aspx'
159 |     headers = {'User-Agent':useragent,
160 |             'Origin':url,
161 |             'Referer':url}  
162 | 
163 |     print('[*] Google SafeBrowsing and PhishTank: {}'.format(domain))
164 |     
165 |     try:
166 |         response = s.get(url=url, headers=headers)
167 |         
168 |         soup = BeautifulSoup(response.content,'lxml')
169 | 
170 |         viewstate = soup.select('input[name=__VIEWSTATE]')[0]['value']
171 |         viewstategenerator = soup.select('input[name=__VIEWSTATEGENERATOR]')[0]['value']
172 |         eventvalidation = soup.select('input[name=__EVENTVALIDATION]')[0]['value']
173 | 
174 |         data = {
175 |         "__EVENTTARGET": "",
176 |         "__EVENTARGUMENT": "",
177 |         "__VIEWSTATE": viewstate,
178 |         "__VIEWSTATEGENERATOR": viewstategenerator,
179 |         "__EVENTVALIDATION": eventvalidation,
180 |         "ctl00$ContentPlaceHolder1$brandReputationUrl": domain,
181 |         "ctl00$ContentPlaceHolder1$brandReputationDoLookup": "Brand Reputation Lookup",
182 |         "ctl00$ucSignIn$hfRegCode": 'missing',
183 |         "ctl00$ucSignIn$hfRedirectSignUp": '/Public/Tools/BrandReputation.aspx',
184 |         "ctl00$ucSignIn$hfRedirectLogin": '',
185 |         "ctl00$ucSignIn$txtEmailAddress": '',
186 |         "ctl00$ucSignIn$cbNewAccount": 'cbNewAccount',
187 |         "ctl00$ucSignIn$txtFullName": '',
188 |         "ctl00$ucSignIn$txtModalNewPassword": '',
189 |         "ctl00$ucSignIn$txtPhone": '',
190 |         "ctl00$ucSignIn$txtCompanyName": '',
191 |         "ctl00$ucSignIn$drpTitle": '',
192 |         "ctl00$ucSignIn$txtTitleName": '',
193 |         "ctl00$ucSignIn$txtModalPassword": ''
194 |         }
195 |           
196 |         response = s.post(url=url, headers=headers, data=data)
197 | 
198 |         soup = BeautifulSoup(response.content,'lxml')
199 | 
200 |         a = ''
201 |         if soup.select('div[id=ctl00_ContentPlaceHolder1_noIssuesFound]'):
202 |             a = 'No issues found'
203 |             return a
204 |         else:
205 |             if soup.select('div[id=ctl00_ContentPlaceHolder1_googleSafeBrowsingIssuesFound]'):
206 |                 a = 'Google SafeBrowsing Issues Found. '
207 |         
208 |             if soup.select('div[id=ctl00_ContentPlaceHolder1_phishTankIssuesFound]'):
209 |                 a += 'PhishTank Issues Found'
210 |             return a
211 | 
212 |     except Exception as e:
213 |         print('[-] Error retrieving Google SafeBrowsing and PhishTank reputation!')
214 |         return "error"
215 | 
216 | def downloadMalwareDomains(malwaredomainsURL):
217 |     url = malwaredomainsURL
218 |     response = s.get(url=url,headers=headers,verify=False)
219 |     responseText = response.text
220 |     if response.status_code == 200:
221 |         return responseText
222 |     else:
223 |         print("[-] Error reaching:{}  Status: {}").format(url, response.status_code)
224 | 
225 | def checkDomain(domain):
226 |     print('[*] Fetching domain reputation for: {}'.format(domain))
227 | 
228 |     if domain in maldomainsList:
229 |         print("[!] {}: Identified as known malware domain (malwaredomains.com)".format(domain))
230 |       
231 |     bluecoat = checkBluecoat(domain)
232 |     print("[+] {}: {}".format(domain, bluecoat))
233 |     
234 |     ibmxforce = checkIBMXForce(domain)
235 |     print("[+] {}: {}".format(domain, ibmxforce))
236 | 
237 |     ciscotalos = checkTalos(domain)
238 |     print("[+] {}: {}".format(domain, ciscotalos))
239 | 
240 |     mxtoolbox = checkMXToolbox(domain)
241 |     print("[+] {}: {}".format(domain, mxtoolbox))
242 | 
243 |     print("")
244 |     
245 |     results = [domain,bluecoat,ibmxforce,ciscotalos,mxtoolbox]
246 |     return results
247 | 
248 | def solveCaptcha(url,session):  
249 |     # Downloads CAPTCHA image and saves to current directory for OCR with tesseract
250 |     # Returns CAPTCHA string or False if error occured
251 |     
252 |     jpeg = 'captcha.jpg'
253 |     
254 |     try:
255 |         response = session.get(url=url,headers=headers,verify=False, stream=True)
256 |         if response.status_code == 200:
257 |             with open(jpeg, 'wb') as f:
258 |                 response.raw.decode_content = True
259 |                 shutil.copyfileobj(response.raw, f)
260 |         else:
261 |             print('[-] Error downloading CAPTCHA file!')
262 |             return False
263 | 
264 |         # Perform basic OCR without additional image enhancement
265 |         text = pytesseract.image_to_string(Image.open(jpeg))
266 |         text = text.replace(" ", "")
267 |         
268 |         # Remove CAPTCHA file
269 |         try:
270 |             os.remove(jpeg)
271 |         except OSError:
272 |             pass
273 | 
274 |         return text
275 | 
276 |     except Exception as e:
277 |         print("[-] Error solving CAPTCHA - {0}".format(e))
278 |         
279 |         return False
280 | 
281 | def drawTable(header,data):
282 |     
283 |     data.insert(0,header)
284 |     t = Texttable(max_width=maxwidth)
285 |     t.add_rows(data)
286 |     t.header(header)
287 |     
288 |     return(t.draw())
289 | 
290 | ## MAIN
291 | if __name__ == "__main__":
292 | 
293 | 
294 |     parser = argparse.ArgumentParser(
295 |         description='Finds expired domains, domain categorization, and Archive.org history to determine good candidates for C2 and phishing domains',
296 |         epilog = '''
297 | Examples:
298 | ./domainhunter.py -k apples -c --ocr -t5
299 | ./domainhunter.py --check --ocr -t3
300 | ./domainhunter.py --single mydomain.com
301 | ./domainhunter.py --keyword tech --check --ocr --timing 5 --alexa
302 | ./domaihunter.py --filename inputlist.txt --ocr --timing 5''',
303 |         formatter_class=argparse.RawDescriptionHelpFormatter)
304 | 
305 |     parser.add_argument('-a','--alexa', help='Filter results to Alexa listings', required=False, default=0, action='store_const', const=1)
306 |     parser.add_argument('-k','--keyword', help='Keyword used to refine search results', required=False, default=False, type=str, dest='keyword')
307 |     parser.add_argument('-c','--check', help='Perform domain reputation checks', required=False, default=False, action='store_true', dest='check')
308 |     parser.add_argument('-f','--filename', help='Specify input file of line delimited domain names to check', required=False, default=False, type=str, dest='filename')
309 |     parser.add_argument('--ocr', help='Perform OCR on CAPTCHAs when challenged', required=False, default=False, action='store_true')
310 |     parser.add_argument('-r','--maxresults', help='Number of results to return when querying latest expired/deleted domains', required=False, default=100, type=int, dest='maxresults')
311 |     parser.add_argument('-s','--single', help='Performs detailed reputation checks against a single domain name/IP.', required=False, default=False, dest='single')
312 |     parser.add_argument('-t','--timing', help='Modifies request timing to avoid CAPTCHAs. Slowest(0) = 90-120 seconds, Default(3) = 10-20 seconds, Fastest(5) = no delay', required=False, default=3, type=int, choices=range(0,6), dest='timing')
313 |     parser.add_argument('-w','--maxwidth', help='Width of text table', required=False, default=400, type=int, dest='maxwidth')
314 |     parser.add_argument('-V','--version', action='version',version='%(prog)s {version}'.format(version=__version__))
315 |     args = parser.parse_args()
316 | 
317 |     # Load dependent modules
318 |     try:
319 |         import requests
320 |         from bs4 import BeautifulSoup
321 |         from texttable import Texttable
322 |         
323 |     except Exception as e:
324 |         print("Expired Domains Reputation Check")
325 |         print("[-] Missing basic dependencies: {}".format(str(e)))
326 |         print("[*] Install required dependencies by running `pip3 install -r requirements.txt`")
327 |         quit(0)
328 | 
329 |     # Load OCR related modules if --ocr flag is set since these can be difficult to get working
330 |     if args.ocr:
331 |         try:
332 |             import pytesseract
333 |             from PIL import Image
334 |             import shutil
335 |         except Exception as e:
336 |             print("Expired Domains Reputation Check")
337 |             print("[-] Missing OCR dependencies: {}".format(str(e)))
338 |             print("[*] Install required Python dependencies by running: pip3 install -r requirements.txt")
339 |             print("[*] Ubuntu\Debian - Install tesseract by running: apt-get install tesseract-ocr python3-imaging")
340 |             print("[*] macOS - Install tesseract with homebrew by running: brew install tesseract")
341 |             quit(0)
342 | 
343 | ## Variables
344 | 
345 |     alexa = args.alexa
346 | 
347 |     keyword = args.keyword
348 | 
349 |     check = args.check
350 | 
351 |     filename = args.filename
352 |     
353 |     maxresults = args.maxresults
354 |     
355 |     single = args.single
356 | 
357 |     timing = args.timing
358 | 
359 |     maxwidth = args.maxwidth
360 |     
361 |     ocr = args.ocr
362 |     
363 |     malwaredomainsURL = 'http://mirror1.malwaredomains.com/files/justdomains'
364 | 
365 |     expireddomainsqueryURL = 'https://www.expireddomains.net/domain-name-search'  
366 | 
367 |     timestamp = time.strftime("%Y%m%d_%H%M%S")
368 |             
369 |     useragent = 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)'
370 |    
371 |     headers = {'User-Agent':useragent}
372 | 
373 |     requests.packages.urllib3.disable_warnings()
374 |  
375 |     # HTTP Session container, used to manage cookies, session tokens and other session information
376 |     s = requests.Session()
377 | 
378 |     title = '''
379 |  ____   ___  __  __    _    ___ _   _   _   _ _   _ _   _ _____ _____ ____  
380 | |  _ \ / _ \|  \/  |  / \  |_ _| \ | | | | | | | | | \ | |_   _| ____|  _ \ 
381 | | | | | | | | |\/| | / _ \  | ||  \| | | |_| | | | |  \| | | | |  _| | |_) |
382 | | |_| | |_| | |  | |/ ___ \ | || |\  | |  _  | |_| | |\  | | | | |___|  _ < 
383 | |____/ \___/|_|  |_/_/   \_\___|_| \_| |_| |_|\___/|_| \_| |_| |_____|_| \_\ '''
384 | 
385 |     print(title)
386 |     print("")
387 |     print("Expired Domains Reputation Checker")
388 |     print("Authors: @joevest and @andrewchiles\n")
389 |     print("DISCLAIMER: This is for educational purposes only!")
390 |     disclaimer = '''It is designed to promote education and the improvement of computer/cyber security.  
391 | The authors or employers are not liable for any illegal act or misuse performed by any user of this tool.
392 | If you plan to use this content for illegal purpose, don't.  Have a nice day :)'''
393 |     print(disclaimer)
394 |     print("")
395 | 
396 |     # Download known malware domains
397 |     print('[*] Downloading malware domain list from {}\n'.format(malwaredomainsURL))
398 |     maldomains = downloadMalwareDomains(malwaredomainsURL)
399 |     maldomainsList = maldomains.split("\n")
400 | 
401 |     # Retrieve reputation for a single choosen domain (Quick Mode)
402 |     if single:
403 |         checkDomain(single)
404 |         exit(0)
405 | 
406 |     # Perform detailed domain reputation checks against input file, print table, and quit. This does not generate an HTML report
407 |     if filename:
408 |         # Initialize our list with an empty row for the header
409 |         data = []
410 |         try:
411 |             with open(filename, 'r') as domainsList:
412 |                 for line in domainsList.read().splitlines():
413 |                     data.append(checkDomain(line))
414 |                     doSleep(timing)
415 | 
416 |                 # Print results table
417 |                 header = ['Domain', 'BlueCoat', 'IBM X-Force', 'Cisco Talos', 'MXToolbox']
418 |                 print(drawTable(header,data))
419 | 
420 |         except KeyboardInterrupt:
421 |             print('Caught keyboard interrupt. Exiting!')
422 |             exit(0)
423 |         except Exception as e:
424 |             print('[-] Error: {}'.format(e))
425 |             exit(1)
426 |         exit(0)
427 |      
428 |     # Generic Proxy support 
429 |     # TODO: add as a parameter 
430 |     proxies = {
431 |       'http': 'http://127.0.0.1:8080',
432 |       'https': 'http://127.0.0.1:8080',
433 |     }
434 | 
435 |     # Create an initial session
436 |     domainrequest = s.get("https://www.expireddomains.net",headers=headers,verify=False)
437 |     
438 |     # Use proxy like Burp for debugging request/parsing errors
439 |     #domainrequest = s.get("https://www.expireddomains.net",headers=headers,verify=False,proxies=proxies)
440 | 
441 |     # Lists for our ExpiredDomains results
442 |     domain_list = []
443 |     data = []
444 | 
445 |     # Generate list of URLs to query for expired/deleted domains
446 |     urls = []
447 |     
448 |     # Use the keyword string to narrow domain search if provided. This generates a list of URLs to query
449 | 
450 |     if keyword:
451 |         print('[*] Fetching expired or deleted domains containing "{}"'.format(keyword))
452 |         for i in range (0,maxresults,25):
453 |             if i == 0:
454 |                 urls.append("{}/?q={}&fwhois=22&ftlds[]=2&ftlds[]=3&ftlds[]=4&falexa={}".format(expireddomainsqueryURL,keyword,alexa))
455 |                 headers['Referer'] ='https://www.expireddomains.net/domain-name-search/?q={}&start=1'.format(keyword)
456 |             else:
457 |                 urls.append("{}/?start={}&q={}&ftlds[]=2&ftlds[]=3&ftlds[]=4&fwhois=22&falexa={}".format(expireddomainsqueryURL,i,keyword,alexa))
458 |                 headers['Referer'] ='https://www.expireddomains.net/domain-name-search/?start={}&q={}'.format((i-25),keyword)
459 |     
460 |     # If no keyword provided, generate list of recently expired domains URLS (batches of 25 results).
461 |     else:
462 |         print('[*] Fetching expired or deleted domains...')
463 |         # Caculate number of URLs to request since we're performing a request for two different resources instead of one
464 |         numresults = int(maxresults / 2)
465 |         for i in range (0,(numresults),25):
466 |             urls.append('https://www.expireddomains.net/backorder-expired-domains?start={}&ftlds[]=2&ftlds[]=3&ftlds[]=4&falexa={}'.format(i,alexa))
467 |             urls.append('https://www.expireddomains.net/deleted-com-domains/?start={}&ftlds[]=2&ftlds[]=3&ftlds[]=4&falexa={}'.format(i,alexa))
468 |  
469 |     for url in urls:
470 | 
471 |         print("[*]  {}".format(url))
472 | 
473 |         # Annoyingly when querying specific keywords the expireddomains.net site requires additional cookies which 
474 |         #  are set in JavaScript and not recognized by Requests so we add them here manually.
475 |         # May not be needed, but the _pk_id.10.dd0a cookie only requires a single . to be successful
476 |         # In order to somewhat match a real cookie, but still be different, random integers are introduced
477 | 
478 |         r1 = random.randint(100000,999999)
479 | 
480 |         # Known good example _pk_id.10.dd0a cookie: 5abbbc772cbacfb1.1496760705.2.1496760705.1496760705
481 |         pk_str = '5abbbc772cbacfb1' + '.1496' + str(r1) + '.2.1496' + str(r1) + '.1496' + str(r1)
482 | 
483 |         jar = requests.cookies.RequestsCookieJar()
484 |         jar.set('_pk_ses.10.dd0a', '*', domain='expireddomains.net', path='/')
485 |         jar.set('_pk_id.10.dd0a', pk_str, domain='expireddomains.net', path='/')
486 |         
487 |         domainrequest = s.get(url,headers=headers,verify=False,cookies=jar)
488 |         #domainrequest = s.get(url,headers=headers,verify=False,cookies=jar,proxies=proxies)
489 | 
490 |         domains = domainrequest.text
491 |    
492 |         # Turn the HTML into a Beautiful Soup object
493 |         soup = BeautifulSoup(domains, 'lxml')    
494 |         #print(soup)
495 |         try:
496 |             table = soup.find("table")
497 | 
498 |             rows = table.findAll('tr')[1:]
499 |             for row in table.findAll('tr')[1:]:
500 | 
501 |                 # Alternative way to extract domain name
502 |                 # domain = row.find('td').find('a').text
503 | 
504 |                 cells = row.findAll("td")
505 | 
506 |                 if len(cells) >= 1:
507 |                     if keyword:
508 | 
509 |                         c0 = row.find('td').find('a').text   # domain
510 |                         c1 = cells[1].find(text=True)   # bl
511 |                         c2 = cells[2].find(text=True)   # domainpop
512 |                         c3 = cells[3].find(text=True)   # birth
513 |                         c4 = cells[4].find(text=True)   # Archive.org entries
514 |                         c5 = cells[5].find(text=True)   # Alexa
515 |                         c6 = cells[6].find(text=True)   # Dmoz.org
516 |                         c7 = cells[7].find(text=True)   # status com
517 |                         c8 = cells[8].find(text=True)   # status net
518 |                         c9 = cells[9].find(text=True)   # status org
519 |                         c10 = cells[10].find(text=True) # status de
520 |                         c11 = cells[11].find(text=True) # TLDs
521 |                         c12 = cells[12].find(text=True) # RDT
522 |                         c13 = cells[13].find(text=True) # List
523 |                         c14 = cells[14].find(text=True) # Status
524 |                         c15 = ""                        # Links 
525 | 
526 |                         # create available TLD list
527 |                         available = ''
528 |                         if c7 == "available":
529 |                             available += ".com "
530 | 
531 |                         if c8 == "available":
532 |                             available += ".net "
533 | 
534 |                         if c9 == "available":
535 |                             available += ".org "
536 | 
537 |                         if c10 == "available":
538 |                             available += ".de "
539 |                         
540 |                         # Only grab status for keyword searches since it doesn't exist otherwise
541 |                         status = ""
542 |                         if keyword:
543 |                             status = c14
544 |                         
545 |                         # Only add Expired, not Pending, Backorder, etc
546 |                         if c13 == "Expired":
547 |                             # Append parsed domain data to list if it matches our criteria (.com|.net|.org and not a known malware domain)
548 |                             if (c0.lower().endswith(".com") or c0.lower().endswith(".net") or c0.lower().endswith(".org")) and (c0 not in maldomainsList):
549 |                                 domain_list.append([c0,c3,c4,available,status]) 
550 | 
551 |                     # Non-keyword search table format is slightly different
552 |                     else:
553 |                     
554 |                         c0 = cells[0].find(text=True)   # domain
555 |                         c1 = cells[1].find(text=True)   # bl
556 |                         c2 = cells[2].find(text=True)   # domainpop
557 |                         c3 = cells[3].find(text=True)   # birth
558 |                         c4 = cells[4].find(text=True)   # Archive.org entries
559 |                         c5 = cells[5].find(text=True)   # Alexa
560 |                         c6 = cells[6].find(text=True)   # Dmoz.org
561 |                         c7 = cells[7].find(text=True)   # status com
562 |                         c8 = cells[8].find(text=True)   # status net
563 |                         c9 = cells[9].find(text=True)   # status org
564 |                         c10 = cells[10].find(text=True) # status de
565 |                         c11 = cells[11].find(text=True) # TLDs
566 |                         c12 = cells[12].find(text=True) # RDT
567 |                         c13 = cells[13].find(text=True) # End Date
568 |                         c14 = cells[14].find(text=True) # Links
569 |                         
570 |                         # create available TLD list
571 |                         available = ''
572 |                         if c7 == "available":
573 |                             available += ".com "
574 | 
575 |                         if c8 == "available":
576 |                             available += ".net "
577 | 
578 |                         if c9 == "available":
579 |                             available += ".org "
580 | 
581 |                         if c10 == "available":
582 |                             available += ".de "
583 | 
584 |                         status = ""
585 | 
586 |                         # Append original parsed domain data to list if it matches our criteria (.com|.net|.org and not a known malware domain)
587 |                         if (c0.lower().endswith(".com") or c0.lower().endswith(".net") or c0.lower().endswith(".org")) and (c0 not in maldomainsList):
588 |                             domain_list.append([c0,c3,c4,available,status]) 
589 |                         
590 |         except Exception as e: 
591 |             print("[!] Error: ", e)
592 |             pass
593 | 
594 |         # Add additional sleep on requests to ExpiredDomains.net to avoid errors
595 |         time.sleep(5)
596 | 
597 |     # Check for valid list results before continuing
598 |     if len(domain_list) == 0:
599 |         print("[-] No domain results found or none are currently available for purchase!")
600 |         exit(0)
601 |     else:
602 |         domain_list_unique = []
603 |         [domain_list_unique.append(item) for item in domain_list if item not in domain_list_unique]
604 | 
605 |         # Print number of domains to perform reputation checks against
606 |         if check:
607 |             print("\n[*] Performing reputation checks for {} domains".format(len(domain_list_unique)))
608 | 
609 |         for domain_entry in domain_list_unique:
610 |             domain = domain_entry[0]
611 |             birthdate = domain_entry[1]
612 |             archiveentries = domain_entry[2]
613 |             availabletlds = domain_entry[3]
614 |             status = domain_entry[4]
615 |             bluecoat = '-'
616 |             ibmxforce = '-'
617 |             ciscotalos = '-'
618 | 
619 |             # Perform domain reputation checks
620 |             if check:
621 |                 
622 |                 bluecoat = checkBluecoat(domain)
623 |                 print("[+] {}: {}".format(domain, bluecoat))
624 |                 ibmxforce = checkIBMXForce(domain)
625 |                 print("[+] {}: {}".format(domain, ibmxforce))
626 |                 ciscotalos = checkTalos(domain)
627 |                 print("[+] {}: {}".format(domain, ciscotalos))
628 |                 print("")
629 |                 # Sleep to avoid captchas
630 |                 doSleep(timing)
631 | 
632 |             # Append entry to new list with reputation if at least one service reports reputation
633 |             if not ((bluecoat in ('Uncategorized','badurl','Suspicious','Malicious Sources/Malnets','captcha','Phishing','Placeholders','Spam','error')) \
634 |                 and (ibmxforce in ('Not found.','error')) and (ciscotalos in ('Uncategorized','error'))):
635 |                 
636 |                 data.append([domain,birthdate,archiveentries,availabletlds,status,bluecoat,ibmxforce,ciscotalos])
637 | 
638 |     # Sort domain list by column 2 (Birth Year)
639 |     sortedDomains = sorted(data, key=lambda x: x[1], reverse=True) 
640 | 
641 |     if check:
642 |         if len(sortedDomains) == 0:
643 |             print("[-] No domains discovered with a desireable categorization!")
644 |             exit(0)
645 |         else:
646 |             print("[*] {} of {} domains discovered with a potentially desireable categorization!".format(len(sortedDomains),len(domain_list)))
647 | 
648 |     # Build HTML Table
649 |     html = ''
650 |     htmlHeader = '<html><head><title>Expired Domain List</title></head>'
651 |     htmlBody = '<body><p>The following available domains report was generated at {}</p>'.format(timestamp)
652 |     htmlTableHeader = '''
653 |                 
654 |                  <table border="1" align="center">
655 |                     <th>Domain</th>
656 |                     <th>Birth</th>
657 |                     <th>Entries</th>
658 |                     <th>TLDs Available</th>
659 |                     <th>Status</th>
660 |                     <th>BlueCoat</th>
661 |                     <th>IBM X-Force</th>
662 |                     <th>Cisco Talos</th>
663 |                     <th>WatchGuard</th>
664 |                     <th>Namecheap</th>
665 |                     <th>Archive.org</th>
666 |                  '''
667 | 
668 |     htmlTableBody = ''
669 |     htmlTableFooter = '</table>'
670 |     htmlFooter = '</body></html>'
671 | 
672 |     # Build HTML table contents
673 |     for i in sortedDomains:
674 |         htmlTableBody += '<tr>'
675 |         htmlTableBody += '<td>{}</td>'.format(i[0]) # Domain
676 |         htmlTableBody += '<td>{}</td>'.format(i[1]) # Birth
677 |         htmlTableBody += '<td>{}</td>'.format(i[2]) # Entries
678 |         htmlTableBody += '<td>{}</td>'.format(i[3]) # TLDs
679 |         htmlTableBody += '<td>{}</td>'.format(i[4]) # Status
680 | 
681 |         htmlTableBody += '<td><a href="https://sitereview.bluecoat.com/" target="_blank">{}</a></td>'.format(i[5]) # Bluecoat
682 |         htmlTableBody += '<td><a href="https://exchange.xforce.ibmcloud.com/url/{}" target="_blank">{}</a></td>'.format(i[0],i[6]) # IBM x-Force Categorization
683 |         htmlTableBody += '<td><a href="https://www.talosintelligence.com/reputation_center/lookup?search={}" target="_blank">{}</a></td>'.format(i[0],i[7]) # Cisco Talos
684 |         htmlTableBody += '<td><a href="http://www.borderware.com/domain_lookup.php?ip={}" target="_blank">WatchGuard</a></td>'.format(i[0]) # Borderware WatchGuard
685 |         htmlTableBody += '<td><a href="https://www.namecheap.com/domains/registration/results.aspx?domain={}" target="_blank">Namecheap</a></td>'.format(i[0]) # Namecheap
686 |         htmlTableBody += '<td><a href="http://web.archive.org/web/*/{}" target="_blank">Archive.org</a></td>'.format(i[0]) # Archive.org
687 |         htmlTableBody += '</tr>'
688 | 
689 |     html = htmlHeader + htmlBody + htmlTableHeader + htmlTableBody + htmlTableFooter + htmlFooter
690 | 
691 |     logfilename = "{}_domainreport.html".format(timestamp)
692 |     log = open(logfilename,'w')
693 |     log.write(html)
694 |     log.close
695 | 
696 |     print("\n[*] Search complete")
697 |     print("[*] Log written to {}\n".format(logfilename))
698 |     
699 |     # Print Text Table
700 |     header = ['Domain', 'Birth', '#', 'TLDs', 'Status', 'BlueCoat', 'IBM', 'Cisco Talos']
701 |     print(drawTable(header,sortedDomains))
702 | 


--------------------------------------------------------------------------------