├── README.md └── Zoominfo ├── README.md └── zoominfo-scraper.py /README.md: -------------------------------------------------------------------------------- 1 | # HTTPScrapers 2 | NetSPI HTTP Scrapers 3 | -------------------------------------------------------------------------------- /Zoominfo/README.md: -------------------------------------------------------------------------------- 1 | # zoominfo-scraper 2 | 3 | ## Purpose 4 | Scrapes zoominfo.com for employee names and turns them 5 | into email addresses automagically. 6 | 7 | ## Disclaimer 8 | This isn't using any fancy APIs or anything, so if zoominfo.com updates their site at all this script will fail hilariously. 9 | 10 | ## A Word on CloudFlare 11 | The random sleep breaks within the script are in an attempt to avoid CloudFlare rate-limiting. 12 | 13 | CloudFlare DDoS protection is really the biggest hurdle for scraping zoominfo.com. CloudFlare has a browser Javascript challenge every so often that cloudscraper attempts to solve. Then if CloudFlare decides you have solved its challenges too quickly or too often, it may randomly decide to completely block you from accessing the site. The only option at that point would be to wait until the rate-limiting subsides or connect via a different IP address. Both the Javascript challenge and the full-blown rate-limiting are 429 responses, so if you see a 429 from the script and cloudscraper isn't broken (also possible) you are likely being rate-limited :). VPNs are a quick and easy way around the issue if it comes up, and the script will also pause automatically at any 429 responses. 14 | 15 | 16 | ## Getting started 17 | 18 | Google dork your target "domain.tld" like so: 19 | 20 | site:zoominfo.com domain.tld 21 | 22 | Pick the correct instance (usually the first) and give the script everything after "https://www.zoominfo.com/c/" as **-z**, e.g., 23 | 24 | -z 'netspi-inc/36078304' -d netspi.com 25 | 26 | To have the script automatically search and select the first Google dork result and proceed, use the **-g** switch with **-d**, e.g., 27 | 28 | -d netspi.com -g 29 | 30 | There are four format (-f) options: 31 | 1. flast@domain.tld (e.g., jdoe@netspi.com) (default) 32 | 2. first.last@domain.tld (e.g., john.doe@netspi.com) 33 | 3. lastf@domain.tld (e.g., doej@netspi.com) 34 | 4. Full name (e.g., John Marie Doe) 35 | 36 | CloudFlare is a bear, so random sleeps are added for each 37 | request to not poke the bear too much. If there are more 38 | than 10 pages, a ~60 second break is taken every 10 pages. 39 | 40 | If you get multiple 429's returned, it's likely that you 41 | are being rate limited by CloudFlare. You can try 42 | changing your IP and continuing with 'y'. 43 | 44 | ## Install 45 | The tool needs the following package to run: 46 | 47 | pip3 install cloudscraper 48 | 49 | Optional: 50 | 51 | pip3 install google 52 | 53 | ## Run instructions 54 | Requires python3. (-z) or (-g **and** -d) are required. (-z) is the zoominfo.com path described above. (-d) appends whatever domain.tld you want to the employee names. The (-f) format options are also described above. An output file (-o) option will write to a given filename. Run the script with no options or (-h) to see the help menu. 55 | 56 | python3 zoominfo-scraper.py -z zoominfo/path [-d domain.tld] [-f 1] [-o outputfile.txt] [-g] 57 | 58 | ## References 59 | - https://pypi.org/project/cloudscraper/ 60 | -------------------------------------------------------------------------------- /Zoominfo/zoominfo-scraper.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import argparse 3 | import sys 4 | import textwrap 5 | import cloudscraper # pip3 install cloudscraper 6 | import re 7 | from time import sleep 8 | import random 9 | 10 | ''' 11 | Scrapes zoominfo.com for employee names and turns them into email addresses automagically 12 | ''' 13 | 14 | parser = argparse.ArgumentParser(formatter_class=argparse.RawDescriptionHelpFormatter, 15 | description=textwrap.dedent('''\ 16 | ########################################################## 17 | Scrapes zoominfo.com for employee names and turns them 18 | into email addresses automagically 19 | 20 | Getting started - 21 | 22 | Google dork your target "domain.tld" like so: 23 | 24 | site:zoominfo.com domain.tld 25 | 26 | Pick the correct instance (usually the first) and give 27 | everything after "https://www.zoominfo.com/c/" as -z 28 | e.g., -z 'netspi-inc/36078304' 29 | 30 | The -g switch with a domain (-d) will automatically 31 | take the first Google search result and proceed 32 | e.g., -d netspi.com -g 33 | 34 | There are four format (-f) options: 35 | 1. flast@domain.tld (e.g., jdoe@netspi.com) (default) 36 | 2. first.last@domain.tld (e.g., john.doe@netspi.com) 37 | 3. lastf@domain.tld (e.g., doej@netspi.com) 38 | 4. Full name (e.g., John Marie Doe) 39 | 40 | CloudFlare is a bear, so random breaks are added for each 41 | request to not poke the bear too much. If there are more 42 | than 10 pages, a 60 second break is taken every 10 pages. 43 | 44 | If you get multiple 429's returned, it's likely that you 45 | are being rate limited by CloudFlare. You can try 46 | changing your IP and continuing with 'y'. 47 | ########################################################## 48 | ''')) 49 | parser.add_argument('-z', metavar='zoominfo/path', help='zoominfo.com path after /c/') 50 | parser.add_argument('-d', metavar='domain.tld', help='The domain.tld to append to addresses') 51 | parser.add_argument('-f', metavar='format', help='1:flast(default), 2:first.last, 3:lastf, 4:full', type=int, default=1) 52 | parser.add_argument('-g', help='switch. automatically grab first google.com result for -d', action='store_true') 53 | parser.add_argument('-o', metavar='outputfile', help='output filename') 54 | parser.add_argument('-p', metavar='page', help='Page number to start on, default 1', type=int, default=1) 55 | args = vars(parser.parse_args()) 56 | 57 | if not args['z'] and not (args['g'] and args['d']): 58 | parser.print_help(sys.stderr) 59 | sys.exit() 60 | 61 | if args['f'] > 4: 62 | print("[-] Please double check your format option before we start. Exiting..") 63 | sys.exit() 64 | else: 65 | format_option = args['f'] 66 | 67 | if args['d']: 68 | domain_tld = "@" + args['d'] 69 | else: 70 | domain_tld = "" 71 | 72 | starting_page = args['p'] 73 | zoom_url = "" 74 | 75 | if args['z']: 76 | zoom_url = "https://www.zoominfo.com/pic/{0}".format(args['z']) 77 | else: 78 | print('[+] Using Google search') 79 | import googlesearch # pip3 install google 80 | for googleurl in googlesearch.search('site:zoominfo.com {0}'.format(args['d']), stop=1): 81 | print('[+] Using URL: {0}'.format(googleurl)) 82 | zoom_url = "https://www.zoominfo.com/pic/{0}".format(googleurl.split("/c/",1)[1]) 83 | 84 | if not zoom_url: 85 | print('[-] No URL found. Try giving the correct path manually with -z.') 86 | print('[-] Exiting') 87 | sys.exit() 88 | 89 | random_header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'} 90 | 91 | 92 | def zoomscrape(zoomurl, randomheader, startpage): 93 | pageno = startpage 94 | counter = 0 95 | status_code = 200 96 | companynamelist = [] 97 | # Using the cloudflarescraper to bypass CloudFlare checks: 98 | s = cloudscraper.create_scraper() # returns a CloudflareScraper instance 99 | while status_code == 200 or status_code == 429: 100 | nexturl = zoomurl+'?pageNum={0}'.format(pageno) 101 | print("[*] Requesting page {0}".format(pageno)) 102 | r = s.get(nexturl, headers=randomheader) 103 | status_code = r.status_code 104 | if status_code == 200: 105 | print("[+] Found! Parsing page {0}".format(pageno)) 106 | # This will break if they update their site at all :) 107 | companynamelist += re.findall(r'class="link amplitudeElement">(.*?)', r.text) 108 | pageno += 1 109 | counter += 1 110 | elif status_code == 429: 111 | print("[-] Site returned status code: ", status_code) 112 | print("[-] Likely rate-limited by CloudFlare :/ Pausing") 113 | answer = input("[*] Maybe change your IP address. Continue? (y/N)") 114 | if not answer == 'y' and not answer == 'Y': 115 | break 116 | elif status_code == 410: 117 | print("[+] Site returned status code: ", status_code) 118 | print("[+] We seem to be at the end! Yay!") 119 | break 120 | else: 121 | print("[-] Site returned status code: ", status_code) 122 | print("[-] Status code not 200. Not sure why.. Quitting!") 123 | break 124 | print("[*] Random sleep break to appease CloudFlare") 125 | sleep(random.randint(1, 8)) 126 | if not counter % 50: 127 | print("[*] Taking a 5 minute break after 50 pages!") 128 | sleep(300 + random.randint(1, 10)) 129 | elif not counter % 10: 130 | print("[*] Taking a 60 second break after 10 pages!") 131 | sleep(60 + random.randint(1, 10)) 132 | return companynamelist 133 | # End def zoomscrape() 134 | 135 | 136 | def printlist(emaillist, domaintld, formatoption, outputfile): 137 | # Print out the company name list 138 | if not emaillist: 139 | print("[-] List appears to be empty") 140 | else: 141 | print("[+] Printing email address list") 142 | z = [] 143 | if formatoption == 1: 144 | for y in emaillist: 145 | first, *middle, last = y.split() 146 | z.append(first[0] + last + domaintld) 147 | z = list(map(str.lower, z)) 148 | z = set(z) 149 | z = sorted(z) 150 | elif formatoption == 2: 151 | for y in emaillist: 152 | first, *middle, last = y.split() 153 | z.append(first + "." + last + domaintld) 154 | z = list(map(str.lower, z)) 155 | z = set(z) 156 | z = sorted(z) 157 | elif formatoption == 3: 158 | for y in emaillist: 159 | first, *middle, last = y.split() 160 | z.append(last + first[0] + domaintld) 161 | z = list(map(str.lower, z)) 162 | z = set(z) 163 | z = sorted(z) 164 | else: 165 | z = emaillist 166 | z = set(z) 167 | z = sorted(z) 168 | if outputfile: 169 | try: 170 | print("[*] Writing to file {0}".format(outputfile)) 171 | with open(outputfile, 'w') as f: 172 | for x in z: 173 | f.write("{0}\n".format(x)) 174 | except: 175 | print("[-] Write to file failed! Printing here instead:") 176 | for x in z: 177 | print(x) 178 | else: 179 | for x in z: 180 | print(x) 181 | print("[+] Found " + str(len(z)) + " names!") 182 | # End def printlist() 183 | 184 | email_list = zoomscrape(zoom_url, random_header, starting_page) 185 | email_list = set(email_list) 186 | email_list = sorted(email_list) 187 | printlist(email_list, domain_tld, format_option, args['o']) 188 | --------------------------------------------------------------------------------