├── README.md
└── Zoominfo
    ├── README.md
    └── zoominfo-scraper.py


/README.md:
--------------------------------------------------------------------------------
1 | # HTTPScrapers
2 | NetSPI HTTP Scrapers
3 | 


--------------------------------------------------------------------------------
/Zoominfo/README.md:
--------------------------------------------------------------------------------
 1 | # zoominfo-scraper
 2 | 
 3 | ## Purpose
 4 | Scrapes zoominfo.com for employee names and turns them
 5 | into email addresses automagically.
 6 | 
 7 | ## Disclaimer
 8 | This isn't using any fancy APIs or anything, so if zoominfo.com updates their site at all this script will fail hilariously.
 9 | 
10 | ## A Word on CloudFlare
11 | The random sleep breaks within the script are in an attempt to avoid CloudFlare rate-limiting.
12 | 
13 | CloudFlare DDoS protection is really the biggest hurdle for scraping zoominfo.com. CloudFlare has a browser Javascript challenge every so often that cloudscraper attempts to solve. Then if CloudFlare decides you have solved its challenges too quickly or too often, it may randomly decide to completely block you from accessing the site. The only option at that point would be to wait until the rate-limiting subsides or connect via a different IP address. Both the Javascript challenge and the full-blown rate-limiting are 429 responses, so if you see a 429 from the script and cloudscraper isn't broken (also possible) you are likely being rate-limited :). VPNs are a quick and easy way around the issue if it comes up, and the script will also pause automatically at any 429 responses.
14 | 
15 | 
16 | ## Getting started
17 | 
18 | Google dork your target "domain.tld" like so:
19 | 
20 |     site:zoominfo.com domain.tld
21 | 
22 | Pick the correct instance (usually the first) and give the script everything after "https://www.zoominfo.com/c/" as **-z**, e.g.,
23 | 
24 |     -z 'netspi-inc/36078304' -d netspi.com
25 | 
26 | To have the script automatically search and select the first Google dork result and proceed, use the **-g** switch with **-d**, e.g.,
27 | 
28 |     -d netspi.com -g
29 | 
30 | There are four format (-f) options:
31 |    1. flast@domain.tld (e.g., jdoe@netspi.com) (default)
32 |    2. first.last@domain.tld (e.g., john.doe@netspi.com)
33 |    3. lastf@domain.tld (e.g., doej@netspi.com)
34 |    4. Full name (e.g., John Marie Doe)
35 | 
36 | CloudFlare is a bear, so random sleeps are added for each
37 | request to not poke the bear too much. If there are more
38 | than 10 pages, a ~60 second break is taken every 10 pages.
39 | 
40 | If you get multiple 429's returned, it's likely that you
41 | are being rate limited by CloudFlare. You can try
42 | changing your IP and continuing with 'y'.
43 | 
44 | ## Install
45 | The tool needs the following package to run:
46 | 
47 |     pip3 install cloudscraper
48 | 
49 | Optional: 
50 | 
51 |     pip3 install google
52 | 
53 | ## Run instructions
54 | Requires python3. (-z) or (-g **and** -d) are required. (-z) is the zoominfo.com path described above. (-d) appends whatever domain.tld you want to the employee names. The (-f) format options are also described above. An output file (-o) option will write to a given filename. Run the script with no options or (-h) to see the help menu.
55 |     
56 |     python3 zoominfo-scraper.py -z zoominfo/path [-d domain.tld] [-f 1] [-o outputfile.txt] [-g]
57 | 
58 | ## References
59 | - https://pypi.org/project/cloudscraper/
60 | 


--------------------------------------------------------------------------------
/Zoominfo/zoominfo-scraper.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | import argparse
  3 | import sys
  4 | import textwrap
  5 | import cloudscraper # pip3 install cloudscraper
  6 | import re
  7 | from time import sleep
  8 | import random
  9 | 
 10 | '''
 11 | Scrapes zoominfo.com for employee names and turns them into email addresses automagically
 12 | '''
 13 | 
 14 | parser = argparse.ArgumentParser(formatter_class=argparse.RawDescriptionHelpFormatter,
 15 |                                  description=textwrap.dedent('''\
 16 | ##########################################################
 17 | Scrapes zoominfo.com for employee names and turns them
 18 | into email addresses automagically
 19 | 
 20 | Getting started -
 21 | 
 22 | Google dork your target "domain.tld" like so:
 23 | 
 24 | site:zoominfo.com domain.tld
 25 | 
 26 | Pick the correct instance (usually the first) and give
 27 | everything after "https://www.zoominfo.com/c/" as -z
 28 | e.g., -z 'netspi-inc/36078304'
 29 | 
 30 | The -g switch with a domain (-d) will automatically
 31 | take the first Google search result and proceed
 32 | e.g., -d netspi.com -g
 33 | 
 34 | There are four format (-f) options:
 35 |     1. flast@domain.tld (e.g., jdoe@netspi.com) (default)
 36 |     2. first.last@domain.tld (e.g., john.doe@netspi.com)
 37 |     3. lastf@domain.tld (e.g., doej@netspi.com)
 38 |     4. Full name (e.g., John Marie Doe)
 39 | 
 40 | CloudFlare is a bear, so random breaks are added for each
 41 | request to not poke the bear too much. If there are more
 42 | than 10 pages, a 60 second break is taken every 10 pages.
 43 | 
 44 | If you get multiple 429's returned, it's likely that you
 45 | are being rate limited by CloudFlare. You can try
 46 | changing your IP and continuing with 'y'.
 47 | ##########################################################
 48 |                                  '''))
 49 | parser.add_argument('-z', metavar='zoominfo/path', help='zoominfo.com path after /c/')
 50 | parser.add_argument('-d', metavar='domain.tld', help='The domain.tld to append to addresses')
 51 | parser.add_argument('-f', metavar='format', help='1:flast(default), 2:first.last, 3:lastf, 4:full', type=int, default=1)
 52 | parser.add_argument('-g', help='switch. automatically grab first google.com result for -d', action='store_true')
 53 | parser.add_argument('-o', metavar='outputfile', help='output filename')
 54 | parser.add_argument('-p', metavar='page', help='Page number to start on, default 1', type=int, default=1)
 55 | args = vars(parser.parse_args())
 56 | 
 57 | if not args['z'] and not (args['g'] and args['d']):
 58 |     parser.print_help(sys.stderr)
 59 |     sys.exit()
 60 | 
 61 | if args['f'] > 4:
 62 |     print("[-] Please double check your format option before we start. Exiting..")
 63 |     sys.exit()
 64 | else:
 65 |     format_option = args['f']
 66 | 
 67 | if args['d']:
 68 |     domain_tld = "@" + args['d']
 69 | else:
 70 |     domain_tld = ""
 71 | 
 72 | starting_page = args['p']
 73 | zoom_url = ""
 74 | 
 75 | if args['z']:
 76 |     zoom_url = "https://www.zoominfo.com/pic/{0}".format(args['z'])
 77 | else:
 78 |     print('[+] Using Google search')
 79 |     import googlesearch # pip3 install google
 80 |     for googleurl in googlesearch.search('site:zoominfo.com {0}'.format(args['d']), stop=1):
 81 |         print('[+] Using URL: {0}'.format(googleurl))
 82 |         zoom_url = "https://www.zoominfo.com/pic/{0}".format(googleurl.split("/c/",1)[1])
 83 | 
 84 | if not zoom_url:
 85 |     print('[-] No URL found. Try giving the correct path manually with -z.')
 86 |     print('[-] Exiting')
 87 |     sys.exit()
 88 | 
 89 | random_header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}
 90 | 
 91 | 
 92 | def zoomscrape(zoomurl, randomheader, startpage):
 93 |     pageno = startpage
 94 |     counter = 0
 95 |     status_code = 200
 96 |     companynamelist = []
 97 |     # Using the cloudflarescraper to bypass CloudFlare checks:
 98 |     s = cloudscraper.create_scraper()  # returns a CloudflareScraper instance
 99 |     while status_code == 200 or status_code == 429:
100 |         nexturl = zoomurl+'?pageNum={0}'.format(pageno)
101 |         print("[*] Requesting page {0}".format(pageno))
102 |         r = s.get(nexturl, headers=randomheader)
103 |         status_code = r.status_code
104 |         if status_code == 200:
105 |             print("[+] Found! Parsing page {0}".format(pageno))
106 |             # This will break if they update their site at all :)
107 |             companynamelist += re.findall(r'class="link amplitudeElement">(.*?)</a>', r.text)
108 |             pageno += 1
109 |             counter += 1
110 |         elif status_code == 429:
111 |             print("[-] Site returned status code: ", status_code)
112 |             print("[-] Likely rate-limited by CloudFlare :/ Pausing")
113 |             answer = input("[*] Maybe change your IP address. Continue? (y/N)")
114 |             if not answer == 'y' and not answer == 'Y':
115 |                 break
116 |         elif status_code == 410:
117 |             print("[+] Site returned status code: ", status_code)
118 |             print("[+] We seem to be at the end! Yay!")
119 |             break
120 |         else:
121 |             print("[-] Site returned status code: ", status_code)
122 |             print("[-] Status code not 200. Not sure why.. Quitting!")
123 |             break
124 |         print("[*] Random sleep break to appease CloudFlare")
125 |         sleep(random.randint(1, 8))
126 |         if not counter % 50:
127 |             print("[*] Taking a 5 minute break after 50 pages!")
128 |             sleep(300 + random.randint(1, 10))
129 |         elif not counter % 10:
130 |             print("[*] Taking a 60 second break after 10 pages!")
131 |             sleep(60 + random.randint(1, 10))
132 |     return companynamelist
133 | # End def zoomscrape()
134 | 
135 | 
136 | def printlist(emaillist, domaintld, formatoption, outputfile):
137 |     # Print out the company name list
138 |     if not emaillist:
139 |         print("[-] List appears to be empty")
140 |     else:
141 |         print("[+] Printing email address list")
142 |         z = []
143 |         if formatoption == 1:
144 |             for y in emaillist:
145 |                 first, *middle, last = y.split()
146 |                 z.append(first[0] + last + domaintld)
147 |             z = list(map(str.lower, z))
148 |             z = set(z)
149 |             z = sorted(z)
150 |         elif formatoption == 2:
151 |             for y in emaillist:
152 |                 first, *middle, last = y.split()
153 |                 z.append(first + "." + last + domaintld)
154 |             z = list(map(str.lower, z))
155 |             z = set(z)
156 |             z = sorted(z)
157 |         elif formatoption == 3:
158 |             for y in emaillist:
159 |                 first, *middle, last = y.split()
160 |                 z.append(last + first[0] + domaintld)
161 |             z = list(map(str.lower, z))
162 |             z = set(z)
163 |             z = sorted(z)
164 |         else:
165 |             z = emaillist
166 |             z = set(z)
167 |             z = sorted(z)
168 |         if outputfile:
169 |             try:
170 |                 print("[*] Writing to file {0}".format(outputfile))
171 |                 with open(outputfile, 'w') as f:
172 |                     for x in z:
173 |                         f.write("{0}\n".format(x))
174 |             except:
175 |                 print("[-] Write to file failed! Printing here instead:")
176 |                 for x in z:
177 |                     print(x)
178 |         else:
179 |             for x in z:
180 |                 print(x)
181 |         print("[+] Found " + str(len(z)) + " names!")
182 | # End def printlist()
183 | 
184 | email_list = zoomscrape(zoom_url, random_header, starting_page)
185 | email_list = set(email_list)
186 | email_list = sorted(email_list)
187 | printlist(email_list, domain_tld, format_option, args['o'])
188 | 


--------------------------------------------------------------------------------