├── requirements.txt
├── README.md
└── dirscraper.py


/requirements.txt:
--------------------------------------------------------------------------------
1 | requests
2 | argparse
3 | beautifulsoup4
4 | html5lib
5 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Dirscraper
 2 | Dirscraper is an OSINT scanning tool which assists penetration testers in identifying hidden, or previously unknown, directories on a domain or subdomain. This helps greatly in the recon stage of pentesting as it provide pentesters with a larger attack surface for the specific domain.
 3 | 
 4 | ## How does it work?
 5 | Dirscraper works by initially visiting the domain provided by the user. From there, it locates all relative script tags hosted on the website. After this, it reads source code of all those javascript files and locates interesting subdomains and endpoints used in those javascript files. A lot of website developers will not make endpoints publically available but will still allow users to interact with them through javascript when appropriate. Sometimes it takes a rare corner case for this criteria to be met (and for a tool such as Burp Suite to pick up the request to the end point) and it becomes unpractical to manually locate these endpoints.
 6 | 
 7 | # Getting Started
 8 | ## Installation
 9 | To install dirscraper, simply download the python file and make your in the terminal to the directory containing the file. From ther, run the following installation command:
10 | 
11 | ```
12 | $ pip install -r requirements.txt
13 | ```
14 |   
15 | ## Running the program
16 | To run the program, you will need to open the directory containing the file with your terminal. From there, run the following command containing the URL of the site you wish to scan:
17 | 
18 | ```
19 | $ python dirscraper.py -u <URL>
20 | ```
21 | 
22 | ## Outputting to a file
23 | When outputting to a file, you must select a filename (if it already exists, it will append results to the bottom, if it doesn't exist it will create the new file). This flag is optional.
24 | 
25 | ```
26 | $ python dirscraper.py -u <URL> -o <FILE>
27 | ```
28 | 
29 | ## Silent mode
30 | If you are scanning a website and do not wish to see the results displayed in the terminal, then you can set this flag. If you are not outputting to a file, then using this flag will make it impossible to see your results. This flag is optional.
31 | 
32 | ```
33 | $ python dirscraper.py -u <URL> -o <FILE> -s
34 | ```
35 | 


--------------------------------------------------------------------------------
/dirscraper.py:
--------------------------------------------------------------------------------
 1 | import requests, os, argparse, re
 2 | from bs4 import BeautifulSoup
 3 | 
 4 | from requests.packages.urllib3.exceptions import InsecureRequestWarning
 5 | requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
 6 | 
 7 | 
 8 | def regex(content):
 9 |     pattern = "(\"|')(\/[\w\d?\/&=#.!:_-]{1,})(\"|')"
10 |     matches = re.findall(pattern, content)
11 |     response = ""
12 |     i = 0
13 |     for match in matches:
14 |         i += 1
15 |         if i == len(matches):
16 |             response += match[1]
17 |         else:
18 |             response += match[1] + "\n"
19 |     return(response)
20 | print("     _ _                              _____      \n  __| (_)_ __ ___  ___ _ __ __ _ _ __|___ / _ __ \n / _` | | '__/ __|/ __| '__/ _` | '_ \ |_ \| '__|\n| (_| | | |  \__ \ (__| | | (_| | |_) |__) | |   \n \__,_|_|_|  |___/\___|_|  \__,_| .__/____/|_|   \n                                |_|\n\n                        ~Cillian Collins\nOutput:")
21 | 
22 | parser = argparse.ArgumentParser(description='Extract GET parameters from javascript files.')
23 | parser.add_argument('-u', help='URL of the website to scan.')
24 | parser.add_argument('-o', help='Output file (for results).', nargs="?")
25 | parser.add_argument('-s', help='Silent mode (results not printed).', action="store_true")
26 | parser.add_argument('-d', help='Includes domain name in output.', action="store_true")
27 | 
28 | args = parser.parse_args()
29 | 
30 | url = args.u + "/"
31 | try:
32 |     r = requests.get(url, verify=False)
33 | except requests.exceptions.MissingSchema:
34 |     args.u = "http://" + args.u
35 |     url = args.u + "/"
36 |     r = requests.get(url, verify=False)
37 | soup = BeautifulSoup(r.text, 'html5lib')
38 | scripts = soup.find_all('script')
39 | 
40 | linkArr = [args.u]
41 | dirArr = []
42 | 
43 | for script in scripts:
44 |     try:
45 |         if script['src'][0] == "/" and script['src'][1] != "/":
46 |             script = url.split("/")[0:2] + script['src']
47 |             linkArr.append(script)
48 |         else:
49 |             pass
50 |     except:
51 |         pass
52 | for link in linkArr:
53 |     res = requests.get(link, verify=False)
54 |     out = regex(res.text).split("\n")
55 |     for line in out:
56 |         pathArr = line.strip().split("/")
57 |         path = ""
58 |         for i in range(len(pathArr)):
59 |             if i == len(pathArr) - 1:
60 |                 if "." in pathArr[i]:
61 |                     pass
62 |                 else:
63 |                      path += pathArr[i] + "/"
64 |             else:
65 |                   path += pathArr[i] + "/"
66 |         if path != "/" and path != "//":
67 |             dirArr.append(path.replace("//", "/").split("#")[0])
68 |         else:
69 |             pass
70 | 
71 | for directory in list(set(dirArr)):
72 |     if args.o:
73 |         output = open(args.o, "a")
74 |         if args.d:
75 |             output.write(args.u.split("/")[0] + "//" + args.u.split("/")[2] + directory + "\n")
76 |         else:
77 |             output.write(directory + "\n")
78 |     if args.s:
79 |         pass
80 |     else:
81 |         if args.d:
82 |             print(args.u.split("/")[0] + "//" + args.u.split("/")[2] + directory)
83 |         else:
84 |             print(directory)
85 | 


--------------------------------------------------------------------------------