├── _config.yml
├── BappManifest.bmf
├── BappDescription.html
├── .gitignore
├── README.md
└── blhchecker.py


/_config.yml:
--------------------------------------------------------------------------------
1 | theme: jekyll-theme-merlot


--------------------------------------------------------------------------------
/BappManifest.bmf:
--------------------------------------------------------------------------------
 1 | Uuid: fa31e4d97b4c410aa72a8cf4d9a65d6e
 2 | ExtensionType: 2
 3 | Name: Broken Link Hijacking
 4 | RepoName: broken-link-hijacking
 5 | ScreenVersion: 1.0
 6 | SerialVersion: 1
 7 | MinPlatformVersion: 0
 8 | ProOnly: True
 9 | Author: Arbazz Hussain
10 | ShortDescription: Discover broken links
11 | EntryPoint: blhchecker.py
12 | 


--------------------------------------------------------------------------------
/BappDescription.html:
--------------------------------------------------------------------------------
 1 | <h3>Discover broken links</h3>
 2 | 
 3 | <h4>Features</h4>
 4 | <ul>
 5 |   <li>Supports various HTML elements/attributes with regex based on <a href="https://github.com/stevenvachon/broken-link-checker/blob/09682b3250e1b2d01e4995bac77ec77cb612db46/test/helpers/json-generators/scrapeHtml.js">scrapeHtml.js</a></li>
 6 |   <li>Concurrently checks multiple links using defined threads.</li>
 7 |   <li>Customizing [STATUS_CODES|PATH-PATTERN|MIME-TYPE]</li>
 8 | </ul>
 9 | 
10 | <h4>Usage</h4>
11 | <p>By default it passively scans the responses with Target "Scope in" . Make sure to add the targets into the scope. (Reason: To Avoid Noise)</p>
12 | 
13 | <p>WHITELIST_CODES - You can add status_code's to this list for more accurate results.</p>
14 | 
15 | <p>ex: avoiding https redirects by adding 301, if the path,url redirects to https.</p>
16 | 
17 | <p>WHITELIST_PATTERN - Regex extracting pattern based on given patterns.</p>
18 | 
19 | <ul>
20 |   <li>ex: /admin.php</li>
21 |   <li>//google.com/test.jpg</li>
22 |   <li>../../img.src</li>
23 |   <li>WHITELIST_MEMES - Whitelisting MimeType to be processed for scanning patterns in responses if their Mime-Type matches.</li>
24 | </ul>
25 | 
26 | <p>ex: Mainly used to avoid performing regexes in gif,img,jpg,swf etc</p>
27 | 
28 | <p>no_of_threads - Increase no of threads , default : 15</p>
29 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | *.egg-info/
 24 | .installed.cfg
 25 | *.egg
 26 | MANIFEST
 27 | 
 28 | # PyInstaller
 29 | #  Usually these files are written by a python script from a template
 30 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 31 | *.manifest
 32 | *.spec
 33 | 
 34 | # Installer logs
 35 | pip-log.txt
 36 | pip-delete-this-directory.txt
 37 | 
 38 | # Unit test / coverage reports
 39 | htmlcov/
 40 | .tox/
 41 | .coverage
 42 | .coverage.*
 43 | .cache
 44 | nosetests.xml
 45 | coverage.xml
 46 | *.cover
 47 | .hypothesis/
 48 | .pytest_cache/
 49 | 
 50 | # Translations
 51 | *.mo
 52 | *.pot
 53 | 
 54 | # Django stuff:
 55 | *.log
 56 | local_settings.py
 57 | db.sqlite3
 58 | 
 59 | # Flask stuff:
 60 | instance/
 61 | .webassets-cache
 62 | 
 63 | # Scrapy stuff:
 64 | .scrapy
 65 | 
 66 | # Sphinx documentation
 67 | docs/_build/
 68 | 
 69 | # PyBuilder
 70 | target/
 71 | 
 72 | # Jupyter Notebook
 73 | .ipynb_checkpoints
 74 | 
 75 | # pyenv
 76 | .python-version
 77 | 
 78 | # celery beat schedule file
 79 | celerybeat-schedule
 80 | 
 81 | # SageMath parsed files
 82 | *.sage.py
 83 | 
 84 | # Environments
 85 | .env
 86 | .venv
 87 | env/
 88 | venv/
 89 | ENV/
 90 | env.bak/
 91 | venv.bak/
 92 | 
 93 | # Spyder project settings
 94 | .spyderproject
 95 | .spyproject
 96 | 
 97 | # Rope project settings
 98 | .ropeproject
 99 | 
100 | # mkdocs documentation
101 | /site
102 | 
103 | # mypy
104 | .mypy_cache/
105 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | About Broken Link Hijacking : https://edoverflow.com/2017/broken-link-hijacking/ by @EdOverflow
 2 | 
 3 | # BLH Plugin
 4 | 
 5 | Burp Extension to discover broken links using IScannerCheck & synchronized threads.
 6 | 
 7 | # Features
 8 | 
 9 | ---
10 | 
11 | - Supports various HTML elements/attributes with regex based on following
12 | 
13 | [`https://github.com/stevenvachon/broken-link-checker/blob/09682b3250e1b2d01e4995bac77ec77cb612db46/test/helpers/json-generators/scrapeHtml.js`](https://github.com/stevenvachon/broken-link-checker/blob/09682b3250e1b2d01e4995bac77ec77cb612db46/test/helpers/json-generators/scrapeHtml.js)
14 | 
15 | - Concurrently checks multiple links using defined threads.
16 | - Customizing **`[STATUS_CODES|PATH-PATTERN|MIME-TYPE]`**
17 | 
18 | [https://github.com/arbazkiraak/BurpBLH/blob/master/blhchecker.py#L20](https://github.com/arbazkiraak/BurpBLH/blob/master/blhchecker.py#L20)
19 | 
20 | # Usage
21 | 
22 | ---
23 | 
24 | By default it passively scans the responses with Target "Scope in" . Make sure to add the targets into the scope. (Reason: To Avoid Noise)
25 | 
26 | `WHITELIST_CODES` - You can add status_code's to this list for more accurate results. 
27 | 
28 | ex: avoiding https redirects by adding `301`, if the path,url redirects to https.
29 | 
30 | `WHITELIST_PATTERN` - Regex extracting pattern based on given patterns.
31 | 
32 | - ex: /admin.php
33 | - //google.com/test.jpg
34 | - ../../img.src
35 | 
36 | `WHITELIST_MEMES` - Whitelisting MimeType to be processed for scanning patterns in responses if their Mime-Type matches.
37 | 
38 | ex: Mainly used to avoid performing regexes in `gif,img,jpg,swf etc`
39 | 
40 | `no_of_threads` - Increase no of threads , default : 15
41 | 
42 | ---
43 | 
44 | # Output
45 | 
46 | - 2 Ways it outputs the broken links.
47 | 1. Broken Links which belongs to external origins.
48 | 2. Broken Links which belongs to same origins.
49 | - If there are no external origin broken links then look for same origin broken links & return **same origin broken links.**
50 | - if there are external origin broken links & same origin broken links then return only ***external origin broken links.***
51 | 
52 | OUTPUT1: External Origins
53 | 
54 | ![](https://cdn-images-1.medium.com/max/1000/1*D6GdgfKlLvw8GYEuYQ5Exw.png)
55 | 
56 | OUTPUT2: Same Origins
57 | 
58 | ![](https://cdn-images-1.medium.com/max/1000/1*4BAh1fjeVEw76swr_RyMgw.png)
59 | 
60 | This plugin is based on [https://github.com/stevenvachon/broken-link-checker](https://github.com/stevenvachon/broken-link-checker)
61 | 


--------------------------------------------------------------------------------
/blhchecker.py:
--------------------------------------------------------------------------------
  1 | from burp import IBurpExtender,IScannerCheck,IScanIssue,IHttpService
  2 | from array import array
  3 | import re,threading
  4 | try:
  5 | 	import Queue as queue
  6 | except:
  7 | 	import queue as queue
  8 | from java.net import URL
  9 | import urlparse
 10 | from java.io import PrintWriter
 11 | from java.lang import RuntimeException
 12 | 
 13 | ## FOR URL BASED, Use below regex
 14 | #REGEX = r'https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+'
 15 | 
 16 | ## LINK BASED REGEX FOR HTML
 17 | regex_str = r'(href|src|icon|data|url)=[\'"]?([^\'" >;]+)'
 18 | 
 19 | 
 20 | WHITELIST_CODES = [200]
 21 | WHITELIST_PATTERN = ["/","../","http","www"]
 22 | WHITELIST_MEMES = ['HTML','Script','Other text','CSS','JSON']
 23 | 
 24 | class BurpExtender(IBurpExtender,IScannerCheck):
 25 | 	extension_name = "Link Hijacking"
 26 | 	q = queue.Queue()
 27 | 	temp_urls = []
 28 | 	no_of_threads = 15
 29 | 	def registerExtenderCallbacks(self,callbacks):
 30 | 		try:
 31 | 			self._callbacks = callbacks
 32 | 			self._helpers = callbacks.getHelpers()
 33 | 			self._callbacks.setExtensionName(self.extension_name)
 34 | 			self._stdout = PrintWriter(self._callbacks.getStdout(),True)
 35 | 			self._stderr = PrintWriter(self._callbacks.getStderr(),True)
 36 | 			callbacks.registerScannerCheck(self)
 37 | 			self._stdout.println("Extension Successfully Installed")
 38 | 			return
 39 | 		except Exception as e:
 40 | 			self._stderr.println("Installation Problem ?")
 41 | 			self._stderr.println(str(e))
 42 | 			return
 43 | 
 44 | 	def split(self,strng, sep, pos):
 45 | 		strng = strng.split(sep)
 46 | 		return sep.join(strng[:pos]), sep.join(strng[pos:])
 47 | 
 48 | 
 49 | 	def _up_check(self,url):
 50 | 			parse_object = urlparse.urlparse(url)
 51 | 			hostname = parse_object.netloc
 52 | 			if parse_object.scheme == 'https':
 53 | 				port = 443
 54 | 				SSL = True
 55 | 			else:
 56 | 				port = 80
 57 | 				SSL = False
 58 | 			try:
 59 | 				req_bytes = self._helpers.buildHttpRequest(URL(str(url)))
 60 | 				res_bytes = self._callbacks.makeHttpRequest(hostname,port,SSL,req_bytes)
 61 | 				res = self._helpers.analyzeResponse(res_bytes)
 62 | 				if res.getStatusCode() not in WHITELIST_CODES:
 63 | 					return url
 64 | 			except Exception as e:
 65 | 				print(e)
 66 | 				print('SKIPPING : ',url)
 67 | 				return None
 68 | 
 69 | 	def process_queue(self,res):
 70 | 		while not self.q.empty():
 71 | 			url = self.q.get()
 72 | 			furl = self._up_check(str(url))
 73 | 			if furl is not None:
 74 | 				#print(furl)
 75 | 				self.temp_urls.append(furl)
 76 | 
 77 | 	def _blf(self,baseRequestResponse,regex,host):
 78 | 		res = self._helpers.bytesToString(baseRequestResponse.getResponse())
 79 | 		re_r = regex.findall(res)
 80 | 		if len(re_r) == 0:
 81 | 			return
 82 | 		re_r = list(set([tuple(j for j in re_r if j)[-1] for re_r in re_r]))
 83 | 		re_r = list(set([i for i in re_r if i.startswith(tuple(WHITELIST_PATTERN))]))
 84 | 		#re_r = [i.encode('utf-8').strip('') for i in re_r if isinstance(i,bytes)]
 85 | 		#re_r = [i.replace('\\','/') for i in re_r]
 86 | 		final = []
 87 | 		for i in re_r:
 88 | 			if i.startswith('//') and ' ' not in i:
 89 | 				i = 'https:'+str(i)
 90 | 				final.append(i)
 91 | 			elif i.startswith('/') and ' ' not in i:
 92 | 				if '.' in i.split('/')[1]:
 93 | 					i = 'https:/'+str(i)
 94 | 					final.append(i)
 95 | 				else:
 96 | 					i = str(host)+str(i)
 97 | 					final.append(i)
 98 | 			elif (i.startswith('http') or i.startswith('www.')) and ' ' not in i:
 99 | 				final.append(i)
100 | 			elif i.startswith('../') and ' ' not in i:
101 | 				i = str(host)+'/'+str(i)
102 | 				final.append(i)
103 | 		for url in final:
104 | 			self.q.put(str(url))
105 | 
106 | 		threads = []
107 | 		for i in range(int(BurpExtender.no_of_threads)):
108 | 			t = threading.Thread(target=self.process_queue,args=(baseRequestResponse,))
109 | 			threads.append(t)
110 | 			t.start()
111 | 		
112 | 		for j in threads:
113 | 			j.join()			
114 | 		return True
115 | 
116 | 
117 | 	def doPassiveScan(self,baseRequestResponse):
118 | 		reqInfo = self._helpers.analyzeRequest(baseRequestResponse.getHttpService(),baseRequestResponse.getRequest())
119 | 		if self._callbacks.isInScope(reqInfo.getUrl()) and str(self._helpers.analyzeResponse(baseRequestResponse.getResponse()).getInferredMimeType()) in WHITELIST_MEMES:
120 | 			_HTTP = baseRequestResponse.getHttpService()
121 | 			print('ON : ',str(reqInfo.getUrl()))
122 | 			_host = str(_HTTP.getProtocol())+"://"+str(_HTTP.getHost())
123 | 			regex = re.compile(regex_str,re.VERBOSE)
124 | 			res = self._blf(baseRequestResponse,regex,_host)
125 | 			if res and len(self.temp_urls) > 0:
126 | 				final_urls = self.temp_urls[:]
127 | 				self.temp_urls[:] = []
128 | 				return [CustomScanIssue(baseRequestResponse.getHttpService(),self._helpers.analyzeRequest(baseRequestResponse).getUrl(),
129 | 			[self._callbacks.applyMarkers(baseRequestResponse,None,None)],"Broken Link Hijacking",final_urls,"Low",_HTTP.getHost())]
130 | 
131 | 	def consolidateDuplicateIssues(self,existingIssue,newIssue):
132 | 		if existingIssue.getIssueName() == newIssue.getIssueName():
133 | 			return -1
134 | 		return 0
135 | 
136 | 	def extensionUnloaded(self):
137 | 		self._stdout.println("Extension was unloaded")
138 | 		return
139 | 
140 | class CustomScanIssue (IScanIssue):
141 | 	def __init__(self, httpService, url, httpMessages, name, detail, severity, hostbased):
142 | 		self._httpService = httpService
143 | 		self._url = url
144 | 		self._httpMessages = httpMessages
145 | 		self._name = name
146 | 		self._detail = detail
147 | 		self._severity = severity
148 | 		self._hostbased = hostbased
149 | 
150 | 
151 | 	def getUrl(self):
152 | 		return self._url
153 | 
154 | 	def getIssueName(self):
155 | 		return self._name
156 | 
157 | 	def getIssueType(self):
158 | 		return 0
159 | 
160 | 	def getSeverity(self):
161 | 		return self._severity
162 | 
163 | 	def getConfidence(self):
164 | 		return "Certain"
165 | 
166 | 	def getIssueBackground(self):
167 | 		pass
168 | 
169 | 	def getRemediationBackground(self):
170 | 		pass
171 | 
172 | 	def getIssueDetail(self):
173 | 		host_b = []
174 | 		for i in self._detail:
175 | 			if self._hostbased not in i:
176 | 				host_b.append(str(i))
177 | 		if len(host_b) > 0:
178 | 			final = '<br>'.join(host_b)
179 | 			return str(final)
180 | 		else:
181 | 			final_without_host = '<br>'.join(self._detail)
182 | 			return str(final_without_host)
183 | 
184 | 	def getRemediationDetail(self):
185 | 		pass
186 | 
187 | 	def getHttpMessages(self):
188 | 		return self._httpMessages
189 | 
190 | 	def getHttpService(self):
191 | 		return self._httpService
192 | 


--------------------------------------------------------------------------------