├── .github └── ISSUE_TEMPLATE │ └── bug_report.md ├── .gitignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── bug_report.md ├── corruptlatin └── corruptlatin.py ├── requirements.txt └── spyonweb └── website_connections.py /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Describe the bug** 11 | A clear and concise description of what the bug is. 12 | 13 | **To Reproduce** 14 | Steps to reproduce the behavior: 15 | 1. Go to '...' 16 | 2. Click on '....' 17 | 3. Scroll down to '....' 18 | 4. See error 19 | 20 | **Expected behavior** 21 | A clear and concise description of what you expected to happen. 22 | 23 | **Screenshots** 24 | If applicable, add screenshots to help explain your problem. 25 | 26 | **Desktop (please complete the following information):** 27 | - OS: [e.g. iOS] 28 | - Browser [e.g. chrome, safari] 29 | - Version [e.g. 22] 30 | 31 | **Smartphone (please complete the following information):** 32 | - Device: [e.g. iPhone6] 33 | - OS: [e.g. iOS8.1] 34 | - Browser [e.g. stock browser, safari] 35 | - Version [e.g. 22] 36 | 37 | **Additional context** 38 | Add any other context about the problem here. 39 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | MANIFEST 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | .pytest_cache/ 49 | 50 | # Translations 51 | *.mo 52 | *.pot 53 | 54 | # Django stuff: 55 | *.log 56 | local_settings.py 57 | db.sqlite3 58 | 59 | # Flask stuff: 60 | instance/ 61 | .webassets-cache 62 | 63 | # Scrapy stuff: 64 | .scrapy 65 | 66 | # Sphinx documentation 67 | docs/_build/ 68 | 69 | # PyBuilder 70 | target/ 71 | 72 | # Jupyter Notebook 73 | .ipynb_checkpoints 74 | 75 | # pyenv 76 | .python-version 77 | 78 | # celery beat schedule file 79 | celerybeat-schedule 80 | 81 | # SageMath parsed files 82 | *.sage.py 83 | 84 | # Environments 85 | .env 86 | .venv 87 | bin/ 88 | env/ 89 | ENV/ 90 | env.bak/ 91 | include/ 92 | lib/ 93 | pyvenv.cfg 94 | venv/ 95 | venv.bak/ 96 | 97 | 98 | # Spyder project settings 99 | .spyderproject 100 | .spyproject 101 | 102 | # Rope project settings 103 | .ropeproject 104 | 105 | # mkdocs documentation 106 | /site 107 | 108 | # mypy 109 | .mypy_cache/ 110 | 111 | # bash 112 | .sh 113 | 114 | # misc. 115 | .vscode 116 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | In the interest of fostering an open and welcoming environment, we as 6 | contributors and maintainers pledge to making participation in our project and 7 | our community a harassment-free experience for everyone, regardless of age, body 8 | size, disability, ethnicity, sex characteristics, gender identity and expression, 9 | level of experience, education, socio-economic status, nationality, personal 10 | appearance, race, religion, or sexual identity and orientation. 11 | 12 | ## Our Standards 13 | 14 | Examples of behavior that contributes to creating a positive environment 15 | include: 16 | 17 | * Using welcoming and inclusive language 18 | * Being respectful of differing viewpoints and experiences 19 | * Gracefully accepting constructive criticism 20 | * Focusing on what is best for the community 21 | * Showing empathy towards other community members 22 | 23 | Examples of unacceptable behavior by participants include: 24 | 25 | * The use of sexualized language or imagery and unwelcome sexual attention or 26 | advances 27 | * Trolling, insulting/derogatory comments, and personal or political attacks 28 | * Public or private harassment 29 | * Publishing others' private information, such as a physical or electronic 30 | address, without explicit permission 31 | * Other conduct which could reasonably be considered inappropriate in a 32 | professional setting 33 | 34 | ## Our Responsibilities 35 | 36 | Project maintainers are responsible for clarifying the standards of acceptable 37 | behavior and are expected to take appropriate and fair corrective action in 38 | response to any instances of unacceptable behavior. 39 | 40 | Project maintainers have the right and responsibility to remove, edit, or 41 | reject comments, commits, code, wiki edits, issues, and other contributions 42 | that are not aligned to this Code of Conduct, or to ban temporarily or 43 | permanently any contributor for other behaviors that they deem inappropriate, 44 | threatening, offensive, or harmful. 45 | 46 | ## Scope 47 | 48 | This Code of Conduct applies both within project spaces and in public spaces 49 | when an individual is representing the project or its community. Examples of 50 | representing a project or community include using an official project e-mail 51 | address, posting via an official social media account, or acting as an appointed 52 | representative at an online or offline event. Representation of a project may be 53 | further defined and clarified by project maintainers. 54 | 55 | ## Enforcement 56 | 57 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 58 | reported by contacting the project team at @gitcordier. All 59 | complaints will be reviewed and investigated and will result in a response that 60 | is deemed necessary and appropriate to the circumstances. The project team is 61 | obligated to maintain confidentiality with regard to the reporter of an incident. 62 | Further details of specific enforcement policies may be posted separately. 63 | 64 | Project maintainers who do not follow or enforce the Code of Conduct in good 65 | faith may face temporary or permanent repercussions as determined by other 66 | members of the project's leadership. 67 | 68 | ## Attribution 69 | 70 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, 71 | available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html 72 | 73 | [homepage]: https://www.contributor-covenant.org 74 | 75 | For answers to common questions about this code of conduct, see 76 | https://www.contributor-covenant.org/faq 77 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | - **Spaces or tab**: Spaces 2 | - **Indent**: 4 3 | - **Width**: 4 | - code ≤ 79 + CR 5 | - comment ≤ 72 + CR 6 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 gitcordier 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # bellingcat 2 | 3 | Code from [Bellingcat's guide](https://www.bellingcat.com/category/resources/how-tos/) 4 | 5 | [corruptlatin](https://www.bellingcat.com/resources/how-tos/2018/10/22/corrupt-latin-orthography-revealed-corruption-kyrgyzstan/) 6 | You may need to install selenium-server-standalone-3.14X.Y.jar from 7 | [the selenium website](https://www.seleniumhq.org) and export its location in 8 | your PATH before your Selenium's pip install. 9 | 10 | 11 | [spyonweb](https://www.bellingcat.com/resources/2017/07/31/automatically-discover-website-connections-tracking-codes/) 12 | This is the python 2 code from [here](https://raw.githubusercontent.com/automatingosint/osint_public/master/trackingcodes/website_connections.py) that I turned into a python 3 code. 13 | 14 | -------------------------------------------------------------------------------- /bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | title: BUG 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Describe the bug** 11 | A clear and concise description of what the bug is. 12 | 13 | **To Reproduce** 14 | Steps to reproduce the behavior: 15 | 16 | **Expected behavior** 17 | A clear and concise description of what you expected to happen. 18 | 19 | **Unexpected behavior** 20 | A clear and concise description of what you **did not** expect to happen. 21 | 22 | **Logs** 23 | If applicable, export outputs [e.g. with > bug.txt] to help explain your problem. 24 | 25 | **Versions** 26 | - OS: [e.g. MacOS 10.15] 27 | - Python version [e.g. Python 3.8.5] 28 | 29 | **Additional context** 30 | Add any other context about the problem here. 31 | -------------------------------------------------------------------------------- /corruptlatin/corruptlatin.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import time 3 | from bs4 import BeautifulSoup 4 | from selenium import webdriver 5 | from selenium.webdriver.common.by import By 6 | from selenium.webdriver.common.keys import Keys 7 | from selenium.webdriver.support.ui import Select, WebDriverWait 8 | from selenium.common.exceptions import TimeoutException 9 | from selenium.webdriver.support import expected_conditions as EC 10 | 11 | PATH_TO_EXCEL_FILE = "" #Your excel file location 12 | driver = webdriver.Firefox() 13 | driver.get("http://zakupki.gov.kg/popp/view/order/list.xhtml") 14 | 15 | def date_range(begin_date, end_date): 16 | driver.find_element_by_xpath("//a[@onclick=\"SerachTabToggle()\"]").click() 17 | 18 | inputElement = driver.find_element_by_id("tv1:begin_input") 19 | inputElement.send_keys(begin_date) 20 | inputElement = driver.find_element_by_id("tv1:end_input") 21 | inputElement.send_keys(end_date) 22 | 23 | inputElement.submit() 24 | 25 | def total_pages(): 26 | #Show 50 per page 27 | time.sleep(2) 28 | dropdown = Select(driver.find_element_by_name("j_idt104:j_idt105:table_rppDD")) 29 | dropdown.select_by_visible_text("50") 30 | 31 | time.sleep(2) 32 | driver.find_element_by_xpath("//a[@aria-label=\"Last Page\"]").click() 33 | 34 | time.sleep(2) 35 | last_page = int(driver.find_elements_by_xpath 36 | ("//a[@clas=\'ui-paginator-page ui-state-default ui-corner-all\']")[-1].text) +1 37 | driver.find_element_by_xpath("//[@aria-label=\"First Page\"]").click() 38 | 39 | return last_page 40 | 41 | date_range('01.01.2015', '01.06.2015') 42 | total_pages = total_pages() 43 | print("Ther are ", total_pages, ". It will take approximately ", int(total_pages*4/60), "minutes to scrape it.") 44 | 45 | 46 | page = 1 47 | number = [] 48 | government_agency = [] 49 | procurement_name = [] 50 | cost_expected = [] 51 | date_published = [] 52 | 53 | while page <= total_pages: 54 | time.sleep(2) 55 | nextpage = driver.find_element_by_link_text(str(page)) 56 | nextpage.click() 57 | time.sleep(3) 58 | 59 | html = BeautifulSoup(driver.page_source, 'html.parser') 60 | name_box = html.find_all("td", attrs = {"role": "gridcell"}) 61 | 62 | for line in name_box: 63 | if '№\n\t\t\t ' in line.text: 64 | number.append(line.text[26: 41]) 65 | elif "Наименование органзаци" in line.text: 66 | government_agency.append(line.text.replace("\nНаименование органзаци", "")) 67 | elif "Наименование закупки" in line.text: 68 | government_agency.append(line.text.replace("\nНаименование закупки", "")) 69 | elif "Планируемая сумма" in line.text: 70 | government_agency.append(line.text.replace("\nПланируемая сумма", "")) 71 | elif "дата опубликования" in line.text: 72 | government_agency.append(line.text.replace("\дата опубликования", "")) 73 | page+=1 74 | 75 | list = [('number', number), ('government_agency', government_agency), ('procurement_name', procurement_name), 76 | ('cost_expected', cost_expected), ('date_published', date_published)] 77 | df = pd.DataFrame.from_items(list) 78 | 79 | if df.duplicated().sum() != 0: 80 | print("There are", df.duplicated().sum(), 'duplicate rows. Please increase "time.sleep()"') 81 | 82 | column = [value for value in df["procurement_name"]] 83 | iscorruptiblelatin = [] 84 | 85 | for string in column: 86 | string_latin_cyr = False 87 | 88 | for word in string.split(): 89 | for index, char1 in enumerate(word): 90 | if char1.upper() in 'АЕТУОНКХСВМ': 91 | if index != len(word) -1: #if not the last char - check the right 92 | char2 = word[index+1] 93 | 94 | if 1039 1: 28 | 29 | return tracking_code.rsplit("-",1)[0] 30 | 31 | else: 32 | 33 | return tracking_code 34 | 35 | # 36 | # Extract tracking codes from a target domain. 37 | # 38 | def extract_tracking_codes(domains): 39 | 40 | tracking_codes = [] 41 | connections = {} 42 | 43 | for domain in domains: 44 | 45 | # send a request off to the website 46 | try: 47 | 48 | print("[*] Checking %s for tracking codes." % domain) 49 | 50 | if not domain.startswith("http:"): 51 | site = "http://" + domain 52 | 53 | response = requests.get(site) 54 | 55 | except: 56 | 57 | print("[!] Failed to reach site.") 58 | 59 | continue 60 | 61 | # extract the tracking codes 62 | extracted_codes = [] 63 | extracted_codes.extend(google_adsense_pattern.findall(response.content)) 64 | extracted_codes.extend(google_analytics_pattern.findall(response.content)) 65 | 66 | # loop over the extracted tracking codes 67 | for code in extracted_codes: 68 | 69 | # remove the trailing dash and number 70 | code = clean_tracking_code(code) 71 | 72 | if code.lower() not in tracking_codes: 73 | 74 | print("[*] Discovered: %s" % code.lower()) 75 | 76 | if code not in connections.keys(): 77 | connections[code] = [domain] 78 | else: 79 | connections[code].append(domain) 80 | 81 | 82 | return connections 83 | 84 | # 85 | # Send a request off to Spy On Web 86 | # 87 | def spyonweb_request(data,request_type="domain"): 88 | 89 | params = {} 90 | params['access_token'] = spyonweb_access_token 91 | 92 | response = requests.get(spyonweb_url+request_type+"/"+data,params=params) 93 | 94 | if response.status_code == 200: 95 | 96 | result = response.json() 97 | 98 | if result['status'] != "not_found": 99 | 100 | return result 101 | 102 | return None 103 | 104 | # 105 | # Function to check the extracted analytics codes with Spyonweb 106 | # 107 | def spyonweb_analytics_codes(connections): 108 | 109 | 110 | # use any found tracking codes on Spyonweb 111 | for code in connections: 112 | 113 | # send off the tracking code to Spyonweb 114 | if code.lower().startswith("pub"): 115 | 116 | request_type = "adsense" 117 | 118 | elif code.lower().startswith("ua"): 119 | 120 | request_type = "analytics" 121 | 122 | print("[*] Trying code: %s on Spyonweb." % code) 123 | 124 | results = spyonweb_request(code,request_type) 125 | 126 | if results: 127 | 128 | for domain in results['result'][request_type][code]['items']: 129 | 130 | print("[*] Found additional domain: %s" % domain) 131 | 132 | connections[code].append(domain) 133 | 134 | return connections 135 | 136 | # 137 | # Use Spyonweb to grab full domain reports. 138 | # 139 | def spyonweb_domain_reports(connections): 140 | 141 | # now loop over all of the domains and request a domain report 142 | tested_domains = [] 143 | all_codes = connections.keys() 144 | 145 | for code in all_codes: 146 | 147 | for domain in connections[code]: 148 | 149 | if domain not in tested_domains: 150 | 151 | tested_domains.append(domain) 152 | 153 | print("[*] Getting domain report for: %s" % domain) 154 | 155 | results = spyonweb_request(domain) 156 | 157 | if results: 158 | 159 | # loop over adsense results 160 | adsense = results['result'].get("adsense") 161 | 162 | if adsense: 163 | 164 | for code in adsense: 165 | 166 | code = clean_tracking_code(code) 167 | 168 | if code not in connections: 169 | 170 | connections[code] = [] 171 | 172 | for domain in adsense[code]['items'].keys(): 173 | 174 | if domain not in connections[code]: 175 | 176 | print("[*] Discovered new domain: %s" % domain) 177 | 178 | connections[code].append(domain) 179 | 180 | analytics = results['result'].get("analytics") 181 | 182 | if analytics: 183 | 184 | for code in analytics: 185 | 186 | code = clean_tracking_code(code) 187 | 188 | if code not in connections: 189 | 190 | connections[code] = [] 191 | 192 | for domain in analytics[code]['items'].keys(): 193 | 194 | if domain not in connections[code]: 195 | 196 | print("[*] Discovered new domain: %s" % domain) 197 | 198 | connections[code].append(domain) 199 | 200 | return connections 201 | 202 | # 203 | # Graph the connections so we can visualize in Gephi or other tools 204 | # 205 | def graph_connections(connections,domains,graph_file): 206 | 207 | graph = networkx.Graph() 208 | 209 | for connection in connections: 210 | 211 | # add the tracking code to the graph 212 | graph.add_node(connection,{"type":"tracking_code"}) 213 | 214 | for domain in connections[connection]: 215 | 216 | # if it was one of our original domains we set the attribute appropriately 217 | if domain in domains: 218 | 219 | graph.add_node(domain,{"type":"source_domain"}) 220 | 221 | else: 222 | 223 | # this would be a discovered domain so the attribute is different 224 | graph.add_node(domain,{"type":"domain"}) 225 | 226 | # now connect the tracking code to the domain 227 | graph.add_edge(connection,domain) 228 | 229 | 230 | networkx.write_gexf(graph,graph_file) 231 | 232 | print("[*] Wrote out graph to %s" % graph_file) 233 | 234 | return 235 | 236 | # build a domain list either by the file passed in 237 | # or create a single item list with the domain passed in 238 | if args.file: 239 | 240 | with open(args.file,"rb") as fd: 241 | 242 | domains = fd.read().splitlines() 243 | 244 | else: 245 | 246 | domains = [args.domain] 247 | 248 | # extract the codes from the live domains 249 | connections = extract_tracking_codes(domains) 250 | 251 | if len(connections.keys()): 252 | 253 | # use Spyonweb to find connected sites via their tracking codes 254 | connections = spyonweb_analytics_codes(connections) 255 | 256 | # request full domain reports from Spyonweb to tease out any other connections 257 | connections = spyonweb_domain_reports(connections) 258 | 259 | # now create a graph of the connections 260 | graph_connections(connections,domains,args.graph) 261 | 262 | else: 263 | 264 | print("[!] No tracking codes found!") 265 | sys.exit(0) 266 | 267 | 268 | print("[*] Finished! Open %s in Gephi and have fun!" % args.graph) --------------------------------------------------------------------------------