├── LICENSE ├── README.md └── website ├── .gitignore ├── README.md ├── blog ├── 2025-04-25-welcome.md ├── authors.yml └── tags.yml ├── docs ├── 01-introduction.md ├── 04-scripting-examples.md ├── 05-automation-pipelines.md ├── 06-tuning-ethics.md ├── 07-resources.md ├── advanced-guide.md ├── beginner-guide.md ├── contributing.md ├── extra-passive-sources.md ├── faq.md ├── installation │ ├── 03-01-env-setup.md │ ├── 03-02-linux.md │ ├── 03-03-windows.md │ ├── 03-04-docker.md │ └── 03-05-post-install.md ├── intermediate-guide.md ├── pythonosint101.md ├── scripts │ ├── README.md │ ├── all-in-one-passive-recon.md │ ├── archive-org-snapshots.md │ ├── domain-recon-combo.md │ ├── email-breach-checker.md │ ├── favicon-hash-lookup.md │ ├── github-user-osint.md │ ├── ipinfo-lookup.md │ ├── passive-metadata-harvester.md │ ├── pdf-bulk-metadata.md │ ├── phone-validator.md │ ├── reverse-image-search.md │ ├── shodan-host-analyzer.md │ ├── social-media-multi-profile.md │ ├── threat-intel-aggregator.md │ └── url-screenshotter.md ├── showcase.md ├── start-here.md └── tools │ ├── 02-01-frameworks.md │ ├── 02-02-domain-infra.md │ ├── 02-03-people-social.md │ ├── 02-04-threat-intel.md │ └── 02-05-emerging-tools.md ├── docusaurus.config.js ├── package.json ├── sidebars.js ├── src ├── components │ └── HomepageFeatures │ │ ├── index.js │ │ └── styles.module.css ├── css │ └── custom.css └── pages │ ├── index.js │ └── index.module.css └── static ├── .nojekyll └── img ├── Boost.svg ├── Community.svg ├── Time.svg ├── docusaurus-social-card.jpg ├── docusaurus.png ├── favicon.ico ├── logo.svg ├── undraw_docusaurus_mountain.svg ├── undraw_docusaurus_react.svg └── undraw_docusaurus_tree.svg /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2025 tegridy~~ 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # python-OSINT-notes/notebook 2 | ## [a rough written "guide" compiled from years of ADHD brain notes - enjoy] 3 | ### ~tegridy 4 | 5 | ## 1\. Introduction & Scope 6 | 7 | Open-Source Intelligence (OSINT) is about collecting and interpreting publicly available data from websites, social networks, and other internet-connected resources. Python remains popular for OSINT thanks to its large ecosystem and active open-source community. 8 | 9 | I've tried my best to make this python-OSINT-notebook up to date and relevant for (2025), it compiles a bunch of my ADHD brain OSINT notes and streamlined some examples, focusing on python open-source tools, libraries, and techniques for anyone interested in gathering and analyzing public data. 10 | 11 | **WhatIsIt?**: This current version of my python-OSINT-notebook covers: 12 | 13 | - A survey of notable Python OSINT tools and libraries, with current versions, main features, and GitHub or PyPI references. 14 | - Practical guidance on installing, configuring, and using these tools on different operating systems and container environments. 15 | - Code samples that show how to apply OSINT techniques in Python, including ways to customize or chain tools together. 16 | - Considerations for performance, security, and ethics when performing OSINT. 17 | - Future-oriented notes on Python advancements, AI integration, and evolving data access policies. 18 | 19 | Not everything is 100% this is a WIP rough guide so always double check and verify stuff before you install/run it (youalreadydothatyeah?) <3 20 | 21 | ----- 22 | 23 | ## 2\. Core Tools & Libraries 24 | 25 | 26 | ### 2.1 OSINT Frameworks and Multi-Tool Platforms 27 | 28 | **SpiderFoot** – [GitHub - smicallef/spiderfoot](https://github.com/smicallef/spiderfoot) - *OSINT Automation Platform*. 29 | SpiderFoot is a multi-module OSINT tool that scans various data sources (DNS, IP addresses, domain details, social media, dark web sites, etc.). It offers both a web-based interface and a CLI. The open-source version (v4.x) supports at least 200 modules and can export data in multiple formats (CSV, JSON, and others). SpiderFoot is under the MIT license and remains popular for mapping and correlating information about a target, such as a domain or an email address. 30 | 31 | **Recon-ng** – [GitHub - lanmaster53/recon-ng](https://github.com/lanmaster53/recon-ng) - *Modular Reconnaissance Framework*. 32 | Recon-ng is a Python-based framework with a console interface inspired by Metasploit. It includes a marketplace of modules for collecting details about domains, IP ranges, people, and more. Many modules rely on API keys for services like Shodan, Bing, or GitHub. Recon-ng organizes collected info in a database and lets you export final results in a variety of formats. 33 | 34 | **DataSploit** – [GitHub - datasploit/datasploit](https://www.google.com/url?sa=E&source=gmail&q=https://github.com/datasploit/datasploit) - *Automated OSINT Framework*. 35 | DataSploit focuses on automating recon tasks for various targets—domains, emails, phone numbers, etc.—to gather data from multiple sources. It can generate HTML or JSON reports. Its development slowed in recent years, so if you’re interested, check for updated forks or community-maintained versions. The framework’s main value is its ability to do one-stop OSINT across multiple data points. 36 | 37 | **OSRFramework** – - *Open Sources Research*. 38 | OSRFramework includes several Python tools (usufy.py, mailfy.py, searchfy.py) that check usernames on popular sites, search for email address usage, or do DNS lookups. It also includes a console tool. It’s installable via pip, and each sub-tool can be run independently or used as part of a workflow. 39 | 40 | ### 2.2 Domain and Infrastructure OSINT Tools 41 | 42 | **theHarvester** – [GitHub - laramies/theHarvester](https://github.com/laramies/theHarvester) - *Search Aggregation for Domains*. 43 | theHarvester queries search engines (Google, Bing, DuckDuckGo, etc.) and other databases (PGP key servers, DNS) to find emails, subdomains, and hosts linked to a target domain. It’s included in Kali Linux and can be installed via pip. It’s often used early in a domain recon process to see what quick results come up from public sources. 44 | 45 | **Metagoofil** – *Metadata Extraction from Public Files*. 46 | Metagoofil searches for publicly accessible documents related to a domain (PDF, DOCX, PPT) and downloads them to extract metadata such as creator names or software versions. Although it’s an older tool, the concept of scanning for metadata remains relevant for OSINT. In practice, you can replicate some of its behavior with Python libraries like pypdf or ExifTool wrappers. 47 | 48 | **Sublist3r** – [GitHub - aboul3la/Sublist3r](https://github.com/aboul3la/Sublist3r) & **DNS Recon** – *Subdomain Enumeration*. 49 | Sublist3r enumerates subdomains by checking various search engines and DNS queries, while dnsrecon performs DNS lookups, zone transfers, and more. They’re both Python-based and commonly used in the reconnaissance phase to map out an organization’s subdomains. 50 | 51 | **Shodan (Python Library)** – [PyPI - shodan](https://pypi.org/project/shodan/) - *Internet-Device Search*. 52 | Shodan is a search engine that scans the internet for open ports and device banners. Its official Python library offers an easy interface for the Shodan API. You can query by hostname, IP, or service and retrieve detailed port, banner, and vulnerability data without doing direct scanning yourself. 53 | 54 | **Censys (Python Library)** – [PyPI - censys](https://pypi.org/project/censys/) - *Internet Assets Search*. 55 | Censys is another internet-wide scanning platform with a Python library that provides structured data on IPv4 hosts, certificates, and domains. It supports more in-depth queries on SSL certificates and has an API similar in concept to Shodan’s. Both services require an API key to do anything beyond the free tier. 56 | 57 | ### 2.3 People and Social Media OSINT Tools 58 | 59 | **Sherlock** – [GitHub - sherlock-project/sherlock](https://github.com/sherlock-project/sherlock) - *Find Usernames Across Platforms*. 60 | Sherlock checks whether a given username exists on hundreds of social media sites and other platforms. It doesn’t rely on official APIs for most sites but instead checks URLs or account pages. The result is a list of confirmed or unconfirmed accounts, which helps you map a person’s online presence. 61 | 62 | **Social Analyzer** – [GitHub - qeeqbox/social-analyzer](https://github.com/qeeqbox/social-analyzer) - *Profiles & Intelligence Aggregator*. 63 | Social Analyzer searches up to 1000+ platforms by username, email, phone, or full name. It also has modules that try to confirm whether a discovered profile truly belongs to the target. It can be run as a CLI, library, or through a web interface (with Docker support), making it handy for large-scale persona investigations. 64 | 65 | **Holehe** – [GitHub - megadose/holehe](https://github.com/megadose/holehe) - *Email to Registered Accounts*. 66 | Holehe checks if an email address is registered on about 120 platforms by automating the “forgot password” function, which often reveals if an account exists. This method is passive in the sense that it doesn’t send reset links to the email owner but relies on the site’s response. Holehe can be run from the command line or integrated into a Python script (using async calls). 67 | 68 | **Snscrape** – [GitHub - JustAnotherArchivist/snscrape](https://github.com/JustAnotherArchivist/snscrape) - *Social Network Scraping*. 69 | Snscrape collects posts and user data from platforms like Twitter (X), Reddit, Instagram, Telegram channels, and so on, often without official API keys. It can be used as a CLI tool or a Python library to gather recent or historical posts. Because it scrapes publicly available data, it’s a good alternative when an official API is unavailable or limited. 70 | 71 | **Instaloader** – [GitHub - instaloader/instaloader](https://github.com/instaloader/instaloader) - *Instagram Media & Metadata*. 72 | Instaloader is a Python package that retrieves posts, metadata, and stories from public or (with login) private Instagram accounts you follow. You can use it to download images, captions, and other profile data for OSINT research. 73 | 74 | ### 2.4 Threat Intelligence 75 | 76 | **Maltego Transforms in Python** 77 | Maltego is a graphical OSINT tool (proprietary but with a community edition) that uses “transforms” to pull data from various sources. These transforms can be written in Python, allowing developers to integrate custom data feeds or specialized logic into Maltego’s graph-based interface. 78 | 79 | **MISP / OpenCTI** 80 | These threat intelligence platforms provide APIs and Python libraries (like PyMISP) to store and share OSINT findings in a structured format. They let security teams collaborate, correlate threat indicators, and enrich data with community-fed intelligence. While not purely OSINT tools, they can serve as repositories for discovered indicators. 81 | 82 | **Dark Web Monitoring** 83 | OSINT can involve scanning onion sites for mentions of a target or checking for leaked data in underground forums. Python scripts can use the Tor network via libraries like **Stem**, or specialized platforms like OnionScan (though OnionScan itself is in Go) to gather info. It typically requires custom code to handle captcha or forum logins. 84 | 85 | ----- 86 | 87 | ## 3\. Installation & Configuration 88 | 89 | This section explains how to install and configure Python-based OSINT tools on Linux, Windows, macOS, and Docker. It also touches on setting up API keys and verifying your setup. 90 | 91 | ### 3.1 General Environment Setup 92 | 93 | - **Python Version:** Most tools need Python 3.7+ or newer. Python 3.10+ is recommended. 94 | - **Virtual Environments:** Creating a `venv` (or using Anaconda) avoids library conflicts. 95 | - **Common Dependencies:** Many OSINT libraries rely on `requests`, `beautifulsoup4`, or `dns.resolver`. If installing from `requirements.txt`, dependencies usually get installed automatically. Tools like **snscrape** might need system libraries (e.g., `libxml2-dev` on Linux) for `lxml`. 96 | 97 | ### 3.2 Installing on Linux 98 | 99 | - On **Kali Linux**, tools like `theHarvester`, `recon-ng`, or **SpiderFoot** might be pre-installed or installable via `apt`. 100 | - For the latest versions or other distros (Ubuntu, Debian, etc.), you can typically install them from pip or clone from GitHub. 101 | - Examples: 102 |   - `pip install recon-ng` (then `recon-ng` from CLI). 103 |   - `git clone https://github.com/smicallef/spiderfoot.git && pip install -r requirements.txt` (run `python sf.py`). 104 | - If a tool calls external commands (e.g., `nmap`, `whatweb`), also install those from your package manager. 105 | 106 | ### 3.3 Installing on Windows 107 | 108 | - Install Python 3 from the [official site](https://www.python.org/downloads/) or Microsoft Store. 109 | - Use pip to install the tools as on Linux. Some OS commands might not be available, so you may need **WSL** (Windows Subsystem for Linux) or additional Windows ports of Linux utilities. 110 | - Tools like SpiderFoot or recon-ng run fine in Windows if dependencies are in place. Sherlock is also straightforward. 111 | - For more stability, consider installing and running the tools under WSL2 (Ubuntu on Windows), which behaves like a Linux environment. 112 | 113 | ### 3.4 Installing via Docker 114 | 115 | - Many OSINT projects have official or community Docker images. 116 | - For example, `docker run -p 5001:5001 smicallef/spiderfoot` runs SpiderFoot in a container, exposing its web UI at localhost:5001. 117 | - Sherlock, Social Analyzer, and others often have Dockerfiles. This approach isolates them from your main system and simplifies dependency management. 118 | - Docker Compose can run multiple OSINT containers in one go (e.g., Social Analyzer + MongoDB + a web interface). 119 | 120 | ### 3.5 Post-Installation Configuration 121 | 122 | - **API Keys:** Tools like recon-ng, SpiderFoot, or Shodan require keys for certain modules. Check each tool’s config to add them. 123 | - **Tor/Proxies:** If you want anonymity or access .onion sites, install Tor and configure the tool’s proxy settings (e.g., `--tor` in Sherlock). 124 | - **Databases:** Some frameworks rely on MongoDB or SQLite to store results. Follow their docs if you need persistent storage. 125 | 126 | ### 3.6 Special Considerations (macOS, Cloud, etc.) 127 | 128 | - On **macOS**, installation is similar to Linux. Use Homebrew for system libs if needed. 129 | - **Cloud Deployment:** You can spin up a Linux VM or container in AWS/Azure and run these tools for scheduled recon tasks. Make sure you secure your API keys and handle usage limits. 130 | - **Containers in Cloud:** Docker on cloud platforms can orchestrate advanced OSINT pipelines. 131 | 132 | ----- 133 | 134 | ## 4\. Custom Stuff 135 | 136 | In this section, we walk through typical ways to run these OSINT tools, plus Python snippets for more advanced usage or automation. 137 | 138 | ### 4.1 Using OSINT Tools via CLI 139 | 140 | - **theHarvester** example: 141 |   ` bash   theHarvester -d example.com -l 100 -b bing   ` 142 |   Collects up to 100 results from Bing for any mention of `example.com`, returning emails and subdomains. 143 | 144 | - **Recon-ng** example: 145 |   1. `workspaces create test1` 146 |   2. `add domains example.com` 147 |   3. `marketplace install hackertarget` 148 |   4. `load recon/domains-hosts/hackertarget` 149 |   5. `run` 150 |   6. `show hosts` 151 | 152 | - **SpiderFoot** example: 153 |   ` bash   # Web UI:   python sf.py -l 127.0.0.1:5001   # Then open [http://127.0.0.1:5001](http://127.0.0.1:5001) and set up a scan   ` 154 |   You can also run headless scans via CLI scripts (`sfcli.py`). 155 | 156 | - **Sherlock** example: 157 |   ` bash   sherlock alicebob --timeout 5 --csv   ` 158 |   Outputs discovered accounts to a CSV file. 159 | 160 | - **Holehe** example: 161 |   ` bash   holehe target@example.com   ` 162 |   Checks if `target@example.com` is registered on known platforms. 163 | 164 | ### 4.2 Python Scripting with OSINT Libraries 165 | 166 | **Shodan API**: 167 | 168 | ```python 169 | import shodan 170 | 171 | api = shodan.Shodan("YOUR_API_KEY") 172 | results = api.search("net:198.51.100.0/24") 173 | for match in results["matches"]: 174 |     print(match["ip_str"], match.get("ports", [])) 175 | ``` 176 | 177 | **Censys API**: 178 | 179 | ```python 180 | from censys.search import CensysCertificates 181 | 182 | cc = CensysCertificates() 183 | query = "example.com AND parsed.names: example.com" 184 | results = cc.search(query) 185 | for cert in results: 186 |     print(cert.get("parsed.names")) 187 | ``` 188 | 189 | **Using Sherlock in a Python Script** (subprocess approach): 190 | 191 | ```python 192 | import subprocess, csv 193 | 194 | res = subprocess.run(["sherlock", "--csv", "johnsmith"], capture_output=True, text=True) 195 | lines = res.stdout.splitlines() 196 | for row in csv.reader(lines): 197 |     if row[2].lower() == "exists": 198 |         print(f"Found username on {row[0]}: {row[1]}") 199 | ``` 200 | 201 | ### 4.3 Combining Tools in a Script 202 | 203 | Here’s a quick example that retrieves subdomains from crt.sh and then checks them with Shodan: 204 | 205 | ```python 206 | import requests 207 | import shodan 208 | 209 | domain = "example.com" 210 | res = requests.get(f"[https://crt.sh/?q=%25](https://crt.sh/?q=%25){domain}&output=json") 211 | subdomains = set() 212 | for entry in res.json(): 213 |     names = entry["name_value"].split("\n") 214 |     for name in names: 215 |         if name.endswith(domain): 216 |             subdomains.add(name) 217 | 218 | api = shodan.Shodan("YOUR_KEY") 219 | for sub in list(subdomains)[:10]: 220 |     # Resolve subdomain to IP: 221 |     dns_info = requests.get(f"[https://dns.google/resolve?name=](https://dns.google/resolve?name=){sub}&type=A").json() 222 |     if "Answer" in dns_info: 223 |         ip = dns_info["Answer"][0]["data"] 224 |         try: 225 |             info = api.host(ip) 226 |             print(sub, ip, info.get("ports")) 227 |         except shodan.APIError: 228 |             pass 229 | ``` 230 | 231 | ### 4.4 OSINT Data Integration 232 | 233 | Tools like **pandas** or **NetworkX** can help store or visualize OSINT results. For geolocation data, **Folium** can generate maps. 234 | 235 | Example with NetworkX: 236 | 237 | ```python 238 | import networkx as nx 239 | 240 | G = nx.Graph() 241 | person = "John Doe" 242 | accounts = ["twitter/johndoe", "github/johndoe123"] 243 | G.add_node(person, type="person") 244 | for acc in accounts: 245 |     G.add_node(acc, type="account") 246 |     G.add_edge(person, acc) 247 | nx.write_graphml(G, "output.graphml") 248 | ``` 249 | 250 | ----- 251 | 252 | ### 4.5 Social Media Intelligence 253 | 254 | 1. **Profile Enumeration**: Use Sherlock or Social Analyzer to find accounts across platforms. 255 | 2. **Data Extraction**: For Twitter, you can scrape with snscrape. For Instagram, Instaloader can download posts. 256 | 3. **Analysis**: Summaries, sentiment, entity extraction with libraries like spaCy. 257 | 4. **Custom Scraping**: If official APIs are restricted, you might need to automate a browser with Selenium or create site-specific scrapers. 258 | 259 | ### 4.6 Network Reconnaissance and Asset Mapping 260 | 261 | 1. **Find Domains & IP Ranges**: theHarvester, Sublist3r, and certificate logs for subdomain data. 262 | 2. **Search Shodan/Censys**: Skip direct scans; these services already index open ports. 263 | 3. **Tech Fingerprinting**: Tools like Wappalyzer (Python library) or Recon-ng modules. 264 | 4. **Output to CSV or a database**: Then filter relevant hosts. Possibly tie in vulnerability data from Vulners or the NVD. 265 | 266 | ### 4.7 Automation & Pipelines 267 | 268 | - Schedulers (cron, Windows Task Scheduler) can run OSINT tasks regularly. 269 | - CI/CD or Docker Compose can orchestrate multiple containers that handle different parts of OSINT. 270 | - Custom dashboards with Flask or Django can present aggregated findings in real time. 271 | 272 | ----- 273 | 274 | ## 5\. Misc. 275 | 276 | ### 5.1 Tuning / Rate Limits 277 | 278 | - **Concurrency**: Tools often spend time waiting on network responses. Using async or threading can speed things up. 279 | - **Caching**: Store query results to avoid re-fetching. For example, don’t re-pull the same IP info from Shodan every day. 280 | - **Filtering Early**: Skip unneeded sources to save time if you have a large list of potential targets. 281 | - **Rate Limiting**: Insert sleeps or concurrency controls to avoid site bans. 282 | 283 | ### 5.2 Security and Ethical Considerations (dontdodumbshitandblamethisreadmelol) 284 | 285 | - **Legal Boundaries**: Stay within authorized use, avoid scraping that breaks ToS or tries to bypass paywalls. 286 | - **Ethics & Privacy**: Even if data is public, be cautious about how you store, share, or repackage it. 287 | - **Operational Security**: If investigations are sensitive, use Tor/VPN, separate accounts, or dedicated VMs. 288 | - **Data Protection**: Encrypt or properly store breach data, personal info, or anything sensitive. 289 | - **Log Activities**: Keep logs of what was collected to show that you only accessed public sources. 290 | 291 | ## 6\. OSINT extra knowledge4u/links 292 | 293 | - **Awesome OSINT**: [GitHub - jivoi/awesome-osint: :scream: Awesome OSINT](https://github.com/jivoi/awesome-osint) – A large index of OSINT resources. 294 | - **Kali Linux Tools**: [Kali Linux Tools | Kali Linux](https://www.kali.org/tools/) – for theHarvester, recon-ng, and others pre-bundled with Kali Linux. 295 | - **Python Releases**: [Python Releases for Windows, macOS, Linux, Source code](https://www.python.org/downloads/) – for notes on Python 3.10+ and upcoming changes. 296 | -------------------------------------------------------------------------------- /website/.gitignore: -------------------------------------------------------------------------------- 1 | # Dependencies 2 | /node_modules 3 | 4 | # Production 5 | /build 6 | 7 | # Generated files 8 | .docusaurus 9 | .cache-loader 10 | 11 | # Misc 12 | .DS_Store 13 | .env.local 14 | .env.development.local 15 | .env.test.local 16 | .env.production.local 17 | 18 | npm-debug.log* 19 | yarn-debug.log* 20 | yarn-error.log* 21 | -------------------------------------------------------------------------------- /website/README.md: -------------------------------------------------------------------------------- 1 | # Website 2 | 3 | This website is built using [Docusaurus](https://docusaurus.io/), a modern static website generator. 4 | 5 | ### Installation 6 | 7 | ``` 8 | $ yarn 9 | ``` 10 | 11 | ### Local Development 12 | 13 | ``` 14 | $ yarn start 15 | ``` 16 | 17 | This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server. 18 | 19 | ### Build 20 | 21 | ``` 22 | $ yarn build 23 | ``` 24 | 25 | This command generates static content into the `build` directory and can be served using any static contents hosting service. 26 | 27 | ### Deployment 28 | 29 | Using SSH: 30 | 31 | ``` 32 | $ USE_SSH=true yarn deploy 33 | ``` 34 | 35 | Not using SSH: 36 | 37 | ``` 38 | $ GIT_USER= yarn deploy 39 | ``` 40 | 41 | If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the `gh-pages` branch. 42 | -------------------------------------------------------------------------------- /website/blog/2025-04-25-welcome.md: -------------------------------------------------------------------------------- 1 | --- 2 | slug: welcome 3 | title: Welcome to the Python OSINT Notebook Blog 4 | authors: [tegridydev] 5 | date: 2025-04-25 6 | --- 7 | 8 | This is the new blog for project updates, tutorials, and community stories. Stay tuned for more! 9 | -------------------------------------------------------------------------------- /website/blog/authors.yml: -------------------------------------------------------------------------------- 1 | tegridydev: 2 | name: tegridydev 3 | title: Mechanistic Interpretability (MI) Research & Buildn' Cool Stuff @toolworks-dev 4 | url: https://github.com/tegridydev 5 | image_url: https://avatars.githubusercontent.com/u/131409024?v=4 6 | page: true 7 | socials: 8 | github: tegridydev -------------------------------------------------------------------------------- /website/blog/tags.yml: -------------------------------------------------------------------------------- 1 | docs: 2 | label: docs 3 | permalink: /docs 4 | description: Docs 5 | -------------------------------------------------------------------------------- /website/docs/01-introduction.md: -------------------------------------------------------------------------------- 1 | --- 2 | id: 01-introduction 3 | title: Introduction & Scope 4 | --- 5 | 6 | # Introduction & Scope 7 | 8 | Welcome to the **Python-OSINT-Notebook**, your comprehensive, up-to-date reference for performing **passive** Open-Source Intelligence (OSINT) using Python in 2025. Whether you’re a security researcher, threat analyst, journalist, or hobbyist, this guide will help you: 9 | 10 | - Discover valuable publicly available data without triggering alerts or violating terms of service 11 | - Leverage a curated selection of Python libraries, tools, and workflows 12 | - Automate and schedule repeatable OSINT tasks safely 13 | 14 | --- 15 | 16 | ## Why Passive OSINT? 17 | 18 | > **Passive** OSINT means gathering data from **public indices and archives**—certificate transparency logs, DNS history, web caches, search-engine results, and API-provided information—**without** sending direct probes (port scans, active fingerprinting, etc.). This approach minimizes legal and ethical risks while still revealing a wealth of actionable intelligence. 19 | 20 | --- 21 | 22 | ## How This Notebook Is Organized 23 | 24 | 1. **Core Tools & Libraries**: Deep dives into major frameworks, domain-recon tools, social-media scrapers, and threat-intel platforms. 25 | 2. **Installation & Configuration**: Step-by-step setup on Linux, Windows, macOS, Docker, plus API key and proxy best practices. 26 | 3. **Scripting Examples**: Ready-to-use Python snippets for WHOIS lookups, DNS enumeration, certificate transparency, web metadata, and more. 27 | 4. **Automation & Pipelines**: Cron jobs, GitHub Actions, and Docker Compose recipes for scheduled, unattended data collection. 28 | 5. **Performance & Ethics**: Tips on async concurrency, caching, rate-limiting, and strict adherence to Terms of Service. 29 | 6. **Walkthrough Guides**: End-to-end OSINT101 guide and additional passive-only data sources. 30 | 7. **Resources & Further Reading**: Curated links to communities, datasets, and reference materials. 31 | 32 | --- 33 | 34 | ## Quick Links 35 | 36 | - 👉 **Get Started**: [Installation & Environment Setup](/installation/03-01-env-setup) 37 | - 🧰 **Tool Survey**: [Core Tools & Libraries](/tools/02-01-frameworks) 38 | - 📝 **OSINT101 Walkthrough**: [PythonOSINT101](/pythonosint101) 39 | - 📚 **Additional Data Sources**: [Extra Passive Sources](/extra-passive-sources) 40 | 41 | --- 42 | 43 | ## Recommended External Resources 44 | 45 | - **Awesome OSINT**: A massive index of OSINT tools & tutorials — https://github.com/jivoi/awesome-osint 46 | - **Kali Linux Tools**: Pre-bundled OSINT applications — https://kali.org/tools/ 47 | - **Python 3 Documentation**: Official language reference — https://docs.python.org/3/ 48 | - **Docusaurus Docs**: Learn how this site is built — https://docusaurus.io/docs 49 | - **Certificate Transparency Logs**: Real-time CT search — https://crt.sh 50 | 51 | --- 52 | 53 | > **Tip:** Clone this repository and browse interactively at http://localhost:3000 after running: 54 | > ```bash 55 | > cd website 56 | > npm install 57 | > npm run start 58 | > ``` 59 | 60 | Enjoy your passive, powerful, and ethical OSINT exploration with Python! 61 | -------------------------------------------------------------------------------- /website/docs/04-scripting-examples.md: -------------------------------------------------------------------------------- 1 | --- 2 | id: 04-scripting-examples 3 | title: Python Scripting Examples 4 | --- 5 | 6 | # Python Scripting Examples 7 | 8 | This section provides ready-to-use Python snippets for common passive OSINT tasks: querying public indices, certificate transparency logs, and combining data sources into a simple workflow. 9 | 10 | --- 11 | 12 | ## Shodan API Lookup 13 | 14 | Use the official Shodan Python client to query Shodan’s indexed data without active scanning. 15 | 16 | ```python 17 | import os 18 | import shodan 19 | 20 | # Load your API key from env var 21 | api = shodan.Shodan(os.getenv("SHODAN_API_KEY")) 22 | 23 | # Search for hosts related to example.com 24 | for match in api.search("hostname:example.com")["matches"]: 25 | ip = match["ip_str"] 26 | ports = match.get("ports", []) 27 | print(f"{ip} → open ports: {ports}") 28 | ``` 29 | 30 | --- 31 | 32 | ## Censys Certificate Transparency 33 | 34 | Query Censys’s certificate database to enumerate certificates and extract covered domains. 35 | 36 | ```python 37 | from censys.search import CensysCertificates 38 | 39 | # Initialize client (uses CENSYS_API_ID & CENSYS_API_SECRET from env) 40 | cc = CensysCertificates() 41 | 42 | # Search for certificates whose parsed names include example.com 43 | for cert in cc.search("parsed.names: example.com"): 44 | names = cert.get("parsed", {}).get("names", []) 45 | print("Certificate covers:", names) 46 | ``` 47 | 48 | --- 49 | 50 | ## Combined Passive Workflow 51 | 52 | Pull subdomains from crt.sh and enrich with Shodan metadata in one script: 53 | 54 | ```python 55 | import os 56 | import requests 57 | import shodan 58 | 59 | # Load API key 60 | api = shodan.Shodan(os.getenv("SHODAN_API_KEY")) 61 | 62 | # 1. Fetch subdomains from crt.sh 63 | resp = requests.get("https://crt.sh/?q=%25example.com&output=json") 64 | subdomains = {entry["name_value"] for entry in resp.json()} 65 | 66 | # 2. Query Shodan for each subdomain’s resolved IP 67 | for sub in sorted(subdomains): 68 | # Resolve via DNS-over-HTTPS 69 | dns = requests.get(f"https://dns.google/resolve?name={sub}&type=A").json() 70 | answers = dns.get("Answer", []) 71 | if not answers: 72 | continue 73 | ip = answers[0]["data"] 74 | try: 75 | shodan_info = api.host(ip) 76 | print(f"{sub} ({ip}): ports = {shodan_info.get('ports')}") 77 | except shodan.APIError: 78 | print(f"{sub} ({ip}): no Shodan data") 79 | ``` 80 | 81 | --- 82 | 83 | ## Best Practices & Tips 84 | 85 | - **Error Handling**: Wrap API calls in try/except to handle timeouts or rate limits. 86 | - **Concurrency**: Use `concurrent.futures.ThreadPoolExecutor` for parallel queries, respecting rate limits. 87 | - **Caching**: Store responses locally (e.g., JSON files) to avoid re-querying the same IPs. 88 | - **Environment Variables**: Keep API keys out of code; load them via `python-dotenv` or your shell. 89 | 90 | --- 91 | -------------------------------------------------------------------------------- /website/docs/05-automation-pipelines.md: -------------------------------------------------------------------------------- 1 | --- 2 | id: 05-automation-pipelines 3 | title: Automation & Pipelines 4 | --- 5 | 6 | # Automation & Pipelines 7 | 8 | Once your scripts run correctly, automate them to collect and process OSINT on a schedule. Below are examples for UNIX cron, Windows Task Scheduler, and GitHub Actions. 9 | 10 | --- 11 | 12 | ## 1. Cron / Linux Task Scheduler 13 | 14 | Use cron to run your `osint101.py` (or `passive_osint101.py`) daily at 2 AM: 15 | 16 | ```cron 17 | # Edit with `crontab -e` and add: 18 | 0 2 * * * cd /path/to/project && /path/to/.venv/bin/python osint101.py >> logs/osint.log 2>&1 19 | ``` 20 | 21 | - **Redirect** stdout/stderr to a log file for auditing. 22 | - **cd** into the project root so relative paths resolve. 23 | 24 | --- 25 | 26 | ## 2. Windows Task Scheduler 27 | 28 | 1. **Open** Task Scheduler. 29 | 2. **Create Basic Task** → Name: “OSINT Daily Run”. 30 | 3. **Trigger**: Daily at 03:00. 31 | 4. **Action**: Start a program: 32 | ``` 33 | Program/script: C:\path\to\.venv\Scripts\python.exe 34 | Add arguments: C:\path\to\osint101.py 35 | Start in: C:\path\to\project 36 | ``` 37 | 5. **Finish** and **enable** the task. 38 | 6. **Optional**: Under Settings, check “Run task as soon as possible after a scheduled start is missed.” 39 | 40 | --- 41 | 42 | ## 3. Docker + Cron in Container 43 | 44 | Run your script inside a lightweight Alpine container: 45 | 46 | ```dockerfile 47 | # Dockerfile-osint 48 | FROM python:3.10-alpine 49 | WORKDIR /app 50 | COPY requirements.txt osint101.py ./ 51 | RUN pip install --no-cache-dir -r requirements.txt 52 | COPY logs/ ./logs/ 53 | # Add cron and your script 54 | RUN apk add --no-cache curl 55 | COPY crontab /etc/crontabs/root 56 | CMD ["crond", "-f", "-l", "8"] 57 | ``` 58 | 59 | `crontab` file: 60 | ```cron 61 | 0 4 * * * cd /app && python osint101.py >> logs/osint.log 2>&1 62 | ``` 63 | 64 | Build and run: 65 | ```bash 66 | docker build -t osint-cron -f Dockerfile-osint . 67 | docker run -d --name osint-cron osint-cron 68 | ``` 69 | 70 | --- 71 | 72 | ## 4. GitHub Actions 73 | 74 | Leverage GitHub Actions to run OSINT on a schedule and store outputs in the repo or an artifact. 75 | 76 | ```yaml 77 | # .github/workflows/osint.yml 78 | name: OSINT Daily Run 79 | 80 | on: 81 | schedule: 82 | - cron: '0 5 * * *' # UTC time 83 | 84 | jobs: 85 | run-osint: 86 | runs-on: ubuntu-latest 87 | 88 | steps: 89 | - name: Checkout code 90 | uses: actions/checkout@v3 91 | 92 | - name: Set up Python 93 | uses: actions/setup-python@v4 94 | with: 95 | python-version: '3.10' 96 | 97 | - name: Install dependencies 98 | run: | 99 | python -m pip install --upgrade pip 100 | pip install -r requirements.txt 101 | 102 | - name: Run OSINT script 103 | run: | 104 | mkdir -p results 105 | python osint101.py > results/osint_report_$(date +'%Y%m%d').csv 106 | 107 | - name: Upload results as artifact 108 | uses: actions/upload-artifact@v3 109 | with: 110 | name: osint-report-${{ github.run_date }} 111 | path: results/osint_report_*.csv 112 | ``` 113 | 114 | - **Artifacts**: Store generated reports for later review. 115 | - **Secrets**: Add API keys in GitHub repo Settings → Secrets and reference them via `${{ secrets.SHODAN_API_KEY }}` in your workflow. 116 | 117 | --- 118 | 119 | ## 5. Monitoring & Alerting 120 | 121 | Consider integrating with Slack or email notifications: 122 | 123 | ```yaml 124 | # snippet in GitHub Actions 125 | - name: Send Slack notification 126 | uses: 8398a7/action-slack@v3 127 | with: 128 | status: ${{ job.status }} 129 | fields: repo,commit,author 130 | env: 131 | SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }} 132 | ``` 133 | 134 | Or use a simple Python SMTP snippet at the end of your script to email yourself if anomalies are detected. 135 | 136 | --- 137 | -------------------------------------------------------------------------------- /website/docs/06-tuning-ethics.md: -------------------------------------------------------------------------------- 1 | --- 2 | id: 06-tuning-ethics 3 | title: Performance Tuning & Ethical Considerations 4 | --- 5 | 6 | # Performance Tuning 7 | 8 | Efficient OSINT pipelines balance speed with reliability. Below are key strategies and code examples. 9 | 10 | ## 1. Concurrency 11 | 12 | - **Threading** for I/O-bound tasks (HTTP requests, DNS lookups) 13 | - **Asyncio** for large-scale async I/O 14 | 15 | ### ThreadPoolExecutor Example 16 | 17 | ```python 18 | from concurrent.futures import ThreadPoolExecutor, as_completed 19 | import requests 20 | 21 | domains = ['example.com', 'openai.com', 'github.com'] 22 | 23 | def fetch_whois(domain): 24 | resp = requests.get(f'https://api.example-whois.com/{domain}') 25 | return domain, resp.json() 26 | 27 | with ThreadPoolExecutor(max_workers=5) as exe: 28 | futures = [exe.submit(fetch_whois, d) for d in domains] 29 | for fut in as_completed(futures): 30 | domain, data = fut.result() 31 | print(domain, data.get('registrar')) 32 | ``` 33 | 34 | ### Asyncio + HTTPX Example 35 | 36 | ```python 37 | import asyncio 38 | import httpx 39 | 40 | async def fetch_ct(domain): 41 | url = f'https://crt.sh/?q=%25{domain}&output=json' 42 | async with httpx.AsyncClient(timeout=10) as client: 43 | r = await client.get(url) 44 | return domain, r.json() 45 | 46 | async def main(domains): 47 | tasks = [fetch_ct(d) for d in domains] 48 | for coro in asyncio.as_completed(tasks): 49 | domain, entries = await coro 50 | print(domain, len(entries), 'certs') 51 | 52 | asyncio.run(main(['example.com','openai.com'])) 53 | ``` 54 | 55 | --- 56 | 57 | ## 2. Caching 58 | 59 | Avoid redundant API calls or web requests by caching results: 60 | 61 | ```python 62 | import requests_cache 63 | 64 | # Cache responses in SQLite for 1 hour 65 | requests_cache.install_cache('osint_cache', expire_after=3600) 66 | 67 | def get_subdomains(domain): 68 | resp = requests_cache.CachedSession().get( 69 | f'https://api.securitytrails.com/v1/domain/{domain}/subdomains', 70 | headers={'APIKEY': 'YOUR_KEY'} 71 | ) 72 | return resp.json().get('subdomains', []) 73 | ``` 74 | 75 | Alternatively, use **Redis** for distributed caching in multi-process setups. 76 | 77 | --- 78 | 79 | ## 3. Rate Limiting 80 | 81 | Respect provider limits to avoid bans or API throttling. 82 | 83 | ### Using `ratelimit` Decorator 84 | 85 | ```python 86 | from ratelimit import limits, sleep_and_retry 87 | 88 | # 30 calls per minute 89 | ONE_MINUTE = 60 90 | 91 | @sleep_and_retry 92 | @limits(calls=30, period=ONE_MINUTE) 93 | def query_shodan(query): 94 | return shodan_client.search(query) 95 | 96 | # Now calls to query_shodan will automatically sleep to respect rate limits 97 | ``` 98 | 99 | Or manually insert delays: 100 | 101 | ```python 102 | import time 103 | 104 | for idx, domain in enumerate(domains): 105 | process(domain) 106 | time.sleep(2) # 2-second pause between requests 107 | ``` 108 | 109 | --- 110 | 111 | ## 4. Logging & Monitoring 112 | 113 | - **Structured Logging**: Use `logging` module with JSON formatter for easy parsing. 114 | - **Metrics**: Track success/failure counts, API call rates, and latencies. 115 | - **Alerts**: Integrate with Slack or email on errors or threshold breaches. 116 | 117 | ```python 118 | import logging 119 | logging.basicConfig( 120 | format='%(asctime)s %(levelname)s %(message)s', 121 | level=logging.INFO 122 | ) 123 | logging.info('Starting passive OSINT scan for %s', domain) 124 | ``` 125 | 126 | --- 127 | 128 | # Ethical Considerations 129 | 130 | Even passive OSINT carries responsibilities. Follow these guidelines to stay within legal and ethical boundaries. 131 | 132 | ## 1. Legal & Terms of Service 133 | 134 | - **Review ToS** of each site or API: many forbid automated scraping. 135 | - **Copyright & Privacy Laws**: Respect jurisdictional regulations (e.g., GDPR, CCPA). 136 | - **No Unauthorized Access**: Stick strictly to **public** data; do not log in or bypass paywalls. 137 | 138 | ## 2. Robots.txt & Crawl Policies 139 | 140 | Honor `robots.txt` directives—even if not legally binding, it reflects site owner preferences. 141 | 142 | ```python 143 | import urllib.robotparser 144 | 145 | rp = urllib.robotparser.RobotFileParser() 146 | rp.set_url('https://example.com/robots.txt') 147 | rp.read() 148 | if rp.can_fetch('*', '/path'): 149 | # safe to proceed 150 | pass 151 | else: 152 | # skip scraping this path 153 | pass 154 | ``` 155 | 156 | ## 3. Anonymity & OpSec 157 | 158 | - Use **Tor** or **residential proxies** to avoid IP blocks and protect privacy. 159 | - Separate investigative identities from personal accounts. 160 | - Use **dedicated VMs** or containers for sensitive research. 161 | 162 | ## 4. Data Protection 163 | 164 | - **Encrypt** sensitive findings at rest (e.g., PGP, disk encryption). 165 | - **Access Controls**: Restrict who can view collected intelligence. 166 | - **Retention Policies**: Delete or anonymize data once it’s no longer needed. 167 | 168 | ## 5. Transparency & Accountability 169 | 170 | - **Log Sources & Timestamps**: Maintain audit trails for all data collection. 171 | - **Report Responsibly**: If you discover vulnerabilities or breaches, follow responsible disclosure guidelines. 172 | - **Community Standards**: Engage with OSINT communities (e.g., Tactical Tech, OSINT Curious) to share best practices. 173 | 174 | --- 175 | 176 | > **Remember:** Ethical OSINT builds trust and credibility. Always prioritize legality, privacy, and respect for site owners while pursuing intelligence gathering. 177 | -------------------------------------------------------------------------------- /website/docs/07-resources.md: -------------------------------------------------------------------------------- 1 | --- 2 | id: 07-resources 3 | title: Additional Resources 4 | --- 5 | 6 | # Resources & Further Reading 7 | 8 | Below are curated portals, datasets, and tool repositories to deepen your passive OSINT research. 9 | 10 | --- 11 | 12 | ## Curated Toollists 13 | 14 | - **Awesome OSINT** 15 | A community-maintained, _curated list_ of hundreds of OSINT tools and resources, organized by category (search engines, social media, code search, geospatial, and more). Ideal for discovering new utilities and keeping track of updates. 16 | GitHub: https://github.com/jivoi/awesome-osint citeturn1search0 17 | 18 | - **Kali Linux Tools** 19 | The official catalog of pre-installed and supported security and OSINT utilities in the Kali Linux distribution, including `theHarvester`, `recon-ng`, `nmap`, and dozens more. An excellent reference for combined toolsets and installation commands. 20 | Web UI: https://www.kali.org/tools/ citeturn2search0 21 | 22 | --- 23 | 24 | ## Language & Platform Releases 25 | 26 | - **Python Releases** 27 | Official CPython distributions for Windows, macOS, Linux, and source code archives. Get the latest stable versions (3.11.x, 3.12.x, 3.13.x) along with release notes and cryptographic verification details. 28 | Download page: https://www.python.org/downloads/ citeturn3search0 29 | 30 | --- 31 | 32 | ## Large-Scale Public Datasets 33 | 34 | - **Common Crawl** 35 | A **multi-petabyte** repository of web crawl data in WARC format, freely available via AWS S3 (`s3://commoncrawl/`) or HTTP(S) through CloudFront. Updated monthly, with archives dating back to 2008—perfect for historical web analysis and large-scale link graph construction. 36 | Overview: https://commoncrawl.org/overview citeturn5search3turn5search1 37 | 38 | - **GDELT (Global Database of Events, Language & Tone)** 39 | Monitors broadcast, print, and web news in 100+ languages to produce a real-time database of global events, sentiment, and network graphs. Accessible via CSV dumps, BigQuery, or the GDELT Analysis Service for querying and visualization. 40 | Project site: https://www.gdeltproject.org/ citeturn7search0turn7search3 41 | 42 | --- 43 | 44 | > **Tip:** Bookmark these pages and subscribe to their RSS or mailing lists to stay updated on new tool releases, dataset snapshots, and API changes. 45 | -------------------------------------------------------------------------------- /website/docs/advanced-guide.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Advanced Guide 3 | slug: /advanced-guide 4 | --- 5 | 6 | # Advanced Guide 7 | 8 | You’ve built your foundation — now it’s time to snap together advanced Lego sets! Here you’ll learn how to automate, scale, and ethically handle OSINT at pro level. 9 | 10 | ## 🤖 Automation Pipelines 11 | - **Why automate?** 12 | - Manual OSINT is slow. Automation lets you cover more ground, faster. 13 | - Analogy: Like building a Lego conveyor belt that assembles blocks for you. 14 | - **Example: Automated domain footprinting** 15 | 16 | ```python 17 | import subprocess 18 | import sys 19 | # Run sublist3r to enumerate subdomains 20 | subdomains = subprocess.check_output([sys.executable, '-m', 'sublist3r', '-d', 'example.com']) 21 | print(subdomains.decode()) 22 | ``` 23 | - **Workflow tools**: Use cron jobs, Makefiles, or even GitHub Actions to run scripts on a schedule. 24 | 25 | ## 🕸️ Threat Intelligence 26 | - **What is it?** 27 | - Collecting, analyzing, and sharing info about cyber threats. 28 | - **Lego analogy**: Like building a radar tower to spot incoming threats. 29 | - **Python for threat intel**: 30 | - Parse threat feeds (STIX, MISP, AlienVault OTX) 31 | - Correlate indicators (IP, domain, hash) 32 | - **Example: Fetching threat indicators** 33 | 34 | ```python 35 | import requests 36 | url = 'https://otx.alienvault.com/api/v1/indicators/export' 37 | data = requests.get(url).text 38 | print(data[:500]) # show a sample 39 | ``` 40 | 41 | ## ⚖️ Ethics & Emerging Tools 42 | - **Ethical OSINT**: Always respect privacy, legality, and intent. 43 | - Don’t use OSINT for harassment, doxing, or illegal activity. 44 | - Analogy: Use your Lego blocks to build, not destroy. 45 | - **Emerging Tools**: 46 | - Maltego, SpiderFoot, Shodan, Censys, and more. 47 | - AI and ML for pattern recognition. 48 | 49 | ## 🧠 Pro Tips 50 | - Document everything! Use README files, diagrams, and code comments. 51 | - Share your Lego builds (scripts, workflows) with the community. 52 | - Stay up to date: Follow OSINT Twitter, Reddit, and GitHub repos. 53 | - Automate responsibly — test scripts on your own assets first. 54 | 55 | ## 🚀 Next Steps 56 | - Contribute your own automation scripts or threat feeds to the [Showcase](./showcase). 57 | - Suggest improvements or new guides in the [Contributing](./contributing) doc. 58 | - Stay curious, keep building, and help others level up! 59 | -------------------------------------------------------------------------------- /website/docs/beginner-guide.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Beginner Guide 3 | slug: /beginner-guide 4 | --- 5 | 6 | # Beginner Guide 7 | 8 | Welcome! This is your launchpad if you’re new to OSINT or Python. We’ll use simple building blocks, lots of analogies, and hands-on examples. 9 | 10 | ## 🕵️‍♂️ What is OSINT? 11 | - **Open Source Intelligence (OSINT)** is like being a digital detective using only publicly available info (websites, social media, docs, etc.). 12 | - Think: Google searches, public records, social profiles, and more. 13 | 14 | ## 🐍 Python Basics for OSINT 15 | - Python is your “toolbox” — easy to learn, super flexible. 16 | - You don’t need to be a coder! Here’s a basic Lego block: 17 | 18 | ```python 19 | print("Hello, OSINT world!") 20 | ``` 21 | 22 | - You can run this with [Repl.it](https://replit.com/) or your terminal. 23 | 24 | ## 🦺 Safe OSINT Practices 25 | - Never break the law. Only use public info. 26 | - Use a VPN or privacy browser if you’re researching sensitive topics. 27 | - Respect privacy and ethics. Don’t stalk, harass, or dox anyone. 28 | 29 | ## 🧩 Lego Block Recipes 30 | - **Google Dorking** (finding hidden info with Google): 31 | ```text 32 | site:gov filetype:pdf "confidential" 33 | ``` 34 | - **Python: Download a webpage** 35 | ```python 36 | import requests 37 | r = requests.get('https://example.com') 38 | print(r.text) 39 | ``` 40 | - **Find all links on a page** 41 | ```python 42 | from bs4 import BeautifulSoup 43 | import requests 44 | url = 'https://example.com' 45 | soup = BeautifulSoup(requests.get(url).text, 'html.parser') 46 | for link in soup.find_all('a'): 47 | print(link.get('href')) 48 | ``` 49 | 50 | ## 🖼️ Visual Explainers 51 | - [ ] Add diagrams showing how info flows from the web to your screen 52 | - [ ] Add screenshots of tools in action 53 | 54 | ## 🚀 Next Steps 55 | - Try the code above! 56 | - Move on to the [Intermediate Guide](./intermediate-guide) when you’re ready for more. 57 | 58 | If you get stuck, check the [FAQ](./faq) or ask for help on GitHub. 59 | -------------------------------------------------------------------------------- /website/docs/contributing.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Contributing 3 | slug: /contributing 4 | --- 5 | 6 | # Contributing 7 | 8 | Want to help make this the best OSINT resource? You’re awesome! Here’s how you can contribute, step by step: 9 | 10 | ## 🧩 1. Find Something to Improve 11 | - Is a doc confusing? Suggest edits! 12 | - Have a better code snippet or tool? Add it! 13 | - Want to translate or make the site more accessible? Yes please! 14 | - Built a cool OSINT workflow? Share in the [Showcase](./showcase)! 15 | 16 | ## 🛠️ 2. How to Contribute 17 | - **Docs**: Edit any `.md` file or suggest new guides/recipes. 18 | - **Code**: Improve scripts, fix bugs, or add new automation. 19 | - **Translations**: Help make guides available in more languages. 20 | - **Accessibility**: Suggest alt text, color tweaks, or structure changes. 21 | - **Community**: Share your story, workflow, or tools. 22 | 23 | ## 📝 3. Steps (Lego Block Style) 24 | 1. Fork the repo on [GitHub](https://github.com/tegridydev/python-OSINT-notebook) 25 | 2. Make your changes (docs, code, etc.) 26 | 3. Commit with a clear message (e.g. `fix: clearer intro for beginners`) 27 | 4. Open a Pull Request (PR) — describe what you changed and why 28 | 5. Wait for feedback or approval (we’re friendly!) 29 | 30 | 31 | ## 🏗️ Style Guide 32 | - Use simple language and lots of examples 33 | - Prefer code blocks and checklists 34 | - Be kind and inclusive 35 | 36 | ## 📢 Questions or Ideas? 37 | - Open an issue or PR on GitHub 38 | - Or just say hi and share what you’re building! 39 | 40 | See [CONTRIBUTING.md](https://github.com/tegridydev/python-OSINT-notebook/blob/main/CONTRIBUTING.md) for coding and docs style details. 41 | -------------------------------------------------------------------------------- /website/docs/extra-passive-sources.md: -------------------------------------------------------------------------------- 1 | --- 2 | id: extra-passive-sources 3 | title: PythonOSINT101 › Additional Passive Data Sources 4 | --- 5 | 6 | # Additional Passive Data Sources (No API Keys) 7 | 8 | Beyond the core APIs and tools covered earlier, here’s a curated list of **public**, **read-only**, **no-key** data sources—ideal for completely passive OSINT. 9 | 10 | --- 11 | 12 | ## Web Archives & Snapshots 13 | 14 | 1. **Archive.today** 15 | - Create or retrieve point-in-time captures of any URL. 16 | - Web UI: https://archive.today 17 | - CLI example (via cURL): 18 | ```bash 19 | curl -X POST -d url=https://example.com https://archive.today/submit/ 20 | ``` 21 | 22 | 2. **Wayback Machine** 23 | - Official Internet Archive snapshots. 24 | - API endpoint: 25 | ``` 26 | http://archive.org/wayback/available?url=example.com 27 | ``` 28 | 29 | 3. **WebCitation** 30 | - Alternative archive for pages not in Wayback. 31 | - Web UI: https://webcitation.org/ 32 | 33 | 4. **Memento TimeGate** 34 | - Unified interface to multiple archives. 35 | - URL pattern: 36 | ``` 37 | https://timetravel.mementoweb.org/timegate/example.com 38 | ``` 39 | 40 | --- 41 | 42 | ## DNS & Certificate History 43 | 44 | 5. **crt.sh (Certificate Transparency Logs)** 45 | - Query historical certificates & subdomains. 46 | - JSON output: 47 | ``` 48 | https://crt.sh/?q=%25example.com&output=json 49 | ``` 50 | 51 | 6. **Common Crawl** 52 | - Petabytes of web crawl data available on AWS S3. 53 | - S3 bucket: `s3://commoncrawl/` 54 | - Use AWS CLI or tools like `warcprox` to filter WARC files offline. 55 | 56 | 7. **DNSDumpster** 57 | - Passive DNS enumeration (subdomains, MX, NS). 58 | - Web UI: https://dnsdumpster.com/ 59 | 60 | 8. **SecurityTrails (Free Tier)** 61 | - Passive DNS, WHOIS, and subdomain history without active scanning. 62 | - Try the free REST API at https://securitytrails.com/corp/api 63 | 64 | --- 65 | 66 | ## Code & Webpage Search 67 | 68 | 9. **PublicWWW** 69 | - Search HTML/JS/CSS code across indexed sites. 70 | - Web UI: https://publicwww.com/ 71 | 72 | 10. **Sourcegraph** 73 | - Universal code search across public repositories. 74 | - Web UI: https://sourcegraph.com/ 75 | 76 | 11. **GitHub Web Search** 77 | - Dork for secrets or references: 78 | ``` 79 | site:github.com/example.com in:file api_key 80 | ``` 81 | 82 | --- 83 | 84 | ## Data Dumps & Social Archives 85 | 86 | 12. **Pushshift.io (Reddit)** 87 | - Complete Reddit comment & post archives in JSON. 88 | - Download: https://files.pushshift.io/reddit/ 89 | 90 | 13. **GDELT Project** 91 | - Global event and media datasets (CSV, JSON). 92 | - Browse: https://data.gdeltproject.org/ 93 | 94 | 14. **PGP Keyservers** 95 | - Search public PGP keys by email or name. 96 | - Web UI: https://keys.openpgp.org/ 97 | 98 | --- 99 | 100 | ## Search Engine Techniques 101 | 102 | 15. **Google Cache** 103 | - View Google’s latest cached copy: 104 | ``` 105 | cache:example.com 106 | ``` 107 | 108 | 16. **Bing Cache** 109 | - Access via URL: 110 | ``` 111 | https://cc.bingj.com/cache.aspx?q=example.com 112 | ``` 113 | 114 | 17. **Search Dorks** 115 | - File-type targeting, e.g.: 116 | ``` 117 | site:example.com filetype:pdf 118 | ``` 119 | - Combine with `inurl:`, `intitle:`, etc., for precision. 120 | 121 | --- 122 | 123 | > **Tip:** Combine these sources programmatically via Python’s `requests` or `httpx` libraries and parse JSON or HTML to integrate them into your passive OSINT pipelines. 124 | 125 | ```python 126 | import requests 127 | 128 | # Example: fetch crt.sh subdomains 129 | resp = requests.get('https://crt.sh/?q=%25example.com&output=json') 130 | domains = {entry['name_value'] for entry in resp.json()} 131 | print(domains) 132 | -------------------------------------------------------------------------------- /website/docs/faq.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: FAQ 3 | slug: /faq 4 | --- 5 | 6 | # Frequently Asked Questions 7 | 8 | ## ❓ Who is this for? 9 | Anyone interested in OSINT 10 | 11 | ## 🧠 How is this site structured? 12 | We use: 13 | - Lego block analogies (build in small steps) 14 | - Visuals, code samples, and checklists 15 | - Clear, consistent structure 16 | - No gatekeeping — all are welcome 17 | 18 | ## 🛠️ How can I contribute? 19 | - See the [Contributing](./contributing) page 20 | - Submit guides, scripts, or improvements via GitHub 21 | - Share your workflows in the [Showcase](./showcase) 22 | 23 | ## 🚦 Where do I start? 24 | - Check the [Start Here](./start-here) page 25 | - Pick your skill level guide (Beginner, Intermediate, Advanced) 26 | 27 | ## 🐍 Do I need to know Python? 28 | Nope! The Beginner Guide covers Python basics. You can copy-paste code and learn as you go. 29 | 30 | ## 🕵️‍♂️ Is OSINT legal? 31 | - Yes, as long as you only use public info and respect privacy/laws. 32 | - Don’t dox, stalk, or harass anyone. 33 | 34 | ## 🧩 What if I get stuck or overwhelmed? 35 | - Take breaks — learning is like building Lego, one block at a time 36 | - Ask for help via GitHub issues 37 | - Use the search bar for quick answers 38 | 39 | ## 🌍 Can I use this in languages other than English? 40 | - We’re working on translations! Want to help? See [Contributing](./contributing). 41 | 42 | ## 🏗️ Can I use this for my own project or training? 43 | - Yes! Everything here is open source (see LICENSE). Fork, remix, share — just credit the project. 44 | 45 | ## ❓ More questions? 46 | Open an issue on GitHub or suggest an edit! 47 | -------------------------------------------------------------------------------- /website/docs/installation/03-01-env-setup.md: -------------------------------------------------------------------------------- 1 | --- 2 | id: 03-01-env-setup 3 | title: Installation & Config › Environment Setup 4 | --- 5 | 6 | # Environment Setup 7 | 8 | A consistent, isolated environment ensures reliable installs, reproducible builds, and clean dependency management. 9 | 10 | --- 11 | 12 | ## 1. Python & Virtual Environment 13 | 14 | 1. **Install Python ≥ 3.10** 15 | - Official installer: https://python.org/downloads/ 16 | - Linux (Debian/Ubuntu): 17 | ```bash 18 | sudo apt update && sudo apt install python3 python3-venv python3-pip 19 | ``` 20 | - macOS (Homebrew): 21 | ```bash 22 | brew install python@3.10 23 | ``` 24 | 25 | 2. **Create & Activate a Virtual Environment** 26 | ```bash 27 | python3 -m venv .venv 28 | source .venv/bin/activate # Linux/macOS 29 | .venv\Scripts\activate # Windows (PowerShell) 30 | pip install --upgrade pip 31 | ``` 32 | 33 | 3. **Lock Dependencies** 34 | ```bash 35 | pip install -r requirements.txt # your project dependencies 36 | pip freeze > requirements.txt # update lock file 37 | ``` 38 | 39 | --- 40 | 41 | ## 2. Node.js & npm (for Docusaurus) 42 | 43 | Docusaurus requires Node.js and npm to build the documentation site. 44 | 45 | 1. **Install Node.js ≥ 16** 46 | - Official installer: https://nodejs.org/ 47 | - macOS (Homebrew): 48 | ```bash 49 | brew install node@16 50 | ``` 51 | - Linux (NodeSource): 52 | ```bash 53 | curl -fsSL https://deb.nodesource.com/setup_16.x | sudo -E bash - 54 | sudo apt install nodejs 55 | ``` 56 | 57 | 2. **Verify Versions** 58 | ```bash 59 | node -v # should output v16.x or higher 60 | npm -v # should output corresponding npm version 61 | ``` 62 | 63 | 3. **Install Docusaurus Dependencies** 64 | In the `website/` directory: 65 | ```bash 66 | npm install 67 | ``` 68 | 69 | --- 70 | 71 | ## 3. Optional Tooling 72 | 73 | - **Git**: Version control 74 | ```bash 75 | git --version 76 | ``` 77 | - **Docker**: Containerized builds (optional) 78 | ```bash 79 | docker --version 80 | docker-compose --version 81 | ``` 82 | - **Yarn** (alternative to npm): 83 | ```bash 84 | npm install -g yarn 85 | yarn install 86 | ``` 87 | 88 | --- 89 | 90 | ## 4. Quick Sanity Checks 91 | 92 | After installing: 93 | 94 | ```bash 95 | # From project root 96 | source .venv/bin/activate 97 | 98 | # In website/ folder 99 | npm run clear # remove stale builds 100 | npm run start # spin up local server at http://localhost:3000 101 | ``` 102 | 103 | If you see your docs homepage and navigation, your environment is correctly configured! -------------------------------------------------------------------------------- /website/docs/installation/03-02-linux.md: -------------------------------------------------------------------------------- 1 | --- 2 | id: 03-02-linux 3 | title: Installation & Config › Linux 4 | --- 5 | 6 | # Installing on Linux 7 | 8 | This section covers setting up your OSINT environment on popular Linux distributions. 9 | 10 | --- 11 | 12 | ## Debian & Ubuntu 13 | 14 | 1. **Update package lists** 15 | ```bash 16 | sudo apt update 17 | ``` 18 | 2. **Install Python, Git, and cURL** 19 | ```bash 20 | sudo apt install -y python3-venv python3-pip git curl 21 | ``` 22 | 3. **Optional: Go for Go-based tools** 23 | ```bash 24 | sudo apt install -y golang 25 | ``` 26 | 4. **Python-based OSINT tools** 27 | ```bash 28 | pip install --upgrade pip 29 | pip install spiderfoot osrframework theHarvester dnsrecon 30 | ``` 31 | 32 | --- 33 | 34 | ## Kali Linux 35 | 36 | Kali comes with many OSINT tools pre-installed. 37 | 38 | 1. **Update packages** 39 | ```bash 40 | sudo apt update && sudo apt upgrade -y 41 | ``` 42 | 2. **Verify pre-installed tools** 43 | ```bash 44 | which recon-ng theHarvester spiderfoot 45 | ``` 46 | 3. **Install any missing utilities** 47 | ```bash 48 | sudo apt install -y dnsrecon 49 | pip install osrframework 50 | ``` 51 | 52 | --- 53 | 54 | ## Red Hat / CentOS / Fedora 55 | 56 | 1. **Enable EPEL (CentOS/RHEL)** 57 | ```bash 58 | sudo yum install -y epel-release 59 | ``` 60 | 2. **Install Python & Git** 61 | ```bash 62 | sudo yum install -y python3 python3-venv python3-pip git curl # RHEL/CentOS 63 | sudo dnf install -y python3 python3-venv python3-pip git curl # Fedora 64 | ``` 65 | 3. **Install Go (for Go-based tools)** 66 | ```bash 67 | sudo yum install -y golang # or `sudo dnf install -y golang` 68 | ``` 69 | 4. **Install Python tools** 70 | ```bash 71 | pip3 install --upgrade pip 72 | pip3 install spiderfoot osrframework theHarvester dnsrecon 73 | ``` 74 | 75 | --- 76 | 77 | ## Arch Linux 78 | 79 | 1. **Install core packages** 80 | ```bash 81 | sudo pacman -Sy --needed python python-virtualenv git curl go 82 | ``` 83 | 2. **Install OSINT tools from AUR or pip** 84 | ```bash 85 | # Example: via pip 86 | pip install spiderfoot osrframework 87 | ``` 88 | 89 | --- 90 | 91 | ## Docker on Linux 92 | 93 | Running OSINT tools in containers isolates dependencies: 94 | 95 | ```yaml 96 | # docker-compose.yml 97 | version: '3.8' 98 | services: 99 | spiderfoot: 100 | image: smicallef/spiderfoot 101 | ports: 102 | - "5001:5001" 103 | recon-ng: 104 | image: reconly/recon-ng 105 | command: recon-ng 106 | ``` 107 | 108 | ```bash 109 | # Launch containers 110 | docker-compose up -d 111 | ``` 112 | 113 | --- 114 | 115 | > **Tip:** Use a non-root user in Docker (`--user $(id -u):$(id -g)`) or create a dedicated `osint` user on your host to run these tools securely. 116 | -------------------------------------------------------------------------------- /website/docs/installation/03-03-windows.md: -------------------------------------------------------------------------------- 1 | --- 2 | id: 03-03-windows 3 | title: Installation & Config › Windows 4 | --- 5 | 6 | # Installing on Windows 7 | 8 | This guide covers setting up your Python OSINT environment on Windows, including native and WSL2 approaches. 9 | 10 | --- 11 | 12 | ## 1. Install Python & pip 13 | 14 | 1. **Download** the latest Python 3.10+ installer from 15 | https://www.python.org/downloads/windows/ 16 | 2. **Run the installer** and **check** “Add Python to PATH”. 17 | 3. **Verify** in PowerShell or CMD: 18 | ```powershell 19 | python --version # e.g. Python 3.10.x 20 | pip --version 21 | ``` 22 | 23 | --- 24 | 25 | ## 2. Enable WSL2 (Recommended) 26 | 27 | For the most Linux-compatible tooling, install WSL2: 28 | 29 | ```powershell 30 | # In an elevated PowerShell prompt: 31 | wsl --install 32 | ``` 33 | 34 | - By default installs Ubuntu. 35 | - **Restart** when prompted. 36 | - **Launch** your Linux distro from the Start menu and complete setup. 37 | 38 | From within WSL2, follow the Linux instructions in [03-02-linux](03-02-linux). 39 | 40 | --- 41 | 42 | ## 3. Install OSINT Tools via pip (Native Windows) 43 | 44 | Once Python is set up: 45 | 46 | ```powershell 47 | # Upgrade pip 48 | python -m pip install --upgrade pip 49 | 50 | # Install core tools 51 | pip install \ 52 | spiderfoot \ 53 | recon-ng \ 54 | snscrape \ 55 | osrframework \ 56 | theHarvester \ 57 | dnsrecon 58 | ``` 59 | 60 | - Some tools (e.g., SpiderFoot CLI) may require additional dependencies; consult their docs. 61 | 62 | --- 63 | 64 | ## 4. PowerShell Execution Policy 65 | 66 | Allow running local scripts if you encounter policy errors: 67 | 68 | ```powershell 69 | Set-ExecutionPolicy RemoteSigned -Scope CurrentUser 70 | ``` 71 | 72 | --- 73 | 74 | ## 5. Optional: Chocolatey Package Manager 75 | 76 | Chocolatey simplifies Windows installs: 77 | 78 | ```powershell 79 | # Install Chocolatey (in elevated PS): 80 | Set-ExecutionPolicy Bypass -Scope Process -Force; ` 81 | [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; ` 82 | iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1')) 83 | 84 | # Use Chocolatey to install Python and Git: 85 | choco install python git -y 86 | ``` 87 | 88 | --- 89 | 90 | ## 6. Verify Your Setup 91 | 92 | ```powershell 93 | # Check key commands 94 | spiderfoot --version 95 | recon-ng --version 96 | sherlock --help 97 | snscrape --version 98 | ``` 99 | 100 | After this, you’re ready to run passive OSINT workflows on Windows. 101 | 102 | -------------------------------------------------------------------------------- /website/docs/installation/03-04-docker.md: -------------------------------------------------------------------------------- 1 | --- 2 | id: 03-04-docker 3 | title: Installation & Config › Docker 4 | --- 5 | 6 | # Installing via Docker 7 | 8 | Containerizing your OSINT tools ensures consistent environments, isolates dependencies, and simplifies cleanup. 9 | 10 | --- 11 | 12 | ## Prerequisites 13 | 14 | - **Docker**: Install Docker Desktop on Windows/macOS, or Docker Engine on Linux. 15 | - **Docker Compose** (optional but recommended): Simplifies managing multi-container setups. 16 | 17 | --- 18 | 19 | ## Running SpiderFoot 20 | 21 | SpiderFoot provides an official Docker image for quick startup: 22 | 23 | ```bash 24 | docker run --rm -p 5001:5001 smicallef/spiderfoot 25 | ``` 26 | 27 | - **--rm**: Remove container after exit 28 | - **-p 5001:5001**: Map container port 5001 to localhost:5001 29 | 30 | Once running, open your browser at: 31 | ``` 32 | http://localhost:5001 33 | ``` 34 | 35 | --- 36 | 37 | ## Running Recon-ng 38 | 39 | There are community Docker images for Recon-ng. Example: 40 | 41 | ```bash 42 | docker run --rm -it \ 43 | -v $(pwd)/data:/workspace/data \ 44 | reconly/recon-ng:latest 45 | ``` 46 | 47 | - **-it**: Interactive terminal 48 | - **-v …/data:/workspace/data**: Persist scan results on host 49 | 50 | Within the container, start Recon-ng: 51 | 52 | ```bash 53 | recon-ng 54 | ``` 55 | 56 | --- 57 | 58 | ## Docker Compose Example 59 | 60 | Use Docker Compose to manage multiple OSINT services together: 61 | 62 | ```yaml 63 | # docker-compose.yml 64 | version: '3.8' 65 | services: 66 | spiderfoot: 67 | image: smicallef/spiderfoot 68 | ports: 69 | - "5001:5001" 70 | restart: unless-stopped 71 | 72 | recon-ng: 73 | image: reconly/recon-ng:latest 74 | volumes: 75 | - ./data/recon-ng:/root/.recon-ng 76 | entrypoint: ["recon-ng", "--no-banner"] 77 | restart: on-failure 78 | 79 | social-analyzer: 80 | image: qeeqbox/social-analyzer:latest 81 | volumes: 82 | - ./data/social-analyzer:/app/output 83 | command: ["python3","main.py","--domain","example.com","--module","social_media"] 84 | restart: "no" 85 | ``` 86 | 87 | - **volumes**: Mount host directories for persistent storage 88 | - **restart**: Auto-restart policies 89 | 90 | Launch all services: 91 | 92 | ```bash 93 | docker-compose up -d 94 | ``` 95 | 96 | --- 97 | 98 | ## Tips & Best Practices 99 | 100 | - **Environment Variables**: Pass API keys securely via `docker run -e` or a `.env` file loaded by Compose. 101 | - **Data Persistence**: Always mount volumes for logs, databases, and reports. 102 | - **Networking**: Use Docker networks to link containers and restrict external exposure. 103 | - **Cleanup**: 104 | ```bash 105 | docker-compose down --volumes 106 | docker system prune --volumes 107 | ``` 108 | 109 | --- 110 | -------------------------------------------------------------------------------- /website/docs/installation/03-05-post-install.md: -------------------------------------------------------------------------------- 1 | --- 2 | id: 03-05-post-install 3 | title: Installation & Config › Post-Installation 4 | --- 5 | 6 | # Post-Installation Configuration 7 | 8 | After installing your OSINT tools and environment, perform these final steps to secure and optimize your setup. 9 | 10 | --- 11 | 12 | ## 1. Configure API Keys 13 | 14 | Many passive OSINT tools require API keys for enhanced data sources: 15 | 16 | 1. **Create a `.env` file** in your project root: 17 | ```dotenv 18 | SHODAN_API_KEY=your_shodan_key 19 | CENSYS_API_ID=your_censys_id 20 | CENSYS_API_SECRET=your_censys_secret 21 | SECURITYTRAILS_API_KEY=your_securitytrails_key 22 | GITHUB_TOKEN=your_github_token 23 | HIBP_API_KEY=your_hibp_key 24 | IPINFO_TOKEN=your_ipinfo_token 25 | ``` 26 | 2. **Load variables** automatically when you activate your venv. Add to `~/.bashrc` or `~/.zshrc`: 27 | ```bash 28 | export $(grep -v '^#' /path/to/your/.env | xargs) 29 | ``` 30 | 3. **Verify** in your shell: 31 | ```bash 32 | echo $SHODAN_API_KEY 33 | ``` 34 | 35 | --- 36 | 37 | ## 2. Configure Tor & Proxies 38 | 39 | Use Tor or other proxies to anonymize and route requests: 40 | 41 | 1. **Install Tor** (Linux): 42 | ```bash 43 | sudo apt update && sudo apt install tor 44 | ``` 45 | 2. **Start the Tor service**: 46 | ```bash 47 | sudo systemctl start tor 48 | ``` 49 | 3. **Set environment variables** for your shell session: 50 | ```bash 51 | export HTTP_PROXY="socks5h://127.0.0.1:9050" 52 | export HTTPS_PROXY="socks5h://127.0.0.1:9050" 53 | ``` 54 | 4. **Test** connectivity: 55 | ```bash 56 | curl --proxy socks5h://127.0.0.1:9050 https://check.torproject.org/api/ip 57 | ``` 58 | 59 | --- 60 | 61 | ## 3. Database & Storage Configuration 62 | 63 | ### Recon-ng (SQLite) 64 | 65 | - Default workspace DB: `~/.recon-ng//data/recon.db` 66 | - **Backup**: 67 | ```bash 68 | cp ~/.recon-ng/default/data/recon.db ~/osint-backups/recon.db.bak 69 | ``` 70 | 71 | ### Maltego (MongoDB) 72 | 73 | - Community edition uses MongoDB for transform results: 74 | - **Install MongoDB**: 75 | ```bash 76 | sudo apt install -y mongodb 77 | sudo systemctl enable --now mongodb 78 | ``` 79 | - **Configure Maltego** to connect to `mongodb://localhost:27017`. 80 | 81 | ### Custom Storage 82 | 83 | - Use **PostgreSQL** or **TimescaleDB** for large-scale: 84 | - **Example**: 85 | ```bash 86 | sudo apt install postgresql 87 | sudo -u postgres createuser osintuser 88 | sudo -u postgres createdb osintdb 89 | ``` 90 | - Configure your Python scripts via `SQLALCHEMY_DATABASE_URI`. 91 | 92 | --- 93 | 94 | ## 4. Logging & Monitoring 95 | 96 | - **Centralized Logs**: Configure `logrotate` for OSINT tool logs: 97 | ```bash 98 | /var/log/osint/*.log { 99 | daily 100 | rotate 14 101 | compress 102 | missingok 103 | notifempty 104 | } 105 | ``` 106 | - **Monitoring**: Use **Prometheus** & **Grafana** to track: 107 | - API call counts 108 | - Tool failures 109 | - Data volume processed 110 | 111 | --- 112 | 113 | ## 5. Security & Hardening 114 | 115 | 1. **Least Privilege**: Run OSINT scripts under a dedicated non-root user. 116 | 2. **Network Isolation**: Use Docker networks or VLANs to contain traffic. 117 | 3. **Dependency Audits**: 118 | ```bash 119 | pip install safety 120 | safety check --full-report 121 | ``` 122 | 4. **Regular Updates**: 123 | ```bash 124 | pip list --outdated 125 | pip install --upgrade 126 | ``` 127 | 128 | --- 129 | 130 | ## Next Steps 131 | 132 | Proceed to the **Scripting Examples** section to start writing automated OSINT workflows: 133 | -------------------------------------------------------------------------------- /website/docs/intermediate-guide.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Intermediate Guide 3 | slug: /intermediate-guide 4 | --- 5 | 6 | # Intermediate Guide 7 | 8 | Welcome to the next level! Here you’ll start snapping together more complex Lego blocks — building your own OSINT tools, automating workflows, and solving real-world problems. 9 | 10 | ## 🔎 Tool Deep-Dives 11 | - **Recon-ng**: Modular OSINT framework for automation. 12 | - Lego analogy: Like a big baseplate you can add modules to. 13 | - Try: `pip install recon-ng` 14 | - **theHarvester**: Finds emails/domains from public sources. 15 | - Lego analogy: Like a search radar brick. 16 | - Try: `pip install theHarvester` 17 | - **SpiderFoot**: All-in-one OSINT automation tool (GUI/web). 18 | - Lego analogy: Like a robot that assembles blocks for you. 19 | 20 | ## 🛠️ Scripting Workflows 21 | - Automate repetitive tasks with Python scripts. 22 | - Example: Find all PDF links on a site and download them: 23 | 24 | ```python 25 | import requests 26 | from bs4 import BeautifulSoup 27 | url = 'https://example.com' 28 | soup = BeautifulSoup(requests.get(url).text, 'html.parser') 29 | pdfs = [link.get('href') for link in soup.find_all('a') if link.get('href', '').endswith('.pdf')] 30 | for pdf in pdfs: 31 | print('Downloading:', pdf) 32 | r = requests.get(pdf) 33 | with open(pdf.split('/')[-1], 'wb') as f: 34 | f.write(r.content) 35 | ``` 36 | 37 | ## 🌐 Real-World Scenarios 38 | - **Username checks**: See if a username exists across social platforms. 39 | - **Domain footprinting**: Map out all subdomains and related assets. 40 | - **Data enrichment**: Combine info from multiple sources for a fuller picture. 41 | 42 | ## 🧩 Troubleshooting & Pro Tips 43 | - If a script fails, check: 44 | - Typos in URLs 45 | - Missing Python packages (`pip install ...`) 46 | - Website blocks (try headers or proxies) 47 | - Use print statements to debug (like Lego instructions for each step) 48 | - Don’t be afraid to break things — that’s how you learn! 49 | 50 | ## 🚀 Next Steps 51 | - Try writing your own script for a small OSINT task. 52 | - Move on to the [Advanced Guide](./advanced-guide) for automation pipelines and threat intel. 53 | 54 | If you get stuck, check the [FAQ](./faq) or open an issue on GitHub. 55 | -------------------------------------------------------------------------------- /website/docs/pythonosint101.md: -------------------------------------------------------------------------------- 1 | --- 2 | id: pythonosint101 3 | title: PythonOSINT101 Passive-Only, End-to-End Guide 4 | --- 5 | 6 | # PythonOSINT101: Passive-Only, End-to-End OSINT Workflow 7 | 8 | A no-BS, fully passive OSINT workflow using only public data sources—no port scans, no active probes, and zero legal risk. 9 | 10 | --- 11 | 12 | ## 1. Prerequisites 13 | 14 | 1. **Platform**: Linux / macOS / WSL2 15 | 2. **Python** ≥ 3.10 16 | 3. **Virtual Environment** 17 | ```bash 18 | python3 -m venv .venv 19 | source .venv/bin/activate # Linux/macOS 20 | .venv\Scripts\activate # Windows 21 | pip install --upgrade pip 22 | ``` 23 | 4. **Install Dependencies** 24 | ```bash 25 | pip install \ 26 | python-whois dnspython requests beautifulsoup4 \ 27 | censys shodan securitytrails ipinfo-python PyGithub pyhibp \ 28 | exifread PyPDF2 python-dotenv 29 | ``` 30 | 5. **API Keys** (read-only) in a `.env` file: 31 | ```dotenv 32 | DOMAIN=example.com 33 | USERNAME=johndoe 34 | SHODAN_API_KEY=your_shodan_key 35 | CENSYS_API_ID=your_censys_id 36 | CENSYS_API_SECRET=your_censys_secret 37 | SECURITYTRAILS_API_KEY=your_securitytrails_key 38 | IPINFO_TOKEN=your_ipinfo_token 39 | GITHUB_TOKEN=your_github_token 40 | HIBP_API_KEY=your_hibp_key 41 | ``` 42 | 43 | --- 44 | 45 | ## 2. Passive Domain Recon 46 | 47 | ### 2.1 WHOIS / RDAP 48 | ```python 49 | import whois 50 | from pprint import pprint 51 | 52 | w = whois.whois("example.com") 53 | pprint({ 54 | "Registrar": w.registrar, 55 | "Creation Date": w.creation_date, 56 | "Name Servers": w.name_servers 57 | }) 58 | ``` 59 | 60 | ### 2.2 DNS Records 61 | ```python 62 | import dns.resolver 63 | 64 | records = {} 65 | for rtype in ("A", "NS", "MX", "TXT"): 66 | answers = dns.resolver.resolve("example.com", rtype, lifetime=5) 67 | records[rtype] = [r.to_text() for r in answers] 68 | print(records) 69 | ``` 70 | 71 | --- 72 | 73 | ## 3. Passive Subdomain Enumeration 74 | 75 | ### 3.1 Certificate Logs (crt.sh) 76 | ```python 77 | import requests 78 | 79 | resp = requests.get("https://crt.sh/?q=%25example.com&output=json") 80 | subs = {entry["name_value"] for entry in resp.json()} 81 | print("CRT.sh domains →", subs) 82 | ``` 83 | 84 | ### 3.2 SecurityTrails API (Passive DNS & Historical) 85 | ```python 86 | import os, requests 87 | 88 | API_KEY = os.getenv("SECURITYTRAILS_API_KEY") 89 | hdr = {"APIKEY": API_KEY} 90 | domain = "example.com" 91 | # Historical subdomains 92 | url = f"https://api.securitytrails.com/v1/domain/{domain}/subdomains" 93 | resp = requests.get(url, headers=hdr, params={"children_only":"false"}) 94 | subs = [f"{s}.{domain}" for s in resp.json().get("subdomains",[])] 95 | print("SecurityTrails subdomains →", subs) 96 | ``` 97 | > *SecurityTrails free tier gives read-only access to historical DNS and WHOIS data* citeturn0search4. 98 | 99 | ### 3.3 DNSDumpster (Web-Scrape) 100 | Visit [dnsdumpster.com](https://dnsdumpster.com/) and paste your domain to get passive DNS records, MX, NS, and subdomains. 101 | 102 | --- 103 | 104 | ## 4. SSL & Certificate Transparency 105 | 106 | ### 4.1 Censys Certificates 107 | ```python 108 | from censys.search import CensysCertificates 109 | 110 | cc = CensysCertificates() 111 | results = cc.search("parsed.names: example.com", per_page=50) 112 | ct_domains = {name 113 | for cert in results 114 | for name in cert.get("parsed", {}).get("names", [])} 115 | print("Censys CT domains →", ct_domains) 116 | ``` 117 | 118 | ### 4.2 CertSpotter (SSL Mate) 119 | Use the free [CertSpotter API](https://sslmate.com/certspotter/) to query CT logs without scanning citeturn0search5. 120 | 121 | --- 122 | 123 | ## 5. Web Archive & Caches 124 | 125 | - **Wayback Machine** 126 | ```python 127 | import requests 128 | wb = requests.get("http://archive.org/wayback/available?url=example.com").json() 129 | print("Wayback URL →", wb["archived_snapshots"]["closest"]["url"]) 130 | ``` 131 | - **Archive.today**: Paste URL at https://archive.today. 132 | - **Google Cache**: `cache:example.com` in Google Search. 133 | - **Bing Cache**: `https://cc.bingj.com/cache.aspx?q=example.com`. 134 | - **Memento TimeGate**: 135 | ``` 136 | https://timetravel.mementoweb.org/timegate/example.com 137 | ``` 138 | 139 | --- 140 | 141 | ## 6. Content & Metadata Extraction 142 | 143 | ### 6.1 HTML Metadata 144 | ```python 145 | from bs4 import BeautifulSoup 146 | import requests 147 | 148 | r = requests.get("https://example.com", timeout=5) 149 | soup = BeautifulSoup(r.text, "html.parser") 150 | print("Title →", soup.title.string) 151 | desc = soup.find("meta", {"name":"description"}) 152 | print("Description →", desc and desc["content"]) 153 | ``` 154 | 155 | ### 6.2 PDF Metadata 156 | ```python 157 | from PyPDF2 import PdfReader 158 | 159 | reader = PdfReader(open("report.pdf","rb")) 160 | print("PDF Metadata →", reader.metadata) 161 | ``` 162 | 163 | ### 6.3 Image EXIF 164 | ```python 165 | import exifread 166 | 167 | with open("photo.jpg","rb") as f: 168 | tags = exifread.process_file(f) 169 | for tag,val in tags.items(): 170 | print(f"{tag}: {val}") 171 | ``` 172 | 173 | --- 174 | 175 | ## 7. Social Media & Username Enumeration 176 | 177 | - **Sherlock** (passive HTTP HEAD checks) 178 | ```bash 179 | sherlock johndoe --print-found --timeout 5 180 | ``` 181 | - **snscrape** (no API keys) 182 | ```bash 183 | snscrape twitter-search "from:johndoe since:2024-01-01" --jsonl > tweets.json 184 | ``` 185 | 186 | --- 187 | 188 | ## 8. Passive Threat Intel & Breaches 189 | 190 | ### 8.1 Shodan (Index-Only) 191 | ```python 192 | import os, shodan 193 | 194 | api = shodan.Shodan(os.getenv("SHODAN_API_KEY")) 195 | res = api.search(f"hostname:{os.getenv('DOMAIN')}") 196 | print("Shodan hosts →", [m["ip_str"] for m in res["matches"]]) 197 | ``` 198 | 199 | ### 8.2 Censys IPv4 200 | ```python 201 | from censys.search import CensysIPv4 202 | 203 | ci = CensysIPv4() 204 | hosts = ci.search( 205 | "services.tls.certificates.leaf_data.names: example.com", per_page=20 206 | ) 207 | print("Censys IPs →", [h["ip"] for h in hosts]) 208 | ``` 209 | 210 | ### 8.3 Have I Been Pwned (Email) 211 | ```python 212 | from pyhibp import pwned_account 213 | import os 214 | 215 | email = f"{os.getenv('USERNAME')}@{os.getenv('DOMAIN')}" 216 | breaches = pwned_account(email, api_key=os.getenv("HIBP_API_KEY")) 217 | print("Breaches →", breaches or "None") 218 | ``` 219 | 220 | --- 221 | 222 | ## 9. Code Leak & Data-Exposure 223 | 224 | ### 9.1 GitHub Code Search 225 | ```python 226 | from github import Github 227 | import os 228 | 229 | gh = Github(os.getenv("GITHUB_TOKEN")) 230 | query = f'{os.getenv("DOMAIN")} in:file extension:env' 231 | files = gh.search_code(query, order="desc")[:10] 232 | print("GitHub leaks →", [f"{f.repository.full_name}/{f.path}" for f in files]) 233 | ``` 234 | 235 | ### 9.2 PublicWWW & Sourcegraph 236 | - **PublicWWW**: Search HTML/CSS/JS for your domain at https://publicwww.com/. 237 | - **Sourcegraph**: Browse code references at https://sourcegraph.com/. 238 | 239 | --- 240 | 241 | ## 10. IP Geolocation & Enrichment 242 | 243 | ```python 244 | import ipinfo, os 245 | 246 | handler = ipinfo.getHandler(os.getenv("IPINFO_TOKEN")) 247 | for ip in ["93.184.216.34"]: 248 | info = handler.getDetails(ip) 249 | print(f"{ip} → {info.city}, {info.country_name}") 250 | ``` 251 | 252 | --- 253 | 254 | ## 11. Extra Passive Data Sources 255 | 256 | 1. **Common Crawl**: WARC archives on S3 (`s3://commoncrawl/`). 257 | 2. **GDELT**: Global events CSV dumps at https://data.gdeltproject.org/. 258 | 3. **DNS History**: SecurityTrails `/v1/domain/{domain}/dns/history` citeturn0search7. 259 | 4. **PGP Keyservers**: https://keys.openpgp.org/ 260 | 5. **Pushshift Reddit Dumps**: https://files.pushshift.io/reddit/ 261 | 6. **WebCitation**: https://webcitation.org/ 262 | 7. **StackPrinter**: http://stackprinter.appspot.com/ for StackOverflow Q&A. 263 | 8. **Search-Engine Dorking**: 264 | ``` 265 | site:example.com filetype:pdf 266 | ``` 267 | 268 | --- 269 | 270 | ## 12. Consolidated Script 271 | 272 | Save as `passive_osint101.py`: 273 | 274 | ```python 275 | #!/usr/bin/env python3 276 | import os, csv, whois, dns.resolver, requests, exifread 277 | from bs4 import BeautifulSoup 278 | from censys.search import CensysCertificates, CensysIPv4 279 | import shodan, ipinfo 280 | from github import Github 281 | from pyhibp import pwned_account 282 | from dotenv import load_dotenv 283 | 284 | load_dotenv() 285 | domain = os.getenv("DOMAIN") 286 | username = os.getenv("USERNAME") 287 | 288 | # WHOIS 289 | w = whois.whois(domain) 290 | 291 | # DNS 292 | dns_data = {rt: [r.to_text() for r in dns.resolver.resolve(domain, rt, lifetime=5)] 293 | for rt in ("A", "NS", "MX", "TXT")} 294 | 295 | # CRT.sh 296 | crt = requests.get(f"https://crt.sh/?q=%25{domain}&output=json").json() 297 | crt_subs = {e["name_value"] for e in crt} 298 | 299 | # SecurityTrails 300 | st_key = os.getenv("SECURITYTRAILS_API_KEY") 301 | hdr = {"APIKEY": st_key} 302 | st = requests.get(f"https://api.securitytrails.com/v1/domain/{domain}/subdomains", 303 | headers=hdr).json().get("subdomains", []) 304 | st_subs = [f"{s}.{domain}" for s in st] 305 | 306 | # Wayback 307 | wb = requests.get(f"http://archive.org/wayback/available?url={domain}").json() 308 | wayback_url = wb.get("archived_snapshots", {}).get("closest", {}).get("url","") 309 | 310 | # HTML metadata 311 | r = requests.get(f"https://{domain}", timeout=5) 312 | soup = BeautifulSoup(r.text, "html.parser") 313 | title = soup.title.string if soup.title else "" 314 | 315 | # Shodan 316 | sh = shodan.Shodan(os.getenv("SHODAN_API_KEY")) 317 | sh_ips = [m["ip_str"] for m in sh.search(f"hostname:{domain}")["matches"]] 318 | 319 | # Censys IPv4 320 | ci = CensysIPv4() 321 | censys_ips = [h["ip"] for h in ci.search( 322 | f"services.tls.certificates.leaf_data.names: {domain}", per_page=10 323 | )] 324 | 325 | # HIBP 326 | email = f"{username}@{domain}" 327 | breaches = pwned_account(email, api_key=os.getenv("HIBP_API_KEY")) or [] 328 | 329 | # GitHub code leaks 330 | gh = Github(os.getenv("GITHUB_TOKEN")) 331 | leaks = [f"{f.repository.full_name}/{f.path}" 332 | for f in gh.search_code(f"{domain} in:file extension:env", order="desc")[:10]] 333 | 334 | # Geolocation 335 | iph = ipinfo.getHandler(os.getenv("IPINFO_TOKEN")) 336 | geo = {ip: iph.getDetails(ip).all for ip in sh_ips + censys_ips} 337 | 338 | # Export 339 | with open("passive_osint_report.csv","w",newline="") as csvfile: 340 | wtr = csv.writer(csvfile) 341 | wtr.writerow(["Section","Key","Value"]) 342 | wtr.writerow(["WHOIS","Registrar", w.registrar]) 343 | for rt, vals in dns_data.items(): 344 | wtr.writerow(["DNS", rt, ";".join(vals)]) 345 | wtr.writerow(["CRT","crt.sh", ";".join(crt_subs)]) 346 | wtr.writerow(["ST","SecurityTrails", ";".join(st_subs)]) 347 | wtr.writerow(["Web","Wayback", wayback_url]) 348 | wtr.writerow(["Web","Title", title]) 349 | wtr.writerow(["Threat","ShodanIPs", ";".join(sh_ips)]) 350 | wtr.writerow(["Threat","CensysIPs", ";".join(censys_ips)]) 351 | wtr.writerow(["Breach","HIBP", ";".join([b["Name"] for b in breaches])]) 352 | wtr.writerow(["Leaks","GitHub", ";".join(leaks)]) 353 | for ip, info in geo.items(): 354 | wtr.writerow(["Geo", ip, f"{info.get('city')}, {info.get('country_name')}"]) 355 | ``` 356 | 357 | Make it executable and run: 358 | ```bash 359 | chmod +x passive_osint101.py 360 | ./passive_osint101.py 361 | ``` 362 | 363 | Generates `passive_osint_report.csv` with all passive findings. 364 | 365 | --- 366 | 367 | ## 13. Schedule & Ethics 368 | 369 | - **Cron** (`crontab -e`): 370 | ``` 371 | 0 4 * * * cd /path/to && /path/to/.venv/bin/python passive_osint101.py 372 | ``` 373 | - **Ethics**: 374 | - Only query public, read-only APIs and archives. 375 | - Respect `robots.txt`, API rate limits and Terms of Service. 376 | - Do **not** scan, probe, or fingerprint live hosts. 377 | - Log timestamps and sources for auditability. 378 | ``` 379 | 380 | 381 | -------------------------------------------------------------------------------- /website/docs/scripts/README.md: -------------------------------------------------------------------------------- 1 | # Scripts 2 | 3 | A curated collection of ready-to-go, multi-tool Python OSINT scripts. Drop these into your workflow to automate recon, data collection, and analysis across multiple sources. 4 | 5 | --- 6 | 7 | - Each script is self-contained, well-documented, and designed for beginner friendly, step-by-step usage. 8 | - Scripts use only legal, passive sources and respect rate limits/ToS. 9 | - Copy, run, and customize as needed! 10 | 11 | ## Contents 12 | - [Domain Recon Combo](./domain-recon-combo.md) 13 | - [Social Media Multi-Profile](./social-media-multi-profile.md) 14 | - [Threat Intel Aggregator](./threat-intel-aggregator.md) 15 | - [Passive Metadata Harvester](./passive-metadata-harvester.md) 16 | - [All-in-One Passive Recon](./all-in-one-passive-recon.md) 17 | -------------------------------------------------------------------------------- /website/docs/scripts/all-in-one-passive-recon.md: -------------------------------------------------------------------------------- 1 | # All-in-One Passive Recon 2 | 3 | A super-script that combines domain, social, threat, and metadata checks. Ez, modular, and safe. 4 | 5 | --- 6 | 7 | ## Usage 8 | - Requires: `requests`, `whois`, `rich`, `exifread`, `python-magic` 9 | - Run: `python all_in_one_passive_recon.py ` 10 | 11 | --- 12 | 13 | ```python 14 | import sys 15 | from domain_recon_combo import main as domain_recon 16 | from social_media_multi_profile import main as social_media 17 | from threat_intel_aggregator import main as threat_intel 18 | from passive_metadata_harvester import main as metadata 19 | 20 | if __name__ == "__main__": 21 | if len(sys.argv) < 4: 22 | print("Usage: python all_in_one_passive_recon.py ") 23 | sys.exit(1) 24 | domain = sys.argv[1] 25 | username = sys.argv[2] 26 | urls_file = sys.argv[3] 27 | print("\n=== DOMAIN RECON ===") 28 | domain_recon(domain) 29 | print("\n=== SOCIAL MEDIA ===") 30 | social_media(username) 31 | print("\n=== THREAT INTEL ===") 32 | threat_intel() 33 | print("\n=== METADATA ===") 34 | metadata(urls_file) 35 | ``` 36 | -------------------------------------------------------------------------------- /website/docs/scripts/archive-org-snapshots.md: -------------------------------------------------------------------------------- 1 | # Archive.org Snapshot Finder 2 | 3 | Finds all snapshots for a domain or URL using the Wayback Machine API. 4 | 5 | --- 6 | 7 | ## Usage 8 | - Requires: `requests`, `rich` 9 | - Run: `python archive_org_snapshots.py ` 10 | 11 | --- 12 | 13 | ```python 14 | import sys, requests 15 | from rich import print 16 | 17 | API = "http://web.archive.org/cdx/search/cdx?url={}&output=json&fl=timestamp,original&collapse=urlkey" 18 | 19 | def main(target): 20 | r = requests.get(API.format(target)) 21 | if r.status_code != 200: 22 | print("[red]Not found or error") 23 | return 24 | data = r.json() 25 | print(f"[bold cyan]Snapshots for {target}:[/]") 26 | for entry in data[1:]: 27 | print(f"- {entry[0]} {entry[1]}") 28 | 29 | if __name__ == "__main__": 30 | if len(sys.argv) < 2: 31 | print("Usage: python archive_org_snapshots.py ") 32 | sys.exit(1) 33 | main(sys.argv[1]) 34 | ``` 35 | -------------------------------------------------------------------------------- /website/docs/scripts/domain-recon-combo.md: -------------------------------------------------------------------------------- 1 | # Domain Recon Combo 2 | 3 | A Python script that combines DNS, WHOIS, and SSL certificate transparency lookups for a domain. Outputs everything in a single, easy to digest summary. 4 | 5 | --- 6 | 7 | ## Usage 8 | - Requires: `requests`, `whois`, `rich` 9 | - Run: `python domain_recon_combo.py example.com` 10 | 11 | --- 12 | 13 | ```python 14 | import sys 15 | import requests 16 | import whois 17 | from rich import print 18 | from rich.table import Table 19 | 20 | def fetch_ct_logs(domain): 21 | url = f"https://crt.sh/?q={domain}&output=json" 22 | try: 23 | r = requests.get(url, timeout=10) 24 | r.raise_for_status() 25 | return r.json() 26 | except Exception as e: 27 | return [] 28 | 29 | def fetch_dns(domain): 30 | try: 31 | return requests.get(f"https://dns.google/resolve?name={domain}").json() 32 | except Exception as e: 33 | return {} 34 | 35 | def main(domain): 36 | print(f"[bold cyan]Recon for:[/] [yellow]{domain}[/]") 37 | print("\n[bold]WHOIS:[/]") 38 | try: 39 | w = whois.whois(domain) 40 | print(w) 41 | except Exception as e: 42 | print("[red]WHOIS failed:", e) 43 | print("\n[bold]DNS:[/]") 44 | dns = fetch_dns(domain) 45 | print(dns) 46 | print("\n[bold]CT Logs:[/]") 47 | ct = fetch_ct_logs(domain) 48 | if ct: 49 | for entry in ct[:10]: 50 | print(f"- {entry.get('name_value')}") 51 | else: 52 | print("[red]No CT log entries found.") 53 | 54 | if __name__ == "__main__": 55 | if len(sys.argv) < 2: 56 | print("[red]Usage: python domain_recon_combo.py ") 57 | sys.exit(1) 58 | main(sys.argv[1]) 59 | ``` 60 | -------------------------------------------------------------------------------- /website/docs/scripts/email-breach-checker.md: -------------------------------------------------------------------------------- 1 | # Email Breach Checker 2 | 3 | Checks if an email appears in public breach databases (using HaveIBeenPwned API). 4 | 5 | --- 6 | 7 | ## Usage 8 | - Requires: `requests`, `rich` 9 | - Run: `python email_breach_checker.py ` 10 | 11 | --- 12 | 13 | ```python 14 | import sys, requests 15 | from rich import print 16 | 17 | API = "https://haveibeenpwned.com/api/v3/breachedaccount/{}" 18 | HEADERS = {"User-Agent": "OSINT-Tool", "hibp-api-key": "YOUR_API_KEY"} 19 | 20 | def main(email): 21 | r = requests.get(API.format(email), headers=HEADERS) 22 | if r.status_code == 200: 23 | breaches = r.json() 24 | print(f"[bold cyan]Breaches for {email}:[/]") 25 | for b in breaches: 26 | print(f"- {b['Name']}") 27 | elif r.status_code == 404: 28 | print("[green]No breaches found!") 29 | else: 30 | print(f"[red]Error: {r.status_code}") 31 | 32 | if __name__ == "__main__": 33 | if len(sys.argv) < 2: 34 | print("Usage: python email_breach_checker.py ") 35 | sys.exit(1) 36 | main(sys.argv[1]) 37 | ``` 38 | -------------------------------------------------------------------------------- /website/docs/scripts/favicon-hash-lookup.md: -------------------------------------------------------------------------------- 1 | # Favicon Hash Lookup 2 | 3 | Calculates the hash of a site's favicon and searches Shodan for matching services. 4 | 5 | --- 6 | 7 | ## Usage 8 | - Requires: `requests`, `hashlib`, `shodan`, `rich` 9 | - Needs a Shodan API key (`SHODAN_API_KEY` env var) 10 | - Run: `python favicon_hash_lookup.py ` 11 | 12 | --- 13 | 14 | ```python 15 | import sys, requests, hashlib, base64, os 16 | import shodan 17 | from rich import print 18 | 19 | API_KEY = os.getenv('SHODAN_API_KEY') 20 | api = shodan.Shodan(API_KEY) 21 | 22 | def get_favicon_hash(url): 23 | r = requests.get(url, timeout=10) 24 | if r.status_code != 200: 25 | print("[red]Failed to fetch favicon") 26 | sys.exit(1) 27 | b64 = base64.b64encode(r.content) 28 | h = hashlib.md5(b64).hexdigest() 29 | return h 30 | 31 | def main(url): 32 | print(f"[bold cyan]Favicon hash for:[/] {url}") 33 | h = get_favicon_hash(url) 34 | print(f"[bold]Hash:[/] {h}") 35 | print("[bold]Shodan search:[/]") 36 | for banner in api.search_cursor(f"http.favicon.hash:{h}"): 37 | print(f"- {banner['ip_str']}:{banner['port']}") 38 | 39 | if __name__ == "__main__": 40 | if len(sys.argv) < 2: 41 | print("Usage: python favicon_hash_lookup.py ") 42 | sys.exit(1) 43 | main(sys.argv[1]) 44 | ``` 45 | -------------------------------------------------------------------------------- /website/docs/scripts/github-user-osint.md: -------------------------------------------------------------------------------- 1 | # GitHub User OSINT 2 | 3 | Pulls public info, repo stats, and social links for a GitHub username. 4 | 5 | --- 6 | 7 | ## Usage 8 | - Requires: `requests`, `rich` 9 | - Run: `python github_user_osint.py ` 10 | 11 | --- 12 | 13 | ```python 14 | import sys, requests 15 | from rich import print 16 | 17 | API_URL = "https://api.github.com/users/{}" 18 | 19 | def main(username): 20 | r = requests.get(API_URL.format(username)) 21 | if r.status_code != 200: 22 | print("[red]User not found") 23 | return 24 | data = r.json() 25 | print(f"[bold cyan]GitHub User:[/] {data['login']}") 26 | print(f"[bold]Name:[/] {data.get('name')}") 27 | print(f"[bold]Bio:[/] {data.get('bio')}") 28 | print(f"[bold]Location:[/] {data.get('location')}") 29 | print(f"[bold]Blog:[/] {data.get('blog')}") 30 | print(f"[bold]Public Repos:[/] {data['public_repos']}") 31 | print(f"[bold]Followers:[/] {data['followers']}") 32 | print(f"[bold]Following:[/] {data['following']}") 33 | 34 | if __name__ == "__main__": 35 | if len(sys.argv) < 2: 36 | print("Usage: python github_user_osint.py ") 37 | sys.exit(1) 38 | main(sys.argv[1]) 39 | ``` 40 | -------------------------------------------------------------------------------- /website/docs/scripts/ipinfo-lookup.md: -------------------------------------------------------------------------------- 1 | # IPInfo Lookup 2 | 3 | Fetches geolocation, ASN, and abuse info for an IP using ipinfo.io. 4 | 5 | --- 6 | 7 | ## Usage 8 | - Requires: `requests`, `rich` 9 | - Run: `python ipinfo_lookup.py ` 10 | 11 | --- 12 | 13 | ```python 14 | import sys, requests 15 | from rich import print 16 | 17 | API = "https://ipinfo.io/{}/json" 18 | 19 | def main(ip): 20 | r = requests.get(API.format(ip)) 21 | if r.status_code != 200: 22 | print("[red]Not found or error") 23 | return 24 | data = r.json() 25 | print(f"[bold cyan]IP:[/] {data['ip']}") 26 | print(f"[bold]City:[/] {data.get('city')}") 27 | print(f"[bold]Region:[/] {data.get('region')}") 28 | print(f"[bold]Country:[/] {data.get('country')}") 29 | print(f"[bold]Org:[/] {data.get('org')}") 30 | print(f"[bold]ASN:[/] {data.get('asn', {}).get('asn', 'N/A')}") 31 | print(f"[bold]Abuse Contact:[/] {data.get('abuse', {}).get('address', 'N/A')}") 32 | 33 | if __name__ == "__main__": 34 | if len(sys.argv) < 2: 35 | print("Usage: python ipinfo_lookup.py ") 36 | sys.exit(1) 37 | main(sys.argv[1]) 38 | ``` 39 | -------------------------------------------------------------------------------- /website/docs/scripts/passive-metadata-harvester.md: -------------------------------------------------------------------------------- 1 | # Passive Metadata Harvester 2 | 3 | Extracts metadata from public files (PDF, DOCX, images) from a given URL list. Simple, safe, passive. 4 | 5 | --- 6 | 7 | ## Usage 8 | - Requires: `requests`, `python-magic`, `exifread`, `rich` 9 | - Run: `python passive_metadata_harvester.py urls.txt` 10 | 11 | --- 12 | 13 | ```python 14 | import sys 15 | import requests 16 | import magic 17 | import exifread 18 | from rich import print 19 | 20 | 21 | def fetch_file(url): 22 | try: 23 | r = requests.get(url, timeout=10) 24 | r.raise_for_status() 25 | return r.content 26 | except Exception as e: 27 | print(f"[red]Could not fetch {url}: {e}") 28 | return None 29 | 30 | def extract_metadata(data): 31 | try: 32 | tags = exifread.process_file(data) 33 | return tags 34 | except Exception as e: 35 | print(f"[red]EXIF extraction failed: {e}") 36 | return {} 37 | 38 | def main(urls_file): 39 | with open(urls_file) as f: 40 | urls = [line.strip() for line in f if line.strip()] 41 | for url in urls: 42 | print(f"[bold cyan]URL:[/] {url}") 43 | data = fetch_file(url) 44 | if data: 45 | meta = extract_metadata(data) 46 | if meta: 47 | for k, v in meta.items(): 48 | print(f"[yellow]{k}[/]: {v}") 49 | else: 50 | print("[red]No metadata found.") 51 | print() 52 | 53 | if __name__ == "__main__": 54 | if len(sys.argv) < 2: 55 | print("[red]Usage: python passive_metadata_harvester.py urls.txt") 56 | sys.exit(1) 57 | main(sys.argv[1]) 58 | ``` 59 | -------------------------------------------------------------------------------- /website/docs/scripts/pdf-bulk-metadata.md: -------------------------------------------------------------------------------- 1 | # PDF Bulk Metadata Extractor 2 | 3 | Extracts metadata from a list of PDF files (local or URLs). 4 | 5 | --- 6 | 7 | ## Usage 8 | - Requires: `requests`, `PyPDF2`, `rich` 9 | - Run: `python pdf_bulk_metadata.py files.txt` 10 | 11 | --- 12 | 13 | ```python 14 | import sys, requests 15 | from PyPDF2 import PdfReader 16 | from rich import print 17 | 18 | def get_pdf_meta(path): 19 | try: 20 | if path.startswith('http'): 21 | r = requests.get(path) 22 | r.raise_for_status() 23 | from io import BytesIO 24 | pdf = PdfReader(BytesIO(r.content)) 25 | else: 26 | pdf = PdfReader(path) 27 | return pdf.metadata 28 | except Exception as e: 29 | print(f"[red]Error: {e}") 30 | return {} 31 | 32 | def main(files_txt): 33 | with open(files_txt) as f: 34 | for line in f: 35 | path = line.strip() 36 | if not path: continue 37 | meta = get_pdf_meta(path) 38 | print(f"[bold cyan]{path}[/]") 39 | for k, v in meta.items(): 40 | print(f"[yellow]{k}[/]: {v}") 41 | print() 42 | 43 | if __name__ == "__main__": 44 | if len(sys.argv) < 2: 45 | print("Usage: python pdf_bulk_metadata.py files.txt") 46 | sys.exit(1) 47 | main(sys.argv[1]) 48 | ``` 49 | -------------------------------------------------------------------------------- /website/docs/scripts/phone-validator.md: -------------------------------------------------------------------------------- 1 | # Phone Validator & OSINT 2 | 3 | Checks if a phone number is valid, carrier info, and if it appears in public paste sites. 4 | 5 | --- 6 | 7 | ## Usage 8 | - Requires: `requests`, `phonenumbers`, `rich` 9 | - Run: `python phone_validator.py ` 10 | 11 | --- 12 | 13 | ```python 14 | import sys, requests 15 | import phonenumbers 16 | from rich import print 17 | 18 | def validate_number(number): 19 | try: 20 | num = phonenumbers.parse(number) 21 | valid = phonenumbers.is_valid_number(num) 22 | carrier = phonenumbers.carrier.name_for_number(num, "en") 23 | return valid, carrier 24 | except: 25 | return False, None 26 | 27 | def check_pastes(number): 28 | url = f"https://publicwww.com/websites/%22{number}%22/" 29 | print(f"[cyan]Paste search:[/] {url}") 30 | 31 | def main(number): 32 | valid, carrier = validate_number(number) 33 | print(f"[bold cyan]Number:[/] {number}") 34 | print(f"[bold]Valid:[/] {valid}") 35 | print(f"[bold]Carrier:[/] {carrier}") 36 | check_pastes(number) 37 | 38 | if __name__ == "__main__": 39 | if len(sys.argv) < 2: 40 | print("Usage: python phone_validator.py ") 41 | sys.exit(1) 42 | main(sys.argv[1]) 43 | ``` 44 | -------------------------------------------------------------------------------- /website/docs/scripts/reverse-image-search.md: -------------------------------------------------------------------------------- 1 | # Reverse Image Searcher 2 | 3 | Checks a list of image URLs against TinEye and Google Images for matches. 4 | 5 | --- 6 | 7 | ## Usage 8 | - Requires: `requests`, `rich` 9 | - Run: `python reverse_image_search.py images.txt` 10 | 11 | --- 12 | 13 | ```python 14 | import sys, requests, webbrowser 15 | from rich import print 16 | 17 | def google_search(img_url): 18 | q = f"https://www.google.com/searchbyimage?image_url={img_url}" 19 | print(f"[cyan]Google:[/] {q}") 20 | webbrowser.open(q) 21 | 22 | def tineye_search(img_url): 23 | q = f"https://tineye.com/search?url={img_url}" 24 | print(f"[cyan]TinEye:[/] {q}") 25 | webbrowser.open(q) 26 | 27 | def main(images_txt): 28 | with open(images_txt) as f: 29 | for url in f: 30 | url = url.strip() 31 | if url: 32 | google_search(url) 33 | tineye_search(url) 34 | 35 | if __name__ == "__main__": 36 | if len(sys.argv) < 2: 37 | print("Usage: python reverse_image_search.py images.txt") 38 | sys.exit(1) 39 | main(sys.argv[1]) 40 | ``` 41 | -------------------------------------------------------------------------------- /website/docs/scripts/shodan-host-analyzer.md: -------------------------------------------------------------------------------- 1 | # Shodan Host Analyzer 2 | 3 | Query Shodan for a given IP or domain and summarize open ports, services, and metadata. 4 | 5 | --- 6 | 7 | ## Usage 8 | - Requires: `shodan` Python package (`pip install shodan`) 9 | - Needs a Shodan API key (`SHODAN_API_KEY` env var) 10 | - Run: `python shodan_host_analyzer.py ` 11 | 12 | --- 13 | 14 | ```python 15 | import sys, os 16 | import shodan 17 | from rich import print 18 | 19 | API_KEY = os.getenv('SHODAN_API_KEY') 20 | api = shodan.Shodan(API_KEY) 21 | 22 | def main(target): 23 | print(f"[bold cyan]Shodan scan for:[/] [yellow]{target}") 24 | try: 25 | result = api.host(target) 26 | print(f"[bold]IP:[/] {result['ip_str']}") 27 | print(f"[bold]Org:[/] {result.get('org')}") 28 | print(f"[bold]Open Ports:[/]") 29 | for item in result['data']: 30 | print(f"- {item['port']} {item.get('product', '')}") 31 | except Exception as e: 32 | print(f"[red]Error: {e}") 33 | 34 | if __name__ == "__main__": 35 | if len(sys.argv) < 2: 36 | print("Usage: python shodan_host_analyzer.py ") 37 | sys.exit(1) 38 | main(sys.argv[1]) 39 | ``` 40 | -------------------------------------------------------------------------------- /website/docs/scripts/social-media-multi-profile.md: -------------------------------------------------------------------------------- 1 | # Social Media Multi-Profile OSINT 2 | 3 | A Python script to search multiple social media platforms for a given username, easy to digest output. 4 | 5 | --- 6 | 7 | ## Usage 8 | - Requires: `requests`, `rich` 9 | - Run: `python social_media_multi_profile.py username` 10 | 11 | --- 12 | 13 | ```python 14 | import sys 15 | import requests 16 | from rich import print 17 | 18 | PLATFORMS = { 19 | "Twitter": "https://twitter.com/{}", 20 | "GitHub": "https://github.com/{}", 21 | "Reddit": "https://reddit.com/user/{}", 22 | "Instagram": "https://instagram.com/{}", 23 | "TikTok": "https://www.tiktok.com/@{}", 24 | } 25 | 26 | def check_profile(platform, url): 27 | try: 28 | r = requests.get(url, timeout=5) 29 | if r.status_code == 200: 30 | return True 31 | except: 32 | pass 33 | return False 34 | 35 | def main(username): 36 | print(f"[bold cyan]Checking username:[/] [yellow]{username}[/]") 37 | for plat, url_fmt in PLATFORMS.items(): 38 | url = url_fmt.format(username) 39 | exists = check_profile(plat, url) 40 | if exists: 41 | print(f"[green]✔ {plat}:[/] {url}") 42 | else: 43 | print(f"[red]✘ {plat}:[/] Not found") 44 | 45 | if __name__ == "__main__": 46 | if len(sys.argv) < 2: 47 | print("[red]Usage: python social_media_multi_profile.py ") 48 | sys.exit(1) 49 | main(sys.argv[1]) 50 | ``` 51 | -------------------------------------------------------------------------------- /website/docs/scripts/threat-intel-aggregator.md: -------------------------------------------------------------------------------- 1 | # Threat Intel Aggregator 2 | 3 | Pulls threat intelligence indicators (IPs/domains/hashes) from multiple public feeds and outputs a deduped, pretty table. 4 | 5 | --- 6 | 7 | ## Usage 8 | - Requires: `requests`, `rich` 9 | - Run: `python threat_intel_aggregator.py` 10 | 11 | --- 12 | 13 | ```python 14 | import requests 15 | from rich import print 16 | from rich.table import Table 17 | 18 | FEEDS = [ 19 | "https://feodotracker.abuse.ch/downloads/ipblocklist.txt", 20 | "https://urlhaus.abuse.ch/downloads/text/", 21 | "https://malc0de.com/bl/IP_Blacklist.txt", 22 | ] 23 | 24 | def fetch_feed(url): 25 | try: 26 | resp = requests.get(url, timeout=10) 27 | resp.raise_for_status() 28 | return resp.text.splitlines() 29 | except Exception as e: 30 | print(f"[red]Error fetching {url}: {e}") 31 | return [] 32 | 33 | def main(): 34 | indicators = set() 35 | for feed in FEEDS: 36 | indicators.update(fetch_feed(feed)) 37 | indicators = {i for i in indicators if i and not i.startswith('#')} 38 | table = Table(title="Threat Intel Aggregator") 39 | table.add_column("Indicator", style="cyan") 40 | for ind in sorted(indicators): 41 | table.add_row(ind) 42 | print(table) 43 | 44 | if __name__ == "__main__": 45 | main() 46 | ``` 47 | -------------------------------------------------------------------------------- /website/docs/scripts/url-screenshotter.md: -------------------------------------------------------------------------------- 1 | # URL Screenshotter 2 | 3 | Takes screenshots of a list of URLs using the ScreenshotAPI (or similar service). 4 | 5 | --- 6 | 7 | ## Usage 8 | - Requires: `requests`, `rich` 9 | - Needs API key for screenshot API 10 | - Run: `python url_screenshotter.py urls.txt` 11 | 12 | --- 13 | 14 | ```python 15 | import sys, requests, os 16 | from rich import print 17 | 18 | API_KEY = os.getenv('SCREENSHOT_API_KEY') 19 | API_URL = "https://shot.screenshotapi.net/screenshot" 20 | 21 | def screenshot_url(url): 22 | params = { 23 | "token": API_KEY, 24 | "url": url, 25 | "output": "image", 26 | } 27 | r = requests.get(API_URL, params=params) 28 | if r.status_code == 200: 29 | fname = url.replace('://','_').replace('/','_')+".png" 30 | with open(fname, 'wb') as f: 31 | f.write(r.content) 32 | print(f"[green]Saved:[/] {fname}") 33 | else: 34 | print(f"[red]Failed for {url}") 35 | 36 | def main(urls_file): 37 | with open(urls_file) as f: 38 | for url in f: 39 | url = url.strip() 40 | if url: 41 | screenshot_url(url) 42 | 43 | if __name__ == "__main__": 44 | if len(sys.argv) < 2: 45 | print("Usage: python url_screenshotter.py urls.txt") 46 | sys.exit(1) 47 | main(sys.argv[1]) 48 | ``` 49 | -------------------------------------------------------------------------------- /website/docs/showcase.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Showcase 3 | slug: /showcase 4 | --- 5 | 6 | # Community Showcase 7 | 8 | Welcome to the OSINT Community Showcase! Here we highlight cool projects, scripts, workflows, and stories from people like you. 9 | 10 | ## 🧩 How to Get Featured 11 | - Built a script, tool, or workflow using this Notebook? 12 | - Have a write-up, case study, or visual guide? 13 | 14 | **Submit your entry by:** 15 | - Opening a Pull Request (PR) with your addition 16 | - Or open an issue describing your project 17 | 18 | ## 🌟 Example Showcase Entry 19 | --- 20 | **Project Name:** Social Media Scraper Lego 21 | **Author:** Ada Lovelace 22 | **Skill Level:** Intermediate 23 | **Summary:** 24 | > "I built a Python script to collect public Twitter bios for OSINT. It uses requests and BeautifulSoup, and I added a checklist for safe/ethical use." 25 | 26 | **Code Sample:** 27 | ```python 28 | import requests 29 | from bs4 import BeautifulSoup 30 | # ...your code here... 31 | ``` 32 | 33 | **Tips:** 34 | - Use a VPN 35 | - Respect rate limits 36 | - Document your steps 37 | --- 38 | 39 | ## 🏗️ Add Your Own! 40 | Copy the template above, fill in your details, and submit via PR or issue. We love all skill levels and backgrounds! 41 | 42 | ## 💬 Questions? 43 | Ask in your PR/issue or reach out on GitHub. Let’s build the OSINT Lego city together! 44 | -------------------------------------------------------------------------------- /website/docs/start-here.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Start Here 3 | slug: /start-here 4 | --- 5 | 6 | # Start Here 7 | 8 | Welcome to the **Python OSINT Notebook** a straight forward open-source hub for learning Open Source Intelligence (OSINT) with Python! 9 | 10 | ## 🧩 How This Site Works 11 | Think of this Notebook as a big box of Lego blocks: 12 | - Each doc is a block you can snap together 13 | - You can build simple things (beginner) or complex structures (advanced) 14 | - If you get stuck, you can always rebuild or swap out blocks 15 | 16 | ## 🧠 Who Is This For? 17 | - **Absolute beginners**: No Python or OSINT experience? Start here! 18 | - **Intermediate tinkerers**: Ready to build your own tools and scripts. 19 | - **DevChads**: Automation, pipelines, threat intel, and beyond. 20 | - **Neurodivergent learners**: We use simple words, visuals, and analogies. 21 | 22 | ## 🚦 Choose Your Path 23 | - 🟢 [Beginner Guide](./beginner-guide): What is OSINT? Python basics, safe searching, simple code recipes. 24 | - 🟡 [Intermediate Guide](./intermediate-guide): Tool deep-dives, scripting, real-world workflows, troubleshooting. 25 | - 🔴 [Advanced Guide](./advanced-guide): Automation pipelines, threat intelligence, ethics, pro tips. 26 | 27 | ## 🛟 If You Get Stuck 28 | - Check the [FAQ](./faq) or [Contributing](./contributing) for help. 29 | - Use the search bar (top right) for instant answers. 30 | - Open an issue or PR on [GitHub](https://github.com/tegridydev/python-OSINT-notebook). 31 | 32 | ## 🏗️ How to Use This Notebook 33 | - Read docs in any order — build your own learning path 34 | - Copy-paste code snippets and try them out 35 | - Share your own Lego blocks (guides, scripts) with the community 36 | 37 | --- 38 | Ready? Pick your track and start building! 39 | -------------------------------------------------------------------------------- /website/docs/tools/02-01-frameworks.md: -------------------------------------------------------------------------------- 1 | --- 2 | id: 02-01-frameworks 3 | title: Core Tools & Libraries › Frameworks & Platforms 4 | --- 5 | 6 | # OSINT Frameworks & Multi-Tool Platforms 7 | 8 | Modern OSINT workflows often begin with a **framework** or **platform** that orchestrates multiple modules and data sources. These tools provide: 9 | 10 | - **Modular Architecture**: Load only the components you need (subdomains, social media, metadata extraction, etc.) 11 | - **Unified Interface**: Web UI or CLI to manage scans, workspaces, and results 12 | - **Data Persistence & Export**: Built-in databases (SQLite, MongoDB) and exports to JSON, CSV, or HTML 13 | - **Extensibility**: Custom transforms, API integrations, or plugin systems 14 | 15 | Below are the most popular, actively maintained Python-based OSINT platforms as of 2025. 16 | 17 | --- 18 | 19 | ## SpiderFoot (v4.0.1) 20 | 21 | - **Description** 22 | SpiderFoot is a highly modular OSINT automation platform with **200+ reconnaissance modules** covering DNS, IP, domains, social media, breach data, dark web, and more. It supports both a Web UI for interactive scans and a CLI for scripting. 23 | 24 | - **Install** 25 | ```bash 26 | git clone https://github.com/smicallef/spiderfoot.git 27 | cd spiderfoot 28 | pip install -r requirements.txt 29 | ``` 30 | 31 | - **Usage** 32 | - **Web UI**: 33 | ```bash 34 | python sf.py --headless --port 5001 35 | # then browse http://localhost:5001 36 | ``` 37 | - **CLI scan**: 38 | ```bash 39 | python sf.py --headless \ 40 | --scan-target example.com \ 41 | --modules passive_dns,ssl,o365 \ 42 | --format json > sf_report.json 43 | ``` 44 | 45 | - **Links** 46 | - GitHub: https://github.com/smicallef/spiderfoot 47 | - Official Docs & API: https://spiderfoot.net/documentation/ 48 | 49 | --- 50 | 51 | ## Recon-ng (v5.1.2) 52 | 53 | - **Description** 54 | Recon-ng is inspired by Metasploit: a **console-based** framework with a “marketplace” of modules. It maintains a workspace-backed SQLite database and makes chaining modules (e.g., from DNS to WHOIS to social profiles) straightforward. 55 | 56 | - **Install** 57 | ```bash 58 | pip install recon-ng 59 | ``` 60 | 61 | - **Quickstart** 62 | ```bash 63 | recon-ng 64 | workspaces create myproject 65 | marketplace install all 66 | add domains example.com 67 | load recon/domains-hosts/bing 68 | run 69 | show hosts 70 | ``` 71 | 72 | - **Links** 73 | - GitHub: https://github.com/lanmaster53/recon-ng 74 | - Docs: https://recon-ng.readthedocs.io 75 | 76 | --- 77 | 78 | ## OSRFramework (v0.20.5) 79 | 80 | - **Description** 81 | A **collection of lightweight tools** (usufy, mailfy, searchfy, passfy) for username checks, email lookups, DNS queries, and more. Each utility can run standalone or be combined in a script. 82 | 83 | - **Install** 84 | ```bash 85 | pip install osrframework 86 | ``` 87 | 88 | - **Usage Examples** 89 | ```bash 90 | # Check if 'alice' exists on known sites 91 | usufy -u alice 92 | 93 | # Find data breaches related to an email 94 | mailfy -e alice@example.com 95 | 96 | # Search pages for keywords 97 | searchfy -q "example.com breach" 98 | ``` 99 | 100 | - **Links** 101 | - GitHub: https://github.com/i3visio/osrframework 102 | - PyPI: https://pypi.org/project/osrframework/ 103 | 104 | --- 105 | 106 | ## (Optional) DataSploit 107 | 108 | > **Note:** DataSploit (v1.0) has not been updated since 2017. You may encounter forks or community patches, but consider replacing it with scripted workflows using the above frameworks. 109 | 110 | - GitHub: https://github.com/datasploit/datasploit 111 | 112 | --- 113 | 114 | ### Additional Resources 115 | 116 | - **Awesome-OSINT Frameworks** (community-curated list): 117 | https://github.com/jivoi/awesome-osint#frameworks 118 | - **Comparative Review** of OSINT platforms: 119 | https://osintframework.com/ 120 | 121 | > **Tip:** For maximum flexibility, mix and match modules from these platforms via custom Python scripts or orchestration tools like `invoke` or `Makefile`. 122 | ``` -------------------------------------------------------------------------------- /website/docs/tools/02-02-domain-infra.md: -------------------------------------------------------------------------------- 1 | --- 2 | id: 02-02-domain-infra 3 | title: Core Tools & Libraries › Domain & Infrastructure 4 | --- 5 | 6 | # Domain & Infrastructure Reconnaissance 7 | 8 | Passive domain and infrastructure reconnaissance lets you collect subdomains, DNS records, SSL certificate data, and other network metadata without active scanning. These tools tap public indices, certificate transparency logs, and passive DNS services. 9 | 10 | --- 11 | 12 | ## OWASP Amass (v4.2.0) 13 | 14 | - **Function:** Attack-surface mapping, passive DNS and certificate enumeration, subdomain discovery. 15 | - **Install:** 16 | ```bash 17 | go install github.com/OWASP/Amass/v3/...@latest 18 | ``` 19 | - **Usage Example:** 20 | ```bash 21 | # Passive enumeration of subdomains 22 | amass enum -passive -d example.com -o amass.txt 23 | ``` 24 | - **Link:** https://github.com/OWASP/Amass 25 | 26 | --- 27 | 28 | ## Subfinder (v2.7.0) 29 | 30 | - **Function:** Ultra-fast passive subdomain enumeration using certificate transparency logs, search engines, and third-party APIs. 31 | - **Install:** 32 | ```bash 33 | go install github.com/projectdiscovery/subfinder/v2/cmd/subfinder@latest 34 | ``` 35 | - **Usage Example:** 36 | ```bash 37 | subfinder -d example.com -o subfinder.txt 38 | ``` 39 | - **Link:** https://github.com/projectdiscovery/subfinder 40 | 41 | --- 42 | 43 | ## MassDNS (v0.2.1) 44 | 45 | - **Function:** High-performance bulk DNS resolver for processing large lists of names. 46 | - **Install:** 47 | ```bash 48 | git clone https://github.com/blechschmidt/massdns.git 49 | cd massdns && make 50 | ``` 51 | - **Usage Example:** 52 | ```bash 53 | # Resolve a list of candidate subdomains 54 | massdns -r resolvers.txt -t A -o S subdomains.txt > resolved.txt 55 | ``` 56 | - **Link:** https://github.com/blechschmidt/massdns 57 | 58 | --- 59 | 60 | ## Legacy Tools 61 | 62 | ### theHarvester 63 | 64 | - **Function:** Aggregate email addresses, hostnames, and subdomains from public sources (search engines, PGP servers). 65 | - **Install & Usage:** 66 | ```bash 67 | pip install theHarvester 68 | theHarvester -d example.com -b bing -l 100 69 | ``` 70 | 71 | ### dnsrecon 72 | 73 | - **Function:** Passive and brute-force DNS enumeration (A, MX, NS, zone transfers, reverse lookups). 74 | - **Install & Usage:** 75 | ```bash 76 | pip install dnsrecon 77 | dnsrecon -d example.com -t brt 78 | ``` 79 | 80 | ### Metagoofil 81 | 82 | - **Function:** Download and extract document metadata (PDF, DOCX, PPTX) to reveal authors, software versions, and internal paths. 83 | - **Usage Note:** 84 | - No active repo; use Python libraries like `PyPDF2` or `ExifTool` wrappers to replicate functionality. 85 | - Example with `PyPDF2`: 86 | ```python 87 | from PyPDF2 import PdfReader 88 | reader = PdfReader("report.pdf") 89 | print(reader.metadata) 90 | ``` 91 | 92 | --- 93 | 94 | -------------------------------------------------------------------------------- /website/docs/tools/02-03-people-social.md: -------------------------------------------------------------------------------- 1 | --- 2 | id: 02-03-people-social 3 | title: Core Tools & Libraries › People & Social Media 4 | --- 5 | 6 | # People & Social Media OSINT 7 | 8 | Social platforms harbor rich troves of publicly available data—profiles, posts, connections, and metadata. The following Python tools enable passive enumeration, scraping, and analysis of user presence across multiple networks without requiring official API access. 9 | 10 | --- 11 | 12 | ## Sherlock (v0.16.8) 13 | 14 | - **Function:** Checks if a username exists on 300+ social media and content sites using HTTP HEAD/GET requests. 15 | - **Install:** 16 | ```bash 17 | pip install sherlock 18 | ``` 19 | - **Usage:** 20 | ```bash 21 | sherlock targetusername --timeout 5 --print-found 22 | ``` 23 | - **Output:** CSV or table of found/unfound accounts. 24 | - **Link:** https://github.com/sherlock-project/sherlock 25 | 26 | --- 27 | 28 | ## Social-Analyzer (v1.0.0) 29 | 30 | - **Function:** Aggregates profile data by username, email, or phone across 1,000+ platforms. Offers CLI, Python library, and Docker support. 31 | - **Install & Run:** 32 | ```bash 33 | git clone https://github.com/qeeqbox/social-analyzer.git 34 | cd social-analyzer 35 | pip install -r requirements.txt 36 | python3 main.py --domain example.com --module social_media 37 | ``` 38 | - **Output:** JSON report with matched profiles and metadata. 39 | - **Link:** https://github.com/qeeqbox/social-analyzer 40 | 41 | --- 42 | 43 | ## Holehe (v1.2.0) 44 | 45 | - **Function:** Uses “forgot password” endpoints to passively check if an email is registered on ~120 platforms. 46 | - **Install:** 47 | ```bash 48 | pip install holehe 49 | ``` 50 | - **Usage:** 51 | ```bash 52 | holehe target@example.com 53 | ``` 54 | - **Link:** https://github.com/megadose/holehe 55 | 56 | --- 57 | 58 | ## snscrape (v0.5.0) 59 | 60 | - **Function:** Scrapes public posts and profiles from Twitter (X), Reddit, Telegram, and more—without API keys. 61 | - **Install:** 62 | ```bash 63 | pip install snscrape 64 | ``` 65 | - **Usage:** 66 | ```bash 67 | # Fetch recent tweets by user 68 | snscrape twitter-user johndoe > tweets.json 69 | ``` 70 | - **Link:** https://github.com/JustAnotherArchivist/snscrape 71 | 72 | --- 73 | 74 | ## Instaloader (v4.10) 75 | 76 | - **Function:** Downloads Instagram profiles, posts, stories, and metadata (public or accessible private). 77 | - **Install:** 78 | ```bash 79 | pip install instaloader 80 | ``` 81 | - **Usage:** 82 | ```bash 83 | # Download all posts from profile 84 | instaloader profile johndoe 85 | ``` 86 | - **Link:** https://github.com/instaloader/instaloader 87 | 88 | --- 89 | 90 | -------------------------------------------------------------------------------- /website/docs/tools/02-04-threat-intel.md: -------------------------------------------------------------------------------- 1 | --- 2 | id: 02-04-threat-intel 3 | title: Core Tools & Libraries › Threat Intelligence 4 | --- 5 | 6 | # Threat Intelligence & Leaked Credentials 7 | 8 | Threat intelligence tools help you collect, store, and analyze indicators of compromise (IoCs), leaked credentials, and underground chatter. These platforms and utilities enable passive ingestion of community-fed data or scanning of code repositories for secrets. 9 | 10 | --- 11 | 12 | ## PyMISP (v2.4.154) 13 | 14 | - **Function:** Python client for the MISP threat-intelligence platform. Create, query, and export events and attributes via API. 15 | - **Install:** 16 | ```bash 17 | pip install pymisp 18 | ``` 19 | - **Usage Example:** 20 | ```python 21 | from pymisp import ExpandedPyMISP 22 | 23 | misp = ExpandedPyMISP( 24 | 'https://misp.example.com', 25 | 'YOUR_API_KEY', 26 | ssl=False 27 | ) 28 | # Search for events related to example.com 29 | events = misp.search(controller='events', value='example.com') 30 | for e in events: 31 | print(e['Event']['info']) 32 | ``` 33 | - **Link:** https://github.com/MISP/PyMISP 34 | 35 | --- 36 | 37 | ## OpenCTI (client v5.2.0) 38 | 39 | - **Function:** Client library for OpenCTI, a collaborative threat-intelligence platform supporting STIX/TAXII. 40 | - **Install:** 41 | ```bash 42 | pip install pycti 43 | ``` 44 | - **Usage Example:** 45 | ```python 46 | from pycti import OpenCTIConnectorHelper, OpenCTIApiClient 47 | 48 | client = OpenCTIApiClient('https://opencti.example.com', 'YOUR_API_TOKEN') 49 | # Fetch threat actors 50 | actors = client.threat_actor.list() 51 | for actor in actors: 52 | print(actor['name']) 53 | ``` 54 | - **Link:** https://github.com/OpenCTI-Platform/client-python 55 | 56 | --- 57 | 58 | ## TruffleHog (v3.0.0) 59 | 60 | - **Function:** Deep-history search for high-entropy strings (API keys, passwords) in Git repositories. 61 | - **Install:** 62 | ```bash 63 | pip install trufflehog 64 | ``` 65 | - **Usage Example:** 66 | ```bash 67 | # Scan a remote Git repo 68 | trufflehog git https://github.com/org/repo.git 69 | ``` 70 | - **Link:** https://github.com/trufflesecurity/trufflehog 71 | 72 | --- 73 | 74 | ## GitLeaks (v8.21.0) 75 | 76 | - **Function:** Detect hardcoded credentials and secrets in Git repos using regex rules. Supports CLI and GitHub Actions. 77 | - **Install (Go):** 78 | ```bash 79 | go install github.com/zricethezav/gitleaks/v8@latest 80 | ``` 81 | - **Usage Example:** 82 | ```bash 83 | # Scan current repo 84 | gitleaks detect --source . 85 | ``` 86 | - **Link:** https://github.com/gitleaks/gitleaks 87 | 88 | --- 89 | 90 | ## Dark Web Monitoring 91 | 92 | ### OnionScan (v0.0.7) 93 | 94 | - **Function:** Crawl and analyze .onion sites to enumerate links, files, and metadata. 95 | - **Install (Go):** 96 | ```bash 97 | go install github.com/s-rah/onionscan@latest 98 | ``` 99 | - **Usage Example:** 100 | ```bash 101 | onionscan https://exampleonion.onion --format json > onionscan.json 102 | ``` 103 | - **Link:** https://github.com/s-rah/onionscan 104 | 105 | ### Stem (v1.8.0) 106 | 107 | - **Function:** Python controller library for Tor, useful for automating .onion requests and tunneling traffic. 108 | - **Install:** 109 | ```bash 110 | pip install stem 111 | ``` 112 | - **Usage Example:** 113 | ```python 114 | from stem import Signal 115 | from stem.control import Controller 116 | 117 | with Controller.from_port(port=9051) as controller: 118 | controller.authenticate() 119 | controller.signal(Signal.NEWNYM) 120 | ``` 121 | - **Link:** https://stem.torproject.org/ 122 | 123 | --- 124 | 125 | -------------------------------------------------------------------------------- /website/docs/tools/02-05-emerging-tools.md: -------------------------------------------------------------------------------- 1 | --- 2 | id: 02-05-emerging-tools 3 | title: Core Tools & Libraries › Emerging Tools 4 | --- 5 | 6 | # Emerging OSINT Tools (2025) 7 | 8 | The OSINT landscape evolves rapidly. Here are some of the most promising **new** or **up-and-coming** Python-based tools and platforms (and related utilities) for passive intelligence gathering in 2025. 9 | 10 | --- 11 | 12 | ## OSINTgram 13 | 14 | - **Function:** Graph-based analysis and visualization of Telegram channels/groups and Instagram profiles. Extracts follower/following relationships, post metadata, and media. 15 | - **Install:** 16 | ```bash 17 | git clone https://github.com/Datalux/OSINTgram.git 18 | cd OSINTgram 19 | pip install -r requirements.txt 20 | ``` 21 | - **Usage Example:** 22 | ```bash 23 | python osintgram.py --target johndoe --output graph.graphml 24 | ``` 25 | - **Output:** GraphML file importable into tools like Gephi or Cytoscape. 26 | - **Link:** https://github.com/Datalux/OSINTgram 27 | 28 | --- 29 | 30 | ## TikTok-Scraper 31 | 32 | - **Function:** Fetch TikTok video metadata (views, likes, comments) and download media without using the official API. 33 | - **Install:** 34 | ```bash 35 | npm install -g tiktok-scraper 36 | ``` 37 | - **Usage Example:** 38 | ```bash 39 | tiktok-scraper user @johndoe --download --metadata --filepath ./tiktoks 40 | ``` 41 | - **Link:** https://www.npmjs.com/package/tiktok-scraper 42 | 43 | --- 44 | 45 | ## ShadowSearch 46 | 47 | - **Function:** AI-driven passive search tool that submits queries to darknet search engines and aggregates results using natural-language prompts. 48 | - **Install:** 49 | ```bash 50 | pip install shadowsearch 51 | ``` 52 | - **Usage Example:** 53 | ```python 54 | from shadowsearch import ShadowSearch 55 | ss = ShadowSearch() 56 | results = ss.search("example.com breach") 57 | print(results) 58 | ``` 59 | - **Link:** https://github.com/your-org/shadowsearch 60 | 61 | --- 62 | 63 | ## XRay-OSINT 64 | 65 | - **Function:** Leverages machine learning (NER, classification) to automatically extract entities (people, organizations, locations) from large-scale scrape results. 66 | - **Install:** 67 | ```bash 68 | pip install xray-osint 69 | ``` 70 | - **Usage Example:** 71 | ```bash 72 | xray scan --input tweets.json --model en_core_web_trf --output entities.csv 73 | ``` 74 | - **Link:** https://github.com/your-org/xray-osint 75 | 76 | --- 77 | 78 | ## CertGraph 79 | 80 | - **Function:** Builds an interactive graph of certificate transparency (CT) logs, connecting domains, intermediate CAs, and certificates over time. 81 | - **Install:** 82 | ```bash 83 | pip install certgraph 84 | ``` 85 | - **Usage Example:** 86 | ```python 87 | from certgraph import CertGraph 88 | cg = CertGraph(domain="example.com") 89 | cg.build_graph(output="certgraph.gml") 90 | ``` 91 | - **Link:** https://github.com/your-org/certgraph 92 | 93 | --- 94 | -------------------------------------------------------------------------------- /website/docusaurus.config.js: -------------------------------------------------------------------------------- 1 | // website/docusaurus.config.js 2 | // @ts-check 3 | import { themes as prismThemes } from 'prism-react-renderer'; 4 | 5 | /** @type {import('@docusaurus/types').Config} */ 6 | const config = { 7 | title: 'Python-OSINT-Notebook', 8 | tagline: 'Passive OSINT via Python', 9 | favicon: 'img/favicon.ico', 10 | 11 | url: 'https://tegridydev.github.io', 12 | baseUrl: '/python-OSINT-notebook/', 13 | organizationName: 'tegridydev', 14 | projectName: 'python-OSINT-notebook', 15 | deploymentBranch: 'gh-pages', 16 | 17 | onBrokenLinks: 'throw', 18 | onBrokenMarkdownLinks: 'warn', 19 | trailingSlash: false, 20 | 21 | i18n: { 22 | defaultLocale: 'en', 23 | locales: ['en', 'es', 'fr', 'de'], // Scaffold for future translations 24 | }, 25 | 26 | presets: [ 27 | [ 28 | 'classic', 29 | /** @type {import('@docusaurus/preset-classic').Options} */ 30 | ({ 31 | docs: { 32 | routeBasePath: '/', 33 | sidebarPath: require.resolve('./sidebars.js'), 34 | editUrl: 35 | 'https://github.com/tegridydev/python-OSINT-notebook/edit/main/website/', 36 | showLastUpdateAuthor: true, 37 | showLastUpdateTime: true, 38 | }, 39 | blog: { 40 | showReadingTime: true, 41 | editUrl: 42 | 'https://github.com/tegridydev/python-OSINT-notebook/edit/main/website/', 43 | }, 44 | theme: { 45 | customCss: require.resolve('./src/css/custom.css'), 46 | }, 47 | sitemap: { 48 | changefreq: 'weekly', 49 | priority: 0.5, 50 | }, 51 | gtag: { 52 | trackingID: 'G-XXXXXXX', 53 | anonymizeIP: true, 54 | }, 55 | }), 56 | ], 57 | ], 58 | 59 | themeConfig: 60 | /** @type {import('@docusaurus/preset-classic').ThemeConfig} */ 61 | ({ 62 | colorMode: { 63 | defaultMode: 'dark', 64 | disableSwitch: false, 65 | respectPrefersColorScheme: true, 66 | }, 67 | image: 'img/docusaurus-social-card.jpg', 68 | navbar: { 69 | title: 'Python-OSINT-Notebook', 70 | logo: { 71 | alt: 'Python OSINT Logo', 72 | src: 'img/favicon.ico', 73 | }, 74 | items: [ 75 | {to: '/', label: 'Docs', position: 'left'}, 76 | {to: '/blog', label: 'Blog', position: 'left'}, 77 | {to: '/showcase', label: 'Showcase', position: 'left'}, 78 | {to: '/scripts/', label: 'Scripts', position: 'left'}, 79 | {to: '/contributing', label: 'Contributing', position: 'right'}, 80 | {to: '/start-here', label: 'Start Here', position: 'right'}, 81 | {href: 'https://github.com/tegridydev/python-OSINT-notebook', label: 'GitHub', position: 'right'}, 82 | ], 83 | }, 84 | 85 | prism: { 86 | theme: prismThemes.github, 87 | darkTheme: prismThemes.dracula, 88 | }, 89 | announcementBar: { 90 | id: 'star-us', 91 | content: 92 | '⭐️ If this Notebook helps you, please ⭐ the GitHub repo!', 93 | backgroundColor: '#fafbfc', 94 | textColor: '#091E42', 95 | isCloseable: true, 96 | }, 97 | 98 | docs: { 99 | sidebar: { 100 | hideable: true, 101 | autoCollapseCategories: true, 102 | }, 103 | }, 104 | footer: { 105 | style: 'dark', 106 | links: [ 107 | { 108 | title: 'Docs', 109 | items: [ 110 | { label: 'Introduction', to: '01-introduction' }, 111 | { label: 'OSINT101', to: 'pythonosint101' }, 112 | ], 113 | }, 114 | { 115 | title: 'Community', 116 | items: [ 117 | { 118 | label: 'Awesome-OSINT', 119 | href: 'https://github.com/jivoi/awesome-osint', 120 | }, 121 | { 122 | label: 'Stack Overflow', 123 | href: 'https://stackoverflow.com/questions/tagged/osint', 124 | }, 125 | { 126 | label: 'Reddit', 127 | href: 'https://www.reddit.com/r/OSINT/', 128 | }, 129 | ], 130 | }, 131 | { 132 | title: 'More', 133 | items: [ 134 | { 135 | label: 'GitHub', 136 | href: 'https://github.com/tegridydev/python-OSINT-notebook', 137 | }, 138 | { label: 'Docusaurus', href: 'https://docusaurus.io' }, 139 | ], 140 | }, 141 | ], 142 | copyright: `© ${new Date().getFullYear()} tegridydev`, 143 | }, 144 | 145 | tableOfContents: { 146 | minHeadingLevel: 2, 147 | maxHeadingLevel: 3, 148 | }, 149 | }), 150 | }; 151 | 152 | module.exports = config; 153 | -------------------------------------------------------------------------------- /website/package.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "website", 3 | "version": "0.0.0", 4 | "private": true, 5 | "scripts": { 6 | "docusaurus": "docusaurus", 7 | "start": "docusaurus start", 8 | "build": "docusaurus build", 9 | "swizzle": "docusaurus swizzle", 10 | "deploy": "docusaurus deploy", 11 | "clear": "docusaurus clear", 12 | "serve": "docusaurus serve", 13 | "write-translations": "docusaurus write-translations", 14 | "write-heading-ids": "docusaurus write-heading-ids" 15 | }, 16 | "dependencies": { 17 | "@docusaurus/core": "3.7.0", 18 | "@mdx-js/react": "^3.0.0", 19 | "clsx": "^2.0.0", 20 | "react": "^19.0.0", 21 | "react-dom": "^19.0.0" 22 | }, 23 | "devDependencies": { 24 | "@docusaurus/module-type-aliases": "3.7.0", 25 | "@docusaurus/plugin-content-docs": "^3.7.0", 26 | "@docusaurus/preset-classic": "^3.7.0", 27 | "@docusaurus/types": "^3.7.0", 28 | "@types/node": "^22.15.2", 29 | "prism-react-renderer": "^2.4.1" 30 | }, 31 | "browserslist": { 32 | "production": [ 33 | ">0.5%", 34 | "not dead", 35 | "not op_mini all" 36 | ], 37 | "development": [ 38 | "last 3 chrome version", 39 | "last 3 firefox version", 40 | "last 5 safari version" 41 | ] 42 | }, 43 | "engines": { 44 | "node": ">=18.0" 45 | } 46 | } 47 | -------------------------------------------------------------------------------- /website/sidebars.js: -------------------------------------------------------------------------------- 1 | // website/sidebars.js 2 | // @ts-check 3 | 4 | /** 5 | * @type {import('@docusaurus/plugin-content-docs').SidebarsConfig} 6 | */ 7 | const sidebars = { 8 | tutorialSidebar: [ 9 | 'start-here', 10 | '01-introduction', 11 | { 12 | type: 'category', 13 | label: 'Core Tools & Libraries', 14 | items: [ 15 | 'tools/02-01-frameworks', 16 | 'tools/02-02-domain-infra', 17 | 'tools/02-03-people-social', 18 | 'tools/02-04-threat-intel', 19 | 'tools/02-05-emerging-tools', 20 | ], 21 | }, 22 | { 23 | type: 'category', 24 | label: 'Installation & Configuration', 25 | items: [ 26 | 'installation/03-01-env-setup', 27 | 'installation/03-02-linux', 28 | 'installation/03-03-windows', 29 | 'installation/03-04-docker', 30 | 'installation/03-05-post-install', 31 | ], 32 | }, 33 | '04-scripting-examples', 34 | '05-automation-pipelines', 35 | '06-tuning-ethics', 36 | '07-resources', 37 | { 38 | type: 'category', 39 | label: 'Skill Level Guides', 40 | items: [ 41 | 'beginner-guide', 42 | 'intermediate-guide', 43 | 'advanced-guide', 44 | ], 45 | }, 46 | { 47 | type: 'category', 48 | label: 'Scripts', 49 | items: [ 50 | 'scripts/README', 51 | 'scripts/domain-recon-combo', 52 | 'scripts/social-media-multi-profile', 53 | 'scripts/threat-intel-aggregator', 54 | 'scripts/passive-metadata-harvester', 55 | 'scripts/all-in-one-passive-recon', 56 | 'scripts/shodan-host-analyzer', 57 | 'scripts/github-user-osint', 58 | 'scripts/email-breach-checker', 59 | 'scripts/url-screenshotter', 60 | 'scripts/ipinfo-lookup', 61 | 'scripts/pdf-bulk-metadata', 62 | 'scripts/reverse-image-search', 63 | 'scripts/phone-validator', 64 | 'scripts/archive-org-snapshots', 65 | 'scripts/favicon-hash-lookup', 66 | ], 67 | }, 68 | 'faq', 69 | 'contributing', 70 | 'showcase', 71 | { 72 | type: 'category', 73 | label: 'Walkthrough Guides', 74 | items: [ 75 | 'pythonosint101', 76 | 'extra-passive-sources', 77 | ], 78 | }, 79 | ], 80 | }; 81 | 82 | module.exports = sidebars; 83 | -------------------------------------------------------------------------------- /website/src/components/HomepageFeatures/index.js: -------------------------------------------------------------------------------- 1 | import clsx from 'clsx'; 2 | import Heading from '@theme/Heading'; 3 | import styles from './styles.module.css'; 4 | 5 | const FeatureList = []; 6 | 7 | function Feature({Svg, title, description}) { 8 | return ( 9 |
10 |
11 | 12 |
13 |
14 | {title} 15 |

{description}

16 |
17 |
18 | ); 19 | } 20 | 21 | export default function HomepageFeatures() { 22 | return ( 23 |
24 |
25 |
26 | {FeatureList.map((props, idx) => ( 27 | 28 | ))} 29 |
30 |
31 |
32 | ); 33 | } 34 | -------------------------------------------------------------------------------- /website/src/components/HomepageFeatures/styles.module.css: -------------------------------------------------------------------------------- 1 | .features { 2 | display: flex; 3 | align-items: center; 4 | padding: 2rem 0; 5 | width: 100%; 6 | } 7 | 8 | .featureSvg { 9 | height: 200px; 10 | width: 200px; 11 | } 12 | -------------------------------------------------------------------------------- /website/src/css/custom.css: -------------------------------------------------------------------------------- 1 | /** 2 | * Any CSS included here will be global. The classic template 3 | * bundles Infima by default. Infima is a CSS framework designed to 4 | * work well for content-centric websites. 5 | */ 6 | 7 | /* You can override the default Infima variables here. */ 8 | :root { 9 | --ifm-color-primary: #2e8555; 10 | --ifm-color-primary-dark: #29784c; 11 | --ifm-color-primary-darker: #277148; 12 | --ifm-color-primary-darkest: #205d3b; 13 | --ifm-color-primary-light: #33925d; 14 | --ifm-color-primary-lighter: #359962; 15 | --ifm-color-primary-lightest: #3cad6e; 16 | --ifm-code-font-size: 95%; 17 | --docusaurus-highlighted-code-line-bg: rgba(0, 0, 0, 0.1); 18 | } 19 | 20 | /* For readability concerns, you should choose a lighter palette in dark mode. */ 21 | [data-theme='dark'] { 22 | --ifm-color-primary: #25c2a0; 23 | --ifm-color-primary-dark: #21af90; 24 | --ifm-color-primary-darker: #1fa588; 25 | --ifm-color-primary-darkest: #1a8870; 26 | --ifm-color-primary-light: #29d5b0; 27 | --ifm-color-primary-lighter: #32d8b4; 28 | --ifm-color-primary-lightest: #4fddbf; 29 | --docusaurus-highlighted-code-line-bg: rgba(0, 0, 0, 0.3); 30 | } 31 | -------------------------------------------------------------------------------- /website/src/pages/index.js: -------------------------------------------------------------------------------- 1 | // website/src/pages/index.js 2 | import React from 'react'; 3 | import clsx from 'clsx'; 4 | import Layout from '@theme/Layout'; 5 | import Link from '@docusaurus/Link'; 6 | import useDocusaurusContext from '@docusaurus/useDocusaurusContext'; 7 | import styles from './index.module.css'; 8 | 9 | function HomepageHeader() { 10 | const { siteConfig } = useDocusaurusContext(); 11 | return ( 12 |
13 |
14 |

Python-OSINT-Notebook

15 |

Your modular toolkit for passive OSINT workflows, scripts, and guides.

16 |
17 | Get Started 18 | Scripts & Tools 19 | GitHub 20 |
21 |
22 |
23 | ); 24 | } 25 | 26 | export default function Home() { 27 | const { siteConfig } = useDocusaurusContext(); 28 | return ( 29 | 33 | 34 |
35 | {/* Quick Mission/Intro Block */} 36 |
37 |
38 |

What is this?

39 |

40 | Python-OSINT-Notebook is your all-in-one, plug-and-play resource for passive OSINT. Whether you’re new to OSINT or just want a no-BS toolkit, you’ll find modular guides and ready-to-run scripts for every skill level. 41 |

42 |
43 |
44 | 45 | {/* Visual Features Grid */} 46 |
47 |
48 |

Core Features

49 |
    50 |
  • Passive Recon: DNS, WHOIS, CT logs, and metadata with zero noise.
  • 51 |
  • Automation: Multi-tool scripts for domains, social, threat intel, and more.
  • 52 |
  • Community & Learning: Step-by-step docs and real-world use cases.
  • 53 |
54 |
55 |
56 | 57 | {/* Scripts Section Preview */} 58 |
59 |
60 |

Featured Scripts

61 |
    62 |
  • Domain Recon Combo: DNS, WHOIS, CT logs in one shot.
  • 63 |
  • Social Multi-Profile: Check usernames across major platforms.
  • 64 |
  • All-in-One Recon: Full passive recon workflow.
  • 65 |
  • Shodan Host Analyzer: Open ports, banners, and metadata.
  • 66 |
  • PDF Bulk Metadata: Extracts metadata from many PDFs.
  • 67 |
68 |
69 | See All Scripts 70 |
71 |
72 |
73 | 74 | {/* Community Section */} 75 |
76 |
77 |

Join the Community

78 |

Share your workflows, ask questions and contribute to the project.

79 |
80 | GitHub 81 |
82 |
83 |
84 |
85 |
86 | ); 87 | } 88 | -------------------------------------------------------------------------------- /website/src/pages/index.module.css: -------------------------------------------------------------------------------- 1 | /* website/src/pages/index.module.css */ 2 | 3 | .heroBanner { 4 | background: linear-gradient(135deg, #1e3a8a 0%, #2563eb 100%); 5 | color: white; 6 | padding: 4rem 0; 7 | text-align: center; 8 | } 9 | .heroTitle { 10 | font-size: 3rem; 11 | margin-bottom: 1rem; 12 | } 13 | .heroSubtitle { 14 | font-size: 1.25rem; 15 | max-width: 600px; 16 | margin: 0 auto 2rem; 17 | } 18 | .buttons { 19 | display: flex; 20 | justify-content: center; 21 | gap: 1rem; 22 | } 23 | 24 | .features { 25 | padding: 4rem 0; 26 | background: #f5f7ff; 27 | } 28 | .sectionTitle { 29 | text-align: center; 30 | margin-bottom: 2rem; 31 | } 32 | .featureGrid { 33 | display: flex; 34 | flex-wrap: wrap; 35 | justify-content: center; 36 | gap: 2rem; 37 | } 38 | .featureCard { 39 | flex: 1 1 250px; 40 | background: white; 41 | padding: 1.5rem; 42 | border-radius: 8px; 43 | box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1); 44 | } 45 | .featureTitle { 46 | margin-bottom: 0.5rem; 47 | } 48 | .featureDesc { 49 | margin: 0; 50 | } 51 | 52 | .quickLinks { 53 | padding: 4rem 0; 54 | } 55 | .linkGrid { 56 | display: flex; 57 | flex-wrap: wrap; 58 | justify-content: center; 59 | list-style: none; 60 | padding: 0; 61 | gap: 1rem; 62 | } 63 | 64 | .cta { 65 | padding: 4rem 0; 66 | background: #eef2ff; 67 | } 68 | .ctaDesc { 69 | max-width: 600px; 70 | margin: 0 auto 2rem; 71 | } 72 | -------------------------------------------------------------------------------- /website/static/.nojekyll: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tegridydev/python-OSINT-notebook/87e4ec6c32b4a3ea58d2389c74e09be86717490e/website/static/.nojekyll -------------------------------------------------------------------------------- /website/static/img/Boost.svg: -------------------------------------------------------------------------------- 1 | 🚀 2 | -------------------------------------------------------------------------------- /website/static/img/Community.svg: -------------------------------------------------------------------------------- 1 | 🤝 2 | -------------------------------------------------------------------------------- /website/static/img/Time.svg: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /website/static/img/docusaurus-social-card.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tegridydev/python-OSINT-notebook/87e4ec6c32b4a3ea58d2389c74e09be86717490e/website/static/img/docusaurus-social-card.jpg -------------------------------------------------------------------------------- /website/static/img/docusaurus.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tegridydev/python-OSINT-notebook/87e4ec6c32b4a3ea58d2389c74e09be86717490e/website/static/img/docusaurus.png -------------------------------------------------------------------------------- /website/static/img/favicon.ico: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tegridydev/python-OSINT-notebook/87e4ec6c32b4a3ea58d2389c74e09be86717490e/website/static/img/favicon.ico -------------------------------------------------------------------------------- /website/static/img/logo.svg: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /website/static/img/undraw_docusaurus_mountain.svg: -------------------------------------------------------------------------------- 1 | 2 | Easy to Use 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | 170 | 171 | 172 | -------------------------------------------------------------------------------- /website/static/img/undraw_docusaurus_tree.svg: -------------------------------------------------------------------------------- 1 | 2 | Focus on What Matters 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | --------------------------------------------------------------------------------