├── .env.example ├── .github ├── ISSUE_TEMPLATE │ └── bug_report.md └── pull_request_template.md ├── .gitignore ├── .replit ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── README_WEBAPP.md ├── client ├── package-lock.json ├── package.json ├── postcss.config.js ├── public │ ├── index.html │ └── manifest.json ├── src │ ├── App.js │ ├── components │ │ ├── AnalysisView.js │ │ ├── ResultsList.js │ │ └── SearchForm.js │ ├── index.css │ └── index.js └── tailwind.config.js ├── dotenv.py ├── fix_dependencies.bat ├── fix_linux_dotenv.sh ├── fix_windows_dotenv.bat ├── generated-icon.png ├── main.py ├── osint_assistant.py ├── osint_web_app.py ├── package-lock.json ├── package.json ├── pyproject.toml ├── replit.nix ├── requirements.txt ├── run.sh ├── run_windows.bat └── uv.lock /.env.example: -------------------------------------------------------------------------------- 1 | # OSINT Assistant API Keys 2 | # Add your API key here for AI capabilities 3 | PERPLEXITY_API_KEY=your_api_key_here -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | title: "[BUG]" 5 | labels: bug 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Describe the bug** 11 | A clear and concise description of what the bug is. 12 | 13 | **To Reproduce** 14 | Steps to reproduce the behavior: 15 | 1. Run command '...' 16 | 2. Use query '....' 17 | 3. See error 18 | 19 | **Expected behavior** 20 | A clear and concise description of what you expected to happen. 21 | 22 | **Screenshots** 23 | If applicable, add screenshots to help explain your problem. 24 | 25 | **Environment (please complete the following information):** 26 | - OS: [e.g. Ubuntu 20.04] 27 | - Python Version: [e.g. 3.8.5] 28 | - Tool Version [e.g. 1.0.0] 29 | 30 | **Additional context** 31 | Add any other context about the problem here. -------------------------------------------------------------------------------- /.github/pull_request_template.md: -------------------------------------------------------------------------------- 1 | ## Description 2 | Please include a summary of the change and which issue is fixed. Include relevant motivation and context. 3 | 4 | Fixes # (issue) 5 | 6 | ## Type of change 7 | - [ ] Bug fix (non-breaking change which fixes an issue) 8 | - [ ] New feature (non-breaking change which adds functionality) 9 | - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) 10 | - [ ] Documentation update 11 | 12 | ## Checklist 13 | - [ ] My code follows the style guidelines of this project 14 | - [ ] I have performed a self-review of my own code 15 | - [ ] I have commented my code, particularly in hard-to-understand areas 16 | - [ ] I have made corresponding changes to the documentation 17 | - [ ] My changes generate no new warnings 18 | - [ ] Any sensitive API keys or credentials have been removed 19 | - [ ] The .env file has not been committed -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Environment variables 2 | .env 3 | 4 | # Python 5 | __pycache__/ 6 | *.py[cod] 7 | *$py.class 8 | *.so 9 | .Python 10 | build/ 11 | develop-eggs/ 12 | dist/ 13 | downloads/ 14 | eggs/ 15 | .eggs/ 16 | lib/ 17 | lib64/ 18 | parts/ 19 | sdist/ 20 | var/ 21 | wheels/ 22 | *.egg-info/ 23 | .installed.cfg 24 | *.egg 25 | 26 | # Jupyter Notebook 27 | .ipynb_checkpoints 28 | 29 | # Virtual Environment 30 | venv/ 31 | ENV/ 32 | env/ 33 | virtualenv/ 34 | 35 | # IDE files 36 | .idea/ 37 | .vscode/ 38 | *.swp 39 | *.swo 40 | 41 | # OSINT Assistant output files 42 | osint_data*.json 43 | *_results.json 44 | 45 | # Distribution / packaging 46 | .Python 47 | build/ 48 | develop-eggs/ 49 | dist/ 50 | downloads/ 51 | eggs/ 52 | .eggs/ 53 | lib/ 54 | lib64/ 55 | parts/ 56 | sdist/ 57 | var/ 58 | wheels/ 59 | share/python-wheels/ 60 | *.egg-info/ 61 | .installed.cfg 62 | *.egg 63 | MANIFEST 64 | 65 | # PyInstaller 66 | # Usually these files are written by a python script from a template 67 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 68 | *.manifest 69 | *.spec 70 | 71 | # Installer logs 72 | pip-log.txt 73 | pip-delete-this-directory.txt 74 | 75 | # Unit test / coverage reports 76 | htmlcov/ 77 | .tox/ 78 | .nox/ 79 | .coverage 80 | .coverage.* 81 | .cache 82 | nosetests.xml 83 | coverage.xml 84 | *.cover 85 | *.py,cover 86 | .hypothesis/ 87 | .pytest_cache/ 88 | cover/ 89 | 90 | # Translations 91 | *.mo 92 | *.pot 93 | 94 | # Django stuff: 95 | *.log 96 | local_settings.py 97 | db.sqlite3 98 | db.sqlite3-journal 99 | 100 | # Flask stuff: 101 | instance/ 102 | .webassets-cache 103 | 104 | # Scrapy stuff: 105 | .scrapy 106 | 107 | # Sphinx documentation 108 | docs/_build/ 109 | 110 | # PyBuilder 111 | .pybuilder/ 112 | target/ 113 | 114 | # IPython 115 | profile_default/ 116 | ipython_config.py 117 | 118 | # pyenv 119 | # For a library or package, you might want to ignore these files since the code is 120 | # intended to run in multiple environments; otherwise, check them in: 121 | # .python-version 122 | 123 | # pipenv 124 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 125 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 126 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 127 | # install all needed dependencies. 128 | #Pipfile.lock 129 | 130 | # poetry 131 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. 132 | # This is especially recommended for binary packages to ensure reproducibility, and is more 133 | # commonly ignored for libraries. 134 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control 135 | #poetry.lock 136 | 137 | # pdm 138 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. 139 | #pdm.lock 140 | # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it 141 | # in version control. 142 | # https://pdm.fming.dev/#use-with-ide 143 | .pdm.toml 144 | 145 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm 146 | __pypackages__/ 147 | 148 | # Celery stuff 149 | celerybeat-schedule 150 | celerybeat.pid 151 | 152 | # SageMath parsed files 153 | *.sage.py 154 | 155 | # Environments 156 | .env 157 | .venv 158 | env/ 159 | venv/ 160 | ENV/ 161 | env.bak/ 162 | venv.bak/ 163 | 164 | # Spyder project settings 165 | .spyderproject 166 | .spyproject 167 | 168 | # Rope project settings 169 | .ropeproject 170 | 171 | # mkdocs documentation 172 | /site 173 | 174 | # mypy 175 | .mypy_cache/ 176 | .dmypy.json 177 | dmypy.json 178 | 179 | # Pyre type checker 180 | .pyre/ 181 | 182 | # pytype static type analyzer 183 | .pytype/ 184 | 185 | # Cython debug symbols 186 | cython_debug/ 187 | 188 | # PyCharm 189 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can 190 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore 191 | # and can be added to the global gitignore or merged into this file. For a more nuclear 192 | # option (not recommended) you can uncomment the following to ignore the entire idea folder. 193 | #.idea/ -------------------------------------------------------------------------------- /.replit: -------------------------------------------------------------------------------- 1 | entrypoint = "main.py" 2 | modules = ["nodejs-20", "python-3.11"] 3 | 4 | [nix] 5 | channel = "stable-24_05" 6 | 7 | [unitTest] 8 | language = "python3" 9 | 10 | [gitHubImport] 11 | requiredFiles = [".replit", "replit.nix"] 12 | 13 | [deployment] 14 | run = ["python3", "main.py"] 15 | deploymentTarget = "cloudrun" 16 | 17 | [[ports]] 18 | localPort = 5000 19 | externalPort = 80 20 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to OSINT Assistant 2 | 3 | First off, thank you for considering contributing to OSINT Assistant! It's people like you that make this tool better for everyone. 4 | 5 | ## Code of Conduct 6 | 7 | By participating in this project, you are expected to uphold our Code of Conduct, which is to treat everyone with respect and courtesy. 8 | 9 | ## How Can I Contribute? 10 | 11 | ### Reporting Bugs 12 | 13 | This section guides you through submitting a bug report. Following these guidelines helps maintainers understand your report and reproduce the issue. 14 | 15 | Before creating bug reports, please check the issue list as you might find that the bug has already been reported. When you are creating a bug report, please include as many details as possible. 16 | 17 | **How Do I Submit A Bug Report?** 18 | 19 | Bugs are tracked as GitHub issues. Create an issue and provide the following information: 20 | 21 | * Use a clear and descriptive title 22 | * Describe the exact steps to reproduce the problem with as much detail as possible 23 | * Provide specific examples to demonstrate the steps 24 | * Describe the behavior you observed after following the steps 25 | * Explain which behavior you expected to see instead and why 26 | * Include screenshots if possible 27 | * Include details about your environment 28 | 29 | ### Suggesting Enhancements 30 | 31 | This section guides you through submitting an enhancement suggestion, including completely new features and minor improvements to existing functionality. 32 | 33 | **How Do I Submit An Enhancement Suggestion?** 34 | 35 | Enhancement suggestions are tracked as GitHub issues. Create an issue and provide the following information: 36 | 37 | * Use a clear and descriptive title 38 | * Provide a detailed description of the suggested enhancement 39 | * Provide specific examples to demonstrate how the suggestion would work 40 | * Describe the current behavior and explain which behavior you expected to see instead and why 41 | * Include screenshots if possible 42 | 43 | ### Pull Requests 44 | 45 | * Fill in the required template 46 | * Do not include issue numbers in the PR title 47 | * Follow the style guides 48 | * Update documentation as needed 49 | * End all files with a newline 50 | * Avoid platform-dependent code 51 | 52 | ## Style Guides 53 | 54 | ### Git Commit Messages 55 | 56 | * Use the present tense ("Add feature" not "Added feature") 57 | * Use the imperative mood ("Move cursor to..." not "Moves cursor to...") 58 | * Limit the first line to 72 characters or less 59 | * Reference issues and pull requests liberally after the first line 60 | 61 | ### Python Style Guide 62 | 63 | * Follow PEP 8 64 | * Use 4 spaces for indentation 65 | * Use docstrings for all public modules, functions, classes, and methods 66 | * Keep line length to a maximum of 100 characters 67 | * Use meaningful variable names 68 | 69 | ### Documentation Style Guide 70 | 71 | * Use Markdown for documentation 72 | * Keep documentation updated when you change code 73 | * Include examples when appropriate 74 | 75 | ## Additional Notes 76 | 77 | ### Security 78 | 79 | If you find a security vulnerability, do NOT open an issue. Email the maintainers directly. 80 | 81 | ### API Keys and Credentials 82 | 83 | * Never commit any API keys, tokens, or credentials 84 | * Always use environment variables or configuration files for sensitive information 85 | * Include sample configuration files with placeholders -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 OSINT Assistant Contributors 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # OSINT Assistant 2 | 3 | An AI-enhanced OSINT (Open Source Intelligence) tool for gathering, analyzing, and reporting on information from various web sources. This tool leverages the power of Perplexity AI to provide deeper insights and more accurate analysis. 4 | 5 | This project includes both a command-line interface and a full-featured web application for easier interaction. 6 | 7 | ## Features 8 | 9 | - 🔍 **Web Search:** Collect information from multiple sources based on specific queries 10 | - 🧠 **AI-Powered Analysis:** Use Perplexity AI for enhanced content analysis 11 | - 📊 **Entity Recognition:** Identify key people, organizations, and concepts from collected data 12 | - 🔗 **Connection Analysis:** Map relationships between identified entities 13 | - 📈 **Pattern Recognition:** Identify trends and patterns in the collected data 14 | - 📝 **Comprehensive Reporting:** Generate structured reports with visualizations and actionable insights 15 | - 📤 **Data Export:** Save results in JSON format with proper serialization using Pydantic 16 | 17 | ## Installation 18 | 19 | ### Prerequisites 20 | 21 | - Python 3.8+ 22 | - pip package manager 23 | - Node.js 14+ and npm (for web application) 24 | 25 | ### Setup 26 | 27 | 1. Clone the repository: 28 | ```bash 29 | git clone https://github.com/AXRoux/OSINT-Assistant.git 30 | cd OSINT-Assistant 31 | ``` 32 | 33 | 2. Install dependencies: 34 | ```bash 35 | pip install -r requirements.txt 36 | ``` 37 | 38 | 3. Set up your environment variables: 39 | ```bash 40 | cp .env.example .env 41 | ``` 42 | 43 | 4. Edit the `.env` file and add your Perplexity API key: 44 | ``` 45 | PERPLEXITY_API_KEY=your_api_key_here 46 | ``` 47 | 48 | ## Usage 49 | 50 | ### Command Line Interface 51 | 52 | Run a search query: 53 | ```bash 54 | python osint_assistant.py --query "quantum computing advances" 55 | ``` 56 | 57 | ### Web Application 58 | 59 | Start the web application: 60 | 61 | #### Linux/Mac 62 | ```bash 63 | bash run.sh 64 | ``` 65 | or make it executable first: 66 | ```bash 67 | chmod +x run.sh 68 | ./run.sh 69 | ``` 70 | 71 | #### Windows 72 | ``` 73 | run_windows.bat 74 | ``` 75 | 76 | The web application will automatically: 77 | 1. Install required dependencies 78 | 2. Set up the environment file if not present 79 | 3. Build the React frontend if needed 80 | 4. Start the Flask server 81 | 82 | On Windows, the script will automatically open your browser to http://localhost:5000. 83 | On Linux/Mac, you'll need to open your browser and navigate to http://localhost:5000 manually. 84 | 85 | ### Advanced Options 86 | 87 | Save the results to a file: 88 | ```bash 89 | python osint_assistant.py --query "quantum computing advances" --save 90 | ``` 91 | 92 | Specify the number of results to collect: 93 | ```bash 94 | python osint_assistant.py --query "quantum computing advances" --results 15 95 | ``` 96 | 97 | Output results as JSON: 98 | ```bash 99 | python osint_assistant.py --query "quantum computing advances" --json 100 | ``` 101 | 102 | Override the API key from environment file: 103 | ```bash 104 | python osint_assistant.py --query "quantum computing advances" --api-key "your-api-key" 105 | ``` 106 | 107 | ### Command Line Arguments 108 | 109 | | Argument | Short | Description | 110 | |----------|-------|-------------| 111 | | `--query` | `-q` | The search query to investigate | 112 | | `--results` | `-r` | Number of results to collect (default: 10) | 113 | | `--save` | `-s` | Save the collected data to a file | 114 | | `--api-key` | `-k` | Perplexity API key (overrides .env file) | 115 | | `--json` | `-j` | Output results as JSON | 116 | 117 | ## API Key Setup 118 | 119 | This tool uses the Perplexity AI API for enhanced intelligence gathering and analysis. To use this feature: 120 | 121 | 1. Sign up for an account at [Perplexity AI](https://www.perplexity.ai/) 122 | 2. Navigate to the API section to generate an API key 123 | 3. Add the key to your `.env` file or use the `--api-key` command line argument 124 | 125 | Note: The tool will still function without an API key, but will fall back to simulated data rather than real AI-powered analysis. 126 | 127 | ## Data Models 128 | 129 | The tool uses Pydantic models for data validation and serialization: 130 | 131 | - `SearchResult`: Represents a single search result with title, URL, snippet, etc. 132 | - `ContentAnalysis`: Contains analysis of a specific URL including credibility, entities, and sentiment 133 | - `OSINTReport`: The complete report with all collected data and analyses 134 | 135 | ## Contributing 136 | 137 | Contributions are welcome! Please feel free to submit a Pull Request. 138 | 139 | 1. Fork the repository 140 | 2. Create your feature branch (`git checkout -b feature/amazing-feature`) 141 | 3. Commit your changes (`git commit -m 'Add some amazing feature'`) 142 | 4. Push to the branch (`git push origin feature/amazing-feature`) 143 | 5. Open a Pull Request 144 | 145 | ### Development Guidelines 146 | 147 | When contributing to this project, please: 148 | 149 | 1. Ensure code follows PEP 8 style guide for Python code 150 | 2. Write unit tests for new features 151 | 3. Update documentation as needed 152 | 4. Make sure not to commit any API keys or sensitive information 153 | 5. Verify that all tests pass before submitting a PR 154 | 155 | ## Troubleshooting 156 | 157 | ### Windows Users 158 | 159 | #### "No module named 'dotenv'" Error 160 | If you encounter this error when running the application: 161 | ``` 162 | ModuleNotFoundError: No module named 'dotenv' 163 | ``` 164 | 165 | There are several ways to fix this issue: 166 | 167 | 1. **Run the diagnostic script** (most comprehensive): 168 | ``` 169 | fix_windows_dotenv.bat 170 | ``` 171 | This script will: 172 | - Diagnose your Python environment 173 | - Try multiple installation methods 174 | - Provide detailed troubleshooting guidance 175 | 176 | 2. **Use the included standalone module**: 177 | No installation required - we've included a standalone `dotenv.py` file directly in the project folder. This will work even if pip fails. 178 | 179 | 3. **Manual installation methods**: 180 | ``` 181 | pip install python-dotenv 182 | ``` 183 | or 184 | ``` 185 | python -m pip install python-dotenv 186 | ``` 187 | or 188 | ``` 189 | pip install --user python-dotenv 190 | ``` 191 | 192 | 4. **Administrator privileges**: 193 | Run Command Prompt as Administrator, then try: 194 | ``` 195 | pip install python-dotenv 196 | ``` 197 | 198 | 5. **Multiple Python installations**: 199 | If you have multiple Python versions, specify the version: 200 | ``` 201 | py -3.9 -m pip install python-dotenv 202 | ``` 203 | 204 | #### Browser Not Opening Automatically 205 | When running the application directly with `python osint_web_app.py`, the browser may not open automatically. Use `run_windows.bat` instead to automatically open the browser with the application. 206 | 207 | ### Linux/Mac Users 208 | 209 | #### "No module named 'dotenv'" or "module 'dotenv' has no attribute" Errors 210 | If you encounter either of these errors when running the application: 211 | ``` 212 | ModuleNotFoundError: No module named 'dotenv' 213 | ``` 214 | or 215 | ``` 216 | AttributeError: module 'dotenv' has no attribute 'dotenv_values' 217 | ``` 218 | 219 | There are several ways to fix this issue: 220 | 221 | 1. **Run the diagnostic script** (recommended): 222 | ```bash 223 | ./fix_linux_dotenv.sh 224 | ``` 225 | or if it's not executable: 226 | ```bash 227 | bash fix_linux_dotenv.sh 228 | ``` 229 | This script will: 230 | - Diagnose your Python environment 231 | - Try multiple installation methods 232 | - Check if the installed module has all required functions 233 | - Provide Linux-specific troubleshooting guidance 234 | 235 | 2. **Use the included standalone module**: 236 | No installation required - we've included a standalone `dotenv.py` file directly in the project folder that implements all necessary functions including `dotenv_values`. 237 | 238 | 3. **Manual installation methods**: 239 | ```bash 240 | pip install python-dotenv 241 | ``` 242 | or 243 | ```bash 244 | pip3 install python-dotenv 245 | ``` 246 | or 247 | ```bash 248 | sudo pip install python-dotenv 249 | ``` 250 | 251 | For general dependency issues, you can also run: 252 | ```bash 253 | pip install -r requirements.txt 254 | ``` 255 | 256 | ## Security Notes 257 | 258 | ⚠️ **IMPORTANT**: This tool requires API keys to function properly. 259 | 260 | - Never commit your `.env` file with real API keys to GitHub 261 | - Always use the `.env.example` file as a template 262 | - Consider using GitHub Secrets for CI/CD workflows if adding automation 263 | 264 | ## License 265 | 266 | This project is licensed under the MIT License - see the LICENSE file for details. 267 | 268 | ## Disclaimer 269 | 270 | This tool is for educational and research purposes only. Always ensure you comply with relevant laws and regulations when conducting OSINT research. The authors are not responsible for any misuse of this tool. 271 | -------------------------------------------------------------------------------- /README_WEBAPP.md: -------------------------------------------------------------------------------- 1 | # OSINT Assistant Web Application 2 | 3 | This repository contains both the OSINT Assistant core functionality and a web application interface built with Flask and React. 4 | 5 | ## Project Structure 6 | 7 | - `osint_assistant.py` - Core OSINT functionality 8 | - `osint_web_app.py` - Flask backend server 9 | - `client/` - React frontend application 10 | - `requirements.txt` - Python dependencies 11 | 12 | ## Features 13 | 14 | - 🔍 **Web Search:** Collect information from multiple sources based on specific queries 15 | - 🧠 **AI-Powered Analysis:** Use Perplexity AI for enhanced content analysis 16 | - 📊 **Entity Recognition:** Identify key people, organizations, and concepts 17 | - 🔗 **Connection Analysis:** Map relationships between identified entities 18 | - 📊 **Modern UI:** React and TailwindCSS frontend for easy interaction 19 | - 🌐 **API Access:** RESTful API endpoints for programmatic access 20 | 21 | ## Installation 22 | 23 | ### Prerequisites 24 | 25 | - Python 3.8+ 26 | - Node.js 14+ and npm 27 | - pip package manager 28 | 29 | ### Backend Setup 30 | 31 | 1. Install Python dependencies: 32 | ```bash 33 | pip install -r requirements.txt 34 | ``` 35 | 36 | 2. Set up your environment variables: 37 | ```bash 38 | cp .env.example .env 39 | ``` 40 | 41 | 3. Edit the `.env` file and add your Perplexity API key: 42 | ``` 43 | PERPLEXITY_API_KEY=your_api_key_here 44 | ``` 45 | 46 | ### Frontend Setup 47 | 48 | 1. Navigate to the client directory: 49 | ```bash 50 | cd client 51 | ``` 52 | 53 | 2. Install Node.js dependencies: 54 | ```bash 55 | npm install 56 | ``` 57 | 58 | 3. Build the frontend: 59 | ```bash 60 | npm run build 61 | ``` 62 | 63 | ## Running the Application 64 | 65 | ### Development Mode 66 | 67 | Run the backend server: 68 | ```bash 69 | python osint_web_app.py 70 | ``` 71 | 72 | In a separate terminal, run the React development server: 73 | ```bash 74 | cd client 75 | npm start 76 | ``` 77 | 78 | Access the application at http://localhost:3000 79 | 80 | ### Production Mode 81 | 82 | 1. Build the React frontend: 83 | ```bash 84 | cd client 85 | npm run build 86 | ``` 87 | 88 | 2. Run the Flask server, which will serve both the API and the built React frontend: 89 | ```bash 90 | python osint_web_app.py 91 | ``` 92 | 93 | 3. Access the application at http://localhost:5000 94 | 95 | ## API Documentation 96 | 97 | The web application exposes the following API endpoint: 98 | 99 | ### POST /api/search 100 | 101 | Performs an OSINT search and analysis based on a query. 102 | 103 | **Request Body:** 104 | ```json 105 | { 106 | "query": "your search query", 107 | "num_results": 10, 108 | "api_key": "optional_api_key" 109 | } 110 | ``` 111 | 112 | **Response:** 113 | ```json 114 | { 115 | "collected_data": [...], 116 | "analysis_results": {...}, 117 | "query_info": { 118 | "query": "your search query", 119 | "results_requested": 10, 120 | "results_found": 3 121 | } 122 | } 123 | ``` 124 | 125 | ## Contributing 126 | 127 | Contributions are welcome! Please feel free to submit a Pull Request. 128 | 129 | ## License 130 | 131 | This project is licensed under the MIT License - see the LICENSE file for details. 132 | 133 | ## Disclaimer 134 | 135 | This tool is for educational and research purposes only. Always ensure you comply with relevant laws and regulations when conducting OSINT research. -------------------------------------------------------------------------------- /client/package.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "osint-assistant-client", 3 | "version": "0.1.0", 4 | "private": true, 5 | "dependencies": { 6 | "react": "^18.2.0", 7 | "react-dom": "^18.2.0", 8 | "autoprefixer": "^10.4.17", 9 | "postcss": "^8.4.35", 10 | "tailwindcss": "^3.4.1" 11 | }, 12 | "scripts": { 13 | "start": "react-scripts start", 14 | "build": "react-scripts build", 15 | "test": "react-scripts test", 16 | "eject": "react-scripts eject" 17 | }, 18 | "eslintConfig": { 19 | "extends": [ 20 | "react-app", 21 | "react-app/jest" 22 | ] 23 | }, 24 | "browserslist": { 25 | "production": [ 26 | ">0.2%", 27 | "not dead", 28 | "not op_mini all" 29 | ], 30 | "development": [ 31 | "last 1 chrome version", 32 | "last 1 firefox version", 33 | "last 1 safari version" 34 | ] 35 | }, 36 | "devDependencies": { 37 | "react-scripts": "5.0.1" 38 | }, 39 | "proxy": "http://localhost:5000" 40 | } -------------------------------------------------------------------------------- /client/postcss.config.js: -------------------------------------------------------------------------------- 1 | module.exports = { 2 | plugins: { 3 | tailwindcss: {}, 4 | autoprefixer: {}, 5 | }, 6 | } -------------------------------------------------------------------------------- /client/public/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 12 | 13 | 14 | OSINT Assistant 15 | 16 | 17 | 18 |
19 | 20 | -------------------------------------------------------------------------------- /client/public/manifest.json: -------------------------------------------------------------------------------- 1 | { 2 | "short_name": "OSINT Assistant", 3 | "name": "OSINT Assistant - AI-Enhanced Intelligence Tool", 4 | "icons": [ 5 | { 6 | "src": "favicon.ico", 7 | "sizes": "64x64 32x32 24x24 16x16", 8 | "type": "image/x-icon" 9 | }, 10 | { 11 | "src": "logo192.png", 12 | "type": "image/png", 13 | "sizes": "192x192" 14 | }, 15 | { 16 | "src": "logo512.png", 17 | "type": "image/png", 18 | "sizes": "512x512" 19 | } 20 | ], 21 | "start_url": ".", 22 | "display": "standalone", 23 | "theme_color": "#4F46E5", 24 | "background_color": "#ffffff" 25 | } -------------------------------------------------------------------------------- /client/src/App.js: -------------------------------------------------------------------------------- 1 | // This file will not be used as we're creating a server-side HTML app instead -------------------------------------------------------------------------------- /client/src/components/AnalysisView.js: -------------------------------------------------------------------------------- 1 | import React from 'react'; 2 | 3 | const AnalysisView = ({ result, analysis }) => { 4 | if (!result || !analysis) { 5 | return ( 6 |
7 |

No analysis data available.

8 |
9 | ); 10 | } 11 | 12 | const getCredibilityColor = (score) => { 13 | if (score >= 0.7) return 'bg-green-100 text-green-800'; 14 | if (score >= 0.4) return 'bg-yellow-100 text-yellow-800'; 15 | return 'bg-red-100 text-red-800'; 16 | }; 17 | 18 | const getSentimentColor = (sentiment) => { 19 | switch (sentiment.toLowerCase()) { 20 | case 'positive': 21 | return 'text-green-500'; 22 | case 'negative': 23 | return 'text-red-500'; 24 | default: 25 | return 'text-gray-500'; 26 | } 27 | }; 28 | 29 | return ( 30 |
31 |
32 |

{result.title}

33 | 40 | {result.url} 41 | 42 |
43 | 44 |
45 |
46 |
47 |

Source Details

48 |
49 |
50 | Domain: 51 | {analysis.domain} 52 |
53 |
54 | Credibility Score: 55 | 56 | {(analysis.credibility_score * 100).toFixed(0)}% 57 | 58 |
59 |
60 | Sentiment: 61 | 62 | {analysis.sentiment.charAt(0).toUpperCase() + analysis.sentiment.slice(1)} 63 | 64 |
65 |
66 | Published: 67 | {analysis.timestamps.published} 68 |
69 |
70 | Last Updated: 71 | {analysis.timestamps.last_updated} 72 |
73 |
74 |
75 | 76 |
77 |

Key Entities

78 |
79 | {analysis.key_entities.map((entity, index) => ( 80 | 84 | {entity} 85 | 86 | ))} 87 |
88 |
89 |
90 | 91 |
92 |

Content Summary

93 |

{result.snippet}

94 |
95 | 96 | {analysis.connections && analysis.connections.length > 0 && ( 97 |
98 |

Entity Connections

99 |
100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | {analysis.connections.map((connection, index) => ( 110 | 111 | 112 | 113 | 114 | 115 | ))} 116 | 117 |
FromRelationshipTo
{connection.from}{connection.relationship}{connection.to}
118 |
119 |
120 | )} 121 |
122 |
123 | ); 124 | }; 125 | 126 | export default AnalysisView; -------------------------------------------------------------------------------- /client/src/components/ResultsList.js: -------------------------------------------------------------------------------- 1 | import React from 'react'; 2 | 3 | const ResultsList = ({ results, onSelectResult, selectedUrl }) => { 4 | if (!results || results.length === 0) { 5 | return null; 6 | } 7 | 8 | const handleResultClick = (result) => { 9 | onSelectResult(result); 10 | }; 11 | 12 | const handleKeyDown = (e, result) => { 13 | if (e.key === 'Enter' || e.key === ' ') { 14 | onSelectResult(result); 15 | } 16 | }; 17 | 18 | return ( 19 |
20 |

Search Results

21 | 45 |
46 | ); 47 | }; 48 | 49 | export default ResultsList; -------------------------------------------------------------------------------- /client/src/components/SearchForm.js: -------------------------------------------------------------------------------- 1 | import React, { useState } from 'react'; 2 | 3 | const SearchForm = ({ onSearch }) => { 4 | const [query, setQuery] = useState(''); 5 | const [apiKey, setApiKey] = useState(''); 6 | const [numResults, setNumResults] = useState(10); 7 | const [showAdvanced, setShowAdvanced] = useState(false); 8 | 9 | const handleSubmit = (e) => { 10 | e.preventDefault(); 11 | onSearch(query, apiKey, numResults); 12 | }; 13 | 14 | const handleKeyDown = (e) => { 15 | if (e.key === 'Enter') { 16 | handleSubmit(e); 17 | } 18 | }; 19 | 20 | return ( 21 |
22 |
23 |
24 | 30 | setQuery(e.target.value)} 35 | onKeyDown={handleKeyDown} 36 | placeholder="Enter your OSINT search query" 37 | className="w-full px-4 py-2 border border-gray-300 rounded-md focus:outline-none focus:ring-2 focus:ring-indigo-500" 38 | aria-label="Search query" 39 | tabIndex="0" 40 | required 41 | /> 42 |
43 | 44 |
45 | 62 |
63 | 64 | {showAdvanced && ( 65 |
66 |
67 | 73 | setApiKey(e.target.value)} 78 | placeholder="Enter your API key" 79 | className="w-full px-4 py-2 border border-gray-300 rounded-md focus:outline-none focus:ring-2 focus:ring-indigo-500" 80 | aria-label="API key" 81 | tabIndex="0" 82 | /> 83 |

84 | Leave empty to use the API key from the .env file 85 |

86 |
87 | 88 |
89 | 95 | setNumResults(parseInt(e.target.value, 10))} 100 | min="1" 101 | max="50" 102 | className="w-full px-4 py-2 border border-gray-300 rounded-md focus:outline-none focus:ring-2 focus:ring-indigo-500" 103 | aria-label="Number of results" 104 | tabIndex="0" 105 | /> 106 |
107 |
108 | )} 109 | 110 |
111 | 119 |
120 |
121 |
122 | ); 123 | }; 124 | 125 | export default SearchForm; -------------------------------------------------------------------------------- /client/src/index.css: -------------------------------------------------------------------------------- 1 | @tailwind base; 2 | @tailwind components; 3 | @tailwind utilities; 4 | 5 | body { 6 | margin: 0; 7 | font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', 'Roboto', 'Oxygen', 8 | 'Ubuntu', 'Cantarell', 'Fira Sans', 'Droid Sans', 'Helvetica Neue', 9 | sans-serif; 10 | -webkit-font-smoothing: antialiased; 11 | -moz-osx-font-smoothing: grayscale; 12 | @apply bg-gray-50; 13 | } 14 | 15 | code { 16 | font-family: source-code-pro, Menlo, Monaco, Consolas, 'Courier New', 17 | monospace; 18 | } -------------------------------------------------------------------------------- /client/src/index.js: -------------------------------------------------------------------------------- 1 | import React from 'react'; 2 | import ReactDOM from 'react-dom/client'; 3 | import './index.css'; 4 | import App from './App'; 5 | 6 | const root = ReactDOM.createRoot(document.getElementById('root')); 7 | root.render( 8 | 9 | 10 | 11 | ); -------------------------------------------------------------------------------- /client/tailwind.config.js: -------------------------------------------------------------------------------- 1 | /** @type {import('tailwindcss').Config} */ 2 | module.exports = { 3 | content: [ 4 | "./src/**/*.{js,jsx,ts,tsx}", 5 | ], 6 | theme: { 7 | extend: {}, 8 | }, 9 | plugins: [], 10 | } -------------------------------------------------------------------------------- /dotenv.py: -------------------------------------------------------------------------------- 1 | """ 2 | Python-dotenv 3 | 4 | Loads environment variables from .env file 5 | 6 | Usage: 7 | from dotenv import load_dotenv 8 | load_dotenv() # Load from .env in current directory 9 | load_dotenv('/path/to/.env') # Load from specified file 10 | 11 | This is a simplified standalone version of python-dotenv that can be placed 12 | directly in your project folder if you're unable to install the package. 13 | """ 14 | 15 | import os 16 | import io 17 | import re 18 | import sys 19 | from typing import Dict, Optional, Union, Any, List, Iterator, TextIO 20 | 21 | 22 | def load_dotenv(dotenv_path: Union[str, os.PathLike, None] = None, 23 | override: bool = False, 24 | encoding: Optional[str] = None) -> bool: 25 | """ 26 | Load the environment variables from a .env file. 27 | 28 | Args: 29 | dotenv_path: Path to the .env file. If not provided, looks for .env in current directory 30 | override: Whether to override existing environment variables 31 | encoding: Encoding of the .env file 32 | 33 | Returns: 34 | True if the file was loaded successfully, False otherwise 35 | """ 36 | if dotenv_path is None: 37 | dotenv_path = os.path.join(os.getcwd(), '.env') 38 | 39 | if not os.path.exists(dotenv_path): 40 | return False 41 | 42 | try: 43 | with io.open(dotenv_path, encoding=encoding or 'utf-8') as f: 44 | env_vars = parse_dotenv(f) 45 | except Exception: 46 | return False 47 | 48 | for key, value in env_vars.items(): 49 | if key in os.environ and not override: 50 | continue 51 | os.environ[key] = value 52 | 53 | return True 54 | 55 | 56 | def dotenv_values(dotenv_path: Union[str, os.PathLike, None] = None, 57 | encoding: Optional[str] = None) -> Dict[str, str]: 58 | """ 59 | Parse a .env file and return a dictionary of key-value pairs. 60 | This is the function that Flask CLI uses. 61 | 62 | Args: 63 | dotenv_path: Path to the .env file. If not provided, looks for .env in current directory 64 | encoding: Encoding of the .env file 65 | 66 | Returns: 67 | Dictionary of key-value pairs from the .env file 68 | """ 69 | if dotenv_path is None: 70 | dotenv_path = os.path.join(os.getcwd(), '.env') 71 | 72 | if not os.path.exists(dotenv_path): 73 | return {} 74 | 75 | try: 76 | with io.open(dotenv_path, encoding=encoding or 'utf-8') as f: 77 | return parse_dotenv(f) 78 | except Exception: 79 | return {} 80 | 81 | 82 | def parse_dotenv(file_obj: TextIO) -> Dict[str, str]: 83 | """Parse the contents of a .env file and return a dictionary of key-value pairs.""" 84 | result = {} 85 | 86 | for line in file_obj: 87 | line = line.strip() 88 | if not line or line.startswith('#'): 89 | continue 90 | 91 | # Handle basic KEY=VALUE format 92 | if '=' in line: 93 | key, value = line.split('=', 1) 94 | key = key.strip() 95 | value = value.strip() 96 | 97 | # Remove quotes if present 98 | if value and ( 99 | (value[0] == value[-1] == '"') or 100 | (value[0] == value[-1] == "'") 101 | ): 102 | value = value[1:-1] 103 | 104 | result[key] = value 105 | 106 | return result 107 | 108 | 109 | def find_dotenv(filename: str = '.env', 110 | raise_error_if_not_found: bool = False, 111 | usecwd: bool = False) -> str: 112 | """ 113 | Search for a .env file in parent directories. 114 | 115 | Args: 116 | filename: Name of the file to search for 117 | raise_error_if_not_found: Whether to raise an error if file is not found 118 | usecwd: Whether to use the current working directory 119 | 120 | Returns: 121 | Path to the found file 122 | 123 | Raises: 124 | IOError: If file is not found and raise_error_if_not_found is True 125 | """ 126 | if usecwd: 127 | path = os.getcwd() 128 | else: 129 | frame = sys._getframe() 130 | current_file = __file__ 131 | 132 | while frame.f_code.co_filename == current_file: 133 | frame = frame.f_back 134 | frame_filename = frame.f_code.co_filename 135 | path = os.path.dirname(os.path.abspath(frame_filename)) 136 | 137 | for _ in range(10): # Limit the number of parent directories to search 138 | check_path = os.path.join(path, filename) 139 | if os.path.exists(check_path): 140 | return check_path 141 | 142 | parent_path = os.path.dirname(path) 143 | if parent_path == path: 144 | break 145 | path = parent_path 146 | 147 | if raise_error_if_not_found: 148 | raise IOError(f"File {filename} not found") 149 | return "" 150 | 151 | 152 | if __name__ == "__main__": 153 | print("This is a standalone version of python-dotenv.") 154 | print("To use it, import the module and call load_dotenv().") 155 | print("For example: from dotenv import load_dotenv; load_dotenv()") -------------------------------------------------------------------------------- /fix_dependencies.bat: -------------------------------------------------------------------------------- 1 | @echo off 2 | echo Installing OSINT Assistant dependencies... 3 | 4 | REM Check if Python is installed 5 | python --version >nul 2>&1 6 | if %ERRORLEVEL% NEQ 0 ( 7 | echo Python is required but not installed. Please install Python 3.8+ and try again. 8 | pause 9 | exit /b 1 10 | ) 11 | 12 | REM Install dependencies 13 | echo Installing required Python packages... 14 | pip install python-dotenv 15 | pip install -r requirements.txt 16 | 17 | echo Dependencies installed successfully! 18 | echo. 19 | echo If you were getting "No module named 'dotenv'" error, it should be fixed now. 20 | echo You can now run the application using run_windows.bat or directly with: 21 | echo python osint_web_app.py 22 | echo. 23 | pause -------------------------------------------------------------------------------- /fix_linux_dotenv.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | echo "===== OSINT Assistant Python Environment Diagnostic =====" 4 | echo 5 | 6 | # Show Python environment information 7 | echo "PYTHON ENVIRONMENT INFORMATION:" 8 | echo "------------------------------" 9 | which python3 10 | python3 --version 11 | echo 12 | echo "Python executable path:" 13 | python3 -c "import sys; print(sys.executable)" 14 | echo 15 | echo "Python site-packages location:" 16 | python3 -c "import site; print(site.getsitepackages()[0])" 17 | echo 18 | 19 | # Try different pip installation methods 20 | echo "ATTEMPTING MULTIPLE INSTALLATION METHODS:" 21 | echo "------------------------------" 22 | echo "Method 1: Standard pip install" 23 | pip install python-dotenv 24 | echo 25 | 26 | echo "Method 2: Specific Python executable" 27 | python3 -m pip install python-dotenv 28 | echo 29 | 30 | echo "Method 3: Using pip directly with --user flag" 31 | pip install --user python-dotenv 32 | echo 33 | 34 | echo "Method 4: Force reinstallation" 35 | pip install --force-reinstall python-dotenv 36 | echo 37 | 38 | # Verify installation 39 | echo "VERIFYING INSTALLATION:" 40 | echo "------------------------------" 41 | echo "Checking if dotenv is now installed:" 42 | python3 -c "import importlib.util; print('python-dotenv is INSTALLED' if importlib.util.find_spec('dotenv') else 'python-dotenv is NOT installed')" 43 | echo 44 | 45 | # Run the actual application with error handling 46 | echo "TESTING OSINT ASSISTANT:" 47 | echo "------------------------------" 48 | echo "Trying to import dotenv module..." 49 | python3 -c "try: import dotenv; print('Successfully imported dotenv module. Now checking for dotenv_values function...'); print('dotenv_values function exists' if hasattr(dotenv, 'dotenv_values') else 'dotenv_values function MISSING'); except ImportError as e: print(f'Error: {e}. Module still not installed correctly.')" 50 | echo 51 | 52 | echo "==== TROUBLESHOOTING RECOMMENDATIONS ====" 53 | echo 54 | echo "If the module is still not found:" 55 | echo "1. Try using pip3 instead of pip:" 56 | echo " pip3 install python-dotenv" 57 | echo 58 | echo "2. Try installing with sudo (if you have admin privileges):" 59 | echo " sudo pip install python-dotenv" 60 | echo 61 | echo "3. If you're using a virtual environment, make sure it's activated" 62 | echo 63 | echo "4. Manually add the site-packages path to PYTHONPATH:" 64 | echo " export PYTHONPATH=\$PYTHONPATH:[site-packages-path-shown-above]" 65 | echo 66 | echo "5. We've included a standalone dotenv.py module in the project directory" 67 | echo " This should work even without installing python-dotenv" 68 | echo 69 | 70 | # Make the script executable 71 | chmod +x run.sh -------------------------------------------------------------------------------- /fix_windows_dotenv.bat: -------------------------------------------------------------------------------- 1 | @echo off 2 | echo ===== OSINT Assistant Python Environment Diagnostic ===== 3 | echo. 4 | 5 | REM Show Python environment information 6 | echo PYTHON ENVIRONMENT INFORMATION: 7 | echo ------------------------------ 8 | where python 9 | python --version 10 | echo. 11 | echo Python executable path: 12 | python -c "import sys; print(sys.executable)" 13 | echo. 14 | echo Python site-packages location: 15 | python -c "import site; print(site.getsitepackages()[0])" 16 | echo. 17 | 18 | REM Try different pip installation methods 19 | echo ATTEMPTING MULTIPLE INSTALLATION METHODS: 20 | echo ------------------------------ 21 | echo Method 1: Standard pip install 22 | pip install python-dotenv 23 | echo. 24 | 25 | echo Method 2: Specific Python executable 26 | python -m pip install python-dotenv 27 | echo. 28 | 29 | echo Method 3: Using pip directly with --user flag 30 | pip install --user python-dotenv 31 | echo. 32 | 33 | echo Method 4: Force reinstallation 34 | pip install --force-reinstall python-dotenv 35 | echo. 36 | 37 | REM Verify installation 38 | echo VERIFYING INSTALLATION: 39 | echo ------------------------------ 40 | echo Checking if dotenv is now installed: 41 | python -c "import importlib.util; print('python-dotenv is INSTALLED' if importlib.util.find_spec('dotenv') else 'python-dotenv is NOT installed')" 42 | echo. 43 | 44 | REM Run the actual application with error handling 45 | echo TESTING OSINT ASSISTANT: 46 | echo ------------------------------ 47 | echo Trying to run OSINT web app... 48 | python -c "try: import dotenv; print('Successfully imported dotenv module.'); except ImportError as e: print(f'Error: {e}. Module still not installed correctly.')" 49 | echo. 50 | 51 | echo ==== TROUBLESHOOTING RECOMMENDATIONS ==== 52 | echo. 53 | echo If the module is still not found: 54 | echo 1. You may have multiple Python installations. Try using 'py' instead of 'python' 55 | echo py -m pip install python-dotenv 56 | echo. 57 | echo 2. Try installing with administrator privileges (Run Command Prompt as Administrator) 58 | echo. 59 | echo 3. If you're using Anaconda/virtual environments, make sure to activate the correct environment 60 | echo. 61 | echo 4. Manually add the site-packages path to PYTHONPATH: 62 | echo set PYTHONPATH=%PYTHONPATH%;[site-packages-path-shown-above] 63 | echo. 64 | echo 5. As a last resort, you can try placing dotenv.py directly in your project folder: 65 | echo curl -o dotenv.py https://raw.githubusercontent.com/theskumar/python-dotenv/master/src/dotenv/main.py 66 | echo. 67 | 68 | pause -------------------------------------------------------------------------------- /generated-icon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AXRoux/OSINT-Assistant/44c470d8f23c047b92d4a95dbbea4e55956d73eb/generated-icon.png -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AXRoux/OSINT-Assistant/44c470d8f23c047b92d4a95dbbea4e55956d73eb/main.py -------------------------------------------------------------------------------- /osint_assistant.py: -------------------------------------------------------------------------------- 1 | import requests 2 | import re 3 | import json 4 | import argparse 5 | import os 6 | from datetime import datetime, timedelta 7 | from typing import List, Dict, Any, Optional, Union, Tuple 8 | from urllib.parse import urlparse 9 | from rich.console import Console 10 | from rich.table import Table 11 | from pydantic import BaseModel, Field 12 | from openai import OpenAI 13 | from dotenv import load_dotenv 14 | 15 | # Load environment variables from .env file 16 | load_dotenv() 17 | 18 | # Pydantic models for structured data 19 | class Entity(BaseModel): 20 | name: str 21 | type: Optional[str] = None 22 | confidence: float = 1.0 23 | 24 | class Connection(BaseModel): 25 | from_entity: str = Field(..., alias="from") 26 | to_entity: str = Field(..., alias="to") 27 | relationship: str 28 | 29 | class Timestamp(BaseModel): 30 | published: Optional[str] = None 31 | last_updated: Optional[str] = None 32 | 33 | class ContentAnalysis(BaseModel): 34 | domain: str 35 | credibility_score: float 36 | key_entities: List[str] 37 | sentiment: str 38 | timestamps: Dict[str, str] 39 | connections: List[Dict[str, str]] 40 | 41 | class SearchResult(BaseModel): 42 | title: str 43 | url: str 44 | snippet: str 45 | source_type: str 46 | timestamp: str 47 | 48 | class OSINTReport(BaseModel): 49 | collected_data: List[SearchResult] 50 | analysis_results: Dict[str, ContentAnalysis] 51 | timestamp: str 52 | query_info: Dict[str, Any] 53 | 54 | class ApiClient: 55 | """Handles API communication with AI services""" 56 | 57 | def __init__(self, api_key: Optional[str] = None, base_url: str = "https://api.perplexity.ai"): 58 | self.api_key = api_key 59 | self.base_url = base_url 60 | self.console = Console() 61 | 62 | def call_api(self, messages: List[Dict[str, str]], model: str = "sonar-pro", 63 | temperature: float = 0.2, max_tokens: int = 2000) -> Optional[str]: 64 | """Make API call to the AI service""" 65 | if not self.api_key: 66 | self.console.print("[yellow]API capabilities not available: No API key provided[/yellow]") 67 | return None 68 | 69 | try: 70 | data = { 71 | "model": model, 72 | "messages": messages, 73 | "temperature": temperature, 74 | "max_tokens": max_tokens 75 | } 76 | 77 | headers = { 78 | "accept": "application/json", 79 | "content-type": "application/json", 80 | "Authorization": f"Bearer {self.api_key}" 81 | } 82 | 83 | response = requests.post( 84 | f"{self.base_url}/chat/completions", 85 | headers=headers, 86 | json=data 87 | ) 88 | response.raise_for_status() 89 | result = response.json() 90 | 91 | if "choices" in result and result["choices"] and "message" in result["choices"][0]: 92 | return result["choices"][0]["message"]["content"] 93 | else: 94 | self.console.print("[yellow]Unexpected API response format[/yellow]") 95 | return None 96 | 97 | except Exception as e: 98 | error_message = str(e) 99 | if "401" in error_message or "Authorization" in error_message: 100 | self.console.print("[yellow]Authorization error with API. Check your API key.[/yellow]") 101 | self.console.print(f"Error details: {error_message}") 102 | self.api_key = None # Disable API for future calls 103 | else: 104 | self.console.print(f"[red]Error using AI service: {error_message}[/red]") 105 | return None 106 | 107 | class JsonHelper: 108 | """Helper class for JSON operations""" 109 | 110 | @staticmethod 111 | def extract_json_from_text(text, console: Optional[Console] = None) -> Union[Dict, List, str]: 112 | """Extract JSON from text that may contain markdown or other non-JSON content.""" 113 | if console is None: 114 | console = Console() 115 | 116 | # Check if the text is already valid JSON 117 | try: 118 | return json.loads(text) 119 | except json.JSONDecodeError: 120 | pass 121 | 122 | # Try to extract JSON from markdown code blocks 123 | for pattern in ["```json", "```"]: 124 | if pattern in text: 125 | try: 126 | parts = text.split(pattern) 127 | if len(parts) > 1: 128 | json_content = parts[1].split("```")[0].strip() if pattern == "```json" else parts[1].strip() 129 | return json.loads(json_content) 130 | except (json.JSONDecodeError, IndexError): 131 | pass 132 | 133 | # Try to match JSON array or object patterns 134 | for pattern, is_array in [ 135 | (r'\[\s*{[\s\S]*}\s*\]', True), # Array pattern 136 | (r'{[\s\S]*}', False) # Object pattern 137 | ]: 138 | matches = re.findall(pattern, text) 139 | if matches: 140 | for match in matches: 141 | try: 142 | return json.loads(match) 143 | except json.JSONDecodeError: 144 | continue 145 | 146 | if console: 147 | console.print("[yellow]Could not extract valid JSON from text. Using text as-is.[/yellow]") 148 | return text 149 | 150 | class OSINTAssistant: 151 | """ 152 | A simple OSINT tool that collects and analyzes information from web sources 153 | based on specific search terms and parameters. Enhanced with AI capabilities. 154 | """ 155 | 156 | def __init__(self, api_key: Optional[str] = None): 157 | self.console = Console() 158 | self.collected_data = [] 159 | self.analysis_results = {} 160 | 161 | # Get API key from environment if not provided 162 | if not api_key: 163 | api_key = os.getenv("PERPLEXITY_API_KEY") 164 | 165 | # Initialize API client 166 | self.api_client = ApiClient(api_key) if api_key else None 167 | if not api_key: 168 | self.console.print("[yellow]No API key provided. AI capabilities will be disabled.[/yellow]") 169 | 170 | def ask_ai(self, query: str, system_prompt: Optional[str] = None) -> Optional[str]: 171 | """Use AI to enhance search and analysis capabilities with web search.""" 172 | if not self.api_client: 173 | self.console.print("[yellow]AI capabilities not available: No API key provided[/yellow]") 174 | return None 175 | 176 | # Default system prompt if none provided 177 | if not system_prompt: 178 | system_prompt = """ 179 | You are an OSINT (Open Source Intelligence) research assistant. Your task is to gather factual information about the query from the web. 180 | 181 | Rules: 182 | 1. Search the web for reliable, up-to-date information on the topic. 183 | 2. Provide only the final answer based on factual information found online. 184 | 3. Be precise, accurate, and comprehensive. 185 | 4. Cite sources where possible. 186 | 5. Organize information in a structured way. 187 | 6. Remain objective and avoid speculation. 188 | 189 | Steps: 190 | 1. Search for reliable information across multiple sources. 191 | 2. Verify facts by cross-referencing sources. 192 | 3. Organize facts into a coherent response. 193 | 4. Format your response as JSON when requested. 194 | """ 195 | 196 | messages = [ 197 | {"role": "system", "content": system_prompt}, 198 | {"role": "user", "content": query} 199 | ] 200 | 201 | return self.api_client.call_api(messages) 202 | 203 | def search_web(self, query: str, num_results: int = 10) -> List[SearchResult]: 204 | """Search the web for information related to a specific query, enhanced with AI.""" 205 | self.console.print(f"[bold blue]Searching for:[/bold blue] {query}") 206 | 207 | # Check if we have AI capabilities 208 | if not self.api_client: 209 | self.console.print("[yellow]AI capabilities not available. No results available.[/yellow]") 210 | self.collected_data = [] 211 | return [] 212 | 213 | # Make multiple search queries to get the required number of results 214 | ai_results = [] 215 | max_attempts = min(3, num_results) # Limit to 3 attempts to avoid excessive API calls 216 | 217 | # First query - direct search 218 | direct_results = self._perform_search(query, num_results) 219 | if direct_results: 220 | ai_results.extend(direct_results) 221 | 222 | # If we still need more results, try different variations of the query 223 | if len(ai_results) < num_results and max_attempts > 1: 224 | variations = [ 225 | f"latest news about {query}", 226 | f"recent developments regarding {query}", 227 | f"current information on {query}" 228 | ] 229 | 230 | existing_urls = {result["url"] for result in ai_results} 231 | 232 | for i, variation in enumerate(variations): 233 | if len(ai_results) >= num_results or i >= max_attempts - 1: 234 | break 235 | 236 | self.console.print(f"[blue]Searching for additional results with variation: {variation}[/blue]") 237 | additional_results = self._perform_search(variation, num_results - len(ai_results)) 238 | 239 | # Add only unique results (checking by URL) 240 | if additional_results: 241 | for result in additional_results: 242 | if result["url"] not in existing_urls: 243 | ai_results.append(result) 244 | existing_urls.add(result["url"]) 245 | 246 | # Limit results to the requested number 247 | ai_results = ai_results[:num_results] 248 | 249 | # Convert to Pydantic models and store in collected_data 250 | search_results = self._process_search_results(ai_results) 251 | 252 | self.console.print(f"[green]Found {len(search_results)} results[/green]") 253 | return search_results 254 | 255 | def _process_search_results(self, results: List[Dict]) -> List[SearchResult]: 256 | """Process and validate search results""" 257 | search_results = [] 258 | self.collected_data = [] # Reset collected data 259 | 260 | for result in results: 261 | try: 262 | # Ensure each result has all required fields 263 | if "timestamp" not in result or not result["timestamp"]: 264 | result["timestamp"] = datetime.now().strftime("%Y-%m-%d") 265 | 266 | search_result = SearchResult(**result) 267 | search_results.append(search_result) 268 | # Add to collected data list (as dict for compatibility with existing code) 269 | self.collected_data.append(result) 270 | except Exception as e: 271 | self.console.print(f"[red]Error parsing result: {str(e)}[/red]") 272 | self.console.print(f"[dim]Result data: {result}[/dim]") 273 | 274 | return search_results 275 | 276 | def _perform_search(self, query: str, num_results: int) -> List[Dict]: 277 | """Perform a single search with AI API.""" 278 | system_prompt = f""" 279 | You are an OSINT JSON response generator. Your ONLY job is to return search results in JSON format. 280 | 281 | IMPORTANT: Your response MUST be ONLY a valid JSON array with {num_results} search results. 282 | DO NOT include any explanations, markdown formatting, or text before or after the JSON. 283 | 284 | JSON format should be an array of objects with these exact fields: 285 | - "title": The title of the result 286 | - "url": The complete URL of the source 287 | - "snippet": A short excerpt or summary (1-3 sentences) 288 | - "source_type": The type of source (e.g., "News", "Academic", "Government") 289 | - "timestamp": Current date in YYYY-MM-DD format 290 | 291 | Example of EXACTLY how your response should look: 292 | [ 293 | {{ 294 | "title": "Example Title", 295 | "url": "https://example.com/article", 296 | "snippet": "Brief excerpt from the source.", 297 | "source_type": "News", 298 | "timestamp": "2023-04-15" 299 | }}, 300 | ... more results ... 301 | ] 302 | 303 | Search for timely, factual information from credible sources. Focus on recent content (last 6 months). 304 | """ 305 | try: 306 | messages = [ 307 | {"role": "system", "content": system_prompt}, 308 | {"role": "user", "content": f"Return EXACTLY {num_results} search results about '{query}' in ONLY JSON format with no other text."} 309 | ] 310 | 311 | content = self.api_client.call_api( 312 | messages, 313 | temperature=0.1, # Lower temperature for more reliable formatting 314 | max_tokens=4000 # Increase token limit to accommodate more results 315 | ) 316 | 317 | if not content: 318 | return self._fallback_search(query, num_results) 319 | 320 | # Process the response 321 | json_content = JsonHelper.extract_json_from_text(content, self.console) 322 | 323 | if isinstance(json_content, list): 324 | self.console.print(f"[green]Successfully retrieved {len(json_content)} results from AI[/green]") 325 | return json_content 326 | elif isinstance(json_content, dict): 327 | # Try to get a list from a field in the dictionary if available 328 | for field in ["results", "data", "items", "content"]: 329 | if field in json_content and isinstance(json_content[field], list): 330 | return json_content[field] 331 | # Fall back to wrapping the dict in a list 332 | self.console.print("[green]Successfully retrieved 1 result after JSON extraction[/green]") 333 | return [json_content] 334 | else: 335 | self.console.print("[yellow]Could not parse API response as JSON. Trying alternate approach.[/yellow]") 336 | return self._fallback_search(query, num_results) 337 | 338 | except Exception as e: 339 | self.console.print(f"[yellow]Error in AI search: {str(e)}. Trying fallback approach.[/yellow]") 340 | return self._fallback_search(query, num_results) 341 | 342 | def _fallback_search(self, query: str, num_results: int) -> List[Dict]: 343 | """Fallback method for search when JSON formatting fails.""" 344 | try: 345 | results = [] 346 | 347 | for i in range(min(3, num_results)): 348 | prompt = f""" 349 | What is a single important, recent result about "{query}"? 350 | Provide only a JSON object with these fields: 351 | - title: The title 352 | - url: The URL 353 | - snippet: A brief excerpt (1-2 sentences) 354 | - source_type: The type of source (News, Academic, etc.) 355 | - timestamp: Today's date 356 | """ 357 | ai_response = self.ask_ai(prompt, "You must respond with only a single JSON object, nothing else.") 358 | if ai_response: 359 | json_content = JsonHelper.extract_json_from_text(ai_response, self.console) 360 | 361 | if isinstance(json_content, dict): 362 | # Make sure it has required fields 363 | required_fields = ["title", "url", "snippet", "source_type"] 364 | if all(field in json_content for field in required_fields): 365 | # Add timestamp if missing 366 | if "timestamp" not in json_content: 367 | json_content["timestamp"] = datetime.now().strftime("%Y-%m-%d") 368 | # Check if this URL is already in results 369 | if not any(result.get("url") == json_content.get("url") for result in results): 370 | results.append(json_content) 371 | 372 | if len(results) >= num_results: 373 | break 374 | 375 | if results: 376 | self.console.print(f"[green]Retrieved {len(results)} results using fallback approach[/green]") 377 | return results 378 | 379 | # If still no results, fall back to simulated data 380 | self.console.print("[yellow]No results from AI. Using simulated data.[/yellow]") 381 | return self._generate_simulated_results(query, num_results) 382 | 383 | except Exception as e: 384 | self.console.print(f"[red]Fallback search failed: {str(e)}[/red]") 385 | return self._generate_simulated_results(query, num_results) 386 | 387 | def analyze_content(self, url: str) -> Optional[ContentAnalysis]: 388 | """Analyze the content of a specific URL to extract relevant information, enhanced with AI.""" 389 | self.console.print(f"[bold blue]Analyzing content from:[/bold blue] {url}") 390 | 391 | # Use AI for content analysis if available 392 | ai_analysis = None 393 | if self.api_client: 394 | system_prompt = """ 395 | You are an OSINT content analyzer tasked with analyzing a URL and extracting structured information. 396 | 397 | Rules: 398 | 1. Search for and analyze the content at the provided URL. 399 | 2. Provide a detailed, evidence-based analysis. 400 | 3. Format your response as a JSON object with these exact fields: 401 | - domain: The domain name of the URL (e.g., "example.com") 402 | - credibility_score: A float from 0.0 to 1.0 indicating source credibility (higher = more credible) 403 | - key_entities: Array of strings for key people, organizations, or concepts mentioned 404 | - sentiment: One of "positive", "negative", or "neutral" 405 | - timestamps: Object with two string fields in YYYY-MM-DD format: 406 | * "published": When the content was first published 407 | * "last_updated": When the content was last updated 408 | - connections: Array of objects showing relationships between entities: 409 | * "from": The source entity 410 | * "to": The target entity 411 | * "relationship": The type of relationship 412 | 413 | Return ONLY valid JSON with no additional text or explanation. 414 | """ 415 | ai_response = self.ask_ai(f"Analyze this URL and extract structured information: {url}", system_prompt) 416 | 417 | if ai_response: 418 | # Extract JSON from the response 419 | json_content = JsonHelper.extract_json_from_text(ai_response, self.console) 420 | if isinstance(json_content, dict): 421 | ai_analysis = json_content 422 | self.console.print("[green]Successfully analyzed content with AI[/green]") 423 | else: 424 | self.console.print("[yellow]Could not parse AI analysis as JSON. Using fallback analysis[/yellow]") 425 | if isinstance(json_content, str) and len(json_content) > 50: 426 | self.console.print(f"[dim]Response: {json_content[:100]}...[/dim]") 427 | else: 428 | self.console.print("[yellow]No response from AI. Using fallback analysis[/yellow]") 429 | 430 | # If no AI analysis, use simulated analysis 431 | if not ai_analysis: 432 | self.console.print("[yellow]Using simulated content analysis[/yellow]") 433 | domain = urlparse(url).netloc 434 | ai_analysis = self._generate_simulated_analysis(url, domain) 435 | 436 | # Validate and normalize analysis data 437 | normalized_analysis = self._normalize_analysis_data(ai_analysis) 438 | 439 | # Convert to Pydantic model and store 440 | try: 441 | content_analysis = ContentAnalysis(**normalized_analysis) 442 | self.analysis_results[url] = normalized_analysis # Store as dict for compatibility 443 | self.console.print("[green]Analysis complete[/green]") 444 | return content_analysis 445 | except Exception as e: 446 | self.console.print(f"[red]Error in content analysis: {str(e)}[/red]") 447 | return None 448 | 449 | def _generate_simulated_analysis(self, url: str, domain: str) -> Dict[str, Any]: 450 | """Generate simulated content analysis when AI is not available""" 451 | return { 452 | "domain": domain, 453 | "credibility_score": self._calculate_credibility(domain), 454 | "key_entities": self._extract_entities(url), 455 | "sentiment": self._analyze_sentiment(url), 456 | "timestamps": self._extract_timestamps(url), 457 | "connections": self._find_connections(url) 458 | } 459 | 460 | def _normalize_analysis_data(self, analysis: Dict[str, Any]) -> Dict[str, Any]: 461 | """Normalize analysis data to ensure it conforms to the ContentAnalysis model""" 462 | fixed = analysis.copy() 463 | 464 | # Use urlparse to get domain if needed 465 | if "domain" not in fixed or not isinstance(fixed["domain"], str): 466 | if self.collected_data: 467 | fixed["domain"] = urlparse(next(iter(self.collected_data), {}).get("url", "unknown.com")).netloc 468 | else: 469 | fixed["domain"] = "unknown.com" 470 | 471 | # Normalize credibility_score to a float between 0 and 1 472 | if "credibility_score" not in fixed or not isinstance(fixed["credibility_score"], (int, float)): 473 | fixed["credibility_score"] = 0.5 474 | else: 475 | fixed["credibility_score"] = max(0.0, min(1.0, float(fixed["credibility_score"]))) 476 | 477 | # Ensure key_entities is a list of strings 478 | if "key_entities" not in fixed or not isinstance(fixed["key_entities"], list): 479 | fixed["key_entities"] = ["Unknown Entity"] 480 | 481 | # Normalize sentiment to one of the valid values 482 | valid_sentiments = ["positive", "negative", "neutral"] 483 | if "sentiment" not in fixed or fixed["sentiment"] not in valid_sentiments: 484 | fixed["sentiment"] = "neutral" 485 | 486 | # Normalize timestamps to the required format 487 | fixed["timestamps"] = self._normalize_timestamps(fixed.get("timestamps", {})) 488 | 489 | # Normalize connections to the required format 490 | fixed["connections"] = self._normalize_connections(fixed.get("connections", [])) 491 | 492 | return fixed 493 | 494 | def _normalize_timestamps(self, timestamps: Union[Dict[str, str], None]) -> Dict[str, str]: 495 | """Normalize timestamps to ensure they are in the correct format""" 496 | if not timestamps or not isinstance(timestamps, dict): 497 | return {"published": "N/A", "last_updated": "N/A"} 498 | 499 | normalized = {} 500 | for field in ["published", "last_updated"]: 501 | if field not in timestamps or not isinstance(timestamps[field], str) or timestamps[field] == "None": 502 | normalized[field] = "N/A" 503 | else: 504 | normalized[field] = timestamps[field] 505 | 506 | return normalized 507 | 508 | def _normalize_connections(self, connections: Union[List[Dict[str, str]], None]) -> List[Dict[str, str]]: 509 | """Normalize connections to ensure they are in the correct format""" 510 | if not connections or not isinstance(connections, list): 511 | return [{"from": "Entity A", "to": "Entity B", "relationship": "related to"}] 512 | 513 | normalized = [] 514 | for conn in connections: 515 | if isinstance(conn, dict): 516 | valid_conn = {} 517 | # Add required fields with defaults if missing 518 | valid_conn["from"] = conn.get("from", "Entity A") if isinstance(conn.get("from"), str) else "Entity A" 519 | valid_conn["to"] = conn.get("to", "Entity B") if isinstance(conn.get("to"), str) else "Entity B" 520 | valid_conn["relationship"] = conn.get("relationship", "related to") if isinstance(conn.get("relationship"), str) else "related to" 521 | normalized.append(valid_conn) 522 | 523 | return normalized if normalized else [{"from": "Entity A", "to": "Entity B", "relationship": "related to"}] 524 | 525 | def _calculate_credibility(self, domain: str) -> float: 526 | """Calculate a basic credibility score for a domain.""" 527 | # This would use heuristics or a database of known sources in a real tool 528 | credibility_db = { 529 | "example.com": 0.6, 530 | "dataresearch.org": 0.8, 531 | "gov.reports.org": 0.9 532 | } 533 | return credibility_db.get(domain, 0.5) # Default score for unknown domains 534 | 535 | def _extract_entities(self, url: str) -> List[str]: 536 | """Extract key entities mentioned in the content.""" 537 | # Simulated entity extraction 538 | if "analysis" in url: 539 | return ["Organization X", "Technology Y", "Process Z"] 540 | elif "stats" in url: 541 | return ["Dataset A", "Trend B", "Factor C"] 542 | else: 543 | return ["Policy D", "Department E", "Initiative F"] 544 | 545 | def _analyze_sentiment(self, url: str) -> str: 546 | """Analyze the sentiment of the content.""" 547 | # Simulated sentiment analysis 548 | sentiments = ["positive", "negative", "neutral"] 549 | return sentiments[hash(url) % 3] 550 | 551 | def _extract_timestamps(self, url: str) -> Dict[str, str]: 552 | """Extract relevant timestamps from the content.""" 553 | # Simulated timestamp extraction 554 | return { 555 | "published": (datetime.now() - timedelta(days=5)).strftime("%Y-%m-%d"), 556 | "last_updated": datetime.now().strftime("%Y-%m-%d") 557 | } 558 | 559 | def _find_connections(self, url: str) -> List[Dict[str, str]]: 560 | """Find connections between entities in the content.""" 561 | # Simulated connection finding 562 | connections = [ 563 | {"from": "Entity A", "to": "Entity B", "relationship": "funds"}, 564 | {"from": "Entity C", "to": "Entity D", "relationship": "collaborates with"} 565 | ] 566 | return connections 567 | 568 | def generate_report(self) -> None: 569 | """Generate a comprehensive report from the collected data.""" 570 | self.console.print("\n[bold yellow]===== OSINT ANALYSIS REPORT =====[/bold yellow]") 571 | 572 | # Summary table 573 | self._generate_summary_table() 574 | 575 | # Key findings 576 | self._generate_key_findings() 577 | 578 | # Generate insights using AI or fallback 579 | self._generate_insights() 580 | 581 | # Export options 582 | self._show_export_options() 583 | 584 | def _generate_summary_table(self) -> None: 585 | """Generate and display the summary table""" 586 | table = Table(title="Data Collection Summary") 587 | table.add_column("Source", style="cyan") 588 | table.add_column("Type", style="green") 589 | table.add_column("Credibility", style="yellow") 590 | 591 | for item in self.collected_data: 592 | domain = urlparse(item["url"]).netloc 593 | credibility = self._calculate_credibility(domain) 594 | table.add_row(domain, item["source_type"], f"{credibility:.2f}") 595 | 596 | self.console.print(table) 597 | 598 | def _generate_key_findings(self) -> Dict[str, int]: 599 | """Generate and display key findings""" 600 | self.console.print("\n[bold blue]Key Findings:[/bold blue]") 601 | all_entities = [] 602 | for item in self.collected_data: 603 | entities = self._extract_entities(item["url"]) 604 | all_entities.extend(entities) 605 | 606 | # Find most frequent entities 607 | entity_counts = {} 608 | for entity in all_entities: 609 | entity_counts[entity] = entity_counts.get(entity, 0) + 1 610 | 611 | for entity, count in sorted(entity_counts.items(), key=lambda x: x[1], reverse=True): 612 | self.console.print(f"- [bold]{entity}[/bold] mentioned {count} times") 613 | 614 | return entity_counts 615 | 616 | def _generate_insights(self) -> None: 617 | """Generate and display insights""" 618 | if not self.api_client: 619 | # Fallback recommendations when AI is not available 620 | self.console.print("\n[bold blue]Recommendations:[/bold blue]") 621 | self.console.print("- Further investigate connections between Entity A and Entity C") 622 | self.console.print("- Monitor developments in Technology Y over the next 30 days") 623 | self.console.print("- Cross-reference findings with Dataset A for validation") 624 | return 625 | 626 | # Use AI to generate insights 627 | entity_counts = self._generate_key_findings() 628 | data_summary = self._prepare_data_summary(entity_counts) 629 | 630 | system_prompt = """ 631 | You are an OSINT analyst tasked with generating insights from collected intelligence data. 632 | 633 | Rules: 634 | 1. Analyze the OSINT data summary provided. 635 | 2. Generate 3-5 key insights based on the data. 636 | 3. Each insight should be specific, actionable, and data-driven. 637 | 4. Include a brief conclusion and suggested next steps. 638 | 5. Be objective and analytical in your assessment. 639 | 640 | Format your response in a structured way with clear section headings. 641 | """ 642 | 643 | ai_insights = self.ask_ai(f"Analyze this OSINT data summary and provide insights: {json.dumps(data_summary)}", system_prompt) 644 | 645 | if ai_insights: 646 | self.console.print("\n[bold blue]AI-Generated Insights:[/bold blue]") 647 | self.console.print(ai_insights) 648 | else: 649 | # Fallback recommendations if AI insights not available 650 | self.console.print("\n[bold blue]Recommendations:[/bold blue]") 651 | self.console.print("- Further investigate connections between entities") 652 | self.console.print("- Monitor developments over the next 30 days") 653 | self.console.print("- Cross-reference findings with additional sources for validation") 654 | 655 | def _prepare_data_summary(self, entity_counts: Dict[str, int]) -> Dict[str, Any]: 656 | """Prepare data summary for AI insights generation""" 657 | # Extract query info from first item's URL 658 | query_info = "N/A" 659 | if self.collected_data and len(self.collected_data) > 0: 660 | url = self.collected_data[0]["url"] 661 | query_parts = url.split('/')[-1].replace('-', ' ').replace('_', ' ') 662 | query_info = query_parts 663 | 664 | # Get top entities 665 | top_entities = [e for e, _ in sorted(entity_counts.items(), key=lambda x: x[1], reverse=True)[:5]] 666 | 667 | # Calculate timespan from timestamps 668 | timespan = self._calculate_timespan() 669 | 670 | # Prepare complete data summary 671 | return { 672 | "query": query_info, 673 | "sources": len(self.collected_data), 674 | "top_entities": top_entities, 675 | "timespan": timespan, 676 | "domains": [urlparse(item["url"]).netloc for item in self.collected_data], 677 | "sentiments": [self.analysis_results[item["url"]]["sentiment"] 678 | for item in self.collected_data 679 | if item["url"] in self.analysis_results] 680 | } 681 | 682 | def _calculate_timespan(self) -> str: 683 | """Calculate the timespan of data from timestamps""" 684 | timespan = "Recent" 685 | dates = [] 686 | 687 | for url, analysis in self.analysis_results.items(): 688 | if "timestamps" in analysis and "published" in analysis["timestamps"]: 689 | pub_date = analysis["timestamps"]["published"] 690 | if pub_date and pub_date != "N/A": 691 | dates.append(pub_date) 692 | 693 | if dates: 694 | try: 695 | min_date = min(dates) 696 | max_date = max(dates) 697 | timespan = f"{min_date} to {max_date}" 698 | except Exception: 699 | pass 700 | 701 | return timespan 702 | 703 | def _show_export_options(self) -> None: 704 | """Display export options""" 705 | self.console.print("\n[bold blue]Export Options:[/bold blue]") 706 | self.console.print("1. Export as JSON (Pydantic model_dump_json)") 707 | self.console.print("2. Export as CSV") 708 | self.console.print("3. Generate visualization") 709 | 710 | def save_data(self, filename: str = "osint_data.json") -> None: 711 | """Save the collected data to a file using Pydantic for JSON serialization.""" 712 | # Create a Pydantic model for the full report 713 | report = OSINTReport( 714 | collected_data=self.collected_data, 715 | analysis_results=self.analysis_results, 716 | timestamp=datetime.now().strftime("%Y-%m-%d %H:%M:%S"), 717 | query_info={ 718 | "queries_executed": len(self.collected_data) 719 | } 720 | ) 721 | 722 | # Use model_dump_json to properly serialize the data 723 | with open(filename, 'w') as f: 724 | f.write(report.model_dump_json(indent=4)) 725 | 726 | self.console.print(f"[green]Data saved to {filename} using Pydantic serialization[/green]") 727 | 728 | def _generate_simulated_results(self, query: str, count: int) -> List[Dict[str, str]]: 729 | """Generate simulated search results.""" 730 | simulated_results = [] 731 | domains = ["example.com", "newsportal.org", "research.edu", "dataresearch.org", 732 | "gov.reports.org", "analysis.net", "insights.io", "factfinder.com", 733 | "academicjournal.edu", "trustednews.com"] 734 | source_types = ["Article", "Research", "News", "Blog", "Government", "Academic", "Analysis"] 735 | title_prefixes = ["Analysis of", "Report on", "Latest on", "Deep Dive into", 736 | "Investigation of", "Overview of", "Update on", "Trends in"] 737 | 738 | for i in range(count): 739 | domain = domains[i % len(domains)] 740 | source_type = source_types[i % len(source_types)] 741 | title_prefix = title_prefixes[i % len(title_prefixes)] 742 | 743 | result = { 744 | "title": f"{title_prefix} {query}: Part {i+1}", 745 | "url": f"https://{domain}/analysis/{query.lower().replace(' ', '-')}-part-{i+1}", 746 | "snippet": f"This {source_type.lower()} provides insight {i+1} about {query}, revealing important details and context...", 747 | "source_type": source_type, 748 | "timestamp": datetime.now().strftime("%Y-%m-%d") 749 | } 750 | simulated_results.append(result) 751 | 752 | return simulated_results 753 | 754 | def main(): 755 | parser = argparse.ArgumentParser(description="OSINT Assistant - A tool for collecting and analyzing open-source intelligence with AI capabilities") 756 | parser.add_argument("--query", "-q", type=str, help="The search query to investigate") 757 | parser.add_argument("--results", "-r", type=int, default=10, help="Number of results to collect") 758 | parser.add_argument("--save", "-s", action="store_true", help="Save the collected data") 759 | parser.add_argument("--api-key", "-k", type=str, help="Perplexity API key for AI-enhanced capabilities (overrides .env file)") 760 | parser.add_argument("--json", "-j", action="store_true", help="Output results as JSON") 761 | 762 | args = parser.parse_args() 763 | 764 | # Initialize with API key if provided 765 | assistant = OSINTAssistant(api_key=args.api_key) 766 | 767 | if args.query: 768 | search_results = assistant.search_web(args.query, args.results) 769 | 770 | # Analyze each result 771 | for item in assistant.collected_data: 772 | analysis = assistant.analyze_content(item["url"]) 773 | 774 | # Generate report or output JSON 775 | if args.json: 776 | # Create report object 777 | report = OSINTReport( 778 | collected_data=assistant.collected_data, 779 | analysis_results=assistant.analysis_results, 780 | timestamp=datetime.now().strftime("%Y-%m-%d %H:%M:%S"), 781 | query_info={ 782 | "query": args.query, 783 | "results_requested": args.results, 784 | "results_found": len(assistant.collected_data) 785 | } 786 | ) 787 | 788 | # Output as JSON 789 | print(report.model_dump_json(indent=2)) 790 | else: 791 | # Generate human-readable report 792 | assistant.generate_report() 793 | 794 | # Save data if requested 795 | if args.save: 796 | assistant.save_data() 797 | else: 798 | print("Please provide a search query using the --query or -q option") 799 | 800 | if __name__ == "__main__": 801 | main() -------------------------------------------------------------------------------- /osint_web_app.py: -------------------------------------------------------------------------------- 1 | from flask import Flask, request, jsonify, render_template_string, redirect, url_for 2 | from flask_cors import CORS 3 | import os 4 | import json 5 | import time 6 | from datetime import datetime 7 | from osint_assistant import OSINTAssistant 8 | 9 | app = Flask(__name__) 10 | CORS(app) # Enable CORS for all routes 11 | 12 | # HTML template for the application 13 | APP_HTML = """ 14 | 15 | 16 | 17 | 18 | 19 | OSINT Assistant 20 | 21 | 22 | 23 | 552 | 553 | 554 |
555 |
556 |
557 |
558 |
559 | 560 |

OSINT Assistant

561 |
562 |

AI-Enhanced Open Source Intelligence Tool

563 |
564 |
565 |
566 | 567 | Secure Intelligence 568 |
569 |
570 | 571 | 575 | 576 |
577 |
578 |
579 |
580 |
581 | 582 |
583 |
584 |
585 |

586 | 587 | Begin Your Intelligence Search 588 |

589 |
590 | 593 |
594 |
595 | 596 |
597 | 606 |
607 |
608 | 609 |
610 | 626 |
627 | 628 | 667 | 668 |
669 | 676 |
677 |
678 |
679 | 680 | 681 | {% if is_loading %} 682 |
683 |
684 |

Gathering intelligence data...

685 |
686 | {% endif %} 687 | 688 | 689 | {% if error %} 690 |
691 |
692 |
693 | 694 |
695 |
696 |

Error Occurred

697 |

{{ error }}

698 |
699 |
700 |
701 | {% endif %} 702 | 703 | 704 | {% if results %} 705 |
706 | 707 |
708 |
709 |
710 | 711 |

Search Results

712 |
713 |
714 |
    715 | {% for result in results.collected_data %} 716 |
  • 720 |

    {{ result.title }}

    721 |

    722 | 723 | {{ result.url }} 724 |

    725 |

    {{ result.snippet }}

    726 |
    727 | 728 | {{ result.source_type }} 729 | 730 | 731 | 732 | {{ result.timestamp }} 733 | 734 |
    735 |
  • 736 | {% endfor %} 737 |
738 |
739 |
740 |
741 | 742 | 743 |
744 | {% if selected_result and selected_analysis %} 745 |
746 |
747 |
748 |
749 |

{{ selected_result.title }}

750 | 756 | 757 | {{ selected_result.url }} 758 | 759 |
760 | 771 |
772 |
773 | 774 |
775 |
776 |
777 |
778 | 779 |

Source Details

780 |
781 |
782 |
783 | 784 | 785 | Domain: 786 | 787 | {{ selected_analysis.domain }} 788 |
789 |
790 | 791 | 792 | Credibility: 793 | 794 | 803 | {{ (selected_analysis.credibility_score * 100) | int }}% 804 | {% if selected_analysis.credibility_score >= 0.7 %} 805 | 806 | {% elif selected_analysis.credibility_score >= 0.4 %} 807 | 808 | {% else %} 809 | 810 | {% endif %} 811 | 812 |
813 |
814 | 815 | 816 | Sentiment: 817 | 818 | 827 | {{ selected_analysis.sentiment | capitalize }} 828 | {% if selected_analysis.sentiment == 'positive' %} 829 | 830 | {% elif selected_analysis.sentiment == 'negative' %} 831 | 832 | {% else %} 833 | 834 | {% endif %} 835 | 836 |
837 |
838 | 839 | 840 | Published: 841 | 842 | {{ selected_analysis.timestamps.published }} 843 |
844 |
845 | 846 | 847 | Last Updated: 848 | 849 | {{ selected_analysis.timestamps.last_updated }} 850 |
851 |
852 |
853 | 854 |
855 |
856 | 857 |

Key Entities

858 |
859 |
860 | {% for entity in selected_analysis.key_entities %} 861 | 862 | {{ entity }} 863 | 864 | {% endfor %} 865 |
866 |
867 |
868 | 869 |
870 |
871 | 872 |

Content Summary

873 |
874 |

{{ selected_result.snippet }}

875 |
876 | 877 | {% if selected_analysis.connections and selected_analysis.connections|length > 0 %} 878 |
879 |
880 | 881 |

Entity Connections

882 |
883 |
884 | 885 | 886 | 887 | 888 | 889 | 890 | 891 | 892 | 893 | {% for connection in selected_analysis.connections %} 894 | 895 | 896 | 897 | 898 | 899 | {% endfor %} 900 | 901 |
FromRelationshipTo
{{ connection.from }}{{ connection.relationship }}{{ connection.to }}
902 |
903 |
904 | {% endif %} 905 |
906 |
907 | {% else %} 908 |
909 |
910 | 911 |
912 |

913 | {% if results %} 914 | Select a result to view analysis 915 | {% else %} 916 | Start your intelligence search 917 | {% endif %} 918 |

919 |

920 | {% if results %} 921 | Click on any search result from the list to view detailed analysis and insights. 922 | {% else %} 923 | Enter your search query above to begin gathering open-source intelligence data. 924 | {% endif %} 925 |

926 |
927 | {% endif %} 928 |
929 |
930 | {% endif %} 931 |
932 | 933 | 951 | 952 | 953 | 989 | 990 | 991 | 1027 | 1028 | 1121 | 1122 | 1123 | """ 1124 | 1125 | # Store session data (normally you'd use a proper database) 1126 | session_results = {} 1127 | 1128 | @app.route('/', methods=['GET']) 1129 | def index(): 1130 | """Render the main application page""" 1131 | return render_template_string(APP_HTML, 1132 | results=None, 1133 | selected_result=None, 1134 | selected_analysis=None, 1135 | query=None, 1136 | is_loading=False, 1137 | error=None, 1138 | current_year=datetime.now().year) 1139 | 1140 | @app.route('/search', methods=['POST']) 1141 | def search(): 1142 | """Handle search form submission""" 1143 | # Get form data 1144 | query = request.form.get('query') 1145 | api_key = request.form.get('api_key', '') 1146 | num_results = int(request.form.get('num_results', 10)) 1147 | 1148 | if not query: 1149 | return render_template_string(APP_HTML, 1150 | results=None, 1151 | selected_result=None, 1152 | selected_analysis=None, 1153 | query=None, 1154 | is_loading=False, 1155 | error="No query provided", 1156 | current_year=datetime.now().year) 1157 | 1158 | try: 1159 | # Initialize the OSINT Assistant 1160 | assistant = OSINTAssistant(api_key=api_key if api_key else None) 1161 | 1162 | # Perform the search 1163 | search_results = assistant.search_web(query, num_results) 1164 | 1165 | # Analyze results 1166 | for item in assistant.collected_data: 1167 | analysis = assistant.analyze_content(item["url"]) 1168 | 1169 | # Create the report 1170 | report = { 1171 | "collected_data": assistant.collected_data, 1172 | "analysis_results": assistant.analysis_results, 1173 | "query_info": { 1174 | "query": query, 1175 | "results_requested": num_results, 1176 | "results_found": len(assistant.collected_data) 1177 | } 1178 | } 1179 | 1180 | # Store in session 1181 | session_id = str(int(time.time())) 1182 | session_results[session_id] = report 1183 | 1184 | # Redirect to the results page 1185 | return redirect(f'/results/{session_id}') 1186 | 1187 | except Exception as e: 1188 | return render_template_string(APP_HTML, 1189 | results=None, 1190 | selected_result=None, 1191 | selected_analysis=None, 1192 | query=query, 1193 | is_loading=False, 1194 | error=str(e), 1195 | current_year=datetime.now().year) 1196 | 1197 | @app.route('/results/', methods=['GET']) 1198 | def results(session_id): 1199 | """Show search results""" 1200 | if session_id not in session_results: 1201 | return redirect('/') 1202 | 1203 | results = session_results[session_id] 1204 | 1205 | return render_template_string(APP_HTML, 1206 | results=results, 1207 | selected_result=None, 1208 | selected_analysis=None, 1209 | query=results['query_info']['query'], 1210 | is_loading=False, 1211 | error=None, 1212 | selected_url=None, 1213 | session_id=session_id, 1214 | current_year=datetime.now().year) 1215 | 1216 | @app.route('/results//', methods=['GET']) 1217 | def search_results(session_id, result_index): 1218 | """Show a specific search result with analysis""" 1219 | if session_id not in session_results: 1220 | return redirect('/') 1221 | 1222 | results = session_results[session_id] 1223 | 1224 | if result_index >= len(results['collected_data']): 1225 | return redirect(f'/results/{session_id}') 1226 | 1227 | selected_result = results['collected_data'][result_index] 1228 | selected_analysis = results['analysis_results'].get(selected_result['url']) 1229 | 1230 | return render_template_string(APP_HTML, 1231 | results=results, 1232 | selected_result=selected_result, 1233 | selected_analysis=selected_analysis, 1234 | query=results['query_info']['query'], 1235 | is_loading=False, 1236 | error=None, 1237 | selected_url=selected_result['url'], 1238 | session_id=session_id, 1239 | current_year=datetime.now().year) 1240 | 1241 | @app.route('/api/search', methods=['POST']) 1242 | def api_search(): 1243 | """API endpoint to perform OSINT search and analysis""" 1244 | data = request.json 1245 | query = data.get('query') 1246 | num_results = data.get('num_results', 10) 1247 | api_key = data.get('api_key', None) 1248 | 1249 | if not query: 1250 | return jsonify({"error": "No query provided"}), 400 1251 | 1252 | try: 1253 | # Initialize the OSINT Assistant 1254 | assistant = OSINTAssistant(api_key=api_key) 1255 | 1256 | # Perform the search 1257 | search_results = assistant.search_web(query, num_results) 1258 | 1259 | # Analyze results 1260 | for item in assistant.collected_data: 1261 | analysis = assistant.analyze_content(item["url"]) 1262 | 1263 | # Create the report 1264 | report = { 1265 | "collected_data": assistant.collected_data, 1266 | "analysis_results": assistant.analysis_results, 1267 | "query_info": { 1268 | "query": query, 1269 | "results_requested": num_results, 1270 | "results_found": len(assistant.collected_data) 1271 | } 1272 | } 1273 | 1274 | return jsonify(report) 1275 | 1276 | except Exception as e: 1277 | return jsonify({"error": str(e)}), 500 1278 | 1279 | if __name__ == '__main__': 1280 | app.run(debug=True, host='0.0.0.0', port=5000) -------------------------------------------------------------------------------- /package-lock.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "workspace", 3 | "version": "1.0.0", 4 | "lockfileVersion": 3, 5 | "requires": true, 6 | "packages": { 7 | "": { 8 | "name": "workspace", 9 | "version": "1.0.0", 10 | "license": "ISC" 11 | } 12 | } 13 | } 14 | -------------------------------------------------------------------------------- /package.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "workspace", 3 | "version": "1.0.0", 4 | "description": "An AI-enhanced OSINT (Open Source Intelligence) tool for gathering, analyzing, and reporting on information from various web sources. This tool leverages the power of Perplexity AI to provide deeper insights and more accurate analysis.", 5 | "main": "index.js", 6 | "scripts": { 7 | "test": "echo \"Error: no test specified\" && exit 1" 8 | }, 9 | "keywords": [], 10 | "author": "", 11 | "license": "ISC" 12 | } 13 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [project] 2 | name = "python-template" 3 | version = "0.1.0" 4 | description = "" 5 | authors = ["Your Name "] 6 | requires-python = ">=3.11" 7 | dependencies = [] 8 | -------------------------------------------------------------------------------- /replit.nix: -------------------------------------------------------------------------------- 1 | {pkgs}: { 2 | deps = [ 3 | pkgs.yakut 4 | ]; 5 | } 6 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | requests 2 | beautifulsoup4 3 | rich 4 | pydantic 5 | openai 6 | python-dotenv 7 | flask 8 | flask-cors -------------------------------------------------------------------------------- /run.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Check if Python is installed 4 | if ! command -v python &> /dev/null; then 5 | echo "Python is required but not installed. Please install Python 3.8+ and try again." 6 | exit 1 7 | fi 8 | 9 | # Check if Flask is installed 10 | python -c "import flask" &> /dev/null 11 | if [ $? -ne 0 ]; then 12 | echo "Flask is not installed. Installing dependencies..." 13 | pip install -r requirements.txt 14 | fi 15 | 16 | # Check if .env file exists 17 | if [ ! -f .env ]; then 18 | echo "Creating .env file from example..." 19 | cp .env.example .env 20 | echo "Please edit the .env file and add your API key." 21 | fi 22 | 23 | # Check if React app is built 24 | if [ ! -d "client/build" ]; then 25 | echo "React app is not built. Building now..." 26 | 27 | # Check if Node.js is installed 28 | if ! command -v npm &> /dev/null; then 29 | echo "Node.js and npm are required to build the frontend. Please install them and try again." 30 | echo "Running with backend only..." 31 | else 32 | cd client 33 | npm install 34 | npm run build 35 | cd .. 36 | fi 37 | fi 38 | 39 | # Start the Flask server 40 | echo "Starting OSINT Assistant Web App..." 41 | python osint_web_app.py -------------------------------------------------------------------------------- /run_windows.bat: -------------------------------------------------------------------------------- 1 | @echo off 2 | echo Starting OSINT Assistant setup... 3 | 4 | REM Check if Python is installed 5 | python --version >nul 2>&1 6 | if %ERRORLEVEL% NEQ 0 ( 7 | echo Python is required but not installed. Please install Python 3.8+ and try again. 8 | pause 9 | exit /b 1 10 | ) 11 | 12 | REM Install key dependencies first to avoid common errors 13 | echo Installing critical dependencies... 14 | pip install python-dotenv 15 | 16 | REM Install all other dependencies 17 | echo Installing remaining dependencies... 18 | pip install -r requirements.txt 19 | 20 | REM Check if .env file exists 21 | if not exist .env ( 22 | echo Creating .env file from example... 23 | copy .env.example .env 24 | echo Please edit the .env file and add your API key. 25 | ) 26 | 27 | REM Check if React app is built 28 | if not exist "client\build" ( 29 | echo React app is not built. Building now... 30 | 31 | REM Check if Node.js is installed 32 | where npm >nul 2>&1 33 | if %ERRORLEVEL% NEQ 0 ( 34 | echo Node.js and npm are required to build the frontend. Running with backend only... 35 | ) else ( 36 | cd client 37 | call npm install 38 | call npm run build 39 | cd .. 40 | ) 41 | ) 42 | 43 | REM Start the Flask server and open browser 44 | echo Starting OSINT Assistant Web App... 45 | start "" http://localhost:5000 46 | python osint_web_app.py 47 | 48 | pause -------------------------------------------------------------------------------- /uv.lock: -------------------------------------------------------------------------------- 1 | version = 1 2 | requires-python = ">=3.11" 3 | 4 | [[package]] 5 | name = "python-template" 6 | version = "0.1.0" 7 | source = { virtual = "." } 8 | --------------------------------------------------------------------------------