├── .github ├── ISSUE_TEMPLATE │ ├── bug_report.md │ └── feature_request.md └── workflows │ └── main.yml ├── .gitignore ├── CODE_OF_CONDUCT.md ├── LICENSE ├── README.md ├── codeargos ├── Constants.py ├── __init__.py ├── __main__.py ├── __version__.py ├── app.py ├── codediffer.py ├── datastore.py ├── diff.py ├── displaydiff.py ├── enums.py ├── in-scope.txt ├── scrapedpage.py ├── scraper.py ├── webcrawler.py └── webhook.py ├── images ├── CodeArgos-Discord-Notification.png ├── CodeArgos-LogicApp.png ├── CodeArgos-Slack-Notification.png └── CodeArgos-Teams-Notification.png ├── requirements.txt ├── setup.py ├── sonar-project.properties └── test_site ├── about.html ├── evenmorecode.js ├── external.html ├── index.html ├── launch_test_site.sh ├── site.css └── somecode.js /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Describe the bug** 11 | A clear and concise description of what the bug is. 12 | 13 | **To Reproduce** 14 | Steps to reproduce the behavior: 15 | 1. Go to '...' 16 | 2. Click on '....' 17 | 3. Scroll down to '....' 18 | 4. See error 19 | 20 | **Expected behavior** 21 | A clear and concise description of what you expected to happen. 22 | 23 | **Screenshots** 24 | If applicable, add screenshots to help explain your problem. 25 | 26 | **Desktop (please complete the following information):** 27 | - OS: [e.g. iOS] 28 | - Browser [e.g. chrome, safari] 29 | - Version [e.g. 22] 30 | 31 | **Smartphone (please complete the following information):** 32 | - Device: [e.g. iPhone6] 33 | - OS: [e.g. iOS8.1] 34 | - Browser [e.g. stock browser, safari] 35 | - Version [e.g. 22] 36 | 37 | **Additional context** 38 | Add any other context about the problem here. 39 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Feature request 3 | about: Suggest an idea for this project 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Is your feature request related to a problem? Please describe.** 11 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 12 | 13 | **Describe the solution you'd like** 14 | A clear and concise description of what you want to happen. 15 | 16 | **Describe alternatives you've considered** 17 | A clear and concise description of any alternative solutions or features you've considered. 18 | 19 | **Additional context** 20 | Add any other context or screenshots about the feature request here. 21 | -------------------------------------------------------------------------------- /.github/workflows/main.yml: -------------------------------------------------------------------------------- 1 | # This is a basic workflow to help you get started with Actions 2 | 3 | name: CI 4 | 5 | # Controls when the action will run. Triggers the workflow on push or pull request 6 | # events but only for the master branch 7 | on: 8 | push: 9 | branches: [ master ] 10 | pull_request: 11 | branches: [ master ] 12 | 13 | # A workflow run is made up of one or more jobs that can run sequentially or in parallel 14 | jobs: 15 | # This workflow contains a single job called "build" 16 | build: 17 | # The type of runner that the job will run on 18 | runs-on: ubuntu-latest 19 | 20 | # Steps represent a sequence of tasks that will be executed as part of the job 21 | steps: 22 | # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it 23 | - uses: actions/checkout@v3 24 | with: 25 | fetch-depth: 0 26 | 27 | # Run SonarCloud scan on codebase 28 | - name: SonarCloud Scan 29 | uses: SonarSource/sonarcloud-github-action@master 30 | env: 31 | GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} 32 | SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }} 33 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # VS code 10 | .vscode 11 | 12 | # Ignore test databases 13 | *.db 14 | 15 | # Distribution / packaging 16 | .Python 17 | build/ 18 | develop-eggs/ 19 | dist/ 20 | downloads/ 21 | eggs/ 22 | .eggs/ 23 | lib/ 24 | lib64/ 25 | parts/ 26 | sdist/ 27 | var/ 28 | wheels/ 29 | share/python-wheels/ 30 | *.egg-info/ 31 | .installed.cfg 32 | *.egg 33 | MANIFEST 34 | 35 | # PyInstaller 36 | # Usually these files are written by a python script from a template 37 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 38 | *.manifest 39 | *.spec 40 | 41 | # Installer logs 42 | pip-log.txt 43 | pip-delete-this-directory.txt 44 | 45 | # Unit test / coverage reports 46 | htmlcov/ 47 | .tox/ 48 | .nox/ 49 | .coverage 50 | .coverage.* 51 | .cache 52 | nosetests.xml 53 | coverage.xml 54 | *.cover 55 | *.py,cover 56 | .hypothesis/ 57 | .pytest_cache/ 58 | cover/ 59 | 60 | # Translations 61 | *.mo 62 | *.pot 63 | 64 | # Django stuff: 65 | *.log 66 | local_settings.py 67 | db.sqlite3 68 | db.sqlite3-journal 69 | 70 | # Flask stuff: 71 | instance/ 72 | .webassets-cache 73 | 74 | # Scrapy stuff: 75 | .scrapy 76 | 77 | # Sphinx documentation 78 | docs/_build/ 79 | 80 | # PyBuilder 81 | .pybuilder/ 82 | target/ 83 | 84 | # Jupyter Notebook 85 | .ipynb_checkpoints 86 | 87 | # IPython 88 | profile_default/ 89 | ipython_config.py 90 | 91 | # pyenv 92 | # For a library or package, you might want to ignore these files since the code is 93 | # intended to run in multiple environments; otherwise, check them in: 94 | # .python-version 95 | 96 | # pipenv 97 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 98 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 99 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 100 | # install all needed dependencies. 101 | #Pipfile.lock 102 | 103 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 104 | __pypackages__/ 105 | 106 | # Celery stuff 107 | celerybeat-schedule 108 | celerybeat.pid 109 | 110 | # SageMath parsed files 111 | *.sage.py 112 | 113 | # Environments 114 | .env 115 | .venv 116 | env/ 117 | venv/ 118 | ENV/ 119 | env.bak/ 120 | venv.bak/ 121 | 122 | # Spyder project settings 123 | .spyderproject 124 | .spyproject 125 | 126 | # Rope project settings 127 | .ropeproject 128 | 129 | # mkdocs documentation 130 | /site 131 | 132 | # mypy 133 | .mypy_cache/ 134 | .dmypy.json 135 | dmypy.json 136 | 137 | # Pyre type checker 138 | .pyre/ 139 | 140 | # pytype static type analyzer 141 | .pytype/ 142 | 143 | # Cython debug symbols 144 | cython_debug/ 145 | 146 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | In the interest of fostering an open and welcoming environment, we as 6 | contributors and maintainers pledge to making participation in our project and 7 | our community a harassment-free experience for everyone, regardless of age, body 8 | size, disability, ethnicity, sex characteristics, gender identity and expression, 9 | level of experience, education, socio-economic status, nationality, personal 10 | appearance, race, religion, or sexual identity and orientation. 11 | 12 | ## Our Standards 13 | 14 | Examples of behavior that contributes to creating a positive environment 15 | include: 16 | 17 | * Using welcoming and inclusive language 18 | * Being respectful of differing viewpoints and experiences 19 | * Gracefully accepting constructive criticism 20 | * Focusing on what is best for the community 21 | * Showing empathy towards other community members 22 | 23 | Examples of unacceptable behavior by participants include: 24 | 25 | * The use of sexualized language or imagery and unwelcome sexual attention or 26 | advances 27 | * Trolling, insulting/derogatory comments, and personal or political attacks 28 | * Public or private harassment 29 | * Publishing others' private information, such as a physical or electronic 30 | address, without explicit permission 31 | * Other conduct which could reasonably be considered inappropriate in a 32 | professional setting 33 | 34 | ## Our Responsibilities 35 | 36 | Project maintainers are responsible for clarifying the standards of acceptable 37 | behavior and are expected to take appropriate and fair corrective action in 38 | response to any instances of unacceptable behavior. 39 | 40 | Project maintainers have the right and responsibility to remove, edit, or 41 | reject comments, commits, code, wiki edits, issues, and other contributions 42 | that are not aligned to this Code of Conduct, or to ban temporarily or 43 | permanently any contributor for other behaviors that they deem inappropriate, 44 | threatening, offensive, or harmful. 45 | 46 | ## Scope 47 | 48 | This Code of Conduct applies both within project spaces and in public spaces 49 | when an individual is representing the project or its community. Examples of 50 | representing a project or community include using an official project e-mail 51 | address, posting via an official social media account, or acting as an appointed 52 | representative at an online or offline event. Representation of a project may be 53 | further defined and clarified by project maintainers. 54 | 55 | ## Enforcement 56 | 57 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 58 | reported by contacting the project team at dana@vulscan.com. All 59 | complaints will be reviewed and investigated and will result in a response that 60 | is deemed necessary and appropriate to the circumstances. The project team is 61 | obligated to maintain confidentiality with regard to the reporter of an incident. 62 | Further details of specific enforcement policies may be posted separately. 63 | 64 | Project maintainers who do not follow or enforce the Code of Conduct in good 65 | faith may face temporary or permanent repercussions as determined by other 66 | members of the project's leadership. 67 | 68 | ## Attribution 69 | 70 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, 71 | available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html 72 | 73 | [homepage]: https://www.contributor-covenant.org 74 | 75 | For answers to common questions about this code of conduct, see 76 | https://www.contributor-covenant.org/faq 77 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Dana Epp 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # CodeArgos 2 | [![Reliability Rating](https://sonarcloud.io/api/project_badges/measure?project=codeargos-github&metric=reliability_rating)](https://sonarcloud.io/dashboard?id=codeargos-github) 3 | [![Maintainability Rating](https://sonarcloud.io/api/project_badges/measure?project=codeargos-github&metric=sqale_rating)](https://sonarcloud.io/dashboard?id=codeargos-github) 4 | [![Security Rating](https://sonarcloud.io/api/project_badges/measure?project=codeargos-github&metric=security_rating)](https://sonarcloud.io/dashboard?id=codeargos-github) 5 | [![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=codeargos-github&metric=alert_status)](https://sonarcloud.io/dashboard?id=codeargos-github) 6 | 7 | [![Bugs](https://sonarcloud.io/api/project_badges/measure?project=codeargos-github&metric=bugs)](https://sonarcloud.io/dashboard?id=codeargos-github) 8 | [![Vulnerabilities](https://sonarcloud.io/api/project_badges/measure?project=codeargos-github&metric=vulnerabilities)](https://sonarcloud.io/dashboard?id=codeargos-github) 9 | [![Code Smells](https://sonarcloud.io/api/project_badges/measure?project=codeargos-github&metric=code_smells)](https://sonarcloud.io/dashboard?id=codeargos-github) 10 | 11 | This tool supports the continious recon of scripts and script blocks in an active web application. 12 | 13 | It populates and maintains an internal database by web crawling a target and detects Javascript files and HTML script blocks, watching for changes as they are published. 14 | 15 | The tool can then produce change diffs between scansets to allow security researchers to pinpoint the changing attack surface of the target web application. 16 | ## Install 17 | Install using: 18 | ```bash 19 | git clone https://github.com/DanaEpp/CodeArgos.git 20 | cd CodeArgos 21 | python3 setup.py install 22 | ``` 23 | Dependencies will be installed and `codeargos` will be added to your path. 24 | 25 | To create a cron job that will run CodeArgos every day: 26 | ```bash 27 | crontab -e 28 | ``` 29 | Then create an entry that looks something like this: 30 | 31 | ```bash 32 | @daily python3 -m /path/to/codeargos -u https://yourtarget.com 33 | ``` 34 | 35 | This will run CodeArgos once a day, at midnight against your target web app. You can adjust the schedule to meet your needs, and add additional arguments as needed (defined below). 36 | 37 | **NOTE:** If you are using CodeArgos against several different targets, try to schedule recon scan windows at least 30 minutes apart. This will allow CodeArgos to maximize your CPU, threads and bandwidth during the web crawling of each target. 38 | 39 | ## Usage 40 | When used for RECON: 41 | ```bash 42 | python3 -m codeargos -u target.com 43 | [-t thread_cnt] [-d] [-s] [-f /path/to/your/file.db] 44 | [-w generic|slack|teams|discord --wurl https://hook.slack.com/some/webhook] 45 | [-p id|diff|both|none] 46 | [--scope /path/to/scope/file] 47 | ``` 48 | 49 | When used to REVIEW a diff after detection during RECON: 50 | ```bash 51 | python3 -m codeargos -f /path/to/your/file.db --diff id_num 52 | ``` 53 | 54 | * `-u`, `--url` : The target base URL (with crawl anything it finds underneith it) 55 | * `-t`, `--threads` [optional] : The number of threads to run. By default it is calculated at 5 times your total CPU count 56 | * `-d`, `--debug` [optional] : Write out debug information to local log file (codeargos.log) 57 | * `--stats` [optional] : Dump stats to stdout to show progress of crawl 58 | * `-f`, `--file` [optional] : Reads and stores data across runs using a sqlite database you point to. If not used, default is `target.com.db`, where **target** is the hostname of the URL passed in. 59 | * `-w`, `--webhook` [optional] : Enables notifications to a webhook. Possible options are *slack*, *teams*, *discord* and *generic*. Requires the `--wurl` param. Use generic when sending to Zapier, IFTTT, Microsoft Logic Apps or Microsoft Flow 60 | * `--wurl` or `--webhookurl` [optional] : The fully qualified path to your webhook endpoint. You need to generate this in your favorite web app (Slack/Teams/Discord etc). 61 | * `--diff` : The diff id sent to you by the webhook notification service. Also requires the `-f` option to know which db to read the diff from. 62 | * `-p` or `--print` [optional] : Determines how results are displayed. Options include: 63 | * **id** : Shows a list of diff ids 64 | * **diff** : Shows the actual diffs between scans 65 | * **both** : Shows both the diffs and then the list of ids (default) 66 | * **none** : Useful in first time run or when you expect to use notifications to send results 67 | * `-s` or `--scope` [optional] : Defines the file that will be used for a fixed scope scan. In this mode the crawler will **NOT** add links it discovers to the queue for deeper scanning. It will ONLY scan the files defined. It will also remove whatever target you added in the `-u` param from the queue. You still need to pass `-u` though, as its the seed used to define the database filename unless you pass in `-f`. 68 | 69 | ## Webhooks support 70 | To assist in notifying your red team of recent code changes, or to get ahead of other bug bounty hunters who may be competing on the same target, consider using webhook notifications. Here is a real life example that got me a $1,000 bounty because I was able to 'get there first'. 71 | 72 | > ![Slack notification](images/CodeArgos-Slack-Notification.png) 73 | 74 | Here is another example in Microsoft Teams. Note the "View Code" button launches a browser directly to the affected page: 75 | 76 | > ![Teams notification](images/CodeArgos-Teams-Notification.png) 77 | 78 | Finally, here is an example of message being sent to Discord. 79 | 80 | > ![Discord notification](images/CodeArgos-Discord-Notification.png) 81 | 82 | For more information on setting up webhook notifications for your favorite apps please see: 83 | * **Slack** : [Detailed instructions](https://api.slack.com/messaging/webhooks). To setup your first one [go here](https://my.slack.com/services/new/incoming-webhook/). 84 | * **Microsoft Teams** : [Detailed instructions](https://docs.microsoft.com/en-us/microsoftteams/platform/webhooks-and-connectors/how-to/add-incoming-webhook) 85 | * **Discord** : [Detailed instructions](https://support.discord.com/hc/en-us/articles/228383668-Intro-to-Webhooks) 86 | 87 | ### Working with generic webhooks like Microsoft Logic Apps 88 | If you are wanting to get notifications to a different device or email, consider using the "generic" webhook option and configure it to point to a [Microsoft Logic App](https://azure.microsoft.com/en-us/services/logic-apps/). When defining the HTTP receive endpoint in Azure use the following Request Body JSON Schema: 89 | 90 | ``` 91 | { 92 | "properties": { 93 | "code_url": { 94 | "type": "string" 95 | }, 96 | "content": { 97 | "type": "string" 98 | }, 99 | "username": { 100 | "type": "string" 101 | } 102 | }, 103 | "type": "object" 104 | } 105 | ``` 106 | 107 | By defining it in that way, the Logic App will parse out the payload and allow direct dynamic content variables for use in your workflow. From there you can do anything with the payload, from sending it via SMS to your phone or directly to email. 108 | 109 | Here is a sample workflow that will send it to a Google Gmail account: 110 | 111 | ![Microsoft Logic App](images/CodeArgos-LogicApp.png) 112 | 113 | Have fun with it. Generic webhooks and Logic Apps can do some pretty powerful things. 114 | 115 | ## Tips 116 | If you are having any difficulties in crawling your target web app, consider dialing back the threads used. By default it will select five times the number of CPUs you have. I've found the most success with `-t 10` on targets behind difficult WAFs and load balancers. While there is an incrimental backoff retry pattern in the tool, the reality is CodeArgos can be aggressive on its initial scan as it populates it's database. 117 | 118 | If you aren't sure whats going on, use the `-d` argument and look through the `codeargos.log` file to see what is going on. ie: `tail -f codeargos.log` 119 | 120 | If you find the tool is tripping up on a target, please open an [issue](https://github.com/DanaEpp/CodeArgos/issues) and include your target URL and any log data you are comfortable in sharing. I'll try to take a look at it ASAP. 121 | 122 | Don't want to crawl the whole site and only look at one or two Javascript files? No problem. Try something like: 123 | ```bash 124 | python -m codeargos -u http://yourtarget.com --scope /path/to/scoped-targets.txt 125 | ``` 126 | 127 | ## Dev Tips 128 | You can evaluate the scanner and parser by jumping into the test_site dir and running the launcher. It will load a test web server on port 9000 for you. 129 | 130 | ``` 131 | cd test_site 132 | ./launch_test_site.sh 133 | ``` 134 | In another shell window execute: 135 | ``` 136 | python3 -m codeargos -u http://localhost:9000 -d -t 10 -f test.db 137 | ``` 138 | The test site will continue to be expanded on as we find in-field issues with the parsing and data management. If you wish to contribute, here would be a great place to add complex and weird script blocks that we can evaluate and make sure get parsed correctly. 139 | 140 | -------------------------------------------------------------------------------- /codeargos/Constants.py: -------------------------------------------------------------------------------- 1 | # Colors 2 | # -------------- 3 | RED = '\33[31m' 4 | CYAN = '\33[36m' 5 | GREEN = '\33[32m' 6 | WHITE = '\33[0m' 7 | # -------------- -------------------------------------------------------------------------------- /codeargos/__init__.py: -------------------------------------------------------------------------------- 1 | # __init__.py 2 | from .app import CodeArgos -------------------------------------------------------------------------------- /codeargos/__main__.py: -------------------------------------------------------------------------------- 1 | # __main__.py 2 | 3 | import sys 4 | import signal 5 | from .app import CodeArgos 6 | 7 | def exit_gracefully(signal, frame): 8 | sys.exit(0) 9 | 10 | if __name__ == '__main__': 11 | signal.signal(signal.SIGINT, exit_gracefully) 12 | CodeArgos.run(sys.argv[1:]) 13 | 14 | -------------------------------------------------------------------------------- /codeargos/__version__.py: -------------------------------------------------------------------------------- 1 | __version__ = '0.2' -------------------------------------------------------------------------------- /codeargos/app.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import os 4 | import sys 5 | import getopt 6 | import re 7 | from codeargos.__version__ import __version__ 8 | import codeargos.Constants as Constants 9 | from codeargos.enums import CodeArgosMode, CodeArgosPrintMode 10 | from codeargos.webcrawler import WebCrawler 11 | from codeargos.displaydiff import DisplayDiff 12 | from datetime import tzinfo, timedelta, datetime, timezone 13 | import logging 14 | 15 | class CodeArgos: 16 | 17 | def __init__(self): 18 | self.visited = set() 19 | 20 | @classmethod 21 | def print_banner(cls): 22 | print("\n{0}===========================================".format(Constants.CYAN)) 23 | print(" {0}CodeArgos{1} ({0}v{2}{1}) - Developed by @danaepp".format(Constants.WHITE, Constants.CYAN, __version__)) 24 | print(" https://github.com/danaepp/CodeArgos") 25 | print("==========================================={0} \n".format(Constants.WHITE)) 26 | 27 | @classmethod 28 | def display_usage(cls): 29 | print( ' RECON: codeargos.py -u example.com\n\t[-t thread_cnt] [-d] [-s] [-f /path/to/your.db]\n\t[-w generic|slack|teams|discord --wurl=https://hook.slack.com/some/webhook]\n\t[-p diff|id|both|none]' ) 30 | print( 'REVIEW: codeargos.py --diff 123 -f /path/to/your.db' ) 31 | print('\n') 32 | 33 | @classmethod 34 | def setup_logging(cls, log_level): 35 | if log_level is None: 36 | logging.basicConfig( 37 | stream=sys.stdout, 38 | level=log_level, 39 | format='%(asctime)s [%(levelname)s] %(message)s', 40 | datefmt='%m/%d/%Y %I:%M:%S %p' ) 41 | else: 42 | logging.basicConfig( 43 | filename="codeargos.log", 44 | level=log_level, 45 | format='%(asctime)s [%(levelname)s] %(message)s', 46 | datefmt='%m/%d/%Y %I:%M:%S %p' ) 47 | 48 | @classmethod 49 | def get_print_mode(cls,pmode): 50 | if pmode == "none": 51 | print_mode = CodeArgosPrintMode.NONE 52 | elif pmode == "id": 53 | print_mode = CodeArgosPrintMode.ID 54 | elif pmode == "diff": 55 | print_mode = CodeArgosPrintMode.DIFF 56 | else: 57 | print_mode = CodeArgosPrintMode.BOTH 58 | 59 | return print_mode 60 | 61 | @classmethod 62 | def get_scoped_targets(cls,file_path): 63 | scope = [] 64 | try: 65 | scope_file = open( file_path, 'r') 66 | targets = scope_file.readlines() 67 | scope_file.close() 68 | for target in targets: 69 | scope.append(target.strip()) 70 | except FileNotFoundError: 71 | print( "A valid scope target list file not be found.") 72 | except Exception as e: 73 | logging.exception(e) 74 | return scope 75 | 76 | @staticmethod 77 | def run(argv): 78 | CodeArgos.print_banner() 79 | 80 | diff_id = 0 81 | print_mode = CodeArgosPrintMode.BOTH 82 | targets = [] 83 | seed_url = "" 84 | 85 | try: 86 | opts, args = getopt.getopt( argv, 87 | "hu:t:ds:f:w:p:", 88 | ["help", "url=", "threads=", "debug", "stats", "file", "webhook=", "wurl=", "webhookurl=", "diff=", "print=", "scope="]) 89 | except getopt.GetoptError as err: 90 | logging.exception(err) 91 | logging.debug("opts: {0} | args: {1}".format(opts, args)) 92 | CodeArgos.display_usage() 93 | sys.exit(2) 94 | 95 | threads = os.cpu_count() * 5 96 | log_level = None 97 | show_stats = False 98 | db_file_path = "" 99 | webhook_type = "" 100 | webhook_url = "" 101 | 102 | for opt, arg in opts: 103 | if opt in ( "-h", "--help"): 104 | CodeArgos.display_usage() 105 | sys.exit() 106 | elif opt in ( "-u", "--url"): 107 | seed_url = arg 108 | elif opt in ( "-t", "--threads"): 109 | try: 110 | threads = int(arg, base=10) 111 | except ValueError: 112 | print( "Invalid thread count. Using defaults") 113 | threads = os.cpu_count() * 5 114 | elif opt in ( "-d", "--debug" ): 115 | log_level = logging.DEBUG 116 | elif opt in ( "--stats" ): 117 | show_stats = True 118 | elif opt in ( "-f", "--file" ): 119 | db_file_path = arg 120 | elif opt in ( "-w", "--webhook" ): 121 | webhook_type = arg.lower() 122 | elif opt in ( "--wurl", "--webhookurl" ): 123 | webhook_url = arg 124 | elif opt in ("--diff"): 125 | try: 126 | diff_id = int(arg) 127 | except Exception as e: 128 | logging.exception(e) 129 | elif opt in ("-p", "--print"): 130 | print_mode = CodeArgos.get_print_mode(arg.lower()) 131 | elif opt in ("-s", "--scope"): 132 | targets = CodeArgos.get_scoped_targets(arg) 133 | 134 | CodeArgos.setup_logging(log_level) 135 | 136 | if diff_id > 0 and db_file_path: 137 | diff_viewer = DisplayDiff(db_file_path) 138 | diff_viewer.show(diff_id) 139 | else: 140 | if seed_url: 141 | scan_start = datetime.now(timezone.utc) 142 | print( "Attempting to scan {0} across {1} threads...".format(seed_url, threads)) 143 | print( "Starting scan at {0} UTC".format(scan_start.strftime("%Y-%m-%d %H:%M")) ) 144 | 145 | crawler = WebCrawler(seed_url, threads, show_stats, db_file_path, webhook_type, webhook_url, print_mode) 146 | crawler.start(targets) 147 | 148 | scan_end = datetime.now(timezone.utc) 149 | elapsed_time = scan_end - scan_start 150 | 151 | print( "Scan complete: reviewed {0} pages in {1}.".format( crawler.processed, elapsed_time ) ) 152 | else: 153 | print( "Missing target (-u) parameter. Aborting!") 154 | CodeArgos.display_usage() 155 | 156 | 157 | -------------------------------------------------------------------------------- /codeargos/codediffer.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import difflib 4 | import sys 5 | from colorama import Fore, Back, Style, init 6 | import pprint 7 | import jsbeautifier 8 | from enum import IntEnum 9 | from codeargos.enums import CodeDifferMode 10 | 11 | class CodeDiffer: 12 | def __init__(self, console=False, mode=CodeDifferMode.UNIFIED): 13 | self.console = console 14 | self.mode = mode 15 | 16 | def diff(self, url, original_code, new_code): 17 | diff_data = "" 18 | 19 | # Prep the two code blocks, after expanding into readable segments 20 | options = { 21 | "keep_function_indentation": True, 22 | "keep_array_indentation": True, 23 | "jslint_happy": True 24 | } 25 | 26 | beautiful_old_code = jsbeautifier.beautify(original_code, options).splitlines() 27 | beautiful_new_code = jsbeautifier.beautify(new_code, options).splitlines() 28 | 29 | if self.mode == CodeDifferMode.UNIFIED: 30 | delta = difflib.unified_diff( 31 | beautiful_old_code, beautiful_new_code, 32 | fromfile="before : {0}".format(url), tofile="after : {0}".format(url), 33 | lineterm="") 34 | 35 | diff_data = '\n'.join(delta) 36 | 37 | return diff_data 38 | else: 39 | # Generate an HTML diff file 40 | differ = difflib.HtmlDiff() 41 | diff_data = differ.make_file( beautiful_old_code, beautiful_new_code ) 42 | 43 | if self.console: 44 | print( "[CHANGED FILE] {0}".format(url)) 45 | print(diff_data) 46 | 47 | return diff_data -------------------------------------------------------------------------------- /codeargos/datastore.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import sqlite3 3 | from codeargos.scrapedpage import ScrapedPage 4 | from codeargos.diff import Diff 5 | import threading 6 | from os.path import isfile, getsize 7 | import logging 8 | import json 9 | 10 | class DataStore: 11 | def __init__(self, db_file_name): 12 | # To handle the fact python doesn't like recursive cursors 13 | # for sqlite, we have to use thread locking to prevent mangling 14 | self.lock = threading.Lock() 15 | 16 | # sqlite3 does not like multithreading in python. 17 | # We have to remove the thread id check. 18 | self.conn = sqlite3.connect( db_file_name, check_same_thread=False ) 19 | self.conn.row_factory = sqlite3.Row 20 | 21 | self.db = self.conn.cursor() 22 | self.create_datastore() 23 | 24 | def close(self): 25 | self.db.close() 26 | self.conn.close() 27 | 28 | def create_datastore(self): 29 | try: 30 | self.lock.acquire(True) 31 | self.db.execute( 32 | """CREATE TABLE IF NOT EXISTS 33 | pages ( 34 | url TEXT NOT NULL PRIMARY KEY, 35 | sig TEXT NOT NULL, 36 | content TEXT, 37 | last_update DATETIME DEFAULT CURRENT_TIMESTAMP) 38 | """ ) 39 | self.conn.commit() 40 | 41 | self.db.execute( 42 | """CREATE TABLE IF NOT EXISTS 43 | diffs ( 44 | id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, 45 | url TEXT, 46 | content TEXT, 47 | last_update DATETIME DEFAULT CURRENT_TIMESTAMP) 48 | """ ) 49 | self.conn.commit() 50 | finally: 51 | self.lock.release() 52 | 53 | def add_page(self, page): 54 | # As sqlite doesn't have UPSERT, this will do the trick 55 | try: 56 | self.lock.acquire(True) 57 | self.db.execute( 58 | """INSERT INTO pages VALUES( :url, :sig, :content, CURRENT_TIMESTAMP ) 59 | ON CONFLICT(url) 60 | DO UPDATE SET sig=:sig, content=:content, last_update=CURRENT_TIMESTAMP 61 | """, 62 | { 63 | 'url': page.url, 64 | 'sig': page.signature, 65 | 'content': page.content 66 | }) 67 | self.conn.commit() 68 | finally: 69 | self.lock.release() 70 | 71 | def add_diff(self, url, diff_content): 72 | last_id = -1 73 | try: 74 | self.lock.acquire(True) 75 | # We store the history of all code changes. 76 | self.db.execute( 77 | """INSERT INTO diffs VALUES( :id, :url, :content, CURRENT_TIMESTAMP )""", 78 | { 79 | 'id': None, 80 | 'url': url, 81 | 'content': diff_content 82 | }) 83 | self.conn.commit() 84 | 85 | last_id = self.db.lastrowid 86 | finally: 87 | self.lock.release() 88 | 89 | return last_id 90 | 91 | def get_page(self, url): 92 | page = None 93 | try: 94 | self.lock.acquire(True) 95 | self.db.execute("SELECT * FROM pages WHERE url=:u", {'u': url}) 96 | data = self.db.fetchone() 97 | if data: 98 | page = ScrapedPage( 99 | data['url'], 100 | data['sig'], 101 | data['content']) 102 | finally: 103 | self.lock.release() 104 | return page 105 | 106 | def get_diff(self, diff_id): 107 | diff = None 108 | try: 109 | self.lock.acquire(True) 110 | self.db.execute("SELECT * FROM diffs WHERE id=:id", {'id': diff_id}) 111 | data = self.db.fetchone() 112 | if data: 113 | diff = Diff( 114 | diff_id, 115 | data['url'], 116 | data['content'], 117 | data['last_update'] 118 | ) 119 | finally: 120 | self.lock.release() 121 | return diff 122 | 123 | def dump_pages(self): 124 | try: 125 | self.lock.acquire(True) 126 | self.db.execute( "SELECT * FROM pages") 127 | pages = self.db.fetchall() 128 | finally: 129 | self.lock.release() 130 | 131 | for page in pages: 132 | print(page) -------------------------------------------------------------------------------- /codeargos/diff.py: -------------------------------------------------------------------------------- 1 | class Diff: 2 | def __init__(self, id, url, content, date): 3 | self.__diff_id = id 4 | self.__diff_url = url 5 | self.__diff_content = content 6 | self.__diff_date = date 7 | 8 | @property 9 | def url(self): 10 | return self.__diff_url 11 | 12 | @property 13 | def id(self): 14 | return self.__diff_id 15 | 16 | @property 17 | def content(self): 18 | return self.__diff_content 19 | 20 | @property 21 | def date(self): 22 | return self.__diff_date 23 | 24 | def __repr__(self): 25 | return "{ id:\"{0}\", url:\"{1}\", content: \"{2}\", date: \"{3}\" }".format(self.__diff_id, self.__diff_url, self.__diff_content, self.__diff_date) 26 | 27 | def __str__(self): 28 | return "Diff('{0}', '{1}', '{2}', '{3}')".format(self.__diff_id, self.__diff_url, self.__diff_content, self.__diff_date) 29 | -------------------------------------------------------------------------------- /codeargos/displaydiff.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import os.path 4 | import sys 5 | import logging 6 | from codeargos.datastore import DataStore 7 | from codeargos.diff import Diff 8 | from colorama import Fore, Back, Style, init 9 | 10 | class DisplayDiff: 11 | def __init__(self, db_file_name): 12 | try: 13 | if os.path.isfile(db_file_name): 14 | self.data_store = DataStore(db_file_name) 15 | else: 16 | msg = "Unable to find db file to read diff data. Aborting." 17 | logging.debug(msg) 18 | print(msg) 19 | sys.exit() 20 | except Exception as e: 21 | logging.exception(e) 22 | print("Exception thrown while trying to load the db file to read diff data. Aborting.") 23 | sys.exit() 24 | 25 | def show(self, diff_id): 26 | try: 27 | diff = self.data_store.get_diff(diff_id) 28 | 29 | if diff: 30 | print("\n[DIFF #{0}] {1} (Detected: {2}) ".format(diff.id, diff.url, diff.date) ) 31 | lines = diff.content.split('\n') 32 | for line in lines: 33 | if line.startswith('+'): 34 | print( Fore.GREEN + line + Fore.RESET ) 35 | elif line.startswith('-'): 36 | print( Fore.RED + line + Fore.RESET ) 37 | elif line.startswith('^'): 38 | print( Fore.BLUE + line + Fore.RESET ) 39 | else: 40 | print( line ) 41 | else: 42 | print( "No diff by that id exists!") 43 | except Exception as e: 44 | logging.exception(e) 45 | -------------------------------------------------------------------------------- /codeargos/enums.py: -------------------------------------------------------------------------------- 1 | from enum import IntEnum 2 | 3 | class CodeArgosMode(IntEnum): 4 | RECON = 0 5 | REVIEW = 1 6 | 7 | class CodeArgosPrintMode(IntEnum): 8 | NONE = 0 9 | ID = 1 10 | DIFF = 2 11 | BOTH = 3 12 | 13 | class CodeDifferMode(IntEnum): 14 | UNIFIED = 0 15 | HTML = 1 16 | 17 | class WebHookType(IntEnum): 18 | NONE = 0 19 | GENERIC = 1 20 | SLACK = 2 21 | TEAMS = 3 22 | DISCORD = 4 -------------------------------------------------------------------------------- /codeargos/in-scope.txt: -------------------------------------------------------------------------------- 1 | http://localhost:9000/about.html 2 | http://localhost:9000/evenmorecode.js 3 | -------------------------------------------------------------------------------- /codeargos/scrapedpage.py: -------------------------------------------------------------------------------- 1 | class ScrapedPage: 2 | def __init__(self, url, sig, content): 3 | self.__page_url = url 4 | self.__page_sig = sig 5 | self.__page_content = content 6 | 7 | @property 8 | def url(self): 9 | return self.__page_url 10 | 11 | @property 12 | def signature(self): 13 | return self.__page_sig 14 | 15 | @property 16 | def content(self): 17 | return self.__page_content 18 | 19 | def __repr__(self): 20 | return "{ url:\"{0}\", sig: \"{1}\" }".format(self.__page_url, self.__page_sig) 21 | 22 | def __str__(self): 23 | return "Page('{0}', '{1}')".format(self.__page_url, self.__page_sig) 24 | 25 | -------------------------------------------------------------------------------- /codeargos/scraper.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import requests 3 | from requests.adapters import HTTPAdapter 4 | from urllib3.util.retry import Retry 5 | from urllib.parse import urlparse,urljoin 6 | from bs4 import BeautifulSoup 7 | from pprint import pprint 8 | import hashlib 9 | from codeargos.scrapedpage import ScrapedPage 10 | from codeargos.codediffer import CodeDiffer, CodeDifferMode 11 | 12 | class Scraper: 13 | 14 | allowed_content = ( 15 | 'text/plain', 16 | 'text/html', 17 | 'application/x-httpd-php', 18 | 'application/xhtml+xml', 19 | 'application/javascript', 20 | 'application/ecmascript', 21 | 'application/x-ecmascript', 22 | 'application/x-javascript', 23 | 'text/javascript', 24 | 'text/ecmascript', 25 | 'text/javascript1.0', 26 | 'text/javascript1.1', 27 | 'text/javascript1.2', 28 | 'text/javascript1.3', 29 | 'text/javascript1.4', 30 | 'text/javascript1.5', 31 | 'text/jscript', 32 | 'text/livescript', 33 | 'text/x-ecmascript', 34 | 'text/x-javascript' 35 | ) 36 | 37 | def __init__(self, url, scraped_page, scoped_scan): 38 | self.url = url 39 | self.old_scraped_page = scraped_page 40 | self.internal_urls = None 41 | self.scoped_scan = scoped_scan 42 | 43 | def get_page(self, session, url): 44 | response = None 45 | 46 | try: 47 | # This is fetching twice, but the first lets us determine if its a 48 | # filetype we can actually process or not, which saves overall on 49 | # bandwidth for files and images we just don't care about 50 | head = session.head(url) 51 | if head.status_code == 200: 52 | content_type = head.headers.get('Content-Type') 53 | if content_type.startswith(self.allowed_content): 54 | response = session.get(url) 55 | except Exception as ex: 56 | logging.exception(ex) 57 | 58 | return response 59 | 60 | def get_links(self, url, parsed_html): 61 | 62 | links = [] 63 | 64 | try: 65 | # Get links to other pages within the app 66 | for link in parsed_html.find_all("a", href=True): 67 | link_url = link.get('href').strip() 68 | if link_url is None: 69 | continue 70 | 71 | parsed_link = urlparse(link_url) 72 | 73 | full_url = link_url 74 | 75 | # Need to account for malformed URLs, ie: http:/singleslashisbad.com 76 | if parsed_link.scheme and not parsed_link.netloc: 77 | continue 78 | elif not parsed_link.scheme and not parsed_link.netloc and parsed_link.path: 79 | full_url = urljoin(url, link_url) 80 | 81 | links.append( full_url ) 82 | 83 | # Get links to javascript files used by the app 84 | for script in parsed_html.find_all("script"): 85 | 86 | if 'src' in script.attrs: 87 | script_url = script['src'] 88 | parsed_url = urlparse(script_url) 89 | 90 | # In case we have relative path javascript files 91 | if not parsed_url.netloc: 92 | script_url = urljoin(self.url, script_url) 93 | 94 | links.append( script_url ) 95 | except Exception as e: 96 | logging.exception(e) 97 | 98 | return links 99 | 100 | def check_changes(self, new_sig, content): 101 | # Check signature to see if the page has changed 102 | # and if we even have to process the content further 103 | new_content = True 104 | diff_content = "" 105 | if self.old_scraped_page: 106 | if self.old_scraped_page.signature == new_sig: 107 | new_content = False 108 | else: 109 | msg = "Changes detected on {0}".format(self.url) 110 | logging.debug(msg) 111 | differ = CodeDiffer(True, CodeDifferMode.UNIFIED) 112 | diff_content = differ.diff( self.url, self.old_scraped_page.content, content ) 113 | 114 | return new_content, diff_content 115 | 116 | def get_script_blocks(self,html): 117 | scripts_blocks = "" 118 | for script in html.find_all("script"): 119 | if 'src' not in script.attrs: 120 | # OK, its a local code block. Add it to what we have so we can hash all the blocks 121 | # as one entity. 122 | scripts_blocks += script.string 123 | 124 | return scripts_blocks 125 | 126 | def scrape(self): 127 | parsed_html = "" 128 | new_page_sig = "" 129 | session = requests.session() 130 | 131 | # Add incrimental backoff retry logic some we aren't slamming servers 132 | retry_strategy = Retry( 133 | total=10, 134 | status_forcelist=[104, 429, 500, 502, 503, 504], 135 | method_whitelist=["HEAD", "GET", "OPTIONS"], 136 | backoff_factor=1 137 | ) 138 | 139 | adapter = HTTPAdapter(max_retries=retry_strategy) 140 | session.mount('http://', adapter) 141 | session.mount('https://', adapter) 142 | 143 | try: 144 | response = self.get_page(session, self.url) 145 | if response and response.ok: 146 | raw_content = "" 147 | parsed_html = BeautifulSoup(response.content, features='html.parser') 148 | 149 | # We only want to hash the code blocks in the page, not the entire page, unless of course 150 | # it's a Javascript file. Otherwise any HTML code change affects an change diff, when we really 151 | # only want to be alerted of Javascript changes 152 | raw_content = response.text if self.url.lower().endswith(".js") else self.get_script_blocks(parsed_html) 153 | 154 | new_page_sig = hashlib.sha256(raw_content.encode('utf-8')).hexdigest() if raw_content else "NOSCRIPTS" 155 | else: 156 | status_code = response.status_code if response else "unknown" 157 | logging.debug( "Received error status {0} when fetching {1}".format(status_code, self.url)) 158 | except Exception as e: 159 | logging.exception(e) 160 | 161 | new_content, diff_content = self.check_changes(new_page_sig, raw_content) 162 | 163 | scraped_urls = [] 164 | if self.scoped_scan == False: 165 | scraped_urls = self.get_links(self.url, parsed_html) 166 | self.internal_urls = set(scraped_urls) 167 | 168 | session.close() 169 | 170 | return self.internal_urls, self.url, new_page_sig, new_content, raw_content, diff_content 171 | -------------------------------------------------------------------------------- /codeargos/webcrawler.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import re 3 | import sys 4 | import signal 5 | from queue import Queue, Empty 6 | from typing import List 7 | from concurrent.futures import ThreadPoolExecutor, Future, ALL_COMPLETED 8 | import concurrent 9 | import logging 10 | import time 11 | import pprint 12 | from codeargos.scraper import Scraper 13 | from codeargos.datastore import DataStore 14 | from codeargos.scrapedpage import ScrapedPage 15 | from codeargos.displaydiff import DisplayDiff 16 | from codeargos.webhook import WebHookType, WebHook 17 | from codeargos.enums import CodeArgosPrintMode 18 | from urllib.parse import urlparse 19 | 20 | class WebCrawler: 21 | def __init__(self, seed_url, threads, stats, db_file_path, webhook_type, webhook_url, print_mode): 22 | self.seed_url = seed_url 23 | self.pool = ThreadPoolExecutor(max_workers=threads) 24 | self.processed_urls = set([]) 25 | self.queued_urls = Queue() 26 | self.queued_urls.put(self.seed_url) 27 | self.show_stats = stats 28 | self.scripts_found = 0 29 | self.diff_list = set([]) 30 | self.print_mode = print_mode 31 | self.scoped_scan = False 32 | 33 | # Setup local sqlite database 34 | self.db_name = "unknown.db" 35 | if db_file_path is None: 36 | self.db_name = self.gen_db_name(seed_url) 37 | else: 38 | self.db_name = db_file_path 39 | self.data_store = DataStore(self.db_name) 40 | 41 | # Setup optional webhook for notifications 42 | if self.setup_webhook(webhook_url, webhook_type): 43 | self.webhook = WebHook(self.webhook_url, self.webhook_type) 44 | else: 45 | self.webhook = None 46 | 47 | signal.signal(signal.SIGINT, self.dump_data) 48 | 49 | def __del__(self): 50 | if( self.data_store ): 51 | try: 52 | self.data_store.close() 53 | except Exception as e: 54 | logging.exception(e) 55 | 56 | def gen_db_name(self, url): 57 | target = "unknown" 58 | try: 59 | parsed_url = urlparse(url) 60 | if parsed_url.netloc: 61 | target = parsed_url.hostname 62 | except Exception as e: 63 | logging.exception(e) 64 | 65 | return target + ".db" 66 | 67 | def setup_webhook(self, url, hooktype): 68 | 69 | webhook_enabled = False 70 | 71 | if hooktype and url: 72 | msg = "" 73 | self.webhook_url = url 74 | if hooktype == "slack": 75 | self.webhook_type = WebHookType.SLACK 76 | msg = "Configured SLACK webhook to {0}".format(self.webhook_url) 77 | elif hooktype == "teams": 78 | self.webhook_type = WebHookType.TEAMS 79 | msg = "Configured TEAMS webhook to {0}".format(self.webhook_url) 80 | elif hooktype == "discord": 81 | self.webhook_type = WebHookType.DISCORD 82 | msg = "Configured DISCORD webhook to {0}".format(self.webhook_url) 83 | elif hooktype == "generic": 84 | self.webhook_type = WebHookType.GENERIC 85 | msg = "Configured GENERIC webhook to {0}".format(self.webhook_url) 86 | else: 87 | self.webhook_type = WebHookType.NONE 88 | self.webhook_url = "" 89 | msg = "Couldn't properly parse out the webhook settings. Ignoring and will not send webhook notifications." 90 | 91 | logging.debug( msg ) 92 | 93 | if self.webhook_url: 94 | webhook_enabled = True 95 | else: 96 | self.webhook_type = WebHookType.NONE 97 | self.webhook_url = "" 98 | logging.debug( "No webhooks configured.") 99 | webhook_enabled = False 100 | 101 | return webhook_enabled 102 | 103 | def dump_data(self, signal, frame): 104 | choice = input( "\nEarly abort detected. Dump data already collected? (to processed.txt): [y/N] ") 105 | choice = choice.lower() 106 | if choice == 'y': 107 | with open('processed.txt', 'w') as f: 108 | for item in self.processed_urls: 109 | f.write("%s\n" % item) 110 | sys.exit() 111 | 112 | def process_scraper_results(self, future): 113 | # get the items of interest from the future object 114 | internal_urls = future._result[0] 115 | url = future._result[1] 116 | sig = future._result[2] 117 | new_content = future._result[3] 118 | raw_content = future._result[4] 119 | diff_content = future._result[5] 120 | 121 | # There are occassions when an unknown media type gets through and 122 | # can't be properly hashed, which leaves sig empty. Instead of b0rking, 123 | # let's just let it go and move on. 124 | if new_content and sig: 125 | page = ScrapedPage(url, sig, raw_content) 126 | self.data_store.add_page(page) 127 | if diff_content: 128 | diff_id = self.data_store.add_diff(url,diff_content) 129 | self.diff_list.add(diff_id) 130 | self.notify_webhook(url, diff_id) 131 | 132 | # also add scraped links to queue if they 133 | # aren't already queued or already processed 134 | for link_url in internal_urls: 135 | # We have to account for not just internal pages, but external scripts foreign to 136 | # the target app. ie: jQuery, Angular etc 137 | if link_url.startswith(self.seed_url) or link_url.lower().endswith(".js"): 138 | if link_url not in self.queued_urls.queue and link_url not in self.processed_urls: 139 | self.queued_urls.put(link_url) 140 | 141 | @property 142 | def processed(self): 143 | return len(self.processed_urls) 144 | 145 | def dump_pages(self): 146 | self.data_store.dump_pages() 147 | 148 | def notify_webhook(self, url, diff_id): 149 | if self.webhook: 150 | message = "Changes detected on {0}. Review in {1} [#diff: {2}]".format(url, self.db_name, diff_id) 151 | self.webhook.notify(message, url) 152 | 153 | def add_targets(self, targets): 154 | for target in targets: 155 | if self.valid_url(target): 156 | logging.debug( "Adding {0} to queue...".format(target) ) 157 | self.queued_urls.put(target) 158 | 159 | def valid_url(self, x): 160 | try: 161 | result = urlparse(x) 162 | return all([result.scheme in ["http", "https", "ftp"], result.netloc]) 163 | except Exception: 164 | return False 165 | 166 | def start(self, targets): 167 | LOG_EVERY_N = 500 168 | i = 0 169 | 170 | if targets: 171 | # If --scope is used, we need to remove the seed_url and ONLY scan those 172 | # targets defined 173 | self.queued_urls.queue.clear() 174 | self.scoped_scan = True 175 | self.add_targets(targets) 176 | 177 | jobs = [] 178 | 179 | while True: 180 | try: 181 | # get a url from the queue 182 | target_url = self.queued_urls.get(timeout=15) 183 | 184 | # check that the url hasn't already been processed 185 | if target_url not in self.processed_urls: 186 | # add url to the processed list 187 | self.processed_urls.add(target_url) 188 | 189 | logging.debug(f'[URL] {target_url}') 190 | 191 | # Check the datastore to see if we have a sig for this page 192 | scraped_page = self.data_store.get_page(target_url) 193 | 194 | job = self.pool.submit(Scraper(target_url, scraped_page, self.scoped_scan).scrape) 195 | job.add_done_callback(self.process_scraper_results) 196 | jobs.append(job) 197 | 198 | if self.show_stats and i % LOG_EVERY_N == 0: 199 | print("Processed: {0:<8} | Queue: {1:<8} | Scheduled Jobs: {2:<8}".format( 200 | len(self.processed_urls), 201 | self.queued_urls.qsize(), 202 | self.pool._work_queue.qsize())) 203 | 204 | i=i+1 205 | except Empty: 206 | logging.debug("All queues and jobs complete.") 207 | 208 | # We need to wait until all child threads/jobs have completed before we can dump 209 | # the diffs. If we don't wait, we may miss a few children still being processed and 210 | # could cause a runtime exception due to the diff_list size changing. 211 | concurrent.futures.wait(jobs, timeout=None, return_when=ALL_COMPLETED) 212 | self.display_results() 213 | return 214 | except Exception as e: 215 | logging.exception(e) 216 | continue 217 | 218 | def display_results(self): 219 | if len(self.diff_list) > 0: 220 | diff_viewer = DisplayDiff(self.db_name) 221 | 222 | try: 223 | if self.print_mode == CodeArgosPrintMode.DIFF or self.print_mode == CodeArgosPrintMode.BOTH: 224 | for diff_id in self.diff_list.copy(): 225 | diff_viewer.show(diff_id) 226 | except Exception as e: 227 | logging.exception(e) 228 | finally: 229 | if self.print_mode == CodeArgosPrintMode.ID or self.print_mode == CodeArgosPrintMode.BOTH: 230 | print( "diffs: {0}".format(self.diff_list)) 231 | -------------------------------------------------------------------------------- /codeargos/webhook.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | from enum import IntEnum 3 | from codeargos.enums import WebHookType 4 | import logging 5 | import requests 6 | import json 7 | import pymsteams 8 | 9 | class WebHook: 10 | DEFAULT_HEADERS = {'Content-Type': 'application/json'} 11 | 12 | def __init__(self, url, hooktype=WebHookType.GENERIC): 13 | self.url = url 14 | self.hooktype = hooktype 15 | 16 | def notify(self, message, code_url = ""): 17 | if message: 18 | # Why oh why can't python support switch/case??? >:( 19 | if self.hooktype == WebHookType.GENERIC: 20 | self.__send_to_generic_webhook(message, code_url) 21 | elif self.hooktype == WebHookType.SLACK: 22 | self.__send_to_slack(message) 23 | elif self.hooktype == WebHookType.TEAMS: 24 | self.__send_to_teams(message, code_url) 25 | elif self.hooktype == WebHookType.DISCORD: 26 | self.__send_to_discord(message) 27 | 28 | def __send_to_generic_webhook(self, message, code_url): 29 | logging.debug( "[WEBHOOK] {0}".format(message)) 30 | 31 | data = { 32 | 'content': message, 33 | 'username': 'codeargos', 34 | 'code_url': code_url 35 | } 36 | 37 | try: 38 | response = requests.post( self.url, data=json.dumps(data), headers=WebHook.DEFAULT_HEADERS) 39 | if not response.ok: 40 | logging.debug( "Failed to send notification via Generic webhook. Server response: {0}".format(response.text)) 41 | except Exception as e: 42 | logging.exception(e) 43 | 44 | # See https://api.slack.com/messaging/webhooks for more info 45 | def __send_to_slack(self, message): 46 | logging.debug( "[SLACK] {0}".format(message)) 47 | 48 | # Remember to set your webhook up at https://my.slack.com/services/new/incoming-webhook/ 49 | data = { 50 | 'text': message, 51 | 'username': 'CodeArgos', 52 | 'icon_emoji': ':skull_and_crossbones:' 53 | } 54 | 55 | try: 56 | response = requests.post( self.url, data=json.dumps(data), headers=WebHook.DEFAULT_HEADERS) 57 | if not response.ok: 58 | logging.debug( "Failed to send notification via Slack. Server response: {0}".format(response.text)) 59 | except Exception as e: 60 | logging.exception(e) 61 | 62 | # See https://docs.microsoft.com/en-us/microsoftteams/platform/webhooks-and-connectors/how-to/add-incoming-webhook 63 | def __send_to_teams(self, message, code_url): 64 | logging.debug( "[TEAMS] {0}".format(message)) 65 | 66 | try: 67 | teams = pymsteams.connectorcard(self.url) 68 | teams.title( "☠ Code changes detected!") 69 | teams.addLinkButton( "View code", code_url) 70 | teams.text(message) 71 | teams.send() 72 | except Exception as e: 73 | logging.exception(e) 74 | 75 | def __send_to_discord(self, message): 76 | logging.debug( "[DISCORD] {0}".format(message)) 77 | 78 | data = { 79 | 'content': message, 80 | 'username': 'codeargos' 81 | } 82 | 83 | try: 84 | response = requests.post( self.url, data=json.dumps(data), headers=WebHook.DEFAULT_HEADERS) 85 | if not response.ok: 86 | logging.debug( "Failed to send notification via Discord. Server response: {0}".format(response.text)) 87 | 88 | except Exception as e: 89 | logging.exception(e) -------------------------------------------------------------------------------- /images/CodeArgos-Discord-Notification.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DanaEpp/CodeArgos/6189b15952bfe8cb95a0aa6dbf39a7eaecb4acc2/images/CodeArgos-Discord-Notification.png -------------------------------------------------------------------------------- /images/CodeArgos-LogicApp.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DanaEpp/CodeArgos/6189b15952bfe8cb95a0aa6dbf39a7eaecb4acc2/images/CodeArgos-LogicApp.png -------------------------------------------------------------------------------- /images/CodeArgos-Slack-Notification.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DanaEpp/CodeArgos/6189b15952bfe8cb95a0aa6dbf39a7eaecb4acc2/images/CodeArgos-Slack-Notification.png -------------------------------------------------------------------------------- /images/CodeArgos-Teams-Notification.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DanaEpp/CodeArgos/6189b15952bfe8cb95a0aa6dbf39a7eaecb4acc2/images/CodeArgos-Teams-Notification.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | urllib3==1.26.5 2 | colorama==0.4.3 3 | jsbeautifier==1.11.0 4 | requests==2.31.0 5 | pymsteams==0.1.13 6 | beautifulsoup4==4.9.1 7 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | from setuptools import setup 3 | 4 | setup( 5 | name='CodeArgos', 6 | version='0.2', 7 | description='A python module for red teams to support the continuous recon of JavaScript files and HTML script blocks in an active web application.', 8 | long_description=open('README.md').read(), 9 | author='Dana Epp', 10 | author_email='dana@vulscan.com', 11 | url='https://github.com/danaepp/codeargos', 12 | license='MIT', 13 | packages=['codeargos'], 14 | install_requires=[ 'requests', 'colorama', 'jsbeautifier', 'urllib3', 'beautifulsoup4', 'pymsteams' ] 15 | ) -------------------------------------------------------------------------------- /sonar-project.properties: -------------------------------------------------------------------------------- 1 | sonar.projectKey=codeargos-github 2 | sonar.organization=danaepp-github 3 | sonar.projectName=CodeArgos 4 | sonar.host.url=https://sonarcloud.io 5 | sonar.python.version=3 6 | -------------------------------------------------------------------------------- /test_site/about.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | About Test Site 7 | 8 | 9 | 10 | 11 | 15 | 16 | 17 |

About Test Site

18 |

19 | Why are you even reading this? It's a test site for the parser. Go away. 20 |

21 |

22 | You're still here. OK fine. Check this out. 23 |

24 |

25 | Now go away, or I will have to get my fighting trousers. 26 |

27 | 28 | 32 | 33 | -------------------------------------------------------------------------------- /test_site/evenmorecode.js: -------------------------------------------------------------------------------- 1 | var even = "more code" 2 | var answer = 0 + 42 -------------------------------------------------------------------------------- /test_site/external.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | About Test Site 7 | 8 | 9 | 10 | 11 | 12 |

External Code

13 |

14 | Call code from an external file locally. 15 |

16 | 17 | -------------------------------------------------------------------------------- /test_site/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | Test Site 7 | 8 | 9 | 10 |

Test Site

11 |

12 | This site is designed to test CodeArgos locally. This should NOT be used in production. 13 |

14 |

15 |

19 | 20 |

21 | 22 | -------------------------------------------------------------------------------- /test_site/launch_test_site.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | python3 -m http.server 9000 -------------------------------------------------------------------------------- /test_site/site.css: -------------------------------------------------------------------------------- 1 | body { 2 | font-family:Arial, Helvetica, sans-serif 3 | } 4 | -------------------------------------------------------------------------------- /test_site/somecode.js: -------------------------------------------------------------------------------- 1 | var fubar = "Oh yeah!" 2 | var x = 1 + 2 + 3 3 | --------------------------------------------------------------------------------