├── .github └── dependabot.yml ├── .gitignore ├── .vscode └── settings.json ├── CHANGELOG.txt ├── CONTRIBUTING.md ├── Dockerfile ├── LICENSE.txt ├── Makefile ├── README.md ├── VERSION.txt ├── app ├── __init__.py ├── auth.py ├── main.py ├── misc.py ├── settings.py └── summarizer.py ├── docker-compose.yml ├── env.example ├── quickview.py ├── requirements.txt ├── sample-data.txt ├── static ├── apple-touch-icon-precomposed.png ├── apple-touch-icon.png ├── limits.html ├── main.css ├── markdown-styles.css ├── quote-150x150.png ├── summary-example.png ├── text-example.png └── tlp-clear.png └── templates └── index.html /.github/dependabot.yml: -------------------------------------------------------------------------------- 1 | # To get started with Dependabot version updates, you'll need to specify which 2 | # package ecosystems to update and where the package manifests are located. 3 | # Please see the documentation for all configuration options: 4 | # https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates 5 | 6 | version: 2 7 | updates: 8 | - package-ecosystem: "pip" 9 | directory: "/" # Location of package manifests 10 | schedule: 11 | interval: "weekly" 12 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | venv/ 2 | .python-version 3 | 4 | *.pyc 5 | __pycache__/ 6 | 7 | instance/ 8 | 9 | .pytest_cache/ 10 | .coverage 11 | htmlcov/ 12 | 13 | dist/ 14 | build/ 15 | *.egg-info/ 16 | 17 | .DS_STORE 18 | 19 | .env 20 | *.gif 21 | 22 | env.backup 23 | -------------------------------------------------------------------------------- /.vscode/settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "python.analysis.typeCheckingMode": "basic", 3 | "python.linting.pylintEnabled": false, 4 | "python.linting.pycodestyleEnabled": true, 5 | "python.linting.enabled": true, 6 | "python.linting.pycodestyleArgs": ["--max-line-length=128", "--ignore=E501", ], 7 | "python.linting.pylintArgs": [ "--disable=F0401" ] 8 | } -------------------------------------------------------------------------------- /CHANGELOG.txt: -------------------------------------------------------------------------------- 1 | Version 0.5 (204/3/19) 2 | ========================== 3 | 4 | * added PDF upload functionality 5 | * cleaned up stuff pylint was complaining about 6 | 7 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing 2 | 3 | TL;DR: 4 | 1. Pull requests welcome 5 | 2. Please use the `develop` branch . Use the fork & PR model 6 | 3. The `main` branch is protected and will be used to merge in new things from the `develop` branch 7 | 4. Make sure, you have unit tests 8 | 9 | Thank you ❤️ for contributing! Let's make this a great CTI summarization tool. 10 | 11 | ## What are we currently looking for? 12 | 13 | * Take a look at [the issues](https://github.com/aaronkaplan/openai-cti-summarizer/issues) 14 | * Overall goal for version 2: 15 | - change everything to use langchain 16 | - allow for local LLMs (llama-2-70b or similar) in addition to MS Azure's OpenAI (middle level of sensitivity) or openAI's API (TLP:CLEAR) 17 | - promptsDB 18 | - enhance UI so that a few proposals get presented to the user. The user shall be able to select the best one and possibly edit the final result. 19 | - Automate QA control. 20 | 21 | Any help in these directions is much appreciated. 22 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM python:3.11-bullseye 2 | 3 | # create a working directory 4 | RUN mkdir /app 5 | WORKDIR /app 6 | 7 | # copy the requirements.txt file 8 | COPY requirements.txt . 9 | 10 | # install the dependencies 11 | RUN pip install -r requirements.txt 12 | 13 | # copy the main files 14 | COPY app /app 15 | COPY templates /templates 16 | COPY static /static 17 | COPY .env / 18 | COPY VERSION.txt / 19 | 20 | # expose the port for the FastAPI application 21 | EXPOSE 9999 22 | 23 | # run the FastAPI application 24 | CMD ["uvicorn", "main:app", "--access-log", "--reload", "--host", "0.0.0.0", "--port", "9999"] 25 | 26 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | EUROPEAN UNION PUBLIC LICENCE v. 1.2 2 | EUPL © the European Union 2007, 2016 3 | 4 | This European Union Public Licence (the ‘EUPL’) applies to the Work (as defined 5 | below) which is provided under the terms of this Licence. Any use of the Work, 6 | other than as authorised under this Licence is prohibited (to the extent such 7 | use is covered by a right of the copyright holder of the Work). 8 | 9 | The Work is provided under the terms of this Licence when the Licensor (as 10 | defined below) has placed the following notice immediately following the 11 | copyright notice for the Work: 12 | 13 | Licensed under the EUPL 14 | 15 | or has expressed by any other means his willingness to license under the EUPL. 16 | 17 | 1. Definitions 18 | 19 | In this Licence, the following terms have the following meaning: 20 | 21 | - ‘The Licence’: this Licence. 22 | 23 | - ‘The Original Work’: the work or software distributed or communicated by the 24 | Licensor under this Licence, available as Source Code and also as Executable 25 | Code as the case may be. 26 | 27 | - ‘Derivative Works’: the works or software that could be created by the 28 | Licensee, based upon the Original Work or modifications thereof. This Licence 29 | does not define the extent of modification or dependence on the Original Work 30 | required in order to classify a work as a Derivative Work; this extent is 31 | determined by copyright law applicable in the country mentioned in Article 15. 32 | 33 | - ‘The Work’: the Original Work or its Derivative Works. 34 | 35 | - ‘The Source Code’: the human-readable form of the Work which is the most 36 | convenient for people to study and modify. 37 | 38 | - ‘The Executable Code’: any code which has generally been compiled and which is 39 | meant to be interpreted by a computer as a program. 40 | 41 | - ‘The Licensor’: the natural or legal person that distributes or communicates 42 | the Work under the Licence. 43 | 44 | - ‘Contributor(s)’: any natural or legal person who modifies the Work under the 45 | Licence, or otherwise contributes to the creation of a Derivative Work. 46 | 47 | - ‘The Licensee’ or ‘You’: any natural or legal person who makes any usage of 48 | the Work under the terms of the Licence. 49 | 50 | - ‘Distribution’ or ‘Communication’: any act of selling, giving, lending, 51 | renting, distributing, communicating, transmitting, or otherwise making 52 | available, online or offline, copies of the Work or providing access to its 53 | essential functionalities at the disposal of any other natural or legal 54 | person. 55 | 56 | 2. Scope of the rights granted by the Licence 57 | 58 | The Licensor hereby grants You a worldwide, royalty-free, non-exclusive, 59 | sublicensable licence to do the following, for the duration of copyright vested 60 | in the Original Work: 61 | 62 | - use the Work in any circumstance and for all usage, 63 | - reproduce the Work, 64 | - modify the Work, and make Derivative Works based upon the Work, 65 | - communicate to the public, including the right to make available or display 66 | the Work or copies thereof to the public and perform publicly, as the case may 67 | be, the Work, 68 | - distribute the Work or copies thereof, 69 | - lend and rent the Work or copies thereof, 70 | - sublicense rights in the Work or copies thereof. 71 | 72 | Those rights can be exercised on any media, supports and formats, whether now 73 | known or later invented, as far as the applicable law permits so. 74 | 75 | In the countries where moral rights apply, the Licensor waives his right to 76 | exercise his moral right to the extent allowed by law in order to make effective 77 | the licence of the economic rights here above listed. 78 | 79 | The Licensor grants to the Licensee royalty-free, non-exclusive usage rights to 80 | any patents held by the Licensor, to the extent necessary to make use of the 81 | rights granted on the Work under this Licence. 82 | 83 | 3. Communication of the Source Code 84 | 85 | The Licensor may provide the Work either in its Source Code form, or as 86 | Executable Code. If the Work is provided as Executable Code, the Licensor 87 | provides in addition a machine-readable copy of the Source Code of the Work 88 | along with each copy of the Work that the Licensor distributes or indicates, in 89 | a notice following the copyright notice attached to the Work, a repository where 90 | the Source Code is easily and freely accessible for as long as the Licensor 91 | continues to distribute or communicate the Work. 92 | 93 | 4. Limitations on copyright 94 | 95 | Nothing in this Licence is intended to deprive the Licensee of the benefits from 96 | any exception or limitation to the exclusive rights of the rights owners in the 97 | Work, of the exhaustion of those rights or of other applicable limitations 98 | thereto. 99 | 100 | 5. Obligations of the Licensee 101 | 102 | The grant of the rights mentioned above is subject to some restrictions and 103 | obligations imposed on the Licensee. Those obligations are the following: 104 | 105 | Attribution right: The Licensee shall keep intact all copyright, patent or 106 | trademarks notices and all notices that refer to the Licence and to the 107 | disclaimer of warranties. The Licensee must include a copy of such notices and a 108 | copy of the Licence with every copy of the Work he/she distributes or 109 | communicates. The Licensee must cause any Derivative Work to carry prominent 110 | notices stating that the Work has been modified and the date of modification. 111 | 112 | Copyleft clause: If the Licensee distributes or communicates copies of the 113 | Original Works or Derivative Works, this Distribution or Communication will be 114 | done under the terms of this Licence or of a later version of this Licence 115 | unless the Original Work is expressly distributed only under this version of the 116 | Licence — for example by communicating ‘EUPL v. 1.2 only’. The Licensee 117 | (becoming Licensor) cannot offer or impose any additional terms or conditions on 118 | the Work or Derivative Work that alter or restrict the terms of the Licence. 119 | 120 | Compatibility clause: If the Licensee Distributes or Communicates Derivative 121 | Works or copies thereof based upon both the Work and another work licensed under 122 | a Compatible Licence, this Distribution or Communication can be done under the 123 | terms of this Compatible Licence. For the sake of this clause, ‘Compatible 124 | Licence’ refers to the licences listed in the appendix attached to this Licence. 125 | Should the Licensee's obligations under the Compatible Licence conflict with 126 | his/her obligations under this Licence, the obligations of the Compatible 127 | Licence shall prevail. 128 | 129 | Provision of Source Code: When distributing or communicating copies of the Work, 130 | the Licensee will provide a machine-readable copy of the Source Code or indicate 131 | a repository where this Source will be easily and freely available for as long 132 | as the Licensee continues to distribute or communicate the Work. 133 | 134 | Legal Protection: This Licence does not grant permission to use the trade names, 135 | trademarks, service marks, or names of the Licensor, except as required for 136 | reasonable and customary use in describing the origin of the Work and 137 | reproducing the content of the copyright notice. 138 | 139 | 6. Chain of Authorship 140 | 141 | The original Licensor warrants that the copyright in the Original Work granted 142 | hereunder is owned by him/her or licensed to him/her and that he/she has the 143 | power and authority to grant the Licence. 144 | 145 | Each Contributor warrants that the copyright in the modifications he/she brings 146 | to the Work are owned by him/her or licensed to him/her and that he/she has the 147 | power and authority to grant the Licence. 148 | 149 | Each time You accept the Licence, the original Licensor and subsequent 150 | Contributors grant You a licence to their contributions to the Work, under the 151 | terms of this Licence. 152 | 153 | 7. Disclaimer of Warranty 154 | 155 | The Work is a work in progress, which is continuously improved by numerous 156 | Contributors. It is not a finished work and may therefore contain defects or 157 | ‘bugs’ inherent to this type of development. 158 | 159 | For the above reason, the Work is provided under the Licence on an ‘as is’ basis 160 | and without warranties of any kind concerning the Work, including without 161 | limitation merchantability, fitness for a particular purpose, absence of defects 162 | or errors, accuracy, non-infringement of intellectual property rights other than 163 | copyright as stated in Article 6 of this Licence. 164 | 165 | This disclaimer of warranty is an essential part of the Licence and a condition 166 | for the grant of any rights to the Work. 167 | 168 | 8. Disclaimer of Liability 169 | 170 | Except in the cases of wilful misconduct or damages directly caused to natural 171 | persons, the Licensor will in no event be liable for any direct or indirect, 172 | material or moral, damages of any kind, arising out of the Licence or of the use 173 | of the Work, including without limitation, damages for loss of goodwill, work 174 | stoppage, computer failure or malfunction, loss of data or any commercial 175 | damage, even if the Licensor has been advised of the possibility of such damage. 176 | However, the Licensor will be liable under statutory product liability laws as 177 | far such laws apply to the Work. 178 | 179 | 9. Additional agreements 180 | 181 | While distributing the Work, You may choose to conclude an additional agreement, 182 | defining obligations or services consistent with this Licence. However, if 183 | accepting obligations, You may act only on your own behalf and on your sole 184 | responsibility, not on behalf of the original Licensor or any other Contributor, 185 | and only if You agree to indemnify, defend, and hold each Contributor harmless 186 | for any liability incurred by, or claims asserted against such Contributor by 187 | the fact You have accepted any warranty or additional liability. 188 | 189 | 10. Acceptance of the Licence 190 | 191 | The provisions of this Licence can be accepted by clicking on an icon ‘I agree’ 192 | placed under the bottom of a window displaying the text of this Licence or by 193 | affirming consent in any other similar way, in accordance with the rules of 194 | applicable law. Clicking on that icon indicates your clear and irrevocable 195 | acceptance of this Licence and all of its terms and conditions. 196 | 197 | Similarly, you irrevocably accept this Licence and all of its terms and 198 | conditions by exercising any rights granted to You by Article 2 of this Licence, 199 | such as the use of the Work, the creation by You of a Derivative Work or the 200 | Distribution or Communication by You of the Work or copies thereof. 201 | 202 | 11. Information to the public 203 | 204 | In case of any Distribution or Communication of the Work by means of electronic 205 | communication by You (for example, by offering to download the Work from a 206 | remote location) the distribution channel or media (for example, a website) must 207 | at least provide to the public the information requested by the applicable law 208 | regarding the Licensor, the Licence and the way it may be accessible, concluded, 209 | stored and reproduced by the Licensee. 210 | 211 | 12. Termination of the Licence 212 | 213 | The Licence and the rights granted hereunder will terminate automatically upon 214 | any breach by the Licensee of the terms of the Licence. 215 | 216 | Such a termination will not terminate the licences of any person who has 217 | received the Work from the Licensee under the Licence, provided such persons 218 | remain in full compliance with the Licence. 219 | 220 | 13. Miscellaneous 221 | 222 | Without prejudice of Article 9 above, the Licence represents the complete 223 | agreement between the Parties as to the Work. 224 | 225 | If any provision of the Licence is invalid or unenforceable under applicable 226 | law, this will not affect the validity or enforceability of the Licence as a 227 | whole. Such provision will be construed or reformed so as necessary to make it 228 | valid and enforceable. 229 | 230 | The European Commission may publish other linguistic versions or new versions of 231 | this Licence or updated versions of the Appendix, so far this is required and 232 | reasonable, without reducing the scope of the rights granted by the Licence. New 233 | versions of the Licence will be published with a unique version number. 234 | 235 | All linguistic versions of this Licence, approved by the European Commission, 236 | have identical value. Parties can take advantage of the linguistic version of 237 | their choice. 238 | 239 | 14. Jurisdiction 240 | 241 | Without prejudice to specific agreement between parties, 242 | 243 | - any litigation resulting from the interpretation of this License, arising 244 | between the European Union institutions, bodies, offices or agencies, as a 245 | Licensor, and any Licensee, will be subject to the jurisdiction of the Court 246 | of Justice of the European Union, as laid down in article 272 of the Treaty on 247 | the Functioning of the European Union, 248 | 249 | - any litigation arising between other parties and resulting from the 250 | interpretation of this License, will be subject to the exclusive jurisdiction 251 | of the competent court where the Licensor resides or conducts its primary 252 | business. 253 | 254 | 15. Applicable Law 255 | 256 | Without prejudice to specific agreement between parties, 257 | 258 | - this Licence shall be governed by the law of the European Union Member State 259 | where the Licensor has his seat, resides or has his registered office, 260 | 261 | - this licence shall be governed by Belgian law if the Licensor has no seat, 262 | residence or registered office inside a European Union Member State. 263 | 264 | Appendix 265 | 266 | ‘Compatible Licences’ according to Article 5 EUPL are: 267 | 268 | - GNU General Public License (GPL) v. 2, v. 3 269 | - GNU Affero General Public License (AGPL) v. 3 270 | - Open Software License (OSL) v. 2.1, v. 3.0 271 | - Eclipse Public License (EPL) v. 1.0 272 | - CeCILL v. 2.0, v. 2.1 273 | - Mozilla Public Licence (MPL) v. 2 274 | - GNU Lesser General Public Licence (LGPL) v. 2.1, v. 3 275 | - Creative Commons Attribution-ShareAlike v. 3.0 Unported (CC BY-SA 3.0) for 276 | works other than software 277 | - European Union Public Licence (EUPL) v. 1.1, v. 1.2 278 | - Québec Free and Open-Source Licence — Reciprocity (LiLiQ-R) or Strong 279 | Reciprocity (LiLiQ-R+). 280 | 281 | The European Commission may update this Appendix to later versions of the above 282 | licences without producing a new version of the EUPL, as long as they provide 283 | the rights granted in Article 2 of this Licence and protect the covered Source 284 | Code from exclusive appropriation. 285 | 286 | All other changes or additions to this Appendix require the production of a new 287 | EUPL version. 288 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | VERSION=$(shell cat VERSION.txt) 2 | 3 | 4 | restart: 5 | docker compose down && docker compose --env-file .env up -d 6 | 7 | all: app/*.py requirements.txt Dockerfile docker-compose.yml static/* templates/* 8 | @echo "Building version $(VERSION)" 9 | docker build -t openai-summarizer:$(VERSION) . --network=host && docker compose down && docker compose --env-file .env up -d 10 | 11 | tests: 12 | @echo "Running tests" 13 | pytest -v 14 | 15 | clean: 16 | @echo "Cleaning up" 17 | docker compose down && docker rmi openai-summarizer:$(VERSION) -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # OpenAI and FastAPI - Text summarization 2 | 3 | This text summarizer is a web based frontend for summarising (Cyber Threat Intelligence]() (CTI) reports. 4 | It uses OpenAI's GPT-3.5 and GPT-4 API to generate meaningful summaries for management as well as for extracting IP addresses, domains, URLs, hashes etc from a CTI report. 5 | 6 | However, if this task is not what you need, you can also give it another system prompt as well. GPT-3.5 and GPT-4 are so flexible. 7 | 8 | One of the main benefits of using the API is that, according to OpenAI's documentation, they delete the queries after some time and don't use it for training the next models. See [here](https://platform.openai.com/docs/guides/chat/chat-vs-completions) and [here](https://openai.com/policies/usage-policies). 9 | 10 | 11 | 12 | Input: 13 | 14 | ![Example of a (public) Blog CTI blog post](static/text-example.png) 15 | 16 | 17 | Output: 18 | 19 | ![Example a GPT4 generated summary](static/summary-example.png) 20 | 21 | 22 | 23 | 24 | This code is losely based on [Oikosohn's](https://github.com/oikosohn/openai-quickstart-fastapi) openai quickstart fastapi repo, which in turn was based on [openai-quickstart-python](https://github.com/openai/openai-quickstart-python). 25 | 26 | 27 | It uses the OpenAI API [quickstart tutorial](https://beta.openai.com/docs/quickstart) and the [FastAPI](https://fastapi.tiangolo.com/) web framework. 28 | 29 | With prompt engineering, we ask openai's gpt-4 model to summarize a CTI text for upper management. 30 | 31 | **Note**: you will have to get your own API key for this. 32 | 33 | 34 | ## Setup 35 | 36 | 1. First make a copy of the example environment variables file 37 | 38 | ```bash 39 | # Linux 40 | $ cp env.example .env 41 | ``` 42 | 43 | ```shell 44 | # Windows 45 | xcopy .env.example .env 46 | ``` 47 | 48 | 2. Add your [API key](https://beta.openai.com/account/api-keys) to the newly created `.env` file 49 | *Note*: when coding, you might want to not send a request to openai for every page reload. In that case, set `DRY_RUN=1` in `.env`. 50 | 51 | 52 | 3. Then build the image: 53 | 54 | ```bash 55 | docker build -t openai-summarizer:0.1 . --network=host 56 | ``` 57 | 58 | (The .env file will be copied into the image as well) 59 | 60 | 61 | 4. Run the dockerized app 62 | 63 | ```bash 64 | $ docker compose up -d 65 | ``` 66 | 67 | 68 | You should now be able to access the app at [http://localhost:9999](http://localhost:9999)! 69 | 70 | 71 | ## Reference 72 | 73 | - [openai/openai-quickstart-python](https://github.com/openai/openai-quickstart-python) 74 | - [Oikosohn's fastapi openai demo](https://github.com/oikosohn/openai-quickstart-fastapi) 75 | 76 | 77 | # License 78 | 79 | This code is released under the [EUPL license 1.2](https://joinup.ec.europa.eu/collection/eupl/eupl-text-eupl-12) 80 | -------------------------------------------------------------------------------- /VERSION.txt: -------------------------------------------------------------------------------- 1 | 0.5 2 | -------------------------------------------------------------------------------- /app/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EC-DIGIT-CSIRC/openai-cti-summarizer/f42675459fcb523c83706e6c354fcfe2ba4a2f65/app/__init__.py -------------------------------------------------------------------------------- /app/auth.py: -------------------------------------------------------------------------------- 1 | """Authorization helper.""" 2 | 3 | import os 4 | 5 | from fastapi import Depends, HTTPException 6 | from fastapi.security import HTTPBasic, HTTPBasicCredentials 7 | 8 | 9 | security = HTTPBasic() 10 | 11 | # fake users to simulate authentication 12 | fake_users = { 13 | os.getenv('BASIC_AUTH_USER'): os.getenv('BASIC_AUTH_PASSWORD') 14 | } 15 | 16 | 17 | # dependency to check if the credentials are valid 18 | def get_current_username(credentials: HTTPBasicCredentials = Depends(security)): 19 | """Check if user in the allowed list""" 20 | username = credentials.username 21 | password = credentials.password 22 | if username in fake_users and password == fake_users[username]: 23 | return username 24 | raise HTTPException(status_code=401, detail="Invalid credentials") 25 | -------------------------------------------------------------------------------- /app/main.py: -------------------------------------------------------------------------------- 1 | """Main FastAPI file. Provides the app WSGI entry point.""" 2 | import os 3 | import sys 4 | import tempfile 5 | from urllib.parse import urlparse 6 | from distutils.util import strtobool # pylint: disable=deprecated-module 7 | 8 | import requests 9 | 10 | import fitz # PyMuPDF 11 | 12 | import uvicorn 13 | from fastapi import FastAPI, Request, Form, Depends, UploadFile, File 14 | from fastapi.responses import HTMLResponse 15 | from fastapi.templating import Jinja2Templates 16 | from fastapi.staticfiles import StaticFiles 17 | from starlette.middleware.base import BaseHTTPMiddleware 18 | import markdown 19 | 20 | from bs4 import BeautifulSoup 21 | from dotenv import load_dotenv, find_dotenv 22 | 23 | from summarizer import Summarizer # pylint: ignore=import-error 24 | from auth import get_current_username # pylint: ignore=import-error 25 | 26 | from settings import log # pylint: ignore=import-error 27 | 28 | 29 | # first get the env parametting 30 | if not load_dotenv(find_dotenv(), verbose=True, override=False): # read local .env file 31 | log.warning("Could not find .env file! Assuming ENV vars work") 32 | 33 | try: 34 | with open('../VERSION.txt', encoding='utf-8') as _f: 35 | VERSION = _f.readline().rstrip('\n') 36 | except Exception as e: 37 | log.error("could not find VERSION.txt, bailing out.") 38 | sys.exit(-1) 39 | 40 | 41 | app = FastAPI(version=VERSION) 42 | templates = Jinja2Templates(directory="/templates") 43 | app.mount("/static", StaticFiles(directory="/static"), name="static") 44 | GO_AZURE = bool(strtobool(os.getenv('USE_AZURE', 'true'))) 45 | OUTPUT_JSON = bool(strtobool(os.getenv('OUTPUT_JSON', 'false'))) 46 | DRY_RUN = bool(strtobool(os.getenv('DRY_RUN', 'false'))) 47 | OPENAI_MODEL = os.getenv('OPENAI_MODEL', 'gpt-3.5-turbo') 48 | 49 | # First detect if we should invoke OpenAI via MS Azure or directly 50 | try: 51 | GO_AZURE = bool(strtobool(os.getenv('USE_AZURE', 'false'))) 52 | except Exception as e: 53 | log.warning( 54 | f"Could not read 'USE_AZURE' env var. Reason: '{str(e)}'. Reverting to false.") 55 | GO_AZURE = False 56 | 57 | # print out settings 58 | log.info(f"{GO_AZURE=}") 59 | log.info(f"{OUTPUT_JSON=}") 60 | log.info(f"{DRY_RUN=}") 61 | log.info(f"{OPENAI_MODEL=}") 62 | 63 | 64 | class HTTPSRedirectMiddleware(BaseHTTPMiddleware): 65 | """HTTP to HTTPS redirection""" 66 | async def dispatch(self, request: Request, call_next): 67 | if 'X-Forwarded-Proto' in request.headers and request.headers['X-Forwarded-Proto'] == 'https': 68 | request.scope['scheme'] = 'https' 69 | response = await call_next(request) 70 | return response 71 | 72 | 73 | app.add_middleware(HTTPSRedirectMiddleware) 74 | 75 | summarizer = Summarizer(go_azure=GO_AZURE, model=OPENAI_MODEL, 76 | max_tokens=8192, output_json=OUTPUT_JSON) 77 | 78 | 79 | async def fetch_text_from_url(url: str) -> str: 80 | """Fetch the text behind url and try to extract it via beautiful soup. 81 | Returns text or raises an exception. 82 | """ 83 | parsed_url = urlparse(url) 84 | if not all([parsed_url.scheme, parsed_url.netloc]): 85 | raise ValueError("Invalid URL") 86 | 87 | response = requests.get(url, timeout=5) 88 | soup = BeautifulSoup(response.text, 'html.parser') 89 | text = soup.get_text() 90 | return text 91 | 92 | 93 | @app.get("/", response_class=HTMLResponse) 94 | def get_index(request: Request, username: str = Depends(get_current_username)): 95 | """Return the default page.""" 96 | return templates.TemplateResponse("index.html", {"request": request, "system_prompt": os.environ['SYSTEM_PROMPT'], "username": username}) 97 | 98 | 99 | def convert_pdf_to_markdown(filename: str) -> str: 100 | """Convert a PDF file given by to markdown. 101 | 102 | Args: 103 | filename: str the file on the filesystem 104 | 105 | Returns: 106 | markdown or "" empty string in case of error 107 | """ 108 | # Open the PDF file 109 | doc = fitz.open(filename) 110 | 111 | # Initialize a variable to hold the text 112 | markdown_content = "" 113 | 114 | # Iterate through each page of the PDF 115 | for page_num in range(len(doc)): 116 | # Get the page 117 | page = doc.load_page(page_num) 118 | 119 | # Extract text from the page 120 | text = page.get_text() 121 | 122 | # Add the text to our markdown content, followed by a page break 123 | markdown_content += text + "\n\n---\n\n" 124 | 125 | return markdown_content 126 | 127 | 128 | # The main POST method. Input can either be a URL or a PDF file or a textarea text 129 | @app.post("/", response_class=HTMLResponse) 130 | async def index(request: Request, # request object 131 | text: str = Form(None), # the text in the textarea 132 | url: str = Form(None), # alternatively the URL 133 | pdffile: UploadFile = File(None), 134 | system_prompt: str = Form(None), model: str = Form('model'), token_count: int = Form(100), 135 | username: str = Depends(get_current_username)): 136 | """HTTP POST method for the default page. This gets called when the user already HTTP POSTs a text which should be summarized.""" 137 | 138 | if url: 139 | log.warning(f"Got request with url: {url[:20]}") 140 | elif pdffile: 141 | log.warning(f"Got request with pdffile: {pdffile.filename}") 142 | elif text: 143 | log.warning(f"Got request with text: {text[:100]}") 144 | else: 145 | log.error("no pdffile, no text, no url. Bailing out.") 146 | error = "Expected either url field or text field or a PDF file. Please specify one at least." 147 | result = None 148 | return templates.TemplateResponse("index.html", {"request": request, "text": text, "system_prompt": system_prompt, "result": error, "success": False, "username": username}, status_code=400) 149 | 150 | summarizer.model = model 151 | summarizer.max_tokens = token_count 152 | 153 | if url: 154 | try: 155 | text = await fetch_text_from_url(url) 156 | except Exception as ex: 157 | return templates.TemplateResponse("index.html", {"request": request, "text": url, "system_prompt": system_prompt, "result": f"Could not fetch URL. Reason {str(ex)}", "success": False}, status_code=400) 158 | 159 | elif pdffile: 160 | log.warning("we got a pdffile") 161 | try: 162 | suffix = ".pdf" 163 | with tempfile.NamedTemporaryFile(delete=False, suffix=suffix) as tmp: 164 | tmp.write(pdffile.file.read()) 165 | tmp_pdf_path = tmp.name # Temp file path 166 | log.warning(f"stored as {tmp_pdf_path}") 167 | 168 | # Convert PDF to Markdown 169 | text = convert_pdf_to_markdown(tmp_pdf_path) 170 | log.warning(f"converted as {text[:100]}") 171 | 172 | # Cleanup the temporary file 173 | os.unlink(tmp_pdf_path) 174 | except Exception as ex: 175 | return templates.TemplateResponse("index.html", {"request": request, "text": text, "system_prompt": system_prompt, "result": f"Could not process the PDF file. Reason {str(ex)}", "success": False}, status_code=400) 176 | 177 | # we got the text from the URL or the pdffile was converted... now check if we should actually summarize 178 | if DRY_RUN: 179 | result = "This is a sample response, we are in dry-run mode. We don't want to waste money for querying the API." 180 | error = None 181 | else: 182 | result, error = summarizer.summarize(text, system_prompt) 183 | 184 | if error: 185 | return templates.TemplateResponse("index.html", {"request": request, "text": text, "system_prompt": system_prompt, "result": error, "success": False, "username": username}, status_code=400) 186 | 187 | result = markdown.markdown(result) 188 | return templates.TemplateResponse("index.html", { 189 | "request": request, 190 | "text": text, 191 | "system_prompt": system_prompt, 192 | "result": result, 193 | "success": True, 194 | "model": model, 195 | "username": username, 196 | "token_count": token_count}) 197 | 198 | 199 | if __name__ == "__main__": 200 | uvicorn.run('main:app', host="localhost", port=9999, reload=True) 201 | -------------------------------------------------------------------------------- /app/misc.py: -------------------------------------------------------------------------------- 1 | """ 2 | misc.py - collection of all kinds of stuff 3 | """ 4 | 5 | LORE_IPSUM = """ 6 | Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Malesuada pellentesque elit eget gravida cum. Sed id semper risus in hendrerit. Dui vivamus arcu felis bibendum ut tristique et. Felis imperdiet proin fermentum leo vel orci. Sit amet facilisis magna etiam tempor orci eu lobortis. Enim ut tellus elementum sagittis. Nulla at volutpat diam ut venenatis tellus in metus vulputate. Phasellus faucibus scelerisque eleifend donec pretium vulputate. Cras adipiscing enim eu turpis egestas pretium aenean. Tincidunt augue interdum velit euismod in pellentesque massa. Malesuada fames ac turpis egestas maecenas pharetra convallis posuere. Auctor augue mauris augue neque gravida in. Tempus imperdiet nulla malesuada pellentesque elit eget gravida cum. Morbi tristique senectus et netus et malesuada fames ac. 7 | 8 | Amet tellus cras adipiscing enim eu turpis egestas pretium. Orci nulla pellentesque dignissim enim sit. Suspendisse potenti nullam ac tortor vitae purus faucibus ornare. Enim neque volutpat ac tincidunt vitae semper quis. Ullamcorper eget nulla facilisi etiam. Proin sed libero enim sed. Tortor pretium viverra suspendisse potenti nullam ac tortor vitae. Adipiscing bibendum est ultricies integer quis auctor elit sed vulputate. Elit at imperdiet dui accumsan sit amet nulla facilisi. Augue eget arcu dictum varius duis. Tortor posuere ac ut consequat semper viverra nam libero. Phasellus faucibus scelerisque eleifend donec pretium vulputate sapien nec. Magna fermentum iaculis eu non diam phasellus vestibulum lorem. 9 | 10 | Faucibus interdum posuere lorem ipsum dolor sit. Nulla pellentesque dignissim enim sit. Tincidunt praesent semper feugiat nibh sed pulvinar proin gravida. Cras ornare arcu dui vivamus. Interdum varius sit amet mattis vulputate. Enim nulla aliquet porttitor lacus luctus. Vitae justo eget magna fermentum. Auctor elit sed vulputate mi sit amet mauris commodo quis. Congue eu consequat ac felis donec et odio pellentesque diam. Urna molestie at elementum eu facilisis sed. In metus vulputate eu scelerisque felis imperdiet proin. Sollicitudin ac orci phasellus egestas. 11 | 12 | Consectetur a erat nam at lectus urna duis convallis convallis. Justo donec enim diam vulputate ut pharetra sit. Sagittis purus sit amet volutpat consequat mauris. Placerat vestibulum lectus mauris ultrices. Feugiat nibh sed pulvinar proin. Placerat orci nulla pellentesque dignissim enim. Dui accumsan sit amet nulla facilisi. Magna sit amet purus gravida quis. Id donec ultrices tincidunt arcu non sodales. Volutpat sed cras ornare arcu dui. 13 | 14 | Pharetra pharetra massa massa ultricies. Lacus luctus accumsan tortor posuere ac. Libero justo laoreet sit amet cursus. Posuere morbi leo urna molestie at elementum eu facilisis. Arcu ac tortor dignissim convallis. Euismod nisi porta lorem mollis aliquam ut. Id aliquet lectus proin nibh nisl condimentum id venenatis a. Praesent semper feugiat nibh sed pulvinar. Fermentum posuere urna nec tincidunt praesent semper feugiat nibh. Sed ullamcorper morbi tincidunt ornare massa eget egestas. Cras semper auctor neque vitae tempus quam pellentesque. 15 | 16 | Nibh praesent tristique magna sit amet. Id porta nibh venenatis cras. Dictumst vestibulum rhoncus est pellentesque elit. Tempus imperdiet nulla malesuada pellentesque elit eget gravida cum. Purus gravida quis blandit turpis cursus in. Mattis pellentesque id nibh tortor id aliquet. A diam sollicitudin tempor id eu nisl nunc mi. At volutpat diam ut venenatis tellus in metus vulputate. Imperdiet dui accumsan sit amet nulla facilisi morbi. Morbi tristique senectus et netus. Sit amet massa vitae tortor condimentum. Fusce ut placerat orci nulla pellentesque. In hac habitasse platea dictumst vestibulum rhoncus est. Lectus urna duis convallis convallis tellus id. Euismod in pellentesque massa placerat duis ultricies. Vitae purus faucibus ornare suspendisse sed nisi lacus sed viverra. Imperdiet proin fermentum leo vel orci porta non. Auctor neque vitae tempus quam pellentesque nec nam. 17 | 18 | Pharetra pharetra massa massa ultricies mi quis hendrerit dolor. Neque ornare aenean euismod elementum nisi quis eleifend quam adipiscing. Adipiscing at in tellus integer feugiat scelerisque varius. Maecenas ultricies mi eget mauris pharetra et. Luctus venenatis lectus magna fringilla. Aliquet risus feugiat in ante metus dictum at tempor commodo. Tellus at urna condimentum mattis pellentesque id nibh. Nec feugiat in fermentum posuere urna nec tincidunt. Adipiscing elit ut aliquam purus sit amet luctus. Lobortis feugiat vivamus at augue eget arcu dictum varius duis. Interdum consectetur libero id faucibus nisl tincidunt. 19 | 20 | Metus dictum at tempor commodo ullamcorper a lacus vestibulum. Nullam ac tortor vitae purus faucibus. Mattis rhoncus urna neque viverra justo. Et egestas quis ipsum suspendisse. Erat velit scelerisque in dictum non consectetur a erat nam. Pulvinar etiam non quam lacus suspendisse faucibus interdum posuere. Faucibus scelerisque eleifend donec pretium vulputate sapien. Nunc sed id semper risus in. Nibh nisl condimentum id venenatis a condimentum vitae sapien. Nullam vehicula ipsum a arcu cursus vitae congue. 21 | 22 | Aenean vel elit scelerisque mauris. Leo vel fringilla est ullamcorper eget nulla facilisi etiam. Magna fermentum iaculis eu non diam phasellus vestibulum lorem sed. Lectus urna duis convallis convallis tellus id. Facilisi nullam vehicula ipsum a arcu cursus. Tincidunt eget nullam non nisi est sit. Dictum at tempor commodo ullamcorper a lacus vestibulum sed arcu. Blandit massa enim nec dui nunc mattis enim ut tellus. Duis ut diam quam nulla porttitor massa id neque. Blandit aliquam etiam erat velit scelerisque in dictum. At consectetur lorem donec massa sapien faucibus et. Tempor commodo ullamcorper a lacus vestibulum sed arcu non odio. Netus et malesuada fames ac turpis egestas integer. Consectetur lorem donec massa sapien. Urna porttitor rhoncus dolor purus non. 23 | 24 | Ipsum dolor sit amet consectetur adipiscing elit ut aliquam. Purus sit amet luctus venenatis lectus magna fringilla. Pulvinar neque laoreet suspendisse interdum consectetur libero id faucibus nisl. Nulla facilisi morbi tempus iaculis urna id volutpat lacus laoreet. Orci dapibus ultrices in iaculis nunc sed augue lacus. Nulla facilisi nullam vehicula ipsum a arcu cursus vitae congue. Velit egestas dui id ornare arcu odio ut sem nulla. Sed odio morbi quis commodo. Sagittis orci a scelerisque purus semper eget duis at tellus. Nunc id cursus metus aliquam eleifend mi in nulla. Sit amet nisl suscipit adipiscing bibendum est ultricies. Felis eget nunc lobortis mattis aliquam. 25 | """ -------------------------------------------------------------------------------- /app/settings.py: -------------------------------------------------------------------------------- 1 | """General settings config.""" 2 | 3 | import logging 4 | 5 | 6 | # Configure logging 7 | logging.basicConfig(level=logging.INFO, 8 | format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', 9 | datefmt='%Y-%m-%d %H:%M:%S') 10 | 11 | log = logging.getLogger(__name__) 12 | -------------------------------------------------------------------------------- /app/summarizer.py: -------------------------------------------------------------------------------- 1 | """The summarizer class, abstracting away the LLM.""" 2 | import os 3 | from typing import Tuple 4 | 5 | import openai 6 | from openai import AzureOpenAI 7 | 8 | from settings import log # pylint: ignore=import-error 9 | 10 | # first get the env parametting 11 | from dotenv import load_dotenv, find_dotenv 12 | _ = load_dotenv(find_dotenv()) # read local .env file 13 | 14 | 15 | class Summarizer: 16 | """Wrapper to summarize texts via OpenAI or MS Azure's OpenAI.""" 17 | 18 | client: openai._base_client.BaseClient 19 | 20 | def __init__(self, model: str, max_tokens: int, system_prompt: str = "", go_azure: bool = False, output_json: bool = False): 21 | if system_prompt: 22 | self.system_prompt = system_prompt 23 | else: 24 | self.system_prompt = "You are a Cyber Threat Intelligence Analyst and need to summarise a report for upper management. The report shall be nicely formatted with two sections: one Executive Summary section and one 'TTPs and IoCs' section. The second section shall list all IP addresses, domains, URLs, tools and hashes (sha-1, sha256, md5, etc.) which can be found in the report. Nicely format the report as markdown. Use newlines between markdown headings." 25 | self.model = model 26 | self.max_tokens = max_tokens 27 | self.go_azure = go_azure 28 | self.output_json = output_json 29 | 30 | if self.go_azure: 31 | api_version = os.environ['OPENAI_API_VERSION'] 32 | azure_endpoint = os.environ['OPENAI_API_BASE'] 33 | azure_deployment = os.environ['ENGINE'] 34 | api_key = os.environ['AZURE_OPENAI_API_KEY'] 35 | log.debug(f""" 36 | {api_version=}, 37 | {azure_endpoint=}, 38 | {azure_deployment=}, 39 | {api_key=} 40 | """) 41 | self.client = AzureOpenAI(api_version=os.environ['OPENAI_API_VERSION'], 42 | azure_endpoint=os.environ['OPENAI_API_BASE'], 43 | azure_deployment=os.environ['ENGINE'], 44 | api_key=os.environ['AZURE_OPENAI_API_KEY']) 45 | 46 | # TODO: The 'openai.api_base' option isn't read in the client API. You will need to pass it when you instantiate the client, e.g. 'OpenAI(api_base=os.environ['OPENAI_API_BASE'])' 47 | # openai.api_base = os.environ['OPENAI_API_BASE'] # Your Azure OpenAI resource's endpoint value. 48 | # "2023-05-15" 49 | 50 | """ 51 | openai.api_type = os.environ['OPENAI_API_TYPE'] 52 | openai.api_base = os.environ['OPENAI_API_BASE'] # "https://devmartiopenai.openai.azure.com/" 53 | openai.api_version = os.environ['OPENAI_API_VERSION'] # "2023-05-15" 54 | """ 55 | log.info(f"Using Azure client {self.client._version}") 56 | else: 57 | self.client = openai.OpenAI(api_key=os.environ['OPENAI_API_KEY']) 58 | 59 | def summarize(self, text: str, system_prompt: str = "") -> Tuple[str, str]: 60 | """Send to openAI and get a summary back. 61 | Returns a tuple: error, message. Note that either error or message may be None. 62 | """ 63 | if not system_prompt: 64 | system_prompt = self.system_prompt 65 | messages = [ 66 | {"role": "system", "content": system_prompt}, # single shot 67 | {"role": "user", "content": text} 68 | ] 69 | 70 | try: 71 | if self.go_azure: 72 | log.info("Using MS AZURE!") 73 | response = self.client.chat.completions.create(model=os.environ['ENGINE'], 74 | messages=messages, 75 | temperature=0.3, 76 | top_p=0.95, 77 | stop=None, 78 | max_tokens=self.max_tokens, 79 | n=1) 80 | else: # go directly via OpenAI's API 81 | log.info("Using OpenAI directly!") 82 | if self.output_json: 83 | response_format = {"type": "json_object"} 84 | else: 85 | response_format = None 86 | response = self.client.chat.completions.create(model=self.model, 87 | messages=messages, 88 | temperature=0.3, 89 | top_p=0.95, 90 | stop=None, 91 | max_tokens=self.max_tokens, 92 | response_format=response_format, 93 | n=1) 94 | 95 | log.debug(f"Full Response (OpenAI): {response}") 96 | log.debug(f"response.choices[0].text: {response.choices[0].message}") 97 | log.debug(response.model_dump_json(indent=2)) 98 | result = response.choices[0].message.content 99 | error = None # Or move the error handling back to main.py, not sure 100 | except openai.APIConnectionError as e: 101 | result = None 102 | error = f"The server could not be reached. Reason {e.__cause__}" 103 | log.error(error) 104 | except openai.RateLimitError as e: 105 | result = None 106 | error = f"A 429 status code was received; we should back off a bit. {str(e)}" 107 | log.error(error) 108 | except openai.APIStatusError as e: 109 | result = None 110 | error = f"Another non-200-range status code was received. Status code: {e.status_code}. \n\nResponse: {e.message}" 111 | log.error(error) 112 | except Exception as e: 113 | result = None 114 | error = f"Unknown error! Error = '{str(e)}'" 115 | log.error(error) 116 | 117 | return result, error # type: ignore 118 | -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- 1 | services: 2 | # The main microservice, serving the OpenAI summarizer 3 | # 4 | openai-summarizer: 5 | image: openai-summarizer:${VERSION} 6 | build: 7 | context: . 8 | dockerfile: Dockerfile 9 | args: 10 | - VERSION=${VERSION} 11 | environment: 12 | PYTHON_PATH: /app 13 | env_file: .env 14 | ports: 15 | - "9001:9999" 16 | dns: 8.8.8.8 17 | # network_mode: host 18 | volumes: 19 | - ./app:/app 20 | labels: 21 | - "traefik.enable=true" 22 | - "traefik.http.routers.munch-cti.rule=Host(`cti-summarizer.malware.lab`)" 23 | - "traefik.http.routers.munch-cti.entrypoints=websecure" 24 | - "traefik.http.routers.munch-cti.tls.certresolver=myresolver" 25 | - "traefik.http.services.munch-cti.loadbalancer.server.port=9999" 26 | networks: 27 | - web 28 | 29 | networks: 30 | web: 31 | external: true 32 | -------------------------------------------------------------------------------- /env.example: -------------------------------------------------------------------------------- 1 | # The version of this service 2 | VERSION=0.2 3 | 4 | # First decide: are you using OpenAI's API directly or via MS Azure? 5 | 6 | # if USE_MS_AZURE=1 then go via MS. Otherwise, OpenAI's API is the default 7 | USE_MS_AZURE=1 8 | MS_AZURE_DEPLOYMENT= 9 | 10 | # Insert your OpenAI API key here: 11 | OPENAI_API_KEY=sk-.... 12 | 13 | # System Prompt: Tell GPT* its system-role. Example below 14 | SYSTEM_PROMPT="You are a Cyber Threat Intelligence Analyst and need to summarise a report for upper management. The report shall be nicely formatted with two sections: one Executive Summary section and one 'TTPs and IoCs' section. The second section shall list all IP addresses, domains, URLs, tools and hashes (sha-1, sha256, md5, etc.) which can be found in the report. Nicely format the report as markdown. Use newlines between markdown headings." 15 | 16 | # DRY_RUN = 1 means: don't send a request to OpenAI, simulate it with a lore-ipsum text. 17 | DRY_RUN=0 18 | 19 | # HTTP Basic auth simple protection 20 | BASIC_AUTH_USER= 21 | BASIC_AUTH_PASSWORD= 22 | -------------------------------------------------------------------------------- /quickview.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | 4 | """Quickview the plain_text ORKL field""" 5 | 6 | import sys 7 | from pprint import pprint 8 | 9 | with open(sys.argv[1]) as fp: 10 | data = fp.readlines() 11 | for line in data: 12 | data2 = line.split('\\n') 13 | data2 = list(filter(None, data2)) 14 | pprint(data2) 15 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | fastapi 2 | multidict 3 | openai 4 | openpyxl 5 | platformdirs 6 | pydantic 7 | pydantic-settings 8 | python-multipart 9 | requests 10 | starlette 11 | tqdm 12 | types-pytz 13 | typing_extensions 14 | urllib3 15 | wincertstore 16 | yarl 17 | uvicorn 18 | virtualenv 19 | Jinja2 20 | Markdown 21 | MarkupSafe 22 | beautifulsoup4 23 | python-dotenv 24 | pytest 25 | pytest-cov 26 | pymupdf 27 | python-multipart 28 | -------------------------------------------------------------------------------- /sample-data.txt: -------------------------------------------------------------------------------- 1 | Since 2004, Mandiant has investigated computer security breaches at hundreds of organizations around the world. The majority of these security breaches are attributed to advanced threat actors referred to as the "Advanced Persistent Threat" (APT). We first published details about the APT in our January 2010 M-Trends report. As we stated in the report, our position was that "The Chinese government may authorize this activity, but there’s no way to determine the extent of its involvement." Now, three years later, we have the evidence required to change our assessment. The details we have analyzed during hundreds of investigations convince us that the groups conducting these activities are based primarily in China and that the Chinese Government is aware of them.[3] 2 | Mandiant continues to track dozens of APT groups around the world; however, this report is focused on the most prolific of these groups. We refer to this group as "APT1" and it is one of more than 20 APT groups with origins in China. APT1 is a single organization of operators that has conducted a cyber espionage campaign against a broad range of victims since at least 2006. From our observations, it is one of the most prolific cyber espionage groups in terms of the sheer quantity of information stolen. The scale and impact of APT1’s operations compelled us to write this report. 3 | The activity we have directly observed likely represents only a small fraction of the cyber espionage that APT1 has conducted. Though our visibility of APT1’s activities is incomplete, we have analyzed the group’s intrusions against nearly 150 victims over seven years. From our unique vantage point responding to victims, we tracked APT1 back to four large networks in Shanghai, two of which are allocated directly to the Pudong New Area. We uncovered a substantial amount of APT1’s attack infrastructure, command and control, and modus operandi (tools, tactics, and procedures). In an effort to underscore there are actual individuals behind the keyboard, Mandiant is revealing three personas we have attributed to APT1. These operators, like soldiers, may merely be following orders given to them by others. Our analysis has led us to conclude that APT1 is likely government-sponsored and one of the most persistent of China’s cyber threat actors. We believe that APT1 is able to wage such a long-running and extensive cyber espionage campaign in large part because it receives direct government support. In seeking to identify the organization behind this activity, our research found that People’s Liberation Army (PLA’s) Unit 61398 is similar to APT1 in its mission, capabilities, and resources. PLA Unit 61398 is also located in precisely the same area from which APT1 activity appears to originate. 4 | 5 | KEY FINDINGS 6 | 7 | APT1 is believed to be the 2nd Bureau of the People’s Liberation Army (PLA) General Staff Department’s (GSD) 3rd Department (总参三部二局), which is most commonly known by its Military Unit Cover Designator (MUCD) as Unit 61398 (61398部队). 8 | 9 | » The nature of "Unit 61398’s" work is considered by China to be a state secret; however, we believe it engages in harmful "Computer Network Operations." 10 | » Unit 61398 is partially situated on Datong Road (大同路) in Gaoqiaozhen (高桥镇), which is located in the Pudong New Area (浦东新区) of Shanghai (上海). The central building in this compound is a 130,663 square foot facility that is 12 stories high and was built in early 2007. 11 | » We estimate that Unit 61398 is staffed by hundreds, and perhaps thousands of people based on the size of Unit 61398’s physical infrastructure. 12 | » China Telecom provided special fiber optic communications infrastructure for the unit in the name of national defense. 13 | » Unit 61398 requires its personnel to be trained in computer security and computer network operations and also requires its personnel to be proficient in the English language. 14 | » Mandiant has traced APT1’s activity to four large networks in Shanghai, two of which serve the Pudong New Area where Unit 61398 is based. APT1 has systematically stolen hundreds of terabytes of data from at least 141 organizations, and has demonstrated the capability and intent to steal from dozens of organizations simultaneously.4 15 | » Since 2006, Mandiant has observed APT1 compromise 141 companies spanning 20 major industries. 16 | » APT1 has a well-defined attack methodology, honed over years and designed to steal large volumes of valuable intellectual property. 17 | » Once APT1 has established access, they periodically revisit the victim’s network over several months or years and steal broad categories of intellectual property, including technology blueprints, proprietary manufacturing processes, test results, business plans, pricing documents, partnership agreements, and emails and contact lists from victim organizations’ leadership. 18 | » APT1 uses some tools and techniques that we have not yet observed being used by other groups including two utilities designed to steal email — GETMAIL and MAPIGET. 19 | » APT1 maintained access to victim networks for an average of 356 days.5 The longest time period APT1 maintained access to a victim’s network was 1,764 days, or four years and ten months. 20 | » Among other large-scale thefts of intellectual property, we have observed APT1 stealing 6.5 terabytes of compressed data from a single organization over a ten-month time period. 21 | » In the first month of 2011, APT1 successfully compromised at least 17 new victims operating in 10 different industries. 22 | 23 | 24 | 25 | [3] Our conclusions are based exclusively on unclassified, open source information derived from Mandiant observations. None of the information in this report involves access to or confirmation by classified intelligence. 26 | -------------------------------------------------------------------------------- /static/apple-touch-icon-precomposed.png: -------------------------------------------------------------------------------- 1 | quote-150x150.png -------------------------------------------------------------------------------- /static/apple-touch-icon.png: -------------------------------------------------------------------------------- 1 | quote-150x150.png -------------------------------------------------------------------------------- /static/limits.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | GPT Limitations in Text Summarization 5 | 6 | 7 | 8 |
9 |
10 |

Limitations of GPT in Text Summarization

11 |

12 |

A human needs to check the results of GPT's output.

13 |
    14 |
  • Note that OpenAI might use the data once we send it via the API. So, only send TLP:CLEAR, open source data.
  • 15 |
  • GPT may miss important details or context that a human would recognize as relevant.
  • 16 |
  • GPT may generate summaries that are factually incorrect or contain misleading information.
  • 17 |
  • GPT may struggle with summarizing complex or technical language.
  • 18 |
  • GPT may generate summaries that are too brief or too long.
  • 19 |
  • May occasionally produce harmful instructions or biased content
  • 20 |
  • Limited knowledge of world and events after 2021
  • 21 |
22 |
 
23 |

Note: It is important to have a human review and verify the output of any text summarization generated by GPT.

24 |
25 |
26 | 27 | 28 | 29 | -------------------------------------------------------------------------------- /static/main.css: -------------------------------------------------------------------------------- 1 | @font-face { 2 | font-family: "ColfaxAI"; 3 | src: url(https://cdn.openai.com/API/fonts/ColfaxAIRegular.woff2) 4 | format("woff2"), 5 | url(https://cdn.openai.com/API/fonts/ColfaxAIRegular.woff) format("woff"); 6 | font-weight: normal; 7 | font-style: normal; 8 | } 9 | @font-face { 10 | font-family: "ColfaxAI"; 11 | src: url(https://cdn.openai.com/API/fonts/ColfaxAIBold.woff2) format("woff2"), 12 | url(https://cdn.openai.com/API/fonts/ColfaxAIBold.woff) format("woff"); 13 | font-weight: bold; 14 | font-style: normal; 15 | } 16 | body, 17 | input { 18 | font-size: 16px; 19 | line-height: 24px; 20 | color: #353740; 21 | font-family: "ColfaxAI", Helvetica, sans-serif; 22 | } 23 | body { 24 | display: flex; 25 | flex-direction: column; 26 | align-items: center; 27 | padding-top: 60px; 28 | } 29 | .page { 30 | width: 800px; 31 | align-items: center; 32 | display: flex; 33 | flex-direction: column; 34 | } 35 | .icon { 36 | width: 34px; 37 | height: 34px; 38 | } 39 | h3 { 40 | font-size: 32px; 41 | line-height: 40px; 42 | font-weight: bold; 43 | color: #202123; 44 | margin: 16px 0 40px; 45 | } 46 | form { 47 | display: flex; 48 | flex-direction: column; 49 | /* width: 320px; */ 50 | width: 800px; 51 | } 52 | input[type="text"] { 53 | padding: 12px 16px; 54 | border: 1px solid #10a37f; 55 | border-radius: 4px; 56 | margin-bottom: 24px; 57 | } 58 | input[type="url"] { 59 | padding: 12px 16px; 60 | border: 1px solid #10a37f; 61 | border-radius: 4px; 62 | margin-bottom: 24px; 63 | } 64 | /* input [type="textarea"] { */ 65 | .textarea { 66 | /* padding: 24px 32px; */ 67 | resize: none; 68 | border: 1px solid #10a37f; 69 | border-radius: 4px; 70 | margin-top: 32px; 71 | margin-bottom: 32px; 72 | width: 100%; 73 | height: 30px; 74 | box-sizing: border-box; 75 | overflow: auto; 76 | /* box-sizing: content-box; */ 77 | } 78 | ::placeholder { 79 | color: #8e8ea0; 80 | opacity: 1; 81 | } 82 | input[type="submit"] { 83 | padding: 8px; 84 | color: #fff; 85 | background-color: #10a37f; 86 | border: none; 87 | border-radius: 4px; 88 | text-align: center; 89 | cursor: pointer; 90 | } 91 | .result { 92 | font-weight: italic; 93 | margin-top: 40px; 94 | width: 800px; 95 | display: flex; 96 | flex-direction: column; 97 | align-items: center; 98 | } 99 | -------------------------------------------------------------------------------- /static/markdown-styles.css: -------------------------------------------------------------------------------- 1 | .markdown { 2 | line-height: 1.5; 3 | font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; 4 | } 5 | 6 | .markdown h1, 7 | .markdown h2, 8 | .markdown h3, 9 | .markdown h4, 10 | .markdown h5, 11 | .markdown h6 { 12 | margin-top: 1.25rem; 13 | margin-bottom: 0.625rem; 14 | } 15 | 16 | .markdown h1 { 17 | font-size: 2rem; 18 | border-bottom: 1px solid #ccc; 19 | padding-bottom: 0.3em; 20 | } 21 | 22 | .markdown h2 { 23 | font-size: 1.75rem; 24 | } 25 | 26 | .markdown h3 { 27 | font-size: 1.5rem; 28 | } 29 | 30 | .markdown h4 { 31 | font-size: 1.25rem; 32 | } 33 | 34 | .markdown h5 { 35 | font-size: 1rem; 36 | } 37 | 38 | .markdown h6 { 39 | font-size: 0.875rem; 40 | } 41 | 42 | .markdown p { 43 | margin-bottom: 1.25rem; 44 | } 45 | 46 | .markdown blockquote { 47 | border-left: 4px solid #ccc; 48 | padding-left: 1.25rem; 49 | margin-left: 0; 50 | font-style: italic; 51 | margin-bottom: 1.25rem; 52 | } 53 | 54 | .markdown pre, 55 | .markdown code { 56 | font-family: "SFMono-Regular", Consolas, "Liberation Mono", Menlo, Courier, monospace; 57 | font-size: 0.875rem; 58 | } 59 | 60 | .markdown pre { 61 | background-color: #f5f5f5; 62 | padding: 1.25rem; 63 | overflow-x: auto; 64 | border-radius: 4px; 65 | margin-bottom: 1.25rem; 66 | } 67 | 68 | .markdown code { 69 | background-color: #f5f5f5; 70 | padding: 0.125rem 0.25rem; 71 | border-radius: 4px; 72 | } 73 | 74 | .markdown ul, 75 | .markdown ol { 76 | padding-left: 2rem; 77 | margin-bottom: 1.25rem; 78 | } 79 | 80 | .markdown img { 81 | max-width: 100%; 82 | height: auto; 83 | display: block; 84 | margin-left: auto; 85 | margin-right: auto; 86 | margin-bottom: 1.25rem; 87 | } 88 | 89 | .markdown a { 90 | color: #3273dc; 91 | text-decoration: none; 92 | } 93 | 94 | .markdown a:hover { 95 | color: #363636; 96 | text-decoration: underline; 97 | } 98 | 99 | -------------------------------------------------------------------------------- /static/quote-150x150.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EC-DIGIT-CSIRC/openai-cti-summarizer/f42675459fcb523c83706e6c354fcfe2ba4a2f65/static/quote-150x150.png -------------------------------------------------------------------------------- /static/summary-example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EC-DIGIT-CSIRC/openai-cti-summarizer/f42675459fcb523c83706e6c354fcfe2ba4a2f65/static/summary-example.png -------------------------------------------------------------------------------- /static/text-example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EC-DIGIT-CSIRC/openai-cti-summarizer/f42675459fcb523c83706e6c354fcfe2ba4a2f65/static/text-example.png -------------------------------------------------------------------------------- /static/tlp-clear.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EC-DIGIT-CSIRC/openai-cti-summarizer/f42675459fcb523c83706e6c354fcfe2ba4a2f65/static/tlp-clear.png -------------------------------------------------------------------------------- /templates/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | Summarize this (via OpenAI) 5 | 6 | 8 | 9 | 10 | 11 | 12 | 29 | 39 | 121 | 122 | 123 | 124 |
125 | 126 |
127 |
128 |
129 | 130 |

Summarize this

131 |
132 |
133 |
134 |
135 | 136 |
137 | 138 |   139 | 142 |

143 | 144 |   145 | 147 |

148 | 150 |   151 | 153 |

154 | 155 | 156 |
157 |
158 |
159 |

160 | 161 |

162 |   163 |

164 | 170 | 174 | 175 |

176 |
177 | 178 |

179 | 180 |

181 |   182 |

183 | 184 |

185 | {% if url %}{{ url }}{% endif %} 187 |

188 |   189 |

190 |

191 | 192 |
193 | 194 |
195 | 206 |
207 |
208 | 213 |

214 |   215 |

216 |

217 | 218 | 219 |
220 | 221 | 222 |

223 | 226 |
227 |
228 | 229 |
230 |
231 | 240 |
241 |
242 |
243 |
244 | 245 |
246 | 247 |
248 |
4096
249 |
250 |
251 | 252 |
253 |
254 |
255 |
256 |

  257 |

258 | {% if result %} 259 | {% if success %} 260 |

261 | {% else %} 262 |
263 | {% endif %} 264 |
265 |

Results:

266 | 271 |
272 |
273 | {{ result|safe }} 274 | 275 |
276 |
277 | {% endif %} 278 | {% if metainfo %} 279 |
{{ metainfo }}
280 | {% endif %} 281 |

  282 |

283 |

284 |
285 |
286 |
287 |
288 |
289 | Version: 0.5. Copyright 2023-2024 (C) by Aaron Kaplan. All rights reserved.E-mail.

291 |

Made with in Vienna

292 |
293 |
294 |
295 |
296 |
297 |
298 | 299 | 328 | 329 | 330 | --------------------------------------------------------------------------------