├── .gitignore ├── .pre-commit-config.yaml ├── LICENSE ├── README.md ├── data ├── evaded │ ├── .gitkeep │ ├── ember │ │ └── .gitkeep │ └── malconv │ │ └── .gitkeep └── logs │ └── .gitkeep ├── dockerfile.cpu ├── download_deps.py ├── malware_rl ├── __init__.py └── envs │ ├── __init__.py │ ├── controls │ ├── __init__.py │ ├── good_strings │ │ └── .gitkeep │ ├── modifier.py │ ├── section_names.txt │ ├── small_dll_imports.json │ └── trusted │ │ └── .gitkeep │ ├── ember_gym.py │ ├── malconv_gym.py │ ├── sorel_gym.py │ └── utils │ ├── __init__.py │ ├── ember.py │ ├── interface.py │ ├── malconv.h5 │ ├── malconv.py │ ├── samples │ └── .gitkeep │ └── sorel.py ├── ppo.py ├── random_agent.py ├── requirements.txt └── stable_baselines_env_check.py /.gitignore: -------------------------------------------------------------------------------- 1 | * 2 | !/**/ 3 | !*.* 4 | 5 | __pycache__/ 6 | *.py[cod] 7 | .DS_Store 8 | .vscode/ 9 | data/* 10 | *.ipynb_checkpoints/ 11 | *.exe 12 | *.txt 13 | .idea 14 | *.model 15 | -------------------------------------------------------------------------------- /.pre-commit-config.yaml: -------------------------------------------------------------------------------- 1 | repos: 2 | - repo: https://github.com/pre-commit/pre-commit-hooks 3 | rev: v4.0.1 4 | hooks: 5 | - id: trailing-whitespace 6 | - id: end-of-file-fixer 7 | - id: check-docstring-first 8 | - id: check-json 9 | - id: check-yaml 10 | - id: debug-statements 11 | - id: requirements-txt-fixer 12 | - repo: https://github.com/asottile/pyupgrade 13 | rev: v2.19.4 14 | hooks: 15 | - id: pyupgrade 16 | args: [--py36-plus] 17 | - repo: https://github.com/asottile/add-trailing-comma 18 | rev: v2.1.0 19 | hooks: 20 | - id: add-trailing-comma 21 | args: [--py36-plus] 22 | - repo: meta 23 | hooks: 24 | - id: check-hooks-apply 25 | - id: check-useless-excludes 26 | - repo: https://github.com/pre-commit/mirrors-isort 27 | rev: v5.8.0 28 | hooks: 29 | - id: isort 30 | - repo: https://github.com/ambv/black 31 | rev: 21.6b0 32 | hooks: 33 | - id: black 34 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Bobby Filar 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # MalwareRL 2 | > Malware Bypass Research using Reinforcement Learning 3 | 4 | ## Background 5 | This is a malware manipulation environment using OpenAI's gym environments. The core idea is based on paper "Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning" 6 | ([paper](https://arxiv.org/abs/1801.08917)). I am extending the original repo because: 7 | 1. It is no longer maintained 8 | 2. It uses Python2 and an outdated version of LIEF 9 | 3. I wanted to integrate new Malware gym environments and additional manipulations 10 | 11 | Over the past three years there have been breakthrough open-source projects published in the security ML space. In particular, [Ember](https://github.com/endgameinc/ember) (Endgame Malware BEnchmark for Research) ([paper](https://arxiv.org/abs/1804.04637)) and MalConv: Malware detection by eating a whole exe ([paper](https://arxiv.org/abs/1710.09435)) have provided security researchers the ability to develop sophisticated, reproducible models that emulate features/techniques found in NGAVs. 12 | 13 | ## MalwareRL Gym Environment 14 | MalwareRL exposes `gym` environments for both Ember and MalConv to allow researchers to develop Reinforcement Learning agents to bypass Malware Classifiers. Actions include a variety of non-breaking (e.g. binaries will still execute) modifications to the PE header, sections, imports and overlay and are listed below. 15 | 16 | ### Action Space 17 | ``` 18 | ACTION_TABLE = { 19 | 'modify_machine_type': 'modify_machine_type', 20 | 'pad_overlay': 'pad_overlay', 21 | 'append_benign_data_overlay': 'append_benign_data_overlay', 22 | 'append_benign_binary_overlay': 'append_benign_binary_overlay', 23 | 'add_bytes_to_section_cave': 'add_bytes_to_section_cave', 24 | 'add_section_strings': 'add_section_strings', 25 | 'add_section_benign_data': 'add_section_benign_data', 26 | 'add_strings_to_overlay': 'add_strings_to_overlay', 27 | 'add_imports': 'add_imports', 28 | 'rename_section': 'rename_section', 29 | 'remove_debug': 'remove_debug', 30 | 'modify_optional_header': 'modify_optional_header', 31 | 'modify_timestamp': 'modify_timestamp', 32 | 'break_optional_header_checksum': 'break_optional_header_checksum', 33 | 'upx_unpack': 'upx_unpack', 34 | 'upx_pack': 'upx_pack' 35 | } 36 | ``` 37 | 38 | ### Observation Space 39 | The `observation_space` of the `gym` environments are an array representing the feature vector. For ember this is `numpy.array == 2381` and malconv `numpy.array == 1024**2`. The MalConv gym presents an opportunity to try RL techniques to generalize learning across large State Spaces. 40 | 41 | ### Agents 42 | A baseline agent `RandomAgent` is provided to demonstrate how to interact w/ `gym` environments and expected output. This agent attempts to evade the classifier by randomly selecting an action. This process is repeated up to the length of a game (e.g. 50 mods). If the modifed binary scores below the classifier threshold we register it as an evasion. In a lot of ways the `RandomAgent` acts as a fuzzer trying a bunch of actions with no regard to minimizing the modifications of the resulting binary. 43 | 44 | Additional agents will be developed and made available (both model and code) in the coming weeks. 45 | 46 | **Table 1:** _Evasion Rate against Ember Holdout Dataset_* 47 | | gym | agent | evasion_rate | avg_ep_len | 48 | | --- | ----- | ------------ | ---------- | 49 | | ember | RandomAgent | 89.2% | 8.2 50 | | malconv | RandomAgent | 88.5% | 16.33 51 | 52 | \ 53 | \* _250 random samples_ 54 | 55 | ## Setup 56 | To get `malware_rl` up and running you will need the follow external dependencies: 57 | - [LIEF](https://lief.quarkslab.com/) 58 | - [Ember](https://github.com/Azure/2020-machine-learning-security-evasion-competition/blob/master/defender/defender/models/ember_model.txt.gz), [Malconv](https://github.com/endgameinc/ember/blob/master/malconv/malconv.h5) and [SOREL-20M](https://github.com/sophos-ai/SOREL-20M) models. All of these then need to be placed into the `malware_rl/envs/utils/` directory. 59 | > The SOREL-20M model requires use of the `aws-cli` in order to get. When accessing the AWS S3 bucket, look in the `sorel-20m-model/checkpoints/lightGBM` folder and fish out any of the models in the `seed` folders. The model file will need to be renamed to `sorel.model` and placed into `malware_rl/envs/utils` alongside the other models. 60 | - UPX has been added to support pack/unpack modifications. Download the binary [here](https://upx.github.io/) and place in the `malware_rl/envs/controls` directory. 61 | - Benign binaries - a small set of "trusted" binaries (e.g. grabbed from base Windows installation) you can download some via MSFT website ([example](https://download.microsoft.com/download/a/c/1/ac1ac039-088b-4024-833e-28f61e01f102/NETFX1.1_bootstrapper.exe)). Store these binaries in `malware_rl/envs/controls/trusted` 62 | - Run `strings` command on those binaries and save the output as `.txt` files in `malware_rl/envs/controls/good_strings` 63 | - Download a set of malware from VirusShare or VirusTotal. I just used a list of hashes from the Ember dataset 64 | 65 | **Note:** The helper script `download_deps.py` can be used as a quickstart to get most of the key dependencies setup. 66 | 67 | I used a [conda](https://docs.conda.io/en/latest/) env set for Python3.7: 68 | 69 | `conda create -n malware_rl python=3.7` 70 | 71 | Finally install the Python3 dependencies in the `requirements.txt`. 72 | 73 | `pip3 install -r requirements.txt` 74 | 75 | ## References 76 | The are a bunch of good papers/blog posts on manipulating binaries to evade ML classifiers. I compiled a few that inspired portions of this project below. Also, I have inevitably left out other pertinent reseach, so if there is something that should be in here let me know in an Git Issue or hit me up on Twitter ([@filar](https://twitter.com/filar)). 77 | ### Papers 78 | - Demetrio, Luca, et al. "Efficient Black-box Optimization of Adversarial Windows Malware with Constrained Manipulations." arXiv preprint arXiv:2003.13526 (2020). ([paper](https://arxiv.org/abs/2003.13526)) 79 | - Demetrio, Luca, et al. "Adversarial EXEmples: A Survey and Experimental Evaluation of Practical Attacks on Machine Learning for Windows Malware Detection." arXiv preprint arXiv:2008.07125 (2020). ([paper](https://arxiv.org/abs/2008.07125)) 80 | - Song, Wei, et al. "Automatic Generation of Adversarial Examples for Interpreting Malware Classifiers." arXiv preprint arXiv:2003.03100 (2020). 81 | ([paper](https://arxiv.org/abs/2003.03100)) 82 | - Suciu, Octavian, Scott E. Coull, and Jeffrey Johns. "Exploring adversarial examples in malware detection." 2019 IEEE Security and Privacy Workshops (SPW). IEEE, 2019. ([paper](https://arxiv.org/abs/1810.08280)) 83 | - Fleshman, William, et al. "Static malware detection & subterfuge: Quantifying the robustness of machine learning and current anti-virus." 2018 13th International Conference on Malicious and Unwanted Software (MALWARE). IEEE, 2018. ([paper](https://arxiv.org/abs/1806.04773)) 84 | - Pierazzi, Fabio, et al. "Intriguing properties of adversarial ML attacks in the problem space." 2020 IEEE Symposium on Security and Privacy (SP). IEEE, 2020. ([paper/code](https://s2lab.kcl.ac.uk/projects/intriguing/)) 85 | - Fang, Zhiyang, et al. "Evading anti-malware engines with deep reinforcement learning." IEEE Access 7 (2019): 48867-48879. ([paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8676031)) 86 | 87 | ### Blog Posts 88 | - [Evading Machine Learning Malware Classifiers: for fun and profit!](https://towardsdatascience.com/evading-machine-learning-malware-classifiers-ce52dabdb713) 89 | - [Cylance, I Kill You!](https://skylightcyber.com/2019/07/18/cylance-i-kill-you/) 90 | - [Machine Learning Security Evasion Competition 2020](https://msrc-blog.microsoft.com/2020/06/01/machine-learning-security-evasion-competition-2020-invites-researchers-to-defend-and-attack/) 91 | - [ML evasion contest – the AV tester’s perspective](https://www.mrg-effitas.com/research/machine-learning-evasion-contest-the-av-testers-perspective/) 92 | 93 | ### Talks 94 | - 42: The answer to life the universe and everything offensive security by Will Pearce, Nick Landers ([slides](https://github.com/moohax/Talks/blob/master/slides/DerbyCon19.pdf)) 95 | - Bot vs. Bot: Evading Machine Learning Malware Detection by Hyrum Anderson ([slides](https://www.blackhat.com/docs/us-17/thursday/us-17-Anderson-Bot-Vs-Bot-Evading-Machine-Learning-Malware-Detection.pdf)) 96 | - Trying to Make Meterpreter into an Adversarial Example by Andy Applebaum ([slides](https://www.camlis.org/2019/talks/applebaum)) 97 | -------------------------------------------------------------------------------- /data/evaded/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bfilar/malware_rl/300f47ff2240132a449277283807426d34271993/data/evaded/.gitkeep -------------------------------------------------------------------------------- /data/evaded/ember/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bfilar/malware_rl/300f47ff2240132a449277283807426d34271993/data/evaded/ember/.gitkeep -------------------------------------------------------------------------------- /data/evaded/malconv/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bfilar/malware_rl/300f47ff2240132a449277283807426d34271993/data/evaded/malconv/.gitkeep -------------------------------------------------------------------------------- /data/logs/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bfilar/malware_rl/300f47ff2240132a449277283807426d34271993/data/logs/.gitkeep -------------------------------------------------------------------------------- /dockerfile.cpu: -------------------------------------------------------------------------------- 1 | # Dockerfile to stand up malware-rl and all associated dependencies. 2 | # Only uses CPU tensorflow. Would need to be rebuilt in order to take 3 | # advantage of a GPU. 4 | FROM ubuntu:20.04 5 | 6 | LABEL maintainer="br0kej@protonmail.com" 7 | 8 | WORKDIR /home 9 | 10 | RUN apt-get update 11 | 12 | RUN apt-get install -y git python3 python3-pip python3-virtualenv upx subversion binutils 13 | 14 | RUN git clone https://github.com/bfilar/malware_rl.git 15 | 16 | RUN cd malware_rl && pip3 install -r requirements.txt && python3 download_deps.py --accept --force 17 | 18 | CMD bash 19 | -------------------------------------------------------------------------------- /download_deps.py: -------------------------------------------------------------------------------- 1 | """ 2 | This script does the following: 3 | 4 | 1. Downloads a small repository of known bad stuff (14 bad things) and 5 | saves to temporary directory. The ransomware folder from the 6 | https://github.com/Endermanch/MalwareDatabase/ repo. 7 | 2. Unzips the samples into the correct directory for the environment 8 | (malware_rl/envs/utils/samples). 9 | 3. Renames each sample to its corresponding SHA256 hash. 10 | 4. Removes temporary malware directory 11 | """ 12 | 13 | import argparse 14 | import glob 15 | import gzip 16 | import hashlib 17 | import os 18 | import shutil 19 | import subprocess 20 | import sys 21 | import urllib.request 22 | import zipfile 23 | 24 | # Third Part Libraries 25 | import svn.remote 26 | 27 | MODULE_PATH = os.path.split(os.path.abspath(sys.modules[__name__].__file__))[0] 28 | UTIL_PATH = os.path.join(MODULE_PATH, "malware_rl/envs/utils/") 29 | SAMPLE_PATH = os.path.join(MODULE_PATH, "malware_rl/envs/utils/samples/") 30 | ZIP_PASSWORD = "mysubsarethebest" 31 | DEFAULT_MALWARE_REPOS = [ 32 | "https://github.com/Endermanch/MalwareDatabase/trunk/ransomwares", 33 | "https://github.com/Endermanch/MalwareDatabase/trunk/rogues", 34 | "https://github.com/Endermanch/MalwareDatabase/trunk/trojans", 35 | "https://github.com/Endermanch/MalwareDatabase/trunk/jokes", 36 | ] 37 | TEMP_SAMPLE_PATHS = ["ransomwares/", "rogues/", "trojans/", "jokes/"] 38 | BENIGN_REPO = "https://github.com/xournalpp/xournalpp/releases/download/1.0.18/xournalpp-1.0.18-windows.zip" 39 | EMBER_MODEL_PATH = "https://raw.githubusercontent.com/Azure/2020-machine-learning-security-evasion-competition/master/defender/defender/models/ember_model.txt.gz" 40 | 41 | 42 | def retrive_url(source_file_url=None, filename=None): 43 | """ 44 | Retrieves a file 45 | """ 46 | if os.path.exists(filename): 47 | print(f"[-] {filename} already present. Skipping") 48 | else: 49 | urllib.request.urlretrieve(source_file_url, filename) 50 | 51 | 52 | def download_specific_github_file( 53 | source_file_url=None, 54 | filename=None, 55 | storage_directory=None, 56 | ): 57 | """ 58 | Downloads a specific file from a github repo. 59 | If gzipped, decompresses and drops into directory 60 | """ 61 | retrive_url(source_file_url, filename) 62 | shutil.move( 63 | os.path.join(os.getcwd(), filename), 64 | os.path.join(storage_directory, filename), 65 | ) 66 | 67 | if os.path.join(storage_directory, filename).endswith(".gz"): 68 | split_filename = os.path.splitext(filename)[0] 69 | 70 | with gzip.open(os.path.join(UTIL_PATH, filename), "r") as f_in, open( 71 | os.path.join(UTIL_PATH, split_filename), 72 | "wb", 73 | ) as f_out: 74 | shutil.copyfileobj(f_in, f_out) 75 | os.remove(os.path.join(UTIL_PATH, filename)) 76 | print("[+] Success - Ember Model downloaded") 77 | 78 | 79 | def download_specific_git_repo_directory(temp_path=None, source_repo=None): 80 | """ 81 | Downloads a specific directory within a git repo. 82 | """ 83 | if os.path.exists(temp_path) is False: 84 | repo = svn.remote.RemoteClient(source_repo) 85 | 86 | try: 87 | repo.checkout(source_repo) 88 | print( 89 | "[+] Success - Samples Downloaded " "Placed into Temp Directory", 90 | ) 91 | 92 | except svn.exception.SvnException: 93 | print( 94 | """ 95 | Subversion not found. In order to download the sample malware, 96 | Subversion (svn) needs to be installed. This provides a method of 97 | downloading only the target folder rather than the whole repo. 98 | """ 99 | ) 100 | 101 | 102 | def unzip_file(filename=None, source_zip=None, password=False): 103 | """ 104 | Unzips a .zip file 105 | """ 106 | try: 107 | if password: 108 | with zipfile.ZipFile(filename, "r") as file: 109 | file.extractall( 110 | source_zip, 111 | pwd=bytes(ZIP_PASSWORD, "utf-8"), 112 | ) 113 | else: 114 | with zipfile.ZipFile(filename, "r") as file: 115 | file.extractall(source_zip) 116 | except: 117 | pass 118 | 119 | 120 | def unzip_samples(temp_sample_path=None, sample_path=None): 121 | """ 122 | Unzips all .zip's within the target directory 123 | """ 124 | if os.path.exists(temp_sample_path): 125 | target_path_contents = glob.glob( 126 | os.path.join( 127 | os.getcwd(), 128 | temp_sample_path + "*.zip", 129 | ), 130 | ) 131 | for filename in target_path_contents: 132 | unzip_file(filename, sample_path, password=True) 133 | 134 | print("[+] Success - Samples Unzipped") 135 | 136 | 137 | def rename_samples_to_sha256_hash(sample_path=None): 138 | """ 139 | Renames all malware files within a target directory to there 140 | SHA256 hash 141 | """ 142 | for files in glob.glob(os.path.join(sample_path, "*")): 143 | sha256_hash = hashlib.sha256() 144 | with open(files, "rb") as file: 145 | for byte_block in iter(lambda: file.read(4096), b""): 146 | sha256_hash.update(byte_block) 147 | computed_hash = sha256_hash.hexdigest() 148 | os.rename(files, os.path.join(sample_path, computed_hash)) 149 | print("[+] Success - Samples renamed to their SHA256 hash") 150 | 151 | 152 | def clean_up_temp_samples_dir(directory_to_remove=None): 153 | """ 154 | Clean up temporary samples directory 155 | """ 156 | if os.path.exists(directory_to_remove): 157 | shutil.rmtree(directory_to_remove) 158 | print(f"[+] Cleanup Complete - {directory_to_remove} has been removed") 159 | 160 | 161 | def check_if_samples_exist(directory_to_check=None): 162 | """ 163 | Checks if samples directory contains samples 164 | """ 165 | if len(os.listdir(directory_to_check)) == 0: 166 | return True 167 | else: 168 | return False 169 | 170 | 171 | def generate_example_benign_strings_output(benign_repo=None, output_dir=None): 172 | """ 173 | Downloads a sample open source windows application and 174 | generates strings output 175 | """ 176 | output_zip = benign_repo.split("/")[-1] 177 | output_filename = "".join(output_zip.split(".")[:-1]) 178 | 179 | retrive_url(benign_repo, output_zip) 180 | unzip_file(output_zip) 181 | os.remove(output_zip) 182 | 183 | file = open( 184 | "./malware_rl/envs/controls/good_strings/xournal-strings.txt", 185 | "w", 186 | ) 187 | unzipped_filename = glob.glob("xournalpp-*")[0] 188 | subprocess.run(["strings", unzipped_filename], stdout=file) 189 | shutil.move( 190 | os.path.join(MODULE_PATH, unzipped_filename), 191 | os.path.join( 192 | MODULE_PATH, 193 | "malware_rl/envs/controls/trusted/" + unzipped_filename, 194 | ), 195 | ) 196 | 197 | 198 | if __name__ == "__main__": 199 | parser = argparse.ArgumentParser( 200 | description="A small utility that helps with the downloading of the requirements for the malware-rl environment", 201 | ) 202 | parser.add_argument( 203 | "--accept", 204 | help="accept liability for downloading bad things", 205 | required=False, 206 | action="store_true", 207 | ) 208 | parser.add_argument( 209 | "--force", 210 | help="forces the download even if samples directory is" "not empty", 211 | action="store_true", 212 | ) 213 | parser.add_argument( 214 | "--clean", 215 | help="deletes the contents of the samples directory", 216 | action="store_true", 217 | ) 218 | parser.add_argument( 219 | "--strings", 220 | help="download goodware windows executable and generate text file containing strings output", 221 | action="store_true", 222 | ) 223 | args = parser.parse_args() 224 | 225 | if args.clean: 226 | for sample in glob.glob(os.path.join(SAMPLE_PATH, "*")): 227 | os.remove(sample) 228 | 229 | if args.strings: 230 | generate_example_benign_strings_output( 231 | benign_repo=BENIGN_REPO, 232 | output_dir=MODULE_PATH, 233 | ) 234 | 235 | if args.accept: 236 | if check_if_samples_exist(directory_to_check=SAMPLE_PATH) | args.force is True: 237 | for temp_sample_path, malware_repo in zip( 238 | TEMP_SAMPLE_PATHS, 239 | DEFAULT_MALWARE_REPOS, 240 | ): 241 | print( 242 | f"[*] Attempting to Download {temp_sample_path} Samples & Place in Temp Directory", 243 | ) 244 | download_specific_git_repo_directory( 245 | temp_path=temp_sample_path, 246 | source_repo=malware_repo, 247 | ) 248 | print("[*] Attempting to Unzip Samples") 249 | unzip_samples( 250 | temp_sample_path=temp_sample_path, 251 | sample_path=SAMPLE_PATH, 252 | ) 253 | print("[*] Attempting to Rename Files to SHA256 Hash") 254 | rename_samples_to_sha256_hash(sample_path=SAMPLE_PATH) 255 | print("[*] Attempting Clean Up") 256 | clean_up_temp_samples_dir(directory_to_remove=temp_sample_path) 257 | 258 | print("[*] Attempting downloading Ember Model") 259 | download_specific_github_file( 260 | source_file_url=EMBER_MODEL_PATH, 261 | filename="ember_model.txt.gz", 262 | storage_directory=UTIL_PATH, 263 | ) 264 | print("[+] Success - Ember Model Downloaded") 265 | print("[*] Attempting to generate example benign strings output") 266 | generate_example_benign_strings_output( 267 | benign_repo=BENIGN_REPO, 268 | output_dir=MODULE_PATH, 269 | ) 270 | print("[+] Success - Example Strings Output Generated") 271 | 272 | else: 273 | print( 274 | "[-] It looks like there is something in your samples " 275 | "directory (malware_rl/envs/utils/samples) already, aborting" 276 | "download. Use the --force flag to continue download", 277 | ) 278 | -------------------------------------------------------------------------------- /malware_rl/__init__.py: -------------------------------------------------------------------------------- 1 | from gym.envs.registration import register 2 | from sklearn.model_selection import train_test_split 3 | 4 | from malware_rl.envs.utils import interface 5 | 6 | # create a holdout set 7 | sha256 = interface.get_available_sha256() 8 | sha256_train, sha256_holdout = train_test_split(sha256, test_size=40) 9 | 10 | MAXTURNS = 50 11 | 12 | register( 13 | id="malconv-train-v0", 14 | entry_point="malware_rl.envs:MalConvEnv", 15 | kwargs={ 16 | "random_sample": True, 17 | "maxturns": MAXTURNS, 18 | "sha256list": sha256_train, 19 | }, 20 | ) 21 | 22 | register( 23 | id="malconv-test-v0", 24 | entry_point="malware_rl.envs:MalConvEnv", 25 | kwargs={ 26 | "random_sample": False, 27 | "maxturns": MAXTURNS, 28 | "sha256list": sha256_holdout, 29 | }, 30 | ) 31 | 32 | register( 33 | id="ember-train-v0", 34 | entry_point="malware_rl.envs:EmberEnv", 35 | kwargs={ 36 | "random_sample": True, 37 | "maxturns": MAXTURNS, 38 | "sha256list": sha256_train, 39 | }, 40 | ) 41 | 42 | register( 43 | id="ember-test-v0", 44 | entry_point="malware_rl.envs:EmberEnv", 45 | kwargs={ 46 | "random_sample": False, 47 | "maxturns": MAXTURNS, 48 | "sha256list": sha256_holdout, 49 | }, 50 | ) 51 | 52 | register( 53 | id="sorel-train-v0", 54 | entry_point="malware_rl.envs:SorelEnv", 55 | kwargs={ 56 | "random_sample": True, 57 | "maxturns": MAXTURNS, 58 | "sha256list": sha256_train, 59 | }, 60 | ) 61 | 62 | register( 63 | id="sorel-test-v0", 64 | entry_point="malware_rl.envs:SorelEnv", 65 | kwargs={ 66 | "random_sample": False, 67 | "maxturns": MAXTURNS, 68 | "sha256list": sha256_holdout, 69 | }, 70 | ) 71 | -------------------------------------------------------------------------------- /malware_rl/envs/__init__.py: -------------------------------------------------------------------------------- 1 | from malware_rl.envs import utils 2 | from malware_rl.envs.ember_gym import EmberEnv 3 | from malware_rl.envs.malconv_gym import MalConvEnv 4 | from malware_rl.envs.sorel_gym import SorelEnv 5 | -------------------------------------------------------------------------------- /malware_rl/envs/controls/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bfilar/malware_rl/300f47ff2240132a449277283807426d34271993/malware_rl/envs/controls/__init__.py -------------------------------------------------------------------------------- /malware_rl/envs/controls/good_strings/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bfilar/malware_rl/300f47ff2240132a449277283807426d34271993/malware_rl/envs/controls/good_strings/.gitkeep -------------------------------------------------------------------------------- /malware_rl/envs/controls/modifier.py: -------------------------------------------------------------------------------- 1 | import array 2 | import json 3 | import os 4 | import random 5 | import subprocess 6 | import sys 7 | import tempfile 8 | from os import listdir 9 | from os.path import isfile, join 10 | 11 | import lief 12 | 13 | module_path = os.path.split(os.path.abspath(sys.modules[__name__].__file__))[0] 14 | 15 | COMMON_SECTION_NAMES = ( 16 | open( 17 | os.path.join( 18 | module_path, 19 | "section_names.txt", 20 | ), 21 | ) 22 | .read() 23 | .rstrip() 24 | .split("\n") 25 | ) 26 | COMMON_IMPORTS = json.load( 27 | open(os.path.join(module_path, "small_dll_imports.json")), 28 | ) 29 | 30 | 31 | class ModifyBinary: 32 | def __init__(self, bytez): 33 | self.bytez = bytez 34 | self.trusted_path = module_path + "/trusted/" 35 | self.good_str_path = module_path + "/good_strings/" 36 | 37 | def _randomly_select_trusted_file(self): 38 | return random.choice( 39 | [ 40 | join(self.trusted_path, f) 41 | for f in listdir(self.trusted_path) 42 | if (f != ".gitkeep") and (isfile(join(self.trusted_path, f))) 43 | ], 44 | ) 45 | 46 | def _randomly_select_good_strings(self): 47 | good_strings = random.choice( 48 | [ 49 | join(self.good_str_path, f) 50 | for f in listdir(self.good_str_path) 51 | if (f != ".gitkeep") and (isfile(join(self.good_str_path, f))) 52 | ], 53 | ) 54 | 55 | with open(good_strings) as f: 56 | strings = f.read() 57 | 58 | return strings 59 | 60 | def _random_length(self): 61 | return 2 ** random.randint(5, 8) 62 | 63 | def _search_cave( 64 | self, 65 | name, 66 | body, 67 | file_offset, 68 | vaddr, 69 | cave_size=128, 70 | _bytes=b"\x00", 71 | ): 72 | found_caves = [] 73 | null_count = 0 74 | size = len(body) 75 | 76 | for offset in range(size): 77 | byte = body[offset] 78 | check = False 79 | 80 | if byte in _bytes: 81 | null_count += 1 82 | else: 83 | check = True 84 | 85 | if offset == size - 1: 86 | check = True 87 | offset += 1 88 | 89 | if check: 90 | if null_count >= cave_size: 91 | cave_start = file_offset + offset - null_count 92 | cave_end = file_offset + offset 93 | cave_size = null_count 94 | found_caves.append([cave_start, cave_end, cave_size]) 95 | null_count = 0 96 | return found_caves 97 | 98 | def _binary_to_bytez(self, binary, imports=False): 99 | # Write modified binary to disk 100 | builder = lief.PE.Builder(binary) 101 | builder.build_imports(imports) 102 | builder.build() 103 | 104 | self.bytez = array.array("B", builder.get_build()).tobytes() 105 | return self.bytez 106 | 107 | def rename_section(self): 108 | binary = lief.PE.parse(list(self.bytez)) 109 | targeted_section = random.choice(binary.sections) 110 | targeted_section.name = random.choice(COMMON_SECTION_NAMES)[:5] 111 | 112 | self.bytez = self._binary_to_bytez(binary) 113 | return self.bytez 114 | 115 | def add_bytes_to_section_cave(self): 116 | caves = [] 117 | binary = lief.PE.parse(list(self.bytez)) 118 | base_addr = binary.optional_header.imagebase 119 | for section in binary.sections: 120 | section_offset = section.pointerto_raw_data 121 | vaddr = section.virtual_address + base_addr 122 | body = bytearray(section.content) 123 | 124 | if section.sizeof_raw_data > section.virtual_size: 125 | body.extend( 126 | list(b"\x00" * (section.sizeof_raw_data - section.virtual_size)), 127 | ) 128 | 129 | caves.extend( 130 | self._search_cave( 131 | section.name, 132 | body, 133 | section_offset, 134 | vaddr, 135 | ), 136 | ) 137 | 138 | if caves: 139 | random_selected_cave = random.choice(caves) 140 | upper = random.randrange(256) 141 | add_bytes = bytearray( 142 | random.randint(0, upper) for _ in range(random_selected_cave[-1]) 143 | ) 144 | self.bytez = ( 145 | self.bytez[: random_selected_cave[0]] 146 | + add_bytes 147 | + self.bytez[random_selected_cave[1] :] 148 | ) 149 | 150 | return self.bytez 151 | 152 | def modify_machine_type(self): 153 | binary = lief.PE.parse(list(self.bytez)) 154 | binary.header.machine = random.choice( 155 | [ 156 | lief.PE.MACHINE_TYPES.AMD64, 157 | lief.PE.MACHINE_TYPES.IA64, 158 | lief.PE.MACHINE_TYPES.ARM64, 159 | lief.PE.MACHINE_TYPES.POWERPC, 160 | ], 161 | ) 162 | 163 | self.bytez = self._binary_to_bytez(binary) 164 | 165 | return self.bytez 166 | 167 | def modify_timestamp(self): 168 | binary = lief.PE.parse(list(self.bytez)) 169 | binary.header.time_date_stamps = random.choice( 170 | [ 171 | 0, 172 | 868967292, 173 | 993636360, 174 | 587902357, 175 | 872078556, 176 | ], 177 | ) 178 | 179 | self.bytez = self._binary_to_bytez(binary) 180 | 181 | return self.bytez 182 | 183 | def pad_overlay(self): 184 | byte_pattern = random.choice([i for i in range(256)]) 185 | overlay = bytearray([byte_pattern] * 100000) 186 | self.bytez += overlay 187 | 188 | return self.bytez 189 | 190 | def append_benign_data_overlay(self): 191 | random_benign_file = self._randomly_select_trusted_file() 192 | benign_binary = lief.PE.parse(random_benign_file) 193 | benign_binary_section_content = benign_binary.get_section( 194 | ".text", 195 | ).content 196 | overlay = bytearray(benign_binary_section_content) 197 | self.bytez += overlay 198 | 199 | return self.bytez 200 | 201 | def append_benign_binary_overlay(self): 202 | random_benign_file = self._randomly_select_trusted_file() 203 | 204 | with open(random_benign_file, "rb") as f: 205 | benign_binary = f.read() 206 | self.bytez += benign_binary 207 | 208 | return self.bytez 209 | 210 | def add_section_benign_data(self): 211 | random_benign_file = self._randomly_select_trusted_file() 212 | benign_binary = lief.PE.parse(random_benign_file) 213 | benign_binary_section_content = benign_binary.get_section( 214 | ".text", 215 | ).content 216 | 217 | binary = lief.PE.parse(list(self.bytez)) 218 | 219 | current_section_names = [section.name for section in binary.sections] 220 | available_section_names = list( 221 | set(COMMON_SECTION_NAMES) - set(current_section_names), 222 | ) 223 | section = lief.PE.Section(random.choice(available_section_names)) 224 | section.content = benign_binary_section_content 225 | binary.add_section(section, lief.PE.SECTION_TYPES.DATA) 226 | 227 | self.bytez = self._binary_to_bytez(binary) 228 | return self.bytez 229 | 230 | def add_section_strings(self): 231 | good_strings = self._randomly_select_good_strings() 232 | binary = lief.PE.parse(list(self.bytez)) 233 | 234 | current_section_names = [section.name for section in binary.sections] 235 | available_section_names = list( 236 | set(COMMON_SECTION_NAMES) - set(current_section_names), 237 | ) 238 | section = lief.PE.Section(random.choice(available_section_names)) 239 | section.content = [ord(c) for c in good_strings] 240 | binary.add_section(section, lief.PE.SECTION_TYPES.DATA) 241 | 242 | self.bytez = self._binary_to_bytez(binary) 243 | return self.bytez 244 | 245 | def add_strings_to_overlay(self): 246 | """ 247 | Open a txt file of strings from low scoring binaries. 248 | https://skylightcyber.com/2019/07/18/cylance-i-kill-you/ 249 | """ 250 | good_strings = self._randomly_select_good_strings() 251 | self.bytez += bytes(good_strings, encoding="ascii") 252 | 253 | return self.bytez 254 | 255 | def add_imports(self): 256 | binary = lief.PE.parse(list(self.bytez)) 257 | 258 | # draw a library at random 259 | libname = random.choice(list(COMMON_IMPORTS.keys())) 260 | funcname = random.choice(list(COMMON_IMPORTS[libname])) 261 | lowerlibname = libname.lower() 262 | 263 | # find this lib in the imports, if it exists 264 | lib = None 265 | for im in binary.imports: 266 | if im.name.lower() == lowerlibname: 267 | lib = im 268 | break 269 | 270 | if lib is None: 271 | # add a new library 272 | lib = binary.add_library(libname) 273 | 274 | # get current names 275 | names = {e.name for e in lib.entries} 276 | if funcname not in names: 277 | lib.add_entry(funcname) 278 | 279 | self.bytez = self._binary_to_bytez(binary, imports=True) 280 | 281 | return self.bytez 282 | 283 | def remove_debug(self): 284 | binary = lief.PE.parse(list(self.bytez)) 285 | 286 | if binary.has_debug: 287 | for i, e in enumerate(binary.data_directories): 288 | if e.type == lief.PE.DATA_DIRECTORY.DEBUG: 289 | e.rva = 0 290 | e.size = 0 291 | self.bytez = self._binary_to_bytez(binary) 292 | return self.bytez 293 | # no debug found 294 | return self.bytez 295 | 296 | def modify_optional_header(self): 297 | binary = lief.PE.parse(list(self.bytez)) 298 | 299 | oh = { 300 | "major_linker_version": [2, 6, 7, 9, 11, 14], 301 | "minor_linker_version": [0, 16, 20, 22, 25], 302 | "major_operating_system_version": [4, 5, 6, 10], 303 | "minor_operating_system_version": [0, 1, 3], 304 | "major_image_version": [0, 1, 5, 6, 10], 305 | "minor_image_version": [0, 1, 3], 306 | } 307 | 308 | key = random.choice(list(oh.keys())) 309 | 310 | modified_val = random.choice(oh[key]) 311 | binary.optional_header.__setattr__(key, modified_val) 312 | 313 | self.bytez = self._binary_to_bytez(binary) 314 | return self.bytez 315 | 316 | def break_optional_header_checksum(self): 317 | binary = lief.PE.parse(list(self.bytez)) 318 | binary.optional_header.checksum = 0 319 | self.bytez = self._binary_to_bytez(binary) 320 | return self.bytez 321 | 322 | def upx_unpack(self): 323 | # dump bytez to a temporary file 324 | tmpfilename = os.path.join( 325 | tempfile._get_default_tempdir(), 326 | next(tempfile._get_candidate_names()), 327 | ) 328 | 329 | with open(tmpfilename, "wb") as outfile: 330 | outfile.write(self.bytez) 331 | 332 | with open(os.devnull, "w") as DEVNULL: 333 | retcode = subprocess.call( 334 | ["upx", tmpfilename, "-d", "-o", tmpfilename + "_unpacked"], 335 | stdout=DEVNULL, 336 | stderr=DEVNULL, 337 | ) 338 | 339 | os.unlink(tmpfilename) 340 | 341 | if retcode == 0: # sucessfully unpacked 342 | with open(tmpfilename + "_unpacked", "rb") as result: 343 | self.bytez = result.read() 344 | 345 | os.unlink(tmpfilename + "_unpacked") 346 | 347 | return self.bytez 348 | 349 | def upx_pack(self): 350 | # tested with UPX 3.94 351 | # WARNING: upx compression only works on binaries over 100KB 352 | tmpfilename = os.path.join( 353 | tempfile._get_default_tempdir(), 354 | next(tempfile._get_candidate_names()), 355 | ) 356 | 357 | # dump bytez to a temporary file 358 | with open(tmpfilename, "wb") as outfile: 359 | outfile.write(self.bytez) 360 | 361 | options = ["--force", "--overlay=copy"] 362 | compression_level = random.randint(1, 9) 363 | options += [f"-{compression_level}"] 364 | options += [f"--compress-exports={random.randint(0, 1)}"] 365 | options += [f"--compress-icons={random.randint(0, 3)}"] 366 | options += [f"--compress-resources={random.randint(0, 1)}"] 367 | options += [f"--strip-relocs={random.randint(0, 1)}"] 368 | 369 | with open(os.devnull, "w") as DEVNULL: 370 | retcode = subprocess.call( 371 | ["upx"] + options + [tmpfilename, "-o", tmpfilename + "_packed"], 372 | stdout=DEVNULL, 373 | stderr=DEVNULL, 374 | ) 375 | 376 | os.unlink(tmpfilename) 377 | 378 | if retcode == 0: # successfully packed 379 | 380 | with open(tmpfilename + "_packed", "rb") as infile: 381 | self.bytez = infile.read() 382 | 383 | os.unlink(tmpfilename + "_packed") 384 | 385 | return self.bytez 386 | 387 | 388 | def modify_sample(bytez, action): 389 | bytez = ModifyBinary(bytez).__getattribute__(action)() 390 | return bytez 391 | 392 | 393 | ACTION_TABLE = { 394 | "modify_machine_type": "modify_machine_type", 395 | "pad_overlay": "pad_overlay", 396 | "append_benign_data_overlay": "append_benign_data_overlay", 397 | "append_benign_binary_overlay": "append_benign_binary_overlay", 398 | "add_bytes_to_section_cave": "add_bytes_to_section_cave", 399 | "add_section_strings": "add_section_strings", 400 | "add_section_benign_data": "add_section_benign_data", 401 | "add_strings_to_overlay": "add_strings_to_overlay", 402 | "add_imports": "add_imports", 403 | "rename_section": "rename_section", 404 | "remove_debug": "remove_debug", 405 | "modify_optional_header": "modify_optional_header", 406 | "modify_timestamp": "modify_timestamp", 407 | "break_optional_header_checksum": "break_optional_header_checksum", 408 | "upx_unpack": "upx_unpack", 409 | "upx_pack": "upx_pack", 410 | } 411 | 412 | if __name__ == "__main__": 413 | # use for testing/debugging actions 414 | import hashlib 415 | 416 | from IPython import embed 417 | 418 | # filename = '../utils/samples/e090668cfbbe44474cc979f09c1efe82a644a351c5b1a2e16009be273118e053' # upx packed sample 419 | filename = "../utils/samples/7a5d1bb166c07ed101f2ee9cb43b3a8ce0d90d52788a0d9791a040d2cdcc8057" 420 | with open(filename, "rb") as f: 421 | bytez = f.read() 422 | 423 | m = hashlib.sha256() 424 | m.update(bytez) 425 | print(f"original hash: {m.hexdigest()}") 426 | 427 | action = "upx_pack" 428 | bytez = modify_sample(bytez, action) 429 | 430 | m = hashlib.sha256() 431 | m.update(bytez) 432 | print(f"modified hash: {m.hexdigest()}") 433 | 434 | embed() 435 | -------------------------------------------------------------------------------- /malware_rl/envs/controls/section_names.txt: -------------------------------------------------------------------------------- 1 | .text 2 | .rsrc 3 | .reloc 4 | .data 5 | .rdata 6 | .idata 7 | .tls 8 | .brdata 9 | .bss 10 | .pdata 11 | .xdata 12 | DATA 13 | CODE 14 | BSS 15 | rdata 16 | .rmnet 17 | .CRT 18 | .edata 19 | .extrel 20 | .sdata 21 | .code 22 | .vmp0 23 | .itext 24 | .data2 25 | .data1 26 | .vmp1 27 | .adata 28 | .gfids 29 | .data3 30 | INIT 31 | .extjmp 32 | .didat 33 | .didata 34 | PAGE 35 | .orpc 36 | vryeypb 37 | camztlf 38 | tkjdelw 39 | dgbwqbp 40 | odyqxub 41 | .tsuarch 42 | .tsustub 43 | .textbss 44 | .sxdata 45 | .zrdata 46 | qxejodg 47 | .data-co 48 | .text-co 49 | gumrkvc 50 | rqvmxkb 51 | kakxcjb 52 | .cdata 53 | ExeS 54 | .rrdata 55 | -------------------------------------------------------------------------------- /malware_rl/envs/controls/trusted/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bfilar/malware_rl/300f47ff2240132a449277283807426d34271993/malware_rl/envs/controls/trusted/.gitkeep -------------------------------------------------------------------------------- /malware_rl/envs/ember_gym.py: -------------------------------------------------------------------------------- 1 | import hashlib 2 | import os 3 | import random 4 | import sys 5 | from collections import OrderedDict 6 | 7 | import gym 8 | import numpy as np 9 | from gym import spaces 10 | from malware_rl.envs.controls import modifier 11 | from malware_rl.envs.utils import ember, interface 12 | 13 | random.seed(0) 14 | module_path = os.path.split(os.path.abspath(sys.modules[__name__].__file__))[0] 15 | 16 | ACTION_LOOKUP = {i: act for i, act in enumerate(modifier.ACTION_TABLE.keys())} 17 | 18 | ember_model = ember.EmberModel() 19 | malicious_threshold = ember_model.threshold 20 | 21 | 22 | class EmberEnv(gym.Env): 23 | """Create MalConv gym interface""" 24 | 25 | metadata = {"render.modes": ["human"]} 26 | 27 | def __init__( 28 | self, 29 | sha256list, 30 | random_sample=True, 31 | maxturns=5, 32 | output_path="data/evaded/ember", 33 | ): 34 | super().__init__() 35 | self.available_sha256 = sha256list 36 | self.action_space = spaces.Discrete(len(ACTION_LOOKUP)) 37 | observation_high = np.finfo(np.float32).max 38 | self.observation_space = spaces.Box( 39 | low=-observation_high, 40 | high=observation_high, 41 | shape=(2381,), 42 | dtype=np.float32, 43 | ) 44 | self.maxturns = maxturns 45 | self.feature_extractor = ember_model.extract 46 | self.output_path = output_path 47 | self.random_sample = random_sample 48 | self.history = OrderedDict() 49 | self.sample_iteration_index = 0 50 | 51 | self.output_path = os.path.join( 52 | os.path.dirname( 53 | os.path.dirname( 54 | os.path.dirname( 55 | os.path.abspath(__file__), 56 | ), 57 | ), 58 | ), 59 | output_path, 60 | ) 61 | 62 | def step(self, action_ix): 63 | # Execute one time step within the environment 64 | self.turns += 1 65 | self._take_action(action_ix) 66 | self.observation_space = self.feature_extractor(self.bytez) 67 | self.score = ember_model.predict_sample(self.observation_space) 68 | 69 | if self.score < malicious_threshold: 70 | reward = 10.0 71 | episode_over = True 72 | self.history[self.sha256]["evaded"] = True 73 | self.history[self.sha256]["reward"] = reward 74 | 75 | # save off file to evasion directory 76 | m = hashlib.sha256() 77 | m.update(self.bytez) 78 | sha256 = m.hexdigest() 79 | evade_path = os.path.join(self.output_path, sha256) 80 | 81 | with open(evade_path, "wb") as out: 82 | out.write(self.bytez) 83 | 84 | self.history[self.sha256]["evade_path"] = evade_path 85 | 86 | elif self.turns >= self.maxturns: 87 | # game over - max turns hit 88 | reward = self.original_score - self.score 89 | episode_over = True 90 | self.history[self.sha256]["evaded"] = False 91 | self.history[self.sha256]["reward"] = reward 92 | else: 93 | reward = self.original_score - self.score 94 | episode_over = False 95 | 96 | if episode_over: 97 | print(f"Episode over: reward = {reward}") 98 | 99 | return self.observation_space, reward, episode_over, self.history[self.sha256] 100 | 101 | def _take_action(self, action_ix): 102 | action = ACTION_LOOKUP[action_ix] 103 | self.history[self.sha256]["actions"].append(action) 104 | self.bytez = modifier.modify_sample(self.bytez, action) 105 | 106 | def reset(self): 107 | # Reset the state of the environment to an initial state 108 | self.turns = 0 109 | while True: 110 | # grab a new sample (TODO) 111 | if self.random_sample: 112 | self.sha256 = random.choice(self.available_sha256) 113 | else: 114 | self.sha256 = self.available_sha256[ 115 | self.sample_iteration_index % len(self.available_sha256) 116 | ] 117 | self.sample_iteration_index += 1 118 | 119 | self.history[self.sha256] = {"actions": [], "evaded": False} 120 | self.bytez = interface.fetch_file( 121 | os.path.join( 122 | module_path, 123 | "utils/samples/", 124 | ) 125 | + self.sha256, 126 | ) 127 | 128 | self.observation_space = self.feature_extractor(self.bytez) 129 | self.original_score = ember_model.predict_sample( 130 | self.observation_space, 131 | ) 132 | if self.original_score < malicious_threshold: 133 | # already labeled benign, skip 134 | continue 135 | 136 | break 137 | print(f"Sample: {self.sha256}") 138 | return self.observation_space 139 | 140 | def render(self, mode="human", close=False): 141 | # Render the environment to the screen 142 | pass 143 | -------------------------------------------------------------------------------- /malware_rl/envs/malconv_gym.py: -------------------------------------------------------------------------------- 1 | import hashlib 2 | import os 3 | import random 4 | import sys 5 | from collections import OrderedDict 6 | 7 | import gym 8 | import numpy as np 9 | from gym import spaces 10 | from malware_rl.envs.controls import modifier 11 | from malware_rl.envs.utils import interface, malconv 12 | 13 | random.seed(0) 14 | module_path = os.path.split(os.path.abspath(sys.modules[__name__].__file__))[0] 15 | 16 | ACTION_LOOKUP = {i: act for i, act in enumerate(modifier.ACTION_TABLE.keys())} 17 | 18 | mc = malconv.MalConv() 19 | malicious_threshold = mc.malicious_threshold 20 | 21 | 22 | class MalConvEnv(gym.Env): 23 | """Create MalConv gym interface""" 24 | 25 | metadata = {"render.modes": ["human"]} 26 | 27 | def __init__( 28 | self, 29 | sha256list, 30 | random_sample=True, 31 | maxturns=5, 32 | output_path="data/evaded/malconv", 33 | ): 34 | super().__init__() 35 | self.available_sha256 = sha256list 36 | self.action_space = spaces.Discrete(len(ACTION_LOOKUP)) 37 | self.observation_space = spaces.Box( 38 | low=0, 39 | high=256, 40 | shape=(1048576,), 41 | dtype=np.int16, 42 | ) 43 | self.maxturns = maxturns 44 | self.feature_extractor = mc.extract 45 | self.output_path = output_path 46 | self.random_sample = random_sample 47 | self.history = OrderedDict() 48 | self.sample_iteration_index = 0 49 | 50 | self.output_path = os.path.join( 51 | os.path.dirname( 52 | os.path.dirname( 53 | os.path.dirname( 54 | os.path.abspath(__file__), 55 | ), 56 | ), 57 | ), 58 | output_path, 59 | ) 60 | 61 | def step(self, action_ix): 62 | # Execute one time step within the environment 63 | self.turns += 1 64 | self._take_action(action_ix) 65 | self.observation_space = self.feature_extractor(self.bytez) 66 | self.score = mc.predict_sample(self.observation_space) 67 | 68 | if self.score < malicious_threshold: 69 | reward = 10.0 70 | episode_over = True 71 | self.history[self.sha256]["evaded"] = True 72 | self.history[self.sha256]["reward"] = reward 73 | 74 | # save off file to evasion directory 75 | m = hashlib.sha256() 76 | m.update(self.bytez) 77 | sha256 = m.hexdigest() 78 | evade_path = os.path.join(self.output_path, sha256) 79 | 80 | with open(evade_path, "wb") as out: 81 | out.write(self.bytez) 82 | 83 | self.history[self.sha256]["evade_path"] = evade_path 84 | 85 | elif self.turns >= self.maxturns: 86 | # game over - max turns hit 87 | reward = self.original_score - self.score 88 | episode_over = True 89 | self.history[self.sha256]["evaded"] = False 90 | self.history[self.sha256]["reward"] = reward 91 | 92 | else: 93 | reward = float(self.original_score - self.score) 94 | episode_over = False 95 | 96 | if episode_over: 97 | print(f"Episode over: reward = {reward}") 98 | 99 | return self.observation_space, reward, episode_over, self.history[self.sha256] 100 | 101 | def _take_action(self, action_ix): 102 | action = ACTION_LOOKUP[action_ix] 103 | # print("ACTION:", action) 104 | self.history[self.sha256]["actions"].append(action) 105 | self.bytez = modifier.modify_sample(self.bytez, action) 106 | 107 | def reset(self): 108 | # Reset the state of the environment to an initial state 109 | self.turns = 0 110 | while True: 111 | # grab a new sample (TODO) 112 | if self.random_sample: 113 | self.sha256 = random.choice(self.available_sha256) 114 | else: 115 | self.sha256 = self.available_sha256[ 116 | self.sample_iteration_index % len(self.available_sha256) 117 | ] 118 | self.sample_iteration_index += 1 119 | 120 | self.history[self.sha256] = {"actions": [], "evaded": False} 121 | self.bytez = interface.fetch_file( 122 | os.path.join( 123 | module_path, 124 | "utils/samples/", 125 | ) 126 | + self.sha256, 127 | ) 128 | 129 | self.observation_space = self.feature_extractor(self.bytez) 130 | self.original_score = mc.predict_sample(self.observation_space) 131 | if self.original_score < malicious_threshold: 132 | # already labeled benign, skip 133 | continue 134 | 135 | break 136 | print(f"Sample: {self.sha256}") 137 | 138 | return self.observation_space 139 | 140 | def render(self, mode="human", close=False): 141 | # Render the environment to the screen 142 | pass 143 | -------------------------------------------------------------------------------- /malware_rl/envs/sorel_gym.py: -------------------------------------------------------------------------------- 1 | import hashlib 2 | import os 3 | import random 4 | import sys 5 | from collections import OrderedDict 6 | 7 | import gym 8 | import numpy as np 9 | from gym import spaces 10 | from malware_rl.envs.controls import modifier 11 | from malware_rl.envs.utils import interface, sorel 12 | 13 | random.seed(0) 14 | module_path = os.path.split(os.path.abspath(sys.modules[__name__].__file__))[0] 15 | 16 | ACTION_LOOKUP = {i: act for i, act in enumerate(modifier.ACTION_TABLE.keys())} 17 | 18 | sorel_model = sorel.SorelModel() 19 | malicious_threshold = sorel_model.threshold 20 | 21 | 22 | class SorelEnv(gym.Env): 23 | """Creates the Sorel Gym Interface""" 24 | 25 | metadata = {"render.modes": ["human"]} 26 | 27 | def __init__( 28 | self, 29 | sha256list, 30 | random_sample=True, 31 | maxturns=5, 32 | output_path="data/evaded/sorel", 33 | ): 34 | super().__init__() 35 | self.available_sha256 = sha256list 36 | self.action_space = spaces.Discrete(len(ACTION_LOOKUP)) 37 | observation_high = np.finfo(np.float32).max 38 | self.observation_space = spaces.Box( 39 | low=-observation_high, 40 | high=observation_high, 41 | shape=(2381,), 42 | dtype=np.float32, 43 | ) 44 | self.maxturns = maxturns 45 | self.feature_extractor = sorel_model.extract 46 | self.output_path = output_path 47 | self.random_sample = random_sample 48 | self.history = OrderedDict() 49 | self.sample_iteration_index = 0 50 | 51 | self.output_path = os.path.join( 52 | os.path.dirname( 53 | os.path.dirname( 54 | os.path.dirname( 55 | os.path.abspath(__file__), 56 | ), 57 | ), 58 | ), 59 | output_path, 60 | ) 61 | 62 | def step(self, action_ix): 63 | # Execute one time step within the environment 64 | self.turns += 1 65 | self._take_action(action_ix) 66 | self.observation_space = self.feature_extractor(self.bytez) 67 | self.score = sorel_model.predict_sample(self.observation_space) 68 | 69 | if self.score < malicious_threshold: 70 | reward = 10.0 71 | episode_over = True 72 | self.history[self.sha256]["evaded"] = True 73 | self.history[self.sha256]["reward"] = reward 74 | 75 | # save off file to evasion directory 76 | m = hashlib.sha256() 77 | m.update(self.bytez) 78 | sha256 = m.hexdigest() 79 | evade_path = os.path.join(self.output_path, sha256) 80 | 81 | with open(evade_path, "wb") as out: 82 | out.write(self.bytez) 83 | 84 | self.history[self.sha256]["evade_path"] = evade_path 85 | 86 | elif self.turns >= self.maxturns: 87 | # game over - max turns hit 88 | reward = self.original_score - self.score 89 | episode_over = True 90 | self.history[self.sha256]["evaded"] = False 91 | self.history[self.sha256]["reward"] = reward 92 | else: 93 | reward = self.original_score - self.score 94 | episode_over = False 95 | 96 | if episode_over: 97 | print(f"Episode over: reward = {reward}") 98 | 99 | return self.observation_space, reward, episode_over, self.history[self.sha256] 100 | 101 | def _take_action(self, action_ix): 102 | action = ACTION_LOOKUP[action_ix] 103 | self.history[self.sha256]["actions"].append(action) 104 | self.bytez = modifier.modify_sample(self.bytez, action) 105 | 106 | def reset(self): 107 | # Reset the state of the environment to an initial state 108 | self.turns = 0 109 | while True: 110 | # grab a new sample (TODO) 111 | if self.random_sample: 112 | self.sha256 = random.choice(self.available_sha256) 113 | else: 114 | self.sha256 = self.available_sha256[ 115 | self.sample_iteration_index % len(self.available_sha256) 116 | ] 117 | self.sample_iteration_index += 1 118 | 119 | self.history[self.sha256] = {"actions": [], "evaded": False} 120 | self.bytez = interface.fetch_file( 121 | os.path.join( 122 | module_path, 123 | "utils/samples/", 124 | ) 125 | + self.sha256, 126 | ) 127 | 128 | self.observation_space = self.feature_extractor(self.bytez) 129 | self.original_score = sorel_model.predict_sample( 130 | self.observation_space, 131 | ) 132 | if self.original_score < malicious_threshold: 133 | # already labeled benign, skip 134 | continue 135 | 136 | break 137 | print(f"Sample: {self.sha256}") 138 | return self.observation_space 139 | 140 | def render(self, mode="human", close=False): 141 | # Render the environment to the screen 142 | pass 143 | -------------------------------------------------------------------------------- /malware_rl/envs/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bfilar/malware_rl/300f47ff2240132a449277283807426d34271993/malware_rl/envs/utils/__init__.py -------------------------------------------------------------------------------- /malware_rl/envs/utils/ember.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | """ Extracts some basic features from PE files. Many of the features 3 | implemented have been used in previously published works. For more information, 4 | check out the following resources: 5 | * Schultz, et al., 2001: http://128.59.14.66/sites/default/files/binaryeval-ieeesp01.pdf 6 | * Kolter and Maloof, 2006: http://www.jmlr.org/papers/volume7/kolter06a/kolter06a.pdf 7 | * Shafiq et al., 2009: https://www.researchgate.net/profile/Fauzan_Mirza/publication/242084613_A_Framework_for_Efficient_Mining_of_Structural_Information_to_Detect_Zero-Day_Malicious_Portable_Executables/links/0c96052e191668c3d5000000.pdf 8 | * Raman, 2012: http://2012.infosecsouthwest.com/files/speaker_materials/ISSW2012_Selecting_Features_to_Classify_Malware.pdf 9 | * Saxe and Berlin, 2015: https://arxiv.org/pdf/1508.03096.pdf 10 | 11 | It may be useful to do feature selection to reduce this set of features to a meaningful set 12 | for your modeling problem. 13 | """ 14 | import hashlib 15 | import os 16 | import re 17 | import sys 18 | 19 | import lief 20 | import lightgbm as lgb 21 | import numpy as np 22 | from sklearn.feature_extraction import FeatureHasher 23 | 24 | LIEF_MAJOR, LIEF_MINOR, _ = lief.__version__.split(".") 25 | LIEF_EXPORT_OBJECT = int(LIEF_MAJOR) > 0 or ( 26 | int(LIEF_MAJOR) == 0 and int(LIEF_MINOR) >= 10 27 | ) 28 | 29 | module_path = os.path.split(os.path.abspath(sys.modules[__name__].__file__))[0] 30 | model_path = os.path.join(module_path, "ember_model.txt") 31 | 32 | 33 | class FeatureType: 34 | """Base class from which each feature type may inherit""" 35 | 36 | name = "" 37 | dim = 0 38 | 39 | def __repr__(self): 40 | return f"{self.name}({self.dim})" 41 | 42 | def raw_features(self, bytez, lief_binary): 43 | """Generate a JSON-able representation of the file""" 44 | raise (NotImplementedError) 45 | 46 | def process_raw_features(self, raw_obj): 47 | """Generate a feature vector from the raw features""" 48 | raise (NotImplementedError) 49 | 50 | def feature_vector(self, bytez, lief_binary): 51 | """Directly calculate the feature vector from the sample itself. This should only be implemented differently 52 | if there are significant speedups to be gained from combining the two functions.""" 53 | return self.process_raw_features(self.raw_features(bytez, lief_binary)) 54 | 55 | 56 | class ByteHistogram(FeatureType): 57 | """Byte histogram (count + non-normalized) over the entire binary file""" 58 | 59 | name = "histogram" 60 | dim = 256 61 | 62 | def __init__(self): 63 | super(FeatureType, self).__init__() 64 | 65 | def raw_features(self, bytez, lief_binary): 66 | counts = np.bincount( 67 | np.frombuffer( 68 | bytez, 69 | dtype=np.uint8, 70 | ), 71 | minlength=256, 72 | ) 73 | return counts.tolist() 74 | 75 | def process_raw_features(self, raw_obj): 76 | counts = np.array(raw_obj, dtype=np.float32) 77 | sum = counts.sum() 78 | normalized = counts / sum 79 | return normalized 80 | 81 | 82 | class ByteEntropyHistogram(FeatureType): 83 | """2d byte/entropy histogram based loosely on (Saxe and Berlin, 2015). 84 | This roughly approximates the joint probability of byte value and local entropy. 85 | See Section 2.1.1 in https://arxiv.org/pdf/1508.03096.pdf for more info. 86 | """ 87 | 88 | name = "byteentropy" 89 | dim = 256 90 | 91 | def __init__(self, step=1024, window=2048): 92 | super(FeatureType, self).__init__() 93 | self.window = window 94 | self.step = step 95 | 96 | def _entropy_bin_counts(self, block): 97 | # coarse histogram, 16 bytes per bin 98 | c = np.bincount(block >> 4, minlength=16) # 16-bin histogram 99 | p = c.astype(np.float32) / self.window 100 | wh = np.where(c)[0] 101 | H = ( 102 | np.sum( 103 | -p[wh] 104 | * np.log2( 105 | p[wh], 106 | ), 107 | ) 108 | * 2 109 | ) # * x2 b.c. we reduced information by half: 256 bins (8 bits) to 16 bins (4 bits) 110 | 111 | Hbin = int(H * 2) # up to 16 bins (max entropy is 8 bits) 112 | if Hbin == 16: # handle entropy = 8.0 bits 113 | Hbin = 15 114 | 115 | return Hbin, c 116 | 117 | def raw_features(self, bytez, lief_binary): 118 | output = np.zeros((16, 16), dtype=np.int) 119 | a = np.frombuffer(bytez, dtype=np.uint8) 120 | if a.shape[0] < self.window: 121 | Hbin, c = self._entropy_bin_counts(a) 122 | output[Hbin, :] += c 123 | else: 124 | # strided trick from here: http://www.rigtorp.se/2011/01/01/rolling-statistics-numpy.html 125 | shape = a.shape[:-1] + (a.shape[-1] - self.window + 1, self.window) 126 | strides = a.strides + (a.strides[-1],) 127 | blocks = np.lib.stride_tricks.as_strided( 128 | a, 129 | shape=shape, 130 | strides=strides, 131 | )[:: self.step, :] 132 | 133 | # from the blocks, compute histogram 134 | for block in blocks: 135 | Hbin, c = self._entropy_bin_counts(block) 136 | output[Hbin, :] += c 137 | 138 | return output.flatten().tolist() 139 | 140 | def process_raw_features(self, raw_obj): 141 | counts = np.array(raw_obj, dtype=np.float32) 142 | sum = counts.sum() 143 | normalized = counts / sum 144 | return normalized 145 | 146 | 147 | class SectionInfo(FeatureType): 148 | """Information about section names, sizes and entropy. Uses hashing trick 149 | to summarize all this section info into a feature vector. 150 | """ 151 | 152 | name = "section" 153 | dim = 5 + 50 + 50 + 50 + 50 + 50 154 | 155 | def __init__(self): 156 | super(FeatureType, self).__init__() 157 | 158 | @staticmethod 159 | def _properties(s): 160 | return [str(c).split(".")[-1] for c in s.characteristics_lists] 161 | 162 | def raw_features(self, bytez, lief_binary): 163 | if lief_binary is None: 164 | return {"entry": "", "sections": []} 165 | 166 | # properties of entry point, or if invalid, the first executable section 167 | try: 168 | entry_section = lief_binary.section_from_offset( 169 | lief_binary.entrypoint, 170 | ).name 171 | except lief.not_found: 172 | # bad entry point, let's find the first executable section 173 | entry_section = "" 174 | for s in lief_binary.sections: 175 | if ( 176 | lief.PE.SECTION_CHARACTERISTICS.MEM_EXECUTE 177 | in s.characteristics_lists 178 | ): 179 | entry_section = s.name 180 | break 181 | 182 | raw_obj = {"entry": entry_section} 183 | raw_obj["sections"] = [ 184 | { 185 | "name": s.name, 186 | "size": s.size, 187 | "entropy": s.entropy, 188 | "vsize": s.virtual_size, 189 | "props": self._properties(s), 190 | } 191 | for s in lief_binary.sections 192 | ] 193 | return raw_obj 194 | 195 | def process_raw_features(self, raw_obj): 196 | sections = raw_obj["sections"] 197 | general = [ 198 | len(sections), # total number of sections 199 | # number of sections with nonzero size 200 | sum(1 for s in sections if s["size"] == 0), 201 | # number of sections with an empty name 202 | sum(1 for s in sections if s["name"] == ""), 203 | # number of RX 204 | sum( 205 | 1 206 | for s in sections 207 | if "MEM_READ" in s["props"] and "MEM_EXECUTE" in s["props"] 208 | ), 209 | # number of W 210 | sum(1 for s in sections if "MEM_WRITE" in s["props"]), 211 | ] 212 | # gross characteristics of each section 213 | section_sizes = [(s["name"], s["size"]) for s in sections] 214 | section_sizes_hashed = ( 215 | FeatureHasher(50, input_type="pair") 216 | .transform( 217 | [ 218 | section_sizes, 219 | ], 220 | ) 221 | .toarray()[0] 222 | ) 223 | section_entropy = [(s["name"], s["entropy"]) for s in sections] 224 | section_entropy_hashed = ( 225 | FeatureHasher(50, input_type="pair") 226 | .transform( 227 | [ 228 | section_entropy, 229 | ], 230 | ) 231 | .toarray()[0] 232 | ) 233 | section_vsize = [(s["name"], s["vsize"]) for s in sections] 234 | section_vsize_hashed = ( 235 | FeatureHasher(50, input_type="pair") 236 | .transform( 237 | [ 238 | section_vsize, 239 | ], 240 | ) 241 | .toarray()[0] 242 | ) 243 | entry_name_hashed = ( 244 | FeatureHasher(50, input_type="string") 245 | .transform( 246 | [ 247 | raw_obj["entry"], 248 | ], 249 | ) 250 | .toarray()[0] 251 | ) 252 | characteristics = [ 253 | p for s in sections for p in s["props"] if s["name"] == raw_obj["entry"] 254 | ] 255 | characteristics_hashed = ( 256 | FeatureHasher(50, input_type="string") 257 | .transform( 258 | [ 259 | characteristics, 260 | ], 261 | ) 262 | .toarray()[0] 263 | ) 264 | 265 | return np.hstack( 266 | [ 267 | general, 268 | section_sizes_hashed, 269 | section_entropy_hashed, 270 | section_vsize_hashed, 271 | entry_name_hashed, 272 | characteristics_hashed, 273 | ], 274 | ).astype(np.float32) 275 | 276 | 277 | class ImportsInfo(FeatureType): 278 | """Information about imported libraries and functions from the 279 | import address table. Note that the total number of imported 280 | functions is contained in GeneralFileInfo. 281 | """ 282 | 283 | name = "imports" 284 | dim = 1280 285 | 286 | def __init__(self): 287 | super(FeatureType, self).__init__() 288 | 289 | def raw_features(self, bytez, lief_binary): 290 | imports = {} 291 | if lief_binary is None: 292 | return imports 293 | 294 | for lib in lief_binary.imports: 295 | if lib.name not in imports: 296 | # libraries can be duplicated in listing, extend instead of overwrite 297 | imports[lib.name] = [] 298 | 299 | # Clipping assumes there are diminishing returns on the discriminatory power of imported functions 300 | # beyond the first 10000 characters, and this will help limit the dataset size 301 | for entry in lib.entries: 302 | if entry.is_ordinal: 303 | imports[lib.name].append("ordinal" + str(entry.ordinal)) 304 | else: 305 | imports[lib.name].append(entry.name[:10000]) 306 | 307 | return imports 308 | 309 | def process_raw_features(self, raw_obj): 310 | # unique libraries 311 | libraries = list({l.lower() for l in raw_obj.keys()}) 312 | libraries_hashed = ( 313 | FeatureHasher( 314 | 256, 315 | input_type="string", 316 | ) 317 | .transform([libraries]) 318 | .toarray()[0] 319 | ) 320 | 321 | # A string like "kernel32.dll:CreateFileMappingA" for each imported function 322 | imports = [ 323 | lib.lower() + ":" + e for lib, elist in raw_obj.items() for e in elist 324 | ] 325 | imports_hashed = ( 326 | FeatureHasher( 327 | 1024, 328 | input_type="string", 329 | ) 330 | .transform([imports]) 331 | .toarray()[0] 332 | ) 333 | 334 | # Two separate elements: libraries (alone) and fully-qualified names of imported functions 335 | return np.hstack([libraries_hashed, imports_hashed]).astype(np.float32) 336 | 337 | 338 | class ExportsInfo(FeatureType): 339 | """Information about exported functions. Note that the total number of exported 340 | functions is contained in GeneralFileInfo. 341 | """ 342 | 343 | name = "exports" 344 | dim = 128 345 | 346 | def __init__(self): 347 | super(FeatureType, self).__init__() 348 | 349 | def raw_features(self, bytez, lief_binary): 350 | if lief_binary is None: 351 | return [] 352 | 353 | # Clipping assumes there are diminishing returns on the discriminatory power of exports beyond 354 | # the first 10000 characters, and this will help limit the dataset size 355 | if LIEF_EXPORT_OBJECT: 356 | # export is an object with .name attribute (0.10.0 and later) 357 | clipped_exports = [ 358 | export.name[:10000] for export in lief_binary.exported_functions 359 | ] 360 | else: 361 | # export is a string (LIEF 0.9.0 and earlier) 362 | clipped_exports = [ 363 | export[:10000] for export in lief_binary.exported_functions 364 | ] 365 | 366 | return clipped_exports 367 | 368 | def process_raw_features(self, raw_obj): 369 | exports_hashed = ( 370 | FeatureHasher( 371 | 128, 372 | input_type="string", 373 | ) 374 | .transform([raw_obj]) 375 | .toarray()[0] 376 | ) 377 | return exports_hashed.astype(np.float32) 378 | 379 | 380 | class GeneralFileInfo(FeatureType): 381 | """General information about the file""" 382 | 383 | name = "general" 384 | dim = 10 385 | 386 | def __init__(self): 387 | super(FeatureType, self).__init__() 388 | 389 | def raw_features(self, bytez, lief_binary): 390 | if lief_binary is None: 391 | return { 392 | "size": len(bytez), 393 | "vsize": 0, 394 | "has_debug": 0, 395 | "exports": 0, 396 | "imports": 0, 397 | "has_relocations": 0, 398 | "has_resources": 0, 399 | "has_signature": 0, 400 | "has_tls": 0, 401 | "symbols": 0, 402 | } 403 | 404 | return { 405 | "size": len(bytez), 406 | "vsize": lief_binary.virtual_size, 407 | "has_debug": int(lief_binary.has_debug), 408 | "exports": len(lief_binary.exported_functions), 409 | "imports": len(lief_binary.imported_functions), 410 | "has_relocations": int(lief_binary.has_relocations), 411 | "has_resources": int(lief_binary.has_resources), 412 | "has_signature": int(lief_binary.has_signature), 413 | "has_tls": int(lief_binary.has_tls), 414 | "symbols": len(lief_binary.symbols), 415 | } 416 | 417 | def process_raw_features(self, raw_obj): 418 | return np.asarray( 419 | [ 420 | raw_obj["size"], 421 | raw_obj["vsize"], 422 | raw_obj["has_debug"], 423 | raw_obj["exports"], 424 | raw_obj["imports"], 425 | raw_obj["has_relocations"], 426 | raw_obj["has_resources"], 427 | raw_obj["has_signature"], 428 | raw_obj["has_tls"], 429 | raw_obj["symbols"], 430 | ], 431 | dtype=np.float32, 432 | ) 433 | 434 | 435 | class HeaderFileInfo(FeatureType): 436 | """Machine, architecure, OS, linker and other information extracted from header""" 437 | 438 | name = "header" 439 | dim = 62 440 | 441 | def __init__(self): 442 | super(FeatureType, self).__init__() 443 | 444 | def raw_features(self, bytez, lief_binary): 445 | raw_obj = {} 446 | raw_obj["coff"] = { 447 | "timestamp": 0, 448 | "machine": "", 449 | "characteristics": [], 450 | } 451 | raw_obj["optional"] = { 452 | "subsystem": "", 453 | "dll_characteristics": [], 454 | "magic": "", 455 | "major_image_version": 0, 456 | "minor_image_version": 0, 457 | "major_linker_version": 0, 458 | "minor_linker_version": 0, 459 | "major_operating_system_version": 0, 460 | "minor_operating_system_version": 0, 461 | "major_subsystem_version": 0, 462 | "minor_subsystem_version": 0, 463 | "sizeof_code": 0, 464 | "sizeof_headers": 0, 465 | "sizeof_heap_commit": 0, 466 | } 467 | if lief_binary is None: 468 | return raw_obj 469 | 470 | raw_obj["coff"]["timestamp"] = lief_binary.header.time_date_stamps 471 | raw_obj["coff"]["machine"] = str(lief_binary.header.machine).split( 472 | ".", 473 | )[-1] 474 | raw_obj["coff"]["characteristics"] = [ 475 | str(c).split( 476 | ".", 477 | )[-1] 478 | for c in lief_binary.header.characteristics_list 479 | ] 480 | raw_obj["optional"]["subsystem"] = str( 481 | lief_binary.optional_header.subsystem, 482 | ).split(".")[-1] 483 | raw_obj["optional"]["dll_characteristics"] = [ 484 | str(c).split(".")[-1] 485 | for c in lief_binary.optional_header.dll_characteristics_lists 486 | ] 487 | raw_obj["optional"]["magic"] = str(lief_binary.optional_header.magic).split( 488 | ".", 489 | )[-1] 490 | raw_obj["optional"][ 491 | "major_image_version" 492 | ] = lief_binary.optional_header.major_image_version 493 | raw_obj["optional"][ 494 | "minor_image_version" 495 | ] = lief_binary.optional_header.minor_image_version 496 | raw_obj["optional"][ 497 | "major_linker_version" 498 | ] = lief_binary.optional_header.major_linker_version 499 | raw_obj["optional"][ 500 | "minor_linker_version" 501 | ] = lief_binary.optional_header.minor_linker_version 502 | raw_obj["optional"][ 503 | "major_operating_system_version" 504 | ] = lief_binary.optional_header.major_operating_system_version 505 | raw_obj["optional"][ 506 | "minor_operating_system_version" 507 | ] = lief_binary.optional_header.minor_operating_system_version 508 | raw_obj["optional"][ 509 | "major_subsystem_version" 510 | ] = lief_binary.optional_header.major_subsystem_version 511 | raw_obj["optional"][ 512 | "minor_subsystem_version" 513 | ] = lief_binary.optional_header.minor_subsystem_version 514 | raw_obj["optional"]["sizeof_code"] = lief_binary.optional_header.sizeof_code 515 | raw_obj["optional"][ 516 | "sizeof_headers" 517 | ] = lief_binary.optional_header.sizeof_headers 518 | raw_obj["optional"][ 519 | "sizeof_heap_commit" 520 | ] = lief_binary.optional_header.sizeof_heap_commit 521 | return raw_obj 522 | 523 | def process_raw_features(self, raw_obj): 524 | return np.hstack( 525 | [ 526 | raw_obj["coff"]["timestamp"], 527 | FeatureHasher(10, input_type="string") 528 | .transform( 529 | [[raw_obj["coff"]["machine"]]], 530 | ) 531 | .toarray()[0], 532 | FeatureHasher(10, input_type="string") 533 | .transform( 534 | [raw_obj["coff"]["characteristics"]], 535 | ) 536 | .toarray()[0], 537 | FeatureHasher(10, input_type="string") 538 | .transform( 539 | [[raw_obj["optional"]["subsystem"]]], 540 | ) 541 | .toarray()[0], 542 | FeatureHasher(10, input_type="string") 543 | .transform( 544 | [raw_obj["optional"]["dll_characteristics"]], 545 | ) 546 | .toarray()[0], 547 | FeatureHasher(10, input_type="string") 548 | .transform( 549 | [[raw_obj["optional"]["magic"]]], 550 | ) 551 | .toarray()[0], 552 | raw_obj["optional"]["major_image_version"], 553 | raw_obj["optional"]["minor_image_version"], 554 | raw_obj["optional"]["major_linker_version"], 555 | raw_obj["optional"]["minor_linker_version"], 556 | raw_obj["optional"]["major_operating_system_version"], 557 | raw_obj["optional"]["minor_operating_system_version"], 558 | raw_obj["optional"]["major_subsystem_version"], 559 | raw_obj["optional"]["minor_subsystem_version"], 560 | raw_obj["optional"]["sizeof_code"], 561 | raw_obj["optional"]["sizeof_headers"], 562 | raw_obj["optional"]["sizeof_heap_commit"], 563 | ], 564 | ).astype(np.float32) 565 | 566 | 567 | class StringExtractor(FeatureType): 568 | """Extracts strings from raw byte stream""" 569 | 570 | name = "strings" 571 | dim = 1 + 1 + 1 + 96 + 1 + 1 + 1 + 1 + 1 572 | 573 | def __init__(self): 574 | super(FeatureType, self).__init__() 575 | # all consecutive runs of 0x20 - 0x7f that are 5+ characters 576 | self._allstrings = re.compile(b"[\x20-\x7f]{5,}") 577 | # occurances of the string 'C:\'. Not actually extracting the path 578 | self._paths = re.compile(b"c:\\\\", re.IGNORECASE) 579 | # occurances of http:// or https://. Not actually extracting the URLs 580 | self._urls = re.compile(b"https?://", re.IGNORECASE) 581 | # occurances of the string prefix HKEY_. No actually extracting registry names 582 | self._registry = re.compile(b"HKEY_") 583 | # crude evidence of an MZ header (dropper?) somewhere in the byte stream 584 | self._mz = re.compile(b"MZ") 585 | 586 | def raw_features(self, bytez, lief_binary): 587 | allstrings = self._allstrings.findall(bytez) 588 | if allstrings: 589 | # statistics about strings: 590 | string_lengths = [len(s) for s in allstrings] 591 | avlength = sum(string_lengths) / len(string_lengths) 592 | # map printable characters 0x20 - 0x7f to an int array consisting of 0-95, inclusive 593 | as_shifted_string = [b - ord(b"\x20") for b in b"".join(allstrings)] 594 | c = np.bincount(as_shifted_string, minlength=96) # histogram count 595 | # distribution of characters in printable strings 596 | csum = c.sum() 597 | p = c.astype(np.float32) / csum 598 | wh = np.where(c)[0] 599 | H = np.sum(-p[wh] * np.log2(p[wh])) # entropy 600 | else: 601 | avlength = 0 602 | c = np.zeros((96,), dtype=np.float32) 603 | H = 0 604 | csum = 0 605 | 606 | return { 607 | "numstrings": len(allstrings), 608 | "avlength": avlength, 609 | "printabledist": c.tolist(), # store non-normalized histogram 610 | "printables": int(csum), 611 | "entropy": float(H), 612 | "paths": len(self._paths.findall(bytez)), 613 | "urls": len(self._urls.findall(bytez)), 614 | "registry": len(self._registry.findall(bytez)), 615 | "MZ": len(self._mz.findall(bytez)), 616 | } 617 | 618 | def process_raw_features(self, raw_obj): 619 | hist_divisor = ( 620 | float( 621 | raw_obj["printables"], 622 | ) 623 | if raw_obj["printables"] > 0 624 | else 1.0 625 | ) 626 | return np.hstack( 627 | [ 628 | raw_obj["numstrings"], 629 | raw_obj["avlength"], 630 | raw_obj["printables"], 631 | np.asarray(raw_obj["printabledist"]) / hist_divisor, 632 | raw_obj["entropy"], 633 | raw_obj["paths"], 634 | raw_obj["urls"], 635 | raw_obj["registry"], 636 | raw_obj["MZ"], 637 | ], 638 | ).astype(np.float32) 639 | 640 | 641 | class DataDirectories(FeatureType): 642 | """Extracts size and virtual address of the first 15 data directories""" 643 | 644 | name = "datadirectories" 645 | dim = 15 * 2 646 | 647 | def __init__(self): 648 | super(FeatureType, self).__init__() 649 | self._name_order = [ 650 | "EXPORT_TABLE", 651 | "IMPORT_TABLE", 652 | "RESOURCE_TABLE", 653 | "EXCEPTION_TABLE", 654 | "CERTIFICATE_TABLE", 655 | "BASE_RELOCATION_TABLE", 656 | "DEBUG", 657 | "ARCHITECTURE", 658 | "GLOBAL_PTR", 659 | "TLS_TABLE", 660 | "LOAD_CONFIG_TABLE", 661 | "BOUND_IMPORT", 662 | "IAT", 663 | "DELAY_IMPORT_DESCRIPTOR", 664 | "CLR_RUNTIME_HEADER", 665 | ] 666 | 667 | def raw_features(self, bytez, lief_binary): 668 | output = [] 669 | if lief_binary is None: 670 | return output 671 | 672 | for data_directory in lief_binary.data_directories: 673 | output.append( 674 | { 675 | "name": str(data_directory.type).replace("DATA_DIRECTORY.", ""), 676 | "size": data_directory.size, 677 | "virtual_address": data_directory.rva, 678 | }, 679 | ) 680 | return output 681 | 682 | def process_raw_features(self, raw_obj): 683 | features = np.zeros(2 * len(self._name_order), dtype=np.float32) 684 | for i in range(len(self._name_order)): 685 | if i < len(raw_obj): 686 | features[2 * i] = raw_obj[i]["size"] 687 | features[2 * i + 1] = raw_obj[i]["virtual_address"] 688 | return features 689 | 690 | 691 | class PEFeatureExtractor: 692 | """Extract useful features from a PE file, and return as a vector of fixed size.""" 693 | 694 | def __init__(self, feature_version=2): 695 | self.features = [ 696 | ByteHistogram(), 697 | ByteEntropyHistogram(), 698 | StringExtractor(), 699 | GeneralFileInfo(), 700 | HeaderFileInfo(), 701 | SectionInfo(), 702 | ImportsInfo(), 703 | ExportsInfo(), 704 | ] 705 | if feature_version == 1: 706 | if not lief.__version__.startswith("0.8.3"): 707 | print( 708 | f"WARNING: EMBER feature version 1 were computed using lief version 0.8.3-18d5b75", 709 | ) 710 | print( 711 | f"WARNING: lief version {lief.__version__} found instead. There may be slight inconsistencies", 712 | ) 713 | print(f"WARNING: in the feature calculations.") 714 | elif feature_version == 2: 715 | self.features.append(DataDirectories()) 716 | if not lief.__version__.startswith("0.9.0"): 717 | print( 718 | f"WARNING: EMBER feature version 2 were computed using lief version 0.9.0-", 719 | ) 720 | print( 721 | f"WARNING: lief version {lief.__version__} found instead. There may be slight inconsistencies", 722 | ) 723 | print(f"WARNING: in the feature calculations.") 724 | else: 725 | raise Exception( 726 | f"EMBER feature version must be 1 or 2. Not {feature_version}", 727 | ) 728 | self.dim = sum(fe.dim for fe in self.features) 729 | 730 | def raw_features(self, bytez): 731 | lief_errors = ( 732 | lief.bad_format, 733 | lief.bad_file, 734 | lief.pe_error, 735 | lief.parser_error, 736 | lief.read_out_of_bound, 737 | RuntimeError, 738 | ) 739 | try: 740 | lief_binary = lief.PE.parse(list(bytez)) 741 | except lief_errors as e: 742 | print("lief error: ", str(e)) 743 | lief_binary = None 744 | # everything else (KeyboardInterrupt, SystemExit, ValueError): 745 | except Exception: 746 | raise 747 | 748 | features = {"sha256": hashlib.sha256(bytez).hexdigest()} 749 | features.update( 750 | {fe.name: fe.raw_features(bytez, lief_binary) for fe in self.features}, 751 | ) 752 | return features 753 | 754 | def process_raw_features(self, raw_obj): 755 | feature_vectors = [ 756 | fe.process_raw_features( 757 | raw_obj[fe.name], 758 | ) 759 | for fe in self.features 760 | ] 761 | return np.hstack(feature_vectors).astype(np.float32) 762 | 763 | def feature_vector(self, bytez): 764 | return self.process_raw_features(self.raw_features(bytez)) 765 | 766 | 767 | class EmberModel: 768 | def __init__(self): 769 | self.model = lgb.Booster(model_file=model_path) 770 | self.threshold = 0.8336 # Ember 1% FPR 771 | self.feature_version = 2 772 | self.extractor = PEFeatureExtractor(self.feature_version) 773 | 774 | def extract(self, bytez): 775 | return np.array(self.extractor.feature_vector(bytez), dtype=np.float32) 776 | 777 | def predict_sample(self, features): 778 | return self.model.predict([features])[0] 779 | -------------------------------------------------------------------------------- /malware_rl/envs/utils/interface.py: -------------------------------------------------------------------------------- 1 | import glob 2 | import os.path 3 | import re 4 | import sys 5 | 6 | module_path = os.path.dirname(os.path.abspath(sys.modules[__name__].__file__)) 7 | SAMPLE_PATH = os.path.join(module_path, "samples") 8 | 9 | 10 | def fetch_file(sample_path): 11 | with open(sample_path, "rb") as f: 12 | bytez = f.read() 13 | return bytez 14 | 15 | 16 | def get_available_sha256(): 17 | sha256list = [] 18 | for fp in glob.glob(os.path.join(SAMPLE_PATH, "*")): 19 | fn = os.path.split(fp)[-1] 20 | # require filenames to be sha256 21 | result = re.match(r"^[0-9a-fA-F]{64}$", fn) 22 | if result: 23 | sha256list.append(result.group(0)) 24 | # no files found in SAMLPE_PATH with sha256 names 25 | assert len(sha256list) > 0 26 | return sha256list 27 | -------------------------------------------------------------------------------- /malware_rl/envs/utils/malconv.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bfilar/malware_rl/300f47ff2240132a449277283807426d34271993/malware_rl/envs/utils/malconv.h5 -------------------------------------------------------------------------------- /malware_rl/envs/utils/malconv.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | """ 3 | Defines the MalConv architecture. 4 | Adapted from https://arxiv.org/pdf/1710.09435.pdf 5 | Things different about our implementation and that of the original paper: 6 | * The paper uses batch_size = 256 and 7 | SGD(lr=0.01, momentum=0.9, decay=UNDISCLOSED, nesterov=True ) 8 | * The paper didn't have a special EOF symbol 9 | * The paper allowed for up to 2MB malware sizes, 10 | we use 1.0MB because of memory on a Titan X 11 | """ 12 | import os 13 | import sys 14 | 15 | import numpy as np 16 | import tensorflow as tf 17 | from keras import metrics 18 | from keras.models import load_model 19 | from keras.optimizers import SGD 20 | 21 | module_path = os.path.split(os.path.abspath(sys.modules[__name__].__file__))[0] 22 | model_path = os.path.join(module_path, "malconv.h5") 23 | 24 | tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR) 25 | 26 | 27 | class MalConv: 28 | def __init__(self): 29 | self.batch_size = 100 30 | self.input_dim = 257 # every byte plus a special padding symbol 31 | self.padding_char = 256 32 | self.malicious_threshold = 0.5 33 | 34 | self.model = load_model(model_path) 35 | _, self.maxlen, self.embedding_size = self.model.layers[1].output_shape 36 | 37 | self.model.compile( 38 | loss="binary_crossentropy", 39 | optimizer=SGD(lr=0.01, momentum=0.9, nesterov=True, decay=1e-3), 40 | metrics=[metrics.binary_accuracy], 41 | ) 42 | 43 | def extract(self, bytez): 44 | b = np.ones((self.maxlen,), dtype=np.int16) * self.padding_char 45 | bytez = np.frombuffer(bytez[: self.maxlen], dtype=np.uint8) 46 | b[: len(bytez)] = bytez 47 | return b 48 | 49 | def predict_sample(self, bytez): 50 | return self.model.predict(bytez.reshape(1, -1))[0][0] 51 | -------------------------------------------------------------------------------- /malware_rl/envs/utils/samples/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bfilar/malware_rl/300f47ff2240132a449277283807426d34271993/malware_rl/envs/utils/samples/.gitkeep -------------------------------------------------------------------------------- /malware_rl/envs/utils/sorel.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | 4 | import lightgbm as lgb 5 | import numpy as np 6 | from malware_rl.envs.utils.ember import PEFeatureExtractor 7 | 8 | module_path = os.path.split(os.path.abspath(sys.modules[__name__].__file__))[0] 9 | 10 | try: 11 | model_path = os.path.join(module_path, "sorel.model") 12 | except ValueError: 13 | print("The model path provide does not exist") 14 | 15 | 16 | class SorelModel: 17 | def __init__(self): 18 | self.model = lgb.Booster(model_file=model_path) 19 | self.threshold = 0.8336 # Ember 1% FPR 20 | self.feature_version = 2 21 | self.extractor = PEFeatureExtractor(self.feature_version) 22 | 23 | def extract(self, bytez): 24 | return np.array(self.extractor.feature_vector(bytez), dtype=np.float32) 25 | 26 | def predict_sample(self, features): 27 | return self.model.predict([features])[0] 28 | -------------------------------------------------------------------------------- /ppo.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | import sys 4 | 5 | import gym 6 | import numpy as np 7 | from gym import wrappers 8 | from stable_baselines3 import PPO 9 | 10 | import malware_rl 11 | 12 | random.seed(0) 13 | module_path = os.path.split(os.path.abspath(sys.modules[__name__].__file__))[0] 14 | outdir = os.path.join(module_path, "data/logs/ppo-agent-results") 15 | 16 | # Setting up environment 17 | env = gym.make("sorel-train-v0") 18 | env = wrappers.Monitor(env, directory=outdir, force=True) 19 | env.seed(0) 20 | 21 | # Setting up training parameters and holding variables 22 | episode_count = 250 23 | done = False 24 | reward = 0 25 | evasions = 0 26 | evasion_history = {} 27 | 28 | # Train the agent 29 | agent = PPO("MlpPolicy", env, verbose=1) 30 | agent.learn(total_timesteps=2500) 31 | 32 | 33 | # Test the agent 34 | for i in range(episode_count): 35 | ob = env.reset() 36 | sha256 = env.env.sha256 37 | while True: 38 | action, _states = agent.predict(ob, reward, done) 39 | obs, rewards, done, ep_history = env.step(action) 40 | if done and rewards >= 10.0: 41 | evasions += 1 42 | evasion_history[sha256] = ep_history 43 | break 44 | 45 | elif done: 46 | break 47 | 48 | # Output metrics/evaluation stuff 49 | evasion_rate = (evasions / episode_count) * 100 50 | mean_action_count = np.mean(env.get_episode_lengths()) 51 | print(f"{evasion_rate}% samples evaded model.") 52 | print(f"Average of {mean_action_count} moves to evade model.") 53 | -------------------------------------------------------------------------------- /random_agent.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | import sys 4 | 5 | import gym 6 | import numpy as np 7 | from gym import wrappers 8 | from IPython import embed 9 | 10 | import malware_rl 11 | 12 | random.seed(0) 13 | module_path = os.path.split(os.path.abspath(sys.modules[__name__].__file__))[0] 14 | 15 | 16 | class RandomAgent: 17 | """The world's simplest agent!""" 18 | 19 | def __init__(self, action_space): 20 | self.action_space = action_space 21 | 22 | def act(self, observation, reward, done): 23 | return self.action_space.sample() 24 | 25 | 26 | # gym setup 27 | outdir = os.path.join(module_path, "data/logs/random-agent-results") 28 | env = gym.make("malconv-train-v0") 29 | env = wrappers.Monitor(env, directory=outdir, force=True) 30 | env.seed(0) 31 | episode_count = 250 32 | done = False 33 | reward = 0 34 | 35 | # metric tracking 36 | evasions = 0 37 | evasion_history = {} 38 | 39 | agent = RandomAgent(env.action_space) 40 | 41 | for i in range(episode_count): 42 | ob = env.reset() 43 | sha256 = env.env.sha256 44 | while True: 45 | action = agent.act(ob, reward, done) 46 | ob, reward, done, ep_history = env.step(action) 47 | if done and reward >= 10.0: 48 | evasions += 1 49 | evasion_history[sha256] = ep_history 50 | break 51 | 52 | elif done: 53 | break 54 | 55 | evasion_rate = (evasions / episode_count) * 100 56 | mean_action_count = np.mean(env.get_episode_lengths()) 57 | print(f"{evasion_rate}% samples evaded model.") 58 | print(f"Average of {mean_action_count} moves to evade model.") 59 | embed() 60 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | absl-py==0.10.0 2 | appdirs==1.4.3 3 | appnope==0.1.0 4 | astunparse==1.6.3 5 | backcall==0.2.0 6 | CacheControl==0.12.6 7 | cachetools==4.1.1 8 | certifi>=2022.12.07 9 | chardet==3.0.4 10 | cloudpickle==1.3.0 11 | colorama==0.4.3 12 | contextlib2==0.6.0 13 | decorator==4.4.2 14 | distlib==0.3.0 15 | distro==1.4.0 16 | gast==0.3.3 17 | google-auth==1.21.0 18 | google-auth-oauthlib==0.4.1 19 | google-pasta==0.2.0 20 | grpcio==1.53.2 21 | gym==0.17.2 22 | h5py==2.10.0 23 | html5lib==1.0.1 24 | idna==2.10 25 | importlib-metadata==1.7.0 26 | ipaddr==2.2.0 27 | ipython==8.10.0 28 | ipython-genutils==0.2.0 29 | jedi==0.17.2 30 | joblib>=1.2.0 31 | Keras==2.4.3 32 | Keras-Preprocessing==1.1.2 33 | lief>=0.12.2 34 | lightgbm==2.3.1 35 | lockfile==0.12.2 36 | Markdown==3.2.2 37 | mccabe==0.6.1 38 | msgpack==0.6.2 39 | nose==1.3.7 40 | numpy==1.22.0 41 | oauthlib==3.1.0 42 | opt-einsum==3.3.0 43 | packaging==20.3 44 | parso==0.7.1 45 | pep517==0.8.2 46 | pexpect==4.8.0 47 | pickleshare==0.7.5 48 | progress==1.5 49 | prompt-toolkit==3.0.6 50 | protobuf>=3.18.3 51 | ptyprocess==0.6.0 52 | pyasn1==0.4.8 53 | pyasn1-modules==0.2.8 54 | pycodestyle==2.6.0 55 | pyflakes==2.2.0 56 | pyglet==1.5.0 57 | Pygments>=2.7.4 58 | pyparsing==2.4.6 59 | python-dateutil==2.8.1 60 | pytoml==0.1.21 61 | PyYAML>=5.4 62 | requests==2.24.0 63 | requests-oauthlib==1.3.0 64 | retrying==1.3.3 65 | rsa>=4.7 66 | scikit-learn==1.0.1 67 | scipy==1.11.1 68 | six==1.15.0 69 | stable-baselines3 70 | svn==1.0.1 71 | tensorboard==2.3.0 72 | tensorboard-plugin-wit==1.7.0 73 | tensorflow>=2.3.1 74 | tensorflow-estimator==2.3.0 75 | termcolor==1.1.0 76 | threadpoolctl==2.1.0 77 | traitlets==4.3.3 78 | urllib3>=1.26.5 79 | wcwidth==0.2.5 80 | webencodings==0.5.1 81 | Werkzeug==2.3.8 82 | wrapt==1.12.1 83 | zipp==3.1.0 84 | -------------------------------------------------------------------------------- /stable_baselines_env_check.py: -------------------------------------------------------------------------------- 1 | import gym 2 | from stable_baselines3.common.env_checker import check_env 3 | 4 | import malware_rl # Needs to be included in order to make the environment using gym 5 | 6 | 7 | def test_env(env_name): 8 | 9 | print(f"TESTING {env_name}!") 10 | env = gym.make(env_name) 11 | print("Checking environment . . .") 12 | check_env(env) 13 | env.close() 14 | 15 | 16 | environments = ["sorel-train-v0", "malconv-train-v0", "ember-train-v0"] 17 | 18 | for e in environments: 19 | test_env(e) 20 | --------------------------------------------------------------------------------