├── .gitignore ├── README.md ├── Simulation1.ipynb ├── Simulation1_1 - Analysis.ipynb ├── Simulation1_1 - Training.ipynb ├── Simulation2_batch32 - Analysis.ipynb ├── Simulation2_batch32 - Training.ipynb ├── Simulation2_batch51 - Analysis.ipynb ├── Simulation2_batch51 - Training.ipynb ├── agent.py ├── const.py ├── evaluate.py ├── evaluate_extended.py ├── generate_actions.py ├── mockSQLenv.py └── utilities.py /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints/ 2 | __pycache__/ 3 | ignore_* 4 | models/ 5 | *.zip 6 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # CTF-SQL 2 | Modelling SQL Injection Using Reinforcement Learning 3 | 4 | ### Requirements 5 | The following code requires *numpy, scipy, matplotlib* and [OpenAI gym](https://github.com/openai/gym); [stable-baselines3](https://stable-baselines3.readthedocs.io/en/master/) (together with [pytorch](https://pytorch.org/)) is used to train reinforcement learning agents. 6 | 7 | **Warning:** Simulation1 and Simulation2 rely at the moment on synthetic SQL server simulators. Simulation1 uses the module *mockSQLenv.py*. Simulation2 uses the OpenAI gym environment [gym-CTF-SQL](https://github.com/avalds/gym-CTF-SQL) (check the gym repository for installing and running the environment) 8 | 9 | ### Content 10 | The project *CTF-SQL* contains the simulations running reinforcement agent on a CTF challenge containing a simple SQL injection vulnerability. Every *SimulationX* file contains a simulation, including training and analysis. 11 | - *Simulation1* runs a tabular Q-learning agent; 12 | - *Simulation2* runs a deep Q-learning agent (with different batch settings). 13 | Details about the setup and the interpretation may be found in [1]. 14 | 15 | ### References 16 | 17 | \[1\] Erdodi, L., Sommervoll, A.A. and Zennaro, F.M., 2020. Simulating SQL Injection Vulnerability Exploitation Using Q-Learning Reinforcement Learning Agents. arXiv preprint. -------------------------------------------------------------------------------- /Simulation1_1 - Analysis.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Simulation 1.1 - Analyzing the agents\n", 8 | "\n", 9 | "\n", 10 | "## Importing libraries" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 1, 16 | "metadata": {}, 17 | "outputs": [], 18 | "source": [ 19 | "import numpy as np\n", 20 | "import matplotlib.pyplot as plt\n", 21 | "from scipy.ndimage import uniform_filter1d\n", 22 | "from IPython.display import clear_output\n", 23 | "import datetime\n", 24 | "import joblib\n", 25 | "from tqdm import tqdm\n", 26 | "\n", 27 | "import const\n", 28 | "import utilities as ut\n", 29 | "import mockSQLenv as SQLenv\n", 30 | "import agent as agn" 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": {}, 36 | "source": [ 37 | "## Defining the parameters of the simulations" 38 | ] 39 | }, 40 | { 41 | "cell_type": "code", 42 | "execution_count": 2, 43 | "metadata": {}, 44 | "outputs": [], 45 | "source": [ 46 | "n_simulations = 10\n", 47 | "n_episodes_training = 10**6\n", 48 | "n_episodes_test = 10**2\n", 49 | "\n", 50 | "exploration_train = 0.1\n", 51 | "exploration_test = 0\n", 52 | "learningrate = 0.1\n", 53 | "discount = 0.9\n", 54 | "max_steps = 1000\n", 55 | "\n", 56 | "flag_reward = 10\n", 57 | "query_reward = -1" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": {}, 63 | "source": [ 64 | "## Loading the statistics" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": 3, 70 | "metadata": {}, 71 | "outputs": [], 72 | "source": [ 73 | "train_data = joblib.load('ignore_simul1_traindata_20210217232217657942.pkl')\n", 74 | "test_data = joblib.load('ignore_simul1_testdata_20210217232217657942.pkl')" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "## Analyzing the agent\n", 82 | "\n", 83 | "### Training: number of states\n", 84 | "\n", 85 | "Plotting the evolution in the number of states (average and standard deviation over 10 repetitions)." 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": 11, 91 | "metadata": {}, 92 | "outputs": [ 93 | { 94 | "data": { 95 | "text/plain": [ 96 | "array([2000000. , 2000000.1, 2000000.2, ..., 2099999.7, 2099999.8,\n", 97 | " 2099999.9])" 98 | ] 99 | }, 100 | "execution_count": 11, 101 | "metadata": {}, 102 | "output_type": "execute_result" 103 | } 104 | ], 105 | "source": [ 106 | "0.1*np.arange(0,n_episodes_training)+2*10**6" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": 13, 112 | "metadata": {}, 113 | "outputs": [ 114 | { 115 | "data": { 116 | "text/plain": [ 117 | "array([4.76759389e+00, 1.54385359e+06])" 118 | ] 119 | }, 120 | "execution_count": 13, 121 | "metadata": {}, 122 | "output_type": "execute_result" 123 | } 124 | ], 125 | "source": [ 126 | "params = np.polyfit(range(200000, n_episodes_training), np.mean(train_data[:,2,200000:],axis=0),1)" 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": 14, 132 | "metadata": {}, 133 | "outputs": [ 134 | { 135 | "name": "stdout", 136 | "output_type": "stream", 137 | "text": [ 138 | "Linear regression params: slope = 4.767593893285314 ; intercept = 1543853.5924806432\n" 139 | ] 140 | }, 141 | { 142 | "name": "stderr", 143 | "output_type": "stream", 144 | "text": [ 145 | "/home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/IPython/core/events.py:88: UserWarning: Creating legend with loc=\"best\" can be slow with large amounts of data.\n", 146 | " func(*args, **kwargs)\n", 147 | "/home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/IPython/core/pylabtools.py:128: UserWarning: Creating legend with loc=\"best\" can be slow with large amounts of data.\n", 148 | " fig.canvas.print_figure(bytes_io, **kw)\n" 149 | ] 150 | }, 151 | { 152 | "data": { 153 | "image/png": "\n", 154 | "text/plain": [ 155 | "
" 156 | ] 157 | }, 158 | "metadata": { 159 | "needs_background": "light" 160 | }, 161 | "output_type": "display_data" 162 | } 163 | ], 164 | "source": [ 165 | "plt.errorbar(range(n_episodes_training),np.mean(train_data[:,2,:],axis=0),yerr=np.std(train_data[:,2,:],axis=0),ecolor='cyan')\n", 166 | "plt.xlabel('episodes')\n", 167 | "plt.ylabel('number of states')\n", 168 | "\n", 169 | "params = np.polyfit(range(200000, n_episodes_training), np.mean(train_data[:,2,200000:],axis=0),1)\n", 170 | "plt.plot(range(n_episodes_training), np.arange(n_episodes_training)*params[0]+params[1], c='red',ls='--',label='linear regression')\n", 171 | "plt.legend()\n", 172 | "\n", 173 | "print('Linear regression params: slope = {0} ; intercept = {1}'.format(params[0],params[1]))" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "### Training: number of steps\n", 181 | "\n", 182 | "We first plot the number of steps (average and standard deviation over 10 repetitions)" 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": {}, 188 | "source": [ 189 | "### Training: number of steps\n", 190 | "\n", 191 | "We first plot the number of steps (average and standard deviation over 10 repetitions)" 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": 5, 197 | "metadata": {}, 198 | "outputs": [ 199 | { 200 | "data": { 201 | "text/plain": [ 202 | "Text(0, 0.5, 'number of steps per episode')" 203 | ] 204 | }, 205 | "execution_count": 5, 206 | "metadata": {}, 207 | "output_type": "execute_result" 208 | }, 209 | { 210 | "data": { 211 | "image/png": "\n", 212 | "text/plain": [ 213 | "
" 214 | ] 215 | }, 216 | "metadata": { 217 | "needs_background": "light" 218 | }, 219 | "output_type": "display_data" 220 | } 221 | ], 222 | "source": [ 223 | "plt.errorbar(range(n_episodes_training),np.mean(train_data[:,0,:],axis=0),yerr=np.std(train_data[:,0,:],axis=0),ecolor='cyan')\n", 224 | "plt.xlabel('episodes')\n", 225 | "plt.ylabel('number of steps per episode')" 226 | ] 227 | }, 228 | { 229 | "cell_type": "markdown", 230 | "metadata": {}, 231 | "source": [ 232 | "We then smooth with a $1000$-step window each repetition, then we compute mean and standard deviation in number of steps." 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": 6, 238 | "metadata": {}, 239 | "outputs": [ 240 | { 241 | "data": { 242 | "text/plain": [ 243 | "Text(0, 0.5, 'number of steps per episode')" 244 | ] 245 | }, 246 | "execution_count": 6, 247 | "metadata": {}, 248 | "output_type": "execute_result" 249 | }, 250 | { 251 | "data": { 252 | "image/png": "\n", 253 | "text/plain": [ 254 | "
" 255 | ] 256 | }, 257 | "metadata": { 258 | "needs_background": "light" 259 | }, 260 | "output_type": "display_data" 261 | } 262 | ], 263 | "source": [ 264 | "mean_smmothed = np.mean(uniform_filter1d(train_data[:,0,:],axis=1,size=1000,mode='nearest'),axis=0)\n", 265 | "std_smmothed = np.std(uniform_filter1d(train_data[:,0,:],axis=1,size=1000,mode='nearest'),axis=0)\n", 266 | "\n", 267 | "plt.errorbar(range(n_episodes_training),mean_smmothed,yerr=std_smmothed,ecolor='cyan')\n", 268 | "plt.xlabel('episodes')\n", 269 | "plt.ylabel('number of steps per episode')" 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": {}, 275 | "source": [ 276 | "### Training: total number of steps\n", 277 | "\n", 278 | "We compute the total number of steps taken by each agent." 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": 7, 284 | "metadata": { 285 | "scrolled": true 286 | }, 287 | "outputs": [ 288 | { 289 | "name": "stdout", 290 | "output_type": "stream", 291 | "text": [ 292 | "Average total number of steps: 65984107.0\n" 293 | ] 294 | }, 295 | { 296 | "data": { 297 | "image/png": "\n", 298 | "text/plain": [ 299 | "
" 300 | ] 301 | }, 302 | "metadata": { 303 | "needs_background": "light" 304 | }, 305 | "output_type": "display_data" 306 | } 307 | ], 308 | "source": [ 309 | "total_steps = np.sum(train_data[:,0,:],axis=1)\n", 310 | "\n", 311 | "plt.bar(range(1,n_simulations+1), total_steps)\n", 312 | "plt.axhline(np.mean(total_steps),c='red',ls='--',label='mean')\n", 313 | "plt.legend()\n", 314 | "plt.xlabel('agent')\n", 315 | "plt.ylabel('total number of steps')\n", 316 | "\n", 317 | "print('Average total number of steps: {0}'.format(np.mean(total_steps)))" 318 | ] 319 | }, 320 | { 321 | "cell_type": "markdown", 322 | "metadata": {}, 323 | "source": [ 324 | "### Test: number of steps\n", 325 | "\n", 326 | "We first plot the number of steps (average and standard deviation over 10 repetitions)" 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": 8, 332 | "metadata": {}, 333 | "outputs": [ 334 | { 335 | "data": { 336 | "text/plain": [ 337 | "" 338 | ] 339 | }, 340 | "execution_count": 8, 341 | "metadata": {}, 342 | "output_type": "execute_result" 343 | }, 344 | { 345 | "data": { 346 | "image/png": "\n", 347 | "text/plain": [ 348 | "
" 349 | ] 350 | }, 351 | "metadata": { 352 | "needs_background": "light" 353 | }, 354 | "output_type": "display_data" 355 | } 356 | ], 357 | "source": [ 358 | "plt.errorbar(range(n_episodes_test),np.mean(test_data[:,0,:],axis=0),yerr=np.std(test_data[:,0,:],axis=0))\n", 359 | "plt.xlabel('episodes')\n", 360 | "plt.ylabel('number of steps per episode')\n", 361 | "plt.axhline(np.mean(test_data[:,0,:]),c='red',ls='--',label='mean')\n", 362 | "plt.legend()" 363 | ] 364 | } 365 | ], 366 | "metadata": { 367 | "kernelspec": { 368 | "display_name": "Python 3", 369 | "language": "python", 370 | "name": "python3" 371 | }, 372 | "language_info": { 373 | "codemirror_mode": { 374 | "name": "ipython", 375 | "version": 3 376 | }, 377 | "file_extension": ".py", 378 | "mimetype": "text/x-python", 379 | "name": "python", 380 | "nbconvert_exporter": "python", 381 | "pygments_lexer": "ipython3", 382 | "version": "3.7.5" 383 | } 384 | }, 385 | "nbformat": 4, 386 | "nbformat_minor": 4 387 | } 388 | -------------------------------------------------------------------------------- /Simulation2_batch32 - Training.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Simulation 2_b32 - Training the agents\n", 8 | "\n", 9 | "\n", 10 | "## Importing libraries" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 1, 16 | "metadata": {}, 17 | "outputs": [], 18 | "source": [ 19 | "import numpy as np\n", 20 | "import matplotlib.pyplot as plt\n", 21 | "from scipy.signal import savgol_filter as SGfilter\n", 22 | "from IPython.display import clear_output, display\n", 23 | "import datetime\n", 24 | "import joblib\n", 25 | "from tqdm import tqdm\n", 26 | "\n", 27 | "import const\n", 28 | "import utilities as ut\n", 29 | "\n", 30 | "\n", 31 | "import gym\n", 32 | "import ctfsql\n", 33 | "from stable_baselines3.common.vec_env import DummyVecEnv\n", 34 | "from stable_baselines3 import DQN\n", 35 | "import evaluate as ev" 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "## Defining the parameters of the simulations" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 2, 48 | "metadata": {}, 49 | "outputs": [], 50 | "source": [ 51 | "n_simulations = 10\n", 52 | "n_episodes_training = 10**6\n", 53 | "n_episodes_test = 10**2" 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 3, 59 | "metadata": {}, 60 | "outputs": [ 61 | { 62 | "name": "stdout", 63 | "output_type": "stream", 64 | "text": [ 65 | "Using cpu device\n", 66 | "Wrapping the env in a DummyVecEnv.\n", 67 | "Using cpu device\n", 68 | "Wrapping the env in a DummyVecEnv.\n", 69 | "Using cpu device\n", 70 | "Wrapping the env in a DummyVecEnv.\n", 71 | "Using cpu device\n", 72 | "Wrapping the env in a DummyVecEnv.\n", 73 | "Using cpu device\n", 74 | "Wrapping the env in a DummyVecEnv.\n", 75 | "Using cpu device\n", 76 | "Wrapping the env in a DummyVecEnv.\n", 77 | "Using cpu device\n", 78 | "Wrapping the env in a DummyVecEnv.\n", 79 | "Using cpu device\n", 80 | "Wrapping the env in a DummyVecEnv.\n", 81 | "Using cpu device\n", 82 | "Wrapping the env in a DummyVecEnv.\n", 83 | "Using cpu device\n", 84 | "Wrapping the env in a DummyVecEnv.\n" 85 | ] 86 | } 87 | ], 88 | "source": [ 89 | "env = gym.make('ctfsql-v0')\n", 90 | "\n", 91 | "dqn_models = [DQN('MlpPolicy', env, verbose=1) for i in range(n_simulations)]" 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": {}, 97 | "source": [ 98 | "## Running the simulations" 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": 134, 104 | "metadata": {}, 105 | "outputs": [ 106 | { 107 | "data": { 108 | "text/plain": [ 109 | "'train_data = np.zeros((n_simulations,3,n_episodes_training))\\ntest_data = np.zeros((n_simulations,3,n_episodes_test))\\n\\nfor i in tqdm(range(n_simulations)):\\n dqn_models[i].learn(total_timesteps=10**6)\\n timestamp = datetime.datetime.now().strftime(\"%Y%m%d%H%M%S%f\")\\n dqn_models[i].save(str(i) + \\'ignore_simul2_\\'+timestamp)\\n'" 110 | ] 111 | }, 112 | "execution_count": 134, 113 | "metadata": {}, 114 | "output_type": "execute_result" 115 | } 116 | ], 117 | "source": [ 118 | "train_data = np.zeros((n_simulations,3,n_episodes_training))\n", 119 | "test_data = np.zeros((n_simulations,3,n_episodes_test))\n", 120 | "\n", 121 | "for i in tqdm(range(n_simulations)):\n", 122 | " dqn_models[i].learn(total_timesteps=10**6)\n", 123 | " timestamp = datetime.datetime.now().strftime(\"%Y%m%d%H%M%S%f\")\n", 124 | " dqn_models[i].save(str(i) + 'ignore_simul2_'+timestamp)\n" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": 7, 130 | "metadata": {}, 131 | "outputs": [ 132 | { 133 | "name": "stderr", 134 | "output_type": "stream", 135 | "text": [ 136 | "\r", 137 | " 0%| | 0/10 [00:00= self.syntaxmin and action_number < self.syntaxmax): 52 | if(action_number == self.flag_cols*2 + self.setup[1] + 1 or action_number == self.flag_cols*2 + self.setup[1] + 2): 53 | if(self.verbose): print('Query with correct number of rows') 54 | return 4,self.query_reward, self.termination, "Server response is 4" 55 | 56 | if(self.verbose): print('Query has the correct escape, but contains the wrong number of rows. I return 0') 57 | return 0,self.query_reward, self.termination,'Server response is 0' 58 | else: 59 | if(self.verbose): print('Query is syntactically wrong. I return -1') 60 | return -1,self.query_reward, self.termination,'Server response is -1' 61 | 62 | 63 | def reset(self): 64 | self.termination = False 65 | if(self.verbose): print('Game reset (but not reinitialized with a new random query!)') 66 | return None,0,self.termination,'Game reset' 67 | 68 | def reveal_solution(self): 69 | #For debugging only 70 | print('Correct escapes are: \n [{0}]: {1} \n [{2}]: {3}'.format(self.setup[0],self.A[self.setup[0]],self.setup[1],self.A[self.setup[1]])) 71 | print('Correct SQL injection is: \n [{0}]: {1}'.format(self.setup[2],self.A[self.setup[2]])) 72 | -------------------------------------------------------------------------------- /utilities.py: -------------------------------------------------------------------------------- 1 | from itertools import chain, combinations 2 | 3 | def powerset(iterable): 4 | #based on: https://stackoverflow.com/questions/18035595/powersets-in-python-using-itertools 5 | "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)" 6 | s = list(iterable) 7 | return list(chain.from_iterable(combinations(s, r) for r in range(len(s)+1))) 8 | 9 | 10 | def getdictshape(d): 11 | return (len(d.keys()), d[()].shape) 12 | 13 | 14 | if __name__ == "__main__": 15 | x = powerset([0,1,2,3,4,5]) 16 | print(x) 17 | print(len(x)) 18 | --------------------------------------------------------------------------------