├── Figures ├── LCCDE_Overview.jpg ├── MTH-IDS_Overview.png ├── README.md └── Tree-based_IDS_Overview.jpg ├── LCCDE_IDS_GlobeCom22.ipynb ├── LCCDE_IDS_GlobeCom22_paper.pdf ├── LICENSE ├── MTH_IDS_IoTJ.ipynb ├── MTH_IDS_paper.pdf ├── README.md ├── Tree-based_IDS_GlobeCom19.ipynb ├── Tree-based_IDS_paper.pdf └── data ├── CICIDS2017_sample.csv ├── CICIDS2017_sample_km.csv └── README.md /Figures/LCCDE_Overview.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Western-OC2-Lab/Intrusion-Detection-System-Using-Machine-Learning/1b8b79d1711a97fee33d83639b3ae4e19f980ba6/Figures/LCCDE_Overview.jpg -------------------------------------------------------------------------------- /Figures/MTH-IDS_Overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Western-OC2-Lab/Intrusion-Detection-System-Using-Machine-Learning/1b8b79d1711a97fee33d83639b3ae4e19f980ba6/Figures/MTH-IDS_Overview.png -------------------------------------------------------------------------------- /Figures/README.md: -------------------------------------------------------------------------------- 1 | # The figures used in this GitHub repository 2 | -------------------------------------------------------------------------------- /Figures/Tree-based_IDS_Overview.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Western-OC2-Lab/Intrusion-Detection-System-Using-Machine-Learning/1b8b79d1711a97fee33d83639b3ae4e19f980ba6/Figures/Tree-based_IDS_Overview.jpg -------------------------------------------------------------------------------- /LCCDE_IDS_GlobeCom22.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# LCCDE: A Decision-Based Ensemble Framework for Intrusion Detection in The Internet of Vehicles\n", 8 | "This is the code for the paper entitled \"**LCCDE: A Decision-Based Ensemble Framework for Intrusion Detection in The Internet of Vehicles**\" accepted in 2022 IEEE Global Communications Conference (GLOBECOM). \n", 9 | "Authors: Li Yang (lyang339@uwo.ca), Abdallah Shami (Abdallah.Shami@uwo.ca), Gary Stevens, and Stephen de Rusett \n", 10 | "Organization: The Optimized Computing and Communications (OC2) Lab, ECE Department, Western University, Ontario, Canada; S2E Technologies, St. Jacobs, Ontario, Canada \n", 11 | "\n", 12 | "If you find this repository useful in your research, please cite: \n", 13 | "L. Yang, A. Shami, G. Stevens, and S. DeRusett, “LCCDE: A Decision-Based Ensemble Framework for Intrusion Detection in The Internet of Vehicles,\" in 2022 IEEE Global Communications Conference (GLOBECOM), 2022, pp. 1-6." 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": {}, 19 | "source": [ 20 | "## Import libraries" 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 1, 26 | "metadata": {}, 27 | "outputs": [], 28 | "source": [ 29 | "import warnings\n", 30 | "warnings.filterwarnings(\"ignore\")" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 2, 36 | "metadata": {}, 37 | "outputs": [], 38 | "source": [ 39 | "import pandas as pd\n", 40 | "import numpy as np\n", 41 | "import matplotlib.pyplot as plt\n", 42 | "import seaborn as sns\n", 43 | "from sklearn.model_selection import train_test_split\n", 44 | "from sklearn.metrics import classification_report,confusion_matrix,accuracy_score, precision_score, recall_score, f1_score\n", 45 | "import lightgbm as lgb\n", 46 | "import catboost as cbt\n", 47 | "import xgboost as xgb\n", 48 | "import time\n", 49 | "from river import stream\n", 50 | "from statistics import mode" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "## Read the sampled CICIDS2017 dataset\n", 58 | "The CICIDS2017 dataset is publicly available at: https://www.unb.ca/cic/datasets/ids-2017.html \n", 59 | "Due to the large size of this dataset, the sampled subsets of CICIDS2017 is used. The subsets are in the \"data\" folder. \n", 60 | "If you want to use this code on other datasets (e.g., CAN-intrusion dataset), just change the dataset name and follow the same steps. The models in this code are generic models that can be used in any intrusion detection/network traffic datasets." 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": 3, 66 | "metadata": {}, 67 | "outputs": [], 68 | "source": [ 69 | "df = pd.read_csv(\"./data/CICIDS2017_sample_km.csv\")" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": 4, 75 | "metadata": {}, 76 | "outputs": [ 77 | { 78 | "data": { 79 | "text/plain": [ 80 | "0 18225\n", 81 | "3 3042\n", 82 | "6 2180\n", 83 | "1 1966\n", 84 | "5 1255\n", 85 | "2 96\n", 86 | "4 36\n", 87 | "Name: Label, dtype: int64" 88 | ] 89 | }, 90 | "execution_count": 4, 91 | "metadata": {}, 92 | "output_type": "execute_result" 93 | } 94 | ], 95 | "source": [ 96 | "df.Label.value_counts()" 97 | ] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "metadata": {}, 102 | "source": [ 103 | "**Corresponding Attack Types:** \n", 104 | "0 BENIGN   18225 \n", 105 | "3 DoS     3042 \n", 106 | "6 WebAttack   2180 \n", 107 | "1 Bot     1966 \n", 108 | "5 PortScan   1255 \n", 109 | "2 BruteForce   96 \n", 110 | "4 Infiltration   36 " 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": {}, 116 | "source": [ 117 | "## Split train set and test set" 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": 5, 123 | "metadata": {}, 124 | "outputs": [], 125 | "source": [ 126 | "X = df.drop(['Label'],axis=1)\n", 127 | "y = df['Label']\n", 128 | "X_train, X_test, y_train, y_test = train_test_split(X,y, train_size = 0.8, test_size = 0.2, random_state = 0) #shuffle=False" 129 | ] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "metadata": { 134 | "collapsed": true 135 | }, 136 | "source": [ 137 | "## SMOTE to solve class-imbalance" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": 6, 143 | "metadata": {}, 144 | "outputs": [ 145 | { 146 | "data": { 147 | "text/plain": [ 148 | "0 14569\n", 149 | "3 2430\n", 150 | "6 1728\n", 151 | "1 1579\n", 152 | "5 1024\n", 153 | "2 82\n", 154 | "4 28\n", 155 | "Name: Label, dtype: int64" 156 | ] 157 | }, 158 | "execution_count": 6, 159 | "metadata": {}, 160 | "output_type": "execute_result" 161 | } 162 | ], 163 | "source": [ 164 | "pd.Series(y_train).value_counts()" 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": 7, 170 | "metadata": {}, 171 | "outputs": [], 172 | "source": [ 173 | "from imblearn.over_sampling import SMOTE\n", 174 | "smote=SMOTE(n_jobs=-1,sampling_strategy={2:1000,4:1000})" 175 | ] 176 | }, 177 | { 178 | "cell_type": "code", 179 | "execution_count": 8, 180 | "metadata": {}, 181 | "outputs": [], 182 | "source": [ 183 | "X_train, y_train = smote.fit_resample(X_train, y_train)" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": 9, 189 | "metadata": {}, 190 | "outputs": [ 191 | { 192 | "data": { 193 | "text/plain": [ 194 | "0 14569\n", 195 | "3 2430\n", 196 | "6 1728\n", 197 | "1 1579\n", 198 | "5 1024\n", 199 | "4 1000\n", 200 | "2 1000\n", 201 | "Name: Label, dtype: int64" 202 | ] 203 | }, 204 | "execution_count": 9, 205 | "metadata": {}, 206 | "output_type": "execute_result" 207 | } 208 | ], 209 | "source": [ 210 | "pd.Series(y_train).value_counts()" 211 | ] 212 | }, 213 | { 214 | "cell_type": "markdown", 215 | "metadata": {}, 216 | "source": [ 217 | "## Machine Learning (ML) model training\n", 218 | "### Training three base learners: LightGBM, XGBoost, CatBoost" 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": 10, 224 | "metadata": {}, 225 | "outputs": [ 226 | { 227 | "name": "stdout", 228 | "output_type": "stream", 229 | "text": [ 230 | " precision recall f1-score support\n", 231 | "\n", 232 | " 0 1.00 1.00 1.00 3656\n", 233 | " 1 0.99 0.99 0.99 387\n", 234 | " 2 1.00 1.00 1.00 14\n", 235 | " 3 1.00 1.00 1.00 612\n", 236 | " 4 1.00 0.75 0.86 8\n", 237 | " 5 0.99 1.00 0.99 231\n", 238 | " 6 1.00 1.00 1.00 452\n", 239 | "\n", 240 | " accuracy 1.00 5360\n", 241 | " macro avg 1.00 0.96 0.98 5360\n", 242 | "weighted avg 1.00 1.00 1.00 5360\n", 243 | "\n", 244 | "Accuracy of LightGBM: 0.9970149253731343\n", 245 | "Precision of LightGBM: 0.9970231077536348\n", 246 | "Recall of LightGBM: 0.9970149253731343\n", 247 | "Average F1 of LightGBM: 0.9969877746104384\n", 248 | "F1 of LightGBM for each type of attack: [0.99795054 0.99092088 1. 0.997543 0.85714286 0.99354839\n", 249 | " 0.99778271]\n" 250 | ] 251 | }, 252 | { 253 | "data": { 254 | "image/png": "\n", 255 | "text/plain": [ 256 | "
" 257 | ] 258 | }, 259 | "metadata": { 260 | "needs_background": "light" 261 | }, 262 | "output_type": "display_data" 263 | }, 264 | { 265 | "name": "stdout", 266 | "output_type": "stream", 267 | "text": [ 268 | "Wall time: 2.7 s\n" 269 | ] 270 | } 271 | ], 272 | "source": [ 273 | "%%time\n", 274 | "# Train the LightGBM algorithm\n", 275 | "import lightgbm as lgb\n", 276 | "lg = lgb.LGBMClassifier()\n", 277 | "lg.fit(X_train, y_train)\n", 278 | "y_pred = lg.predict(X_test)\n", 279 | "print(classification_report(y_test,y_pred))\n", 280 | "print(\"Accuracy of LightGBM: \"+ str(accuracy_score(y_test, y_pred)))\n", 281 | "print(\"Precision of LightGBM: \"+ str(precision_score(y_test, y_pred, average='weighted')))\n", 282 | "print(\"Recall of LightGBM: \"+ str(recall_score(y_test, y_pred, average='weighted')))\n", 283 | "print(\"Average F1 of LightGBM: \"+ str(f1_score(y_test, y_pred, average='weighted')))\n", 284 | "print(\"F1 of LightGBM for each type of attack: \"+ str(f1_score(y_test, y_pred, average=None)))\n", 285 | "lg_f1=f1_score(y_test, y_pred, average=None)\n", 286 | "\n", 287 | "# Plot the confusion matrix\n", 288 | "cm=confusion_matrix(y_test,y_pred)\n", 289 | "f,ax=plt.subplots(figsize=(5,5))\n", 290 | "sns.heatmap(cm,annot=True,linewidth=0.5,linecolor=\"red\",fmt=\".0f\",ax=ax)\n", 291 | "plt.xlabel(\"y_pred\")\n", 292 | "plt.ylabel(\"y_true\")\n", 293 | "plt.show()" 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": 11, 299 | "metadata": {}, 300 | "outputs": [ 301 | { 302 | "name": "stdout", 303 | "output_type": "stream", 304 | "text": [ 305 | " precision recall f1-score support\n", 306 | "\n", 307 | " 0 1.00 1.00 1.00 3656\n", 308 | " 1 1.00 0.99 0.99 387\n", 309 | " 2 1.00 1.00 1.00 14\n", 310 | " 3 1.00 1.00 1.00 612\n", 311 | " 4 1.00 0.75 0.86 8\n", 312 | " 5 0.99 1.00 0.99 231\n", 313 | " 6 1.00 1.00 1.00 452\n", 314 | "\n", 315 | " accuracy 1.00 5360\n", 316 | " macro avg 1.00 0.96 0.98 5360\n", 317 | "weighted avg 1.00 1.00 1.00 5360\n", 318 | "\n", 319 | "Accuracy of XGBoost: 0.9975746268656717\n", 320 | "Precision of XGBoost: 0.9975807247413017\n", 321 | "Recall of XGBoost: 0.9975746268656717\n", 322 | "Average F1 of XGBoost: 0.9975482770384609\n", 323 | "F1 of XGBoost for each type of attack: [0.99836021 0.99351492 1. 0.99836334 0.85714286 0.99137931\n", 324 | " 0.99889258]\n" 325 | ] 326 | }, 327 | { 328 | "data": { 329 | "image/png": "\n", 330 | "text/plain": [ 331 | "
" 332 | ] 333 | }, 334 | "metadata": { 335 | "needs_background": "light" 336 | }, 337 | "output_type": "display_data" 338 | }, 339 | { 340 | "name": "stdout", 341 | "output_type": "stream", 342 | "text": [ 343 | "Wall time: 7.91 s\n" 344 | ] 345 | } 346 | ], 347 | "source": [ 348 | "%%time\n", 349 | "# Train the XGBoost algorithm\n", 350 | "import xgboost as xgb\n", 351 | "xg = xgb.XGBClassifier()\n", 352 | "\n", 353 | "X_train_x = X_train.values\n", 354 | "X_test_x = X_test.values\n", 355 | "\n", 356 | "xg.fit(X_train_x, y_train)\n", 357 | "\n", 358 | "y_pred = xg.predict(X_test_x)\n", 359 | "print(classification_report(y_test,y_pred))\n", 360 | "print(\"Accuracy of XGBoost: \"+ str(accuracy_score(y_test, y_pred)))\n", 361 | "print(\"Precision of XGBoost: \"+ str(precision_score(y_test, y_pred, average='weighted')))\n", 362 | "print(\"Recall of XGBoost: \"+ str(recall_score(y_test, y_pred, average='weighted')))\n", 363 | "print(\"Average F1 of XGBoost: \"+ str(f1_score(y_test, y_pred, average='weighted')))\n", 364 | "print(\"F1 of XGBoost for each type of attack: \"+ str(f1_score(y_test, y_pred, average=None)))\n", 365 | "xg_f1=f1_score(y_test, y_pred, average=None)\n", 366 | "\n", 367 | "# Plot the confusion matrix\n", 368 | "cm=confusion_matrix(y_test,y_pred)\n", 369 | "f,ax=plt.subplots(figsize=(5,5))\n", 370 | "sns.heatmap(cm,annot=True,linewidth=0.5,linecolor=\"red\",fmt=\".0f\",ax=ax)\n", 371 | "plt.xlabel(\"y_pred\")\n", 372 | "plt.ylabel(\"y_true\")\n", 373 | "plt.show()" 374 | ] 375 | }, 376 | { 377 | "cell_type": "code", 378 | "execution_count": 12, 379 | "metadata": { 380 | "scrolled": false 381 | }, 382 | "outputs": [ 383 | { 384 | "name": "stdout", 385 | "output_type": "stream", 386 | "text": [ 387 | " precision recall f1-score support\n", 388 | "\n", 389 | " 0 1.00 1.00 1.00 3656\n", 390 | " 1 0.99 0.99 0.99 387\n", 391 | " 2 1.00 1.00 1.00 14\n", 392 | " 3 1.00 1.00 1.00 612\n", 393 | " 4 1.00 0.75 0.86 8\n", 394 | " 5 0.99 1.00 0.99 231\n", 395 | " 6 1.00 0.99 0.99 452\n", 396 | "\n", 397 | " accuracy 1.00 5360\n", 398 | " macro avg 1.00 0.96 0.98 5360\n", 399 | "weighted avg 1.00 1.00 1.00 5360\n", 400 | "\n", 401 | "Accuracy of CatBoost: 0.996455223880597\n", 402 | "Precision of CatBoost: 0.9964590935743635\n", 403 | "Recall of CatBoost: 0.996455223880597\n", 404 | "Average F1 of CatBoost: 0.9964290021228678\n", 405 | "F1 of CatBoost for each type of attack: [0.99781241 0.99222798 1. 0.99591837 0.85714286 0.99137931\n", 406 | " 0.9944629 ]\n" 407 | ] 408 | }, 409 | { 410 | "data": { 411 | "image/png": "\n", 412 | "text/plain": [ 413 | "
" 414 | ] 415 | }, 416 | "metadata": { 417 | "needs_background": "light" 418 | }, 419 | "output_type": "display_data" 420 | }, 421 | { 422 | "name": "stdout", 423 | "output_type": "stream", 424 | "text": [ 425 | "Wall time: 49.3 s\n" 426 | ] 427 | } 428 | ], 429 | "source": [ 430 | "%%time\n", 431 | "# Train the CatBoost algorithm\n", 432 | "import catboost as cbt\n", 433 | "cb = cbt.CatBoostClassifier(verbose=0,boosting_type='Plain')\n", 434 | "#cb = cbt.CatBoostClassifier()\n", 435 | "\n", 436 | "cb.fit(X_train, y_train)\n", 437 | "y_pred = cb.predict(X_test)\n", 438 | "print(classification_report(y_test,y_pred))\n", 439 | "print(\"Accuracy of CatBoost: \"+ str(accuracy_score(y_test, y_pred)))\n", 440 | "print(\"Precision of CatBoost: \"+ str(precision_score(y_test, y_pred, average='weighted')))\n", 441 | "print(\"Recall of CatBoost: \"+ str(recall_score(y_test, y_pred, average='weighted')))\n", 442 | "print(\"Average F1 of CatBoost: \"+ str(f1_score(y_test, y_pred, average='weighted')))\n", 443 | "print(\"F1 of CatBoost for each type of attack: \"+ str(f1_score(y_test, y_pred, average=None)))\n", 444 | "cb_f1=f1_score(y_test, y_pred, average=None)\n", 445 | "\n", 446 | "# Plot the confusion matrix\n", 447 | "cm=confusion_matrix(y_test,y_pred)\n", 448 | "f,ax=plt.subplots(figsize=(5,5))\n", 449 | "sns.heatmap(cm,annot=True,linewidth=0.5,linecolor=\"red\",fmt=\".0f\",ax=ax)\n", 450 | "plt.xlabel(\"y_pred\")\n", 451 | "plt.ylabel(\"y_true\")\n", 452 | "plt.show()" 453 | ] 454 | }, 455 | { 456 | "cell_type": "markdown", 457 | "metadata": {}, 458 | "source": [ 459 | "## Proposed ensemble model: Leader Class and Confidence Decision Ensemble (LCCDE)" 460 | ] 461 | }, 462 | { 463 | "cell_type": "markdown", 464 | "metadata": {}, 465 | "source": [ 466 | "LCCDE aims to achieve optimal model performance by identifying the best-performing base ML model with the highest prediction confidence for each class. " 467 | ] 468 | }, 469 | { 470 | "cell_type": "markdown", 471 | "metadata": {}, 472 | "source": [ 473 | "### Find the best-performing (leading) model for each type of attack among the three ML models" 474 | ] 475 | }, 476 | { 477 | "cell_type": "code", 478 | "execution_count": 13, 479 | "metadata": {}, 480 | "outputs": [], 481 | "source": [ 482 | "# Leading model list for each class\n", 483 | "model=[]\n", 484 | "for i in range(len(lg_f1)):\n", 485 | " if max(lg_f1[i],xg_f1[i],cb_f1[i]) == lg_f1[i]:\n", 486 | " model.append(lg)\n", 487 | " elif max(lg_f1[i],xg_f1[i],cb_f1[i]) == xg_f1[i]:\n", 488 | " model.append(xg)\n", 489 | " else:\n", 490 | " model.append(cb)" 491 | ] 492 | }, 493 | { 494 | "cell_type": "code", 495 | "execution_count": 14, 496 | "metadata": { 497 | "scrolled": false 498 | }, 499 | "outputs": [ 500 | { 501 | "data": { 502 | "text/plain": [ 503 | "[XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n", 504 | " colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n", 505 | " importance_type='gain', interaction_constraints='',\n", 506 | " learning_rate=0.300000012, max_delta_step=0, max_depth=6,\n", 507 | " min_child_weight=1, missing=nan, monotone_constraints='()',\n", 508 | " n_estimators=100, n_jobs=0, num_parallel_tree=1,\n", 509 | " objective='multi:softprob', random_state=0, reg_alpha=0,\n", 510 | " reg_lambda=1, scale_pos_weight=None, subsample=1,\n", 511 | " tree_method='exact', validate_parameters=1, verbosity=None),\n", 512 | " XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n", 513 | " colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n", 514 | " importance_type='gain', interaction_constraints='',\n", 515 | " learning_rate=0.300000012, max_delta_step=0, max_depth=6,\n", 516 | " min_child_weight=1, missing=nan, monotone_constraints='()',\n", 517 | " n_estimators=100, n_jobs=0, num_parallel_tree=1,\n", 518 | " objective='multi:softprob', random_state=0, reg_alpha=0,\n", 519 | " reg_lambda=1, scale_pos_weight=None, subsample=1,\n", 520 | " tree_method='exact', validate_parameters=1, verbosity=None),\n", 521 | " LGBMClassifier(),\n", 522 | " XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n", 523 | " colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n", 524 | " importance_type='gain', interaction_constraints='',\n", 525 | " learning_rate=0.300000012, max_delta_step=0, max_depth=6,\n", 526 | " min_child_weight=1, missing=nan, monotone_constraints='()',\n", 527 | " n_estimators=100, n_jobs=0, num_parallel_tree=1,\n", 528 | " objective='multi:softprob', random_state=0, reg_alpha=0,\n", 529 | " reg_lambda=1, scale_pos_weight=None, subsample=1,\n", 530 | " tree_method='exact', validate_parameters=1, verbosity=None),\n", 531 | " LGBMClassifier(),\n", 532 | " LGBMClassifier(),\n", 533 | " XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n", 534 | " colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n", 535 | " importance_type='gain', interaction_constraints='',\n", 536 | " learning_rate=0.300000012, max_delta_step=0, max_depth=6,\n", 537 | " min_child_weight=1, missing=nan, monotone_constraints='()',\n", 538 | " n_estimators=100, n_jobs=0, num_parallel_tree=1,\n", 539 | " objective='multi:softprob', random_state=0, reg_alpha=0,\n", 540 | " reg_lambda=1, scale_pos_weight=None, subsample=1,\n", 541 | " tree_method='exact', validate_parameters=1, verbosity=None)]" 542 | ] 543 | }, 544 | "execution_count": 14, 545 | "metadata": {}, 546 | "output_type": "execute_result" 547 | } 548 | ], 549 | "source": [ 550 | "model" 551 | ] 552 | }, 553 | { 554 | "cell_type": "markdown", 555 | "metadata": {}, 556 | "source": [ 557 | "**Leading Model for Each Type of Attack:** \n", 558 | "0 BENIGN:   XGBClassifier \n", 559 | "1 Bot:     XGBClassifier \n", 560 | "2 BruteForce:   LGBMClassifier \n", 561 | "3 DoS:     XGBClassifier \n", 562 | "4 Infiltration:   LGBMClassifier \n", 563 | "5 PortScan:   LGBMClassifier \n", 564 | "6 WebAttack:   XGBClassifier " 565 | ] 566 | }, 567 | { 568 | "cell_type": "markdown", 569 | "metadata": { 570 | "collapsed": true 571 | }, 572 | "source": [ 573 | "## LCCDE Prediction" 574 | ] 575 | }, 576 | { 577 | "cell_type": "code", 578 | "execution_count": 15, 579 | "metadata": {}, 580 | "outputs": [], 581 | "source": [ 582 | "def LCCDE(X_test, y_test, m1, m2, m3):\n", 583 | " i = 0\n", 584 | " t = []\n", 585 | " m = []\n", 586 | " yt = []\n", 587 | " yp = []\n", 588 | " l = []\n", 589 | " pred_l = []\n", 590 | " pro_l = []\n", 591 | "\n", 592 | " # For each class (normal or a type of attack), find the leader model\n", 593 | " for xi, yi in stream.iter_pandas(X_test, y_test):\n", 594 | "\n", 595 | " xi2=np.array(list(xi.values()))\n", 596 | " y_pred1 = m1.predict(xi2.reshape(1, -1)) # model 1 (LightGBM) makes a prediction on text sample xi\n", 597 | " y_pred1 = int(y_pred1[0])\n", 598 | " y_pred2 = m2.predict(xi2.reshape(1, -1)) # model 2 (XGBoost) makes a prediction on text sample xi\n", 599 | " y_pred2 = int(y_pred2[0])\n", 600 | " y_pred3 = m3.predict(xi2.reshape(1, -1)) # model 3 (Catboost) makes a prediction on text sample xi\n", 601 | " y_pred3 = int(y_pred3[0])\n", 602 | "\n", 603 | " p1 = m1.predict_proba(xi2.reshape(1, -1)) # The prediction probability (confidence) list of model 1 \n", 604 | " p2 = m2.predict_proba(xi2.reshape(1, -1)) # The prediction probability (confidence) list of model 2 \n", 605 | " p3 = m3.predict_proba(xi2.reshape(1, -1)) # The prediction probability (confidence) list of model 3 \n", 606 | "\n", 607 | " # Find the highest prediction probability among all classes for each ML model\n", 608 | " y_pred_p1 = np.max(p1)\n", 609 | " y_pred_p2 = np.max(p2)\n", 610 | " y_pred_p3 = np.max(p3)\n", 611 | "\n", 612 | " if y_pred1 == y_pred2 == y_pred3: # If the predicted classes of all the three models are the same\n", 613 | " y_pred = y_pred1 # Use this predicted class as the final predicted class\n", 614 | "\n", 615 | " elif y_pred1 != y_pred2 != y_pred3: # If the predicted classes of all the three models are different\n", 616 | " # For each prediction model, check if the predicted class’s original ML model is the same as its leader model\n", 617 | " if model[y_pred1]==m1: # If they are the same and the leading model is model 1 (LightGBM)\n", 618 | " l.append(m1)\n", 619 | " pred_l.append(y_pred1) # Save the predicted class\n", 620 | " pro_l.append(y_pred_p1) # Save the confidence\n", 621 | "\n", 622 | " if model[y_pred2]==m2: # If they are the same and the leading model is model 2 (XGBoost)\n", 623 | " l.append(m2)\n", 624 | " pred_l.append(y_pred2)\n", 625 | " pro_l.append(y_pred_p2)\n", 626 | "\n", 627 | " if model[y_pred3]==m3: # If they are the same and the leading model is model 3 (CatBoost)\n", 628 | " l.append(m3)\n", 629 | " pred_l.append(y_pred3)\n", 630 | " pro_l.append(y_pred_p3)\n", 631 | "\n", 632 | " if len(l)==0: # Avoid empty probability list\n", 633 | " pro_l=[y_pred_p1,y_pred_p2,y_pred_p3]\n", 634 | "\n", 635 | " elif len(l)==1: # If only one pair of the original model and the leader model for each predicted class is the same\n", 636 | " y_pred=pred_l[0] # Use the predicted class of the leader model as the final prediction class\n", 637 | "\n", 638 | " else: # If no pair or multiple pairs of the original prediction model and the leader model for each predicted class are the same\n", 639 | " max_p = max(pro_l) # Find the highest confidence\n", 640 | " \n", 641 | " # Use the predicted class with the highest confidence as the final prediction class\n", 642 | " if max_p == y_pred_p1:\n", 643 | " y_pred = y_pred1\n", 644 | " elif max_p == y_pred_p2:\n", 645 | " y_pred = y_pred2\n", 646 | " else:\n", 647 | " y_pred = y_pred3 \n", 648 | " \n", 649 | " else: # If two predicted classes are the same and the other one is different\n", 650 | " n = mode([y_pred1,y_pred2,y_pred3]) # Find the predicted class with the majority vote\n", 651 | " y_pred = model[n].predict(xi2.reshape(1, -1)) # Use the predicted class of the leader model as the final prediction class\n", 652 | " y_pred = int(y_pred[0]) \n", 653 | "\n", 654 | " yt.append(yi)\n", 655 | " yp.append(y_pred) # Save the predicted classes for all tested samples\n", 656 | " return yt, yp" 657 | ] 658 | }, 659 | { 660 | "cell_type": "code", 661 | "execution_count": 16, 662 | "metadata": {}, 663 | "outputs": [ 664 | { 665 | "name": "stdout", 666 | "output_type": "stream", 667 | "text": [ 668 | "Wall time: 59.1 s\n" 669 | ] 670 | } 671 | ], 672 | "source": [ 673 | "%%time\n", 674 | "# Implementing LCCDE\n", 675 | "yt, yp = LCCDE(X_test, y_test, m1 = lg, m2 = xg, m3 = cb)" 676 | ] 677 | }, 678 | { 679 | "cell_type": "code", 680 | "execution_count": 17, 681 | "metadata": {}, 682 | "outputs": [ 683 | { 684 | "name": "stdout", 685 | "output_type": "stream", 686 | "text": [ 687 | "Accuracy of LCCDE: 0.9975746268656717\n", 688 | "Precision of LCCDE: 0.9975807247413017\n", 689 | "Recall of LCCDE: 0.9975746268656717\n", 690 | "Average F1 of LCCDE: 0.9975482770384609\n", 691 | "F1 of LCCDE for each type of attack: [0.99836021 0.99351492 1. 0.99836334 0.85714286 0.99137931\n", 692 | " 0.99889258]\n" 693 | ] 694 | } 695 | ], 696 | "source": [ 697 | "# The performance of the proposed lCCDE model\n", 698 | "print(\"Accuracy of LCCDE: \"+ str(accuracy_score(yt, yp)))\n", 699 | "print(\"Precision of LCCDE: \"+ str(precision_score(yt, yp, average='weighted')))\n", 700 | "print(\"Recall of LCCDE: \"+ str(recall_score(yt, yp, average='weighted')))\n", 701 | "print(\"Average F1 of LCCDE: \"+ str(f1_score(yt, yp, average='weighted')))\n", 702 | "print(\"F1 of LCCDE for each type of attack: \"+ str(f1_score(yt, yp, average=None)))" 703 | ] 704 | }, 705 | { 706 | "cell_type": "code", 707 | "execution_count": 18, 708 | "metadata": { 709 | "scrolled": true 710 | }, 711 | "outputs": [ 712 | { 713 | "name": "stdout", 714 | "output_type": "stream", 715 | "text": [ 716 | "F1 of LightGBM for each type of attack: [0.99795054 0.99092088 1. 0.997543 0.85714286 0.99354839\n", 717 | " 0.99778271]\n", 718 | "F1 of XGBoost for each type of attack: [0.99836021 0.99351492 1. 0.99836334 0.85714286 0.99137931\n", 719 | " 0.99889258]\n", 720 | "F1 of CatBoost for each type of attack: [0.99781241 0.99222798 1. 0.99591837 0.85714286 0.99137931\n", 721 | " 0.9944629 ]\n" 722 | ] 723 | } 724 | ], 725 | "source": [ 726 | "# Comparison: The F1-scores for each base model\n", 727 | "print(\"F1 of LightGBM for each type of attack: \"+ str(lg_f1))\n", 728 | "print(\"F1 of XGBoost for each type of attack: \"+ str(xg_f1))\n", 729 | "print(\"F1 of CatBoost for each type of attack: \"+ str(cb_f1))" 730 | ] 731 | }, 732 | { 733 | "cell_type": "markdown", 734 | "metadata": { 735 | "collapsed": true 736 | }, 737 | "source": [ 738 | "**Conclusion**: The performance (F1-score) of the proposed LCCDE ensemble model on each type of attack detection is higher than any base ML model." 739 | ] 740 | }, 741 | { 742 | "cell_type": "code", 743 | "execution_count": null, 744 | "metadata": { 745 | "collapsed": true 746 | }, 747 | "outputs": [], 748 | "source": [] 749 | } 750 | ], 751 | "metadata": { 752 | "anaconda-cloud": {}, 753 | "kernelspec": { 754 | "display_name": "Python 3", 755 | "language": "python", 756 | "name": "python3" 757 | }, 758 | "language_info": { 759 | "codemirror_mode": { 760 | "name": "ipython", 761 | "version": 3 762 | }, 763 | "file_extension": ".py", 764 | "mimetype": "text/x-python", 765 | "name": "python", 766 | "nbconvert_exporter": "python", 767 | "pygments_lexer": "ipython3", 768 | "version": "3.6.8" 769 | } 770 | }, 771 | "nbformat": 4, 772 | "nbformat_minor": 2 773 | } 774 | -------------------------------------------------------------------------------- /LCCDE_IDS_GlobeCom22_paper.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Western-OC2-Lab/Intrusion-Detection-System-Using-Machine-Learning/1b8b79d1711a97fee33d83639b3ae4e19f980ba6/LCCDE_IDS_GlobeCom22_paper.pdf -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 Western OC2 Lab 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /MTH_IDS_paper.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Western-OC2-Lab/Intrusion-Detection-System-Using-Machine-Learning/1b8b79d1711a97fee33d83639b3ae4e19f980ba6/MTH_IDS_paper.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Intrusion-Detection-System-Using-Machine-Learning 2 | 3 | This repository contains the code for the project "IDS-ML: Intrusion Detection System Development Using Machine Learning". The code and proposed Intrusion Detection System (IDSs) are general models that can be used in any IDS and anomaly detection applications. In this project, three papers have been published: 4 | * L. Yang, A. Moubayed, I. Hamieh and A. Shami, "[Tree-Based Intelligent Intrusion Detection System in Internet of Vehicles](https://arxiv.org/pdf/1910.08635.pdf)," in 2019 IEEE Global Communications Conference (GLOBECOM), 2019, pp. 1-6, doi: 10.1109/GLOBECOM38437.2019.9013892. 5 | * L. Yang, A. Moubayed, and A. Shami, “[MTH-IDS: A Multi-Tiered Hybrid Intrusion Detection System for Internet of Vehicles](https://arxiv.org/pdf/2105.13289.pdf),” IEEE Internet of Things Journal, vol. 9, no. 1, pp. 616-632, Jan.1, 2022, doi: 10.1109/JIOT.2021.3084796. 6 | * L. Yang, A. Shami, G. Stevens, and S. DeRusett, “[LCCDE: A Decision-Based Ensemble Framework for Intrusion Detection in The Internet of Vehicles](https://arxiv.org/pdf/2208.03399.pdf)," in 2022 IEEE Global Communications Conference (GLOBECOM), 2022, pp. 1-6, doi: 10.1109/GLOBECOM48099.2022.10001280. 7 | 8 | 9 | The code introduction of this repository is publicly available at: 10 | * L. Yang, and A. Shami, “[IDS-ML: An open source code for Intrusion Detection System development using Machine Learning](https://www.sciencedirect.com/science/article/pii/S2665963822001300)," Software Impacts, vol. 14, pp. 1-4, 2022, doi: 10.1016/j.simpa.2022.100446. 11 | 12 | This repository proposed three **intrusion detection systems** by implementing many **machine learning** algorithms, including tree-based algorithms (**decision tree, random forest, XGBoost, LightGBM, CatBoost etc.**), unsupervised learning algorithms (**k-means**), ensemble learning algorithms (**stacking, proposed LCCDE**), and hyperparameter optimization techniques (**Bayesian optimization**)**. 13 | 14 | - Another **intrusion detection system development code** using **convolutional neural networks (CNNs)** and **transfer learning** techniques can be found in: [Intrusion-Detection-System-Using-CNN-and-Transfer-Learning](https://github.com/Western-OC2-Lab/Intrusion-Detection-System-Using-CNN-and-Transfer-Learning) 15 | 16 | - A comprehensive **hyperparameter optimization** tutorial code can be found in: [Hyperparameter-Optimization-of-Machine-Learning-Algorithms](https://github.com/LiYangHart/Hyperparameter-Optimization-of-Machine-Learning-Algorithms) 17 | 18 | 19 | ## Paper Abstract 20 | ### Paper 1: Tree-Based Intelligent Intrusion Detection System in Internet of Vehicles 21 |   The use of autonomous vehicles (AVs) is a promising technology in Intelligent Transportation Systems (ITSs) to improve safety and driving efficiency. Vehicle-to-everything (V2X) technology enables communication among vehicles and other infrastructures. However, AVs and Internet of Vehicles (IoV) are vulnerable to different types of cyber-attacks such as denial of service, spoofing, and sniffing attacks. An intelligent IDS is proposed in this paper for network attack detection that can be applied to not only Controller Area Network (CAN) bus of AVs but also on general IoVs. The proposed IDS utilizes tree-based ML algorithms including decision tree (DT), random forest (RF), extra trees (ET), and Extreme Gradient Boosting (XGBoost). The results from the implementation of the proposed intrusion detection system on standard data sets indicate that the system has the ability to identify various cyber-attacks in the AV networks. Furthermore, the proposed ensemble learning and feature selection approaches enable the proposed system to achieve high detection rate and low computational cost simultaneously. 22 | 23 | **

Figure 1: The overview of the tree-based IDS model.

** 24 |

25 | 26 |

27 | 28 | ### Paper 2: MTH-IDS: A Multi-Tiered Hybrid Intrusion Detection System for Internet of Vehicles 29 |   Modern vehicles, including connected vehicles and autonomous vehicles, nowadays involve many electronic control units connected through intra-vehicle networks to implement various functionalities and perform actions. Modern vehicles are also connected to external networks through vehicle-to-everything technologies, enabling their communications with other vehicles, infrastructures, and smart devices. However, the improving functionality and connectivity of modern vehicles also increase their vulnerabilities to cyber-attacks targeting both intra-vehicle and external networks due to the large attack surfaces. To secure vehicular networks, many researchers have focused on developing intrusion detection systems (IDSs) that capitalize on machine learning methods to detect malicious cyber-attacks. In this paper, the vulnerabilities of intra-vehicle and external networks are discussed, and a multi-tiered hybrid IDS that incorporates a signature-based IDS and an anomaly-based IDS is proposed to detect both known and unknown attacks on vehicular networks. Experimental results illustrate that the proposed system can accurately detect various types of known attacks on the CAN-intrusion-dataset representing the intra-vehicle network data and the CICIDS2017 dataset illustrating the external vehicular network data. 30 |   The proposed MTH-IDS framework consists of two traditional ML stages (data pre-processing and feature engineering) and four tiers of learning models: 31 | 1. Four tree-based supervised learners — decision tree (DT), random forest (RF), extra trees (ET), and extreme gradient boosting (XGBoost) — used as multi-class classifiers for known attack detection; 32 | 2. A stacking ensemble model and a Bayesian optimization with tree Parzen estimator (BO-TPE) method for supervised learner optimization; 33 | 3. A cluster labeling (CL) k-means used as an unsupervised learner for zero-day attack detection; 34 | 4. Two biased classifiers and a Bayesian optimization with Gaussian process (BO-GP) method for unsupervised learner optimization. 35 | 36 | **

Figure 2: The overview of the MTH-IDS model.

** 37 |

38 | 39 |

40 | 41 | 42 | ### Paper 3: LCCDE: A Decision-Based Ensemble Framework for Intrusion Detection in The Internet of Vehicles 43 |   Modern vehicles, including autonomous vehicles and connected vehicles, have adopted an increasing variety of functionalities through connections and communications with other vehicles, smart devices, and infrastructures. However, the growing connectivity of the Internet of Vehicles (IoV) also increases the vulnerabilities to network attacks. To protect IoV systems against cyber threats, Intrusion Detection Systems (IDSs) that can identify malicious cyber-attacks have been developed using Machine Learning (ML) approaches. To accurately detect various types of attacks in IoV networks, we propose a novel ensemble IDS framework named Leader Class and Confidence Decision Ensemble (LCCDE). It is constructed by determining the best-performing ML model among three advanced ML algorithms (XGBoost, LightGBM, and CatBoost) for every class or type of attack. The class leader models with their prediction confidence values are then utilized to make accurate decisions regarding the detection of various types of cyber-attacks. Experiments on two public IoV security datasets (Car-Hacking and CICIDS2017 datasets) demonstrate the effectiveness of the proposed LCCDE for intrusion detection on both intra-vehicle and external networks. 44 | 45 | **

Figure 3: The overview of the LCCCDE IDS model.

** 46 |

47 | 48 |

49 | 50 | 51 | ## Implementation 52 | ### Dataset 53 | CICIDS2017 dataset, a popular network traffic dataset for intrusion detection problems 54 | * Publicly available at: https://www.unb.ca/cic/datasets/ids-2017.html 55 | * For the purpose of displaying the experimental results in Jupyter Notebook, the sampled subsets of CICIDS2017 is used in the sample code. The subsets are in the "data" folder. 56 | 57 | CAN-intrusion dataset, a benchmark network security dataset for intra-vehicle intrusion detection 58 | * Publicly available at: https://ocslab.hksecurity.net/Datasets/CAN-intrusion-dataset 59 | * Can be processed using the same code 60 | 61 | ### Code 62 | * [Tree-based_IDS_GlobeCom19.ipynb](https://github.com/Western-OC2-Lab/Intrusion-Detection-System-Using-Machine-Learning/blob/main/Tree-based_IDS_GlobeCom19.ipynb): code for the paper "Tree-Based Intelligent Intrusion Detection System in Internet of Vehicles" 63 | * [MTH_IDS_IoTJ.ipynb](https://github.com/Western-OC2-Lab/Intrusion-Detection-System-Using-Machine-Learning/blob/main/MTH_IDS_IoTJ.ipynb): code for the paper "MTH-IDS: A Multi-Tiered Hybrid Intrusion Detection System for Internet of Vehicles" 64 | * [LCCDE_IDS_GlobeCom22.ipynb](https://github.com/Western-OC2-Lab/Intrusion-Detection-System-Using-Machine-Learning/blob/main/LCCDE_IDS_GlobeCom22.ipynb): code for the paper "LCCDE: A Decision-Based Ensemble Framework for Intrusion Detection in The Internet of Vehicles" 65 | 66 | ### Machine Learning Algorithms 67 | * Decision tree (DT) 68 | * Random forest (RF) 69 | * Extra trees (ET) 70 | * XGBoost 71 | * LightGBM 72 | * CatBoost 73 | * Stacking 74 | * K-means 75 | 76 | ### Hyperparameter Optimization Methods 77 | * Bayesian Optimization with Gaussian Processes (BO-GP) 78 | * Bayesian Optimization with Tree-structured Parzen Estimator (BO-TPE) 79 | 80 | If you are interested in hyperparameter tuning of machine learning algorithms, please see the code in the following link: 81 | https://github.com/LiYangHart/Hyperparameter-Optimization-of-Machine-Learning-Algorithms 82 | 83 | ### Requirements & Libraries 84 | * Python 3.6+ 85 | * [scikit-learn](https://scikit-learn.org/stable/) 86 | * [Xgboost](https://xgboost.readthedocs.io/en/latest/python/python_intro.html) 87 | * [lightgbm](https://lightgbm.readthedocs.io/en/v3.3.2/Python-Intro.html) 88 | * [catboost](https://xgboost.readthedocs.io/en/latest/python/python_intro.html) 89 | * [FCBF](https://github.com/SantiagoEG/FCBF_module) 90 | * [scikit-optimize](https://github.com/scikit-optimize/scikit-optimize) 91 | * [hyperopt](https://github.com/hyperopt/hyperopt) 92 | * [River](https://riverml.xyz/dev/) 93 | 94 | ## Contact-Info 95 | Please feel free to contact us for any questions or cooperation opportunities. We will be happy to help. 96 | * Email: [liyanghart@gmail.com](mailto:liyanghart@gmail.com) 97 | * GitHub: [LiYangHart](https://github.com/LiYangHart) and [Western OC2 Lab](https://github.com/Western-OC2-Lab/) 98 | * LinkedIn: [Li Yang](https://www.linkedin.com/in/li-yang-phd-65a190176/) 99 | * Google Scholar: [Li Yang](https://scholar.google.com.eg/citations?user=XEfM7bIAAAAJ&hl=en) and [OC2 Lab](https://scholar.google.com.eg/citations?user=oiebNboAAAAJ&hl=en) 100 | 101 | ## Citation 102 | If you find this repository useful in your research, please cite one of the following two articles as: 103 | 104 | L. Yang, A. Moubayed, I. Hamieh and A. Shami, "Tree-Based Intelligent Intrusion Detection System in Internet of Vehicles," 2019 IEEE Global Communications Conference (GLOBECOM), 2019, pp. 1-6, doi: 10.1109/GLOBECOM38437.2019.9013892. 105 | ``` 106 | @INPROCEEDINGS{9013892, 107 | author={Yang, Li and Moubayed, Abdallah and Hamieh, Ismail and Shami, Abdallah}, 108 | booktitle={2019 IEEE Global Communications Conference (GLOBECOM)}, 109 | title={Tree-Based Intelligent Intrusion Detection System in Internet of Vehicles}, 110 | year={2019}, 111 | pages={1-6}, 112 | doi={10.1109/GLOBECOM38437.2019.9013892} 113 | } 114 | ``` 115 | 116 | L. Yang, A. Moubayed, and A. Shami, “MTH-IDS: A Multi-Tiered Hybrid Intrusion Detection System for Internet of Vehicles,” IEEE Internet of Things Journal, vol. 9, no. 1, pp. 616-632, Jan.1, 2022, doi: 10.1109/JIOT.2021.3084796. 117 | ``` 118 | @ARTICLE{9443234, 119 | author={Yang, Li and Moubayed, Abdallah and Shami, Abdallah}, 120 | journal={IEEE Internet of Things Journal}, 121 | title={MTH-IDS: A Multitiered Hybrid Intrusion Detection System for Internet of Vehicles}, 122 | year={2022}, 123 | volume={9}, 124 | number={1}, 125 | pages={616-632}, 126 | doi={10.1109/JIOT.2021.3084796}} 127 | ``` 128 | 129 | L. Yang, A. Shami, G. Stevens, and S. DeRusett, “LCCDE: A Decision-Based Ensemble Framework for Intrusion Detection in The Internet of Vehicles," in 2022 IEEE Global Communications Conference (GLOBECOM), 2022, pp. 1-6, doi: 10.1109/GLOBECOM48099.2022.10001280. 130 | ``` 131 | @INPROCEEDINGS{10001280, 132 | author={Yang, Li and Shami, Abdallah and Stevens, Gary and de Rusett, Stephen}, 133 | booktitle={GLOBECOM 2022 - 2022 IEEE Global Communications Conference}, 134 | title={LCCDE: A Decision-Based Ensemble Framework for Intrusion Detection in The Internet of Vehicles}, 135 | year={2022}, 136 | pages={3545-3550}, 137 | doi={10.1109/GLOBECOM48099.2022.10001280}} 138 | ``` 139 | -------------------------------------------------------------------------------- /Tree-based_IDS_paper.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Western-OC2-Lab/Intrusion-Detection-System-Using-Machine-Learning/1b8b79d1711a97fee33d83639b3ae4e19f980ba6/Tree-based_IDS_paper.pdf -------------------------------------------------------------------------------- /data/README.md: -------------------------------------------------------------------------------- 1 | # The sampled datasets used for the experiments in the sample code 2 | 3 | **CICIDS2017_sample.csv**: The randomly sampled subset of CICIDS2017 4 | **CICIDS2017_sample_km.csv**: The subset of CICIDS2017 sampled by k-means clustering 5 | --------------------------------------------------------------------------------