├── README.md ├── LSTM_inDeep_binary_class.ipynb ├── MLP_inDeep_binary_class.ipynb ├── InDeep_binary_class.ipynb ├── inDeep_Feature_study.ipynb └── RFE_feature_selection.ipynb /README.md: -------------------------------------------------------------------------------- 1 | # CNN_UNSW-NB15 2 | DISEÑO Y EVALUACIÓN DE REDES NEURONALES CONVOLUCIONALES PARA UN SISTEMA DE DETECCIÓN DE INTRUSIONES. 3 | ## Descripción 🚀 4 | Año a año, las comunicaciones entre equipos, personas, corporaciones, gobiernos o todo aquello que use cualquier mecanismo moderno para comunicarse con otra entidad ha crecido de manera exponencial. Vivir comunicados en un mundo globalizado es indispensable. Debido a ello, cada vez es más notorio la importancia de salvaguardar la seguridad de las comunicaciones frente al continuo crecimiento de los ataques cibernéticos. Estos ataques, día a día mejoran, se fortalecen y se hacen más difícilmente detectables frente a los equipos de ciberseguridad. Entre esos equipos, el IDS es materia de nuestro interés, siendo una herramienta fundamental para la detección de intrusiones en la red. Dicho equipo, a pesar de sus diversos beneficios, presenta ciertas carencias en algunos aspectos, entre la que destaca el alto ratio de falsos positivos que genera. Por ello, se vio necesario la utilización de otros mecanismos para paliar dicha carencia. Como resultado, este estudio pretende utilizar como IDS una red neuronal convolucional que mejore el resultado de los falsos positivos que vienen arrastrando los IDS tradicionales. Además, se evalúa nuestra red con otras dos redes neuronales: una red MLP y otra LSTM. Adicionalmente, se usan diferentes técnicas para procesar el set de datos que utilizamos (UNSW-NB15) para entrenar la red neuronal convolucional. Los resultados de este estudio reflejan una mejora notable en la tasa de falsos positivos con respecto a otras redes y estudios realizados, una mejora en la clasificación multiclase y una buena eficiencia de entrenamiento por parte de la red. 5 | 6 | ### Pre-requisitos 📋 7 | 8 | Tener descargado en el Google Drive los ficheros: 9 | 10 | * [UNSW_NB15_training-set.csv](https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/) 11 | * [UNSW_NB15_testing-set.csv](https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/) 12 | 13 | ### Instalación 🔧 14 | 15 | _Se requiere vincular la cuenta de Google Drive a Google Colab_ 16 | 17 | ``` 18 | from google.colab import drive 19 | drive.mount('/content/drive') 20 | ``` 21 | 22 | ## Construido con 🛠️ 23 | 24 | * [Google Colab](https://colab.research.google.com/) - Entorno de ejecución y programación en Python en el navegador. 25 | 26 | 27 | 28 | -------------------------------------------------------------------------------- /LSTM_inDeep_binary_class.ipynb: -------------------------------------------------------------------------------- 1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"LSTM_inDeep_binary_class.ipynb","provenance":[],"collapsed_sections":[],"authorship_tag":"ABX9TyMzr38kJvukSYgEd4hqWT2I"},"kernelspec":{"name":"python3","display_name":"Python 3"}},"cells":[{"cell_type":"code","metadata":{"id":"oHUbEfIrroJU","colab_type":"code","colab":{}},"source":["#Importamos la segunda versión de tensorflow\n","%tensorflow_version 2.x"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"OniWnG5-rui7","colab_type":"code","colab":{}},"source":["#Instalamos las dependencias necesarias que no posee Google Colab.\n","!pip install bayesian-optimization\n","!pip install mlxtend --upgrade --no-deps"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"WXLBhGDEruk9","colab_type":"code","colab":{}},"source":["#Importamos los paquetes necesarios para el proyecto\n","import pandas as pd\n","import numpy as np\n","import matplotlib.pyplot as plt\n","from tensorflow.keras.utils import get_file\n","from sklearn import preprocessing\n","from sklearn.preprocessing import LabelEncoder\n","from PIL import Image\n","from sklearn.model_selection import StratifiedShuffleSplit\n","from sklearn import metrics\n","from keras import backend as K\n","from keras.utils.generic_utils import get_custom_objects\n","from bayes_opt import BayesianOptimization\n","from sklearn.model_selection import StratifiedKFold\n","from mlxtend.plotting import plot_confusion_matrix\n","from sklearn.model_selection import train_test_split\n","from imblearn.over_sampling import SMOTE\n","\n","\n","from tensorflow.keras.models import Sequential,Model\n","from tensorflow.keras.layers import Dense, Conv2D,LSTM, MaxPooling2D,UpSampling2D,Conv2DTranspose, Dropout, Flatten, Activation, LeakyReLU, ReLU, Input,concatenate,BatchNormalization\n","from tensorflow.keras.optimizers import Adam\n","from keras.regularizers import l1\n","from tensorflow.keras.callbacks import EarlyStopping"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"h2k63oqrrum_","colab_type":"code","colab":{}},"source":["#Descargamos el set de datos subido previamente al Google Drive (Tarda menos que descargarlo directamente de la página).\n","path_train = \"/content/drive/My Drive/TFG/UNSW_NB15_training-set.csv\"\n","path_test = \"/content/drive/My Drive/TFG/UNSW_NB15_testing-set.csv\"\n","\n","#Leemos los datos\n","df_train=pd.read_csv(path_train,dtype='unicode')\n","df_test=pd.read_csv(path_test,dtype='unicode')\n","\n","#Quitamos las columnas no necesarias\n","df_train.drop('id', axis=1, inplace=True)\n","df_train.drop('attack_cat', axis=1, inplace=True)\n","df_test.drop('id', axis=1, inplace=True)\n","df_test.drop('attack_cat', axis=1, inplace=True)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"WBJfZogHruo9","colab_type":"code","colab":{}},"source":["#Concatenamos el set de entrenamiento y el de test para manipular directamente el conjunto.\n","df = pd.concat([df_train,df_test],axis=0)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"oqwvw6OEruqw","colab_type":"code","colab":{}},"source":["#División del dataset por clases.\n","df_class_Normal = df[df['label'] == '0']\n","df_class_Attack = df[df['label'] == '1']\n","\n","#Under-sampling de la categoría attack hasta alcanzar el mismo valor que la categoria Normal\n","df_class_Attack = df_class_Attack.sample(df_class_Normal.shape[0])\n","\n","#Concatenado\n","df = pd.concat([df_class_Normal,df_class_Attack], axis=0)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"HbArosMZrutK","colab_type":"code","colab":{}},"source":["#Visualización en una gráfica circular del peso de cada clase\n","print(\"Shape of dataFrame: {} \\n\".format(df.shape))\n","print(\"Number of attack samples\")\n","display(df['label'].value_counts())\n","print(\"\")\n","print(\"Plotting balance of dataFrame\")\n","df_plot = (df['label'].value_counts(normalize=True) *100)\n","df_plot.plot(kind='pie',figsize=(10,10),title='Balance of dataset (%)')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"9VTk_5g7ruwh","colab_type":"code","colab":{}},"source":["#Conversión de los datos tipo string a enteros en un rango entre 0 y 255 (1 Byte de información)\n","def encode_string_byte (df,name):\n"," df[name] = LabelEncoder().fit_transform(df[name])\n","\n","encode_string_byte (df,'proto')\n","encode_string_byte (df,'state') \n","encode_string_byte (df,'service') "],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"SetOj5kjru0J","colab_type":"code","colab":{}},"source":["#Normalización de los números enteros en valores decimales en rango entre 0 y 1\n","def numerical_minmax_normalization (df, name):\n"," x = df[name].values.reshape(-1,1)\n"," min_max_scaler = preprocessing.MinMaxScaler()\n"," x_scaled = min_max_scaler.fit_transform(x)\n"," df[name] = x_scaled\n","\n","numerical_minmax_normalization(df,'dur')\n","numerical_minmax_normalization(df,'spkts')\n","numerical_minmax_normalization(df,'dpkts')\n","numerical_minmax_normalization(df,'sbytes')\n","numerical_minmax_normalization(df,'dbytes')\n","numerical_minmax_normalization(df,'rate')\n","numerical_minmax_normalization(df,'sttl')\n","numerical_minmax_normalization(df,'dttl')\n","numerical_minmax_normalization(df,'sload')\n","numerical_minmax_normalization(df,'dload')\n","numerical_minmax_normalization(df,'sloss')\n","numerical_minmax_normalization(df,'dloss')\n","numerical_minmax_normalization(df,'sinpkt')\n","numerical_minmax_normalization(df,'dinpkt')\n","numerical_minmax_normalization(df,'sjit')\n","numerical_minmax_normalization(df,'djit')\n","numerical_minmax_normalization(df,'swin')\n","numerical_minmax_normalization(df,'stcpb')\n","numerical_minmax_normalization(df,'dtcpb')\n","numerical_minmax_normalization(df,'dwin')\n","numerical_minmax_normalization(df,'tcprtt')\n","numerical_minmax_normalization(df,'synack')\n","numerical_minmax_normalization(df,'ackdat')\n","numerical_minmax_normalization(df,'smean')\n","numerical_minmax_normalization(df,'dmean')\n","numerical_minmax_normalization(df,'trans_depth')\n","numerical_minmax_normalization(df,'response_body_len')\n","numerical_minmax_normalization(df,'ct_srv_src')\n","numerical_minmax_normalization(df,'ct_state_ttl')\n","numerical_minmax_normalization(df,'ct_dst_ltm')\n","numerical_minmax_normalization(df,'ct_src_dport_ltm')\n","numerical_minmax_normalization(df,'ct_dst_sport_ltm')\n","numerical_minmax_normalization(df,'ct_dst_src_ltm')\n","numerical_minmax_normalization(df,'is_ftp_login')\n","numerical_minmax_normalization(df,'ct_ftp_cmd')\n","numerical_minmax_normalization(df,'ct_flw_http_mthd')\n","numerical_minmax_normalization(df,'ct_src_ltm')\n","numerical_minmax_normalization(df,'ct_srv_dst')\n","numerical_minmax_normalization(df,'is_sm_ips_ports')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"6ScZjFeAru2E","colab_type":"code","colab":{}},"source":["#Mapeo de los valores normalizados del paso anterior a valores enteros entre 0 y 255 (1 Byte de información)\n","def numerical_split_ohe (df,name):\n"," pd_to_np = df[name].tolist()\n"," np_split = []\n"," \n"," categories = np.linspace(0, 1, num=256,endpoint=False)\n"," quantization = range(0,256)\n","\n"," for value in pd_to_np:\n"," for i in range(len(categories)-1):\n"," if (categories[i] <= float(value) <= categories[i+1]):\n"," np_split.append(quantization[i])\n"," break\n"," if (float(value) > categories[-1]):\n"," np_split.append(quantization[-1])\n"," break\n"," \n"," df[name] = np_split\n","\n","\n","numerical_split_ohe(df,'dur')\n","numerical_split_ohe(df,'spkts')\n","numerical_split_ohe(df,'dpkts')\n","numerical_split_ohe(df,'sbytes')\n","numerical_split_ohe(df,'dbytes')\n","numerical_split_ohe(df,'rate')\n","numerical_split_ohe(df,'sttl')\n","numerical_split_ohe(df,'dttl')\n","numerical_split_ohe(df,'sload')\n","numerical_split_ohe(df,'dload')\n","numerical_split_ohe(df,'sloss')\n","numerical_split_ohe(df,'dloss')\n","numerical_split_ohe(df,'sinpkt')\n","numerical_split_ohe(df,'dinpkt')\n","numerical_split_ohe(df,'sjit')\n","numerical_split_ohe(df,'djit')\n","numerical_split_ohe(df,'swin')\n","numerical_split_ohe(df,'stcpb')\n","numerical_split_ohe(df,'dtcpb')\n","numerical_split_ohe(df,'dwin')\n","numerical_split_ohe(df,'tcprtt')\n","numerical_split_ohe(df,'synack')\n","numerical_split_ohe(df,'ackdat')\n","numerical_split_ohe(df,'smean')\n","numerical_split_ohe(df,'dmean')\n","numerical_split_ohe(df,'trans_depth')\n","numerical_split_ohe(df,'response_body_len')\n","numerical_split_ohe(df,'ct_srv_src')\n","numerical_split_ohe(df,'ct_state_ttl')\n","numerical_split_ohe(df,'ct_dst_ltm')\n","numerical_split_ohe(df,'ct_src_dport_ltm')\n","numerical_split_ohe(df,'ct_dst_sport_ltm')\n","numerical_split_ohe(df,'ct_dst_src_ltm')\n","numerical_split_ohe(df,'is_ftp_login')\n","numerical_split_ohe(df,'ct_ftp_cmd')\n","numerical_split_ohe(df,'ct_flw_http_mthd')\n","numerical_split_ohe(df,'ct_src_ltm')\n","numerical_split_ohe(df,'ct_srv_dst')\n","numerical_split_ohe(df,'is_sm_ips_ports')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"SC83D1aGru4M","colab_type":"code","colab":{}},"source":["#Quitando la columna attack_cat y guardandola en la variable y.\n","y_column = df['label']\n","df.drop('label',axis=1,inplace=True)\n","dummies = pd.get_dummies(y_column) \n","y = dummies.values"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"MQMi0pQmru6l","colab_type":"code","colab":{}},"source":["#Normalización de los valores entre -0,5 y 0,5\n","x = []\n","for image in np.array(df.to_numpy()):\n"," x.append((image/255 - 0.5))\n","x = np.array(x)\n"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"XxIke5Xpru83","colab_type":"code","colab":{}},"source":["#Separación del dataset en un set de entrenamiento y otro de validación.\n","sss = StratifiedShuffleSplit(n_splits=1, test_size=0.25, random_state=42)\n","\n","for train_index, test_index in sss.split(x,y):\n"," x_train, x_test = x[train_index], x[test_index]\n"," y_train, y_test = y[train_index], y[test_index]"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"k4J7O6pmru_J","colab_type":"code","colab":{}},"source":["#Definición del modelo final.\n","def LSTM_model():\n","\n"," input_img = Input(shape = (None, 1))\n"," output = LSTM(128,activation='relu',activity_regularizer=l1(1e-5))(input_img)\n"," output = Dense(128, activation='relu',activity_regularizer=l1(1e-5))(output)\n"," output = Dense(64, activation='relu',activity_regularizer=l1(1e-5))(output)\n"," out = Dense(10, activation='sigmoid')(output)\n","\n"," model = Model(inputs = input_img, outputs = out)\n"," model.compile(optimizer='adam', loss=\"binary_crossentropy\", metrics=[\"accuracy\"])\n","\n"," return model"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"AZJ4hQ4BrvBn","colab_type":"code","colab":{}},"source":["#Proceso de entrenamiento\n","model = LSTM_model()\n","es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, min_delta=0.01, patience=10, restore_best_weights=True)\n","history = model.fit(x_train,y_train,validation_data=(x_test,y_test), verbose=1, batch_size=256, epochs=200, callbacks=[es])"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"Jy9SpA_prvEW","colab_type":"code","colab":{}},"source":["#Predicciones de la red entrenada, medidad con Accuracy, precision, reacall y F1.\n","y_pred = model.predict(x)\n","y_pred = np.argmax(y_pred,axis=1) \n","y_true = np.argmax(y,axis=1)\n","\n","print(\"Accuracy: {}\" .format(metrics.accuracy_score(y_true, y_pred)))\n","print(\"Precision: {}\" .format(metrics.precision_score(y_true, y_pred, average='macro')))\n","print(\"Recall: {}\" .format(metrics.recall_score(y_true, y_pred, average='macro')))\n","print(\"F1: {}\" .format(metrics.f1_score(y_true, y_pred, average='macro')))"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"Wx05OXJ3sYGT","colab_type":"code","colab":{}},"source":["#Función para representar la matriz de confusión.\n","def plot_confusing_matrix (y_compare,pred,n_categories,outcome_labels):\n","\n"," cm = metrics.confusion_matrix(y_compare, pred, labels = list(range(n_categories)))\n"," plot_confusion_matrix(conf_mat=cm,figsize=(13,13),class_names = outcome_labels,show_normed=True)\n"," plt.title('Confusing Matrix')\n"," plt.ylabel('Target')\n"," plt.xlabel('Predicted')\n"," plt.show()"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"Vlg-OAdbsYLp","colab_type":"code","colab":{}},"source":["#Matriz de confusión.\n","outcome_labels = [\"Normal\",\"Attack\"]\n","plot_confusing_matrix(y_true,y_pred,2,outcome_labels)"],"execution_count":0,"outputs":[]}]} -------------------------------------------------------------------------------- /MLP_inDeep_binary_class.ipynb: -------------------------------------------------------------------------------- 1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"MLP_inDeep_binary_class.ipynb","provenance":[],"collapsed_sections":[],"mount_file_id":"1L7TurSo8Ah0Kun02XoC_cZRfbjgr7WQ9","authorship_tag":"ABX9TyPWhoA7VnaDeOCG5Uc6oIIK"},"kernelspec":{"name":"python3","display_name":"Python 3"},"accelerator":"GPU"},"cells":[{"cell_type":"code","metadata":{"id":"O1I89gm7yEwC","colab_type":"code","colab":{}},"source":["#Importamos la segunda versión de tensorflow\n","\n","%tensorflow_version 2.x"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"F6saNUBLyND5","colab_type":"code","colab":{}},"source":["#Instalamos las dependencias necesarias que no posee Google Colab.\n","!pip install bayesian-optimization\n","!pip install mlxtend --upgrade --no-deps"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"_P9Ro5MnyNF-","colab_type":"code","colab":{}},"source":["#Importamos los paquetes necesarios para el proyecto\n","import pandas as pd\n","import numpy as np\n","import matplotlib.pyplot as plt\n","from tensorflow.keras.utils import get_file\n","from sklearn import preprocessing\n","from sklearn.preprocessing import LabelEncoder\n","from PIL import Image\n","from sklearn.model_selection import StratifiedShuffleSplit\n","from sklearn import metrics\n","from keras import backend as K\n","from keras.utils.generic_utils import get_custom_objects\n","from bayes_opt import BayesianOptimization\n","from sklearn.model_selection import StratifiedKFold\n","from mlxtend.plotting import plot_confusion_matrix\n","from sklearn.model_selection import train_test_split\n","from imblearn.over_sampling import SMOTE\n","\n","\n","from tensorflow.keras.models import Sequential,Model\n","from tensorflow.keras.layers import Dense, Conv2D,LSTM, MaxPooling2D,UpSampling2D,Conv2DTranspose, Dropout, Flatten, Activation, LeakyReLU, ReLU, Input,concatenate,BatchNormalization\n","from tensorflow.keras.optimizers import Adam\n","from keras.regularizers import l1\n","from tensorflow.keras.callbacks import EarlyStopping"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"NHaznnfOyNH5","colab_type":"code","colab":{}},"source":["#Descargamos el set de datos subido previamente al Google Drive (Tarda menos que descargarlo directamente de la página).\n","path_train = \"/content/drive/My Drive/TFG/UNSW_NB15_training-set.csv\"\n","path_test = \"/content/drive/My Drive/TFG/UNSW_NB15_testing-set.csv\"\n","\n","#Leemos los datos\n","df_train=pd.read_csv(path_train,dtype='unicode')\n","df_test=pd.read_csv(path_test,dtype='unicode')\n","\n","#Quitamos las columnas no necesarias\n","df_train.drop('id', axis=1, inplace=True)\n","df_train.drop('attack_cat', axis=1, inplace=True)\n","df_test.drop('id', axis=1, inplace=True)\n","df_test.drop('attack_cat', axis=1, inplace=True)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"T7FuJYI9yNJw","colab_type":"code","colab":{}},"source":["#Concatenamos el set de entrenamiento y el de test para manipular directamente el conjunto.\n","df = pd.concat([df_train,df_test],axis=0)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"OM78CLysy1kN","colab_type":"code","colab":{}},"source":["#División del dataset por clases.\n","df_class_Normal = df[df['label'] == '0']\n","df_class_Attack = df[df['label'] == '1']\n","\n","#Under-sampling de la categoría attack hasta alcanzar el mismo valor que la categoria Normal\n","df_class_Attack = df_class_Attack.sample(df_class_Normal.shape[0])\n","\n","#Concatenado\n","df = pd.concat([df_class_Normal,df_class_Attack], axis=0)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"FhOiR9yOyNL2","colab_type":"code","colab":{}},"source":["#Visualización en una gráfica circular del peso de cada clase\n","print(\"Shape of dataFrame: {} \\n\".format(df.shape))\n","print(\"Number of attack samples\")\n","display(df['label'].value_counts())\n","print(\"\")\n","print(\"Plotting balance of dataFrame\")\n","df_plot = (df['label'].value_counts(normalize=True) *100)\n","df_plot.plot(kind='pie',figsize=(10,10),title='Balance of dataset (%)')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"WDKVmdvByNNx","colab_type":"code","colab":{}},"source":["#Conversión de los datos tipo string a enteros en un rango entre 0 y 255 (1 Byte de información)\n","def encode_string_byte (df,name):\n"," df[name] = LabelEncoder().fit_transform(df[name])\n"," #df[name] = [(x).to_bytes(1,byteorder='big') for x in df[name]]\n","\n","encode_string_byte (df,'proto')\n","encode_string_byte (df,'state') \n","encode_string_byte (df,'service') "],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"3uLeix38yNQG","colab_type":"code","colab":{}},"source":["#Normalización de los números enteros en valores decimales en rango entre 0 y 1\n","def numerical_minmax_normalization (df, name):\n"," x = df[name].values.reshape(-1,1)\n"," min_max_scaler = preprocessing.MinMaxScaler()\n"," x_scaled = min_max_scaler.fit_transform(x)\n"," df[name] = x_scaled\n","\n","numerical_minmax_normalization(df,'dur')\n","numerical_minmax_normalization(df,'spkts')\n","numerical_minmax_normalization(df,'dpkts')\n","numerical_minmax_normalization(df,'sbytes')\n","numerical_minmax_normalization(df,'dbytes')\n","numerical_minmax_normalization(df,'rate')\n","numerical_minmax_normalization(df,'sttl')\n","numerical_minmax_normalization(df,'dttl')\n","numerical_minmax_normalization(df,'sload')\n","numerical_minmax_normalization(df,'dload')\n","numerical_minmax_normalization(df,'sloss')\n","numerical_minmax_normalization(df,'dloss')\n","numerical_minmax_normalization(df,'sinpkt')\n","numerical_minmax_normalization(df,'dinpkt')\n","numerical_minmax_normalization(df,'sjit')\n","numerical_minmax_normalization(df,'djit')\n","numerical_minmax_normalization(df,'swin')\n","numerical_minmax_normalization(df,'stcpb')\n","numerical_minmax_normalization(df,'dtcpb')\n","numerical_minmax_normalization(df,'dwin')\n","numerical_minmax_normalization(df,'tcprtt')\n","numerical_minmax_normalization(df,'synack')\n","numerical_minmax_normalization(df,'ackdat')\n","numerical_minmax_normalization(df,'smean')\n","numerical_minmax_normalization(df,'dmean')\n","numerical_minmax_normalization(df,'trans_depth')\n","numerical_minmax_normalization(df,'response_body_len')\n","numerical_minmax_normalization(df,'ct_srv_src')\n","numerical_minmax_normalization(df,'ct_state_ttl')\n","numerical_minmax_normalization(df,'ct_dst_ltm')\n","numerical_minmax_normalization(df,'ct_src_dport_ltm')\n","numerical_minmax_normalization(df,'ct_dst_sport_ltm')\n","numerical_minmax_normalization(df,'ct_dst_src_ltm')\n","numerical_minmax_normalization(df,'is_ftp_login')\n","numerical_minmax_normalization(df,'ct_ftp_cmd')\n","numerical_minmax_normalization(df,'ct_flw_http_mthd')\n","numerical_minmax_normalization(df,'ct_src_ltm')\n","numerical_minmax_normalization(df,'ct_srv_dst')\n","numerical_minmax_normalization(df,'is_sm_ips_ports')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"0oPb1jXuyNT3","colab_type":"code","colab":{}},"source":["#Mapeo de los valores normalizados del paso anterior a valores enteros entre 0 y 255 (1 Byte de información)\n","def numerical_split_ohe (df,name):\n"," pd_to_np = df[name].tolist()\n"," np_split = []\n"," \n"," categories = np.linspace(0, 1, num=256,endpoint=False)\n"," quantization = range(0,256)\n","\n"," for value in pd_to_np:\n"," for i in range(len(categories)-1):\n"," if (categories[i] <= float(value) <= categories[i+1]):\n"," np_split.append(quantization[i])\n"," break\n"," if (float(value) > categories[-1]):\n"," np_split.append(quantization[-1])\n"," break\n"," \n"," df[name] = np_split\n","\n","\n","numerical_split_ohe(df,'dur')\n","numerical_split_ohe(df,'spkts')\n","numerical_split_ohe(df,'dpkts')\n","numerical_split_ohe(df,'sbytes')\n","numerical_split_ohe(df,'dbytes')\n","numerical_split_ohe(df,'rate')\n","numerical_split_ohe(df,'sttl')\n","numerical_split_ohe(df,'dttl')\n","numerical_split_ohe(df,'sload')\n","numerical_split_ohe(df,'dload')\n","numerical_split_ohe(df,'sloss')\n","numerical_split_ohe(df,'dloss')\n","numerical_split_ohe(df,'sinpkt')\n","numerical_split_ohe(df,'dinpkt')\n","numerical_split_ohe(df,'sjit')\n","numerical_split_ohe(df,'djit')\n","numerical_split_ohe(df,'swin')\n","numerical_split_ohe(df,'stcpb')\n","numerical_split_ohe(df,'dtcpb')\n","numerical_split_ohe(df,'dwin')\n","numerical_split_ohe(df,'tcprtt')\n","numerical_split_ohe(df,'synack')\n","numerical_split_ohe(df,'ackdat')\n","numerical_split_ohe(df,'smean')\n","numerical_split_ohe(df,'dmean')\n","numerical_split_ohe(df,'trans_depth')\n","numerical_split_ohe(df,'response_body_len')\n","numerical_split_ohe(df,'ct_srv_src')\n","numerical_split_ohe(df,'ct_state_ttl')\n","numerical_split_ohe(df,'ct_dst_ltm')\n","numerical_split_ohe(df,'ct_src_dport_ltm')\n","numerical_split_ohe(df,'ct_dst_sport_ltm')\n","numerical_split_ohe(df,'ct_dst_src_ltm')\n","numerical_split_ohe(df,'is_ftp_login')\n","numerical_split_ohe(df,'ct_ftp_cmd')\n","numerical_split_ohe(df,'ct_flw_http_mthd')\n","numerical_split_ohe(df,'ct_src_ltm')\n","numerical_split_ohe(df,'ct_srv_dst')\n","numerical_split_ohe(df,'is_sm_ips_ports')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"dSkLL7xkyNV_","colab_type":"code","colab":{}},"source":["#Quitando la columna attack_cat y guardandola en la variable y.\n","y_column = df['label']\n","df.drop('label',axis=1,inplace=True)\n","dummies = pd.get_dummies(y_column) \n","y = dummies.values"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"WqSHxrDMzcBR","colab_type":"code","colab":{}},"source":["#Normalización de los valores entre -0,5 y 0,5\n","x = []\n","for image in np.array(df.to_numpy()):\n"," x.append((image/255 - 0.5))\n","x = np.array(x)\n"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"At6RHtetyNYJ","colab_type":"code","colab":{}},"source":["#Separación del dataset en un set de entrenamiento y otro de validación.\n","sss = StratifiedShuffleSplit(n_splits=1, test_size=0.25, random_state=42)\n","\n","for train_index, test_index in sss.split(x,y):\n"," x_train, x_test = x[train_index], x[test_index]\n"," y_train, y_test = y[train_index], y[test_index]"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"9wRKt9kcyNad","colab_type":"code","colab":{}},"source":["#Definición del modelo final.\n","def MLP_model():\n","\n"," input_img = Input(shape = 42)\n"," output = Dense(42, activation= 'relu')(input_img)\n"," output = Dense(512, activation= 'relu')(output)\n"," output = Dense(256, activation= 'relu')(output)\n"," output = Dense(128, activation= 'relu')(output)\n"," out = Dense(2, activation='sigmoid')(output)\n","\n"," model = Model(inputs = input_img, outputs = out)\n"," model.compile(optimizer=Adam(lr=0.001), loss=\"binary_crossentropy\", metrics=[\"accuracy\"])\n","\n"," return model"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"Bz9Lkh--zFSU","colab_type":"code","colab":{}},"source":["#Proceso de entrenamiento\n","model = MLP_model()\n","es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, min_delta=0.01, patience=10, restore_best_weights=True)\n","history = model.fit(x_train,y_train,validation_data=(x_test,y_test), verbose=1, batch_size=512, epochs=200, callbacks=[es])"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"LX6DV17_zRtT","colab_type":"code","colab":{}},"source":["#Predicciones de la red entrenada, medidad con Accuracy, precision, reacall y F1.\n","y_pred = model.predict(x)\n","y_pred = np.argmax(y_pred,axis=1) \n","y_true = np.argmax(y,axis=1)\n","\n","print(\"Accuracy: {}\" .format(metrics.accuracy_score(y_true, y_pred)))\n","print(\"Precision: {}\" .format(metrics.precision_score(y_true, y_pred, average='macro')))\n","print(\"Recall: {}\" .format(metrics.recall_score(y_true, y_pred, average='macro')))\n","print(\"F1: {}\" .format(metrics.f1_score(y_true, y_pred, average='macro')))"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"-JM6IiJgzRrG","colab_type":"code","colab":{}},"source":["#Función para representar la matriz de confusión.\n","def plot_confusing_matrix (y_compare,pred,n_categories,outcome_labels):\n","\n"," cm = metrics.confusion_matrix(y_compare, pred, labels = list(range(n_categories)))\n"," plot_confusion_matrix(conf_mat=cm,figsize=(13,13),class_names = outcome_labels,show_normed=True)\n"," plt.title('Confusing Matrix')\n"," plt.ylabel('Target')\n"," plt.xlabel('Predicted')\n"," plt.show()"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"yUpngcauzRvm","colab_type":"code","colab":{}},"source":["#Matriz de confusión.\n","outcome_labels = [\"Normal\",\"Attack\"]\n","plot_confusing_matrix(y_true,y_pred,2,outcome_labels)"],"execution_count":0,"outputs":[]}]} -------------------------------------------------------------------------------- /InDeep_binary_class.ipynb: -------------------------------------------------------------------------------- 1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"InDeep_binary_class.ipynb","provenance":[],"collapsed_sections":[],"mount_file_id":"1_122ttph4oGcyizT8U-l_9vDEhN0jLbd","authorship_tag":"ABX9TyMzAWZzg3WmGWx5Qk41MxGU"},"kernelspec":{"name":"python3","display_name":"Python 3"},"accelerator":"GPU"},"cells":[{"cell_type":"code","metadata":{"id":"cEuuR-z_BmeD","colab_type":"code","colab":{}},"source":["#Importamos la segunda versión de tensorflow\n","%tensorflow_version 2.x"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"-zzLBI0iBs33","colab_type":"code","colab":{}},"source":["#Instalamos las dependencias necesarias que no posee Google Colab.\n","!pip install bayesian-optimization\n","!pip install mlxtend --upgrade --no-deps"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"ZdterqtlBs6B","colab_type":"code","colab":{}},"source":["#Importamos los paquetes necesarios para el proyecto\n","import pandas as pd\n","import numpy as np\n","import matplotlib.pyplot as plt\n","from tensorflow.keras.utils import get_file\n","from sklearn import preprocessing\n","from sklearn.preprocessing import LabelEncoder\n","from PIL import Image\n","from sklearn.model_selection import StratifiedShuffleSplit\n","from sklearn import metrics\n","from keras import backend as K\n","from keras.utils.generic_utils import get_custom_objects\n","from bayes_opt import BayesianOptimization\n","from sklearn.model_selection import StratifiedKFold\n","from mlxtend.plotting import plot_confusion_matrix\n","from sklearn.model_selection import train_test_split\n","from imblearn.over_sampling import SMOTE\n","\n","\n","from tensorflow.keras.models import Sequential,Model\n","from tensorflow.keras.layers import Dense, Conv2D,LSTM, MaxPooling2D,UpSampling2D,Conv2DTranspose, Dropout, Flatten, Activation, LeakyReLU, ReLU, Input,concatenate,BatchNormalization\n","from tensorflow.keras.optimizers import Adam\n","from keras.regularizers import l1\n","from tensorflow.keras.callbacks import EarlyStopping"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"MrSpVkOxBs8N","colab_type":"code","colab":{}},"source":["#Descargamos el set de datos subido previamente al Google Drive (Tarda menos que descargarlo directamente de la página).\n","path_train = \"/content/drive/My Drive/TFG/UNSW_NB15_training-set.csv\"\n","path_test = \"/content/drive/My Drive/TFG/UNSW_NB15_testing-set.csv\"\n","\n","#Leemos los datos\n","df_train=pd.read_csv(path_train,dtype='unicode')\n","df_test=pd.read_csv(path_test,dtype='unicode')\n","\n","#Quitamos las columnas no necesarias\n","df_train.drop('id', axis=1, inplace=True)\n","df_train.drop('attack_cat', axis=1, inplace=True)\n","df_test.drop('id', axis=1, inplace=True)\n","df_test.drop('attack_cat', axis=1, inplace=True)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"xhDayPhNBs-c","colab_type":"code","colab":{}},"source":["#Concatenamos el set de entrenamiento y el de test para manipular directamente el conjunto.\n","df = pd.concat([df_train,df_test],axis=0)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"QtizAFEQ7lYM","colab_type":"code","colab":{}},"source":["#División del dataset por clases.\n","df_class_Normal = df[df['label'] == '0']\n","df_class_Attack = df[df['label'] == '1']\n","\n","#Under-sampling de la categoría attack hasta alcanzar el mismo valor que la categoria Normal\n","df_class_Attack = df_class_Attack.sample(df_class_Normal.shape[0])\n","\n","#Concatenado\n","df = pd.concat([df_class_Normal,df_class_Attack], axis=0)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"eYw3fPlNBtDv","colab_type":"code","colab":{}},"source":["#Visualización en una gráfica circular del peso de cada clase\n","print(\"Shape of dataFrame: {} \\n\".format(df.shape))\n","print(\"Number of attack samples\")\n","display(df['label'].value_counts())\n","print(\"\")\n","print(\"Plotting balance of dataFrame\")\n","df_plot = (df['label'].value_counts(normalize=True) *100)\n","df_plot.plot(kind='pie',figsize=(10,10),title='Balance of dataset (%)')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"F8GyxFhhkTyR","colab_type":"code","colab":{}},"source":["#Conversión de los datos tipo string a enteros en un rango entre 0 y 255 (1 Byte de información)\n","def encode_string_byte (df,name):\n"," df[name] = LabelEncoder().fit_transform(df[name])\n","\n","encode_string_byte (df,'proto')\n","encode_string_byte (df,'state') \n","encode_string_byte (df,'service') "],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"ZWFV0AcMBtF8","colab_type":"code","colab":{}},"source":["#Normalización de los números enteros en valores decimales en rango entre 0 y 1\n","def numerical_minmax_normalization (df, name):\n"," x = df[name].values.reshape(-1,1)\n"," min_max_scaler = preprocessing.MinMaxScaler()\n"," x_scaled = min_max_scaler.fit_transform(x)\n"," df[name] = x_scaled\n","\n","numerical_minmax_normalization(df,'dur')\n","numerical_minmax_normalization(df,'spkts')\n","numerical_minmax_normalization(df,'dpkts')\n","numerical_minmax_normalization(df,'sbytes')\n","numerical_minmax_normalization(df,'dbytes')\n","numerical_minmax_normalization(df,'rate')\n","numerical_minmax_normalization(df,'sttl')\n","numerical_minmax_normalization(df,'dttl')\n","numerical_minmax_normalization(df,'sload')\n","numerical_minmax_normalization(df,'dload')\n","numerical_minmax_normalization(df,'sloss')\n","numerical_minmax_normalization(df,'dloss')\n","numerical_minmax_normalization(df,'sinpkt')\n","numerical_minmax_normalization(df,'dinpkt')\n","numerical_minmax_normalization(df,'sjit')\n","numerical_minmax_normalization(df,'djit')\n","numerical_minmax_normalization(df,'swin')\n","numerical_minmax_normalization(df,'stcpb')\n","numerical_minmax_normalization(df,'dtcpb')\n","numerical_minmax_normalization(df,'dwin')\n","numerical_minmax_normalization(df,'tcprtt')\n","numerical_minmax_normalization(df,'synack')\n","numerical_minmax_normalization(df,'ackdat')\n","numerical_minmax_normalization(df,'smean')\n","numerical_minmax_normalization(df,'dmean')\n","numerical_minmax_normalization(df,'trans_depth')\n","numerical_minmax_normalization(df,'response_body_len')\n","numerical_minmax_normalization(df,'ct_srv_src')\n","numerical_minmax_normalization(df,'ct_state_ttl')\n","numerical_minmax_normalization(df,'ct_dst_ltm')\n","numerical_minmax_normalization(df,'ct_src_dport_ltm')\n","numerical_minmax_normalization(df,'ct_dst_sport_ltm')\n","numerical_minmax_normalization(df,'ct_dst_src_ltm')\n","numerical_minmax_normalization(df,'is_ftp_login')\n","numerical_minmax_normalization(df,'ct_ftp_cmd')\n","numerical_minmax_normalization(df,'ct_flw_http_mthd')\n","numerical_minmax_normalization(df,'ct_src_ltm')\n","numerical_minmax_normalization(df,'ct_srv_dst')\n","numerical_minmax_normalization(df,'is_sm_ips_ports')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"xElC18PzBtKE","colab_type":"code","colab":{}},"source":["#Mapeo de los valores normalizados del paso anterior a valores enteros entre 0 y 255 (1 Byte de información)\n","def numerical_split_ohe (df,name):\n"," pd_to_np = df[name].tolist()\n"," np_split = []\n"," \n"," categories = np.linspace(0, 1, num=256,endpoint=False)\n"," quantization = range(0,256)\n","\n"," for value in pd_to_np:\n"," for i in range(len(categories)-1):\n"," if (categories[i] <= float(value) <= categories[i+1]):\n"," np_split.append(quantization[i])\n"," break\n"," if (float(value) > categories[-1]):\n"," np_split.append(quantization[-1])\n"," break\n"," \n"," df[name] = np_split\n","\n","\n","numerical_split_ohe(df,'dur')\n","numerical_split_ohe(df,'spkts')\n","numerical_split_ohe(df,'dpkts')\n","numerical_split_ohe(df,'sbytes')\n","numerical_split_ohe(df,'dbytes')\n","numerical_split_ohe(df,'rate')\n","numerical_split_ohe(df,'sttl')\n","numerical_split_ohe(df,'dttl')\n","numerical_split_ohe(df,'sload')\n","numerical_split_ohe(df,'dload')\n","numerical_split_ohe(df,'sloss')\n","numerical_split_ohe(df,'dloss')\n","numerical_split_ohe(df,'sinpkt')\n","numerical_split_ohe(df,'dinpkt')\n","numerical_split_ohe(df,'sjit')\n","numerical_split_ohe(df,'djit')\n","numerical_split_ohe(df,'swin')\n","numerical_split_ohe(df,'stcpb')\n","numerical_split_ohe(df,'dtcpb')\n","numerical_split_ohe(df,'dwin')\n","numerical_split_ohe(df,'tcprtt')\n","numerical_split_ohe(df,'synack')\n","numerical_split_ohe(df,'ackdat')\n","numerical_split_ohe(df,'smean')\n","numerical_split_ohe(df,'dmean')\n","numerical_split_ohe(df,'trans_depth')\n","numerical_split_ohe(df,'response_body_len')\n","numerical_split_ohe(df,'ct_srv_src')\n","numerical_split_ohe(df,'ct_state_ttl')\n","numerical_split_ohe(df,'ct_dst_ltm')\n","numerical_split_ohe(df,'ct_src_dport_ltm')\n","numerical_split_ohe(df,'ct_dst_sport_ltm')\n","numerical_split_ohe(df,'ct_dst_src_ltm')\n","numerical_split_ohe(df,'is_ftp_login')\n","numerical_split_ohe(df,'ct_ftp_cmd')\n","numerical_split_ohe(df,'ct_flw_http_mthd')\n","numerical_split_ohe(df,'ct_src_ltm')\n","numerical_split_ohe(df,'ct_srv_dst')\n","numerical_split_ohe(df,'is_sm_ips_ports')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"DlYHxzb6BtMF","colab_type":"code","colab":{}},"source":["#Quitando la columna attack_cat y guardandola en la variable y.\n","y_column = df['label']\n","df.drop('label',axis=1,inplace=True)\n","dummies = pd.get_dummies(y_column) \n","y = dummies.values"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"vdAXl3F2BtSm","colab_type":"code","colab":{}},"source":["#Padding para que cada fila de datos tenga 64 valores.\n","byte_images = np.pad(df.to_numpy(), ((0,0),(0,22)), 'constant')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"dsjSkRzUBtU6","colab_type":"code","colab":{}},"source":["#Normalización de los valores entre -0,5 y 0,5\n","x = []\n","for image in np.array(byte_images):\n"," x.append((image/255 - 0.5).reshape(8,8))\n","x = np.array(x)\n","x = x.reshape(x.shape[0],8,8,1)\n"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"ArRvwJJxBtWo","colab_type":"code","colab":{}},"source":["#Función de activación Swish\n","def swish(x):\n"," return (K.sigmoid(x) * x)\n","\n","get_custom_objects().update({'swish': Activation(swish)})"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"HQJZUO8KBtY3","colab_type":"code","colab":{}},"source":["#Separación del dataset en un set de entrenamiento y otro de validación.\n","sss = StratifiedShuffleSplit(n_splits=1, test_size=0.25, random_state=42)\n","\n","for train_index, test_index in sss.split(x,y):\n"," x_train, x_test = x[train_index], x[test_index]\n"," y_train, y_test = y[train_index], y[test_index]"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"SW-tGkVYBtbI","colab_type":"code","colab":{}},"source":["#Definición del modelo final.\n","def InDeep_model():\n","\n"," input_img = Input(shape = (8, 8, 1))\n","\n"," block_1 = BatchNormalization()(input_img)\n"," block_1 = Activation(swish)(block_1)\n"," block_1 = Conv2D(16, (3,3), padding='same')(block_1)\n"," block_1 = BatchNormalization()(block_1)\n"," block_1 = Activation(swish)(block_1)\n"," block_1 = Conv2D(16, (3,3), padding='same')(block_1)\n","\n"," concat_1 = concatenate([input_img, block_1], axis = 3)\n","\n"," block_2 = BatchNormalization()(concat_1)\n"," block_2 = Activation(swish)(block_2)\n"," block_2 = Conv2D(32, (3,3), padding='same')(block_2)\n"," block_2 = BatchNormalization()(block_2)\n"," block_2 = Activation(swish)(block_2)\n"," block_2 = Conv2D(32, (3,3), padding='same')(block_2)\n","\n"," concat_2 = concatenate([input_img, block_1,block_2], axis = 3)\n","\n"," block_3 = BatchNormalization()(concat_2)\n"," block_3 = Activation(swish)(block_3)\n"," block_3 = Conv2D(64, (3,3), padding='same')(block_3)\n"," block_3 = BatchNormalization()(block_3)\n"," block_3 = Activation(swish)(block_3)\n"," block_3 = Conv2D(64, (3,3), padding='same',strides=(2,2))(block_3)\n","\n"," output = Flatten()(block_3)\n"," output = Dense(128, activation=Activation(swish))(output)\n"," output = Dropout(rate=0.2)(output)\n"," output = Dense(64, activation=Activation(swish))(output)\n"," output = Dropout(rate=0.2)(output)\n"," out = Dense(2, activation='sigmoid')(output)\n","\n"," model = Model(inputs = input_img, outputs = out)\n"," model.compile(optimizer='adam', loss=\"binary_crossentropy\", metrics=[\"accuracy\"])\n","\n"," return model"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"RqiyIccHlev6","colab_type":"code","colab":{}},"source":["#Proceso de entrenamiento\n","model = InDeep_model()\n","es = EarlyStopping(monitor='accuracy', mode='max', verbose=1, min_delta=0.005, patience=10, restore_best_weights=True)\n","history = model.fit(x_train,y_train,validation_data=(x_test,y_test), verbose=1, batch_size=256, epochs=200, callbacks=[es])"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"ERmJEAf9rR-0","colab_type":"code","colab":{}},"source":["#Predicciones de la red entrenada, medidad con Accuracy, precision, reacall y F1.\n","y_pred = model.predict(x)\n","y_pred = np.argmax(y_pred,axis=1) \n","y_true = np.argmax(y,axis=1)\n","\n","print(\"Accuracy: {}\" .format(metrics.accuracy_score(y_true, y_pred)))\n","print(\"Precision: {}\" .format(metrics.precision_score(y_true, y_pred, average='macro')))\n","print(\"Recall: {}\" .format(metrics.recall_score(y_true, y_pred, average='macro')))\n","print(\"F1: {}\" .format(metrics.f1_score(y_true, y_pred, average='macro')))"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"MOn0UlgClex9","colab_type":"code","colab":{}},"source":["#Función para representar la matriz de confusión.\n","def plot_confusing_matrix (y_compare,pred,n_categories,outcome_labels):\n","\n"," cm = metrics.confusion_matrix(y_compare, pred, labels = list(range(n_categories)))\n"," plot_confusion_matrix(conf_mat=cm,figsize=(13,13),class_names = outcome_labels,show_normed=True)\n"," plt.title('Confusing Matrix')\n"," plt.ylabel('Target')\n"," plt.xlabel('Predicted')\n"," plt.show()"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"SZEGsohLle0E","colab_type":"code","colab":{}},"source":["#Matriz de confusión.\n","outcome_labels = [\"Normal\",\"Attack\"]\n","plot_confusing_matrix(y_true,y_pred,2,outcome_labels)"],"execution_count":0,"outputs":[]}]} -------------------------------------------------------------------------------- /inDeep_Feature_study.ipynb: -------------------------------------------------------------------------------- 1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"inDeep_Feature_study.ipynb","provenance":[],"collapsed_sections":[],"mount_file_id":"1jV-vAKbDunozKHHtpUOvkgD30cRexNxt","authorship_tag":"ABX9TyPMW3k9fKm3oDaVT4kDX+ra"},"kernelspec":{"name":"python3","display_name":"Python 3"},"accelerator":"GPU"},"cells":[{"cell_type":"code","metadata":{"id":"MMjylng_Y75w","colab_type":"code","colab":{}},"source":["#Importamos la segunda versión de tensorflow\n","%tensorflow_version 2.x"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"HUCdrpC0ZBq6","colab_type":"code","colab":{}},"source":["#Instalamos las dependencias necesarias que no posee Google Colab.\n","!pip install bayesian-optimization\n","!pip install mlxtend --upgrade --no-deps"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"jMTqRdAWZBtC","colab_type":"code","colab":{}},"source":["#Importamos los paquetes necesarios para el proyecto\n","import pandas as pd\n","import numpy as np\n","import matplotlib.pyplot as plt\n","from tensorflow.keras.utils import get_file\n","from sklearn import preprocessing\n","from sklearn.preprocessing import LabelEncoder\n","from PIL import Image\n","from sklearn.model_selection import StratifiedShuffleSplit\n","from sklearn import metrics\n","from keras import backend as K\n","from keras.utils.generic_utils import get_custom_objects\n","from bayes_opt import BayesianOptimization\n","from sklearn.model_selection import StratifiedKFold\n","from mlxtend.plotting import plot_confusion_matrix\n","from sklearn.model_selection import train_test_split\n","from imblearn.over_sampling import SMOTE\n","\n","\n","from tensorflow.keras.models import Sequential,Model\n","from tensorflow.keras.layers import Dense, Conv2D,LSTM, MaxPooling2D,UpSampling2D,Conv2DTranspose, Dropout, Flatten, Activation, LeakyReLU, ReLU, Input,concatenate,BatchNormalization\n","from tensorflow.keras.optimizers import Adam\n","from keras.regularizers import l1\n","from tensorflow.keras.callbacks import EarlyStopping"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"aBXIuEb6ZBu9","colab_type":"code","colab":{}},"source":["#Descargamos el set de datos subido previamente al Google Drive (Tarda menos que descargarlo directamente de la página).\n","path_train = \"/content/drive/My Drive/TFG/UNSW_NB15_training-set.csv\"\n","path_test = \"/content/drive/My Drive/TFG/UNSW_NB15_testing-set.csv\"\n","\n","#Leemos los datos\n","df_train=pd.read_csv(path_train,dtype='unicode')\n","df_test=pd.read_csv(path_test,dtype='unicode')\n","\n","#Quitamos las columnas no necesarias\n","df_train.drop('id', axis=1, inplace=True)\n","df_train.drop('label', axis=1, inplace=True)\n","df_test.drop('id', axis=1, inplace=True)\n","df_test.drop('label', axis=1, inplace=True)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"LzEjcA3bZBw4","colab_type":"code","colab":{}},"source":["#Concatenamos el set de entrenamiento y el de test para manipular directamente el conjunto.\n","df = pd.concat([df_train,df_test],axis=0)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"wSIPRImnZBzA","colab_type":"code","colab":{}},"source":["#Conversión de los datos tipo string a enteros en un rango entre 0 y 255 (1 Byte de información)\n","def encode_string_byte (df,name):\n"," df[name] = LabelEncoder().fit_transform(df[name])\n","\n","encode_string_byte (df,'proto')\n","encode_string_byte (df,'state') \n","encode_string_byte (df,'service') \n","\n","display(df.head())\n","print(\"\")\n","print(\"Proto column --> Max value: {} Min value: {} \".format(max(df['proto']),min(df['proto'])))\n","print(\"State column --> Max value: {} Min value: {} \".format(max(df['state']),min(df['state'])))\n","print(\"Service column --> Max value: {} Min value: {} \".format(max(df['service']),min(df['service'])))"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"9F-r932LZB06","colab_type":"code","colab":{}},"source":["#Normalización de los números enteros en valores decimales en rango entre 0 y 1\n","def numerical_minmax_normalization (df, name):\n"," x = df[name].values.reshape(-1,1)\n"," min_max_scaler = preprocessing.MinMaxScaler()\n"," x_scaled = min_max_scaler.fit_transform(x)\n"," df[name] = x_scaled\n","\n","numerical_minmax_normalization(df,'dur')\n","numerical_minmax_normalization(df,'spkts')\n","numerical_minmax_normalization(df,'dpkts')\n","numerical_minmax_normalization(df,'sbytes')\n","numerical_minmax_normalization(df,'dbytes')\n","numerical_minmax_normalization(df,'rate')\n","numerical_minmax_normalization(df,'sttl')\n","numerical_minmax_normalization(df,'dttl')\n","numerical_minmax_normalization(df,'sload')\n","numerical_minmax_normalization(df,'dload')\n","numerical_minmax_normalization(df,'sloss')\n","numerical_minmax_normalization(df,'dloss')\n","numerical_minmax_normalization(df,'sinpkt')\n","numerical_minmax_normalization(df,'dinpkt')\n","numerical_minmax_normalization(df,'sjit')\n","numerical_minmax_normalization(df,'djit')\n","numerical_minmax_normalization(df,'swin')\n","numerical_minmax_normalization(df,'stcpb')\n","numerical_minmax_normalization(df,'dtcpb')\n","numerical_minmax_normalization(df,'dwin')\n","numerical_minmax_normalization(df,'tcprtt')\n","numerical_minmax_normalization(df,'synack')\n","numerical_minmax_normalization(df,'ackdat')\n","numerical_minmax_normalization(df,'smean')\n","numerical_minmax_normalization(df,'dmean')\n","numerical_minmax_normalization(df,'trans_depth')\n","numerical_minmax_normalization(df,'response_body_len')\n","numerical_minmax_normalization(df,'ct_srv_src')\n","numerical_minmax_normalization(df,'ct_state_ttl')\n","numerical_minmax_normalization(df,'ct_dst_ltm')\n","numerical_minmax_normalization(df,'ct_src_dport_ltm')\n","numerical_minmax_normalization(df,'ct_dst_sport_ltm')\n","numerical_minmax_normalization(df,'ct_dst_src_ltm')\n","numerical_minmax_normalization(df,'is_ftp_login')\n","numerical_minmax_normalization(df,'ct_ftp_cmd')\n","numerical_minmax_normalization(df,'ct_flw_http_mthd')\n","numerical_minmax_normalization(df,'ct_src_ltm')\n","numerical_minmax_normalization(df,'ct_srv_dst')\n","numerical_minmax_normalization(df,'is_sm_ips_ports')\n","\n","df.head()"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"IRd_C7OAZB2-","colab_type":"code","colab":{}},"source":["#Mapeo de los valores normalizados del paso anterior a valores enteros entre 0 y 255 (1 Byte de información)\n","def numerical_split_ohe (df,name):\n"," pd_to_np = df[name].tolist()\n"," np_split = []\n"," \n"," categories = np.linspace(0, 1, num=256,endpoint=False)\n"," quantization = range(0,256)\n","\n"," for value in pd_to_np:\n"," for i in range(len(categories)-1):\n"," if (categories[i] <= float(value) <= categories[i+1]):\n"," np_split.append(quantization[i])\n"," break\n"," if (float(value) > categories[-1]):\n"," np_split.append(quantization[-1])\n"," break\n"," \n"," df[name] = np_split\n","\n","\n","numerical_split_ohe(df,'dur')\n","numerical_split_ohe(df,'spkts')\n","numerical_split_ohe(df,'dpkts')\n","numerical_split_ohe(df,'sbytes')\n","numerical_split_ohe(df,'dbytes')\n","numerical_split_ohe(df,'rate')\n","numerical_split_ohe(df,'sttl')\n","numerical_split_ohe(df,'dttl')\n","numerical_split_ohe(df,'sload')\n","numerical_split_ohe(df,'dload')\n","numerical_split_ohe(df,'sloss')\n","numerical_split_ohe(df,'dloss')\n","numerical_split_ohe(df,'sinpkt')\n","numerical_split_ohe(df,'dinpkt')\n","numerical_split_ohe(df,'sjit')\n","numerical_split_ohe(df,'djit')\n","numerical_split_ohe(df,'swin')\n","numerical_split_ohe(df,'stcpb')\n","numerical_split_ohe(df,'dtcpb')\n","numerical_split_ohe(df,'dwin')\n","numerical_split_ohe(df,'tcprtt')\n","numerical_split_ohe(df,'synack')\n","numerical_split_ohe(df,'ackdat')\n","numerical_split_ohe(df,'smean')\n","numerical_split_ohe(df,'dmean')\n","numerical_split_ohe(df,'trans_depth')\n","numerical_split_ohe(df,'response_body_len')\n","numerical_split_ohe(df,'ct_srv_src')\n","numerical_split_ohe(df,'ct_state_ttl')\n","numerical_split_ohe(df,'ct_dst_ltm')\n","numerical_split_ohe(df,'ct_src_dport_ltm')\n","numerical_split_ohe(df,'ct_dst_sport_ltm')\n","numerical_split_ohe(df,'ct_dst_src_ltm')\n","numerical_split_ohe(df,'is_ftp_login')\n","numerical_split_ohe(df,'ct_ftp_cmd')\n","numerical_split_ohe(df,'ct_flw_http_mthd')\n","numerical_split_ohe(df,'ct_src_ltm')\n","numerical_split_ohe(df,'ct_srv_dst')\n","numerical_split_ohe(df,'is_sm_ips_ports')\n","\n","display(df.head())"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"xd_QqL0aZB9X","colab_type":"code","colab":{}},"source":["#Quitando la columna attack_cat y guardandola en la variable y.\n","y_column = df['attack_cat']\n","df.drop('attack_cat',axis=1,inplace=True)\n","dummies = pd.get_dummies(y_column) \n","y = dummies.values\n","\n","print(y[:5])\n","display(dummies.head())"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"t8I57rAXZB7I","colab_type":"code","colab":{}},"source":["#Función de activación Swish\n","def swish(x):\n"," return (K.sigmoid(x) * x)\n","\n","get_custom_objects().update({'swish': Activation(swish)})"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"AGAivxe7bC7t","colab_type":"code","colab":{}},"source":["#Lista de columnas seleccionadas por el RFE.\n","list_features = [['smean'],\n","['smean', 'dmean'],\n","['smean', 'dmean', 'ct_srv_dst'],\n","['synack', 'smean', 'dmean', 'ct_srv_dst'],\n","['proto', 'synack', 'smean', 'dmean', 'ct_srv_dst'],\n","['proto', 'synack', 'smean', 'dmean', 'ct_dst_src_ltm', 'ct_srv_dst'],\n","['proto', 'sttl', 'synack', 'smean', 'dmean', 'ct_dst_src_ltm', 'ct_srv_dst'],\n","['proto', 'service', 'sttl', 'synack', 'smean', 'dmean', 'ct_dst_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'sttl', 'synack', 'smean', 'dmean', 'ct_dst_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'sttl', 'sload', 'synack', 'smean', 'dmean', 'ct_dst_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'sttl', 'sload', 'synack', 'smean', 'dmean', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'sttl', 'sload', 'synack', 'smean', 'dmean', 'ct_srv_src', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'sttl', 'sload', 'synack', 'smean', 'dmean', 'ct_srv_src', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'sttl', 'sload', 'tcprtt', 'synack', 'smean', 'dmean', 'ct_srv_src', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'rate', 'sttl', 'sload', 'tcprtt', 'synack', 'smean', 'dmean', 'ct_srv_src', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'rate', 'sttl', 'sload', 'tcprtt', 'synack', 'smean', 'dmean', 'ct_srv_src', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'rate', 'sttl', 'sload', 'dtcpb', 'tcprtt', 'synack', 'smean', 'dmean', 'ct_srv_src', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'rate', 'sttl', 'sload', 'dtcpb', 'tcprtt', 'synack', 'smean', 'dmean', 'ct_srv_src', 'ct_state_ttl', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'rate', 'sttl', 'sload', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'smean', 'dmean', 'ct_srv_src', 'ct_state_ttl', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'rate', 'sttl', 'sload', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'smean', 'dmean', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'rate', 'sttl', 'sload', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'smean', 'dmean', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'rate', 'sttl', 'sload', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'rate', 'sttl', 'sload', 'sjit', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'rate', 'sttl', 'dttl', 'sload', 'sjit', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'rate', 'sttl', 'dttl', 'sload', 'sjit', 'djit', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'rate', 'sttl', 'dttl', 'sload', 'sjit', 'djit', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'state', 'rate', 'sttl', 'dttl', 'sload', 'sjit', 'djit', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'state', 'spkts', 'rate', 'sttl', 'dttl', 'sload', 'sjit', 'djit', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'state', 'spkts', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'sjit', 'djit', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'state', 'spkts', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'sinpkt', 'sjit', 'djit', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'sinpkt', 'sjit', 'djit', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'sinpkt', 'sjit', 'djit', 'swin', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'sinpkt', 'sjit', 'djit', 'swin', 'stcpb', 'dtcpb', 'dwin', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'dloss', 'sinpkt', 'sjit', 'djit', 'swin', 'stcpb', 'dtcpb', 'dwin', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'dbytes', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'dloss', 'sinpkt', 'sjit', 'djit', 'swin', 'stcpb', 'dtcpb', 'dwin', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'dbytes', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'dloss', 'sinpkt', 'dinpkt', 'sjit', 'djit', 'swin', 'stcpb', 'dtcpb', 'dwin', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'dbytes', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'sloss', 'dloss', 'sinpkt', 'dinpkt', 'sjit', 'djit', 'swin', 'stcpb', 'dtcpb', 'dwin', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'dbytes', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'sloss', 'dloss', 'sinpkt', 'dinpkt', 'sjit', 'djit', 'swin', 'stcpb', 'dtcpb', 'dwin', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'response_body_len', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'sbytes', 'dbytes', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'sloss', 'dloss', 'sinpkt', 'dinpkt', 'sjit', 'djit', 'swin', 'stcpb', 'dtcpb', 'dwin', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'response_body_len', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst'],\n","['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'sbytes', 'dbytes', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'sloss', 'dloss', 'sinpkt', 'dinpkt', 'sjit', 'djit', 'swin', 'stcpb', 'dtcpb', 'dwin', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'response_body_len', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst', 'is_sm_ips_ports'],\n","['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'sbytes', 'dbytes', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'sloss', 'dloss', 'sinpkt', 'dinpkt', 'sjit', 'djit', 'swin', 'stcpb', 'dtcpb', 'dwin', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'response_body_len', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_ftp_cmd', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst', 'is_sm_ips_ports']]"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"H0jJ1TrRZfUG","colab_type":"code","colab":{}},"source":["#Definición del modelo final.\n","def InDeep_model():\n","\n"," input_img = Input(shape = (8, 8, 1))\n","\n"," block_1 = BatchNormalization()(input_img)\n"," block_1 = Activation(swish)(block_1)\n"," block_1 = Conv2D(16, (3,3), padding='same')(block_1)\n"," block_1 = BatchNormalization()(block_1)\n"," block_1 = Activation(swish)(block_1)\n"," block_1 = Conv2D(16, (3,3), padding='same')(block_1)\n","\n"," concat_1 = concatenate([input_img, block_1], axis = 3)\n","\n"," block_2 = BatchNormalization()(concat_1)\n"," block_2 = Activation(swish)(block_2)\n"," block_2 = Conv2D(32, (3,3), padding='same')(block_2)\n"," block_2 = BatchNormalization()(block_2)\n"," block_2 = Activation(swish)(block_2)\n"," block_2 = Conv2D(32, (3,3), padding='same')(block_2)\n","\n"," concat_2 = concatenate([input_img, block_1,block_2], axis = 3)\n","\n"," block_3 = BatchNormalization()(concat_2)\n"," block_3 = Activation(swish)(block_3)\n"," block_3 = Conv2D(64, (3,3), padding='same')(block_3)\n"," block_3 = BatchNormalization()(block_3)\n"," block_3 = Activation(swish)(block_3)\n"," block_3 = Conv2D(64, (3,3), padding='same',strides=(2,2))(block_3)\n","\n"," output = Flatten()(block_3)\n"," output = Dense(128, activation=Activation(swish))(output)\n"," output = Dropout(rate=0.2)(output)\n"," output = Dense(64, activation=Activation(swish))(output)\n"," output = Dropout(rate=0.2)(output)\n"," out = Dense(10, activation='softmax')(output)\n","\n"," model = Model(inputs = input_img, outputs = out)\n"," model.compile(optimizer=Adam(lr=0.001), loss=\"categorical_crossentropy\", metrics=[\"accuracy\"])\n","\n"," return model\n"," \n","model = InDeep_model()"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"Hca39UpAizS8","colab_type":"code","colab":{}},"source":["#Balanceo del dataset mediante SMOTE (930000 por cada clase) y normalización entre -0,5 y 0,5\n","x=[]\n","for image in df.to_numpy():\n"," x.append((image/255 - 0.5))\n","sm = SMOTE(random_state=0)\n","x, y = sm.fit_sample(x, y)\n","x = np.array(x)\n","df_x = pd.DataFrame(data=x[:],columns=df.columns) "],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"N02ziEqlar7u","colab_type":"code","colab":{}},"source":["#Proceso iterativo de entrenamiento y evaluación de la red CNN en función de la lista de características seleccionadas por el RFE\n","import warnings\n","warnings.filterwarnings('ignore')\n","\n","for feature in list_features:\n"," df_features = df_x.loc[:,feature]\n"," x = np.pad(df_features.to_numpy(), ((0,0),(0,64-len(feature))), 'constant')\n"," x = x.reshape(x.shape[0],8,8,1)\n"," print(\"Number of features: {}\".format(len(feature)))\n"," display(df_features.head())\n","\n"," #Splitting to train and test input/output\n"," sss = StratifiedShuffleSplit(n_splits=1, test_size=0.25, random_state=42)\n","\n"," for train_index, test_index in sss.split(x,y):\n"," x_train, x_test = x[train_index], x[test_index]\n"," y_train, y_test = y[train_index], y[test_index]\n","\n"," \n"," history = model.fit(x_train,y_train,validation_data=(x_test,y_test),verbose=1,batch_size=256,epochs=25)\n"," \n"," y_pred = model.predict(x)\n"," y_pred = np.argmax(y_pred,axis=1) \n"," y_true = np.argmax(y,axis=1)\n","\n"," print(\"Accuracy: {}\" .format(metrics.accuracy_score(y_true, y_pred)))\n"," print(\"Precision: {}\" .format(metrics.precision_score(y_true, y_pred, average='macro')))\n"," print(\"Recall: {}\" .format(metrics.recall_score(y_true, y_pred, average='macro')))\n"," print(\"F1: {}\" .format(metrics.f1_score(y_true, y_pred, average='macro')))"],"execution_count":0,"outputs":[]}]} -------------------------------------------------------------------------------- /RFE_feature_selection.ipynb: -------------------------------------------------------------------------------- 1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"RFE_feature_selection.ipynb","provenance":[],"collapsed_sections":[],"mount_file_id":"1svMAvL8w8AAW3sHMWC0KlBRc9ZYDHbhu","authorship_tag":"ABX9TyMlWYoFQdvU42qMOBnnkJtd"},"kernelspec":{"name":"python3","display_name":"Python 3"},"accelerator":"GPU"},"cells":[{"cell_type":"code","metadata":{"id":"8nCj1d2NAhPA","colab_type":"code","colab":{}},"source":["#Importamos la segunda versión de tensorflow\n","%tensorflow_version 2.x"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"Y7Nd06bVAnYr","colab_type":"code","colab":{}},"source":["#Instalamos las dependencias necesarias que no posee Google Colab.\n","!pip install mlxtend --upgrade --no-deps\n","!pip install bayesian-optimization"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"YaKGQHxYAo4U","colab_type":"code","colab":{}},"source":["#Importamos los paquetes necesarios para el proyecto\n","import pandas as pd\n","import numpy as np\n","import matplotlib.pyplot as plt\n","from tensorflow.keras.utils import get_file\n","from sklearn import preprocessing\n","from scipy import stats\n","from array import array\n","from sklearn.model_selection import StratifiedKFold\n","from sklearn import metrics\n","from imblearn.over_sampling import SMOTE\n","from mlxtend.plotting import plot_confusion_matrix\n","from bayes_opt import BayesianOptimization\n","from sklearn.utils import shuffle\n","from sklearn.ensemble import RandomForestClassifier\n","from sklearn.feature_selection import RFE\n","from sklearn.preprocessing import LabelEncoder\n","\n","from sklearn.model_selection import StratifiedShuffleSplit\n","from sklearn import metrics\n","from keras.regularizers import l1\n","from keras import backend as K\n","from keras.utils.generic_utils import get_custom_objects\n","from tensorflow.keras.optimizers import Adam\n","from tensorflow.keras.models import Sequential\n","from tensorflow.keras.layers import Dense, Conv2D,LSTM, MaxPool2D, Dropout, Flatten, Activation, LeakyReLU, ReLU\n","from tensorflow.keras.callbacks import EarlyStopping"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"mYkYWXvKAqrW","colab_type":"code","colab":{}},"source":["#Descargamos el set de datos subido previamente al Google Drive (Tarda menos que descargarlo directamente de la página).\n","path_train = \"/content/drive/My Drive/TFG/UNSW_NB15_training-set.csv\"\n","path_test = \"/content/drive/My Drive/TFG/UNSW_NB15_testing-set.csv\"\n","\n","#Leemos los datos\n","df_train=pd.read_csv(path_train,dtype='unicode')\n","df_test=pd.read_csv(path_test,dtype='unicode')\n","\n","#Quitamos las columnas no necesarias\n","df_train.drop('id', axis=1, inplace=True)\n","df_train.drop('label', axis=1, inplace=True)\n","df_test.drop('id', axis=1, inplace=True)\n","df_test.drop('label', axis=1, inplace=True)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"aJFeunEGCOIZ","colab_type":"code","colab":{}},"source":["#Concatenamos el set de entrenamiento y el de test para manipular directamente el conjunto.\n","df = pd.concat([df_train,df_test],axis=0)\n","print(df.shape)\n","df[:5]"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"pDg5SANHElSj","colab_type":"code","colab":{}},"source":["#Número de valores por clase\n","attack_cat = df['attack_cat'].value_counts()\n","\n","count_class_Normal = attack_cat[0]\n","count_class_Generic = attack_cat[1]\n","count_class_Exploits = attack_cat[2]\n","count_class_Fuzzers = attack_cat[3]\n","count_class_Reconnaissance = attack_cat[4]\n","count_class_DoS = attack_cat[5]\n","count_class_Backdoors = attack_cat[6]\n","count_class_Analysis = attack_cat[7]\n","count_class_Shellcode = attack_cat[8]\n","count_class_Worms = attack_cat[9]\n","\n","#División del dataset por clases.\n","df_class_Normal = df[df['attack_cat'] == 'Normal']\n","df_class_Generic = df[df['attack_cat'] == 'Generic']\n","df_class_Exploits = df[df['attack_cat'] == 'Exploits']\n","df_class_Fuzzers = df[df['attack_cat'] == 'Fuzzers']\n","df_class_DoS = df[df['attack_cat'] == 'DoS']\n","df_class_Reconnaissance = df[df['attack_cat'] == 'Reconnaissance']\n","df_class_Backdoor = df[df['attack_cat'] == 'Backdoor']\n","df_class_Analysis = df[df['attack_cat'] == 'Analysis']\n","df_class_Shellcode = df[df['attack_cat'] == 'Shellcode']\n","df_class_Worms = df[df['attack_cat'] == 'Worms']\n","\n","#Under-sampling las categorías Normal, Generic, Exploits, Fuzzers, DoS, Reconnaissance, Backdoor, Analysis y Shellcode aleatoriamente a 1000 muestras\n","df_class_Normal = df_class_Normal.sample(1000)\n","df_class_Generic = df_class_Generic.sample(1000)\n","df_class_Exploits = df_class_Exploits.sample(1000)\n","df_class_Fuzzers = df_class_Fuzzers.sample(1000)\n","df_class_DoS = df_class_DoS.sample(1000)\n","df_class_Reconnaissance = df_class_Reconnaissance.sample(1000)\n","df_class_Backdoor = df_class_Backdoor.sample(1000)\n","df_class_Analysis = df_class_Analysis.sample(1000)\n","df_class_Shellcode = df_class_Shellcode.sample(1000)\n","\n","#Concatenado\n","df = pd.concat([df_class_Normal,df_class_Generic,df_class_Exploits,df_class_Fuzzers,df_class_DoS,df_class_Reconnaissance,df_class_Backdoor,df_class_Analysis,df_class_Shellcode,df_class_Worms], axis=0)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"lfETf-ZYFauX","colab_type":"code","outputId":"c91b9df1-c3bc-4270-c8ca-f74172aec419","executionInfo":{"status":"ok","timestamp":1591807480692,"user_tz":-120,"elapsed":1145,"user":{"displayName":"Farid Bagheri-Gisour Marandyn","photoUrl":"","userId":"05978557582753685866"}},"colab":{"base_uri":"https://localhost:8080/","height":880}},"source":["#Visualización en una gráfica circular del peso de cada clase\n","print(\"Shape of dataFrame: {} \\n\".format(df.shape))\n","print(\"Number of samples per attack\")\n","display(df['attack_cat'].value_counts())\n","print(\"\")\n","print(\"Plotting balance of dataFrame\")\n","df_plot = (df['attack_cat'].value_counts(normalize=True) *100)\n","df_plot.plot(kind='pie',figsize=(10,10),title='Balance of dataset (%)')"],"execution_count":0,"outputs":[{"output_type":"stream","text":["Shape of dataFrame: (9174, 43) \n","\n","Number of samples per attack\n"],"name":"stdout"},{"output_type":"display_data","data":{"text/plain":["Shellcode 1000\n","Normal 1000\n","Analysis 1000\n","DoS 1000\n","Exploits 1000\n","Generic 1000\n","Fuzzers 1000\n","Reconnaissance 1000\n","Backdoor 1000\n","Worms 174\n","Name: attack_cat, dtype: int64"]},"metadata":{"tags":[]}},{"output_type":"stream","text":["\n","Plotting balance of dataFrame\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/plain":[""]},"metadata":{"tags":[]},"execution_count":10},{"output_type":"display_data","data":{"image/png":"\n","text/plain":["
"]},"metadata":{"tags":[]}}]},{"cell_type":"code","metadata":{"id":"Cjyy9-p9I1tP","colab_type":"code","colab":{}},"source":["#Conversión de los datos tipo string a enteros en un rango entre 0 y 255 (1 Byte de información)\n","def encode_string_byte (df,name):\n"," df[name] = LabelEncoder().fit_transform(df[name])\n","\n","encode_string_byte (df,'proto')\n","encode_string_byte (df,'state') \n","encode_string_byte (df,'service') \n","\n","display(df.head())\n","print(\"\")\n","print(\"Proto column --> Max value: {} Min value: {} \".format(max(df['proto']),min(df['proto'])))\n","print(\"State column --> Max value: {} Min value: {} \".format(max(df['state']),min(df['state'])))\n","print(\"Service column --> Max value: {} Min value: {} \".format(max(df['service']),min(df['service'])))"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"K5PK4TkmNZMN","colab_type":"code","colab":{}},"source":["#Normalización de los números enteros en valores decimales en rango entre 0 y 1\n","def numerical_minmax_normalization (df, name):\n"," x = df[name].values.reshape(-1,1)\n"," min_max_scaler = preprocessing.MinMaxScaler()\n"," x_scaled = min_max_scaler.fit_transform(x)\n"," df[name] = x_scaled\n","\n","numerical_minmax_normalization(df,'dur')\n","numerical_minmax_normalization(df,'spkts')\n","numerical_minmax_normalization(df,'dpkts')\n","numerical_minmax_normalization(df,'sbytes')\n","numerical_minmax_normalization(df,'dbytes')\n","numerical_minmax_normalization(df,'rate')\n","numerical_minmax_normalization(df,'sttl')\n","numerical_minmax_normalization(df,'dttl')\n","numerical_minmax_normalization(df,'sload')\n","numerical_minmax_normalization(df,'dload')\n","numerical_minmax_normalization(df,'sloss')\n","numerical_minmax_normalization(df,'dloss')\n","numerical_minmax_normalization(df,'sinpkt')\n","numerical_minmax_normalization(df,'dinpkt')\n","numerical_minmax_normalization(df,'sjit')\n","numerical_minmax_normalization(df,'djit')\n","numerical_minmax_normalization(df,'swin')\n","numerical_minmax_normalization(df,'stcpb')\n","numerical_minmax_normalization(df,'dtcpb')\n","numerical_minmax_normalization(df,'dwin')\n","numerical_minmax_normalization(df,'tcprtt')\n","numerical_minmax_normalization(df,'synack')\n","numerical_minmax_normalization(df,'ackdat')\n","numerical_minmax_normalization(df,'smean')\n","numerical_minmax_normalization(df,'dmean')\n","numerical_minmax_normalization(df,'trans_depth')\n","numerical_minmax_normalization(df,'response_body_len')\n","numerical_minmax_normalization(df,'ct_srv_src')\n","numerical_minmax_normalization(df,'ct_state_ttl')\n","numerical_minmax_normalization(df,'ct_dst_ltm')\n","numerical_minmax_normalization(df,'ct_src_dport_ltm')\n","numerical_minmax_normalization(df,'ct_dst_sport_ltm')\n","numerical_minmax_normalization(df,'ct_dst_src_ltm')\n","numerical_minmax_normalization(df,'is_ftp_login')\n","numerical_minmax_normalization(df,'ct_ftp_cmd')\n","numerical_minmax_normalization(df,'ct_flw_http_mthd')\n","numerical_minmax_normalization(df,'ct_src_ltm')\n","numerical_minmax_normalization(df,'ct_srv_dst')\n","numerical_minmax_normalization(df,'is_sm_ips_ports')\n","\n","df.head()"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"dZf6lDHLOWvW","colab_type":"code","colab":{}},"source":["#Mapeo de los valores normalizados del paso anterior a valores enteros entre 0 y 255 (1 Byte de información)\n","def numerical_split_ohe (df,name):\n"," pd_to_np = df[name].tolist()\n"," np_split = []\n"," \n"," categories = np.linspace(0, 1, num=256,endpoint=False)\n"," quantization = range(0,256)\n","\n"," for value in pd_to_np:\n"," for i in range(len(categories)-1):\n"," if (categories[i] <= float(value) <= categories[i+1]):\n"," np_split.append(quantization[i])\n"," break\n"," if (float(value) > categories[-1]):\n"," np_split.append(quantization[-1])\n"," break\n"," \n"," df[name] = np_split\n","\n","\n","numerical_split_ohe(df,'dur')\n","numerical_split_ohe(df,'spkts')\n","numerical_split_ohe(df,'dpkts')\n","numerical_split_ohe(df,'sbytes')\n","numerical_split_ohe(df,'dbytes')\n","numerical_split_ohe(df,'rate')\n","numerical_split_ohe(df,'sttl')\n","numerical_split_ohe(df,'dttl')\n","numerical_split_ohe(df,'sload')\n","numerical_split_ohe(df,'dload')\n","numerical_split_ohe(df,'sloss')\n","numerical_split_ohe(df,'dloss')\n","numerical_split_ohe(df,'sinpkt')\n","numerical_split_ohe(df,'dinpkt')\n","numerical_split_ohe(df,'sjit')\n","numerical_split_ohe(df,'djit')\n","numerical_split_ohe(df,'swin')\n","numerical_split_ohe(df,'stcpb')\n","numerical_split_ohe(df,'dtcpb')\n","numerical_split_ohe(df,'dwin')\n","numerical_split_ohe(df,'tcprtt')\n","numerical_split_ohe(df,'synack')\n","numerical_split_ohe(df,'ackdat')\n","numerical_split_ohe(df,'smean')\n","numerical_split_ohe(df,'dmean')\n","numerical_split_ohe(df,'trans_depth')\n","numerical_split_ohe(df,'response_body_len')\n","numerical_split_ohe(df,'ct_srv_src')\n","numerical_split_ohe(df,'ct_state_ttl')\n","numerical_split_ohe(df,'ct_dst_ltm')\n","numerical_split_ohe(df,'ct_src_dport_ltm')\n","numerical_split_ohe(df,'ct_dst_sport_ltm')\n","numerical_split_ohe(df,'ct_dst_src_ltm')\n","numerical_split_ohe(df,'is_ftp_login')\n","numerical_split_ohe(df,'ct_ftp_cmd')\n","numerical_split_ohe(df,'ct_flw_http_mthd')\n","numerical_split_ohe(df,'ct_src_ltm')\n","numerical_split_ohe(df,'ct_srv_dst')\n","numerical_split_ohe(df,'is_sm_ips_ports')\n","\n","display(df.head())"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"HESpjmI5l7tj","colab_type":"code","colab":{}},"source":["#Quitando la columna attack_cat y guardandola en la variable y.\n","y_column = df['attack_cat']\n","df.drop('attack_cat',axis=1,inplace=True)\n","dummies = pd.get_dummies(y_column) \n","y = dummies.values\n","\n","print(y[:5])\n","display(dummies.head())"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"f7JSiCRO_9Nk","colab_type":"code","colab":{}},"source":["#Balanceo del dataset mediante SMOTE (1000 por cada clase) y normalización entre -0,5 y 0,5\n","byte_images = df.to_numpy()\n","x = []\n","for image in np.array(byte_images):\n"," x.append((image/255 - 0.5))\n","sm = SMOTE(random_state=0)\n","x, y = sm.fit_sample(x, y)\n","x = np.array(x)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"8XFDF3PiIbb5","colab_type":"code","colab":{}},"source":["#Separación del dataset en un set de entrenamiento y otro de validación.\n","sss = StratifiedShuffleSplit(n_splits=1, test_size=0.25, random_state=42)\n","\n","for train_index, test_index in sss.split(x,y):\n"," x_train, x_test = x[train_index], x[test_index]\n"," y_train, y_test = y[train_index], y[test_index]"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"wsWqobWQNWAx","colab_type":"code","colab":{}},"source":["#Transformación del array 'y' en un valor úncio\n","def y_transform(y_train):\n"," y_train_rfe = []\n"," for value in y_train:\n"," y_train_rfe.append(list(value).index(1))\n"," return y_train_rfe\n"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"4Qb_5PRardhC","colab_type":"code","outputId":"4340b090-e4f2-41cf-8665-fec31afd2220","executionInfo":{"status":"ok","timestamp":1591810329799,"user_tz":-120,"elapsed":1064441,"user":{"displayName":"Farid Bagheri-Gisour Marandyn","photoUrl":"","userId":"05978557582753685866"}},"colab":{"base_uri":"https://localhost:8080/","height":1000}},"source":["#Proceso de selección de caraceterísitcas mediante RFE. Estimador RandomForestClassifier.\n","y_rfe = y_transform(y)\n","for index in range(1,42):\n"," sel = RFE(RandomForestClassifier(n_estimators=100,random_state=0,n_jobs=-1),n_features_to_select=index)\n"," sel.fit(x,y_rfe)\n"," print(\"Number of selected features: {}\".format(index))\n"," features =[]\n"," for i,value in enumerate(sel.support_):\n"," if value:\n"," features.append(i)\n"," print(\"Features: {}\".format([list(list(df.columns)[index_value] for index_value in features)]))"],"execution_count":0,"outputs":[{"output_type":"stream","text":["Number of selected features: 1\n","Features: [['smean']]\n","Number of selected features: 2\n","Features: [['smean', 'dmean']]\n","Number of selected features: 3\n","Features: [['smean', 'dmean', 'ct_srv_dst']]\n","Number of selected features: 4\n","Features: [['synack', 'smean', 'dmean', 'ct_srv_dst']]\n","Number of selected features: 5\n","Features: [['proto', 'synack', 'smean', 'dmean', 'ct_srv_dst']]\n","Number of selected features: 6\n","Features: [['proto', 'synack', 'smean', 'dmean', 'ct_dst_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 7\n","Features: [['proto', 'sttl', 'synack', 'smean', 'dmean', 'ct_dst_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 8\n","Features: [['proto', 'service', 'sttl', 'synack', 'smean', 'dmean', 'ct_dst_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 9\n","Features: [['dur', 'proto', 'service', 'sttl', 'synack', 'smean', 'dmean', 'ct_dst_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 10\n","Features: [['dur', 'proto', 'service', 'sttl', 'sload', 'synack', 'smean', 'dmean', 'ct_dst_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 11\n","Features: [['dur', 'proto', 'service', 'sttl', 'sload', 'synack', 'smean', 'dmean', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 12\n","Features: [['dur', 'proto', 'service', 'sttl', 'sload', 'synack', 'smean', 'dmean', 'ct_srv_src', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 13\n","Features: [['dur', 'proto', 'service', 'sttl', 'sload', 'synack', 'smean', 'dmean', 'ct_srv_src', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 14\n","Features: [['dur', 'proto', 'service', 'sttl', 'sload', 'tcprtt', 'synack', 'smean', 'dmean', 'ct_srv_src', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 15\n","Features: [['dur', 'proto', 'service', 'rate', 'sttl', 'sload', 'tcprtt', 'synack', 'smean', 'dmean', 'ct_srv_src', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 16\n","Features: [['dur', 'proto', 'service', 'rate', 'sttl', 'sload', 'tcprtt', 'synack', 'smean', 'dmean', 'ct_srv_src', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 17\n","Features: [['dur', 'proto', 'service', 'rate', 'sttl', 'sload', 'dtcpb', 'tcprtt', 'synack', 'smean', 'dmean', 'ct_srv_src', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 18\n","Features: [['dur', 'proto', 'service', 'rate', 'sttl', 'sload', 'dtcpb', 'tcprtt', 'synack', 'smean', 'dmean', 'ct_srv_src', 'ct_state_ttl', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 19\n","Features: [['dur', 'proto', 'service', 'rate', 'sttl', 'sload', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'smean', 'dmean', 'ct_srv_src', 'ct_state_ttl', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 20\n","Features: [['dur', 'proto', 'service', 'rate', 'sttl', 'sload', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'smean', 'dmean', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 21\n","Features: [['dur', 'proto', 'service', 'rate', 'sttl', 'sload', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'smean', 'dmean', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 22\n","Features: [['dur', 'proto', 'service', 'rate', 'sttl', 'sload', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 23\n","Features: [['dur', 'proto', 'service', 'rate', 'sttl', 'sload', 'sjit', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 24\n","Features: [['dur', 'proto', 'service', 'rate', 'sttl', 'dttl', 'sload', 'sjit', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 25\n","Features: [['dur', 'proto', 'service', 'rate', 'sttl', 'dttl', 'sload', 'sjit', 'djit', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 26\n","Features: [['dur', 'proto', 'service', 'rate', 'sttl', 'dttl', 'sload', 'sjit', 'djit', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 27\n","Features: [['dur', 'proto', 'service', 'state', 'rate', 'sttl', 'dttl', 'sload', 'sjit', 'djit', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 28\n","Features: [['dur', 'proto', 'service', 'state', 'spkts', 'rate', 'sttl', 'dttl', 'sload', 'sjit', 'djit', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 29\n","Features: [['dur', 'proto', 'service', 'state', 'spkts', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'sjit', 'djit', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 30\n","Features: [['dur', 'proto', 'service', 'state', 'spkts', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'sinpkt', 'sjit', 'djit', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 31\n","Features: [['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'sinpkt', 'sjit', 'djit', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 32\n","Features: [['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'sinpkt', 'sjit', 'djit', 'swin', 'stcpb', 'dtcpb', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 33\n","Features: [['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'sinpkt', 'sjit', 'djit', 'swin', 'stcpb', 'dtcpb', 'dwin', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 34\n","Features: [['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'dloss', 'sinpkt', 'sjit', 'djit', 'swin', 'stcpb', 'dtcpb', 'dwin', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 35\n","Features: [['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'dbytes', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'dloss', 'sinpkt', 'sjit', 'djit', 'swin', 'stcpb', 'dtcpb', 'dwin', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 36\n","Features: [['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'dbytes', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'dloss', 'sinpkt', 'dinpkt', 'sjit', 'djit', 'swin', 'stcpb', 'dtcpb', 'dwin', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 37\n","Features: [['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'dbytes', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'sloss', 'dloss', 'sinpkt', 'dinpkt', 'sjit', 'djit', 'swin', 'stcpb', 'dtcpb', 'dwin', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 38\n","Features: [['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'dbytes', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'sloss', 'dloss', 'sinpkt', 'dinpkt', 'sjit', 'djit', 'swin', 'stcpb', 'dtcpb', 'dwin', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'response_body_len', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 39\n","Features: [['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'sbytes', 'dbytes', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'sloss', 'dloss', 'sinpkt', 'dinpkt', 'sjit', 'djit', 'swin', 'stcpb', 'dtcpb', 'dwin', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'response_body_len', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst']]\n","Number of selected features: 40\n","Features: [['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'sbytes', 'dbytes', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'sloss', 'dloss', 'sinpkt', 'dinpkt', 'sjit', 'djit', 'swin', 'stcpb', 'dtcpb', 'dwin', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'response_body_len', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst', 'is_sm_ips_ports']]\n","Number of selected features: 41\n","Features: [['dur', 'proto', 'service', 'state', 'spkts', 'dpkts', 'sbytes', 'dbytes', 'rate', 'sttl', 'dttl', 'sload', 'dload', 'sloss', 'dloss', 'sinpkt', 'dinpkt', 'sjit', 'djit', 'swin', 'stcpb', 'dtcpb', 'dwin', 'tcprtt', 'synack', 'ackdat', 'smean', 'dmean', 'trans_depth', 'response_body_len', 'ct_srv_src', 'ct_state_ttl', 'ct_dst_ltm', 'ct_src_dport_ltm', 'ct_dst_sport_ltm', 'ct_dst_src_ltm', 'ct_ftp_cmd', 'ct_flw_http_mthd', 'ct_src_ltm', 'ct_srv_dst', 'is_sm_ips_ports']]\n"],"name":"stdout"}]}]} --------------------------------------------------------------------------------