├── README.md └── Data_Extraction_Revised.py /README.md: -------------------------------------------------------------------------------- 1 | # Intrusion Detection System using Deep Learning 2 | 3 | VGG-19 deep learning model trained using ISCX 2012 IDS Dataset 4 | 5 | # Framework & API's 6 | 7 | * Tensorflow-GPU 8 | * Keras 9 | * NVIDIA CUDA Toolkit 9.0 10 | * NVIDIA cuDNN 7.0 11 | 12 | # Tools 13 | 14 | * Anaconda (Python 3.6) 15 | * PyCharm 16 | 17 | 18 | # How to use 19 | Download the ISCX 2012 data set from the link 20 | 21 | http://www.unb.ca/cic/datasets/ids.html 22 | 23 | Then run the Java program known as ISCX FlowMeter which is found here on GitHub. You can use any IDE for that 24 | 25 | https://github.com/ISCX/CICFlowMeter (if this doesnt convert .PCAP to .XML then try below) 26 | 27 | https://github.com/ISCX/ISCXFlowMeter 28 | 29 | Next I want you to make sure that your system is capable of running deep learning software. To check you can follow this guide that I have created: 30 | 31 | https://towardsdatascience.com/python-environment-setup-for-deep-learning-on-windows-10-c373786e36d1 32 | 33 | #### Note: If your system is inadequate then I humbly request you to stop here as the program will not perform efficiently and a great deal of time will be wasted. 34 | 35 | Next run the program on the pre-processed data (change the location of the save file in the code). This will take out the relevant data fields in XML format for each file and process the data into Numpy Arrays by running the following python file: 36 | 37 | Data_Extraction_Revised.py 38 | 39 | When completed you can now run (assuming you have Jupyter Notebook) the program. 40 | You have to change the location of the save file, in the code, to the save file from the revised data extraction program 41 | 42 | FYP-Revised.ipynb 43 | 44 | And you can begin training 45 | 46 | ## GOOD LUCK :) 47 | -------------------------------------------------------------------------------- /Data_Extraction_Revised.py: -------------------------------------------------------------------------------- 1 | import xml.etree.ElementTree as ET 2 | import numpy as np 3 | import os 4 | import time 5 | 6 | import_directory = 'C:\\Users\Tamim Mirza\Documents\ISCX\labeled_flows_xml\\' 7 | 8 | files = os.listdir(import_directory) 9 | 10 | errors = [] 11 | 12 | start_time = time.time() 13 | i = -1 14 | data_array = np.empty((0, 2)) 15 | counter = 0 16 | actual = (50**2) * 3 17 | for file in files: 18 | print(file) 19 | try: 20 | tree = ET.parse(import_directory + file) 21 | print('Reading File ', file) 22 | root = tree.getroot() 23 | except: 24 | errors += file 25 | continue 26 | for child in root: 27 | for next_child in child: 28 | if next_child.tag == 'destinationPayloadAsUTF': 29 | if next_child.text is not None: 30 | x = next_child.text 31 | if len(x) > actual: 32 | x = x[: actual] 33 | else: 34 | while len(x) < actual: 35 | x += x 36 | x = x[:actual] 37 | if child.find('Tag').text == 'Normal': 38 | data_array = np.vstack((data_array, np.array([np.fromstring(x, dtype=np.uint8), 0]))) 39 | else: 40 | data_array = np.vstack((data_array, np.array([np.fromstring(x, dtype=np.uint8), 1]))) 41 | counter += 1 42 | print('Time taken: {}'.format(time.time() - start_time)) 43 | start_time = time.time() 44 | np.save('Database2\destinationPayload_' + file, np.array(data_array)) 45 | data_array = np.empty((0, 2)) 46 | 47 | print('Error in Opening Files = ', errors) 48 | print('Counter = ', counter) 49 | print('DONE!') --------------------------------------------------------------------------------