├── README.md
└── Data_Extraction_Revised.py


/README.md:
--------------------------------------------------------------------------------
 1 | # Intrusion Detection System using Deep Learning
 2 | 
 3 | VGG-19 deep learning model trained using ISCX 2012 IDS Dataset
 4 | 
 5 | # Framework & API's 
 6 | 
 7 | * Tensorflow-GPU
 8 | * Keras
 9 | * NVIDIA CUDA Toolkit 9.0
10 | * NVIDIA cuDNN 7.0
11 | 
12 | # Tools
13 | 
14 | * Anaconda (Python 3.6)
15 | * PyCharm
16 | 
17 | 
18 | # How to use
19 | Download the ISCX 2012 data set from the link
20 | 
21 | http://www.unb.ca/cic/datasets/ids.html
22 | 
23 | Then run the Java program known as ISCX FlowMeter which is found here on GitHub. You can use any IDE for that
24 | 
25 | https://github.com/ISCX/CICFlowMeter (if this doesnt convert .PCAP to .XML then try below)
26 | 
27 | https://github.com/ISCX/ISCXFlowMeter
28 | 
29 | Next I want you to make sure that your system is capable of running deep learning software. To check you can follow this guide that I have created:
30 | 
31 | https://towardsdatascience.com/python-environment-setup-for-deep-learning-on-windows-10-c373786e36d1
32 | 
33 | #### Note: If your system is inadequate then I humbly request you to stop here as the program will not perform efficiently and a great deal of time will be wasted.
34 | 
35 | Next run the program on the pre-processed data (change the location of the save file in the code). This will take out the relevant data fields in XML format for each file and process the data into Numpy Arrays by running the following python file:
36 | 
37 |     Data_Extraction_Revised.py
38 | 
39 | When completed you can now run (assuming you have Jupyter Notebook) the program.
40 | You have to change the location of the save file, in the code, to the save file from the revised data extraction program
41 | 
42 |     FYP-Revised.ipynb
43 | 
44 | And you can begin training
45 | 
46 | ## GOOD LUCK :)
47 | 


--------------------------------------------------------------------------------
/Data_Extraction_Revised.py:
--------------------------------------------------------------------------------
 1 | import xml.etree.ElementTree as ET
 2 | import numpy as np
 3 | import os
 4 | import time
 5 | 
 6 | import_directory = 'C:\\Users\Tamim Mirza\Documents\ISCX\labeled_flows_xml\\'
 7 | 
 8 | files = os.listdir(import_directory)
 9 | 
10 | errors = []
11 | 
12 | start_time = time.time()
13 | i = -1
14 | data_array = np.empty((0, 2))
15 | counter = 0
16 | actual = (50**2) * 3
17 | for file in files:
18 |     print(file)
19 |     try:
20 |         tree = ET.parse(import_directory + file)
21 |         print('Reading File ', file)
22 |         root = tree.getroot()
23 |     except:
24 |         errors += file
25 |         continue
26 |     for child in root:
27 |         for next_child in child:
28 |             if next_child.tag == 'destinationPayloadAsUTF':
29 |                 if next_child.text is not None:
30 |                     x = next_child.text
31 |                     if len(x) > actual:
32 |                         x = x[: actual]
33 |                     else:
34 |                         while len(x) < actual:
35 |                             x += x
36 |                         x = x[:actual]
37 |                     if child.find('Tag').text == 'Normal':
38 |                         data_array = np.vstack((data_array, np.array([np.fromstring(x, dtype=np.uint8), 0])))
39 |                     else:
40 |                         data_array = np.vstack((data_array, np.array([np.fromstring(x, dtype=np.uint8), 1])))
41 |                     counter += 1
42 |     print('Time taken: {}'.format(time.time() - start_time))
43 |     start_time = time.time()
44 |     np.save('Database2\destinationPayload_' + file, np.array(data_array))
45 |     data_array = np.empty((0, 2))
46 | 
47 | print('Error in Opening Files = ', errors)
48 | print('Counter = ', counter)
49 | print('DONE!')


--------------------------------------------------------------------------------