├── .gitignore ├── LICENSE ├── Master_Thesis.pdf ├── README.md ├── convert_pcap_to_h5.py ├── data_exploration.py ├── datasets ├── LinuxChrome │ ├── 8 │ │ └── extracted_8-2104_1422.h5 │ └── 16 │ │ └── extracted_16-2104_1523.h5 ├── WindowsAndreas │ ├── 8 │ │ └── extracted_8-2304_0930.h5 │ └── 16 │ │ └── extracted_16-2304_0932.h5 ├── WindowsChrome │ ├── 8 │ │ └── extracted_8-2004_1553.h5 │ └── 16 │ │ └── extracted_16-2004_1615.h5 ├── WindowsFirefox │ ├── 8 │ │ └── extracted_8-2004_2104.h5 │ └── 16 │ │ └── extracted_16-2004_2210.h5 └── WindowsSalik │ ├── 8 │ └── extracted_8-2004_1525.h5 │ └── 16 │ └── extracted_16-2004_1542.h5 ├── extract_headers.py ├── extract_payload.py ├── filter_http_https.py ├── ip_header_test.py ├── pca ├── dataanalyzer.py ├── pca.py └── summarystats.py ├── pcap └── pcaptools.py ├── tf ├── confusionmatrix.py ├── dataset.py ├── early_stopping.py └── tf_utils.py ├── trafficgen ├── PyTgen │ ├── config.py │ ├── core │ │ ├── __init__.py │ │ ├── generator.py │ │ ├── runner.py │ │ └── scheduler.py │ ├── nslookup.py │ └── run.py └── Streaming │ ├── streaming_generator.py │ ├── streaming_types.py │ ├── unix_capture.py │ └── win_capture.py ├── train ├── train_header.py ├── train_logistic.py └── train_payload.py ├── utils.py └── visualization ├── classes_module.py ├── heatmap.py ├── histogram.py ├── pca_plots.py ├── t-sne_compare.py ├── t_sne.py ├── vis_utils.py └── visualize_activations.py /.gitignore: -------------------------------------------------------------------------------- 1 | trafficgen/__pycache__/ 2 | trafficgen/core/__pycache__/ 3 | __pycache__/ 4 | *.pcap 5 | .idea/ 6 | .vscode/ 7 | constants.py 8 | trained_models/ 9 | tensorboard/ 10 | *.png 11 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Salik Lennert Pedersen 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /Master_Thesis.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SalikLP/classification-of-encrypted-traffic/3c86e098aab58941f9339bb64945c1112ab556ef/Master_Thesis.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Classification of encrypted traffic using deep learning 2 | This repository contains the code used and developed during a master thesis at DTU Compute in 2018. 3 | Professor [Ole Winther](http://cogsys.imm.dtu.dk/staff/winther/) has been supervisor for this master thesis. 4 | Alex Omø Agerholm from [Napatech](https://www.napatech.com/) has been co-supervisor for this project. 5 | 6 | In this thesis we examined and evaluated different ways of classifying encrypted network traffic by use of neural networks. For this purpose we created a dataset with a streaming/non-streaming focus. The dataset comprises seven different classes, five streaming and two non-streaming. 7 | The thesis serves as a preliminary proof-of-concept for Napatech A/S. 8 | 9 | We propose a novel approach where the unencrypted parts of network traffic, namely the headers are utilized. This is done by concatenating the initial headers from a session thus forming a signature datapoint as shown in the following figure: 10 | 11 | Header datapoint 12 | 13 | The datasets created by use of the first 8 and 16 headers are available in the datasets folder in this repository. 14 | We explored the dataset by running t-SNE on the concatenated headers dataset. As can be seen in the t-SNE plot below, which shows all the individual datasets merged, it seems possible to perform classification of individual classes. 15 | 16 | t-SNE plot 17 | 18 | In experiments using the header-based approach we achieve very promising results, showing that a simple neural network with a single hidden layer of less than 50 units, can predict the individual classes with an accuracy of 96.4\% and an AUC of 0.99 to 1.00 for the individual classes, as shown in the following figures. 19 | 20 | Confusion matrix of all 7 classes ROC Plot 21 | 22 | The thesis hereby provides a solution to network traffic classification using the unencrypted headers. 23 | -------------------------------------------------------------------------------- /convert_pcap_to_h5.py: -------------------------------------------------------------------------------- 1 | import pcap.pcaptools as pcap 2 | 3 | 4 | 5 | if __name__ == '__main__': 6 | pcap.process_pcap_to_h5('/home/mclrn/Data/', '/home/mclrn/Data/h5/', session_threshold=5000) -------------------------------------------------------------------------------- /data_exploration.py: -------------------------------------------------------------------------------- 1 | import pca.dataanalyzer 2 | import utils 3 | import glob 4 | import os 5 | import pandas as pd 6 | import numpy as np 7 | from matplotlib import pyplot as plt 8 | from pca.dataanalyzer import byteindextoheaderfield 9 | 10 | num_headers = 8 11 | train_dir = 'C:/Users/salik/Documents/Data/LinuxChrome/{}/'.format(num_headers) 12 | dataframes = [] 13 | for fullname in glob.iglob(train_dir + '*.h5'): 14 | filename = os.path.basename(fullname) 15 | df = utils.load_h5(train_dir, filename) 16 | dataframes.append(df) 17 | # create one large dataframe 18 | data = pd.concat(dataframes) 19 | print(len(data)) 20 | print("drtv:", len(data[data['label'] == 'drtv'])) 21 | print("hbo:", len(data[data['label'] == 'hbo'])) 22 | print("http:", len(data[data['label'] == 'http'])) 23 | print("https:", len(data[data['label'] == 'https'])) 24 | print("netflix:", len(data[data['label'] == 'netflix'])) 25 | print("twitch:", len(data[data['label'] == 'twitch'])) 26 | print("youtube:", len(data[data['label'] == 'youtube'])) 27 | 28 | 29 | # dr_mean, dr_mean_sub, dr_std = getmeanstd(data, 'drtv') 30 | # nf_mean, nf_mean_sub, nf_std = getmeanstd(data, 'netflix') 31 | # mean_diff = dr_mean - nf_mean 32 | # sort_diff = (-abs(mean_diff)).argsort() #Sort on absolute values in decending order 33 | # for i in range(10): 34 | # packetnumber = math.ceil(sort_diff[i] / 54) 35 | # bytenumber = sort_diff[i] % 54 36 | # print('Index %i is bytenumber %i in packet: %i' % (sort_diff[i],bytenumber, packetnumber), byteindextoheaderfield(sort_diff[i])) 37 | # 38 | # bytes = getbytes(data) 39 | # labels = data['label'] 40 | # pca = p.runpca(bytes, 50) 41 | # # p.plotvarianceexp(pca, 50) 42 | # Z = p.componentprojection(bytes, pca) 43 | # for pc in range(17): 44 | # p.plotprojection(Z, pc, labels) 45 | # p.showplots() 46 | # id_max = np.argmax(mean_diff) 47 | # print(byteindextoheaderfield(id_max)) 48 | # id_min = np.argmin(mean_diff) 49 | # print(dr_mean-nf_mean) 50 | 51 | def createBoxplotsFromColumns(title,param, param1): 52 | 53 | plt.title(title) 54 | plt.boxplot([param,param1]) 55 | 56 | plt.savefig('boxplots/boxplot:%s.png' % title,dpi=300) 57 | plt.gcf().clear() 58 | 59 | def compare_data(path1, path2, nrheaders): 60 | df1 = pd.DataFrame() 61 | df2 = pd.DataFrame() 62 | 63 | for fullname in glob.iglob(path1 + '*.h5'): 64 | filename = os.path.basename(fullname) 65 | df1 = utils.load_h5(path1, filename) 66 | 67 | for fullname in glob.iglob(path2 + '*.h5'): 68 | filename = os.path.basename(fullname) 69 | df2 = utils.load_h5(path2, filename) 70 | 71 | classes = [] 72 | # find all classes 73 | for label in set(df1['label']): 74 | classes.append(label) 75 | print(set(df1['label'])) 76 | print(set(df2['label'])) 77 | # filter on classes 78 | for c in classes: 79 | ## Exclude youtube as it contains both UDP and TCP 80 | if(c == 'youtube'): 81 | continue 82 | 83 | 84 | # create selector 85 | df1_selector = df1['label'] == c 86 | df2_selector = df2['label'] == c 87 | 88 | 89 | df1_values = df1[df1_selector]['bytes'].values 90 | df2_values = df2[df2_selector]['bytes'].values 91 | 92 | df1_bytes = np.zeros((df1_values.shape[0], nrheaders * 54)) 93 | df2_bytes = np.zeros((df2_values.shape[0], nrheaders * 54)) 94 | 95 | for i, v in enumerate(df1_values): 96 | payload = np.zeros(nrheaders * 54, dtype=np.uint8) 97 | payload[:v.shape[0]] = v 98 | df1_bytes[i] = payload 99 | 100 | for i, v in enumerate(df2_values): 101 | payload = np.zeros(nrheaders * 54, dtype=np.uint8) 102 | payload[:v.shape[0]] = v 103 | df2_bytes[i] = payload 104 | 105 | 106 | 107 | # Extract byte 23 to determine the protocol. 108 | TCP = True if int(df2_bytes[0][23]) == 6 else False 109 | 110 | 111 | df1_mean = np.mean(df1_bytes, axis=0) 112 | df2_mean = np.mean(df2_bytes, axis=0) 113 | 114 | df1_min = np.min(df1_bytes, axis=0) 115 | df2_min = np.min(df2_bytes, axis=0) 116 | 117 | df1_max = np.max(df1_bytes, axis=0) 118 | df2_max = np.max(df2_bytes, axis=0) 119 | 120 | 121 | 122 | for index, mean in enumerate(df1_mean): 123 | if(index % 25 == 0): 124 | print(c,index) 125 | if df1_mean[index] > 0 or df2_mean[index] > 0: 126 | if(int(df1_min[index]) != int(df1_max[index]) or int(df2_min[index]) != int(df2_max[index])): 127 | if(int(df1_mean[index]) != int(df2_mean[index]) or int(df1_min[index]) != int(df2_min[index]) or int(df1_max[index]) != int(df2_max[index])): 128 | print(index, " : ", int(df1_mean[index]), ' : ' , int(df2_mean[index]), int(df1_min[index]), int(df2_min[index]), int(df1_max[index]), int(df2_max[index])) 129 | headername = byteindextoheaderfield(index, TCP) 130 | headername = headername.replace('/',' ') 131 | createBoxplotsFromColumns(c+headername +':'+str(index), df1_bytes[:,index], df2_bytes[:,index]) 132 | 133 | 134 | # compare_data('/home/mclrn/Data/linux/no_checksum/8/', '/home/mclrn/Data/windows_firefox/no_checksum/8/',8) 135 | 136 | -------------------------------------------------------------------------------- /datasets/LinuxChrome/16/extracted_16-2104_1523.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SalikLP/classification-of-encrypted-traffic/3c86e098aab58941f9339bb64945c1112ab556ef/datasets/LinuxChrome/16/extracted_16-2104_1523.h5 -------------------------------------------------------------------------------- /datasets/LinuxChrome/8/extracted_8-2104_1422.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SalikLP/classification-of-encrypted-traffic/3c86e098aab58941f9339bb64945c1112ab556ef/datasets/LinuxChrome/8/extracted_8-2104_1422.h5 -------------------------------------------------------------------------------- /datasets/WindowsAndreas/16/extracted_16-2304_0932.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SalikLP/classification-of-encrypted-traffic/3c86e098aab58941f9339bb64945c1112ab556ef/datasets/WindowsAndreas/16/extracted_16-2304_0932.h5 -------------------------------------------------------------------------------- /datasets/WindowsAndreas/8/extracted_8-2304_0930.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SalikLP/classification-of-encrypted-traffic/3c86e098aab58941f9339bb64945c1112ab556ef/datasets/WindowsAndreas/8/extracted_8-2304_0930.h5 -------------------------------------------------------------------------------- /datasets/WindowsChrome/16/extracted_16-2004_1615.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SalikLP/classification-of-encrypted-traffic/3c86e098aab58941f9339bb64945c1112ab556ef/datasets/WindowsChrome/16/extracted_16-2004_1615.h5 -------------------------------------------------------------------------------- /datasets/WindowsChrome/8/extracted_8-2004_1553.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SalikLP/classification-of-encrypted-traffic/3c86e098aab58941f9339bb64945c1112ab556ef/datasets/WindowsChrome/8/extracted_8-2004_1553.h5 -------------------------------------------------------------------------------- /datasets/WindowsFirefox/16/extracted_16-2004_2210.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SalikLP/classification-of-encrypted-traffic/3c86e098aab58941f9339bb64945c1112ab556ef/datasets/WindowsFirefox/16/extracted_16-2004_2210.h5 -------------------------------------------------------------------------------- /datasets/WindowsFirefox/8/extracted_8-2004_2104.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SalikLP/classification-of-encrypted-traffic/3c86e098aab58941f9339bb64945c1112ab556ef/datasets/WindowsFirefox/8/extracted_8-2004_2104.h5 -------------------------------------------------------------------------------- /datasets/WindowsSalik/16/extracted_16-2004_1542.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SalikLP/classification-of-encrypted-traffic/3c86e098aab58941f9339bb64945c1112ab556ef/datasets/WindowsSalik/16/extracted_16-2004_1542.h5 -------------------------------------------------------------------------------- /datasets/WindowsSalik/8/extracted_8-2004_1525.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SalikLP/classification-of-encrypted-traffic/3c86e098aab58941f9339bb64945c1112ab556ef/datasets/WindowsSalik/8/extracted_8-2004_1525.h5 -------------------------------------------------------------------------------- /extract_headers.py: -------------------------------------------------------------------------------- 1 | import datetime 2 | 3 | import utils 4 | 5 | if __name__ == '__main__': 6 | loaddir = "/home/mclrn/Data/h5/" 7 | headers = [1, 2, 4, 8, 16] 8 | for headersize in headers: 9 | savedir= "/home/mclrn/Data/h5/" + str(headersize) + "/" 10 | now = datetime.datetime.now() 11 | savename = "extracted_%d-%.2d%.2d_%.2d%.2d" % (headersize, now.day, now.month, now.hour, now.minute) 12 | utils.saveextractedheaders(loaddir, savedir, savename, num_headers=headersize) -------------------------------------------------------------------------------- /extract_payload.py: -------------------------------------------------------------------------------- 1 | import datetime 2 | import utils 3 | import glob 4 | import os 5 | import numpy as np 6 | import pandas as pd 7 | 8 | 9 | if __name__ == '__main__': 10 | loaddir = "E:/Data/h5/" 11 | labels = ['https', 'netflix'] 12 | max_packet_length = 1514 13 | for label in labels: 14 | print("Starting label: " + label) 15 | savedir = loaddir + label + "/" 16 | now = datetime.datetime.now() 17 | savename = "payload_%s-%.2d%.2d_%.2d%.2d" % (label, now.day, now.month, now.hour, now.minute) 18 | filelist = glob.glob(loaddir + label + '*.h5') 19 | # Try only one of each file 20 | fullname = filelist[0] 21 | # for fullname in filelist: 22 | load_dir, filename = os.path.split(fullname) 23 | print("Loading: {0}".format(filename)) 24 | df = utils.load_h5(load_dir, filename) 25 | packets = df['bytes'].values 26 | payloads = [] 27 | labels = [] 28 | filenames = [] 29 | for packet in packets: 30 | if len(packet) == max_packet_length: 31 | # Extract the payload from the packet should have length 1460 32 | payload = packet[54:] 33 | p = np.fromstring(payload, dtype=np.uint8) 34 | payloads.append(p) 35 | labels.append(label) 36 | filenames.append(filename) 37 | d = {'filename': filenames, 'bytes': payloads, 'label': labels} 38 | dataframe = pd.DataFrame(data=d) 39 | key = savename.split('-')[0] 40 | dataframe.to_hdf(savedir + savename + '.h5', key=key, mode='w') 41 | # utils.saveextractedheaders(loaddir, savedir, savename, num_headers=headersize) 42 | print("Done with label: " + label) 43 | -------------------------------------------------------------------------------- /filter_http_https.py: -------------------------------------------------------------------------------- 1 | import utils 2 | import trafficgen.PyTgen.config as cf 3 | import socket 4 | 5 | 6 | def save_dataframe_h5(df, dir, filename): 7 | key = filename.split('-')[0] 8 | df.to_hdf(dir + filename + '.h5', key=key) 9 | 10 | 11 | Conf = cf.Conf 12 | all_ips = [] 13 | for http in Conf.http_urls: 14 | url = http.split('/')[2] 15 | ip_list = [] 16 | ais = socket.getaddrinfo(url, 0, 0, 0, 0) 17 | for result in ais: 18 | ip_list.append(result[-1][0]) 19 | ip_list = list(set(ip_list)) 20 | all_ips.append(ip_list) 21 | 22 | http_list = [ip for ips in all_ips for ip in ips] 23 | 24 | all_ips = [] 25 | for http in Conf.https_urls: 26 | url = http.split('/')[2] 27 | ip_list = [] 28 | ais = socket.getaddrinfo(url, 0, 0, 0, 0) 29 | for result in ais: 30 | ip_list.append(result[-1][0]) 31 | ip_list = list(set(ip_list)) 32 | all_ips.append(ip_list) 33 | https_list = [ip for ips in all_ips for ip in ips] 34 | 35 | dir = 'E:/Data/' 36 | filename = 'http_https-browse' 37 | http_dataframe = utils.filter_pcap_by_ip(dir, filename, http_list, 'http') 38 | save_dataframe_h5(http_dataframe, dir, 'http-browse-1104_2010') 39 | 40 | https_dataframe = utils.filter_pcap_by_ip(dir, filename, https_list, 'https') 41 | save_dataframe_h5(https_dataframe, dir, 'https-browse-1104_2020') 42 | -------------------------------------------------------------------------------- /ip_header_test.py: -------------------------------------------------------------------------------- 1 | import glob 2 | import os 3 | import utils 4 | import numpy as np 5 | import multiprocessing 6 | 7 | num_headers = 1 8 | 9 | load_dir = 'E:/Data/h5/' 10 | 11 | 12 | def ipheadertask(filelist): 13 | j = 1 14 | for fullname in filelist: 15 | print("Loading filenr: {}".format(j)) 16 | load_dir, filename = os.path.split(fullname) 17 | df = utils.load_h5(load_dir, filename) 18 | frames = df['bytes'].values 19 | for i, frame in enumerate(frames): 20 | p = np.fromstring(frame, dtype=np.uint8) 21 | if p[14] != 69: 22 | print("IP Header length not 20! in file {0}".format(filename)) 23 | j += 1 24 | 25 | if __name__ == '__main__': 26 | filelist = glob.glob(load_dir + '*.h5') 27 | filesplits = utils.split_list(filelist, 4) 28 | 29 | threads = [] 30 | for split in filesplits: 31 | # create a thread for each 32 | t = multiprocessing.Process(target=ipheadertask, args=(split,)) 33 | threads.append(t) 34 | t.start() 35 | # create one large dataframe 36 | 37 | for t in threads: 38 | t.join() 39 | print("Process joined: ", t) 40 | -------------------------------------------------------------------------------- /pca/dataanalyzer.py: -------------------------------------------------------------------------------- 1 | import utils 2 | import glob 3 | import os 4 | import pandas as pd 5 | import numpy as np 6 | import math 7 | import pca as p 8 | 9 | 10 | def getbytes(dataframe, payload_length=810): 11 | values = dataframe['bytes'].values 12 | bytes = np.zeros((values.shape[0], payload_length)) 13 | for i, v in enumerate(values): 14 | payload = np.zeros(payload_length, dtype=np.uint8) 15 | payload[:v.shape[0]] = v 16 | bytes[i] = payload 17 | return bytes 18 | 19 | 20 | def getmeanstd(dataframe, label): 21 | labels = dataframe['label'] == label 22 | bytes = getbytes(dataframe[labels]) 23 | # values = dataframe[labels]['bytes'].values 24 | # bytes = np.zeros((values.shape[0], values[0].shape[0])) 25 | # for i, v in enumerate(values): 26 | # bytes[i] = v 27 | 28 | # Ys = (X - np.mean(X, axis=0)) / np.std(X, axis=0) 29 | mean = np.mean(bytes, axis=0) 30 | mean_sub = np.subtract(bytes, mean) 31 | std = mean_sub / np.std(bytes, axis=0) 32 | return mean, mean_sub, std 33 | 34 | 35 | def byteindextoheaderfield(number, TCP=True): 36 | if TCP: 37 | bytenumber = number % 54 38 | else: 39 | bytenumber = number % 42 40 | 41 | if bytenumber in range(6): 42 | return "Destination MAC" 43 | if bytenumber in range(6, 12): 44 | return "Source MAC" 45 | if bytenumber in (12, 13): 46 | return "Eth. Type" 47 | if bytenumber == 14: 48 | return "IP Version and header length" 49 | if bytenumber == 15: 50 | return "Explicit Congestion Notification" 51 | if bytenumber in (16, 17): 52 | return "Total Length (IP header)" 53 | if bytenumber in (18, 19): 54 | return "Identification (IP header)" 55 | if bytenumber in (20, 21): 56 | return "Fragment offset (IP header)" 57 | if bytenumber == 22: 58 | return "Time to live (IP header)" 59 | if bytenumber == 23: 60 | return "Protocol (IP header)" 61 | if bytenumber in (24, 25): 62 | return "Header checksum (IP header)" 63 | if bytenumber in range(26, 30): 64 | return "Source IP (IP header)" 65 | if bytenumber in range(30, 34): 66 | return "Destination IP (IP header)" 67 | if bytenumber in (34, 35): 68 | return "Source Port (TCP/UDP header)" 69 | if bytenumber in (36, 37): 70 | return "Destination Port (TCP/UDP header)" 71 | if bytenumber in range(38, 42): 72 | if TCP: 73 | return "Sequence number (TCP header)" 74 | elif bytenumber in (38, 39): 75 | return "Length of data (UDP Header)" 76 | else: 77 | return "UDP Checksum (UDP Header)" 78 | if bytenumber in range(42, 46): 79 | return "ACK number (TCP header)" 80 | if bytenumber == 46: 81 | return "TCP Header length or Nonce (TCP header)" 82 | if bytenumber == 47: 83 | return "TCP FLAGS (CWR, ECN-ECHO, ACK, PUSH, RST, SYN, FIN) (TCP header)" 84 | if bytenumber in (48, 49): 85 | return "Window size (TCP header)" 86 | if bytenumber in (50, 51): 87 | return "Checksum (TCP header)" 88 | if bytenumber in (52, 53): 89 | return "Urgent Pointer (TCP header)" 90 | 91 | 92 | 93 | 94 | -------------------------------------------------------------------------------- /pca/pca.py: -------------------------------------------------------------------------------- 1 | from sklearn.decomposition import PCA 2 | import matplotlib.pyplot as plt 3 | import numpy as np 4 | 5 | 6 | def runpca(X, num_comp=None): 7 | pca = PCA(n_components=num_comp, svd_solver='full') 8 | pca.fit(X) 9 | # print(pca.n_components_) 10 | # print(pca.explained_variance_ratio_) 11 | # print(sum(pca.explained_variance_ratio_)) 12 | return pca 13 | 14 | 15 | def plotvarianceexp(pca, num_comp): 16 | # plt.ion() 17 | rho = pca.explained_variance_ratio_ 18 | rhosum = np.empty(len(rho)) 19 | for i in range(1, len(rho) + 1): 20 | rhosum[i - 1] = np.sum(rho[0:i]) 21 | ind = np.arange(num_comp) 22 | width = 0.35 23 | opacity = 0.8 24 | fig, ax = plt.subplots() 25 | ax.set_ylim(0, 1.1) 26 | # Variance explained by single component 27 | bars1 = ax.bar(ind, rho[0:num_comp], width, alpha=opacity, color="xkcd:blue", label='Single') 28 | # Variance explained by cummulative component 29 | bars2 = ax.bar(ind + width, rhosum[0:num_comp], width, alpha=opacity, color="g", label='Cummulative') 30 | # add some text for labels, title and axes ticks 31 | ax.set_ylabel('Variance Explained', fontsize=16) 32 | # ax.set_title('Variance Explained by principal components') 33 | ax.set_xticks(ind + width / 2) 34 | 35 | labels = ["" for x in range(num_comp)] 36 | for k in range(0, num_comp): 37 | labels[k] = ('$v{0}$'.format(k + 1)) 38 | ax.set_xticklabels(labels, fontsize=16) 39 | ax.legend(fontsize=16) 40 | autolabel(bars2, ax) 41 | plt.yticks(fontsize=16) 42 | plt.draw() 43 | 44 | 45 | def componentprojection(data, pca): 46 | V = pca.components_.T 47 | Z = data @ V 48 | return Z 49 | 50 | def plotprojection(Z, pc, labels, class_labels): 51 | diff_labels = np.unique(labels) 52 | opacity = 0.8 53 | fig, ax = plt.subplots() 54 | color_map = {0: 'orangered', 1: 'royalblue', 2: 'lightgreen', 3: 'darkorchid', 4: 'teal', 5: 'darkslategrey', 55 | 6: 'darkgreen', 7: 'darkgrey'} 56 | for label in diff_labels: 57 | idx = labels == label 58 | ax.plot(Z[idx, pc], Z[idx, pc + 1], 'o', alpha=opacity, c=color_map[label], label='{label}'.format(label=class_labels[label])) 59 | # ax.plot(Z[idx_below, pc], Z[idx_below, pc + 1], 'o', alpha=opacity, 60 | # label='{name} below mean'.format(name=attributeNames[att])) 61 | ax.set_ylabel('$v{0}$'.format(pc + 2)) 62 | ax.set_xlabel('$v{0}$'.format(pc + 1)) 63 | ax.legend() 64 | ax.set_title('Data projected on v{0} and v{1}'.format(pc+1, pc+2)) 65 | # fig.savefig('v{0}_v{1}_{att}.png'.format(pc + 1, pc + 2, att=attributeNames[att]), dpi=300) 66 | plt.draw() 67 | 68 | 69 | def showplots(): 70 | plt.show() 71 | 72 | 73 | def autolabel(rects, ax): 74 | """ 75 | Attach a text label above each bar displaying its height 76 | """ 77 | for rect in rects: 78 | height = rect.get_height() 79 | ax.text(rect.get_x() + rect.get_width()/2., 1.05*height, 80 | '%4.2f' % height, 81 | ha='center', va='bottom', fontsize=16) -------------------------------------------------------------------------------- /pca/summarystats.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pandas as pd 3 | import utils 4 | # import seaborn as sns 5 | import glob 6 | import os 7 | from scipy import stats, integrate 8 | import matplotlib.pyplot as plt 9 | # sns.set(color_codes=True) 10 | 11 | 12 | def remove_checksum(nrheaders): 13 | # nrheaders = 1 14 | read_dir = '/home/mclrn/Data/linux/' 15 | headers = '{0}/'.format(nrheaders) 16 | dataframes = [] 17 | for fullname in glob.iglob(read_dir + headers + '*.h5'): 18 | filename = os.path.basename(fullname) 19 | df = utils.load_h5(read_dir + headers, filename) 20 | dataframes.append(df) 21 | # create one large dataframe 22 | df = pd.concat(dataframes) 23 | # df = pd.read_hdf('/home/mclrn/Data/salik_windows/{0}/'.format(nrheaders), key="extracted_{0}".format(nrheaders)) 24 | df = df.sample(frac=1).reset_index(drop=True) 25 | values, counts = np.unique(df['label'], return_counts=True) 26 | print(values, counts) 27 | 28 | #selector = df['label'] == 'youtube' 29 | 30 | values = df['bytes'].values 31 | bytes = np.zeros((values.shape[0], nrheaders * 54)) 32 | for i, v in enumerate(values): 33 | payload = np.zeros(nrheaders * 54, dtype=np.uint8) 34 | payload[:v.shape[0]] = v 35 | bytes[i] = payload 36 | 37 | #mean = np.mean(bytes, axis=0) 38 | #min = np.min(bytes, axis=0) 39 | #max = np.max(bytes, axis=0) 40 | #print(np.max(bytes[0:, 23])) # Protocol field if value = 6 then TCP if value = 17 the UDP 41 | bytes_no_checksum = [] 42 | for j, b in enumerate(bytes): 43 | if b[23] == 6: 44 | # TCP 45 | # if bytenumber in (50, 51): 46 | # return "Checksum (TCP header)" 47 | for i in range(nrheaders): 48 | b[i * 54 + 50] = 0 49 | b[i * 54 + 51] = 0 50 | b[i * 54 + 24] = 0 51 | b[i * 54 + 25] = 0 52 | elif b[23] == 17: 53 | # UDP 54 | # if bytenumber in (40,41) 55 | # return "UDP Checksum (UDP Header)" 56 | for i in range(nrheaders): 57 | b[i * 42 + 40] = 0 58 | b[i * 42 + 41] = 0 59 | b[i * 42 + 24] = 0 60 | b[i * 42 + 25] = 0 61 | else: 62 | print("Byte was not 6 nor 17 but: %d" % bytes[23]) 63 | 64 | bytes_no_checksum.append(b) 65 | new_data = {'bytes': bytes_no_checksum, 'label': df['label'].values} 66 | new_df = pd.DataFrame(new_data) 67 | # print(df) 68 | # print(new_df) 69 | save_dir = read_dir + "no_checksum/" + headers 70 | # if not os.path.exists(save_dir): 71 | os.makedirs(save_dir, exist_ok=True) 72 | new_df.to_hdf(save_dir + 73 | "extracted_{0}-no_checksum".format(nrheaders) + '.h5', 74 | key='extracted_{0}'.format(nrheaders), mode='w') 75 | 76 | 77 | nr = [1,2,4,8,16] 78 | for n in nr: 79 | remove_checksum(n) 80 | 81 | 82 | #sns.distplot(bytes[0:,370], kde=False, rug=True) 83 | 84 | ''' 85 | for index, m in enumerate(mean): 86 | if m > 0 and min[index] != max[index]: 87 | print(index, min[index], max[index], mean[index]) 88 | sns.distplot(bytes[0:,index], kde=False, rug=True) 89 | plt.show() 90 | ''' 91 | #plt.show() 92 | #plt.savefig("dist.png") 93 | 94 | #for i in mean: 95 | # print("%.2f" % i) 96 | 97 | 98 | -------------------------------------------------------------------------------- /pcap/pcaptools.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | from scapy.all import * 3 | import glob 4 | import os 5 | import multiprocessing 6 | 7 | from utils import split_list 8 | 9 | 10 | def pcap_cleaner(dir): 11 | """ 12 | This method can be used for cleaning pcap files. 13 | :param dir: Directory containing the pcap files that should be filtered 14 | :return: None 15 | """ 16 | for fullname in glob.iglob(dir + '*.pcap'): 17 | dir, filename = os.path.split(fullname) 18 | command = 'tshark -r %s -2 -R "!(eth.dst[0]&1) && !(tcp.port==5901) && ip" -w %s/filtered/%s' % (fullname,dir, filename) 19 | os.system(command) 20 | 21 | 22 | def save_pcap_task(files, save_dir, session_threshold): 23 | """ 24 | This method takes all files in a list (full path names) and uses the method save_pcap that converts a pcap to h5 format 25 | :param files: 26 | :param session_threshold: 27 | :return: 28 | """ 29 | for fullname in files: 30 | print('Currently saving file: ', fullname) 31 | 32 | save_pcap(fullname, save_dir, session_threshold) 33 | 34 | 35 | def process_pcap_to_h5(read_dir, save_dir, session_threshold=5000): 36 | """ 37 | Use this method to process all pcap files in a directory to a h5 format. 38 | Session threshold is used to filter out all sessions containing fewer packets 39 | :param save_dir: 40 | :param read_dir: Directory containing pcap files that should be converted into h5 format 41 | :param session_threshold: Threshold to filter out session with less packets 42 | :return: None 43 | """ 44 | h5files = [] 45 | 46 | for h5 in glob.iglob(save_dir + '*.h5'): 47 | h5files.append(os.path.basename(h5)) 48 | # Load all files 49 | files = [] 50 | for fullname in glob.iglob(read_dir + '*.pcap'): 51 | filename = os.path.basename(fullname) 52 | h5name = filename +'.h5' 53 | if h5name in h5files: 54 | os.rename(fullname, read_dir + '/processed_pcap/' + filename) 55 | else: 56 | files.append(fullname) 57 | 58 | splits = 4 59 | files_splits = split_list(files, splits) 60 | processes = [] 61 | for file_split in files_splits: 62 | # create a thread for each 63 | t1 = multiprocessing.Process(target=save_pcap_task, args=(file_split, save_dir, session_threshold)) 64 | print("Starting process", t1) 65 | processes.append(t1) 66 | t1.start() 67 | 68 | for process in processes: 69 | process.join() 70 | print("Process joined", process) 71 | 72 | def save_pcap(fullname, save_dir, session_threshold=0): 73 | """ 74 | This method read a pcap file and saves it to an h5 dataframe. 75 | The file is overwritten if it already exists. 76 | :param dir: The folder containing the pcap file 77 | :param filename: The name of the pcap file 78 | :return: Nothing 79 | """ 80 | dir_n, filename = os.path.split(fullname) 81 | df = read_pcap(fullname, filename, session_threshold) 82 | key = filename.split('-')[0] 83 | df.to_hdf(save_dir + filename + '.h5', key=key, mode='w') 84 | 85 | 86 | def read_pcap(fullname, filename, session_threshold=0): 87 | """ 88 | This method will extract the packets of the major session within the pcap file. It will label the packets according 89 | to the filename. 90 | The method excludes packets between local/internal ip adresses (ip.src and ip.dst startswith 10.....) 91 | The method finds the major sessions by counting the packets for each session and calculate a threshold dependent 92 | on the session with most packets. All sessions with more packets than the threshold value is extracted and placed 93 | in the dataframe. 94 | 95 | :param dir: The directory in which the pcap file is located. Should end with a / 96 | :param filename: The name of the pcap file. It is expected to contain the label of the data before the first - char 97 | :return: A dataframe containing the extracted packets. 98 | """ 99 | time_s = time.clock() 100 | label = filename.split('-')[0] 101 | print("Read PCAP, label is %s" % label) 102 | if not filename.endswith('.pcap'): 103 | filename += '.pcap' 104 | data = rdpcap(fullname) 105 | # Workaround/speedup for pandas append to dataframe 106 | frametimes =[] 107 | dsts = [] 108 | srcs = [] 109 | protocols = [] 110 | dports = [] 111 | sports = [] 112 | bytes = [] 113 | labels = [] 114 | 115 | time_r = time.clock() 116 | time_read = time_r-time_s 117 | print("Time to read PCAP: "+ str(time_read)) 118 | sessions = data.sessions(session_extractor=session_extractor) 119 | for id, session in sessions.items(): 120 | if session_threshold is not None: 121 | if len(session) < session_threshold: 122 | continue 123 | for packet in session: 124 | # Check that the packet is transferred by either UDP or TCP and ensure that it is not a packet between to local/internal IP adresses (occurs when using vnc and such) 125 | if IP in packet and (UDP in packet or TCP in packet) and not (packet[IP].dst.startswith('10.') and packet[IP].src.startswith('10.')): 126 | ip_layer = packet[IP] 127 | transport_layer = ip_layer.payload 128 | frametimes.append(packet.time) 129 | dsts.append(ip_layer.dst) 130 | srcs.append(ip_layer.src) 131 | protocols.append(transport_layer.name) 132 | dports.append(transport_layer.dport) 133 | sports.append(transport_layer.sport) 134 | # Save the raw byte string 135 | raw_payload = raw(packet) 136 | bytes.append(raw_payload) 137 | labels.append(label) 138 | time_t = time.clock() 139 | print("Time spend: %ds" % (time_t-time_r)) 140 | d = {'time': frametimes, 141 | 'ip.dst': dsts, 142 | 'ip.src': srcs, 143 | 'protocol': protocols, 144 | 'port.dst': dports, 145 | 'port.src': sports, 146 | 'bytes': bytes, 147 | 'label': labels} 148 | df = pd.DataFrame(data=d) 149 | time_e = time.clock() 150 | total_time = time_e - time_s 151 | print("Time to convert PCAP to dataframe: " + str(total_time)) 152 | return df 153 | 154 | 155 | def session_extractor(p): 156 | """ 157 | Custom session extractor to use for scapy to group bi directional sessions instead of a uni directional flows. 158 | :param p: packet as used by scapy 159 | :return: session string to use a key in dict 160 | """ 161 | sess = "Other" 162 | if 'Ether' in p: 163 | if 'IP' in p: 164 | src = p[IP].src 165 | dst = p[IP].dst 166 | if NTP in p: 167 | if src.startswith('10.') or src.startswith('192.168.'): 168 | sess = p.sprintf("NTP %IP.src%:%r,UDP.sport% > %IP.dst%:%r,UDP.dport%") 169 | elif dst.startswith('10.') or dst.startswith('192.168.'): 170 | sess = p.sprintf("NTP %IP.dst%:%r,UDP.dport% > %IP.src%:%r,UDP.sport%") 171 | elif 'TCP' in p: 172 | if src.startswith('10.') or src.startswith('192.168.'): 173 | sess = p.sprintf("TCP %IP.src%:%r,TCP.sport% > %IP.dst%:%r,TCP.dport%") 174 | elif dst.startswith('10.') or dst.startswith('192.168.'): 175 | sess = p.sprintf("TCP %IP.dst%:%r,TCP.dport% > %IP.src%:%r,TCP.sport%") 176 | elif 'UDP' in p: 177 | if src.startswith('10.') or src.startswith('192.168.'): 178 | sess = p.sprintf("UDP %IP.src%:%r,UDP.sport% > %IP.dst%:%r,UDP.dport%") 179 | elif dst.startswith('10.') or dst.startswith('192.168.'): 180 | sess = p.sprintf("UDP %IP.dst%:%r,UDP.dport% > %IP.src%:%r,UDP.sport%") 181 | elif 'ICMP' in p: 182 | sess = p.sprintf("ICMP %IP.src% > %IP.dst% type=%r,ICMP.type% code=%r,ICMP.code% id=%ICMP.id%") 183 | else: 184 | sess = p.sprintf("IP %IP.src% > %IP.dst% proto=%IP.proto%") 185 | elif 'ARP' in p: 186 | sess = p.sprintf("ARP %ARP.psrc% > %ARP.pdst%") 187 | else: 188 | sess = p.sprintf("Ethernet type=%04xr,Ether.type%") 189 | return sess -------------------------------------------------------------------------------- /tf/confusionmatrix.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import matplotlib 3 | 4 | 5 | class ConfusionMatrix: 6 | """ 7 | Simple confusion matrix class 8 | row is the true class, column is the predicted class 9 | """ 10 | def __init__(self, num_classes, class_names=None): 11 | self.n_classes = num_classes 12 | if class_names is None: 13 | self.class_names = map(str, range(num_classes)) 14 | else: 15 | self.class_names = class_names 16 | 17 | # find max class_name and pad 18 | max_len = max(map(len, self.class_names)) 19 | self.max_len = max_len 20 | for idx, name in enumerate(self.class_names): 21 | if len(self.class_names) < max_len: 22 | self.class_names[idx] = name + " "*(max_len-len(name)) 23 | 24 | self.mat = np.zeros((num_classes, num_classes), dtype='int') 25 | 26 | def __str__(self): 27 | # calucate row and column sums 28 | col_sum = np.sum(self.mat, axis=1) 29 | row_sum = np.sum(self.mat, axis=0) 30 | 31 | s = [] 32 | 33 | mat_str = self.mat.__str__() 34 | mat_str = mat_str.replace('[','').replace(']','').split('\n') 35 | 36 | for idx, row in enumerate(mat_str): 37 | if idx == 0: 38 | pad = " " 39 | else: 40 | pad = "" 41 | class_name = self.class_names[idx] 42 | class_name = " {:10s}".format(class_name) 43 | class_name += " |" 44 | row_str = class_name + pad + row 45 | row_str += " |" + str(col_sum[idx]) 46 | s.append(row_str) 47 | 48 | row_sum = [(self.max_len+7)*" "+" ".join(map(str, row_sum))] 49 | hline = [(1+self.max_len)*" "+"-"*len(row_sum[0])] 50 | 51 | s = hline + s + hline + row_sum 52 | 53 | # add linebreaks 54 | s_out = [line+'\n' for line in s] 55 | return "".join(s_out) 56 | 57 | def batch_add(self, targets, preds): 58 | assert targets.shape == preds.shape 59 | assert len(targets) == len(preds) 60 | assert max(targets) < self.n_classes 61 | assert max(preds) < self.n_classes 62 | targets = targets.flatten() 63 | preds = preds.flatten() 64 | for i in range(len(targets)): 65 | self.mat[targets[i], preds[i]] += 1 66 | 67 | def get_errors(self): 68 | tp = np.asarray(np.diag(self.mat).flatten(), dtype='float') 69 | fn = np.asarray(np.sum(self.mat, axis=1).flatten(), dtype='float') - tp 70 | fp = np.asarray(np.sum(self.mat, axis=0).flatten(), dtype='float') - tp 71 | tn = np.asarray(np.sum(self.mat)*np.ones(self.n_classes).flatten(), 72 | dtype='float') - tp - fn - fp 73 | return tp, fn, fp, tn 74 | 75 | def accuracy(self): 76 | """ 77 | Calculates global accuracy 78 | :return: accuracy 79 | :example: >>> conf = ConfusionMatrix(3) 80 | >>> conf.batchAdd([0,0,1],[0,0,2]) 81 | >>> print conf.accuracy() 82 | """ 83 | tp, _, _, _ = self.get_errors() 84 | n_samples = np.sum(self.mat) 85 | return np.sum(tp) / n_samples 86 | 87 | def sensitivity(self): 88 | tp, tn, fp, fn = self.get_errors() 89 | res = tp / (tp + fn) 90 | res = res[~np.isnan(res)] 91 | return res 92 | 93 | def specificity(self): 94 | tp, tn, fp, fn = self.get_errors() 95 | res = tn / (tn + fp) 96 | res = res[~np.isnan(res)] 97 | return res 98 | 99 | def positive_predictive_value(self): 100 | tp, tn, fp, fn = self.get_errors() 101 | res = tp / (tp + fp) 102 | res = res[~np.isnan(res)] 103 | return res 104 | 105 | def negative_predictive_value(self): 106 | tp, tn, fp, fn = self.get_errors() 107 | res = tn / (tn + fn) 108 | res = res[~np.isnan(res)] 109 | return res 110 | 111 | def false_positive_rate(self): 112 | tp, tn, fp, fn = self.get_errors() 113 | res = fp / (fp + tn) 114 | res = res[~np.isnan(res)] 115 | return res 116 | 117 | def false_discovery_rate(self): 118 | tp, tn, fp, fn = self.get_errors() 119 | res = fp / (tp + fp) 120 | res = res[~np.isnan(res)] 121 | return res 122 | 123 | def F1(self): 124 | tp, tn, fp, fn = self.get_errors() 125 | res = (2*tp) / (2*tp + fp + fn) 126 | res = res[~np.isnan(res)] 127 | return res 128 | 129 | def matthews_correlation(self): 130 | tp, tn, fp, fn = self.get_errors() 131 | numerator = tp*tn - fp*fn 132 | denominator = np.sqrt((tp + fp)*(tp + fn)*(tn + fp)*(tn + fn)) 133 | res = numerator / denominator 134 | res = res[~np.isnan(res)] 135 | return res -------------------------------------------------------------------------------- /tf/dataset.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from tensorflow.python.framework import dtypes 3 | from tensorflow.python.framework import random_seed 4 | from tensorflow.contrib.learn.python.learn.datasets import base 5 | import glob 6 | import os 7 | import utils 8 | import pandas as pd 9 | from sklearn.preprocessing import LabelEncoder 10 | 11 | _label_encoder = LabelEncoder() 12 | 13 | 14 | class DataSet(object): 15 | 16 | def __init__(self, 17 | payloads, 18 | labels, 19 | dtype=dtypes.float32, 20 | seed=None): 21 | """Construct a DataSet. 22 | one_hot arg is used only if fake_data is true. `dtype` can be either 23 | `uint8` to leave the input as `[0, 255]`, or `float32` to rescale into 24 | `[0, 1]`. Seed arg provides for convenient deterministic testing. 25 | """ 26 | seed1, seed2 = random_seed.get_seed(seed) 27 | # If op level seed is not set, use whatever graph level seed is returned 28 | np.random.seed(seed1 if seed is None else seed2) 29 | dtype = dtypes.as_dtype(dtype).base_dtype 30 | if dtype not in (dtypes.uint8, dtypes.float32): 31 | raise TypeError('Invalid payload dtype %r, expected uint8 or float32' % 32 | dtype) 33 | 34 | assert payloads.shape[0] == labels.shape[0], ( 35 | 'payloads.shape: %s labels.shape: %s' % (payloads.shape, labels.shape)) 36 | self._num_examples = payloads.shape[0] 37 | 38 | if dtype == dtypes.float32: 39 | # Convert from [0, 255] -> [0.0, 1.0]. 40 | payloads = payloads.astype(np.float32) 41 | payloads = np.multiply(payloads, 1.0 / 255.0) 42 | 43 | self._payloads = payloads 44 | self._labels = labels 45 | self._epochs_completed = 0 46 | self._index_in_epoch = 0 47 | 48 | @property 49 | def payloads(self): 50 | return self._payloads 51 | 52 | @property 53 | def labels(self): 54 | return self._labels 55 | 56 | @property 57 | def num_examples(self): 58 | return self._num_examples 59 | 60 | @property 61 | def epochs_completed(self): 62 | return self._epochs_completed 63 | 64 | def next_batch(self, batch_size, shuffle=True): 65 | """Return the next `batch_size` examples from this data set.""" 66 | start = self._index_in_epoch 67 | # Shuffle for the first epoch 68 | 69 | if self._epochs_completed == 0 and start == 0 and shuffle: 70 | perm0 = np.arange(self._num_examples) 71 | np.random.shuffle(perm0) 72 | self._payloads = self.payloads[perm0] 73 | self._labels = self.labels[perm0] 74 | # Go to the next epoch 75 | if start + batch_size > self._num_examples: 76 | # Finished epoch 77 | self._epochs_completed += 1 78 | # Get the rest examples in this epoch 79 | rest_num_examples = self._num_examples - start 80 | payloads_rest_part = self._payloads[start:self._num_examples] 81 | labels_rest_part = self._labels[start:self._num_examples] 82 | # Shuffle the data 83 | if shuffle: 84 | perm = np.arange(self._num_examples) 85 | np.random.shuffle(perm) 86 | self._payloads = self.payloads[perm] 87 | self._labels = self.labels[perm] 88 | # Start next epoch 89 | start = 0 90 | self._index_in_epoch = batch_size - rest_num_examples 91 | end = self._index_in_epoch 92 | images_new_part = self._payloads[start:end] 93 | labels_new_part = self._labels[start:end] 94 | return np.concatenate((payloads_rest_part, images_new_part), axis=0), np.concatenate( 95 | (labels_rest_part, labels_new_part), axis=0) 96 | else: 97 | self._index_in_epoch += batch_size 98 | end = self._index_in_epoch 99 | return self._payloads[start:end], self._labels[start:end] 100 | 101 | 102 | def dense_to_one_hot(labels_dense, num_classes): 103 | """Convert class labels from scalars to one-hot vectors.""" 104 | num_labels = labels_dense.shape[0] 105 | index_offset = np.arange(num_labels) * num_classes 106 | labels_one_hot = np.zeros((num_labels, num_classes), dtype=np.int8) 107 | labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1 108 | return labels_one_hot 109 | 110 | 111 | def extract_labels(dataframe, one_hot=False, num_classes=10): 112 | """Extract the labels into a 1D uint8 numpy array [index]. 113 | 114 | Args: 115 | dataframe: A pandas dataframe object. 116 | one_hot: Does one hot encoding for the result. 117 | num_classes: Number of classes for the one hot encoding. 118 | 119 | Returns: 120 | labels: a 1D uint8 numpy array. 121 | """ 122 | print('Extracting labels', ) 123 | labels = dataframe['label'].values 124 | labels = _label_encoder.fit_transform(labels) 125 | if one_hot: 126 | return dense_to_one_hot(labels, num_classes) 127 | return labels 128 | 129 | 130 | def read_data_sets(train_dirs=[], test_dirs=None, 131 | merge_data=True, 132 | one_hot=False, 133 | dtype=dtypes.float32, 134 | validation_size=0.2, 135 | test_size=0.2, 136 | seed=None, 137 | balance_classes=False, 138 | payload_length=810): 139 | trainframes = [] 140 | testframes = [] 141 | for train_dir in train_dirs: 142 | for fullname in glob.iglob(train_dir + '*.h5'): 143 | filename = os.path.basename(fullname) 144 | df = utils.load_h5(train_dir, filename) 145 | trainframes.append(df) 146 | # create one large dataframe 147 | train_data = pd.concat(trainframes) 148 | if test_dirs != train_dirs: 149 | for test_dir in test_dirs: 150 | for fullname in glob.iglob(test_dir + '*.h5'): 151 | filename = os.path.basename(fullname) 152 | df = utils.load_h5(test_dir, filename) 153 | testframes.append(df) 154 | test_data = pd.concat(testframes) 155 | else: 156 | test_data = pd.DataFrame() 157 | 158 | if merge_data: 159 | train_data = pd.concat([test_data, train_data]) 160 | 161 | num_classes = len(train_data['label'].unique()) 162 | 163 | if balance_classes: 164 | values, counts = np.unique(train_data['label'], return_counts=True) 165 | smallest_class = np.argmin(counts) 166 | amount = counts[smallest_class] 167 | new_data = [] 168 | for v in values: 169 | sample = train_data.loc[train_data['label'] == v].sample(n=amount) 170 | new_data.append(sample) 171 | train_data = new_data 172 | train_data = pd.concat(train_data) 173 | 174 | 175 | # shuffle the dataframe and reset the index 176 | train_data = train_data.sample(frac=1, random_state=seed).reset_index(drop=True) 177 | # 178 | # youtube_selector = train_data['label'] == 'youtube' 179 | # youtube_data = train_data[youtube_selector] 180 | # for index, row in youtube_data.iterrows(): 181 | # bytes = row[0] 182 | # if bytes[23] == 17.0: 183 | # train_data.loc[index, 'label'] = 'youtube_udp' 184 | # else: 185 | # train_data.loc[index, 'label'] = 'youtube_tcp' 186 | 187 | if test_dirs != train_dirs: 188 | test_data = test_data.sample(frac=1, random_state=seed).reset_index(drop=True) 189 | test_labels = extract_labels(test_data, one_hot=one_hot, num_classes=num_classes) 190 | test_payloads = test_data['bytes'].values 191 | test_payloads = utils.pad_arrays_with_zero(test_payloads, payload_length=payload_length) 192 | train_labels = extract_labels(train_data, one_hot=one_hot, num_classes=num_classes) 193 | train_payloads = train_data['bytes'].values 194 | # pad with zero up to payload_length length 195 | train_payloads = utils.pad_arrays_with_zero(train_payloads, payload_length=payload_length) 196 | 197 | # TODO make seperate TEST SET ONCE ready 198 | total_length = len(train_payloads) 199 | validation_amount = int(total_length * validation_size) 200 | if merge_data: 201 | test_amount = int(total_length * test_size) 202 | test_payloads = train_payloads[:test_amount] 203 | test_labels = train_labels[:test_amount] 204 | val_payloads = train_payloads[test_amount:(validation_amount + test_amount)] 205 | val_labels = train_labels[test_amount:(validation_amount + test_amount)] 206 | train_payloads = train_payloads[(validation_amount + test_amount):] 207 | train_labels = train_labels[(validation_amount + test_amount):] 208 | else: 209 | val_payloads = train_payloads[:validation_amount] 210 | val_labels = train_labels[:validation_amount] 211 | train_payloads = train_payloads[validation_amount:] 212 | train_labels = train_labels[validation_amount:] 213 | 214 | options = dict(dtype=dtype, seed=seed) 215 | print("Training set size: {0}".format(len(train_payloads))) 216 | print("Validation set size: {0}".format(len(val_payloads))) 217 | print("Test set size: {0}".format(len(test_payloads))) 218 | train = DataSet(train_payloads, train_labels, **options) 219 | validation = DataSet(val_payloads, val_labels, **options) 220 | test = DataSet(test_payloads, test_labels, **options) 221 | 222 | return base.Datasets(train=train, validation=validation, test=test) 223 | -------------------------------------------------------------------------------- /tf/early_stopping.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | class EarlyStopping: 5 | """Stop training when a monitored quantity has stopped improving. 6 | Arguments: 7 | min_delta: minimum change in the monitored quantity 8 | to qualify as an improvement, i.e. an absolute 9 | change of less than min_delta, will count as no 10 | improvement. 11 | patience: number of epochs with no improvement 12 | after which training will be stopped. 13 | """ 14 | 15 | def __init__(self, 16 | min_delta=0, 17 | patience=0): 18 | self.stop_training = False 19 | self.patience = patience 20 | self.min_delta = min_delta 21 | self.wait = 0 22 | self.stopped_epoch = 0 23 | 24 | self.monitor_op = np.less 25 | self.min_delta *= -1 26 | self.best = np.Inf 27 | 28 | def on_train_begin(self): 29 | # Allow instances to be re-used 30 | self.wait = 0 31 | self.stopped_epoch = 0 32 | self.best = np.Inf 33 | self.stop_training = False 34 | 35 | def on_epoch_end(self, epoch, current): 36 | if self.monitor_op(current - self.min_delta, self.best): 37 | self.best = current 38 | self.wait = 0 39 | else: 40 | if self.wait >= self.patience: 41 | self.stopped_epoch = epoch 42 | self.stop_training = True 43 | self.wait += 1 44 | 45 | def on_train_end(self): 46 | if self.stopped_epoch > 0: 47 | print('Epoch {}: early stopping'.format(self.stopped_epoch)) 48 | -------------------------------------------------------------------------------- /tf/tf_utils.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | 3 | 4 | def train_input_fn(features, labels, batch_size): 5 | """An input function for training""" 6 | # Convert the inputs to a Dataset. 7 | dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels)) 8 | # Shuffle, repeat, and batch the examples. 9 | dataset = dataset.shuffle(1000).repeat().batch(batch_size) 10 | # Build the Iterator, and return the read end of the pipeline. 11 | return dataset.make_one_shot_iterator().get_next() 12 | 13 | 14 | def ffn_layer(name, inputs, hidden_units, activation=tf.nn.relu, seed=None): 15 | """Reusable code for making a simple neural net layer. 16 | 17 | It does a matrix multiply, bias add, and then uses relu to nonlinearize. 18 | It also sets up name scoping so that the resultant graph is easy to read, 19 | and adds a number of summary ops. 20 | """ 21 | input_dim = inputs.get_shape().as_list()[1] 22 | # use xavier glorot intitializer as regular uniform dist did not work 23 | weight_initializer = tf.contrib.layers.xavier_initializer(uniform=True, seed=seed, dtype=tf.float32) 24 | with tf.variable_scope(name): 25 | # This Variable will hold the state of the weights for the layer 26 | with tf.name_scope('weights'): 27 | weights = tf.get_variable('W', [input_dim, hidden_units], initializer=weight_initializer) 28 | variable_summaries(weights) 29 | with tf.name_scope('biases'): 30 | biases = tf.get_variable('b', [hidden_units], initializer=tf.zeros_initializer) 31 | variable_summaries(biases) 32 | with tf.name_scope('Wx_plus_b'): 33 | preactivate = tf.matmul(inputs, weights) + biases 34 | tf.summary.histogram('pre_activations', preactivate) 35 | activations = activation(preactivate, name='activation') 36 | tf.summary.histogram('activations', activations) 37 | return activations 38 | # layer = tf.layers.dense(inputs=inputs, units=hidden_units, activation=activation) 39 | # return layer 40 | 41 | 42 | def conv_layer_1d(name, inputs, num_filters=1, filter_size=(1, 1), strides=1, activation=None): 43 | """"A simple function to create a 1D conv layer""" 44 | with tf.variable_scope(name): 45 | # TensorFlow operation for convolution 46 | layer = tf.layers.conv1d(inputs=inputs, 47 | filters=num_filters, 48 | kernel_size=filter_size, 49 | strides=strides, 50 | padding='same', 51 | activation=activation) 52 | return layer 53 | 54 | 55 | def conv_layer_2d(name, inputs, num_filters=1, filter_size=(1, 1), strides=(1,1), activation=None): 56 | """"A simple function to create a 2D conv layer""" 57 | with tf.variable_scope(name): 58 | # TensorFlow operation for convolution 59 | layer = tf.layers.conv2d(inputs=inputs, 60 | filters=num_filters, 61 | kernel_size=filter_size, 62 | strides=strides, 63 | padding='same', 64 | activation=activation) 65 | return layer 66 | 67 | 68 | def max_pool_layer(inputs, name, pool_size=(1, 1), strides=(1, 1), padding='same'): 69 | """"A simple function to create a max pool layer""" 70 | with tf.variable_scope(name): 71 | # TensorFlow operation for max pooling 72 | layer = tf.layers.max_pooling2d(inputs=inputs, 73 | pool_size=pool_size, 74 | strides=strides, 75 | padding=padding) 76 | return layer 77 | 78 | def max_pool_layer1d(inputs, name, pool_size=(1, 1), strides=(1, 1), padding='same'): 79 | """"A simple function to create a max pool layer""" 80 | with tf.variable_scope(name): 81 | # TensorFlow operation for max pooling 82 | layer = tf.layers.max_pooling1d(inputs=inputs, 83 | pool_size=pool_size, 84 | strides=strides, 85 | padding=padding) 86 | return layer 87 | def dropout(inputs, keep_prob=1.0): 88 | with tf.name_scope('dropout'): 89 | # keep_prob = tf.placeholder(tf.float32) 90 | tf.summary.scalar('dropout_keep_probability', keep_prob) 91 | return tf.nn.dropout(inputs, keep_prob) 92 | 93 | 94 | def variable_summaries(var): 95 | """Attach a lot of summaries to a Tensor (for TensorBoard visualization).""" 96 | with tf.name_scope('summaries'): 97 | mean = tf.reduce_mean(var) 98 | tf.summary.scalar('mean', mean) 99 | with tf.name_scope('stddev'): 100 | stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean))) 101 | tf.summary.scalar('stddev', stddev) 102 | tf.summary.scalar('max', tf.reduce_max(var)) 103 | tf.summary.scalar('min', tf.reduce_min(var)) 104 | tf.summary.histogram('histogram', var) 105 | 106 | # 107 | # mnist_data = input_data.read_data_sets('MNIST_data', 108 | # one_hot=True, # Convert the labels into one hot encoding 109 | # dtype='float32', # rescale images to `[0, 1]` 110 | # reshape=False, # Don't flatten the images to vectors 111 | # ) 112 | 113 | # # Simple MNIST test of the layers 114 | # 115 | # tf.reset_default_graph() 116 | # num_classes = 10 117 | # cm = conf.ConfusionMatrix(num_classes) 118 | # height, width, nchannels = 28, 28, 1 119 | # gpu_opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.45) 120 | # filters_1 = 16 121 | # kernel_size_1 = (5,5) 122 | # pool_size_1 = (2,2) 123 | # x_pl = tf.placeholder(tf.float32, [None, height, width, nchannels], name='xPlaceholder') 124 | # y_pl = tf.placeholder(tf.float64, [None, num_classes], name='yPlaceholder') 125 | # y_pl = tf.cast(y_pl, tf.float32) 126 | # 127 | # x = conv_layer_2d('layer1', x_pl, filters_1, kernel_size_1, activation=tf.nn.relu) 128 | # x = max_pool_layer(x, 'max_pool', pool_size = pool_size_1, strides=pool_size_1) 129 | # x = tf.contrib.layers.flatten(x) 130 | # 131 | # y = ffn_layer('output_layer', x, hidden_units=num_classes, activation=tf.nn.softmax) 132 | # 133 | # with tf.variable_scope('loss'): 134 | # # computing cross entropy per sample 135 | # cross_entropy = -tf.reduce_sum(y_pl * tf.log(y + 1e-8), reduction_indices=[1]) 136 | # 137 | # # averaging over samples 138 | # cross_entropy = tf.reduce_mean(cross_entropy) 139 | # 140 | # with tf.variable_scope('training'): 141 | # # defining our optimizer 142 | # optimizer = tf.train.AdamOptimizer(learning_rate=0.001) 143 | # 144 | # # applying the gradients 145 | # train_op = optimizer.minimize(cross_entropy) 146 | # 147 | # with tf.variable_scope('performance'): 148 | # # making a one-hot encoded vector of correct (1) and incorrect (0) predictions 149 | # correct_prediction = tf.equal(tf.argmax(y, axis=1), tf.argmax(y_pl, axis=1)) 150 | # 151 | # # averaging the one-hot encoded vector 152 | # accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 153 | # 154 | # 155 | # # Training Loop 156 | # batch_size = 100 157 | # max_epochs = 10 158 | # 159 | # valid_loss, valid_accuracy = [], [] 160 | # train_loss, train_accuracy = [], [] 161 | # test_loss, test_accuracy = [], [] 162 | # 163 | # with tf.Session() as sess: 164 | # sess.run(tf.global_variables_initializer()) 165 | # print('Begin training loop') 166 | # 167 | # try: 168 | # while mnist_data.train.epochs_completed < max_epochs: 169 | # _train_loss, _train_accuracy = [], [] 170 | # 171 | # ## Run train op 172 | # x_batch, y_batch = mnist_data.train.next_batch(batch_size) 173 | # fetches_train = [train_op, cross_entropy, accuracy] 174 | # feed_dict_train = {x_pl: x_batch, y_pl: y_batch} 175 | # _, _loss, _acc = sess.run(fetches_train, feed_dict_train) 176 | # 177 | # _train_loss.append(_loss) 178 | # _train_accuracy.append(_acc) 179 | # 180 | # ## Compute validation loss and accuracy 181 | # if mnist_data.train.epochs_completed % 1 == 0 \ 182 | # and mnist_data.train._index_in_epoch <= batch_size: 183 | # train_loss.append(np.mean(_train_loss)) 184 | # train_accuracy.append(np.mean(_train_accuracy)) 185 | # 186 | # fetches_valid = [cross_entropy, accuracy] 187 | # 188 | # feed_dict_valid = {x_pl: mnist_data.validation.images, y_pl: mnist_data.validation.labels} 189 | # _loss, _acc = sess.run(fetches_valid, feed_dict_valid) 190 | # 191 | # valid_loss.append(_loss) 192 | # valid_accuracy.append(_acc) 193 | # print( 194 | # "Epoch {} : Train Loss {:6.3f}, Train acc {:6.3f}, Valid loss {:6.3f}, Valid acc {:6.3f}".format( 195 | # mnist_data.train.epochs_completed, train_loss[-1], train_accuracy[-1], valid_loss[-1], 196 | # valid_accuracy[-1])) 197 | # 198 | # test_epoch = mnist_data.test.epochs_completed 199 | # while mnist_data.test.epochs_completed == test_epoch: 200 | # x_batch, y_batch = mnist_data.test.next_batch(batch_size) 201 | # feed_dict_test = {x_pl: x_batch, y_pl: y_batch} 202 | # _loss, _acc = sess.run(fetches_valid, feed_dict_test) 203 | # y_preds = sess.run(fetches=y, feed_dict=feed_dict_test) 204 | # y_preds = tf.argmax(y_preds, axis=1).eval() 205 | # y_true = tf.argmax(y_batch, axis=1).eval() 206 | # cm.batch_add(y_true,y_preds) 207 | # test_loss.append(_loss) 208 | # test_accuracy.append(_acc) 209 | # print('Test Loss {:6.3f}, Test acc {:6.3f}'.format( 210 | # np.mean(test_loss), np.mean(test_accuracy))) 211 | # 212 | # 213 | # except KeyboardInterrupt: 214 | # pass 215 | # 216 | # 217 | # print(cm.accuracy()) 218 | # epoch = np.arange(len(train_loss)) 219 | # plt.figure() 220 | # plt.plot(epoch, train_accuracy,'r', epoch, valid_accuracy,'b') 221 | # plt.legend(['Train Acc','Val Acc'], loc=4) 222 | # plt.xlabel('Epochs'), plt.ylabel('Acc'), plt.ylim([0.75,1.03]) 223 | # plt.show() -------------------------------------------------------------------------------- /trafficgen/PyTgen/config.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Config file adaptation/modification of the PyTgen generator config.py 3 | ''' 4 | 5 | import logging 6 | 7 | 8 | # 9 | # This is a default configuration for the classification of encrypted traffic project 10 | # 11 | class Conf(object): 12 | # maximum number of worker threads that can be used to execute the jobs. 13 | # the program will start using 3 threads and spawn new ones if needed. 14 | # this setting depends on the number of jobs that have to be executed 15 | # simultaneously (not the number of jobs given in the config file). 16 | maxthreads = 15 17 | 18 | # set to "logging.INFO" or "logging.DEBUG" 19 | loglevel = logging.DEBUG 20 | 21 | # ssh commands that will be randomly executed by the ssh traffic generator 22 | ssh_commands = ['ls', 'cd', 'cd /etc', 'ps ax', 'date', 'mount', 'free', 'vmstat', 23 | 'touch /tmp/tmpfile', 'rm /tmp/tmpfile', 'ls /tmp/tmpfile', 24 | 'tail /etc/hosts', 'tail /etc/passwd', 'tail /etc/fstab', 25 | 'cat /var/log/messages', 'cat /etc/group', 'cat /etc/mtab'] 26 | 27 | # urls the http generator will randomly fetch from 28 | https_urls = ['https://www.dr.dk/', 'https://da.wikipedia.org/wiki/Forside', 29 | 'https://en.wikipedia.org/wiki/Main_Page', 'https://www.dk-hostmaster.dk/', 30 | 'https://www.cph.dk/', 'https://translate.google.com/', 'https://www.borger.dk/', 31 | 'https://www.sdu.dk/da/', 'https://www.sundhed.dk/', 'https://www.facebook.com/', 32 | 'https://www.ug.dk/', 'https://erhvervsstyrelsen.dk/', 'https://www.nets.eu/dk-da', 33 | 'https://www.jobindex.dk/', 'https://www.rejseplanen.dk/webapp/index.html', 'https://yousee.dk/', 34 | 'https://www.sparnord.dk/', 'https://gigahost.dk/', 'https://www.information.dk/', 35 | 'https://stps.dk/', 'https://www.skat.dk/', 'https://danskebank.dk/privat', 'https://www.sst.dk/'] 36 | 37 | http_urls = ['http://naturstyrelsen.dk/', 'http://www.valutakurser.dk/', 'http://ordnet.dk/ddo/forside', 38 | 'http://www.speedtest.net/', 'http://bygningsreglementet.dk/', 'http://www.ft.dk/', 'http://tv2.dk/', 39 | 'http://www.kl.dk/', 'http://www.symbiosis.dk/', 'http://www.noegletal.dk/', 40 | 'http://novonordiskfonden.dk/da', 'http://frida.fooddata.dk/', 41 | 'http://www.arbejdsmiljoforskning.dk/da', 'http://www.su.dk/', 'http://www.trafikstyrelsen.dk/da.aspx', 42 | 'http://www.regioner.dk/', 'http://www.geus.dk/UK/Pages/default.aspx', 'http://bm.dk/', 43 | 'http://www.m.dk/#!/', 'http://www.regionsjaelland.dk/Sider/default.aspx', 44 | 'http://www.trafikstyrelsen.dk/da.aspx'] 45 | # http_intern = ['http://web.intern.ndsec'] 46 | 47 | # a number of files that will randomly be used for ftp upload 48 | ftp_put = ['~/files/file%s' % i for i in range(0, 9)] 49 | 50 | # a number of files that will randomly be used for ftp download 51 | ftp_get = ['~/files/file%s' % i for i in range(0, 9)] 52 | 53 | # array of source-destination tuples for sftp upload 54 | sftp_put = [('~/files/file%s' % i, '/tmp/file%s' % i) for i in range(0, 9)] 55 | 56 | # array of source-destination tuples for sftp download 57 | sftp_get = [('/media/share/files/file%s' % i, '~/files/tmp/file%s' % i) for i in range(0, 9)] 58 | 59 | # significant part of the shell prompt to be able to recognize 60 | # the end of a telnet data transmission 61 | telnet_prompt = "$ " 62 | 63 | # job configuration (see config.example.py) 64 | jobdef = [ 65 | # http (intern) 66 | # ('http_gen', [(start_hour, start_min), (end_hour, end_min), (interval_min, interval_sec)], [urls, retry, sleep_multiply]) 67 | # ('http_gen', [(9, 0), (16, 30), (60, 0)], [http_intern, 2, 30]), 68 | # ('http_gen', [(9, 55), (9, 30), (5, 0)], [http_intern, 5, 20]), 69 | # ('http_gen', [(12, 0), (12, 30), (2, 0)], [http_intern, 6, 10]), 70 | # ('http_gen', [(10, 50), (12, 0), (10, 0)], [http_intern, 2, 10]), 71 | # ('http_gen', [(15, 0), (17, 30), (30, 0)], [http_intern, 8, 20]), 72 | # 73 | # http (extern) 74 | # ('http_gen', [(12, 0), (12, 30), (5, 0)], [http_extern, 10, 20]), 75 | # ('http_gen', [(9, 0), (17, 0), (30, 0)], [http_extern, 5, 30]), 76 | ('http_gen', [(11, 0), (13, 0), (0, 10)], [http_urls, 1, 5]), 77 | ('http_gen', [(11, 0), (13, 0), (0, 10)], [https_urls, 1, 5]), 78 | # ('http_gen', [(9, 0), (17, 0), (90, 0)], [http_extern, 10, 30]), 79 | # ('http_gen', [(12, 0), (12, 10), (5, 0)], [http_extern, 15, 20]), 80 | # 81 | # smtp 82 | # ('smtp_gen', [(9, 0), (18, 0), (120, 0)], ['mail.extern.ndsec', 'mail2', 'mail', 'mail2@mail.extern.ndsec', 'mail52@mail.extern.ndsec']), 83 | # ('smtp_gen', [(12, 0), (13, 0), (30, 0)], ['mail.extern.ndsec', 'mail20', 'mail', 'mail20@mail.extern.ndsec', 'mail51@mail.extern.ndsec']), 84 | # 85 | # ftp 86 | # ('ftp_gen', [(9, 0), (11, 0), (15, 0)], ['ftp.intern.ndsec', 'ndsec', 'ndsec', ftp_put, ftp_get, 10, False, 5]), 87 | # ('ftp_gen', [(10, 0), (18, 0), (135, 0)], ['ftp.intern.ndsec', 'ndsec', 'ndsec', ftp_put, [], 2, False]), 88 | # 89 | # nfs / smb 90 | # ('copy_gen', [(9, 0), (12, 0), (90, 0)], [None, 'Z:/tmp/dummyfile.txt', 30]), 91 | # ('copy_gen', [(10, 0), (16, 0), (120, 0)], [None, 'Z:/tmp/dummyfile.txt', 80]), 92 | # ('copy_gen', [(12, 0), (17, 0), (160, 0)], [None, 'Z:/tmp/dummyfile.txt', 180]), 93 | # ('copy_gen', [(9, 0), (18, 0), (0, 10)], ['file1', 'file2']), 94 | # 95 | # telnet 96 | # ('telnet_gen', [(9, 0), (18, 0), (60, 0)], ['telnet.intern.ndsec', None, 'ndsec', 'ndsec', 5, ssh_commands, telnet_prompt, 10]), 97 | # ('telnet_gen', [(9, 0), (18, 0), (240, 0)], ['telnet.intern.ndsec', 23, 'ndsec', 'ndsec', 2, [], telnet_prompt]), 98 | # ('telnet_gen', [(16, 0), (18, 0), (120, 0)], ['telnet.intern.ndsec', 23, 'ndsec', 'wrongpass', 2, [], telnet_prompt]), 99 | # 100 | # ssh 101 | # ('ssh_gen', [(9, 0), (18, 0), (120, 0)], ['ssh.intern.ndsec', 22, 'ndsec', 'ndsec', 10, ssh_commands]), 102 | # ('ssh_gen', [(9, 0), (18, 0), (240, 0)], ['ssh.intern.ndsec', 22, 'ndsec', 'ndsec', 60, [], 30]), 103 | # ('ssh_gen', [(9, 0), (18, 0), (120, 0)], ['192.168.10.50', 22, 'dummy1', 'dummy1', 5, ssh_commands]), 104 | # ('ssh_gen', [(12, 0), (14, 0), (120, 0)], ['ssh.intern.ndsec', 22, 'dummy1', 'wrongpass', 5, ssh_commands]), 105 | # 106 | # sftp 107 | # ('sftp_gen', [(17, 0), (18, 0), (60, 0)], ['127.0.0.1', 22, 'user', 'pass', sftp_put, sftp_get, 5, 1]), 108 | ] -------------------------------------------------------------------------------- /trafficgen/PyTgen/core/__init__.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Python 3 adaption/modification of the PyTgen __init__ 3 | ''' 4 | from .runner import runner 5 | from .scheduler import scheduler -------------------------------------------------------------------------------- /trafficgen/PyTgen/core/generator.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Python 3 adaption/modification of the PyTgen generator 3 | ''' 4 | import logging 5 | import random 6 | import datetime 7 | import time 8 | import base64 9 | import string 10 | import os 11 | 12 | import smtplib # Used for smtp/mail 13 | import ftplib # Used for ftp 14 | import shutil # Used for network copy 15 | import telnetlib # Used for telnet traffic 16 | import urllib3 # Used for http/https 17 | # import paramiko # Used for ssh + sftp 18 | urllib3.disable_warnings() 19 | logging.getLogger("urllib3").setLevel(logging.WARNING) 20 | 21 | class http_gen(): 22 | ''' 23 | http and https generator 24 | send HTTP GET requests to a webserver to retrieve a given URL n times. 25 | delay the requests by a random time controlled by a multiplier. 26 | to get best results with this generator, the size of the http answers 27 | should differ (send random data) 28 | ''' 29 | __generator__ = 'http' 30 | 31 | def __init__(self, 32 | params): 33 | self._urls = params[0] 34 | self._num = params[1] 35 | self._multiplier = 5 36 | 37 | if len(params) == 3: 38 | self._multiplier = params[2] 39 | 40 | def __call__(self): 41 | for _ in range(self._num): 42 | url = self._urls[random.randint(0, (len(self._urls) - 1))] 43 | logging.getLogger(self.__generator__).info("Requesting: %s", url) 44 | 45 | try: 46 | for __ in range(int(random.random() * 10 + 1)): 47 | http = urllib3.PoolManager() 48 | response = http.request('GET', url) 49 | logging.getLogger(self.__generator__).debug("Recieved %s bytes from %s", 50 | str(len(response.data)), 51 | url) 52 | 53 | except: 54 | logging.getLogger(self.__generator__).debug("Failed to request %s", 55 | url) 56 | 57 | time.sleep(random.random() * self._multiplier) 58 | 59 | class smtp_gen(): 60 | ''' 61 | smtp generator 62 | connect to an smtp server and send an email containing a random length 63 | string to a destination email address. 64 | ''' 65 | __generator__ = "smtp" 66 | 67 | def __init__(self, 68 | params): 69 | self._host = params[0] 70 | self._user = params[1] 71 | self._pass = params[2] 72 | self._from = params[3] 73 | self._to = params[4] 74 | 75 | def __call__(self): 76 | rnd = ''.join(random.choice(string.ascii_letters) for _ in range(int(3000 * random.random()))) 77 | 78 | msg = "From: " + self._from + "\r\n" \ 79 | + "To: " + self._to + "\r\n" \ 80 | + "Subject: PyTgen " + str(datetime.datetime.now()) + "\r\n\r\n" \ 81 | + rnd + "\r\n" 82 | 83 | logging.getLogger(self.__generator__).info("Sending email to %s (size: %s)", 84 | self._host, 85 | len(rnd)) 86 | 87 | try: 88 | sender = smtplib.SMTP(self._host, 25) 89 | 90 | try: 91 | logging.getLogger(self.__generator__).debug("Using TLS") 92 | sender.starttls() 93 | 94 | except: 95 | pass 96 | 97 | try: 98 | sender.login(self._user, self._pass) 99 | 100 | except smtplib.SMTPAuthenticationError: 101 | logging.getLogger(self.__generator__).debug("Using PLAIN auth") 102 | sender.docmd("AUTH LOGIN", base64.b64encode(self._user)) 103 | sender.docmd(base64.b64encode(self._pass), "") 104 | 105 | sender.sendmail(self._from, self._to, msg) 106 | logging.getLogger(self.__generator__).debug("Email sent successful") 107 | 108 | except: 109 | raise 110 | 111 | else: 112 | sender.quit() 113 | 114 | class ftp_gen(): 115 | ''' 116 | ftp and ftp_tls generator 117 | connect to a host using ftp and start uploading and downloading files. 118 | The files to be put or retrieved are specified in an array and are 119 | randomly choosen. An empty array will skip upload or download. 120 | ''' 121 | __generator__ = 'ftp' 122 | 123 | def __init__(self, 124 | params): 125 | self._host = params[0] 126 | self._user = params[1] 127 | self._pass = params[2] 128 | self._put = params[3] 129 | self._get = params[4] 130 | self._num = params[5] 131 | self._tls = params[6] 132 | self._multiplier = 5 133 | 134 | if len(params) == 7: 135 | self._multiplier = params[6] 136 | 137 | def __call__(self): 138 | ftp = None 139 | if self._tls == True: 140 | logging.getLogger(self.__generator__).info("Connecting to ftps://%s", 141 | self._host) 142 | try: 143 | # 20% chanche to login with a wrong password at first try 144 | if random.random() > 0.8: 145 | try: 146 | logging.getLogger(self.__generator__).debug("Logging in with wrong credentials") 147 | ftp = ftplib.FTP_TLS(self._host, 148 | self._user, 149 | "wrongpass") 150 | except: 151 | pass 152 | 153 | time.sleep(2 * random.random()) 154 | 155 | ftp = ftplib.FTP_TLS(self._host, 156 | self._user, 157 | self._pass) 158 | ftp.prot_p() 159 | 160 | except: 161 | logging.getLogger(self.__generator__).debug("Error connecting to ftps://%s", 162 | self._host) 163 | 164 | else: 165 | logging.getLogger(self.__generator__).info("Connecting to ftp://%s", 166 | self._host) 167 | try: 168 | # 20% chanche to login with a wrong password at first try 169 | if random.random() > 0.8: 170 | try: 171 | logging.getLogger(self.__generator__).debug("Logging in with wrong credentials") 172 | ftp = ftplib.FTP(self._host, 173 | self._user, 174 | "wrongpass") 175 | except: 176 | pass 177 | 178 | time.sleep(2 * random.random()) 179 | 180 | ftp = ftplib.FTP(self._host, 181 | self._user, 182 | self._pass) 183 | 184 | except: 185 | logging.getLogger(self.__generator__).debug("Error connecting to ftp://%s", 186 | self._host) 187 | 188 | if ftp is not None: 189 | ftp.retrlines('LIST') 190 | 191 | for _ in range(self._num): 192 | if len(self._put) is not 0: 193 | ressource = self._put[random.randint(0, (len(self._put) - 1))] 194 | (path, filename) = os.path.split(ressource) 195 | 196 | logging.getLogger(self.__generator__).debug("Uploading %s", 197 | ressource) 198 | 199 | f = open(ressource, 'r') 200 | ftp.storbinary("STOR " + filename, f) 201 | f.close() 202 | 203 | time.sleep(self._multiplier * random.random()) 204 | 205 | if len(self._get) is not 0: 206 | ressource = self._get[random.randint(0, (len(self._get) - 1))] 207 | 208 | logging.getLogger(self.__generator__).debug("Downloading %s", 209 | ressource) 210 | 211 | ftp.retrbinary('RETR ' + ressource, self._getfile) 212 | 213 | time.sleep(self._multiplier * random.random()) 214 | 215 | ftp.quit() 216 | 217 | def _getfile(self, 218 | ressource): 219 | pass 220 | 221 | class copy_gen(): 222 | ''' 223 | copy generator. 224 | copy files or directories from a source to a destination. this generator 225 | can be used to generate traffic on network filesystems like nfs or smb. 226 | A random source file will be generated if the source parameter is set to 227 | "None". The size of the generated source file can be controlled by an 228 | optional size parameter (default = 8192 byte) 229 | ''' 230 | __generator__ = "copy" 231 | 232 | def __init__(self, 233 | params): 234 | self._src = params[0] 235 | self._dst = params[1] 236 | self._size = 8192 237 | 238 | if len(params) == 3: 239 | self._size = params[2] * 1024 240 | 241 | def __call__(self): 242 | if self._src is not None: 243 | logging.getLogger(self.__generator__).info("Copying from %s to %s", 244 | self._src, 245 | self._dst) 246 | 247 | if os.path.isdir(self._src): 248 | dst = self._dst + "/" + self._src 249 | 250 | if os.path.exists(dst): 251 | logging.getLogger(self.__generator__).debug("Destination %s exists. Deleting it.", 252 | dst) 253 | shutil.rmtree(dst) 254 | 255 | try: 256 | shutil.copytree(self._src, dst) 257 | except: 258 | logging.getLogger(self.__generator__).debug("Error copying %s to %s", 259 | self._src, 260 | dst) 261 | 262 | else: 263 | try: 264 | shutil.copy2(self._src, self._dst) 265 | except: 266 | logging.getLogger(self.__generator__).debug("Error copying %s to %s", 267 | self._src, 268 | self._dst) 269 | 270 | else: 271 | if (not os.path.exists(self._dst)) or (os.path.isfile(self._dst)): 272 | rnd = ''.join(random.choice(string.ascii_letters) for _ in range(int(self._size * random.random()))) 273 | 274 | logging.getLogger(self.__generator__).info("Writing %s byte to %s", 275 | len(rnd), 276 | self._dst) 277 | 278 | try: 279 | f = open(self._dst, "w") 280 | f.write(rnd) 281 | 282 | except: 283 | logging.getLogger(self.__generator__).debug("Error writing to %s", 284 | self._dst) 285 | 286 | else: 287 | f.close() 288 | 289 | else: 290 | logging.getLogger(self.__generator__).info("Destination %s is not a file", 291 | self._dst) 292 | 293 | 294 | class telnet_gen(): 295 | ''' 296 | telnet generator. 297 | connect to a host using telnet and start sending commands to the host. The 298 | connection will be kept open until the connection time provided in the 299 | config is over. If the commands array is empty, the connection will idle 300 | until connection time is over. 301 | ''' 302 | __generator__ = "telnet" 303 | 304 | def __init__(self, 305 | params): 306 | self._host = params[0] 307 | self._port = params[1] 308 | self._user = params[2] 309 | self._pass = params[3] 310 | self._time = params[4] 311 | self._cmds = params[5] 312 | self._prompt = params[6] 313 | self._multiplier = 60 314 | 315 | if len(params) == 8: 316 | self._multiplier = params[7] 317 | 318 | def __call__(self): 319 | logging.getLogger(self.__generator__).info("Connecting to %s", 320 | self._host) 321 | 322 | realmin = self._time * 2 * random.random() 323 | endtime = datetime.datetime.now() + datetime.timedelta(minutes = realmin) 324 | 325 | try: 326 | tn = telnetlib.Telnet(self._host, self._port) 327 | 328 | try: 329 | tn.read_until("login: ") 330 | tn.write(self._user + "\n") 331 | if self._pass is not None: 332 | tn.read_until("Password: ") 333 | tn.write(self._pass + "\n") 334 | 335 | except: 336 | logging.getLogger(self.__generator__).debug("Error logging in") 337 | 338 | except: 339 | logging.getLogger(self.__generator__).debug("Error connecting to %s", 340 | self._host) 341 | 342 | else: 343 | while datetime.datetime.now() < endtime: 344 | if len(self._cmds) is not 0: 345 | self._send_cmds(tn) 346 | 347 | else: 348 | time.sleep(realmin) 349 | break 350 | 351 | time.sleep(self._multiplier * random.random()) 352 | 353 | tn.write("exit\n") 354 | tn.read_all() 355 | 356 | def _send_cmds(self, 357 | tn): 358 | for _ in range(int(6 * random.random())): 359 | cmd = self._cmds[random.randint(0, (len(self._cmds) - 1))] 360 | logging.getLogger(self.__generator__).debug("Sending command: %s", 361 | cmd) 362 | tn.read_very_eager() 363 | tn.write(cmd + '\n') 364 | tn.read_eager() 365 | time.sleep(5 * random.random()) 366 | 367 | class ssh_gen(): 368 | ''' 369 | ssh generator. 370 | connect to a host using ssh and start sending commands to the host. The 371 | connection will be kept open until the connection time provided in the 372 | config is over. If the commands array is empty, the connection will idle 373 | until connection time is over. 374 | ''' 375 | __generator__ = "ssh" 376 | 377 | def __init__(self, 378 | params): 379 | self._host = params[0] 380 | self._port = params[1] 381 | self._user = params[2] 382 | self._pass = params[3] 383 | self._time = params[4] 384 | self._cmds = params[5] 385 | self._multiplier = 60 386 | 387 | if len(params) == 7: 388 | self._multiplier = params[6] 389 | 390 | def __call__(self): 391 | logging.getLogger("paramiko").setLevel(logging.INFO) 392 | logging.getLogger(self.__generator__).info("Connecting to %s", 393 | self._host) 394 | 395 | realmin = self._time * 2 * random.random() 396 | endtime = datetime.datetime.now() + datetime.timedelta(minutes = realmin) 397 | 398 | client = paramiko.SSHClient() 399 | client.load_system_host_keys() 400 | client.set_missing_host_key_policy(paramiko.AutoAddPolicy()) 401 | 402 | try: 403 | # 20% chanche to login with a wrong password at first try 404 | if random.random() > 0.8: 405 | try: 406 | client.connect(self._host, 407 | self._port, 408 | self._user, 409 | "wrongpass") 410 | except: 411 | pass 412 | 413 | time.sleep(2 * random.random()) 414 | 415 | client.connect(self._host, 416 | self._port, 417 | self._user, 418 | self._pass) 419 | 420 | except: 421 | logging.getLogger(self.__generator__).debug("Error connecting to %s", 422 | self._host) 423 | 424 | else: 425 | while datetime.datetime.now() < endtime: 426 | if len(self._cmds) is not 0: 427 | self._send_cmds(client) 428 | 429 | else: 430 | time.sleep(realmin) 431 | break 432 | 433 | time.sleep(self._multiplier * random.random()) 434 | 435 | client.close() 436 | 437 | def _send_cmds(self, 438 | client): 439 | for _ in range(3): 440 | cmd = self._cmds[random.randint(0, (len(self._cmds) - 1))] 441 | logging.getLogger(self.__generator__).debug("Sending command: %s", 442 | cmd) 443 | client.exec_command(cmd) 444 | time.sleep(5 * random.random()) 445 | 446 | class sftp_gen(): 447 | ''' 448 | sftp generator. 449 | this generator will connect to a host using ssh subsystem sftp. It then 450 | starts sending uploading and downloading files provided via the _get and 451 | _put parameters. 452 | ''' 453 | __generator__ = "sftp" 454 | 455 | def __init__(self, 456 | params): 457 | self._host = params[0] 458 | self._port = params[1] 459 | self._user = params[2] 460 | self._pass = params[3] 461 | self._put = params[4] 462 | self._get = params[5] 463 | self._time = params[6] 464 | self._multiplier = 60 465 | 466 | if len(params) == 8: 467 | self._multiplier = params[7] 468 | 469 | def __call__(self): 470 | logging.getLogger("paramiko").setLevel(logging.INFO) 471 | logging.getLogger(self.__generator__).info("Connecting to %s", self._host) 472 | 473 | realmin = self._time * 2 * random.random() 474 | endtime = datetime.datetime.now() + datetime.timedelta(minutes = realmin) 475 | 476 | try: 477 | transport = paramiko.Transport((self._host, 478 | self._port)) 479 | transport.connect(username = self._user, 480 | password = self._pass) 481 | 482 | sftp = paramiko.SFTPClient.from_transport(transport) 483 | 484 | except: 485 | logging.getLogger(self.__generator__).debug("Error connecting to %s", 486 | self._host) 487 | 488 | else: 489 | while datetime.datetime.now() < endtime: 490 | put = self._put 491 | get = self._get 492 | 493 | if len(put) is 0 and len(get) is 0: 494 | time.sleep(realmin) 495 | break 496 | 497 | while len(put) is not 0: 498 | (src, dst) = put.pop(len(put) - 1) 499 | 500 | logging.getLogger(self.__generator__).debug("Uploading %s to %s", 501 | src, dst) 502 | sftp.put(src, dst) 503 | time.sleep(2 * random.random()) 504 | 505 | while len(get) is not 0: 506 | (src, dst) = get.pop(len(get) - 1) 507 | 508 | logging.getLogger(self.__generator__).debug("Downloading %s to %s", 509 | src, dst) 510 | sftp.get(src, dst) 511 | time.sleep(2 * random.random()) 512 | 513 | time.sleep(self._multiplier * random.random()) 514 | 515 | sftp.close() 516 | time.sleep(0.2) 517 | transport.close() -------------------------------------------------------------------------------- /trafficgen/PyTgen/core/runner.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Python 3 adaption/modification of the PyTgen runner 3 | ''' 4 | 5 | import threading 6 | import queue 7 | import logging 8 | 9 | class worker(threading.Thread): 10 | def __init__(self, 11 | name, 12 | queue, 13 | create, 14 | destroy): 15 | threading.Thread.__init__(self) 16 | 17 | self.__name = name 18 | self.__queue = queue 19 | 20 | self.__create = create 21 | self.__destroy = destroy 22 | 23 | self.__dismissed = threading.Event() 24 | 25 | self.daemon = True 26 | self.name = self.__name 27 | 28 | self.start() 29 | 30 | def run(self): 31 | if self.__create: 32 | self.__create() 33 | 34 | logging.getLogger(self.__name).debug('main loop started') 35 | 36 | while True: 37 | if self.__dismissed.is_set(): 38 | break 39 | 40 | try: 41 | action = self.__queue.get(block=True, 42 | timeout=10) 43 | 44 | except: 45 | continue 46 | 47 | else: 48 | try: 49 | action() 50 | 51 | except: 52 | raise 53 | 54 | logging.getLogger(self.__name).debug('main loop finished') 55 | 56 | if self.__destroy: 57 | self.__destroy() 58 | 59 | def dismiss(self): 60 | self.__dismissed.set() 61 | 62 | class runner(object): 63 | def __init__(self, 64 | maxthreads=10, 65 | thread_create=None, 66 | thread_destroy=None): 67 | self.__queue = queue.Queue() 68 | self.__maxthreads = maxthreads 69 | 70 | logging.getLogger('runner').info('creating runner with 3 initial threads') 71 | 72 | self.__workers = [] 73 | for i in range(0, 3): 74 | name = 'worker_%d' % i 75 | 76 | logging.getLogger('runner').debug('creating worker thread: %s', 77 | name) 78 | 79 | self.__workers.append(worker(name=name, 80 | queue=self.__queue, 81 | create=thread_create, 82 | destroy=thread_destroy)) 83 | 84 | def __call__(self, 85 | action): 86 | self.__queue.put(action) 87 | 88 | if (self.__queue.qsize() > 2): 89 | if (len(self.__workers) < self.__maxthreads): 90 | self._spawn() 91 | else: 92 | logging.getLogger('Runner').warning('Not enough worker threads to handle queue') 93 | 94 | def _spawn(self): 95 | name = 'worker_%d' % (len(self.__workers) + 1) 96 | logging.getLogger('Runner').info('spawning new worker thread: %s', name) 97 | self.__workers.append(worker(name=name, 98 | queue=self.__queue, 99 | create=None, 100 | destroy=None)) 101 | 102 | def stop(self): 103 | for worker in self.__workers: 104 | worker.dismiss() 105 | 106 | for worker in self.__workers: 107 | worker.join() -------------------------------------------------------------------------------- /trafficgen/PyTgen/core/scheduler.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Python 3 adaption/modification of the PyTgen scheduler 3 | ''' 4 | 5 | import datetime 6 | import threading 7 | import heapq 8 | import random 9 | import logging 10 | 11 | class scheduler(threading.Thread): 12 | class job(object): 13 | def __init__(self, name, action, interval, start, end): 14 | self.__name = name 15 | self.__action = action 16 | self.__interval = interval 17 | self.__start = start 18 | self.__end = end 19 | self.__exec_time = datetime.datetime.now() + datetime.timedelta(seconds = self.__interval[1] * random.random(), 20 | minutes = self.__interval[0] * random.random()) 21 | 22 | def __call__(self): 23 | today = datetime.datetime.now() 24 | start = today.replace(hour = self.__start[0], 25 | minute = self.__start[1], 26 | second = 0, microsecond = 0) 27 | end = today.replace(hour = self.__end[0], 28 | minute = self.__end[1], 29 | second = 0, microsecond = 0) 30 | 31 | if start <= self.__exec_time < end: 32 | # enqueue job for "random() * 2 * interval" 33 | # in average the job will run every interval but differing randomly 34 | self.__exec_time += datetime.timedelta(seconds = self.__interval[1] * random.random() * 2, 35 | minutes = self.__interval[0] * random.random() * 2) 36 | 37 | if self.__exec_time < datetime.datetime.now(): 38 | logging.getLogger("scheduler").warning('scheduler is overloaded!') 39 | 40 | return self.__action 41 | 42 | else: 43 | # enqueue the job until next start time 44 | if self.__exec_time < start and self.__exec_time.day == start.day: 45 | self.__exec_time = start + datetime.timedelta(seconds = 1) 46 | else: 47 | self.__exec_time = start + datetime.timedelta(days = 1) 48 | 49 | logging.getLogger("scheduler").info("enqueueing %s until %s", 50 | self.__name, self.__exec_time) 51 | return False 52 | 53 | 54 | def __lt__(self, other): 55 | try: 56 | if type(other) == scheduler.job: 57 | return self.__exec_time < other.__exec_time 58 | 59 | elif type(other) == datetime.datetime: 60 | return self.__exec_time < other 61 | else: 62 | raise Exception() 63 | except Exception as e: 64 | raise 65 | 66 | def __sub__(self, other): 67 | try: 68 | if type(other) == scheduler.job: 69 | return self.__exec_time - other.__exec_time 70 | 71 | elif type(other) == datetime.datetime: 72 | return self.__exec_time - other 73 | else: 74 | raise Exception() 75 | except Exception as e: 76 | raise 77 | 78 | def __init__(self, jobs, runner): 79 | threading.Thread.__init__(self) 80 | self.setName('scheduler') 81 | 82 | self.__runner = runner 83 | self.__jobs = jobs 84 | heapq.heapify(self.__jobs) 85 | 86 | self.__running = False 87 | self.__signal = threading.Condition() 88 | 89 | def run(self): 90 | self.__running = True 91 | while self.__running: 92 | self.__signal.acquire() 93 | if not self.__jobs: 94 | self.__signal.wait() 95 | 96 | else: 97 | now = datetime.datetime.now() 98 | while (self.__jobs[0] < now): 99 | job = heapq.heappop(self.__jobs) 100 | 101 | action = job() 102 | if action is not False: 103 | self.__runner(action) 104 | 105 | heapq.heappush(self.__jobs, job) 106 | 107 | logging.getLogger("scheduler").debug("Sleeping %s seconds", (self.__jobs[0] - now)) 108 | self.__signal.wait((self.__jobs[0] - now).total_seconds()) 109 | 110 | self.__signal.release() 111 | 112 | def stop(self): 113 | self.__running = False 114 | 115 | self.__signal.acquire() 116 | self.__signal.notify_all() 117 | self.__signal.release() 118 | 119 | def set_jobs(self, jobs): 120 | self.__signal.acquire() 121 | 122 | self.__jobs = jobs 123 | heapq.heapify(self.__jobs) 124 | 125 | self.__signal.notify_all() 126 | self.__signal.release() -------------------------------------------------------------------------------- /trafficgen/PyTgen/nslookup.py: -------------------------------------------------------------------------------- 1 | import socket 2 | import config 3 | 4 | Conf = config.Conf 5 | # ip_list = [] 6 | # ais = socket.getaddrinfo('en.wikipedia.org', 0, 0, 0, 0) 7 | # for result in ais: 8 | # ip_list.append(result[-1][0]) 9 | # ip_list = list(set(ip_list)) 10 | # print(ip_list) 11 | 12 | count = 0 13 | for http in Conf.https_urls: 14 | url = http.split('/')[2] 15 | ip_list = [] 16 | ais = socket.getaddrinfo(url, 0, 0, 0, 0) 17 | for result in ais: 18 | ip_list.append(result[-1][0]) 19 | ip_list = list(set(ip_list)) 20 | count +=1 21 | print(http, ip_list, count) -------------------------------------------------------------------------------- /trafficgen/PyTgen/run.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Python 3 adaption/modification of the PyTgen run.py 3 | ''' 4 | 5 | import logging 6 | import signal 7 | import platform 8 | 9 | import core.runner 10 | import core.scheduler 11 | import config 12 | 13 | from core.generator import * 14 | 15 | 16 | def create_jobs(): 17 | logging.getLogger('main').info('Creating jobs') 18 | 19 | jobs = [] 20 | for next_job in Conf.jobdef: 21 | logging.getLogger('main').info('creating %s', next_job) 22 | 23 | job = core.scheduler.job(name = next_job[0], 24 | action = eval(next_job[0])(next_job[2]), 25 | interval = next_job[1][2], 26 | start = next_job[1][0], 27 | end = next_job[1][1]) 28 | jobs.append(job) 29 | 30 | return jobs 31 | 32 | if __name__ == '__main__': 33 | # set hostbased parameters 34 | hostname = platform.node() 35 | 36 | log_file = 'logs/' + hostname + '.log' 37 | config_file = "config.py" 38 | # file = open(log_file, 'a+') 39 | # file.close() 40 | # load the hostbased configuration file 41 | # _Conf = __import__(config_file, globals(), locals(), ['Conf'], -1) 42 | # Conf = _Conf.Conf 43 | Conf = config.Conf 44 | 45 | # start logger 46 | logging.basicConfig(level = Conf.loglevel, 47 | format = '%(asctime)s %(name)-12s %(levelname)-8s %(message)s', 48 | datefmt = '%Y-%m-%d %H:%M:%S', 49 | filename = log_file) 50 | 51 | logging.getLogger('main').info('Configuration %s loaded', config_file) 52 | 53 | # start runner, create jobs, start scheduling 54 | runner = core.runner(maxthreads = Conf.maxthreads) 55 | 56 | jobs = create_jobs() 57 | 58 | scheduler = core.scheduler(jobs = jobs, 59 | runner = runner) 60 | 61 | # Stop scheduler on exit 62 | def signal_int(signal, frame): 63 | logging.getLogger('main').info('Stopping scheduler') 64 | scheduler.stop() 65 | 66 | signal.signal(signal.SIGINT, signal_int) 67 | 68 | # Run the scheduler 69 | logging.getLogger('main').info('Starting scheduler') 70 | scheduler.start() 71 | scheduler.join(timeout=None) 72 | 73 | # Stop the runner 74 | runner.stop() -------------------------------------------------------------------------------- /trafficgen/Streaming/streaming_generator.py: -------------------------------------------------------------------------------- 1 | import sys 2 | sys.path.insert(0, "/home/mclrn/dlproject/") 3 | import datetime 4 | from threading import Thread 5 | from selenium import webdriver 6 | from slackclient import SlackClient 7 | import traceback 8 | import os 9 | from selenium.webdriver.support.ui import WebDriverWait 10 | 11 | 12 | # import trafficgen.Streaming.win_capture as cap 13 | # import trafficgen.Streaming.streaming_types as stream 14 | 15 | import unix_capture as cap 16 | import streaming_types as stream 17 | from constants import SLACK_TOKEN 18 | 19 | def notifySlack(message): 20 | sc = SlackClient(SLACK_TOKEN) 21 | try: 22 | sc.api_call("chat.postMessage", channel="#server", text=message) 23 | except: 24 | sc.api_call("chat.postMessage", channel="#server", text="Could not send stacktrace") 25 | 26 | 27 | def generate_streaming(duration, dir, total_iterations, options=None): 28 | iterations = 0 29 | stream_types = { 30 | # 'hbo': (stream.HboNordic, 1), 31 | # 'netflix': (stream.Netflix, 1), 32 | 'twitch': (stream.Twitch, 5), 33 | 'youtube': (stream.Youtube, 5), 34 | 'drtv': (stream.DrTv, 5), 35 | } 36 | while iterations < total_iterations: 37 | print("Iteration:", iterations) 38 | if iterations % 25 == 0: 39 | notifySlack("Starting iteration: " + str(iterations)) 40 | try: 41 | for stream_type in stream_types.keys(): 42 | browsers, capture_thread, file, streaming_threads, = [], [], [], [] 43 | type = stream_types[stream_type][0] 44 | num_threads = stream_types[stream_type][1] 45 | 46 | browsers, capture_thread, file, streaming_threads = generate_threaded_streaming(type, stream_type, dir, 47 | duration, options, 48 | num_threads=num_threads) 49 | try: 50 | capture_thread.start() 51 | for thread in streaming_threads: 52 | # Start streaming threads 53 | thread.start() 54 | print("streaming started", stream_type) 55 | capture_thread.join() # Stream until the capture thread joins 56 | print("capture done - thread has joined") 57 | except Exception as e: 58 | notifySlack("Something went wrong %s" % traceback.format_exc()) 59 | # Wait for capture thread 60 | capture_thread.join() 61 | # Do a cleanup since somthing went wrong 62 | cap.cleanup(file) 63 | try: 64 | for browser in browsers: 65 | browser.close() 66 | browser.quit() 67 | except Exception as e: 68 | notifySlack("Something went wrong %s" % traceback.format_exc()) 69 | # os.system("killall chrome") 70 | # os.system("killall chromedriver") 71 | except Exception as ex: 72 | notifySlack("Something went wrong when setting up the threads \n %s" % traceback.format_exc()) 73 | 74 | iterations += 1 75 | 76 | 77 | def generate_threaded_streaming(obj: stream.Streaming, stream_name, dir, duration, chrome_options=None, num_threads=5): 78 | #### STREAMING #### 79 | # Create filename 80 | now = datetime.datetime.now() 81 | file = dir + "/%s-%.2d%.2d_%.2d%.2d%.2d.pcap" % (stream_name, now.day, now.month, now.hour, now.minute, now.second) 82 | # Instantiate thread 83 | capture_thread = Thread(target=cap.captureTraffic, args=(1, duration, dir, file)) 84 | # Create five threads for streaming 85 | streaming_threads = [] 86 | browsers = [] 87 | for i in range(num_threads): 88 | browser = webdriver.Chrome(options=chrome_options) 89 | browser.implicitly_wait(10) 90 | browsers.append(browser) 91 | t = Thread(target=obj.stream_video, args=(obj, browser)) 92 | streaming_threads.append(t) 93 | 94 | return browsers, capture_thread, file, streaming_threads 95 | 96 | 97 | def get_clear_browsing_button(driver): 98 | """Find the "CLEAR BROWSING BUTTON" on the Chrome settings page. /deep/ to go past shadow roots""" 99 | return driver.find_element_by_css_selector('* /deep/ #clearBrowsingDataConfirm') 100 | 101 | 102 | def clear_cache(driver, timeout=60): 103 | """Clear the cookies and cache for the ChromeDriver instance.""" 104 | # navigate to the settings page 105 | driver.get('chrome://settings/clearBrowserData') 106 | 107 | # wait for the button to appear 108 | wait = WebDriverWait(driver, timeout) 109 | wait.until(get_clear_browsing_button) 110 | 111 | # click the button to clear the cache 112 | get_clear_browsing_button(driver).click() 113 | 114 | # wait for the button to be gone before returning 115 | wait.until_not(get_clear_browsing_button) 116 | 117 | 118 | if __name__ == "__main__": 119 | #netflixuser = os.environ["netflixuser"] 120 | #netflixpassword = os.environ["netflixpassword"] 121 | #hbouser = os.environ["hbouser"] 122 | #hbopassword = os.environ["hbopassword"] 123 | # slack_token = os.environ['slack_token'] 124 | # Specify duration in seconds 125 | duration = 60 * 1 126 | total_iterations = 1000 127 | save_dir = '/home/mclrn/Data' 128 | chrome_profile_dir = "/home/mclrn/.config/google-chrome/" 129 | options = webdriver.ChromeOptions() 130 | #options.add_argument('user-data-dir=' + chrome_profile_dir) 131 | options.add_argument("--enable-quic") 132 | # options.add_argument('headless') 133 | generate_streaming(duration, save_dir, total_iterations, options) 134 | print("something") -------------------------------------------------------------------------------- /trafficgen/Streaming/streaming_types.py: -------------------------------------------------------------------------------- 1 | from random import randint 2 | import time 3 | class Streaming: 4 | 5 | def stream_video(self, browser): 6 | pass 7 | 8 | 9 | class Twitch(Streaming): 10 | 11 | def stream_video(self, browser): 12 | browser.get('https://www.twitch.tv/directory/game/League%20of%20Legends/videos/all') 13 | # Choose random video 14 | time.sleep(1) 15 | videos = browser.find_elements_by_css_selector("a[href*='/videos/']") 16 | video = videos[randint(0, len(videos))] 17 | link = video.get_attribute('href') 18 | browser.get(link) 19 | 20 | 21 | class Youtube(Streaming): 22 | 23 | def stream_video(self, browser): 24 | browser.get('https://www.youtube.com') 25 | # Choose random video 26 | time.sleep(1) 27 | videos = browser.find_elements_by_css_selector("ytd-grid-video-renderer a[href*='/watch?v=']") 28 | video = videos[randint(0, len(videos))] 29 | # thumbnail 30 | link = video.get_attribute('href') 31 | browser.get(link) 32 | 33 | 34 | class Netflix(Streaming): 35 | 36 | def stream_video(self, browser): 37 | # Change to correct profile 38 | browser.get("https://www.netflix.com/SwitchProfile?tkn=I42P4G75VVDM7LV626VKTXTXGI") 39 | # Choose random video 40 | videos = browser.find_elements_by_css_selector("div.title-card-container a[href*='/watch/']") 41 | video = videos[randint(0, len(videos))] 42 | link = video.get_attribute('href') 43 | browser.get(link) 44 | 45 | 46 | class DrTv(Streaming): 47 | 48 | def stream_video(self, browser): 49 | browser.get('https://www.dr.dk/tv') 50 | # Choose random video 51 | videos = browser.find_elements_by_class_name('program-link') 52 | video = videos[randint(0, len(videos))] 53 | link = video.get_attribute('href') 54 | browser.get(link) 55 | play_button = browser.find_element_by_css_selector('button[title="Afspil"]') 56 | play_button.click() 57 | 58 | 59 | class HboNordic(Streaming): 60 | def stream_video(self, browser): 61 | browser.get("https://dk.hbonordic.com/home") 62 | videos = browser.find_elements_by_css_selector("a[data-automation='play-button']") 63 | video = videos[randint(0, len(videos))] 64 | video_url = video.get_attribute("href") 65 | browser.get(video_url) 66 | -------------------------------------------------------------------------------- /trafficgen/Streaming/unix_capture.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | 4 | def captureTraffic(interfaceNumber, duration, dir, file): 5 | ''' 6 | Interfacenumber: specifies the network interface that should be captured (use tshark -D to list the options) 7 | duration: specifies the number of seconds that the capture should go on 8 | dir: is the folder to which the pcap file of the capture should be saved 9 | name: is the name of that pcap file (this will always be appended by date and time) 10 | ''' 11 | # makedir if it does not exist 12 | if not os.path.isdir(dir): 13 | os.mkdir(dir) 14 | #open(file, "w") # overwrites if file already exists 15 | os.system("echo %s |sudo -S tshark -i %d -a duration:%d -w %s port not 5901 and ip and not broadcast and not multicast" % ('Napatech10',interfaceNumber, duration, file)) 16 | os.system("echo %s |sudo -S chown mclrn:mclrn %s" % ('Napatech10', file)) 17 | 18 | 19 | 20 | ''' 21 | To test 22 | captureTraffic(1,10, 'C:/users/arhjo/desktop', "test") 23 | ''' 24 | 25 | 26 | def cleanup(file): 27 | os.system("echo %s |sudo -S rm %s" % ('Napatech10', file)) 28 | -------------------------------------------------------------------------------- /trafficgen/Streaming/win_capture.py: -------------------------------------------------------------------------------- 1 | import os 2 | import datetime 3 | 4 | 5 | def captureTraffic(interfaceNumber, duration, dir, file): 6 | ''' 7 | Interfacenumber: specifies the network interface that should be captured (use tshark -D to list the options) 8 | duration: specifies the number of seconds that the capture should go on 9 | dir: is the folder to which the pcap file of the capture should be saved 10 | name: is the name of that pcap file (this will always be appended by date and time) 11 | ''' 12 | 13 | # makedir if it does not exist 14 | if not os.path.isdir(dir): 15 | os.mkdir(dir) 16 | 17 | open(file, "w") # overwrites if file already exists 18 | os.system("tshark -i %d -a duration:%d -w %s port not 5901 and ((port 80) or (port 443)) and ip and not broadcast and not multicast" % (interfaceNumber, duration, file)) 19 | 20 | 21 | def cleanup(file): 22 | os.system("del %s" % file) 23 | 24 | ''' 25 | To test 26 | captureTraffic(1,10, 'C:/users/arhjo/desktop', "test") 27 | ''' -------------------------------------------------------------------------------- /train/train_header.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tf import tf_utils as tfu, confusionmatrix as conf, dataset, early_stopping as es 3 | import numpy as np 4 | import datetime 5 | from sklearn import metrics 6 | from sklearn.preprocessing import label_binarize 7 | import utils 8 | import os 9 | 10 | now = datetime.datetime.now() 11 | 12 | summaries_dir = '../tensorboard' 13 | 14 | # hidden_units = 12 15 | units = [5] 16 | acc_list = [] 17 | num_headers_train = [] 18 | hidden_units_train = [] 19 | num_headers = [8] 20 | for num_header in num_headers: 21 | for hidden_units in units: 22 | hidden_units_train.append(hidden_units) 23 | train_dirs = ["E:/Data/LinuxChrome/{}/".format(num_header), 24 | "E:/Data/WindowsSalik/{}/".format(num_header), 25 | "E:/Data/WindowsAndreas/{}/".format(num_header), 26 | "E:/Data/WindowsFirefox/{}/".format(num_header) 27 | ] 28 | 29 | test_dirs = ["E:/Data/WindowsChrome/{}/".format(num_header)] 30 | 31 | trainstr = "train:" 32 | for traindir in train_dirs: 33 | trainstr += traindir.split('Data/')[1].split("/")[0] 34 | trainstr += ":" 35 | teststr = "test:" 36 | for testdir in test_dirs: 37 | teststr += testdir.split('Data/')[1].split("/")[0] 38 | teststr += ":" 39 | timestamp = "%.2d%.2d_%.2d%.2d" % (now.day, now.month, now.hour, now.minute) 40 | save_dir = "../trained_models/{0}/{1}/{2}/".format(num_header, hidden_units, timestamp) 41 | os.makedirs(save_dir, exist_ok=True) 42 | seed = 0 43 | namestr = trainstr+teststr+str(num_header)+":"+str(hidden_units) 44 | # Beta for L2 regularization 45 | beta = 1.0 46 | val_size = [0.1] 47 | 48 | early_stop = es.EarlyStopping(patience=20, min_delta=0.05) 49 | subdir = "/{0}/{1}/{2}/".format(num_header, hidden_units, timestamp) 50 | input_size = num_header*54 51 | data = dataset.read_data_sets(test_dirs, train_dirs, merge_data=True, one_hot=True, 52 | validation_size=0.1, 53 | test_size=0.1, 54 | balance_classes=False, 55 | payload_length=input_size, 56 | seed=seed) 57 | tf.reset_default_graph() 58 | num_headers_train.append(num_header) 59 | num_classes = len(dataset._label_encoder.classes_) 60 | labels = dataset._label_encoder.classes_ 61 | 62 | # cm = conf.ConfusionMatrix(num_classes, class_names=labels) 63 | gpu_opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.85) 64 | 65 | x_pl = tf.placeholder(tf.float32, [None, input_size], name='xPlaceholder') 66 | y_pl = tf.placeholder(tf.float64, [None, num_classes], name='yPlaceholder') 67 | y_pl = tf.cast(y_pl, tf.float32) 68 | 69 | x = tfu.ffn_layer('layer1', x_pl, hidden_units, activation=tf.nn.relu, seed=seed) 70 | # x = tf.layers.dense(x_pl, hidden_units, tf.nn.relu) 71 | # x = tfu.dropout(x, 0.5) 72 | # x = tfu.ffn_layer('layer2', x, hidden_units, activation=tf.nn.relu) 73 | # x = tfu.ffn_layer('layer2', x, 50, activation=tf.nn.sigmoid) 74 | # x = tfu.ffn_layer('layer3', x, 730, activation=tf.nn.relu) 75 | y = tfu.ffn_layer('output_layer', x, hidden_units=num_classes, activation=tf.nn.softmax, seed=seed) 76 | y_ = tf.argmax(y, axis=1) 77 | # with tf.name_scope('cross_entropy'): 78 | # # The raw formulation of cross-entropy, 79 | # # 80 | # # tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(tf.softmax(y)), 81 | # # reduction_indices=[1])) 82 | # # 83 | # # can be numerically unstable. 84 | # # 85 | # # So here we use tf.losses.sparse_softmax_cross_entropy on the 86 | # # raw logit outputs of the nn_layer above. 87 | # with tf.name_scope('total'): 88 | # cross_entropy = tf.losses.softmax_cross_entropy(onehot_labels=y_pl, logits=y) 89 | 90 | 91 | with tf.variable_scope('loss'): 92 | # computing cross entropy per sample 93 | cross_entropy = -tf.reduce_sum(y_pl * tf.log(y + 1e-8), reduction_indices=[1]) 94 | 95 | W1 = tf.get_default_graph().get_tensor_by_name("layer1/W:0") 96 | loss = tf.nn.l2_loss(W1) 97 | # averaging over samples 98 | cross_entropy = tf.reduce_mean(cross_entropy) 99 | loss = tf.reduce_mean(cross_entropy + loss * beta) 100 | 101 | tf.summary.scalar('cross_entropy', cross_entropy) 102 | 103 | with tf.variable_scope('training'): 104 | # defining our optimizer 105 | optimizer = tf.train.AdamOptimizer(learning_rate=0.001) 106 | 107 | # applying the gradients 108 | train_op = optimizer.minimize(cross_entropy) 109 | 110 | with tf.variable_scope('performance'): 111 | # making a one-hot encoded vector of correct (1) and incorrect (0) predictions 112 | correct_prediction = tf.equal(tf.argmax(y, axis=1), tf.argmax(y_pl, axis=1)) 113 | 114 | # averaging the one-hot encoded vector 115 | accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 116 | 117 | tf.summary.scalar('accuracy', accuracy) 118 | 119 | # Merge all the summaries and write them out 120 | merged = tf.summary.merge_all() 121 | train_writer = tf.summary.FileWriter(summaries_dir + '/train/' + subdir) 122 | val_writer = tf.summary.FileWriter(summaries_dir + '/validation/' + subdir) 123 | 124 | # Training Loop 125 | batch_size = 100 126 | max_epochs = 200 127 | 128 | valid_loss, valid_accuracy = [], [] 129 | train_loss, train_accuracy = [], [] 130 | test_loss, test_accuracy = [], [] 131 | epochs = [] 132 | with tf.Session() as sess: 133 | early_stop.on_train_begin() 134 | train_writer.add_graph(sess.graph) 135 | sess.run(tf.global_variables_initializer()) 136 | total_parameters = 0 137 | print("Calculating trainable parameters!") 138 | for variable in tf.trainable_variables(): 139 | # shape is an array of tf.Dimension 140 | shape = variable.get_shape() 141 | print("Shape: {}".format(shape)) 142 | variable_parameters = 1 143 | for dim in shape: 144 | variable_parameters *= dim.value 145 | print("Shape {0} gives {1} trainable parameters".format(shape, variable_parameters)) 146 | total_parameters += variable_parameters 147 | print("Trainable parameters: {}".format(total_parameters)) 148 | print('Begin training loop') 149 | saver = tf.train.Saver() 150 | try: 151 | while data.train.epochs_completed < max_epochs: 152 | _train_loss, _train_accuracy = [], [] 153 | ## Run train op 154 | x_batch, y_batch = data.train.next_batch(batch_size) 155 | fetches_train = [train_op, cross_entropy, accuracy, merged] 156 | feed_dict_train = {x_pl: x_batch, y_pl: y_batch} 157 | _, _loss, _acc, _summary = sess.run(fetches_train, feed_dict_train) 158 | 159 | _train_loss.append(_loss) 160 | _train_accuracy.append(_acc) 161 | ## Compute validation loss and accuracy 162 | if data.train.epochs_completed % 1 == 0 \ 163 | and data.train._index_in_epoch <= batch_size: 164 | 165 | train_writer.add_summary(_summary, data.train.epochs_completed) 166 | 167 | train_loss.append(np.mean(_train_loss)) 168 | train_accuracy.append(np.mean(_train_accuracy)) 169 | 170 | fetches_valid = [cross_entropy, accuracy, merged] 171 | 172 | feed_dict_valid = {x_pl: data.validation.payloads, y_pl: data.validation.labels} 173 | _loss, _acc, _summary = sess.run(fetches_valid, feed_dict_valid) 174 | epochs.append(data.train.epochs_completed) 175 | valid_loss.append(_loss) 176 | valid_accuracy.append(_acc) 177 | val_writer.add_summary(_summary, data.train.epochs_completed) 178 | current = valid_loss[-1] 179 | early_stop.on_epoch_end(data.train.epochs_completed, current) 180 | print("Epoch {} : Train Loss {:6.3f}, Train acc {:6.3f}, Valid loss {:6.3f}, Valid acc {:6.3f}" 181 | .format(data.train.epochs_completed, train_loss[-1], train_accuracy[-1], valid_loss[-1], 182 | valid_accuracy[-1])) 183 | if early_stop.stop_training: 184 | early_stop.on_train_end() 185 | break 186 | 187 | 188 | test_epoch = data.test.epochs_completed 189 | while data.test.epochs_completed == test_epoch: 190 | batch_size = 1000 191 | x_batch, y_batch = data.test.next_batch(batch_size) 192 | feed_dict_test = {x_pl: x_batch, y_pl: y_batch} 193 | _loss, _acc, _summary = sess.run(fetches_valid, feed_dict_test) 194 | y_preds = sess.run(fetches=y, feed_dict=feed_dict_test) 195 | y_preds = tf.argmax(y_preds, axis=1).eval() 196 | y_true = tf.argmax(y_batch, axis=1).eval() 197 | # cm.batch_add(y_true, y_preds) 198 | test_loss.append(_loss) 199 | test_accuracy.append(_acc) 200 | # test_writer.add_summary(_summary, data.train.epochs_completed) 201 | print('Test Loss {:6.3f}, Test acc {:6.3f}'.format( 202 | np.mean(test_loss), np.mean(test_accuracy))) 203 | namestr += ":acc{:.3f}".format(np.mean(test_accuracy)) 204 | acc_list.append("{:.3f}".format(np.mean(test_accuracy))) 205 | saver.save(sess, save_dir+'header_{0}_{1}_units.ckpt'.format(num_header, hidden_units)) 206 | feed_dict_test = {x_pl: data.test.payloads, y_pl: data.test.labels} 207 | # Create ROC curve for all classes 208 | y_preds = sess.run(fetches=y, feed_dict=feed_dict_test) 209 | y_true = data.test.labels 210 | utils.plot_ROC(y_true, y_preds, num_classes, labels, micro=False, macro=False) 211 | # Compute different metrics for confusionmatrix and more 212 | y_preds = sess.run(fetches=y_, feed_dict=feed_dict_test) 213 | y_true = tf.argmax(data.test.labels, axis=1).eval() 214 | y_true = [labels[i] for i in y_true] 215 | y_preds = [labels[i] for i in y_preds] 216 | conf = metrics.confusion_matrix(y_true, y_preds, labels=labels) 217 | report = metrics.classification_report(y_true, y_preds, labels=labels) 218 | nostream_dict = ['http', 'https'] 219 | y_stream_true = [] 220 | y_stream_preds = [] 221 | for i, v in enumerate(y_true): 222 | pred = y_preds[i] 223 | if v in nostream_dict: 224 | y_stream_true.append('non-streaming') 225 | else: 226 | y_stream_true.append('streaming') 227 | if pred in nostream_dict: 228 | y_stream_preds.append('non-streaming') 229 | else: 230 | y_stream_preds.append('streaming') 231 | stream_acc = len([v for i, v in enumerate(y_stream_preds) if v == y_stream_true[i]]) / len( 232 | y_stream_true) 233 | 234 | # Binarize the output 235 | y_stream_true1 = label_binarize(y_stream_true, classes=['non-streaming', 'streaming']) 236 | y_stream_preds1 = label_binarize(y_stream_preds, classes=['non-streaming', 'streaming']) 237 | n_classes = y_stream_true1.shape[1] 238 | # Stream/non-stream ROC curve 239 | utils.plot_ROC(y_stream_true1, y_stream_preds1, n_classes, ['non-streaming', 'streaming'], micro=False, macro=False) 240 | 241 | conf1 = metrics.confusion_matrix(y_true, y_preds, labels=labels) 242 | conf2 = metrics.confusion_matrix(y_stream_true, y_stream_preds, labels=['non-streaming', 'streaming']) 243 | report = metrics.classification_report(y_true, y_preds, labels=labels) 244 | report2 = metrics.classification_report(y_stream_true, y_stream_preds, labels=['non-streaming', 'streaming']) 245 | 246 | except KeyboardInterrupt: 247 | pass 248 | print(namestr) 249 | utils.plot_confusion_matrix(conf1, labels, save=False, title=namestr) 250 | utils.plot_confusion_matrix(conf2, ['non-streaming', 'streaming'], save=False, 251 | title="StreamNoStream_acc{}".format(stream_acc)) 252 | print(report) 253 | print(report2) 254 | # utils.plot_metric_graph(x_list=epochs, y_list=valid_accuracy, save=False, x_label="epochs", y_label="Accuracy", title="Accuracy") 255 | # utils.plot_metric_graph(x_list=epochs, y_list=valid_loss, save=False, x_label="epochs", y_label="loss", title="Loss") 256 | # utils.plot_confusion_matrix(conf, labels, save=False, title=namestr) 257 | utils.show_plot() 258 | 259 | acc_list = list(map(float, acc_list)) 260 | print(acc_list, hidden_units_train) 261 | 262 | 263 | -------------------------------------------------------------------------------- /train/train_logistic.py: -------------------------------------------------------------------------------- 1 | from sklearn.linear_model import LogisticRegression 2 | import tensorflow as tf 3 | from tf import tf_utils as tfu, dataset, early_stopping as es 4 | import numpy as np 5 | import datetime 6 | from sklearn import metrics 7 | import utils 8 | import operator 9 | from sklearn.metrics import classification_report 10 | import pandas as pd 11 | from collections import Counter 12 | 13 | def input_fn(data): 14 | return (data.payloads, data.labels) 15 | 16 | 17 | now = datetime.datetime.now() 18 | 19 | num_headers = 16 20 | hidden_units = 15 21 | train_dirs = ['/home/mclrn/Data/WindowsSalik/{0}/'.format(num_headers), 22 | '/home/mclrn/Data/WindowsChrome/{0}/'.format(num_headers), 23 | '/home/mclrn/Data/WindowsLinux/{0}/'.format(num_headers), 24 | '/home/mclrn/Data/WindowsFirefox/{0}/'.format(num_headers)] 25 | 26 | test_dirs = ['/home/mclrn/Data/WindowsAndreas/{0}/'.format(num_headers)] 27 | 28 | val = 0.1 29 | input_size = num_headers * 54 30 | seed = 0 31 | train_size = [] 32 | data = dataset.read_data_sets(train_dirs, test_dirs, merge_data=True, one_hot=False, 33 | validation_size=val, 34 | test_size=0.1, 35 | balance_classes=False, 36 | payload_length=input_size, 37 | seed=seed) 38 | 39 | train_size.append(len(data.train.payloads)) 40 | num_classes = len(dataset._label_encoder.classes_) 41 | labels = dataset._label_encoder.classes_ 42 | 43 | classifier = LogisticRegression(verbose=2) 44 | 45 | classifier.fit(data.train.payloads, data.train.labels) 46 | 47 | score = classifier.score(data.test.payloads, data.test.labels) 48 | 49 | predict_proba = classifier.predict_proba(data.test.payloads) 50 | 51 | predict = classifier.predict(data.test.payloads) 52 | 53 | correct = list(map(operator.sub,predict,data.test.labels)).count(0) 54 | 55 | 56 | confusion = metrics.confusion_matrix(data.test.labels, predict) 57 | utils.plot_confusion_matrix(confusion, labels, save=True, title="Logistic regression confusion matrixacc") 58 | accuracy = correct/len(predict) 59 | 60 | report = classification_report(data.test.labels, predict) 61 | print(accuracy) 62 | skleranacc = metrics.accuracy_score(data.test.labels,predict) 63 | print("Accuracy calculated by sklearn.metrics: {}".format(skleranacc)) 64 | most_occurences = Counter(data.test.labels).most_common(1) 65 | print("Naive classifier that just guesses the most frequents label will obtain an accuracy of: %s" % ((most_occurences[0][1])/len(data.test.labels))) 66 | print(report) -------------------------------------------------------------------------------- /train/train_payload.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tf import tf_utils as tfu, confusionmatrix as conf, dataset, early_stopping as es 3 | import numpy as np 4 | from sklearn import metrics 5 | import utils 6 | summaries_dir = '../tensorboard' 7 | 8 | train_dirs = ["E:/Data/h5/https/"] 9 | test_dirs = ["E:/Data/h5/netflix/"] 10 | pay_len = 1460 11 | hidden_units = 2000 12 | save_dir = "../trained_models/" 13 | seed = 1 14 | 15 | data = dataset.read_data_sets(train_dirs, test_dirs, merge_data=True, one_hot=True, validation_size=0.1, test_size=0.1, balance_classes=True, payload_length=pay_len, seed=seed) 16 | tf.reset_default_graph() 17 | num_classes = len(dataset._label_encoder.classes_) 18 | labels = dataset._label_encoder.classes_ 19 | 20 | early_stop = es.EarlyStopping(patience=100, min_delta=0.05) 21 | 22 | cm = conf.ConfusionMatrix(num_classes, class_names=dataset._label_encoder.classes_) 23 | gpu_opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.45) 24 | 25 | x_pl = tf.placeholder(tf.float32, [None, pay_len], name='xPlaceholder') 26 | y_pl = tf.placeholder(tf.float64, [None, num_classes], name='yPlaceholder') 27 | y_pl = tf.cast(y_pl, tf.float32) 28 | 29 | x = tfu.ffn_layer('layer1', x_pl, hidden_units, activation=tf.nn.relu, seed=seed) 30 | x = tfu.ffn_layer('layer2', x, hidden_units, activation=tf.nn.relu, seed=seed) 31 | x = tfu.ffn_layer('layer3', x, hidden_units, activation=tf.nn.relu, seed=seed) 32 | y = tfu.ffn_layer('output_layer', x, hidden_units=num_classes, activation=tf.nn.softmax, seed=seed) 33 | y_ = tf.argmax(y, axis=1) 34 | # with tf.name_scope('cross_entropy'): 35 | # # The raw formulation of cross-entropy, 36 | # # 37 | # # tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(tf.softmax(y)), 38 | # # reduction_indices=[1])) 39 | # # 40 | # # can be numerically unstable. 41 | # # 42 | # # So here we use tf.losses.sparse_softmax_cross_entropy on the 43 | # # raw logit outputs of the nn_layer above. 44 | # with tf.name_scope('total'): 45 | # cross_entropy = tf.losses.softmax_cross_entropy(onehot_labels=y_pl, logits=y) 46 | 47 | 48 | with tf.variable_scope('loss'): 49 | # computing cross entropy per sample 50 | cross_entropy = -tf.reduce_sum(y_pl * tf.log(y + 1e-8), reduction_indices=[1]) 51 | 52 | # averaging over samples 53 | cross_entropy = tf.reduce_mean(cross_entropy) 54 | 55 | tf.summary.scalar('cross_entropy', cross_entropy) 56 | 57 | with tf.variable_scope('training'): 58 | # defining our optimizer 59 | optimizer = tf.train.AdamOptimizer(learning_rate=0.001) 60 | 61 | # applying the gradients 62 | train_op = optimizer.minimize(cross_entropy) 63 | 64 | with tf.variable_scope('performance'): 65 | # making a one-hot encoded vector of correct (1) and incorrect (0) predictions 66 | correct_prediction = tf.equal(tf.argmax(y, axis=1), tf.argmax(y_pl, axis=1)) 67 | 68 | # averaging the one-hot encoded vector 69 | accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 70 | 71 | tf.summary.scalar('accuracy', accuracy) 72 | 73 | # Merge all the summaries and write them out 74 | merged = tf.summary.merge_all() 75 | train_writer = tf.summary.FileWriter(summaries_dir + '/train') 76 | val_writer = tf.summary.FileWriter(summaries_dir + '/validation') 77 | test_writer = tf.summary.FileWriter(summaries_dir + '/test') 78 | 79 | # Training Loop 80 | batch_size = 100 81 | max_epochs = 200 82 | 83 | valid_loss, valid_accuracy = [], [] 84 | train_loss, train_accuracy = [], [] 85 | test_loss, test_accuracy = [], [] 86 | 87 | with tf.Session() as sess: 88 | early_stop.on_train_begin() 89 | train_writer.add_graph(sess.graph) 90 | sess.run(tf.global_variables_initializer()) 91 | total_parameters = 0 92 | print("Calculating trainable parameters!") 93 | for variable in tf.trainable_variables(): 94 | # shape is an array of tf.Dimension 95 | shape = variable.get_shape() 96 | # print("Shape: {}".format(shape)) 97 | variable_parameters = 1 98 | for dim in shape: 99 | variable_parameters *= dim.value 100 | print("Shape {0} gives {1} trainable parameters".format(shape, variable_parameters)) 101 | total_parameters += variable_parameters 102 | print("Trainable parameters: {}".format(total_parameters)) 103 | print('Begin training loop') 104 | saver = tf.train.Saver() 105 | try: 106 | while data.train.epochs_completed < max_epochs: 107 | _train_loss, _train_accuracy = [], [] 108 | 109 | ## Run train op 110 | x_batch, y_batch = data.train.next_batch(batch_size) 111 | fetches_train = [train_op, cross_entropy, accuracy, merged] 112 | feed_dict_train = {x_pl: x_batch, y_pl: y_batch} 113 | _, _loss, _acc, _summary = sess.run(fetches_train, feed_dict_train) 114 | 115 | _train_loss.append(_loss) 116 | _train_accuracy.append(_acc) 117 | ## Compute validation loss and accuracy 118 | if data.train.epochs_completed % 1 == 0 \ 119 | and data.train._index_in_epoch <= batch_size: 120 | 121 | train_writer.add_summary(_summary, data.train.epochs_completed) 122 | 123 | train_loss.append(np.mean(_train_loss)) 124 | train_accuracy.append(np.mean(_train_accuracy)) 125 | 126 | fetches_valid = [cross_entropy, accuracy, merged] 127 | 128 | feed_dict_valid = {x_pl: data.validation.payloads, y_pl: data.validation.labels} 129 | _loss, _acc, _summary = sess.run(fetches_valid, feed_dict_valid) 130 | 131 | valid_loss.append(_loss) 132 | valid_accuracy.append(_acc) 133 | val_writer.add_summary(_summary, data.train.epochs_completed) 134 | current = valid_loss[-1] 135 | early_stop.on_epoch_end(data.train.epochs_completed, current) 136 | print("Epoch {} : Train Loss {:6.3f}, Train acc {:6.3f}, Valid loss {:6.3f}, Valid acc {:6.3f}" 137 | .format(data.train.epochs_completed, train_loss[-1], train_accuracy[-1], valid_loss[-1], 138 | valid_accuracy[-1])) 139 | if early_stop.stop_training: 140 | early_stop.on_train_end() 141 | break 142 | 143 | 144 | test_epoch = data.test.epochs_completed 145 | while data.test.epochs_completed == test_epoch: 146 | batch_size = 1000 147 | x_batch, y_batch = data.test.next_batch(batch_size) 148 | feed_dict_test = {x_pl: x_batch, y_pl: y_batch} 149 | _loss, _acc, _summary = sess.run(fetches_valid, feed_dict_test) 150 | y_preds = sess.run(fetches=y, feed_dict=feed_dict_test) 151 | y_preds = tf.argmax(y_preds, axis=1).eval() 152 | y_true = tf.argmax(y_batch, axis=1).eval() 153 | # cm.batch_add(y_true, y_preds) 154 | test_loss.append(_loss) 155 | test_accuracy.append(_acc) 156 | # test_writer.add_summary(_summary, data.train.epochs_completed) 157 | print('Test Loss {:6.3f}, Test acc {:6.3f}'.format( 158 | np.mean(test_loss), np.mean(test_accuracy))) 159 | saver.save(sess, save_dir+'payload_{0}_{1}_units.ckpt'.format(pay_len, hidden_units)) 160 | feed_dict_test = {x_pl: data.test.payloads, y_pl: data.test.labels} 161 | y_preds = sess.run(fetches=y_, feed_dict=feed_dict_test) 162 | y_true = tf.argmax(data.test.labels, axis=1).eval() 163 | y_true = [labels[i] for i in y_true] 164 | y_preds = [labels[i] for i in y_preds] 165 | conf = metrics.confusion_matrix(y_true, y_preds, labels=labels) 166 | 167 | except KeyboardInterrupt: 168 | pass 169 | # print(namestr) 170 | utils.plot_confusion_matrix(conf, labels, save=True, title='payload_{0}_{1}_units_acc{2}'.format(pay_len, hidden_units, np.mean(test_accuracy))) 171 | # acc_list = list(map(float, acc_list)) 172 | # print(acc_list, train_size) 173 | # 174 | # with tf.Session() as sess: 175 | # sess.run(tf.global_variables_initializer()) 176 | # print('Begin training loop') 177 | # 178 | # try: 179 | # while data.train.epochs_completed < max_epochs: 180 | # _train_loss, _train_accuracy = [], [] 181 | # 182 | # ## Run train op 183 | # x_batch, y_batch = data.train.next_batch(batch_size) 184 | # fetches_train = [train_op, cross_entropy, accuracy, merged] 185 | # feed_dict_train = {x_pl: x_batch, y_pl: y_batch} 186 | # _, _loss, _acc, _summary = sess.run(fetches_train, feed_dict_train) 187 | # 188 | # _train_loss.append(_loss) 189 | # _train_accuracy.append(_acc) 190 | # ## Compute validation loss and accuracy 191 | # if data.train.epochs_completed % 1 == 0 \ 192 | # and data.train._index_in_epoch <= batch_size: 193 | # 194 | # train_writer.add_summary(_summary, data.train.epochs_completed) 195 | # 196 | # train_loss.append(np.mean(_train_loss)) 197 | # train_accuracy.append(np.mean(_train_accuracy)) 198 | # 199 | # fetches_valid = [cross_entropy, accuracy, merged] 200 | # 201 | # feed_dict_valid = {x_pl: data.validation.payloads, y_pl: data.validation.labels} 202 | # _loss, _acc, _summary = sess.run(fetches_valid, feed_dict_valid) 203 | # 204 | # valid_loss.append(_loss) 205 | # valid_accuracy.append(_acc) 206 | # val_writer.add_summary(_summary, data.train.epochs_completed) 207 | # print( 208 | # "Epoch {} : Train Loss {:6.3f}, Train acc {:6.3f}, Valid loss {:6.3f}, Valid acc {:6.3f}".format( 209 | # data.train.epochs_completed, train_loss[-1], train_accuracy[-1], valid_loss[-1], 210 | # valid_accuracy[-1])) 211 | # test_epoch = data.test.epochs_completed 212 | # while data.test.epochs_completed == test_epoch: 213 | # x_batch, y_batch = data.test.next_batch(batch_size) 214 | # feed_dict_test = {x_pl: x_batch, y_pl: y_batch} 215 | # _loss, _acc, _summary = sess.run(fetches_valid, feed_dict_test) 216 | # y_preds = sess.run(fetches=y, feed_dict=feed_dict_test) 217 | # y_preds = tf.argmax(y_preds, axis=1).eval() 218 | # y_true = tf.argmax(y_batch, axis=1).eval() 219 | # cm.batch_add(y_true, y_preds) 220 | # test_loss.append(_loss) 221 | # test_accuracy.append(_acc) 222 | # # test_writer.add_summary(_summary, data.train.epochs_completed) 223 | # print('Test Loss {:6.3f}, Test acc {:6.3f}'.format( 224 | # np.mean(test_loss), np.mean(test_accuracy))) 225 | # 226 | # 227 | # except KeyboardInterrupt: 228 | # pass 229 | 230 | 231 | # print(cm) 232 | # 233 | # # df = utils.load_h5(dir + filename+'.h5', key=filename.split('-')[0]) 234 | # # print("Load done!") 235 | # # print(df.shape) 236 | # # payloads = df['payload'].values 237 | # # payloads = utils.pad_elements_with_zero(payloads) 238 | # # df['payload'] = payloads 239 | # # 240 | # # # Converting hex string to list of int... Maybe takes to long? 241 | # # payloads = [[int(i, 16) for i in list(x)] for x in payloads] 242 | # # np_payloads = numpy.array(payloads) 243 | # # # dataset = DataSet(np_payloads, df['label'].values) 244 | # # x, y = dataset.next_batch(10) 245 | # # batch_size = 100 246 | # # features = {'payload': payloads} 247 | # # 248 | # # 249 | # # gb = df.groupby(['ip.dst', 'ip.src', 'port.dst', 'port.src']) 250 | # # 251 | # # 252 | # # l = dict(list(gb)) 253 | # # 254 | # # 255 | # # s = [[k, len(v)] for k, v in sorted(l.items(), key=lambda x: len(x[1]), reverse=True)] 256 | # 257 | # 258 | # print("DONE") 259 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import multiprocessing 2 | from sklearn import metrics 3 | from scapy.all import * 4 | import matplotlib.pyplot as plt 5 | import os 6 | import numpy as np 7 | import time 8 | import pandas as pd 9 | import glob 10 | from scipy import interp 11 | 12 | 13 | def filter_pcap_by_ip(dir, filename, ip_list, label): 14 | ''' 15 | This method can be used to extract certain packets (associted with specified ip adresses) from a pcap file. 16 | The method expects user knowledge of the communication protocol used by the specified ip adresses for the label to be correct. 17 | 18 | The label is intended to be either http or https. 19 | :param dir: The directory in which the pcap file is located 20 | :param filename: The name of the pcap file that should be loaded 21 | :param ip_list: A list of ip adresses of interest. Packets with either ip.src or ip.dst in ip_list will be extracted 22 | :param label: The label that should be applied to the extracted packets. Note that the ip_list should contain 23 | adresses that we know communicate over http or https in order to match with the label 24 | :return: A dataframe containing information about the extracted packets from the pcap file. 25 | ''' 26 | time_s = time.clock() 27 | count = 0 28 | print("Read PCAP, label is %s" % label) 29 | data = rdpcap(dir + filename + '.pcap') 30 | totalPackets = len(data) 31 | percentage = int(totalPackets / 100) 32 | # Workaround/speedup for pandas append to dataframe 33 | frametimes = [] 34 | dsts = [] 35 | srcs = [] 36 | protocols = [] 37 | dports = [] 38 | sports = [] 39 | bytes = [] 40 | labels = [] 41 | 42 | print("Total packages: %d" % totalPackets) 43 | time_r = time.clock() 44 | time_read = time_r - time_s 45 | print("Time to read PCAP: " + str(time_read)) 46 | for packet in data: 47 | if IP in packet and \ 48 | (UDP in packet or TCP in packet) and \ 49 | (packet[IP].dst in ip_list or packet[IP].src in ip_list): 50 | ip_layer = packet[IP] 51 | transport_layer = ip_layer.payload 52 | frametimes.append(packet.time) 53 | dsts.append(ip_layer.dst) 54 | srcs.append(ip_layer.src) 55 | protocols.append(transport_layer.name) 56 | dports.append(transport_layer.dport) 57 | sports.append(transport_layer.sport) 58 | # Save the raw byte string 59 | raw_payload = raw(packet) 60 | bytes.append(raw_payload) 61 | labels.append(label) 62 | if (count % (percentage * 5) == 0): 63 | print(str(count / percentage) + '%') 64 | count += 1 65 | time_t = time.clock() 66 | print("Time spend: %ds" % (time_t - time_r)) 67 | d = {'time': frametimes, 68 | 'ip.dst': dsts, 69 | 'ip.src': srcs, 70 | 'protocol': protocols, 71 | 'port.dst': dports, 72 | 'port.src': sports, 73 | 'bytes': bytes, 74 | 'label': labels} 75 | df = pd.DataFrame(data=d) 76 | time_e = time.clock() 77 | total_time = time_e - time_s 78 | print("Time to convert PCAP to dataframe: " + str(total_time)) 79 | return df 80 | 81 | 82 | def load_h5(dir, filename): 83 | timeS = time.clock() 84 | df = pd.read_hdf(dir + "/" + filename, key=filename.split('-')[0]) 85 | timeE = time.clock() 86 | loadTime = timeE - timeS 87 | print("Time to load " + filename + ": " + str(loadTime)) 88 | return df 89 | 90 | 91 | def plotHex(hexvalues, filename): 92 | ''' 93 | Plot an example as an image 94 | hexvalues: list of byte values 95 | average: allows for providing more than one list of hexvalues and create an average over all 96 | ''' 97 | 98 | size = 39 99 | hex_placeholder = [0] * (size * size) # create placeholder of correct size 100 | 101 | if (type(hexvalues[0]) is np.ndarray): 102 | print("Multiple payloads") 103 | for hex_list in hexvalues: 104 | hex_placeholder[0:len(hex_list)] += hex_list # overwrite zero values with values of 105 | hex_placeholder = np.array(hex_placeholder) / len(hexvalues) # average the elements of the placeholder 106 | else: 107 | print("Single payload") 108 | hex_placeholder[0:len(hexvalues)] = hexvalues # overwrite zero values with values of 109 | 110 | canvas = np.reshape(np.array(hex_placeholder), (size, size)) 111 | plt.figure(figsize=(4, 4)) 112 | plt.axis('off') 113 | plt.imshow(canvas, cmap='gray') 114 | plt.title(filename) 115 | plt.show() 116 | return canvas 117 | 118 | 119 | def pad_string_elements_with_zero(payloads): 120 | # Assume max payload to be 1460 bytes but as each byte is now 2 hex digits we take double length 121 | max_payload_len = 1460 * 2 122 | # Pad with '0' 123 | payloads = [s.ljust(max_payload_len, '0') for s in payloads] 124 | return payloads 125 | 126 | 127 | def pad_arrays_with_zero(payloads, payload_length=810): 128 | tmp_payloads = [] 129 | for x in payloads: 130 | payload = np.zeros(payload_length, dtype=np.uint8) 131 | # pl = np.fromstring(x, dtype=np.uint8) 132 | payload[:x.shape[0]] = x 133 | tmp_payloads.append(payload) 134 | 135 | # payloads = [np.fromstring(x) for x in payloads] 136 | return np.array(tmp_payloads) 137 | 138 | 139 | def hash_elements(payloads): 140 | return payloads 141 | 142 | 143 | def packetanonymizer(packet): 144 | """" 145 | Takes a packet as a bytestring in hex format and convert to unsigned 8bit integers [0-255] 146 | Sets the header fields which contain MAC, IP and Port information to 0 147 | """ 148 | # Should work with TCP and UDP 149 | 150 | p = np.fromstring(packet, dtype=np.uint8) 151 | # set MACs to 0 152 | p[0:12] = 0 153 | # Remove IP checksum 154 | p[24:26] = 0 155 | # set IPs to 0 156 | p[26:34] = 0 157 | # set ports to 0 158 | p[34:36] = 0 159 | p[36:38] = 0 160 | 161 | # IP protocol field check if TCP 162 | if p[23] == 6: 163 | #Remove TCP checksum 164 | p[50:52] = 0 165 | else: 166 | # Remove UDP checksum 167 | p[40:42] = 0 168 | return p 169 | 170 | 171 | def extractdatapoints(dataframe, filename, num_headers=15, session=True): 172 | """" 173 | Extracts the concatenated header datapoints from a dataframe while anonomizing the individual header 174 | :returns a dataframe with datapoints (bytes) and labels 175 | """ 176 | group_by = dataframe.sort_values(['time']).groupby(['ip.dst', 'ip.src', 'port.dst', 'port.src']) 177 | gb_dict = dict(list(group_by)) 178 | data_points = [] 179 | labels = [] 180 | filenames = [] 181 | sessions = [] 182 | done = set() 183 | num_too_short = 0 184 | # Iterate over sessions 185 | for k, v in gb_dict.items(): 186 | # v is a DataFrame 187 | # k is a tuple (src, dst, sport, dport) 188 | if k in done: 189 | continue 190 | done.add(k) 191 | if session: 192 | other_direction_key = (k[1], k[0], k[3], k[2]) 193 | other_direction = gb_dict[other_direction_key] 194 | v = pd.concat([v, other_direction]).sort_values(['time']) 195 | done.add(other_direction_key) 196 | if len(v) < num_headers: 197 | num_too_short += 1 198 | continue 199 | # extract num_headers of packets (one row each) 200 | packets = v['bytes'].values[:num_headers] 201 | headers = [] 202 | label = v['label'].iloc[0] 203 | protocol = v['protocol'].iloc[0] 204 | 205 | # Skip session if UDP and not youtube 206 | if protocol == 'UDP' and label != 'youtube': 207 | continue 208 | # For each packet 209 | packetindex = 0 210 | headeradded = 0 211 | 212 | while headeradded < num_headers: 213 | if packetindex < len(packets): 214 | p = packets[packetindex] 215 | packetindex += 1 216 | else: 217 | break 218 | p_an = packetanonymizer(p) 219 | 220 | # assuming a session utilize the same protocol throughout 221 | # Extract headers (TCP = 54 Bytes, UDP = 42 Bytes - Maybe + 4 Bytes for VLAN tagging) from x first packets of session/flow 222 | header = np.zeros(54, dtype=np.uint8) 223 | if protocol == 'TCP': 224 | # TCP 225 | header[:54] = p_an[:54] 226 | else: 227 | # UDP 228 | header[:42] = p_an[:42] # pad zeros 229 | 230 | # Skip if header packet is fragmented 231 | if (0 < header[20] < 64) or header[21] != 0: 232 | continue 233 | headers.append(header) 234 | headeradded += 1 235 | 236 | # Concatenate headers as the feature vector 237 | if len(headers) == num_headers: 238 | feature_vector = np.concatenate(headers).ravel() 239 | data_points.append(feature_vector) 240 | labels.append(label) 241 | filenames.append(filename) 242 | sessions.append(k) 243 | d = {'filename': filenames, 'session': sessions, 'bytes': data_points, 'label': labels} 244 | return pd.DataFrame(data=d) 245 | 246 | 247 | def saveheaderstask(filelist, num_headers, session, dataframes): 248 | datapointslist = [] 249 | for fullname in filelist: 250 | load_dir, filename = os.path.split(fullname) 251 | print("Loading: {0}".format(filename)) 252 | df = load_h5(load_dir, filename) 253 | datapoints = extractdatapoints(df, filename, num_headers, session) 254 | datapointslist.append(datapoints) 255 | 256 | # Extend the shared dataframe 257 | dataframes.extend(datapointslist) 258 | 259 | 260 | def saveextractedheaders(load_dir, save_dir, savename, num_headers=15, session=True): 261 | """" 262 | Extracts datapoints from all .h5 files in train_dir and saves the them in a new .h5 file 263 | :param load_dir: The directory to load from 264 | :param save_dir: The directory to save the extracted headers 265 | :param savename: The filename to save 266 | :param num_headers: The amount of headers to use as datapoint 267 | :param session: session or flow 268 | """ 269 | manager = multiprocessing.Manager() 270 | dataframes = manager.list() 271 | filelist = glob.glob(load_dir + '*.h5') 272 | filesplits = split_list(filelist, 4) 273 | 274 | threads = [] 275 | for split in filesplits: 276 | # create a thread for each 277 | t = multiprocessing.Process(target=saveheaderstask, args=(split, num_headers, session, dataframes)) 278 | threads.append(t) 279 | t.start() 280 | # create one large dataframe 281 | 282 | for t in threads: 283 | t.join() 284 | print("Process joined: ", t) 285 | data = pd.concat(dataframes) 286 | key = savename.split('-')[0] 287 | if not os.path.exists(save_dir): 288 | os.makedirs(save_dir) 289 | data.to_hdf(save_dir + savename + '.h5', key=key, mode='w') 290 | 291 | 292 | # saveextractedheaders('./', 'extracted-0103_1136') 293 | # read_pcap('../Data/', 'drtv-2302_1031') 294 | def split_list(list, chunks): 295 | ''' 296 | Takes a list an splits it to equal sized chunks. 297 | :param list: list to split 298 | :param chunks: number of chunks (int) 299 | :return: a list containing chunks (lists) as elements 300 | ''' 301 | avg = len(list) / float(chunks) 302 | out = [] 303 | last = 0.0 304 | 305 | while last < len(list): 306 | out.append(list[int(last):int(last + avg)]) 307 | last += avg 308 | 309 | return out 310 | 311 | 312 | def plot_confusion_matrix(cm, classes, 313 | normalize=False, 314 | title='Confusion matrix', 315 | cmap=plt.cm.get_cmap(name='Blues'), save=False): 316 | """ 317 | This function prints and plots the confusion matrix. 318 | Normalization can be applied by setting `normalize=True`. 319 | """ 320 | import itertools 321 | from matplotlib import rcParams 322 | # Make room for xlabel which is otherwise cut off 323 | rcParams.update({'figure.autolayout': True}) 324 | 325 | if normalize: 326 | cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis] 327 | print("Normalized confusion matrix") 328 | else: 329 | print('Confusion matrix, without normalization') 330 | 331 | print(cm) 332 | plt.figure() 333 | plt.imshow(cm, interpolation='nearest', cmap=cmap) 334 | plt.title("Accuracy: {0}".format(title.split("acc")[1])) 335 | plt.colorbar() 336 | tick_marks = np.arange(len(classes)) 337 | plt.xticks(tick_marks, classes, rotation='vertical') 338 | plt.yticks(tick_marks, classes) 339 | 340 | fmt = '.2f' if normalize else 'd' 341 | thresh = cm.max() / 1.5 342 | # thresh = 2260 343 | for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])): 344 | plt.text(j, i, format(cm[i, j], fmt), 345 | horizontalalignment="center", 346 | color="white" if cm[i, j] > thresh else "black") 347 | 348 | plt.tight_layout() 349 | plt.ylabel('True label') 350 | plt.xlabel('Predicted label') 351 | if save: 352 | i = 0 353 | filename = "{}".format(title) 354 | while os.path.exists('{}{:d}.png'.format(filename, i)): 355 | i += 1 356 | plt.savefig('{}{:d}.png'.format(filename, i), dpi=300) 357 | plt.draw() 358 | # plt.gcf().clear() 359 | 360 | 361 | def plot_metric_graph(x_list, y_list, x_label="Datapoints", y_label="Accuracy", 362 | title='Metric list', save=False): 363 | from matplotlib import rcParams 364 | # Make room for xlabel which is otherwise cut off 365 | rcParams.update({'figure.autolayout': True}) 366 | plt.figure() 367 | plt.plot(x_list, y_list, label="90/10 split") 368 | # plt.plot(x_list2, y_list2, label="Seperate testset") 369 | # Calculate min and max of y scale 370 | ymin = np.min(y_list) 371 | ymin = np.floor(ymin * 10) / 10 372 | ymax = np.max(y_list) 373 | ymax = np.ceil(ymax * 10) / 10 374 | plt.ylim(ymin, ymax) 375 | plt.title("{0}".format(title)) 376 | plt.tight_layout() 377 | plt.ylabel(y_label) 378 | plt.xlabel(x_label) 379 | # plt.legend() 380 | if save: 381 | i = 0 382 | filename = "{}".format(title) 383 | while os.path.exists('{}{:d}.png'.format(filename, i)): 384 | i += 1 385 | plt.savefig('{}{:d}.png'.format(filename, i), dpi=300) 386 | plt.draw() 387 | # plt.gcf().clear() 388 | 389 | def plot_class_ROC(fpr, tpr, roc_auc, class_idx, labels): 390 | from matplotlib import rcParams 391 | # Make room for xlabel which is otherwise cut off 392 | rcParams.update({'figure.autolayout': True}) 393 | plt.figure() 394 | lw = 2 395 | plt.plot(fpr[class_idx], tpr[class_idx], color='darkorange', 396 | lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[class_idx]) 397 | plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--') 398 | plt.xlim([0.0, 1.0]) 399 | plt.ylim([0.0, 1.05]) 400 | plt.xlabel('False Positive Rate') 401 | plt.ylabel('True Positive Rate') 402 | plt.title('Receiver operating characteristic of class {}'.format(labels[class_idx])) 403 | plt.legend(loc="lower right") 404 | plt.tight_layout() 405 | 406 | def plot_multi_ROC(fpr, tpr, roc_auc, num_classes, labels, micro=True, macro=True): 407 | from matplotlib import rcParams 408 | # Make room for xlabel which is otherwise cut off 409 | rcParams.update({'figure.autolayout': True}) 410 | # Plot all ROC curves 411 | plt.figure() 412 | lw = 2 413 | if micro: 414 | plt.plot(fpr["micro"], tpr["micro"], 415 | label='micro-average ROC curve (area = {0:0.2f})' 416 | ''.format(roc_auc["micro"]), 417 | color='deeppink', linestyle=':', linewidth=4) 418 | if macro: 419 | plt.plot(fpr["macro"], tpr["macro"], 420 | label='macro-average ROC curve (area = {0:0.2f})' 421 | ''.format(roc_auc["macro"]), 422 | color='navy', linestyle=':', linewidth=4) 423 | color_map = {0: '#487fff', 1: '#2ee3ff', 2: '#4eff4e', 3: '#ffca43', 4: '#ff365e', 5: '#d342ff', 6: '#626663'} 424 | # colors = cycle(['aqua', 'darkorange', 'cornflowerblue']) 425 | for i in range(num_classes): 426 | plt.plot(fpr[i], tpr[i], color=color_map[i], lw=lw, 427 | label='ROC curve of {0} (area = {1:0.2f})' 428 | ''.format(labels[i], roc_auc[i])) 429 | 430 | plt.plot([0, 1], [0, 1], 'k--', lw=lw) 431 | plt.xlim([0.0, 1.0]) 432 | plt.ylim([0.0, 1.05]) 433 | plt.xlabel('False Positive Rate') 434 | plt.ylabel('True Positive Rate') 435 | plt.title('Receiver operating characteristic of all classes') 436 | plt.legend(loc="lower right") 437 | plt.tight_layout() 438 | 439 | 440 | def plot_ROC(y_true, y_preds, num_classes, labels, micro=True, macro=True): 441 | # Compute ROC curve and ROC area for each class 442 | fpr = dict() 443 | tpr = dict() 444 | roc_auc = dict() 445 | for i in range(num_classes): 446 | fpr[i], tpr[i], _ = metrics.roc_curve(y_true[:, i], y_preds[:, i]) 447 | roc_auc[i] = metrics.auc(fpr[i], tpr[i]) 448 | if micro: 449 | # Compute micro-average ROC curve and ROC area 450 | fpr["micro"], tpr["micro"], _ = metrics.roc_curve(y_true.ravel(), y_preds.ravel()) 451 | roc_auc["micro"] = metrics.auc(fpr["micro"], tpr["micro"]) 452 | if macro: 453 | # Compute macro-average ROC curve and ROC area 454 | # First aggregate all false positive rates 455 | all_fpr = np.unique(np.concatenate([fpr[i] for i in range(num_classes)])) 456 | # Then interpolate all ROC curves at this points 457 | mean_tpr = np.zeros_like(all_fpr) 458 | for i in range(num_classes): 459 | mean_tpr += interp(all_fpr, fpr[i], tpr[i]) 460 | # Finally average it and compute AUC 461 | mean_tpr /= num_classes 462 | fpr["macro"] = all_fpr 463 | tpr["macro"] = mean_tpr 464 | roc_auc["macro"] = metrics.auc(fpr["macro"], tpr["macro"]) 465 | for i in range(num_classes): 466 | plot_class_ROC(fpr, tpr, roc_auc, i, labels) 467 | plot_multi_ROC(fpr, tpr, roc_auc, num_classes, labels, micro, macro) 468 | 469 | 470 | def show_plot(): 471 | plt.show() 472 | # y_list_test, x_list_test = [0.265, 0.572, 0.636, 0.704, 0.746, 0.763, 0.769, 0.784, 0.789, 0.818, 0.82, 0.817, 0.831, 0.845, 0.846, 0.848, 0.859, 0.887, 0.876], [54, 268, 536, 1072, 1608, 2144, 2680, 3216, 3752, 4288, 4824, 5360, 8039, 13398, 18758, 24117, 29476, 34835, 40194] 473 | # y_list_merge, x_list_merge = [0.269, 0.75, 0.79, 0.818, 0.833, 0.844, 0.856, 0.866, 0.87, 0.868, 0.883, 0.881, 0.902, 0.925, 0.923, 0.941, 0.946, 0.943], [59, 291, 582, 1163, 1744, 2325, 2906, 3487, 4068, 4649, 5230, 5811, 8717, 14528, 20339, 26150, 31961, 37772] 474 | # plot_metric_graph(x_list_merge, y_list_merge, x_list_test, y_list_test, title="#Training Datapoints vs. Accuracy", save=True) -------------------------------------------------------------------------------- /visualization/classes_module.py: -------------------------------------------------------------------------------- 1 | import copy 2 | import numpy 3 | from visualization import vis_utils 4 | 5 | # ------------------------- 6 | # Feed-forward network 7 | # ------------------------- 8 | class Network: 9 | 10 | def __init__(self, layers): 11 | self.layers = layers 12 | 13 | def forward(self, Z): 14 | for l in self.layers: 15 | Z = l.forward(Z) 16 | return Z 17 | 18 | def gradprop(self, DZ): 19 | for l in self.layers[::-1]: 20 | DZ = l.gradprop(DZ) 21 | return DZ 22 | 23 | def relprop(self, R): 24 | for l in self.layers[::-1]: 25 | R = l.relprop(R) 26 | return R 27 | 28 | 29 | 30 | # ------------------------- 31 | # ReLU activation layer 32 | # ------------------------- 33 | class ReLU: 34 | 35 | def forward(self, X): 36 | self.Z = X > 0 37 | return X*self.Z 38 | 39 | def gradprop(self, DY): 40 | return DY*self.Z 41 | 42 | def relprop(self, R): 43 | return R 44 | 45 | # ------------------------- 46 | # Fully-connected layer 47 | # ------------------------- 48 | class Linear: 49 | 50 | def __init__(self, W, b): 51 | self.W = W 52 | self.B = b 53 | 54 | def forward(self, X): 55 | self.X = X 56 | return numpy.dot(self.X, self.W)+self.B 57 | 58 | def gradprop(self, DY): 59 | self.DY = DY 60 | return numpy.dot(self.DY, self.W.T) 61 | 62 | def relprop(self, R): 63 | V = numpy.maximum(0, self.W) 64 | Z = numpy.dot(self.X, V) + 1e-9 65 | S = R / Z 66 | C = numpy.dot(S, V.T) 67 | R = self.X * C 68 | return R 69 | 70 | 71 | class AlphaBetaLinear: 72 | 73 | def __init__(self, W, b, alpha): 74 | self.W = W 75 | self.B = b 76 | self.alpha = alpha 77 | self.beta = alpha -1 78 | 79 | def forward(self, X): 80 | self.X = X 81 | return numpy.dot(self.X, self.W)+self.B 82 | 83 | def gradprop(self, DY): 84 | self.DY = DY 85 | return numpy.dot(self.DY, self.W.T) 86 | 87 | def relprop(self, R): 88 | pself = copy.deepcopy(self) 89 | nself = copy.deepcopy(self) 90 | pself.B *= 0 91 | pself.W = numpy.maximum( 1e-9, pself.W) 92 | nself.B *= 0 93 | nself.W = numpy.minimum(-1e-9, pself.W) 94 | 95 | X = self.X + 1e-9 96 | ZA = pself.forward(X) 97 | SA = self.alpha*R/ZA 98 | ZB = nself.forward(X) 99 | SB = -self.beta *R/ZB 100 | R = X * (pself.gradprop(SA) + nself.gradprop(SB)) 101 | return R 102 | 103 | class FirstLinear(Linear): 104 | 105 | def __init__(self, W, b): 106 | self.W = W 107 | self.B = b 108 | 109 | def relprop(self, R): 110 | W, V, U = self.W, numpy.maximum(0, self.W), numpy.minimum(0, self.W) 111 | X, L, H = self.X, self.X * 0 + vis_utils.lowest, self.X * 0 + vis_utils.highest 112 | 113 | Z = numpy.dot(X, W) - numpy.dot(L, V) - numpy.dot(H, U) + 1e-9; 114 | S = R / Z 115 | R = X * numpy.dot(S, W.T) - L * numpy.dot(S, V.T) - H * numpy.dot(S, U.T) 116 | return R 117 | 118 | 119 | # # ------------------------- 120 | # # Sum-pooling layer 121 | # # ------------------------- 122 | # class Pooling: 123 | # 124 | # def forward(self, X): 125 | # self.X = X 126 | # self.Y = 0.5*(X[:, ::2, ::2, :]+X[:, ::2, 1::2, :]+X[:, 1::2, ::2, :]+X[:, 1::2, 1::2, :]) 127 | # return self.Y 128 | # 129 | # def gradprop(self, DY): 130 | # self.DY = DY 131 | # DX = self.X*0 132 | # for i, j in [(0, 0), (0, 1), (1, 0), (1, 1)]: 133 | # DX[:, i::2, j::2, :] += DY*0.5 134 | # return DX 135 | # 136 | # # ------------------------- 137 | # # Convolution layer 138 | # # ------------------------- 139 | # class Convolution: 140 | # 141 | # def __init__(self,name): 142 | # wshape = map(int,list(name.split("-")[-1].split("x"))) 143 | # self.W = numpy.loadtxt(name+'-W.txt').reshape(wshape) 144 | # self.B = numpy.loadtxt(name+'-B.txt') 145 | # 146 | # def forward(self,X): 147 | # 148 | # self.X = X 149 | # mb,wx,hx,nx = X.shape 150 | # ww,hw,nx,ny = self.W.shape 151 | # wy,hy = wx-ww+1,hx-hw+1 152 | # 153 | # Y = numpy.zeros([mb,wy,hy,ny],dtype='float32') 154 | # 155 | # for i in range(ww): 156 | # for j in range(hw): 157 | # Y += numpy.dot(X[:,i:i+wy,j:j+hy,:],self.W[i,j,:,:]) 158 | # 159 | # return Y+self.B 160 | # 161 | # def gradprop(self,DY): 162 | # 163 | # self.DY = DY 164 | # mb,wy,hy,ny = DY.shape 165 | # ww,hw,nx,ny = self.W.shape 166 | # 167 | # DX = self.X*0 168 | # 169 | # for i in range(ww): 170 | # for j in range(hw): 171 | # DX[:,i:i+wy,j:j+hy,:] += numpy.dot(DY,self.W[i,j,:,:].T) 172 | # 173 | # return DX 174 | -------------------------------------------------------------------------------- /visualization/heatmap.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | from sklearn.preprocessing import LabelEncoder 4 | from sklearn.cross_validation import train_test_split 5 | import utils 6 | import glob, os 7 | import pca.dataanalyzer as da 8 | 9 | 10 | # visulaize the important characteristics of the dataset 11 | import matplotlib.pyplot as plt 12 | 13 | data_len = 1460 14 | seed = 0 15 | # dirs = ["C:/Users/salik/Documents/Data/LinuxChrome/{}/".format(num_headers), 16 | # "C:/Users/salik/Documents/Data/WindowsFirefox/{}/".format(num_headers), 17 | # "C:/Users/salik/Documents/Data/WindowsChrome/{}/".format(num_headers), 18 | # "C:/Users/salik/Documents/Data/WindowsSalik/{}/".format(num_headers), 19 | # "C:/Users/salik/Documents/Data/WindowsAndreas/{}/".format(num_headers)] 20 | dirs = ["E:/Data/h5/https/", "E:/Data/h5/netflix/"] 21 | 22 | # step 1: get the data 23 | dataframes = [] 24 | num_examples = 0 25 | for dir in dirs: 26 | for fullname in glob.iglob(dir + '*.h5'): 27 | filename = os.path.basename(fullname) 28 | df = utils.load_h5(dir, filename) 29 | dataframes.append(df) 30 | num_examples = len(df.values) 31 | # create one large dataframe 32 | data = pd.concat(dataframes) 33 | data.sample(frac=1, random_state=seed).reset_index(drop=True) 34 | num_rows = data.shape[0] 35 | columns = data.columns 36 | print(columns) 37 | 38 | # step 2: get x and convert it to numpy array 39 | x = da.getbytes(data, data_len) 40 | 41 | # step 3: get class labels y and then encode it into number 42 | # get class label data 43 | y = data['label'].values 44 | # encode the class label 45 | class_labels = np.unique(y) 46 | label_encoder = LabelEncoder() 47 | y = label_encoder.fit_transform(y) 48 | # step 4: split the data into training set and test set 49 | test_percentage = 0.5 50 | x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=test_percentage, random_state=seed) 51 | plot_savename = "histogram_payload" 52 | 53 | from matplotlib import rcParams 54 | # Make room for xlabel which is otherwise cut off 55 | rcParams.update({'figure.autolayout': True}) 56 | # Heatmap plot how plot the sample points among 5 classes 57 | for idx, cl in enumerate(np.unique(y_test)): 58 | plt.figure() 59 | print("Starting class: " + class_labels[cl] +" With len:" + str(len(x_test[y_test == cl]))) 60 | positioncounts = np.zeros(shape=(256, data_len)) 61 | for x in x_test[y_test == cl]: 62 | for i, v in enumerate(x): 63 | positioncounts[int(v), i] += 1 64 | plt.imshow(positioncounts, cmap="YlGnBu", interpolation='nearest') 65 | plt.title('Heatmap of : {}'.format(class_labels[cl])) 66 | plt.colorbar() 67 | plt.tight_layout() 68 | # plt.savefig('{0}{1}.png'.format(plot_savename, int(perplexity)), dpi=300) 69 | plt.show() -------------------------------------------------------------------------------- /visualization/histogram.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | from sklearn.preprocessing import LabelEncoder 4 | from sklearn.preprocessing import StandardScaler 5 | from sklearn.cross_validation import train_test_split 6 | import utils 7 | import glob, os 8 | import pca.dataanalyzer as da, pca.pca as pca 9 | from sklearn.metrics import accuracy_score 10 | 11 | # visulaize the important characteristics of the dataset 12 | import matplotlib.pyplot as plt 13 | seed = 0 14 | num_headers = 16 15 | data_len = 54*num_headers #1460 16 | dirs = ["C:/Users/salik/Documents/Data/LinuxChrome/{}/".format(num_headers), 17 | "C:/Users/salik/Documents/Data/WindowsFirefox/{}/".format(num_headers), 18 | "C:/Users/salik/Documents/Data/WindowsChrome/{}/".format(num_headers), 19 | "C:/Users/salik/Documents/Data/WindowsSalik/{}/".format(num_headers), 20 | "C:/Users/salik/Documents/Data/WindowsAndreas/{}/".format(num_headers)] 21 | # dirs = ["E:/Data/h5/https/", "E:/Data/h5/netflix/"] 22 | 23 | # step 1: get the data 24 | dataframes = [] 25 | num_examples = 0 26 | for dir in dirs: 27 | for fullname in glob.iglob(dir + '*.h5'): 28 | filename = os.path.basename(fullname) 29 | df = utils.load_h5(dir, filename) 30 | dataframes.append(df) 31 | num_examples = len(df.values) 32 | # create one large dataframe 33 | data = pd.concat(dataframes) 34 | data.sample(frac=1, random_state=seed).reset_index(drop=True) 35 | num_rows = data.shape[0] 36 | columns = data.columns 37 | print(columns) 38 | 39 | # step 2: get features (x) and convert it to numpy array 40 | x = da.getbytes(data, data_len) 41 | 42 | # step 3: get class labels y and then encode it into number 43 | # get class label data 44 | y = data['label'].values 45 | 46 | # encode the class label 47 | class_labels = np.unique(y) 48 | label_encoder = LabelEncoder() 49 | y = label_encoder.fit_transform(y) 50 | 51 | # step 4: split the data into training set and test set 52 | test_percentage = 0.5 53 | x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=test_percentage, random_state=seed) 54 | 55 | plot_savename = "histogram_payload" 56 | 57 | from matplotlib import rcParams 58 | # Make room for xlabel which is otherwise cut off 59 | rcParams.update({'figure.autolayout': True}) 60 | 61 | 62 | 63 | # scatter plot the sample points among 5 classes 64 | # markers = ('s', 'd', 'o', '^', 'v', ".", ",", "<", ">", "8", "p", "P", "*", "h", "H", "+", "x", "X", "D", "|", "_") 65 | color_map = {0: '#487fff', 1: '#d342ff', 2: '#4eff4e', 3: '#2ee3ff', 4: '#ffca43', 5:'#ff365e', 6:'#626663'} 66 | plt.figure() 67 | for idx, cl in enumerate(np.unique(y_test)): 68 | # Get count of unique values 69 | values, counts = np.unique(x_test[y_test == cl], return_counts=True) 70 | # Maybe remove zero as there is a lot of zeros in the header 71 | # values = values[1:] 72 | # counts = counts[1:] 73 | n, bins, patches = plt.hist(values, weights=counts, bins=256, facecolor=color_map[idx], label=class_labels[cl], alpha=0.8) 74 | 75 | plt.legend(loc='upper right') 76 | plt.title('Histogram of : {}'.format(class_labels)) 77 | plt.tight_layout() 78 | # plt.savefig('{0}{1}.png'.format(plot_savename, int(perplexity)), dpi=300) 79 | plt.show() -------------------------------------------------------------------------------- /visualization/pca_plots.py: -------------------------------------------------------------------------------- 1 | import pca.dataanalyzer as da, pca.pca as pca 2 | import glob 3 | import os 4 | import pandas as pd 5 | import numpy as np 6 | from sklearn.preprocessing import StandardScaler, LabelEncoder 7 | import utils 8 | 9 | 10 | seed = 0 11 | num_headers = 16 12 | dirs = ["E:/Data/LinuxChrome/{}/".format(num_headers), 13 | "E:/Data/WindowsSalik/{}/".format(num_headers), 14 | "E:/Data/WindowsAndreas/{}/".format(num_headers), 15 | "E:/Data/WindowsFirefox/{}/".format(num_headers), 16 | "E:/Data/WindowsChrome/{}/".format(num_headers), 17 | ] 18 | # dirs = ["C:/Users/salik/Documents/Data/h5/https/", "C:/Users/salik/Documents/Data/h5/netflix/"] 19 | # dirs = ["C:/Users/salik/Documents/Data/WindowsAndreas/{}/".format(num_headers)] 20 | # step 1: get the data 21 | dataframes = [] 22 | num_examples = 0 23 | for dir in dirs: 24 | for fullname in glob.iglob(dir + '*.h5'): 25 | filename = os.path.basename(fullname) 26 | df = utils.load_h5(dir, filename) 27 | dataframes.append(df) 28 | num_examples = len(df.values) 29 | # create one large dataframe 30 | data = pd.concat(dataframes) 31 | data = data.sample(frac=0.1, random_state=seed).reset_index(drop=True) 32 | num_rows = data.shape[0] 33 | columns = data.columns 34 | print(columns) 35 | 36 | # step 3: get features (x) and scale the features 37 | # get x and convert it to numpy array 38 | x = da.getbytes(data, num_headers*54) 39 | standard_scaler = StandardScaler() 40 | # x_stds = [] 41 | # ys = [] 42 | # for data in dataframes: 43 | # x = da.getbytes(data, 1460) 44 | 45 | # x = [[-0.5, -0.5], 46 | # [-0.5, 0.5], 47 | # [0.5, -0.5], 48 | # [0.5, 0.5], 49 | # [-0.4, -0.4], 50 | # [-0.4, 0.4], 51 | # [0.4, -0.4], 52 | # [0.4, 0.4] 53 | # ] 54 | x_std = standard_scaler.fit_transform(x) 55 | y = data['label'].values 56 | # y = ['drtv', 'netflix', 'youtube', 'twitch', 'twitch', 'youtube', 'netflix', 'drtv'] 57 | # encode the class label 58 | class_labels = np.unique(y) 59 | label_encoder = LabelEncoder() 60 | y = label_encoder.fit_transform(y) 61 | 62 | 63 | p = pca.runpca(x_std, num_comp=25) 64 | z = pca.componentprojection(x_std, p) 65 | pca.plotprojection(z, 0, y, class_labels) 66 | pca.plotvarianceexp(p, 25) 67 | pca.showplots() 68 | # plot_savename = "PCA_header_all_cumulative" 69 | # plt.savefig('{0}.png'.format(plot_savename), dpi=300) -------------------------------------------------------------------------------- /visualization/t-sne_compare.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | from sklearn.preprocessing import LabelEncoder 4 | from sklearn.preprocessing import StandardScaler 5 | from sklearn.cross_validation import train_test_split 6 | import utils 7 | import glob, os 8 | import pca.dataanalyzer as da, pca.pca as pca 9 | from sklearn.metrics import accuracy_score 10 | 11 | # visulaize the important characteristics of the dataset 12 | import matplotlib.pyplot as plt 13 | seed = 0 14 | num_headers = 16 15 | dirs = ["E:/Data/WindowsFirefox/{}/".format(num_headers), 16 | "E:/Data/WindowsChrome/{}/".format(num_headers)] 17 | # dirs = ["E:/Data/h5/https/", "E:/Data/h5/netflix/"] 18 | # dirs = ["C:/Users/salik/Documents/Data/WindowsAndreas/{}/".format(num_headers)] 19 | # step 1: get the data 20 | dataframes = [] 21 | num_examples = 0 22 | for dir in dirs: 23 | for fullname in glob.iglob(dir + '*.h5'): 24 | filename = os.path.basename(fullname) 25 | df = utils.load_h5(dir, filename) 26 | dataframes.append(df) 27 | num_examples = len(df.values) 28 | # create one large dataframe 29 | data = pd.concat(dataframes) 30 | num_rows = data.shape[0] 31 | columns = data.columns 32 | print(columns) 33 | 34 | # step 2: get features (x) and convert it to numpy array 35 | standard_scaler = StandardScaler() 36 | x_stds = [] 37 | ys = [] 38 | class_labels = [] 39 | for data in dataframes: 40 | x = da.getbytes(data, num_headers*54) 41 | # x = da.getbytes(data, 1460) 42 | x_std = standard_scaler.fit_transform(x) 43 | x_stds.append(x_std) 44 | # step 4: get class labels y and then encode it into number 45 | # get class label data 46 | y = data['label'].values 47 | # encode the class label 48 | class_labels = np.unique(y) 49 | label_encoder = LabelEncoder() 50 | y = label_encoder.fit_transform(y) 51 | ys.append(y) 52 | class_labels1 = ["Firefox " + x for x in class_labels] 53 | class_labels2 = ["Chrome " + x for x in class_labels] 54 | # step 5: split the data into training set and test set 55 | test_percentage = 0.1 56 | x_tests = [] 57 | y_tests = [] 58 | for i, x_std in enumerate(x_stds): 59 | x_train, x_test, y_train, y_test = train_test_split(x_std, ys[i], test_size=test_percentage, random_state=seed) 60 | x_tests.append(x_test) 61 | y_tests.append(y_test) 62 | 63 | x_test = np.append(x_tests[0], x_tests[1], axis=0) 64 | y_test = np.append(y_tests[0], y_tests[1], axis=0) 65 | first_set_length = len(y_tests[0]) 66 | print(first_set_length) 67 | 68 | # t-distributed Stochastic Neighbor Embedding (t-SNE) visualization 69 | plot_savename = "t-sne_16headers_windows_linux_perplexity" 70 | from sklearn.manifold import TSNE 71 | perplexities = [30.0] 72 | 73 | from matplotlib import rcParams 74 | # Make room for xlabel which is otherwise cut off 75 | rcParams.update({'figure.autolayout': True}) 76 | 77 | for perplexity in perplexities: 78 | print("Starting perplexity: {}".format(perplexity)) 79 | tsne = TSNE(n_components=2, perplexity=perplexity, n_iter=1000, random_state=seed, verbose=2) 80 | x_test_2d = tsne.fit_transform(x_test) 81 | x_test_2d_0 = x_test_2d[:first_set_length, :] 82 | x_test_2d_1 = x_test_2d[first_set_length:, :] 83 | 84 | # scatter plot the sample points among 7 classes 85 | # markers = ('s', 'd', 'o', '^', 'v', ".", ",", "<", ">", "8", "p", "P", "*", "h", "H", "+", "x", "X", "D", "|", "_") 86 | color_map = {0: '#487fff', 1: '#2ee3ff', 2: '#4eff4e', 3: '#ffca43', 4: '#ff365e', 5: '#d342ff', 6:'#626663'} 87 | plt.figure() 88 | for idx, cl in enumerate(np.unique(y_test)): 89 | # Plot first dataset as + 90 | plt.scatter(x=x_test_2d_0[y_tests[0] == cl, 0], y=x_test_2d_0[y_tests[0] == cl, 1], 91 | marker="+", s=30, 92 | c=color_map[idx], 93 | label=class_labels1[cl]) 94 | # Plot second dataset as o with no fill 95 | plt.scatter(x=x_test_2d_1[y_tests[1] == cl, 0], y=x_test_2d_1[y_tests[1] == cl, 1], 96 | marker="o", facecolors="None", s=30, linewidths=1, 97 | edgecolors=color_map[idx], 98 | label=class_labels2[cl]) 99 | plt.xlabel('X in t-SNE') 100 | plt.ylabel('Y in t-SNE') 101 | plt.legend(loc='lower right') 102 | plt.title('t-SNE visualization with perplexity: {}'.format(perplexity)) 103 | plt.tight_layout() 104 | # plt.savefig('{0}{1}.png'.format(plot_savename, int(perplexity)), dpi=300) 105 | plt.show() -------------------------------------------------------------------------------- /visualization/t_sne.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | from sklearn.preprocessing import LabelEncoder 4 | from sklearn.preprocessing import StandardScaler 5 | from sklearn.cross_validation import train_test_split 6 | import utils 7 | import glob, os 8 | import pca.dataanalyzer as da, pca.pca as pca 9 | from sklearn.metrics import accuracy_score 10 | 11 | # visulaize the important characteristics of the dataset 12 | import matplotlib.pyplot as plt 13 | import matplotlib.markers as ms 14 | seed = 0 15 | num_headers = 16 16 | dirs = ["C:/Users/salik/Documents/Data/LinuxChrome/{}/".format(num_headers), 17 | "C:/Users/salik/Documents/Data/WindowsFirefox/{}/".format(num_headers), 18 | "C:/Users/salik/Documents/Data/WindowsChrome/{}/".format(num_headers), 19 | "C:/Users/salik/Documents/Data/WindowsSalik/{}/".format(num_headers), 20 | "C:/Users/salik/Documents/Data/WindowsAndreas/{}/".format(num_headers)] 21 | # dirs = ["E:/Data/h5/https/", "E:/Data/h5/netflix/"] 22 | # dirs = ["C:/Users/salik/Documents/Data/WindowsAndreas/{}/".format(num_headers)] 23 | # step 1: get the data 24 | dataframes = [] 25 | num_examples = 0 26 | for dir in dirs: 27 | for fullname in glob.iglob(dir + '*.h5'): 28 | filename = os.path.basename(fullname) 29 | df = utils.load_h5(dir, filename) 30 | dataframes.append(df) 31 | num_examples = len(df.values) 32 | # create one large dataframe 33 | data = pd.concat(dataframes) 34 | data.sample(frac=1, random_state=seed).reset_index(drop=True) 35 | num_rows = data.shape[0] 36 | columns = data.columns 37 | print(columns) 38 | 39 | # step 3: get features (x) and scale the features 40 | # get x and convert it to numpy array 41 | # x = da.getbytes(data, 1460) 42 | standard_scaler = StandardScaler() 43 | x = da.getbytes(data, num_headers*54) 44 | x_std = standard_scaler.fit_transform(x) 45 | # step 4: get class labels y and then encode it into number 46 | # get class label data 47 | y = data['label'].values 48 | # encode the class label 49 | class_labels = np.unique(y) 50 | label_encoder = LabelEncoder() 51 | y = label_encoder.fit_transform(y) 52 | # step 5: split the data into training set and test set 53 | test_percentage = 0.1 54 | x_tests = [] 55 | y_tests = [] 56 | x_train, x_test, y_train, y_test = train_test_split(x_std, y, test_size=test_percentage, random_state=seed) 57 | # t-distributed Stochastic Neighbor Embedding (t-SNE) visualization 58 | plot_savename = "t-sne_16headers_windows_linux_perplexity" 59 | from sklearn.manifold import TSNE 60 | perplexities = [30.0] 61 | 62 | from matplotlib import rcParams 63 | # Make room for xlabel which is otherwise cut off 64 | rcParams.update({'figure.autolayout': True}) 65 | 66 | for perplexity in perplexities: 67 | print("Starting perplexity: {}".format(perplexity)) 68 | tsne = TSNE(n_components=2, perplexity=perplexity, n_iter=1000, random_state=seed, verbose=2) 69 | x_test_2d = tsne.fit_transform(x_test) 70 | # # scatter plot the sample points among 7 classes 71 | # # markers = ('s', 'd', 'o', '^', 'v', ".", ",", "<", ">", "8", "p", "P", "*", "h", "H", "+", "x", "X", "D", "|", "_") 72 | color_map = {0: '#487fff', 1: '#2ee3ff', 2: '#4eff4e', 3: '#ffca43', 4: '#ff365e', 5: '#d342ff', 6:'#626663'} 73 | plt.figure() 74 | for idx, cl in enumerate(np.unique(y_test)): 75 | plt.scatter(x=x_test_2d[y_test == cl, 0], 76 | y=x_test_2d[y_test == cl, 1], 77 | marker=".", 78 | c=color_map[idx], 79 | label=class_labels[cl]) 80 | plt.xlabel('X in t-SNE') 81 | plt.ylabel('Y in t-SNE') 82 | plt.legend(loc='lower right') 83 | # plt.title('t-SNE visualization with perplexity: {}'.format(perplexity)) 84 | plt.tight_layout() 85 | # plt.savefig('{0}{1}.png'.format(plot_savename, int(perplexity)), dpi=300) 86 | plt.show() -------------------------------------------------------------------------------- /visualization/vis_utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | # import PIL.Image 3 | import matplotlib.pyplot as plt 4 | import matplotlib.cm as cm 5 | # lowest = -1.0 6 | lowest = 0.0 7 | highest = 1.0 8 | 9 | # -------------------------------------- 10 | # Color maps ([-1,1] -> [0,1]^3) 11 | # -------------------------------------- 12 | 13 | 14 | def heatmap(x): 15 | 16 | x = x[..., np.newaxis] 17 | 18 | # positive relevance 19 | hrp = 0.9 - np.clip(x-0.3, 0, 0.7)/0.7*0.5 20 | hgp = 0.9 - np.clip(x-0.0, 0, 0.3)/0.3*0.5 - np.clip(x-0.3, 0, 0.7)/0.7*0.4 21 | hbp = 0.9 - np.clip(x-0.0, 0, 0.3)/0.3*0.5 - np.clip(x-0.3, 0, 0.7)/0.7*0.4 22 | 23 | # negative relevance 24 | hrn = 0.9 - np.clip(-x-0.0, 0, 0.3)/0.3*0.5 - np.clip(-x-0.3, 0, 0.7)/0.7*0.4 25 | hgn = 0.9 - np.clip(-x-0.0, 0, 0.3)/0.3*0.5 - np.clip(-x-0.3, 0, 0.7)/0.7*0.4 26 | hbn = 0.9 - np.clip(-x-0.3, 0, 0.7)/0.7*0.5 27 | 28 | r = hrp*(x >= 0)+hrn*(x < 0) 29 | g = hgp*(x >= 0)+hgn*(x < 0) 30 | b = hbp*(x >= 0)+hbn*(x < 0) 31 | 32 | return np.concatenate([r, g, b], axis=-1) 33 | 34 | 35 | def graymap(x): 36 | 37 | x = x[..., np.newaxis] 38 | return np.concatenate([x, x, x], axis=-1)*0.5+0.5 39 | 40 | # -------------------------------------- 41 | # Visualizing data 42 | # -------------------------------------- 43 | 44 | # def visualize(x,colormap,name): 45 | # 46 | # N = len(x) 47 | # assert(N <= 16) 48 | # 49 | # x = colormap(x/np.abs(x).max()) 50 | # 51 | # # Create a mosaic and upsample 52 | # x = x.reshape([1, N, 29, 29, 3]) 53 | # x = np.pad(x, ((0, 0), (0, 0), (2, 2), (2, 2), (0, 0)), 'constant', constant_values=1) 54 | # x = x.transpose([0, 2, 1, 3, 4]).reshape([1*33, N*33, 3]) 55 | # x = np.kron(x, np.ones([2, 2, 1])) 56 | # 57 | # PIL.Image.fromarray((x*255).astype('byte'), 'RGB').save(name) 58 | 59 | 60 | def plt_vector(x, colormap, num_headers): 61 | N = len(x) 62 | assert (N <= 16) 63 | len_x = 54 64 | len_y = num_headers 65 | # size = int(np.ceil(np.sqrt(len(x[0])))) 66 | length = len_y*len_x 67 | data = np.zeros((N, length), dtype=np.float64) 68 | data[:, :x.shape[1]] = x 69 | data = colormap(data / np.abs(data).max()) 70 | # data = data.reshape([1, N, size, size, 3]) 71 | data = data.reshape([1, N, len_y, len_x, 3]) 72 | # data = np.pad(data, ((0, 0), (0, 0), (2, 2), (2, 2), (0, 0)), 'constant', constant_values=1) 73 | data = data.transpose([0, 2, 1, 3, 4]).reshape([1 * (len_y), N * (len_x), 3]) 74 | return data 75 | # data = np.kron(data, np.ones([2, 2, 1])) # scales 76 | 77 | 78 | def add_subplot(data, num_plots, plot_index, title, figure): 79 | fig = figure 80 | ax = fig.add_subplot(num_plots, 1, plot_index) 81 | cax = ax.imshow(data, interpolation='nearest', aspect='auto') 82 | # cbar = fig.colorbar(cax, ticks=[0, 1]) 83 | # cbar.ax.set_yticklabels(['0', '> 1']) # vertically oriented colorbar 84 | ax.set_title(title) 85 | 86 | def plot_data(data, title): 87 | plt.figure(figsize=(6.4, 2.5)) # figuresize to make 16 headers plot look good 88 | # plt.axis('scaled') 89 | plt.imshow(data, interpolation='nearest', aspect='auto') 90 | plt.title(title) 91 | plt.tight_layout() 92 | 93 | 94 | def plotNNFilter(units): 95 | filters = units.shape[3] 96 | plt.figure(1, figsize=(20, 20)) 97 | n_columns = 6 98 | n_rows = np.ceil(filters / n_columns) + 1 99 | for i in range(filters): 100 | plt.subplot(n_rows, n_columns, i+1) 101 | plt.title('Filter ' + str(i)) 102 | plt.imshow(units[0, :, :, i], interpolation="nearest", cmap="gray") 103 | 104 | -------------------------------------------------------------------------------- /visualization/visualize_activations.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import matplotlib.pyplot as plt 3 | import numpy as np 4 | from tf import dataset 5 | from visualization import classes_module as md, vis_utils 6 | 7 | num_headers = 16 8 | hidden_units = 12 9 | train_dirs = ['C:/Users/salik/Documents/Data/LinuxChrome/{0}/'.format(num_headers), 10 | 'C:/Users/salik/Documents/Data/WindowsFirefox/{0}/'.format(num_headers), 11 | 'C:/Users/salik/Documents/Data/WindowsAndreas/{0}/'.format(num_headers), 12 | 'C:/Users/salik/Documents/Data/WindowsSalik/{0}/'.format(num_headers)] 13 | 14 | test_dirs = ['C:/Users/salik/Documents/Data/WindowsChrome/{0}/'.format(num_headers)] 15 | seed = 0 16 | input_size = 54*num_headers 17 | data = dataset.read_data_sets(train_dirs, test_dirs, merge_data=True, one_hot=True, 18 | validation_size=0.1, 19 | test_size=0.1, 20 | balance_classes=False, 21 | payload_length=input_size, 22 | seed=seed) 23 | load_dir = "../trained_models/" 24 | model_name = 'header_{0}_{1}_units.ckpt'.format(num_headers, hidden_units) 25 | sess = tf.Session() 26 | # First let's load meta graph and restore weights 27 | saver = tf.train.import_meta_graph(load_dir + model_name + ".meta") 28 | saver.restore(sess, tf.train.latest_checkpoint(load_dir)) 29 | 30 | # Now, let's access and create placeholders variables and 31 | # create feed-dict to feed new data 32 | graph = tf.get_default_graph() 33 | names = [tensor.name for tensor in graph.as_graph_def().node] 34 | x_pl = graph.get_tensor_by_name("xPlaceholder:0") 35 | y_pl = graph.get_tensor_by_name("yPlaceholder:0") 36 | layer1 = graph.get_tensor_by_name('layer1/activation:0') 37 | W1 = graph.get_tensor_by_name("layer1/W:0") 38 | b1 = graph.get_tensor_by_name("layer1/b:0") 39 | W_out = graph.get_tensor_by_name("output_layer/W:0") 40 | b_out = graph.get_tensor_by_name("output_layer/b:0") 41 | y = graph.get_tensor_by_name("output_layer/activation:0") 42 | # Get index of prediction 43 | y_ = tf.argmax(y, axis=1) 44 | 45 | feed_dict = {x_pl: data.test.payloads, y_pl: data.test.labels} 46 | y_preds = sess.run(fetches=y_, feed_dict=feed_dict) 47 | y_true = tf.argmax(data.test.labels, axis=1).eval(session=sess) 48 | 49 | # Number of samples to plot 50 | sample_size = 10 51 | num_samples_picked = 0 52 | sample_payloads = [] 53 | sample_labels = [] 54 | classes = np.array(dataset._label_encoder.classes_) 55 | # Which class do we want to visualise 56 | class_name = "drtv" 57 | class_number = np.argmax(classes == class_name) 58 | # Get all the places where that class label is the true label 59 | true_idx = [i for i, v in enumerate(y_true) if v == class_number] 60 | # What predictions did the network make at those indicies 61 | preds_at_idx = y_preds[true_idx] 62 | # Iterate over predictions and pick #sample_size where prediction was right 63 | for i, v in enumerate(preds_at_idx): 64 | if v == class_number and num_samples_picked < sample_size: 65 | sample_payloads.append(data.test.payloads[true_idx[i]]) 66 | sample_labels.append(data.test.labels[true_idx[i]]) 67 | num_samples_picked += 1 68 | 69 | 70 | X = np.array(sample_payloads) 71 | T = np.array(sample_labels) 72 | # Extract the weights and biases from the network 73 | l1_weights, l1_biases, out_weights, out_biases = sess.run([W1, b1, W_out, b_out]) 74 | # Create visualisation network in sequential manner 75 | nn = md.Network([md.Linear(l1_weights, l1_biases), md.ReLU(), 76 | md.Linear(out_weights, out_biases), md.ReLU() 77 | ]) 78 | # Make a sensitivity and relevance plot for each sample 79 | for i, v in enumerate(X): 80 | t = T[i] 81 | classname = classes[np.argmax(t)] 82 | Y = np.array([nn.forward(v)]) 83 | S = np.array([nn.gradprop(t)**2]) 84 | D = nn.relprop(Y*t) 85 | s_data = vis_utils.plt_vector(S, vis_utils.heatmap, num_headers) 86 | s_title = "Sensitivity for {0}".format(classname) 87 | rel_data = vis_utils.plt_vector(D, vis_utils.heatmap, num_headers) 88 | r_title = "Relevance for {0}".format(classname) 89 | vis_utils.plot_data(s_data, s_title) 90 | vis_utils.plot_data(rel_data, r_title) 91 | plt.show() 92 | --------------------------------------------------------------------------------