├── .gitignore ├── LICENSE ├── README.md ├── ad_simple.py ├── ad_tf.py ├── ad_tf_autoencoder.ipynb └── res ├── anomaly.pcap.gz ├── attack-trace.pcap.gz ├── attack-trace.pcap.readme ├── input.pcap.gz ├── input_normal_anomaly.pcap.readme ├── normal.pcap.gz └── trace.json /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints 2 | ad_test.pcap 3 | checkpoint 4 | model.* 5 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright 2015 The TensorFlow Authors. All rights reserved. 2 | 3 | Apache License 4 | Version 2.0, January 2004 5 | http://www.apache.org/licenses/ 6 | 7 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 8 | 9 | 1. Definitions. 10 | 11 | "License" shall mean the terms and conditions for use, reproduction, 12 | and distribution as defined by Sections 1 through 9 of this document. 13 | 14 | "Licensor" shall mean the copyright owner or entity authorized by 15 | the copyright owner that is granting the License. 16 | 17 | "Legal Entity" shall mean the union of the acting entity and all 18 | other entities that control, are controlled by, or are under common 19 | control with that entity. For the purposes of this definition, 20 | "control" means (i) the power, direct or indirect, to cause the 21 | direction or management of such entity, whether by contract or 22 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 23 | outstanding shares, or (iii) beneficial ownership of such entity. 24 | 25 | "You" (or "Your") shall mean an individual or Legal Entity 26 | exercising permissions granted by this License. 27 | 28 | "Source" form shall mean the preferred form for making modifications, 29 | including but not limited to software source code, documentation 30 | source, and configuration files. 31 | 32 | "Object" form shall mean any form resulting from mechanical 33 | transformation or translation of a Source form, including but 34 | not limited to compiled object code, generated documentation, 35 | and conversions to other media types. 36 | 37 | "Work" shall mean the work of authorship, whether in Source or 38 | Object form, made available under the License, as indicated by a 39 | copyright notice that is included in or attached to the work 40 | (an example is provided in the Appendix below). 41 | 42 | "Derivative Works" shall mean any work, whether in Source or Object 43 | form, that is based on (or derived from) the Work and for which the 44 | editorial revisions, annotations, elaborations, or other modifications 45 | represent, as a whole, an original work of authorship. For the purposes 46 | of this License, Derivative Works shall not include works that remain 47 | separable from, or merely link (or bind by name) to the interfaces of, 48 | the Work and Derivative Works thereof. 49 | 50 | "Contribution" shall mean any work of authorship, including 51 | the original version of the Work and any modifications or additions 52 | to that Work or Derivative Works thereof, that is intentionally 53 | submitted to Licensor for inclusion in the Work by the copyright owner 54 | or by an individual or Legal Entity authorized to submit on behalf of 55 | the copyright owner. For the purposes of this definition, "submitted" 56 | means any form of electronic, verbal, or written communication sent 57 | to the Licensor or its representatives, including but not limited to 58 | communication on electronic mailing lists, source code control systems, 59 | and issue tracking systems that are managed by, or on behalf of, the 60 | Licensor for the purpose of discussing and improving the Work, but 61 | excluding communication that is conspicuously marked or otherwise 62 | designated in writing by the copyright owner as "Not a Contribution." 63 | 64 | "Contributor" shall mean Licensor and any individual or Legal Entity 65 | on behalf of whom a Contribution has been received by Licensor and 66 | subsequently incorporated within the Work. 67 | 68 | 2. Grant of Copyright License. Subject to the terms and conditions of 69 | this License, each Contributor hereby grants to You a perpetual, 70 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 71 | copyright license to reproduce, prepare Derivative Works of, 72 | publicly display, publicly perform, sublicense, and distribute the 73 | Work and such Derivative Works in Source or Object form. 74 | 75 | 3. Grant of Patent License. Subject to the terms and conditions of 76 | this License, each Contributor hereby grants to You a perpetual, 77 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 78 | (except as stated in this section) patent license to make, have made, 79 | use, offer to sell, sell, import, and otherwise transfer the Work, 80 | where such license applies only to those patent claims licensable 81 | by such Contributor that are necessarily infringed by their 82 | Contribution(s) alone or by combination of their Contribution(s) 83 | with the Work to which such Contribution(s) was submitted. If You 84 | institute patent litigation against any entity (including a 85 | cross-claim or counterclaim in a lawsuit) alleging that the Work 86 | or a Contribution incorporated within the Work constitutes direct 87 | or contributory patent infringement, then any patent licenses 88 | granted to You under this License for that Work shall terminate 89 | as of the date such litigation is filed. 90 | 91 | 4. Redistribution. You may reproduce and distribute copies of the 92 | Work or Derivative Works thereof in any medium, with or without 93 | modifications, and in Source or Object form, provided that You 94 | meet the following conditions: 95 | 96 | (a) You must give any other recipients of the Work or 97 | Derivative Works a copy of this License; and 98 | 99 | (b) You must cause any modified files to carry prominent notices 100 | stating that You changed the files; and 101 | 102 | (c) You must retain, in the Source form of any Derivative Works 103 | that You distribute, all copyright, patent, trademark, and 104 | attribution notices from the Source form of the Work, 105 | excluding those notices that do not pertain to any part of 106 | the Derivative Works; and 107 | 108 | (d) If the Work includes a "NOTICE" text file as part of its 109 | distribution, then any Derivative Works that You distribute must 110 | include a readable copy of the attribution notices contained 111 | within such NOTICE file, excluding those notices that do not 112 | pertain to any part of the Derivative Works, in at least one 113 | of the following places: within a NOTICE text file distributed 114 | as part of the Derivative Works; within the Source form or 115 | documentation, if provided along with the Derivative Works; or, 116 | within a display generated by the Derivative Works, if and 117 | wherever such third-party notices normally appear. The contents 118 | of the NOTICE file are for informational purposes only and 119 | do not modify the License. You may add Your own attribution 120 | notices within Derivative Works that You distribute, alongside 121 | or as an addendum to the NOTICE text from the Work, provided 122 | that such additional attribution notices cannot be construed 123 | as modifying the License. 124 | 125 | You may add Your own copyright statement to Your modifications and 126 | may provide additional or different license terms and conditions 127 | for use, reproduction, or distribution of Your modifications, or 128 | for any such Derivative Works as a whole, provided Your use, 129 | reproduction, and distribution of the Work otherwise complies with 130 | the conditions stated in this License. 131 | 132 | 5. Submission of Contributions. Unless You explicitly state otherwise, 133 | any Contribution intentionally submitted for inclusion in the Work 134 | by You to the Licensor shall be under the terms and conditions of 135 | this License, without any additional terms or conditions. 136 | Notwithstanding the above, nothing herein shall supersede or modify 137 | the terms of any separate license agreement you may have executed 138 | with Licensor regarding such Contributions. 139 | 140 | 6. Trademarks. This License does not grant permission to use the trade 141 | names, trademarks, service marks, or product names of the Licensor, 142 | except as required for reasonable and customary use in describing the 143 | origin of the Work and reproducing the content of the NOTICE file. 144 | 145 | 7. Disclaimer of Warranty. Unless required by applicable law or 146 | agreed to in writing, Licensor provides the Work (and each 147 | Contributor provides its Contributions) on an "AS IS" BASIS, 148 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 149 | implied, including, without limitation, any warranties or conditions 150 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 151 | PARTICULAR PURPOSE. You are solely responsible for determining the 152 | appropriateness of using or redistributing the Work and assume any 153 | risks associated with Your exercise of permissions under this License. 154 | 155 | 8. Limitation of Liability. In no event and under no legal theory, 156 | whether in tort (including negligence), contract, or otherwise, 157 | unless required by applicable law (such as deliberate and grossly 158 | negligent acts) or agreed to in writing, shall any Contributor be 159 | liable to You for damages, including any direct, indirect, special, 160 | incidental, or consequential damages of any character arising as a 161 | result of this License or out of the use or inability to use the 162 | Work (including but not limited to damages for loss of goodwill, 163 | work stoppage, computer failure or malfunction, or any and all 164 | other commercial damages or losses), even if such Contributor 165 | has been advised of the possibility of such damages. 166 | 167 | 9. Accepting Warranty or Additional Liability. While redistributing 168 | the Work or Derivative Works thereof, You may choose to offer, 169 | and charge a fee for, acceptance of support, warranty, indemnity, 170 | or other liability obligations and/or rights consistent with this 171 | License. However, in accepting such obligations, You may act only 172 | on Your own behalf and on Your sole responsibility, not on behalf 173 | of any other Contributor, and only if You agree to indemnify, 174 | defend, and hold each Contributor harmless for any liability 175 | incurred by, or claims asserted against, such Contributor by reason 176 | of your accepting any such warranty or additional liability. 177 | 178 | END OF TERMS AND CONDITIONS 179 | 180 | APPENDIX: How to apply the Apache License to your work. 181 | 182 | To apply the Apache License to your work, attach the following 183 | boilerplate notice, with the fields enclosed by brackets "[]" 184 | replaced with your own identifying information. (Don't include 185 | the brackets!) The text should be enclosed in the appropriate 186 | comment syntax for the file format. We also recommend that a 187 | file or class name and description of purpose be included on the 188 | same "printed page" as the copyright notice for easier 189 | identification within third-party archives. 190 | 191 | Copyright 2017, H21 lab, Martin Kacer 192 | 193 | Based on tensorflow classifier example wide_n_deep_tutorial.py 194 | Copyright 2017, The TensorFlow Authors. 195 | 196 | Licensed under the Apache License, Version 2.0 (the "License"); 197 | you may not use this file except in compliance with the License. 198 | You may obtain a copy of the License at 199 | 200 | http://www.apache.org/licenses/LICENSE-2.0 201 | 202 | Unless required by applicable law or agreed to in writing, software 203 | distributed under the License is distributed on an "AS IS" BASIS, 204 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 205 | See the License for the specific language governing permissions and 206 | limitations under the License. 207 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Unsupervised Anomaly Detection using tensorflow and tshark 2 | Unsupervised learning by using autoencoder neural network by using tensorflow. 3 | 4 | See the [ad_tf_autoencoder.ipynb](https://github.com/H21lab/Anomaly-Detection/blob/master/ad_tf_autoencoder.ipynb) 5 | 6 | 7 | # Supervised Anomaly Detection using tensorflow and tshark 8 | ```shell-session 9 | Script to help to detect anomalies in pcap file. 10 | Using tensorflow neural network classifier and tshark -T ek -x input. 11 | 12 | Input is tshark ek json generate by: 13 | ./tshark -T ek -x -r trace.pcap > input.json 14 | 15 | Run script: 16 | cat input.pcap.json | python ad_tf.py -i normal.pcap.json \ 17 | -a anomaly.pcap.json -f field_1 field_2 .... field_n 18 | 19 | For fields the name of the fields from json ek should be used, e.g.: 20 | tshark -T ek -x -r ./res/input.pcap.gz | python ad_tf.py \ 21 | -i res/normal.json -a res/anomaly.json -f tcp_tcp_flags_raw \ 22 | tcp_tcp_dstport_raw 23 | 24 | Output pcap 25 | ad_test.pcap 26 | 27 | The script uses the tshark ek jsons including the raw hex data generated 28 | from pcaps by command as described above. The fields arguments are used for 29 | anomaly detection. The fields are used as columns, hashed and used as input 30 | to tensorflow neural classifier network. 31 | 32 | The neural classifier network is first trained with normal.pcap.json input 33 | with label 0 and with anomaly.pcap.json input with label 1. After training 34 | then from stdin is read the input.pcap.json and evaluated. The neural 35 | network predicts the label. 36 | 37 | The output pcap contains then the frames predicted by neural network as 38 | anomalies with label 1. 39 | ``` 40 | 41 | # Simple Anomaly Detection using tshark 42 | ```shell-session 43 | Simple script to help to detect anomalies in pcap file. 44 | 45 | Input is tshark ek json generate by: 46 | ./tshark -T ek -x -r trace.pcap > input.json 47 | 48 | Run script: 49 | cat input.json | python ad_simple.py field_1 field_2 .... field_n 50 | 51 | For fields the name of the fields from json ek should be used, e.g.: 52 | cat input.json | python ad_simple.py ip_ip_src ip_ip_dst 53 | 54 | Output pcap 55 | ad_test.pcap 56 | 57 | The script read the tshark ek json including the raw hex data. The input is 58 | generated from pcap using tshark. The fields arguments are used for simple 59 | anomaly detection. The behavior is similar like SQL GROUP BY command. The 60 | fields are hashed together and the output pcap contains the frames 61 | beginning with most unique combination of selected fields and descending to 62 | most frequent frames containing the selected fields. 63 | 64 | The following example 65 | cat input.json | python ad_simple.py ip_ip_src ip_ip_dst 66 | will generate pcap starting with less frequent combinations of source and 67 | dest IP pairs and descending to frames with common 68 | combinations. 69 | ``` 70 | 71 | ## Limitations 72 | 73 | Program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY. 74 | 75 | ## Attribution 76 | 77 | This code was created by Martin Kacer, H21 lab, Copyright 2020. 78 | https://www.h21lab.com 79 | 80 | -------------------------------------------------------------------------------- /ad_simple.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | # Simple anomaly detection using tshark 4 | # Input is tshark -T ek -x json and output is pcap 5 | # 6 | # Copyright 2020, H21 lab, Martin Kacer 7 | # All the content and resources have been provided in the hope that it will be useful. 8 | # Author do not take responsibility for any misapplication of it. 9 | # 10 | # Licensed under the Apache License, Version 2.0 (the "License"); 11 | # you may not use this file except in compliance with the License. 12 | # You may obtain a copy of the License at 13 | # 14 | # http://www.apache.org/licenses/LICENSE-2.0 15 | # 16 | # Unless required by applicable law or agreed to in writing, software 17 | # distributed under the License is distributed on an "AS IS" BASIS, 18 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 19 | # See the License for the specific language governing permissions and 20 | # limitations under the License. 21 | # 22 | 23 | import sys 24 | import json 25 | import argparse 26 | import operator 27 | import subprocess 28 | import os 29 | import hashlib 30 | 31 | def to_pcap_file(filename, output_pcap_file): 32 | FNULL = open(os.devnull, 'w') 33 | subprocess.call(["text2pcap", filename, output_pcap_file], stdout=FNULL, stderr=subprocess.STDOUT) 34 | 35 | def hex_to_txt(hexstring, output_file): 36 | h = hexstring.lower() 37 | 38 | file = open(output_file, 'a') 39 | 40 | for i in range(0, len(h), 2): 41 | if(i%32 == 0): 42 | file.write(format(int(i/2), '06x') + ' ') 43 | 44 | file.write(h[i:i+2] + ' ') 45 | 46 | if(i%32 == 30): 47 | file.write('\n') 48 | 49 | file.write('\n') 50 | file.close() 51 | 52 | def json_collector(dict, name): 53 | r = [] 54 | if hasattr(dict, 'items'): 55 | for k, v in dict.items(): 56 | if (k in name): 57 | r.append(v) 58 | else: 59 | val = json_collector(v, name) 60 | if (len(val) > 0): 61 | r = r + val 62 | 63 | return r 64 | 65 | 66 | def main(_): 67 | 68 | df = [] # Table storing frames, score, hash_key, ... 69 | d = {} # Hash dict storing counters 70 | i = 0 71 | 72 | # Read from stdin line by line 73 | for line in sys.stdin: 74 | # trim end of lines 75 | line = line.rstrip('\n') 76 | # skip empty lines 77 | if (line.rstrip() == ""): 78 | continue 79 | j = json.loads(line) 80 | 81 | # packet found in ek json input 82 | if ('layers' in j): 83 | 84 | # calculate hash-key and store counters in d dict 85 | fj = json_collector(j, _) 86 | #print fj 87 | k = '' 88 | for f in fj: 89 | s = str(f) 90 | m = hashlib.md5() 91 | m.update(s.encode('utf-8')) 92 | k = m.hexdigest() 93 | if k in d: 94 | d[k] = d[k] + 1 95 | else: 96 | d[k] = 1 97 | if(k == ''): 98 | if k in d: 99 | d[k] = d[k] + 1 100 | else: 101 | d[k] = 1 102 | 103 | 104 | # store in df list all the columns 105 | layers = j['layers'] 106 | 107 | linux_cooked_header = False 108 | if ('sll_raw' in layers): 109 | linux_cooked_header = True 110 | if ('frame_raw' in layers): 111 | # columns: frame_id, frame_raw, score, linux_cooked_header_flag, hash_key 112 | df.append([i, layers['frame_raw'], 0, linux_cooked_header, k]) 113 | i = i + 1 114 | 115 | #print d 116 | 117 | # Calculate score column in df table 118 | for index in range(0, len(df)): 119 | frame = df[index][1] 120 | df[index][2] = d[df[index][4]] 121 | 122 | #print(df) 123 | 124 | # sort the df table by score ascending 125 | sorted_df = sorted(df, key=operator.itemgetter(2), reverse=False) 126 | 127 | #print(sorted_df) 128 | 129 | # Generate output pcap 130 | # open TMP file used by text2pcap 131 | infile = 'ad_test' 132 | file = infile + '.tmp' 133 | f = open(file, 'w') 134 | 135 | # Iterate over packets in JSON 136 | for index in range(0, len(sorted_df)): 137 | list = [] 138 | linux_cooked_header = False; 139 | 140 | frame_raw = sorted_df[index][1] 141 | 142 | # for Linux cooked header replace dest MAC and remove two bytes to reconstruct normal frame using text2pcap 143 | if (sorted_df[index][3]): 144 | frame_raw = "000000000000" + frame_raw[6*2:] # replce dest MAC 145 | frame_raw = frame_raw[:12*2] + "" + frame_raw[14*2:] # remove two bytes before Protocol 146 | 147 | hex_to_txt(frame_raw, file) 148 | 149 | f.close() 150 | # Write out pcap 151 | to_pcap_file(infile + '.tmp', infile + '.pcap') 152 | print("Generated " + infile + ".pcap") 153 | os.remove(infile + '.tmp') 154 | 155 | 156 | if __name__ == "__main__": 157 | parser = argparse.ArgumentParser(description=""" 158 | Simple script to help to detect anomalies in pcap file. 159 | 160 | Input is tshark ek json generate by: 161 | ./tshark -T ek -x -r trace.pcap > input.json 162 | 163 | Run script: 164 | cat input.json | python ad_simple.py field_1 field_2 .... field_n 165 | 166 | For fields the name of the fields from json ek should be used, e.g.: 167 | cat input.json | python ad_simple.py ip_ip_src ip_ip_dst 168 | 169 | Output pcap 170 | ad_test.pcap 171 | 172 | The script read the tshark ek json including the raw hex data. The input is 173 | generated from pcap using tshark. The fields arguments are used for simple 174 | anomaly detection. The behavior is similar like SQL GROUP BY command. The 175 | fields are hashed together and the output pcap contains the frames 176 | beginning with most unique combination of selected fields and descending to 177 | most frequent frames containing the selected fields. 178 | 179 | The following example 180 | cat input.json | python ad_simple.py ip_ip_src ip_ip_dst 181 | will generate pcap starting with less frequent combinations of source and 182 | dest IP pairs and descending to frames with common 183 | combinations. 184 | 185 | """, formatter_class=argparse.RawTextHelpFormatter) 186 | parser.register("type", "bool", lambda v: v.lower() == "true") 187 | FLAGS, unparsed = parser.parse_known_args() 188 | main(unparsed) 189 | -------------------------------------------------------------------------------- /ad_tf.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | # 4 | # Anomaly detection using tensorflow and tshark 5 | # Supervised learning using neural network classifier 6 | # 7 | # Copyright 2020, H21 lab, Martin Kacer 8 | # All the content and resources have been provided in the hope that it will be useful. 9 | # Author do not take responsibility for any misapplication of it. 10 | # 11 | # Based on tensorflow classifier example wide_n_deep_tutorial.py 12 | # Copyright 2017, The TensorFlow Authors. 13 | # 14 | # Licensed under the Apache License, Version 2.0 (the "License"); 15 | # you may not use this file except in compliance with the License. 16 | # You may obtain a copy of the License at 17 | # 18 | # http://www.apache.org/licenses/LICENSE-2.0 19 | # 20 | # Unless required by applicable law or agreed to in writing, software 21 | # distributed under the License is distributed on an "AS IS" BASIS, 22 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 23 | # See the License for the specific language governing permissions and 24 | # limitations under the License. 25 | # 26 | 27 | import sys 28 | import json 29 | import argparse 30 | import tempfile 31 | import pandas as pd 32 | import operator 33 | import subprocess 34 | import os 35 | import hashlib 36 | import tensorflow as tf 37 | tf.estimator.Estimator._validate_features_in_predict_input = lambda *args: None 38 | 39 | tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.DEBUG) 40 | 41 | COLUMNS = [] 42 | LABEL_COLUMN = "label" 43 | CATEGORICAL_COLUMNS = [] 44 | CONTINUOUS_COLUMNS = [] 45 | 46 | FLAGS = None 47 | 48 | def build_model_columns(): 49 | """Builds a set of wide and deep feature columns.""" 50 | 51 | # Wide columns and deep columns. 52 | wide_columns = [] 53 | 54 | deep_columns = [] 55 | 56 | for c in COLUMNS: 57 | # Sparse base columns. 58 | print(">>>>>>>>>>>>>>>>>>>") 59 | print(c) 60 | column = tf.feature_column.categorical_column_with_hash_bucket(c, hash_bucket_size=10000) 61 | deep_columns.append(tf.feature_column.embedding_column(column, dimension=8)) 62 | #wide_columns.append(column) 63 | 64 | return wide_columns, deep_columns 65 | 66 | def build_estimator(model_dir, model_type): 67 | """Build an estimator appropriate for the given model type.""" 68 | wide_columns, deep_columns = build_model_columns() 69 | hidden_units = [100, 75, 50, 25] 70 | 71 | run_config = tf.estimator.RunConfig().replace(keep_checkpoint_max = 5, 72 | log_step_count_steps=20, save_checkpoints_steps=200) 73 | 74 | if model_type == 'wide': 75 | return tf.estimator.LinearClassifier( 76 | model_dir=model_dir, 77 | feature_columns=wide_columns, 78 | config=run_config) 79 | elif model_type == 'deep': 80 | return tf.estimator.DNNClassifier( 81 | model_dir=model_dir, 82 | feature_columns=deep_columns, 83 | hidden_units=hidden_units, 84 | config=run_config) 85 | else: 86 | return tf.estimator.DNNLinearCombinedClassifier( 87 | model_dir=model_dir, 88 | linear_feature_columns=wide_columns, 89 | dnn_feature_columns=deep_columns, 90 | dnn_hidden_units=hidden_units, 91 | config=run_config) 92 | 93 | def input_fn(df, num_epochs, shuffle, batch_size): 94 | """Input builder function.""" 95 | dataset = tf.data.Dataset.from_tensor_slices((dict(df[COLUMNS]), df['label'])) 96 | 97 | if shuffle: 98 | dataset = dataset.shuffle(1000) 99 | 100 | dataset = dataset.repeat(num_epochs) 101 | dataset = dataset.batch(batch_size) 102 | return dataset 103 | 104 | def df_to_pcap(j, df_predict, file): 105 | linux_cooked_header = df_predict.at[j, 'linux_cooked_header']; 106 | frame_raw = df_predict.at[j, 'frame_raw'] 107 | # for Linux cooked header replace dest MAC and remove two bytes to reconstruct normal frame using text2pcap 108 | if (linux_cooked_header): 109 | frame_raw = "000000000000" + frame_raw[6*2:] # replce dest MAC 110 | frame_raw = frame_raw[:12*2] + "" + frame_raw[14*2:] # remove two bytes before Protocol 111 | hex_to_txt(frame_raw, file) 112 | 113 | def to_pcap_file(filename, output_pcap_file): 114 | FNULL = open(os.devnull, 'w') 115 | subprocess.call(["text2pcap", filename, output_pcap_file], stdout=FNULL, stderr=subprocess.STDOUT) 116 | 117 | def hex_to_txt(hexstring, output_file): 118 | h = hexstring.lower() 119 | 120 | file = open(output_file, 'a') 121 | 122 | for i in range(0, len(h), 2): 123 | if(i%32 == 0): 124 | file.write(format(int(i/2), '06x') + ' ') 125 | 126 | file.write(h[i:i+2] + ' ') 127 | 128 | if(i%32 == 30): 129 | file.write('\n') 130 | 131 | file.write('\n') 132 | file.close() 133 | 134 | def json_collector(dict, name): 135 | r = [] 136 | if hasattr(dict, 'items'): 137 | for k, v in dict.items(): 138 | if (k in name): 139 | r.append(v) 140 | else: 141 | val = json_collector(v, name) 142 | if (len(val) > 0): 143 | r = r + val 144 | 145 | return r 146 | 147 | def readJsonEKLine(df, line, label): 148 | # trim end of lines 149 | line = line.rstrip('\n') 150 | # skip empty lines 151 | if (line.rstrip() == ""): 152 | return 153 | 154 | j = json.loads(line) 155 | 156 | # frames 157 | if ('layers' in j): 158 | layers = j['layers'] 159 | 160 | linux_cooked_header = False 161 | if ('sll_raw' in layers): 162 | linux_cooked_header = True 163 | if ('frame_raw' in layers): 164 | 165 | i = len(df) 166 | 167 | df.loc[i, 'frame_raw'] = layers['frame_raw'] 168 | df.loc[i, 'linux_cooked_header'] = linux_cooked_header 169 | 170 | for c in COLUMNS: 171 | v = json_collector(j, [c]) 172 | if (len(v) > 0): 173 | v = v[0] 174 | else: 175 | v = '' 176 | df.loc[i, c] = v 177 | 178 | df.loc[i, 'label'] = label 179 | 180 | def readJsonEK(df, filename, label, limit = 0): 181 | i = 0 182 | while i <= limit: 183 | with open(filename) as f: 184 | for line in f: 185 | if (limit != 0 and i > limit): 186 | return i 187 | readJsonEKLine(df, line, label) 188 | i = i + 1 189 | return i 190 | 191 | def main(_): 192 | 193 | global COLUMNS 194 | global CATEGORICAL_COLUMNS 195 | COLUMNS = FLAGS.fields 196 | CATEGORICAL_COLUMNS = COLUMNS 197 | 198 | print('===============') 199 | print(COLUMNS) 200 | print(CATEGORICAL_COLUMNS) 201 | print(CONTINUOUS_COLUMNS) 202 | print('===============') 203 | 204 | df = pd.DataFrame() 205 | 206 | ln = readJsonEK(df, FLAGS.normal_tshark_ek_x_json, 0) 207 | readJsonEK(df, FLAGS.anomaly_tshark_ek_x_json, 1, ln) 208 | 209 | df = df.sample(frac=1).reset_index(drop=True) 210 | 211 | print(df) 212 | 213 | ##################################### 214 | # train neural network and evaluate # 215 | ##################################### 216 | model_dir = tempfile.mkdtemp() 217 | print("model directory = %s" % model_dir) 218 | 219 | print(">>>>>>>>>>>>>>>" + str(COLUMNS)) 220 | model = build_estimator(model_dir, 'wide_n_deep') 221 | 222 | # Train and evaluate the model every `FLAGS.epochs_per_eval` epochs. 223 | train_epochs = 100 224 | epochs_per_eval = 20 225 | train_steps = 400 226 | for n in range(train_epochs // epochs_per_eval): 227 | model.train(input_fn=lambda: input_fn(df, train_epochs, True, train_steps)) 228 | 229 | results = model.evaluate(input_fn=lambda: input_fn(df, train_epochs, True, train_steps)) 230 | 231 | # Display evaluation metrics 232 | print('Results at epoch', (n + 1) * epochs_per_eval) 233 | print('-' * 60) 234 | 235 | for key in sorted(results): 236 | print('%s: %s' % (key, results[key])) 237 | 238 | ##################################### 239 | # read from stdin and predict # 240 | ##################################### 241 | # Generate pcap 242 | # open TMP file used by text2pcap 243 | 244 | infile = 'ad_test' 245 | file = infile + '.tmp' 246 | f = open(file, 'w') 247 | 248 | df_predict = pd.DataFrame() 249 | 250 | i = 0; 251 | for line in sys.stdin: 252 | readJsonEKLine(df_predict, line, 0) 253 | 254 | i = i + 1 255 | 256 | #print(df_predict) 257 | 258 | # flush every 100 lines, EK JSON contains also index lines, not packets 259 | if (i%200) == 0: 260 | y = model.predict(input_fn=lambda: input_fn(df_predict, 1, False, 100)) 261 | #print("=======================") 262 | #print(y) 263 | #print("=======================") 264 | 265 | j = 0 266 | for val in y: 267 | #print("****") 268 | #print(val) 269 | #print("****") 270 | if (val == 1): 271 | print(str(df_predict.iloc[[j]])) 272 | # pcap 273 | df_to_pcap(j, df_predict, file) 274 | 275 | j = j + 1 276 | 277 | # check predicted labels 278 | if len(df_predict) > 0: 279 | y = model.predict(input_fn=lambda: input_fn(df_predict, 1, False, 100)) 280 | j = 0 281 | for val in y: 282 | label = val['class_ids'][0] 283 | if (label == 1): 284 | print("index = " + str(j)) 285 | print("label = " + str(label)) 286 | print("Probability = " + str(val['probabilities'][label])) 287 | print(str(df_predict.iloc[[j]])) 288 | # pcap 289 | df_to_pcap(j, df_predict, file) 290 | 291 | j = j + 1 292 | 293 | # flush 294 | df_predict = pd.DataFrame() 295 | 296 | # pcap 297 | f.close() 298 | to_pcap_file(infile + '.tmp', infile + '.pcap') 299 | os.remove(infile + '.tmp') 300 | print("Generated " + infile + ".pcap") 301 | 302 | 303 | if __name__ == "__main__": 304 | parser = argparse.ArgumentParser(description=""" 305 | Script to help to detect anomalies in pcap file. 306 | Using tensorflow neural network classifier and tshark -T ek -x input. 307 | 308 | Input is tshark ek json generate by: 309 | ./tshark -T ek -x -r trace.pcap > input.json 310 | 311 | Run script: 312 | cat input.pcap.json | python ad_tf.py -i normal.pcap.json \\ 313 | -a anomaly.pcap.json -f field_1 field_2 .... field_n 314 | 315 | For fields the name of the fields from json ek should be used, e.g.: 316 | tshark -T ek -x -r ./res/input.pcap.gz | python ad_tf.py \\ 317 | -i res/normal.json -a res/anomaly.json -f tcp_tcp_flags_raw \\ 318 | tcp_tcp_dstport_raw 319 | 320 | Output pcap 321 | ad_test.pcap 322 | 323 | The script uses the tshark ek jsons including the raw hex data generated 324 | from pcaps by command as described above. The fields arguments are used for 325 | anomaly detection. The fields are used as columns, hashed and used as input 326 | to tensorflow neural classifier network. 327 | 328 | The neural classifier network is first trained with normal.pcap.json input 329 | with label 0 and with anomaly.pcap.json input with label 1. After training 330 | then from stdin is read the input.pcap.json and evaluated. The neural 331 | network predicts the label. 332 | 333 | The output pcap contains then the frames predicted by neural network as 334 | anomalies with label 1. 335 | """, formatter_class=argparse.RawTextHelpFormatter) 336 | parser.register("type", "bool", lambda v: v.lower() == "true") 337 | parser.add_argument( 338 | "-a", 339 | "--anomaly_tshark_ek_x_json", 340 | type=str, 341 | default="", 342 | help="Anomaly traffic. Json created by tshark -T ek -x from pcap.\nShall contain only frames considered as anomalies.", 343 | required=True 344 | ) 345 | parser.add_argument( 346 | "-i", 347 | "--normal_tshark_ek_x_json", 348 | type=str, 349 | default="", 350 | help="Regular traffic. Json created by tshark -T ek -x from pcap.\nShall contain only frames considered as normal.", 351 | required=True 352 | ) 353 | parser.add_argument( 354 | "-f", 355 | "--fields", 356 | nargs='+', 357 | help='field_1 field_2 .... field_n (e.g. ip_ip_src ip_ip_dst)', 358 | required=True 359 | ) 360 | 361 | FLAGS, unparsed = parser.parse_known_args() 362 | 363 | print("============") 364 | print(FLAGS.anomaly_tshark_ek_x_json) 365 | print(FLAGS.normal_tshark_ek_x_json) 366 | print(FLAGS.fields) 367 | print("============") 368 | 369 | main([sys.argv[0]] + unparsed) 370 | -------------------------------------------------------------------------------- /res/anomaly.pcap.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/H21lab/Anomaly-Detection/070c01bb6de2601c78dc6aea0a558a990c04acc2/res/anomaly.pcap.gz -------------------------------------------------------------------------------- /res/attack-trace.pcap.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/H21lab/Anomaly-Detection/070c01bb6de2601c78dc6aea0a558a990c04acc2/res/attack-trace.pcap.gz -------------------------------------------------------------------------------- /res/attack-trace.pcap.readme: -------------------------------------------------------------------------------- 1 | Created from pcaps publicly available on https://pcapr.net 2 | 3 | attack-trace.pcap 4 | 5 | Created by the following command 6 | tshark -T ek -x -r attack-trace.pcap > trace.json 7 | -------------------------------------------------------------------------------- /res/input.pcap.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/H21lab/Anomaly-Detection/070c01bb6de2601c78dc6aea0a558a990c04acc2/res/input.pcap.gz -------------------------------------------------------------------------------- /res/input_normal_anomaly.pcap.readme: -------------------------------------------------------------------------------- 1 | input.pcap, normal.pcap, anomaly.pcap 2 | 3 | Original pcap was "DEF CON 23 ICS Village" 4 | Downloaded from https://www.netresec.com/?page=PcapFiles 5 | https://media.defcon.org/DEF%20CON%2023/DEF%20CON%2023%20villages/DEF%20CON%2023%20ics%20village/DEF%20CON%2023%20ICS%20Village%20packet%20captures.rar 6 | 7 | The pcap was after split into 3 pcap files. 8 | -------------------------------------------------------------------------------- /res/normal.pcap.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/H21lab/Anomaly-Detection/070c01bb6de2601c78dc6aea0a558a990c04acc2/res/normal.pcap.gz --------------------------------------------------------------------------------